I wish to generate analytics on a per-user foundation on what number of occasions they’ve seen a selected web page for 7 days, and what number of occasions they’ve seen a web page throughout their lifetime as a consumer.
This shall be tracked in 1d, 7d, 14d, 30d, and lifelong intervals. If a consumer visits the web page at this time, they are going to seem to have 1 go to within the 1d, in addition to 1 extra go to in all date vary classes above 1d- since visiting as soon as at this time means you’ve got visited at the least one amongst different date vary classes.
To do that I’m storing all occasions in an information lake, and rolling up these date vary counts each 24 hours primarily based on the standards. The precise customers knowledge is in a doc, but it surely would not be possible to retailer all occasions for a consumer of their doc given how a lot knowledge that will finally be, and the inevitable knowledge skew that will create on the cluster. This works now, but it surely’s taking increasingly time to generate these rollups as the information grows, and even with partitioning, the variety of pages we’re doing this rollup on is rising at a tempo the place the present methodology might not scale gracefully.
Once we obtain these occasions, we might replace the customers counts at runtime. However with out the context of the date, there could be no “dropoff”. If it have been a lifetime depend we might at all times increment that discipline, however the counts must replace every day as a web page view at this time isn’t a web page view tomorrow.
One thing that stands out to me is that the counts will persistently gravitate in the direction of one finish. However which will simply be an statement and never something helpful.
Is the one manner to do that the best way I’ve described, or is there a extra intelligent technique to replace these fields?