Definition of Daily Active Users (DAU) - metrics

Suppose there exists an application X.
Is DAU defined as either (a) the number of users that login X every single day over a specified period, or (b) the average total number of users that login X each day over a specified period?
For example:
Specified period = 5 days
The same 50 users login X everyday. In addition, a random number of users login X on top of this each day, say 20, 40, 10, 25, 30.
Does DAU = 50 or DAU = (70+90+60+75+80)/5

DAU is how many unique users visit the site daily. In other words, DAU lives only during a specific day, and not other specific period.
In your example, DAU for the first day is 50 users, the socond - 70 users, the third - 90 and so on.
DAU = (70+90+60+75+80)/5 is not a DAU, that is more likely an average value of DAU for 5 days;
as wel as 50 is DAU for the first day only, not for whole 5 days.
If you wanna calculate an Active Users index for a specific period, you may user Weekly Active Users (WAO) and Monthly Active Users (MAO) or, let's say, a [5 days] Active Users counters.
To calculate "[N Days]AU", you should measure it by counting the number of unique users during a specific measurement period, such as within the previous N days.
So, if User1 (and no one else) logins to the site every 5 days, you'll still have [N Days]AU = 1 for the site, because you have only 1 unique active user during this period.

Related

Understanding locust users and spwan rate

I am trying to understand the exact relationship between the number of users and the spawn rate in locust which I could not find anywhere in the documentation.
So assume that I set the number of users to be 100, the spawn rate to be 50 users per second, and the test duration to be 10 seconds.
Is this a correct interpretation that in 2 seconds (100/50) all the required 100 users are ready and for the next 8 seconds we have 100 constant users running the tasks?
Furthermore, if I use the LoadTestShape class where I can change the number of users and the spawn rate at every tick so that for the next tick I increase the number of users to 150, does it take 1 second to spawn another 50 users to reach 150, or it spawns a whole new batch of 150 users in 3 seconds?
Is this a correct interpretation that in 2 seconds (100/50) all the required 100 users are ready and for the next 8 seconds we have 100 constant users running the tasks?
Yes.
Furthermore, if I use the LoadTestShape class where I can change the number of users and the spawn rate at every tick so that for the next tick I increase the number of users to 150, does it take 1 second to spawn another 50 users to reach 150, or it spawns a whole new batch of 150 users in 3 seconds?
With LoadTestShape you set the target user count (total) and the rate at which you want to approach that number, so option #1.
Note that the exact rate of the ramp up is not 100% guaranteed. If you set the new rate to occur after X seconds, you cant be sure you’ll have all 150 users running at X+1 seconds.

Time-based allocation of assets

I am working on a resource allocation problem and was looking for a algorithm that I could use. Here's are the data items
Each timeslot is 15 mins in duration
There are n number of a resource type
A resource can be requested for n timeslots (e.g. 1 hour = 4 timeslots) at a certain time (e.g. 10 am)
Input to Algorithm: Request n resources for n timeslots at h hour, would a resource be available to fulfil the request.
e.g. can I hire a car for 1 hour at 10:00 am, from a inventory of 4 cars, however 2 of these cars are already booked between 9:30 and 10:00 am.
Any pointers on how this can done, I'll be very grateful
Al

Best data structure to find price difference between two time frames

I am working on a project where my task is to find the % price change given 2 time frames for 100+ stocks in an efficient way.
The time frames are pre defined and can only be 5 mins, 10 mins, 30 mins, 1 hour, 4 hour, 12 hour, 24 hours.
Given a time frame, I need a way to efficiently figure out the % price change of all the stocks that I am tracking.
As per the current implementation, I am getting price data for those stocks every second and dumping the data to a price table.
I have another cron job which updates the % change of the stock based on the values in the price table every few seconds.
The solution is kind of in working state but its not efficient. Is there any data structure/ algorithm that I can use to find the % change efficiently?

Maximizing profits for given stock quotes and volumes

Given an array of stock quotes Q[0], ..., Q[n-1] in chronological order and corresponding volumes V[0], ..., V[n-1] as numbers of shares, how can you find the optimal times and volumes to buy or sell when the volumes of your trades are each limited by V[0], ..., V[n-1]?
I assume that you want to start and end with 0 shares in each stock and that you have unlimited capital.
The problem can be boiled down to buying at the lowest prices available and selling at the highest, with the side condition that buying a share has to be done prior to selling it.
I would process the data in time order and add purchases as long as there is available volumes with a higher price in the future (for each purchase you need to tick off the same amount of shares as sold using the highest future price available).
Continue to move forward in time add add buys as long as there is a profitable time to sell in the future. If there is surplus volume available volume but no available profitable selling spot in the future, you need to look back to see if the current price is lower than any purchases already made. In that case, exchange the most expensive shares from the past for the cheaper ones, but only if there is a future selling point available. Also check if there is any profitable selling point available for the scrapped purchase order.
Example:
Day Price Volume
1 100 1000
2 80 1000
3 110 1000
4 70 1000
5 120 2000
Day 1:
Purchase 1000 at 100 per share. Sell 1000 day 4 at 120.
Day 2:
Purchase 1000 at 80 per share. Sell 1000 day 4 at 120.
Day 3:
No available profitable selling opportunity because all future shares at prices above 70 are already booked!
Look back and see if you have purchased at prices above 110.
You havent, so there is no purchase.
Day 4:
No available profitable opportunity because all future volumes at prices above 70 are already booked!
Look back and see if you have purchased at prices above 70.
Replace purchase of 1000 shares day 1 with purchase of 1000 shares at 70 day 4.
Re-examine the shares of day one and check if there is any other profitable sale available (you only need to consider the timeline up to day 4).
There is, so purchase 1000 at 100 per share day 1 and sell them at 110 per share day 3.
The final order book is:
Day Price Volume Order type shares owned
1 100 1000 Buy 1000
2 80 1000 Buy 2000
3 110 1000 Sell 1000
4 70 1000 Buy 2000
5 120 2000 Sell 0
Total profit: 10000

Algorithm for aggregating stock chart datapoints out of many DB entries

I have a database with stock values in a table, for example:
id - unique id for this entry
stockId - ticker symbol of the stock
value - price of the stock
timestamp - timestamp of that price
I would like to create separate arrays for a timeframe of 24 hour, 7 days and 1 month from my database entries, each array containing datapoints for a stock chart.
For some stockIds, I have just a few data points per hour, for others it could be hundreds or thousands.
My question:
What is a good algorithm to "aggregate" the possibly many datapoints into a few - for example, for the 24 hours chart I would like to have at a maximum 10 datapoints per hour. How do I handle exceptionally high / low values?
What is the common approach in regards to stock charts?
Thank you for reading!
Some options: (assuming 10 points per hour, i.e. one roughly every 6 minutes)
For every 6 minute period, pick the data point closest to the centre of the period
For every 6 minute period, take the average of the points over that period
For an hour period, find the maximum and minimum for each 4 minutes period and pick the 5 maximum and 5 minimum in these respective sets (4 minutes is somewhat randomly chosen).
I originally thought to pick the 5 minimum points and the 5 maximum points such that each maximum point is at least 8 minutes apart, and similarly for minimum points.
The 8 minutes here is so we don't have all the points stacked up on each other. Why 8 minutes? At an even distribution, 60/5 = 12 minutes, so just moving a bit away from that gives us 8 minutes.
But, in terms of the implementation, the 4 minutes approach will be much simpler and should give similar results.
You'll have to see which one gives you the most desirable results. The last one is likely to give a decent indication of variation across the period, whereas the second one is likely to have a more stable graph. The first one can be a bit more erratic.

Resources