Ranking algorithm based spent time and user level - algorithm

Think about a game, players are trying to solve problems and every problem is actually means "Level".
Players will see realtime ranking when playing.
RedisIO has sorted set feature, I'll use it.
But I don't know how to score players:
PlayerA at 7 level, total game time 80 seconds
PlayerB at 7 level, total game time 65 seconds
PlayerC at 5 level, total game time 40 seconds
PlayerD at 1 level, total game time 200 seconds
ranking that I want is just like this
1) PlayerB - because level 7 and 65 seconds
2) PlayerA - because level 7 and 80 seconds
3) PlayerC - because level 5 and 40 seconds
4) PlayerD - because level 1 and 200 seconds
I tried
(timeSpent/level)
calculation but it didn't work well when somebody is at lesser level and lesser spent time than other players.

Short answer: you can have the following function:
score = (level * HUGE_NUMBER) - timeSpent
For HUGE_NUMBER, you can select a value that is slightly larger than the maximum allowed time to finish a level.
While that might be enough for most cases, I would rather use sorting for this problem to avoid any potential unseen bugs in the ranking algorithm.
Assuming that the level of a player is the dominant factor in the ranking, I'd sort all players by level in descending order. This may give you something like that (note that it's not the final ranking yet):
1) PlayerA - because level 7 and 80 seconds
2) PlayerB - because level 7 and 65 seconds
3) PlayerC - because level 5 and 40 seconds
4) PlayerD - because level 1 and 200 seconds
Following that, I'd create sublists of players in each level and sort them by time in ascending order. In the above example, the second sorting would give you the final correct ranking.
1) PlayerB - because level 7 and 65 seconds
2) PlayerA - because level 7 and 80 seconds
3) PlayerC - because level 5 and 40 seconds
4) PlayerD - because level 1 and 200 seconds

Related

Why is Average response time is reducing when we are increasing the number of users?

I am using J-Meter to run a performance test with different number of users. With 1 user, the avg response time is 1.4 seconds, but with more number of users, it's logical that the avg response time will go up, but instead it is reducing. Can anyone explain why? The test scenario is that I am interacting a few times (2-3 interactions) with a chat bot.
Please help me understand this confusing results below:
1 user - 30 seconds - 1.3 seconds (average response time)
5 users - 60 seconds - 0.92 seconds (average response time)
10 users - 60 seconds - 0.93 seconds (average response time)
20 users - 120 seconds - 0.92 seconds (average response time)
First iteration of first user often involves some overhead on client side (most commonly DNS resolution), and can have some overhead on server side (server "warmup"). That overhead is not required in the following iterations or users.
Thus what you see as reduction in average time is actually reduction of the impact of the slower "first user first iteration" execution time on overall outcome. This is why it's important to provide a sufficient sample, so that such local spike does not matter that much anymore. My rule of thumb is at least 10000 iterations before looking at any averages, although level of comfort is up to every tester to set.
Also when increasing number of users, you should not expect average to be worse, unless you reached a saturation point: it should be stable rather. So if you expect your app to be able to support not more than 20 users, than your result is surprising, but if you expect application to support 20000 users, you should not have any average degradation at 20 users.
To test if this is what happens, try to run 1 user, but for much longer, so that total number of iterations is similar to running 20 users for example. Roughly you need to increase duration of test with 1 user to 20 min to get to similar number of iterations (i.e. same length of test would be 120 sec, but also x20 iterations with 20 users, giving you rough number of 20 min total for 1 user)

If I have a 15 min sliding window, how can I collect daily/weekly aggregates?

I have a 15 min sliding window, and can aggregate at any given time over this data within this window. Due to memory constraints, I can't increase the size of the window. I still think I should be able to get aggregates (like trending items which is basically freq counter etc.) over a day/a week.
It doesn't have to be a very accurate count, just needs to filter out the top 3-5.
Will running a cron job every 15 mins and putting it into 4 (15min) counters work?
Can I get update some kind of a rolling counter over the aggregate?
Is there any other method to do this?
My suggestion is an exponentially decaying moving average. Like is done for the Unix load average. (See http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/ for an explanation.)
What you do is pick a constant 0 < k < 1 then update every 5 minutes as follows:
moving_average = k * average_over_last_5_min + (1-k) * moving_average
This will behave something like an average over the last 5/k minutes. So if you set k = 1/(24.0 * 60.0 / 5.0) = 0.00347222222222222 then you get roughly a daily moving average. Divide that by 7 and you get roughly a weekly moving average.
The averages won't be exact, but should work perfectly well to identify what things are trending recently.

Evenly schedule timed items into fixed size containers

I got a task at work to evenly schedule commercial timed items into pre-defined commercial breaks (containers).
Each campaign has a set of commercials with or without spreading order. I need to allow users to chose multiple campaigns and distribute all the commercials to best fit the breaks within a time window.
Example of Campaign A:
Item | Duration | # of times to schedule | order
A1 15 sec 2 1
A2 25 sec 2 2
A3 30 sec 2 3
Required outcome:
each item should appear only once in a break, no repeating.
if there is specific order try to best fit by keeping the order. If
no order shuffle it.
At the end of the process the breaks should contain evenly amount of
commercial time.
Ideal spread would fully fill all desired campaigns into the breaks.
For example: Campaign {Item,Duration,#ofTimes,Order}
Campaign A which has set {A1,15,2,1},{A2,25,2,2},{A3,10,1,3}
Campaign B which has set {B1,20,2,2},{B2,35,3,1},
Campaign C which has set {C1,10,1,1},{C2,15,2,3 sec},{C3,15,1,2 sec}
,{C4,40,1,4}
A client will choose to schedule those campaigns in a specific date that hold 5 breaks of 60 second each.
A good outcome would result in:
Container 1: {A1,15}{B2,35}{C1,10} total of 60 sec
Container 2: {C3,15}{A2,25}{B1,20} total of 60 sec
Container 3: {A3,10}{C2,15}{B2,35} total of 60 sec
Container 4: {C4,40}{B1,20} total of 60 sec
Container 5: {C2,15}{A3,10}{B3,35} total of 60 sec
Of course it's rarely that all will fit so perfectly in real-life examples.
There are so many combinations with large amount of items and I'm not sure how to go about it. The order of items inside a break needs to be dynamically calculated so that the end result would best fit all the items into the breaks.
If the architecture is poor and someone has a better idea (like giving priority to items over order and schedule based on priority or such I'll be glad to hear).
It seems like simulated annealing might be a good way to approach this problem. Just incorporate your constraints of: keeping order, even spreading and fitting into 60sec frame into the scoring function. Your random neighbor function might just swap 2 items with each other | move an item to a different frame.

Algorithm for aggregating stock chart datapoints out of many DB entries

I have a database with stock values in a table, for example:
id - unique id for this entry
stockId - ticker symbol of the stock
value - price of the stock
timestamp - timestamp of that price
I would like to create separate arrays for a timeframe of 24 hour, 7 days and 1 month from my database entries, each array containing datapoints for a stock chart.
For some stockIds, I have just a few data points per hour, for others it could be hundreds or thousands.
My question:
What is a good algorithm to "aggregate" the possibly many datapoints into a few - for example, for the 24 hours chart I would like to have at a maximum 10 datapoints per hour. How do I handle exceptionally high / low values?
What is the common approach in regards to stock charts?
Thank you for reading!
Some options: (assuming 10 points per hour, i.e. one roughly every 6 minutes)
For every 6 minute period, pick the data point closest to the centre of the period
For every 6 minute period, take the average of the points over that period
For an hour period, find the maximum and minimum for each 4 minutes period and pick the 5 maximum and 5 minimum in these respective sets (4 minutes is somewhat randomly chosen).
I originally thought to pick the 5 minimum points and the 5 maximum points such that each maximum point is at least 8 minutes apart, and similarly for minimum points.
The 8 minutes here is so we don't have all the points stacked up on each other. Why 8 minutes? At an even distribution, 60/5 = 12 minutes, so just moving a bit away from that gives us 8 minutes.
But, in terms of the implementation, the 4 minutes approach will be much simpler and should give similar results.
You'll have to see which one gives you the most desirable results. The last one is likely to give a decent indication of variation across the period, whereas the second one is likely to have a more stable graph. The first one can be a bit more erratic.

Log data reduction for variable bandwidth data link

I have an embedded system which generates samples (16bit numbers) at 1 milli second intervals. The variable uplink bandwidth can at best transfer a sample every 5ms, so I am
looking for ways to adaptively reduce the data rate while minimizing the loss
of important information -- in this case the minimum and maximum values in a time interval.
A scheme which I think should work involves sparse coding and a variation of lossy compression. Like this:
The system will internally store the min and max values during a 10ms interval.
The system will internally queue a limited number (say 50) of these data pairs.
No loss of min or max values is allowed but the time interval in which they occur may vary.
When the queue gets full, neighboring data pairs will be combined starting at the end of the queue so that the converted min/max pairs now represent 20ms intervals.
The scheme should be iterative so that further interval combining to 40ms, 80ms etc is done when necessary.
The scheme should be linearly weighted across the length of the queue so that there is no combining for the newest data and maximum necessary combining of the oldest data.
For example with a queue of length 6, successive data reduction should cause the data pairs to cover these intervals:
initial: 10 10 10 10 10 10 (60ms, queue full)
70ms: 10 10 10 10 10 20
80ms: 10 10 10 10 20 20
90ms: 10 10 20 20 20 20
100ms: 10 10 20 20 20 40
110ms: 10 10 20 20 40 40
120ms: 10 20 20 20 40 40
130ms: 10 20 20 40 40 40
140ms: 10 20 20 40 40 80
New samples are added on the left, data is read out from the right.
This idea obviously falls into the categories of lossy-compression and sparse-coding.
I assume this is a problem that must occur often in data logging applications with limited uplink bandwidth therefore some "standard" solution might have emerged.
I have deliberately simplified and left out other issues such as time stamping.
Questions:
Are there already algorithms which do this kind of data logging? I am not looking for the standard, lossy picture or video compression algos but something more specific to data logging as described above.
What would be the most appropriate implementation for the queue? Linked list? Tree?
The term you are looking for is "lossy compression" (See: http://en.wikipedia.org/wiki/Lossy_compression ). The optimal compression method depends on various aspects such as the distribution of your data.
As i understand you want to transmit min() and max() of all samples in a timeperiod.
eg. you want transmit min/max every 10ms with taking samples every 1ms?
if you do not need the individual samples you simply compare them after each sampling
i=0; min=TYPE_MAX; max=TYPE_MIN;// First sample will always overwrite the initial values
while true do
sample = getSample();
if min>sample then
min=sample
if max<sample then
max=sample
if i%10 == 0 then
send(min, max);
// if each period should be handled seperatly: min=TYPE_MAX; max=TYPE_MIN;
done
you can also save bandwidth with sending data only on changes (depends on sample data: if they dont change very quick you will save a lot)
Define a combination cost function that matches your needs, e.g. (len(i) + len(i+1)) / i^2, then iterate the array to find the "cheapest" pair to replace.

Resources