Algorithm recommendation for calculating score jumps - algorithm

I've to come up with an algorithm which would determine jumps and changes of a person's personal scores. Just imagine that every day a person participates in a competition and the scores are recorded on daily basis. Now my task is to come up with a person's performance ratio based on provided time interval. For example: A person scored 7 yesterday, scored 6 today, which means the performance is negative: -1 .
My current solution:
I've two collection of numbers which represents scores, where each element is a daily score of a person (three days):
dataFor2014-07-11/13 = {6,6,6}
dataFor2014-07-13/15 = {6,3,5}
double personsScores = AVG(dataFor2014-07-13/15) - AVG(dataFor2014-07-11/13);
Output: 4.6 - 6 = -1.3 (person's performance for past six days is negative.)
Do you think this is reasonable algorithm? Do you have any suggestions how can I improve it and recommend any better solution?

I think this is more a mathematical problem and the Math forum would probably better to ask.
Probably, convolution filters are the way to go. This is a technique to make a 'graph' through some points (the individual scores).

Related

Normalizing workouts based on activity, total milage, and total time

My friends and I are competing in our own fitness challenge (Sober October) where we are keeping track of Activity, Total Time Spent Moving, and Distance. Our activities include running (outdoors), running (treadmill), running (elliptical), rowing, biking (stationary), biking (outdoors), swimming, and stair stepper.
As a group, we weren't really interested in using a calorie estimation because those results can be easily manipulated by increasing the weight that the equation uses, so we wanted to keep it based on just distance and time.
What kind of equation should I use to best normalize such exercises? I'm looking for something that would weight distance and time differently based on the activity; for example, when compared to running,biking should give more weight to time than to milage because it takes less work to go a mile on a bike than it does on foot.
I was able to find this article on how calories are calculated, and just thought about removing the weight portion of the equation to get our normalized number, but wanted to see if there was a better way to calculate what I'm looking for.
Objective measure
You are seeking an objective measurement which is independent of weight. Use METs.
A human expends a baseline of one MET sitting quietly. Maybe your measure will be excess-MET-hours.
Score = (METs - 1) × Hours
MET values
On that link above you can find reference METs values for various activities, including several of your target activities. These are independent of speed.
You can further improve the calculation by factoring in your distance/time measurements. For example, given cited METs figures:
Walking slowly (1 mph) = 2.0 MET
Walking (3 mph) = 3.0 MET
Jogging (6.8 mph) = 11.2 MET
You can fit them to a curve. Use Desmos.
So your score for walking/jogging/running is:
Excess METs = [1 + 0.2 × (miles/hours) ^ 2 - 1] × hours
You can make similar estimations for other activities.

How to give players a score on a ranking/prediction task?

I have a website built with php/mysql, and I am looking for help in communicating to a Programmer what I want him to do with a Poll/Prediction game that I am trying to create.
For purposes of discussion, assume a game where perhaps 100 players try to predict the top 5 finishers in a Golf Tournament of perhaps 9 Golfers.
I am looking for help in how to create and assign a score based upon the accuracy of prediction.
The players provide a rank ordering using a drag and drop function to order the players from 1 through 5. This ordering has already been coded, and the ranks are stored somehow in the DB (I do not know how).
My initial thinking is to ask the coder to create a script which will assign a score from 1 to 5 for each Golfer that the player nominated to be in the Top 5.
So, a player who predicted perfectly would be awarded a perfect score of 12345.
His first golfer received a 1 for finishing first, second a 2 for finishing second, third golfer receives a 3 for finishing third, and so on.
Anybody less than perfect would have a score higher than 12345.
Players who got the first four positions correct would have to be differentiated on the basis of the finish of their fifth Golfer.
So, one might score 12347 and the other 12348 and the player with the highest score (12348) would be the loser in a matchup of the two players.
A player who did poorly, might have a score of 53419.
Question:
Is this a viable way of creating a score which the players of my game can be ranked upon?
Is it possible to instead simply have something like a Spearman Rank-Order Correlation calculated comparing the Actual Finish Positions with the Predicted Finish Positions for each player,
and then rank players on the basis of the correlation coefficients for their rankings?
Thanks for any help in clarifying how to conceptualize this before approaching a programmer who gets annoyed when I don't really know what I want him to do ahead of time.
It's a quite interesting problem.
It seems that there are three components that need to be considered in the scoring: the number of correct predictions, the order of correct predictions, and the weight of correct predictions.
For example, assume the truth is:
1,5,10,15,20
Here are some predictions:
1,6,7,8,9 : only predicted first one
2,1,10,21,30 : 1 and 10, but the order of 1 is incorrect
20,15,1,5,30 : hit four in the top 5, but the orders are incorrect
It depends on what you value most. You may first check how many in the top 5 the user has predicted, add a value, and then penalize wrong orders. The weight for each position should also be different, this way
1,5,10,15,20 will rank higher than 1,5,10,20,15 and higher than 1,10,5,20,15
Spearman may be working, but I feel it could be too coarse for your purpose.
This is actually a very similar problem that search engines have. EG, in search engine evaluation, the actual outcomes are preferred results provided by humans, and the predicted outcomes are the results delivered by the search engine. In both your task and for search engines, I'd guess you care a lot more about the accuracy of the winner than the accuracy of the 5th place finisher. If that is the case, then the mean average precision is probably a good measure.

Total Score ranking algorithm

Background;
I am looking for a way to calculate the score of a piece of audio based on listeners feedback. Each time a user listens to the track, they must vote if they like it, a simple yes or no. Then each track has a score, based on the number of yes and no votes.
Additionally I would like to decay the value of each vote uniformly over the course of 31 days, so after this amount of time, its value is 0 and doesn't contribute to the overall total score.
I have found a lot of discussions based on reddit and hacker news ranking algorithms, but these seem to decay the total score, and not individual votes themselves. Each vote will have a different amount of decay, based on when the vote was originally cast.
Can anyone help or recommend some material to look at?
Thanks
You could model it as "yes" = 1.0 and "no" = 0.0.
Then, the value of a vote on the nth day after it was cast = (31-n)/31. Further condition this if n > 31, then set it to 0.
Hope this answers your question.
What acceleration do you want on the degradation. A common one is logarithmic because it is easy to implement. Score a 1 for like and a -1 for dislike. Then, when adding up the likes/disliked, divide by the number of days since the vote. On day 1, the vote will have an absolute value of 1. On day 2, it will be 1/2. On day 3, it will be worth 1/3, etc... On day 31, it will be worth 1/31 (0.03).
The problem with logarithmic degradation is that it drops very quickly. You can use many other methods, such as multiplying by log(11-d) where d=1 on the first day, 2 on the second day, and so on. It only allows 11 days of degradation. log(31-d) would allow 31 days. You need to ensure you don't try to do log(0) or log(-x).
Another problem with this entire model is how to handle things that only have old votes. What if something has nothing but likes, but all the likes are old? It will register as not liked much because all the likes have degraded.

Feedback on ranking algorithm options for my website

I am currently working on writing an algorithm for my new site I plan to launch soon. The index page will display the "hottest" posts at the moment.
Variables to consider are:
Number of votes
How controversial the post is (# between 0-1)
Time since post
I have come up with two possible algorithms, the first and most simple is:
controversial * (numVotesThisHour / (numVotesTotal - numVotesThisHour)
Denom = numVotesTuisHour if numVotesTotal - numVotesThisHour == 0
Highest number is hottest
My other option is to use an algorithm similar to Reddit's (except that the score decreases as time goes by):
[controversial * log(x)] - (TimePassed / interval)
x = { numVotesTotal if numVotesTotal >= 10, 10 if numVotesTotal < 10
Highest number is hottest
The first algorithm would allow older posts to become "hot" again in the future while the second one wouldn't.
So my question is, which one of these two algorithms do you think is more effective? Which one do you think will display the truly "hot" topics at the moment? Can you think of any advantages or disadvantages to using one over the other? I just want to make sure I don't overlook anything so that I can ensure the content is as relevant as possible. Any feedback would be great! Thanks!
Am I missing something. In the first formula you have numVotesTotal in the denominator. So higher number of votes all time will mean it will never be so hot even if it is not so old.
For example if I have two posts - P1 and P2 (both equally controversial). Say P1 has numVotesTotal = 20, and P2 has numVotesTotal = 1000. Now in the last one hour P1 gets numVotesThisHour = 10 and P2 gets numVotesThisHour = 200.
According to the algorithm, P1 is more famous than P2. It doesn't make sense to me.
I think the first algorithm relies too heavily on instantaneous trend. Think of NASCAR, the current leader could be going 0 m.p.h. because he's at a pit stop. The second one uses the notion of average trend. I think both have their uses.
So for two posts with the same total votes and controversial rating, but where posts one receives 20 votes in the first hour and zero in the second, while the other receives 10 in each hour. The first post will be buried by the first algorithm but the second algorithm will rank them equally.
YMMV, but I think the 'hotness' is entirely dependent on the time frame, and not at all on the total votes unless your time frame is 'all time'. Also, it seems to me that the proportion of all votes in the relevant time frame, rather than the absolute number of them, is the important figure.
You might have several categories of hot:
Hottest this hour
Hottest this week
Hottest since your last visit
Hottest all time
So, 'Hottest in the last [whatever]' could be calculated like this:
votes_for_topic_in_timeframe / all_votes_in_timeframe
if you especially want a number between 0 and 1, (useful for comparing across categories) or, if you only want the ones in a specific timeframe, just take the votes_for_topic_in_timeframe values and sort into descending order.
If you don't want the user explicitly choosing the time frame, you may want to calculate all (say) four versions (or perhaps just the top 3), assign a multiplier to each category to give each category a relative importance, and calculate total values for each topic to take the top n. This has the advantage of potentially hiding from the user that no-one at all has voted in the last hour ;)

How to calculate scores?

This question is more related to logic than any programming language. If the question is not apt for the forum please do let me know and I will delete this.
I have to write a logic to calculate scores for blogs for a Blog Award website. A blog may be nominated for multiple award categories and is peer-reviewed or rated by a Jury on a -1 to 5 scale (-1 to indicate a blog they utterly dislike). Now, a blog can be rated by one or more Jurors. One criterion while calculating final score for a blog is that if a blog is rated positively by more people it should get more weightage (and vice-versa). Similarly a blog rated -1 even by one Juror should have its score affected (-1 is sort of a Veto here). Lastly, I also want to have an additional score based on the Technorati rank of the blog (so that the final score is based on a mix of Juror rating + Technorati ranking).
Example: A blog is rated in category A by total 6 Jurors. 2 rate it at 3, 3 rate it at 2 and 1 rate it at 4. (I used to calculate the score as (2*3 + 3*2 + 1*4)/6 = 16/6 = 2.67 to get weighted average but I am not satisfied with this, primarily because it doesn't work well when a Juror rating is -1. Moreover, I need to add the Technorati ranking ranking criteria too) .
Could you help me decide the best way to calculate the final scores (keeping the rating method same as above as that cannot be changed now)?
If you want to weight the effect of a -1 rating more strongly, use the same average score calculation but substitute -10 whenever you see -1. You can choose a value other than -10 if you don't want a negative rating to weight as strongly.
You might look at using the lower bound of the Wilson score interval for your ratings.
See http://www.evanmiller.org/how-not-to-sort-by-average-rating.html for more details. Although, there, it is used for the simpler Bernoulli case.
The gist is if you have a lot of ratings you have a higher degree of confidence in your scoring. You can then combine the scores from your local ratings and the Technorati ratings, by weighting the scores by the number of voters locally and on Technorati.
As for wanting a single -1 vote to have high impact, just remap it to a large negative value proportional to your desired impact before feeding it into your scoring formula.
Calculating a score based on votes will be pretty easy. Adding the technorati rank will be the tricky part.
I made a quick script that calculates some scores based on this algorithm
score = ( vote_sum - ( vetos * veto_weight ) ) / number_of_votes
you can change the url paramters to get different values
There are a lot of ties, so maybe you could use technorati blog rank as a tie breaker
you could internally work with scores from 0 to 6. Just do a shift by one, calculate the score and shift back. I guess the -1 has some disrupting effekt on your calculation.

Resources