Adjusting votes based on different numbers of voters - algorithm

I have a 1 to 5 voting system and i'm trying to figure out the best way to find the most popular item voted on, taking into consideration the total possible number of votes cast. To get a vote total, i'm counting "1" votes as -3, "2" votes as -2, "3" votes as +1, "4" votes as +2, "5" votes as +3, so a "1" vote would cancel out a "5" vote and vice versa.
For this example, say we have 3 films playing in 3 different size theaters.
Film 1: 800 seats / Film 2: 400 seats / Film 3: 180 seats
In a way, we're limiting the total amount of votes based on seats, so I would like a way for the film in the smaller theater to not get automatically overwhelmed by the film in the larger theater. It's likely that there will be more votes cast in the larger theater, resulting in a higher total score.
Edit 10/18:
Alright, hopefully I can explain this better. I'm working for a film festival, and we're balloting the first screening of each film in the fest. Therefore, each film will have from 0 to a maximum number of votes based on the size of each theater. I'm looking to find the most popular film in 3 categories: narrative, documentary, short film. By popular I mean a combination of highest average vote and number of votes.
It seems like a weighted average is what i'm looking for, giving less weight to votes from a bigger theater and more weight to votes from a smaller theater to even things out.

You're working with weighted averages.
Instead of just adding up and dividing by the total number of elements (arithmetic mean):
a + b + c
You are adding weights to each element, as they are not all evenly distributed:
w1*a + w2*b + w3*c
In your case, the weights could be this:
# of people in current theater
# of people in all the theaters
Let's try a test case:
Theater 1: 100 people (rating: 1)
Theater 2: 1,000,000 people (rating: 5)
Average = (100 / (100 + 1000000)) * 1 + (1000000/(100 + 1000000)) * 5
= 2.49980002

Well, depending on your goals it sounds like you are interested in some sort of weighted average.
Continuing your film example, it sounds to me like you are trying to rate how "good" the films are. To do this, you don't want to factor the number of views of any particular film too highly into the final determination. However, you have to take it into account somewhat since a film that only got viewed 5 times and had an average rating of +2.7 has much less credibility than a film with 10,000 views getting the same rating.
You might consider simply not including a film in the results unless it has a minimum number of votes.

Given a uniform (even) distribution of votes across {1,2,3,4,5}, the expected rating of your film is 0.2. This is because the the votes {1 and 5} cancel eachother out, as do {2 and 4}. But the vote 3 has an expected value of 1/5 = 0.2. So if people give a rating of {1,2,3,4,5} with equal probability, then you would expect a film (no matter how many people see it) to have an average rating close to 0.2.
So I think the best option for you would be to add up all the scores received and simply divide by the number of people who have seen each film. This should be a good guess at people's sentiment toward the film as the average of the distribution should not get larger simply because more people see the film.
If I were you, I would also suggest adding a small penalty term to your final result, to take into account the fact that some people didn't even want to go see the movie. If lots of people didn't want to see the movie in the first place, but the 5 or so people that saw it gave it a 5* rating, that doesn't make it a good movie, does it?
So a final solution I would recommend: Add up all the points as you have described, and divide by the total number of people who have gone to the cinema. While not perfect (whatever perfect means), it should give you some indication of what people like and don't like. This essentially means people who chose not to see a movie are adding zero to the points total, but still affect the average because the end result is divided by a larger number.


Developing player rankings with ELO

I recently created a tournament system that will soon lead into player rankings. Basically, after players are done with the tournament, they are given a rank based on how they did in the tournament. So the person who won the tournament will have the most points and be ranked #1, while the second will have the second most points and be ranked #2, and so on...
However, after they are ranked in the new rankings, they can challenge other members and have a way to play other members and change their ranks. So basically (using a ranking system), if Player A who is ranked #2 beats Player B who is ranked #1, Player A will now become #1.
I've also decided that if a player wants to compete in the rankings but was not present during the tournament, they can sign up after the tournament, and will be given the lowest possible rank with the lowest points (but they have a chance to move up).
So now, I am wanting to know which way should I go about planning this. When I convert the players from tournament to match rankings, I have to identify them with points in order to rank them. I decided this seems like the best way to do it.
1 1000
2 900
3 800
4 700
5 600
6 500
7 400
8 300
9 200
10 100
After looking on the internet I've decided it would be wise to use ELO to give players their new rank after they players have matched against each other.. I went about it on this page:
So if I go about it this way, lets say I have rank #10 facing rank #1. According to the website above, my formula is:
R' = R + K * (S - E)
and the rating of #10 only has 100 points where #1 has 1,000.
So after doing the math rank #10's expected value of beating #1 is:
1 / [ 1 + 10 ^ ( [1000 - 100] / 400) ]
= 0.55%
100 + 32 * (1 - 0.52)
= 115.36
The problem I have with ELO is it makes no sense. After A rank such as #10 beats #1, he should not gain something as low as 15 points. I'm not sure if i'm doing the math wrong, or if I'm splitting up the points wrong. Or maybe I shouldn't use ELO at all? Any suggestions would be very helpful
Don't get offended, it is your table that doesn't make sense.
Elo system is based on the premise that a rating is an accurate estimate of the strength, and difference of ratings accurately predicts an outcome of a match (a player better by 200 point is expected to score 75%). If an actual outcome does not agree with a prediction, it means that ratings do not reflect strength, hence must be adjusted according to how much an actual outcome differs from the predicted.
An official (as in FIDE) Elo system has few arbitrary arbitrary constants (e.g. 200/75 gauge, Erf as predictor, etc); choosing them (reasonably) different may lead to a different rating values, yet would result (in a long run) in the same ranking. There is some interesting math behind this assertion; this is not a right place to get into details.
Now back to your table. It assigns the rating based on the place, not on the points scored. The champion gets 1000 no matter whether she swept the tournament with an absolute 100% result, or barely made it among equals. These points do not estimate the strength of the participants.
So my advise is to abandon the table altogether, assign each new player an entry rating (say, 1000; it really doesn't matter as long as you are consistent), and stick to Elo from the very beginning.

Find all possible combinations of a scores consistent with data

So I've been working on a problem in my spare time and I'm stuck. Here is where I'm at. I have a number 40. It represents players. I've been given other numbers 39, 38, .... 10. These represent the scores of the first 30 players (1 -30). The rest of the players (31-40) have some unknown score. What I would like to do is find how many combinations of the scores are consistent with the given data.
So for a simpler example: if you have 3 players. One has a score of 1. Then the number of possible combinations of the scores is 3 (0,2; 2,0; 1,1), where (a,b) stands for the number of wins for player one and player two, respectively. A combination of (3,0) wouldn't work because no person can have 3 wins. Nor would (0,0) work because we need a total of 3 wins (and wouldn't get it with 0,0).
I've found the total possible number of games. This is the total number of games played, which means it is the total number of wins. (There are no ties.) Finally, I have a variable for the max wins per player (which is one less than the total number of players. No player can have more than that.)
I've tried finding the number of unique combinations by spreading out N wins to each player and then subtracting combinations that don't fit the criteria. E.g., to figure out many ways to give 10 victories to 5 people with no more than 4 victories to each person, you would use:
C(14,4) - C(5,1)*C(9,4) + C(5,2)*C(4,4) = 381. C(14,4) comes from the formula C(n+k-1, k-1) (google bars and strips, I believe). The next is picking off the the ones with the 5 (not allowed), but adding in the ones we subtracted twice.
Yeah, there has got to be an easier way. Lastly, the numbers get so big that I'm not sure that my computer can adequately handle them. We're talking about C(780, 39), which is 1.15495183 × 10^66. Regardless, there should be a better way of doing this.
To recap, you have 40 people. The scores of the first 30 people are 10 - 39. The last ten people have unknown scores. How many scores can you generate that are meet the criteria: all the scores add up to total possible wins and each player gets no more 39 wins.
Generating functions:
Since the question is more about math, but still on a programming QA site, let me give you a partial solution that works for many of these problems using a symbolic algebra (like Maple of Mathematica). I highly recommend that you grab an intro combinatorics book, these kind of questions are answered there.
First of all the first 30 players who score 10-39 (with a total score of 735) are a bit of a red herring - what we would like to do is solve the other problem, the remaining 10 players whose score could be in the range of (0...39).
If we think of the possible scores of the players as the polynomial:
f(x) = x^0 + x^1 + x^2 + ... x^39
Where a value of x^2 is the score of 2 for example, consider what this looks like
This represents the combined score of all 10 players, ie. the coefficent of x^385 is 2002, which represents the fact that there are 2002 ways for the 10 players to score 385. Wolfram Alpha (a programming lanuage IMO) can evaluate this for us.
If you'd like to know how many possible ways of doing this, just substitute in x=1 in the expression giving 8,140,406,085,191,601, which just happens to be 39^10 (no surprise!)
Why is this useful?
While I know it may seem silly to some to set up all this machinery for a simple problem that can be solved on paper - the approach of generating functions is useful when the problem gets messy (and asymptotic analysis is possible). Consider the same problem, but now we restrict the players to only score prime numbers (2,3,5,7,11,...). How many possible ways can the 10 of them score a specific number, say 344? Just modify your f(x):
f(x) = x^2 + x^3 + x^5 + x^7 + x^11 ...
and repeat the process! (I get [x^344]f(x)^10 = 1390).

Weighted Voting Algorithm/Calculation

I am creating a 'duel' app and I am at a dead-end to calculating the results.
Each user either has an upvote or downvote. There is no 1-5 or five-star rating.
For example: If I were displayed 5 times and won 3, I would have 3 'upvotes' and 2 'downvotes'.
If I did straight percentages, any who was displayed 1 time and selected 1 time (100%) would always be the top where as if someone was 9/10 (90%) they would be below the 1/1 but in theory would belong on top.
Anyone have any ideas of how to accomplish this?
I, too, have been looking for a suitable algorithm for a voting website.
Whilst what #joshhendo suggested would appear to be a sound method of ranking votes, it doesn't take into account the percentage of positive votes.
For example:
Item 1 has 70 'up' votes and 30 'down' votes.
Item 2 has 400 'up' votes and 300 'down' votes.
For Item 1: 70-30 = 40
For Item 2: 400-300 = 100
Item 2 will appear above Item 1 because it has more positive votes. But Item 2 only has 25% positive votes, whereas Item 1 has ~57% positive votes. Item 1 should obviously appear above Item 2, because even though it doesn't have as many overall votes, it has a better 'up' to 'down' ratio of votes.
But then again, one wants to avoid the initial problem of items with 1 vote (positive) appearing above everything else.
I recommend you read this:
It suggests a more mathematically sound solution to this problem. It's actually a very interesting read, and I will be implementing something similar into my own website.
This is also a very good read:
Rather than the positive vote percent, track a bayesian average of that, e.g.:
(positive votes + weighted avg positive votes) / (total votes + arbitrary sample)
You could just tally up the votes, with an up vote counting as +1 and a down vote counting as -1.
For example, lets say someone was 9/10 (for example, had 9 up votes and 1 down vote), then their score would be 9 + -1 = 8. This is higher than 1/1, who has 1 up vote and 0 down votes, therefore their score would be 1 + -0 = 1. So, the person who would have got 90% in your percentage system now has a score of 8, which is higher than the person who would have got 100% with a score of 1.
That's the best and simplest solution I can think of. There may be more complex solutions that would work, but for what you want, I think that should work.
You can have a weighted score.
track each users's points, and a ranking-score
let q be your opponents score (which is in range of 0 to 1 inclusive on both ends.)
When you do battle, you gain 1-q points when you win and you lose q points when you lose. This means if you lose against someone who always wins, that's not going to hurt you much. If you lose to someone who almost always loses, you're going to lose lots of points for it.
Each (day, hour, whatever) recalculate everyone's q, where the #1 person gets a q of 1 (or 1.5, or 2, whatever, but 1 works the best) and the lowest person gets a q of 0.

How to balance number of ratings versus the ratings themselves?

For a school project, we'll have to implement a ranking system. However, we figured that a dumb rank average would suck: something that one user ranked 5 stars would have a better average that something 188 users ranked 4 stars, and that's just stupid.
So I'm wondering if any of you have an example algorithm of "smart" ranking. It only needs to take in account the rankings given and the number of rankings.
You can use a method inspired by Bayesian probability. The gist of the approach is to have an initial belief about the true rating of an item, and use users' ratings to update your belief.
This approach requires two parameters:
What do you think is the true "default" rating of an item, if you have no ratings at all for the item? Call this number R, the "initial belief".
How much weight do you give to the initial belief, compared to the user ratings? Call this W, where the initial belief is "worth" W user ratings of that value.
With the parameters R and W, computing the new rating is simple: assume you have W ratings of value R along with any user ratings, and compute the average. For example, if R = 2 and W = 3, we compute the final score for various scenarios below:
100 (user) ratings of 4: (3*2 + 100*4) / (3 + 100) = 3.94
3 ratings of 5 and 1 rating of 4: (3*2 + 3*5 + 1*4) / (3 + 3 + 1) = 3.57
10 ratings of 4: (3*2 + 10*4) / (3 + 10) = 3.54
1 rating of 5: (3*2 + 1*5) / (3 + 1) = 2.75
No user ratings: (3*2 + 0) / (3 + 0) = 2
1 rating of 1: (3*2 + 1*1) / (3 + 1) = 1.75
This computation takes into consideration the number of user ratings, and the values of those ratings. As a result, the final score roughly corresponds to how happy one can expect to be about a particular item, given the data.
Choosing R
When you choose R, think about what value you would be comfortable assuming for an item with no ratings. Is the typical no-rating item actually 2.4 out of 5, if you were to instantly have everyone rate it? If so, R = 2.4 would be a reasonable choice.
You should not use the minimum value on the rating scale for this parameter, since an item rated extremely poorly by users should end up "worse" than a default item with no ratings.
If you want to pick R using data rather than just intuition, you can use the following method:
Consider all items with at least some threshold of user ratings (so you can be confident that the average user rating is reasonably accurate).
For each item, assume its "true score" is the average user rating.
Choose R to be the median of those scores.
If you want to be slightly more optimistic or pessimistic about a no-rating item, you can choose R to be a different percentile of the scores, for instance the 60th percentile (optimistic) or 40th percentile (pessimistic).
Choosing W
The choice of W should depend on how many ratings a typical item has, and how consistent ratings are. W can be higher if items naturally obtain many ratings, and W should be higher if you have less confidence in user ratings (e.g., if you have high spammer activity). Note that W does not have to be an integer, and can be less than 1.
Choosing W is a more subjective matter than choosing R. However, here are some guidelines:
If a typical item obtains C ratings, then W should not exceed C, or else the final score will be more dependent on R than on the actual user ratings. Instead, W should be close to a fraction of C, perhaps between C/20 and C/5 (depending on how noisy or "spammy" ratings are).
If historical ratings are usually consistent (for an individual item), then W should be relatively small. On the other hand, if ratings for an item vary wildly, then W should be relatively large. You can think of this algorithm as "absorbing" W ratings that are abnormally high or low, turning those ratings into more moderate ones.
In the extreme, setting W = 0 is equivalent to using only the average of user ratings. Setting W = infinity is equivalent to proclaiming that every item has a true rating of R, regardless of the user ratings. Clearly, neither of these extremes are appropriate.
Setting W too large can have the effect of favoring an item with many moderately-high ratings over an item with slightly fewer exceptionally-high ratings.
I appreciated the top answer at the time of posting, so here it is codified as JavaScript:
const defaultR = 2;
const defaultW = 3; // should not exceed typicalNumberOfRatingsPerAnswers 0 is equivalent to using only average of ratings
function getSortAlgoValue(ratings) {
const allRatings = ratings.reduce((sum, r) => sum + r, 0);
return (defaultR * defaultW + allRatings) / (defaultW + ratings.length);
Only listed as a separate answer because the formatting of the code block as a reply wasn't very
Since you've stated that the machine would only be given the rankings and the number of rankings, I would argue that it may be negligent to attempt a calculated weighting method.
First, there are two many unknowns to confirm the proposition that in enough circumstances a larger quantity of ratings are a better indication of quality than a smaller number of ratings. One example is how long have rankings been given? Has there been equal collection duration (equal attention) given to different items ranked with this same method? Others are, which markets have had access to this item and, of course, who specifically ranked it?
Secondly, you've stated in a comment below the question that this is not for front-end use but rather "the ratings are generated by machines, for machines," as a response to my comment that "it's not necessarily only statistical. One person might consider 50 ratings enough, where that might not be enough for another. And some raters' profiles might look more reliable to one person than to another. When that's transparent, it lets the user make a more informed assessment."
Why would that be any different for machines? :)
In any case, if this is about machine-to-machine rankings, the question needs greater detail in order for us to understand how different machines might generate and use the rankings.
Can a ranking generated by a machine be flawed (so as to suggest that more rankings may somehow compensate for those "flawed" rankings? What does that even mean - is it a machine error? Or is it because the item has no use to this particular machine, for example? There are many issues here we might first want to unpack, including if we have access to how the machines are generating the ranking, on some level we may already know the meaning this item may have for this machine, making the aggregated ranking superfluous.
What you can find on different plattforms is the blanking of ratings without enough votings: "This item does not have enough votings"
The problem is you can't do it in an easy formula to calculate a ranking.
I would suggest a hiding of ranking with less than minimum votings but caclulate intern a moving average. I always prefer moving average against total average as it prefers votings from the last time against very old votings which might be given for totaly different circumstances.
Additionally you do not need to have too add a list of all votings. you just have the calculated average and the next voting just changes this value.
newAverage = weight * newVoting + (1-weight) * oldAverage
with a weight about 0.05 for a preference of the last 20 values. (just experiment with this weight)
Additionally I would start with these conditions:
no votings = medium range value (1-5 stars => start with 3 stars)
the average will not be shown if less than 10 votings were given.
A simple solution might be a weighted average:
sum(votes) / number_of_votes
That way, 3 people voting 1 star, and one person voting 5 would give a weighted average of (1+1+1+5)/4 = 2 stars.
Simple, effective, and probably sufficient for your purposes.

Algorithm to create fair / evenly matched teams based on player rankings

I have a data set of players' skill ranking, age and sex and would like to create evenly matched teams.
Teams will have the same number of players (currently 8 teams of 12 players).
Teams should have the same or similar male to female ratio.
Teams should have similar age curve/distribution.
I would like to try this in Haskell but the choice of coding language is the least important aspect of this problem.
This is a bin packing problem, or a multi-dimensional knapsack problem. Björn B. Brandenburg has made a bin packing heuristics library in Haskell that you may find useful.
You need something like...
data Player = P { skill :: Int, gender :: Bool, age :: Int }
Decide on a number of teams n (I'm guessing this is a function of the total number of players).
Find the desired total skill per team:
teamSkill n ps = sum (map skill ps) / n
Find the ideal gender ratio:
genderRatio ps = sum (map (\x -> if gender x then 1 else 0)) / length ps
Find the ideal age variance (you'll want the Math.Statistics package):
ageDist ps = pvar (map age ps)
And you must assign the three constraints some weights to come up with a scoring for a given team:
score skillW genderW ageW team = skillW * sk + genderW * g + ageW * a
where (sk, (g, a)) = (teamSkill 1 &&& genderRatio &&& ageDist) team
The problem reduces to the minimization of the difference in scores between teams. A brute force approach will take time proportional to Θ(nk−1). Given the size of your problem (8 teams of 12 players each), this translates to about 6 to 24 hours on a typical modern PC.
An approach that may work well for you (since you don't need an exact solution in practise) is simulated annealing, or continual improvement by random permutation:
Pick teams at random.
Get a score for this configuration (see above).
Randomly swap players between two or more teams.
Get a score for the new configuration. If it's better than the previous one, keep it and recurse to step 3. Otherwise discard the new configuration and try step 3 again.
When the score has not improved for some fixed number of iterations (experiment to find the knee of this curve), stop. It's likely that the configuration you have at this point will be close enough to the ideal. Run this algorithm a few times to gain confidence that you have not hit on some local optimum that is considerably worse than ideal.
Given the number of players per team and the gender ration (which you can easily compute). The remaining problem is called n-partition problem, which is unfortunately NP-complete and thus very hard to solve exactly. You will have to use approximative or heuristic allgorithms (evolutionary algorithms), if your problem size is too big for a brute force solution. A very simple approximation would be sorting by age and assign in an alternating way.
Assign point values to the skill levels, gender, and age
Assign the sum of the points for each criteria to each player
Sort players by their calculated point value
Assign the next player to the first team
Assign players to the second team until it has >= total points than the first team or the team reaches the maximum players.
Perform 5 for each team, looping back to the first team, until all players are assigned
You can tweak the skill level, gender, and age point values to change the distribution of each.
Lets say you have six players (for a simple example). We can use the same algorithm which pairs opponents in single-elimination tournaments and adapt that to generate "even" teams based on any criteria you choose.
First rank your players best-to-worst. Don't take this too literally. You want a list of players sorted by the criteria you wish to separate them.
Let's look at single elimination tournaments for a second. The idea of using an algorithm to generate optimal single-elimination matches is to avoid the problem of the "top players" meeting too soon in the tournament. If top players meet too soon, one of the top players will be eliminated early on, making the tournament less interesting. We can use this "optimal" pairing to generate teams in which the "top" players are spread out evenly across the teams. Then spread out the the second top players, etc, etc.
So list you players by the criteria you want them separated: men first, then women... sorted by age second. We get (for example):
Player 1: Male - 18
Player 2: Male - 26
Player 3: Male - 45
Player 4: Female - 18
Player 5: Female - 26
Player 6: Female - 45
Then we'll apply the single-elimination algorithm which uses their "rank" (which is just their player number) to create "good match ups".
The single-elimination tournament generator basically works like this: take their rank (player number) and reverse the bits (binary). This new number you come up with become their "slot" in the tournament.
Player 1 in binary (001), reversed becomes 100 (4 decimal) = slot 4
Player 2 in binary (010), reversed becomes 010 (2 decimal) = slot 2
Player 3 in binary (011), reversed becomes 110 (6 decimal) = slot 6
Player 4 in binary (100), reversed becomes 001 (1 decimal) = slot 1
Player 5 in binary (101), reversed becomes 101 (5 decimal) = slot 5
Player 6 in binary (110), reversed becomes 011 (3 decimal) = slot 3
In a single-elimination tournament, slot 1 plays slot 2, 3-vs-4, 5-vs-6. We're going to uses these "pair ups" to generate optimal teams.
Looking at the player number above, ordered by their "slot number", here is the list we came up with:
Slot 1: Female - 18
Slot 2: Male - 26
Slot 3: Female - 45
Slot 4: Male - 18
Slot 5: Female - 26
Slot 6: Male - 45
When you split the slots up into teams (two or more) you get the players in slot 1-3 vs players in slot 4-6. That is the best/optimal grouping you can get.
This technique scales very well with many more players, multiple criteria (just group them together correctly), and multiple teams.
Sort players by skill
Assign best players in order (i.e.: team A: 1st player, team B: 2nd player, ...)
Assign worst players in order
Loop on 2
Evaluate possible corrections and perform them (i.e.: if team A has a total skill of 19 with a player with skill 5 and team B has a total skill of 21 with a player with skill 4, interchange them)
Evaluate possible corrections on gender distribution and perform them
Evaluate possible corrections on age distribution and perform them
Almost trivial approach for two teams:
Sort all player by your skill/rank assessment.
Assign team A the best player.
Assign team B the next two best players
Assign team A the next two best players
goto 3
End when you're out of players.
Not very flexible, and only works on one column ranking, so it won't try to get similar gender or age profiles. But it does make fair well matched teams if the input distribution is reasonably smooth. Plus it doesn't always end with team A have the spare player when there are an odd number.
My answer is not about scoring strategies of teams/players because all the posted are good, but I would try a brute force or a random search approach.
I don't think it's worth create a genetic algorithm.
