I am really looking forward to implement bayesian average rating system for a site I'm developing. I have faced a problem though - all of the examples I can find on the net, are for multi-value rating systems, with the smallest being binary - likes / dislikes (Apply Bayesian average in a NON 5-star rating system).
I cannot seem to understand how I could apply binary bayesian to a unary rating system.
I have no dislikes, I have only likes.
Given the algorithm:
(n / (n + C)) * j + (C / (n + C)) * m
C is the average number of ratings an item receives
m is the average rating across all items
n is the number of ratings the current item
j is the average rating for the current item
I get stuck on m - the average rating accross all items. The average rating is 1 for everything.
How do I tweak this formula for unary rating system?
Maybe there are other, better suited equivalents of bayesian for such task?
Number of likes is a one-dimensional input, so it's hard to do anything interesting without another input. Two possibilities are how old the item is and how many users have viewed it.
Related
Here are a couple of example scenarios about what I'm trying to figure out:
Let's say a grocery store item is listed as 4 for 5.00. How do we go about figuring the unit price for each item, according to the deal that is listed?
A simple solution would be to divide the total price by the quantity listed, and in this case, you would get 1.25.
However, in a situation that is a bit more complicated, such as 3 for 5.00, dividing the price by the quantity gives roughly 1.6666666666666667, which would round to 1.67.
If we round all three items to 1.67, the total price is not 5.00, but in fact 5.01. The individual prices would need to be calculated as 1.67, 1.67, and 1.66 in order to add up correctly.
The same goes for something like 3 for 4.00. The mathematical unit price would be 1.3333333333333333, rounding to 1.33. However, we need to adjust one of them again because the actual price without adjustments would be 3.99. The individual prices would need to be 1.34, 1.33, and 1.33 to add up correctly.
Is there an efficient way to determine how to split up a price deal like this and how to determine adjusted amounts so that the individual prices add up correctly?
If you want to divide an integer number (e.g. of pence) up into as equal portions as possible one way is to mimic dividing up that portion of a line by making marks in it, so portion i of n (counting from 0) when the total is T is of length floor((T * (i + 1)) / n) - floor((T * i) / n).
Whether it makes sense to say the the individual prices of 3 items are 1.67, 1.67, and 1.66 is another matter. How do you decide which item is the cheap one?
Since we're talking about basic math here, I'd say efficiency depends more on your implementation than the algorithm. When you say "efficient", do you mean you don't want to add up all prices to check the remainder? That can be done:
The premise is, you're selling x items for a price of y, where x is obviously integer and y is obviously a float rounded to 2 decimals.
First, you need to calculate the remainder: R = (y*100)%x.
x-R of the items will cost (y * 100 : x) /100 (where ":" means integer division and "/" means float division)
and R of the items will cost (y * 100 : x) / 100 + 0.01
That algorithm is theoretically speaking efficient since it has a complexity of O(1). But I think I remember that in hardware not much is more efficient than adding floats (Don't take my word for it, I didn't pay that much attention to my hardware lectures), so maybe the crude approach is still better.
As we know that reddit has its ranking algorithm as well as stackoverflow.
I want to development a ranking algorithm for audit content dynamic ranking. Users can audit the contents by digg/bury. The ranking algorithm is for making the oldest/fewest actions content be auditted first.
Any ideas?
Here are the classical formulas: http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed
Formula:
(p - 1) / (t + 2)^1.5
Description:
Votes divided by age factor
p = votes (points) from users.
t = time since submission in hours.
p is subtracted by 1 to negate submitters vote.
age factor is (time since submission in hours plus two) to the power of 1.5.
I'm developing a tournament model for a virtual city commerce game (Urbien.com) and would love to get some algorithm suggestions. Here's the scenario and current "basic" implementation:
Scenario
Entries are paired up duel-style, like on the original Facemash or Pixoto.com.
The "player" is a judge, who gets a stream of dueling pairs and must choose a winner for each pair.
Tournaments never end, people can submit new entries at any time and winners of the day/week/month/millenium are chosen based on the data at that date.
Problems to be solved
Rating algorithm - how to rate tournament entries and how to adjust their ratings after each match?
Pairing algorithm - how to choose the next pair to feed the player?
Current solution
Rating algorithm - the Elo rating system currently used in chess and other tournaments.
Pairing algorithm - our current algorithm recognizes two imperatives:
Give more duels to entries that have had less duels so far
Match people with similar ratings with higher probability
Given:
N = total number of entries in the tournament
D = total number of duels played in the tournament so far by all players
Dx = how many duels player x has had so far
To choose players x and y to duel, we first choose player x with probability:
p(x) = (1 - (Dx / D)) / N
Then choose player y the following way:
Sort the players by rating
Let the probability of choosing player j at index jIdx in the sorted list be:
p(j) = ...
0, if (j == x)
n*r^abs(jIdx - xIdx) otherwise
where 0 < r < 1 is a coefficient to be chosen, and n is a normalization factor.
Basically the probabilities in either direction from x form a geometic series, normalized so they sum to 1.
Concerns
Maximize informational value of a duel - pairing the lowest rated entry against the highest rated entry is very unlikely to give you any useful information.
Speed - we don't want to do massive amounts of calculations just to choose one pair. One alternative is to use something like the Swiss pairing system and pair up all entries at once, instead of choosing new duels one at a time. This has the drawback (?) that all entries submitted in a given timeframe will experience roughly the same amount of duels, which may or may not be desirable.
Equilibrium - Pixoto's ImageDuel algorithm detects when entries are unlikely to further improve their rating and gives them less duels from then on. The benefits of such detection are debatable. On the one hand, you can save on computation if you "pause" half the entries. On the other hand, entries with established ratings may be the perfect matches for new entries, to establish the newbies' ratings.
Number of entries - if there are just a few entries, say 10, perhaps a simpler algorithm should be used.
Wins/Losses - how does the player's win/loss ratio affect the next pairing, if at all?
Storage - what to store about each entry and about the tournament itself? Currently stored:
Tournament Entry: # duels so far, # wins, # losses, rating
Tournament: # duels so far, # entries
instead of throwing in ELO and ad-hoc probability formulae, you could use a standard approach based on the maximum likelihood method.
The maximum likelihood method is a method for parameter estimation and it works like this (example). Every contestant (player) is assigned a parameter s[i] (1 <= i <= N where N is total number of contestants) that measures the strength or skill of that player. You pick a formula that maps the strengths of two players into a probability that the first player wins. For example,
P(i, j) = 1/(1 + exp(s[j] - s[i]))
which is the logistic curve (see http://en.wikipedia.org/wiki/Sigmoid_function). When you have then a table that shows the actual results between the users, you use global optimization (e.g. gradient descent) to find those strength parameters s[1] .. s[N] that maximize the probability of the actually observed match result. E.g. if you have three contestants and have observed two results:
Player 1 won over Player 2
Player 2 won over Player 3
then you find parameters s[1], s[2], s[3] that maximize the value of the product
P(1, 2) * P(2, 3)
Incidentally, it can be easier to maximize
log P(1, 2) + log P(2, 3)
Note that if you use something like the logistics curve, it is only the difference of the strength parameters that matters so you need to anchor the values somewhere, e.g. choose arbitrarily
s[1] = 0
In order to have more recent matches "weigh" more, you can adjust the importance of the match results based on their age. If t measures the time since a match took place (in some time units), you can maximize the value of the sum (using the example)
e^-t log P(1, 2) + e^-t' log P(2, 3)
where t and t' are the ages of the matches 1-2 and 2-3, so that those games that occurred more recently weigh more.
The interesting thing in this approach is that when the strength parameters have values, the P(...) formula can be used immediately to calculate the win/lose probability for any future match. To pair contestants, you can pair those where the P(...) value is close to 0.5, and then prefer those contestants whose time-adjusted number of matches (sum of e^-t1 + e^-t2 + ...) for match ages t1, t2, ... is low. The best thing would be to calculate the total impact of a win or loss between two players globally and then prefer those matches that have the largest expected impact on the ratings, but that could require lots of calculations.
You don't need to run the maximum likelihood estimation / global optimization algorithm all the time; you can run it e.g. once a day as a batch run and use the results for the next day for matching people together. The time-adjusted match masses can be updated real time anyway.
On algorithm side, you can sort the players after the maximum likelihood run base on their s parameter, so it's very easy to find equal-strength players quickly.
I need to provide a weighted sort on 2+ factors, ordered by "relevancy". However, the factors aren't completely isolated, in that I want one or more of the factors to affect the "urgency" (weight) of the others.
Example: contributed content (articles) can be up-/down-voted, and thus have a rating; they have a post date, and they're also tagged with categories. Users write the articles and can vote, and may or may not have some kind of ranking themselves (expert, etc). Probably similar to StackOverflow, right?
I want to provide each user with a list of articles grouped by tag but sorted by "relevancy", where relevancy is calculated based on the rating and age of the article, and possibly affected by the ranking of the author. I.E. a highly ranked article that was written several years ago may not necessarily be as relevant as a medium ranked article written yesterday. And maybe if an article was written by an expert it would be treated as more relevant than one written by "Joe Schmoe".
Another good example would be assigning hotels a "meta score" comprised of price, rating, and attractions.
My question is, what is the best algorithm for multiple factor sorting? This may be a duplicate of that question, but I'm interested in a generic algorithm for any number of factors (a more reasonable expectation is 2 - 4 factors), preferably a "fully-automatic" function that I don't have to tweak or require user input, and I can't parse linear algebra and eigenvector wackiness.
Possibilities I've found so far:
Note: S is the "sorting score"
"Linearly weighted" - use a function like: S = (w1 * F1) + (w2 * F2) + (w3 * F3), where wx are arbitrarily assigned weights, and Fx are the values of the factors. You'd also want to normalize F (i.e. Fx_n = Fx / Fmax). I think this is kinda how Lucene search works.
"Base-N weighted" - more like grouping than weighting, it's just a linear weighting where weights are increasing multiples of base-10 (a similar principle to CSS selector specificity), so that more important factors are significantly higher: S = 1000 * F1 + 100 * F2 + 10 * F3 ....
Estimated True Value (ETV) - this is apparently what Google Analytics introduced in their reporting, where the value of one factor influences (weights) another factor - the consequence being to sort on more "statistically significant" values. The link explains it pretty well, so here's just the equation: S = (F2 / F2_max * F1) + ((1 - (F2 / F2_max)) * F1_avg), where F1 is the "more important" factor ("bounce rate" in the article), and F2 is the "significance modifying" factor ("visits" in the article).
Bayesian Estimate - looks really similar to ETV, this is how IMDb calculates their rating. See this StackOverflow post for explanation; equation: S = (F2 / (F2+F2_lim)) * F1 + (F2_lim / (F2+F2_lim)) × F1_avg, where Fx are the same as #3, and F2_lim is the minimum threshold limit for the "significance" factor (i.e. any value less than X shouldn't be considered).
Options #3 or #4 look really promising, since you don't really have to choose an arbitrary weighting scheme like you do in #1 and #2, but the problem is how do you do this for more than two factors?
I also came across the SQL implementation for a two-factor weighting algorithm, which is basically what I'll need to write eventually.
As mentioned in the comments, I would suggest what's called the 'compromise solution' to anyone with a similar problem who is more concerned with not having to set weights than with making one criterion more heavily weighted than the others.
Basically, you consider each of your criterion as a coordinate (after normalization, of course). Based on your judgement, you choose the absolute optimal point, e.g. in this case, the highest rank author, the newest article, etc. Once you choose the optimal solution, each other 'solution' is rated based on its distance from that optimal. A sample formula would be the inverse of the Euclidean distance for each article's score: S = 1/(sqrt((rank - rank_ideal)^2 + (age - age_ideal)^2 + ... + (xn - xn_ideal)^2)).
This treats all criteria as equal, so keep that in mind.
Consider chaining of the weights. E.g. you have 3 factors: X, Y and Z.
You can calculate ETVyz as W = (Z/Zmax * Y) + (1 - Z/Zmax) * Yavg for each record and then calculate ETVxw as S = (W/Wmax * X) + (1 - W/Wmax) * Xavg.
You can chain more factors similary.
The solution, pointed shortly by #gankoji is a simplification of the TOPSIS method.
In TOPSIS the compromise solution can be regarded as choosing the solution with the shortest Euclidean distance from the ideal solution and the farthest Euclidean distance from the negative ideal solution.
This class of problems falls under the term MCDM - Multiple Criteria Decision Making.
Python packages scikit-criteria and mcdm provide implementations of most popular methods. The package docs link to the respective algorithm papers.
For a school project, we'll have to implement a ranking system. However, we figured that a dumb rank average would suck: something that one user ranked 5 stars would have a better average that something 188 users ranked 4 stars, and that's just stupid.
So I'm wondering if any of you have an example algorithm of "smart" ranking. It only needs to take in account the rankings given and the number of rankings.
Thanks!
You can use a method inspired by Bayesian probability. The gist of the approach is to have an initial belief about the true rating of an item, and use users' ratings to update your belief.
This approach requires two parameters:
What do you think is the true "default" rating of an item, if you have no ratings at all for the item? Call this number R, the "initial belief".
How much weight do you give to the initial belief, compared to the user ratings? Call this W, where the initial belief is "worth" W user ratings of that value.
With the parameters R and W, computing the new rating is simple: assume you have W ratings of value R along with any user ratings, and compute the average. For example, if R = 2 and W = 3, we compute the final score for various scenarios below:
100 (user) ratings of 4: (3*2 + 100*4) / (3 + 100) = 3.94
3 ratings of 5 and 1 rating of 4: (3*2 + 3*5 + 1*4) / (3 + 3 + 1) = 3.57
10 ratings of 4: (3*2 + 10*4) / (3 + 10) = 3.54
1 rating of 5: (3*2 + 1*5) / (3 + 1) = 2.75
No user ratings: (3*2 + 0) / (3 + 0) = 2
1 rating of 1: (3*2 + 1*1) / (3 + 1) = 1.75
This computation takes into consideration the number of user ratings, and the values of those ratings. As a result, the final score roughly corresponds to how happy one can expect to be about a particular item, given the data.
Choosing R
When you choose R, think about what value you would be comfortable assuming for an item with no ratings. Is the typical no-rating item actually 2.4 out of 5, if you were to instantly have everyone rate it? If so, R = 2.4 would be a reasonable choice.
You should not use the minimum value on the rating scale for this parameter, since an item rated extremely poorly by users should end up "worse" than a default item with no ratings.
If you want to pick R using data rather than just intuition, you can use the following method:
Consider all items with at least some threshold of user ratings (so you can be confident that the average user rating is reasonably accurate).
For each item, assume its "true score" is the average user rating.
Choose R to be the median of those scores.
If you want to be slightly more optimistic or pessimistic about a no-rating item, you can choose R to be a different percentile of the scores, for instance the 60th percentile (optimistic) or 40th percentile (pessimistic).
Choosing W
The choice of W should depend on how many ratings a typical item has, and how consistent ratings are. W can be higher if items naturally obtain many ratings, and W should be higher if you have less confidence in user ratings (e.g., if you have high spammer activity). Note that W does not have to be an integer, and can be less than 1.
Choosing W is a more subjective matter than choosing R. However, here are some guidelines:
If a typical item obtains C ratings, then W should not exceed C, or else the final score will be more dependent on R than on the actual user ratings. Instead, W should be close to a fraction of C, perhaps between C/20 and C/5 (depending on how noisy or "spammy" ratings are).
If historical ratings are usually consistent (for an individual item), then W should be relatively small. On the other hand, if ratings for an item vary wildly, then W should be relatively large. You can think of this algorithm as "absorbing" W ratings that are abnormally high or low, turning those ratings into more moderate ones.
In the extreme, setting W = 0 is equivalent to using only the average of user ratings. Setting W = infinity is equivalent to proclaiming that every item has a true rating of R, regardless of the user ratings. Clearly, neither of these extremes are appropriate.
Setting W too large can have the effect of favoring an item with many moderately-high ratings over an item with slightly fewer exceptionally-high ratings.
I appreciated the top answer at the time of posting, so here it is codified as JavaScript:
const defaultR = 2;
const defaultW = 3; // should not exceed typicalNumberOfRatingsPerAnswers 0 is equivalent to using only average of ratings
function getSortAlgoValue(ratings) {
const allRatings = ratings.reduce((sum, r) => sum + r, 0);
return (defaultR * defaultW + allRatings) / (defaultW + ratings.length);
}
Only listed as a separate answer because the formatting of the code block as a reply wasn't very
Since you've stated that the machine would only be given the rankings and the number of rankings, I would argue that it may be negligent to attempt a calculated weighting method.
First, there are two many unknowns to confirm the proposition that in enough circumstances a larger quantity of ratings are a better indication of quality than a smaller number of ratings. One example is how long have rankings been given? Has there been equal collection duration (equal attention) given to different items ranked with this same method? Others are, which markets have had access to this item and, of course, who specifically ranked it?
Secondly, you've stated in a comment below the question that this is not for front-end use but rather "the ratings are generated by machines, for machines," as a response to my comment that "it's not necessarily only statistical. One person might consider 50 ratings enough, where that might not be enough for another. And some raters' profiles might look more reliable to one person than to another. When that's transparent, it lets the user make a more informed assessment."
Why would that be any different for machines? :)
In any case, if this is about machine-to-machine rankings, the question needs greater detail in order for us to understand how different machines might generate and use the rankings.
Can a ranking generated by a machine be flawed (so as to suggest that more rankings may somehow compensate for those "flawed" rankings? What does that even mean - is it a machine error? Or is it because the item has no use to this particular machine, for example? There are many issues here we might first want to unpack, including if we have access to how the machines are generating the ranking, on some level we may already know the meaning this item may have for this machine, making the aggregated ranking superfluous.
What you can find on different plattforms is the blanking of ratings without enough votings: "This item does not have enough votings"
The problem is you can't do it in an easy formula to calculate a ranking.
I would suggest a hiding of ranking with less than minimum votings but caclulate intern a moving average. I always prefer moving average against total average as it prefers votings from the last time against very old votings which might be given for totaly different circumstances.
Additionally you do not need to have too add a list of all votings. you just have the calculated average and the next voting just changes this value.
newAverage = weight * newVoting + (1-weight) * oldAverage
with a weight about 0.05 for a preference of the last 20 values. (just experiment with this weight)
Additionally I would start with these conditions:
no votings = medium range value (1-5 stars => start with 3 stars)
the average will not be shown if less than 10 votings were given.
A simple solution might be a weighted average:
sum(votes) / number_of_votes
That way, 3 people voting 1 star, and one person voting 5 would give a weighted average of (1+1+1+5)/4 = 2 stars.
Simple, effective, and probably sufficient for your purposes.