Weighted Average and Ratings - algorithm

Maths isn't my strong point and I'm at a loss here.
Basically, all I need is a simple formula that will give a weighted rating on a scale of 1 to 5. If there are very few votes, they carry less influence and the rating pressess more towards the average (in this case I want it to be 3, not the average of all other ratings).
I've tried a few different bayesian implementations but these haven't worked out. I believe the graphical representation I am looking for could be shown as:
___
/
___/
Cheers

I'd do this this way
1*num(1) + 2*num(2) + 3*num(3) + 4*num(4) + 5*num(5) + A*3
-----------------------------------------------------------
num(1) + num(2) + num(3) + num(4) + num(5) + A
Where num(i) is number of votes for i.
A is a parameter. I can't tell You exact value of it. It depends on what do You mean by "few votes". In general high value of A means that You need many votes to get average different than 3, low value of A means You need few votes to get different value than 3.
If You consider 5 as "few votes" then You can take A=5.
In this solution I just assume that each product starts with A votes for 3 instead of no votes.
Hope it helps.

(sum(ratings) / number(ratings)) * min(number(ratings), 10)/max(number(ratings), 10)
The first part is the un-normalized average rating. The second part will slowly increase the rating towards 5 as the number of individual ratings grows to 10. The question isn't clear enough for me to provide a better answer, but I believe the above formula might be something you can start with and adapt as you go. It goes without saying that you have to check if there are any ratings at all (not to divide by zero).

Related

Determining an algorithm to calculate individual item prices for "deal" listings

Here are a couple of example scenarios about what I'm trying to figure out:
Let's say a grocery store item is listed as 4 for 5.00. How do we go about figuring the unit price for each item, according to the deal that is listed?
A simple solution would be to divide the total price by the quantity listed, and in this case, you would get 1.25.
However, in a situation that is a bit more complicated, such as 3 for 5.00, dividing the price by the quantity gives roughly 1.6666666666666667, which would round to 1.67.
If we round all three items to 1.67, the total price is not 5.00, but in fact 5.01. The individual prices would need to be calculated as 1.67, 1.67, and 1.66 in order to add up correctly.
The same goes for something like 3 for 4.00. The mathematical unit price would be 1.3333333333333333, rounding to 1.33. However, we need to adjust one of them again because the actual price without adjustments would be 3.99. The individual prices would need to be 1.34, 1.33, and 1.33 to add up correctly.
Is there an efficient way to determine how to split up a price deal like this and how to determine adjusted amounts so that the individual prices add up correctly?
If you want to divide an integer number (e.g. of pence) up into as equal portions as possible one way is to mimic dividing up that portion of a line by making marks in it, so portion i of n (counting from 0) when the total is T is of length floor((T * (i + 1)) / n) - floor((T * i) / n).
Whether it makes sense to say the the individual prices of 3 items are 1.67, 1.67, and 1.66 is another matter. How do you decide which item is the cheap one?
Since we're talking about basic math here, I'd say efficiency depends more on your implementation than the algorithm. When you say "efficient", do you mean you don't want to add up all prices to check the remainder? That can be done:
The premise is, you're selling x items for a price of y, where x is obviously integer and y is obviously a float rounded to 2 decimals.
First, you need to calculate the remainder: R = (y*100)%x.
x-R of the items will cost (y * 100 : x) /100 (where ":" means integer division and "/" means float division)
and R of the items will cost (y * 100 : x) / 100 + 0.01
That algorithm is theoretically speaking efficient since it has a complexity of O(1). But I think I remember that in hardware not much is more efficient than adding floats (Don't take my word for it, I didn't pay that much attention to my hardware lectures), so maybe the crude approach is still better.

Find all possible combinations of a scores consistent with data

So I've been working on a problem in my spare time and I'm stuck. Here is where I'm at. I have a number 40. It represents players. I've been given other numbers 39, 38, .... 10. These represent the scores of the first 30 players (1 -30). The rest of the players (31-40) have some unknown score. What I would like to do is find how many combinations of the scores are consistent with the given data.
So for a simpler example: if you have 3 players. One has a score of 1. Then the number of possible combinations of the scores is 3 (0,2; 2,0; 1,1), where (a,b) stands for the number of wins for player one and player two, respectively. A combination of (3,0) wouldn't work because no person can have 3 wins. Nor would (0,0) work because we need a total of 3 wins (and wouldn't get it with 0,0).
I've found the total possible number of games. This is the total number of games played, which means it is the total number of wins. (There are no ties.) Finally, I have a variable for the max wins per player (which is one less than the total number of players. No player can have more than that.)
I've tried finding the number of unique combinations by spreading out N wins to each player and then subtracting combinations that don't fit the criteria. E.g., to figure out many ways to give 10 victories to 5 people with no more than 4 victories to each person, you would use:
C(14,4) - C(5,1)*C(9,4) + C(5,2)*C(4,4) = 381. C(14,4) comes from the formula C(n+k-1, k-1) (google bars and strips, I believe). The next is picking off the the ones with the 5 (not allowed), but adding in the ones we subtracted twice.
Yeah, there has got to be an easier way. Lastly, the numbers get so big that I'm not sure that my computer can adequately handle them. We're talking about C(780, 39), which is 1.15495183 × 10^66. Regardless, there should be a better way of doing this.
To recap, you have 40 people. The scores of the first 30 people are 10 - 39. The last ten people have unknown scores. How many scores can you generate that are meet the criteria: all the scores add up to total possible wins and each player gets no more 39 wins.
Thoughts?
Generating functions:
Since the question is more about math, but still on a programming QA site, let me give you a partial solution that works for many of these problems using a symbolic algebra (like Maple of Mathematica). I highly recommend that you grab an intro combinatorics book, these kind of questions are answered there.
First of all the first 30 players who score 10-39 (with a total score of 735) are a bit of a red herring - what we would like to do is solve the other problem, the remaining 10 players whose score could be in the range of (0...39).
If we think of the possible scores of the players as the polynomial:
f(x) = x^0 + x^1 + x^2 + ... x^39
Where a value of x^2 is the score of 2 for example, consider what this looks like
f(x)^10
This represents the combined score of all 10 players, ie. the coefficent of x^385 is 2002, which represents the fact that there are 2002 ways for the 10 players to score 385. Wolfram Alpha (a programming lanuage IMO) can evaluate this for us.
If you'd like to know how many possible ways of doing this, just substitute in x=1 in the expression giving 8,140,406,085,191,601, which just happens to be 39^10 (no surprise!)
Why is this useful?
While I know it may seem silly to some to set up all this machinery for a simple problem that can be solved on paper - the approach of generating functions is useful when the problem gets messy (and asymptotic analysis is possible). Consider the same problem, but now we restrict the players to only score prime numbers (2,3,5,7,11,...). How many possible ways can the 10 of them score a specific number, say 344? Just modify your f(x):
f(x) = x^2 + x^3 + x^5 + x^7 + x^11 ...
and repeat the process! (I get [x^344]f(x)^10 = 1390).

How to provide most relevant results with Multiple Factor Weighted Sorting

I need to provide a weighted sort on 2+ factors, ordered by "relevancy". However, the factors aren't completely isolated, in that I want one or more of the factors to affect the "urgency" (weight) of the others.
Example: contributed content (articles) can be up-/down-voted, and thus have a rating; they have a post date, and they're also tagged with categories. Users write the articles and can vote, and may or may not have some kind of ranking themselves (expert, etc). Probably similar to StackOverflow, right?
I want to provide each user with a list of articles grouped by tag but sorted by "relevancy", where relevancy is calculated based on the rating and age of the article, and possibly affected by the ranking of the author. I.E. a highly ranked article that was written several years ago may not necessarily be as relevant as a medium ranked article written yesterday. And maybe if an article was written by an expert it would be treated as more relevant than one written by "Joe Schmoe".
Another good example would be assigning hotels a "meta score" comprised of price, rating, and attractions.
My question is, what is the best algorithm for multiple factor sorting? This may be a duplicate of that question, but I'm interested in a generic algorithm for any number of factors (a more reasonable expectation is 2 - 4 factors), preferably a "fully-automatic" function that I don't have to tweak or require user input, and I can't parse linear algebra and eigenvector wackiness.
Possibilities I've found so far:
Note: S is the "sorting score"
"Linearly weighted" - use a function like: S = (w1 * F1) + (w2 * F2) + (w3 * F3), where wx are arbitrarily assigned weights, and Fx are the values of the factors. You'd also want to normalize F (i.e. Fx_n = Fx / Fmax). I think this is kinda how Lucene search works.
"Base-N weighted" - more like grouping than weighting, it's just a linear weighting where weights are increasing multiples of base-10 (a similar principle to CSS selector specificity), so that more important factors are significantly higher: S = 1000 * F1 + 100 * F2 + 10 * F3 ....
Estimated True Value (ETV) - this is apparently what Google Analytics introduced in their reporting, where the value of one factor influences (weights) another factor - the consequence being to sort on more "statistically significant" values. The link explains it pretty well, so here's just the equation: S = (F2 / F2_max * F1) + ((1 - (F2 / F2_max)) * F1_avg), where F1 is the "more important" factor ("bounce rate" in the article), and F2 is the "significance modifying" factor ("visits" in the article).
Bayesian Estimate - looks really similar to ETV, this is how IMDb calculates their rating. See this StackOverflow post for explanation; equation: S = (F2 / (F2+F2_lim)) * F1 + (F2_lim / (F2+F2_lim)) × F1_avg, where Fx are the same as #3, and F2_lim is the minimum threshold limit for the "significance" factor (i.e. any value less than X shouldn't be considered).
Options #3 or #4 look really promising, since you don't really have to choose an arbitrary weighting scheme like you do in #1 and #2, but the problem is how do you do this for more than two factors?
I also came across the SQL implementation for a two-factor weighting algorithm, which is basically what I'll need to write eventually.
As mentioned in the comments, I would suggest what's called the 'compromise solution' to anyone with a similar problem who is more concerned with not having to set weights than with making one criterion more heavily weighted than the others.
Basically, you consider each of your criterion as a coordinate (after normalization, of course). Based on your judgement, you choose the absolute optimal point, e.g. in this case, the highest rank author, the newest article, etc. Once you choose the optimal solution, each other 'solution' is rated based on its distance from that optimal. A sample formula would be the inverse of the Euclidean distance for each article's score: S = 1/(sqrt((rank - rank_ideal)^2 + (age - age_ideal)^2 + ... + (xn - xn_ideal)^2)).
This treats all criteria as equal, so keep that in mind.
Consider chaining of the weights. E.g. you have 3 factors: X, Y and Z.
You can calculate ETVyz as W = (Z/Zmax * Y) + (1 - Z/Zmax) * Yavg for each record and then calculate ETVxw as S = (W/Wmax * X) + (1 - W/Wmax) * Xavg.
You can chain more factors similary.
The solution, pointed shortly by #gankoji is a simplification of the TOPSIS method.
In TOPSIS the compromise solution can be regarded as choosing the solution with the shortest Euclidean distance from the ideal solution and the farthest Euclidean distance from the negative ideal solution.
This class of problems falls under the term MCDM - Multiple Criteria Decision Making.
Python packages scikit-criteria and mcdm provide implementations of most popular methods. The package docs link to the respective algorithm papers.

How to combine various measures into a single measure

I have several measures:
Profit and loss (PNL).
Win to loss ratio (W2L).
Avg gain to drawdown ratio (AG2AD).
Max gain to maximum drawdown ratio (MG2MD).
Number of consecutive gains to consecutive losses ratio (NCG2NCL).
If there were only 3 measures (A, B, C), then I could represent the "total" measure as a magnitude of a 3D vector:
R = SQRT(A^2 + B^2 + C^2)
If I want to combine those 5 measures into a single value, would it make sense to represent them as the magnitude of a 5D vector? Is there a way to put more "weight" on certain measures, such as the PNL? Is there a better way to combine them?
Update:
I'm trying to write a function (in C#) that takes in 5 measures and represents them in a linear manner so I can collapse the multidimensional values into a single linear value. The point of this is that it will allow me to only use one variable (save memory) and it will provide a fast method of comparison between two sets of measures. Almost like building a hash value, but each hash can be used for comparison (i.e. >, <, ==).
The statistical significance of the values is the same as the order they're listed: PNL is the most significant while NCG2NCL is the least significant.
If I want to combine those 5 measures into a single value, would it make sense to represent them as the magnitude of a 5D vector?
Absolutely, if result suits you.
Is there a way to put more "weight" on certain measures, such as the PNL?
You can introduce constant weights
SQRT(wa*A^2 + wb*B^2 + wb*C^2)
Is there a better way to combine them?
That depends on your requirements. In particular, there's nothing wrong with using simple sum |A| + |B| + |C|, that would favour 'average' properties better. I.e., with your formula (0, 0, 9) gives much better total than (3, 3, 3), while with the simple sum they would be equivalent.
Generally speaking Oli is right: you'll have to make the decision yourself, no algorithm book can evaluate the requirements for you.
Combining measures into a single value is risky at best. However you do it you loose information. If I have 3 oranges, an apple and a couple of slices of bread I can combine them in various ways:
Sum (3 + 1 + 2 ) = 6
Weighted sum ( .5 * 3 + 2 * 1 + 1.5 * 2) = 6.5
SQRT( 3 ^ 2 + 1 ^ 2 + 2 ^ 2) = SQRT ( 15 ) ~= 3.8
SQRT( 3 ^ 2 + 2 * 1 ^ 2 + 2 ^ 2) = SQRT (16) = 4
and on and on.
Whichever result I get is less meaningful than the first. Through in a steak and a glass of water and the value becomes even less meaningful. The result is always some measure of serving of food.
You need to figure out how to convert your various values into values with equivelent scales (linear or log) and equivalent value (1 X ~= 1 Y ~= 1Z). At that point a simple sum or product may be sufficient. In your case, it appears you are trying to combine various measure of financial return. Some of the measures you are using are not highly comparable.
As others have noted, there are an infinite number of ways of combining values. You've tagged the question machine-learning and artificial-intelligence, which suggests you might want to find the optimum way of combining them? Eg. come up with a "goodness" metric, and try to model this from the others. Then there are a range of machine learning algorithms - eg. a Bayesian Model would be a good start: Fast, generally performs well if not necessarily the best.
I would suggest implementing this using principal component analysis. That will give you the weights you need for your coefficients. You can either do this via a stat package or use a packaged C# function.
-Ralph Winters

How to balance number of ratings versus the ratings themselves?

For a school project, we'll have to implement a ranking system. However, we figured that a dumb rank average would suck: something that one user ranked 5 stars would have a better average that something 188 users ranked 4 stars, and that's just stupid.
So I'm wondering if any of you have an example algorithm of "smart" ranking. It only needs to take in account the rankings given and the number of rankings.
Thanks!
You can use a method inspired by Bayesian probability. The gist of the approach is to have an initial belief about the true rating of an item, and use users' ratings to update your belief.
This approach requires two parameters:
What do you think is the true "default" rating of an item, if you have no ratings at all for the item? Call this number R, the "initial belief".
How much weight do you give to the initial belief, compared to the user ratings? Call this W, where the initial belief is "worth" W user ratings of that value.
With the parameters R and W, computing the new rating is simple: assume you have W ratings of value R along with any user ratings, and compute the average. For example, if R = 2 and W = 3, we compute the final score for various scenarios below:
100 (user) ratings of 4: (3*2 + 100*4) / (3 + 100) = 3.94
3 ratings of 5 and 1 rating of 4: (3*2 + 3*5 + 1*4) / (3 + 3 + 1) = 3.57
10 ratings of 4: (3*2 + 10*4) / (3 + 10) = 3.54
1 rating of 5: (3*2 + 1*5) / (3 + 1) = 2.75
No user ratings: (3*2 + 0) / (3 + 0) = 2
1 rating of 1: (3*2 + 1*1) / (3 + 1) = 1.75
This computation takes into consideration the number of user ratings, and the values of those ratings. As a result, the final score roughly corresponds to how happy one can expect to be about a particular item, given the data.
Choosing R
When you choose R, think about what value you would be comfortable assuming for an item with no ratings. Is the typical no-rating item actually 2.4 out of 5, if you were to instantly have everyone rate it? If so, R = 2.4 would be a reasonable choice.
You should not use the minimum value on the rating scale for this parameter, since an item rated extremely poorly by users should end up "worse" than a default item with no ratings.
If you want to pick R using data rather than just intuition, you can use the following method:
Consider all items with at least some threshold of user ratings (so you can be confident that the average user rating is reasonably accurate).
For each item, assume its "true score" is the average user rating.
Choose R to be the median of those scores.
If you want to be slightly more optimistic or pessimistic about a no-rating item, you can choose R to be a different percentile of the scores, for instance the 60th percentile (optimistic) or 40th percentile (pessimistic).
Choosing W
The choice of W should depend on how many ratings a typical item has, and how consistent ratings are. W can be higher if items naturally obtain many ratings, and W should be higher if you have less confidence in user ratings (e.g., if you have high spammer activity). Note that W does not have to be an integer, and can be less than 1.
Choosing W is a more subjective matter than choosing R. However, here are some guidelines:
If a typical item obtains C ratings, then W should not exceed C, or else the final score will be more dependent on R than on the actual user ratings. Instead, W should be close to a fraction of C, perhaps between C/20 and C/5 (depending on how noisy or "spammy" ratings are).
If historical ratings are usually consistent (for an individual item), then W should be relatively small. On the other hand, if ratings for an item vary wildly, then W should be relatively large. You can think of this algorithm as "absorbing" W ratings that are abnormally high or low, turning those ratings into more moderate ones.
In the extreme, setting W = 0 is equivalent to using only the average of user ratings. Setting W = infinity is equivalent to proclaiming that every item has a true rating of R, regardless of the user ratings. Clearly, neither of these extremes are appropriate.
Setting W too large can have the effect of favoring an item with many moderately-high ratings over an item with slightly fewer exceptionally-high ratings.
I appreciated the top answer at the time of posting, so here it is codified as JavaScript:
const defaultR = 2;
const defaultW = 3; // should not exceed typicalNumberOfRatingsPerAnswers 0 is equivalent to using only average of ratings
function getSortAlgoValue(ratings) {
const allRatings = ratings.reduce((sum, r) => sum + r, 0);
return (defaultR * defaultW + allRatings) / (defaultW + ratings.length);
}
Only listed as a separate answer because the formatting of the code block as a reply wasn't very
Since you've stated that the machine would only be given the rankings and the number of rankings, I would argue that it may be negligent to attempt a calculated weighting method.
First, there are two many unknowns to confirm the proposition that in enough circumstances a larger quantity of ratings are a better indication of quality than a smaller number of ratings. One example is how long have rankings been given? Has there been equal collection duration (equal attention) given to different items ranked with this same method? Others are, which markets have had access to this item and, of course, who specifically ranked it?
Secondly, you've stated in a comment below the question that this is not for front-end use but rather "the ratings are generated by machines, for machines," as a response to my comment that "it's not necessarily only statistical. One person might consider 50 ratings enough, where that might not be enough for another. And some raters' profiles might look more reliable to one person than to another. When that's transparent, it lets the user make a more informed assessment."
Why would that be any different for machines? :)
In any case, if this is about machine-to-machine rankings, the question needs greater detail in order for us to understand how different machines might generate and use the rankings.
Can a ranking generated by a machine be flawed (so as to suggest that more rankings may somehow compensate for those "flawed" rankings? What does that even mean - is it a machine error? Or is it because the item has no use to this particular machine, for example? There are many issues here we might first want to unpack, including if we have access to how the machines are generating the ranking, on some level we may already know the meaning this item may have for this machine, making the aggregated ranking superfluous.
What you can find on different plattforms is the blanking of ratings without enough votings: "This item does not have enough votings"
The problem is you can't do it in an easy formula to calculate a ranking.
I would suggest a hiding of ranking with less than minimum votings but caclulate intern a moving average. I always prefer moving average against total average as it prefers votings from the last time against very old votings which might be given for totaly different circumstances.
Additionally you do not need to have too add a list of all votings. you just have the calculated average and the next voting just changes this value.
newAverage = weight * newVoting + (1-weight) * oldAverage
with a weight about 0.05 for a preference of the last 20 values. (just experiment with this weight)
Additionally I would start with these conditions:
no votings = medium range value (1-5 stars => start with 3 stars)
the average will not be shown if less than 10 votings were given.
A simple solution might be a weighted average:
sum(votes) / number_of_votes
That way, 3 people voting 1 star, and one person voting 5 would give a weighted average of (1+1+1+5)/4 = 2 stars.
Simple, effective, and probably sufficient for your purposes.

Resources