Ranking items amongst each other - algorithm

I'm making a website much like face mash where people can rate a duel between two items how much an items is "better" than the other item. It's essentially just a choice based on some item information that isn't relevant to the algorithm, it's purely based on what a user "believes" is the winner.
A user choice is to select one of these between two random items:
Item 1 much better than item 2. # item 1 win
Item 1 slightly better than item 2. # item 1 win
Item 1 equally good as item 2. # draw
Item 2 slightly better than item 1. # item 2 win
Item 2 much better than item 1. # item 2 win
I would like the grade scale value to be dynamic so if I want I could add another level like "item 1 is superior to item 2" which is a greater win than "much better".
I'm not at all a master of algorithm but I think I would've nailed it if it was just as simple as "item1 win/item2 loss" or "item2 win / item1 loss" win but I really need to grade a win, if it's a BIG win or a small win. I've looked for ranking algorithm for soccer matches but that goal is just to make predictions which isn't what I'm out for.
The goal is to create a ranking amongst ALL items in my set.

AFAIK, implementing something similar to Elo rating system is the most common approach for maintaining the kind of objects' rating you want.
The difference to naively updating the ratings with constant steps is that the Elo system takes into account the current difference between ratings to calculate how they are supposed to be updated.
That is, if A already had a much higher rating than B, and a human votes for A, the rating of A would be increased for only a tiny bit (and the rating of B would be decreased for only a tiny bit), as since A was already known to be much better than B, it was already expected a human would probably rate A over B. On the other hand, if a human votes for B, changes in A and B ratings would be much more severe, as a human contradicts the current rating, so it needs to be adjusted more.
And if ratings for A and B are similar, the ratings would be only adjusted mildly, according to human's decision.

Related

Overall rank from multiple ranked lists

I've looked through a lot of literature available online, including this forum without any luck and hoping someone can help a statistical issue I currently face:
I have 5 lists of of ranked data, each containing 10 items ranked from position 1 (best) to position 10 (worst). For sake of context, the 10 items in each lists are the same, but in different ranked orders as the technique used to decide their rank is different.
*Example data:
List 1 List 2 List 3 ... etc
Item 1 Ranked 1 Ranked 2 Ranked 1
Item 2 Ranked 3 Ranked 1 Ranked 2
Item 3 Ranked 2 Ranked 3 Ranked 3
... etc*
I am looking for a way to interpret and analyse the above data so that I get a final result showing the overall rank of each item based on each test and its position, e.g.
Result
Rank 1 = Item 1
Rank 2 = Item 3
Rank 3 = Item 4
... etc
Does anyone know how I can interpret this data in a statistically sound method (at a post graduate / PhD applicable level) so that I can understand the overall ranks signalling the importance of each item in the list across the 5 tests please? Or, if there is another type of technique or statistical test I can look into I would appreciate any hints or guidance.
(It maybe also worth noting, I have also performed the simpler mathematical techniques such as sums, averaging, minimum - maximum tests etc, but do not feel these are statistically important enough at this level).
Any help or advice would be greatly appreciated, thank you for your time.
You can use machine learning to get your ranked list. In the Information Retrieval research field - this is called Learning to Rank - and there is a wide rage of literature about it. This tutorial (heads up: high level tutorial) can help you understand the basic concepts and point you to articles for deepening in.
You might also want to have a look on interleaved ranking. This was originally engineered for evaluation of two lists, but it might also be good for your case.
A number of non-parametric statistical tests work by turning the data received into ranks and then analysing the ranks (this can make life easier if the data are very far from being normally distributed). If your ranks are plausibly derived from some underlying score or goodness that you can't observe directly, you could apply any of these tests - there is a short list at http://en.wikipedia.org/wiki/Ranking#Ranking_in_statistics or any book on non-parametric statistics, such as Conover, should cover them.
If you can come up with a statistic you are interested in, such as the total rank of any one item, you could use a Permutation Test - http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Permutation_tests to work out the probability that the statistic concerned is at least as extreme as observed, under the probability that all of the rankings are simply random - you just generate loads of data that follows the null hypothesis and look at the distribution of the statistic in the randomly generated data. You can then use this to get a P-value, or, better, a confidence bound.

Finding most distinct elements in a set

Say we have a perfume shop that has 100 different perfumes.
Let's say 10,000 customers come in an rate each perfume one through five stars.
Let's say the question is: "how to best construct a pack of 5 perfumes so that 95% customers will give a 4+ star rating for at least one of them"
How to do this algorithmically?
NOTE: I can see that even the question isn't properly formed; there's no guarantee that such a construction even exists. There is a trade-off between 2 parameters.
NOTE: Also, (and this makes the perfume analogy becomes slightly artificial), it doesn't matter whether we get one good match or three good matches. So {4.3, 0, 0, 0, 0} would be equivalent to {4.3, 4.2, 4.2, 4.2, 4.2} -- in both cases the score is 4.3.
Let's say for the purpose of argument that perfumes 0-19 are sweet, perfumes 20-39 are sour, etc (sim. salt, bitter, unami)
So there would be very high crosscorrelation between 0-19.
If you modelled this with 100 points in space, then 0-19 would all attract each other very strongly, they would form a cluster.
Similarly you would get 4 other clusters for the other four tastes.
So from just one metric, we have separated out 5 distinct flavours.
But does this technique extend?
π
PS just giving the names of related techniques would be very helpful, as this would allow me to Google for further information. So any answer that just restates the question in industry accepted terminology would be useful!
This algorithm should find a solution to the problem:
Order the perfumes by the number of customers giving a 4+ rating
Choose the first perfume not concidered yet from the list
Delete the ratings from the customers now satisfied.
Repeat the process for perfumes 2 - 5 in the pack.
Backtrace when neccessary to obtain a selection satisfying the criterion.
The true problem is NP-hard, but you can make use of a greedy algorithm:
Let C be the whole of your customers.
Assign to each perfume a coverage given by the number of customers in C that gave 4+ to each perfume
Sort by descending coverage. If C is empty and all coverages are zero, choose a perfume at random (actually, if C is nonzero but < 5% of the original, your requisite is met)
Remove from C all customers (not ratings) satisfied by the perfume just chosen
Repeat from 2 unless you already have 5 perfumes.
This automatically takes care of taste clustering: a customer giving high marks to sweet perfumes will be satisfied by the most voted sweet perfume, and he will then be struck out from C, all his further ratings ignored, and the algorithm will proceed to satisfy other customers.
Also, you should notice that even if you can't satisfy the requisite (95%, 4+) with five perfumes, perfume similarity will ensure that this algorithm maximizes both the coverage and the marks - so you might end up with, say, (93%, 3.9).
Also, suppose that 10% of users do not give any marks above 3. There's no way that you can 4-satisfy 95% of customers, since 10% of total are at most 3-satisfiable. You might want to build C with customers that actually did give at least one 4+ rating.
Or you could change the algorithm and instead of the one in your question, decide on using a knapsack: you want to take home the highest cumulative rating. This also raises the likelihood of a customer being satisfied by the overall package (as is, he is almost guaranteed to very much like one perfume, but he might strongly dislike the other four).

Rating Algorithm

I'm trying to develop a rating system for an application I'm working on. Basically app allows you to rate an object from 1 to 5(represented by stars). But I of course know that keeping a rating count and adding the rating the number itself is not feasible.
So the first thing that came up in my mind was dividing the received rating by the total ratings given. Like if the object has received the rating 2 from a user and if the number of times that object has been rated is 100 maybe adding the 2/100. However I believe this method is not good enough since 1)A naive approach 2) In order for me to get the number of times that object has been rated I have to do a look up on db which might end up having time complexity O(n)
So I was wondering what alternative and possibly better ways to approach this problem?
You can keep in DB 2 additional values - number of times it was rated and total sum of all ratings. This way to update object's rating you need only to:
Add new rating to total sum.
Divide total sum by total times it was rated.
There are many approaches to this but before that check
If all feedback givers treated at equal or some have more weight than others (like panel review, etc)
If the objective is to provide only an average or any score band or such. Consider scenario like this website - showing total reputation score
And yes - if average is to be omputed, you need to have total and count of feedback and then have to compute it - that's plain maths. But if you need any other method, be prepared for more compute cycles. balance between database hits and compute cycle but that's next stage of design. First get your requirement and approach to solution in place.
I think you should keep separate counters for 1 stars, 2 stars, ... to calcuate the rating, you'd have to compute rating = (1*numOneStars+2*numTwoStars+3*numThreeStars+4*numFourStars+5*numFiveStars)/numOneStars+numTwoStars+numThreeStars+numFourStars+numFiveStars)
This way you can, like amazon also show how many ppl voted 1 stars and how many voted 5 stars...
Have you considered a vote up/down mechanism over numbers of stars? It doesn't directly solve your problem but it's worth noting that other sites such as YouTube, Facebook, StackOverflow etc all use +/- voting as it is often much more effective than star based ratings.

Algorithm for Rating Objects Based on Amount of Votes and 5 Star Rating

I'm creating a site whereby people can rate an object of their choice by allotting a star rating (say 5 star rating). Objects are arranged in a series of tags and categories eg. electronics>graphics cards>pci express>... or maintenance>contractor>plumber.
If another user searches for a specific category or tag, the hits must return the highest "rated" object in that category. However, the system would be flawed if 1 person only votes 5 stars for an object whilst 1000 users vote an average of 4.5 stars for another object. Obviously, logic dictates that credibility would be given to the 1000 user rated object as opposed to the object that is evaluated by 1 user even though it has a "lower" score.
Conversely, it's reliable to trust an object with 500 user rating with score of 4.8 than it is to trust an object with 1000 user ratings of 4.5 for example.
What algorithm can achieve this weighting?
A great answer to this question is here:
http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
You can use the Bayesian average when sorting by recommendation.
I'd be tempted to have a cutoff (say, fifty votes though this is obviously traffic dependent) before which you consider the item as unranked. That would significantly reduce the motivation for spam/idiot rankings (especially if each vote is tied to a user account), and also gets you a simple, quick to implement, and reasonably reliable system.
simboid_function(value) = 1/(1+e^(-value));
rating = simboid_function(number_of_voters) + simboid_function(average_rating);

Algorithm to Rate Objects with Numerous Comparisons

Lets say I have a list of 500 objects. I need to rate each one out of 10.
At random I select two and present them to a friend. I then ask the friend which they prefer. I then use this comparison (ie OBJECT1 is better than OBJECT2) to alter the two objects' rating out of ten.
I then repeat this random selection and comparison thousands of times with a group of friends until I have a list of 500 objects with a reliable rating out of ten.
I need to figure out an algorithm which takes the two objects current ratings, and alters them depending on which is thought to be better...
Each object's rating could be (number of victories)/(number of contests entered) * 10. So the rating of the winner goes up a bit and the rating of the loser goes down a bit, according to how many contests they've previously entered.
For something more complicated and less sensitive to the luck of the draw with smaller numbers of trials, I'd suggest http://en.wikipedia.org/wiki/Elo_rating_system, but it's not out of 10. You could rescale everyone's scores so that the top score becomes 10, but then a match could affect everyone's rating, not just the rating of the two involved.
It all sort of depends what "reliable" means. Different friends' judgements will not be consistent with respect to each other, and possibly not even consistent over time for the same person, so there's no "real" sorted order for you to sanity-check the rankings against.
On a more abstruse point, Arrow's Impossibility Theorem states some nice properties that you'd like to have in a system that takes individual preferences and combines them to form an aggregated group preference. It then proceeds to prove that they're mutually inconsistent - you can't have them all. Any intuitive idea of a "good" overall rating runs a real risk of being unachievable.

Resources