I've seen some sites where they show two random items from a list, and users pick which one they prefer, and then based on the results of the user preferences, a ranking is generated for the entire data set. Does anyone know what this ranking algorithm is called and how it works?
Thank you.
I believe you're referring to the ELO rating system.
A simple implementation would be to always choose two random items for the comparison and give the preferred item a point. Then rank in order of decreasing points.
The usual method for this is collaborative filtering. For this usually the choices of all persons are compared and a similarity between persons is used to weight their choices, when recommending or rating items. That means, people who have shown similar choices to yours before, are used more to generate recommendations than those who have shown different behavior.
There are several methods for doing this inference, and which one is best or how to optimize the performance is an open research questions. Most often the simplest implementation will achieve sufficent predictions and is easy to implement. It just does a two multiplications of the preference matrix with itself transposed.
Related
I am trying to build a small app so that my friends and I can more easily make decisions about where we want to eat. The idea is that, given a list of restaurants, each person puts down a score from 0-100 indicating how much they like that restaurant. I want to figure out a good way to combine those scores to output an ordered list of recommendations. For discussion's sake, we can assume that everyone scores restaurants normally across the scale (i.e. let's assume the individual preference scores are valid/normalized/etc.).
As of now, I was thinking of just sorting by the average score of each restaurant while enforcing a minimum score from each person so that no one is very unhappy. In other words, the goal is to maximize happiness with the constraint that no one should be extremely unhappy.
Does anyone have any suggestions on a clever algorithm or better way to achieve this? Is there any research on matching problems that could be relevant to this, or am I just over-thinking it?
Your can first compute for each restaurant:
- the mean value
- the minimum value
Then you can easily sort by mean with any constraint you need.
Other intersting methods exist. You can for instance use minimax. It means you sort by maximum among the restaurant minimums. It guarantees that noone would hate this restaurant.
This looks like the Weighted voting system, check wikipedia or other Internet resources.
Note that this system can easily be manipulated if voters or not honnest, you probably want another system.
I have a list of users which need to be sorted into committees. The users can rank committees based on their particular preference, but must choose at least one to join. When they have all made their selections, the algorithm should sort them as evenly as possible taking into account their committee preference, gender, age, time zone and country (for now). I have looked at this question and its answer would seem like a good choice, but it is unclear to me how to add the various constraints to the algorithm for it to work.
Would anyone point me in the right direction on how to do this, please?
Looking for "clustering" will get you nowhere, because this is not a clustering type if task.
Instead, this is an assignment problem.
For further informarion, see:
Knapsack Problem
Generalized Assignment Problem
Usually, these are NP-hard to solve. Thus, one will usually choose a greedy optimization heuristic to find a reasonably good solution faster.
Think about how to best assign one person at a time.
Then, process the data as follows:
assign everybody that can only be assigned in a single way
find an unassigned person that is hard to assign, stop if everybody is assigned
assign the best possible way
remove preferences that are no longer admissible, and go to 1 again (there may be new person with only a single choice left)
For bonus points, add a source of randomness, and an overall quality measure. Then run the algorothm 10 times, and keep only the best result.
For further bonus, add an postprocessing optimization: when can you transfer one person to another group or swap to persons to improve the overall quality? Iterate over all persons to find such small improvements until you cannot find any.
I am trying to fully understand the item-to-item Amazon's algorithm to apply it to my system to recommend items the user might like, matching the previous items the user liked.
So far I have read these: Amazon paper, item-to-item presentation and item-based algorithms. Also I found this question, but after that I just got more confused.
What I can tell is that I need to follow the next steps to get the list of recommended items:
Have my data set with the items that liked to the users (I have set liked=1 and not liked=0).
Use Pearson Correlation Score (How is this done? I found the formula, but is there any example?).
Then what should I do?
So I came with this questions:
What are the differences between the item-to-item and item-based filtering? Are both algorithms the same?
Is it right to replace the ranked score with liked or not?
Is it right to use the item-to-item algorithm, or is there any other more suitable for my case?
Any information about this topic will be appreciated.
Great questions.
Think about your data. You might have unary (consumed or null), binary (liked and not liked), ternary (liked, not liked, unknown/null), or continuous (null and some numeric scale), or even ordinal (null and some ordinal scale). Different algorithms work better with different data types.
Item-item collaborative filtering (also called item-based) works best with numeric or ordinal scales. If you just have unary, binary, or ternary data, you might be better off with data mining algorithms like association rule mining.
Given a matrix of users and their ratings of items, you can calculate the similarity of every item to every other item. Matrix manipulation and calculation is built into many libraries: try out scipy and numpy in Python, for example. You can just iterate over items and use the built-in matrix calculations to do much of the work in https://en.wikipedia.org/wiki/Cosine_similarity. Or download a framework like Mahout or Lenskit, which does this for you.
Now that you have a matrix of every item's similarity to every other item, you might want to suggest items for User U. So look in her history of items. For each history item I, for each item in your dataset ID, add the similarity of I to ID to a list of candidate item scores. When you've gone through all history items, sort the list of candidate items by score descending, and recommend the top ones.
To answer the remaining questions: a continuous or ordinal scale will give you the best collaborative filtering results. Don't use a "liked" versus "unliked" scale if you have better data.
Matrix factorization algorithms perform well, and if you don't have many users and you don't have lots of updates to your rating matrix, you can also use user-user collaborative filtering. Try item-item first through: it's a good all-purpose recommender algorithm.
I'm attempting to write some code for item based collaborative filtering for product recommendations. The input has buyers as rows and products as columns, with a simple 0/1 flag to indicate whether or not a buyer has bought an item. The output is a list similar items for a given purchased, ranked by cosine similarities.
I am attempting to measure the accuracy of a few different implementations, but I am not sure of the best approach. Most of the literature I find mentions using some form of mean square error, but this really seems more applicable when your collaborative filtering algorithm predicts a rating (e.g. 4 out of 5 stars) instead of recommending which items a user will purchase.
One approach I was considering was as follows...
split data into training/holdout sets, train on training data
For each item (A) in the set, select data from the holdout set where users bought A
Determine which percentage of A-buyers bought one of the top 3 recommendations for A-buyers
The above seems kind of arbitrary, but I think it could be useful for comparing two different algorithms when trained on the same data.
Actually your approach is quiet similar with the literature but I think you should consider to use recall and precision as most of the papers do.
http://en.wikipedia.org/wiki/Precision_and_recall
Moreover if you will use Apache Mahout there is an implementation for recall and precision in this class; GenericRecommenderIRStatsEvaluator
Best way to test a recommender is always to manually verify that the results. However some kind of automatic verification is also good.
In the spirit of a recommendation system, you should split your data in time, and see if you algorithm can predict what future buys the user does. this should be done for all users.
Don't expect that it can predict everything, a 100% correctness is usually a sign of over-fitting.
I have an algorithm that chooses a list of items that should fit the user's likings.
I'll skip the algorithm's details because of confidentiality issues...
Now, I'm trying to think of a way to check it statistically, with a group of people.
The way I'm checking it now is:
Algorithm gets best results per user.
shuffle top 5 results with lowest 5 results.
make person list the results he liked by order (0 = liked best, 9 = didn't like)
compare user results to algorithm results.
I'm doing this because i figured that to show that algorithm chooses good results, i need to put in some bad results and show that the algorithm knows its a bad result as well.
So, what I'm asking is:
Is shuffling top results with low results is a good idea ?
And if not, do you have an idea on how to get good statistics on how good an algorithm matches user preferences (we have users that can choose stuff) ?
First ask yourself:
What am I trying to measure?
Not to rag on the other submissions here, but while mjv and Sjoerd's answers offer some plausible heuristic reasons for why what you are trying to do may not work as you expect; they are not constructive in the sense that they do not explain why your experiment is flawed, and what you can do to improve it. Before either of these issues can be addressed, what you need to do is define what you hope to measure, and only then should you go about trying to devise an experiment.
Now, I can't say for certain what would constitute a good metric for your purposes, but I can offer you some suggestions. As a starting point, you could try using a precision vs. recall graph:
http://en.wikipedia.org/wiki/Precision_and_recall
This is a standard technique for assessing the performance of ranking and classification algorithms in machine learning and information retrieval (ie web searching). If you have an engineering background, it could be helpful to understand that precision/recall generalizes the notion of precision/accuracy:
http://en.wikipedia.org/wiki/Accuracy_and_precision
Now let us suppose that your algorithm does something like this; it takes as input some prior data about a user then returns a ranked list of other items that user might like. For example, your algorithm is a web search engine and the items are pages; or you have a movie recommender and the items are books. This sounds pretty close to what you are trying to do now, so let us continue with this analogy.
Then the precision of your algorithm's results on the first n is the number of items that the user actually liked out of your first to top n recommendations:
precision = #(items user actually liked out of top n) / n
And the recall is the number of items that you actually got right out of the total number of items:
recall = #(items correctly marked as liked) / #(items user actually likes)
Ideally, one would want to maximize both of these quantities, but they are in a certain sense competing objectives. To illustrate this, consider a few extremal situations: For example, you could have a recommender that returns everything, which would have perfect recall, but very low precision. A second possibility is to have a recommender that returns nothing or only one sure-fire hit, which would have (in a limiting sense) perfect precision, but almost no recall.
As a result, to understand the performance of a ranking algorithm, people typically look at its precision vs. recall graph. These are just plots of the precision vs the recall as the number of items returned are varied:
Image taken from the following tutorial (which is worth reading):
http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html
Now to approximate a precision vs recall for your algorithm, here is what you can do. First, return a large set of say n, results as ranked by your algorithm. Next, get the user to mark which items they actually liked out of those n results. This trivially gives us enough information to compute the precision at every partial set of documents < n (since we know the number). We can also compute the recall (as restricted to this set of documents) by taking the total number of items liked by the user in the entire set. This, we can plot a precision recall curve for this data. Now there are fancier statistical techniques for estimating this using less work, but I have already written enough. For more information please check out the links in the body of my answer.
Your method is biased. If you use the top 5 and bottom 5 results, It is very likely that the user orders it according to your algorithm. Let's say we have an algorithm which rates music, and I present the top 1 and bottom 1 to the user:
Queen
The Cheeky Girls
Of course the user will mark it exactly like your algorithm, because the difference between the top and bottom is so big. You need to make the user rate randomly selected items.
Independently of the question of mixing top and bottom guesses, an implicit drawback of the experimental process, as described, is that the data related to the user's choice can only be exploited in the context of one particular version of the algorithm:
When / if the algorithm or its parameters are ever slightly tuned, the record of past user's choices cannot be reused to validate the changes to the algorithm.
On mixing high and low results:
The main drawback of producing sets of items by mixing the algorithm's top and bottom guesses is that it may further complicate the choice of the error/distance function used to measure how well the algorithm performed. Unless the two subsets of items (topmost choices, bottom most choices) are kept separately for the purpose of computing distinct measurements, typical statistical measures of the error (say RMSE) will not be a good measurement of the effective algorithm's quality.
For example, an algorithm which frequently suggests, low guesses items which end up being picked as top choices by the user may have the same averaged error rate than an algorithm which never confuses highs with lows, but where there the user tends to reorders the items more within their subset.
A second drawback is that the algorithm evaluation method may merely qualify its ability of filtering the relative like/dislike of users for items it [the algorithm] chooses rather than its ability of producing the user's actual top choices.
In other words the user's actual top choices may never be offered to him; so yeah the algorithm does a good job at guessing that user will like say Rock-and-Roll before Rap, but never guessing that in fact user prefers Classical Baroque music over all.