evaluating the performance of item-based collaborative filtering for binary (yes/no) product recommendations - algorithm

I'm attempting to write some code for item based collaborative filtering for product recommendations. The input has buyers as rows and products as columns, with a simple 0/1 flag to indicate whether or not a buyer has bought an item. The output is a list similar items for a given purchased, ranked by cosine similarities.
I am attempting to measure the accuracy of a few different implementations, but I am not sure of the best approach. Most of the literature I find mentions using some form of mean square error, but this really seems more applicable when your collaborative filtering algorithm predicts a rating (e.g. 4 out of 5 stars) instead of recommending which items a user will purchase.
One approach I was considering was as follows...
split data into training/holdout sets, train on training data
For each item (A) in the set, select data from the holdout set where users bought A
Determine which percentage of A-buyers bought one of the top 3 recommendations for A-buyers
The above seems kind of arbitrary, but I think it could be useful for comparing two different algorithms when trained on the same data.

Actually your approach is quiet similar with the literature but I think you should consider to use recall and precision as most of the papers do.
http://en.wikipedia.org/wiki/Precision_and_recall
Moreover if you will use Apache Mahout there is an implementation for recall and precision in this class; GenericRecommenderIRStatsEvaluator

Best way to test a recommender is always to manually verify that the results. However some kind of automatic verification is also good.
In the spirit of a recommendation system, you should split your data in time, and see if you algorithm can predict what future buys the user does. this should be done for all users.
Don't expect that it can predict everything, a 100% correctness is usually a sign of over-fitting.

Related

Clustering+Regression-the right approach or not?

I have a task of prognosing the quickness of selling goods (for example, in one category). E.g, the client inputs the price that he wants his item to be sold and the algorithm should displays that it will be sold with the inputed price for n days. And it should have 3 intervals of quick, medium and long sell. Like in the picture:
The question: how exactly should I prepare the algorithm?
My suggestion: use clustering technics for understanding this three price ranges and then solving regression task for each cluster for predicting the number of days. Is it a right concept to do?
There are two questions here, and I think the answer to each lies in a different domain:
Given an input price, predict how long will it take to sell the item. This is a well defined prediction problem, and can be tackled using ML algorithms. e.g. use your entire dataset to train and test a regression model for prediction.
Translate the prediction into a class: quick-, medium- or slow-sell. This problem is product oriented - there doesn't seem to be any concrete data allowing you to train a classifier on this translation; and I agree with #anony-mousse that using unsupervised learning might not yield easy-to-use results.
You can either consult your users or a product manager on reasonable thresholds to use (there might be considerations here like the type of item, season etc.), or try getting some additional data in order to train a supervised classifier.
E.g. you could ask your users, post-sell, if they think the sell was quick, medium or slow. Then you'll have some data to use for thresholding or for classification.
I suggest you simply define thesholds of 10 days and 31 days. Keep it simple.
Because these are the values the users will want to understand. If you use clustering, you may end up with 0.31415 days or similar nonintuitive values that you cannot explain to the user anyway.

Item-to-item Amazon collaborative filtering

I am trying to fully understand the item-to-item Amazon's algorithm to apply it to my system to recommend items the user might like, matching the previous items the user liked.
So far I have read these: Amazon paper, item-to-item presentation and item-based algorithms. Also I found this question, but after that I just got more confused.
What I can tell is that I need to follow the next steps to get the list of recommended items:
Have my data set with the items that liked to the users (I have set liked=1 and not liked=0).
Use Pearson Correlation Score (How is this done? I found the formula, but is there any example?).
Then what should I do?
So I came with this questions:
What are the differences between the item-to-item and item-based filtering? Are both algorithms the same?
Is it right to replace the ranked score with liked or not?
Is it right to use the item-to-item algorithm, or is there any other more suitable for my case?
Any information about this topic will be appreciated.
Great questions.
Think about your data. You might have unary (consumed or null), binary (liked and not liked), ternary (liked, not liked, unknown/null), or continuous (null and some numeric scale), or even ordinal (null and some ordinal scale). Different algorithms work better with different data types.
Item-item collaborative filtering (also called item-based) works best with numeric or ordinal scales. If you just have unary, binary, or ternary data, you might be better off with data mining algorithms like association rule mining.
Given a matrix of users and their ratings of items, you can calculate the similarity of every item to every other item. Matrix manipulation and calculation is built into many libraries: try out scipy and numpy in Python, for example. You can just iterate over items and use the built-in matrix calculations to do much of the work in https://en.wikipedia.org/wiki/Cosine_similarity. Or download a framework like Mahout or Lenskit, which does this for you.
Now that you have a matrix of every item's similarity to every other item, you might want to suggest items for User U. So look in her history of items. For each history item I, for each item in your dataset ID, add the similarity of I to ID to a list of candidate item scores. When you've gone through all history items, sort the list of candidate items by score descending, and recommend the top ones.
To answer the remaining questions: a continuous or ordinal scale will give you the best collaborative filtering results. Don't use a "liked" versus "unliked" scale if you have better data.
Matrix factorization algorithms perform well, and if you don't have many users and you don't have lots of updates to your rating matrix, you can also use user-user collaborative filtering. Try item-item first through: it's a good all-purpose recommender algorithm.

Algorithm for cross selling -like amazon people who bought this

New to this and a long time since I've done any programming or forums ....
However, this is really getting under my skin.
I've been looking around at the algorithms used for Amazon etc on the recommendations they make around products which have an affinity to the ones people have selected - clearly this works very well.
Here is what I am wondering....
A - why would this be limited to affinity? Is there never a situation where a product would be exclusive of the original selection and perhaps a parallel but not like product might make sense?
B - why would a neural network not make sense? Could this not work well to provide a good link or would you just end up with a few product which have a very low weighting and therefore perpetuates their non selection?
Thanks for your views.
James
Question A: You do not need to limit it to affinity. However, you do need to "package up" all other pertinent information in a way that you can present it to the algorithm. You should read up on "association rules", "frequent itemsets" & recommender algorithms. Most of these algorithms analyze transaction data and learn rules like {peanuts, hot dogs} = {beer}. As to going away from affinity you can produce multiple sets where you reduce beer to {alcoholic beverage} then use multiple frequent item sets, at different levels of specificity, and then some sort of ensemble algorithm to combine them.
Question B: A neural network, or any other similar model would not work due to the dimensionality. Say you are Amazon.com and you want to input an item set of up to 5 items. You might encode that as an input neuron count equal to one neuron for each item times 5. How many items are on Amazon. I have no idea, but I am guessing its over 100k. Even if you get creative with dimensionality reduction, this is going to be a MASSIVE neural network, or even support vector machine or random forest.

Collaborative or structured recommendation?

I am building a Rails app that recommends tutors to students and vise versa. I need to match them based on multiple dimensions, such as their majors (Math, Biology etc.), experience (junior etc.), class (Math 201 etc.), preference (self-described keywords) and ratings.
I checked out some Rails collaborative recommendation engines (recommendable, recommendify) and Mahout. It seems that collaborative recommendation is not the best choice in my case, since I have much more structured data, which allows a more structured query. For example, I can have a recommendation logic for a student like:
if student looks for a Math tutor in Math 201:
if there's a tutor in Math major offering tutoring in Math 201 then return
else if there's a tutor in Math major then sort by experience then return
else if there's a tutor in quantitative major then sort by experience then return
...
My questions are:
What are the benefits of a collaborative recommendation algorithm given that my recommendation system will be preference-based?
If it does provide significant benefits, how I can combine it with a preference-based recommendation as mentioned above?
Since my approach will involve querying multiple tables, it might not be efficient. What should I do about this?
Thanks a lot.
It sounds like your measurement of compatibility could be profitably reformulated as a metric. What you should do is try to interpret your `columns' as being different components of the dimension of your data. The idea is that you ultimately should produce a binary function which returns a measurement of compatibility between students and tutors (and also students/students and tutors/tutors). The motivation for extending this metric to all types of data is that you can then use this idea to reformulate your matching criteria as a nearest-neighbor search:
http://en.wikipedia.org/wiki/Nearest_neighbor_search
There are plenty of data structures and solutions to this problem as it has been very well studied. For example, you could try out the following library which is often used with point cloud data:
http://www.cs.umd.edu/~mount/ANN/
To optimize things a bit, you could also try prefiltering your data by running principal component analysis on your data set. This would let you reduce the dimension of the space in which you do nearest neighbor searches, and usually has the added benefit of reducing some amount of noise.
http://en.wikipedia.org/wiki/Principal_component_analysis
Good luck!
Personally, I think collaborative filtering (cf) would work well for you. Note that a central idea of cf is serendipity. In other words, adding too many constraints might potentially result in a lukewarm recommendations to users. The whole point of cf is to provide exciting and relevant recommendations based on similar users. You need not impose such tight constraints.
If you might decide on implementing a custom cf algorithm, I would recommend reading this article published by Amazon [pdf], which discusses Amazon's recommendation system. Briefly, the algorithm they use is as follows:
for each item I1
for each customer C who bought I1
for each I2 bought by a customer
record purchase C{I1, I2}
for each item I2
calculate sim(I1, I2)
//this could use your own similarity measure, e.g., cosine based
//similarity, sim(A, B) = cos(A, B) = (A . B) / (|A| |B|) where A
//and B are vectors(items, or courses in your case) and the dimensions
//are customers
return table
Note that the creation of this table would be done offline. The online algorithm would be quick to return recommendations. Apparently, the recommendation quality is excellent.
In any case, if you want to get a better idea about cf in general (e.g., various cf strategies) and why it might be suited to you, go through that article (don't worry, it is very readable). Implementing a simple cf recommender is not difficult. Optimizations can be made later.

Beyond item-to-item recommendations

Simple item-to-item recommendation systems are well-known and frequently implemented. An example is the Slope One algorithm. This is fine if the user hasn't rated many items yet, but once they have, I want to offer more finely-grained recommendations. Let's take a music recommendation system as an example, since they are quite popular. If a user is viewing a piece by Mozart, a suggestion for another Mozart piece or Beethoven might be given. But if the user has made many ratings on classical music, we might be able to make a correlation between the items and see that the user dislikes vocals or certain instruments. I'm assuming this would be a two-part process, first part is to find correlations between each users' ratings, the second would be to build the recommendation matrix from these extra data. So the question is, are they any open-source implementations or papers that can be used for each of these steps?
Taste may have something useful. It's moved to the Mahout project:
http://taste.sourceforge.net/
In general, the idea is that given a user's past preferences, you want to predict what they'll select next and recommend it. You build a machine-learning model in which the inputs are what a user has picked in the past and the attributes of each pick. The output is the item(s) they'll pick. You create training data by holding back some of their choices, and using their history to predict the data you held back.
Lots of different machine learning models you can use. Decision trees are common.
One answer is that any recommender system ought to have some of the properties you describe. Initially, recommendations aren't so good and are all over the place. As it learns tastes, the recommendations will come from the area the user likes.
But, the collaborative filtering process you describe is fundamentally not trying to solve the problem you are trying to solve. It is based on user ratings, and two songs aren't rated similarly because they are similar songs -- they're rated similarly just because similar people like them.
What you really need is to define your notion of song-song similarity. Is it based on how the song sounds? the composer? Because it sounds like the notion is not based on ratings, actually. That is 80% of the problem you are trying to solve.
I think the question you are really answering is, what items are most similar to a given item? Given your item similarity, that's an easier problem than recommendation.
Mahout can help with all of these things, except song-song similarity based on its audio -- or at least provide a start and framework for your solution.
There are two techniques that I can think of:
Train a feed-forward artificial neural net using Backpropagation or one of it's successors (e.g. Resilient Propagation).
Use version space learning. This starts with the most general and the most specific hypotheses about what the user likes and narrows them down when new examples are integrated. You can use a hierarchy of terms to describe concepts.
Common characteristics of these methods are:
You need a different function for
each user. This pretty much rules
out efficient database queries when
searching for recommendations.
The function can be updated on the fly
when the user votes for an item.
The dimensions along which you classify
the input data (e.g. has vocals, beats
per minute, musical scales,
whatever) are very critical to the
quality of the classification.
Please note that these suggestions come from university courses in knowledge based systems and artificial neural nets, not from practical experience.

Resources