Recently, I found several web site have something like : "Recommended for You", for example youtube, or facebook, the web site can study my using behavior, and recommend some content for me... ...I would like to know how they analysis this information? Is there any Algorithm to do so? Thank you.
Amazon and Netflix (among others) use a technique called Collaborative filtering to suggest things you might like based on the likes/dislikes of others who have made purchases and selections similar to yours.
Is there any Algorithm to do so?
Yes
Yes. One fairly common one is to look at things you've selected in the past, find other people who've made those selections, then find the other selections most common among those other people, and guess that you're likely to be interested in those as well.
Yup there are lots of algorithms. Things such as k-nearest neighbor: http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm.
Here is a pretty good book on the subject that covers making these sorts of systems along with others: http://www.amazon.com/gp/product/0596529325?ie=UTF8&tag=ianburriscom-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0596529325.
It's generally done by matching you with other users who have similar usage history / profile and then recommending other things that they've purhased/watched/whatever.
Searching for "recommendation algorithm" yields lots of papers. Most algorithms incorporate "machine learning" algorithms to determine groups of things (comedy movies, books on gardening, orchestral music, etc.). Your matching with those groups yields recommendations. Some companies use humans to classify things, too.
Such an algorithm is going to vary wildly from company to company. In many cases, it analyzes some combination of your search history, purchase history, physical location, and other factors. It probably will also compare purchases/searches amongst other people to find what those people have purchased/searched for, and recommend some of those products to you.
There are probably hundreds of these algorithms out there, but I doubt you can use any of them (that are actually good). Probably you are better off figuring it out yourself.
If you can categorize your contents (i.e. by tagging or content analysis), you can also categorize your users and their preferences.
For example: you have a video portal with 5 million videos .. 1 mio of them are tagged mostly red. If 80% of all videos watched by a user (who is defined by an IP, a persistent user account, ...) are tagged mostly red, you might want to recommend even more red videos to him. You might want to refine your recommendations by looking at his further actions: does he like your recommendations -- if so, why not give him even more, if not, try the second-best guess, maybe he's not looking for color, but for the background music ...
There's no absolute algorithm to do it, but all implementations will go into a similar direction. It's always basing on observing users, which scares me from time to time :-)
There's whole lot of algorithms tackling the issue: Wiki article. It's a Machine Learning domain problem. Computer's can be learned using two main techniques: classification and clustering. They require some datasets as input. If the dataset is informative (really holds some useful patterns) than those ML techniques can dig most of it.
Clustering could be best to use for this kind of problem. It's main usage is to find similarities among points in provided dataset. If the points are, e.g. your search history, they can be grouped together to form certain clusters. If Your search history closely relates to another, a hint can be given - picking links that are most similar to Your's.
The same comes with book recommendations - it's obvious what dataset they use: "Other people who bought this product also bought Product A, Product B,...". The key here is to match your profile to other's and use the most similar to recommend.
The computer retrieves information from the human brain with complex memory scan process, sorts it accordingly and outputs results based on what you have experienced in your life so far.
Related
I want to give users the ability to view some personalized users they might find interesting and might follow them...
I was thinking of it like that:
- Get all users he is currently following
- Get all followers that they follow
- rank them by total posts they made (DESC), filled up personal information fields
- show 5 of them on each page load
in case user has followers then an information message will appear...
Can this kind of feature be done with this algorithm or is there a better or even easier way to do it?
In your algorithm, I'm wondering why you need to sort users based on number of posts, maybe it has something to do with reputation?
Recommendation is indeed a very large, open topic, and is also a hot academic research fields. If we are working on a practical project, I think it will be nice to to stay simple and focused.
I witnessed the following two kinds of recommendations on a very popular
social website. From my experience, the recommendation output is of high quality. Here I'm brainstorming the algorithms behind. Hope it helps.
Discover persons you might know: Recommend person whose 'following set' intersects with your 'following set'. It is based on the "clustering effect" of social network: The friend of your friend is more likely to be your friend.
Recommend person based on interests: If the users could be celebrities, companies, institutions, press media, etc., then recommendations like the following might be useful: "People following #Linus also follow #Stallman, #LinuxDeveloper, ...". Suppose you've just followed #Linus, to recommend #Stallman, #LinuxDeveloper, first we need to find out all users following #Linus, then figure out their common following list, possibly ranked by number of followers. The idea is to recommend users based on interest correlations. We calculate and discover high correlation users, assuming that users' following list are grouped by their interests.
(I'm also thinking, algorithm 1 will discover persons that share common interests with you, if users could be celebrities, etc.. This might be preferred for some scenarios.)
You're asking a very open-ended question here - how to pick a small number of recommendations out of a large set. So the answer is - you can make it as simple or as complicated as you want it to be! The simplest would be to pick a few at random (and any more complex algorithm had better prove that it produces better results than that.) Your solution of gathering all users who are two hops away, and then ranking by number of posts, is just a bit more complex, and then at the other extreme are the sophisticated algorithms used by the Amazons and Googles of the world. Companies put a lot of effort into building this sort of thing - have you heard of the Netflix Prize?
as I understand you want to follow the user that could offer high quality information about your Thema .we need an Algorithm to give this user as result to us ,but how can I find these users:
The users that have many Followers are a good choice but not always many of users in Twitter follow another users only as respect or ethiquet.
The users that his/her twitts retwitt many times with other user is a good choice
and the user that they are mentioned many times by other users.
I think ,to find theses users we should use Link based Analyse such as HITS or Page rank algorithim
You may want to consider not including people that are following the given user. I imagine might not be so interested in you, and this could potentially be problematic. However, you maybe very interested in finding more about the people that is following.
Are you considering showing the user the reason why these people were recommended to them? For example, saying like you may be interested in what little billy is saying because of his connection to your wife. If so, to potentially avoid angered users, it may be worth allowing them to in a sense opt-out.
It seems like other than that, it seems like it is a pretty good way of recommending users that someone would be interested in. The only other things that I can think of that might also help find people with similar interests, is if you allow users to tag posts. Allowing you to find users by similar interests, or by what they are posting about.
One other more problematic thing that you could look into is finding users by similar interest. for example, if person a is following person c, and person b is following person c, then maybe recommend person a to person b. though this seems like it could make for some very lengthy queries if you are not careful.
Simple item-to-item recommendation systems are well-known and frequently implemented. An example is the Slope One algorithm. This is fine if the user hasn't rated many items yet, but once they have, I want to offer more finely-grained recommendations. Let's take a music recommendation system as an example, since they are quite popular. If a user is viewing a piece by Mozart, a suggestion for another Mozart piece or Beethoven might be given. But if the user has made many ratings on classical music, we might be able to make a correlation between the items and see that the user dislikes vocals or certain instruments. I'm assuming this would be a two-part process, first part is to find correlations between each users' ratings, the second would be to build the recommendation matrix from these extra data. So the question is, are they any open-source implementations or papers that can be used for each of these steps?
Taste may have something useful. It's moved to the Mahout project:
http://taste.sourceforge.net/
In general, the idea is that given a user's past preferences, you want to predict what they'll select next and recommend it. You build a machine-learning model in which the inputs are what a user has picked in the past and the attributes of each pick. The output is the item(s) they'll pick. You create training data by holding back some of their choices, and using their history to predict the data you held back.
Lots of different machine learning models you can use. Decision trees are common.
One answer is that any recommender system ought to have some of the properties you describe. Initially, recommendations aren't so good and are all over the place. As it learns tastes, the recommendations will come from the area the user likes.
But, the collaborative filtering process you describe is fundamentally not trying to solve the problem you are trying to solve. It is based on user ratings, and two songs aren't rated similarly because they are similar songs -- they're rated similarly just because similar people like them.
What you really need is to define your notion of song-song similarity. Is it based on how the song sounds? the composer? Because it sounds like the notion is not based on ratings, actually. That is 80% of the problem you are trying to solve.
I think the question you are really answering is, what items are most similar to a given item? Given your item similarity, that's an easier problem than recommendation.
Mahout can help with all of these things, except song-song similarity based on its audio -- or at least provide a start and framework for your solution.
There are two techniques that I can think of:
Train a feed-forward artificial neural net using Backpropagation or one of it's successors (e.g. Resilient Propagation).
Use version space learning. This starts with the most general and the most specific hypotheses about what the user likes and narrows them down when new examples are integrated. You can use a hierarchy of terms to describe concepts.
Common characteristics of these methods are:
You need a different function for
each user. This pretty much rules
out efficient database queries when
searching for recommendations.
The function can be updated on the fly
when the user votes for an item.
The dimensions along which you classify
the input data (e.g. has vocals, beats
per minute, musical scales,
whatever) are very critical to the
quality of the classification.
Please note that these suggestions come from university courses in knowledge based systems and artificial neural nets, not from practical experience.
I've been thinking about this for a while now, so I thought I would ask for suggestions:
I have some crawler which enters the root of some site (could be anything from www.StackOverFlow.com, www.SomeDudesPersonalSite.se or even www.Facebook.com). Then I need to determin what "kind of homepage" I'm visiting.. Different types could for instance be:
Forum
Blog
Link catalog
Social media site
News site
"One man site"
I've been brainstorming for a while, and the best solution seems to be some heuristic with a point system. By this I mean different trends gives some points to the different types, and then the program makes a guess afterwards.
But this is where I get stuck.. How do you detect trends?
Catalogs could be easy: If sitesIndexed/Outgoing links is very high, catalogs should get several points.
News sites/Blogs could be easy: If a high amount of sites indexed has a datetime, those types should get several points..
BUT I can't really find too many trends.
SO: My question is:
Any ideas on how to do this?
Thanks so much..
I believe you are attempting document classification, which is a well-researched topic.
http://en.wikipedia.org/wiki/Document_classification
You will see a considerable list of many different methods. But to suggest any one of those (or neural networks or the like) prior to determining the "trends" as you call them is to suggest it prematurely. I would recommend looking into "web document classification" or the like. It is evidently a considerable subset of document classification, and if you have access to academic journals there are plenty of incomprehensible articles for your enjoyment.
I did also find your idea as a homework assignment -- perhaps if you are particularly audacious you could contact the professor.
http://uhaweb.hartford.edu/compsci/ccli/wdc.htm
Lastly, I believe that this is an accessible (if strangely formatted) website that has a general and perhaps outdated discussion:
http://www.webology.ir/2008/v5n1/a52.html
I'm afraid I don't have much personal knowledge of the topic, so the most I could do was tell you the keyword "document classification" and provide some quick googling. However, if I wanted to play around with this concept, I think simply looking for the rate of certain keywords is a decent starting "trend." ("Sale" or "purchase" or "customers" are trends for shopping sites, "my," "opinion," "comment," for blogs, and so on)
You could train a neural network to recognise them. Give it number/types of links, maybe types of HTML tags as well.
I think otherwise you're just going to be second-guessing what makes a site what it is.
I want to implement a media recommendation engine. I saw a similar posts on this, but I think my requirements are bit different from those, so posting here.
Here is the deal.
I want to implement a recommendation engine for media players like VLC, which would be an engine that has to care for only single user. Like, it would be embedded in a media player on a PC which is typically used by single user. And it will start learning the likes and dislikes of the user and gradually learns what a user likes. Here it will not be able to find similar users for using their data for recommendation as its a single user system. So how to go about this?
Or you can consider it as a recommendation engine that has to be put in say iPods, which has to learn about a single user and recommend music/Movies from the collections it has.
I thought of start collecting the genre of music/movies (maybe even artist name) that user watches and recommend movies from the most watched Genre, but it look very crude, isn't it?
So is there any algorithms I can use or any resources I can refer up to?
Regards,
MicroKernel :)
What you're trying to do is quite challenging... particularly because it's still in the research stage and a lot of PHDs from reputable universities across the world are trying to get a good solution for that.
SO here are some things that you might need:
Data that you can analyze:
Lots, and lots, and lots of data!
It could be meta data about the media (name, duration, title, author, style, etc.)
Or you can try to do some crazy feature extraction from the media itself.
References to correlate the data to.
Since you can't get other users, you always need the user feedback.
If you don't want to annoy your user to death with feedback questions, then make your application connect to a central server so you can compare users.
An algorithm that can model your data sufficiently well.
If you have no experience at all, then try k-nearest neighbor (the simplest one).
Collaborative filtering
Pearson Correlation
Matrix Factorization/Decomposition
Singular value decomposition (SVD)
Ensemble learning <-- Allows you to combine multiple algorithms and take advantage of their strengths.
The winners of the NetFlix prize said this:
Predictive accuracy is substantially
improved when blending multiple
predictors. Our experience is that
most efforts should be concentrated in
deriving substantially different
approaches, rather than refining a
single technique. Consequently, our
solution is an ensemble of many
methods.
Conclusion:
There is no silver bullet for recommendation engines and it takes years of exploration to find a good combination of algorithms that produce sufficient results. :)
What's a good algorithm for suggesting things that someone might like based on their previous choices? (e.g. as popularised by Amazon to suggest books, and used in services like iRate Radio or YAPE where you get suggestions by rating items)
Simple and straightforward (order cart):
Keep a list of transactions in terms of what items were ordered together. For instance when someone buys a camcorder on Amazon, they also buy media for recording at the same time.
When deciding what is "suggested" on a given product page, look at all the orders where that product was ordered, count all the other items purchased at the same time, and then display the top 5 items that were most frequently purchased at the same time.
You can expand it from there based not only on orders, but what people searched for in sequence on the website, etc.
In terms of a rating system (ie, movie ratings):
It becomes more difficult when you throw in ratings. Rather than a discrete basket of items one has purchased, you have a customer history of item ratings.
At that point you're looking at data mining, and the complexity is tremendous.
A simple algorithm, though, isn't far from the above, but it takes a different form. Take the customer's highest rated items, and the lowest rated items, and find other customers with similar highest rated and lowest rated lists. You want to match them with others that have similar extreme likes and dislikes - if you focus on likes only, then when you suggest something they hate, you'll have given them a bad experience. In suggestions systems you always want to err on the side of "lukewarm" experience rather than "hate" because one bad experience will sour them from using the suggestions.
Suggest items in other's highest lists to the customer.
Consider looking at "What is a Good Recommendation Algorithm?" and its discussion on Hacker News.
There isn't a definitive answer and it's highly unlikely there is a standard algorithm for that.
How you do that heavily depends on the kind of data you want to relate and how it is organized. It depends on how you define "related" in the scope of your application.
Often the simplest thought produces good results. In the case of books, if you have a database with several attributes per book entry (say author, date, genre etc.) you can simply chose to suggest a random set of books from the same author, the same genre, similar titles and others like that.
However, you can always try more complicated stuff. Keeping a record of other users that required this "product" and suggest other "products" those users required in the past (product can be anything from a book, to a song to anything you can imagine). Something that most major sites that have a suggest function do (although they probably take in a lot of information, from product attributes to demographics, to best serve the client).
Or you can even resort to so called AI; neural networks can be constructed that take in all those are attributes of the product and try (based on previous observations) to relate it to others and update themselves.
A mix of any of those cases might work for you.
I would personally recommend thinking about how you want the algorithm to work and how to suggest related "products". Then, you can explore all the options: from simple to complicated and balance your needs.
Recommended products algorithms are huge business now a days. NetFlix for one is offering 100,000 for only minor increases in the accuracy of their algorithm.
As you have deduced by the answers so far, and indeed as you suggest, this is a large and complex topic. I can't give you an answer, at least nothing that hasn't already been said, but I an point you to a couple of excellent books on the topic:
Programming CI:
http://oreilly.com/catalog/9780596529321/
is a fairly gentle introduction with
samples in Python.
CI In Action:
http://www.manning.com/alag looks a
bit more in depth (but I've only just
read the first chapter or 2) and has
examples in Java.
I think doing a Google on Least Mean Square Regression (or something like that) might give you something to chew on.
I think most of the useful advice has already been suggested but I thought I'll just put in how I would go about it, just thinking though, since I haven't done anything like this.
First I Would find where in the application I will sample the data to be used, so If I have a store it will probably in the check out. Then I would save a relation between each item in the checkout cart.
now if a user goes to an items page I can count the number of relations from other items and pick for example the 5 items with the highest number of relation to the selected item.
I know its simple, and there are probably better ways.
But I hope it helps
Market basket analysis is the field of study you're looking for:
Microsoft offers two suitable algorithms with their Analysis server:
Microsoft Association Algorithm Microsoft Decision Trees Algorithm
Check out this msdn article for suggestions on how best to use Analysis Services to solve this problem.
link text
there is a recommendation platform created by amazon called Certona, you may find this useful, it is used by companies such as B&Q and Screwfix find more information at www.certona.com/