I want to give users the ability to view some personalized users they might find interesting and might follow them...
I was thinking of it like that:
- Get all users he is currently following
- Get all followers that they follow
- rank them by total posts they made (DESC), filled up personal information fields
- show 5 of them on each page load
in case user has followers then an information message will appear...
Can this kind of feature be done with this algorithm or is there a better or even easier way to do it?
In your algorithm, I'm wondering why you need to sort users based on number of posts, maybe it has something to do with reputation?
Recommendation is indeed a very large, open topic, and is also a hot academic research fields. If we are working on a practical project, I think it will be nice to to stay simple and focused.
I witnessed the following two kinds of recommendations on a very popular
social website. From my experience, the recommendation output is of high quality. Here I'm brainstorming the algorithms behind. Hope it helps.
Discover persons you might know: Recommend person whose 'following set' intersects with your 'following set'. It is based on the "clustering effect" of social network: The friend of your friend is more likely to be your friend.
Recommend person based on interests: If the users could be celebrities, companies, institutions, press media, etc., then recommendations like the following might be useful: "People following #Linus also follow #Stallman, #LinuxDeveloper, ...". Suppose you've just followed #Linus, to recommend #Stallman, #LinuxDeveloper, first we need to find out all users following #Linus, then figure out their common following list, possibly ranked by number of followers. The idea is to recommend users based on interest correlations. We calculate and discover high correlation users, assuming that users' following list are grouped by their interests.
(I'm also thinking, algorithm 1 will discover persons that share common interests with you, if users could be celebrities, etc.. This might be preferred for some scenarios.)
You're asking a very open-ended question here - how to pick a small number of recommendations out of a large set. So the answer is - you can make it as simple or as complicated as you want it to be! The simplest would be to pick a few at random (and any more complex algorithm had better prove that it produces better results than that.) Your solution of gathering all users who are two hops away, and then ranking by number of posts, is just a bit more complex, and then at the other extreme are the sophisticated algorithms used by the Amazons and Googles of the world. Companies put a lot of effort into building this sort of thing - have you heard of the Netflix Prize?
as I understand you want to follow the user that could offer high quality information about your Thema .we need an Algorithm to give this user as result to us ,but how can I find these users:
The users that have many Followers are a good choice but not always many of users in Twitter follow another users only as respect or ethiquet.
The users that his/her twitts retwitt many times with other user is a good choice
and the user that they are mentioned many times by other users.
I think ,to find theses users we should use Link based Analyse such as HITS or Page rank algorithim
You may want to consider not including people that are following the given user. I imagine might not be so interested in you, and this could potentially be problematic. However, you maybe very interested in finding more about the people that is following.
Are you considering showing the user the reason why these people were recommended to them? For example, saying like you may be interested in what little billy is saying because of his connection to your wife. If so, to potentially avoid angered users, it may be worth allowing them to in a sense opt-out.
It seems like other than that, it seems like it is a pretty good way of recommending users that someone would be interested in. The only other things that I can think of that might also help find people with similar interests, is if you allow users to tag posts. Allowing you to find users by similar interests, or by what they are posting about.
One other more problematic thing that you could look into is finding users by similar interest. for example, if person a is following person c, and person b is following person c, then maybe recommend person a to person b. though this seems like it could make for some very lengthy queries if you are not careful.
Related
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Where can I learn about recommendation systems?
I am always interested in how web sites recommend articles and users to me, based on what I "like", what I follow, what I vote up/vote down.
And it could also recommend items when I browse an item, "related articles", "people who like this article also like ..."
I need some articles and images to teach me how to implement such a system. Thanks very much.
Update:
I got a keyword "Slope one"
The Wikipedia article, Recommender system, is a good place to start. Also, this Recommendation engine blog post has some good information and illustrations.
The simplest method is one that uses the "people who like this article also like..." approach. If you keep track of each users' article ratings, and also keep track of who likes which articles, then you have the basis for a recommendation system.
For example, say that you're viewing Article A. The system can look up in its index every user who liked Article A. From that list, it can then create a list of all the articles liked by every user who liked Article A. In all likelihood, there will be significant overlap (that is, some articles were liked by multiple people). Your algorithm keeps track of how many likes it got for each article, and then shows the top N that got the most votes.
That simple system is surprisingly effective in many cases, but not perfect. You'll find that exceptionally popular articles dominate, even if they're not related to the article you're viewing. There are ways to prevent the hugely popular articles from dominating. One way is to use a floating point number for an article's score. Rather than adding 1 to the score for each "like", you add 1 / sqrt(users_number_of_likes). So that a user who likes, say, 100 articles, would only give 1/10 point to any individual article, but a user who likes only four articles would give 1/2 a point to each. Although this doesn't sound "fair," it does tend to attenuate the effect of hugely popular, but unrelated, items.
As I said, that's the simplest approach. If you're looking for "related" articles, not based on user input, then you have to either have keywords assigned to each article, or you need some way to examine an article and extract relevant keywords.
There are many ways to do what you're looking to do. Which one you choose depends on the nature of your data, whether you're doing collaborative filtering, how much time you want to spend developing it, and how good you want the results to be.
Netflix spent 1M dollars in price for movie recommendation system (algorhitm)
http://www.netflixprize.com/
You can read here about algorithm
I have 3 main questions about the algorithms in intelligent web (web 2.0)
Here the book I'm reading http://www.amazon.com/Algorithms-Intelligent-Web-Haralambos-Marmanis/dp/1933988665 and I want to learn the algorithms in deeper
1. People You may follow (Twitter)
How can one determine the nearest result to my requests ? Data mining? which algorithms?
2. How you’re connected feature (Linkedin)
Simply algorithm works like that. It draws the path between two nodes let say between Me and the other person is C. Me -> A, B -> A connections -> C . It is not any brute force algorithms or any other like graph algorithms :)
3. Similar to you (Twitter, Facebook)
This algorithms is similar to 1. Does it simply work the max(count) friend in common (facebook) or the max(count) follower in Twitter? or any other algorithms they implement? I think the second part is true because running the loop
dict{count, person}
for person in contacts:
dict.add(count(common(person)))
return dict(max)
is a silly act in every refreshing page.
4. Did you mean (Google)
I know that they may implement it with phonetic algorithm http://en.wikipedia.org/wiki/Phonetic_algorithm simply soundex http://en.wikipedia.org/wiki/Soundex and here is the Google VP of Engineering and CIO Douglas Merrill speak http://www.youtube.com/watch?v=syKY8CrHkck#t=22m03s
What about first 3 questions? Any ideas are welcome !
Thanks
People who you may follow
You can use the factors based calculations:
factorA = getFactorA(); // say double(0.3)
factorB = getFactorB(); // say double(0.6)
factorC = getFactorC(); // say double(0.8)
result = (factorA+factorB+factorC) / 3 // double(0.5666666666666667)
// if result is more than 0.5, you show this person
So say in the case of Twitter, "People who you may follow" can based on the following factors (User A is the user viewing this "People who you may follow" feature, there may be more or less factors):
Relativity between frequent keywords found in User A's and User B's tweets
Relativity between the profile description of both users
Relativity between the location of User A and B
Are people User A is following follows User B?
So where do they compare "People who you may follow" from? The list probably came from a combination of people with high amount of followers (they are probably celebrities, alpha geeks, famous products/services, etc.) and [people whom User A is following] is following.
Basically there's a certain level of data mining to be done here, reading the tweets and bios, calculations. This can be done on a daily or weekly cron job when the server load is least for the day (or maybe done 24/7 on a separate server).
How are you connected
This is probably a smart work here to make you feel that loads of brute force has been done to determine the path. However after some surface research, I find that this is simple:
Say you are User A; User B is your connection; and User C is a connection of User B.
In order for you to visit User C, you need to visit User B's profile first. By visiting User B's profile, the website already save the info indiciating that User A is at User B's profile. So when you visit User C from User B, the website immediately tells you that 'User A -> User B -> User C', ignoring all other possible paths.
This is the max level as at User C, User Acannot go on to look at his connections until User C is User A's connection.
Source: observing LinkedIN
Similar to you
It's the exact same thing as #1 (People you may follow), except that the algorithm reads in a different list of people. The list of people that the algorithm reads in is the people whom you follow.
Did you mean
Well you got it right there, except that Google probably used more than just soundex. There's language translation, word replacement, and many other algorithms used for the case of Google. I can't comment much on this because it will probably get very complex and I am not an expert to handle languages.
If we research a little more into Google's infrastructure, we can find that Google has servers dedicated to Spelling and Translation services. You can get more information on Google platform at http://en.wikipedia.org/wiki/Google_platform.
Conclusion
The key to highly intensified algorithms is caching. Once you cache the result, you don't have to load it every page. Google does it, Stack Overflow does it (on most of the pages with list of questions) and Twitter not surprisingly too!
Basically, algorithms are defined by developers. You may use others' algorithms, but ultimately, you can also create your own.
People you may follow
Could be one of many types of recommendation algorithms, maybe collaborative filtering?
How you are connected
This is just a shortest path algorithm on the social graph. Assuming there is no weight to the connections, it will simply use breadth-first.
Similar to you
Simply a re-arrangement of the data set using the same algorithm as People you may follow.
Check out the book Programming Collective Intelligence for a good introduction to the type of algorithms that are used for People you may follow and Similar to you, it has great python code available too.
People You may follow
From Twitter blog - "suggestions are based on several factors, including people you follow and the people they follow" http://blog.twitter.com/2010/07/discovering-who-to-follow.html
So if you follow A and B and they both follow C, then Twitter will suggest C to you...
How you’re connected feature
I think you have answered this one.
Similar to you
As above and as you say, although the results are probably cached - so its only done once per session or maybe even less frequently...
Hope that helps,
Chris
I don't use twitter; but with that in mind:
1). On the surface, this isn't that difficult: For each person I follow, see who they follow. Then for each of the people they follow, see who they follow, etc. The deeper you go, of course, the more number crunching it takes.
You can take this a bit further, if you can also efficiently extract the reverse: For those I follow, who also follows them?
For both ways, what's unsaid is a way to weight the tweeters to see if they're someone I'd really want to follow: A liberal follower may also follow a conservative tweeter, but that doesn't mean I'd want follow the conservative (see #3).
2). Not sure, thinking about it...
3). Assuming the bio and tweets are the only thing to go on, the hard parts are:
Deciding what attributes should exist (political affiliation, topic types, etc.)
Cleaning each 140 characters to data-mine.
Once you have the right set of attributes, then two different algorithms come to mind:
K means clustering, to decide which attributes I tend to discriminate on.
N-Nearest neighbor, to find the N most similar tweeters to you given the attributes I tend to give weight to.
EDIT: Actually, a decision tree is probably a FAR better way to do all of this...
This is all speculative, but it sounds fun if one were getting paid to do this.
I am implementing a web application that has many users and I would give the users rating based on their activities and based on other users liking their activities. How would I implement such an algorithm for that? I am looking for elegant and smart algorithm that could help.
You are basically looking for Scoring Algos. These articles might help -
How not to sort by average rating
Rank hotness with Newtons law of Cooling
How Reddit Ranking Algorithms work
Hope this helps.
Maybe your answer is staring right at you next to your username on this site :-) Stackoverflow.com's scoring system and badges are here to promote certain behaviors on the site. The algorithm is simple and the feedback is immediate so that everybody can see the consequences of certain actions.
What are the ratings used for? If you want to use the ratings as incentives for you users to encourage a specific behavior, then I believe you need to look at disciplines like behavioral psychology to figure out what behaviors you want to measure and reward.
If you already have a user base that reflects the typical user base you're trying to address, you might want to try with simple trial and error. Pick some actions, like e.g. receiving a like on a post and add points to the user's score whenever that happens. Watch the user community's reaction when you introduce the scoring system and see it it helps motivate the behavior you want. If not, try to change some other parameters and repeat.
Depending on your system, some users might try to game the system, so you could find yourself locked into an eternal cat and mouse game once you introduce a rating system (example: Google page ranking).
Recently, I found several web site have something like : "Recommended for You", for example youtube, or facebook, the web site can study my using behavior, and recommend some content for me... ...I would like to know how they analysis this information? Is there any Algorithm to do so? Thank you.
Amazon and Netflix (among others) use a technique called Collaborative filtering to suggest things you might like based on the likes/dislikes of others who have made purchases and selections similar to yours.
Is there any Algorithm to do so?
Yes
Yes. One fairly common one is to look at things you've selected in the past, find other people who've made those selections, then find the other selections most common among those other people, and guess that you're likely to be interested in those as well.
Yup there are lots of algorithms. Things such as k-nearest neighbor: http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm.
Here is a pretty good book on the subject that covers making these sorts of systems along with others: http://www.amazon.com/gp/product/0596529325?ie=UTF8&tag=ianburriscom-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0596529325.
It's generally done by matching you with other users who have similar usage history / profile and then recommending other things that they've purhased/watched/whatever.
Searching for "recommendation algorithm" yields lots of papers. Most algorithms incorporate "machine learning" algorithms to determine groups of things (comedy movies, books on gardening, orchestral music, etc.). Your matching with those groups yields recommendations. Some companies use humans to classify things, too.
Such an algorithm is going to vary wildly from company to company. In many cases, it analyzes some combination of your search history, purchase history, physical location, and other factors. It probably will also compare purchases/searches amongst other people to find what those people have purchased/searched for, and recommend some of those products to you.
There are probably hundreds of these algorithms out there, but I doubt you can use any of them (that are actually good). Probably you are better off figuring it out yourself.
If you can categorize your contents (i.e. by tagging or content analysis), you can also categorize your users and their preferences.
For example: you have a video portal with 5 million videos .. 1 mio of them are tagged mostly red. If 80% of all videos watched by a user (who is defined by an IP, a persistent user account, ...) are tagged mostly red, you might want to recommend even more red videos to him. You might want to refine your recommendations by looking at his further actions: does he like your recommendations -- if so, why not give him even more, if not, try the second-best guess, maybe he's not looking for color, but for the background music ...
There's no absolute algorithm to do it, but all implementations will go into a similar direction. It's always basing on observing users, which scares me from time to time :-)
There's whole lot of algorithms tackling the issue: Wiki article. It's a Machine Learning domain problem. Computer's can be learned using two main techniques: classification and clustering. They require some datasets as input. If the dataset is informative (really holds some useful patterns) than those ML techniques can dig most of it.
Clustering could be best to use for this kind of problem. It's main usage is to find similarities among points in provided dataset. If the points are, e.g. your search history, they can be grouped together to form certain clusters. If Your search history closely relates to another, a hint can be given - picking links that are most similar to Your's.
The same comes with book recommendations - it's obvious what dataset they use: "Other people who bought this product also bought Product A, Product B,...". The key here is to match your profile to other's and use the most similar to recommend.
The computer retrieves information from the human brain with complex memory scan process, sorts it accordingly and outputs results based on what you have experienced in your life so far.
What's a good algorithm for suggesting things that someone might like based on their previous choices? (e.g. as popularised by Amazon to suggest books, and used in services like iRate Radio or YAPE where you get suggestions by rating items)
Simple and straightforward (order cart):
Keep a list of transactions in terms of what items were ordered together. For instance when someone buys a camcorder on Amazon, they also buy media for recording at the same time.
When deciding what is "suggested" on a given product page, look at all the orders where that product was ordered, count all the other items purchased at the same time, and then display the top 5 items that were most frequently purchased at the same time.
You can expand it from there based not only on orders, but what people searched for in sequence on the website, etc.
In terms of a rating system (ie, movie ratings):
It becomes more difficult when you throw in ratings. Rather than a discrete basket of items one has purchased, you have a customer history of item ratings.
At that point you're looking at data mining, and the complexity is tremendous.
A simple algorithm, though, isn't far from the above, but it takes a different form. Take the customer's highest rated items, and the lowest rated items, and find other customers with similar highest rated and lowest rated lists. You want to match them with others that have similar extreme likes and dislikes - if you focus on likes only, then when you suggest something they hate, you'll have given them a bad experience. In suggestions systems you always want to err on the side of "lukewarm" experience rather than "hate" because one bad experience will sour them from using the suggestions.
Suggest items in other's highest lists to the customer.
Consider looking at "What is a Good Recommendation Algorithm?" and its discussion on Hacker News.
There isn't a definitive answer and it's highly unlikely there is a standard algorithm for that.
How you do that heavily depends on the kind of data you want to relate and how it is organized. It depends on how you define "related" in the scope of your application.
Often the simplest thought produces good results. In the case of books, if you have a database with several attributes per book entry (say author, date, genre etc.) you can simply chose to suggest a random set of books from the same author, the same genre, similar titles and others like that.
However, you can always try more complicated stuff. Keeping a record of other users that required this "product" and suggest other "products" those users required in the past (product can be anything from a book, to a song to anything you can imagine). Something that most major sites that have a suggest function do (although they probably take in a lot of information, from product attributes to demographics, to best serve the client).
Or you can even resort to so called AI; neural networks can be constructed that take in all those are attributes of the product and try (based on previous observations) to relate it to others and update themselves.
A mix of any of those cases might work for you.
I would personally recommend thinking about how you want the algorithm to work and how to suggest related "products". Then, you can explore all the options: from simple to complicated and balance your needs.
Recommended products algorithms are huge business now a days. NetFlix for one is offering 100,000 for only minor increases in the accuracy of their algorithm.
As you have deduced by the answers so far, and indeed as you suggest, this is a large and complex topic. I can't give you an answer, at least nothing that hasn't already been said, but I an point you to a couple of excellent books on the topic:
Programming CI:
http://oreilly.com/catalog/9780596529321/
is a fairly gentle introduction with
samples in Python.
CI In Action:
http://www.manning.com/alag looks a
bit more in depth (but I've only just
read the first chapter or 2) and has
examples in Java.
I think doing a Google on Least Mean Square Regression (or something like that) might give you something to chew on.
I think most of the useful advice has already been suggested but I thought I'll just put in how I would go about it, just thinking though, since I haven't done anything like this.
First I Would find where in the application I will sample the data to be used, so If I have a store it will probably in the check out. Then I would save a relation between each item in the checkout cart.
now if a user goes to an items page I can count the number of relations from other items and pick for example the 5 items with the highest number of relation to the selected item.
I know its simple, and there are probably better ways.
But I hope it helps
Market basket analysis is the field of study you're looking for:
Microsoft offers two suitable algorithms with their Analysis server:
Microsoft Association Algorithm Microsoft Decision Trees Algorithm
Check out this msdn article for suggestions on how best to use Analysis Services to solve this problem.
link text
there is a recommendation platform created by amazon called Certona, you may find this useful, it is used by companies such as B&Q and Screwfix find more information at www.certona.com/