Trying to work out an algorithm for grouping my users into distinct profiles based on their activity - e.g. "regular users", "occasional users", "regular poster", "lurker" for someone who doesn't post but does stuff.
For regular user, I was thinking that the algorithm would have to include the total users, how much an average user visits a site, completing actions such as "like" or "favorite" or "view" or "clicking link".
I'm not very good with algorithms so looking for some help.
Depends on the applications traffic you can try to count every action performed by your users and divide by the users count making it average. And then based on that information apply the ranks e.g.:
1. Regular user with 15% more or less than the average.
2. Active user with >= 16% than the average.
and so on..
The important thing is to do it with the percentage as you want it to become based on YOUR traffic. You can also set some static requirements like 3 clicks per day.
Your best bet is to first really consider how you would define the terminology. Once you've defined the termininology, you should then be able to rank people based on those terms. For example, a user might be "someone who accesses the site", thus a regular user is "someone who accesses the site regularly". The same idea applies to a poster. A lurker doesn't really fit as it seems pretty much like "a user, who does not post".
Thus it might be a good idea to define what actions are related to what terminology. A lurker might then be "someone who accesses the site but rarely does much in terms of actions" while a user is "someone who accesses the site and uses some of it's features, but has does not posted anything".
You then need to determine a metric for defining what regular, occasional, active and other such terms mean. ailvenge gives two good ways this can be done.
Related
I want to give users the ability to view some personalized users they might find interesting and might follow them...
I was thinking of it like that:
- Get all users he is currently following
- Get all followers that they follow
- rank them by total posts they made (DESC), filled up personal information fields
- show 5 of them on each page load
in case user has followers then an information message will appear...
Can this kind of feature be done with this algorithm or is there a better or even easier way to do it?
In your algorithm, I'm wondering why you need to sort users based on number of posts, maybe it has something to do with reputation?
Recommendation is indeed a very large, open topic, and is also a hot academic research fields. If we are working on a practical project, I think it will be nice to to stay simple and focused.
I witnessed the following two kinds of recommendations on a very popular
social website. From my experience, the recommendation output is of high quality. Here I'm brainstorming the algorithms behind. Hope it helps.
Discover persons you might know: Recommend person whose 'following set' intersects with your 'following set'. It is based on the "clustering effect" of social network: The friend of your friend is more likely to be your friend.
Recommend person based on interests: If the users could be celebrities, companies, institutions, press media, etc., then recommendations like the following might be useful: "People following #Linus also follow #Stallman, #LinuxDeveloper, ...". Suppose you've just followed #Linus, to recommend #Stallman, #LinuxDeveloper, first we need to find out all users following #Linus, then figure out their common following list, possibly ranked by number of followers. The idea is to recommend users based on interest correlations. We calculate and discover high correlation users, assuming that users' following list are grouped by their interests.
(I'm also thinking, algorithm 1 will discover persons that share common interests with you, if users could be celebrities, etc.. This might be preferred for some scenarios.)
You're asking a very open-ended question here - how to pick a small number of recommendations out of a large set. So the answer is - you can make it as simple or as complicated as you want it to be! The simplest would be to pick a few at random (and any more complex algorithm had better prove that it produces better results than that.) Your solution of gathering all users who are two hops away, and then ranking by number of posts, is just a bit more complex, and then at the other extreme are the sophisticated algorithms used by the Amazons and Googles of the world. Companies put a lot of effort into building this sort of thing - have you heard of the Netflix Prize?
as I understand you want to follow the user that could offer high quality information about your Thema .we need an Algorithm to give this user as result to us ,but how can I find these users:
The users that have many Followers are a good choice but not always many of users in Twitter follow another users only as respect or ethiquet.
The users that his/her twitts retwitt many times with other user is a good choice
and the user that they are mentioned many times by other users.
I think ,to find theses users we should use Link based Analyse such as HITS or Page rank algorithim
You may want to consider not including people that are following the given user. I imagine might not be so interested in you, and this could potentially be problematic. However, you maybe very interested in finding more about the people that is following.
Are you considering showing the user the reason why these people were recommended to them? For example, saying like you may be interested in what little billy is saying because of his connection to your wife. If so, to potentially avoid angered users, it may be worth allowing them to in a sense opt-out.
It seems like other than that, it seems like it is a pretty good way of recommending users that someone would be interested in. The only other things that I can think of that might also help find people with similar interests, is if you allow users to tag posts. Allowing you to find users by similar interests, or by what they are posting about.
One other more problematic thing that you could look into is finding users by similar interest. for example, if person a is following person c, and person b is following person c, then maybe recommend person a to person b. though this seems like it could make for some very lengthy queries if you are not careful.
I'd like to implement a feedback mechanism in my application--basically, a score. The requirements are:
A total exists, and can be read
A user can add his score to the total
A user cannot add a second score, but could change his original score, again updating the total by removing (subtracting) the original score, and adding the new one.
It is impossible to determine what a given user's vote was
It seems that this borders on (or even overlaps) cryptography theory, but I haven't been able to find anything that would address this. Does anyone have any specific algorithms that would address this? Or even additional search vectors I could use to pursue it?
If there is an anonymous ID, such as a hash of a value that the user supplies, then anyone who can produce something that yields the same hash could modify the corresponding vote.
In this sense, there is still anonymity, because the hash doesn't reveal the source. Instead of listing (userName, vote), list (hashValue, vote). If there is some concern that tracking the hashValue is traceable across many polls, then encode an additional poll-specific wrapping for the hash, which is not revealed publicly. Or let the user embed (e.g. prepend) that into their string to be hashed, so they are still producing a unique submission.
You can never have anonymous voting without the ability to trust that the anonymous individuals will not vote twice. By definition, true anonymity guarantees that you can never detect duplicate voting.
If you instead force the user to identify themself, you can implement a voting system that prevents duplicate voting and provides anonymity within the context of the vote.
Here is a simple algorithm.
User logs in. The onus is on your system to prevent one user from obtaining multiple user accounts.
User (not anonymous) selects an issue on which to vote.
User (not anonymous) casts a vote.
Your system stores the following:
An indication that the user voted on the selected issue. This prevents duplicate voting.
The value of the users vote on the selected issue (this is the score you mentioned). This value is stored without reference to the user who cast the vote.
The value of the user's score if they voted on an issue. You probably need this to be a calculated value
If the user wants to change their vote, they login, select the issue, then unvote (your system knows they voted because it stored this). At this point they can choose the issue again (their vote indication was cleared) and vote.
Note that your system will need to subtract the value of the user's vote from the tally for the issue when they unvote.
You don't give enough information on what a legal vote is, but if it's, say, an integer, then you can just keep a sum and allow multiple votes. This works because changing a vote from A to B has the exact same effect as voting A and then voting (B - A).
Actually, online voting is pretty tricky.
If you want the most extreme approach to voting safety, you may need to consider something like this:
https://docs.google.com/document/d/1SPYFAkVNjqDP4HOt_A_YGFZy-SFXVxHoN1hpLGNFKXI/pub
It is an algorithm that distributes the voting secret among n distinct servers that each cannot break the voting anonymity by themselves. All n servers would have to cooperate in order to break anonymity, and if only one of the server cover its tracks ans wipes all cryptographic data away, the voting secret is lost/hidden forever.
The system can also deal with re-sending of votes, with some limitations inherent to any secure system for online voting:
For voting security online there is always an ultimate limitation in that it is vulnerable to traffic analysis. For example, if one day only one person votes, it can be concluded that any update of the voting result is a result of that persons voting.
A perfect secure online voting system should be viewed as a one-time vote-mixer. It takes a number of votes. Buffers them, and when the voting is finally closed, it mixes all of them in one go. This makes it extremely difficult to associate a vote with a voter. This can be achieved with pretty solid technology.
However, when we want to update votes things get much more tricky. There would be an intrinsic need for synchronization if we want to avoid the possibility of traffic analysis. Ideally all voters would have to re-send an update at regular intervals (even if their update is actually not an update).
I am implementing a web application that has many users and I would give the users rating based on their activities and based on other users liking their activities. How would I implement such an algorithm for that? I am looking for elegant and smart algorithm that could help.
You are basically looking for Scoring Algos. These articles might help -
How not to sort by average rating
Rank hotness with Newtons law of Cooling
How Reddit Ranking Algorithms work
Hope this helps.
Maybe your answer is staring right at you next to your username on this site :-) Stackoverflow.com's scoring system and badges are here to promote certain behaviors on the site. The algorithm is simple and the feedback is immediate so that everybody can see the consequences of certain actions.
What are the ratings used for? If you want to use the ratings as incentives for you users to encourage a specific behavior, then I believe you need to look at disciplines like behavioral psychology to figure out what behaviors you want to measure and reward.
If you already have a user base that reflects the typical user base you're trying to address, you might want to try with simple trial and error. Pick some actions, like e.g. receiving a like on a post and add points to the user's score whenever that happens. Watch the user community's reaction when you introduce the scoring system and see it it helps motivate the behavior you want. If not, try to change some other parameters and repeat.
Depending on your system, some users might try to game the system, so you could find yourself locked into an eternal cat and mouse game once you introduce a rating system (example: Google page ranking).
What's a good algorithm for suggesting things that someone might like based on their previous choices? (e.g. as popularised by Amazon to suggest books, and used in services like iRate Radio or YAPE where you get suggestions by rating items)
Simple and straightforward (order cart):
Keep a list of transactions in terms of what items were ordered together. For instance when someone buys a camcorder on Amazon, they also buy media for recording at the same time.
When deciding what is "suggested" on a given product page, look at all the orders where that product was ordered, count all the other items purchased at the same time, and then display the top 5 items that were most frequently purchased at the same time.
You can expand it from there based not only on orders, but what people searched for in sequence on the website, etc.
In terms of a rating system (ie, movie ratings):
It becomes more difficult when you throw in ratings. Rather than a discrete basket of items one has purchased, you have a customer history of item ratings.
At that point you're looking at data mining, and the complexity is tremendous.
A simple algorithm, though, isn't far from the above, but it takes a different form. Take the customer's highest rated items, and the lowest rated items, and find other customers with similar highest rated and lowest rated lists. You want to match them with others that have similar extreme likes and dislikes - if you focus on likes only, then when you suggest something they hate, you'll have given them a bad experience. In suggestions systems you always want to err on the side of "lukewarm" experience rather than "hate" because one bad experience will sour them from using the suggestions.
Suggest items in other's highest lists to the customer.
Consider looking at "What is a Good Recommendation Algorithm?" and its discussion on Hacker News.
There isn't a definitive answer and it's highly unlikely there is a standard algorithm for that.
How you do that heavily depends on the kind of data you want to relate and how it is organized. It depends on how you define "related" in the scope of your application.
Often the simplest thought produces good results. In the case of books, if you have a database with several attributes per book entry (say author, date, genre etc.) you can simply chose to suggest a random set of books from the same author, the same genre, similar titles and others like that.
However, you can always try more complicated stuff. Keeping a record of other users that required this "product" and suggest other "products" those users required in the past (product can be anything from a book, to a song to anything you can imagine). Something that most major sites that have a suggest function do (although they probably take in a lot of information, from product attributes to demographics, to best serve the client).
Or you can even resort to so called AI; neural networks can be constructed that take in all those are attributes of the product and try (based on previous observations) to relate it to others and update themselves.
A mix of any of those cases might work for you.
I would personally recommend thinking about how you want the algorithm to work and how to suggest related "products". Then, you can explore all the options: from simple to complicated and balance your needs.
Recommended products algorithms are huge business now a days. NetFlix for one is offering 100,000 for only minor increases in the accuracy of their algorithm.
As you have deduced by the answers so far, and indeed as you suggest, this is a large and complex topic. I can't give you an answer, at least nothing that hasn't already been said, but I an point you to a couple of excellent books on the topic:
Programming CI:
http://oreilly.com/catalog/9780596529321/
is a fairly gentle introduction with
samples in Python.
CI In Action:
http://www.manning.com/alag looks a
bit more in depth (but I've only just
read the first chapter or 2) and has
examples in Java.
I think doing a Google on Least Mean Square Regression (or something like that) might give you something to chew on.
I think most of the useful advice has already been suggested but I thought I'll just put in how I would go about it, just thinking though, since I haven't done anything like this.
First I Would find where in the application I will sample the data to be used, so If I have a store it will probably in the check out. Then I would save a relation between each item in the checkout cart.
now if a user goes to an items page I can count the number of relations from other items and pick for example the 5 items with the highest number of relation to the selected item.
I know its simple, and there are probably better ways.
But I hope it helps
Market basket analysis is the field of study you're looking for:
Microsoft offers two suitable algorithms with their Analysis server:
Microsoft Association Algorithm Microsoft Decision Trees Algorithm
Check out this msdn article for suggestions on how best to use Analysis Services to solve this problem.
link text
there is a recommendation platform created by amazon called Certona, you may find this useful, it is used by companies such as B&Q and Screwfix find more information at www.certona.com/
If you were running a news site that created a list of 10 top news stories, and you wanted to make tweaks to your algorithm and see if people liked the new top story mix better, how would you approach this?
Simple Click logging in the DB associated with the post entry?
A/B testing where you would show one version of the algorithm togroup A and another to group B and measure the clicks?
What sort of characteristics would you base your decision on as to whether the changes were better?
A/B test seems a good start, and randomize the participants. You'll have to remember them so they never see both.
You could treat it like a behavioral psychology experiment, do a T-Test etc...
In addition to monitoring number of clicks, it might also be helpful to monitor how long they look at the story they clicked on. It's more complicated data, but provides another level of information. You would then not only be seeing if the stories you picked out grab the user's attentions, but also that the stories are able to keep it.
You could do statistical analysis (i.e. T-test like Tim suggested), but you probably won't get low enough of a standard deviation on either measure to prove significance. Although, it won't really matter: all you need is for one of the algorithms to have a higher average number of clicks and/or time spent. No need to fool around with hypothesis testing, hopefully.
Of course, there is always the option of simply asking the user if the recommendations were relevant, but that may not be feasible for your situation.