I've always wondered about how and what the best way to go about implementing the 'Genius' feature on iTunes.
I could probably brute force it, but was just wondering if anyone had any insight.
Thanks.
The Genius algorithm is an example of a recommendation system, which is a hot topic in E-commerce systems. So much so that Netflix had a $1 million prize that went on for several years to improve their recommendation system by a mere 10%.
On iTunes you have a collection of music. Genius can make assumptions that if you have this music that you must like it. If enough people have song B that have song A then Genius can say that if you have song A you'll probably like song B.
Just having the song would be a fairly weak recommendation. Better would be if the user had rated that music so you can improve the strength of the "recommendation" on that basis.
I'd highly recommend reading If You Liked This, You’re Sure to Love That as a good primer on recommendation systems.
Step1- collect the data, for all the clicks/play per user. That would be lots of data.
Step2- make a ranking/recommendation list generation system. For every song, generate a ranking/priority type list with all the products/songs people are viewing/playing. A simple example say no of people share the same combination or the amount of play time each song is played for.
Step3- keep a limit (say top10) to show your recommendations from the above made list for a song.
This was not so difficult, the trick or the genius lies in adding weights to the list you make in step2. How your recommendation system works with weights (for ex page rank).
I might have disappointed data mining engineers by giving such a naive/simple explanation to extremely complex computer science field. Do pardon me. :)
Have a look at this, term frequency–inverse document frequency, it's a method that ranks according to what you like, the more "unique" the more effect a liked song has on the recommendations.
Basically, if you only like and play U2, it will be hard for the algorithm/program to recommend something special, which is to your liking.
On the other hand if you are more varied in your iTunes usage, those lesser known bands that you really like will be weighted more, since they isolate you more from the masses.
Important point: you have to have data from lots of users. You couldn't do this yourself by brute force (unless you mean creating it entirely by hand).
Related
Simple item-to-item recommendation systems are well-known and frequently implemented. An example is the Slope One algorithm. This is fine if the user hasn't rated many items yet, but once they have, I want to offer more finely-grained recommendations. Let's take a music recommendation system as an example, since they are quite popular. If a user is viewing a piece by Mozart, a suggestion for another Mozart piece or Beethoven might be given. But if the user has made many ratings on classical music, we might be able to make a correlation between the items and see that the user dislikes vocals or certain instruments. I'm assuming this would be a two-part process, first part is to find correlations between each users' ratings, the second would be to build the recommendation matrix from these extra data. So the question is, are they any open-source implementations or papers that can be used for each of these steps?
Taste may have something useful. It's moved to the Mahout project:
http://taste.sourceforge.net/
In general, the idea is that given a user's past preferences, you want to predict what they'll select next and recommend it. You build a machine-learning model in which the inputs are what a user has picked in the past and the attributes of each pick. The output is the item(s) they'll pick. You create training data by holding back some of their choices, and using their history to predict the data you held back.
Lots of different machine learning models you can use. Decision trees are common.
One answer is that any recommender system ought to have some of the properties you describe. Initially, recommendations aren't so good and are all over the place. As it learns tastes, the recommendations will come from the area the user likes.
But, the collaborative filtering process you describe is fundamentally not trying to solve the problem you are trying to solve. It is based on user ratings, and two songs aren't rated similarly because they are similar songs -- they're rated similarly just because similar people like them.
What you really need is to define your notion of song-song similarity. Is it based on how the song sounds? the composer? Because it sounds like the notion is not based on ratings, actually. That is 80% of the problem you are trying to solve.
I think the question you are really answering is, what items are most similar to a given item? Given your item similarity, that's an easier problem than recommendation.
Mahout can help with all of these things, except song-song similarity based on its audio -- or at least provide a start and framework for your solution.
There are two techniques that I can think of:
Train a feed-forward artificial neural net using Backpropagation or one of it's successors (e.g. Resilient Propagation).
Use version space learning. This starts with the most general and the most specific hypotheses about what the user likes and narrows them down when new examples are integrated. You can use a hierarchy of terms to describe concepts.
Common characteristics of these methods are:
You need a different function for
each user. This pretty much rules
out efficient database queries when
searching for recommendations.
The function can be updated on the fly
when the user votes for an item.
The dimensions along which you classify
the input data (e.g. has vocals, beats
per minute, musical scales,
whatever) are very critical to the
quality of the classification.
Please note that these suggestions come from university courses in knowledge based systems and artificial neural nets, not from practical experience.
Recently, I found several web site have something like : "Recommended for You", for example youtube, or facebook, the web site can study my using behavior, and recommend some content for me... ...I would like to know how they analysis this information? Is there any Algorithm to do so? Thank you.
Amazon and Netflix (among others) use a technique called Collaborative filtering to suggest things you might like based on the likes/dislikes of others who have made purchases and selections similar to yours.
Is there any Algorithm to do so?
Yes
Yes. One fairly common one is to look at things you've selected in the past, find other people who've made those selections, then find the other selections most common among those other people, and guess that you're likely to be interested in those as well.
Yup there are lots of algorithms. Things such as k-nearest neighbor: http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm.
Here is a pretty good book on the subject that covers making these sorts of systems along with others: http://www.amazon.com/gp/product/0596529325?ie=UTF8&tag=ianburriscom-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0596529325.
It's generally done by matching you with other users who have similar usage history / profile and then recommending other things that they've purhased/watched/whatever.
Searching for "recommendation algorithm" yields lots of papers. Most algorithms incorporate "machine learning" algorithms to determine groups of things (comedy movies, books on gardening, orchestral music, etc.). Your matching with those groups yields recommendations. Some companies use humans to classify things, too.
Such an algorithm is going to vary wildly from company to company. In many cases, it analyzes some combination of your search history, purchase history, physical location, and other factors. It probably will also compare purchases/searches amongst other people to find what those people have purchased/searched for, and recommend some of those products to you.
There are probably hundreds of these algorithms out there, but I doubt you can use any of them (that are actually good). Probably you are better off figuring it out yourself.
If you can categorize your contents (i.e. by tagging or content analysis), you can also categorize your users and their preferences.
For example: you have a video portal with 5 million videos .. 1 mio of them are tagged mostly red. If 80% of all videos watched by a user (who is defined by an IP, a persistent user account, ...) are tagged mostly red, you might want to recommend even more red videos to him. You might want to refine your recommendations by looking at his further actions: does he like your recommendations -- if so, why not give him even more, if not, try the second-best guess, maybe he's not looking for color, but for the background music ...
There's no absolute algorithm to do it, but all implementations will go into a similar direction. It's always basing on observing users, which scares me from time to time :-)
There's whole lot of algorithms tackling the issue: Wiki article. It's a Machine Learning domain problem. Computer's can be learned using two main techniques: classification and clustering. They require some datasets as input. If the dataset is informative (really holds some useful patterns) than those ML techniques can dig most of it.
Clustering could be best to use for this kind of problem. It's main usage is to find similarities among points in provided dataset. If the points are, e.g. your search history, they can be grouped together to form certain clusters. If Your search history closely relates to another, a hint can be given - picking links that are most similar to Your's.
The same comes with book recommendations - it's obvious what dataset they use: "Other people who bought this product also bought Product A, Product B,...". The key here is to match your profile to other's and use the most similar to recommend.
The computer retrieves information from the human brain with complex memory scan process, sorts it accordingly and outputs results based on what you have experienced in your life so far.
Occasionally I read news articles that mention about some computer models that computer scientists use to predict winners of some sporting events or the odds for betting which I think there must be a mathematical model behind it. I never bothered to think twice even though I am a "pseudo computer scientist" myself. With the 2010 FIFA World Cup just underway, and since I am also a "pseudo football/soccer player" myself, I just started to wonder about these calculations algorithms.
For example, I know one factor is determining the strength of opponents, so that a win against a strong opponent can count more than a win against a weak opponent. But it now kind of gets in a circular loop, or at least how does one determine the strength of a team in the first place, before that team can be considered strong or weak? If it's based on a historical data then there's no way that could be accurate, because those players of the past are no longer on the fields so their impact is none (except maybe if they become coaches like Maradona)
Anyway, long question short, if you're happen to be working in this field or have some knowledge, please shed some lights.
I know of some work, but its basis might surprise you a bit -- it's been used to predict (quite accurately) what countries do how well in the Olympics, but it's based purely on the economics of the countries in question, not looking at the individual athletes at all. I don't believe it's been used specifically to look at FIFA world cup, but I suspect it would apply about as well, or maybe even a bit better.
Some of the large Investment Banks started a competition for thier quants to write models to predict the wold cup winner.
http://kaggle.com/worldcup2010
More info on the models
http://kaggle.com/blog/2010/06/03/predicting-the-2010-fifa-world-cup-can-statisticians-outdo-the-investment-banks/
There's been some modeling to select horse racing winners using logit models here. The general principle can be applied to predicting which teams advance to the Round of 16 and subsequent rounds. Horse racing is at least, if not more, complex with regard to the number of variables that have a statistically significant effect on the outcome. For instance, in the author's model, weight, win rate, jockey characteristics, speed, post position, distance, and winnings were all significant variables. The authors didn't have access to "trip handicapping" at the time which has proven to be another important effect.
Reading this paper might help generate some thoughts around handicapping FIFA.
To tanascius point, developing a predictive model is the first step. As the authors further explain, developing a betting strategy based on the results is a different problem that's based, in part, on the accuracy of the model.
One guy has been using googles page-rank for sports. Not sure why he felt the need to rename it:
http://www.physorg.com/news180094320.html
I found that like by a quick search for using page-rank for sports because I realized it solves the circular references in rankings. Was curious if anyone had tried it, and there it was.
BTW, anyone who can make accurate predictions for things you can bet on is not publishing their methods or results. They should be making money instead.
I think there's too much to take into consideration:
Injured players, form players, form teams, pressure on teams, rivalries, weather, home advantage, past meetings, formations, team styles, expectations....
Dunno if this applies to the FIFA soccer video games, but I know that for the Super Bowl (american football, for those who dont know) they use the latest version of Madden to predict the winner.
Not very scientific, I suppose, but its there.
My friend works for a non-profit organization working to stop the illegal exploitation of minors over sites such as craigslist.org, which is one of the more popular mediums. The question is whether or not it is possible, now or in the near future, to develop an algorithm to analyze a photo of a person and return a prediction of their relative age.
It sounds like a mammoth task. My only thought was some sort of Bayesian probability system. I know even people often have trouble judging someone's age but Bayesian spam filters are advertised as being "10 times as accurate as a human" so maybe it's possible?
I am pretty inexperienced though. I would appreciate it if someone else could suggest whether or not this is feasible and if so how and when?
EDIT: Thank you everyone for the responses. Smoore that study was very helpful but I think Hal's solution is the most practical for the time being.
Here's a possible (left-field) solution. Perhaps, you could tie it into some type of a captcha solution for the site itself. Prompt new users with images of other new users with the question: "Is this person over 18?". It's true that a 50% success rate is not a very effective captcha system, but it's a start.
Coupled with some other checks or repetitive checks and it could work. You could display the image to a number of new users, and base the result on a certain threshold. If, 8 out of 10 people flagged a certain image as not a minor, than it's probably pretty safe they are of age.
But, this whole system can be circumvented by simply uploading someone else's image so I'm not sure how effective any of this really is. :)
I expect it would be pretty hard to get right. Consider this set of photos where the same model is made up to look very different ages.
There are algorithm to reliably determine the attractiveness of a face. See acm.org and uni-regensburg.de. It wouldn't be too much of a stretch to imagine an algorithm which could predict age.
Characteristics such as smoothness would probably have a strong correlation with age. It would probably take a great deal of effort to be more reliable than your average carney though.
I think you would need some input from a forensic anthropoligist ( or at least an anatomist).
Differnet parts of the body grow at different rates so it might be possible to do something like size of head vs. shoulder width, arm length vs. body width.
Unfortunately it sounds like he is trying to differentiate between say a 14 year olds and 18 year olds. Which is only a four year difference, variations in genetic makeup and nutitrition would probaly give any system an accuracy of +/- 20% which would equate to three years for this age group.
On the other hand if you had a large sample of photos then you could account for the variance statisticaly and get a pretty good idea whether a site was likely to be exploiting minors systematicaly.
The direct answer to your question is that no, no such algorithm will exist in the near future, and is probably impossible to achieve with any accuracy without strong AI.
That said, a practical solution to your problem is probably the amazon mechanical turk:
http://mturk.com
There, you can pay a small fee to have real people complete a task for you. I'd probably set your task up so that you paid $0.02 to have a person estimate the age of maybe 5 faces at a time. You could double or triple check your results with other workers, particularly for those faces who seemed close to your age limit. This is probably your only practical solution other than hiring minimum wage interns to manually review all submissions.
Use mechanical turk
In this study they tried it by analysing facial geometry and wrinkle features. Problem is this would be affected by shot angle, lighting, etc.
In some theoretical sense it is probably possible. For all practical purposes though, it is currently impossible.
Mammoth is an understatement I think. "Giant glacier" or "moon" might be more appropriate.
This isn't to say it wouldn't be worth looking into but I have a feeling you'd be in for a lot of man hours before you came up with something remotely useful.
I don't think it's something that a computer could do with any degree of accuracy. It's even really hard for people to do. I mean, have you been the the liquor store lately, they are supposed to ask for ID from anybody who looks under 25 (drinking age is 19 here). Apparently some 40 year olds don't look old enough. Telling somebody's age just by looking at them is a very hard thing to do. Especially when you get into to erotic picture arena, where they are trying to make models seem younger than they really are.
I think you will also have difficulties with different composited pictures. For instance angles on a face, different lighting, as well as context and probably most of all... image quality/resolution. It's a lot easier to work with a 800x600 pic then it is to work with a 320x240. The algorithm is only as good as the subject.
I cannot see this approach (a software solution to measuring age) being very effective. I like the idea of users flagging images - a human being can discern age many times more effectively then any algorithm.
Practical approach aside, I'd advice against trying to develop anything in that direction for now.
Few reasons:
1. guessing someone's age is not a grateful task
2. "biological" age and "calendar" age of people vary greatly - I know people who are 30 and are still asked for an ID when buying liquor, and some who are barely 18 and already look over 30
3. some people's looks don't change over time - they just have that kind of looks
4. nowadays, everyone's working to look as young as they can - so basically, you've got the whole industry working against you :(
Anyways, to cut long story short, I don't think it's feasible for now.
A neural net is a reasonable approach, you would need a training set of pictures of people with known ages and a bit of image processing to remove hats etc.
edit: Question changed?
You might be ale to classify someone as 20-30 or 40-50 on a CCTV but you aren't going to be ale to tell if a model is 17 or 18 in a posed photo.
Just like nearly all advanced tasks in image classification this topic is still in research. Judging from this paper it is possible to do it but non-trivial, also you have to have a lot of (manually) annotated training data. Without any knowledge of this field and no experience in image processing this task is going to take you several months.
Develop a classification algorithm that bases a heuristic on many values of the pictures, amount of pixels that are dark within the face area (possibly wrinkles), and the color of the hair. These values should fall within a general area of any profile-esque picture, if you want to be fancy, carry weights with these values and develop a type of game tree that would be able to search hundreds of thousands of images quickly, finding where this image "falls" in the tree within an age-specific set of values.
Some Japanese cigarette vending machines do this. Not terribly well by all accounts, but then it probably doesn't matter since, as Hal mentioned, the easiest hack is just to use someone else's image...
Impossible is nothing, Only amount of efforts changes :
I think it would be near impossible if you target one particular feature of face.
you have to consider multiple factor, So decision will be lying in a matrix and you have to feed multiple things and you will get your answer i would enlist some feature :
1) Beard (Detect face , Now detect beard on face , Help full in distinguish male/female
/childern )
2) Hair
3) Wrinkles
4) Size of face
5) Ration between height and breadth of face
It would be a tough assignment but algorithm can be developed.
As of now, this is possible with 90% accuracy. Yes. please refer the following link..
http://www.omron.com/r_d/coretech/vision/okao.html
I've read the book Programming Collective Intelligence and found it fascinating. I'd recently heard about a challenge amazon had posted to the world to come up with a better recommendation engine for their system.
The winner apparently produced the best algorithm by limiting the amount of information that was being fed to it.
As a first rule of thumb I guess... "More information is not necessarily better when it comes to fuzzy algorithms."
I know's it's subjective, but ultimately it's a measurable thing (clicks in response to recommendations).
Since most of us are dealing with the web these days and search can be considered a form of recommendation... I suspect I'm not the only one who'd appreciate other peoples ideas on this.
In a nutshell, "What is the best way to build a recommendation ?"
You don't want to use "overall popularity" unless you have no information about the user. Instead, you want to align this user with similar users and weight accordingly.
This is exactly what Bayesian Inference does. In English, it means adjusting the overall probability you'll like something (the average rating) with ratings from other people who generally vote your way as well.
Another piece of advice, but this time ad hoc: I find that there are people where if they like something I will almost assuredly not like it. I don't know if this effect is real or imagined, but it might be fun to build in a kind of "negative effect" instead of just clumping people by similarity.
Finally there's a company specializing in exactly this called SenseArray. The owner (Ian Clarke of freenet fame) is very approachable. You can use my name if you call him up.
There is an entire research area in computer science devoted to this subject. I'd suggest reading some articles.
Agree with #Ricardo. This question is too broad, like asking "What's the best way to optimize a system?"
One common feature to nearly all existing recommendation engines is that making the final recommendation boils down to multiplying some number of matrices and vectors. For example multiply a matrix containing proximity weights between users by a vector of item ratings.
(Of course you have to be ready for most of your vectors to be super sparse!)
My answer is surely too late for #Allain but for other users finding this question through search -- send me a PM and ask a more specific question and I will be sure to respond.
(I design recommendation engines professionally.)
#Lao Tzu, I agree with you.
According to me, recommendation engines are made up of:
Context Input fed from context aware systems (logging all your data)
Logical reasoning to filter the most obvious
Expert systems that improve your subjective data over the period of time based on context inputs, and
Probabilistic reasoning to do decision-making close-to-proximity based on weighted sum of previous actions(beliefs, desires, & intentions).
P.S.
I made such recommendation engine.