Elo rating system with multiple skills - algorithm

Does anybody have a good idea as to how I could score multiple skills, and then combine them as a 'top level'-Elo rating?
Use case:
I want to rate a student. I give the student 100 English exercises, these exercises are of different types - i.e. 25 grammar exercises, 25 vocabulary exercises, 25 listening exercises, and 25 apprehension exercises.
Then I would like to be able to calculate four individual Elo-ratings, as well as one combined rating that shows how well the student generally is at English.

Related

Ranked Feed Algorithm

I'm building a sports newsfeed for an app and I'd like it to be sorted on popularity as well as chronologically. I've implemented the sorting using the open-source reddit algorithm (my app has likes for each post in the newsfeed). So far I've tested it and it seems to be working well but there's one main problem I've encountered: News about popular sports always show up above news from other sports. Example: My app has 100,000 basketball fans and 1,000 soccer fans. A big news about soccer comes out. It'll still have less likes than the other regular daily basketball news. How can I resolve this issue? One possible solution I considered is feeding the reddit algorithm the % of all fans that liked a certain post.
I suggest that you normalize the percentage across your fan base. "Popularity" should measure not only percentage of up-votes, but relative percentage within the fan base.
For each article, count the up-votes. Next, convert this to a Z-score: how many standard deviations above/below the mean this article was rated, within the fan base for that sport. Use this in place of the quantity of votes.

Fit the bag of routes with bag of rooms.

I came up with the idea to exercise algorithmics when I received interesting task to do.
Namely, my university organizes "open days" and they want to guide the students with few different routes (for exmple 10 routes), so that all rooms, where are some exhibitions that took different time, were fit effectively to this routes. Important is that, that the rooms has limited size and different times.
Do you have any idea for effective algorithm? Or only greedy algo is possible here?

Clustering+Regression-the right approach or not?

I have a task of prognosing the quickness of selling goods (for example, in one category). E.g, the client inputs the price that he wants his item to be sold and the algorithm should displays that it will be sold with the inputed price for n days. And it should have 3 intervals of quick, medium and long sell. Like in the picture:
The question: how exactly should I prepare the algorithm?
My suggestion: use clustering technics for understanding this three price ranges and then solving regression task for each cluster for predicting the number of days. Is it a right concept to do?
There are two questions here, and I think the answer to each lies in a different domain:
Given an input price, predict how long will it take to sell the item. This is a well defined prediction problem, and can be tackled using ML algorithms. e.g. use your entire dataset to train and test a regression model for prediction.
Translate the prediction into a class: quick-, medium- or slow-sell. This problem is product oriented - there doesn't seem to be any concrete data allowing you to train a classifier on this translation; and I agree with #anony-mousse that using unsupervised learning might not yield easy-to-use results.
You can either consult your users or a product manager on reasonable thresholds to use (there might be considerations here like the type of item, season etc.), or try getting some additional data in order to train a supervised classifier.
E.g. you could ask your users, post-sell, if they think the sell was quick, medium or slow. Then you'll have some data to use for thresholding or for classification.
I suggest you simply define thesholds of 10 days and 31 days. Keep it simple.
Because these are the values the users will want to understand. If you use clustering, you may end up with 0.31415 days or similar nonintuitive values that you cannot explain to the user anyway.

Algorithm for preference based grouping

I am looking to figure out a way to sort people into classes by preference.
For example, say there are 100 students that are each going to be assigned one of five classes:
Science - 40 seats
Math - 15 seats
History - 15 seats
Computers - 20 seats
Writing - 10 seats
Each student has three preferred classes that are ordered by preference. What is the best way to approach dividing up the students so that as many people get their first and second choice classes as possible, while at the same time making sure that no class has too many students for the room.
I've thought about approaching it by the following method:
Group all students by their first choice class
See which classes have too many students and which have too few
Check to see if any students in the overbooked classes have second choice classes which are underbooked
Move those students accordingly
Repeat 2-4 with 3rd choice classes
While I feel like this is a reasonable implementation, I am wondering if there are any other algorithms that solve this problem in a better way. I have tried searching all over, but I cannot find anything that would solve this kind of problem.
From your description, this sounds very much like one of the variations of the Stable Marriage Problem
Check the Wiki link and you will see a description of the Gale-Shapley Algorithm, which is a good solution.

Algorithms to find stuff a user would like based on other users likes

I'm thinking of writing an app to classify movies in an HTPC based on what the family members like.
I don't know statistics or AI, but the stuff here looks very juicy. I wouldn't know where to start do.
Here's what I want to accomplish:
Compose a set of samples from each users likes, rating each sample attribute separately. For example, maybe a user likes western movies a lot, so the western genre would carry a bit more weight for that user (and so on for other attributes, like actors, director, etc).
A user can get suggestions based on the likes of the other users. For example, if both user A and B like Spielberg (connection between the users), and user B loves Batman Begins, but user A loathes Katie Holmes, weigh the movie for user A accordingly (again, each attribute separately, for example, maybe user A doesn't like action movies so much, so bring the rating down a bit, and since Katie Holmes isn't the main star, don't take that into account as much as the other attributes).
Basically, comparing sets from user A similar to sets from user B, and come up with a rating for user A.
I have a crude idea about how to implement this, but I'm certain some bright minds have already thought of a far better solution already, so... any suggestions?
Actually, after a quick research, it seems a Bayesian filter would work. If so, would this be the better approach? Would it be as simple as just "normalizing" movie data, training a classifier for each user, and then just classify each movie?
If your suggestion includes some brain melting concepts (I'm not experienced in these subjects, specially in AI), I'd appreciate it if you also included a list of some basics for me to research before diving into the meaty stuff.
Thanks!
Matthew Podwysocki had some interesting articles on this stuff
http://codebetter.com/blogs/matthew.podwysocki/archive/2009/03/30/functional-programming-and-collective-intelligence.aspx
http://codebetter.com/blogs/matthew.podwysocki/archive/2009/04/01/functional-programming-and-collective-intelligence-ii.aspx
http://weblogs.asp.net/podwysocki/archive/2009/04/07/functional-programming-and-collective-intelligence-iii.aspx
This is similar to this question where the OP wanted to build a recommendation system. In a nutshell, we are given a set of training data consisting of users ratings to movies (1-5 star rating for example) and a set of attributes for each movie (year, genre, actors, ..). We want to build a recommender so that it will output for unseen movies a possible rating. So the inpt data looks like:
user movie year genre ... | rating
---------------------------------------------
1 1 2006 action | 5
3 2 2008 drama | 3.5
...
and for an unrated movie X:
10 20 2009 drama ?
we want to predict a rating. Doing this for all unseen movies then sorting by predicted movie rating and outputting the top 10 gives you a recommendation system.
The simplest approach is to use a k-nearest neighbor algorithm. Among the rated movies, search for the "closest" ones to movie X, and combine their ratings to produce a prediction.
This approach has the advantage of being very simple to easy implement from scratch.
Other more sophisticated approaches exist. For example you can build a decision tree, fit a set of rules on the training data. You can also use Bayesian networks, artificial neural networks, support vector machines, among many others... Going through each of these wont be easy for someone without the proper background.
Still I expect you would be using an external tool/library. Now you seem to be familiar with Bayesian Networks, so a simple naive bayes net, could in fact be very powerful. One advantage is that it allow for prediction under missing data.
The main idea would be somewhat the same; take the input data you have, train a model, then use it to predict the class of new instances.
If you want to play around with different algorithms in simple intuitive package which requires no programming, I suggest you take a look at Weka (my 1st choice), Orange, or RapidMiner. The most difficult part would be to prepare the dataset to the required format. The rest is as easy as choosing what algorithm and applying it (all in a few clicks!)
I guess for someone not looking to go into too much details, I would recommend going with the nearest neighbor method as it is intuitive and easy to implement.. Still the option of using Weka (or one of the other tools) is worth looking into.
There are a few algorithms that are good for this:
ARTMAP: groups via probability against each other (this isn't fast but its the best thing for your problem IMO)
ARTMAP holds a group of common attributes and determines likelyhood of simliarity via a percentages.
ARTMAP
KMeans: This seperates out the vectors by the distance that they are from each other
KMeans: Wikipedia
PCA: will seperate the average of all the values from the varing bits. This is what you would use to do face detection, and background subtraction in Computer Vision.
PCA
The K-nearest neighbor algorithm may be right up your alley.
Check out some of the work of the top teams for the netflix prize.

Resources