Naive Bayes Probabilities applying to Web Based Travel Recommender System - probability

Hi we're developing a Travel Recommender System and we stated that we will use Naive Bayes probability however we don't have enough knowledge in doing this.
We want to know the probability formula if the new user will choose the place that was chosen or rated by the existing user given that they have the same Gender, Age, Province and Traveler Style.
Demographics of User:
Gender
Age
Province
Traveler Style
Rated Places (Existing User)

Related

optimal k for knn algorithm in item based recommendation system using cross-validation

I'm trying to make a recommendation system in my Graduation project app using k-nearest neighbor algorithm.
I make an item recommendation system to recommend products to the active user based on products rating.
my dataset is a matrix filled from the database, the columns represent the system's users, and the rows represent the products in the system, and the matrix filled with the rating values for each product from each user. as the following example , but in my case there is products not movies.
there is now 17 products(rows) and 12 users (columns). but I want an accurate general way to find k .
I used k=sqrt(n)/2 as an equation to find a good k value (n represents the number of products in the application). but I want more accurate way to find the k value, someone give me an advice to use k-fold cross-validation, after searching, I understood how to use cross-validation for classification.
Can anyone explain to me how to use it in Recommendation systems?

Organize a trade event || Business "speed dating" algorithm

I'm Student of software engineering,
Right now I am working for my final project, scheduling Business matchmaking on a trade day.
The idea is to bring a seller (developer) and a buyer (A person with financial means) together. The algorithm should be like "Speed Dating".
Let's say I have 15 tables and 10 sessions.
It means that each session 15 buyers will meet 15 sellers for 20 minutes.
My question is how do I make the matching?
Suppose each person has 8 attribute that characterize him.
• I thinking creating bipartite graph (group A – Sellers, group B - Buyers)
• Then link up between a seller and buyer based on similar attributes (Should consider what is level of error). dont want to bring together people who are not related
• Then on each session look for a maximum matching.
Constraints: it's not a real time, I'll close registration a few days before the event.
I'm currently "idea blocked" on how to do the linking step (base on a person attributes).
I would appreciate your help,
Even a dialogue on the matter would help me a lot!:)
Often given multi-dimensional data that describe data points, you define a similarity or "kernel" between points. This could be the e.g. dot product after you normalize by standard deviation in each dimension for example. Or it could be a Gaussian kernel e^((-d^2)/y) where d is the dot-product between points and y is a constant bandwidth parameter. Also e.g. if certain dimensions are categorical then you could the one-dimensional dot-product to be 1 if the categorical variables agree, otherwise 0. Then you can form the overall dot-product from the multi-dimensional data after normalizing each dimension by its standard deviation. The point is, once you form a similarity or kernel between points, then you can define a weighted bipartite graph where the weight of an edge is equal to the similarity/kernel between points, and your problem is to find a maximum weight matching. This is a well-known problem with solutions in the literature e.g. the Hungarian algorithm, see e.g. http://en.wikipedia.org/wiki/Matching_%28graph_theory%29#In_weighted_bipartite_graphs .

A user ranking model

I am trying to develop a simple game where a group of user can come and play the game. Based on the performance of the user, they get positive score or negative score.
I want two parameters to be taken into account user's weight(no. of matches he has played and his performance in those matches) his instantaneous skill sets. These two combined together for each user and compared with other user's score might give his score in the current match.
Then combining the score and previous rating we might arrive at users new rating.
I do not want to reinvent the wheel. I tried and came up with this, but this looks pretty naive and i am not sure how the performance will be in real world scenario.
Pos[i] and Neg[i] are the positive and negative score of the users in a match.
Step1: Calculate the average score of n people `for i in range(1, N): sum = sum + Pos[i] Average = sum/N` do the same for negative score.
Step2: Calculate the Standard Deviation (SD)
Step3: Calculate the weight of the user as follows say the user has played M matches, his weight W will be Mxabs((sum(POS[i])/N1 - (sum(NEG[i])/N2))
(where N1 is the number of times he has scored positive scores and N2 the number of times he scored negative result)
Step4: Current Score = (POSi - SD)xW
Step5: His New Rating = Old Rating + Current Score
Please suggest something standard.
You should check out how chess ratings are calculated. There are some variations to choose from, but I think it should be appropriate for your case.
Well it has been done here it takes into consideration previous literature and other stuff. It also shows what most famous methods are out there and how they have done it.
You could check ELO ratings used in chess, as mentioned by Running Wild. Alternatively, you could also look into the Power Rating system used in Age of Empires 3. In this post they explain how it works and the reason they replaced the old ELO rating system used in MSN Zone with it.
Check out Glicko, TrueSkill, different approaches in the Kaggle chess rating competition
http://timsalimans.com/how-i-won-the-deloittefide-chess-rating-challenge/
http://blog.kaggle.com/2012/03/20/could-world-chess-ratings-be-decided-by-the-stephenson-system/
I put some URL's up here, before noticing OP was SO question: http://www.reddit.com/r/statistics/comments/rsomx/how_do_i_calculate_the_rating_of_a_player_in_a/

algorithm for solving resource allocation problems

Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.

Algorithm for computing timetable given restrictions

I'm considering a hypothetical problem, and looking for guidance on how to approach solving the problem, from an algorithmic point of view.
The Problem:
Consider a university. You have the following objects:
Teaching staff. Each staff member teaches one or more papers.
Students. Each student takes one or more papers.
Rooms. Rooms hold a certain number of students, and contain certain types of equipment.
Papers. Require a certain type of equipment, and a certain amount of time each week.
Given information about enrollments (i.e.- how many students are enrolled in each paper, and what staff are allocated to teach each paper), how can I compute a timetable that obeys the following restrictions:
Staff can only teach one thing at once.
Students can only attend one paper at once.
Rooms can only hold a certain number of students.
Papers that require a certain type of equipment can only be held in in a room that provides that type of equipment.
Hours of operation are Monday to Friday, 8-12 and 1-5.
Discussion:
In reality I'm not too concerned with the situation outlined above - it's the general class of problem that I'm curious about. At first glance It seems to me like a good fit for a genetic algorithm, but the fitness function for such an algorithm would be incredibly complex.
What's a good approach for trying to solve this kind of constraint-satisfying problem?
I guess there's probably no way to solve this perfectly, since students may well take a combination of papers that leads to impossible situations, especially as the number of students & papers grows.
Staying on genetic algorithms, I don't think the fitness function for this would be very complex, quite the opposite.
You basically just check your candidate solution (whatever the encoding) for each of the constraints (you only have 5) and assign a weight to them so that when a constraint is not satisfied the weight is added to a total score that could represent fitness.
In such a scenario you just minimize the fitness function (because best fitness possible is 0, meaning all the constraints are satisfied) and let the GA crunch the numbers.
The encoding will take a bit of figuring out, but once that's done it should be straightforward, unless I am missing something, of course :)
A very restricted version of this problem is NP-Complete.
Consider the case when exactly one student can take a paper.
Now for a given time slot (say the paper is taught all day), you can construct a 3-partite graph, with Rooms, Papers and Students, with an edge between a paper and a student if that student wants to take it. Also add edges between a paper and it's possible rooms.
We now see that the 3 Dimensional matching problem is an instance of your problem: you need to pick a non-overlapping (student, paper, room) combination for that particular timeslot.
You are probably better off with some heuristics for the general problem. Sorry, I can't help you there.
Hope that helps.

Resources