Confusing cb results with ips cb_type - vowpalwabbit

Is cb_type ips expected to be much better than dm or dr?
I wrote the script here (results included) https://gist.github.com/travisbrady/546b40a0ed328d8f83dbc6230ee83fec using the personalization bandit data from Tony Jebara's ML for personalization class and the evaluation metric ("take rate") varies by a lot as I change the cb_type.
Specifically 0.23 for dm, 0.79 for dr and 0.85 for ips. Those jumps are quite large so I'm wondering if I've misunderstood something.
Is this expected? Is ips typically the better choice? In what circumstances is each cb_type preferred?

From your gist, the action probabilities are all declared as 1/10, so an effective difference between dm and ips is a factor of 10 in the learning rate.
Secondarily, --mtr is the typically best option but this is only available in the --cb_adf code path.

Related

What matching algorithm could I use?

I would need some help because I don't know what algorithm i could use for the following (I use python) :
Steve is 25 and he buys everyday orange juice
Maria is 23 and she likes to buy smoothies
Steve & Maria tastes are pretty much the same.
Juan is 16 and he only drinks sodas
Juan tastes are not the same as Steve and Maria.
====================================================
I would like to use a matching algorithm that will detect the users who have the same drink preference and a close age. To continue with the example, Steve and Maria would be matched together but not Juan. Which one should I use ?
I agree with #klutt that your task is pretty vague. There are two approaches that come to mind, but not knowing more details about your problem does limit the details I can provide in my answer that would help you. I am interpreting the question as if you are taking in raw text and might want to process more sentences that have very similar semantic and syntactical structure.
An algorithmic approach:
Assuming that your word choices are static in their semantic meaning (Maria is 23 ... Steve is 25), we can parse each sentence and identify tokens like is or and or same and essentially perform lexical analysis on the text... from here, you could continue thinking about how you would go about matching and so forth... but this is rather complicated...
Neural Network approach:
If you are taking in raw text in the form of sentences, it's a problem that's not straight forward to solve using a top-down algorithmic approach.
You could take an approach with neural networks that trains a model to solve your problem, but then again what you seem to be asking is quite complex since there are multiple "facts" within each sentence that are not semantically related. For example, your second sentence identifies that Maria is 23 but at the end of that sentence there is a comparison between Steve and Maria. And your first sentence only identifies Steve as 25.
Even if you chunk raw text into sentences, you would have to have a very fine tuned neural network architecture and a lot of training data to get remotely close to your goal.
Now, both of those solutions are very complex... but if you wanted to create an application that collects this data (via a form or prompt) and puts it into a structured format (like a json or xml object) to organize and store the data in memory (perhaps writing out to a database or file for persistent storage), that might be a good route to go down.
This can serve as a good lesson in how to think about data as well. It is one thing if you have a pool of thousands of sentences, just raw data that you need to organize for quantitative purposes (classic qualitative -> quantitative problems). It is another thing if you are going to be collecting this data. If you are going to be collecting data, having a program that collects and organizes names, ages, and drink preferences (and then organizes that data within certain data structures), then we can talk about matching algorithms.
I will also add here that if you do have structured data, Collaborative filtering (mentioned by Shridhar) is a great starting place.
Collaborative filtering best suits your needs.
In the newer, narrower sense, collaborative filtering is a method of
making automatic predictions (filtering) about the interests of a user
by collecting preferences or taste information from many users
(collaborating). The underlying assumption of the collaborative
filtering approach is that if a person A has the same opinion as a
person B on an issue, A is more likely to have B's opinion on a
different issue than that of a randomly chosen person. For example, a
collaborative filtering recommendation system for television tastes
could make predictions about which television show a user should like
given a partial list of that user's tastes (likes or dislikes).[3]
Note that these predictions are specific to the user, but use
information gleaned from many users. This differs from the simpler
approach of giving an average (non-specific) score for each item of
interest, for example based on its number of votes.

Evaluating a specific Information retrieval system with P#1

I am working on a information retrieval system which aims to select the first result and to link it to other database. Indeed, our system is based on a Keyword description of a video and try to interlink the video to a DBpedia entity which has the same meaning of the description. In the step of evaluation, i noticid that the majority of evaluation set the minimum of the precision cut-off to 5, whereas in our system is not suitable. I am thinking to put an interval [1,5]: (P#1,...P#5).Will it be possible? !!
Please provide your suggestions and your reference to some notes.. Thanks..
You can definitely calculate P#1 for a retrieval system, if you have truth labels. (In this case, it sounds like they would be [Video, DBPedia] matching pairs generated by humans).
People generally look at this measure for things like Question-Answering or recommendation systems. The only caveat is that you typically wouldn't use it to train a learning to rank system or any other learning system -- it's not "continuous enough" a near miss (best at rank 2) and a total miss (best at rank 4 million) get equivalent scores, so it can be hard to smoothly improve a system by tuning weights in such a case.
For those kinds of tasks, using Mean Reciprocal Rank is pretty common, if you need something tunable. Also NDCG tends to be okay, too, since it has an exponential discounting factor.
But there's nothing in the definition of precision that prevents you from calculating it at rank 1. It may be more correct to describe it as a "success#1" feature, since you're going to get 0/1 or 1/1 as your two options.

Positivity/Negativity percentage of a file in a hotel-review dataset

There's a hotel -review dataset having 1500 each of positive and negative files. To determine the accuracy of my algorithm, I have to first check the percentage positivity or negativity of the original file in the hotel-review dataset.
I tried the basic percentage criterion:
positivity % = no. of positive words/ (Total positive + total neg words)
But this holds no significant ground, so can't work on this. Is there any other method or ground on which I can work?
Example-> (She's the most beautiful lady I've ever seen.) should get a better positivity percentage than (She is a nice lady.)
I'm doing the work in Python.
The first thing you can try is switching from a binary category for words (positive vs. negative), to a sliding scale. The SentiWordNet project provides this.
However on your specific example this could actually make things worse. E.g. nice gives P = 0.875. Whereas beautiful only gets P = 0.75. Of course you could fix the SentiWordNet ratings if you disagree, but I'd suggest doing that kind of tuning using an automatic system, with as much domain-specific training data as you can find.
BTW, there are at least a couple of Python interfaces to SentiWordNet.
http://compprag.christopherpotts.net/code-data/sentiwordnet.py describes itself as an "Interface to SentiWordNet using the NLTK WordNet classes."
https://pypi.python.org/pypi/sentiment_classifier is a more general tool, using SentiWordNet.
Going back to your example, the key difference is the "the most [SOMETHING] I've ever seen" structure. This requires switching from a bag of words approach to actually parsing and understanding the sentence. I have no useful leads to give you there, so I'll be as delighted as you if someone says there is a ready-made open-source package already doing that :-)
I'd also like to mention the importance of context. Without any context "She's a beautiful lady" and "She is a nice lady" are both simple and positive. But in the context of the hotel reviews, and their relevance to me, maybe "nice" is more useful than "beautiful" And, for fun, compare these two:
"The receptionist was a nice lady."
"At breakfast, at a table near to me, was the most beautiful lady I've ever seen. It was a welcome distraction from the food."
That is the challenge I love about sentiment analysis; the commercial applications are just excuses to work on problems like that!

Collaborative or structured recommendation?

I am building a Rails app that recommends tutors to students and vise versa. I need to match them based on multiple dimensions, such as their majors (Math, Biology etc.), experience (junior etc.), class (Math 201 etc.), preference (self-described keywords) and ratings.
I checked out some Rails collaborative recommendation engines (recommendable, recommendify) and Mahout. It seems that collaborative recommendation is not the best choice in my case, since I have much more structured data, which allows a more structured query. For example, I can have a recommendation logic for a student like:
if student looks for a Math tutor in Math 201:
if there's a tutor in Math major offering tutoring in Math 201 then return
else if there's a tutor in Math major then sort by experience then return
else if there's a tutor in quantitative major then sort by experience then return
...
My questions are:
What are the benefits of a collaborative recommendation algorithm given that my recommendation system will be preference-based?
If it does provide significant benefits, how I can combine it with a preference-based recommendation as mentioned above?
Since my approach will involve querying multiple tables, it might not be efficient. What should I do about this?
Thanks a lot.
It sounds like your measurement of compatibility could be profitably reformulated as a metric. What you should do is try to interpret your `columns' as being different components of the dimension of your data. The idea is that you ultimately should produce a binary function which returns a measurement of compatibility between students and tutors (and also students/students and tutors/tutors). The motivation for extending this metric to all types of data is that you can then use this idea to reformulate your matching criteria as a nearest-neighbor search:
http://en.wikipedia.org/wiki/Nearest_neighbor_search
There are plenty of data structures and solutions to this problem as it has been very well studied. For example, you could try out the following library which is often used with point cloud data:
http://www.cs.umd.edu/~mount/ANN/
To optimize things a bit, you could also try prefiltering your data by running principal component analysis on your data set. This would let you reduce the dimension of the space in which you do nearest neighbor searches, and usually has the added benefit of reducing some amount of noise.
http://en.wikipedia.org/wiki/Principal_component_analysis
Good luck!
Personally, I think collaborative filtering (cf) would work well for you. Note that a central idea of cf is serendipity. In other words, adding too many constraints might potentially result in a lukewarm recommendations to users. The whole point of cf is to provide exciting and relevant recommendations based on similar users. You need not impose such tight constraints.
If you might decide on implementing a custom cf algorithm, I would recommend reading this article published by Amazon [pdf], which discusses Amazon's recommendation system. Briefly, the algorithm they use is as follows:
for each item I1
for each customer C who bought I1
for each I2 bought by a customer
record purchase C{I1, I2}
for each item I2
calculate sim(I1, I2)
//this could use your own similarity measure, e.g., cosine based
//similarity, sim(A, B) = cos(A, B) = (A . B) / (|A| |B|) where A
//and B are vectors(items, or courses in your case) and the dimensions
//are customers
return table
Note that the creation of this table would be done offline. The online algorithm would be quick to return recommendations. Apparently, the recommendation quality is excellent.
In any case, if you want to get a better idea about cf in general (e.g., various cf strategies) and why it might be suited to you, go through that article (don't worry, it is very readable). Implementing a simple cf recommender is not difficult. Optimizations can be made later.

What should be considered when building a Recommendation Engine?

I've read the book Programming Collective Intelligence and found it fascinating. I'd recently heard about a challenge amazon had posted to the world to come up with a better recommendation engine for their system.
The winner apparently produced the best algorithm by limiting the amount of information that was being fed to it.
As a first rule of thumb I guess... "More information is not necessarily better when it comes to fuzzy algorithms."
I know's it's subjective, but ultimately it's a measurable thing (clicks in response to recommendations).
Since most of us are dealing with the web these days and search can be considered a form of recommendation... I suspect I'm not the only one who'd appreciate other peoples ideas on this.
In a nutshell, "What is the best way to build a recommendation ?"
You don't want to use "overall popularity" unless you have no information about the user. Instead, you want to align this user with similar users and weight accordingly.
This is exactly what Bayesian Inference does. In English, it means adjusting the overall probability you'll like something (the average rating) with ratings from other people who generally vote your way as well.
Another piece of advice, but this time ad hoc: I find that there are people where if they like something I will almost assuredly not like it. I don't know if this effect is real or imagined, but it might be fun to build in a kind of "negative effect" instead of just clumping people by similarity.
Finally there's a company specializing in exactly this called SenseArray. The owner (Ian Clarke of freenet fame) is very approachable. You can use my name if you call him up.
There is an entire research area in computer science devoted to this subject. I'd suggest reading some articles.
Agree with #Ricardo. This question is too broad, like asking "What's the best way to optimize a system?"
One common feature to nearly all existing recommendation engines is that making the final recommendation boils down to multiplying some number of matrices and vectors. For example multiply a matrix containing proximity weights between users by a vector of item ratings.
(Of course you have to be ready for most of your vectors to be super sparse!)
My answer is surely too late for #Allain but for other users finding this question through search -- send me a PM and ask a more specific question and I will be sure to respond.
(I design recommendation engines professionally.)
#Lao Tzu, I agree with you.
According to me, recommendation engines are made up of:
Context Input fed from context aware systems (logging all your data)
Logical reasoning to filter the most obvious
Expert systems that improve your subjective data over the period of time based on context inputs, and
Probabilistic reasoning to do decision-making close-to-proximity based on weighted sum of previous actions(beliefs, desires, & intentions).
P.S.
I made such recommendation engine.

Resources