Contextual Bandit using Vowpal wabbit - vowpalwabbit

In this case, one of the inputs is the probability of choosing an arm/action but how do we find that probability?
Isn't finding that probability itself a big task in hand?

Supplying the probability means you are taking a scenario where you are feeding actions taken historically, e.g. from a log, rather than performing the real online scenario. This is useful because (at least some of) Vowpal's Contextual Bandits models can be bootstrapped from historical data. Meaning, a Contextual Bandits policy learnt over historical data can outperform one that learns online from scratch ― something that you can do only if you have historical data relevant to the online scenario of yours.
The Wiki page has been recently edited to better reflect that this format generalizes for this case.
Another (contrived) use case for including probabilities might be that you are acting against multiple environments, but in any event to the best of my understanding the probability here can be interpreted as a mere frequency.
As such, my understanding is you do not have to supply the probability part in your input, when not feeding in historical interaction data. Just skip it as in the example here.

Related

How `vw --audit` internally computes the weights of the features?

In vowpawabbit there is an option --audit that prints the weights of the features.
If we have a vw contextual bandit model with four arms, how is this feature weight created?
From what I understand vowpawabbit tries to fit one linear model to each arm.
So if weights were calculated using an average across all the arms, then they would correlate with getting a reward generally, instead of which features makes the model pick one variant from another.
I am interested know out how they are calculated to see how I can interpret the results obtained. I tried searching its Github repository but could not find anything meaningful.
I am interested know out how they are calculated to see how I can interpret the results obtained.
Unfortunately knowing the first does not lead to knowing the second.
Your question is concerned with contextual bandits, but it is important to note that interpreting model parameters is an issue that also occurs in supervised learning. Machine learning has made progress recently (i.e., my lifetime) largely by focusing concern on quality of predictions rather than meaningfulness of model parameters. In a blog post, Phoebe Wong outlines the issue while being entertaining.
The bottom line is that our models are not causal, so you simply cannot conclude because "the weight of feature X is for arm A is large means that if I were to intervene in the system and increase this feature value that I will get more reward for playing arm A".
We are currently working on tools for model inspection that leverage techniques such as permutation importance that will help you answer questions like "if I were to stop using a particular feature how would the frequency of playing each arm change for the trained policy". We're hoping that is helpful information.
Having said all that, let me try to answer your original question ...
In vowpawabbit there is an option --audit that prints the weights of the features.
If we have a vw contextual bandit model with four arms, how is this feature weight created?
The format is documented here. Assuming you are using --cb (not --cb_adf) then there are a fixed number of arms and so the offset field will increment over the arms. So for an example like
1:2:0.4 |foo bar
with --cb 4 you'll get an audit output with namespace of foo, feature of bar, and offset of 0, 1, 2, and 3.
Interpreting the output when using --cb_adf is possible but difficult to explain succinctly.
From what I understand vowpawabbit tries to fit one linear model to each arm.
Shorter answer: With --cb_type dm, essentially VW independently tries to predict the average reward for each arm using only examples where the policy played that arm. So the weight you get from audit at a particular offset N is analogous to what you would get from a supervised learning model trained to predict reward on a subset of the historical data consisting solely of times the historical policy played arm N. With other --cb_type settings the interpretation is more complicated.
Longer answer: "Linear model" refers to the representation being used. VW can incorporate nonlinearities into the model but let's ignore that for now. "Fit" is where some important details are. VW takes the partial feedback information of a CB problem (partial feedback = "for this example you don't know the reward of the arms not pulled") and reduces it to a full feedback supervised learning problem (full feedback = "for this example you do the reward of all arms"). The --cb_type argument selects the reduction strategy. There are several papers on the topic, a good place to start is Dudik et. al. and then look for papers that cite this paper. In terms of code, ultimately things are grounded here, but the code is written more for performance than intelligibility.

What type of algorithm should I use for forecasting with only very little historic data?

The problem is as follows:
I want to use a forecasting algorithm to predict heat demand of a not further specified household during the next 24 hours with a time resolution of only a few minutes within the next three or four hours and lower resolution within the following hours.
The algorithm should be adaptive and learn over time. I do not have much historic data since in the beginning I want the algorithm to be able to be used in different occasions. I only have very basic input like the assumed yearly heat demand and current outside temperature and time to begin with. So, it will be quite general and unprecise at the beginning but learn from its Errors over time.
The algorithm is asked to be implemented in Matlab if possible.
Does anyone know an apporach or an algortihm designed to predict sensible values after a short time by learning and adapting to current incoming data?
Well, this question is quite broad as essentially any algorithm for forcasting or data assimilation could do this task in principle.
The classic approach I would look into first would be Kalman filtering, which is a quite general approach at least once its generalizations to ensemble Filters etc. are taken into account (This is also implementable in MATLAB easily).
https://en.wikipedia.org/wiki/Kalman_filter
However the more important part than the actual inference algorithm is typically the design of the model you fit to your data. For your scenario you could start with a simple prediction from past values and add daily rhythms, influences of outside temperature etc. The more (correct) information you put into your model a priori the better your model should be at prediction.
For the full mathematical analysis of this type of problem I can recommend this book: https://doi.org/10.1017/CBO9781107706804
In order to turn this into a calibration problem, we need:
a model that predicts the heat demand depending on inputs and parameters,
observations of the heat demand.
Calibrating this model means tuning the parameters so that the model best predicts the heat demand.
If you go for Python, I suggest to use OpenTURNS, which provides several data assimilation methods, e.g. Kalman filtering (also called BLUE):
https://openturns.github.io/openturns/latest/user_manual/calibration.html

evaluating the performance of item-based collaborative filtering for binary (yes/no) product recommendations

I'm attempting to write some code for item based collaborative filtering for product recommendations. The input has buyers as rows and products as columns, with a simple 0/1 flag to indicate whether or not a buyer has bought an item. The output is a list similar items for a given purchased, ranked by cosine similarities.
I am attempting to measure the accuracy of a few different implementations, but I am not sure of the best approach. Most of the literature I find mentions using some form of mean square error, but this really seems more applicable when your collaborative filtering algorithm predicts a rating (e.g. 4 out of 5 stars) instead of recommending which items a user will purchase.
One approach I was considering was as follows...
split data into training/holdout sets, train on training data
For each item (A) in the set, select data from the holdout set where users bought A
Determine which percentage of A-buyers bought one of the top 3 recommendations for A-buyers
The above seems kind of arbitrary, but I think it could be useful for comparing two different algorithms when trained on the same data.
Actually your approach is quiet similar with the literature but I think you should consider to use recall and precision as most of the papers do.
http://en.wikipedia.org/wiki/Precision_and_recall
Moreover if you will use Apache Mahout there is an implementation for recall and precision in this class; GenericRecommenderIRStatsEvaluator
Best way to test a recommender is always to manually verify that the results. However some kind of automatic verification is also good.
In the spirit of a recommendation system, you should split your data in time, and see if you algorithm can predict what future buys the user does. this should be done for all users.
Don't expect that it can predict everything, a 100% correctness is usually a sign of over-fitting.

What will be a efficient searching mechanism to traverse through 200+ chemistry reaction library

Im currently developing an organic conversion simulator which would help to generate in-between steps of the conversion when the starting compound and the ending compound is been provided. I'm in search of a searching mechanism to get the in-between compounds. I have come across Brute force as a solution to search through. But as I'm having like 200 reactions in my library to check for each compound i think it will result in a lot of time consumption. What other techniques or algorithms will be less time consuming to get my requirement done efficiently. For example I have fuzzy logic, genetic algorithm etc
Concerning genetic algorithms, it's a quite good tool for multiparametric optimization. But, surely, you need to evaluate kinetic parameters for each reaction, not only reaction mechanism framework. Also, thermodynamic data is needed for each intermediate species. So, you have tons of parameters - you need tons of experimental data to evaluate these parameters correctly.
If you already have your own reaction set, from which reactions should be taken, you can generate full kinetic mechanism (with all pathways) and further implement the reduction of chemical mechanisms technique. There are different reduction techniques, which are not for discussing on stackoverflow.
Also, if your reaction parameters are estimated (so reactions are ready to deal with), you can generate different schemes with GA - e.g. with binary genes - 0 - this reaction won't be included into scheme, 1 - will be included.

Beyond item-to-item recommendations

Simple item-to-item recommendation systems are well-known and frequently implemented. An example is the Slope One algorithm. This is fine if the user hasn't rated many items yet, but once they have, I want to offer more finely-grained recommendations. Let's take a music recommendation system as an example, since they are quite popular. If a user is viewing a piece by Mozart, a suggestion for another Mozart piece or Beethoven might be given. But if the user has made many ratings on classical music, we might be able to make a correlation between the items and see that the user dislikes vocals or certain instruments. I'm assuming this would be a two-part process, first part is to find correlations between each users' ratings, the second would be to build the recommendation matrix from these extra data. So the question is, are they any open-source implementations or papers that can be used for each of these steps?
Taste may have something useful. It's moved to the Mahout project:
http://taste.sourceforge.net/
In general, the idea is that given a user's past preferences, you want to predict what they'll select next and recommend it. You build a machine-learning model in which the inputs are what a user has picked in the past and the attributes of each pick. The output is the item(s) they'll pick. You create training data by holding back some of their choices, and using their history to predict the data you held back.
Lots of different machine learning models you can use. Decision trees are common.
One answer is that any recommender system ought to have some of the properties you describe. Initially, recommendations aren't so good and are all over the place. As it learns tastes, the recommendations will come from the area the user likes.
But, the collaborative filtering process you describe is fundamentally not trying to solve the problem you are trying to solve. It is based on user ratings, and two songs aren't rated similarly because they are similar songs -- they're rated similarly just because similar people like them.
What you really need is to define your notion of song-song similarity. Is it based on how the song sounds? the composer? Because it sounds like the notion is not based on ratings, actually. That is 80% of the problem you are trying to solve.
I think the question you are really answering is, what items are most similar to a given item? Given your item similarity, that's an easier problem than recommendation.
Mahout can help with all of these things, except song-song similarity based on its audio -- or at least provide a start and framework for your solution.
There are two techniques that I can think of:
Train a feed-forward artificial neural net using Backpropagation or one of it's successors (e.g. Resilient Propagation).
Use version space learning. This starts with the most general and the most specific hypotheses about what the user likes and narrows them down when new examples are integrated. You can use a hierarchy of terms to describe concepts.
Common characteristics of these methods are:
You need a different function for
each user. This pretty much rules
out efficient database queries when
searching for recommendations.
The function can be updated on the fly
when the user votes for an item.
The dimensions along which you classify
the input data (e.g. has vocals, beats
per minute, musical scales,
whatever) are very critical to the
quality of the classification.
Please note that these suggestions come from university courses in knowledge based systems and artificial neural nets, not from practical experience.

Resources