VW contextual bandits: historical data and online learning - vowpalwabbit

I'd like to test CB for e-commerce task: personal offer recommendations (like "last chance to buy", "similar positions", "consumers recommend", "bestsellers", etc.). My task is to order them (more relevant issue is higher in the list of recommendations).
So, there are 5 possible offers.
I have some historical data collected without using any model: context (user and web-session features), action id (one of my 5 offers), reward (1 if user clicked this offer, 0 - not clicked). So I have N users and 5 offers with known reward, totally 5*N rows in my historical data.
Ex:
1:1:1 | user_id:1 f1:... f2:...
2:-1:1 | user_id:1 f1:... f2:...
3:-1:1 | user_id:1 f1:... f2:...
This means that user 1 have seen 3 offers (1,2,3), cost of the 1 offer is equal to 1 (didn't click), user ckickes on offers 2 and 3 (cost is negative -> reward is positive). Probabilities are equal to 1, since all offers were shown and we know rewards.
Global task is to increase CTR. I'd like to use this data for training CB and then improve the model by exploration/exploitation policies. I set probabilities equal to 1 in this data (Is it right?). But next I'd like to set the order of offers according to rewards.
Should I use for this warm start in VW CB? Will this work correctly with data collected without using CB? Maybe you can advise more relevant methods in CB for this data and task?
Thanks a lot.

If there are only 5 possible offers and if you (as indicated) have data of the form "I have N users and 5 offers with known reward, totally 5*N rows in my historical data." then your historical data is supervised multilabel data and the warm-start functionality would apply; make sure you use the cost-sensitive version to accommodate the multilabel aspect of your historical data (i.e., there is more than one offer that would result in a click).
Will this work correctly with data collected without using CB?
Because the every action-reward is specified for every user in the data set, you only have to ensure that the sample of users is representative of the population you care about.
Maybe you can advise more relevant methods in CB for this data and task?
The first paragraph started with "if" because the more typical case is 1) there are many possible offers and 2) users have only seen a few of them historically.
In such case what you have is a combination of a degenerate logging policy and multiple rewards being revealed. If there are k possible actions but each user has only seen n<=k historically then you could try and make n lines for each user as you did. Theoretically this does not necessarily work but in practice it might help.
Out of the box: change the data
If the data you have was collected as the result of running an existing policy, then an alternative would be to start randomizing the decisions made by that system in order to collect a dataset which conforms to CB. For example, use your current system to pick the "best" action 96% of the time, and one of the other 4 actions at random 4% of the time, and log the probability along with the reward (either 0.96 or 0.01 depending upon whether it was the considered best), and then set up a proper CB-style training set for vw. With this you can also counterfactually estimate the value of both your current policy and the policy vw generates, and only switch to vw when it is winning.
The fastest way to implement the last paragraph is to just start using APS.

Related

How can I make randomization automatic on REDCap?

I am running a survey on REDCap where participants need to be assigned to one of three groups before receiving a group-specific intervention to reduce their smartphone use (e.g., 1 - intervention one, 2 - intervention 2, 3 - intervention 3). I have tried using the randomization module, but it requires one to allocate each record manually. For this specific study it becomes a problem, because we want to collect data from hundreds of people who will be completing the study all over the world, meaning that I cannot be on the computer at all times manually randomizing people and entering their records.
Is there a way to set up the randomization (or any other method) so that participants are randomly assigned to one of the three groups?
As you noted, randomisation in REDCap must be executed by a user with sufficient rights to do so, and ordinarily cannot be automated. But there are other options.
Realtime Randomization
You should reach out to your local REDCap administrators, as they may be amenable to installing the Realtime Randomization External Module, which may provide you the functionality that you want. This will (I think) automate the execution of the randomize button when a form is completed. Whether it works on surveys I don't fully know. Assuming it does, this is advantageous as it will uses the pre-defined randomisation allocation table that you generate outside REDCap, possibly with the help of a statistician. This is preferred if you need real randomisation.
Pseudo-randomisation
If you don't need to use a pre-defined randomisation allocation table, and can get by with each successive participant being allocated to a different group (record 1 -> intervention 1, record 2 -> intervention 2, record 3 -> intervention 3, record 4 -> intervention 1, etc.), so in fact not random at all, but sort of gated, then you can use the record ID in a calculated field to determine which of the three interventions a record should be allocated to. To do this you should return the modulo of the record ID by 3:
[record-name] - (rounddown([record-name]/3) * 3)
This will return 1, 2 and 0 for record IDs 1, 2 and 3, respectively, and for 4, 5 and 6, respectively, and so on ad infinitum.
Then, from this value, you can use standard branching logic to display a different fields, direct respondents to different surveys using logic in the survey queue, invite them to specific instruments using logic in automated survey invitations, fire off different alerts with instructions for each intervention group, etc.

Cross-validation in Lenskit

I'm trying to understand how exactly is performed cross-validation in lenskit. In the documentation, it says that by default the data are partitioned by user. Does that mean that, in each fold, none of the users in the test set has been used for training? Is this achieved through the "holdout" option? If so, does this option break the user-based partioning and yields folds in which each user shows up in both the training and test sets?
Right now, my evaluation code looks something like this:
dataset crossfold("data") {
source csvfile(sourceFile) {
delimiter "\t"
domain {
minimum 0.0
maximum 10.0
precision 0.1
}
}
// order RandomOrder
holdoutFraction 0.1
}
I commented out the "order" option because, when using it, lenskit eval throws an error.
Cheers!!!
Each user appears in both the training and the test sets, no matter the holdout, holdoutFraction, or retain options.
However, for each test user (when using 5 partitions, 20% of the users), part of their ratings (the test ratings) are held out and placed in the test set. The remainder of their ratings are placed in the training set, along with all ratings from other users.
This simulates the common case of a recommender system: you have users, for whom some of their history is already known and can be used in model training, and you're trying to recommend or predict their future behavior.
The holdout, holdoutFraction, and retain options are different ways of deciding how many ratings are put in the test set. If you say holdout 5, then 5 ratings from each test user are put in the test set, and the rest are used for training. If you say holdoutFraction 0.2, then 20% are used for testing and 80% for training. If you say retain 5, then 5 are used for training and the rest are used for testing.

How to manage multiple positive implicit feedbacks?

When there are no ratings, a common scenario is to use implicit feedback (items bought, pageviews, clicks, ...) to suggests recommendations. I'm using a model-based approach and I wondering how to deal with multiple identical feedback.
As an example, let's imagine that consummers buy items more than once. Should I have to consider the number of feedback (pageviews, items bought, ...) as a rating or compute a custom value ?
To model implicit feedback, we usually have a mapping procedure to map implicit user feedback into the explicit ratings. I guess in most domains, repeated user action against the same item indicates that the user's preference over the item is increasing.
This is certainly true if the domain is music or video recommendation. In a shopping site, such a behavior might indicate the item is consumed periodically, e.g., diapers or printer ink.
One way I am aware of to model this multiple implicit feedback is to create a numeric rating mapping function. When the number of times (k) of implicit feedback increases, the mapped value of rating should increase. At k = 1, you have a minimal rating of positive feedback, for example 0.6; when k increases, it approaches 1. For sure, you don't need to map to [0,1]; you can have integer ratings, 0,1,2,3,4,5.
To give you a concrete example of the mapping, here is what they did in a music recommendation domain. For short, they used the statistic info of the items per user to define the mapping function.
We assume that the more
times the user has listened to an artist the more the user
likes that particular artist. Note that user’s listening habits
usually present a power law distribution, meaning that a few
artists have lots of plays in the users profile, while the rest
of the artists have significantly less play counts. Therefore,
we compute the complementary cumulative distribution of
artist plays in the users’ profile. Artists located in the top
80-100% of the distribution are assigned a score of 5, while
artists in the 60-80% range assign a score of 4.
Another way I have seen in the literature is to create another variable besides a binary rating variable. They call it confidence levels. See here for details.
Probably not that helpful for OP any longer, but it might be for others in the same boat.
Evaluating Various Implicit Factors in E-commerce
Modelling User Preferences from Implicit Preference Indicators via Compensational Aggregations
If anyone knows more papers/methods, please share as I'm currently looking for state of the art approaches to this problem. Thanks in advance.
You typically use a sum of clicks, or some weighted sum of events, as a "score" for each user-item pair in implicit feedback systems. It's not a rating, and that's more than a semantic distinction. You won't get good results if you feed these values into a process that's expecting rating-like and trying to minimize a squared-error loss.
You treat 3 clicks as adding 3 times the value of 1 click to the user-item interaction strength. Other events, like a purchase, might be weighted much more highly than a click. But in the end it also adds to a sum.

Binary values in Collaborative Filtering

Can the values in User-Item matrix be binary values like 0 and 1 which indicate “didn’t buy”-vs-“bought”?
And if apply latent factor model on the matrix, can the predicted value (for example 0.8) stand for the probability of user's behavior(i.e. didn’t buy or bought)?
Yes, it is quite common to have implicit feedback to represent ratings. One slight pitfall with the suggestion you made would be if 0 means the user saw the item but chose not to buy it, or the user never even saw the item (i.e gave no feedback.)
Typically the value output from your recommendation algorithm isn't a probability of a purchase, but rather a numerical score used to rank that item versus all other potential items. This way you can identify the top X items to recommend to a user.
You can use standard collaborative filtering on the type of data you discussed, and also using factorisation techniques.

Algorithm for most recently/often contacts for auto-complete?

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete
I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.
What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.
I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.
Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine
Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.
Take a look at Self organizing lists.
A quick and dirty look:
Move to Front Heuristic:
A linked list, Such that whenever a node is selected, it is moved to the front of the list.
Frequency Heuristic:
A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.
It looks like the move to front implementation would best suit your needs.
EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.
Further Edit:
Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.
The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.
This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.
Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)
I'd go for something similar to:
NoM = Number of Mail
(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30
Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0
It's just an idea, anyway it is to give different importance to the most and just mailed contacts.
If you want to get crazy, mark the most 'active' emails in one of several ways:
Last access
Frequency of use
Contacts with pending sales
Direct bosses
Etc
Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.
It's a lot of work but kind of fun...
Maybe count the number of emails sent to each address. Then:
ORDER BY EmailCount DESC, LastName, FirstName
That way, your most-often-used addresses come first, even if they haven't been used in a few days.
I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).
I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.
Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.
This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.
The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.
In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.
I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).
I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).
Each use:
UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = 'foo#bar.com'
Each interval:
UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)
In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.
The increment and diminisher would have to be adjusted to preference, of course.

Resources