Multilevel model repeated measures pre/post - longitudinal

I have a multilevel model repeated measures dataset, but I'm not sure it's appropriate to run it as such.
My professor says I can't do RM because I only have two time points -- he says it's a measurement model.
I have 3 groups, and each participant completed a task twice (pre/post). I want to look at change over time in the pre and post data.
Without using a difference score, is it possible to use MLM to look at change over time?
Level 1: measurement from the task (reaction time and accuracy); 4 DV from pre, 4 DV from post
Level 2: participants who are apart of 3 groups (1 control, 2 experimental)

Related

VW contextual bandits: historical data and online learning

I'd like to test CB for e-commerce task: personal offer recommendations (like "last chance to buy", "similar positions", "consumers recommend", "bestsellers", etc.). My task is to order them (more relevant issue is higher in the list of recommendations).
So, there are 5 possible offers.
I have some historical data collected without using any model: context (user and web-session features), action id (one of my 5 offers), reward (1 if user clicked this offer, 0 - not clicked). So I have N users and 5 offers with known reward, totally 5*N rows in my historical data.
Ex:
1:1:1 | user_id:1 f1:... f2:...
2:-1:1 | user_id:1 f1:... f2:...
3:-1:1 | user_id:1 f1:... f2:...
This means that user 1 have seen 3 offers (1,2,3), cost of the 1 offer is equal to 1 (didn't click), user ckickes on offers 2 and 3 (cost is negative -> reward is positive). Probabilities are equal to 1, since all offers were shown and we know rewards.
Global task is to increase CTR. I'd like to use this data for training CB and then improve the model by exploration/exploitation policies. I set probabilities equal to 1 in this data (Is it right?). But next I'd like to set the order of offers according to rewards.
Should I use for this warm start in VW CB? Will this work correctly with data collected without using CB? Maybe you can advise more relevant methods in CB for this data and task?
Thanks a lot.
If there are only 5 possible offers and if you (as indicated) have data of the form "I have N users and 5 offers with known reward, totally 5*N rows in my historical data." then your historical data is supervised multilabel data and the warm-start functionality would apply; make sure you use the cost-sensitive version to accommodate the multilabel aspect of your historical data (i.e., there is more than one offer that would result in a click).
Will this work correctly with data collected without using CB?
Because the every action-reward is specified for every user in the data set, you only have to ensure that the sample of users is representative of the population you care about.
Maybe you can advise more relevant methods in CB for this data and task?
The first paragraph started with "if" because the more typical case is 1) there are many possible offers and 2) users have only seen a few of them historically.
In such case what you have is a combination of a degenerate logging policy and multiple rewards being revealed. If there are k possible actions but each user has only seen n<=k historically then you could try and make n lines for each user as you did. Theoretically this does not necessarily work but in practice it might help.
Out of the box: change the data
If the data you have was collected as the result of running an existing policy, then an alternative would be to start randomizing the decisions made by that system in order to collect a dataset which conforms to CB. For example, use your current system to pick the "best" action 96% of the time, and one of the other 4 actions at random 4% of the time, and log the probability along with the reward (either 0.96 or 0.01 depending upon whether it was the considered best), and then set up a proper CB-style training set for vw. With this you can also counterfactually estimate the value of both your current policy and the policy vw generates, and only switch to vw when it is winning.
The fastest way to implement the last paragraph is to just start using APS.

Deep learning vocabulary: images/second and step time?

There's a table at the bottom of
https://www.tensorflow.org/performance/performance_guide#optimizing_for_cpu that talks about images per second and step time, in the context of performance in deep learning.
How does one compute images/second and step time?
When training in Tensorflow using the Estimator API, there's a number reported as global_step/sec. Is this the same thing? If so, does that take into account the time required to process the input pipeline, or just the time it takes to do the forward pass through the model?
global_step/sec means how many steps per second your tensorflow model is doing. A step is usually a minibatch. So the inverse of global_step/sec is your step time and batch_size * global_step/sec is your number of images per second.
Because these numbers are throughput numbers computed on the steady state of your model training they include the input pipeline (i.e. if your input pipeline cannot produce X minibatches per second then you necessarily have to run less than X global_step/sec).

Interpreting basic output from Vowpal Wabbit

I had a couple questions about the output from a simple run of VW. I have read around the internet and the wiki sites but am still unsure about a couple of basic things.
I ran the following on the boston housing data:
vw -d housing.vm --progress 1
where the housing.vm file is set up as (partially):
and output is (partially):
Question 1:
1) Is it correct to think about the average loss column as the following steps:
a) predict zero, so the first average loss is the squared error of the first example (with the prediction as zero)
b) build a model on example 1 and predict example 2. Average the now 2 squared losses
c) build a model on example 1-2 and predict example 3. Average the now 3 squared losses
d) ...
Do this until you hit the end of the data (assuming a single pass)
2) What is the current features columns? It appears to be the number of non-zero features + an intercept. What is shown in the example, suggests that a feature is not counted if it is zero - is that true? For instance, the second record has a value of zero for 'ZN'. Does VW really look at that numeric feature as missing??
Your statements are basically correct. By default, VW does online learning, so in step c it takes the current model (weights) and updates it with the current example (rather than learning from all the previous examples again).
As you supposed, the current features column is the number of (non-zero) features for the current example. The intercept feature is included automatically, unless you specify --noconstant.
There is no difference between a missing feature and a feature with zero value. Both means that you won't update the corresponding weight.

Best practices for overall rating calculation

I have LAMP-based business application. SugarCRM to be more precise. There are 120+ active users at the moment. Every day each user generates some records that are used in complex calculation to get so called “individual rating”.
It takes for about 6 seconds to calculate one “individual rating” value. And there was not a big problem before: each user hits the link provided to start “individual rating” calculations, waits for 6-7 seconds, and get the value displayed.
But now I need to implement “overall rating” calculation. That means that additionally to “individual rating” I have to calculate and display to the user:
minimum individual rating among ALL the users of the application
maximum individual rating among ALL the users of the application
current user position in the range of all individual ratings.
Say, current user has individual rating equal to 220 points, minimum value of rating is 80, maximum is 235 and he is on 23rd position among all the users.
What are (imho) the main problems to be solved?
If one calculation lasts for 6 seconds, that overall calculations will take more than 10 minutes. I think it’s no good to make the application almost unaccessible for this period. And what if the quantity of users will rise in the nearest future 2-3 times?
Those calculations could be done as nightly job but all the users are in different timezones. In Russia difference between extreme timezones is 9 hours. So people in west part of Russia are still working in “today”. While people in eastern part is waking up to work in “tomorrow”. So what is the best time for nightly job in this case?
Are there any best practices|approaches|algorithms to build such rating system?
Given only the information provided, the only options I see:
The obvious one - reduce the time taken for a rating calculation (6 seconds to calculate 1 user's rating seems like a lot)
If possible, have intermediate values which you only recalculate some of, as required (for example, have 10 values that make up the rating, all based on different data, when some of the data changes, flag the appropriate values for recalcuation). Either do this recalculation:
During your daily recalculation or
When the update happens
Partial batch calculation - only recalculate x of the users' ratings at chosen intervals (where x is some chosen value) - has the disadvantage that, at all times, some of the ratings can be out of date
Calculate if not busy - either continuously recalculate ratings or only do so at a chosen interval, but instead of locking the system, have it run as a background process, only doing work if the system is idle
(Sorry, didn't manage with "long" comment posting; so decided to post as answer)
#Dukeling
SQL query that takes almost all the time for calculation mentioned above is just a replication of business logic that should be executed in PHP code. The logic was moved into SQL with the hope to reduce calculation time. OK, I’ll try both to optimize SQL query and play with executing logic in PHP code.
Suppose after that optimized application calculates individual rating for just 1 second. Great! But even in this case the first user logged into system should awaits for 120 seconds (120+ users * 1 sec = 120 sec) to calculate overall rating and gets its position in it.
I’m thinking of implementing the following approach:
Let’s have 2 “overall ratings” – “today” and “yesterday”.
For displaying purposes we’ll use “yesterday” overall rating represented as huge already sorted PHP array.
When user hits calculation link he started “today” calculation but application displays him “yesterday” value. Thus we have quickly accessible “yesterday” rating and each user randomly launches rating calculation that will be displayed for them tomorrow.
User list are partitioned by timezones. Each hour a cron job started to check if there’re any users in selected timezone that don’t have “today” individual rating calculated (e.g. user didn’t log into application). If so, application starts calculation of individual rating and puts its value in “today” (still invisible) ovarall rating array. Thus we have a cron job that runs nightly for each timezone-specific user group and fills the probable gaps in case users didn’t log into system.
After all users in all timezones had been worked out, application
sorts “today” array,
drops “yesterday” one,
rename “today” in “yesterday” and
initialize new “today”.
What do you think of it? Is it reasonable enough or not?

Cross Validation in Weka

I've always thought from what I read that cross validation is performed like this:
In k-fold cross-validation, the original sample is randomly
partitioned into k subsamples. Of the k subsamples, a single subsample
is retained as the validation data for testing the model, and the
remaining k − 1 subsamples are used as training data. The
cross-validation process is then repeated k times (the folds), with
each of the k subsamples used exactly once as the validation data. The
k results from the folds then can be averaged (or otherwise combined)
to produce a single estimation
So k models are built and the final one is the average of those.
In Weka guide is written that each model is always built using ALL the data set. So how does cross validation in Weka work ? Is the model built from all data and the "cross-validation" means that k fold are created then each fold is evaluated on it and the final output results is simply the averaged result from folds?
So, here is the scenario again: you have 100 labeled data
Use training set
weka will take 100 labeled data
it will apply an algorithm to build a classifier from these 100 data
it applies that classifier AGAIN on
these 100 data
it provides you with the performance of the
classifier (applied to the same 100 data from which it was
developed)
Use 10 fold CV
Weka takes 100 labeled data
it produces 10 equal sized sets. Each set is divided into two groups: 90 labeled data are used for training and 10 labeled data are used for testing.
it produces a classifier with an algorithm from 90 labeled data and applies that on the 10 testing data for set 1.
It does the same thing for set 2 to 10 and produces 9 more classifiers
it averages the performance of the 10 classifiers produced from 10 equal sized (90 training and 10 testing) sets
Let me know if that answers your question.
I would have answered in a comment but my reputation still doesn't allow me to:
In addition to Rushdi's accepted answer, I want to emphasize that the models which are created for the cross-validation fold sets are all discarded after the performance measurements have been carried out and averaged.
The resulting model is always based on the full training set, regardless of your test options. Since M-T-A was asking for an update to the quoted link, here it is: https://web.archive.org/web/20170519110106/http://list.waikato.ac.nz/pipermail/wekalist/2009-December/046633.html/. It's an answer from one of the WEKA maintainers, pointing out just what I wrote.
I think I figured it out. Take (for example) weka.classifiers.rules.OneR -x 10 -d outmodel.xxx. This does two things:
It creates a model based on the full dataset. This is the model that is written to outmodel.xxx. This model is not used as part of cross-validation.
Then cross-validation is run. cross-validation involves creating (in this case) 10 new models with the training and testing on segments of the data as has been described. The key is the models used in cross-validation are temporary and only used to generate statistics. They are not equivalent to, or used for the model that is given to the user.
Weka follows the conventional k-fold cross validation you mentioned here. You have the full data set, then divide it into k nos of equal sets (k1, k2, ... , k10 for example for 10 fold CV) without overlaps. Then at the first run, take k1 to k9 as training set and develop a model. Use that model on k10 to get the performance. Next comes k1 to k8 and k10 as training set. Develop a model from them and apply it to k9 to get the performance. In this way, use all the folds where each fold at most 1 time is used as test set.
Then Weka averages the performances and presents that on the output pane.
once we've done the 10-cross-validation by dividing data in 10 segments & create Decision tree and evaluate, what Weka does is run the algorithm an eleventh time on the whole dataset. That will then produce a classifier that we might deploy in practice. We use 10-fold cross-validation in order to get an evaluation result and estimate of the error, and then finally we do classification one more time to get an actual classifier to use in practice.
During kth cross validation, we will going to have different Decision tree but final one is created on whole datasets. CV is used to see if we have overfitting or large variance issue.
According to "Data Mining with Weka" at The University of Waikato:
Cross-validation is a way of improving upon repeated holdout.
Cross-validation is a systematic way of doing repeated holdout that actually improves upon it by reducing the variance of the estimate.
We take a training set and we create a classifier
Then we’re looking to evaluate the performance of that classifier, and there’s a certain amount of variance in that evaluation, because it’s all statistical underneath.
We want to keep the variance in the estimate as low as possible.
Cross-validation is a way of reducing the variance, and a variant on cross-validation called “stratified cross-validation” reduces it even further.
(In contrast to the the “repeated holdout” method in which we hold out 10% for the testing and we repeat that 10 times.)
So how does cross validation in Weka work ?:
With cross-validation, we divide our dataset just once, but we divide into k pieces, for example , 10 pieces. Then we take 9 of the pieces and use them for training and the last piece we use for testing. Then with the same division, we take another 9 pieces and use them for training and the held-out piece for testing. We do the whole thing 10 times, using a different segment for testing each time. In other words, we divide the dataset into 10 pieces, and then we hold out each of these pieces in turn for testing, train on the rest, do the testing and average the 10 results.
That would be 10-fold cross-validation. Divide the dataset into 10 parts (these are called “folds”);
hold out each part in turn;
and average the results.
So each data point in the dataset is used once for testing and 9 times for training.
That’s 10-fold cross-validation.

Resources