Qualtrics survey: draw question stimuli from answers in previous wave - survey

I'm trying to have two qualtrics surveys, where participants participate to both surveys in two waves, and where in the second wave they respond to stimuli that are drawn from responses from other participants to the first wave. To make things concrete, an example could be: everyone submits a paragraph of text in the first wave, and in the second wave the same group of participants then rates the creativity of the submitted text of three participants other than themselves.
This question is a bit similar to How do you display previous answers of a DIFFERENT survey participant in qualtrics survey?, but not exactly because I would work with two waves. I hope this means a solution is possible, for example based on assigning all participants a sequential participation id before the survey starts, and then retrieving answers from the previous waves or something of the kind.
As this is for a large group of participants, I need to rule out solutions that require lots of manually handling the data.
Any help, ideas, or even example projects would be highly appreciated.Also, if you believe solutions other than qualtrics would work better, I would be open to those as well.
Thanks in advance,
Bart

Related

Many small models vs. one big model

I am currently implementing kind of a questionnaire with a chatbot and use LUIS to interpret the answers. This questionnaire is divided into segments of maybe 20 questions.
Now the following question came to mind: Which option is better?
I make one LUIS model per question. Since these questions can include implicit subquestions (i.e. "Do you want to pay all at once or in rates" could include a "How high should these rates be?") I end up with maybe 3-5 intents per question (including None).
I can make one model per segment. Let's assume that this is possible and fits in the 80 intents per model.
My first intuition was to think that the first alternative is better since this should be way more robust. When there are only 5 intents to choose from, then it may be easier to determine the right one. As far as I know, there is no restriction in how many models you can have (...is this correct?).
So here is my question for SO: What other benefits/drawbacks are there and is there maybe an option that is objectively the best?
You can have as many models as you want, there is no limit on this. But onto the rest of your question:
You intend to use LUIS to interpret every response? I'm curious as to the actual design of the questionnaire and why you need (or want) open ended responses and not multiple-choice questions. "Do you want to pay all at once or in rates" itself is a binary question. Branching off of this, users might respond with, "Yes I want to pay all at once", which could use LUIS. Or they could respond with, "rates" which could be one of two choices available to the user in a Prompt/FormFlow. "rates" is also much shorter than the first answer and thus a selection that would probably be typed more often than not.
Multiple-choice questions provide a standardized input which would reduce the amount of work you'd have to do in managing your data. It also would most likely reduce the amount of effort needed to maintain the models and questionnaire.
Objectively speaking, one model is more likely to be less work, but we can drill down a little further:
First option:
If your questionnaire segments include 20 questions and you have 2 segments, you have 40 individual models to maintain which is a nightmare.
Additionally, you might experience latency issues depending on your recognizer order, because you have to wait for a response from 40 endpoints. This said it IS possible to turn off recognizers so you might only need to wait for one recognizer. But then you need to manually turn on the next recognizer and turn off the previous one. You should also be aware that handling multiple "None" intents is a nightmare, in case you wish to leave every recognizer active.
I'm assuming that you'll want assistance in managing you models after you realize the pain of handling forty of them by yourself. You can add collaborators, but then you need to add them to multiple models as well. One day you'll (probably) have to remove them from all of the models they have collaborator status on.
The first option IS more robust but also involves a rather extreme amount of work hours. You're partially right in that fewer intents is helpful because of fewer possibilities for the model to predict. But the predictions of your models become more accurate with more utterances and labeling, so any bonus gained by having 5 intents per model is most likely to be rendered moot.
Second option:
Using one model per segment, as mentioned above is less work. It's less work to maintain, but what are some downsides? Until you train your model well enough, there may indeed be some unintended consequences due to false-positive predictions. That said, you could account for that in your questionnaire/bot/questionnaire-bot's code to specifically look for the expected intents for the question and then use the highest scoring intent from this subset if the highest scoring intent overall doesn't match to your question.
Another downfall is that if it's one model and a collaborator makes a catastrophic mistake, it affects the entire segment. With multiple models, the mistake would only affect the one question/model, so that's a plus.
Aside from not having to deal with multiple None-intent handling, you can quickly label utterances that should belong to the None intent. What you label as an intent in a singular model essentially makes it stand out more against the other intents inside of the model. If you have multiple models, an answer that triggers a specific intent in one model needs to trigger the None intent in your other models, otherwise, you'll end up with multiple high scoring intents (and the relevant/expected intents might not be the highest scoring).
End:
I recommend the second object, simply because it's less work. Also, I'm not sure of the questionnaire's goals, but as a general rule, I question the need of putting in AI where it's not needed. Here is a link that talks about factors that do not contribute to a bot's success (note that Natural Language is one of these factors).

Comparing Two Images / Matching a Complex Gesture

Can anyone give some advice to the thought process of matching complex patterns. For example, if I show the user a treble-clef and they draw the pattern.
I need to generate a number representing how close the line was to the source. For my example, i want to ignore speed, pauses, or other time based elements just the final product.
I have been racking my brain and 'google'ing but have not seen any example that helps.
I searched for "Computer Vision Gesture Recognition" and found a student review paper that surveys the field and may give you some helpful references.

How to implement this recommendation algorithm?

Most recommendation algorithm articles I've read are focused on the Netflix model where users rate items. What I want to do is slightly different (I think).
Let's say that instead, I want to create a site where a user is presented with two pictures of cars. The user can then select which car they like better. The user can repeat this process as many times as s/he likes, but hopefully as they continue, the pictures become more and more refined towards what the user likes.
How would you implement this algorithm? It seems like one possible way would simply be to implement an ELO ranking algorithm and use the order of those results as a "rating", but that has serious flaws in that multiple items can't be given a maximum rating (which the user may have done if given the ability to rate the items themselves).
Another method, which seems more promising to me, would be to predetermine the general properties of each vehicle (e.g. color, body type, 2 door vs 4 door, etc.) and use those to get a general idea of the properties each user likes and base recommendations off of that.
I'll take a stab at this.
Suppose that each car is given a set of properties. If this set of properties were coded as a vector, one potential method of recommendation would be to use Self Organizing Maps (SOM). The basic gist of a SOM is that is a categorizer of input vectors. If you train a SOM with input vectors representing distinct classes of input, a SOM will start to cluster its storage vectors to be more like each class of input. Note that the original input vector is not retained. To train a SOM with an input vector, the best vector currently in the SOM is picked and then the area around that vector becomes more like the input. Of course, see Wikipedia http://en.wikipedia.org/wiki/Self-organizing_map.
So how does this apply to this situation? Well, one SOM could be used to train on images that the person does like and one could be trained on the ones they do like. Even if there are is no single style that they like, clusters should form around cars they like/don't like. Then seeing if they like a car that has not been picked by them is a matter of finding how well it matches to groups from their likes and dislikes. Note that in this case, it would be best to match up cars that are dissimilar to each other or more likely to not both be liked.
When the person first joins the site, it may be advantageous to allow them to pick a few likes and dislikes right off the bat to seed the SOMs.
Good luck!
Maby it's a bit too late to answer, but you might want to check this out. It's about an MIT professor that argues that 5-star rating, like-rating, etc. don't work, he proposes an algorithm that works with input by pairs, just as you suggest (Car A or Car B).
The Algorithm is quite complex but can be found on the link.

algorithms to evaluate user responses

I'm working on a web application which will be used for classifying photos of automobiles. The users will be presented with photos of various vehicles, and will be asked to answer a series of questions about what they see. The results will be recorded to a database, averaged, and displayed.
I'm looking for algorithms to help me identify users which frequently don't vote with the group, indicating that they're probably either not paying attention to the photos, or that they're lying about what they see. I then want to exclude these users, and recalculate the results, such that I can say, with a known amount of confidence, that this particular photo shows a vehicle that is this and that.
This question goes out to all you computer science guys, where to find such algorithms or to give myself the theoretical background to design such algorithms. I'm assuming I'm going to have to learn some probability and statics, maybe some data mining. Some book recommendations would be great. Thanks!
P.S. These are multiple choice questions.
All of these are good suggestions. Thank you! I wish there was a way on stack overflow to select multiple correct answers so more of you could be acknowledged for your contributions!!
Read The Elements of Statistical Learning, it is a great compendium on data mining.
You can be interested especially in unsupervised algorithms, for example clustering. Assuming that most people do not lie, the biggest cluster is right and the rest is wrong. Mark people accordingly, then apply some bayesian statistics and you'll be done.
Of course, most data mining technologies are pretty experimentative, so don't count on that they will be always right... or even in most cases.
I believe what you described is solved using outlier/anomaly detection.
A number of techniques exist:
statistical-based methods
distance-based methods
model-based methods
I suggest you take a look at these slides from the excellent book Introduction to Data Mining
If you know what answers you are expecting why do you ask people to vote? By excluding some values you basically turn the vote in something that you like. Automobiles make different impression to different individuals. If 100 ppl loved a car then when someone comes and says that he/she doesn't like it, you exclude the vote?
But anyway, considering that you still want to do this, first of all you will need a large set o data from "trusted" voters. This will give you an idea of "good" answer and from this point you can choose the exclude threshold.
Without an initial set of data you cannot apply any algorithm because you will get false results. Consider just one vote of 100 from on a scale from 0 to 100. The second vote is "1" The you will exclude this vote because is too far away from the average.
I think a pretty simple algorithm could accomplish this for you. You could try and get fancier by calculating the standard deviations and such, but I wouldn't bother.
Here's a simple approach that should be sufficient:
For each of your users, calculate the number of questions they answered and the number of times they selected the most popular answer for the question. The users which have the lowest ratio of picking the popular answer versus total answers you can guess are providing bogus data.
You probably would not want to throw out the data from users where they've only answered a small number of questions because they likely have just disagreed on a few versus putting in bogus data.
What kind of questions are they (Yes/No, or 1 to 10?).
You may be able to get away with not discarding anything by using a mean instead of an average. With averages if there are extreme outliers in the response it could affect the average, but if you use median you may get a better answer. So for example if you had 5 answers, order them and pick the middle one.
I think what you are saying is that you are concerned that certain people are "outliers", and they are adding noise to your data, making the categorizations less reliable. So, if you have a Chevy Camaro, and most people say it is either a pony car, a muscle car, or a sports car, but you have some goofball who says it's a family sedan, you would want to minimize the impact of his vote.
One thing you could do is provide a Stack Overflow-like reputation score for users:
The more a user is "in agreement" with other users, the better his or her score would be. For a given user (User X), this could be determined by a simple calculation of what percentage of users who responded to a question chose the same category as User X, then averaging this value over all questions answered.
You may want to to multiply this value by the total number of question answered to encourage people to answer as many questions as possible. (Note: if you choose to do this, it would be equivalent to just summing the percentage agreement scores rather than averaging them.)
You could present the final reputation score to users, making sure to explain that they will be rewarded for how well their responses agree with those of other users. This will encourage people to answer more questions but also to take care in their answers.
Finally, you could calculate a certainty score for a given categorization by adding up the total reputation score of all people who chose a given category.
Some of these ideas may need some refinement, especially since I don't know your exact situation. Certainly, if people can see what other people chose before they vote, it would be way too easy to game the system.
If you were to collect votes like "on a scale from 1 to 10, how would you rate this car", you could probably use simple average and standard deviation: the smaller the standard deviation, the more unanimous the general consensus is among your voters, and you can flag users who are e.g. 3 standard devs from the average.
For multiple choice, you need to be more careful. Simply discarding all but the most-voted option will do nothing but disgruntle the voters. You need to establish a measure of how significant the winner is w.r.t. the other options, e.g. flag users who voted for options with less than 1/3 of the winning options count.
Note that I wrote "flag users", not discard votes. If you discard votes, you can't tell how confident you are about the result ("91% voted this to be a Ford Mustang"). If a user has more than a certain percentage of his votes flagged - well, that's up to you.
Your trickiest problem, however, will probably be to collect sufficient votes. Depending on how easy the multiple choice problem is, you probably need several times the number of options as votes, per photo. Otherwise the statistics are meaningless.

Algorithms to find stuff a user would like based on other users likes

I'm thinking of writing an app to classify movies in an HTPC based on what the family members like.
I don't know statistics or AI, but the stuff here looks very juicy. I wouldn't know where to start do.
Here's what I want to accomplish:
Compose a set of samples from each users likes, rating each sample attribute separately. For example, maybe a user likes western movies a lot, so the western genre would carry a bit more weight for that user (and so on for other attributes, like actors, director, etc).
A user can get suggestions based on the likes of the other users. For example, if both user A and B like Spielberg (connection between the users), and user B loves Batman Begins, but user A loathes Katie Holmes, weigh the movie for user A accordingly (again, each attribute separately, for example, maybe user A doesn't like action movies so much, so bring the rating down a bit, and since Katie Holmes isn't the main star, don't take that into account as much as the other attributes).
Basically, comparing sets from user A similar to sets from user B, and come up with a rating for user A.
I have a crude idea about how to implement this, but I'm certain some bright minds have already thought of a far better solution already, so... any suggestions?
Actually, after a quick research, it seems a Bayesian filter would work. If so, would this be the better approach? Would it be as simple as just "normalizing" movie data, training a classifier for each user, and then just classify each movie?
If your suggestion includes some brain melting concepts (I'm not experienced in these subjects, specially in AI), I'd appreciate it if you also included a list of some basics for me to research before diving into the meaty stuff.
Thanks!
Matthew Podwysocki had some interesting articles on this stuff
http://codebetter.com/blogs/matthew.podwysocki/archive/2009/03/30/functional-programming-and-collective-intelligence.aspx
http://codebetter.com/blogs/matthew.podwysocki/archive/2009/04/01/functional-programming-and-collective-intelligence-ii.aspx
http://weblogs.asp.net/podwysocki/archive/2009/04/07/functional-programming-and-collective-intelligence-iii.aspx
This is similar to this question where the OP wanted to build a recommendation system. In a nutshell, we are given a set of training data consisting of users ratings to movies (1-5 star rating for example) and a set of attributes for each movie (year, genre, actors, ..). We want to build a recommender so that it will output for unseen movies a possible rating. So the inpt data looks like:
user movie year genre ... | rating
---------------------------------------------
1 1 2006 action | 5
3 2 2008 drama | 3.5
...
and for an unrated movie X:
10 20 2009 drama ?
we want to predict a rating. Doing this for all unseen movies then sorting by predicted movie rating and outputting the top 10 gives you a recommendation system.
The simplest approach is to use a k-nearest neighbor algorithm. Among the rated movies, search for the "closest" ones to movie X, and combine their ratings to produce a prediction.
This approach has the advantage of being very simple to easy implement from scratch.
Other more sophisticated approaches exist. For example you can build a decision tree, fit a set of rules on the training data. You can also use Bayesian networks, artificial neural networks, support vector machines, among many others... Going through each of these wont be easy for someone without the proper background.
Still I expect you would be using an external tool/library. Now you seem to be familiar with Bayesian Networks, so a simple naive bayes net, could in fact be very powerful. One advantage is that it allow for prediction under missing data.
The main idea would be somewhat the same; take the input data you have, train a model, then use it to predict the class of new instances.
If you want to play around with different algorithms in simple intuitive package which requires no programming, I suggest you take a look at Weka (my 1st choice), Orange, or RapidMiner. The most difficult part would be to prepare the dataset to the required format. The rest is as easy as choosing what algorithm and applying it (all in a few clicks!)
I guess for someone not looking to go into too much details, I would recommend going with the nearest neighbor method as it is intuitive and easy to implement.. Still the option of using Weka (or one of the other tools) is worth looking into.
There are a few algorithms that are good for this:
ARTMAP: groups via probability against each other (this isn't fast but its the best thing for your problem IMO)
ARTMAP holds a group of common attributes and determines likelyhood of simliarity via a percentages.
ARTMAP
KMeans: This seperates out the vectors by the distance that they are from each other
KMeans: Wikipedia
PCA: will seperate the average of all the values from the varing bits. This is what you would use to do face detection, and background subtraction in Computer Vision.
PCA
The K-nearest neighbor algorithm may be right up your alley.
Check out some of the work of the top teams for the netflix prize.

Resources