Ranking questions on REDCap - matrix

We have a matrix of questions with 12 field label variable (resources) and options for this matrix are - Strongly disagree -to strongly agree. I want develop next question based on "Strongly agree answer' and asking participants to rank those variable/sources.
'The final set of questions relate to return to work resources the MDT team may have offered while you were still in the recovery unit. Please indicate how much you agree with the following statements.'
1 Help with CV
2 Apply funding .....
3
......
12 Adaptive equipment
For the following question - is it possible to pull out variables/ resources that participant choose as strongly agreed and ask them to rank them 1 to 5.
Thanks,
JM

If I understand correctly, you have a matrix of 12 fields with strongly agree to strongly disagree, and for each of those that are marked strongly agree, you want to have the respondent rank them from 1 through to 5.
What you can do is have another matrix field that asks respondents to select a choice between 1 and 5 for each of the 12 statements in the first matrix. Then, for each statement in the second matrix, add branching logic to only show it if the corresponding statement in the first matrix was strongly agree. Finally, set the matrix to 'Ranking'; this will only allow a single response per column (in addition to only a single response per row). This will mean a user may only have one 1, one 2, etc.,
Here's what that looks like in the designer:
However, there are problems with this. First, there is nothing to prevent someone selecting strongly agree to more than 5 choices, meaning there will be statements that cannot be ranked. Maybe this is good; maybe you only want them to be able to rank 5 and the way to handle that is to only provide 5 columns to rank. Here's a screenshot of a record with 6 statements to rank. Notice the last cannot be ranked or it will displace another statement's ranking:
The reverse is also true, if someone answers strongly agree to less than 5 statements, they will still see 5 columns to rank the less than 5 statements, so you might have 1, 2, 3, or it might be 1, 3, 4. This screenshot shows three statements being ranked, leaving two rank positions unfilled.
The problem is that the number of choices in a ranking matrix field cannot grow or shrink depending on the number of statements in it.

Related

Overall rank from multiple ranked lists

I've looked through a lot of literature available online, including this forum without any luck and hoping someone can help a statistical issue I currently face:
I have 5 lists of of ranked data, each containing 10 items ranked from position 1 (best) to position 10 (worst). For sake of context, the 10 items in each lists are the same, but in different ranked orders as the technique used to decide their rank is different.
*Example data:
List 1 List 2 List 3 ... etc
Item 1 Ranked 1 Ranked 2 Ranked 1
Item 2 Ranked 3 Ranked 1 Ranked 2
Item 3 Ranked 2 Ranked 3 Ranked 3
... etc*
I am looking for a way to interpret and analyse the above data so that I get a final result showing the overall rank of each item based on each test and its position, e.g.
Result
Rank 1 = Item 1
Rank 2 = Item 3
Rank 3 = Item 4
... etc
Does anyone know how I can interpret this data in a statistically sound method (at a post graduate / PhD applicable level) so that I can understand the overall ranks signalling the importance of each item in the list across the 5 tests please? Or, if there is another type of technique or statistical test I can look into I would appreciate any hints or guidance.
(It maybe also worth noting, I have also performed the simpler mathematical techniques such as sums, averaging, minimum - maximum tests etc, but do not feel these are statistically important enough at this level).
Any help or advice would be greatly appreciated, thank you for your time.
You can use machine learning to get your ranked list. In the Information Retrieval research field - this is called Learning to Rank - and there is a wide rage of literature about it. This tutorial (heads up: high level tutorial) can help you understand the basic concepts and point you to articles for deepening in.
You might also want to have a look on interleaved ranking. This was originally engineered for evaluation of two lists, but it might also be good for your case.
A number of non-parametric statistical tests work by turning the data received into ranks and then analysing the ranks (this can make life easier if the data are very far from being normally distributed). If your ranks are plausibly derived from some underlying score or goodness that you can't observe directly, you could apply any of these tests - there is a short list at http://en.wikipedia.org/wiki/Ranking#Ranking_in_statistics or any book on non-parametric statistics, such as Conover, should cover them.
If you can come up with a statistic you are interested in, such as the total rank of any one item, you could use a Permutation Test - http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Permutation_tests to work out the probability that the statistic concerned is at least as extreme as observed, under the probability that all of the rankings are simply random - you just generate loads of data that follows the null hypothesis and look at the distribution of the statistic in the randomly generated data. You can then use this to get a P-value, or, better, a confidence bound.

Need help to adapt a genetic algorithm to "solve" Traveling Salesman on Ruby

I just downloaded ai4r library http://ai4r.rubyforge.org/ and i am using the genetic algorithm to get a good route from multiple places, just like this:
http://ai4r.rubyforge.org/geneticAlgorithms.html
But i need to be able to set a start city.
Any clue on how to use this on a "fixed" start city?
In general, not looking at the Ruby-specific implementation, you can just remove the start city, and assume that the first edge followed is from your start city to whatever the GA produces. Just make sure that first edge is included in the cost/fitness function.
The solution of a traveling salesmen problem is a round-trip and is therefor independent of a start-city. After you have found your solution you can take any city of the round-trip as your start-city.
EDIT: If you do not need to return to your start-city, you can select the end city, by removing the larger of the two distances that leave your start city. If you remove in your final solution the largest distance in the whole round-trip you get the overall shortest tour that is not a round-trip. This is likely what they did on the web page you linked (Dublin - Moscow looks to be the most expensive direction). However, note that the authors of that page used a wrong location for Vienna and Madrid seems to be off as well.
Another way of when you'll be needing a start-city is when you have an additional time window constraint. This constraint specifies that for each city you need to be there at a certain time, the "depot" as your start-city is called in this case has a time window that covers the whole trip. However the TSPtw is a much more complex problem and often requires advanced genetic operators. You can also model the TSPtw as a CVRPtw (Capacitated Vehicle Routing Problem with Time Windows) if you use just one vehicle. You can try our VRP implementation in HeuristicLab to solve this problem. We have a mailing list if you require further support.
The answer you'll get from the GA to the TSP is a cycle, that means that the first city is the on that you desire. For example, if the answer is [3 4 2 6 1 5], and I want the first city to be 2 then I can "roll" the solution to [2 6 1 5 3 4].
Although, you can reduce your problem by 1 if in your evaluation function you specify the first city. In that case you must take into account that the individuals must be modified to account this. For example, if you set the first city to #2 (problem of 6 cities) and you have an individual that is [1 2 3 4 5] (6 cities minus one). The individual to evaluate is [2 1 3 4 5 6].

Algorithm to get probability of reaching a goal

Okay, I'm gonna be as detailed as possible here.
Imagine the user goes through a set of 'options' he can choose. Every time he chooses, he get, say, 4 different options. There are many more options that can appear in those 4 'slots'. Each of those has a certain definite and known probability of appearing. Not all options are equally probable to appear, and some options require others to have already been selected previously - in a complex interdependence tree. (this I have already defined)
When the user chooses one of the 4, he is presented another choice of 4 options. The pool of options is defined again and can depend on what the user has chosen previously.
Among all possible 'options' that can ever appear, there are a certain select few which are special, call them KEY options.
When the program starts, the user is presented the first 4 options. For every one of those 4, the program needs to compute the total probability that the user will 'achieve' all the KEY options in a period of (variable) N choices.
e.g. if there are 4 options altogether the probability of achieving any one of them is exactly 1 since all of them appear right at the beginning.
If anyone can advise me as to what logic i should start with, I'd be very grateful.
I was thinking of counting all possible choice sequences, and counting the ones resulting in KEY options being chosen within N 'steps', but the problem is the probability is not uniform for all of them to appear, and also the pool of options changes as the user chooses and accumulates his options.
I'm having difficulty implementing the well defined probabilities and dependencies of the options into an algorithm that can give sensible total probability. So the user knows each time which of the 4 puts him in the best position to eventually acquire the KEY options.
Any ideas?
EDIT:
here's an example:
say there are 7 options in the pool. option1, ..., option7
option7 requires option6; option6 requires option4 and option5;
option1 thru 5 dont require anything and can appear immediately, with respective probabilities option1.p, ..., option5.p;
the KEY option is, say, option7;
user gets 4 randomly (but weighted) chosen options among 1-5, and the program needs to say something like:
"if you choose (first), you have ##% chance of getting option7 in at most N tries." analogous for the other 3 options.
naturally, for some low N it is impossible to get option7, and for some large N it is certain. N can be chosen but is fixed.
EDIT: So, the point here is NOT the user chooses randomly. Point is - the program suggests which option to choose, as to maximize the probability that eventually, after N steps, the user will be offered all key options.
For the above example; say we choose N = 4. so the program needs to tell us which of the first 4 options that appeared (any 4 among option1-5), which one, when chosen, yields the best chance of obtaining option7. since for option7 you need option6, and for that you need option4 and option5, it is clear that you MUST select either option4 or option5 on the first set of choices. one of them is certain to appear, of course.
Let's say we get this for the first choice {option3, option5, option2, option4}. The program then says:
if you chose option3, you'll never get option7 in 4 steps. p = 0;
if you chose option5, you might get option7, p=....;
... option2, p = 0;
... option4, p = ...;
Whatever we choose, for the next 4 options, the p's are re calculated. Clearly, if we chose option3 or option2, every further choice has exactly 0 probability of getting us to option7. But for option4 and option5, p > 0;
Is it clearer now? I don't know how to getting these probabilities p.
This sounds like a moderately fiddly Markov chain type problem. Create a node for every state; a state has no history, and is just dependent on the possible paths out of it (each weighted with some probability). You put a probability on each node, the chance that the user is in that state, so, for the first step, there will be a 1 his starting node, 0 everywhere else. Then, according to which nodes are adjacent and the chances of getting to them, you iterate to the next step by updating the probabilities on each vertex. So, you can calculate easily which states the user could land on in, say, 15 steps, and the associated probabilities. If you are interested in asymptotic behaviour (what would happen if he could play forever), you make a big pile of linear simultaneous equations and just solve them directly or using some tricks if your tree or graph has a neat form. You often end up with cyclical solutions, where the user could get stuck in a loop, and so on.
If you think the user selects the options at random, and he is always presented the same distribution of options at a node, you model this as a random walk on a graph. There was a recent nice post on calculating terminating probabilities of a particular random walks on the mathematica blog.

sorting cards with wildcards

i am programming a card game and i need to sort a stack of cards by their rank. so that they form a gapless sequence.
in this special game the card with value 2 could be used as a wild card, so for example the cards
2 3 5
should be sorted like this
3 2 5
because the 2 replaces the 4, otherwise it would not be a valid sequence.
however the cards
2 3 4
should stay like they are.
restriction: there an be only one '2' used as a wildcard.
2 2 3 4
would also stay like it is, because the first 2 would replace the ACE (or 1, whatever you call it).
the following would not be a valid input sequence, since one of the 2s must be use as a wildcard and one not. it is not possible to make up a gapless sequence then.
2 4 2 6
now i have a difficulty to figure out if a 2 is used as a wildcard or not. once i got that, i think i can do the rest of the sorting
thanks for any algorithmic help on this problem!
EDIT in response to your clarification to your new requirement:
You're implying that you'll never get data for which a gapless sequence cannot be formed. (If only I could have such guarantees in the real world.) So:
Do you have a 2?
No: your sequence must already be gapless.
Yes: You need to figure out where to put it.
Sort your input. Do you see a gap? Since you can only use one 2 as a wildcard, there can be at most one gap.
No: treat the 2 as a legitimate number two.
Yes: move the 2 to the gap to fill it in.
EDIT in response to your new requirement:
In this case, just look for the highest single gap, and plug it with a 2 if you have a 2 available.
Original answer:
Since your sequence must be gapless, you could count the number of 2s you have and the sizes of all the gaps that are present. Then just fill in the highest gap for which you have a sufficient number of 2s.

What class of algorithms can be used to solve this?

EDIT: Just to make sure someone is not breaking their head on the problem... I am not looking for the best optimal algorithm. Some heuristic that makes sense is fine.
I made a previous attempt at formulating this and realized I did not do a great job at it so I removed that question. I have taken another shot at formulating my problem. Please feel free to provide any constructive criticism that can help me improve this.
Input:
N people
k announcements that I can make
Distance that my voice can be heard (say 5 meters) i.e. I may decide to announce or not depending on the number of people within these 5 meters
Goal:
Maximize the total number of people who have heard my k announcements and (optionally) minimize the time in which I can finish announcing all k announcements
Constraints:
Once a person hears my announcement, he is be removed from the total i.e. if he had heard my first announcement, I do not count him even if he hears my second announcement
I can see the same person as well as the same set of people within my proximity
Example:
Let us consider 10 people numbered from 1 to 10 and the following pattern of arrival:
Time slot 1: 1 (payoff = 1)
Time slot 2: 2 3 4 5 (payoff = 4)
Time slot 3: 5 6 7 8 (payoff = 4 if no announcement was made previously in time slot 2, 3 if an announcement was made in time slot 2)
Time slot 4: 9 10 (payoff = 2)
and I am given 2 announcements to make. Now if I were an oracle, I would choose time slots 2 and time slots 3 because then 7 people would have heard (because 5 already heard my announcement in Time slot 2, I do not consider him anymore). I am looking for an online algorithm that will help me make these decisions on whether or not to make an announcement and if so based on what factors. Does anyone have any ideas on what algorithms can be used to solve this or a simpler version of this problem?
There should be an approach relying upon a max-flow algorithm. In essence, you're trying to push the maximum amount of messages from start->end. Though it would be multidimensional, you could have a super-sink, which connects to each value of t, then have each value of t connect to the people you can reach at this time and then have a super-sink. This way, you simply have to compute a max-flow (with the added constraint of no more than k shouts, which should be solvable with a bit of dynamic programming). It's a terrifically dirty way to solve it, but it should get the job done deterministically and without the use of heuristics.
I don't know that there is really a way to solve this or an algorithm to do it the way you have formulated it.
It seems like basically you are trying to reach the maximum number of people with exactly 2 announcements. But without knowing any information about the groups of people in advance, you can't really make any kind of intelligent decision about whether or not to use your first announcement. Your second one at least has the benefit of knowing when not to be used (i.e. if the group has no new members then you can know its not worth wasting the announcement). But it still has basically the same problem.
The only real way to solve this is to use knowledge about the type of data or the desired outcome to make guesses. If you know that groups average 100 people with a standard deviation of 10, then you could just refuse to announce if less than 90 people are present. Or, if you know you need to reach at least 100 people with two announcements, you could choose never to announce to less than 50 at once. Obviously those approaches risk never announcing at all if the actual data does not meet what you would expect. But that's always going to be a risk, since you could get 1 person in the first group and then 0 in all of the rest, no matter what you do.
Or, you could try more clearly defining the problem, I have a hard time figuring out how to relate this to computers.
Lets start my trying to solve the simplest possible variant of the problem: Lets assume N people and K timeslots, but only one possible announcement. Lets also assume that each person will only ever stay for one timeslot and that each person who hasn't yet shown up has an equally probable chance of showing up at any future timeslot.
Given these simplifications, at each timeslot you look at the payoff of announcing at the current timeslot and compare to the chance of a future timeslot having a higher payoff, eg, lets assume 4 people 3 timeslots:
Timeslot 1: Person 1 shows up, so you know you could get a payoff of 1 by announcing, but then you have 3 people to show up in 2 remaining timeslots, so at least one of those timeslots is guaranteed to have 2 people, so don't announce..
So at each timeslot, you can calculate the chance that a later timeslot will have a higher payoff than the current by treating the remaining (N) people and (K) timeslots as being N independent random numbers each from 1..k, and calculate the chance of at least one value k being hit more than or equal to the current-payoff times. (Similar to the Birthday problem, but for more than 1 collision) and then you need to decide hwo much to discount based on expected variances. (bird in the hand, etc)
Generalization of this solution to the original problem is left as an exercise for the reader.

Resources