Obs Best _streak_
1 Freeburg Foxes 1
2 Freeburg Foxes 2
3 Freeburg Foxes 3
4 Freeburg Foxes 4
5 Charlotte Chipmunks 1
6 Toronto Turtles 1
7 Toronto Turtles 2
8 Freeburg Foxes 1
9 Freeburg Foxes 2
10 Toronto Turtles 1
...
Obs Best _streak_
1 Freeburg Foxes 4
2 Charlotte Chipmunks 1
3 Toronto Turtles 2
4 Freeburg Foxes 2 (thanks for correcting)
...
Above (first one on top) is my current SAS output. However I want to display only the max amount of times a team has been on a streak with team name once. So My output would look like the second one (or the shorter output).
If the data is sorted in the order you've specified then you can get your result with just one pass of the data, using the NOTSORTED option.
data have;
input best & $20. _streak_;
datalines;
Freeburg Foxes 1
Freeburg Foxes 2
Freeburg Foxes 3
Freeburg Foxes 4
Charlotte Chipmunks 1
Toronto Turtles 1
Toronto Turtles 2
Freeburg Foxes 1
Freeburg Foxes 2
Toronto Turtles 1
;
run;
data want;
set have;
by best notsorted;
if last.best;
run;
I believe in the desired output observation 4 should be:
4 Freeburg Foxes 2
?
So that we choose the maximum streak for each contingent series of records for each team, not the absolute maximum, right?
Then you can do that like this, adding the second instance of the same dataset, shifted one row up, so that to be able "to look ahead" and decide that current record is the last in a series::
data want;
set have;
if not eof then do;
set have(firstobs=2 keep=Best
rename=(Best=nextBest))
end=eof;
end;
if Best^=nextBest or eof then output;
drop next:;
run;
Related
I have a matrix (m) of scores for 4 students on 3 different exams.
4 3 1
3 2 5
8 4 6
1 5 2
I want to know, for each student, the exams they did best to worse on. Desired output:
1 2 3
2 3 1
1 3 2
3 1 2
Now, I'm new to the language (and coding in general), so I read GeeksforGeeks' page on sorting in Julia and tried
mapslices(sortperm, -m; dims = 2)
However, this gives something subtly different: a matrix of each row being the index of the sorting.
1 2 3
3 1 2
1 3 2
2 3 1
Perhaps it was obvious, but I now realize this is not actually what I want, but I cannot find a built-in function/fast way to complete this operation. Any ideas? Preferably something which doesn't iterate through items in the matrix/row, as in reality my matrix is very, very large. Thanks!
Such functionality is provided by StatsBase.jl. Here is an example:
julia> using StatsBase
julia> m = [4 3 1
3 2 5
8 4 6
1 5 2]
4×3 Array{Int64,2}:
4 3 1
3 2 5
8 4 6
1 5 2
julia> mapslices(x -> ordinalrank(x, rev=true), m, dims = 2)
4×3 Array{Int64,2}:
1 2 3
2 3 1
1 3 2
3 1 2
You might want to use other rank, depending on how you want to split ties, see here for details.
Figured out something which works!
Run m_index_rank = mapslices(sortperm, -m; dims = 2) on the matrix and get a ranking for each row through index. Then, realizing this is, in each row, an inverse permutation away from the desired output, run mapslices(invperm, m_index_rank; dims = 2) for the desired result.
In one line, this is mapslices(r -> invperm(sortperm(r, rev=true)), m; dims=2) over the desired matrix m. dims = 2 is to carry out the operation row-wise.
I'm marking this resolved for now, but please let me know if there are cleaner/faster ways to do this.
Edit: Replaced my syntactically clunky mapslices(invperm, mapslices(sortperm, -m; dims = 2); dims = 2) with a more natural one, thanks to #phipsgabler
Let's assume these are our numbers and we are looking for mod for them
which we can find them using library(pracma)
> mod(c(1,4,23,13,8,9,11,27,32,2),7)
> [1] 1 4 2 6 1 2 4 6 4 2
I want to get a number to see where each number is coming from when it is a matrix?
1,1,4,2,2,2,2,4,5,1
For example; if this is an m by 7 matrix;
We know that it is on 2nd column but what row? 9 is 2nd row (2,2) but not quotient is 1, then 23 is 4th row (4,2) but quotient is 3. Finally, last element 2 is on (1,2).
I am looking for row position since I can use the mod as a column position.
I came out with this
b=c(1,4,23,13,7,9,11,27,32,2)
floor(b/7+1)
[1] 1 1 4 2 2 2 2 4 5 1
Im a little newbie with R and not familiar with PCA. My problem is, from a survey I have a list with observations from nine variables, first one is the gender of the respondents, the next five (Q51_1_c,Q51_2_c,Q51_4_c,Q51_6_c,Q51_7_c) ask about entrepreneurial issues and the others ask about future expectations (Q56_1_c, Q56_2_c, Q56_3_c). Except gender, all this variables takes values between 1 and 5. I want to make a scatter plot with two axis. First one with "entrepreneurial variables" and second axis with "future expectations variables" and then define as points in the scatter plot the position of Male and Female. My data look like this:
x <- "Q1b Q51_1_c Q51_2_c Q51_4_c Q51_6_c Q51_7_c Q56_1_c Q56_2_c Q56_3_c
3 Male 5 4 4 4 4 5 4 4
4 Female 4 3 4 4 3 3 4 3
5 Female 1 1 1 1 1 3 1 1
7 Female 2 1 1 1 1 5 1 4
8 Female 4 4 5 4 4 5 4 4
9 Female 3 3 4 4 3 3 4 4
13 Male 4 4 4 4 5 3 3 3
15 Female 3 4 4 4 4 1 1 5
16 Female 4 1 4 4 4 3 3 3
19 Female 3 2 3 3 3 3 3 3
20 Male 1 1 1 1 1 3 1 5
21 Female 3 1 1 2 1 3 3 3
26 Female 5 5 1 2 1 4 4 3
27 Female 2 1 1 1 1 1 1 1
29 Male 2 2 2 2 1 4 4 4
31 Female 3 1 1 1 1 5 2 3
34 Female 4 1 1 4 3 3 1 4
36 Female 5 1 1 4 4 5 1 2
37 Male 5 1 2 4 4 5 4 5
38 Female 3 1 1 1 1 1 1 1"
To run PCA this is my code:
x <- na.omit(x) #Jus to simplyfy
resul <- prcomp(x[,-1], scale = TRUE)
x$PC1 <- resul$x[,1] #Saving Scores PC1
x$PC2 <- resul$x[,2] #Saving Scores PC2
The result axis are like this:
biplot(resul, scale = 0)
Finally, to make the scatter plot:
x %>%
group_by(Q1b) %>%
summarise(mean_PC1 = mean(PC1),
mean_PC2 = mean(PC2)) %>%
ggplot(aes(x=mean_PC1, y=mean_PC2, colour=Q1b)) +
geom_point() +
theme_bw()
Which gives me this:
I'm not sure how about read the results... Should I accept that Females in general get higher values in the dimension of future expectations than Males. And Males get higher values in the entrepreneurial dimension?
Thanks in advance!!
Your interpretation of the axes looks correct, i.e., PC1 is a gradient which from left to right represents decreasing "entrepreneurialness", while PC2 is a gradient which from bottom to top represents increasing future expectations (assuming that "5" in the original data means highest entrepreneurialness/expectations).
In terms of whether males and females are different, you probably need to plot more than the just the means for each group: even if males and females are truly identical in their entrepreneurialness/expectations, you'd never expect the means from two samples to sit right on top of each other on a scatter plot. To address this, you could plot the actual observations rather than their means (i.e., one point per row, coloured by gender) and see if they intermingle vs. separate in the plot space. Or, regress gender against the principal components.
Another issue is whether it's appropriate to use PCA on ordinal data - see here for discussion.
I'm familiar with finding two step dominances when the players involved have only played each other once - you create a matrix of results filled with 1's (for wins) and 0's (for losses/ties), then square it. To find the power of each team you square the matrix then add it to itself.
So, how does the process change when you have teams involved that have played each other more than once and there are 2's introduced into the matrix? I'm working this with Matlab (Octave actually), and when I enter the matrix, which is actually a 31x31 matrix showing the results from the 2001-2002 NFL season, then square it, I get results showing that teams had dominance over themselves - like this:
Original Matrix (abbreviated):
Buf Ind Mia NE NYJ
Buf 0 0 0 0 1
Ind 2 0 0 0 1
Mia 2 2 0 1 0
NE 2 2 1 0 1
NYJ 1 1 2 1 0
Squared Matrix (abbreviated):
Buf Ind Mia NE NYJ
Buf 1 1 2 1 0
Ind 2 1 2 2 2
Mia 8 3 1 1 5
NE 10 4 2 2 4
NYJ 9 8 1 3 3
So how do I address the issue of the results showing a team having dominance over itself and get to my final power numbers like I would in a "played only once" scenario?
Thanks in advance.
I've had this same problem with soccer games with 2 points for a win, 1 for a draw and 0 for a loss, but I belief that is is possible to have a team with dominance over itself because they have beaten the team that beat them (or for soccer the draw). Therefore, I would say that you can just continue on as is. (p.s. I am a Year 11 Maths C student, so there may be other explainations for this)
I am trying to find a O (n) algorithm for this problem but unable to do so even after spending 3 - 4 hours. The brute force method times out (O (n^2)). I am confused as to how to do it ? Does the solution requires dynamic programming solution ?
http://acm.timus.ru/problem.aspx?space=1&num=1794
In short the problem is this:
There are some students sitting in circle and each one of them has its own choice as to when he wants to be asked a question from a teacher. The teacher will ask the questions in clockwise order only. For example:
5
3 3 1 5 5
This means that there are 5 students and :
1st student wants to go third
2nd student wants to go third
3rd student wants to go first
4th student wants to go fifth
5th student wants to go fifth.
The question is as to where should teacher start asking questions so that maximum number of students will get the turn as they want. For this particular example, the answer is 5 because
3 3 1 5 5
2 3 4 5 1
You can see that by starting at fifth student as 1st, 2 students (3 and 5) are getting the choices as they wanted. For this example the answer is 12th student :
12
5 1 2 3 6 3 8 4 10 3 12 7
because
5 1 2 3 6 3 8 4 10 3 12 7
2 3 4 5 6 7 8 9 10 11 12 1
four students get their choices fulfilled.
It's actually a rather simple problem. If student k wants to be the jth to present, then she will be satisfied iff the (k - j + 1)th (modulo n) is the first to present. This should lead you to a a simple O(n) algorithm.