Using SAS to transfer data structure - data-structures

I have a question on using SAS for data structure transfer. This is my old dataset
question answer
1 3
2 4
3 5
4 3
5 1
1 2
2 4
3 1
4 3
5 6
The ideal output dataset is
ques1 ques2 ques3 ques4 ques5
3 4 5 3 1
2 4 1 3 6

The solution is simple. Create a dummy column which stores the questions group and then transpose that data with by variable as that group causing 2 separate output rows. Check out the following code.
data have;
infile datalines missover;
input question answer ;
if question=1 then group+1;
datalines;
1 3
2 4
3 5
4 3
5 1
1 2
2 4
3 1
4 3
5 6
;;;;
run;
proc transpose data=have out=want prefix=ques;
by group;
var answer;
id question;
run;
proc print data=want;run;

Related

Making a scatterplot with PCA and how to read results

Im a little newbie with R and not familiar with PCA. My problem is, from a survey I have a list with observations from nine variables, first one is the gender of the respondents, the next five (Q51_1_c,Q51_2_c,Q51_4_c,Q51_6_c,Q51_7_c) ask about entrepreneurial issues and the others ask about future expectations (Q56_1_c, Q56_2_c, Q56_3_c). Except gender, all this variables takes values between 1 and 5. I want to make a scatter plot with two axis. First one with "entrepreneurial variables" and second axis with "future expectations variables" and then define as points in the scatter plot the position of Male and Female. My data look like this:
x <- "Q1b Q51_1_c Q51_2_c Q51_4_c Q51_6_c Q51_7_c Q56_1_c Q56_2_c Q56_3_c
3 Male 5 4 4 4 4 5 4 4
4 Female 4 3 4 4 3 3 4 3
5 Female 1 1 1 1 1 3 1 1
7 Female 2 1 1 1 1 5 1 4
8 Female 4 4 5 4 4 5 4 4
9 Female 3 3 4 4 3 3 4 4
13 Male 4 4 4 4 5 3 3 3
15 Female 3 4 4 4 4 1 1 5
16 Female 4 1 4 4 4 3 3 3
19 Female 3 2 3 3 3 3 3 3
20 Male 1 1 1 1 1 3 1 5
21 Female 3 1 1 2 1 3 3 3
26 Female 5 5 1 2 1 4 4 3
27 Female 2 1 1 1 1 1 1 1
29 Male 2 2 2 2 1 4 4 4
31 Female 3 1 1 1 1 5 2 3
34 Female 4 1 1 4 3 3 1 4
36 Female 5 1 1 4 4 5 1 2
37 Male 5 1 2 4 4 5 4 5
38 Female 3 1 1 1 1 1 1 1"
To run PCA this is my code:
x <- na.omit(x) #Jus to simplyfy
resul <- prcomp(x[,-1], scale = TRUE)
x$PC1 <- resul$x[,1] #Saving Scores PC1
x$PC2 <- resul$x[,2] #Saving Scores PC2
The result axis are like this:
biplot(resul, scale = 0)
Finally, to make the scatter plot:
x %>%
group_by(Q1b) %>%
summarise(mean_PC1 = mean(PC1),
mean_PC2 = mean(PC2)) %>%
ggplot(aes(x=mean_PC1, y=mean_PC2, colour=Q1b)) +
geom_point() +
theme_bw()
Which gives me this:
I'm not sure how about read the results... Should I accept that Females in general get higher values in the dimension of future expectations than Males. And Males get higher values in the entrepreneurial dimension?
Thanks in advance!!
Your interpretation of the axes looks correct, i.e., PC1 is a gradient which from left to right represents decreasing "entrepreneurialness", while PC2 is a gradient which from bottom to top represents increasing future expectations (assuming that "5" in the original data means highest entrepreneurialness/expectations).
In terms of whether males and females are different, you probably need to plot more than the just the means for each group: even if males and females are truly identical in their entrepreneurialness/expectations, you'd never expect the means from two samples to sit right on top of each other on a scatter plot. To address this, you could plot the actual observations rather than their means (i.e., one point per row, coloured by gender) and see if they intermingle vs. separate in the plot space. Or, regress gender against the principal components.
Another issue is whether it's appropriate to use PCA on ordinal data - see here for discussion.

Find multiple based on a single criterium (arrayfun)

I am trying to recieve all values from a variable (b) when using a criterium based on another variable (a) (it's like the =IF function in excel). like this:
Example:
(a): 1 2 2 2 3 3 3 3
(b): 3 6 3 5 6 4 5 4
my criteria is
(a) = 2
my reply has to be:
(b) = 6 3 5
I tried to find a solution using arrayfun, like this:
arrayfun(#(x) b(find(a == x, 1, 'first')), 2)
obviously, it only answers the 6, the first number that matches the criterium. Can I somehow formulate arrayfun correctly? Or do I need a whole other function?
Thanks!
Don't you just want:
a = [ 1 2 2 2 3 3 3 3]
b = [3 6 3 5 6 4 5 4]
b(a == 2)
ans =
6 3 5
If a was a matrix then:
a = [ 1 2 2 2 3 3 3 3; ...
1 1 1 2 2 3 4 4; ]
b = [3 6 3 5 6 4 5 4]
b(a(1,:)==2)
ans =
6 3 5

Sort on specific columns, output only one of those identical but having the highest number in another column

I have records like these:
1 4 6 4 2 4 8
2 3 5 4 6 7 1
5 4 6 4 3 8 4
1 4 6 4 5 7 1
5 7 3 3 3 6 3
6 7 3 3 4 8 4
I want to sort them on columns 2,3,4, and 6 and keep just one of those identical in column 2,3,4 and having the biggest number in column 6 such as:
1 4 6 4 5 7 1
2 3 5 4 6 7 1
5 4 6 4 3 8 4
5 7 3 3 3 6 3
6 7 3 3 4 8 4
I have tried all kinds of combinations between sort and uniq but everything fails because uniq cannot be applied onto a specific column. The only thing I came up with is to change the order of the columns as to first sort as above then move records 2,3,and 4 to the end and then run uniq with -w as to focus only on the last 3 records. This seems quite inefficient to me.
Thanks for help!
You can achieve this with two passes of sort(assuming in the first place I understand your requirement correctly, seeing that the desired data snippet posted above does not match your description of it) . The first pass sorts by field 2 through 4 ascending and field 6 descending, the second pass sorts on fields 2 through 4 only but passing in the "stable sort" and unique flags in addition to pick out those rows for each combination of fields 2-4 that have the highest value from field 6
sort -k2,4n -k6,6nr file.txt | sort -k2,4n -s -u
2 3 5 4 6 7 1
5 4 6 4 3 8 4
6 7 3 3 4 8 4

Data structures - Queue

I have queue q of integer numbers stored in an array in the circular fashion
from front to rear that is..
f: 1
r: 8
array Q: 0 1 2 3 4 5 6 7 8 9
2 8 4 4 3 5 4
What is the array representation of queue q after i perform the following?
while q.front() is an even number do q.enqueue(q.dequeue()).
It will loop until it reaches number 3. So the array will be
0 1 2 3 4 5 6 7 8 9
4 4 3 5 4 2 8

How can I sort a 2-D array in MATLAB with respect to 2nd row?

I have array say "a"
a =
1 4 5
6 7 2
if i use function
b=sort(a)
gives ans
b =
1 4 2
6 7 5
but i want ans like
b =
5 1 4
2 6 7
mean 2nd row should be sorted but elements of ist row should remain unchanged and should be correspondent to row 2nd.
sortrows(a',2)'
Pulling this apart:
a = 1 4 5
6 7 2
a' = 1 6
4 7
5 2
sortrows(a',2) = 5 2
1 6
4 7
sortrows(a',2)' = 5 1 4
2 6 7
The key here is sortrows sorts by a specified row, all the others follow its order.
You can use the SORT function on just the second row, then use the index output to sort the whole array:
[junk,sortIndex] = sort(a(2,:));
b = a(:,sortIndex);
How about
a = [1 4 5; 6 7 2]
a =
1 4 5
6 7 2
>> [s,idx] = sort(a(2,:))
s =
2 6 7
idx =
3 1 2
>> b = a(:,idx)
b =
5 1 4
2 6 7
in other words, you use the second argument of sort to get the sort order you want, and then you apply it to the whole thing.

Resources