How to sort dataframe in R with specified column order preservation? - sorting

Let's say I have a data.frame
x <- data.frame(a = c('A','A','A','A','A', 'C','C','C','C', 'B','B','B'),
b = c('a','c','a','a','c', 'd', 'e','e','d', 'b','b','b'),
c = c( 7, 3, 2, 4, 5, 3, 1, 1, 5, 5, 2, 3),
stringsAsFactors = FALSE)
> x
a b c
1 A a 7
2 A c 3
3 A a 2
4 A a 4
5 A c 5
6 C d 3
7 C e 1
8 C e 1
9 C d 5
10 B b 5
11 B b 2
12 B b 3
I would like to sort x by columns b and c but keeping order of a as before. x[order(x$b, x$c),] - breaks order of column a. This is what I want:
a b c
3 A a 2
4 A a 4
1 A a 7
2 A c 3
5 A c 5
6 C d 3
9 C d 5
7 C e 1
8 C e 1
11 B b 2
12 B b 3
10 B b 5
Is there a quick way of doing it?
Currently I run "for" loop and sort each subset, I'm sure there must be a better way.
Thank you!
Ilya

If column "a" is ordered already, then its this simple:
> x[order(x$a,x$b, x$c),]
a b c
3 A a 2
4 A a 4
1 A a 7
2 A c 3
5 A c 5
6 B d 3
9 B d 5
7 B e 1
8 B e 1
11 C b 2
12 C b 3
10 C b 5
If column a isn't ordered (but is grouped), create a new factor with the levels of x$a and use that.

Thank you Spacedman! Your recommendation works well.
x$a <- factor(x$a, levels = unique(x$a), ordered = TRUE)
x[order(x$a,x$b, x$c),]
Following Gavin's comment
x$a <- factor(x$a, levels = unique(x$a))
x[order(x$a,x$b, x$c),]

require(doBy)
orderBy(~ a + b + c, data=x)

Related

sorting a dataframe by values and storing index and columns

I have a pandas DataFrame which is actually a matrix. It looks as shown below
a b c
d 1 0 5
e 0 6 2
f 2 0 3
I need the values to be sorted and need the values of index and columns of them. the result should be
index Column Value
e b 6
d c 5
f c 3
You need stack for reshape with nlargest:
df1 = df.stack().nlargest(3).rename_axis(['idx','col']).reset_index(name='val')
print (df1)
idx col val
0 e b 6
1 d c 5
2 f c 3
For MultiIndex:
df2 = df.stack().nlargest(3).to_frame(name='val')
print (df2)
val
e b 6
d c 5
f c 3

Pandas pivot table Nested Sorting Part 3

Episode 3:
In part 2, we retained the hierarchical nature of the indices while sorting within right-most level. In part 1, we applied a custom sort to the left-most index level while sorting the values within the right-most index.
Now, I'd like to combine both methods.
Given the following data frame and resultant pivot table:
import pandas as pd
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
df
A B C D
0 a x a 7
1 a y b 5
2 a z a 3
3 a x b 4
4 a y a 1
5 b z b 6
6 b x a 5
7 b y b 3
8 b z a 1
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
D
A B C
a x a 7
b 4
y a 1
b 5
z a 3
b x a 5
y b 3
z a 1
b 6
I would like to specify a custom order of 'B'.
This seems to work:
df['B']=df['B'].astype('category')
df['B'].cat.set_categories(['z','x','y'],inplace=True)
Next, I'd like for the pivot table to keep the order for 'B' specified above while sorting the values 'D' descendingly within each category of 'B'.
Like this:
D
A B C
z a 3
x a 7
a b 4
y b 5
a 1
z b 6
b a 1
x a 5
y b 3
Thanks in advance!
UPDATE: using pivot_table()
In [79]: df.pivot_table(index=['A','B','C'], aggfunc='sum').reset_index().sort_values(['A','B','D'], ascending=[1,1,0]).set_index(['A','B','C'])
Out[79]:
D
A B C
a x a 7
b 4
y b 5
a 1
z a 3
b x a 5
y b 3
z b 6
a 1
is that what you want?
In [64]: df.sort_values(['A','B','D'], ascending=[1,1,0]).set_index(['A','B','C'])
Out[64]:
D
A B C
a z a 3
x a 7
b 4
y b 5
a 1
b z b 6
a 1
x a 5
y b 3

Merge two text files line by line in Ruby

I'm trying to figure out how to merge two text files line by line. The letters file contains letters in a column A to I. Numbers contains numbers in a column from 1 to 9. This is what I have so far:
file='C:\\Users\\USERNAME\\Desktop\\numbers.txt'
f = File.open(file, "r")
f.each_line { |line|
dile='C:\\Users\\USERNAME\\Desktop\\letters.txt'
d = File.open(dile, "r")
d.each_line { |dine|
this = line + dine
print this
}
}
But my results are like this:
1
A
1
B
1
C
1
D
1
E
1
F
1
G
1
H
1
I
1
J2
A
2
B
2
C
2
D
2
E
2
F
2
G
2
H
2
I
2
J3
A
3
B
3
C
3
D
3
E
3
F
3
G
3
H
3
I
3
J4
A
4
B
4
C
4
D
4
E
4
F
4
G
4
H
4
I
4
J5
A
5
B
5
C
5
D
5
E
5
F
5
G
5
H
5
I
5
J6
A
6
B
6
C
6
D
6
E
6
F
6
G
6
H
6
I
6
J7
A
7
B
7
C
7
D
7
E
7
F
7
G
7
H
7
I
7
J8
A
8
B
8
C
8
D
8
E
8
F
8
G
8
H
8
I
8
J9
A
9
B
9
C
9
D
9
E
9
F
9
G
9
H
9
I
9
J10A
10B
10C
10D
10E
10F
10G
10H
10I
10J
When what I really want is something like this:
1A
2B
3C
4D
5E
6F
7G
8H
9I
Anyone have any idea how to do this?
f1, f2 = [
'C:\\Users\\USERNAME\\Desktop\\numbers.txt',
'C:\\Users\\USERNAME\\Desktop\\letters.txt'
]
File.readlines(f1).map(&:chomp)
.zip(File.readlines(f2).map(&:chomp))
.map(&:join)
or, without double chomping:
File.readlines(f1).zip(File.readlines(f2))
.map(&:join)
.map { |s| s.gsub /#$//, '' }
Its because each line already carries line feed \n. Try using chomp:
this = line.chomp + dine.chomp
Like a #mudasobwa answer
=> File.readlines('num').zip(File.readlines('let')).flat_map { |x| x.map(&:chomp!).join }
=> [
[0] "1A",
[1] "2B",
[2] "3C",
[3] "4D",
[4] "5E"
]
just without double chomp

converting 3 variable into a matrix form to create a heatmap in SAS

I'm trying to convert 3 vairables into a matrix, for expample if you have the following:
(CHAR) (char) (num)
Var1 Var2 Var3
A B 1
C D 2
E F 3
A D 4
A F 5
C B 6
C F 7
E B 8
E D 9
Any ideas on how to convert the above three variables into this form of matrix below and my goal is to construct a heatmap using this matix
B D F
A 1 4 5
C 6 2 7
E 8 9 3
Can anyone help me do this in SAS, either using SAS/IML or other Procedure? Thanks!
Assuming you are using a recent version of SAS/IML (13.1 or later), use the HEATMAPCONT or HEATMAPDISC call:
proc iml;
m = {1 4 5,
6 2 7,
8 9 3};
call heatmapcont(m) xvalues={B D F} yvalues={A C E};
For details, see Creating heat maps in SAS/IML
It will be better if you post your code first then ask questions.
I think proc transpose is the fastest solution.
data _t1;
input var1 $ var2 $ var3 5.;
cards;
A B 1
C D 2
E F 3
A D 4
A F 5
C B 6
C F 7
E B 8
E D 9
run;
proc sort data=_t1;by var1;run;
proc transpose data=_t1 out=_t2(drop=_name_ rename=(var1=HereUpToYou));
by var1;
var var3;
id var2;
run;

How to compute a natural join??? 5

Table R (A, C) contains the following entries:
A C
3 3
6 4
2 3
3 5
7 1
Table S (B, C, D) following
B C D
5 1 6
1 5 8
4 3 9
Calculate the natural join of R and S. Which of the lines would be the result? Each resulting string has the following schema (A, B, C, D).
Please help!!!
Got the answer by looking at this. So your answer should be: {(3,4,3,9),(2,4,3,9),(3,1,5,8),(7,5,1,6)}
A B C D
3 4 3 9
2 4 3 9
3 1 5 8
7 5 1 6

Resources