Rewrite matrix into rules - algorithm

I have a lot of rectangular matrices where each cell represents some outcome. As matrices are difficult to maintain, it is my goal to rewrite all of them into rules.
Example Matrix 1:
This is easy to turn into rules (pseudocode):
if (i <= 5 and j <=3) then A
else if (i <= 5 and j >=4) then B
else C
How do I rewrite the following matrix?
Plain text:
ij 1 2 3 4 5 6 7 8 9
1 A A A A C C C C B
2 A A A C C C C B B
3 A A C C C C B B B
4 A C C C C B B B B
5 C C C C B B B B B
6 C C C B B B B B B
7 C C B B B B B B B
8 C B B B B B B B B
9 B B B B B B B B B

The second matrix can be represented as:
if (i+j <= 5)
return A;
else if (i+j <= 9)
return C;
else
return B;
In general, you can check which side of a diagonal line a point is on by testing i+j for a / line, or i-j for a \ line.

Related

sorting a dataframe by values and storing index and columns

I have a pandas DataFrame which is actually a matrix. It looks as shown below
a b c
d 1 0 5
e 0 6 2
f 2 0 3
I need the values to be sorted and need the values of index and columns of them. the result should be
index Column Value
e b 6
d c 5
f c 3
You need stack for reshape with nlargest:
df1 = df.stack().nlargest(3).rename_axis(['idx','col']).reset_index(name='val')
print (df1)
idx col val
0 e b 6
1 d c 5
2 f c 3
For MultiIndex:
df2 = df.stack().nlargest(3).to_frame(name='val')
print (df2)
val
e b 6
d c 5
f c 3

Pandas pivot table Nested Sorting Part 3

Episode 3:
In part 2, we retained the hierarchical nature of the indices while sorting within right-most level. In part 1, we applied a custom sort to the left-most index level while sorting the values within the right-most index.
Now, I'd like to combine both methods.
Given the following data frame and resultant pivot table:
import pandas as pd
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
df
A B C D
0 a x a 7
1 a y b 5
2 a z a 3
3 a x b 4
4 a y a 1
5 b z b 6
6 b x a 5
7 b y b 3
8 b z a 1
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
D
A B C
a x a 7
b 4
y a 1
b 5
z a 3
b x a 5
y b 3
z a 1
b 6
I would like to specify a custom order of 'B'.
This seems to work:
df['B']=df['B'].astype('category')
df['B'].cat.set_categories(['z','x','y'],inplace=True)
Next, I'd like for the pivot table to keep the order for 'B' specified above while sorting the values 'D' descendingly within each category of 'B'.
Like this:
D
A B C
z a 3
x a 7
a b 4
y b 5
a 1
z b 6
b a 1
x a 5
y b 3
Thanks in advance!
UPDATE: using pivot_table()
In [79]: df.pivot_table(index=['A','B','C'], aggfunc='sum').reset_index().sort_values(['A','B','D'], ascending=[1,1,0]).set_index(['A','B','C'])
Out[79]:
D
A B C
a x a 7
b 4
y b 5
a 1
z a 3
b x a 5
y b 3
z b 6
a 1
is that what you want?
In [64]: df.sort_values(['A','B','D'], ascending=[1,1,0]).set_index(['A','B','C'])
Out[64]:
D
A B C
a z a 3
x a 7
b 4
y b 5
a 1
b z b 6
a 1
x a 5
y b 3

Removing certain columns from a text file [duplicate]

This question already has answers here:
Deleting columns from a file with awk or from command line on linux
(4 answers)
Closed 8 years ago.
I have a text file that looks like this:
A B C A B C A B C A B
G T C A G T C A G T C
A B C A B C A B C A B
A B C A B C A B C A B
A D E A B D E A B D E
A B C A B C A B C A B
C B D G C B D G C B D
Is there a way to extract only certain columns and leave the other columns intact?
For example removing only columns 2 and 5:
A C A C A B C A B
G C A T C A G T C
A C A C A B C A B
A C A C A B C A B
A E A D E A B D E
A C A C A B C A B
C D G B D G C B D
Thanks in advance.
UPDATE:
Found this answer using awk, but this extract whole "block" of columns and I only want to extract some.
Awk for extracting columns 3 to 5:
awk -F 'FS' 'BEGIN{FS="\t"}{for (i=1; i<=NF-1; i++) if(i<3 || i>5) {printf $i FS};{print $NF}}' input.txt
in your case you could do
cat your_file |cut -d ' ' --complement -s -f2,5
where ' ' is the delimiter(in your case the space)

Which function/algorithm for this merging and filling operation?

I have written R code that merges two data frames based on first column and for missing data adds the value from above. Here is what is does:
Two input data frames:
1 a
2 b
3 c
5 d
And
1 e
4 f
6 g
My code gives this output:
1 a e
2 b e
3 c e
4 c f
5 d f
6 d g
My code is however inefficient as it is not vectorized properly. Are there some R functions which I could use? Basically a function I am looking for is that fills in missing values / NA values and takes the value from previous element and puts it in place of NA.
I looked through reference book of R, but could not find anything.
Here is a solution making use of zoo::na.locf
library(zoo)
a <- data.frame(id=c(1,2,3,5), v=c("a","b","c", "d"))
b <- data.frame(id=c(1,4,6), v=c("e", "f", "g"))
n <- max(c(a$id, b$id))
an <- merge(data.frame(id=1:n), a, all.x=T)
bn <- merge(data.frame(id=1:n), b, all.x=T)
an$v <- na.locf(an$v)
bn$v <- na.locf(bn$v)
data.frame(an$id, an$v, bn$v)
an.id an.v bn.v
1 1 a e
2 2 b e
3 3 c e
4 4 c f
5 5 d f
6 6 d g

How to sort dataframe in R with specified column order preservation?

Let's say I have a data.frame
x <- data.frame(a = c('A','A','A','A','A', 'C','C','C','C', 'B','B','B'),
b = c('a','c','a','a','c', 'd', 'e','e','d', 'b','b','b'),
c = c( 7, 3, 2, 4, 5, 3, 1, 1, 5, 5, 2, 3),
stringsAsFactors = FALSE)
> x
a b c
1 A a 7
2 A c 3
3 A a 2
4 A a 4
5 A c 5
6 C d 3
7 C e 1
8 C e 1
9 C d 5
10 B b 5
11 B b 2
12 B b 3
I would like to sort x by columns b and c but keeping order of a as before. x[order(x$b, x$c),] - breaks order of column a. This is what I want:
a b c
3 A a 2
4 A a 4
1 A a 7
2 A c 3
5 A c 5
6 C d 3
9 C d 5
7 C e 1
8 C e 1
11 B b 2
12 B b 3
10 B b 5
Is there a quick way of doing it?
Currently I run "for" loop and sort each subset, I'm sure there must be a better way.
Thank you!
Ilya
If column "a" is ordered already, then its this simple:
> x[order(x$a,x$b, x$c),]
a b c
3 A a 2
4 A a 4
1 A a 7
2 A c 3
5 A c 5
6 B d 3
9 B d 5
7 B e 1
8 B e 1
11 C b 2
12 C b 3
10 C b 5
If column a isn't ordered (but is grouped), create a new factor with the levels of x$a and use that.
Thank you Spacedman! Your recommendation works well.
x$a <- factor(x$a, levels = unique(x$a), ordered = TRUE)
x[order(x$a,x$b, x$c),]
Following Gavin's comment
x$a <- factor(x$a, levels = unique(x$a))
x[order(x$a,x$b, x$c),]
require(doBy)
orderBy(~ a + b + c, data=x)

Resources