Pandas pivot table Nested Sorting Part 3 - sorting

Episode 3:
In part 2, we retained the hierarchical nature of the indices while sorting within right-most level. In part 1, we applied a custom sort to the left-most index level while sorting the values within the right-most index.
Now, I'd like to combine both methods.
Given the following data frame and resultant pivot table:
import pandas as pd
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
df
A B C D
0 a x a 7
1 a y b 5
2 a z a 3
3 a x b 4
4 a y a 1
5 b z b 6
6 b x a 5
7 b y b 3
8 b z a 1
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
D
A B C
a x a 7
b 4
y a 1
b 5
z a 3
b x a 5
y b 3
z a 1
b 6
I would like to specify a custom order of 'B'.
This seems to work:
df['B']=df['B'].astype('category')
df['B'].cat.set_categories(['z','x','y'],inplace=True)
Next, I'd like for the pivot table to keep the order for 'B' specified above while sorting the values 'D' descendingly within each category of 'B'.
Like this:
D
A B C
z a 3
x a 7
a b 4
y b 5
a 1
z b 6
b a 1
x a 5
y b 3
Thanks in advance!

UPDATE: using pivot_table()
In [79]: df.pivot_table(index=['A','B','C'], aggfunc='sum').reset_index().sort_values(['A','B','D'], ascending=[1,1,0]).set_index(['A','B','C'])
Out[79]:
D
A B C
a x a 7
b 4
y b 5
a 1
z a 3
b x a 5
y b 3
z b 6
a 1
is that what you want?
In [64]: df.sort_values(['A','B','D'], ascending=[1,1,0]).set_index(['A','B','C'])
Out[64]:
D
A B C
a z a 3
x a 7
b 4
y b 5
a 1
b z b 6
a 1
x a 5
y b 3

Related

How to build special turing machine

Good day. I have a question. Many people are familiar with the Turing machine. The following task arose, which I can’t solve for a long time: there is an alphabet consisting of the letters "X", "Y" "Z", if the number of letters "Z" in the word is exactly 2 more than the letters "X", replace the second letter "Z" with "X". Otherwise, leave the word unchanged. Considering that I cannot change the original word and the tape is infinite (that is, I cannot write an infinite number of states for the machine), I do not understand how to do this.
Just to make it more clear, if the input is for example:
XXXYZZZZZ
then the output you need is:
XXXYZXZZZZ
And if the input is:
XXYZZ
Then everything needs to stay the same, since (number of Z's) - (number of X's) != 2
XXYZZ
If I understood the problem correctly, in the way I defined above, then here comes the solution with Morphett's TM Simulator, with $ sign as left-end marker:
1 $ $ r 1
1 X A r 2
1 Y Y r 4
2 X X r 2
2 B B r 2
2 Y Y r 2
2 Z B R 3
3 Y Y l 3
3 B B l 3
3 X X l 3
3 A A r 1
3 Z Z l 3
4 Y Y r 4
4 B B r 4
4 Z B r 5
4 _ _ l 7
5 Z B r 6
5 _ _ l 7
6 Z Z l 7
6 _ _ l 8
7 B Z l 7
7 Y Y l 7
7 A X l 7
7 $ $ r 12
8 B Z l 8
8 Y Y l 8
8 A X l 8
8 $ $ r 9
9 X X r 9
9 Y Y r 9
9 Z Z r 10
10 Z X r 11
11 Z Z r 11
11 _ _ l halt
12 X X r 12
12 Y Y r 12
12 Z Z r 12
12 _ _ l halt
Copy this code and then paste it to http://morphett.info/turing/turing.html
From advanced options, set initial state to 1 from 0.
Do not forget to add a "$" to beginning of every input.

Split a subset with a constraint

Today, while practicing some Algorithm questions I found an interesting question.
The question is
You have to divide 1 to n (with one missing value x ) into two equal
halfs such that sum of the two halfs are equal.
Example:
If n = 7 and x = 4
The solution will be {7, 5} and {1, 2, 3, 6}
I can answer it with brute force method but i want an efficient solution
Can any one help me out?
If the sum of the elements 1→N without x is odd then there is no solution.
Otherwise you can find your solution in O(N) with balanced selection.
4 in a row
First let us consider that any sequence of four contiguous numbers can be split in two sets with equal sum given that:
[x, x+1, x+2, x+3] → [x+3, x];[x+2, x+1]
Thus selecting them and placing them in sets A B B A balances sets A and B.
4 across
Moreover, when we have two couples across an omitted value, it can hold a similar property:
[x-2, x-1, x+1, x+2] → [x+2, x-2]; [x+1, x-1]
so still A B B A
At this point we can fix the following cases:
we have a quadruplet: we split it as in case 1
we have 2 numbers, x and other 2 numbers: we split as in case 2
Alright, but it can happen we have 3 numbers, x and other 3 numbers, or other conditions. How can we select in balanced manner anyway?
+2 Gap
If we look again at the gap across x:
[x-1, x+1]
we can notice that somehow if we split the two neighbors in two separate sets we must balance a +2 on the set with bigger sum.
Balancing Tail
We can do this by using the last four numbers of the sequence:
[4 3 2 1] → [4, 2] ; [3, 1] → 6 ; 4
Finally we have to consider that we might not have one of them, so let's build the other case:
[3 2 1] → [2] ; [3, 1] → 2 ; 4
and let us also realize we can do the very same at the other end of the sequence with an A B A B (or B A B A) pattern - if our +2 stands on B (or A);
4 across +
It is amazing that 4 across still holds if we jump h (odd!) numbers:
[x+3, x+2, x-2, x-3] → [x+3, x-3]; [x+2, x-2]
So, exploring the array we can draw the solution step by step
An example:
11 10 9 8 7 6 5 4 3 2 1
the sum it's even, so x can be only an even number:
x = 10
11 - 9 | 8 7 6 5 | 4 3 2 1 → (+2 gap - on A) (4 in a row) (balancing tail)
A B A B B A B A B A
x = 8
11 10 | 9 - 7 | 6 5 | 4 3 2 1 → (4 across +) (+2 gap - on A) (balancing tail)
a b A B | b a | B A B A
x = 6
11 10 9 8 | 7 - 5 | 4 3 2 1 → (4 in a row) (+2 gap - on A) (balancing tail)
A B B A A B A B B B
x = 4 we have no balancing tail - we have to do that with head
11 10 9 8 | 7 6 | 5 - 3 | 2 1 → (balancing head) (4 across +) (+2 gap)
A B A B A B | b a | B A
x = 2
11 10 9 8 | 7 6 5 4 | 3 - 1 → (balancing head) (4 in a row) (+2 gap)
A B A B A B B A B A
It is interesting to notice the symmetry of the solutions. Another example.
10 9 8 7 6 5 4 3 2 1
the sum it's odd, so x can be only an odd number, and the number of elements now is odd.
x = 9
10 - 8 | 7 6 5 4 | 3 2 1 → (+2 gap - on A) (4 in a row) (balancing tail)
A B A B B A B A B
x = 7
10 9 | 8 - 6 | 5 4 | 3 2 1 → (4 across +) (+2 gap - on A) (balancing tail)
a b | A B | b a B A B
x = 5
10 9 8 7 | 6 - 4 | 3 2 1 → (4 in a row) (+2 gap - on A) (balancing tail)
A B B A A B B A B
x = 3
10 9 8 7 | 6 5 | 4 - 2 | 1 → (balancing head) (4 across + virtual 0) (+2 gap)
A B A B B A | a b | A
x = 1
10 9 8 7 | 6 5 4 3 | 2 → (balancing head) (4 in a row) (+2 gap virtual 0)
A B A B A B B A B
Finally it is worth to notice we can switch from A to B whenever we have a full balanced segment (i.e. 4 in a row or 4 across)
Funny said - but the property requesting the sum([1 ... N]-x) to be even makes the cases quite redundant if you try yourself.
I am pretty sure this algorithm can be generalized - I'll probably provide a revised version soon.
This problem can be solved by wrapping the standard subset sum problem of dynamic programming with preprocessing steps. These steps are of O(1) com
Algorithm (n, x):
sum = n * (n+1) / 2
neededSum = sum - x
If (neededSum % 2 != 0): return 0
create array [1..n] and remove x from it
call standard subsetsum(arr, 0, neededSum/2, [])
Working python implementation of subsetsum algorithm - printing all subsets is given below.
def subsetsum(arr, i, sum, ss):
if i >= len(arr):
if sum == 0:
print ss
return 1
else:
return 0
ss1 = ss[:]
count = subsetsum(arr, i + 1, sum, ss1)
ss1.append(arr[i])
count += subsetsum(arr, i + 1, sum - arr[i], ss1)
return count
arr = [1, 2, 3, 10, 5, 7]
sum = 14
a = []
print subsetsum(arr, 0, sum, a)
Hope it helps!

sorting a dataframe by values and storing index and columns

I have a pandas DataFrame which is actually a matrix. It looks as shown below
a b c
d 1 0 5
e 0 6 2
f 2 0 3
I need the values to be sorted and need the values of index and columns of them. the result should be
index Column Value
e b 6
d c 5
f c 3
You need stack for reshape with nlargest:
df1 = df.stack().nlargest(3).rename_axis(['idx','col']).reset_index(name='val')
print (df1)
idx col val
0 e b 6
1 d c 5
2 f c 3
For MultiIndex:
df2 = df.stack().nlargest(3).to_frame(name='val')
print (df2)
val
e b 6
d c 5
f c 3

How to compute a natural join??? 5

Table R (A, C) contains the following entries:
A C
3 3
6 4
2 3
3 5
7 1
Table S (B, C, D) following
B C D
5 1 6
1 5 8
4 3 9
Calculate the natural join of R and S. Which of the lines would be the result? Each resulting string has the following schema (A, B, C, D).
Please help!!!
Got the answer by looking at this. So your answer should be: {(3,4,3,9),(2,4,3,9),(3,1,5,8),(7,5,1,6)}
A B C D
3 4 3 9
2 4 3 9
3 1 5 8
7 5 1 6

How to sort dataframe in R with specified column order preservation?

Let's say I have a data.frame
x <- data.frame(a = c('A','A','A','A','A', 'C','C','C','C', 'B','B','B'),
b = c('a','c','a','a','c', 'd', 'e','e','d', 'b','b','b'),
c = c( 7, 3, 2, 4, 5, 3, 1, 1, 5, 5, 2, 3),
stringsAsFactors = FALSE)
> x
a b c
1 A a 7
2 A c 3
3 A a 2
4 A a 4
5 A c 5
6 C d 3
7 C e 1
8 C e 1
9 C d 5
10 B b 5
11 B b 2
12 B b 3
I would like to sort x by columns b and c but keeping order of a as before. x[order(x$b, x$c),] - breaks order of column a. This is what I want:
a b c
3 A a 2
4 A a 4
1 A a 7
2 A c 3
5 A c 5
6 C d 3
9 C d 5
7 C e 1
8 C e 1
11 B b 2
12 B b 3
10 B b 5
Is there a quick way of doing it?
Currently I run "for" loop and sort each subset, I'm sure there must be a better way.
Thank you!
Ilya
If column "a" is ordered already, then its this simple:
> x[order(x$a,x$b, x$c),]
a b c
3 A a 2
4 A a 4
1 A a 7
2 A c 3
5 A c 5
6 B d 3
9 B d 5
7 B e 1
8 B e 1
11 C b 2
12 C b 3
10 C b 5
If column a isn't ordered (but is grouped), create a new factor with the levels of x$a and use that.
Thank you Spacedman! Your recommendation works well.
x$a <- factor(x$a, levels = unique(x$a), ordered = TRUE)
x[order(x$a,x$b, x$c),]
Following Gavin's comment
x$a <- factor(x$a, levels = unique(x$a))
x[order(x$a,x$b, x$c),]
require(doBy)
orderBy(~ a + b + c, data=x)

Resources