Special Case of Natural Join - relational-algebra

What is the the result of natural join if two relations have exactly the same attributes? That is suppose
A B A B
1 2 7 8
3 4 9 10
5 6 1 2
Would the result just be
A B
1 2

Related

Find unrelated partitions of a complete binary tree

I have a complete binary tree of height 'h'.
How do I find 'h' number of unrelated partitions for this ?
NOTE:
Unrelated partition means no child can be present with its immediate parent.
There is a constraint on the number of nodes in each partition.
The difference of the maximum number nodes in a partition and the minimum number of nodes in the partition can either be 0 or 1.
Also, root is excluded from including in the partitions.
Who devised the problem probably had a more elegant solution in mind, but the following works.
Let's say we have h partitions numbered 1 to h, and that the nodes of partition n have value n. The root node has value 0, and does not participate in the partitions. Let's call a partition even if nis even, and odd if n is odd. Let's also number the levels of the complete binary tree, ignoring the root and starting from level 1 with 2 nodes. Level n has 2n nodes, and the complete tree has 2h+1-1 nodes, but only P=2h+1-2 nodes belong to the partitions (because the root is excluded). Each partition consists of p=⌊P/h⌋ or p=⌈P/h⌉ nodes, such that ∑ᵢpᵢ=P.
If the height h of the tree is even, put all even partitions into the even levels of the left subtree and the odd levels of the right subtee, and put all odd partitions into the odd levels of the left subtree and the even levels of the right subtree.
If h is odd, distribute all partitions up to partition h-1 like in the even case, but distribute partition h evenly into the last level of the left and right subtrees.
This is the result for h up to 7 (I wrote a tiny Python library to print binary trees to the terminal in a compact way for this purpose):
0
1 1
0
1 2
2 2 1 1
0
1 2
2 2 1 1
1 1 3 3 2 2 3 3
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 4 4 4 4 4 4 4 1 3 3 3 3 3 3 3
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 4 4 1 1 1 1 1 1 3 3
3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
And this is the code that generates it:
from basicbintree import Node
for h in range(1, 7 + 1):
root = Node(0)
P = 2 ** (h + 1) - 2 # nodes in partitions
p = P // h # partition size (may be p or p + 1)
if h & 1: # odd height
t = (p + 1) // 2 # subtree tail nodes from split partition
n = (h - 1) // 2 # odd or even partitions in subtrees except tail
else: # even height
t = 0 # no subtree tail nodes from split partition
n = h // 2 # odd or even partitions in subtrees
s = P // 2 - t # subtree nodes excluding tail
r = s - n * p # partitions of size p + 1 in subtrees
x = [p + 1] * r + [p] * (n - r) # nodes indexed by subtree partition - 1
odd = [1 + 2 * i for i, c in enumerate(x) for _ in range(c)] + [h] * t
even = [2 + 2 * i for i, c in enumerate(x) for _ in range(c)] + [h] * t
for g in range(1, h + 1):
start = 2 ** (g - 1) - 1
stop = 2 ** g - 1
if g & 1: # odd level
root.set_level(odd[start:stop] + even[start:stop])
else: # even level
root.set_level(even[start:stop] + odd[start:stop])
print('```none')
root.print_tree()
print('```')
All trees produced up to height 27 have been programmatically confirmed to meet the specifications.
Some parts of the algorithm would need a proof, like, e.g., that it's always possible to choose an even size for the split partition in the odd height case, but this and other proofs are left as an exercise to the reader ;-)

Pandas Sort Order of Columns by Row

Given the following data frame:
df = pd.DataFrame({'A' : ['1','2','3','7'],
'B' : [7,6,5,4],
'C' : [5,6,7,1],
'D' : [1,9,9,8]})
df=df.set_index('A')
df
B C D
A
1 7 5 1
2 6 6 9
3 5 7 9
7 4 1 8
I want to sort the order of the columns descendingly on the bottom row like this:
D B C
A
1 1 7 5
2 9 6 6
3 9 5 7
7 8 4 1
Thanks in advance!
Easiest way is to take the transpose, then sort_values, then transpose back.
df.T.sort_values('7', ascending=False).T
or
df.T.sort_values(df.index[-1], ascending=False).T
Gives:
D B C
A
1 1 7 5
2 9 6 6
3 9 5 7
7 8 4 1
Testing
my solution
%%timeit
df.T.sort_values(df.index[-1], ascending=False).T
1000 loops, best of 3: 444 µs per loop
alternative solution
%%timeit
df[[c for c in sorted(list(df.columns), key=df.iloc[-1].get, reverse=True)]]
1000 loops, best of 3: 525 µs per loop
You can use sort_values (by the index position of your row) with axis=1:
df.sort_values(by=df.index[-1],axis=1,inplace=True)
Here is a variation that does not involve transposition:
df = df[[c for c in sorted(list(df.columns), key=df.iloc[-1].get, reverse=True)]]

Hive - add column with number of distinct values in groups

Suppose I have the following data.
number group
1 a
1 a
3 a
4 a
4 a
5 c
6 b
6 b
6 b
7 b
8 b
9 b
10 b
14 b
15 b
I would like to group the data by group and add a further column which say how many distinct values of number each group has.
My desired output would look as follows:
number group dist_number
1 a 3
1 a 3
3 a 3
4 a 3
4 a 3
5 c 1
6 b 9
6 b 9
6 b 9
7 b 9
8 b 9
9 b 9
10 b 9
14 b 9
15 b 9
What I have tried is:
> select *, count(distinct number) over(partition by group) from numbers;
11 11
As one sees, this aggregates globally and calculates the number of distinct values independently from the group.
One thing I could do is to use group by as follows:
hive> select *, count(distinct number) from numbers group by group;
a 3
b 7
c 1
And then join over group
But I thought maybe there is a more easy solution to this, e.g., using the over(partition by group) method?
You definitely want to use windowing functions here. I'm not exactly sure how you got 11 11 from the query your tried; I'm 99% sure if you try to count(distinct _) in Hive with an over/partition it will complain. To get around this you can use collect_set() to get an array of the distinct elements in the partition and then you can use size() to count the elements.
Query:
select *
, size(num_arr) dist_num
from (
select *
, collect_set(num) over (partition by grp) num_arr
from db.tbl ) x
Output:
4 a [4,3,1] 3
4 a [4,3,1] 3
3 a [4,3,1] 3
1 a [4,3,1] 3
1 a [4,3,1] 3
15 b [15,14,10,9,8,7,6] 7
14 b [15,14,10,9,8,7,6] 7
10 b [15,14,10,9,8,7,6] 7
9 b [15,14,10,9,8,7,6] 7
8 b [15,14,10,9,8,7,6] 7
7 b [15,14,10,9,8,7,6] 7
6 b [15,14,10,9,8,7,6] 7
6 b [15,14,10,9,8,7,6] 7
6 b [15,14,10,9,8,7,6] 7
5 c [5] 1
I included in the arrays in the output so you could see what was happening, obviously you can discard them in your query. As as note, doing a self-join here is really a disaster with regards to performance (and it's more lines of code).
As per your requirement,this may work:
select number,group1,COUNT(group1) OVER (PARTITION BY group1) as dist_num from table1;

How to compute a natural join??? 5

Table R (A, C) contains the following entries:
A C
3 3
6 4
2 3
3 5
7 1
Table S (B, C, D) following
B C D
5 1 6
1 5 8
4 3 9
Calculate the natural join of R and S. Which of the lines would be the result? Each resulting string has the following schema (A, B, C, D).
Please help!!!
Got the answer by looking at this. So your answer should be: {(3,4,3,9),(2,4,3,9),(3,1,5,8),(7,5,1,6)}
A B C D
3 4 3 9
2 4 3 9
3 1 5 8
7 5 1 6

How can I sort a 2-D array in MATLAB with respect to 2nd row?

I have array say "a"
a =
1 4 5
6 7 2
if i use function
b=sort(a)
gives ans
b =
1 4 2
6 7 5
but i want ans like
b =
5 1 4
2 6 7
mean 2nd row should be sorted but elements of ist row should remain unchanged and should be correspondent to row 2nd.
sortrows(a',2)'
Pulling this apart:
a = 1 4 5
6 7 2
a' = 1 6
4 7
5 2
sortrows(a',2) = 5 2
1 6
4 7
sortrows(a',2)' = 5 1 4
2 6 7
The key here is sortrows sorts by a specified row, all the others follow its order.
You can use the SORT function on just the second row, then use the index output to sort the whole array:
[junk,sortIndex] = sort(a(2,:));
b = a(:,sortIndex);
How about
a = [1 4 5; 6 7 2]
a =
1 4 5
6 7 2
>> [s,idx] = sort(a(2,:))
s =
2 6 7
idx =
3 1 2
>> b = a(:,idx)
b =
5 1 4
2 6 7
in other words, you use the second argument of sort to get the sort order you want, and then you apply it to the whole thing.

Resources