Hierarchical Update in Hive - hadoop

I got a hive table as follows:
Table A
docid corr_docid header
100 a
101 100 b
102 c
105 101 d
106 102 e
107 106 f
108 107 g
109 h
Is it possible to create another table.
Here corr_docid 107 corrects the document with docid 107.
Table B as follows:
Table A
docid corr_docid header newdocid
100 a 105
101 100 b 105
102 c 108
105 101 d 105
106 102 e 108
107 106 f 108
108 107 g 108
109 h 109
Is this possible in hive.

You can try this native SQL to get desired result, This will work only if you know the hierarchy depth/level, is 4 here.
`select a.docid,
case when b.docid is null then a.docid
when c.docid is null then b.docid
when d.docid is null then c.docid
else d.docid
end newdocid
from Table_A a left join Table_A b on a.docid = b.corr_docid
left join Table_A c on b.docid = c.corr_docid
left join Table_A d on c.docid = d.corr_docid ;`


Google sheet query 2 columns as search key and search

I met some problem with google sheet function.
I have 2 tables. I want to search table1 Date+User as key value in table2.
Date User Unit
2022/05/30 A 109
2022/05/30 B 119
2022/05/30 C 119
2022/05/29 D 109
2022/05/29 E 114
Date User Amount
2022/05/30 A 1
2022/05/30 B 2
2022/05/30 C 3
2022/05/30 D 41
2022/05/30 E 5
2022/05/29 D 6
2022/05/29 E 7
2022/05/29 F 81
2022/05/29 G 9
2022/05/29 A 101
2022/05/29 B 11
2022/05/29 C 121
2022/05/29 D 13
after query I hope the table looks like
Hope Result
Date User Unit Amount
2022/05/30 A 109 1
2022/05/30 B 119 2
2022/05/30 C 119 3
2022/05/29 D 109 6
2022/05/29 E 114 7
This is a sample google sheet
Can I ask for help ?
Many Thanks
Two options. The first pulls all matching combinations of DATE and USER
"select Col1, Col2, Col4, Col3
where Col4 is not null
Col1 'Date',
Col2 'User',
Col3 'Amount',
Col4 'Unit'"))
which returns
The second matches your output exactly, but does omit that second D value for the 29th (13)
"where Col4 is not null
format Col1 'yyyy/mm/dd'"))
Both have been added to your sheet. If either of these work out for you, I can break it down.

Sum data in one column in a specific order in Spotfire

Does anyone know how to create a calculated column (in Spotfire) that will sum data in order of increasing values contained within another column?
For example, what would the expression be to Sum data in [P] in increasing order of [K], for each [Well]
Some example data:
Well Depth P K
A 85 0.191 108
A 85.5 0.192 102
A 87 0.17 49
A 88 0.184 47
A 89 0.192 50
B 298 0.215 177
B 298.5 0.2 177
B 300 .017 105
B 301 0.23 200
You can use:
Sum([P]) OVER (intersect([Well],AllPrevious([K])))
This returns the cumulative sum of P in order of K per Well in ascending order of K.
Well K P Cumulative Sum of P
A 47 0,184 0,184
A 49 0,17 0,354
A 50 0,192 0,546
A 102 0,192 0,738
A 108 0,191 0,929
B 105 0,017 0,017
B 177 0,215 0,432
B 177 0,2 0,432
B 200 0,23 0,662
Edit Based on OP's comment:
you can use to get the cumulative sum in descending order of K:
Sum([P]) OVER (intersect([Well],AllNExt([K])))

Computing lag in Hive by a variable

My input table looks like:
guest_id days
101 79
101 70
101 68
101 61
102 101
102 90
102 55
103 99
103 90
Note that, days are in descending order,by guest_id
Desired output table:
guest_id days days_diff
101 79 0
101 70 9
101 68 2
101 61 7
102 101 0
102 90 11
102 55 35
103 99 0
103 90 9
days_diff is the first order difference by guest_id (not throughout days column)
You need to have a unique id column as well (otherwise Hive doesn't know about the order of your rows).
Then you can just self join on id=id+1 to get your differences:
select a.guest_id,
case when a.guest_id = b.guest_id then b.days-a.days else 0 end days_diff
input a
join input b on a.id=b.id-1
Edit: As pointed out by Kunal in the comments, Hive does have a Lag window function which requires a PARTITION BY ... ORDER BY clause; you still need something to order your table by, for example if you have a date column you would used this like the following:
SELECT guest_id,
LAG(days, 1, 0) OVER (PARTITION BY guest_id ORDER BY date)
FROM input;

Merging two files and ordering them

I want to merge two files in one and order them based on the values of the second column. The example is the following:
File 1:
+ 1.01 id 120
- 1.20 id 145
+ 2.15 id 411
File 2:
r 0.21 id 4
r 1.78 id 85
r 102 id 850
I want to merge them in one file but I would like to put them in ascending order based on the column 2 like this:
File 3:
r 0.21 id 4
+ 1.01 id 120
- 1.20 id 145
r 1.78 id 85
+ 2.15 id 411
r 102 id 850
How could I do this?
how about
sort -k2n file1 file2
f1 and f2 are your files:
kent$ sort -k2n f1 f2
r 0.21 id 4
+ 1.01 id 120
- 1.20 id 145
r 1.78 id 85
+ 2.15 id 411
r 102 id 850

Getting the first occurrence of group of rows

Am having a case where I have transaction and this transaction consists of several steps. I want to return the first occurrence of this transaction, for example:
Trn Loc step
111 0 1
111 0 2
111 0 3
222 3 1
222 3 2
333 5 1
333 5 2
333 5 3
and i want to get this result :
tran loc
111 0
222 3
333 5
I think it is supposed to be done by partition function but I don't know how...any help please?
select t.trn, t.loc
from (select trn, loc, ROW_NUMBER() OVER (PARTITION BY trn, loc ORDER BY trn, loc) as rnum
from table ) t
where t.rnum = 1
Or you can use RANK() function instead of ROW_NUMBER(), rest of syntax is same.
