Repeat values with in the GROUP in HIVE - hadoop

I am trying to repeat a row value in the subsequent rows with in GROUP. A Group can have one or more TAG. The requirement is to populate NEW_TAG in the row where the TAG is populated and in the subsequent rows until another TAG populated with in the same group or we reach end of that GROUP.
Current Table Required Table
GROUPID SEQ TAG GROUPID SEQ TAG NEW_TAG
------- --- ---- ------- --- --- --------
1 1 1 1
1 2 1 2
1 3 1 3
1 4 4 1 4 4 4
1 5 1 5 4
1 6 1 6 4
1 7 1 7 4
1 8 1 8 4
2 1 2 1
2 2 2 2
2 3 2 3
2 4 2 4
2 5 5 2 5 5 5
2 6 2 6 5
2 7 2 7 5
2 8 2 8 5
2 9 9 2 9 9 9
2 10 2 10 9
2 11 2 11 9
Thanks

Assuming TAG is always increasing
max(TAG) over
(
partition by GROUPID
order by SEQ
rows between unbounded preceding
and current row
) as NEW_TAG
select *
,max(TAG) over
(
partition by GROUPID
order by SEQ
rows between unbounded preceding
and current row
) as NEW_TAG
from mytable
;
+---------+--------+--------+---------+
| groupid | seq | tag | new_tag |
+---------+--------+--------+---------+
| 1 | 1 | | |
| 1 | 2 | | |
| 1 | 3 | | |
| 1 | 4 | 4 | 4 |
| 1 | 5 | | 4 |
| 1 | 6 | | 4 |
| 1 | 7 | | 4 |
| 1 | 8 | | 4 |
| 2 | 1 | | |
| 2 | 2 | | |
| 2 | 3 | | |
| 2 | 4 | | |
| 2 | 5 | 5 | 5 |
| 2 | 6 | | 5 |
| 2 | 7 | | 5 |
| 2 | 8 | | 5 |
| 2 | 9 | 9 | 9 |
| 2 | 10 | | 9 |
| 2 | 11 | | 9 |
+---------+--------+--------+---------+

Related

Combinate of values in a table to get the sum of each combination

I have a table with numeric data that i need make diferent combinations itself.
For example:
| A |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
I need to combine this single column to get the next result:
| A | B | C | D |
| - | - | - | - |
| 1 | | | |
| 1 | 2 | | |
| 1 | 2 | 3 | |
| 1 | 2 | 3 | 4 |
| 1 | 2 | | 4 |
| 1 | | 3 | |
| 1 | | 3 | 4 |
| 1 | | | 4 |
| | 2 | | |
| | 2 | 3 | |
| | 2 | 3 | 4 |
| | 2 | | 4 |
| | | 3 | |
| | | 3 | 4 |
| | | | 4 |
At the end of the table, i have to create a column with the Count of every column that has data and another column that contains the sums of number of each columns.
Maybe it sound very difficult or impossible, but I haven't a way to make it work.
I have try to "Cross Join" from SQL but didn't got the expected result.
Help!
In this case, you can solve this by counting in binary ending with the digits being the number of numbers in the set. etc. the starting set 2568 would end with 1111. this binary number would decide if you show that number in each row. Heres a table of how it would work.
| A |
|---|
| 2 |
| 5 |
| 6 |
| 8 |
A
B
C
D
Binary
Row number
8
0001
1
6
0010
2
6
8
0011
3
5
0100
4
5
8
0101
5
5
6
0110
6
5
6
8
0111
7
2
1000
8
2
8
1001
9
2
6
1010
10
2
6
8
1011
11
2
5
1100
12
2
5
8
1101
13
2
5
6
1110
14
2
5
6
8
1111
15

Iterate in oracle using incremental iterable variables

I'm trying to get some data iteratively from an Oracle database. Here's a skeleton of the query I'm trying:
select a.id, a.cd,
sum(case
when b.count>0
then b.count+1
when b.count=0
then 1
else 0
end),
sum(case
when b.count>0 and b.received is NULL and c.rid is NULL
then b.count+1
when b.count=0 and b.received is NULL and c.rid is NULL\
then 1
else 0
end),
sum(case
when c.err='Y'
then 1
else 0
end),
max(b.sent)
from
b
left join c
on b.oid=c.oid
inner join a
on b.id=a.id
inner join d
on (d.oid=b.oid
and d.tss=
(select max(tss)
from e where rid=b.rid
and rcd=b.rcd))
where b.sent is not null
/* !!! The line below !!! */
and b.sent > sysdate -2*i/24 and b.sent <= sysdate -2*(i-1)/24
and b.id=7
group by a.id,a.cd
order by a.id
Please notice the highlighted part and b.sent > sysdate -2*i/24 and b.sent <= sysdate -2*(i-1)/24. This is where I want to iterate incrementing the value of i each time. I tried putting this in a for loop incrementing the value of i each time but got the error
"PLS-00428: an INTO clause is expected in this SELECT statement".
When I tried to declare variables and put the results of the select statement into them, I was unable to due to the group by statements. I'm getting stuck at this point hence reaching out to the members for the expert help. Thank you for patiently going through this. Please do let me know if there is a way to resolve this issue.
You do not need an explicit loop if you just want to execute some query for subsequent values of parameter. You can generate a sequence of values and use lateral join (in Oracle 12+) for each row:
create table t
as
select level as l
, sysdate + trunc(level / 5) as v
from dual
connect by level < 13
with a as (
select level as i
from dual
connect by level < 6
)
select
a.i as iter,
t2.*
from a
outer apply (
select *
from t
where v between sysdate - i
and sysdate + i
) t2
ITER | L | V
---: | -: | :--------
1 | 1 | 19-APR-21
1 | 2 | 19-APR-21
1 | 3 | 19-APR-21
1 | 4 | 19-APR-21
1 | 5 | 20-APR-21
1 | 6 | 20-APR-21
1 | 7 | 20-APR-21
1 | 8 | 20-APR-21
1 | 9 | 20-APR-21
2 | 1 | 19-APR-21
2 | 2 | 19-APR-21
2 | 3 | 19-APR-21
2 | 4 | 19-APR-21
2 | 5 | 20-APR-21
2 | 6 | 20-APR-21
2 | 7 | 20-APR-21
2 | 8 | 20-APR-21
2 | 9 | 20-APR-21
2 | 10 | 21-APR-21
2 | 11 | 21-APR-21
2 | 12 | 21-APR-21
3 | 1 | 19-APR-21
3 | 2 | 19-APR-21
3 | 3 | 19-APR-21
3 | 4 | 19-APR-21
3 | 5 | 20-APR-21
3 | 6 | 20-APR-21
3 | 7 | 20-APR-21
3 | 8 | 20-APR-21
3 | 9 | 20-APR-21
3 | 10 | 21-APR-21
3 | 11 | 21-APR-21
3 | 12 | 21-APR-21
4 | 1 | 19-APR-21
4 | 2 | 19-APR-21
4 | 3 | 19-APR-21
4 | 4 | 19-APR-21
4 | 5 | 20-APR-21
4 | 6 | 20-APR-21
4 | 7 | 20-APR-21
4 | 8 | 20-APR-21
4 | 9 | 20-APR-21
4 | 10 | 21-APR-21
4 | 11 | 21-APR-21
4 | 12 | 21-APR-21
5 | 1 | 19-APR-21
5 | 2 | 19-APR-21
5 | 3 | 19-APR-21
5 | 4 | 19-APR-21
5 | 5 | 20-APR-21
5 | 6 | 20-APR-21
5 | 7 | 20-APR-21
5 | 8 | 20-APR-21
5 | 9 | 20-APR-21
5 | 10 | 21-APR-21
5 | 11 | 21-APR-21
5 | 12 | 21-APR-21
db<>fiddle here

How to use 2 for loops in Hive

How do I use 2 for loops in Hive?
I have input data as below:
1 a 3
15 b 4
1 b 2
25 a 5
15 c 3
1 a 3
15 c 2
25 b 4
Intermediate Output: For 1 count total no. of a and b, similar for 15 and 25
1 a 6
1 b 2
15 b 4
15 c 5
25 a 5
25 b 4
Final output: Need for 1 max count
1 a 6
15 c 5
25 a 5
You can use window functions and get the results.Check this out:
> select * from shailesh;
INFO : OK
+----------------+----------------+----------------+--+
| shailesh.col1 | shailesh.col2 | shailesh.col3 |
+----------------+----------------+----------------+--+
| 1 | a | 3 |
| 15 | b | 4 |
| 1 | b | 2 |
| 25 | a | 5 |
| 15 | c | 3 |
| 1 | a | 3 |
| 15 | c | 2 |
| 25 | b | 4 |
+----------------+----------------+----------------+--+
8 rows selected (0.359 seconds)
> create table shailesh2 as select col1, col2, max(col3s) col3s2 from (select col1,col2,sum(col3) over(partition by col1,col2) col3s from shailesh ) t group by col1, col2;
INFO : OK
+-----------------+-----------------+-------------------+--+
| shailesh2.col1 | shailesh2.col2 | shailesh2.col3s2 |
+-----------------+-----------------+-------------------+--+
| 1 | a | 6 |
| 1 | b | 2 |
| 15 | b | 4 |
| 15 | c | 5 |
| 25 | a | 5 |
| 25 | b | 4 |
+-----------------+-----------------+-------------------+--+
6 rows selected (0.36 seconds)
> select col1, col2, col3s2 from (select col1,col2,col3s2, rank() over(partition by col1 order by col3s2 desc) as rk from shailesh2) t2 where rk=1;
INFO : OK
+-------+-------+---------+--+
| col1 | col2 | col3s2 |
+-------+-------+---------+--+
| 1 | a | 6 |
| 15 | c | 5 |
| 25 | a | 5 |
+-------+-------+---------+--+
3 rows selected (37.224 seconds)

How to sort data by hierarchy

Lets say I have some data like this
Name | ID | ParentID | Level
------------+-----+----------+-------
Fruits | 1 | 0 | 1
Vegetables | 2 | 0 | 1
Apple | 3 | 1 | 2
Banana!! | 4 | 1 | 2
Tomato | 5 | 2 | 2
Potato | 6 | 2 | 2
red | 7 | 5 | 3
green | 8 | 5 | 3
How to sort (compare) this data to get a result like this:
Name | ID | ParentID | Level
------------+-----+----------+---------
Fruits | 1 | 0 | 1
Apple | 3 | 1 | 2
Banana!! | 4 | 1 | 2
Vegetables | 2 | 0 | 1
Tomato | 5 | 2 | 2
red | 7 | 5 | 3
green | 8 | 5 | 3
Potato | 6 | 2 | 2
Background is that I have a model-collection with models and I want to add them according to the hierarchy given by ID/ParentID

Sort values in an associative array pl/sql

If ID is even I must sort the values that correspond to that ID DESC , if the ID is odd I must sort the values ASC. This is the table called Grades.
ID|COL1|COL2|COL3|COL4|COL5|COL6|COL7|
1 | 6 | 3 | 8 | 4 | 7 | 8 | 4 |
2 | 5 | 7 | 9 | 2 | 1 | 7 | 8 |
3 | 2 | 7 | 4 | 8 | 1 | 5 | 9 |
4 | 8 | 4 | 7 | 9 | 4 | 1 | 4 |
5 | 7 | 5 | 2 | 5 | 2 | 6 | 4 |
The result must be this:
ID|COL1|COL2|COL3|COL4|COL5|COL6|COL7|
1 | 3 | 4 | 4 | 6 | 7 | 8 | 8 |
2 | 9 | 8 | 7 | 7 | 5 | 2 | 1 |
3 | 1 | 2 | 4 | 5 | 7 | 8 | 9 |
4 | 9 | 8 | 7 | 4 | 4 | 4 | 1 |
5 | 2 | 2 | 4 | 5 | 5 | 6 | 7 |
As you can see ID=1->odd number so the values must be sorted ASC
This is the code so far:
declare
type grades_array is table of grades%rowtype index by pls_integer;
grades_a grades_array;
cnt number;
begin
Select count(id) into cnt from grades;
For i in 1..cnt loop
--I used an associative array
Select * into grades_a(i) from grades where grades.id=i;
end loop;
For i in grades_a.FIRST..grades_a.LAST loop
if (mod(grades_a(i).id,2)=1)then .......
--I don't know how to sort the specific rows, in this case ASC
--dbms_output.put_line(grades_a(i).col1);
end if;
end loop;
--Also it is specified in the exercise that the table can change, e.g add more columns
end;
I would simply use PIVOT/UNPIVOT for this.
First UNPIVOT the table and assign a rank to each column value in ascending/descending order.
SQL Fiddle
Query 1:
SELECT id,
colval,
ROW_NUMBER () OVER (
PARTITION BY id
ORDER BY CASE MOD (id, 2) WHEN 1 THEN colval END,
CASE MOD (id, 2) WHEN 0 THEN colval END DESC) r
FROM x UNPIVOT (colval FOR colname
IN (col1 AS 'col1', col2 AS 'col2', col3 AS 'col3', col4 AS 'col4',
col5 AS 'col5', col6 AS 'col6', col7 AS 'col7')
)
Results:
| ID | COLVAL | R |
|----|--------|---|
| 1 | 3 | 1 |
| 1 | 4 | 2 |
| 1 | 4 | 3 |
| 1 | 6 | 4 |
| 1 | 7 | 5 |
| 1 | 8 | 6 |
| 1 | 8 | 7 |
| 2 | 9 | 1 |
| 2 | 8 | 2 |
| 2 | 7 | 3 |
| 2 | 7 | 4 |
| 2 | 5 | 5 |
| 2 | 2 | 6 |
| 2 | 1 | 7 |
| 3 | 1 | 1 |
| 3 | 2 | 2 |
| 3 | 4 | 3 |
| 3 | 5 | 4 |
| 3 | 7 | 5 |
| 3 | 8 | 6 |
| 3 | 9 | 7 |
| 4 | 9 | 1 |
| 4 | 8 | 2 |
| 4 | 7 | 3 |
| 4 | 4 | 4 |
| 4 | 4 | 5 |
| 4 | 4 | 6 |
| 4 | 1 | 7 |
| 5 | 2 | 1 |
| 5 | 2 | 2 |
| 5 | 4 | 3 |
| 5 | 5 | 4 |
| 5 | 5 | 5 |
| 5 | 6 | 6 |
| 5 | 7 | 7 |
Then PIVOT the result based on the rank.
Query 2:
WITH pivoted AS (
SELECT id,
colval,
ROW_NUMBER () OVER (
PARTITION BY id
ORDER BY CASE MOD (id, 2) WHEN 1 THEN colval END,
CASE MOD (id, 2) WHEN 0 THEN colval END DESC) r
FROM x UNPIVOT (colval FOR colname
IN (col1 AS 'col1', col2 AS 'col2', col3 AS 'col3', col4 AS 'col4',
col5 AS 'col5', col6 AS 'col6', col7 AS 'col7')
)
)
SELECT * FROM pivoted
PIVOT (MAX (colval)
FOR r
IN (1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4,
5 AS col5, 6 AS col6, 7 AS col7))
Results:
| ID | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 |
|----|------|------|------|------|------|------|------|
| 1 | 3 | 4 | 4 | 6 | 7 | 8 | 8 |
| 2 | 9 | 8 | 7 | 7 | 5 | 2 | 1 |
| 3 | 1 | 2 | 4 | 5 | 7 | 8 | 9 |
| 4 | 9 | 8 | 7 | 4 | 4 | 4 | 1 |
| 5 | 2 | 2 | 4 | 5 | 5 | 6 | 7 |

Resources