Very similar to my last question, now I want only the, "full combination," for a group in order of priority. So, from this source table:
+-------+-------+----------+
| GROUP | State | Priority |
+-------+-------+----------+
| 1 | MI | 1 |
| 1 | IA | 2 |
| 1 | CA | 3 |
| 1 | ND | 4 |
| 1 | AZ | 5 |
| 2 | IA | 2 |
| 2 | NJ | 1 |
| 2 | NH | 3 |
And so on...
I need a query that returns:
+-------+---------------------+
| GROUP | COMBINATION |
+-------+---------------------+
| 1 | MI, IA, CA, ND, AZ |
| 2 | NJ, IA, NH |
+-------+---------------------+
Thanks for the help, again!
Use listagg() ordering by priority within the group.
SELECT "GROUP",
listagg("STATE", ', ') WITHIN GROUP (ORDER BY "PRIORITY")
FROM "ELBAT"
GROUP BY "GROUP";
db<>fiddle
I want to pivot the following table
| ID | Code | date | qty |
| 1 | A | 1/1/19 | 11 |
| 1 | A | 2/1/19 | 12 |
| 2 | B | 1/1/19 | 13 |
| 2 | B | 2/1/19 | 14 |
| 3 | C | 1/1/19 | 15 |
| 3 | C | 3/1/19 | 16 |
into
| ID | Code | mth_1(1/1/19) | mth_2(2/1/19) | mth_3(3/1/19) |
| 1 | A | 11 | 12 | 0 |
| 2 | B | 13 | 14 | 0 |
| 3 | C | 15 | 0 | 16 |
I am new to hive, i am not sure how to implement it.
NOTE: I don't want to do mapping because my month values change over time.
I need a way to avoid duplicate values from oracle join, I have this scenario.
The first table contain general information about a person.
+-----------+-------+-------------+
| ID | Name | Birtday_date|
+-----------+-------+-------------+
| 1 | Byron | 12/10/1998 |
| 2 | Peter | 01/11/1973 |
| 4 | Jose | 05/02/2008 |
+-----------+-------+-------------+
The second table contain information about a telephone of the people in the first table.
+-------+----------+----------+----------+
| ID |ID_Person |CELL_TYPE | NUMBER |
+-------+- --------+----------+----------+
| 1221 | 1 | 3 | 099141021|
| 2221 | 1 | 2 | 099091925|
| 3222 | 1 | 1 | 098041013|
| 4321 | 2 | 1 | 088043153|
| 4561 | 2 | 2 | 090044313|
| 5678 | 4 | 1 | 092049013|
| 8990 | 4 | 2 | 098090233|
+----- -+----------+----------+----------+
The Third table contain information about a email of the people in the first table.
+------+----------+----------+---------------+
| ID |ID_Person |EMAIL_TYPE| Email |
+------+- --------+----------+---------------+
| 221 | 1 | 1 |jdoe#aol.com |
| 222 | 1 | 2 |jdoe1#aol.com |
| 421 | 2 | 1 |xx12#yahoo.com |
| 451 | 2 | 2 |dsdsa#gmail.com|
| 578 | 4 | 1 |sasaw1#sdas.com|
| 899 | 4 | 2 |cvcvsd#wew.es |
| 899 | 4 | 2 |cvsd#www.es |
+------+----------+----------+---------------+
I was able to produce a result like this, you can check in this link http://sqlfiddle.com/#!4/8e326/1
+-----+-------+-------------+----------+----------+----------+----------------+
| ID | Name | Birtday_date| CELL_TYPE| NUMBER |EMAIL_TYPE|EMAIL|
+-----+-------+-------------+----------+----------+----------+----------------+
| 1 | Byron | 12/10/1998 | 3 | 099141021|1 |jdoe#aol.com |
| 1 | Byron | 12/10/1998 | 2 | 099091925|2 |jdoe1#aol.com |
| 1 | Byron | 12/10/1998 | 1 | 099091925| | |
| 2 | Peter | 01/11/1973 | 1 | 088043153|1 |xx12#yahoo.com |
| 2 | Peter | 01/11/1973 | 2 | 090044313|2 |dsdsa#gmail.com |
| 4 | Jose | 05/02/2008 | 1 | 092049013|1 |sasaw1#sdas.com |
| 4 | Jose | 05/02/2008 | 2 | 098090233|2 |cvcvsd#wew.es |
+-----+-------+-------------+----------+----------+----------+----------------+
If you check the data in table Email for user with ID_Person = 4 only present two of the three emails that have, the problem for this case is the person have more emails that cellphone numbers and only will present the same number of the cellphone numbers.
The result i expected is something like this.
+-----+-------+-------------+----------+----------+----------+----------------+
| ID | Name | Birtday_date| CELL_TYPE| NUMBER |EMAIL_TYPE|EMAIL|
+-----+-------+-------------+----------+----------+----------+----------------+
| 1 | Byron | 12/10/1998 | 3 | 099141021|1 |jdoe#aol.com |
| 1 | Byron | 12/10/1998 | 2 | 099091925|2 |jdoe1#aol.com |
| 1 | Byron | 12/10/1998 | 1 | 099091925| | |
| 2 | Peter | 01/11/1973 | 1 | 088043153|1 |xx12#yahoo.com |
| 2 | Peter | 01/11/1973 | 2 | 090044313|2 |dsdsa#gmail.com |
| 4 | Jose | 05/02/2008 | 1 | 092049013|1 |sasaw1#sdas.com |
| 4 | Jose | 05/02/2008 | 2 | 098090233|2 |cvcvsd#wew.es |
| 4 | Jose | 05/02/2008 | | |2 |cvsd#www.es |
+-----+-------+-------------+----------+----------+----------+----------------+
This is the way that i need to present the data.
I could not understand why your query was so complex, thus, added the simple full outer join and it seems to be working:
select distinct p.id, p.name,
case when Lag(CELL) over(partition by p.id order by p.id,pe.id) = CELL then null else cell_type end as cell_type,
case when Lag(CELL) over(partition by p.id order by p.id,pe.id) = CELL then null else CELL end as CELL,
EMAIL_TYPE as EMAIL_TYPE, EMAIL as EMAIL
from person p full outer join phones pe on p.id = pe.id
full outer join emails e
on p.id = e.id and pe.cell_type = e.email_type;
Having a hive table with age column consisting of age of persons.
Have to count and display the top 3 age categories.
Ex: whether below 10, 10-15, 15-20, 20-25, 25-30, ...
Which age category appears more.
Please suggest me a query to do this.
select case
when age <= 10 then '0-10'
else concat_ws
(
'-'
,cast(floor(age/5)*5 as string)
,cast((floor(age/5)+1)*5 as string)
)
end as age_group
,count(*) as cnt
from mytable
group by 1
order by cnt desc
limit 3
;
You might need to set this parameter:
set hive.groupby.orderby.position.alias=true;
Demo
with mytable as
(
select floor(rand()*100) as age
from (select 1) x lateral view explode(split(space(100),' ')) pe
)
select case
when age <= 10 then '0-10'
else concat_ws('-',cast(floor(age/5)*5 as string),cast((floor(age/5)+1)*5 as string))
end as age_group
,count(*) as cnt
,sort_array(collect_list(age)) as age_list
from mytable
group by 1
order by cnt desc
;
+-----------+-----+------------------------------+
| age_group | cnt | age_list |
+-----------+-----+------------------------------+
| 0-10 | 9 | [0,0,1,3,3,6,8,9,10] |
| 25-30 | 9 | [26,26,28,28,28,28,29,29,29] |
| 55-60 | 8 | [55,55,56,57,57,57,58,58] |
| 35-40 | 7 | [35,35,36,36,37,38,39] |
| 80-85 | 7 | [80,80,81,82,82,82,84] |
| 30-35 | 6 | [31,32,32,32,33,34] |
| 70-75 | 6 | [70,70,71,71,72,73] |
| 65-70 | 6 | [65,67,67,68,68,69] |
| 50-55 | 6 | [51,53,53,53,53,54] |
| 45-50 | 5 | [45,45,48,48,49] |
| 85-90 | 5 | [85,86,87,87,89] |
| 75-80 | 5 | [76,77,78,79,79] |
| 20-25 | 5 | [20,20,21,22,22] |
| 15-20 | 5 | [17,17,17,18,19] |
| 10-15 | 4 | [11,12,12,14] |
| 95-100 | 4 | [95,95,96,99] |
| 40-45 | 3 | [41,44,44] |
| 90-95 | 1 | [93] |
+-----------+-----+------------------------------+
I have a SQL statement that has performance issues.
Adding the following index and a SQL hint to use the index improves the performance 10 fold but I do not understand why.
BUS_ID is part of the primary key(T1.REF is the other part fo the key) and clustered index on the T1 table.
The T1 table has about 100,000 rows. BUS_ID has only 6 different values. Similarly the T1.STATUS column can only have a limited number of
possibilities and the majority of these(99%) will be the same value.
If I run the query without the hint(/*+ INDEX ( T1 T1_IDX1) NO_UNNEST */) it takes 5 seconds and with the hint it takes .5 seconds.
I don't understand how the index helps the subquery as T1.STATUS isn't used in any of the 'where' or 'join' clauses in the subquery.
What am I missing?
SELECT
/*+ NO_UNNEST */
t1.bus_id,
t1.ref,
t2.cust,
t3.cust_name,
t2.po_number,
t1.status_old,
t1.status,
t1.an_status
FROM t1
LEFT JOIN t2
ON t1.bus_id = t2.bus_id
AND t1.ref = t2.ref
JOIN t3
ON t3.cust = t2.cust
AND t3.bus_id = t2.bus_id
WHERE (
status IN ('A', 'B', 'C') AND status_old IN ('X', 'Y'))
AND EXISTS
( SELECT /*+ INDEX ( T1 T1_IDX1) NO_UNNEST */
*
FROM t1
WHERE ( EXISTS ( SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq IN ( '0', '2' )
AND t1.bus_id = t6.bus_id)
OR (EXISTS
(SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq = '1'
AND (an_status = 'Y'
OR
an_status = 'X')
AND t1.bus_id = t6.bus_id))
AND t2.ref = t1.ref))
AND USER IN ('FRED')
AND ( t2.status != '45'
AND t2.status != '20')
AND NOT EXISTS ( SELECT
/*+ NO_UNNEST */
*
FROM t4
WHERE EXISTS
(
SELECT
/*+ NO_UNNEST */
*
FROM t5
WHERE pd IN ( '1',
'0' )
AND appl = 'RYP'
AND appl_id IN ( 'RL100')
AND t4.id = t5.id)
AND t2.ref = p.ref
AND t2.bus_id = p.bus_id);
Edited to include Explain Plan and index.
Without Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 20 | 1 | 211 | 15534188 | 19 |
view | | 20 | 1 | 211 | 15534188 | 19 |
count | | | | | | |
view | | 20 | 1 | 198 | 15534188 | 19 |
sort | ORDER BY | 20 | 1 | 114 | 15534188 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 2 | 1 | 15 | 7433 | 2 |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
With Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 21 | 1 | 211 | 15549142 | 19 |
view | | 21 | 1 | 211 | 15549142 | 19 |
count | | | | | | |
view | | 21 | 1 | 198 | 15549142 | 19 |
sort | ORDER BY | 21 | 1 | 114 | 15549142 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 3 | 1 | 15 | 22387 | 2 |
index-t1_idx#1 | FULL SCAN | 2 |97k| | 14643 | |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
Table Index
CREATE INDEX T1_IDX#1 ON T1 (BUS_ID, STATUS)