Issue with HIVE with row_number() over() syntax - oracle

I have HIVE table ( details below):
hive> select * from abcd ;
OK
a 1 1
b 2 2
a 3 3
Time taken: 0.261 seconds, Fetched: 3 row(s)
hive> desc abcd;
OK
val001 string
val002 int
val003 int
Time taken: 0.084 seconds, Fetched: 3 row(s)
I am writing following query but receiving below error :
select max(rnk) rnk, max(val) val, sum(cnt) cnt from (select val, count(*) cnt, row_number() over (order by case val when null then 0 else count(*) end desc, val) rnk from (select VAL001 val from abcd ) group by val) group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;
FAILED: ParseException line 3:55 missing ) at 'by' near 'by'
line 3:58 missing EOF at 'val' near 'by'
I am looking for following result from above query :
RNK VAL CNT
--- ------------------------------ ---
1 a 2
2 b 1
I was able to achieve the same from Oracle database having similar kind of table. Only difference was instead of order by case I used order by decode in Oracle DB but since decode is not supported in HIVe I can not do the same.
Please find ORacle DB SQL query which is working :
SQL> select max(rnk) rnk, max(val) val, sum(cnt) cnt from
(select val, count(*) cnt, row_number() over (order by
decode(val,null,0,count(*)) desc, val) rnk from (select VAL001 val from
table_name ) group by val)
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;
RNK VAL CNT
--- ------------------------------ ---
1 a 2
2 b 1
Can anyone please help me fixing HIVE query. Let me know if you need any more details.

This is your query. I suspect there is a simpler way to get what you want:
select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
from (select VAL001 val from abcd )
group by val
)
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;
I think you just need table aliases for the subqueries in the from clause:
select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
from (select VAL001 val from abcd
) x
group by val
) x
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;

This is not technically simpler solution, but possible easier to read:
The first subquery performs the count and ranking,
the second subquery the categorisation in the top 1 - top 100 and the special categories for other (top) and unknown.
The final query makes the grouping.
with cnt as (
select VAL001 val,
count(*) as cnt,
row_number() over (order by decode(VAL001,null,0,count(*)) desc, VAL001) as rnk
from abcd
group by VAL001),
ctg as (
select
val, cnt, rnk,
case when val is NULL then 'unknown'
when rnk <= 100 then 'top '||rnk
else 'other' end as category_code
from cnt)
select
max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from ctg
group by category_code
order by 1

Related

Oracle sort values into column

I have a table like this:
time length name
00:01:00 2 a
00:11:22 2 a
01:01:00 45 a
00:23:00 3 b
and I want to retrieve data from the table in the form:
a b
time length time length
00:01:00 2 00:23:00 3
00:11:22 2
01:01:00 2
so it is a simple task of rearranging data, atm I am doing this in a bash script, but I wonder if there is an easy way to do it in Oracle?
You can use analytical function ROW_NUMBER and full outer join as follows:
WITH CTE1 AS
(SELECT T.*, ROW_NUMBER() OVER (ORDER BY LENGTH, TIME) AS RN FROM YOUR_TABLE T WHERE NAME = 'a'),
CTE2 AS
(SELECT T.*, ROW_NUMBER() OVER (ORDER BY LENGTH, TIME) AS RN FROM YOUR_TABLE T WHERE NAME = 'b')
SELECT A.TIME, A.LENGTH, B.TIME, B.LENGTH
FROM CTE1 A FULL OUTER JOIN CTE2 B
ON A.RN = B.RN
Note: You need to use proper order by to order the records as per your requirement. I have used LENGTH, TIME
You can use a multi-column pivot, by adding an extra column that links the related A and B values; presumably by time order, something like:
select time_col, length_col, name_col,
dense_rank() over (partition by name_col order by time_col) as rnk
from your_table;
TIME_COL LENGTH_COL N RNK
-------- ---------- - ----------
00:01:00 2 a 1
00:11:22 2 a 2
01:01:00 45 a 3
00:23:00 3 b 1
and then pivot based on that:
select *
from (
select time_col, length_col, name_col,
dense_rank() over (partition by name_col order by time_col) as rnk
from your_table
)
pivot (
max(time_col) as time_col, max(length_col) as length_col
for name_col in ('a' as a, 'b' as b)
);
RNK A_TIME_C A_LENGTH_COL B_TIME_C B_LENGTH_COL
---------- -------- ------------ -------- ------------
1 00:01:00 2 00:23:00 3
2 00:11:22 2
3 01:01:00 45
I've left the rnk value in the output; if you don't want that you can list the columns in the select list:
select a_time_col, a_length_col, b_time_col, b_length_col
from ...
Or you could do the same thing with conditional aggregation (which is what pivot uses under the hood anyway):
select
max(case when name_col = 'a' then time_col end) as time_col_a,
max(case when name_col = 'a' then length_col end) as length_col_a,
max(case when name_col = 'b' then time_col end) as time_col_b,
max(case when name_col = 'b' then length_col end) as length_col_b
from (
select time_col, length_col, name_col,
dense_rank() over (partition by name_col order by time_col) as rnk
from your_table
)
group by rnk
order by rnk;
TIME_COL LENGTH_COL_A TIME_COL LENGTH_COL_B
-------- ------------ -------- ------------
00:01:00 2 00:23:00 3
00:11:22 2
01:01:00 45
db<>fiddle

Rownum for a column with lag

I have this query
SELECT CASE WHEN LAG(emp_id) OVER( ORDER BY NULL ) = emp_id THEN '-'
ELSE emp_id END "Employee ID",
row_number() over (partition by emp_id order by emp_id) as "S/N",
family_mem_id "MemID",
CASE WHEN LAG(emp_id) OVER(ORDER BY NULL ) = emp_id THEN 0
ELSE (SUM(amount_paid) OVER(PARTITION BY emp_id)) END "Total amount"
FROM Employee
ORDER BY emp_id;
And it shows me result like this: Resultset
I want to add row number (first column SN) for Employee ID for in between rows I want to set it as null, for e.g. For Employee ID -> S/N 2-> F904 (SN should be null). How can I do that?
You can use the outer query to create the SN column as follows:
SELECT CASE WHEN "S/N" = 1 THEN '<1>' END AS SN, T.* FROM -- added this
(SELECT
CASE WHEN LAG(emp_id) OVER( ORDER BY NULL ) = emp_id THEN '-'
ELSE emp_id END "Employee ID",
row_number() over (partition by emp_id order by emp_id) as "S/N",
family_mem_id "MemID",
CASE WHEN LAG(emp_id) OVER(ORDER BY NULL ) = emp_id THEN 0
ELSE (SUM(amount_paid) OVER(PARTITION BY emp_id)) END "Total amount"
FROM Employee ) T
ORDER BY "Employee ID", "S/N"; -- Added this
Note that I have also changed the ORDER BY clause (which is moved to the outer query).
Cheers!!
Use dense_rank() to enumerate rows:
select case lag(emp_id) over( order by emp_id ) when emp_id then null
else '<'||dense_rank() over (order by emp_id)||'>'
end sn,
nullif(emp_id, lag(emp_id) over( order by emp_id )) empid,
row_number() over (partition by emp_id order by emp_id) rn,
family_mem_id memid,
case lag(emp_id) over(order by emp_id) when emp_id then null
else (sum(amount_paid) over(partition by emp_id))
end total
FROM Employee order by emp_id;
dbfiddle and sample output:
SN EMPID RN MEMID TOTAL
---- ----- ------ ----- ----------
<1> 101 1 F901 200
2 F904
<2> 102 1 F901 135
2 F901
<3> 103 1 F901 185
2 F901
3 F901

Can't understand Oracle logic for Procedure

PROCEDURE DELETE_X1
IS
v_emp_unum VARCHAR2 (25);
BEGIN
/*FOR rec
IN (SELECT l2.*,
row_number ()
OVER (PARTITION BY case_uid,nvl(min_eff_date, eff_begin_date)
ORDER BY case_uid,nvl(min_eff_date, eff_begin_date))
rw,
ROWID rid
FROM (SELECT l1.*,
MIN (
CASE
WHEN eff_end_date = next_begin - 1
THEN
eff_begin_date
END)
OVER (PARTITION BY case_uid)
min_eff_date,
MAX (
CASE
WHEN previous_end <>
TO_DATE ('12/31/9999',
'mm/dd/yyyy')
AND previous_end + 1 = eff_begin_date
THEN
eff_end_date
END)
OVER (PARTITION BY case_uid)
max_end_date
FROM (SELECT GT.*,
LEAD (
EFF_begin_DATE)
OVER (PARTITION BY CASE_UID
ORDER BY EFF_BEGIN_DATE)
next_begin,
LAG (
EFF_end_DATE)
OVER (PARTITION BY CASE_UID
ORDER BY EFF_BEGIN_DATE)
previous_end
FROM TABLE_OUTPUT GT
WHERE SRC = 'ERICSSON' AND STATUS_CODE = 'X1') l1)
l2)*/
FOR rec
IN (SELECT l2.*,
ROW_NUMBER ()
OVER (PARTITION BY case_uid, min_eff_date, max_end_date
ORDER BY case_uid, min_eff_date, max_end_date)
rw,
ROWID rid
FROM (SELECT l1.*,
(MAX (
start_at)
OVER (
PARTITION BY case_uid
ORDER BY EFF_BEGIN_DATE
ROWS UNBOUNDED PRECEDING))
min_eff_date,
(MIN (
break_at)
OVER (
PARTITION BY case_uid
ORDER BY EFF_BEGIN_DATE
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING))
max_end_date
FROM (SELECT GT.*,
(CASE
WHEN LAG (
EFF_end_DATE)
OVER (
PARTITION BY CASE_UID
ORDER BY EFF_BEGIN_DATE) =
EFF_BEGIN_DATE
- 1
THEN
NULL
ELSE
EFF_BEGIN_DATE
END)
start_at,
(CASE
WHEN LEAD (
EFF_BEGIN_DATE)
OVER (
PARTITION BY case_uid
ORDER BY EFF_BEGIN_DATE) =
CASE
WHEN EFF_end_DATE <>
TO_DATE (
'12/31/9999',
'mm/dd/yyyy')
THEN
EFF_end_DATE
+ 1
ELSE
EFF_end_DATE
END
THEN
NULL
ELSE
EFF_end_DATE
END)
break_at
FROM TABLE_OUTPUT GT
WHERE SRC = 'ERICSSON' AND STATUS_CODE = 'X1') l1)
l2)
The part of code is commented out an re written.
commented out code output
OFF TIME 1/1/2017 1/7/2017 X1
OFF TIME 1/8/2017 2/1/2017 X1
New code output
OFF TIME 1/1/2017 2/1/2017 X1
NORMAL 2/2/2017 2/2/2017 AB
OFF TIME 2/20/2017
The LAG function is used to access data from a previous row.
The LEAD function is used to return data from rows further down the result set.
aggregate functions MIN , MAX
i am pretty confused with the flow of code.
I can't understand that code on the whole please explain the logic for the code

Retrieving page that contains specific row

I have a procedure that returns multiple rows on some criteria and in specific order. These rows are separated into few pages (50 rows per page).
How can I retrieve all rows from page having some specific row.
I've created a query the query that do this work, but it is not optimized and have huge impact on performance. Help me please to optimize it or give an alternative to it:
select *
from
(
select file_id, row_number() over (order by rownum) rn
from my_table
)
where trunc(rn/50) = (
select trunc(rn/50) from
(select t.*, rownum rn from my_table t)
where file_id = 29987);
You have to adjust it a little...
with tab as
(
-- the
-- row_number() over (order by rownum) rn
-- should be here
select level + 1000 as val
, level/50 as rn_50
from dual
connect by
level < 140
)
, val as
(
select rn_50
from tab
where val = 1004 -- pg 1
--where val = 1051 -- pg 2
--where val = 1101 -- pg 3
)
select *
from tab t
where rn_50 >= (select floor(rn_50) from val)
and rn_50 <= (select ceil (rn_50) from val)
;

Query to exclude row based on another row's filter

I'm using Oracle 10g.
Question: How can I write query to return just ID only if ALL the codes for that ID end in 6? I don't want ID=1 because not all its codes end in 6.
TABLE_A
ID Code
===============
1 100
1 106
2 206
3 316
3 326
4 444
Desired Result:
ID
==
2
3
You simply want each ID where the count of rows for that id is the same as the count of rows where the third digit is six.
SELECT ID
FROM TABLE_A
GROUP BY ID
HAVING COUNT(*) = COUNT(CASE WHEN SUBSTR(code,3,1) = '6' THEN 1 END)
Try this:
SELECT DISTINCT b.id
FROM (
SELECT id,
COUNT(1) cnt
FROM table_a
GROUP BY id
) a,
(
SELECT id,
COUNT(1) cnt
FROM table_a
WHERE CODE LIKE '%6'
GROUP BY id
)b
WHERE a.id = b.id
AND a.cnt = b.cnt
Alternative using ANALYTIC functions:
SELECT DISTINCT id
FROM
(
SELECT id,
COUNT(1) OVER(PARTITION BY id) cnt,
SUM(CASE WHEN code LIKE '%6' THEN 1 ELSE 0 END) OVER(PARTITION BY id) sm
FROM table_a
)
WHERE cnt = sm

Resources