How to return non-empty rows for a given ID - Hive - hadoop

I have a table X
ID A B
--------------
1 abc 27
1 - 28
2 - 33
3 xyz 41
3 - 07
I need output as
ID A B
--------------
1 abc 27
2 - 33
3 xyz 41
I tried doing
max(A) OVER (PARTITION BY ID) as the_value
but it did not work. I can still see all the rows in the output table.
I was wondering if somebody has come across a similar situation and has a solution to this ?

you can use this simple trick for getting the full record for which some column is maxed:
select original.* from
(select ID,max(B) as B from Tbl group by ID ) maxB
inner join
(select * from Tbl ) original
on original.ID = maxB.ID and original.B = maxB.B
now this is of course an overkill code. you can also do:
select Tbl.* from
(select ID,max(B) as B from Tbl group by ID ) maxB
inner join
Tbl
on Tbl.ID = maxB.ID and Tbl.B = maxB.B
but the first version is more of a template to do whatever you want with further columns, fields, conditions joins etc.

Related

PL SQL function that includes multiple tables

I'm new to PL SQL and have to write a function, which has customer_id as an input and has to output a product_name of the best selling product for that customer_id.
The schema looks like this:
I found a lot of simple examples where it includes two tables, but I can't seem to find one where you have to do multiple joins and use a function, while selecting only the best selling product.
I could paste a lot of very bad code here and how I tried to approach this, but this seems to be a bit over my head for current knowledge, since I've been learning PL SQL for less than 3 days now and got this task.
With some sample data (minimal column set):
SQL> select * from products order by product_id;
PRODUCT_ID PRODUCT_NAME
---------- ----------------
1 BMW
2 Audi
SQL> select * From order_items;
PRODUCT_ID CUSTOM QUANTITY UNIT_PRICE
---------- ------ ---------- ----------
1 Little 100 1
1 Little 200 2
2 Foot 300 3
If we check some totals:
SQL> select o.product_id,
2 o.customer_id,
3 sum(o.quantity * o.unit_price) total
4 from order_items o
5 group by o.product_id, o.customer_id;
PRODUCT_ID CUSTOM TOTAL
---------- ------ ----------
2 Little 400
1 Little 100
2 Foot 900
SQL>
It says that
for customer Little, product 2 was sold with total = 400 - that's our choice for Little
for customer Little, product 1 was sold with total = 100
for customer Foot, product 2 was sold with total = 900 - that's our choice for Foot
Query might then look like this:
temp CTE calculates totals per each customer
rank_them CTE ranks them in descending order per each customer; row_number so that you get only one product, even if there are ties
finally, select the one that ranks as the highest
SQL> with
2 temp as
3 (select o.product_id,
4 o.customer_id,
5 sum(o.quantity * o.unit_price) total
6 from order_items o
7 group by o.product_id, o.customer_id
8 ),
9 rank_them as
10 (select t.customer_id,
11 t.product_id,
12 row_number() over (partition by t.customer_id order by t.total desc) rn
13 from temp t
14 )
15 select * From rank_them;
CUSTOM PRODUCT_ID RN
------ ---------- ----------
Foot 2 1 --> for Foot, product 2 ranks as the highest
Little 2 1 --> for Little, product 1 ranks as the highest
Little 1 2
SQL>
Moved to a function:
SQL> create or replace function f_product (par_customer_id in order_items.customer_id%type)
2 return products.product_name%type
3 is
4 retval products.product_name%type;
5 begin
6 with
7 temp as
8 (select o.product_id,
9 o.customer_id,
10 sum(o.quantity * o.unit_price) total
11 from order_items o
12 group by o.product_id, o.customer_id
13 ),
14 rank_them as
15 (select t.customer_id,
16 t.product_id,
17 row_number() over (partition by t.customer_id order by t.total desc) rn
18 from temp t
19 )
20 select p.product_name
21 into retval
22 from rank_them r join products p on p.product_id = r.product_id
23 where r.customer_id = par_customer_id
24 and r.rn = 1;
25
26 return retval;
27 end;
28 /
Function created.
SQL>
Testing:
SQL> select f_product ('Little') result from dual;
RESULT
--------------------------------------------------------------------------------
Audi
SQL> select f_product ('Foot') result from dual;
RESULT
--------------------------------------------------------------------------------
Audi
SQL>
Now, you can improve it so that you'd care about no data found issue (when customer didn't buy anything), ties (but you'd then return a collection or a refcursor instead of a scalar value) etc.
[EDIT] I'm sorry, ORDERS table has to be included into the temp CTE; your data model is correct, you don't have to do anything about it - my query was wrong (small screen + late hours issue; not a real excuse, just saying).
So:
with
temp as
(select i.product_id,
o.customer_id,
sum(i.quantity * i.unit_price) total
from order_items i join orders o on o.order_id = i.order_id
group by i.product_id, o.customer_id
),
The rest of my code is - otherwise - unmodified.

write function which will display train details having maximum passanger for given date?

following are the table in database.
many to many relations between train and passenger
table 1 name=train
TNO TNAMe
1 x
2 y
3 z
table 2 name=passenger
PNO PNAME
111 a
222 b
333 c
table 3 name=tp
TNO PNO TPDATE
1 111 23-NOV-15
2 222 24-JUN-14
3 222 19-JUN-13
1 333 23-NOV-15
using follwing code i only find out which train number has highest frequency
select tno,count(tno) as numberofoccurance from tp group by tno
Try to use group by and row_number analytical function as following:
Select name, tp_date
From
(Select t.tname tp.tp_date,
row_number() over (partition by tp.tp_date order by count(1) desc nulls last) as cnt
From train t
Join tp tp
On (t.id = tp.tno)
Group by tp.tno, tp.tp_date)
Where cnt = 1;
Cheers!!

Inactivate duplicate record and re-point child records to active one

There are two table as below
Table1
ID Name Age Active PID
-----------------------------
1 A 2 Y 100
2 A 2 Y 100
3 A 2 Y 100
4 B 3 Y 200
5 B 3 Y 200
Table2
T2ID CID
---------
10 1
20 1
30 1
40 2
50 2
60 3
70 3
80 3
90 4
100 5
110 5
I am trying to inactivate the duplicate record of table 1 and reassign the table2 record to activated rows of table 1,The result for table1 and table2 should be as below
ID Name Age Active PID
-----------------------------
1 A 2 Y 100
2 A 2 N 100
3 A 2 N 100
4 B 3 N 200
5 B 3 Y 200
T2ID CID
---------
10 1
20 1
30 1
40 1
50 1
60 1
70 1
80 1
90 5
100 5
110 5
please help for oracle query to update
You can do this by using two merge statements, like so:
Update table2:
MERGE INTO table2 tgt
USING (WITH t1 AS (SELECT ID,
NAME,
age,
active,
pid,
MIN(ID) OVER (PARTITION BY pid) min_id,
CASE WHEN COUNT(CASE WHEN active = 'Y' THEN 1 END) OVER (PARTITION BY pid) > 1 THEN 'Y' ELSE 'N' END multi_active_rows
FROM table1)
SELECT t2.t2id,
t2.cid old_cid,
t1.min_id new_cid
FROM t1
INNER JOIN table2 t2 ON t1.id = t2.cid
WHERE t1.multi_active_rows = 'Y') src
ON (tgt.t2id = src.t2id)
WHEN MATCHED THEN
UPDATE SET tgt.cid = src.new_cid;
Update table1:
MERGE INTO table1 tgt
USING (WITH t1 AS (SELECT ID,
NAME,
age,
active,
pid,
MIN(ID) OVER (PARTITION BY pid) min_id,
CASE WHEN COUNT(CASE WHEN active = 'Y' THEN 1 END) OVER (PARTITION BY pid) > 1 THEN 'Y' ELSE 'N' END multi_active_rows
FROM table1)
SELECT ID
FROM t1
WHERE multi_active_rows = 'Y'
AND ID != min_id) src
ON (tgt.id = src.id)
WHEN MATCHED THEN
UPDATE SET active = 'N';
Since we want to derive the results to update both table1 and table2 from the original dataset in table1, it's easier to update table2 first before updating table1.
This works by finding the lowest id across each set of pids in table1, plus checking to see if there is more than one active row for each pid (there's no need to do any updates if we have at most one active row available).
Once we have that information, we can use that to decide which rows to update in each table, and we can use the min_id to update table2 with, and we can update any rows in table1 where the id doesn't match the min_id to be not active.
N.B. If you could have a mix of Ys and Ns in your data, you may need to skip the and id != min_id check in the second merge statement and amend the update part to update the row to Y if the id is the min_id, otherwise set it to N.

How to reduce the script lines for same table join condition in multiple times in oracle?

There are two tables table1 and table2.
Table1 is as below:
col1 | col2 | Col3
A 10 X
B 11 X
C 10 X
A 20 X
Table2 is as below:
col1 | col2 | col3 | col4
A 10 1 UDHAY
B 11 2 VIJAY
C 10 1 SURESH
A 20 2 ARUL
A 10 3 UDHAY
B 11 4 VIJAY
C 10 4 SURESH
A 20 5 ARUL
I want to display the column col4 in table2 with 3 join conditions as below.
table1.COl1 = table2.COl1
and table1.COl2 = table2.COl2
and table2.COl3 = '1'
Sample query :
select
table2.col4
from table1
left outer join table2
on(
table1.COl1 = table2.COl1
and table1.COl2 = table2.COl2
and table2.COl3 = '1');
Question: IF I want to display table2.col4 for condition table2.col3 1,2,3,4,5 with matching other condition from table1, how to make the script?
Actually I know we can add same table 5 times with different alias names and can print. But I don't want to repeat the same condition 5 times. Only the where conditions should be common for all 5 values.
Added on 30-OCT-2013:
Thanks for your response. NO not like you mentioned by using IN. Right now I am using below script concept :
select A.col1,A.co2,B1.col4 ,B2.col4,B3.col4.B4.col4
from table1 A
left outer join table2 B1
on(
A.COl1 = B1.COl1
and A.COl2 = B1.COl2
and B1.COl3 = '1')
left outer join table2 B2
on(
A.COl1 = B2.COl1
and A.COl2 = B2.COl2
and B2.COl3 = '2')
left outer join table2 B3
on(
A.COl1 = B3.COl1
and A.COl2 = B3.COl2
and B3.COl3 = '3')
left outer join table2 B4
on(
A.COl1 = B4.COl1
and A.COl2 = B4.COl2
and B4.COl3 = '4');
So my output will be:
A | 10 | UDHAY | |UDHAY| |
B | 11 | | VIJAY| | VIJAY |
C | 10 | SURESH | | | SURESH |
A | 20 | | ARUL | | |
But like above script I have to make it for 25 combination (1 to 25). So if I make the script like above the script lines will be more than 200 lines. To avoid that will you please help to suggest some other method to reduce the script lines and get the same output?
I'm not sure I'm getting you correctly, do you mean something like this?
and table2.COl3 IN ('1','2','3','4','5')
First solution (trivial):
Make a view with the whole set of Joins and query for it instead on your script, it's a bit dodgy but gets your script to be shorter. If your data set ends up growing off the scales you could switch to a materialized view instead and simplify what could end up being a costly plan.
Second solution (not so trivial):
If your tables do have a name convention you could write a PL block and loop through it, going around and arraying the tables that you want to join so that you can concat the joins on runtime in a string and run it as dynamic SQL.

Is there a more efficient way to count the number of aggregate records in Oracle SQL?

I have more experience with MySQL and MSSQL but I don't consider myself a SQL expert.
I have a requirement for some SQL work running on an Oracle database. Not even sure the version yet but it should be somewhat recent (10, 11??).
Anyway, I have to count the number of distinct records that spans two tables. For sake of argument, let's call them master and detail.
The following SQL gives me the number I want against the data. However, this SQL will eventually be put in a UDF (or Oracle equivalent). But my question is, is there a better way? Either using some advanced Oracle optimization or even just a better SQL query.
Thanks
select count(*) from
(
select
mas.barcode
, det.barcode_val
from mas
inner join det on (det.trans_id = mas.trans_id and mas.trans_sub_id = det.trans_sub_id)
where
mas.trans_id = 12345
and det.code_type = 'COMMODORE'
group by
mas.barcode
, det.barcode_val
);
Data:
MAS
trans_id trans_sub_id barcode
-------------------------------------
12345 1 COM_A
12345 2 COM_A
12345 3 COM_B
DET
trans_id trans_sub_id code_type barcode_val
-------------------------------------------------------
12345 1 COMMODORE C64
12345 1 COMMODORE C64
12345 1 TANDY TRASH80
12345 2 COMMODORE C128
12345 2 ATARI 800XL
12345 2 COMMODORE AMIGA500
12345 3 COMMODORE C64
Results before count
--------------------
COM_A C64
COM_A C128
COM_A AMIGA500
COM_B C64
Results after count
-------------------
4
SELECT
COUNT(DISTINCT mas.barcode || det.barcode_val)
FROM mas
INNER JOIN det
ON (det.trans_id = mas.trans_id and mas.trans_sub_id = det.trans_sub_id)
WHERE
mas.trans_id = 12345
AND det.code_type = 'COMMODORE'
or
SELECT COUNT(*) FROM (
SELECT DISTINCT mas.barcode, det.barcode_val
FROM mas
INNER JOIN det
ON (det.trans_id = mas.trans_id and mas.trans_sub_id = det.trans_sub_id)
WHERE
mas.trans_id = 12345
AND det.code_type = 'COMMODORE'
)
If you use the
COUNT(DISTINCT mas.barcode || det.barcode_val)
make sure to put a delimiter between the pipeline:
COUNT(DISTINCT mas.barcode || '-' || det.barcode_val)
For example imagine the following scenario:
Column1 Column2 Column1 || Column2 Column1 || '-' || Column2
A B AB A-B
AB <null> AB AB-
1 201 1201 1-201
<null> 1201 1201 -1201
This table has 4 rows with 4 different values. But if you try a
COUNT(DISTINCT COLUMN1 || COLUMN2)
you would get just 2 "distinct" groups.
Just a tip to try to avoid those corner cases.

Resources