How to join in single row result with one row and multiple rows in another table? - oracle

I have 2 tables, tables 1 is tree structure.
Table 1:
ID NAME PARENTID
-------------------------
1 Book1 0
2 Page 1
3 Line1 2
4 Line2 2
5 Book2 0
6 Page1 5
7 Page2 5
8 Line1 6
9 Line2 6
Table 2:
ID BOOK PAGE LINE
1 1 2 4
2 5 7 9
I want to get all of rows in table 2 and show the Name of table 1 in one line
ID BOOK PAGE LINE
1 Book1 Page Line2
2 Book2 Page2 Line2
How can I show the data from multiple rows and 1 row in single row not duplicate if I use simple select. Sorry for about the stupid problem. Thanks for your help.

Use joins (tbl is second table):
select t1.id,
t2.name book,
t3.name page,
t4.name line
from tbl t1
join treeTbl t2
on t1.book = t2.id
join treeTbl t3
on t3.id=t1.page
join treeTbl t4
on t4.id=t1.line

Related

Delete rows from table which does not exists in other table by picking eligible records only

I am using Oracle. i need to delete the rows from One table which does not exists in other table by joining it with table having only eligible ID.
I am sorry i am not sure how to explain it better. Below is an example. ID + SUB_ID is PK
Final_table -
ID SUB_ID Name
1 1 Football
1 2 Cricket
1 3 Formula1
1 4 Tennis
2 1 Hockey
2 2 Archery
2 3 Badminton
3 1 Basketball
3 2 Dodgeball
Latest_Table
ID SUB_ID Name
1 1 Football
1 2 Cricket
1 3 Formula1
2 1 Hockey
2 2 Archery
3 1 Basketball
Sample_Table
ID
1
3
From above example, i joined Latest_table with Sample_Table on ID column and it gave me ID (1,3). Now based on these IDs which are eligible, i want to delete the rows from Final_Table which are not present in Latest_Table. I don't want it to delete anything for ID=2 as it is not eligible.
I have written below code, but it is deleting everything from FINAL_TABLE which is not present in Latest_table.
Delete from FINAL_TABLE FT
where not exists
(select 1 from
Latest_table LT, Sample_table ST
where LT.ID = ST.ID
and LT.ID = FT.ID
and LT.SUB_ID = FT.SUB_ID);
Thank you for your help.
Edited-
desired result in Final_Table should look like
Final_table -
ID SUB_ID Name
1 1 Football
1 2 Cricket
1 3 Formula1
2 1 Hockey
2 2 Archery
2 3 Badminton
3 1 Basketball
Edit: I misread your query originally. I have changed my answer to remove the join to Sample_Table in the first condition.
Delete from FINAL_TABLE FT
where not exists
(select 1 from
Latest_table LT
where LT.ID = FT.ID
and LT.NAME = FT.NAME
and LT.SUB_ID = FT.SUB_ID)
AND FT.id IN
(
SELECT id FROM Sample_Table
)
This will only delete from Final_Table if the id appears in Sample_Table and the record (all 3 columns) does not appear in the Latest_table.
An alternative way of writing this is
Delete from FINAL_TABLE FT
where
(FT.ID, FT.SUB_ID, FT.NAME) NOT IN
(SELECT LT.ID, LT.SUB_ID, LT.NAME FROM Latest_table LT)
AND FT.ID IN
(SELECT ST.id FROM Sample_Table ST)

only keep distinct rows when doing collect_set over a moving windowing function in hive

Lets say I have a hive table that has 3 rows: merchant_id, week_id, acc_id. My goal is to collect the unique customers in the previous 4 weeks for each week and I am using a moving window to do this.
My codes:
create a test table:
CREATE TABLE table_test_test (merchant_id INT, week_id INT, acc_id INT);
INSERT INTO TABLE table_test_test VALUES
(1,0,8),
(1,0,9),
(1,0,10),
(1,2,1),
(1,2,2),
(1,2,4),
(1,4,1),
(1,4,3),
(1,4,4),
(1,5,1),
(1,5,3),
(1,5,5),
(1,6,1),
(1,6,5),
(1,6,6)
Then do the collect:
select
merchant_id,
week_id,
collect_set(acc_id) over (partition by merchant_id ORDER BY week_id RANGE BETWEEN 4 preceding AND 0 preceding) as uniq_accs_prev_4_weeks
from
table_test_test
The result table is :
merchant_id week_id uniq_accs_prev_4_weeks
1 1 0 []
2 1 0 []
3 1 0 []
4 1 2 [9,8,10]
5 1 2 [9,8,10]
6 1 2 [9,8,10]
7 1 4 [9,8,10,1,2,4]
8 1 4 [9,8,10,1,2,4]
9 1 4 [9,8,10,1,2,4]
10 1 5 [1,2,4,3]
11 1 5 [1,2,4,3]
12 1 5 [1,2,4,3]
13 1 6 [1,2,4,3,5]
14 1 6 [1,2,4,3,5]
15 1 6 [1,2,4,3,5]
As you can see, there are redundant rows in the table. This is just an example, in my actual case this table is huge and the redundancy causes memory problem.
I have tried using distinct and group by but neither of these works.
Is there a good way to do it? Thanks a lot.
Distinct works good:
select distinct merchant_id, week_id, uniq_accs_prev_4_weeks
from
(
select
merchant_id,
week_id,
collect_set(acc_id) over (partition by merchant_id ORDER BY week_id RANGE BETWEEN 4 preceding AND current row) as uniq_accs_prev_4_weeks
from
table_test_test
)s;
Result:
OK
1 0 [9,8,10]
1 2 [9,8,10,1,2,4]
1 4 [9,8,10,1,2,4,3]
1 5 [1,2,4,3,5]
1 6 [1,2,4,3,5,6]
Time taken: 98.088 seconds, Fetched: 5 row(s)
My Hive does not accept 0 preceding, I replaced with current row. It seems like this bug or this bug, my Hive version is 1.2. Yours should work fine with distinct added in the upper subquery.

How to find the nearest neighbor in Hive? Any windowing function?

Given a table
$cat data.csv
ID,State,City,Price,Flag
1,CA,A,95,0
2,CA,A,96,1
3,CA,A,195,1
4,NY,B,124,0
5,NY,B,128,1
6,NY,C,24,0
7,NY,C,27,1
8,NY,C,29,0
9,NY,C,39,1
Expected Result:
ID0, ID1
1,2
4,5
6,7
8,7
for each ID with Flag=0 above, we want to find another ID from Flag=1, with the same "State" and "City", and the nearest Price.
I have two rough stupid ideas:
Method 1.
Use a left outer join with the table itself on
(a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1),
where a.Flag=0 and b.Flag=1,
and then use RANK() over (partitioned by a.State,a.City order by a.Price - b.Price) as rank
where rank=1
Method 2.
Use a left outer join with the table itself,
on
(a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1),
where a.Flag=0 and b.Flag=1,
and then Use Distribute by a.State,a.City Sort by Price_Diff ASC limit 1
What's the best way to find the nearest neighbor in Hive?
Any valuable tips will be greatly appreciated!
select a.id, b.id , min(abs(b.price-a.price)) as delta
from data as a
inner join data as b
on a.country=b.country and
a.flag=0 and b.flag=1 and
a.city=b.city
group by a.id, b.id
order by delta asc;
This returns
1 2 1 <---
8 7 2 <---
6 7 3 <---
4 5 4 <---
8 9 10
6 9 15
1 3 100
The problem is that the last 3 rows have the same id used into the first 4.
select a.id as id0, b.id as id1, abs(b.price-a.price) as delta,
rank() over ( partition by a.country, a.city order by abs(b.price-a.price) )
from data as a
inner join data as b
on a.country=b.country and
a.flag=0 and b.flag=1 and
a.city=b.city;
This will return
id0 id1 prc rank
1 2 1 1 <---
1 3 100 2
4 5 4 1 <---
8 7 2 1 <---
6 7 3 2
8 9 10 3
6 9 15 4
We are missing 6,7 and this is somehow correct.
6,NY,C,24,0
7,NY,C,27,1
8,NY,C,29,0
9,NY,C,39,1
The lowest price difference for (6,7),(6,9),(8,7),(8,9) is in (8,7). (ambiguous join)
I think you will love this video about this topic : Big Data Analytics Using Window Functions

Updating status in 1 table, based on most recent response in another table

I'm using Oracle 11g R1 database. Please help me with what I'm trying to achive.
Table 1
-------
ID Name Status
-- ---- ------
1 John 0
2 Chris 0
3 Joel 0
4 Mike 0
5 Henry 0
Table 2
-------
ID Status ResponseDate
-- ------ -------------
1 0 1-Jan-2013
1 1 31-Jan-2013
1 2 3-Feb-2013
1 6 19-Jan-2013
2 6 3-Mar-2013
2 2 1-Mar-2013
2 1 4-Mar-2013
2 0 2-Mar-2013
3 0 3-Feb-2013
3 1 2-Feb-2013
3 2 1-Feb-2013
4 2 4-Apr-2013
4 1 6-Apr-2013
4 0 1-Apr-2013
5 1 31-Mar-2013
5 6 4-Apr-2013
5 3 10-Jan-2013
I would like to update Table1.status based on the most recent response the ID's have returned. So, the statuses in Table1 should finally be updated as below,
ID Name Status
-- ---- ------
1 John 2
2 Chris 1
3 Joel 0
4 Mike 1
5 Henry 6
update table1 t1
set status = (
select max(status) keep (dense_rank last order by responsedate)
from table2 t2
where t2.id = t1.id
);
update table1 t1
set status =
(
select status
from table2
where id = t1.id and responseDate =
(
select max(responseDate)
from table2
where id = t1.id
)
)
Of course you can update status column of the table1 every time a need arises, but you might consider to create a view, v_table_1 for instance, which will provide you with fresh and up to date information:
create or replace view V_Table1 as
select max(t.id) as id
, max(t.name) as name
, max(q.status) keep(dense_rank first
order by q.ResponseDate desc) as status
from table_1 t
join table_2 q
on (q.id = t.id)
group by t.id
Result:
select *
from V_Table1
ID1 NAME1 STATUS
-------- ----- ----------
1 John 2
2 Chris 1
3 Joel 0
4 Mike 1
5 Henry 6

Oracle convert DECODE to PIVOT or force use of index

I have a very complex SQL view definition that has been inherited and requires altering to improve performance. It takes a list of records based on a foreign key and displays the rows returned as columns.
Thus :-
Data from select using RANK
ID RANK DKEY RECORD1 RECORD2 RECORD3
1 1 1 003 Rob Emmerry
1 2 2 004 Sue Emmerry
Returns
ID REC11 REC12 REC13 REC21 REC22 REC23
1 003 Rob Emmerry 004 Sue Emmerry
There are 37 columns of data repeated for each returned row upto a max of 5.
Using
SELECT ID,
MIN(DECODE(ranking,1,RECORD1, NULL)) AS REC11
MIN(DECODE(ranking,1,RECORD2, NULL)) AS REC12
MIN(DECODE(ranking,1,RECORD3, NULL)) AS REC13
MIN(DECODE(ranking,1,RECORD4, NULL)) AS REC14
MIN(DECODE(ranking,1,RECORD5, NULL)) AS REC15
MIN(DECODE(ranking,1,RECORD6, NULL)) AS REC16
MIN(DECODE(ranking,2,RECORD1, NULL)) AS REC21
MIN(DECODE(ranking,2,RECORD2, NULL)) AS REC22
MIN(DECODE(ranking,2,RECORD3, NULL)) AS REC23
MIN(DECODE(ranking,2,RECORD4, NULL)) AS REC24
MIN(DECODE(ranking,2,RECORD5, NULL)) AS REC25
MIN(DECODE(ranking,2,RECORD6, NULL)) AS REC26
FROM
(
SELECT ID, RANK () OVER (PARTITION BY id ORDER BY dkey) ranking,
RECORD1,
RECORD2,
RECORD3,
RECORD4,
RECORD5,
RECORD6
FROM TABLEA
JOIN
(SELECT ID, DKEY, RECORD4, RECORD5, RECORD6
FROM TABLEB
) ON TABLEB.DKEY = TABLEA.DKEY AND TABLEB.ID = TABLEA.ID
)
GROUP BY ID;
When using the explain plan and filtering on the DKEY field which has an index the index is ignored presumably because of the min/decode statements.
So I thought about rewriting this using PIVOT but don't know how to start.
Any thoughts as to how I can
a) Get the query to use the index
b) Rewrite using PIVOT
First option is obviously preferable.
Thanks
Craig
UPDATE
Here is some sample data showing how my tables are.
Table 1
DKEY PID RECORD1 RECORD2 RECORD3
1 1 3 Rob Emmerry
2 1 4 Sue Emmerry
3 1 4 Jan Morris
4 1 4 Sue Pye
5 1 4 Jane Taylor
Table 2
CID DKEY RECORD10
1 3 A
2 3 D
3 3 G
4 3 J
5 4 A
6 5 A
7 5 D
8 6 A
9 6 D
10 6 G
11 7 A
12 7 D
13 7 G
14 7 J
15 7 M
Table 3
QID DKEY RECORD3
1 3 C
2 6 C
3 6 F
4 7 C
5 7 F
So tables 2 & 3 link to table 1 with DKEY
If we took the DKEY=3 as an example I would want to see this:-
PID DKEY REC1 REC2 REC3 REC4 REC5 REC6 REC7 REC8 REC9 REC10 REC11 REC12 REC13
1 3 4 Jan Morris A D G J NULL C NULL NULL NULL NULL
There could be up to 5 rows in each of tables 2 & 3. Fields PID, DKEY, REC1-REC3 from table 1, REC4-REC8 come from table 2 and the rest from table 3. The other records from table 1 would simply continue on the row so after REC13, DKEY=4 etc etc.
Hope this makes sense.
SELECT
ID,
MIN(DECODE(ranking,1,RECORD1, NULL)) AS REC11,
MIN(DECODE(ranking,1,RECORD2, NULL)) AS REC12,
MIN(DECODE(ranking,1,RECORD3, NULL)) AS REC13,
MIN(DECODE(ranking,1,RECORD4, NULL)) AS REC14,
MIN(DECODE(ranking,1,RECORD5, NULL)) AS REC15,
MIN(DECODE(ranking,1,RECORD6, NULL)) AS REC16,
MIN(DECODE(ranking,2,RECORD1, NULL)) AS REC21,
MIN(DECODE(ranking,2,RECORD2, NULL)) AS REC22,
MIN(DECODE(ranking,2,RECORD3, NULL)) AS REC23,
MIN(DECODE(ranking,2,RECORD4, NULL)) AS REC24,
MIN(DECODE(ranking,2,RECORD5, NULL)) AS REC25,
MIN(DECODE(ranking,2,RECORD6, NULL)) AS REC26
FROM
(
SELECT /*+ INDEX(tablea tablea_index) */
ID,
RANK () OVER (PARTITION BY id ORDER BY dkey) ranking,
RECORD1,
RECORD2,
RECORD3,
RECORD4,
RECORD5,
RECORD6
FROM TABLEA
JOIN TABLEB
-- was: ON TABB.DKEY = TABLEA.DKEY AND TABB ON TABB.ID = TABLEA.ID
ON TABLEB.DKEY = TABLEA.DKEY
AND TABLEB.ID = TABLEA.ID
)
GROUP BY ID;

Resources