I have two Tables with a foreign key from t1.ID to T2.T_ID
T1:
ID
PR_ID
Version
1
1
1
2
2
1
3
2
2
4
3
1
5
3
2
6
4
1
T2:
ID
T_ID
ab_nr
1
1
56
2
2
3
3
3
76
4
4
4
5
5
87
6
6
64
I need a select which gets all T2.IDs with the highest T1.Version. For example T1.PR_ID has the Numbers 2 and 3 with different Versions, here i would only need as end Result the T1.ID 's 1,3,5 and 6.
I tried it with:
SELECT * FROM T2
JOIN T1 ON T1.ID = T2.T_ID
WHERE T1.Version IN (SELECT MAX(VERSION) FROM T1);
but this doesnt work because it only gets the Number 2 and nothing else.
There's always a many ways to skin a SQL cat, but here's a simple one.
SELECT t2.*
FROM t1
INNER JOIN t2 ON t2.t_id = t1.id
WHERE NOT EXISTS ( SELECT 'higher version for the same PR_ID'
FROM t1 t1x
WHERE t1x.pr_id = t1.pr_id
AND t1x.version > t1.version )
That is, add a NOT EXISTS condition to filter out any results that are for old versions.
The way you tried to do it was on the right track, but you just needed to correlate your MAX(VERSION) subquery so that it got the max version for the current PR_ID. Like this:
SELECT * FROM T2
JOIN T1 ON T1.ID = T2.T_ID
WHERE T1.Version IN (SELECT MAX(VERSION) FROM T1X
-- You missed this part, below
WHERE T1X.PR_ID = T1.PR_ID
);
Anyway, try either of these. If performance is not good, we can start looking at more efficient ways of doing it (e.g., MAX ... KEEP)
Related
I have a table which has data as:
id payor_name
---------------
1 AETNA
2 UMR
3 CIGNA
4 METLIFE
4 AETNAU
5 ktm
6 ktm
Id and payor_name are two columns.So,
My expected output is:
id payor_name
---------------
1 AETNA
2 UMR
3 CIGNA
4 METLIFE
4 AETNAU
6 ktm ...> I want to change the id of this row to be 6 from 5.
6 ktm
I want one to one mapping between id and payor_name.So,this is what I tried:
MERGE INTO offc.payor_collec A
USING (select id from offc.payor_collec where payor_name in(
select payor_name from offc.payor_collec group by payor_name having count(distinct id)>=2)) B
ON (A.id=B.id)
WHEN MATCHED THEN
UPDATE SET A.id=B.id
But when I compiled I got error as:
Error at line 1
ORA-38104: Columns referenced in the ON Clause cannot be updated: "A"."ID"
Id is number where as payor_name is varchar2.
How can I achieve this result?
MERGE works, but slightly different than your code.
SQL> select * from test;
ID PAYOR
---------- -----
1 aetna
2 umr
5 ktm
6 ktm
SQL> merge into test t
2 using (select max(t1.id) id,
3 t1.payor_name
4 from test t1
5 group by t1.payor_name
6 ) x
7 on (x.payor_name = t.payor_name)
8 when matched then update set
9 t.id = x.id;
4 rows merged.
SQL> select * from test;
ID PAYOR
---------- -----
1 aetna
2 umr
6 ktm
6 ktm
SQL>
Use a correlated subquery:
UPDATE PAYOR_COLLEC pc
SET pc.ID = (SELECT MAX(pc2.ID)
FROM PAYOR_COLLEC pc2
WHERE pc2.PAYOR_NAME = pc.PAYOR_NAME)
dbfiddle here
You can use a MERGE statement, as you tried and as Littlefoot has shown.
You can also use a correlated subquery as Bob Jarvis has shown, but that will be quite inefficient.
Many Oracle developers are unaware that you can also update through a join. Worse, there are many who say "there is no such thing in Oracle."
In your problem, you need to join your table to an aggregate query (picking just the max id for each payor_name) and the join is on the group by column in the aggregate. This already guarantees that the join column will be unique in the right-hand table; that is all Oracle needs to allow the update through join.
Here is a complete example, starting with the create table statement, then the update and then showing the table after the update. Note that I don't need any constraints (like primary key, not null, unique, etc.) or indexes on the base table. If they do exist, so much the better, but the solution works in the most general case.
create table t (id, payor_name) as
select 1, 'AETNA' from dual union all
select 2, 'UMR' from dual union all
select 3, 'CIGNA' from dual union all
select 4, 'METLIFE' from dual union all
select 4, 'AETNAU' from dual union all
select 5, 'ktm' from dual union all
select 6, 'ktm' from dual;
Table T created.
update
(
select id, payor_name, max_id
from t inner join
(select max(id) as max_id, payor_name from t group by payor_name)
using (payor_name)
)
set id = max_id where id != max_id
;
1 row updated.
select * from t;
ID PAYOR_NAME
----- ----------
1 AETNA
2 UMR
3 CIGNA
4 METLIFE
4 AETNAU
6 ktm
6 ktm
Notice the where clause at the end of the update statement, too. You don't want to update rows to their pre-existing value; that will still generate undo and redo data (although I understand that Oracle has changed that in more recent versions - it now doesn't generate undo and redo unless a row did indeed change). I assume ID is NOT NULL - otherwise you should rewrite the where clause as
where decode(id, max_id, 0) is null
or equivalent
There are two table as below
Table1
ID Name Age Active PID
-----------------------------
1 A 2 Y 100
2 A 2 Y 100
3 A 2 Y 100
4 B 3 Y 200
5 B 3 Y 200
Table2
T2ID CID
---------
10 1
20 1
30 1
40 2
50 2
60 3
70 3
80 3
90 4
100 5
110 5
I am trying to inactivate the duplicate record of table 1 and reassign the table2 record to activated rows of table 1,The result for table1 and table2 should be as below
ID Name Age Active PID
-----------------------------
1 A 2 Y 100
2 A 2 N 100
3 A 2 N 100
4 B 3 N 200
5 B 3 Y 200
T2ID CID
---------
10 1
20 1
30 1
40 1
50 1
60 1
70 1
80 1
90 5
100 5
110 5
please help for oracle query to update
You can do this by using two merge statements, like so:
Update table2:
MERGE INTO table2 tgt
USING (WITH t1 AS (SELECT ID,
NAME,
age,
active,
pid,
MIN(ID) OVER (PARTITION BY pid) min_id,
CASE WHEN COUNT(CASE WHEN active = 'Y' THEN 1 END) OVER (PARTITION BY pid) > 1 THEN 'Y' ELSE 'N' END multi_active_rows
FROM table1)
SELECT t2.t2id,
t2.cid old_cid,
t1.min_id new_cid
FROM t1
INNER JOIN table2 t2 ON t1.id = t2.cid
WHERE t1.multi_active_rows = 'Y') src
ON (tgt.t2id = src.t2id)
WHEN MATCHED THEN
UPDATE SET tgt.cid = src.new_cid;
Update table1:
MERGE INTO table1 tgt
USING (WITH t1 AS (SELECT ID,
NAME,
age,
active,
pid,
MIN(ID) OVER (PARTITION BY pid) min_id,
CASE WHEN COUNT(CASE WHEN active = 'Y' THEN 1 END) OVER (PARTITION BY pid) > 1 THEN 'Y' ELSE 'N' END multi_active_rows
FROM table1)
SELECT ID
FROM t1
WHERE multi_active_rows = 'Y'
AND ID != min_id) src
ON (tgt.id = src.id)
WHEN MATCHED THEN
UPDATE SET active = 'N';
Since we want to derive the results to update both table1 and table2 from the original dataset in table1, it's easier to update table2 first before updating table1.
This works by finding the lowest id across each set of pids in table1, plus checking to see if there is more than one active row for each pid (there's no need to do any updates if we have at most one active row available).
Once we have that information, we can use that to decide which rows to update in each table, and we can use the min_id to update table2 with, and we can update any rows in table1 where the id doesn't match the min_id to be not active.
N.B. If you could have a mix of Ys and Ns in your data, you may need to skip the and id != min_id check in the second merge statement and amend the update part to update the row to Y if the id is the min_id, otherwise set it to N.
I have a question concerning Hive. Let me explain to you the scenario :
I am using a Hive action on Oozie; I have a query which is doing
succesive LEFT JOIN on different tables;
Total number of rows to be inserted is about 35 million;
First, the job was crashing due to lack of memory, so I set "set hive.auto.convert.join=false" the query was perfectly executed but it took 4 hours to be done;
I tried to rewrite the order of LEFT JOINs putting large tables at the end, but same result, about 4 hours to be executed;
Here is what the query look like:
INSERT OVERWRITE TABLE final_table
SELECT
T1.Id,
T1.some_field_name,
T1.another_filed_name,
T2.also_another_filed_name,
FROM table1 T1
LEFT JOIN table2 T2 ON ( T2.Id = T1.Id ) -- T2 is the smallest table
LEFT JOIN table3 T3 ON ( T3.Id = T1.Id )
LEFT JOIN table4 T4 ON ( T4.Id = T1.Id ) -- T4 is the biggest table
So, knowing the structure of the query is there a way to rewrite it so that I can avoid too many JOINs ?
Thanks in advance
PS: Even vectorization gave me the same timing
Too long for a comment, will be deleted later.
(1) Your current query won't compile.
(2) You are not selecting anything from T3 and T4, which makes no sense.
(3) Changing the order of tables is not likely to have any impact with cost based optimizer.
(4) Basically I would suggest to collect statistics on the tables, specifically on the id columns, but in your case I got a feeling that id is not unique in more than 1 table.
Add to your post the result of the following query:
select *
, case when cnt_1 = 0 then 1 else cnt_1 end
* case when cnt_2 = 0 then 1 else cnt_2 end
* case when cnt_3 = 0 then 1 else cnt_3 end
* case when cnt_4 = 0 then 1 else cnt_4 end as product
from (select id
,count(case when tab = 1 then 1 end) as cnt_1
,count(case when tab = 2 then 1 end) as cnt_2
,count(case when tab = 3 then 1 end) as cnt_3
,count(case when tab = 4 then 1 end) as cnt_4
from ( select 1 as tab,id from table1
union all select 2 as tab,id from table2
union all select 3 as tab,id from table3
union all select 4 as tab,id from table4
) t
group by id
having greatest (cnt_1,cnt_2,cnt_3,cnt_4) >= 10
) t
order by product desc
limit 10
;
Given a table
$cat data.csv
ID,State,City,Price,Flag
1,CA,A,95,0
2,CA,A,96,1
3,CA,A,195,1
4,NY,B,124,0
5,NY,B,128,1
6,NY,C,24,0
7,NY,C,27,1
8,NY,C,29,0
9,NY,C,39,1
Expected Result:
ID0, ID1
1,2
4,5
6,7
8,7
for each ID with Flag=0 above, we want to find another ID from Flag=1, with the same "State" and "City", and the nearest Price.
I have two rough stupid ideas:
Method 1.
Use a left outer join with the table itself on
(a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1),
where a.Flag=0 and b.Flag=1,
and then use RANK() over (partitioned by a.State,a.City order by a.Price - b.Price) as rank
where rank=1
Method 2.
Use a left outer join with the table itself,
on
(a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1),
where a.Flag=0 and b.Flag=1,
and then Use Distribute by a.State,a.City Sort by Price_Diff ASC limit 1
What's the best way to find the nearest neighbor in Hive?
Any valuable tips will be greatly appreciated!
select a.id, b.id , min(abs(b.price-a.price)) as delta
from data as a
inner join data as b
on a.country=b.country and
a.flag=0 and b.flag=1 and
a.city=b.city
group by a.id, b.id
order by delta asc;
This returns
1 2 1 <---
8 7 2 <---
6 7 3 <---
4 5 4 <---
8 9 10
6 9 15
1 3 100
The problem is that the last 3 rows have the same id used into the first 4.
select a.id as id0, b.id as id1, abs(b.price-a.price) as delta,
rank() over ( partition by a.country, a.city order by abs(b.price-a.price) )
from data as a
inner join data as b
on a.country=b.country and
a.flag=0 and b.flag=1 and
a.city=b.city;
This will return
id0 id1 prc rank
1 2 1 1 <---
1 3 100 2
4 5 4 1 <---
8 7 2 1 <---
6 7 3 2
8 9 10 3
6 9 15 4
We are missing 6,7 and this is somehow correct.
6,NY,C,24,0
7,NY,C,27,1
8,NY,C,29,0
9,NY,C,39,1
The lowest price difference for (6,7),(6,9),(8,7),(8,9) is in (8,7). (ambiguous join)
I think you will love this video about this topic : Big Data Analytics Using Window Functions
My oracle version is 10.2.
It's very strange when a scalar subquery has an aggregate operation.
my table named t_test looked like this;
t_id t_name
1 1
2 1
3 2
4 2
5 3
6 3
query string looked like this;
select t1.t_id,
(select count(t_name)
from (select t2.t_name
from t_test t2
where t2.t_id=t1.t_id
group by t2.t_name)) a
from t_test t1
this query's result is,
t_id a
1 3
2 3
3 3
4 3
5 3
6 3
which is very weird,
take t1.t_id=1 for example,
select count(t_name)
from (select t2.t_name
from t_test t2
where t2.t_id=1
group by t2.t_name)
the result is 1,
somehow,the 'where' operator doesn't work,the result is exactly the same as I put my query like this:
select t1.t_id,
(select count(t_name)
from (select t2.t_name
from t_test t2
group by t2.t_name)) a
from t_test t1
why?
Can you post a cut-and-paste from SQL*Plus showing exactly what query you're running? The query you posted does not appear to be valid-- the alias t1 is not going to be valid in the subquery where you're referencing it. That makes me suspect that you're simplifying the problem to post here but you've accidentally left something important out.
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 1 id, 1 name from dual union all
3 select 2,1 from dual union all
4 select 3,2 from dual union all
5 select 4,2 from dual union all
6 select 5,3 from dual union all
7 select 6,3 from dual
8 )
9 select t1.id
10 ,(select count(b.name)
11 from (select t2.name
12 from x t2
13 where t2.id = t1.id
14 group by t2.name) b) a
15* from x t1
SQL> /
where t2.id = t1.id
*
ERROR at line 13:
ORA-00904: "T1"."ID": invalid identifier
Presumably, it would be much more natural to write the query like this (assuming you really want to use a scalar subquery) where t1 is going to be a valid alias in the scalar subquery.
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 1 id, 1 name from dual union all
3 select 2,1 from dual union all
4 select 3,2 from dual union all
5 select 4,2 from dual union all
6 select 5,3 from dual union all
7 select 6,3 from dual
8 )
9 select t1.id
10 ,(select count(t2.name)
11 from x t2
12 where t2.id = t1.id) cnt
13* from x t1
SQL> /
ID CNT
---------- ----------
1 1
2 1
3 1
4 1
5 1
6 1
6 rows selected.