Vertica Correlate Subquery with NOT IN clause - vertica

I have put all the DDL and query in sqlfiddel link below
http://sqlfiddle.com/#!9/89c76/2
Same thing here
create table players(name varchar(32),playerid int);
insert into players values("a",1);
insert into players values("b",2);
insert into players values("c",3);
insert into players values("d",4);
insert into players values("e",5);
insert into players values("f",6);
select * from players;
create table matches(playerid int,game varchar(32));
insert into matches values(1,"game1");
insert into matches values(2,"game1");
insert into matches values(3,"game1");
insert into matches values(1,"game2");
insert into matches values(2,"game2");
insert into matches values(3,"game2");
insert into matches values(4,"game3");
insert into matches values(5,"game2");
select * from matches;
commit;
and the query is
select p.playerid,m.game
from players p, (select distinct game from matches) m
where p.playerid not in (select playerid from matches where game=m.game)
I get the following error
[Vertica][VJDBC](2795) ERROR: Correlated subquery with NOT IN is not supported [SQL State=0A000, DB Errorcode=2795]
samething is mentioned in vertica doc.
How can I, rewrite this query?
I need results as
Result:
------------------------
game1 | 4,5,6
------------------------
game2 | 4,6
------------------------
game3 | 1,2,3,5,6
------------------------
....
I had posted this issue in the vertica forum and found the solution through Kim_nicely
https://forum.vertica.com/discussion/comment/240673#Comment_240673
select p.playerid, m.game from players p cross join (select distinct game from matches) m
minus
select * from matches
order by 1, 2;

Based on the query, the logic is to get all players that did not play in game1 and players that did not play any game. You can also get the players not in game1 by doing below. Notice the difference on the left join on m.game='game1'.
I can only do one game becuase the new requirements need stored procedure in vertica. I don't have experience on it but sql.
select 'game1', group_concat(t.playerid) playerid
from (
select p.playerid
from players p
left join matches m
on p.playerid = m.playerid and m.game='game1'
where m.playerid is null
UNION
select p.playerid
from players p
left join matches m
on p.playerid = m.playerid
where m.playerid is null) t

Related

Oracle Performance issues on using subquery in an "In" orperator

I have two query that looks close to the same but Oracle have very different performance.
Query A
Create Table T1 as Select * from FinalView1 where CustomerID in ('A0000001','A000002')
Query B
Create Table T1 as Select * from FinalView1 where CustomerID in (select distinct CustomerID from CriteriaTable)
The CriteriaTable have 800 rows but all belongs to Customer ID 'A0000001' and 'A000002'.
This means the subquery: "select distinct CustomerID from CriteriaTable" also only returns the same two elements('A0000001','A000002') as manually entered in query A
Following is the query under the FinalView1
create or replace view FinalView1_20200716 as
select
Customer_ID,
<Some columns>
from
Table1_20200716 T1
INNER join Table2_20200716 T2 on
T1.Invoice_number = T2.Invoice_number
and
T1.line_id = T2.line_id
left join Table3_20200716 T3 on
T3.id = T1.Customer_ID
left join Table4_20200716 T4 on
T4.Shipping_ID = T1.Shipping_ID
left join Table5_20200716 Table5 on
Table5.Invoice_ID = T1.Invoice_ID
left join Table6_20200716 T6 on
T6.Shipping_ID = T4.Shipping_ID
left join First_Order first on
first.Shipping_ID = T1.Shipping_ID
;
Table1_20200716,Table2_20200716,Table3_20200716,Table4_20200716,Table5_20200716,Table6_20200716 are views to the corresponding table with temporal validity feature. For example
The query under Table1_20200716
Create or replace view Table1_20200716 as
select
*
from Table1 as for period of to_date('20200716,'yyyymmdd')
However table "First_Order" is just a normal table as
Following is the performance for both queries (According to explain plan):
Query A:
Cardinality: 102
Cost : 204
Total Runtime: 5 secs max
Query B:
Cardinality:27921981
Cost: 14846
Total Runtime:20 mins until user cancelled
All tables are indexed using those columns that used to join against other tables in the FinalView1. According to the explain plan, they have all been used except for the FirstOrder table.
Query A used uniquue index on the FirstOrder Table while Query B performed a full scan.
For query B, I was expecting the Oracle will firstly query the sub-query get the result into the in operator, before executing the main query and therefore should only have minor impact to the performance.
Thanks in advance!
As mentioned from my comment 2 days ago. Someone have actually posted the solution and then have it removed while the answer actually work. After waiting for 2 days the So I designed to post that solution.
That solution suggested that the performance was slow down by the "in" operator. and suggested me to replace it with an inner join
Create Table T1 as
Select
FV.*
from
FinalView1 FV
inner join (
select distinct
CustomerID
from
CriteriaTable
) CT on CT.customerid = FV.customerID;
Result from explain plan was worse then before:
Cardinality:28364465 (from 27921981)
Cost: 15060 (from 14846)
However, it only takes 17 secs. Which is very good!

oracle select query based on other table row level condition

I want to find the orders number from table#orders where DelivaryDateRevision less than max revisions from each country(table#maxrevisions). Countrycode is not the foreign key to the other table.
Can I fetch the orders table records if the country code is missing in the maxrevisions table.
Table: orders
OrderNumber | CountryCode | DelivaryDateRevision
123--------------- IN-------------------9
234--------------- US-------------------3
238-------------- IN------------------ 3
table: maxrevisions
CountryCode| MaxRevision
IN ---------------6
US--------------- 4
My query:
SELECT distinct o.ordernumber,o.countrycode
FROM orders o
left outer join maxrevisions m
on o.CountryCode=m.CountryCode
and
o.DelivaryDateRevision<rs.MaxRevision;
but I am getting the wrong result. Can I get any help here?
Your major omission seems to be a WHERE clause which compares the two revisions:
SELECT
o.ordernumber,
o.countrycode
FROM orders o
LEFT JOIN maxrevisions m
ON o.CountryCode = m.CountryCode
WHERE
o.DelivaryDateRevision < m.MaxRevision OR m.MaxRevision IS NULL;
Demo
Select
ordernumber,
countrycode,
deliverydateversion
from orders o
where deliverydateversion >
(
select max(revision)
from maxrevisiontab
where countrycode= o.countrycode
)
Please change the table names and column names as per your structure.

Left Join with Multiple Conditions and MAX Value

I'm trying to execute a left join where multiple conditions must be met with the inclusion of pulling in the MAX sequence number that meets those conditions.
The left join is on the unique identifier in both tables. Table acaps_history has several rows for each app_id. I need to pull in only one row with the highest seq_number and activity_code of 'XU'. If the code 'XU' doesn't exist for the given app_id, then the case statement above should return 'N' for that row. The code I have currently just isn't working - returning the error "a column may not be outer-joined to a subquery":
create table orig_play3 as
(select
x.*,
case when xa.activity_code in 'XU' then 'Y' else 'N' end as cpo_flag
from
dfs_tab_orig_play_x x
left join cf.acaps_history xa on
x.APP_ID = xa.FOC_APPL_ID
and xa.activity_code in 'XU'
and xa.seq_number = (select max(seq_number) from cf.acaps_history where FOC_APPL_ID=x.app_id)
)
Given your error, it seems that the issue is the last part of your query:
and xa.seq_number = (select max(seq_number) from cf.acaps_history where FOC_APPL_ID=x.app_id)
This is still operating in the context of the ON clause, so the sub-query to find the max sequence number is the issue.
You should be able to avoid this by moving that sub-query out of the ON clause:
LEFT JOIN (
SELECT FOC_APPL_ID, activity_code, seq_number
FROM cf.acaps_history
WHERE activity_code in 'XU'
) xa
ON x.APP_ID = xa.FOC_APPL_ID
WHERE xa.seq_number = (select max(ah.seq_number) from cf.acaps_history ah where ah.FOC_APPL_ID=x.app_id and ah.activity_code in 'XU')
This may be the most inefficient way to execute this query, but it worked... It took like 3 minutes to run (table size is over 600K rows), but again, it returned the results I needed:
create table test as (
select x.*,
case when xb.activity_code in 'XU' then 'Y' else 'N' end as cpo_flag
from dfs_tab_orig_play_x x
left join
(select
xa.FOC_APPL_ID, xa.activity_code, xa.seq_number
from dfs_tab_orig_play_x x, cf.acaps_history xa
where x.app_id = xa.FOC_APPL_ID (+)
and xa.seq_number = (select max(seq_number) from cf.acaps_history where
x.app_id=FOC_APPL_ID(+) and activity_code in 'XU')) xb
on x.app_id = xb.FOC_APPL_ID (+)
)
If you are on 12c, I like OUTER APPLY for this sort of thing, because it lets you sort the rows for each app_id descending by seq_number and then just pick the highest one.
SELECT
x.*,
CASE
WHEN xa.activity_code IN 'XU' THEN 'Y'
ELSE 'N'
END
AS cpo_flag
FROM
dfs_tab_orig_play_x x
OUTER APPLY ( SELECT *
FROM cf.acaps_history xa
WHERE xa.foc_appl_id = x.app_id
AND xa.activity_code = 'XU'
ORDER BY xa.seq_number DESC
FETCH FIRST 1 ROW ONLY ) xa
Note: this logic is a little different from what you posted. In this version, it will join to the acaps_history row having the highest seq_number from among 'XU' records for the given app_id. Your version was joining to the row having the highest seq_number for the given app_id, whether that row was an 'XU' row or not. I am assuming (with little reason) that that was a bug on your part. But, if it wasn't, my version won't work as given.

Insert Statement Returns ORA-01427 Error While Trying To Insert From Multiple Tables

I have this table F_Flight which I am trying to insert into from 3 different tables. The first, fourth and fifth columns are from the same, and the second and third columns from different tables. When I execute the code, I get a "single-row subquery returns more than one row" error.
insert when 1 = 1 then into F_Flight (planeid, groupid, dateid, flightduration, kmsflown) values
(planeid, (select b.groupid from BridgeTable b where exists (select p.p1id from pilotkeylookup p where b.pilotid = p.p1id)),
(select dd.id from D_Date dd where exists (select p.launchtime from PilotKeyLookup p where dd."Date" = p.launchtime)),
flightduration, kmsflown) select * from PilotKeyLookup p;
Your subqueries get multiple rows back, which is what the error message says. There is no correlation between the various bits of data and subqueries you're trying to insert into a single row.
This can be done as a much simpler insert...select with joins, something like:
insert into f_flight (planeid, groupid, dateid, flightduration, kmsflown)
select pkl.planeid, bt.groupid, dd.id, pkl.flightduration, pkl.kmsflown
from pilotkeylookup pkl
join bridgetable bt on bt.pilotid = pkl.p1id
join d_date dd on dd."Date" = pkl.launchtime;
This joins the main PilotKeyLookup table to the other two on the keys you used in your subqueries.
Storing an ID value instead of an actual date is unusual, and if launchtime has a time component - which seems likely from the name - and your d_date entries are just dates (i.e. all with time at midnight) then you won't find matches; you might need to do:
join d_date dd on dd."Date" = trunc(pkl.launchtime);
It also seems like this could be a view, as you're storing duplicate data - everything in f_flight could, obviously, be found from the other tables.

Copy records from one table to another with pl-sql

I want to copy records from one table to another.
The only records from table 1 that will be copied to table 2 are the ones that still dont exist in table 2.
If duplicate records exists in Table 1 then only be copied to table 2 the record with the larger size name.
I could already implement a query that almost does what I want.
The problem I have is when there are names with the same maximum size of characters.
In these cases, my query returns more than one record and I just want to insert one new record in table 2.
Does anyone know how I can fix this?
Here is my code:
For x in (Select distinct xdd.id_t, xdd.name_t
From table1 xdd
Where xdd.id_t not in (Select distinct det.id_t2
From table2 det)
And LENGTH(xdd.name_t) in (Select Max(LENGTH(xdd2.name_t))
From table1 xdd2
Where xdd2.id_t = xdd.id_t)
) Loop
Insert into id_t2 (id_t2, name_t2)
Values (x.id_t, x.name_t);
End loop;
Can you give me an example to solve this?
Sure. If I understood requirements correctly, then the merge statement will look similar to this one:
We use row_number() analytic function to choose a duplicate record with longer name_t
merge into table_two t2
using(
select id_t
, name_t
from (select id_t
, name_t
, row_number() over(partition by id_t
order by length(name_t) desc) as rn
from table_one) q
where q.rn = 1
) t1
on (t2.id_t = t1.id_t)
when not matched then
insert(id_t, name_t)
values(t1.id_t, t1.name_t)
SQLFiddle demo
This is a merge statement that should "upsert" data from table 1 into table 2. Matching keys should update only when the name field in table1 is greater than that of table 2. And inserts should occur when keys from table one are not matched to table 2.
MERGE INTO table2 D
USING (SELECT table1.id_t, table1.name_t FROM table1) S
ON (D.id_t2 = S.id_t)
WHEN MATCHED THEN UPDATE SET D.name_t2 = S.name_t
WHERE (LENGTH(S.name_t) > LENGTH(D.name_t2))
WHEN NOT MATCHED THEN INSERT (D.id_t, D.name_t)
VALUES (S.id_t2, S.name_t2);

Resources