Replace selfjoin with analytic functions - oracle

How do I go about replacing the following self join using analytics:
SELECT
t1.col1 col1,
t1.col2 col2,
SUM((extract(hour FROM (t1.times_stamp - t2.times_stamp)) * 3600 + extract(minute FROM ( t1.times_stamp - t2.times_stamp)) * 60 + extract(second FROM ( t1.times_stamp - t2.times_stamp)) ) ) div,
COUNT(*) tot_count
FROM tab1 t1,
tab1 t2
WHERE t2.col1 = t1.col1
AND t2.col2 = t1.col2
AND t2.col3 = t1.sequence_num
AND t2.times_stamp < t1.times_stamp
AND t2.col4 = 3
AND t1.col4 = 4
AND t2.col5 NOT IN(103,123)
AND t1.col5 != 549
GROUP BY t1.col1, t1.col2

I'm pretty sure you won't be able to replace the self-join with analytics because you are using inter-rows operations (t1.time_stamp - t2.time_stamp). Analytics can only access the values of the current row and the value of aggregate functions over a subset of rows (windowing clause).
See this article from Tom Kyte and this paper for further analysis of the limitations of analytics.

It almost looks like you could eliminate the self join on t2 and replace
t1.time_stamp - t2.time_stamp
with something like
t1.time_stamp - lag(t1.time_stamp) over (partition by col1, col2 order by time_stamp)
The different filters on t1 and t2 on col4 and col5 are what prevents you from doing this.
Analytic functions are applied after the where / group by on the main query, so you'd need to have a single filter on t1 in order to use lag/lead to specify following or preceding rows in a sequence.
Also, you'd need to push the sum/group by to an outer query to aggregate after the analytic function:
select col1, col2, sum(timestamp_diff) from (
select col1, col2, timestamp - lag(timestamp) over(.....) as timestamp_diff
where ....
) group by col1, col2

Related

Oracle - Joining a basic join query with a count/group by having query

I have two queries. One basic join query and one query that uses a count/having/group by. The query that uses count is using a table also used in the basic join so I figured I could do either add another join or some sort of sub query.
What I want to do is add one or more columns from another table to query 2.
Query 1
SELECT t1.col1,
, t2.col12
FROM Table1
inner join Table2 t2
on t1.ID_NO = t2.ID_NO
Query 2
SELECT t2.col1||t2.col2, count(distinct t2.col3) Totals
FROM Table2 t2 having count(distinct t2.col3) >=15 GROUP BY t2.col1, t2.col2
Name
Account
Totals
t1.col1
t2.col1 & t2.col2
count(distinct t2.col3)
You have not described what output you are expecting from your query but there are many routes you could take to join the tables:
Join to a sub-query:
SELECT t1.col1,
t2.col1 || t2.col2 AS col12,
t2.max_id_no,
t2.totals
FROM Table1 t1
INNER JOIN (
SELECT col1,
col2,
MAX(id_no) AS max_id_no,
COUNT(DISTINCT col3) AS Totals
FROM Table2
GROUP BY col1, col2
HAVING COUNT(DISTINCT col3) >=15
) t2
ON t1.id_no = t2.max_id_no
Or, join and then group:
SELECT t2.col1 || t2.col2 AS col12,
MAX(t1.col1) AS max_t1_col1,
COUNT(DISTINCT t2.col3) AS Totals
FROM Table1 t1
INNER JOIN Table2 t2
ON (t1.ID_NO = t2.ID_NO)
GROUP BY t2.col1, t2.col2
HAVING COUNT(DISTINCT t2.col3) >=15

Reduce overload on pl/sql

I have a requirement to do matching of few attributes one by one. I'm looking to avoid multiple select statements. Below is the example.
Table1
Col1|Price|Brand|size
-----------------------
A|10$|BRAND1|SIZE1
B|10$|BRAND1|SIZE1
C|30$|BRAND2|SIZE2
D|40$|BRAND2|SIZE4
Table2
Col1|Col2|Col3
--------------
B|XYZ|PQR
C|ZZZ|YYY
Table3
Col1|COL2|COL3|LIKECOL1|Price|brand|size
-----------------------------------------
B|XYZ|PQR|A|10$|BRAND1|SIZE1
C|ZZZ|YYY|D|NULL|BRAND2|NULL
In table3, I need to insert data from table2 by checking below conditions.
Find a match for record in table2, if Brand and size, Price match
If no match found, then try just Brand, Size
still no match found, try brand only
In the above example, for the first record in table2, found match with all the 3 attributes and so inserted into table3 and second record, record 'D' is matching but only 'Brand'.
All I can think of is writing 3 different insert statements like below into an oracle pl/sql block.
insert into table3
select from tab2
where all 3 attributes are matching;
insert into table3
select from tab2
where brand and price are matching
and not exists in table3 (not exists is to avoid
inserting the same record which was already
inserted with all 3 attributes matched);
insert into table3
select from tab2
where Brand is matching and not exists in table3;
Can anyone please suggest a better way to achieve it in any better way avoiding multiple times selecting from table2.
This is a case for OUTER APPLY.
OUTER APPLY is a type of lateral join that allows you join on dynamic views that refer to tables appearing earlier in your FROM clause. With that ability, you can define a dynamic view that finds all the matches, sorts them by the pecking order you've specified, and then use FETCH FIRST 1 ROW ONLY to only include the 1st one in the results.
Using OUTER APPLY means that if there is no match, you will still get the table B record -- just with all the match columns null. If you don't want that, you can change OUTER APPLY to CROSS APPLY.
Here is a working example (with step by step comments), shamelessly stealing the table creation scripts from Michael Piankov's answer:
create table Table1 (Col1,Price,Brand,size1)
as select 'A','10','BRAND1','SIZE1' from dual union all
select 'B','10','BRAND1','SIZE1' from dual union all
select 'C','30','BRAND2','SIZE2' from dual union all
select 'D','40','BRAND2','SIZE4'from dual
create table Table2(Col1,Col2,Col3)
as select 'B','XYZ','PQR' from dual union all
select'C','ZZZ','YYY' from dual;
-- INSERT INTO table3
SELECT t2.col1, t2.col2, t2.col3,
t1.col1 likecol1,
decode(t1.price,t1_template.price,t1_template.price, null) price,
decode(t1.brand,t1_template.brand,t1_template.brand, null) brand,
decode(t1.size1,t1_template.size1,t1_template.size1, null) size1
FROM
-- Start with table2
table2 t2
-- Get the row from table1 matching on col1... this is our search template
inner join table1 t1_template on
t1_template.col1 = t2.col1
-- Get the best match from table1 for our search
-- template, excluding the search template itself
outer apply (
SELECT * FROM table1 t1
WHERE 1=1
-- Exclude search template itself
and t1.col1 != t2.col1
-- All matches include BRAND
and t1.brand = t1_template.brand
-- order by match strength based on price and size
order by case when t1.price = t1_template.price and t1.size1 = t1_template.size1 THEN 1
when t1.size1 = t1_template.size1 THEN 2
else 3 END
-- Only get the best match for each row in T2
FETCH FIRST 1 ROW ONLY) t1;
Unfortunately is not clear what do you mean when say match. What is you expectation if there is more then one match?
Should it be only first matching or it will generate all available pairs?
Regarding you question how to avoid multiple inserts there is more then one way:
You could use multitable insert with INSERT first and condition.
You could join table1 to self and get all pairs and filter results in where condition
You could use analytical function
I suppose there is another ways. But why you would like to avoid 3 simple inserts. Its easy to read and maintain. And may be
There is example with analytical function next:
create table Table1 (Col1,Price,Brand,size1)
as select 'A','10','BRAND1','SIZE1' from dual union all
select 'B','10','BRAND1','SIZE1' from dual union all
select 'C','30','BRAND2','SIZE2' from dual union all
select 'D','40','BRAND2','SIZE4'from dual
create table Table2(Col1,Col2,Col3)
as select 'B','XYZ','PQR' from dual union all
select'C','ZZZ','YYY' from dual
with s as (
select Col1,Price,Brand,size1,
count(*) over(partition by Price,Brand,size1 ) as match3,
count(*) over(partition by Price,Brand ) as match2,
count(*) over(partition by Brand ) as match1,
lead(Col1) over(partition by Price,Brand,size1 order by Col1) as like3,
lead(Col1) over(partition by Price,Brand order by Col1) as like2,
lead(Col1) over(partition by Brand order by Col1) as like1,
lag(Col1) over(partition by Price,Brand,size1 order by Col1) as like_desc3,
lag(Col1) over(partition by Price,Brand order by Col1) as like_desc2,
lag(Col1) over(partition by Brand order by Col1) as like_desc1
from Table1 t )
select t.Col1,t.Col2,t.Col3, coalesce(s.like3, like_desc3, s.like1, like_desc1, s.like1, like_desc1),
case when match3 > 1 then size1 end as size1,
case when match1 > 1 then Brand end as Brand,
case when match2 > 1 then Price end as Price
from table2 t
left join s on s.Col1 = t.Col1
COL1 COL2 COL3 LIKE_COL SIZE1 BRAND PRICE
B XYZ PQR A SIZE1 BRAND1 10
C ZZZ YYY D - BRAND2 -

joining based on columns priority

I want to join 2 tables based on columns priority
ex. Suppose Table1 has six columns(Col1,Col2,Col3,Col4,Col5,Col6)
If i want to join Table 1 with table2 (Col1,Col2,Col3,Col4,Col5,Col7), it should
otherwise
Select Table2.col7
where
first check col1 , col2 and col3 if match found no need go check more
second check col1 , col2 if match found no need go check more
third check col1 if match found no need go check more
last ignore all col1 , col2 and col3
AND Table1.Col4=Table2.Col4
AND Table1.Col5=Table2.Col5
I may not be clear with my words, if any concern please shout
You cannot tell SQL to try to join on a certain condition first and in case it finds no match to go on searching. What you can do is join all allowed combinations (matches on col4 and col5 in your case) and then rank your matches (such that a match on col1 and col2 and col3 is considered best etc.). Then only keep the best matches:
select col7
from
(
select
t1.*,
t2.*,
row_number() over
(
partition by t1.col4, t1.col5
order by case
when t2.col1 = t1.col1 and t2.col2 = t1.col2 and t2.col3 = t1.col3 then 1
when t2.col1 = t1.col1 and t2.col2 = t1.col2 then 2
when t2.col1 = t1.col1 then 3
else 4
) as rn
from table1 t1
join table2 t2 on t2.col4 = t1.col4 and t2.col5 = t1.col5
)
where rn = 1;
Select t2.col7
from Table1 t1 inner join Table2 t2
on
case
when t1.col1 = t2.col1 then 1
when t1.col2 = t2.col2 then 1
when t1.col3 = t2.col3 then 1
when t1.Col4=t2.Col4
and t1.Col5=t2.Col5 then 1
else 0 end = 1
;

How to optimize this SELECT with sub query Oracle

Here is my query,
SELECT ID As Col1,
(
SELECT VID FROM TABLE2 t
WHERE (a.ID=t.ID or a.ID=t.ID2)
AND t.STARTDTE =
(
SELECT MAX(tt.STARTDTE)
FROM TABLE2 tt
WHERE (a.ID=tt.ID or a.ID=tt.ID2) AND tt.STARTDTE < SYSDATE
)
) As Col2
FROM TABLE1 a
Table1 has 48850 records and Table2 has 15944098 records.
I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.
The query is still too slow. How can this be improved? Please help.
I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.
This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):
CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE)
RETURN table2.vid%TYPE IS
l_result table2.vid%TYPE;
BEGIN
SELECT vid
INTO l_result
FROM (SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
UNION ALL
SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id2 = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
ORDER BY startdte DESC)
WHERE rownum = 1;
RETURN l_result;
END;
Your SQL would become:
SELECT ID As Col1,
getvid(a.id) vid
FROM TABLE1 a
Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.
Possibly try the following, though untested.
WITH max_times AS
(SELECT a.ID, MAX(t.STARTDTE) AS Startdte
FROM TABLE1 a, TABLE2 t
WHERE (a.ID=t.ID OR a.ID=t.ID2)
AND t.STARTDTE < SYSDATE
GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
LEFT OUTER JOIN max_times mt
ON (b.ID = mt.ID)
LEFT OUTER JOIN TABLE2 tt
ON ((mt.ID=tt.ID OR mt.ID=tt.ID2)
AND mt.startdte = tt.startdte)
You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:
SELECT id AS col1, vid
FROM (
SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
NULLS LAST) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;
The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.
I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.
The outer query is then just picking the top-ranked result for each id.
You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.
But this is largely speculation without being able to see where your existing query is actually slow.

Oracle Query Subselect with order by

I am normally using MS SQL and am a total rookie with oracle.
I get an oracle driver problem when I use the ORDER BY statement in my subquery.
Example (my real statement is much more complex but I doubt it matters to my problem - I can post it if needed):
SELECT col1
, col2
, (SELECT colsub FROM subtbl WHERE idsub = tbl.id AND ROWNUM=1 ORDER BY coldate) col3
FROM tbl
If I do such a construct I get an odbc driver error: ORA-00907: Right bracket is missing (translated from german, so bracket might be other word :)).
If I remove the ORDER BY coldate everything works fine. I couldn't find any reason why, so what do I wrong?
It doesn't make any sense to write the ROWNUM and the ORDER BY this way since the ORDER BY is evaluated after the WHERE clause, meaning that it has no effect in this case. An example is given in this question.
This also gets a little more complicated because it is hard to join a sub-query back to the main query if it is nested too deeply.
The query below won't necessarily work because you can't join between tbl and subtbl in this way.
SELECT
col1,
col2,
(
select colsub
from (
SELECT colsub
FROM subtbl
WHERE idsub = tbl.id
order by coldate
)
where rownum = 1
) as col3
FROM tbl
So you'll need to use some sort of analytic function as shown in the example below:
SELECT
col1,
col2,
(SELECT max(colsub) keep (dense_rank first order by coldate) as colsub
FROM subtbl
WHERE idsub = tbl.id
group by idsub
) col3
FROM tbl
The FIRST analytic function is more complicated than it needs to be but it will get the job done.
Another option would be to use the ROW_NUMBER analytic function which would also solve the problem.
SELECT
col1,
col2,
(select colsub
from (
SELECT
idsub,
row_number() over (partition by idsub order by coldate) as rown,
colsub
FROM subtbl a
) a
WHERE a.idsub = tbl.id
and a.rown = 1
) col3
FROM tbl
What you are doing wrong is clear. You are using an order by in a sub-query. It does not make any sense using an order by in a sub-query so why would you want to do that?
Also you are using an order by on a sub-query that always returns 1 row. That also does not make any sense.
If you want the query result to be sorted use an order by at the highest level.
try:
select
col1,
col2,
colsub
from(
select
col1 ,
col2 ,
coldate,
max(coldate) over (partition by st.idsub) max_coldate
from
tbl t,
subtbl st
where
st.idsub = t.id)
where
coldate = max_coldate

Resources