How to optimize this SELECT with sub query Oracle - oracle

Here is my query,
SELECT ID As Col1,
(
SELECT VID FROM TABLE2 t
WHERE (a.ID=t.ID or a.ID=t.ID2)
AND t.STARTDTE =
(
SELECT MAX(tt.STARTDTE)
FROM TABLE2 tt
WHERE (a.ID=tt.ID or a.ID=tt.ID2) AND tt.STARTDTE < SYSDATE
)
) As Col2
FROM TABLE1 a
Table1 has 48850 records and Table2 has 15944098 records.
I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.
The query is still too slow. How can this be improved? Please help.

I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.
This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):
CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE)
RETURN table2.vid%TYPE IS
l_result table2.vid%TYPE;
BEGIN
SELECT vid
INTO l_result
FROM (SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
UNION ALL
SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id2 = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
ORDER BY startdte DESC)
WHERE rownum = 1;
RETURN l_result;
END;
Your SQL would become:
SELECT ID As Col1,
getvid(a.id) vid
FROM TABLE1 a
Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.

Possibly try the following, though untested.
WITH max_times AS
(SELECT a.ID, MAX(t.STARTDTE) AS Startdte
FROM TABLE1 a, TABLE2 t
WHERE (a.ID=t.ID OR a.ID=t.ID2)
AND t.STARTDTE < SYSDATE
GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
LEFT OUTER JOIN max_times mt
ON (b.ID = mt.ID)
LEFT OUTER JOIN TABLE2 tt
ON ((mt.ID=tt.ID OR mt.ID=tt.ID2)
AND mt.startdte = tt.startdte)

You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:
SELECT id AS col1, vid
FROM (
SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
NULLS LAST) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;
The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.
I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.
The outer query is then just picking the top-ranked result for each id.
You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.
But this is largely speculation without being able to see where your existing query is actually slow.

Related

Oracle SQL -- Finding count of rows that match date maximum in table

I am trying to use a query to return the count from rows such that the date of the rows matches the maximum date for that column in the table.
Oracle SQL: version 11.2:
The following syntax would seem to be correct (to me), and it compiles and runs. However, instead of returning JUST the count for the maximum, it returns several counts more or less like the "HAIVNG" clause wasn't there.
Select ourDate, Count(1) as OUR_COUNT
from schema1.table1
group by ourDate
HAVING ourDate = max(ourDate) ;
How can this be fixed, please?
You can use:
SELECT MAX(ourDate) AS ourDate,
COUNT(*) KEEP (DENSE_RANK LAST ORDER BY ourDate) AS ourCount
FROM schema1.table1
or:
SELECT ourDate,
COUNT(*) AS our_count
FROM (
SELECT ourDate,
RANK() OVER (ORDER BY ourDate DESC) AS rnk
FROM schema1.table1
)
WHERE rnk = 1
GROUP BY ourDate
Which, for the sample data:
CREATE TABLE table1 (ourDate) AS
SELECT SYSDATE FROM DUAL CONNECT BY LEVEL <= 5 UNION ALL
SELECT SYSDATE - 1 FROM DUAL;
Both output:
OURDATE
OUR_COUNT
2022-06-28 13:35:01
5
db<>fiddle here
I don't know if I understand what you want. Try this:
Select x.ourDate, Count(1) as OUR_COUNT
from schema1.table1 x
where x.ourDate = (select max(y.ourDate) from schema1.table1 y)
group by x.ourDate
One option is to use a subquery which fetches maximum date:
select ourdate, count(*)
from table1
where ourdate = (select max(ourdate)
from table1)
group by ourdate;
Or, a more modern approach (if your database version supports it; 11g doesn't, though):
select ourdate, count(*)
from table1
group by ourdate
order by ourdate desc
fetch first 1 rows only;
You can use this SQL query:
select MAX(ourDate),COUNT(1) as OUR_COUNT
from schema1.table1
where ourDate = (select MAX(ourDate) from schema1.table1)
group by ourDate;

How can I count the amount of values in different columns in oracle plsql

For example, I have a table with these values:
ID
Date
Col1
Col2
Col3
Col4
1
01/11/2021
A
A
B
2
01/11/2021
B
B
The A and B values are dynamic, they can be other characters as well.
Now I need somehow to get to the result that id 1 has 2 occurences of A and one of B. Id 2 has 0 occurences of A and 2 occurences of B.
I'm using dynamic SQL to do this:
for v_record in table_cursor
loop
for i in 1 .. 4
loop
v_query := 'select col'||i||' from table where id = '||v_record.id;
execute immediate v_query into v_char;
if v_char = "any letter I'm checking" then
amount := amount + 1;
end if;
end loop;
-- do somehting with the amount
end loop;
But there has to be a better much more efficient way to do this.
I don't have that much knowledge of plsql and I really don't know how to formulate this question in google. I've looked into pivot, but I don't think that will help me out in this case.
I'd appreciate it if someone could help me out.
Assuming the number of columns would be fixed at four, you could use a union aggregation approach here:
WITH cte AS (
SELECT ID, Col1 AS val FROM yourTable UNION ALL
SELECT ID, Col2 FROM yourTable UNION ALL
SELECT ID, Col3 FROM yourTable UNION ALL
SELECT ID, Col4 FROM yourTable
)
SELECT
t1.ID,
t2.val,
COUNT(c.ID) AS cnt
FROM (SELECT DISTINCT ID FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT val FROM cte) t2
LEFT JOIN cte c
ON c.ID = t1.ID AND
c.val = t2.val
WHERE
t2.val IS NOT NULL
GROUP BY
t1.ID,
t2.val;
This produces:
Demo

Reduce overload on pl/sql

I have a requirement to do matching of few attributes one by one. I'm looking to avoid multiple select statements. Below is the example.
Table1
Col1|Price|Brand|size
-----------------------
A|10$|BRAND1|SIZE1
B|10$|BRAND1|SIZE1
C|30$|BRAND2|SIZE2
D|40$|BRAND2|SIZE4
Table2
Col1|Col2|Col3
--------------
B|XYZ|PQR
C|ZZZ|YYY
Table3
Col1|COL2|COL3|LIKECOL1|Price|brand|size
-----------------------------------------
B|XYZ|PQR|A|10$|BRAND1|SIZE1
C|ZZZ|YYY|D|NULL|BRAND2|NULL
In table3, I need to insert data from table2 by checking below conditions.
Find a match for record in table2, if Brand and size, Price match
If no match found, then try just Brand, Size
still no match found, try brand only
In the above example, for the first record in table2, found match with all the 3 attributes and so inserted into table3 and second record, record 'D' is matching but only 'Brand'.
All I can think of is writing 3 different insert statements like below into an oracle pl/sql block.
insert into table3
select from tab2
where all 3 attributes are matching;
insert into table3
select from tab2
where brand and price are matching
and not exists in table3 (not exists is to avoid
inserting the same record which was already
inserted with all 3 attributes matched);
insert into table3
select from tab2
where Brand is matching and not exists in table3;
Can anyone please suggest a better way to achieve it in any better way avoiding multiple times selecting from table2.
This is a case for OUTER APPLY.
OUTER APPLY is a type of lateral join that allows you join on dynamic views that refer to tables appearing earlier in your FROM clause. With that ability, you can define a dynamic view that finds all the matches, sorts them by the pecking order you've specified, and then use FETCH FIRST 1 ROW ONLY to only include the 1st one in the results.
Using OUTER APPLY means that if there is no match, you will still get the table B record -- just with all the match columns null. If you don't want that, you can change OUTER APPLY to CROSS APPLY.
Here is a working example (with step by step comments), shamelessly stealing the table creation scripts from Michael Piankov's answer:
create table Table1 (Col1,Price,Brand,size1)
as select 'A','10','BRAND1','SIZE1' from dual union all
select 'B','10','BRAND1','SIZE1' from dual union all
select 'C','30','BRAND2','SIZE2' from dual union all
select 'D','40','BRAND2','SIZE4'from dual
create table Table2(Col1,Col2,Col3)
as select 'B','XYZ','PQR' from dual union all
select'C','ZZZ','YYY' from dual;
-- INSERT INTO table3
SELECT t2.col1, t2.col2, t2.col3,
t1.col1 likecol1,
decode(t1.price,t1_template.price,t1_template.price, null) price,
decode(t1.brand,t1_template.brand,t1_template.brand, null) brand,
decode(t1.size1,t1_template.size1,t1_template.size1, null) size1
FROM
-- Start with table2
table2 t2
-- Get the row from table1 matching on col1... this is our search template
inner join table1 t1_template on
t1_template.col1 = t2.col1
-- Get the best match from table1 for our search
-- template, excluding the search template itself
outer apply (
SELECT * FROM table1 t1
WHERE 1=1
-- Exclude search template itself
and t1.col1 != t2.col1
-- All matches include BRAND
and t1.brand = t1_template.brand
-- order by match strength based on price and size
order by case when t1.price = t1_template.price and t1.size1 = t1_template.size1 THEN 1
when t1.size1 = t1_template.size1 THEN 2
else 3 END
-- Only get the best match for each row in T2
FETCH FIRST 1 ROW ONLY) t1;
Unfortunately is not clear what do you mean when say match. What is you expectation if there is more then one match?
Should it be only first matching or it will generate all available pairs?
Regarding you question how to avoid multiple inserts there is more then one way:
You could use multitable insert with INSERT first and condition.
You could join table1 to self and get all pairs and filter results in where condition
You could use analytical function
I suppose there is another ways. But why you would like to avoid 3 simple inserts. Its easy to read and maintain. And may be
There is example with analytical function next:
create table Table1 (Col1,Price,Brand,size1)
as select 'A','10','BRAND1','SIZE1' from dual union all
select 'B','10','BRAND1','SIZE1' from dual union all
select 'C','30','BRAND2','SIZE2' from dual union all
select 'D','40','BRAND2','SIZE4'from dual
create table Table2(Col1,Col2,Col3)
as select 'B','XYZ','PQR' from dual union all
select'C','ZZZ','YYY' from dual
with s as (
select Col1,Price,Brand,size1,
count(*) over(partition by Price,Brand,size1 ) as match3,
count(*) over(partition by Price,Brand ) as match2,
count(*) over(partition by Brand ) as match1,
lead(Col1) over(partition by Price,Brand,size1 order by Col1) as like3,
lead(Col1) over(partition by Price,Brand order by Col1) as like2,
lead(Col1) over(partition by Brand order by Col1) as like1,
lag(Col1) over(partition by Price,Brand,size1 order by Col1) as like_desc3,
lag(Col1) over(partition by Price,Brand order by Col1) as like_desc2,
lag(Col1) over(partition by Brand order by Col1) as like_desc1
from Table1 t )
select t.Col1,t.Col2,t.Col3, coalesce(s.like3, like_desc3, s.like1, like_desc1, s.like1, like_desc1),
case when match3 > 1 then size1 end as size1,
case when match1 > 1 then Brand end as Brand,
case when match2 > 1 then Price end as Price
from table2 t
left join s on s.Col1 = t.Col1
COL1 COL2 COL3 LIKE_COL SIZE1 BRAND PRICE
B XYZ PQR A SIZE1 BRAND1 10
C ZZZ YYY D - BRAND2 -

Query taking long when i use user defined function with order by in oracle select

I have a function, which will get greatest of three dates from the table.
create or replace FUNCTION fn_max_date_val(
pi_user_id IN number)
RETURN DATE
IS
l_modified_dt DATE;
l_mod1_dt DATE;
l_mod2_dt DATE;
ret_user_id DATE;
BEGIN
SELECT MAX(last_modified_dt)
INTO l_modified_dt
FROM table1
WHERE id = pi_user_id;
-- this table contains a million records
SELECT nvl(MAX(last_modified_ts),sysdate-90)
INTO l_mod1_dt
FROM table2
WHERE table2_id=pi_user_id;
-- this table contains clob data, 800 000 records, the table 3 does not have user_id and has to fetched from table 2, as shown below
SELECT nvl(MAX(last_modified_dt),sysdate-90)
INTO l_mod2_dt
FROM table3
WHERE table2_id IN
(SELECT id FROM table2 WHERE table2_id=pi_user_id
);
execute immediate 'select greatest('''||l_modified_dt||''','''||l_mod1_dt||''','''||l_mod2_dt||''') from dual' into ret_user_id;
RETURN ret_user_id;
EXCEPTION
WHEN OTHERS THEN
return SYSDATE;
END;
this function works perfectly fine and executes within a second.
-- random user_id , just to test the functionality
SELECT fn_max_date_val(100) as max_date FROM DUAL
MAX_DATE
--------
27-02-14
For reference purpose i have used the table name as table1,table2 and table3 but my business case is similar to what i stated below.
I need to get the details of the table1 along with the highest modified date among the three tables.
I did something like this.
SELECT a.id,a.name,a.value,fn_max_date_val(id) as max_date
FROM table1 a where status_id ='Active';
The above query execute perfectly fine and got result in millisecods. But the problem came when i tried to use order by.
SELECT a.id,a.name,a.value,a.status_id,last_modified_dt,fn_max_date_val(id) as max_date
FROM table1 where status_id ='Active' a
order by status_id desc,last_modified_dt desc ;
-- It took almost 300 seconds to complete
I tried using index also all the values of the status_id and last_modified, but no luck. Can this be done in a right way?
How about if your query is like this?
select a.*, fn_max_date_val(id) as max_date
from
(SELECT a.id,a.name,a.value,a.status_id,last_modified_dt
FROM table1 where status_id ='Active' a
order by status_id desc,last_modified_dt desc) a;
What if you don't use the function and do something like this:
SELECT a.id,a.name,a.value,a.status_id,last_modified_dt x.max_date
FROM table1 a
(
select max(max_date) as max_date
from (
SELECT MAX(last_modified_dt) as max_date
FROM table1 t1
WHERE t1.id = a.id
union
SELECT nvl(MAX(last_modified_ts),sysdate-90) as max_date
FROM table2 t2
WHERE t2.table2_id=a.id
...
) y
) x
where a.status_id ='Active'
order by status_id desc,last_modified_dt desc;
Syntax might contain errors, but something like that + the third table in the derived table too.

Improve performance of stored procedure where only select query is used

In our environment one procedure is taking long time to execute. I have checked the procedure, and below is the summary -
The procedure contains only select block (around 24). Before each select we are checking if data exists. If yes select the data, else do something else. For example :
-- Select block 1 --
IF EXISTS (SELECT 1 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
)
BEGIN
SELECT t1.col1,t2.col2,t2.col3 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
END
ELSE
BEGIN
SELECT 'DEFAULT1', 'DEFAULT2', 'DEFAULT3'
END
-- Select block 2 --
IF EXISTS (SELECT 1 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col5='someValue' AND t2.col5='someValue'
)
BEGIN
SELECT t1.col5,t2.col6,t2.col7 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col5='someValue' AND t2.col5='someValue'
END
ELSE
BEGIN
SELECT 'DEFAULT1', 'DEFAULT2', 'DEFAULT3'
END
I have come to an conclution that, somehow if we can combine the query that is used within IF EXISTS block into one query, and set some value to some variables so that we can identify which where condition returns true, that can reduce the execution time and improve the performance.
Is my thought correct? Is there any option to do that? Can you suggest any other options?
We are using Microsoft SQL Server 2005.
[Editted : Added] - All select statement doesn't return same column types they are different. And all select statements are required. If there are 24 if block, procedure should return 24 result-set.
[Added]
I would like to ask one more thing, which one of the below runs faster -
SELECT 1 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
SELECT COUNT(1) FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
SELECT TOP 1 1 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
Thanks.
Kartic
To enhance the performance of select query...create "index" on columns which you are using in where clause
like you are using the
WHERE t1.col2='someValue' AND t2.col2='someValue'
WHERE t1.col5='someValue' AND t2.col5='someValue'
so create database index on col2 and col5
Temp table
you can use the temp table to store the result. since you are using same query 24 time so first store the result of below query into the temp table (correct the syntax as require)
insert into temp_table (col2, col5)
SELECT col1, col5 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
Now use the temp table for checking
-- Select block 1 --
IF EXISTS (SELECT 1 FROM temp_table
WHERE t1.col2='someValue' AND t2.col2='someValue'
)
BEGIN
SELECT t1.col1,t2.col2,t2.col3 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
END
-- Select block 2 --
IF EXISTS (SELECT 1 FROM temp_table1
WHERE t1.col5='someValue' AND t2.col5='someValue'
)
BEGIN
SELECT t1.col5,t2.col6,t2.col7 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col5='someValue' AND t2.col5='someValue'
END
The current structure is not very efficient - you effectively have to execute each "if" statement (which will be expensive), and then repeat the same where clause (the expensive bit) if the "if" returns true. And you do this 24 times. Worst case (all the queries return data), you're doubling the time for the query.
You say you've checked for indexing - given that each query appears to be subtly different, it would be worth double checking this.
The obvious thing is to refactor the application to execute the 24 select statements, and deal with the fact that sometimes, they don't return any data. That's a fairly large refactoring, and I assume you've considered that...
If you can't do that, consider a less ambitious (though nastier) refactoring. Instead of checking whether data exists, and either returning it or an equivalent default result set, write it as a union:
SELECT t1.col1,t2.col2,t2.col3 FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
UNION
SELECT 'DEFAULT1', 'DEFAULT2', 'DEFAULT3'
This reduces the number of times you're hitting the where clause, but means your client application must filter out the "default" data.
To answer your final question, I'd run it through the query optimizer and look at the execution plan - but I'd imagine that the first version is fastest - the query can complete as soon as it finds the first record that matches the where criteria. The second version must find all records that match and count them; the final version must find all records and select the first one.
You could outer-join the results of a query to a row of default values, then fall back to the defaults when the query's results are empty:
SELECT
col1 = COALESCE(query.col1, defaults.col1),
col2 = COALESCE(query.col2, defaults.col2),
col3 = COALESCE(query.col3, defaults.col3)
FROM
(SELECT 'DEFAULT1', 'DEFAULT2', 'DEFAULT3') AS defaults (col1, col2, col3)
LEFT JOIN
(
SELECT t1.col1, t2.col2, t2.col3
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
) query
ON 1=1 -- i.e. join all the rows unconditionally
;
The method may not suit you in exactly that form you if the subquery may actually return NULLs and those must not be replaced with default values. In that case, have the subqueries return a flag column (just any value). If that column evaluates to NULL in the final query, that can only mean that the subquery hasn't returned rows. You can use that fact in a CASE expression like this:
SELECT
col1 = CASE WHEN query.HasRows IS NULL THEN defaults.col1 ELSE query.col2 END,
col2 = CASE WHEN query.HasRows IS NULL THEN defaults.col2 ELSE query.col2 END,
col3 = CASE WHEN query.HasRows IS NULL THEN defaults.col3 ELSE query.col2 END
FROM
(SELECT 'DEFAULT1', 'DEFAULT2', 'DEFAULT3') AS defaults (col1, col2, col3)
LEFT JOIN
(
SELECT HasRows = 1, t1.col1, t2.col2, t2.col3
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1
WHERE t1.col2='someValue' AND t2.col2='someValue'
) query
ON 1=1
;

Resources