ORACLE QUERY - OPTIMISATION - oracle

I have written the below query which has many 'AND' operators, i would like to know how to optimize the performance of the below query [can i remove some of the 'AND' Operators]
SELECT I.date,
K.somcolumn,
L.somcolumn,
D.somcolumn
FROM Table1 I,
Table2 K,
Table3 L,
Table4 D
WHERE I._ID = K._ID
AND K.ID = L._ID
AND L._ID = I._ID
AND I._CODE = L._CODE
AND K.ID = D._ID(+)
AND L._ID IN ( SELECT _id
FROM I
WHERE UPPER (someflag) = 'TRUE'
GROUP BY _id
HAVING COUNT (*) > 1)
AND L._ID IN ( SELECT _id
FROM I
WHERE UPPER (code) = 'OPEN'
GROUP BY _id
HAVING COUNT (*) > 1)
ORDER BY I._ID, I._CODE;

You can't combine any of the conditions as far as I can tell, but you can improve the query and reduce the number of AND operators by using standard JOIN syntax:
SELECT I.date,
K.somcolumn,
L.somcolumn,
D.somcolumn
FROM Table1 I
INNER JOIN Table2 K ON I._ID = K._ID
INNER JOIN Table3 L ON K.ID = L._ID
LEFT JOIN Table4 D ON K.ID = D._ID
WHERE L._ID IN ( SELECT _id
FROM I
WHERE UPPER (someflag) = 'TRUE'
GROUP BY _id
HAVING COUNT (*) > 1)
AND L._ID IN ( SELECT _id
FROM I
WHERE UPPER (code) = 'OPEN'
GROUP BY _id
HAVING COUNT (*) > 1)
ORDER BY I._ID, I._CODE;
With that as a basis, you may get an optimization boost if you join to the subquery conditions rather than using correlated subqueries. No guarantees, but something like this may help:
SELECT I.date,
K.somcolumn,
L.somcolumn,
D.somcolumn
FROM Table1 I
INNER JOIN Table2 K ON I._ID = K._ID
INNER JOIN Table3 L ON K.ID = L._ID
LEFT JOIN Table4 D ON K.ID = D._ID
INNER JOIN (
SELECT _id
FROM I
WHERE UPPER (someflag) = 'TRUE'
GROUP BY _id
HAVING COUNT (*) > 1
) someflagtrue ON L._ID = someflagtrue._id
INNER JOIN (
SELECT _id
FROM I
WHERE UPPER (code) = 'OPEN'
GROUP BY _id
HAVING COUNT (*) > 1
) codeopen ON L._ID = codeopen._id
ORDER BY I._ID, I._CODE;

You can replace the two subqueries with one.
Old subqueries:
SELECT _id
FROM I
WHERE UPPER (someflag) = 'TRUE'
GROUP BY _id
HAVING COUNT (*) > 1)
SELECT _id
FROM I
WHERE UPPER (code) = 'OPEN'
GROUP BY _id
HAVING COUNT (*) > 1)
New subquery:
SELECT _ID
FROM I
GROUP BY _ID
HAVING COUNT(CASE WHEN UPPER(SOMEFLAG) = 'TRUE' THEN 1 ELSE 0 END) > 0
AND COUNT(CASE WHEN UPPER(CODE) = 'OPEN' THEN 1 ELSE 0 END) > 0
In most cases this should be at least a little faster as it may reduce the number of full table scans and joins. But it's difficult to tell if it will be faster on your system since there are so many possible choices for the optimizer to make.
After cleaning up the query the next step for performance tuning is to generate an explain plan. Run explain plan for select ...; and then run select * from table(dbms_xplan.display); That will show you how the query is executed, which may give you a hint at what is slow and what can be improved. Add the full output of th explain plan to your question if you need more help. It may also help to add information about the number of rows in the relevant tables, how many rows are returned, and what are the indexes.

Related

Order by in sub query using oracle

I tried to get value from sub query after ordering the records of it but the following occurred when execute the query :
ORA-00907: missing right parenthesis
The Query is :
select S.value , nvl((select D.value from D
join T on D.subID = t.SubID
where D.subid2 = s.subid2 and t.subid3 = s.subid3 and rownum = 1 order by t.id),0 ) value
from S
You can't have ORDER BY clause in a subquery.
See if something like this helps: use a CTE (as it looks somewhat nicer; could be a normal subquery, if you want) which calculates ordinal number for all rows, sorted by t.id column value. In outer (main) query, select row whose rn = 1 (which should act just like your ORDER BY t.id + rownum = 1).
WITH
temp
AS
(SELECT s.VALUE s_value,
d.VALUE d_value,
ROW_NUMBER () OVER (ORDER BY t.id) rn
FROM d
JOIN t ON d.subid = t.subid
JOIN s
ON s.subid2 = d.subid2
AND s.subid3 = t.subid3)
SELECT s_value, NVL (d_value, 0) d_value
FROM temp
WHERE rn = 1
If you are on Oracle 12c or higher, you can use the FETCH FIRST... clause.
SELECT S.VALUE,
NVL (( SELECT D.VALUE
FROM D JOIN T ON D.subID = t.SubID
WHERE D.subid2 = s.subid2 AND t.subid3 = s.subid3
ORDER BY t.id
FETCH FIRST 1 ROWS ONLY),
0) VALUE
FROM S

Adding filters in subquery from CTE quadruples run time

I am working on an existing query for SSRS report that focuses on aggregated financial aid data split out into 10 aggregations. User wants to be able to select students included in that aggregated data based on new vs. returning and 'selected for verification.' For the new/returning status, I added a CTE to return the earliest admit date for a student. 2 of the 10 data fields are created by a subquery. I have been trying for 3 days to get the subquery to use the CTE fields for a filter, but they won't work. Either they're ignored or I get a 'not a group by expression' error. If I put the join to the CTE within the subquery, the query time jumps from 45 second to 400 seconds. This shouldn't be that complicated! What am I missing? I have added some of the code... 3 of the chunks work - paid_something doesn't.
with stuStatus as
(select
person_uid, min(year_admitted) admit_year
from academic_study
where aid_year between :AidYearStartParameter and :AidYearEndParameter
group by person_uid)
--- above code added to get student information not originally in qry
select
finaid_applicant_status.aid_year
, count(1) as fafsa_cnt --works
, sum( --works
case
when (
package_complete_date is not null
and admit.status is not null
)
then 1
else 0
end
) as admit_and_package
, (select count(*) --does't work
from (
select distinct award_by_aid_year.person_uid
from
award_by_aid_year
where
award_by_aid_year.aid_year = finaid_applicant_status.aid_year
and award_by_aid_year.total_paid_amount > 0 )dta
where
(
(:StudentStatusParameter = 'N' and stuStatus.admit_year = finaid_applicant_status.aid_year)
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
)
as paid_something
, sum( --works
case
when exists (
select
1
from
award_by_person abp
where
abp.person_uid = fafsa.person_uid
and abp.aid_year = fafsa.aid_year
and abp.award_paid_amount > 0
) and fafsa.requirement is not null
then 1
else 0
end
) as paid_something_fafsa
from
finaid_applicant_status
join finaid_tracking_requirement fafsa
on finaid_applicant_status.person_uid = fafsa.person_uid
and finaid_applicant_status.aid_year = fafsa.aid_year
and fafsa.requirement = 'FAFSA'
left join finaid_tracking_requirement admit
on finaid_applicant_status.person_uid = admit.person_uid
and finaid_applicant_status.aid_year = admit.aid_year
and admit.requirement = 'ADMIT'
and admit.status in ('M', 'P')
left outer join stuStatus
on finaid_applicant_status.person_uid = stuStatus.person_uid
where
finaid_applicant_status.aid_year between :AidYearStartParameter and :AidYearEndParameter
and (
(:VerifiedParameter = '%') OR
(:VerifiedParameter <> '%' AND finaid_applicant_status.verification_required_ind = :VerifiedParameter)
)
and
(
(:StudentStatusParameter = 'N' and (stuStatus.admit_year IS NULL OR stuStatus.admit_year = finaid_applicant_status.aid_year ))
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
group by
finaid_applicant_status.aid_year
order by
finaid_applicant_status.aid_year
Not sure if this helps, but you have something like this:
select aid_year, count(1) c1,
(select count(1)
from (select distinct person_uid
from award_by_aid_year a
where a.aid_year = fas.aid_year))
from finaid_applicant_status fas
group by aid_year;
This query throws ORA-00904 FAS.AID_YEAR invalid identifier. It is because fas.aid_year is nested too deep in subquery.
If you are able to modify your subquery from select count(1) from (select distinct sth from ... where year = fas.year) to select count(distinct sth) from ... where year = fas.year then it has the chance to work.
select aid_year, count(1) c1,
(select count(distinct person_uid)
from award_by_aid_year a
where a.aid_year = fas.aid_year) c2
from finaid_applicant_status fas
group by aid_year
Here is simplified demo showing non-working and working queries. Of course your query is much more complicated, but this is something what you could check.
Also maybe you can use dbfiddle or sqlfiddle to set up some test case? Or show us sample (anonimized) data and required output for them?

Re-writing a join query

I have a question concerning Hive. Let me explain to you the scenario :
I am using a Hive action on Oozie; I have a query which is doing
succesive LEFT JOIN on different tables;
Total number of rows to be inserted is about 35 million;
First, the job was crashing due to lack of memory, so I set "set hive.auto.convert.join=false" the query was perfectly executed but it took 4 hours to be done;
I tried to rewrite the order of LEFT JOINs putting large tables at the end, but same result, about 4 hours to be executed;
Here is what the query look like:
INSERT OVERWRITE TABLE final_table
SELECT
T1.Id,
T1.some_field_name,
T1.another_filed_name,
T2.also_another_filed_name,
FROM table1 T1
LEFT JOIN table2 T2 ON ( T2.Id = T1.Id ) -- T2 is the smallest table
LEFT JOIN table3 T3 ON ( T3.Id = T1.Id )
LEFT JOIN table4 T4 ON ( T4.Id = T1.Id ) -- T4 is the biggest table
So, knowing the structure of the query is there a way to rewrite it so that I can avoid too many JOINs ?
Thanks in advance
PS: Even vectorization gave me the same timing
Too long for a comment, will be deleted later.
(1) Your current query won't compile.
(2) You are not selecting anything from T3 and T4, which makes no sense.
(3) Changing the order of tables is not likely to have any impact with cost based optimizer.
(4) Basically I would suggest to collect statistics on the tables, specifically on the id columns, but in your case I got a feeling that id is not unique in more than 1 table.
Add to your post the result of the following query:
select *
, case when cnt_1 = 0 then 1 else cnt_1 end
* case when cnt_2 = 0 then 1 else cnt_2 end
* case when cnt_3 = 0 then 1 else cnt_3 end
* case when cnt_4 = 0 then 1 else cnt_4 end as product
from (select id
,count(case when tab = 1 then 1 end) as cnt_1
,count(case when tab = 2 then 1 end) as cnt_2
,count(case when tab = 3 then 1 end) as cnt_3
,count(case when tab = 4 then 1 end) as cnt_4
from ( select 1 as tab,id from table1
union all select 2 as tab,id from table2
union all select 3 as tab,id from table3
union all select 4 as tab,id from table4
) t
group by id
having greatest (cnt_1,cnt_2,cnt_3,cnt_4) >= 10
) t
order by product desc
limit 10
;

data not retrieved in the same order

I have a query which returns list of SIMs. Each SIM is linked to a Customer. The SIMs are in T_SIM table and Customers are in T_CUSTOMER table. There can be more than one SIM linked to a single Customer. When returning the SIMs it returns the Customer details also.
The T_SIM table will have a foreigh key to T_CUSTOMER table.
The issue is:
First run the query by requesting top 100 records by doing order by CUSTOMER_CODE in ascending order.
Now run the same query by requesting top 1000 records by doing order by CUSTOMER_CODE in ascending order.
Here in point #2, in the results of 1000 records the first 100 records are not same as in point #1 result. The records got shuffled. The order is not consistent.
To resolve this I have used ROWID along with order by CUSTOMER_CODE.
But the solution is not accepted by the client.
Could you please suggest any other alternative to resolve the issue. The data type of CUSTOMER_CODE is VARCHAR2
Below is the query:
SELECT TT.SIM_ID,
TT.IMSI,
TT.MSISDN,
TT.SECONDARY_MSISDN,
TT.CUSTOMER_ID,
TT.SIM_STATE,
TCU.CUSTOMER_CODE
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 = 1
AND TT.SIM_ID IN
(SELECT SIM_ID
FROM
(SELECT *
FROM
(SELECT Z.*,
ROWNUM RNUM
FROM
(SELECT TT.SIM_ID
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 =1
ORDER BY TCU.CUSTOMER_CODE ASC
) Z
WHERE ROWNUM <= 1000
)
WHERE RNUM >= 0
)
)
ORDER BY TCU.CUSTOMER_CODE ASC
The result in both the cases is done order by CUSTOMER_CODE but the SIMS belonging to them are not coming in the same order.
The problem is that first you are limiting number of rows when selecting from t_sim (so these are selected randomly) , and just then you are ordering your output.
So what you should do, is to remove ROWNUM<1000 from inner query and
put it on the very top level like this:
select * from
( TT.SIM_ID,
TT.IMSI,
TT.MSISDN,
TT.SECONDARY_MSISDN,
TT.CUSTOMER_ID,
TT.SIM_STATE,
TCU.CUSTOMER_CODE
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 = 1
AND TT.SIM_ID IN
(SELECT SIM_ID
FROM
(SELECT *
FROM
(SELECT Z.*,
ROWNUM RNUM
FROM
(SELECT TT.SIM_ID
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 =1
ORDER BY TCU.CUSTOMER_CODE ASC
) Z
)
WHERE RNUM >= 0
)
)
ORDER BY TCU.CUSTOMER_CODE ASC
) where rownum<1000
Because first you want to make complete ordered result set and just then display 1000 top records of sim cards ordered by customer_code.

Case based Joins in oracle query

I have below query which returns all the data from inner query including one more field i.e. ctry from tbl_account table.I have used
LEFT JOIN so that even though a_row_id not matched with row_id from tbl_account so that i would still get all the matched data from inner query.
select b.* ,sac.ctry as country from
(SELECT DISTINCT rec.* ,asset.a_row_id
FROM tbl_record rec LEFT JOIN tbl_asset asset
ON asset.from_sn = rec.to_sn
AND asset.from_name = rec.to_productname
AND asset.from_rel = rec.to_rel
LEFT JOIN tbl_country mas
ON rec.loc = mas.a_loc
WHERE rec.cust_id = 2456 ) b
LEFT JOIN tbl_account sac on b.a_row_id = sac.row_id ;
But,now i need to implement case based join in the above query i.e. when a_row_id is not null then inner join with one extra condition will be used and when a_row_id is null left join will be used.
I have tried using CASE statement as below but it is not working and cost of the query is very high as well and I suppose it is because of CASE statement.The
data in all the tables are in millions.
select b.* ,sac.ctry as country from
(SELECT DISTINCT rec.* ,asset.a_row_id
FROM tbl_record rec LEFT JOIN tbl_asset asset
ON asset.from_sn = rec.to_sn
AND asset.from_name = rec.to_productname
AND asset.from_rel = rec.to_rel
LEFT JOIN tbl_country mas
ON rec.loc = mas.a_loc
WHERE rec.cust_id = 2456 ) b
INNER JOIN tbl_account sac
ON CASE
WHEN b.a_row_id IS NOT NULL AND b.a_row_id = sac.row_id and b.from_cn = SAC.to_cn THEN 1
END = 1
LEFT JOIN tbl_account sac
ON CASE
WHEN b.a_row_id IS NULL AND b.a_row_id = sac.row_id THEN 1
END = 1 ;
Is there any other way that i can implement case based joins condition in above oracle query and at the same time cost of query would be less.Is it possible to use decode in this case ? Any help on this would be greatly appreciated.
This syntax is probably what you are looking for:
select b.*, sac.ctry as country
from b
left join tbl_account sac
on b.a_row_id = sac.row_id or (b.a_row_id is null and sac.row_id is null)
where b.from_cn = sac.to_cn or b.a_row_id is null
SQLFiddle

Resources