Where do I start to optimize this SQL Query? - performance

I have this sql query delivered from a customer which we need to performance optimize.
Is there anyone that can point me in the right direction to where to start looking for optimizing the following query?
The query on my local machine takes about 6-7 seconds, but on the users it´s about 30 seconds, It executes on a mssql 2008r2
Thanks!
var query = #"
DECLARE #SearchString nvarchar(250)
set #SearchString = '1811820001'
;with BaseSelectorCTE (ID) as
(
SELECT ID FROM BaseCases b
where
b.CPR like #SearchString
OR (b.FirstName + ' ' + b.LastName) like #SearchString
OR b.CustomerInfo_InstitutionName like #SearchString
UNION
Select ID from FlexjobCase
where Kommune like #SearchString
UNION
Select ID from DisabledAssistantCase
where Kommune like #SearchString
UNION
Select ID from AdultStudentCase
where Kommune like #SearchString
UNION
Select ID from DiseaseCase
where Kommune like #SearchString
UNION
Select ID from MaternityCase
where Kommune like #SearchString
UNION
Select ID from MiscellaneousCase
where Kommune like #SearchString
UNION
Select ID from WageSubsidyCase
where Kommune like #SearchString
UNION
Select w.ID from WageSubsidyCase w inner join JobCenters j on
w.JobcenterID = j.ID
where
j.Name like #SearchString
UNION
Select a.ID from AdultStudentCase a inner join JobCenters j on
a.JobcenterID = j.ID
where
j.Name like #SearchString
)
--
-- Select BaseCases mapped to result type
--
,ResultSelectorCTE AS
(
select
bc.Id as CaseID,
bc.ChildCaseName,
bc.CPR,
bc.FirstName,
bc.LastName,
bc.CustomerInfo_CustomerInfoID as CustomerInfoID,
bc.CustomerInfo_InstitutionName as InstitutionName,
bc.CaseDeadline,
bc.StatusID,
cs.Name as [StatusName],
cs.Owner as [StatusOwner],
bc.MetaData_Updated as [LastChange],
bc.LastActionDay,
,CASE bc.StatusID WHEN 9 THEN 1 ELSE 0 END as SidstePeriodeSoegt
from BaseCases bc
inner join CaseStatus cs ON
bc.StatusID = cs.ID
inner join BaseSelectorCTE bsCTE ON
bc.ID = bsCTE.ID
)
select * from (Select *, ROW_NUMBER() Over(Order By ##version , CASE WHEN StatusID = 9 then 2 ELSE 1 END, CaseDeadline ASC,
SidstePeriodeSoegt)
As rownum from ResultSelectorCTE where 1=1 AND StatusOwner <> 2 AND StatusOwner <> 3
AND SUBSTRING(CPR, 0, 3) BETWEEN 26-08-2014 AND 26-08-2015) As Result
where rownum Between ((1 - 1) * 100 + 1) AND (1 * 100);

Yes, the query execution plan:
The SQL Server Database Engine can display how it navigates tables and uses indexes to access or process the data for a query or other DML statement, such as an update. This is a display of an execution plan. To analyze a slow-running query, it is useful to examine the query execution plan to determine what is causing the problem.

Without knowing anything, start by losing the wildcards (*) sign. It's bad almost always as you are just saying, send everything and forgetting to actually review it.
Then, format your code correctly, CTE's are great, simplifies code, but it beats the purpose if your selects look like spaghetti, this has nothing to do with performance though.
Also, I had many times when UNION ALL outperformed UNION and I din't really think if duplicates were a problem or not, so you might want to look into that.
You didn't say if you are running it from Management Studio, are you on a local or remote server, how do the CTE's perform individually, etc. Context is king on this.
Hope this helps.

Related

Adding filters in subquery from CTE quadruples run time

I am working on an existing query for SSRS report that focuses on aggregated financial aid data split out into 10 aggregations. User wants to be able to select students included in that aggregated data based on new vs. returning and 'selected for verification.' For the new/returning status, I added a CTE to return the earliest admit date for a student. 2 of the 10 data fields are created by a subquery. I have been trying for 3 days to get the subquery to use the CTE fields for a filter, but they won't work. Either they're ignored or I get a 'not a group by expression' error. If I put the join to the CTE within the subquery, the query time jumps from 45 second to 400 seconds. This shouldn't be that complicated! What am I missing? I have added some of the code... 3 of the chunks work - paid_something doesn't.
with stuStatus as
(select
person_uid, min(year_admitted) admit_year
from academic_study
where aid_year between :AidYearStartParameter and :AidYearEndParameter
group by person_uid)
--- above code added to get student information not originally in qry
select
finaid_applicant_status.aid_year
, count(1) as fafsa_cnt --works
, sum( --works
case
when (
package_complete_date is not null
and admit.status is not null
)
then 1
else 0
end
) as admit_and_package
, (select count(*) --does't work
from (
select distinct award_by_aid_year.person_uid
from
award_by_aid_year
where
award_by_aid_year.aid_year = finaid_applicant_status.aid_year
and award_by_aid_year.total_paid_amount > 0 )dta
where
(
(:StudentStatusParameter = 'N' and stuStatus.admit_year = finaid_applicant_status.aid_year)
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
)
as paid_something
, sum( --works
case
when exists (
select
1
from
award_by_person abp
where
abp.person_uid = fafsa.person_uid
and abp.aid_year = fafsa.aid_year
and abp.award_paid_amount > 0
) and fafsa.requirement is not null
then 1
else 0
end
) as paid_something_fafsa
from
finaid_applicant_status
join finaid_tracking_requirement fafsa
on finaid_applicant_status.person_uid = fafsa.person_uid
and finaid_applicant_status.aid_year = fafsa.aid_year
and fafsa.requirement = 'FAFSA'
left join finaid_tracking_requirement admit
on finaid_applicant_status.person_uid = admit.person_uid
and finaid_applicant_status.aid_year = admit.aid_year
and admit.requirement = 'ADMIT'
and admit.status in ('M', 'P')
left outer join stuStatus
on finaid_applicant_status.person_uid = stuStatus.person_uid
where
finaid_applicant_status.aid_year between :AidYearStartParameter and :AidYearEndParameter
and (
(:VerifiedParameter = '%') OR
(:VerifiedParameter <> '%' AND finaid_applicant_status.verification_required_ind = :VerifiedParameter)
)
and
(
(:StudentStatusParameter = 'N' and (stuStatus.admit_year IS NULL OR stuStatus.admit_year = finaid_applicant_status.aid_year ))
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
group by
finaid_applicant_status.aid_year
order by
finaid_applicant_status.aid_year
Not sure if this helps, but you have something like this:
select aid_year, count(1) c1,
(select count(1)
from (select distinct person_uid
from award_by_aid_year a
where a.aid_year = fas.aid_year))
from finaid_applicant_status fas
group by aid_year;
This query throws ORA-00904 FAS.AID_YEAR invalid identifier. It is because fas.aid_year is nested too deep in subquery.
If you are able to modify your subquery from select count(1) from (select distinct sth from ... where year = fas.year) to select count(distinct sth) from ... where year = fas.year then it has the chance to work.
select aid_year, count(1) c1,
(select count(distinct person_uid)
from award_by_aid_year a
where a.aid_year = fas.aid_year) c2
from finaid_applicant_status fas
group by aid_year
Here is simplified demo showing non-working and working queries. Of course your query is much more complicated, but this is something what you could check.
Also maybe you can use dbfiddle or sqlfiddle to set up some test case? Or show us sample (anonimized) data and required output for them?

How to pass parameter from query to subquery ? ORA-00904

I get ORA-00904 invalid identifier error, from this query
SELECT
tab1."col1" AS ID,
tab1."col4" AS Name,
tab1."col5" AS Place,
(SELECT SUBSTR (SYS_CONNECT_BY_PATH (one_row , ';'), 2) myConString
FROM (SELECT tab2."col3" || ',' || tab2."col4" AS one_row,
ROW_NUMBER () OVER(ORDER BY tab2."col1") rn,
COUNT (*) OVER () cnt
FROM dbo."table2" tab2
WHERE tab2."col1" = tab1."col1"
AND tab2."col2" = tab1."col2")
WHERE rn = cnt
START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1)
FROM dbo."table1" tab1
WHERE tab1."col1" IN (1,2,3)
AND tab1."col2" = 1 AND tab1."col3" = 1;
in this specific place
tab2."col1" = tab1."col1" AND tab2."col2" = tab1."col2"
In the subquery I concatenate rows into string and it works great and give me the right results, something like
1,100;1,200;2,150....
I think problem is that I try to refer to objects more then one level of subquery, but I can't figure it out, how to rewrite the query.
Thanks for any help
Correlated subqueries can only reference things one level deep. Your tab1 table is two levels away from your tab2 table.
I can't quite wrap my head around your query, but can you rewrite this so that you have a join between tab1 and tab2 instead of having a correlated query in the select clause?

Oracle's SDO_CONTAINS not using spatial index on unioned tables?

I'm trying to use Oracle's sdo_contains spatial operator, but it seems, that it's not really working, when you use it on unioned tables.
The below code runs in 2 mins, but you have to duplicate the spatial operator for every source table:
SELECT -- works
x.code,
count(x.my_id) cnt
FROM (select
c.code,
t.my_id
from my_poi_table_1 t,my_shape c
WHERE SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.latitude, t.longitude,null),null,null)
) = 'TRUE'
union all
select
c.code,
t.my_id
from my_poi_table_2 t,my_shape c
where SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.lat, t.lng,null),null,null)
) = 'TRUE'
) x
group by x.code
I wanted to make it simple, so I tried to first create the points, and then just once use the sdo_contains on it, but it's running for more then 25 mins, because it's not using the spatial index:
SELECT -- does not work
c.code,
count(x.my_id) cnt
FROM my_shape c,
(select
my_id,
sdo_geometry(2001,null,SDO_POINT_type(latitude, longitude,null),null,null) point
from my_poi_table_1 t
union all
select
my_id2,
sdo_geometry(2001,null,SDO_POINT_type(lat, lng,null),null,null) point
from my_poi_table_2 t
) x
WHERE SDO_contains(c.shape,
x.point
) = 'TRUE'
group by c.code
Is there a way to use the sdo_contains on the results of multiple tables without having to include it in the select several times?
Oracle: 12.1.0.2
It seems, that sdo_contains cannot (efficiently) read from a subselect: if I put one of the poi tables into a subselect, then Oracle will not use spatial index for that part:
SELECT -- does not work
x.code,
count(x.my_id) cnt
FROM (select --+ ordered index(c,INDEX_NAME)
c.code,
t.my_id
from my_shape c,(select t.*,rownum rn from my_poi_table_1 t) t
WHERE SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.latitude, t.longitude,null),null,null)
) = 'TRUE'
union all
select
c.code,
t.my_id
from my_poi_table_2 t,my_shape c
where SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.lat, t.lng,null),null,null)
) = 'TRUE'
) x
group by x.code

sqplus: Retrieve results in chunks (update ROWNUM in loop)

I'm really new to using sqlplus, so I'm sorry if this is a stupid question.
I have a really long running query of the form:
SELECT columnA
from tableA
where fieldA in (
(select unique columnB
from tableB
where fieldB in
(select columnC
from tableC
where fieldC not in
(select columnD
from tableD
where x=y
and a=b
and columnX in
(select columnE
from tableE
where p=q)))
and columnInTableB = <some value>
and anotherColumnInTableB = <some other value>
and thirdColumnInTableB IN (<set of values>)
and fourthColumnInTableB like <some string>);
Each of the tables has about 15 - 30 columns and varying number of rows. TableB is the largest, with about 5million rows in all. Tables A - E have between 500,000 - 1 million rows each.
I've tried a couple of approaches:
1) Run this query as is:
This query runs for really long and I get the error -
ORA-01555: snapshot too old: rollback segment number <> with name <>
I did some research and found things like:
ORA-1555: snapshot too old: rollback segment number
However, I don't have privileges to change the undo segment.
2) I re-wrote the query using with...as, but then I get the error:
unable to extend temp segment by <> in tablespace TEMP
Again, I found explanations as to how to fix this error, but I don't have privileges to extend the temp segment.
The query that takes longest to run is:
(select unique columnB
from tableB ...
The and fourthColumnInTableB like <some string>); matches about three million entries in tableB in the worst case.
Someone suggested to me to 'run the query in smaller chunks'.
An approach I thought of was to retrieve data for the long running subquery (
(select unique columnB
from tableB ...
)
in chunks (using ROWNUM as suggested here.
My question is this:
I don't know exactly how many potential matches there are for this subquery. Can I dynamically set ROWNUM to retrieve data in chunks?
If yes, could you please give me an example of what the while loop must look like, and how I can determine when the result set has been exhausted?
An option I found for this was to check while ##ROWCOUNT > 0 or use:
while exists (query)
However, I'm still not sure how to write the loop and how to use a variable (?) to dynamically set ROWNUM.
Basically, I'm wondering if I can do:
SELECT columnA
from tableA
where fieldA in (
while all results have not been fetched:
select *
from
(select a.*, rownum rnum
from
(select unique columnB
from tableB
where fieldB in
(select columnC
from tableC
where fieldC not in
(select columnD
from tableD
where x=y
and a=b
and columnX in
(select columnE
from tableE
where p=q)))
and columnInTableB = <some value>
and anotherColumnInTableB = <some other value>
and thirdColumnInTableB IN (<set of values>)
and fourthColumnInTableB like <some string>) a
where rownumm <= i) and rnum >= i);
update value of i here (?) so that the loop can repeat
How can I update the value of 'i' for rownum/rnum above dynamically in some loop to retrieve results in chunks (of, say, 10000) until the result set has been exhausted?
Also, what should the while loop look like?
I also have no idea how to rewrite this using joins (my knowledge of sql is very limited), so if someone can help me rewrite this more efficiently using joins or any other method, that would work too.
I'd really appreciate any help on this. I've been stuck on this for a few days now and I'm unable to determine a proper solution.
Thank you!
1) Try removing UNIQUE /DISTINCT clause. It should in-memory sort and temp segment.
2) Try applying a 'rownum < x_rows' filter, in the end, to restrict no of rows to the client and reduce IO.
SELECT columnA
FROM tableA
WHERE fieldA IN (
(SELECT **UNIQUE** columnB
FROM tableB
WHERE fieldB IN
(SELECT columnC
FROM tableC
WHERE fieldC NOT IN
(SELECT columnD
FROM tableD
WHERE x =y
AND a =b
AND columnX IN
(SELECT columnE FROM tableE WHERE p=q
)
)
)
AND columnInTableB = <SOME value>
AND anotherColumnInTableB = <SOME other value>
AND thirdColumnInTableB IN (<
SET OF VALUES >)
AND fourthColumnInTableB LIKE <SOME string>
)
**ROWNUM < 50** ;

Using Table Cast

Can you use something on the line of
Select * from table(cast(select * from tab1 inner join tab2)) inner join tab3
Take into account that what's inside the table(cast()) is something much more complex than a simple select involving a block like with test as (select) select *... etc.
I need a simple way to do this preferably without the need for a temporary table.
Thank you.
Database: Oracle 10g
LE:
I have something like
Select a.dummy1, a.dummy2, wm_concat(t2.dummy3)
from table1 a,
(with str as
(Select '1,2,3,4' from dual)
Select a.dummy1, t.dummy3
from table1 a
inner join
(Select regexp_substr (str, '[^,]+', 1, rownum) split
from str
connect by level <= length (regexp_replace (str, '[^,]+')) + 1) t
on instr(a.dummy2, t.split) > 0) t2
where a.dummy1='xyz'
group by a.dummy1, a.dummy2
The main idea is that column t2.dummy3 contains CSVs. Thats why i have select '1,2,3,4' from dual.
I need to find all rows that contain at least one of the values from str.
Using any kind of loop is out of the question because further i need to integrate this into a larger query used for a report in SSRS, and the tables needed for this are quite large (>1mil rows)
CAST seem completely irrelevant here. You use CAST to change the perceived datatype of an expression. Here, you're passing it a result set, not an expression, and you're not saying what datatype to cast to.
You should be able to simply remove the TABLE and CAST calls and do something like:
SELECT * FROM (SELECT * FROM tab1 INNER JOIN tab2 ON ...) INNER JOIN tab3 ON ...
e.g.
SELECT * FROM
(SELECT d1.dummy FROM dual d1 INNER JOIN dual d2 ON d1.dummy=d2.dummy) d12
INNER JOIN dual d3 ON d12.dummy = d3.dummy
Subquery factoring should work fine here as well.
WITH x AS (SELECT * FROM DUAL)
SELECT * FROM
(SELECT d1.dummy FROM x d1 INNER JOIN x d2 ON d1.dummy=d2.dummy) d12
INNER JOIN dual d3 ON d12.dummy = d3.dummy;
If you're having difficulty getting that kind of construct to work, try adding more detail to your question about specifically what you've tried and what error you're getting.
Yeah... i found the answer... i was just too SQL n00b to see it as it was right in front of me...
i just took the "with" statement outside of the query and it worked.
thank you so much for your help, it was your answer that led me to see my mistake :D
Something like:
with str as
(Select '1,2,3,4' from dual)
Select a.dummy1, a.dummy2, wm_concat(t2.dummy3)
from table1 a,
(
Select a.dummy1, t.dummy3
from table1 a
inner join
(Select regexp_substr (str, '[^,]+', 1, rownum) split
from str
connect by level <= length (regexp_replace (str, '[^,]+')) + 1) t
on instr(a.dummy2, t.split) > 0) t2
where a.dummy1='xyz'
group by a.dummy1, a.dummy2

Resources