Long runtime while performing merge into operation in oracle - oracle

I have a requirement to validate records present in table.First, we load the records into table and then we validate it using sql query. I am using below query to update the status code but to process 114000, it took around 7 hours. Is it acceptable? I am not sure why it's taking too much time.Please suggest any better idea so that I can minimize the time.
Query :
MERGE INTO mem_src_extn t USING
(
SELECT mse.rowid row_id,
CASE WHEN mse.type_value IS NULL OR mse."TYPE" IS NULL OR mse.VALUE_1 IS NULL or mse.VALUE_2 IS NULL THEN '100'
WHEN ( SELECT count(*) FROM cmc_mem_src cms WHERE cms.tn_id = mse.type_value ) = 0 THEN '222'
WHEN count(mse.value_1) over ( partition by type_value ) > 1 THEN '333'
ELSE '000' int_value_1
FROM mem_src_extn mse
) u
ON ( t.rowid = u.row_id )
WHEN MATCHED THEN UPDATE SET t.int_value_1 = u.int_value_1

The performance problem is probably caused by the SELECT count(*) subquery , not the MERGE.
The MERGE uses a ROWID join, which should be about as fast as any join can get. But it's possible that Oracle is incorrectly optimizing the subquery. Try re-writing the statement with a LEFT JOIN instead of a correlated subquery:
MERGE INTO mem_src_extn t USING
(
SELECT DISTINCT
mse.rowid row_id,
CASE WHEN mse.type_value IS NULL OR mse."TYPE" IS NULL OR mse.VALUE_1 IS NULL or mse.VALUE_2 IS NULL THEN '100'
WHEN cms.tn_id IS NULL THEN '222'
WHEN count(mse.value_1) over ( partition by type_value ) > 1 THEN '333'
ELSE '000' END int_value_1
FROM mem_src_extn mse
LEFT JOIN cmc_mem_src cms
ON mse.type_value = cms.tn_id
) u
ON ( t.rowid = u.row_id )
WHEN MATCHED THEN UPDATE SET t.int_value_1 = u.int_value_1;
But this is only a guess. If I'm wrong, the next step would be to get a query plan. Run explain plan for merge ... and then select * from table(dbms_xplan.display); and post the entire output of that statement in the question.

Related

Adding filters in subquery from CTE quadruples run time

I am working on an existing query for SSRS report that focuses on aggregated financial aid data split out into 10 aggregations. User wants to be able to select students included in that aggregated data based on new vs. returning and 'selected for verification.' For the new/returning status, I added a CTE to return the earliest admit date for a student. 2 of the 10 data fields are created by a subquery. I have been trying for 3 days to get the subquery to use the CTE fields for a filter, but they won't work. Either they're ignored or I get a 'not a group by expression' error. If I put the join to the CTE within the subquery, the query time jumps from 45 second to 400 seconds. This shouldn't be that complicated! What am I missing? I have added some of the code... 3 of the chunks work - paid_something doesn't.
with stuStatus as
(select
person_uid, min(year_admitted) admit_year
from academic_study
where aid_year between :AidYearStartParameter and :AidYearEndParameter
group by person_uid)
--- above code added to get student information not originally in qry
select
finaid_applicant_status.aid_year
, count(1) as fafsa_cnt --works
, sum( --works
case
when (
package_complete_date is not null
and admit.status is not null
)
then 1
else 0
end
) as admit_and_package
, (select count(*) --does't work
from (
select distinct award_by_aid_year.person_uid
from
award_by_aid_year
where
award_by_aid_year.aid_year = finaid_applicant_status.aid_year
and award_by_aid_year.total_paid_amount > 0 )dta
where
(
(:StudentStatusParameter = 'N' and stuStatus.admit_year = finaid_applicant_status.aid_year)
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
)
as paid_something
, sum( --works
case
when exists (
select
1
from
award_by_person abp
where
abp.person_uid = fafsa.person_uid
and abp.aid_year = fafsa.aid_year
and abp.award_paid_amount > 0
) and fafsa.requirement is not null
then 1
else 0
end
) as paid_something_fafsa
from
finaid_applicant_status
join finaid_tracking_requirement fafsa
on finaid_applicant_status.person_uid = fafsa.person_uid
and finaid_applicant_status.aid_year = fafsa.aid_year
and fafsa.requirement = 'FAFSA'
left join finaid_tracking_requirement admit
on finaid_applicant_status.person_uid = admit.person_uid
and finaid_applicant_status.aid_year = admit.aid_year
and admit.requirement = 'ADMIT'
and admit.status in ('M', 'P')
left outer join stuStatus
on finaid_applicant_status.person_uid = stuStatus.person_uid
where
finaid_applicant_status.aid_year between :AidYearStartParameter and :AidYearEndParameter
and (
(:VerifiedParameter = '%') OR
(:VerifiedParameter <> '%' AND finaid_applicant_status.verification_required_ind = :VerifiedParameter)
)
and
(
(:StudentStatusParameter = 'N' and (stuStatus.admit_year IS NULL OR stuStatus.admit_year = finaid_applicant_status.aid_year ))
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
group by
finaid_applicant_status.aid_year
order by
finaid_applicant_status.aid_year
Not sure if this helps, but you have something like this:
select aid_year, count(1) c1,
(select count(1)
from (select distinct person_uid
from award_by_aid_year a
where a.aid_year = fas.aid_year))
from finaid_applicant_status fas
group by aid_year;
This query throws ORA-00904 FAS.AID_YEAR invalid identifier. It is because fas.aid_year is nested too deep in subquery.
If you are able to modify your subquery from select count(1) from (select distinct sth from ... where year = fas.year) to select count(distinct sth) from ... where year = fas.year then it has the chance to work.
select aid_year, count(1) c1,
(select count(distinct person_uid)
from award_by_aid_year a
where a.aid_year = fas.aid_year) c2
from finaid_applicant_status fas
group by aid_year
Here is simplified demo showing non-working and working queries. Of course your query is much more complicated, but this is something what you could check.
Also maybe you can use dbfiddle or sqlfiddle to set up some test case? Or show us sample (anonimized) data and required output for them?

Oracle's SDO_CONTAINS not using spatial index on unioned tables?

I'm trying to use Oracle's sdo_contains spatial operator, but it seems, that it's not really working, when you use it on unioned tables.
The below code runs in 2 mins, but you have to duplicate the spatial operator for every source table:
SELECT -- works
x.code,
count(x.my_id) cnt
FROM (select
c.code,
t.my_id
from my_poi_table_1 t,my_shape c
WHERE SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.latitude, t.longitude,null),null,null)
) = 'TRUE'
union all
select
c.code,
t.my_id
from my_poi_table_2 t,my_shape c
where SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.lat, t.lng,null),null,null)
) = 'TRUE'
) x
group by x.code
I wanted to make it simple, so I tried to first create the points, and then just once use the sdo_contains on it, but it's running for more then 25 mins, because it's not using the spatial index:
SELECT -- does not work
c.code,
count(x.my_id) cnt
FROM my_shape c,
(select
my_id,
sdo_geometry(2001,null,SDO_POINT_type(latitude, longitude,null),null,null) point
from my_poi_table_1 t
union all
select
my_id2,
sdo_geometry(2001,null,SDO_POINT_type(lat, lng,null),null,null) point
from my_poi_table_2 t
) x
WHERE SDO_contains(c.shape,
x.point
) = 'TRUE'
group by c.code
Is there a way to use the sdo_contains on the results of multiple tables without having to include it in the select several times?
Oracle: 12.1.0.2
It seems, that sdo_contains cannot (efficiently) read from a subselect: if I put one of the poi tables into a subselect, then Oracle will not use spatial index for that part:
SELECT -- does not work
x.code,
count(x.my_id) cnt
FROM (select --+ ordered index(c,INDEX_NAME)
c.code,
t.my_id
from my_shape c,(select t.*,rownum rn from my_poi_table_1 t) t
WHERE SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.latitude, t.longitude,null),null,null)
) = 'TRUE'
union all
select
c.code,
t.my_id
from my_poi_table_2 t,my_shape c
where SDO_contains(c.shape,
sdo_geometry(2001,null,SDO_POINT_type(t.lat, t.lng,null),null,null)
) = 'TRUE'
) x
group by x.code

Re-writing a join query

I have a question concerning Hive. Let me explain to you the scenario :
I am using a Hive action on Oozie; I have a query which is doing
succesive LEFT JOIN on different tables;
Total number of rows to be inserted is about 35 million;
First, the job was crashing due to lack of memory, so I set "set hive.auto.convert.join=false" the query was perfectly executed but it took 4 hours to be done;
I tried to rewrite the order of LEFT JOINs putting large tables at the end, but same result, about 4 hours to be executed;
Here is what the query look like:
INSERT OVERWRITE TABLE final_table
SELECT
T1.Id,
T1.some_field_name,
T1.another_filed_name,
T2.also_another_filed_name,
FROM table1 T1
LEFT JOIN table2 T2 ON ( T2.Id = T1.Id ) -- T2 is the smallest table
LEFT JOIN table3 T3 ON ( T3.Id = T1.Id )
LEFT JOIN table4 T4 ON ( T4.Id = T1.Id ) -- T4 is the biggest table
So, knowing the structure of the query is there a way to rewrite it so that I can avoid too many JOINs ?
Thanks in advance
PS: Even vectorization gave me the same timing
Too long for a comment, will be deleted later.
(1) Your current query won't compile.
(2) You are not selecting anything from T3 and T4, which makes no sense.
(3) Changing the order of tables is not likely to have any impact with cost based optimizer.
(4) Basically I would suggest to collect statistics on the tables, specifically on the id columns, but in your case I got a feeling that id is not unique in more than 1 table.
Add to your post the result of the following query:
select *
, case when cnt_1 = 0 then 1 else cnt_1 end
* case when cnt_2 = 0 then 1 else cnt_2 end
* case when cnt_3 = 0 then 1 else cnt_3 end
* case when cnt_4 = 0 then 1 else cnt_4 end as product
from (select id
,count(case when tab = 1 then 1 end) as cnt_1
,count(case when tab = 2 then 1 end) as cnt_2
,count(case when tab = 3 then 1 end) as cnt_3
,count(case when tab = 4 then 1 end) as cnt_4
from ( select 1 as tab,id from table1
union all select 2 as tab,id from table2
union all select 3 as tab,id from table3
union all select 4 as tab,id from table4
) t
group by id
having greatest (cnt_1,cnt_2,cnt_3,cnt_4) >= 10
) t
order by product desc
limit 10
;

Optimising Self join queries in oracle 11g

I am trying to execute the below query and it takes more time to get executed. Could you please say how I can optimize this query to execute in faster manner. Also please say how to understand the results of EXPLAIN PLAN?
SELECT DISTINCT T1.*,
T3.*
FROM table1 T1
join table1 T2
ON T2.no = T1.no
left outer join table2 T3
ON T1.record_id = T3.record_id
WHERE T2.standard IN ( 'TE' )
AND Upper(T2.ind) = 'PRI'
AND ( ( T2.input = 'SIT'
AND T2.batch_no IS NOT NULL )
OR T2.input <> 'SIT' )
AND Upper(T1.ind) IN ( 'PRI', 'CNJ' )
AND T1.id_no = T2.id_no
ORDER BY T1.no,
T1.ind DESC

RANK OVER function in Hive

I'm trying to run this query in Hive to return only the top 10 url which appear more often in the adimpression table.
select
ranked_mytable.url,
ranked_mytable.cnt
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
Unfortunately I get an error message stating:
FAILED: SemanticException [Error 10002]: Line 26:23 Invalid column reference 'rnk'
I've tried to debug it and until the ranked_mytable sub-queries everything goes smooth. I've tried to comment the where ranked_mytable.rnk <= 10 clause but the error message keep appearing.
Hive is unable to order by a column that is not in the "output" of a select statement. To fix it, just include that column in the selected columns:
select
ranked_mytable.url,
ranked_mytable.cnt,
ranked_mytable.rnk
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
If you don't want that 'rnk' column in your final output, I expect you could wrap that whole thing in another inner-query and just select out the 'url' and 'cnt' fields.
RANK OVER is not the best function to achieve this goal.
A better solution would be to use a combination of SORT BY and LIMIT. It's true in fact LIMIT picks randomly the rows in a table, but this might be avoided if used with the SORT BY function. From the Apache Wiki:
-- Top k queries. The following query returns the top 5 sales records wrt amount.
SET mapred.reduce.tasks = 1 SELECT * FROM sales SORT BY amount
DESC LIMIT 5
The query can be re-written in this way:
select
iq.url,
iq.cnt
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url ) iq
sort by
iq.cnt desc
limit
10
;
Remove the partition by iq.url clause from rank over() and re-run query.
Thanks & Regards,
Kamleshkumar Gujarathi
Put as before the rnk variable. It should work fine.

Resources