I have a large table insert as part of a job for reporting. For ease of development, I did a single insert with a single select, rather than splitting this up into multiple commits.
insert /*+ parallel(AUTO) */ into sc_session_activity_stage1(fiscal_year
,fiscal_quarter_id
,date_stamp
,time_stamp
,session_key
,activity_call_type_key
,user_key
,device_key
,url_key
,ref_url_key
,event_key
,page_type_key
,link_url_key
,component_key
,content_trace_key
,key) (
select /*+ parallel(AUTO) */
schfql.fiscal_year fiscal_year
,schfql.fiscal_quarter_id fiscal_quarter_id
,pkg_sc_portfolio_load.sc_datestamp_from_epoch(swa.time_stamp)
,swa.time_stamp time_stamp
,schuse.session_key session_key
,schact.activity_call_type_key activity_call_type_key
,schu.user_key user_key
,schde.device_key device_key
,schurl_url.url_key url_key
,schurl_ref.url_key ref_url_key
,schev.event_key event_key
,schapt.page_type_key page_type_key
,schurl_link_url.url_key link_url_key
,schwac.component_id component_id
,schti_content_unique_id.trace_id_key content_unique_id_trace_id_key
,schti_unique_id.trace_id_key unique_id_trace_id_key
from web_activity swa
inner join sc_fiscal_quarter_list schfql
on pkg_sc_portfolio_load.sc_datestamp_from_epoch(swa.time_stamp) between schfql.start_date and schfql.end_date
inner join sc_user_sessions schuse
on schuse.session_id = swa.session_id
inner join sc_activity_call_types schact
on schact.activity_call_type_name = swa.calltype
inner join sc_users schu
on schu.user_email = sc_normalize_email(swa.userid)
inner join sc_devices schde
on swa.device=schde.device and
swa.ip=schde.source_ip and
swa.operation_system = schde.operating_system and
swa.browser = schde.browser
left join sc_urls schurl_url
on schurl_url.full_url = trim(swa.url)
inner join sc_events schev
on schev.event=trim(swa.event)
inner join sc_activity_page_types schapt
on schapt.page_type_name=swa.pagetype
left join sc_urls schurl_link_url
on schurl_link_url.full_url = trim(swa.linkurl)
left join sc_urls schurl_ref
on schurl_ref.full_url = trim(swa.ref)
inner join sc_web_activity_components schwac
on schwac.component_name=trim(swa.component)
left join sc_trace_ids schti_content_unique_id
on schti_content_unique_id.alfresco_trace_id = swa.CONTENT_UNIQUE_ID
left join sc_trace_ids schti_unique_id
on schti_unique_id.alfresco_trace_id=swa.UNIQUE_ID
);
commit;
On production, this triggers alarms for TEMP tablespace. If I were to split the above into multiple commits, would that reduce the TEMP usage at any one point in time? This may be obvious to some, but I'm not sure how Oracle works. I'm not seeing any ORA type errors, rather some threshold is being triggered and someone from the DBA team sends an email.
Thank you from the Woodsman.
TEMP tablespace blowouts are common and can be addressed by increasing the TEMP tablesapce AND/OR tuning the SQL to use less TEMP. For tuning: I usually start with the SQL Tuning Advisor recommendations (requires diagnostics and tuning pack). BTW: TEMP usage goes up with parallel queries and also is mostly specific to the SELECT part. You can also reduce the TEMP tablespace usage by doing more in memory (i.e. increasing the PGA).
Related
I have 3 distributed table: t_user_info_all, t_user_event_all, t_user_flat_all, they all on same cluster with same shard key user_id.
And i want insert join result of t_user_event_all and t_user_info_all into t_user_flat_all, SQL like this:
insert into t_user_flat_all select * from t_user_info_all t1 left join t_user_event_all t2 on t1.user_id = t2.user_id.
With setting distributed_product_mode = 'local', join runs on local mode, but insert statements still on a distributed table.
I found setting parallel_distributed_insert_select = 2, SELECT and INSERT will be executed on each shard from/to the underlying table of the distributed engine. But it only works for queries like INSERT INTO distributed_table_a SELECT ... FROM distributed_table_b, the select query can not have where conditions or joins.
Or i can run local insert insert into t_user_flat_local select * from t_user_info_local t1 left join t_user_event_local t2 on t1.user_id = t2.user_id on each shard, but it makes the case complex.
my current problem is in 11g, but I am also interested in how this might be solved smarter in later versions.
I want to join two tables. Table A has 10 million rows, Table B is huge and has a billion of records across about a thousand partitions. One partition has around 10 million records. I am not joining on the partition key. For most rows of Table A, one or more rows in Table B will be found.
Example:
select * from table_a a
inner join table_b b on a.ref = b.ref
The above will return about 50 million rows, whereas the results come from about 30 partitions of table b. I am assuming a hash join is the correct join here, hashing table a and FTSing/index-scanning table b.
So, 970 partitions were scanned for no reason. And, I have a third query that could tell oracle which 30 partitions to check for the join.
Example of third query:
select partition_id from table_c
This query gives exactly the 30 partitions for the query above.
To my question:
In PL/SQL one can solve this by
select the 30 partition_ids into a variable (be it just a select listagg(partition_id,',') ... into v_partitions from table_c
Execute my query like so:
execute immediate 'select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in ('||v_partitions||')' into ...
Let's say this completes in 10 minutes.
Now, how can I do this in the same amount of time with pure SQL?
Just simply writing
select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in (select partition_id from table_c)
does not do the trick it seems, or I might be aiming at the wrong plan.
The plan I think I want is
hash join
table a
nested loop
table c
partition pruning here
table b
But, this does not come back in 10 minutes.
So, how to do this in SQL and what execution plan to aim at? One variation I have not tried yet that might be the solution is
nested loop
table c
hash join
table a
partition pruning here (pushed predicate from the join to c)
table b
Another feeling I have is that the solution might lie in joining table a to table c (not sure on what though) and then joining this result to table b.
I am not asking you to type everything out for me. Just a general concept of how to do this (getting partition restriction from a query) in SQL - what plan should I aim at?
thank you very much! Peter
I'm not an expert at this, but I think Oracle generally does the joins first, then applies the where conditions. So you might get the plan you want by moving the partition pruning up into a join condition:
select * from table_a a
inner join table_b b on a.ref = b.ref
and b.partition_id in (select partition_id from table_c);
I've also seen people try to do this sort of thing with an inline view:
select * from table_a a
inner join (select * from table_b
where partition_id in (select partition_id from table_c)) b
on a.ref = b.ref;
thank you all for your discussions with me on this one. In my case this was solved (not by me) by adding a join-path between table_c and table_a and by overloading the join conditions as below. In my case this was possible by adding column partition_id to table_a:
select * from
table_c c
JOIN table_a a ON (a.partition_id = c.partition_id)
JOIN table_b b ON (b.partition_id = c.partition_id and b.partition_id = a.partition_id and b.ref = a.ref)
And this is the plan you want:
leading(c,b,a) use_nl(c,b) swap_join_inputs(a) use_hash(a)
So you get:
hash join
table a
nested loop
table c
partition list iterator
table b
I have a statement that is used quiet often but does perform pretty poorly.
So i want to optimize it as good as possible. The statement consist of various parts and unions.
One part that is pretty costly is the following.
select *
from rc050 F
LEFT OUTER JOIN
(SELECT fi_nr,
fklz,
rvc_status,
lfdnr,
txt
FROM rc0531 temp
WHERE lfdnr =
(SELECT MAX(lfdnr) FROM rc0531 WHERE fi_nr = temp.fi_nr AND fklz = temp.fklz
)
) SUB_TABLE1
ON F.fklz = SUB_TABLE1.fklz
AND F.fi_nr = SUB_TABLE1.fi_nr
LEFT OUTER JOIN
(SELECT fi_nr,
fklz,
rvc_status,
lfdnr,
txt
FROM rc0532 temp
WHERE lfdnr =
(SELECT MAX(lfdnr) FROM rc0532 WHERE fi_nr = temp.fi_nr AND fklz = temp.fklz
)
) SUB_TABLE2
ON F.fklz = SUB_TABLE2.fklz
AND F.fi_nr = SUB_TABLE2.fi_nr
LEFT OUTER JOIN
(SELECT fi_nr,
fklz,
rvc_status,
lfdnr,
txt
FROM rc05311 temp
WHERE lfdnr =
(SELECT MAX(lfdnr)
FROM rc05311
WHERE fi_nr = temp.fi_nr
AND fklz = temp.fklz
)
) SUB_TABLE11
ON F.fklz = SUB_TABLE11.fklz
AND F.fi_nr = SUB_TABLE11.fi_nr
where F.fklz != ' '
This is a part where I have to load from these rc0531 ... rc05311 tables
what there latest entry is. There are 11 of these tables so this is broken down.
As you can see I'm currently joining each table via subquery and need only the latest entry so i need an additional subquery to get the max(lfdnr).
This all works good so far but i wanna know if there is a more efficient way to perform this.
In my main select I need to be able to address each column from each of these tables.
Do you guys have any suggestions ?
This currently runs in 1.3 sec on 13k rows to get a decent boost this should get down to 0.1 sec or so. Again this is only one problem in a bigger statement full of unefficient declarations.
It's not possible to optimize a SQL query. Oracle is going to take your query and determine an execution plan that it uses to determine the information you've asked for. You might completely rewrite the query and, if it leads to the same execution plan, you'll get exactly the same performance. To tune a query, you need to determine the execution plan.
You may well benefit from an index on each of these tables, on the (fi_nr, fklz) columns or possibly even (fi_nr, fklz, lfdnr), depending on how selective that gets. There's no way to tell in advance from this information, you'll just have to try.
You should also remove the select * and select only the columns you want. If it's possible to get just the needed information from an index, Oracle will not need to retrieve the actual table row.
replace the left joins from :
LEFT OUTER JOIN (
SELECT fi_nr,fklz,rvc_status,lfdnr,txt FROM rc0531 temp
WHERE lfdnr = (SELECT MAX(lfdnr) FROM rc0531 WHERE fi_nr = temp.fi_nr AND fklz = temp.fklz)
) SUB_TABLE1
ON F.fklz = SUB_TABLE1.fklz
AND F.fi_nr = SUB_TABLE1.fi_nr
to this:
LEFT OUTER JOIN (
SELECT fi_nr,fklz,rvc_status,lfdnr,txt FROM rc0531 inner join
(SELECT fi_nr, fklz, MAX(lfdnr) lfdnr FROM rc0531 group by fi_nr, fklz)x on x.fi_nr=rc0531.fi_nr and x.fklz=rc0531.fklz and x.lfdnr=rc0531.lfdnr
) SUB_TABLE1
ON F.fklz = SUB_TABLE1.fklz
AND F.fi_nr = SUB_TABLE1.fi_nr
let me know if it got to 0.1 sec, i think it will go
I have a requirement to read from 2 tables once read i have to update the falg on both table.
My SQL query
SELECT t1.KUNNR,t1.SETT_KEY,t1.QUART_START,t1.QUART_END,t2.PAY_METH,t2.MAT_NDC,t2.AMOUNT
FROM TSAP_REBATE_MEDI t1
INNER JOIN TSAP_REBATE_LINE t2 ON t1.KUNNR=t2.KUNNR AND t1.SETT_KEY=t2.SETT_KEY
WHERE t1.PROCESSING_STATUS = 'N' AND t2.PROCESSING_STATUS = 'N'
This is working fine now i need an update query for the same where PROCESSING_STATUS is set to 'P' on both tables.
You cannot update two tables at the same time. Run two separate UPDATE statements of the following nature
UPDATE t1
SET COLUMN = VALUE
FROM TSAP_REBATE_MEDI t1
INNER JOIN TSAP_REBATE_LINE t2
ON t1.KUNNR=t2.KUNNR
AND t1.SETT_KEY=t2.SETT_KEY
WHERE t1.PROCESSING_STATUS = 'N'
AND t2.PROCESSING_STATUS = 'N'
/* Add any other conditions */
However, if you want them both to be updated (or neither one), wrap both updates in a BEGIN TRANSACTION - COMMIT
I'm learning MS SQL Server 2008 R2 so please excuse my ignorance.
This query takes 3 sec and I would like to do it in less than 1 sec.
the query is only for testing purposes, in reality I would join on different fields.
select * from
(
select row_number() over(order by t1.id) as n, t1.id as id1, t2.id as id2, t3.id as id3, t4.id as id4, t5.id as id5
from dbo.Context t1
inner join dbo.Context t2 on t1.id = t2.test
inner join dbo.Context t3 on t2.id = t3.test
inner join dbo.Context t4 on t3.id = t4.test
inner join dbo.Context t5 on t4.id = t5.test
) as t
where t.n between 950000 and 950009;
I'm afraid this will be worse by the time I have several billion records in this table.
Do I need to enable multi-threading from configuration or something?
There is no real way to optimize the paging portion of such a query, the part that is
t.n between 950000 and 950009
Which is really
{ ROW_NUMBER } between 950000 and 950009
Without fully materializing the INNER JOINs, there is no way to accurately row-number the result. This is unlike a single table with Row_Number - the Query Optimizer can sometimes just count the index keys and go to a direct range.
The only thing you can do is ensure that the JOIN conditions are all fully indexed and have the indexes INCLUDE the columns that will be selected (so they become COVERING INDEXes). There is no point showing specifics since those are not your real columns.
Do I need to enable multi-threading from configuration or something?
By default, parallelism is [already] turned on so such a query will very likely gather the data in multiple streams.
I'd suggest creating the inner query as an indexed view and then running your paging off of that. Since an indexed view actually has a real index on it the same optimization tricks that work on tables can be used.
See here for more information on indexed views including the limitations.