I have a oracle query which is executed once a month to get the order details processed. This query is taking a painfully lot of time to execute. ( More than thirty mins ). Therefore I am trying to optimize this. I have a decent knowledge in Oracle and I will explain what I have tried so far. Still, it takes around 20 minutes to complete. This is the query. Oracle version is 11g.
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
Background
order_main - This table has over 4 million records
I introduced index for order_no column which reduced time to execute.
My questions are as follows.
1) Will it help if I move date validation inside the inner query like this ?
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
WHERE to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
2) Could someone please suggest any other improvements that can be done to this query?
3) Additional indexing would help in any other tables / columns ? Only daily_trx and order_main tables are the tables that contains huge amount of data.
Some generall suggestions
Do not combine ANSI and Oracle Join Syntax in one Query
Do not use outer join if inner join can be used
Your inner subqueries use outer joins, but the final join to main_stores is an inner join
eliminating all rows with store_no is null - you may use inner joins with the same result.
Filter rows early
A suboptimal practice is to first join in a subquery and than filter relevant row with where conditions
Use simple predicated
If you want to constraint a DATE column do it this way
trx.ADDED_DATE >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
Use count distinct if appropriate
The select DISTINCTquery in the third line cam be eliminated if you use COUNT(DISTINCT order_no)
Applying all the above point I come to the following query
select
store.store_no, store.store_typ, count(DISTINCT trx.order_no) order_no_cnt
from daily_trx trx
join order_main odr on trx.order_no = odr.order_no
join ordr_prod_typ prod on odr.prod_typ = prod.prod_typ
join main_stores store on store.store_no = prod.store_no
where trx.ADDED_DATE >= date'2020-05-01' and
trx.ADDED_DATE < date'2020-06-01' and
trx.order_no !='NA'
group by store.store_no, store.store_typ
Performance Considereations
You process a month of data, so there will be probably a large number of transaction (say 100K+). In this case the best approach is to full scan the two large tables and perform HASH JOINs.
You can expect this execution plan
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 199K| 5850K| | 592 (2)| 00:00:08 |
|* 1 | HASH JOIN | | 199K| 5850K| | 592 (2)| 00:00:08 |
| 2 | TABLE ACCESS FULL | MAIN_STORES | 26 | 104 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 199K| 5070K| | 588 (2)| 00:00:08 |
| 4 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 199K| 4290K| 1960K| 584 (1)| 00:00:08 |
|* 6 | TABLE ACCESS FULL| ORDER_MAIN | 100K| 782K| | 69 (2)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| DAILY_TRX | 200K| 2734K| | 172 (2)| 00:00:03 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
3 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
5 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
6 - filter("ODR"."ORDER_NO"<>'NA')
7 - filter("TRX"."ADDED_DATE"<TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ORDER_NO"<>'NA' AND "TRX"."ADDED_DATE">=TO_DATE(' 2020-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
If you have a partition option available you will massively profit by defining a monthly partitioning schema (alternative a daily partitioning) on the two tables DAILY_TRX and ORDER_MAIN.
If the above assumption is not correct and you have very few transactions in the selected time interval (say below 1K) - you will go better using the index access and NESTED LOOPS joins.
You will need this set of indices
create index daily_trx_date on daily_trx(ADDED_DATE);
create unique index order_main_idx on order_main (order_no);
create unique index ORDR_PROD_TYP_idx1 on ORDR_PROD_TYP(prod_typ);
create unique index main_stores_idx1 on main_stores(store_no);
The expected plan is as follows
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID | DAILY_TRX | 92 | 1288 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | DAILY_TRX_DATE | 92 | | 3 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 100K| 1564K| 75 (3)| 00:00:01 |
| 5 | MERGE JOIN | | 26 | 208 | 6 (17)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MAIN_STORES | 26 | 104 | 2 (0)| 00:00:01 |
| 7 | INDEX FULL SCAN | MAIN_STORES_IDX1 | 26 | | 1 (0)| 00:00:01 |
|* 8 | SORT JOIN | | 26 | 104 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | 3 (0)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | ORDER_MAIN | 100K| 782K| 69 (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
2 - filter("TRX"."ORDER_NO"<>'NA')
3 - access("TRX"."ADDED_DATE">=TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ADDED_DATE"<TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
4 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
8 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
filter("STORE"."STORE_NO"="PROD"."STORE_NO")
10 - filter("ODR"."ORDER_NO"<>'NA')
Check here how to get the execution plan of your query
Related
Why does Oracle sometimes return the wrong ORA_ROWSCN, such as in the following? (Note this does not seem to be a ROWDEPENDENCIES issue or a "greater than expected SCN" issue, as I realize both these caveats when using ORA_ROWSCN.)
When I run:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618
My result is:
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA2
1887508 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA3
1887512 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA7
...
Yep, you see that right. The ORA_ROWSCN returned is less than my literal value that I asked for greater-than in the query WHERE clause. (I also included otherSCN to see if it was throwing me off somehow, but it appears to be irrelevant)
It appears that the Row in question in reality has a higher ORA_ROWSCN, and indeed the WHERE clause worked properly, as when I then do SELECT ORA_ROWSCN FROM changed_rows_log WHERE changed_rows_log_id=1887507, I get 7884576380644 not 7884576380617.
Also, when I add just one WHERE condition, I also get the correct data returned:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618 AND l.changed_rows_log_id=1887507
gives me this, as expected
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380644 7884576380644 FOO AAARiGAAMAAG4B4AA2
So why does and how can SELECT ORA_ROWSCN give me simply incorrect data like this? Can I work around it somehow so I can get the expected ORA_ROWSCN that more particular queries give me?
(If it matters, changed_rows_log has ROWDEPENDENCIES enabled. I'm using Oracle Database 12.1.0.2.0 64-bit.)
More detail--the EXPLAIN PLAN for the first query (with bad value)
Plan hash value: 3153795477
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 3 | HASH JOIN | | 208K| 12M| 3424K| 30787 (1)| 00:00:02 |
|* 4 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 71438 | 2581K| | 14052 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 1428K| 34M| | 14058 (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("L"."CHANGED_ROWS_LOG_ID"=MAX("CHANGED_ROWS_LOG_ID"))
3 - access("L"."TABLE_NAME"="TABLE_NAME" AND "L"."RECORD_ROWID"="RECORD_ROWID")
4 - filter("ORA_ROWSCN">7884576380618)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 2 Sql Plan Directives used for this statement
And the last query above (correct value)
Plan hash value: 402632295
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 7 (15)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | 7 (15)| 00:00:01 |
| 3 | NESTED LOOPS | | 3 | 186 | 6 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID | CHANGED_ROWS_LOG | 1 | 37 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C00141068 | 1 | | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| CHANGED_ROWS_LOG | 3 | 75 | 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | CHANGED_ROWS_LOG_IF1 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(MAX("CHANGED_ROWS_LOG_ID")=1887507)
4 - filter("ORA_ROWSCN">7884576380618)
5 - access("L"."CHANGED_ROWS_LOG_ID"=1887507)
7 - access("L"."RECORD_ROWID"="RECORD_ROWID" AND "L"."TABLE_NAME"="TABLE_NAME")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
Adding the MATERIALIZE hint to the WITH subquery overcomes this issue. I'd love if someone could explain why the issue happens at all, but for now:
WITH maxIds as (
SELECT /*+ MATERIALIZE */ table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618
What is the best practice to convert the following sql statement using a subquery (with data as clause) to use it in a database view.
AFAIK the with data as clause is not supported in database views (Edited: Oracle supports Common Table Expressions), but in my case the subquery factoring offers advantage for performance. If I create a database view using Common Table Expression, than this advantage is lost.
Please have a look at my example:
Description of query
a_table
Millions of entries, by the select statement a few thousand are selected.
anchor_table
For each entry in a_table exists a corresponding entry in anchor_table. By this table is determined at runtime exactly one row as anchor. See example below.
horizon_table
For each selection exactly one entry is determined at runtime (all entries of a selection of a_table have the same horizon_id)
Please notice: This is a strongly simplified sql that works fine so far.
In reality more than 20 tables are joined together to get the results of data.
The where clause is much more complex.
Further columns of horizon_table and anchor_table are required to prepare my where condition and result list in the subquery, i.e. moving these tables to the main query is no solution.
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 1 and 10000
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
example of with data as select:
id descr offset anchor position
1 bla 3 0 1
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
54 oeroe 3 0 9
...
result of select * from data
id descr offset anchor position
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
I.E. the result is the anchor row and the tree rows above and below.
How can I achieve the same within a database view?
My attempt failed as I expected by performance issues:
Create a view data of with data as select above
Use this view as above
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
Thank you for any advice :-)
Amendment
If I create a view as recommended in first comment, than I get the same performance issue. Oracle does not use the subquery to restrict the results.
Here are the execution plans of my production queries (please click at the images)
a) SQL
b) View
Here are the execution plans of my test cases
-- Create Testdata table with ~ 1,000,000 entries
insert into a_table
(id, descr, a_position_field, anchor_id, horizon_id, a_value)
select level, 'data' || level, mod(level, 10), level, 1, level
from dual
connect by level <= 999999;
insert into anchor_table
(id, a_date)
select level, trunc(sysdate) - 500000 + level
from dual
connect by level <= 999999;
insert into horizon_table (id, offset) values (1, 50);
commit;
-- Create view
create or replace view testdata_vw as
with data as
(select a_table.id,
a_table.descr,
a_table.a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id))
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
-- Explain plan of subquery factoring select statement
explain plan for
with data as
(select a_table.id,
a_table.descr,
a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 500000 - 500 and 500000 + 500)
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D6628_284C5768 ~ 1000 rows
Plan hash value: 1145408420
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 1791 (2)| 00:00:31 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6628_284C5768 | | | | |
| 3 | WINDOW SORT | | 57 | 6840 | 1785 (2)| 00:00:31 |
|* 4 | HASH JOIN | | 57 | 6840 | 1784 (2)| 00:00:31 |
|* 5 | TABLE ACCESS FULL | A_TABLE | 57 | 4104 | 1193 (2)| 00:00:21 |
| 6 | MERGE JOIN CARTESIAN | | 1189K| 54M| 586 (2)| 00:00:10 |
| 7 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | 3 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 1189K| 24M| 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| 583 (2)| 00:00:10 |
|* 10 | FILTER | | | | | |
| 11 | VIEW | | 57 | 3534 | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 13 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 15 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID" AND
"ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
5 - filter("A_TABLE"."A_VALUE">=499500 AND "A_TABLE"."A_VALUE"<=500500)
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE
("T1") "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<=
(SELECT "D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1"
"DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D2" WHERE "D2"."ANCHOR"=1))
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
-- Explain plan of database view
explain plan for
select *
from testdata_vw
where a_value between 500000 - 500 and 500000 + 500;
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D662A_284C5768 ~ 1000000 rows
Plan hash value: 1422141561
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 1 | VIEW | TESTDATA_VW | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 2 | TEMP TABLE TRANSFORMATION | | | | | | |
| 3 | LOAD AS SELECT | SYS_TEMP_0FD9D662A_284C5768 | | | | | |
| 4 | WINDOW SORT | | 1189K| 136M| 147M| 37032 (1)| 00:10:30 |
|* 5 | HASH JOIN | | 1189K| 136M| | 6868 (1)| 00:01:57 |
| 6 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | | 3 (0)| 00:00:01 |
|* 7 | HASH JOIN | | 1189K| 106M| 38M| 6860 (1)| 00:01:57 |
| 8 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| | 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | A_TABLE | 1209K| 83M| | 1191 (2)| 00:00:21 |
|* 10 | FILTER | | | | | | |
|* 11 | VIEW | | 1189K| 70M| | 4431 (1)| 00:01:16 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 13 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 15 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID")
7 - access("ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE ("T1")
"C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= (SELECT
"D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1" "DESCR","C2"
"A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D2"
WHERE "D2"."ANCHOR"=1))
11 - filter("A_VALUE">=499500 AND "A_VALUE"<=500500)
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
sqlfiddle
explain plan of sql http://www.sqlfiddle.com/#!4/6a7022/3
explain plan of view http://www.sqlfiddle.com/#!4/6a7022/2
You need to write a view definition which returns all possible selectable ranges of a_value as two columns, start_a_value and end_a_value, along with all records which fall into each start/end range. In other words, the correct view definition should logically describe a |n^3| result set given n rows in a_table.
Then query that view as:
SELECT * FROM testdata_vw WHERE START_A_VALUE = 4950 AND END_A_VALUE = 5050;
Also, your multiple references to "data" are unnecessary; same logic can be delivered with an additional analytic function.
Final view def:
CREATE OR REPLACE VIEW testdata_vw AS
SELECT *
FROM
(
SELECT T.*,
MAX(CASE WHEN ANCHOR=1 THEN POSITION END)
OVER (PARTITION BY START_A_VALUE, END_A_VALUE) ANCHOR_POS
FROM
(
SELECT S.A_VALUE START_A_VALUE,
E.A_VALUE END_A_VALUE,
B.ID ID,
B.DESCR DESCR,
HORIZON_TABLE.OFFSET OFFSET,
CASE
WHEN ANCHOR_TABLE.A_DATE = TRUNC(SYSDATE)
THEN 1
ELSE 0
END ANCHOR,
ROW_NUMBER()
OVER(PARTITION BY S.A_VALUE, E.A_VALUE
ORDER BY B.A_POSITION_FIELD) POSITION
FROM
A_TABLE S
JOIN A_TABLE E
ON S.A_VALUE<E.A_VALUE
JOIN A_TABLE B
ON B.A_VALUE BETWEEN S.A_VALUE AND E.A_VALUE
JOIN ANCHOR_TABLE
ON ANCHOR_TABLE.ID = B.ANCHOR_ID
JOIN HORIZON_TABLE
ON HORIZON_TABLE.ID = B.HORIZON_ID
) T
) T
WHERE POSITION BETWEEN ANCHOR_POS - OFFSET AND ANCHOR_POS+OFFSET;
EDIT: SQL Fiddle with expected execution plan
I'm seeing the same (sensible) plan here that I saw in my database; if you're getting something different, please send fiddle link.
Use index lookup to find 1 row in "S" A_TABLE (A_VALUE = 4950)
Use index lookup to find 1 row in "E" A_TABLE (A_VALUE = 5050)
Nested Loop join #1 and #2 (1 x 1 join, still 1 row)
FTS 1 row from HORIZON table
Cartesian join #1 and #2 (1 x 1, okay to use Cartesian).
Use index lookup to find ~100 rows in "B" A_TABLE with values between 4950 and 5050.
Cartesian join #5 and #6 (1 x 102, okay to use Cartesian).
FTS ANCHOR_TABLE with hash join to #7.
Window-sort for analytic functions
You have a predicate outside the view and you want to be applied in the view.
For this, you can use push_pred hint:
select /*+PUSH_PRED(v)*/
*
from
testdata_vw v
where
a_value between 5000 - 50 and 5000 + 50;
SQLFIDDLE
EDIT: Now I've seen that you use the data subquery three times. For the first occurrence it makes sense to push the predicate, but for d1 and d2 it doesn't. It's another query.
What would I do is to use two context variables, set them according my needs and write the query:
SYS_CONTEXT('my_context_name', 'var5000');
create or replace view testdata_vw as
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between SYS_CONTEXT('my_context_name', 'var5000') - SYS_CONTEXT('my_context_name', 'var50') and SYS_CONTEXT('my_context_name', 'var5000') + SYS_CONTEXT('my_context_name', 'var50')
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1) ;
to use it:
dbms_session.set_context ('my_context_name', 'var5000', 5000);
dbms_session.set_context ('my_context_name', 'var50', 50);
select * from testdata_vw;
UPDATE: Instead of context variables(which can be used across sessions) you can use package variables as you commented.
i have 3 tables in a oracle 11g database. I don't have access to trace file or explain plan anymore. I join the 3 table on the date field like:
select * from a,b,c where a.date = b.date and b.date = c.date
and that takes forever.
when I
select * from a,b,c where a.date = b.date and b.date = c.date and a.date = c.date
its fast. but should that make a difference?
Not sure but it looks like a transitive dependency. that's to say if a.date = b.date and b.date = c.date then a.date = c.date. You can modify your query rather like
select a.*
from a
join b on a.date = b.date
join c on a.date = c.date;
I would also have a index on date column for all this 3 tables since that's the column you are joining on.
Apparently the database does not rewrite queries if the joins are such that A = B, B = C ==> A = C so it's stuck to using what its given.
Consider the following:
create table a (dt date);
create table b (dt date);
create table c (dt date);
Now fill in the tables so that a is the smallest (5 rows), b is the biggest (100 rows), and c is in the middle (50 rows). Also, so that not all rows in b and c will join to a just to make things a bit more interesting.
insert into a
select to_date('2015-01-01', 'yyyy-mm-dd') + rownum - 1
from dual
connect by level <= 5
;
insert into b
select to_date('2015-01-01', 'yyyy-mm-dd') + mod(rownum, 10)
from dual
connect by level <= 100
;
insert into c
select to_date('2015-01-01', 'yyyy-mm-dd') + mod(rownum, 10)
from dual
connect by level <= 50
;
I'm going to bypass statistics for now and leave it totally up to the database on how to figure out a plan.
Take 1: without the join from a to c:
explain plan for
select *
from a
, b
, c
where a.dt = b.dt
and b.dt = c.dt
;
and here's the plan:
select *
from table(dbms_xplan.display())
;
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 50 | 900 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| B | 100 | 900 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | C | 50 | 450 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."DT"="C"."DT")
2 - access("A"."DT"="B"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
First off, since there were no statistics on the tables, Oracle chose to sample the data first so it wasn't going in blind. In this case, table a joins to b first, then the result of that joins to c.
Take 2: introduce the a.dt = c.dt condition:
explain plan for
select *
from a
, b
, c
where a.dt = b.dt
and b.dt = c.dt
and a.dt = c.dt
;
select *
from table(dbms_xplan.display())
;
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 675 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 25 | 675 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 25 | 450 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| C | 50 | 450 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | B | 100 | 900 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DT"="B"."DT" AND "B"."DT"="C"."DT")
2 - access("A"."DT"="C"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
And there you go. The order of the joins has switched now that Oracle has been given the extra join path. (FYI, this is the same plan if using just a.dt = b.dt and a.dt = c.dt.)
BUT, notice anything? The estimates are not right anymore. It's guessing 25 rows in the end, not 250. So, the extra condition is actually causing some confusion.
Without the b.dt = c.dt, though, same join path, different estimates (same end result as the first one):
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 25 | 450 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| C | 50 | 450 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | B | 100 | 900 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DT"="B"."DT")
2 - access("A"."DT"="C"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Long story a little longer, since the database isn't going to assume any join paths for you, adding one in your query gives the database more options and as such can change its plan...and a change in plan can certainly affect how fast the results are returned.
This is Your Query.....
select * from a,b,c where a.date = b.date and .date = c.date and a.date = c.date
Now As per my view ..
SELECT * FROM a
JOIN B USING(date)
JOIN C USING(date);
I have one table that is list partitioned on a numeric column (row_id),
TABLEA (ROW_ID NUMERIC(38), TB_KEY NUMERIC(38), ROW_DATA VARCHAR(20));
Partition pruning works when i query from table with no joins:
SELECT A.* FROM TABLEA A
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
Partition Pruning fails when I do left outer join to TableB
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
Partition Pruning works when I change left outer join to inner join
SELECT A.* FROM TABLEA A
INNER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
Partition Pruning works when I do left outer join to TableB and do not use IN clause
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID = 123;
Partition Pruning works when I do left outer join to TableB and use static values for IN clause
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (123, 345);
Can someone explain me why left outer join will cause partition pruning to fail, when i query on column that table is partitioned on using IN clause with result from subquery?
The answer on Oracle 11g is YES, the partition pruning works fine.
There are three main access patterns in your setup, with the list partitioned table TABLEA, lets go all of them through. Note, that I'm using
simplest possible statements to illustrated the behavior.
Access with keys in equal predicate or in IN List
The simplest case is using a literal in an equal predicate on the partition key:
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE A.ROW_ID = 123;
This leads to following execution plan
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 4 (0)| 00:00:01 | | |
|* 1 | HASH JOIN OUTER | | 1 | 51 | 4 (0)| 00:00:01 | | |
| 2 | PARTITION LIST SINGLE| | 1 | 38 | 2 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | TABLEA | 1 | 38 | 2 (0)| 00:00:01 | 2 | 2 |
| 4 | TABLE ACCESS FULL | TABLEB | 1 | 13 | 2 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."TB_KET"="B"."TB_KEY"(+))
3 - filter("A"."ROW_ID"=123)
Only the relevant partition ot TABLEA is accessed (here the partition #2) - see the columns Pstart and Pstop.
Slightly complicated, but similar, is the case with in IN LIST
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (123, 345);
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 4 (0)| 00:00:01 | | |
|* 1 | HASH JOIN OUTER | | 1 | 51 | 4 (0)| 00:00:01 | | |
| 2 | PARTITION LIST INLIST| | 1 | 38 | 2 (0)| 00:00:01 |KEY(I) |KEY(I) |
|* 3 | TABLE ACCESS FULL | TABLEA | 1 | 38 | 2 (0)| 00:00:01 |KEY(I) |KEY(I) |
| 4 | TABLE ACCESS FULL | TABLEB | 1 | 13 | 2 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."TB_KET"="B"."TB_KEY"(+))
3 - filter("A"."ROW_ID"=123 OR "A"."ROW_ID"=345)
In this case more partitions can be accessed, but only those partitions, that contains the keys from the IN LIST are considered.
Same is valid for access using bind variable.
Access with keys from a table using NESTED LOOPS
More complicated is the case where the two tables are joined. While using a nested loop join for each key from TABLEB the
TABLEA is accessed. This means that for each key only the one partition where the key is located is accessed.
SELECT A.* FROM TABLEA A
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 60 | 4 (25)| 00:00:01 | | |
| 1 | NESTED LOOPS | | 1 | 60 | 4 (25)| 00:00:01 | | |
| 2 | SORT UNIQUE | | 1 | 22 | 2 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | TABLEB | 1 | 22 | 2 (0)| 00:00:01 | | |
| 4 | PARTITION LIST ITERATOR| | 100K| 3710K| 1 (0)| 00:00:01 | KEY | KEY |
|* 5 | TABLE ACCESS FULL | TABLEA | 100K| 3710K| 1 (0)| 00:00:01 | KEY | KEY |
---------------------------------------------------------------------------------------------------
Again there is a partition pruning KEY - KEY, so only partitions with key from TABLEB are accessed, but from the nature of nested loops, one partition can be accessed several times (for different keys).
Access with keys from a table using HASH JOIN
Using HASH JOIN is the most complicated case, where the partition pruning must happen before the join started. Here is the Bloom Filter at work.
How does it work? After scanning the TABLEB Oracle knows all relevant keys from it, those keys can be mapped to the relevant partitions and a
the Bloom Filter (BF) of those partitions is created (operation 3 and 2).
The BF is passed to the TABLEA and is used for partition pruning on it (operation 4 and 5).
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 5859K| 5 (20)| 00:00:01 | | |
|* 1 | HASH JOIN RIGHT SEMI | | 100K| 5859K| 5 (20)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE | :BF0000 | 1 | 22 | 2 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | TABLEB | 1 | 22 | 2 (0)| 00:00:01 | | |
| 4 | PARTITION LIST JOIN-FILTER| | 100K| 3710K| 2 (0)| 00:00:01 |:BF0000|:BF0000|
| 5 | TABLE ACCESS FULL | TABLEA | 100K| 3710K| 2 (0)| 00:00:01 |:BF0000|:BF0000|
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ROW_ID"="ID")
3 - filter("DT_COL"=SYSDATE#!)
See the Pstart, Pstop :BFnnnn as a sign of the Bloom Filter.
Partition pruning is able to work with a LEFT OUTER JOIN and an IN subquery. You are likely seeing a very specific problem that requires a more specific test case.
Sample schema
drop table tableb purge;
drop table tablea purge;
create table tablea (row_id numeric(38),tb_key numeric(38),row_data varchar(20))
partition by list(row_id)
(partition p1 values (1),partition p123 values(123),partition p345 values(345));
create table tableb (id numeric(38), dt_col date, tb_key numeric(38));
begin
dbms_stats.gather_table_stats(user, 'TABLEA');
dbms_stats.gather_table_stats(user, 'TABLEB');
end;
/
Queries
--Partition pruning works when i query from table with no joins:
explain plan for
SELECT A.* FROM TABLEA A
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
select * from table(dbms_xplan.display);
--Partition Pruning fails when I do left outer join to TableB
explain plan for
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
select * from table(dbms_xplan.display);
--Partition Pruning works when I change left outer join to inner join
explain plan for
SELECT A.* FROM TABLEA A
INNER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
select * from table(dbms_xplan.display);
--Partition Pruning works when I do left outer join to TableB and do not use IN clause
explain plan for
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID = 123;
select * from table(dbms_xplan.display);
--Partition Pruning works when I do left outer join to TableB and use static values for IN clause
explain plan for
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID IN (123, 345);
select * from table(dbms_xplan.display);
Output
The full execution plan is not displayed here, to save space. The only important columns are Pstart and Pstop, which imply partition pruning is used.
The execution plans look like one of the following:
... -----------------
... | Pstart| Pstop |
... -----------------
...
... | KEY | KEY |
... | KEY | KEY |
...
... -----------------
OR
... -----------------
... | Pstart| Pstop |
... -----------------
...
... | 2 | 2 |
... | 2 | 2 |
...
... -----------------
OR
... -----------------
... | Pstart| Pstop |
... -----------------
...
... |KEY(I) |KEY(I) |
... |KEY(I) |KEY(I) |
...
... -----------------
How does this help?
Not a lot. Even though you have provided much more information than the typical question, even more information is needed to solve this problem.
At least now you know the problem is not caused by a generic limitation in partition pruning. There is a very specific issue here, most likely related to optimizer statistics. Getting to the bottom of such issues can require a lot of time. I'd recommend starting from the sample data above and adding more features and data until the partition pruning goes away.
Post the new test case here, by modifying the question, and someone should be able to solve it.
This query works but takes 5000 miliseconds.
SELECT
SUM(case
when ((TRUNC(OPEN_DATE) <= thedate and TRUNC(END_DATE) > thedate) or(TRUNC(OPEN_DATE) <= thedate and END_DATE Is Null)) then 1
else 0
end) as Open
From (
select *
FROM PROJECT
WHERE
PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
)
cross join (
select add_months(last_day(SYSDATE), level-7) as thedate
from dual
connect by level <= 12
)
GROUP BY thedate
ORDER BY thedate
If I copy the subquery to its own table
create table test_project as
select * FROM PROJECT WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
then do the above query but the subquery is on the copied table as:
From ( select * FROM test_project WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName )
the query takes 10 milliseconds
The query produces a count of how many projects were open in that month over the past 5 and future months (count of open projects for furture months will just equal todays months totals) based on comparing OPEN_DATE to END_DATE
Is there a way to rewrite the original query for optimal performance?
EDIT
OK, I created a second table which is a full copy of the project table (well view) that I was allowed access to. The table copy took about 5 seconds. Using the full set of data and either my sql query or from Egor below, the query is super fast. Something is up with the view. Trying to spit out explain plan using the View in the subquery I get insufficient privileges. Here is the explain plan using a full copy of the view
Plan hash value: 3695211866
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 637 | 1277K| 163 (2)| 00:00:02 |
| 1 | SORT ORDER BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 2 | HASH GROUP BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 3 | MERGE JOIN CARTESIAN | | 637 | 1277K| 161 (0)| 00:00:02 |
| 4 | VIEW | | 1 | 6 | 2 (0)| 00:00:01 |
|* 5 | CONNECT BY WITHOUT FILTERING| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | BUFFER SORT | | 637 | 1273K| 163 (2)| 00:00:02 |
|* 8 | TABLE ACCESS FULL | COMMIT_TEST | 637 | 1273K| 159 (0)| 00:00:02 |
Predicate Information (identified by operation id):
5 - filter(LEVEL<=12)
8 - filter("PROGRAM_NAME"='program_name' AND "ACTION_FOR_ORG"='action_for_org')
Note
- dynamic sampling used for this statement (level=2)
Explain Plan using live table
with
PRJ as (
select /*+ NO_UNNEST */
trunc(OPEN_DATE) as OPEN_DATE,
nvl(trunc(END_DATE), sysdate + 1000) as END_DATE
from
PROJECT
where
PROGRAM_NAME = :program
and ACTION_FOR_ORG = :orgName
),
DATES as (
select
add_months(trunc(last_day(SYSDATE)), level-7) as thedate
from dual
connect by level <= 12
)
SELECT
thedate,
sum(case when thedate between open_date and end_date then 1 end) as Open
FROM
DATES, PRJ
GROUP BY thedate
ORDER BY 1