Oracle Partition Pruning not working when left outer join is used - oracle

I have one table that is list partitioned on a numeric column (row_id),
TABLEA (ROW_ID NUMERIC(38), TB_KEY NUMERIC(38), ROW_DATA VARCHAR(20));
Partition pruning works when i query from table with no joins:
SELECT A.* FROM TABLEA A
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
Partition Pruning fails when I do left outer join to TableB
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
Partition Pruning works when I change left outer join to inner join
SELECT A.* FROM TABLEA A
INNER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
Partition Pruning works when I do left outer join to TableB and do not use IN clause
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID = 123;
Partition Pruning works when I do left outer join to TableB and use static values for IN clause
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (123, 345);
Can someone explain me why left outer join will cause partition pruning to fail, when i query on column that table is partitioned on using IN clause with result from subquery?

The answer on Oracle 11g is YES, the partition pruning works fine.
There are three main access patterns in your setup, with the list partitioned table TABLEA, lets go all of them through. Note, that I'm using
simplest possible statements to illustrated the behavior.
Access with keys in equal predicate or in IN List
The simplest case is using a literal in an equal predicate on the partition key:
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE A.ROW_ID = 123;
This leads to following execution plan
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 4 (0)| 00:00:01 | | |
|* 1 | HASH JOIN OUTER | | 1 | 51 | 4 (0)| 00:00:01 | | |
| 2 | PARTITION LIST SINGLE| | 1 | 38 | 2 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | TABLEA | 1 | 38 | 2 (0)| 00:00:01 | 2 | 2 |
| 4 | TABLE ACCESS FULL | TABLEB | 1 | 13 | 2 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."TB_KET"="B"."TB_KEY"(+))
3 - filter("A"."ROW_ID"=123)
Only the relevant partition ot TABLEA is accessed (here the partition #2) - see the columns Pstart and Pstop.
Slightly complicated, but similar, is the case with in IN LIST
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KET = B.TB_KEY
WHERE ROW_ID IN (123, 345);
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 4 (0)| 00:00:01 | | |
|* 1 | HASH JOIN OUTER | | 1 | 51 | 4 (0)| 00:00:01 | | |
| 2 | PARTITION LIST INLIST| | 1 | 38 | 2 (0)| 00:00:01 |KEY(I) |KEY(I) |
|* 3 | TABLE ACCESS FULL | TABLEA | 1 | 38 | 2 (0)| 00:00:01 |KEY(I) |KEY(I) |
| 4 | TABLE ACCESS FULL | TABLEB | 1 | 13 | 2 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."TB_KET"="B"."TB_KEY"(+))
3 - filter("A"."ROW_ID"=123 OR "A"."ROW_ID"=345)
In this case more partitions can be accessed, but only those partitions, that contains the keys from the IN LIST are considered.
Same is valid for access using bind variable.
Access with keys from a table using NESTED LOOPS
More complicated is the case where the two tables are joined. While using a nested loop join for each key from TABLEB the
TABLEA is accessed. This means that for each key only the one partition where the key is located is accessed.
SELECT A.* FROM TABLEA A
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 60 | 4 (25)| 00:00:01 | | |
| 1 | NESTED LOOPS | | 1 | 60 | 4 (25)| 00:00:01 | | |
| 2 | SORT UNIQUE | | 1 | 22 | 2 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | TABLEB | 1 | 22 | 2 (0)| 00:00:01 | | |
| 4 | PARTITION LIST ITERATOR| | 100K| 3710K| 1 (0)| 00:00:01 | KEY | KEY |
|* 5 | TABLE ACCESS FULL | TABLEA | 100K| 3710K| 1 (0)| 00:00:01 | KEY | KEY |
---------------------------------------------------------------------------------------------------
Again there is a partition pruning KEY - KEY, so only partitions with key from TABLEB are accessed, but from the nature of nested loops, one partition can be accessed several times (for different keys).
Access with keys from a table using HASH JOIN
Using HASH JOIN is the most complicated case, where the partition pruning must happen before the join started. Here is the Bloom Filter at work.
How does it work? After scanning the TABLEB Oracle knows all relevant keys from it, those keys can be mapped to the relevant partitions and a
the Bloom Filter (BF) of those partitions is created (operation 3 and 2).
The BF is passed to the TABLEA and is used for partition pruning on it (operation 4 and 5).
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 5859K| 5 (20)| 00:00:01 | | |
|* 1 | HASH JOIN RIGHT SEMI | | 100K| 5859K| 5 (20)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE | :BF0000 | 1 | 22 | 2 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | TABLEB | 1 | 22 | 2 (0)| 00:00:01 | | |
| 4 | PARTITION LIST JOIN-FILTER| | 100K| 3710K| 2 (0)| 00:00:01 |:BF0000|:BF0000|
| 5 | TABLE ACCESS FULL | TABLEA | 100K| 3710K| 2 (0)| 00:00:01 |:BF0000|:BF0000|
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ROW_ID"="ID")
3 - filter("DT_COL"=SYSDATE#!)
See the Pstart, Pstop :BFnnnn as a sign of the Bloom Filter.

Partition pruning is able to work with a LEFT OUTER JOIN and an IN subquery. You are likely seeing a very specific problem that requires a more specific test case.
Sample schema
drop table tableb purge;
drop table tablea purge;
create table tablea (row_id numeric(38),tb_key numeric(38),row_data varchar(20))
partition by list(row_id)
(partition p1 values (1),partition p123 values(123),partition p345 values(345));
create table tableb (id numeric(38), dt_col date, tb_key numeric(38));
begin
dbms_stats.gather_table_stats(user, 'TABLEA');
dbms_stats.gather_table_stats(user, 'TABLEB');
end;
/
Queries
--Partition pruning works when i query from table with no joins:
explain plan for
SELECT A.* FROM TABLEA A
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
select * from table(dbms_xplan.display);
--Partition Pruning fails when I do left outer join to TableB
explain plan for
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
select * from table(dbms_xplan.display);
--Partition Pruning works when I change left outer join to inner join
explain plan for
SELECT A.* FROM TABLEA A
INNER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID IN (SELECT ID FROM TABLEB WHERE DT_COL = SYSDATE);
select * from table(dbms_xplan.display);
--Partition Pruning works when I do left outer join to TableB and do not use IN clause
explain plan for
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID = 123;
select * from table(dbms_xplan.display);
--Partition Pruning works when I do left outer join to TableB and use static values for IN clause
explain plan for
SELECT A.* FROM TABLEA A
LEFT OUTER JOIN TABLEB B ON A.TB_KEY = B.TB_KEY
WHERE ROW_ID IN (123, 345);
select * from table(dbms_xplan.display);
Output
The full execution plan is not displayed here, to save space. The only important columns are Pstart and Pstop, which imply partition pruning is used.
The execution plans look like one of the following:
... -----------------
... | Pstart| Pstop |
... -----------------
...
... | KEY | KEY |
... | KEY | KEY |
...
... -----------------
OR
... -----------------
... | Pstart| Pstop |
... -----------------
...
... | 2 | 2 |
... | 2 | 2 |
...
... -----------------
OR
... -----------------
... | Pstart| Pstop |
... -----------------
...
... |KEY(I) |KEY(I) |
... |KEY(I) |KEY(I) |
...
... -----------------
How does this help?
Not a lot. Even though you have provided much more information than the typical question, even more information is needed to solve this problem.
At least now you know the problem is not caused by a generic limitation in partition pruning. There is a very specific issue here, most likely related to optimizer statistics. Getting to the bottom of such issues can require a lot of time. I'd recommend starting from the sample data above and adding more features and data until the partition pruning goes away.
Post the new test case here, by modifying the question, and someone should be able to solve it.

Related

ORA_ROWSCN queried properly but why is Oracle returning the wrong value in the results?

Why does Oracle sometimes return the wrong ORA_ROWSCN, such as in the following? (Note this does not seem to be a ROWDEPENDENCIES issue or a "greater than expected SCN" issue, as I realize both these caveats when using ORA_ROWSCN.)
When I run:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618
My result is:
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA2
1887508 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA3
1887512 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA7
...
Yep, you see that right. The ORA_ROWSCN returned is less than my literal value that I asked for greater-than in the query WHERE clause. (I also included otherSCN to see if it was throwing me off somehow, but it appears to be irrelevant)
It appears that the Row in question in reality has a higher ORA_ROWSCN, and indeed the WHERE clause worked properly, as when I then do SELECT ORA_ROWSCN FROM changed_rows_log WHERE changed_rows_log_id=1887507, I get 7884576380644 not 7884576380617.
Also, when I add just one WHERE condition, I also get the correct data returned:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618 AND l.changed_rows_log_id=1887507
gives me this, as expected
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380644 7884576380644 FOO AAARiGAAMAAG4B4AA2
So why does and how can SELECT ORA_ROWSCN give me simply incorrect data like this? Can I work around it somehow so I can get the expected ORA_ROWSCN that more particular queries give me?
(If it matters, changed_rows_log has ROWDEPENDENCIES enabled. I'm using Oracle Database 12.1.0.2.0 64-bit.)
More detail--the EXPLAIN PLAN for the first query (with bad value)
Plan hash value: 3153795477
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 3 | HASH JOIN | | 208K| 12M| 3424K| 30787 (1)| 00:00:02 |
|* 4 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 71438 | 2581K| | 14052 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 1428K| 34M| | 14058 (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("L"."CHANGED_ROWS_LOG_ID"=MAX("CHANGED_ROWS_LOG_ID"))
3 - access("L"."TABLE_NAME"="TABLE_NAME" AND "L"."RECORD_ROWID"="RECORD_ROWID")
4 - filter("ORA_ROWSCN">7884576380618)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 2 Sql Plan Directives used for this statement
And the last query above (correct value)
Plan hash value: 402632295
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 7 (15)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | 7 (15)| 00:00:01 |
| 3 | NESTED LOOPS | | 3 | 186 | 6 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID | CHANGED_ROWS_LOG | 1 | 37 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C00141068 | 1 | | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| CHANGED_ROWS_LOG | 3 | 75 | 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | CHANGED_ROWS_LOG_IF1 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(MAX("CHANGED_ROWS_LOG_ID")=1887507)
4 - filter("ORA_ROWSCN">7884576380618)
5 - access("L"."CHANGED_ROWS_LOG_ID"=1887507)
7 - access("L"."RECORD_ROWID"="RECORD_ROWID" AND "L"."TABLE_NAME"="TABLE_NAME")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
Adding the MATERIALIZE hint to the WITH subquery overcomes this issue. I'd love if someone could explain why the issue happens at all, but for now:
WITH maxIds as (
SELECT /*+ MATERIALIZE */ table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618

Oracle - Query Optimization - Query runs for a long time

I have a oracle query which is executed once a month to get the order details processed. This query is taking a painfully lot of time to execute. ( More than thirty mins ). Therefore I am trying to optimize this. I have a decent knowledge in Oracle and I will explain what I have tried so far. Still, it takes around 20 minutes to complete. This is the query. Oracle version is 11g.
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
Background
order_main - This table has over 4 million records
I introduced index for order_no column which reduced time to execute.
My questions are as follows.
1) Will it help if I move date validation inside the inner query like this ?
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
WHERE to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
2) Could someone please suggest any other improvements that can be done to this query?
3) Additional indexing would help in any other tables / columns ? Only daily_trx and order_main tables are the tables that contains huge amount of data.
Some generall suggestions
Do not combine ANSI and Oracle Join Syntax in one Query
Do not use outer join if inner join can be used
Your inner subqueries use outer joins, but the final join to main_stores is an inner join
eliminating all rows with store_no is null - you may use inner joins with the same result.
Filter rows early
A suboptimal practice is to first join in a subquery and than filter relevant row with where conditions
Use simple predicated
If you want to constraint a DATE column do it this way
trx.ADDED_DATE >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
Use count distinct if appropriate
The select DISTINCTquery in the third line cam be eliminated if you use COUNT(DISTINCT order_no)
Applying all the above point I come to the following query
select
store.store_no, store.store_typ, count(DISTINCT trx.order_no) order_no_cnt
from daily_trx trx
join order_main odr on trx.order_no = odr.order_no
join ordr_prod_typ prod on odr.prod_typ = prod.prod_typ
join main_stores store on store.store_no = prod.store_no
where trx.ADDED_DATE >= date'2020-05-01' and
trx.ADDED_DATE < date'2020-06-01' and
trx.order_no !='NA'
group by store.store_no, store.store_typ
Performance Considereations
You process a month of data, so there will be probably a large number of transaction (say 100K+). In this case the best approach is to full scan the two large tables and perform HASH JOINs.
You can expect this execution plan
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 199K| 5850K| | 592 (2)| 00:00:08 |
|* 1 | HASH JOIN | | 199K| 5850K| | 592 (2)| 00:00:08 |
| 2 | TABLE ACCESS FULL | MAIN_STORES | 26 | 104 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 199K| 5070K| | 588 (2)| 00:00:08 |
| 4 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 199K| 4290K| 1960K| 584 (1)| 00:00:08 |
|* 6 | TABLE ACCESS FULL| ORDER_MAIN | 100K| 782K| | 69 (2)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| DAILY_TRX | 200K| 2734K| | 172 (2)| 00:00:03 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
3 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
5 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
6 - filter("ODR"."ORDER_NO"<>'NA')
7 - filter("TRX"."ADDED_DATE"<TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ORDER_NO"<>'NA' AND "TRX"."ADDED_DATE">=TO_DATE(' 2020-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
If you have a partition option available you will massively profit by defining a monthly partitioning schema (alternative a daily partitioning) on the two tables DAILY_TRX and ORDER_MAIN.
If the above assumption is not correct and you have very few transactions in the selected time interval (say below 1K) - you will go better using the index access and NESTED LOOPS joins.
You will need this set of indices
create index daily_trx_date on daily_trx(ADDED_DATE);
create unique index order_main_idx on order_main (order_no);
create unique index ORDR_PROD_TYP_idx1 on ORDR_PROD_TYP(prod_typ);
create unique index main_stores_idx1 on main_stores(store_no);
The expected plan is as follows
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID | DAILY_TRX | 92 | 1288 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | DAILY_TRX_DATE | 92 | | 3 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 100K| 1564K| 75 (3)| 00:00:01 |
| 5 | MERGE JOIN | | 26 | 208 | 6 (17)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MAIN_STORES | 26 | 104 | 2 (0)| 00:00:01 |
| 7 | INDEX FULL SCAN | MAIN_STORES_IDX1 | 26 | | 1 (0)| 00:00:01 |
|* 8 | SORT JOIN | | 26 | 104 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | 3 (0)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | ORDER_MAIN | 100K| 782K| 69 (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
2 - filter("TRX"."ORDER_NO"<>'NA')
3 - access("TRX"."ADDED_DATE">=TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ADDED_DATE"<TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
4 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
8 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
filter("STORE"."STORE_NO"="PROD"."STORE_NO")
10 - filter("ODR"."ORDER_NO"<>'NA')
Check here how to get the execution plan of your query

Convoluted nested CASE WHEN - better to use subqueries in SELECT statement or just join the other table(s)?

Here is pseudocode to describe my current script:
SELECT
A.ID
, A.YEAR
, CASE WHEN (SELECT B.ID
FROM TABLE_B
WHERE B.ID = A.ID
AND B.YEAR = A.YEAR
AND B.CONDITION = 'TRUE') = A.ID THEN
CASE WHEN (SELECT C.STATUS
FROM TABLE_C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'X' THEN 'STATUS X'
ELSE 'STATUS Y' END
ELSE CASE WHEN (SELECT C.STATUS
FROM TABLE_C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'Z' THEN 'STATUS Z'
ELSE 'STATUS NOT FOUND' END
END AS STATUS
FROM TABLE_A A
I simplified this pseudocode. My actual script has more subqueries, all hitting the same two tables - this seems overly convoluted and I am wondering if it would be better just to join TABLE_B and TABLE_C to my outer query? Or perhaps to select these fields into a temp table and then to write a cursor that will update the STATUS field?
The web application that uses this script will be used by many people, so performance is definitely a concern.
Without seeing the real data it is hard to say anything for certain. xQbert has some interesting comments; I would also recommend benchmarking and testing your real potential queries against your real data and compare your results.
The optimizer is smart, but depending on the nature and volume of data in TABLE_A, TABLE_B, TABLE_C, you may get different plans.
The answer may be different, depending on the data and the query. The clearest way to know is to test.
Here's an example:
First, create the test tables and load them. For this example, I'll arbitrarily just throw 20K rows in each, with a high uniformity of data across the three.
CREATE TABLE TABLE_A(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER);
CREATE TABLE TABLE_B(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, CONDITION VARCHAR2(20));
CREATE TABLE TABLE_C(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, STATUS VARCHAR2(20));
INSERT INTO TABLE_A(YEAR) SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_B(YEAR,CONDITION)
SELECT LEVEL, DECODE(MOD(LEVEL,2),0,'TRUE','FALSE') FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_C(YEAR,STATUS)
SELECT LEVEL, DECODE(MOD(LEVEL,3),0,'X',1,'Y','Z') FROM DUAL CONNECT BY LEVEL <= 20000;
... gather stats
Then, compare your queries. First with scalars:
SELECT
A.ID
, A.YEAR
, CASE WHEN (SELECT B.ID
FROM TABLE_B B
WHERE B.ID = A.ID
AND B.YEAR = A.YEAR
AND B.CONDITION = 'TRUE') = A.ID THEN
CASE WHEN (SELECT C.STATUS
FROM TABLE_C C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'X' THEN 'STATUS X'
ELSE 'STATUS Y' END
ELSE CASE WHEN (SELECT C.STATUS
FROM TABLE_C C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'Z' THEN 'STATUS Z'
ELSE 'STATUS NOT FOUND' END
END AS STATUS
FROM TABLE_A A
ORDER BY 1 ASC;
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 156K| 89479 (1)| 00:00:04 |
|* 1 | TABLE ACCESS BY INDEX ROWID | TABLE_B | 1 | 14 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | SYS_C00409109 | 1 | | 1 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID | TABLE_C | 1 | 10 | 2 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C00409111 | 1 | | 1 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| TABLE_C | 1 | 10 | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | SYS_C00409111 | 1 | | 1 (0)| 00:00:01 |
| 7 | SORT ORDER BY | | 20000 | 156K| 89479 (1)| 00:00:04 |
| 8 | TABLE ACCESS FULL | TABLE_A | 20000 | 156K| 11 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Statistics
-----------------------------------------------------------
4 CPU used by this session
4 CPU used when call started
11 DB time
10592 Requests to/from client
10591 SQL*Net roundtrips to/from client
80006 buffer is not pinned count
766 buffer is pinned count
117828 bytes received via SQL*Net from client
2531165 bytes sent via SQL*Net to client
2 calls to get snapshot scn: kcmgss
5 calls to kcmgcs
41127 consistent gets
...
...
etc.
And compare with a join query that returns equivalent data:
SELECT
TABLE_A.ID,
TABLE_A.YEAR,
CASE WHEN TABLE_B.CONDITION = 'TRUE'
THEN
DECODE(TABLE_C.STATUS, 'X', 'STATUS X', 'STATUS Y')
ELSE DECODE(TABLE_C.STATUS, 'Z', 'STATUS Z', 'STATUS NOT FOUND') END AS STATUS
FROM TABLE_A
LEFT OUTER JOIN TABLE_B
ON TABLE_A.ID = TABLE_B.ID
AND TABLE_A.YEAR = TABLE_B.YEAR
LEFT OUTER JOIN TABLE_C
ON TABLE_A.ID = TABLE_C.ID
AND TABLE_A.YEAR = TABLE_C.YEAR
ORDER BY 1 ASC;
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 625K| 42 (10)| 00:00:01 |
| 1 | SORT ORDER BY | | 20000 | 625K| 42 (10)| 00:00:01 |
|* 2 | HASH JOIN RIGHT OUTER| | 20000 | 625K| 40 (5)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TABLE_C | 20000 | 195K| 13 (0)| 00:00:01 |
|* 4 | HASH JOIN OUTER | | 20000 | 429K| 26 (4)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TABLE_A | 20000 | 156K| 11 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | TABLE_B | 20000 | 273K| 14 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Statistics
-----------------------------------------------------------
1 CPU used by this session
1 CPU used when call started
13 DB time
10592 Requests to/from client
10591 SQL*Net roundtrips to/from client
117622 bytes received via SQL*Net from client
2531166 bytes sent via SQL*Net to client
2 calls to get snapshot scn: kcmgss
11 calls to kcmgcs
160 consistent gets
...
...
etc.
In this example all the tables had similar rows and a 1:1 join relationship.
The queries had different plans, as would be expected.
But what if TABLE_A had 1B rows and very few joined to TABLE_C?
Or if TABLE_B were 100% 'TRUE', or data is highly skewed, etc.?
Both the scalar plan and the join plan will differ against different data sets.
Throwing a temp table in the mix would surely perform differently as well (I'd be surprised if it won the day in this kind of scenario, but...)
Testing to find the best fit for your data is the surest way, especially given your comment that performance is a concern.

How to reuse sql with subquery factoring by database view

What is the best practice to convert the following sql statement using a subquery (with data as clause) to use it in a database view.
AFAIK the with data as clause is not supported in database views (Edited: Oracle supports Common Table Expressions), but in my case the subquery factoring offers advantage for performance. If I create a database view using Common Table Expression, than this advantage is lost.
Please have a look at my example:
Description of query
a_table
Millions of entries, by the select statement a few thousand are selected.
anchor_table
For each entry in a_table exists a corresponding entry in anchor_table. By this table is determined at runtime exactly one row as anchor. See example below.
horizon_table
For each selection exactly one entry is determined at runtime (all entries of a selection of a_table have the same horizon_id)
Please notice: This is a strongly simplified sql that works fine so far.
In reality more than 20 tables are joined together to get the results of data.
The where clause is much more complex.
Further columns of horizon_table and anchor_table are required to prepare my where condition and result list in the subquery, i.e. moving these tables to the main query is no solution.
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 1 and 10000
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
example of with data as select:
id descr offset anchor position
1 bla 3 0 1
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
54 oeroe 3 0 9
...
result of select * from data
id descr offset anchor position
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
I.E. the result is the anchor row and the tree rows above and below.
How can I achieve the same within a database view?
My attempt failed as I expected by performance issues:
Create a view data of with data as select above
Use this view as above
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
Thank you for any advice :-)
Amendment
If I create a view as recommended in first comment, than I get the same performance issue. Oracle does not use the subquery to restrict the results.
Here are the execution plans of my production queries (please click at the images)
a) SQL
b) View
Here are the execution plans of my test cases
-- Create Testdata table with ~ 1,000,000 entries
insert into a_table
(id, descr, a_position_field, anchor_id, horizon_id, a_value)
select level, 'data' || level, mod(level, 10), level, 1, level
from dual
connect by level <= 999999;
insert into anchor_table
(id, a_date)
select level, trunc(sysdate) - 500000 + level
from dual
connect by level <= 999999;
insert into horizon_table (id, offset) values (1, 50);
commit;
-- Create view
create or replace view testdata_vw as
with data as
(select a_table.id,
a_table.descr,
a_table.a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id))
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
-- Explain plan of subquery factoring select statement
explain plan for
with data as
(select a_table.id,
a_table.descr,
a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 500000 - 500 and 500000 + 500)
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D6628_284C5768 ~ 1000 rows
Plan hash value: 1145408420
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 1791 (2)| 00:00:31 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6628_284C5768 | | | | |
| 3 | WINDOW SORT | | 57 | 6840 | 1785 (2)| 00:00:31 |
|* 4 | HASH JOIN | | 57 | 6840 | 1784 (2)| 00:00:31 |
|* 5 | TABLE ACCESS FULL | A_TABLE | 57 | 4104 | 1193 (2)| 00:00:21 |
| 6 | MERGE JOIN CARTESIAN | | 1189K| 54M| 586 (2)| 00:00:10 |
| 7 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | 3 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 1189K| 24M| 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| 583 (2)| 00:00:10 |
|* 10 | FILTER | | | | | |
| 11 | VIEW | | 57 | 3534 | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 13 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 15 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID" AND
"ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
5 - filter("A_TABLE"."A_VALUE">=499500 AND "A_TABLE"."A_VALUE"<=500500)
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE
("T1") "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<=
(SELECT "D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1"
"DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D2" WHERE "D2"."ANCHOR"=1))
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
-- Explain plan of database view
explain plan for
select *
from testdata_vw
where a_value between 500000 - 500 and 500000 + 500;
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D662A_284C5768 ~ 1000000 rows
Plan hash value: 1422141561
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 1 | VIEW | TESTDATA_VW | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 2 | TEMP TABLE TRANSFORMATION | | | | | | |
| 3 | LOAD AS SELECT | SYS_TEMP_0FD9D662A_284C5768 | | | | | |
| 4 | WINDOW SORT | | 1189K| 136M| 147M| 37032 (1)| 00:10:30 |
|* 5 | HASH JOIN | | 1189K| 136M| | 6868 (1)| 00:01:57 |
| 6 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | | 3 (0)| 00:00:01 |
|* 7 | HASH JOIN | | 1189K| 106M| 38M| 6860 (1)| 00:01:57 |
| 8 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| | 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | A_TABLE | 1209K| 83M| | 1191 (2)| 00:00:21 |
|* 10 | FILTER | | | | | | |
|* 11 | VIEW | | 1189K| 70M| | 4431 (1)| 00:01:16 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 13 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 15 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID")
7 - access("ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE ("T1")
"C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= (SELECT
"D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1" "DESCR","C2"
"A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D2"
WHERE "D2"."ANCHOR"=1))
11 - filter("A_VALUE">=499500 AND "A_VALUE"<=500500)
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
sqlfiddle
explain plan of sql http://www.sqlfiddle.com/#!4/6a7022/3
explain plan of view http://www.sqlfiddle.com/#!4/6a7022/2
You need to write a view definition which returns all possible selectable ranges of a_value as two columns, start_a_value and end_a_value, along with all records which fall into each start/end range. In other words, the correct view definition should logically describe a |n^3| result set given n rows in a_table.
Then query that view as:
SELECT * FROM testdata_vw WHERE START_A_VALUE = 4950 AND END_A_VALUE = 5050;
Also, your multiple references to "data" are unnecessary; same logic can be delivered with an additional analytic function.
Final view def:
CREATE OR REPLACE VIEW testdata_vw AS
SELECT *
FROM
(
SELECT T.*,
MAX(CASE WHEN ANCHOR=1 THEN POSITION END)
OVER (PARTITION BY START_A_VALUE, END_A_VALUE) ANCHOR_POS
FROM
(
SELECT S.A_VALUE START_A_VALUE,
E.A_VALUE END_A_VALUE,
B.ID ID,
B.DESCR DESCR,
HORIZON_TABLE.OFFSET OFFSET,
CASE
WHEN ANCHOR_TABLE.A_DATE = TRUNC(SYSDATE)
THEN 1
ELSE 0
END ANCHOR,
ROW_NUMBER()
OVER(PARTITION BY S.A_VALUE, E.A_VALUE
ORDER BY B.A_POSITION_FIELD) POSITION
FROM
A_TABLE S
JOIN A_TABLE E
ON S.A_VALUE<E.A_VALUE
JOIN A_TABLE B
ON B.A_VALUE BETWEEN S.A_VALUE AND E.A_VALUE
JOIN ANCHOR_TABLE
ON ANCHOR_TABLE.ID = B.ANCHOR_ID
JOIN HORIZON_TABLE
ON HORIZON_TABLE.ID = B.HORIZON_ID
) T
) T
WHERE POSITION BETWEEN ANCHOR_POS - OFFSET AND ANCHOR_POS+OFFSET;
EDIT: SQL Fiddle with expected execution plan
I'm seeing the same (sensible) plan here that I saw in my database; if you're getting something different, please send fiddle link.
Use index lookup to find 1 row in "S" A_TABLE (A_VALUE = 4950)
Use index lookup to find 1 row in "E" A_TABLE (A_VALUE = 5050)
Nested Loop join #1 and #2 (1 x 1 join, still 1 row)
FTS 1 row from HORIZON table
Cartesian join #1 and #2 (1 x 1, okay to use Cartesian).
Use index lookup to find ~100 rows in "B" A_TABLE with values between 4950 and 5050.
Cartesian join #5 and #6 (1 x 102, okay to use Cartesian).
FTS ANCHOR_TABLE with hash join to #7.
Window-sort for analytic functions
You have a predicate outside the view and you want to be applied in the view.
For this, you can use push_pred hint:
select /*+PUSH_PRED(v)*/
*
from
testdata_vw v
where
a_value between 5000 - 50 and 5000 + 50;
SQLFIDDLE
EDIT: Now I've seen that you use the data subquery three times. For the first occurrence it makes sense to push the predicate, but for d1 and d2 it doesn't. It's another query.
What would I do is to use two context variables, set them according my needs and write the query:
SYS_CONTEXT('my_context_name', 'var5000');
create or replace view testdata_vw as
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between SYS_CONTEXT('my_context_name', 'var5000') - SYS_CONTEXT('my_context_name', 'var50') and SYS_CONTEXT('my_context_name', 'var5000') + SYS_CONTEXT('my_context_name', 'var50')
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1) ;
to use it:
dbms_session.set_context ('my_context_name', 'var5000', 5000);
dbms_session.set_context ('my_context_name', 'var50', 50);
select * from testdata_vw;
UPDATE: Instead of context variables(which can be used across sessions) you can use package variables as you commented.

Understanding Multi-Row Subqueries in Oracle 10g

I executed a SQL statement and come across a mess. I am not able to understand how this output is coming.
My employee table is: Emp_Id is primary key and dept_no is a foreign key to some other table.
EMP_ID EMP_NAME DEPT_NO MGR_NAME MGR_NO
---------- -------------------- ---------- ---------- -----------
111 Anish 121 Tanuj 1123
112 Aman 122 Jasmeet 1234
1123 Tanuj 122 Vipul 122
1234 Jasmeet 122 Anish 111
122 Vipul 123 Aman 112
100 Chetan 123 Anoop 666
101 Antal Aman
1011 Anjali 126
1111 Angelina 127
My dep1 table is:
DEPT_ID DEPT_NAME
---------- -------------
121 CSE
122 ECE
123 MEC
And the two tables are not linked at all.
The SQL Query is:
SQL> select emp_name
from employee
where dept_no IN (select dept_no from dep1 where dept_name='MEC');
And the output is:
EMP_NAME
--------------------
Anish
Aman
Tanuj
Jasmeet
Vipul
Chetan
Anjali
Angelina
8 rows selected.
And if I change the where condition to dept_name='me' it returns no rows.
Can someone explain why the execution is not generating an error since dept_no is not the column of dep1 table. And how the output is being generated.
if you run this query:
select emp_name
from employee
where dept_no IN (select t.dept_no from dep1 t where dept_name='MEC');
you will see the error there for in your query dept_no comes from employee table (not from dep1 table)and when dept_no is null ,no result will be come back from it and if you change your dept_name to something which is not in dep1 table it is clear that your dep1 table returns nothing and then dept_no cant be in nothing.
As from your query,
..where dept_no IN (select dept_no ...); -- it is similar as using EXISTS
The condition EXISTS is being done here: ( oracle won't return error for EXISTS clause).
CREATE TABLE my_test(ID INT);
CREATE TABLE my_new_test ( new_ID INT);
EXPLAIN PLAN FOR
select * from my_test where id in( select id from my_new_test);
select * from table(dbms_xplan.display);
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | MY_TEST | 1 | 13 | 2 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MY_NEW_TEST | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "MY_NEW_TEST" "MY_NEW_TEST" WHERE
:B1=:B2))
3 - filter(:B1=:B2)
Note
-----
- dynamic sampling used for this statement (level=2)
If you execute the plan for the valid columns , here ( new_id):
then normal access is done:
1 - ACCESS("ID"="NEW_ID")
And will cause and error for the following:
EXPLAIN PLAN FOR
select * from my_test where id in( select some_thing from my_new_test);
SQL Error: ORA-00904: "SOME_THING": invalid identifier
Let me try to answer it.
Oracle uses optimizer to decide the explain plan. Oracle may rewrite your query as it likes and thinks which one is better. And in and exists are interchangeable and performance depend on different things. (exists ends in full table scan and in uses index).
Let me take your case. Below is the explain plan for your query
Plan hash value: 3333342911
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 225 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL| DEPARTMENT | 1 | 12 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "DEPARTMENT" "DEPARTMENT" WHERE
:B1=:B2 AND "DEPT_NAME"='MEC'))
3 - filter(:B1=:B2)
4 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
This explain plan clearly shows that the query is re-written to use exists and it is equivalent to
select emp_name from employee where exists (select 0 from department where dept_name = 'MEC' and dept_no = dept_no);
The above query is a valid query and you are getting the right results.
Bind variables are nothing but dept_no (joining column).
Refer this IN vs EXISTS in oracle link to know more about in and exists.
Your explain plan is completely different if you use the correct column name. Below is the query and explain plan
Query:
select emp_name from employee where dept_no IN (select dept_id from department where dept_name='MEC');
Explain Plan:
Plan hash value: 3817251802
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 100 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 2 | 100 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| DEPARTMENT | 1 | 25 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEPT_NO"="DEPT_ID")
3 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
Oracle thinks it is better to use the filter and hash join to get the required details.
This behaviour depends on the oracle query parser and optimizer.
How about a join?
select a.emp_name
from employee a
join dep1 b
on a.dept_no = b.dept_id
where b.dept_name = 'MEC'

Resources