Oracle Optimizer Extraneous Filter Predicate? - oracle

Why does Oracle still apply a filter predicate on an index even after the access predicate for that same index guarantees the filter predicate is always true?
drop table index_filter_child
;
drop table index_filter_parent
;
create table index_filter_parent
as
select level id, chr(mod(level - 1, 26) + ascii('A')) code from dual connect by level <= 26
;
create table index_filter_child
as
with
"C" as (select chr(mod(level - 1, 26) + ascii('A')) code from dual connect by level <= 26)
select rownum id, C1.code from C C1, C C2
;
exec dbms_stats.gather_table_stats('USER','INDEX_FILTER_PARENT')
;
exec dbms_stats.gather_table_stats('USER','INDEX_FILTER_CHILD')
;
create index ix_index_filter_parent on index_filter_parent(code)
;
create index ix_index_filter_child on index_filter_child(code)
;
select P.*
from index_filter_parent "P"
join index_filter_child "C"
on C.code = P.code
where P.code in('A','Z') --same result if we predicate instead on C.code in('A','Z')
;
--------------------------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
--------------------------------------------------------------------------------------------------------------
| 0 | select statement | | 5 | 35 | 4 (0)| 00:00:01 |
| 1 | nested LOOPS | | 5 | 35 | 4 (0)| 00:00:01 |
| 2 | INLIST ITERATOR | | | | | |
| 3 | table access by index ROWID| INDEX_FILTER_PARENT | 2 | 10 | 2 (0)| 00:00:01 |
|* 4 | index range scan | IX_INDEX_FILTER_PARENT | 2 | | 1 (0)| 00:00:01 |
|* 5 | index range scan | IX_INDEX_FILTER_CHILD | 2 | 4 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."CODE"='A' or "P"."CODE"='Z')
5 - access("C"."CODE"="P"."CODE")
filter("C"."CODE"='A' or "C"."CODE"='Z') <========== why is this needed?
Why is the filter predicate in 5 needed in light of the access("C"."CODE"="P"."CODE") guaranteeing C.code is 'A' or 'Z'?
Thank you in advance.
Oracle 12.1 enterprise Edition.

This is a result of "transitive closure" transformation: you can read more about here:
Transitivity and Transitive Closure (Doc ID 68979.1) Doc id 68979.1
Jonathan Lewis - Cartesian Merge Join
Jonathan Lewis - Transitive Closure (or, even better, in his book "Cost Based Oracle Fundamentals")
If you get CBO trace (alter session set events '10053 trace name context forever, level 1' or alter session set events 'trace[SQL_Optimizer.*]), you will see that the transformation happens before choosing join method and access paths. It allows CBO to analyze more different access paths and choose the best available plan. Moreover, in case of adaptive plans, it allows oracle to change join method on the fly.
For example, you can get a plan like this:
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52 | 364 | 4 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 52 | 364 | 4 (0)| 00:00:01 |
| 2 | INLIST ITERATOR | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| INDEX_FILTER_PARENT | 2 | 10 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_INDEX_FILTER_PARENT | 2 | | 1 (0)| 00:00:01 |
| 5 | INLIST ITERATOR | | | | | |
|* 6 | INDEX RANGE SCAN | IX_INDEX_FILTER_CHILD | 52 | 104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C"."CODE"="P"."CODE")
4 - access("P"."CODE"='A' OR "P"."CODE"='Z')
6 - access("C"."CODE"='A' OR "C"."CODE"='Z')
In fact, you can disable it using the event 10155: CBO disable generation of transitive OR-chains.
Your example:
alter session set events '10155';
explain plan for
select P.*
from index_filter_parent "P"
join index_filter_child "C"
on C.code = P.code
where P.code in('A','Z');
Results:
SQL> alter session set events '10155';
Session altered.
SQL> explain plan for
2 select P.*
3 from index_filter_parent "P"
4 join index_filter_child "C"
5 on C.code = P.code
6 where P.code in('A','Z') ;
Explained.
SQL> #xplan typical
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
Plan hash value: 2543178509
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52 | 364 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 52 | 364 | 4 (0)| 00:00:01 |
| 2 | INLIST ITERATOR | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| INDEX_FILTER_PARENT | 2 | 10 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_INDEX_FILTER_PARENT | 2 | | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IX_INDEX_FILTER_CHILD | 26 | 52 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."CODE"='A' OR "P"."CODE"='Z')
5 - access("C"."CODE"="P"."CODE")
Note
-----
- this is an adaptive plan
22 rows selected.
As you can see, that predicate has disappeared.
PS. Other events for transitive predicates:
ORA-10155: CBO disable generation of transitive OR-chains
ORA-10171: CBO disable transitive join predicates
ORA-10179: CBO turn off transitive predicate replacement
ORA-10195: CBO don't use check constraints for transitive predicates

Related

Increase in query execution time after generating statistics for the target table (Oracle DB)

I have created a new table (tmp_requests_5) in the database and populate it with 290k rows of test data. Right after that, I execute the following query:
select *
from tmp_requests_5 r
join request_statuses rs on ( rs.id = r.status_id )
left join users u on ( u.id = r.created_user_id )
left join executors e on ( e.id = r.executor_id )
order by r.id desc
It runs in 0.045 seconds with the following execution plan:
Execution plan before generating statistics
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 361K| 10G| 1039K (1)| 00:00:41 |
| 1 | NESTED LOOPS | | 361K| 10G| 1039K (1)| 00:00:41 |
| 2 | NESTED LOOPS OUTER | | 361K| 10G| 678K (1)| 00:00:27 |
| 3 | NESTED LOOPS OUTER | | 361K| 10G| 335K (1)| 00:00:14 |
|* 4 | TABLE ACCESS BY INDEX ROWID| TMP_REQUESTS_5 | 361K| 10G| 622 (0)| 00:00:01 |
| 5 | INDEX FULL SCAN DESCENDING| TMP_REQUESTS_5_PK | 361K| | 622 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| USERS | 1 | 119 | 1 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | USERS_PK | 1 | | 0 (0)| 00:00:01 |
|* 8 | TABLE ACCESS BY INDEX ROWID | EXECUTORS | 1 | 50 | 1 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | EXECUTORS_PK | 1 | | 0 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | REQUEST_STATUSES | 1 | 69 | 1 (0)| 00:00:01 |
|* 11 | INDEX UNIQUE SCAN | REQUEST_STATUSES_PK | 1 | | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(INTERNAL_FUNCTION("R"."DESCRIPTION" /*+ LOB_BY_VALUE */ ) AND
INTERNAL_FUNCTION("R"."RESOLUTION" /*+ LOB_BY_VALUE */ ) AND
INTERNAL_FUNCTION("R"."RESPONSE_TO_APPLICANT" /*+ LOB_BY_VALUE */ ))
7 - access("U"."ID"(+)="R"."CREATED_USER_ID")
8 - filter(INTERNAL_FUNCTION("E"."RESPONSIBILITY_AREA" /*+ LOB_BY_VALUE */ ))
9 - access("E"."ID"(+)="R"."EXECUTOR_ID")
11 - access("RS"."ID"="R"."STATUS_ID")
But after statistics are generated for the new table, the same query runs much slower (10 seconds), although the cost is reduced. Here is the execution plan after generating statistics:
Execution plan after generating statistics
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 289K| 390M| | 100K (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 289K| 390M| 453M| 100K (1)| 00:00:04 |
|* 2 | HASH JOIN RIGHT OUTER | | 289K| 390M| | 14375 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL | USERS | 14 | 1666 | | 3 (0)| 00:00:01 |
|* 4 | HASH JOIN RIGHT OUTER| | 289K| 357M| | 14371 (1)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EXECUTORS | 4 | 200 | | 3 (0)| 00:00:01 |
|* 6 | HASH JOIN | | 289K| 343M| | 14367 (1)| 00:00:01 |
| 7 | TABLE ACCESS FULL | REQUEST_STATUSES | 5 | 345 | | 3 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | TMP_REQUESTS_5 | 289K| 324M| | 14363 (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("U"."ID"(+)="R"."CREATED_USER_ID")
4 - access("E"."ID"(+)="R"."EXECUTOR_ID")
5 - filter(INTERNAL_FUNCTION( /*+ LVC_LAZY_LOAD */ "E"."RESPONSIBILITY_AREA" /*+
LOB_BY_VALUE */ ))
6 - access("RS"."ID"="R"."STATUS_ID")
8 - filter(INTERNAL_FUNCTION( /*+ LVC_LAZY_LOAD */ "R"."DESCRIPTION" /*+ LOB_BY_VALUE */
) AND INTERNAL_FUNCTION( /*+ LVC_LAZY_LOAD */ "R"."RESOLUTION" /*+ LOB_BY_VALUE */ ) AND
INTERNAL_FUNCTION( /*+ LVC_LAZY_LOAD */ "R"."RESPONSE_TO_APPLICANT" /*+ LOB_BY_VALUE */ ))
I would like the execution plan to remain the same as before the statistics were generated, since it takes much less time. I've tried using optimizer hints, but have not been successful.
Is it possible to specify for the database the execution plan for this query as it was originally, before generating statistics for this table? If so, how can this be done and will there be any negative effects from this? I will be grateful for any help.
Database version: Oracle 19с standatrd edition

Limit rows examined in Oracle

My table has millions of records. In this query below, can I make Oracle 12c examine the first X rows only instead of doing a full table scan?
The value of X, I imagine should be Offset + Fetch Next , so in this case 15
SELECT * FROM table OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
Thanks in advance
Edit 1
These are the tables involved and this is the actual query
Orders - This table has 113k records in my test DB ( and over 8 million in prod db like my original question mentioned)
--------------------------
| Id | SKUField1|SKUField2|
--------------------------
| 1 | Value1 | Value2 |
| 2 | Value2 | Value2 |
| 3 | Value1 | Value3 |
--------------------------
Products - This table has 2 million records in my test DB ( prod db is similar)
---------------
| PId| SKU_NUM|
---------------
| 1 | Value1 |
| 2 | Value2 |
| 3 | Value3 |
---------------
Note that values of Orders.SKUField1 and Orders.SKUField2 come from the Products.SKU_NUM values
Actual Query:
SELECT /*+ gather_plan_statistics */ Id, PId, SKUField1, SKUField2, SKU_NUM
FROM Orders
LEFT JOIN (
-- this inner query reduces size of Products from 2 million rows down to 1462 rows
select * from Products where SKU_NUM in (
select SKUField1 from Orders
)
) p1 ON SKUField1 = p1.SKU_NUM
LEFT JOIN (
-- this inner query reduces size of table B from 2 million rows down to 459 rows
select * from Products where SKU_NUM in (
select SKUField2 from Orders
)
) p4 ON SKUField2 = p4.SKU_NUM
OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY
Execution Plan:
--------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.06 | 8013 | | | |
|* 1 | VIEW | | 1 | 00:00:01 | 10 |00:00:00.06 | 8013 | | | |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 00:00:01 | 15 |00:00:00.06 | 8013 | 27M| 1904K| |
|* 3 | HASH JOIN RIGHT OUTER | | 1 | 00:00:01 | 15 |00:00:00.06 | 8013 | 1162K| 1162K| 1344K (0)|
| 4 | VIEW | | 1 | 00:00:01 | 1462 |00:00:00.04 | 6795 | | | |
| 5 | NESTED LOOPS | | 1 | 00:00:01 | 1462 |00:00:00.04 | 6795 | | | |
| 6 | NESTED LOOPS | | 1 | 00:00:01 | 1462 |00:00:00.04 | 5333 | | | |
| 7 | SORT UNIQUE | | 1 | 00:00:01 | 1469 |00:00:00.04 | 3010 | 80896 | 80896 |71680 (0)|
| 8 | TABLE ACCESS FULL | Orders | 1 | 00:00:01 | 113K|00:00:00.02 | 3010 | | | |
|* 9 | INDEX UNIQUE SCAN | UIX_Product_SKU_NUM | 1469 | 00:00:01 | 1462 |00:00:00.01 | 2323 | | | |
| 10 | TABLE ACCESS BY INDEX ROWID | Products | 1462 | 00:00:01 | 1462 |00:00:00.01 | 1462 | | | |
|* 11 | HASH JOIN RIGHT OUTER | | 1 | 00:00:01 | 15 |00:00:00.02 | 1218 | 1142K| 1142K| 1335K (0)|
| 12 | VIEW | | 1 | 00:00:01 | 459 |00:00:00.02 | 1213 | | | |
| 13 | NESTED LOOPS | | 1 | 00:00:01 | 459 |00:00:00.02 | 1213 | | | |
| 14 | NESTED LOOPS | | 1 | 00:00:01 | 459 |00:00:00.02 | 754 | | | |
| 15 | SORT UNIQUE | | 1 | 00:00:01 | 462 |00:00:00.02 | 377 | 24576 | 24576 |22528 (0)|
| 16 | INDEX FAST FULL SCAN | Orders_SKUField2_IDX6 | 1 | 00:00:01 | 113K|00:00:00.01 | 377 | | | |
|* 17 | INDEX UNIQUE SCAN | UIX_Product_SKU_NUM | 462 | 00:00:01 | 459 |00:00:00.01 | 377 | | | |
| 18 | TABLE ACCESS BY INDEX ROWID| Products | 459 | 00:00:01 | 459 |00:00:00.01 | 459 | | | |
| 19 | TABLE ACCESS FULL | Orders | 1 | 00:00:01 | 15 |00:00:00.01 | 5 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------
Hence, based on the "A-Rows" column values for row Ids 8 and 16 in the execution plan, it seems like there are full table scans on the Orders table (though row id 16 atleast seems to be using an index). So my question is is it true that there is a full table scan on the orders table even though I am using Offset/Fetch Next
Although your FETCH clause may use a full table scan, Oracle will still only fetch the first X rows from the table.
In the following example, the "TABLE ACCESS FULL" operation does start to read the entire table, but it gets cutoff part of the way through by the "WINDOW NOSORT STOPKEY" operation. Not all full table scans actually scan the full table. You would see similar behavior if your code ended with WHERE ROWNUM <= 50.
CREATE TABLE some_table AS SELECT * FROM all_objects;
EXPLAIN PLAN FOR SELECT * FROM some_table OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
SELECT * FROM TABLE(dbms_xplan.display);
Plan hash value: 2559837639
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 7410 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 15 | 7410 | 2 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 15 | 2010 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | SOME_TABLE | 15 | 2010 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=15 AND
"from$_subquery$_002"."rowlimit_$$_rownumber">5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=15)
The performance implications get more complicated if you also want to order the results. If that is the case, you may want to post the full query and execution plan.
(EDIT: 2022-09-25)
Yes, there is a full table scan on the ORDERS table happening on line 8 of the execution plan. As you mentioned, you can look at the "A-rows" column to tell what's really happening.
But the third full table scan of ORDERS, on line 19, is not a "full" full table scan. The operation "WINDOW NOSORT STOPKEY" stops that full table scan as soon as the 15 necessary rows are read. So the FETCH syntax is helping at least a little.
Applying a FETCH to a query does not mean that every single table will be limited. Although, in your query, it does seem like there ought to be a way to reduce the full table scans. Perhaps an index on SKUField1 would help?
Since Oracle as I know don't provide something like limit or top you can created by yourself like the following:
what is happening here, the inner query gets all the first 10 records and the outer query get those, you can still use any clauses like where or order or any others
SELECT * FROM (
SELECT * FROM Customers WHERE CustomerID <= 10 ORDER BY CustomerID
)
The full article will be found about this topic here at Oracle-Fetch
I am using Online Oracle so you can try it from your end, please let me know if you still have a problem.

Oracle Compound Join Predicate Causes Row Estimate to be Incorrect

In the example below Oracle's optimizer's estimated rows is incorrect by two orders of magnitude. How do I improve the estimated rows?
Table A has rows with numbers 1 through 1,000 for each of the 10 letters A through J.
Table C has 100 copies of table A.
So, table A has a cardinality of 10K and table C has a cardinality of 1M.
A given single-valued predicate on the number in table A will yield 1/1000 of the rows in table A (same for table C).
A given single-valued predicate on the letter in table A will yield 1/10 of the rows in table A (same for table C).
Setup script.
drop table C;
drop table A;
create table A
( num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into A (num, val, pad)
select mod(level-1, 1000) +1
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A'))
, ' '
from dual
connect by level <= 10*1000
;
create table C
( id NUMBER
, num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into C (id, num, val, pad)
with
"D1" as
( select /*+ materialize */ null from dual connect by level <= 100 --320
)
, "D" as
( select /*+ materialize */
level rn
, mod(level-1, 1000) + 1 num
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A')) val
, ' ' pad
from dual
connect by level <= 10*1000
order by 1 offset 0 rows
)
select rownum id
, num num
, val val
, pad pad
from "D1", "D"
;
commit;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
Consider the explain plan to the following query.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num = 1
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 1 | 47 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 100 | 5200 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."NUM"=1 AND "A"."VAL"='A')
3 - filter("C"."NUM"=1 AND "C"."VAL"='A')
The row cardinality of each step makes sense to me.
ID=2 --> (1/1,000) * (1/10) * 10,000 = 1
ID=3 --> (1/1,000) * (1/10) * 1,000,000 = 100
ID=1 --> 100 is correct. Predicates in ID=2 and ID=3 are the same, every row from ID=2 will have one and only one match in the row source from ID=3.
Now consider the explain plan to the slightly modified query below.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 2 | 94 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 200 | 10400 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."VAL"='A' AND ("A"."NUM"=1 OR "A"."NUM"=2))
3 - filter("C"."VAL"='A' AND ("C"."NUM"=1 OR "C"."NUM"=2))
The row cardinality of each step ID=2 and ID=3 makes sense to me, but now ID=1 is incorrect by two orders of magnitude.
ID=2 --> (1/1,000)(1/10) * 10,000 = 1
ID=3 --> (1/1,000)(1/10) * 1,000,000 = 100
ID=1 --> The optimizer's estimate is two orders of magnitude different from the actual.
Adding unique and foreign constraints and extended statistics did not improve the estimated row counts.
create unique index IU_A on A (num, val);
alter table A add constraint UK_A unique (num, val) rely using index IU_A enable validate;
alter table C add constraint R_C foreign key (num, val) references A (num, val) rely enable validate;
create index IR_C on C (num, val);
select dbms_stats.create_extended_stats(null,'A','(num, val)') from dual;
select dbms_stats.create_extended_stats(null,'C','(num, val)') from dual;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 2 | 198 | 10 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| A | 2 | 94 | 5 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IU_A | 2 | | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access(("A"."NUM"=1 OR "A"."NUM"=2) AND "A"."VAL"='A')
6 - access("A"."NUM"="C"."NUM" AND "C"."VAL"='A')
filter("C"."NUM"=1 OR "C"."NUM"=2)
What do I need to do to make the estimated rows better match reality?
Using Oracle Enterprise Edition 19c.
Thanks in advance.
Edit
After ensuring the most recent optimizer_features_enable was used and modifying one of the predicates, we still have an explain plan whose estimated row count is short by two orders of magnitude.
ID=6 ought to have an estimated rows of 100. It seems it is applying the predicate factor twice. Once for the access and again for the filter.
select /*+ optimizer_features_enable('19.1.0') */
*
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val in('A','B')
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 4 | 396 | 16 (0)| 00:00:01 |
| 1 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 2 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | table access by index ROWID BATCHED| A | 4 | 188 | 7 (0)| 00:00:01 |
|* 5 | index range scan | IU_A | 4 | | 3 (0)| 00:00:01 |
|* 6 | index range scan | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | table access by index ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."NUM"=1 or "A"."NUM"=2)
filter("A"."VAL"='A' or "A"."VAL"='B')
6 - access("A"."NUM"="C"."NUM" and "A"."VAL"="C"."VAL")
filter(("C"."NUM"=1 or "C"."NUM"=2) and ("C"."VAL"='A' or "C"."VAL"='B'))

Query does not use specified parallel degree

In my Oracle 12c database I want a statement to be executed with parallel degree 2 without the use of a hint. Note: this is a sample table so there is no improvement in cost or time.
Execution Plan with parallelism 1
PLAN_TABLE_OUTPUT
-----------------
Plan hash value: 2671887276
-----------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------
| 0 | SELECT STATEMENT | | 1 | 674 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EVENT | 1 | 674 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | EVENT_PK | 1 | | 1 (0)| 00:00:01 |
--------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EVENT_PK"='zjmtzhjrth')
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
Execution plan with hint /*+parallel(2) */ where DoP works fine
PLAN_TABLE_OUTPUT
---------------
Plan hash value: 2851389777
----------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------
| 0 | SELECT STATEMENT | | 1 | 674 | 2 (0)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 1 | 674 | 2 (0)| 00:00:01 | Q1,01 | P->S | QC (RAND) |
| 3 | TABLE ACCESS BY INDEX ROWID | EVENT | 1 | 674 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 4 | BUFFER SORT | | | | | | Q1,01 | PCWC | |
| 5 | PX RECEIVE | | 1 | | 1 (0)| 00:00:01 | Q1,01 | PCWP | |
| 6 | PX SEND HASH (BLOCK ADDRESS)| :TQ10000 | 1 | | 1 (0)| 00:00:01 | Q1,00 | S->P | HASH (BLOCK|
| 7 | PX SELECTOR | | | | | | Q1,00 | SCWC | |
|* 8 | INDEX UNIQUE SCAN | EVENT_PK | 1 | | 1 (0)| 00:00:01 | Q1,00 | SCWP | |
--------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("EVENT_PK"='zjmtzhjrth')
Note
-----
- Degree of Parallelism is 2 because of hint
Then I executed the following statements
alter system set parallel_degree_policy=MANUAL;
alter table event parallel 2;
But when I execute the statement without the hint, it doesn't use parallelism. It doesn't even give me the Note about the DoP in the execution plan.
PLAN_TABLE_OUTPUT
----------------
Plan hash value: 2671887276
-----------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 674 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EVENT | 1 | 674 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | EVENT_PK | 1 | | 1 (0)| 00:00:01 |
-------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EVENT_PK"='zjmtzhjrth')
Can anyone tell my why this is not working?
Regarding the questions in the comments:
PARALLEL_DEGREE_LIMIT=CPU
When I set PARALLEL_DEGREE_POLICY back to AUTO it gives me the note again:
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
The statement I issued for my tests is
select * from event where event_pk = 'swdfklwe';
Following Cyrille's comment I tried every combination of selected columns and columns in the where clause. The statement just won't use DoP 2 when an index unique scan is used.
select event_pk, result form event where event_pk = 'swdfklwe'
select event_pk form event where event_pk = 'swdfklwe'
select event_pk, result form event where event_pk = 'swdfklwe' and result = 0
select event_pk form event where event_pk = 'swdfklwe' and result = 0
Parallel execution is for speeding up queries which traverse a large number of records. It divided the total set of records to be searched into smaller sets and processes multiple sets concurrently. This trades off increased consumption of system resource - primarily CPU - for a reduced total response time.
Your table has a unique index on the searched column. So there can be only one record which matches 'EVENT_PK"='zjmtzhjrth'. There is no way parallelism can make that faster.
The optimizer has chosen the most efficient access path to retrieve one row. Be happy that it has.
Why wouldn't
It work like expected on my side:
SQL> create table t1 (id number);
Table created.
SQL> alter table t1 parallel 2;
Table altered.
SQL> explain plan for select * from t1;
Explained.
SQL> #?/rdbms/admin/utlxpls
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2494645258
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | PX BLOCK ITERATOR | | 1 | 13 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- Degree of Parallelism is 2 because of table property
and here are the parameters I have (all defaults)
SQL> show parameter parallel
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
containers_parallel_degree integer 65535
fast_start_parallel_rollback string LOW
parallel_adaptive_multi_user boolean FALSE
parallel_degree_limit string CPU
parallel_degree_policy string MANUAL
parallel_execution_message_size integer 16384
parallel_force_local boolean FALSE
parallel_instance_group string
parallel_max_servers integer 40
parallel_min_percent integer 0
parallel_min_servers integer 4
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
parallel_min_time_threshold string AUTO
parallel_servers_target integer 16
parallel_threads_per_cpu integer 2
recovery_parallelism integer 0
SQL>

Oracle 11gR2 - View Function Columns Evaluation

I seem to have an odd issue regarding an Oracle view that has functions defined for columns and when those functions are evaluated.
Let's say I have the following view and function definition:
CREATE OR REPLACE VIEW test_view_one AS
SELECT column_one,
a_package.function_that_returns_a_value(column_one) function_column
FROM a_table;
CREATE OR REPLACE PACKAGE BODY a_package AS
FUNCTION function_that_returns_a_value(p_key VARCHAR2) RETURN VARCHAR2 IS
CURSOR a_cur IS
SELECT value
FROM table_b
WHERE key = p_key;
p_temp VARCHAR2(30);
BEGIN
-- Code here to write into a temp table. The function call is autonomous.
OPEN a_cur;
FETCH a_cur INTO p_temp;
CLOSE a_cur;
RETURN p_temp;
END function_that_returns_a_value;
END a_package;
In general, I would expect that if function_column is included in a query then for every row brought back by that query, the function would be run. This seems to be true in some circumstances but not for others.
For example, let's say I have the following:
SELECT pageouter,*
FROM(WITH page_query AS (SELECT *
FROM test_view_one
ORDER BY column_one)
SELECT page_query.*, ROWNUM as innerrownum
FROM page_query
WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1
In this scenario, that inner query (the one querying test_view_one) brings back around 90,000 records.
If I define the function as inserting into a temporary table then I can tell that the function ran 25 times, once for each row brought back. Exactly what I would expect.
However, if I add a significant where clause on to that inner query, e.g.
SELECT pageouter,*
FROM(WITH page_query AS (SELECT *
FROM test_view_one
WHERE EXISTS (SELECT 'x' FROM some_table WHERE ...)
AND NOT EXISTS (SELECT 'x' FROM some_other_table WHERE ...)
AND EXISTS (SELECT 'x' FROM another_table WHERE ...)
ORDER BY column_one)
SELECT page_query.*, ROWNUM as innerrownum
FROM page_query
WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1
Then the number of rows being brought back by the inner query is 60,000 and if I then query the temporary table, I can tell the function has run 60,000 times. Unsurprisingly, this pretty much destroys performance of the query.
The queries above are run as part of a paging implementation which is why we only ever bring back 25 rows and is why we only ever need the functions to be run for those 25 rows.
I should add, if I change the WHERE clause (i.e. I remove some of the conditions) then the query goes back to behaving it self, only running the functions for the 25 rows that are actually brought back.
Does anyone have any idea as to when functions in views are evaluated? Or anyway in determining what causes it or a way of identifying when the functions are evaluated (I've checked the explain plan and there's nothing in there which seems to give it away). If I knew that then I could hopefully find a solution to the problem but there seems to be little documentation other than "They'll run for each row brought back" which is clearly not the case in some scenarios.
I fully appreciate it's difficult to work out what's going on without a working schema but if you need anymore info then please feel free to ask.
Many Thanks
Additional Info as Requested.
Below is the actual explain plan that I get out of the production environment. The table names don't match the above query (in fact there's considerably more tables involved but they're all joined by NOT EXISTS statements within the WHERE clause.)
The DEMISE table, is the equivalent of the A_TABLE in the above query.
It's worth noting that stats were gathered just before I ran the explain plan to make it as accurate as possible.
My understanding of this is that the VIEW row is where the functions would be evaluated, which occurs AFTER the rows have been filtered down. My understanding is obviously flawed!
So this is the bad plan, the one that calls the function 60,000 times...
Execution Plan
----------------------------------------------------------
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 10230 | 984 (1)|
| 1 | FAST DUAL | | 1 | | 2 (0)|
| 2 | FAST DUAL | | 1 | | 2 (0)|
|* 3 | VIEW | | 5 | 10230 | 984 (1)|
|* 4 | COUNT STOPKEY | | | | |
| 5 | VIEW | | 5 | 10165 | 984 (1)|
|* 6 | SORT ORDER BY STOPKEY | | 5 | 340 | 984 (1)|
| 7 | COUNT | | | | |
|* 8 | FILTER | | | | |
|* 9 | HASH JOIN RIGHT OUTER | | 5666 | 376K| 767 (1)|
|* 10 | INDEX RANGE SCAN | USERDATAI1 | 1 | 12 | 2 (0)|
|* 11 | HASH JOIN RIGHT ANTI | | 5666 | 309K| 765 (1)|
|* 12 | INDEX FAST FULL SCAN | TNNTMVINI1 | 1 | 17 | 35 (0)|
|* 13 | HASH JOIN RIGHT ANTI | | 6204 | 236K| 729 (1)|
|* 14 | INDEX RANGE SCAN | CODESGENI3 | 1 | 10 | 2 (0)|
|* 15 | INDEX FULL SCAN | DEMISEI4 | 6514 | 184K| 727 (1)|
| 16 | NESTED LOOPS | | 1 | 25 | 3 (0)|
| 17 | NESTED LOOPS | | 1 | 25 | 3 (0)|
|* 18 | INDEX RANGE SCAN | PROPERTY_GC | 1 | 15 | 2 (0)|
|* 19 | INDEX UNIQUE SCAN | CODESGENI1 | 1 | | 0 (0)|
|* 20 | TABLE ACCESS BY INDEX ROWID| CODESGEN | 1 | 10 | 1 (0)|
| 21 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
|* 22 | INDEX RANGE SCAN | DMSELEASI4 | 1 | 21 | 2 (0)|
|* 23 | INDEX RANGE SCAN | TNNTMVINI1 | 1 | 17 | 1 (0)|
| 24 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
-------------------------------------------------------------------------------------------
This is the good plan. This calls the function 25 times but has some of the not exists statements removed from the where clause.
Execution Plan
----------------------------------------------------------
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 54200 | 144 (0)|
| 1 | FAST DUAL | | 1 | | 2 (0)|
| 2 | FAST DUAL | | 1 | | 2 (0)|
|* 3 | VIEW | | 25 | 54200 | 144 (0)|
|* 4 | COUNT STOPKEY | | | | |
| 5 | VIEW | | 26 | 56030 | 144 (0)|
| 6 | COUNT | | | | |
|* 7 | FILTER | | | | |
| 8 | NESTED LOOPS ANTI | | 30 | 3210 | 144 (0)|
| 9 | NESTED LOOPS OUTER | | 30 | 2580 | 114 (0)|
| 10 | NESTED LOOPS ANTI | | 30 | 2220 | 84 (0)|
| 11 | NESTED LOOPS ANTI | | 32 | 1824 | 52 (0)|
| 12 | TABLE ACCESS BY INDEX ROWID| DEMISE | 130K| 5979K| 18 (0)|
| 13 | INDEX FULL SCAN | DEMISEI4 | 34 | | 3 (0)|
|* 14 | INDEX RANGE SCAN | CODESGENI3 | 1 | 10 | 1 (0)|
|* 15 | INDEX RANGE SCAN | TNNTMVINI1 | 1 | 17 | 1 (0)|
|* 16 | INDEX RANGE SCAN | USERDATAI1 | 1 | 12 | 1 (0)|
|* 17 | INDEX RANGE SCAN | DMSELEASI4 | 1 | 21 | 1 (0)|
| 18 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
----------------------------------------------------------------------------------------
I fully appreciate the second plan is doing less but that doesn't explain why the functions aren't being evaluated... at least not that I can work out.
The Pagination with ROWNUM may be performed
in two ways:
A) full scan the row source with optimized sorting (limited to the top N rows) or
B) index access of the row source with no sort at all
Here simplified example of case A
SELECT *
FROM
(SELECT a.*,
ROWNUM rnum
FROM
( SELECT * FROM test_view_one ORDER BY id
) a
WHERE ROWNUM <= 25
)
WHERE rnum >= 1
The corresponding execution plan looks as follows (Note that I presend also part
of column projection - I will soon explain why):
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 975 | | 1034 (1)| 00:00:01 |
|* 1 | VIEW | | 25 | 975 | | 1034 (1)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | | |
| 3 | VIEW | | 90000 | 2285K| | 1034 (1)| 00:00:01 |
|* 4 | SORT ORDER BY STOPKEY| | 90000 | 439K| 1072K| 1034 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 90000 | 439K| | 756 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
...
3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
5 - "ID"[NUMBER,22]
Within the execution the table is accessed with FULL SCAN, i.e. all records are red.
The optimization is in the SORT operation: SORT ORDER BY STOPKEY means that not all
rows are sorted, but only the top 25 are kept and sortet.
Here the execution plan for case B
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 975 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 25 | 975 | 2 (0)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | VIEW | | 26 | 676 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| TEST_IDX | 26 | 130 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Here are accessed only the required 25 rows and therefore the function can't be called more that the N times.
Now the important consideration, in case A, the function can, but need not be called for each row. How do we see it?
The answer is in the column projection in the explain plan.
4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
The relevant line 4 show, that the function is called in the SORT operation and therefor for each line. (Sort gets all the rows).
Conclusion
My test on 11.2 shows that in case A (FULL SCAN with SORT ORDER BY STOPKEY) the view function is called
once per each row.
I guess the only workaround is to select only the ID, limit the result and than join back the original view to get the function value.
Final notes
I tested this in 12.1 as well and see below the shift in the column projection.
The function is calculated first in the VIEW (line 3), i.e. both cases works fine.
Column Projection Information (identified by operation id):
-----------------------------------------------------------
...
3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
4 - (#keys=1) "ID"[NUMBER,22]
5 - "ID"[NUMBER,22]
And of course in 12c the new feature of OFFSET - FETCH NEXT could be used.
Good Luck!

Resources