Both Table P (Parent) and C (Child) have 10 partitions on cat and 316 subpartitions on effective_date.
Table P has the following index create index ix_p_cat on p (cat);.
How is it possible that an index range scan with an index on the partition column is preferable (lower cost) to the optimizer than doing a full partition access?
My thinking is that the same amount of data blocks from P are going to be needed in either case, so might as well avoid having to read additional index blocks. However, the optimizer disagrees.
Below are two explain plans. The first is showing that the optimizer wants to use the index to build the hash table, and the second is with a hint to not use the index.
Tables and index are analyzed.
Oracle Enterprise Edition 19c. Thanks in advance.
select C.some_col
from "P"
join "C"
on P.code = C.code
and P.cat = :cat
and C.cat = :cat
;
------------------------------------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | select statement | | 275G| 5127G| 1280K (58)| 00:00:51 | | |
|* 1 | hash join | | 275G| 5127G| 1280K (58)| 00:00:51 | | |
| 2 | table access by global index ROWID BATCHED| P | 60363 | 412K| 1642 (1)| 00:00:01 | ROWID | ROWID |
|* 3 | index range scan | IX_P_CAT | 60363 | | 231 (0)| 00:00:01 | | |
| 4 | partition LIST single | | 41M| 510M| 539K (1)| 00:00:22 | key | key |
| 5 | partition range all | | 41M| 510M| 539K (1)| 00:00:22 | 1 | 316 |
| 6 | table access full | C | 41M| 510M| 539K (1)| 00:00:22 | | |
------------------------------------------------------------------------------------------------------------------------
query block name / object alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
2 - SEL$58A6D7F6 / p#SEL$1
3 - SEL$58A6D7F6 / p#SEL$1
6 - SEL$58A6D7F6 / C#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("P"."CODE"="C"."CODE")
3 - access("P"."CAT"=:CAT)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "C"."SOME_COL"[NUMBER,22], "C"."SOME_COL"[NUMBER,22]
2 - "P"."CODE"[CHARACTER,2]
3 - "P".ROWID[ROWID,10]
4 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
5 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
6 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
Note
-----
- this is an adaptive plan
No Index
select /*+ no_index(P) */
C.some_col
from "P"
join "C"
on P.code = C.code
and P.cat = :cat
and C.cat = :cat
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 275G| 5127G| 1287K (58)| 00:00:51 | | |
|* 1 | hash join | | 275G| 5127G| 1287K (58)| 00:00:51 | | |
| 2 | partition LIST single| | 60363 | 412K| 8152 (1)| 00:00:01 | key | key |
| 3 | partition range all | | 60363 | 412K| 8152 (1)| 00:00:01 | 1 | 316 |
| 4 | table access full | P | 60363 | 412K| 8152 (1)| 00:00:01 | | |
| 5 | partition LIST single| | 41M| 510M| 539K (1)| 00:00:22 | key | key |
| 6 | partition range all | | 41M| 510M| 539K (1)| 00:00:22 | 1 | 316 |
| 7 | table access full | C | 41M| 510M| 539K (1)| 00:00:22 | | |
-----------------------------------------------------------------------------------------------
query block name / object alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
4 - SEL$58A6D7F6 / p#SEL$1
7 - SEL$58A6D7F6 / C#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("P"."CODE"="C"."CODE")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "C"."SOME_COL"[NUMBER,22], "C"."SOME_COL"[NUMBER,22]
2 - "P"."CODE"[CHARACTER,2]
3 - "P"."CODE"[CHARACTER,2]
4 - "P"."CODE"[CHARACTER,2]
5 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
6 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
7 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
Hint Report (identified by operation id / query block name / object alias):
Total hints for statement: 1
---------------------------------------------------------------------------
4 - SEL$58A6D7F6 / p#SEL$1
- no_index(p)
Note
-----
- this is an adaptive plan
A table with thousands of subpartitions but less than a million rows may have an enormous amount of empty segment space that will cause weird optimizer decisions. Run the below query to see how much space is used by your tables and indexes:
select segment_name, sum(bytes)/1024/1024 mb, count(*) segment_count
from dba_segments
where segment_name in ('P', 'C', 'IX_P_CAT')
group by segment_name;
I tried to recreate your tables on my 19c database and each partition of table "P" consumed 2.5 gigabytes of space even though the actual data should only need a few megabytes. The exact value will differ for every system, but I'd guess that most systems will have a large value. Oracle segments are usually heavy data structures intended to hold more than a thousand rows; performance would suck if Oracle allocated one-byte-at-a-time, so instead it usually hands out megabytes-at-a-time. But if you have 316 subpartitions, those megabytes add up.
Normally, the best way to select a large percentage of data is to use a full table scan or a full (sub)partition scan. But if the table has so much wasted space, it's more efficient to use the small index and lookup each row by the ROWID than to full scan all the mostly-empty segments.
You can fix the problem by either using fewer subpartitions, adjusting your segment allocation settings, or shrinking the table like this:
alter table p enable row movement;
alter table p shrink space;
begin
dbms_stats.gather_table_stats(user, 'P');
end;
/
Related
Considering the execution plan for this query :
SQL_ID 1m5r644say02b, child number 0
-------------------------------------
select * from hr.employees where department_id = 80 intersect select *
from hr.employees where first_name like 'A%'
Plan hash value: 1738366820
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 8 | | | |
| 1 | INTERSECTION | | 1 | | 4 |00:00:00.01 | 8 | | | |
| 2 | SORT UNIQUE | | 1 | 34 | 34 |00:00:00.01 | 6 | 6144 | 6144 | 6144 (0)|
|* 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 34 | 34 |00:00:00.01 | 6 | | | |
| 4 | SORT UNIQUE | | 1 | 11 | 10 |00:00:00.01 | 2 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 1 | 11 | 10 |00:00:00.01 | 2 | | | |
|* 6 | INDEX SKIP SCAN | EMP_NAME_IX | 1 | 11 | 10 |00:00:00.01 | 1 | | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("DEPARTMENT_ID"=80)
6 - access("FIRST_NAME" LIKE 'A%')
filter("FIRST_NAME" LIKE 'A%')
The execution plan has both access and filter predicates with the same '%A' predicate here on the EMP_NAME_IX index. But shouldn't the access predicate be enough here, as they both will filter the same rows? Why did it perform the additional filter predicate?
Is there a general rule for when both access and filter are the same? Based on GV$SQL_PLAN, when an operation has either an access or a filter predicate, they are only equal about 1% of the time. And this situation only happens with operations and options like INDEX (FULL/RANGE/SKIP/UNIQUE) and SORT (JOIN/UNIQUE).
select *
from gv$sql_plan
where access_predicates = filter_predicates;
Presumably you have an index on hr.employees that includes the first_name column. But you are selecting * from hr.employees such that the rows obtained from the index would have to traced back (i.e. join) with the table.
For conceptual understanding it helps to think of indexes as plain tables with a foreign key to the original table's primary key. When usage of indexes helps, these two tables are joined. The index is used alone when it contains all needed columns.
In this case we assume a join is required since you are selecting *. When accessing the hr.employee table for the second query of the intersect, because its where clause filters on an index column, a join to the index is performed prior to filtering.
The first occurrence of "FIRST_NAME" LIKE 'A%' is the reason usage of the index is decided. The second occurrence, is then the actual filtering. Filtering happens only once, not twice.
These are listed as distinct operations as deciding to use the index (and therefore perform the join) has its own costs.
In the example below Oracle's optimizer's estimated rows is incorrect by two orders of magnitude. How do I improve the estimated rows?
Table A has rows with numbers 1 through 1,000 for each of the 10 letters A through J.
Table C has 100 copies of table A.
So, table A has a cardinality of 10K and table C has a cardinality of 1M.
A given single-valued predicate on the number in table A will yield 1/1000 of the rows in table A (same for table C).
A given single-valued predicate on the letter in table A will yield 1/10 of the rows in table A (same for table C).
Setup script.
drop table C;
drop table A;
create table A
( num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into A (num, val, pad)
select mod(level-1, 1000) +1
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A'))
, ' '
from dual
connect by level <= 10*1000
;
create table C
( id NUMBER
, num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into C (id, num, val, pad)
with
"D1" as
( select /*+ materialize */ null from dual connect by level <= 100 --320
)
, "D" as
( select /*+ materialize */
level rn
, mod(level-1, 1000) + 1 num
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A')) val
, ' ' pad
from dual
connect by level <= 10*1000
order by 1 offset 0 rows
)
select rownum id
, num num
, val val
, pad pad
from "D1", "D"
;
commit;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
Consider the explain plan to the following query.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num = 1
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 1 | 47 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 100 | 5200 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."NUM"=1 AND "A"."VAL"='A')
3 - filter("C"."NUM"=1 AND "C"."VAL"='A')
The row cardinality of each step makes sense to me.
ID=2 --> (1/1,000) * (1/10) * 10,000 = 1
ID=3 --> (1/1,000) * (1/10) * 1,000,000 = 100
ID=1 --> 100 is correct. Predicates in ID=2 and ID=3 are the same, every row from ID=2 will have one and only one match in the row source from ID=3.
Now consider the explain plan to the slightly modified query below.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 2 | 94 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 200 | 10400 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."VAL"='A' AND ("A"."NUM"=1 OR "A"."NUM"=2))
3 - filter("C"."VAL"='A' AND ("C"."NUM"=1 OR "C"."NUM"=2))
The row cardinality of each step ID=2 and ID=3 makes sense to me, but now ID=1 is incorrect by two orders of magnitude.
ID=2 --> (1/1,000)(1/10) * 10,000 = 1
ID=3 --> (1/1,000)(1/10) * 1,000,000 = 100
ID=1 --> The optimizer's estimate is two orders of magnitude different from the actual.
Adding unique and foreign constraints and extended statistics did not improve the estimated row counts.
create unique index IU_A on A (num, val);
alter table A add constraint UK_A unique (num, val) rely using index IU_A enable validate;
alter table C add constraint R_C foreign key (num, val) references A (num, val) rely enable validate;
create index IR_C on C (num, val);
select dbms_stats.create_extended_stats(null,'A','(num, val)') from dual;
select dbms_stats.create_extended_stats(null,'C','(num, val)') from dual;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 2 | 198 | 10 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| A | 2 | 94 | 5 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IU_A | 2 | | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access(("A"."NUM"=1 OR "A"."NUM"=2) AND "A"."VAL"='A')
6 - access("A"."NUM"="C"."NUM" AND "C"."VAL"='A')
filter("C"."NUM"=1 OR "C"."NUM"=2)
What do I need to do to make the estimated rows better match reality?
Using Oracle Enterprise Edition 19c.
Thanks in advance.
Edit
After ensuring the most recent optimizer_features_enable was used and modifying one of the predicates, we still have an explain plan whose estimated row count is short by two orders of magnitude.
ID=6 ought to have an estimated rows of 100. It seems it is applying the predicate factor twice. Once for the access and again for the filter.
select /*+ optimizer_features_enable('19.1.0') */
*
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val in('A','B')
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 4 | 396 | 16 (0)| 00:00:01 |
| 1 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 2 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | table access by index ROWID BATCHED| A | 4 | 188 | 7 (0)| 00:00:01 |
|* 5 | index range scan | IU_A | 4 | | 3 (0)| 00:00:01 |
|* 6 | index range scan | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | table access by index ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."NUM"=1 or "A"."NUM"=2)
filter("A"."VAL"='A' or "A"."VAL"='B')
6 - access("A"."NUM"="C"."NUM" and "A"."VAL"="C"."VAL")
filter(("C"."NUM"=1 or "C"."NUM"=2) and ("C"."VAL"='A' or "C"."VAL"='B'))
Why does Oracle still apply a filter predicate on an index even after the access predicate for that same index guarantees the filter predicate is always true?
drop table index_filter_child
;
drop table index_filter_parent
;
create table index_filter_parent
as
select level id, chr(mod(level - 1, 26) + ascii('A')) code from dual connect by level <= 26
;
create table index_filter_child
as
with
"C" as (select chr(mod(level - 1, 26) + ascii('A')) code from dual connect by level <= 26)
select rownum id, C1.code from C C1, C C2
;
exec dbms_stats.gather_table_stats('USER','INDEX_FILTER_PARENT')
;
exec dbms_stats.gather_table_stats('USER','INDEX_FILTER_CHILD')
;
create index ix_index_filter_parent on index_filter_parent(code)
;
create index ix_index_filter_child on index_filter_child(code)
;
select P.*
from index_filter_parent "P"
join index_filter_child "C"
on C.code = P.code
where P.code in('A','Z') --same result if we predicate instead on C.code in('A','Z')
;
--------------------------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
--------------------------------------------------------------------------------------------------------------
| 0 | select statement | | 5 | 35 | 4 (0)| 00:00:01 |
| 1 | nested LOOPS | | 5 | 35 | 4 (0)| 00:00:01 |
| 2 | INLIST ITERATOR | | | | | |
| 3 | table access by index ROWID| INDEX_FILTER_PARENT | 2 | 10 | 2 (0)| 00:00:01 |
|* 4 | index range scan | IX_INDEX_FILTER_PARENT | 2 | | 1 (0)| 00:00:01 |
|* 5 | index range scan | IX_INDEX_FILTER_CHILD | 2 | 4 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."CODE"='A' or "P"."CODE"='Z')
5 - access("C"."CODE"="P"."CODE")
filter("C"."CODE"='A' or "C"."CODE"='Z') <========== why is this needed?
Why is the filter predicate in 5 needed in light of the access("C"."CODE"="P"."CODE") guaranteeing C.code is 'A' or 'Z'?
Thank you in advance.
Oracle 12.1 enterprise Edition.
This is a result of "transitive closure" transformation: you can read more about here:
Transitivity and Transitive Closure (Doc ID 68979.1) Doc id 68979.1
Jonathan Lewis - Cartesian Merge Join
Jonathan Lewis - Transitive Closure (or, even better, in his book "Cost Based Oracle Fundamentals")
If you get CBO trace (alter session set events '10053 trace name context forever, level 1' or alter session set events 'trace[SQL_Optimizer.*]), you will see that the transformation happens before choosing join method and access paths. It allows CBO to analyze more different access paths and choose the best available plan. Moreover, in case of adaptive plans, it allows oracle to change join method on the fly.
For example, you can get a plan like this:
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52 | 364 | 4 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 52 | 364 | 4 (0)| 00:00:01 |
| 2 | INLIST ITERATOR | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| INDEX_FILTER_PARENT | 2 | 10 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_INDEX_FILTER_PARENT | 2 | | 1 (0)| 00:00:01 |
| 5 | INLIST ITERATOR | | | | | |
|* 6 | INDEX RANGE SCAN | IX_INDEX_FILTER_CHILD | 52 | 104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C"."CODE"="P"."CODE")
4 - access("P"."CODE"='A' OR "P"."CODE"='Z')
6 - access("C"."CODE"='A' OR "C"."CODE"='Z')
In fact, you can disable it using the event 10155: CBO disable generation of transitive OR-chains.
Your example:
alter session set events '10155';
explain plan for
select P.*
from index_filter_parent "P"
join index_filter_child "C"
on C.code = P.code
where P.code in('A','Z');
Results:
SQL> alter session set events '10155';
Session altered.
SQL> explain plan for
2 select P.*
3 from index_filter_parent "P"
4 join index_filter_child "C"
5 on C.code = P.code
6 where P.code in('A','Z') ;
Explained.
SQL> #xplan typical
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
Plan hash value: 2543178509
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52 | 364 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 52 | 364 | 4 (0)| 00:00:01 |
| 2 | INLIST ITERATOR | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| INDEX_FILTER_PARENT | 2 | 10 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_INDEX_FILTER_PARENT | 2 | | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IX_INDEX_FILTER_CHILD | 26 | 52 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."CODE"='A' OR "P"."CODE"='Z')
5 - access("C"."CODE"="P"."CODE")
Note
-----
- this is an adaptive plan
22 rows selected.
As you can see, that predicate has disappeared.
PS. Other events for transitive predicates:
ORA-10155: CBO disable generation of transitive OR-chains
ORA-10171: CBO disable transitive join predicates
ORA-10179: CBO turn off transitive predicate replacement
ORA-10195: CBO don't use check constraints for transitive predicates
I'm using Oracle 18c but I guess my question would not be bound to the specific version.
I want to fetch rows from a table but I found a complex, ugly solution.
I would like to know if there is better, simple query that can return the same result as following.
First of all, I have a simple table like this.
Note that col is going to store large text.
CREATE TABLE simpletable
(record_id NUMBER,
col CLOB,
PRIMARY KEY (record_id));
I want to retrieve single row from the above table and whichever row is acceptable.
First query came to my mind is as following.
SELECT * FROM (SELECT * FROM simpletable) WHERE rownum <= 1;
Another is as following.
SELECT * FROM (SELECT * FROM simpletableORDER BY record_id) WHERE rownum <= 1;
Unfortunately, neither of above two does not use primary-key index and uses TABLE ACCESS FULL which can take long time when the table grows enough large.
(I'm guessing that oracle preferred the simpler plan because my table is not enough large yet to use index scan.
Oracle might choose different plan if the table grows up further.)
My final solution that uses primary-key index to narrow down the table access is following.
SELECT simpletable.* FROM
(SELECT * FROM
(SELECT record_id, ROWID as id FROM simpletable ORDER BY record_id)
WHERE rownum<=1) a
JOIN simpletable ON a.id = simpletable.ROWID;
If you have a better solution, please let me know.
It would be very appreciated.
P.S.
The first two queries produced the following plan.
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 2015 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | SIMPLETABLE | 1 | 2015 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
the final one is:
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2039 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2039 | 3 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_C007561 | 1 | 25 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID| SIMPLETABLE | 1 | 2014 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
I think using OFFSET..FETCH method might helps you here -
SELECT *
FROM simpletable
ORDER BY record_id
OFFSET 0 ROWS
FETCH FIRST ROW ONLY;
If the Cost Based Optimizer code has access to reliable statistics, from its dictionary, regarding all the objects available for this query, then it will very likely produce an optimal execution plan. Of course, there are exceptions and you would argue with their support people as to whether or not choosing a suboptimal plan is a bug.
In this specific case, if you are querying a single table and the CBO could choose between a full table scan and some other scan and then chose a full table scan, then chances were good that the CBO determined that the number of blocks scanned (buffer gets) would have been smaller using a full table scan.
You can expose the truth of the matter by tracing the execution of multiple versions of the statement, each one using a different set of hints to force a particular execution plan. You should consider the execution with the fewest buffer gets to be the winner. Alternatively, if the execution plan is of a serial nature, then you can use response time as measure. If the winner is not automatically chosen by the CBO, then it's probably because the statistics it used were not accurate and you should make them accurate. If the statistics are indeed accurate then Oracle support will probably give you a very long homework assignment.
Similar to the Horror Vacui, some database developers suffer under the Horror FULL TABLE SCAN by simply assuming index access good, full scan bad.
But this is not true, FULL TABLE SCAN is a normal access method, that is preferred in some situation.
Let's illustrate it on a simple example with 10K rows in your table
insert into simpletable (record_id, col)
select rownum, rpad('x',3998,'y')
from dual connect by level <= 10000
To get one arbitrary row from the table you simple use the following query
select * from simpletable where rownum = 1;
Here is the output (edited for brevity) you get from SQL*Plus with setting set autotrace traceonly to see the execution plan and the statistics.
Execution Plan
----------------------------------------------------------
Plan hash value: 1007892724
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 2 (0)| 00:00:01
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS FULL| SIMPLETABLE | 10188 | 19M| 2 (0)| 00:00:01
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
Statistics
----------------------------------------------------------
5 consistent gets
2 physical reads
1 rows processed
The most important information is the statistics consistent gets - there were only 5 blocks accessed - the table is much larger.
What is the explanation? See the operation COUNT STOPKEY above the TABLE ACCESS FULL this ensures that the scan is terminated after the first row is found.
If you want to get a specific row, e.g. the one with the highest ID, the prefered approach is using the row_limiting_clause
SELECT *
FROM simpletable
ORDER BY record_id DESC
OFFSET 0 ROWS FETCH NEXT 1 ROW ONLY;
You will see the execution plan below, that performs first the INDEX FULL SCAN DESCENDING. The complete (full) index will be red in the descending order, but again due to STOPKEY you break after reading the highest key (which is the first entry due to the descending order).
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 1 | VIEW | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY | | 10188 | 19M| 613 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| SIMPLETABLE | 10188 | 19M| 613 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN DESCENDING| SYS_C008793 | 10188 | | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN (0>=0) THEN
0 ELSE 0 END +1 AND "from$_subquery$_002"."rowlimit_$$_rownumber">0)
2 - filter(ROW_NUMBER() OVER ( ORDER BY
INTERNAL_FUNCTION("SIMPLETABLE"."RECORD_ID") DESC )<=CASE WHEN (0>=0) THEN 0 ELSE 0
END +1)
Note if the table is empty or contains very few rows, you will se even here a TABLE ACCESS FULL because the optimizer recognises that it is more effective that to first go to the index and that access the table.
I have a big table (about 4.6 million records) with many columns, I have concatenated some columns and inserted it in clob column(one column have an alias of the name so the name came in a different way and it inserted in clob column )and that column with ctxsys.context index.
I searched with contains a function with the fuzzy operator and for performance, I added tow columns to search and those columns with a bitmap index
and analyzed the table by using DBMS_STATS.GATHER_TABLE_STATS() also I alter my index to take parallel with 4 degrees and increase SORT_AREA_SIZE to 8300000.
My problem is when I searched it's taken from 2 to 5 min to executed.
is there any way to improve performance and reduce time execution(another algorithm to speed searching or I can change the structure of my table by increase the columns and search in multiple columns).
Here is my query:
SELECT first_name,
last_name,
countries,
category,
aliases
FROM (SELECT first_name,
last_name,
countries,
category,
aliases,
rr
FROM (SELECT T.u_id,
T.first_name,
T.last_name,
T.countries,
T.category,
T.aliases,
ROWNUM rr,
all_data
FROM tbl_rsk_list_world T
WHERE t.countries = 'SPAIN'
AND category = 'Eng')
WHERE Contains(all_data, 'fuzzy(JOSE,60,,weight)', 1) > 0)
WHERE rr BETWEEN 1 AND 500
The Execution plan is:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2747287528
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20651 | 109M| 5724 (1
|* 1 | VIEW | | 20651 | 109M| 5724 (1
| 2 | COUNT | | | |
| 3 | PX COORDINATOR | | | |
| 4 | PX SEND QC (RANDOM)| :TQ10000 | 20651 | 4638K| 5724 (1
| 5 | PX BLOCK ITERATOR | | 20651 | 4638K| 5724 (1
|* 6 | TABLE ACCESS FULL| TBL_RSK_LIST_WORLD | 20651 | 4638K| 5724 (1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CTXSYS"."CONTAINS"("ALL_DATA",'fuzzy(jose,60,,weight)',1)>0 AND "
AND "from$_subquery$_002"."RR">=1)
6 - filter("COUNTRIES"='SPAIN' AND "CATEGORY"='Eng')
20 rows selected
when I using FIRST_ROWS and DOMAIN_INDEX_NO_SORT hints the execution plan be:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1488722846
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 173K|
|* 1 | VIEW | | 50 | 173K|
| 2 | COUNT | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| TBL_RSK_LIST_WORLD | 50 | 11500 |
|* 4 | DOMAIN INDEX | NDX_RSK_LIST_WORLD_CTX | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CATEGORY"='Eng' AND "COUNTRIES"='SPAIN' AND
"from$_subquery$_002"."RR"<=500 AND "from$_subquery$_002"."RR">=1)
4 - access("CTXSYS"."CONTAINS"("W"."ALL_DATA",'fuzzy(jose,60,,weight)',1)>0)
18 rows selected
but still the performance bad :\