Optimizer choosing full partition scan over partition index scan - oracle

I have a table(table1) monthly partitions on dt column. I have create a local index on dt column. When I run the below query I see optimizer going for full partition scan instead of using the index on dt column.
WITH
A AS
(
Select * from table1
WHERE
EXISTS (SELECT u_id FROM table2
WHERE u_id=UPPER('ABC'))
)
SELECT DISTINCT
A.id,
A.dt
FROM
A
WHERE
A.dt BETWEEN timestamp '2022-04-01 00:00:00' AND timestamp '2022-04-01 23:59:59.999000000'
Explain plan
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 755K| 67M| | 34447 (1)| 00:00:02 | | |
| 1 | HASH UNIQUE | | 755K| 67M| 75M| 34447 (1)| 00:00:02 | | |
|* 2 | FILTER | | | | | | | | |
| 3 | PARTITION RANGE SINGLE| | 755K| 67M| | 18298 (1)| 00:00:01 | 5 | 5 |
|* 4 | TABLE ACCESS FULL | TABLE1 | 755K| 67M| | 18298 (1)| 00:00:01 | 5 | 5 |
|* 5 | INDEX UNIQUE SCAN | SYS_C0099684 | 1 | 15 | | 0 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "TEST"."TABLE2"
"TABLE2" WHERE "U_ID"=U'ABC'))
4 - filter("TABLE1"."DT"<=TIMESTAMP' 2022-04-02 23:59:59.999000000' AND
"TABLE1"."DT">=TIMESTAMP' 2022-04-02 00:00:00.000000000')
5 - access("U_ID"=U'ABC')

First of all, I'm not sure that INDEX RANGE SCAN would be better here:
We don't know your average row length, so maybe full partition scan using multiblock reads would be faster than sequential single block reads of the index and partition (table access by rowid), especially considering that you need 1/30 part of the partition.
Just consider the following example:
partition size = 100 GB
rows in partition = 400 million
avg row length = 100GB/400M =~ 250 byte
Since we need 1 day, we should get ~400M/30 = 13.3M rows. That means that in case of index range scan we need 13.3 M single block reads only to get column "id" from the table by rowid from the index (table access by rowid), and that's even without counting single block reads required from the index scan. Assume, that your average single block read time is 3ms in that system, so you will need more than 13.3M * 3ms = 40000 seconds only to read table rows in case of index access.
Now consider full partition scan. If your multiblock_read_count = 64 (if I remember correctly by default it's 128) and block size is default 8kB, you will need to perform 100GB/(64*8kB) = 195k multiblock reads. Assuming that your multiblock read time is 8ms, your full partition scan would take just ~1500 sec

Related

Oracle Hash Join - Probe Table: Index over Partition?

Both Table P (Parent) and C (Child) have 10 partitions on cat and 316 subpartitions on effective_date.
Table P has the following index create index ix_p_cat on p (cat);.
How is it possible that an index range scan with an index on the partition column is preferable (lower cost) to the optimizer than doing a full partition access?
My thinking is that the same amount of data blocks from P are going to be needed in either case, so might as well avoid having to read additional index blocks. However, the optimizer disagrees.
Below are two explain plans. The first is showing that the optimizer wants to use the index to build the hash table, and the second is with a hint to not use the index.
Tables and index are analyzed.
Oracle Enterprise Edition 19c. Thanks in advance.
select C.some_col
from "P"
join "C"
on P.code = C.code
and P.cat = :cat
and C.cat = :cat
;
------------------------------------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | select statement | | 275G| 5127G| 1280K (58)| 00:00:51 | | |
|* 1 | hash join | | 275G| 5127G| 1280K (58)| 00:00:51 | | |
| 2 | table access by global index ROWID BATCHED| P | 60363 | 412K| 1642 (1)| 00:00:01 | ROWID | ROWID |
|* 3 | index range scan | IX_P_CAT | 60363 | | 231 (0)| 00:00:01 | | |
| 4 | partition LIST single | | 41M| 510M| 539K (1)| 00:00:22 | key | key |
| 5 | partition range all | | 41M| 510M| 539K (1)| 00:00:22 | 1 | 316 |
| 6 | table access full | C | 41M| 510M| 539K (1)| 00:00:22 | | |
------------------------------------------------------------------------------------------------------------------------
query block name / object alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
2 - SEL$58A6D7F6 / p#SEL$1
3 - SEL$58A6D7F6 / p#SEL$1
6 - SEL$58A6D7F6 / C#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("P"."CODE"="C"."CODE")
3 - access("P"."CAT"=:CAT)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "C"."SOME_COL"[NUMBER,22], "C"."SOME_COL"[NUMBER,22]
2 - "P"."CODE"[CHARACTER,2]
3 - "P".ROWID[ROWID,10]
4 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
5 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
6 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
Note
-----
- this is an adaptive plan
No Index
select /*+ no_index(P) */
C.some_col
from "P"
join "C"
on P.code = C.code
and P.cat = :cat
and C.cat = :cat
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 275G| 5127G| 1287K (58)| 00:00:51 | | |
|* 1 | hash join | | 275G| 5127G| 1287K (58)| 00:00:51 | | |
| 2 | partition LIST single| | 60363 | 412K| 8152 (1)| 00:00:01 | key | key |
| 3 | partition range all | | 60363 | 412K| 8152 (1)| 00:00:01 | 1 | 316 |
| 4 | table access full | P | 60363 | 412K| 8152 (1)| 00:00:01 | | |
| 5 | partition LIST single| | 41M| 510M| 539K (1)| 00:00:22 | key | key |
| 6 | partition range all | | 41M| 510M| 539K (1)| 00:00:22 | 1 | 316 |
| 7 | table access full | C | 41M| 510M| 539K (1)| 00:00:22 | | |
-----------------------------------------------------------------------------------------------
query block name / object alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
4 - SEL$58A6D7F6 / p#SEL$1
7 - SEL$58A6D7F6 / C#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("P"."CODE"="C"."CODE")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "C"."SOME_COL"[NUMBER,22], "C"."SOME_COL"[NUMBER,22]
2 - "P"."CODE"[CHARACTER,2]
3 - "P"."CODE"[CHARACTER,2]
4 - "P"."CODE"[CHARACTER,2]
5 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
6 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
7 - "C"."CODE"[CHARACTER,2], "C"."SOME_COL"[NUMBER,22]
Hint Report (identified by operation id / query block name / object alias):
Total hints for statement: 1
---------------------------------------------------------------------------
4 - SEL$58A6D7F6 / p#SEL$1
- no_index(p)
Note
-----
- this is an adaptive plan
A table with thousands of subpartitions but less than a million rows may have an enormous amount of empty segment space that will cause weird optimizer decisions. Run the below query to see how much space is used by your tables and indexes:
select segment_name, sum(bytes)/1024/1024 mb, count(*) segment_count
from dba_segments
where segment_name in ('P', 'C', 'IX_P_CAT')
group by segment_name;
I tried to recreate your tables on my 19c database and each partition of table "P" consumed 2.5 gigabytes of space even though the actual data should only need a few megabytes. The exact value will differ for every system, but I'd guess that most systems will have a large value. Oracle segments are usually heavy data structures intended to hold more than a thousand rows; performance would suck if Oracle allocated one-byte-at-a-time, so instead it usually hands out megabytes-at-a-time. But if you have 316 subpartitions, those megabytes add up.
Normally, the best way to select a large percentage of data is to use a full table scan or a full (sub)partition scan. But if the table has so much wasted space, it's more efficient to use the small index and lookup each row by the ROWID than to full scan all the mostly-empty segments.
You can fix the problem by either using fewer subpartitions, adjusting your segment allocation settings, or shrinking the table like this:
alter table p enable row movement;
alter table p shrink space;
begin
dbms_stats.gather_table_stats(user, 'P');
end;
/

how to avoid TABLE ACCESS FULL when fetching rows in Oracle

I'm using Oracle 18c but I guess my question would not be bound to the specific version.
I want to fetch rows from a table but I found a complex, ugly solution.
I would like to know if there is better, simple query that can return the same result as following.
First of all, I have a simple table like this.
Note that col is going to store large text.
CREATE TABLE simpletable
(record_id NUMBER,
col CLOB,
PRIMARY KEY (record_id));
I want to retrieve single row from the above table and whichever row is acceptable.
First query came to my mind is as following.
SELECT * FROM (SELECT * FROM simpletable) WHERE rownum <= 1;
Another is as following.
SELECT * FROM (SELECT * FROM simpletableORDER BY record_id) WHERE rownum <= 1;
Unfortunately, neither of above two does not use primary-key index and uses TABLE ACCESS FULL which can take long time when the table grows enough large.
(I'm guessing that oracle preferred the simpler plan because my table is not enough large yet to use index scan.
Oracle might choose different plan if the table grows up further.)
My final solution that uses primary-key index to narrow down the table access is following.
SELECT simpletable.* FROM
(SELECT * FROM
(SELECT record_id, ROWID as id FROM simpletable ORDER BY record_id)
WHERE rownum<=1) a
JOIN simpletable ON a.id = simpletable.ROWID;
If you have a better solution, please let me know.
It would be very appreciated.
P.S.
The first two queries produced the following plan.
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 2015 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | SIMPLETABLE | 1 | 2015 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
the final one is:
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2039 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2039 | 3 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_C007561 | 1 | 25 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID| SIMPLETABLE | 1 | 2014 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
I think using OFFSET..FETCH method might helps you here -
SELECT *
FROM simpletable
ORDER BY record_id
OFFSET 0 ROWS
FETCH FIRST ROW ONLY;
If the Cost Based Optimizer code has access to reliable statistics, from its dictionary, regarding all the objects available for this query, then it will very likely produce an optimal execution plan. Of course, there are exceptions and you would argue with their support people as to whether or not choosing a suboptimal plan is a bug.
In this specific case, if you are querying a single table and the CBO could choose between a full table scan and some other scan and then chose a full table scan, then chances were good that the CBO determined that the number of blocks scanned (buffer gets) would have been smaller using a full table scan.
You can expose the truth of the matter by tracing the execution of multiple versions of the statement, each one using a different set of hints to force a particular execution plan. You should consider the execution with the fewest buffer gets to be the winner. Alternatively, if the execution plan is of a serial nature, then you can use response time as measure. If the winner is not automatically chosen by the CBO, then it's probably because the statistics it used were not accurate and you should make them accurate. If the statistics are indeed accurate then Oracle support will probably give you a very long homework assignment.
Similar to the Horror Vacui, some database developers suffer under the Horror FULL TABLE SCAN by simply assuming index access good, full scan bad.
But this is not true, FULL TABLE SCAN is a normal access method, that is preferred in some situation.
Let's illustrate it on a simple example with 10K rows in your table
insert into simpletable (record_id, col)
select rownum, rpad('x',3998,'y')
from dual connect by level <= 10000
To get one arbitrary row from the table you simple use the following query
select * from simpletable where rownum = 1;
Here is the output (edited for brevity) you get from SQL*Plus with setting set autotrace traceonly to see the execution plan and the statistics.
Execution Plan
----------------------------------------------------------
Plan hash value: 1007892724
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 2 (0)| 00:00:01
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS FULL| SIMPLETABLE | 10188 | 19M| 2 (0)| 00:00:01
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
Statistics
----------------------------------------------------------
5 consistent gets
2 physical reads
1 rows processed
The most important information is the statistics consistent gets - there were only 5 blocks accessed - the table is much larger.
What is the explanation? See the operation COUNT STOPKEY above the TABLE ACCESS FULL this ensures that the scan is terminated after the first row is found.
If you want to get a specific row, e.g. the one with the highest ID, the prefered approach is using the row_limiting_clause
SELECT *
FROM simpletable
ORDER BY record_id DESC
OFFSET 0 ROWS FETCH NEXT 1 ROW ONLY;
You will see the execution plan below, that performs first the INDEX FULL SCAN DESCENDING. The complete (full) index will be red in the descending order, but again due to STOPKEY you break after reading the highest key (which is the first entry due to the descending order).
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 1 | VIEW | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY | | 10188 | 19M| 613 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| SIMPLETABLE | 10188 | 19M| 613 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN DESCENDING| SYS_C008793 | 10188 | | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN (0>=0) THEN
0 ELSE 0 END +1 AND "from$_subquery$_002"."rowlimit_$$_rownumber">0)
2 - filter(ROW_NUMBER() OVER ( ORDER BY
INTERNAL_FUNCTION("SIMPLETABLE"."RECORD_ID") DESC )<=CASE WHEN (0>=0) THEN 0 ELSE 0
END +1)
Note if the table is empty or contains very few rows, you will se even here a TABLE ACCESS FULL because the optimizer recognises that it is more effective that to first go to the index and that access the table.

Oracle unique constraint violation

I have a table with over 30 million records. When doing insert, I need to avoid the Unique constraint violation.
When I use this NOT EXIST approach, the insert takes forever. In fact, it couldn't finish after 24 hours of running. And I can't use the ignore_row_on_dupkey_index hint, because this table has more than 1 PK columns.
Another option is to insert in subsets. But I want to know if there's any other way before I do sub-setting.
insert into tlb1 a
select * from tlb2 b
where not exists (select 'x' from tlb1 c
where b.pk = c.pk)
The important decision depends on the numbe rof row inserted, i.e. the number of the rows in the table TBL2
If this number is rather low (say in hundreds to thousands) you may use safely your approach, provided there is an index on the PK column(s) - whoch should be to enforce the unique constraint.
Please check that the used execution plan is something like the one below
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 110 | 2860 | 113 (0)| 00:00:02 |
| 1 | LOAD TABLE CONVENTIONAL | TBL1 | | | | |
| 2 | NESTED LOOPS ANTI | | 110 | 2860 | 113 (0)| 00:00:02 |
| 3 | TABLE ACCESS FULL | TBL2 | 110 | 1430 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | TBL1_IXD | 1 | 13 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("B"."PK"="C"."PK")
The NESTED LOOPS ANTI means that for each inserted row a single index lookup will be done to check if the key already exists in the target table.
This will work fine for a low number of inserted rows. For a large insert (millions rows) the optimizer will switch to a HASH JOIN RIGHT ANTI, i.e. all rows from both table will be joined to get th epossible duplicates.
This can take some time (but usually not 24 hours) and the approach with DML Error Logging which eliminates the need of the join.
INSERT INTO tbl1 (pk)
SELECT pk
FROM tbl3
LOG ERRORS INTO err$_tbl1 ('dedup tbl3') REJECT LIMIT UNLIMITED;
This approach will scale well especially when the number of the duplicates is low compared with the number of inserted rows. It is comparable to a normal insert:
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 876K| 10M| 427 (1)| 00:00:06 |
| 1 | LOAD TABLE CONVENTIONAL | TBL1 | | | | |
| 2 | TABLE ACCESS FULL | TBL3 | 876K| 10M| 427 (1)| 00:00:06 |
---------------------------------------------------------------------------------

How is the cardinality determined in a query?

I have two queries which return the same result set, but when reviewing the execution plans they have different values of cardinality.
The queries are:
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where (prp like '%pswd%' or prp like '%Pswd%')
and prp not like '%kno%'
and prp not like '%encr%';
and
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where regexp_instr(prp, 'pswd', 1,1,0,'i' ) > 0
and regexp_instr(prp, '(encr)|(kno)', 1,1,0,'i' ) = 0;
The first query has the following explain plan:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 65 | 4485 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 65 | 4485 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("PRP" LIKE '%pswd%' OR "PRP" LIKE '%Pswd%')
AND "PRP" NOT LIKE '%kno%'
AND "PRP" NOT LIKE '%encr%')
And the explain plan for the second query is:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 1 | 69 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
1 - filter(REGEXP_INSTR ("PRP",'(encr)|(kno)',1,1,0,'i') = 0
AND REGEXP_INSTR ("PRP",'pswd',1,1,0,'i') > 0 )
My question is why is the cardinality different between the two execution plans? For the first plan the cardinality (rows) is 65 and for the second it's 1?
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition, if each condition was evaluated separately, and all of this based on table statistics. And that is why for my first query this assumed maximum is 65, since the WHERE conditions are a little more permissive.
And also that is why for the second query the cardinality is 1, since the regexp_instr is more restrictive.
If my assumptions are not correct, I'd really like to know what determines this cardinality number.
Thank you in advance for any help
In your case the expression are too complex for the optimizer to use basic statistics to estimate the cardinality. In these cases (it doesn't seem that you use histograms that might affect LIKE predicates) a fixed selectivity is used:
equality operator: 1%
inequality operator: 5%
So your
LIKE example is approximately (5 % + 5 % - (5 % * 5 %)) * 95 % * 95 % => 8.8 % of total table rows. - (5 % * 5 %) is the intersection because of OR operator.
REGEX example is 1 % * 5 % => 0.05 % of total table rows.
Oracle also supports extended statistics where you can compute statistics and histograms for specific expressions or correlated columns.
You comapare plans with direct WHERE conditions and with REGEXP_INSTR functions. Actually there is no difference which function to use, for oracle very difficult to give a real estimate without function execution.
For example we can create function -
CREATE OR REPLACE FUNCTION f_check(str IN VARCHAR2)
RETURN NUMBER IS
BEGIN
IF str LIKE 'A%' THEN
RETURN 1;
END IF;
RETURN -1;
END;
/
First select -
SELECT *
FROM tmptxt
WHERE dsc LIKE 'A%'
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 121 | 4356 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 121 | 4356 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DSC" LIKE 'A%')
and with function -
SELECT *
FROM tmptxt
WHERE f_check(dsc) = 1
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 36 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 1 | 36 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("F_CHECK"("DSC")=1)
This two queries give the same result, but plan estimate has some difference. It is not too important (fullscan in first way and fullscan in the second), just need to evaluate the whole plan, didn't dwell on the numbers.
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition....
No, cardinality is the estimation of the CBO how many rows will be returned in the operation. (technicaly always >= 1).
The cardinality is calculated either from the object statistics stored in data dictionary or by dynaming sampling (details here).
Dynamic sampling are more costly (as they are calculated in each parse) but can return much precise results.
So one possible workaround to get better estimation is to use dynamic sampling. Here small demo with level 10 (which is extrem and demo only as the whole table is scanned in parsing step; but it is not a problem with 779 rows table and the cardinatlity is exact)
create table tst as
select ltrim(to_char(rownum,'09999')) prp from dual connect by level <= 999999;
select count(*) from tst where prp like '%999%';
280
select count(*) from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
280
Alter session set optimizer_dynamic_sampling=10;
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where prp like '%999%';
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 467 (2)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 467 (2)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter("PRP" IS NOT NULL AND "PRP" LIKE '%999%')
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 479 (5)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 479 (5)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter( REGEXP_INSTR ("PRP",'999',1,1,0,'i')>0)

Fetching data is too slow in Oracle DB (query comparison)

I am hitting my head with the following problem:
I have a table with more than 1,000,000,000 data. Now I am running the following query (acc_no is the primary key):
select acc_no from user where acc_no between 753976276998100 and 78776276998199
The above query ran in less than a second and fetched 100,000 records
But if I add one more column ("service_no") in the same query,
select acc_no,service_no from user where acc_no between 753976276998100 and 78776276998199
.. it is taking more than a minute. Why is that? Why is the first query taking less than a second, and the second query is taking more than a minute?
FYI : service_no is a NUMBER column
If you look at the execution plan for both queries, you'll see that the first query is fulfilled with just an index range scan:
explain plan for
select acc_no from t42
where acc_no between 753976276998100 and 78776276998199;
select * from table (dbms_xplan.display);
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 0 (0)| |
|* 1 | FILTER | | | | | |
|* 2 | INDEX RANGE SCAN| SYS_C0090827 | 1 | 10 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
... which can be quite fast; but the second query has an additional step, table access by index rowid:
explain plan for
select acc_no, service_no from t42
where acc_no between 753976276998100 and 78776276998199;
select * from table (dbms_xplan.display);
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 0 (0)| |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T42 | 1 | 14 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C0090827 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
When you only query for columns that exist in the index - acc_no in this case, which is in the primary key's backing index - only the index has to be touched. There is no need to go and look at the underlying table data for the values you already have from the indexed column.
When your select list includes columns that are not in the index the table data has to be retrieved too, because the other column - service_no is not in the index. That is another disk operation access the data blocks in the table segments. The table data is likely to be scattered across more blocks than the index as well, which amplifies the effect as you might have to fetch a different block for every matching row.
Basically it's having to do much more work to access more data from the disk, so it's going to take longer.

Resources