How is the cardinality determined in a query? - oracle

I have two queries which return the same result set, but when reviewing the execution plans they have different values of cardinality.
The queries are:
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where (prp like '%pswd%' or prp like '%Pswd%')
and prp not like '%kno%'
and prp not like '%encr%';
and
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where regexp_instr(prp, 'pswd', 1,1,0,'i' ) > 0
and regexp_instr(prp, '(encr)|(kno)', 1,1,0,'i' ) = 0;
The first query has the following explain plan:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 65 | 4485 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 65 | 4485 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("PRP" LIKE '%pswd%' OR "PRP" LIKE '%Pswd%')
AND "PRP" NOT LIKE '%kno%'
AND "PRP" NOT LIKE '%encr%')
And the explain plan for the second query is:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 1 | 69 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
1 - filter(REGEXP_INSTR ("PRP",'(encr)|(kno)',1,1,0,'i') = 0
AND REGEXP_INSTR ("PRP",'pswd',1,1,0,'i') > 0 )
My question is why is the cardinality different between the two execution plans? For the first plan the cardinality (rows) is 65 and for the second it's 1?
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition, if each condition was evaluated separately, and all of this based on table statistics. And that is why for my first query this assumed maximum is 65, since the WHERE conditions are a little more permissive.
And also that is why for the second query the cardinality is 1, since the regexp_instr is more restrictive.
If my assumptions are not correct, I'd really like to know what determines this cardinality number.
Thank you in advance for any help

In your case the expression are too complex for the optimizer to use basic statistics to estimate the cardinality. In these cases (it doesn't seem that you use histograms that might affect LIKE predicates) a fixed selectivity is used:
equality operator: 1%
inequality operator: 5%
So your
LIKE example is approximately (5 % + 5 % - (5 % * 5 %)) * 95 % * 95 % => 8.8 % of total table rows. - (5 % * 5 %) is the intersection because of OR operator.
REGEX example is 1 % * 5 % => 0.05 % of total table rows.
Oracle also supports extended statistics where you can compute statistics and histograms for specific expressions or correlated columns.

You comapare plans with direct WHERE conditions and with REGEXP_INSTR functions. Actually there is no difference which function to use, for oracle very difficult to give a real estimate without function execution.
For example we can create function -
CREATE OR REPLACE FUNCTION f_check(str IN VARCHAR2)
RETURN NUMBER IS
BEGIN
IF str LIKE 'A%' THEN
RETURN 1;
END IF;
RETURN -1;
END;
/
First select -
SELECT *
FROM tmptxt
WHERE dsc LIKE 'A%'
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 121 | 4356 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 121 | 4356 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DSC" LIKE 'A%')
and with function -
SELECT *
FROM tmptxt
WHERE f_check(dsc) = 1
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 36 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 1 | 36 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("F_CHECK"("DSC")=1)
This two queries give the same result, but plan estimate has some difference. It is not too important (fullscan in first way and fullscan in the second), just need to evaluate the whole plan, didn't dwell on the numbers.

My assumption is that this cardinality is the maximum number of rows that will be returned by each condition....
No, cardinality is the estimation of the CBO how many rows will be returned in the operation. (technicaly always >= 1).
The cardinality is calculated either from the object statistics stored in data dictionary or by dynaming sampling (details here).
Dynamic sampling are more costly (as they are calculated in each parse) but can return much precise results.
So one possible workaround to get better estimation is to use dynamic sampling. Here small demo with level 10 (which is extrem and demo only as the whole table is scanned in parsing step; but it is not a problem with 779 rows table and the cardinatlity is exact)
create table tst as
select ltrim(to_char(rownum,'09999')) prp from dual connect by level <= 999999;
select count(*) from tst where prp like '%999%';
280
select count(*) from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
280
Alter session set optimizer_dynamic_sampling=10;
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where prp like '%999%';
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 467 (2)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 467 (2)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter("PRP" IS NOT NULL AND "PRP" LIKE '%999%')
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 479 (5)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 479 (5)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter( REGEXP_INSTR ("PRP",'999',1,1,0,'i')>0)

Related

how to avoid TABLE ACCESS FULL when fetching rows in Oracle

I'm using Oracle 18c but I guess my question would not be bound to the specific version.
I want to fetch rows from a table but I found a complex, ugly solution.
I would like to know if there is better, simple query that can return the same result as following.
First of all, I have a simple table like this.
Note that col is going to store large text.
CREATE TABLE simpletable
(record_id NUMBER,
col CLOB,
PRIMARY KEY (record_id));
I want to retrieve single row from the above table and whichever row is acceptable.
First query came to my mind is as following.
SELECT * FROM (SELECT * FROM simpletable) WHERE rownum <= 1;
Another is as following.
SELECT * FROM (SELECT * FROM simpletableORDER BY record_id) WHERE rownum <= 1;
Unfortunately, neither of above two does not use primary-key index and uses TABLE ACCESS FULL which can take long time when the table grows enough large.
(I'm guessing that oracle preferred the simpler plan because my table is not enough large yet to use index scan.
Oracle might choose different plan if the table grows up further.)
My final solution that uses primary-key index to narrow down the table access is following.
SELECT simpletable.* FROM
(SELECT * FROM
(SELECT record_id, ROWID as id FROM simpletable ORDER BY record_id)
WHERE rownum<=1) a
JOIN simpletable ON a.id = simpletable.ROWID;
If you have a better solution, please let me know.
It would be very appreciated.
P.S.
The first two queries produced the following plan.
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 2015 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | SIMPLETABLE | 1 | 2015 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
the final one is:
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2039 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2039 | 3 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_C007561 | 1 | 25 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID| SIMPLETABLE | 1 | 2014 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
I think using OFFSET..FETCH method might helps you here -
SELECT *
FROM simpletable
ORDER BY record_id
OFFSET 0 ROWS
FETCH FIRST ROW ONLY;
If the Cost Based Optimizer code has access to reliable statistics, from its dictionary, regarding all the objects available for this query, then it will very likely produce an optimal execution plan. Of course, there are exceptions and you would argue with their support people as to whether or not choosing a suboptimal plan is a bug.
In this specific case, if you are querying a single table and the CBO could choose between a full table scan and some other scan and then chose a full table scan, then chances were good that the CBO determined that the number of blocks scanned (buffer gets) would have been smaller using a full table scan.
You can expose the truth of the matter by tracing the execution of multiple versions of the statement, each one using a different set of hints to force a particular execution plan. You should consider the execution with the fewest buffer gets to be the winner. Alternatively, if the execution plan is of a serial nature, then you can use response time as measure. If the winner is not automatically chosen by the CBO, then it's probably because the statistics it used were not accurate and you should make them accurate. If the statistics are indeed accurate then Oracle support will probably give you a very long homework assignment.
Similar to the Horror Vacui, some database developers suffer under the Horror FULL TABLE SCAN by simply assuming index access good, full scan bad.
But this is not true, FULL TABLE SCAN is a normal access method, that is preferred in some situation.
Let's illustrate it on a simple example with 10K rows in your table
insert into simpletable (record_id, col)
select rownum, rpad('x',3998,'y')
from dual connect by level <= 10000
To get one arbitrary row from the table you simple use the following query
select * from simpletable where rownum = 1;
Here is the output (edited for brevity) you get from SQL*Plus with setting set autotrace traceonly to see the execution plan and the statistics.
Execution Plan
----------------------------------------------------------
Plan hash value: 1007892724
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 2 (0)| 00:00:01
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS FULL| SIMPLETABLE | 10188 | 19M| 2 (0)| 00:00:01
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
Statistics
----------------------------------------------------------
5 consistent gets
2 physical reads
1 rows processed
The most important information is the statistics consistent gets - there were only 5 blocks accessed - the table is much larger.
What is the explanation? See the operation COUNT STOPKEY above the TABLE ACCESS FULL this ensures that the scan is terminated after the first row is found.
If you want to get a specific row, e.g. the one with the highest ID, the prefered approach is using the row_limiting_clause
SELECT *
FROM simpletable
ORDER BY record_id DESC
OFFSET 0 ROWS FETCH NEXT 1 ROW ONLY;
You will see the execution plan below, that performs first the INDEX FULL SCAN DESCENDING. The complete (full) index will be red in the descending order, but again due to STOPKEY you break after reading the highest key (which is the first entry due to the descending order).
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 1 | VIEW | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY | | 10188 | 19M| 613 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| SIMPLETABLE | 10188 | 19M| 613 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN DESCENDING| SYS_C008793 | 10188 | | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN (0>=0) THEN
0 ELSE 0 END +1 AND "from$_subquery$_002"."rowlimit_$$_rownumber">0)
2 - filter(ROW_NUMBER() OVER ( ORDER BY
INTERNAL_FUNCTION("SIMPLETABLE"."RECORD_ID") DESC )<=CASE WHEN (0>=0) THEN 0 ELSE 0
END +1)
Note if the table is empty or contains very few rows, you will se even here a TABLE ACCESS FULL because the optimizer recognises that it is more effective that to first go to the index and that access the table.

Improving query performance to search in clob column in big table

I have a big table (about 4.6 million records) with many columns, I have concatenated some columns and inserted it in clob column(one column have an alias of the name so the name came in a different way and it inserted in clob column )and that column with ctxsys.context index.
I searched with contains a function with the fuzzy operator and for performance, I added tow columns to search and those columns with a bitmap index
and analyzed the table by using DBMS_STATS.GATHER_TABLE_STATS() also I alter my index to take parallel with 4 degrees and increase SORT_AREA_SIZE to 8300000.
My problem is when I searched it's taken from 2 to 5 min to executed.
is there any way to improve performance and reduce time execution(another algorithm to speed searching or I can change the structure of my table by increase the columns and search in multiple columns).
Here is my query:
SELECT first_name,
last_name,
countries,
category,
aliases
FROM (SELECT first_name,
last_name,
countries,
category,
aliases,
rr
FROM (SELECT T.u_id,
T.first_name,
T.last_name,
T.countries,
T.category,
T.aliases,
ROWNUM rr,
all_data
FROM tbl_rsk_list_world T
WHERE t.countries = 'SPAIN'
AND category = 'Eng')
WHERE Contains(all_data, 'fuzzy(JOSE,60,,weight)', 1) > 0)
WHERE rr BETWEEN 1 AND 500
The Execution plan is:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2747287528
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20651 | 109M| 5724 (1
|* 1 | VIEW | | 20651 | 109M| 5724 (1
| 2 | COUNT | | | |
| 3 | PX COORDINATOR | | | |
| 4 | PX SEND QC (RANDOM)| :TQ10000 | 20651 | 4638K| 5724 (1
| 5 | PX BLOCK ITERATOR | | 20651 | 4638K| 5724 (1
|* 6 | TABLE ACCESS FULL| TBL_RSK_LIST_WORLD | 20651 | 4638K| 5724 (1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CTXSYS"."CONTAINS"("ALL_DATA",'fuzzy(jose,60,,weight)',1)>0 AND "
AND "from$_subquery$_002"."RR">=1)
6 - filter("COUNTRIES"='SPAIN' AND "CATEGORY"='Eng')
20 rows selected
when I using FIRST_ROWS and DOMAIN_INDEX_NO_SORT hints the execution plan be:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1488722846
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 173K|
|* 1 | VIEW | | 50 | 173K|
| 2 | COUNT | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| TBL_RSK_LIST_WORLD | 50 | 11500 |
|* 4 | DOMAIN INDEX | NDX_RSK_LIST_WORLD_CTX | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CATEGORY"='Eng' AND "COUNTRIES"='SPAIN' AND
"from$_subquery$_002"."RR"<=500 AND "from$_subquery$_002"."RR">=1)
4 - access("CTXSYS"."CONTAINS"("W"."ALL_DATA",'fuzzy(jose,60,,weight)',1)>0)
18 rows selected
but still the performance bad :\

Why is Oracle's query planner adding a filter predicate that replicates a constraint?

I have a simple Oracle query with a plan that doesn't make sense.
SELECT
u.custid AS "custid",
l.listid AS "listid"
FROM
users u
INNER JOIN lists l ON u.custid = l.custid
And here’s what the autotrace explain tells me it has for a plan
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1468K| 29M| | 11548 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 1468K| 29M| 7104K| 11548 (1)| 00:00:01 |
| 2 | INDEX FAST FULL SCAN| USERS_PK | 404K| 2367K| | 266 (2)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | LISTS | 1416K| 20M| | 9110 (1)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."CUSTID"="L"."CUSTID")
3 - filter(TRUNC("SORT_TYPE")>=1 AND TRUNC("SORT_TYPE")<=16)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 1 Sql Plan Directive used for this statement
What concerns me is predicate 3. sort_type does not appear in the query, and is not indexed at all. It seems to me that sort_type should not be involved in this query at all.
There is a constraint on lists.sort_type: (Yes, I know we could have sort_type be an INTEGER not a NUMBER)
sort_type NUMBER DEFAULT 2 NOT NULL,
CONSTRAINT lists_sort_type CHECK ( sort_type BETWEEN 1 AND 16 AND TRUNC(sort_type) = sort_type )
It looks to me that that filter is on sort_type is basically a tautology. Every row in lists must pass that filter because of that constraint.
If I drop the constraint, the filter no longer shows up in the plan, and the estimated cost goes down a little bit. If I add the constraint back, the plan uses the filter again. There's no significant difference in execution speed one way or the other.
I'm concerned because I discovered this filter in a much larger, more complex query that I was trying to optimize down from a couple of minutes of runtime.
Why is Oracle adding that filter, and is it a problem and/or pointing to another problem?
EDIT: If I change the constraint on sort_type to not have the TRUNC part, the filter disappears. If I split the constraint into two different constraints, the filter comes back.
Generally speaking, Oracle generates predicates based on your CHECK constraints whenever doing so will give the optimizer more information to generate a good plan. It is not always smart enough to recognize when those are redundant. Here is a short example in Oracle 12c using your table names:
-- Create the CUSTS table
CREATE TABLE custs ( custid number not null );
CREATE INDEX custs_u1 on custs (custid);
-- Create the LISTS table
CREATE TABLE lists
( listid number not null,
sort_type number not null,
custid number,
constraint lists_c1 check ( sort_type between 1 and 16 and
trunc(sort_type) = sort_Type )
);
-- Explain a join
EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(u) */
u.custid AS "custid",
l.listid AS "listid"
FROM custs u
INNER JOIN lists l ON u.custid = l.custid;
-- Show the plan
select * from table(dbms_xplan.display);
Plan hash value: 2443150416
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 3 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 39 | 3 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | CUSTS_U1 | 1 | 13 | 1 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| LISTS | 1 | 26 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."CUSTID"="L"."CUSTID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
So far, nothing weird. No questionable predicates added.
Now, let's tell the Oracle optimizer that the distribution of data on TRUNC(sort_type) might matter...
declare
x varchar2(30);
begin
x := dbms_stats.create_extended_stats ( user, 'LISTS', '(TRUNC(SORT_TYPE))');
dbms_output.put_line('Extension name = ' || x);
end;
... and, now, let's explain that same query again...
-- Re-explain the same join as before
EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(u) */
u.custid AS "custid",
l.listid AS "listid"
FROM custs u
INNER JOIN lists l ON u.custid = l.custid;
-- Show the new plan
select * from table(dbms_xplan.display);
Plan hash value: 2443150416
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | 3 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 52 | 3 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | CUSTS_U1 | 1 | 13 | 1 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| LISTS | 1 | 39 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."CUSTID"="L"."CUSTID")
3 - filter(TRUNC("SORT_TYPE")>=1 AND TRUNC("SORT_TYPE")<=16)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Now, Oracle has added the predicate, because the CBO might benefit from it. Does it really benefit from it? No, but Oracle is only smart enough to know that it might and that it doesn't(*) hurt anything.
(*) there have been numerous bugs in previous versions where this _has_ hurt things by messing up the selectivity estimated by the CBO.
The presence of extended statistics is only one example reason of why Oracle might think it could benefit from this predicate. To find out if that is the reason in your case, you can look for extended statistics in your database like this:
SELECT * FROM dba_stat_extensions where table_name = 'LISTS';
Keep in mind, the Oracle CBO can create stat extensions on its own. So there could be extended stats that you didn't realize were there.

Using DECODE on a parameter in WHERE clause will shortcircuit using the index?

I have a stored procedure with multiple mandatory parameters and a SELECT statement inside it which has multiple conditions in its WHERE clause, like below:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = param_3;
This query works fine and it uses the indexes on the table correctly. But a change in requirements implied adjusting the procedure so that you can pass it less parameters, so maybe just the first two, but we want the procedure to work with minimal changes to the stored procedure.
One of the suggestions I've made was to use a DECODE function to treat each possibly NULL parameter, like this:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = DECODE(param_3, null, column_3);
And this way, I considered that because the function is not applied on the table column, the index will still be used. I have made some tests and the query still works and uses the indexes even in this situation.
But I'm still getting contradicted by our architect (with no other explanations), that the query will not use the index because I'm using a function in the WHERE clause.
I'm not sure if my change is enough proof that it will always use the index, or if there are other situations which I should check and in which the index might not be used because of the DECODE function.
Any help / suggestions / information will be very much appreciated.
You are right. Test it and prove it.
Setup
SQL> CREATE TABLE t AS SELECT LEVEL id FROM dual CONNECT BY LEVEL <=10;
Table created.
SQL>
SQL> CREATE INDEX id_indx ON t(ID);
Index created.
Test case
Normal query, without any function:
SQL> set autot on explain
SQL>
SQL> SELECT * FROM t WHERE ID = 5;
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using DECODE on the value(not on column):
SQL> SELECT * FROM t WHERE ID = decode(5, NULL, 3, 5);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using NVL on the value(not on column):
SQL> SELECT * FROM t WHERE ID = nvl(5, 3);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Above all the three cases, index is used.
DECODE on the column:
SQL> SELECT * FROM t WHERE decode(ID, NULL, 3, 5) = 5;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(DECODE(TO_CHAR("ID"),NULL,3,5)=5)
NVL on the column:
SQL> SELECT * FROM t WHERE nvl(ID, 3) = 3;
ID
----------
3
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL("ID",3)=3)
SQL>
As expected, index is not used as you are applying a function on the column having a regular index. You need a function-based index.
So, you are right, you don't have to worry about index usage when you are not applying the function on the column, but on the parameter value.

What reason to use indexing of particular rows in a table?

There is example of using a function-based indexes in the documentation Concepts Oracle 11G:
A function-based index is also useful for indexing only specific rows
in a table. For example, the cust_valid column in the sh.customers
table has either I or A as a value. To index only the A rows, you
could write a function that returns a null value for any rows other
than the A rows.
I can imagine only this use case: the reducing size of index, by eliminating some rows by condition. Is there other use cases when this possibility is useful?
Let's take a look at function-based indexes:
SQL> create table tab1 as select object_name from all_objects;
Table created.
SQL> exec dbms_stats.gather_table_stats(user, 'TAB1');
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 1117438016
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 18 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
|* 2 | TABLE ACCESS FULL| TAB1 | 181 | 3439 | 18 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
63 consistent gets
...
As you know, all the objects have unique names, but oracle has to analyze all 181 rows and performs 63 consistent gets (physical or logical block reads)
Let's create a function-based index:
SQL> create index tab1_obj_name_idx on tab1(lower(object_name));
Index created.
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 707634933
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | INDEX RANGE SCAN| TAB1_OBJ_NAME_IDX | 181 | 3077 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
...
As you can see the cost cuts down (from 18 to 1) dramatically and there are only 2 consistent gets.
So function-based indexes can increase the performance of your application very well.

Resources