My query is like this:
select id, fn_calc(col)
from table_a
order by id
offset 1483800 rows fetch next 100 rows only;
Note that the paging offset is extremely high - table_a has about 1,500,000 rows.
The above query takes very long, but when fn_calc(col) is replaced by col, query speed is satisfactory - at least 5 times faster. But when the offset is 0 or 100, two queries are almost equally fast. Why this difference?
Possible reasons I can think of:
Oracle executes fn_calc() 1483900 times, although it is logically not neccessary. (It is enough to call it only 100 times)
The calling cost for a user function in a query is very high.
I'm using Oracle 12c on ExaData.
Any suggestion can help.
UPDATE:
When the above query is changed as follows:
select id, fn_calc(col)
from
(
select id, col
from table_a
order by id
offset 1483800 rows fetch next 100 rows only
)
order by id
The query speed is comparable to the case when fn_calc() is not called at all.
Why this?
UPDATE:
The execution plans are as follows: (Sorry, currently only I have is SQLDevelper, so I had to capture the result.)
The first query:
The second query (which uses subquery):
To down-voters and close-voters: Please specify your concerns about this question before you vote. I'll update my question accordingly. This question is about the real and serious problem to me. Please do not deprive the chance to get help.
Firstly, what is the purpose of the pagination query when you are not using ORDER BY clause. You are randomly fetching the rows, i.e. Oracle will internally apply ORDER BY NULL.
Secondly, since you have the function in the SELECT and not in the filter predicate, the explain plan should be same. The only extra time spent should be due to the function.
For example,
SQL> CREATE OR REPLACE FUNCTION f_char(
2 i_empno NUMBER)
3 RETURN VARCHAR2
4 AS
5 v_empno VARCHAR2(10);
6 BEGIN
7 v_empno := TO_CHAR(i_empno);
8 return v_empno;
9 END;
10 /
Function created.
Let's compare the explain plan:
SQL> set autot on explain
SQL>
SQL> SELECT f_char(empno) FROM emp
2 OFFSET 5 ROWS
3 FETCH NEXT 5 rows only;
F_CHAR(EMPNO)
--------------------------------------------------------------------------------
7698
7782
7788
7839
7844
Execution Plan
----------------------------------------------------------
Plan hash value: 3611411408
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 28210 | 4 (0)| 00:00:01 |
|* 1 | VIEW | | 14 | 28210 | 4 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 14 | 56 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 14 | 56 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN
(5>=0) THEN 5 ELSE 0 END +5 AND "from$_subquery$_002"."rowlimit_$$
_rownu
mber">5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=CASE WHEN (5>=0)
THEN 5 ELSE 0 END +5)
SQL> SELECT empno FROM emp
2 OFFSET 5 ROWS
3 FETCH NEXT 5 rows only;
EMPNO
----------
7698
7782
7788
7839
7844
Execution Plan
----------------------------------------------------------
Plan hash value: 3611411408
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 364 | 4 (0)| 00:00:01 |
|* 1 | VIEW | | 14 | 364 | 4 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 14 | 56 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 14 | 56 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN
(5>=0) THEN 5 ELSE 0 END +5 AND "from$_subquery$_002"."rowlimit_$$
_rownu
mber">5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=CASE WHEN (5>=0)
THEN 5 ELSE 0 END +5)
Related
I'm performing test before we migrate Oracle database from 12c to 19c.
I'm facing an unusual behavior, which can be explained with below example.
I've condensed it to reproduceable issue as below.
Sorry for making it very long post, I wanted to feed in all possible information.
If any further information is required, then I would be happy to provide that.
Oracle 12c & 19c versions are as below (from v$instance):
VERSION
12.1.0.2.0
VERSION VERSION_FULL
19.0.0.0.0 19.16.0.0.0
Sample Data
2 tables are as below
TAB1
COLUMN_NAME DATA_TYPE NULLABLE
COL1 VARCHAR2(20 BYTE) Yes
RUL_NO NUMBER(11,0) No
INP_DT TIMESTAMP(6) WITH LOCAL TIME ZONE No
TAB2
COLUMN_NAME DATA_TYPE NULLABLE
COL1 VARCHAR2(20 BYTE) No
COL6 NUMBER(11,0) No
COL7 VARCHAR2(5 BYTE) Yes
INP_DT TIMESTAMP(6) WITH LOCAL TIME ZONE No
Index on TAB2 -
create index tab2_IDX1 on tab2(col6);
create index tab2_IDX2 on tab2(col1);
Problem SQL
SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
This sql returns 10 rows on 12c db, but none on 19c db which is causing regression on 19c side.
Here's the output when this sql is run in trace mode.
12c Trace
SQL> set autotrace traceonly
SQL> set linesize 200
SQL> set pagesize 1000
SQL> SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
2 3 4 5 6 7
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 572408916
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 160 | 3 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | TAB1 | 10 | 160 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB2 | 1 | 15 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TAB2_IDX3 | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T"."COL1" IS NULL OR EXISTS (SELECT 0 FROM "TAB2" "B" WHERE
"B"."COL1"=NVL(:B1,'<NULL>') AND "B"."COL6"=1088609))
3 - filter("B"."COL6"=1088609)
4 - access("B"."COL1"=NVL(:B1,'<NULL>'))
Note
-----
- dynamic statistics used: dynamic sampling (level=4)
19c Trace
SQL> set autotrace traceonly
SQL> set linesize 200
SQL> set pagesize 1000
SQL>
SQL> SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
2 3 4 5 6 7
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 4175419084
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 5 (0)| 00:00:01 |
|* 1 | HASH JOIN SEMI NA | | 1 | 31 | 5 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | TAB1 | 10 | 160 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB2 | 1 | 15 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TAB2_IDX1 | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(NVL("T"."COL1",'<NULL>')="B"."COL1")
4 - access("B"."COL6"=1088609)
Note
-----
- this is an adaptive plan
Can somebody suggest why this behavior is observed in 19c, as it should return 10 rows like 12c db. It seems HASH JOIN SEMI NA step on 19c side is causing this issue, but I can't be sure.
Any help on this matter is very much appreciated.
Thanks,
Kailash
It seems that the 19c execution plan somehow loose the OR t.col1 IS NULL predicate in the Predicate Information
1 - access(NVL("T"."COL1",'<NULL>')="B"."COL1")
Which is most probably a bug (wrong predicate elimination??).
Anyway a workround (if you can change the query) seems to be to add the OR into the EXISTS subquery
SELECT *
FROM tab1 t
WHERE EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>')
OR t.col1 IS NULL );
This also implicitely disables the NA semi join back to the FILTER plan from 12c, which is an other sign that this is the cause of the wrong behaviour.
Open a SR with Oracle for a final solution!
I am running following update query based on input from front end application. Since It is updating more than 550M rows, I've stopped it manually after executing it for 5 hrs. Entire code is big so I've posted below code for 2 fields. Could you please suggest what improvement can be done in this?
I've researched it extensively, e.g. add pragma udf in function, but I am hesitant to add PRAGMA UDF in it because these functions are being used in millions of other transformations/calculations in production environment inside PLSQL and I've read PRAGMA UDF slows down if functions are being called from PL/SQL rather than SQL.
1) Could you please suggest how I can add PRAGMA UDF in this block without making changes to original functions?
2) Should I add hints for the index in update statement? If yes, How I can add hints, such as index hints, in update block?
P.S.: 1) Explain plan is from the test environment in which I've very few rows of data as compared to the production environment.2) other fields are using the same functions.
--EXPLAIN PLAN
Plan hash value: 234644540
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | 2527 | 240 (0)| 00:00:01 |
| 1 | MERGE | MORT_BACK_SEC | | | | |
| 2 | VIEW | | | | | |
| 3 | NESTED LOOPS | | 1 | 2498 | 240 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 372 | 2498 | 240 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| RPT_ACCT_HIER | 1 | 491 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IDX2 | 1 | | 1 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | IDX4 | 372 | | 1 (0)| 00:00:01 |
|* 8 | TABLE ACCESS BY INDEX ROWID | MORT_BACK_SEC | 1 | 2007 | 238 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("RPT_ACCT_HIER"."ACCT_GEN2"='a1000')
7 - access("D"."GL_ACCOUNT_ID"="RPT_ACCT_HIER"."ACCT_MEMBER")
8 - filter("D"."AS_OF_DATE"=TO_DATE(' 2019-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
- 1 Sql Plan Directive used for this statement
-------------------------------------------------------------------------------
--CODE
IF id = 8 then
MERGE INTO(
SELECT A.*,
USB.BAS2_RWA_CALC (BAS_CAPITAL_CALC_CD, NVL (IMP_CUR_BOOK_BAL, 0), NVL (BAS_CAP_FACTOR_K, 0), .06, .08) AS V_IMP_BAS_EB_RWA
--and 20 more function calls for 20 other fields
FROM USB.MORT_BACK_SEC A) D
USING (SELECT * FROM USB.rpt_acct_hier) B
ON (a.gl_account_id = b.acct_member and a.as_of_date = TO_DATE('06/30/2019','MM/DD/YYYY') and b.acct_gen2 = 'a1000')
WHEN MATCHED THEN UPDATE SET
IMP_BAS_EB_RWA = V_IMP_BAS_EB_RWA,
IMP_BAS_EB_TOTAL_CAPITAL = ROUND (USB.BAS2_MGRL_CAPITAL (TO_DATE('06/30/2019','MM/DD/YYYY'), V_IMP_BAS_EB_RWA, 0), 2)
WHERE AS_OF_DATE = TO_DATE('06/30/2019','MM/DD/YYYY') --V_IMP_BAS_EB_RWA is being calculated above and then passed as input here
--20 more fields to be updated
end if;
--USB."BAS2_MGRL_CAPITAL" function is being called for updating IMP_BAS_EB_TOTAL_CAPITAL
CREATE OR REPLACE FUNCTION USB."BAS2_MGRL_CAPITAL"
(v_date in date, v_RWA in number,v_RWC in number) return number result_cache
is
v_capital number(14,2);
v_rate number(15,8);
begin
v_rate := case when year(v_date) < 2010 then 0.06
when year(v_date) >= 2010 and year(v_date) < 2012 then 0.07
when year(v_date) = 2012 then 0.0775
when year(v_date) > 2012 and year(v_date) < 2018 then 0.08
else 0.085 end;
v_capital := (v_RWA + v_RWC) * v_rate;
return round(v_capital,2);
end;
/
--USB."BAS2_RWA_CALC" function which is inside MERGE into(...) block is as below
CREATE OR REPLACE FUNCTION USB."BAS2_RWA_CALC"
(v_formula in char,v_bal in number,v_k_factor in number, v_bas_min in number,v_rwa_adj_rate in number) return number result_cache
is
v_rwa number(15,2);
begin
v_rwa := nvl(v_bal,0)*nvl(v_k_factor,0)/nvl(v_bas_min,0);
v_rwa := v_rwa*(1+v_rwa_adj_rate);
return round(v_rwa,2);
end;
/
I have two queries which return the same result set, but when reviewing the execution plans they have different values of cardinality.
The queries are:
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where (prp like '%pswd%' or prp like '%Pswd%')
and prp not like '%kno%'
and prp not like '%encr%';
and
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where regexp_instr(prp, 'pswd', 1,1,0,'i' ) > 0
and regexp_instr(prp, '(encr)|(kno)', 1,1,0,'i' ) = 0;
The first query has the following explain plan:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 65 | 4485 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 65 | 4485 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("PRP" LIKE '%pswd%' OR "PRP" LIKE '%Pswd%')
AND "PRP" NOT LIKE '%kno%'
AND "PRP" NOT LIKE '%encr%')
And the explain plan for the second query is:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 1 | 69 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
1 - filter(REGEXP_INSTR ("PRP",'(encr)|(kno)',1,1,0,'i') = 0
AND REGEXP_INSTR ("PRP",'pswd',1,1,0,'i') > 0 )
My question is why is the cardinality different between the two execution plans? For the first plan the cardinality (rows) is 65 and for the second it's 1?
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition, if each condition was evaluated separately, and all of this based on table statistics. And that is why for my first query this assumed maximum is 65, since the WHERE conditions are a little more permissive.
And also that is why for the second query the cardinality is 1, since the regexp_instr is more restrictive.
If my assumptions are not correct, I'd really like to know what determines this cardinality number.
Thank you in advance for any help
In your case the expression are too complex for the optimizer to use basic statistics to estimate the cardinality. In these cases (it doesn't seem that you use histograms that might affect LIKE predicates) a fixed selectivity is used:
equality operator: 1%
inequality operator: 5%
So your
LIKE example is approximately (5 % + 5 % - (5 % * 5 %)) * 95 % * 95 % => 8.8 % of total table rows. - (5 % * 5 %) is the intersection because of OR operator.
REGEX example is 1 % * 5 % => 0.05 % of total table rows.
Oracle also supports extended statistics where you can compute statistics and histograms for specific expressions or correlated columns.
You comapare plans with direct WHERE conditions and with REGEXP_INSTR functions. Actually there is no difference which function to use, for oracle very difficult to give a real estimate without function execution.
For example we can create function -
CREATE OR REPLACE FUNCTION f_check(str IN VARCHAR2)
RETURN NUMBER IS
BEGIN
IF str LIKE 'A%' THEN
RETURN 1;
END IF;
RETURN -1;
END;
/
First select -
SELECT *
FROM tmptxt
WHERE dsc LIKE 'A%'
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 121 | 4356 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 121 | 4356 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DSC" LIKE 'A%')
and with function -
SELECT *
FROM tmptxt
WHERE f_check(dsc) = 1
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 36 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 1 | 36 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("F_CHECK"("DSC")=1)
This two queries give the same result, but plan estimate has some difference. It is not too important (fullscan in first way and fullscan in the second), just need to evaluate the whole plan, didn't dwell on the numbers.
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition....
No, cardinality is the estimation of the CBO how many rows will be returned in the operation. (technicaly always >= 1).
The cardinality is calculated either from the object statistics stored in data dictionary or by dynaming sampling (details here).
Dynamic sampling are more costly (as they are calculated in each parse) but can return much precise results.
So one possible workaround to get better estimation is to use dynamic sampling. Here small demo with level 10 (which is extrem and demo only as the whole table is scanned in parsing step; but it is not a problem with 779 rows table and the cardinatlity is exact)
create table tst as
select ltrim(to_char(rownum,'09999')) prp from dual connect by level <= 999999;
select count(*) from tst where prp like '%999%';
280
select count(*) from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
280
Alter session set optimizer_dynamic_sampling=10;
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where prp like '%999%';
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 467 (2)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 467 (2)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter("PRP" IS NOT NULL AND "PRP" LIKE '%999%')
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 479 (5)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 479 (5)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter( REGEXP_INSTR ("PRP",'999',1,1,0,'i')>0)
I have a stored procedure with multiple mandatory parameters and a SELECT statement inside it which has multiple conditions in its WHERE clause, like below:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = param_3;
This query works fine and it uses the indexes on the table correctly. But a change in requirements implied adjusting the procedure so that you can pass it less parameters, so maybe just the first two, but we want the procedure to work with minimal changes to the stored procedure.
One of the suggestions I've made was to use a DECODE function to treat each possibly NULL parameter, like this:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = DECODE(param_3, null, column_3);
And this way, I considered that because the function is not applied on the table column, the index will still be used. I have made some tests and the query still works and uses the indexes even in this situation.
But I'm still getting contradicted by our architect (with no other explanations), that the query will not use the index because I'm using a function in the WHERE clause.
I'm not sure if my change is enough proof that it will always use the index, or if there are other situations which I should check and in which the index might not be used because of the DECODE function.
Any help / suggestions / information will be very much appreciated.
You are right. Test it and prove it.
Setup
SQL> CREATE TABLE t AS SELECT LEVEL id FROM dual CONNECT BY LEVEL <=10;
Table created.
SQL>
SQL> CREATE INDEX id_indx ON t(ID);
Index created.
Test case
Normal query, without any function:
SQL> set autot on explain
SQL>
SQL> SELECT * FROM t WHERE ID = 5;
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using DECODE on the value(not on column):
SQL> SELECT * FROM t WHERE ID = decode(5, NULL, 3, 5);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using NVL on the value(not on column):
SQL> SELECT * FROM t WHERE ID = nvl(5, 3);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Above all the three cases, index is used.
DECODE on the column:
SQL> SELECT * FROM t WHERE decode(ID, NULL, 3, 5) = 5;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(DECODE(TO_CHAR("ID"),NULL,3,5)=5)
NVL on the column:
SQL> SELECT * FROM t WHERE nvl(ID, 3) = 3;
ID
----------
3
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL("ID",3)=3)
SQL>
As expected, index is not used as you are applying a function on the column having a regular index. You need a function-based index.
So, you are right, you don't have to worry about index usage when you are not applying the function on the column, but on the parameter value.
There is example of using a function-based indexes in the documentation Concepts Oracle 11G:
A function-based index is also useful for indexing only specific rows
in a table. For example, the cust_valid column in the sh.customers
table has either I or A as a value. To index only the A rows, you
could write a function that returns a null value for any rows other
than the A rows.
I can imagine only this use case: the reducing size of index, by eliminating some rows by condition. Is there other use cases when this possibility is useful?
Let's take a look at function-based indexes:
SQL> create table tab1 as select object_name from all_objects;
Table created.
SQL> exec dbms_stats.gather_table_stats(user, 'TAB1');
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 1117438016
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 18 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
|* 2 | TABLE ACCESS FULL| TAB1 | 181 | 3439 | 18 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
63 consistent gets
...
As you know, all the objects have unique names, but oracle has to analyze all 181 rows and performs 63 consistent gets (physical or logical block reads)
Let's create a function-based index:
SQL> create index tab1_obj_name_idx on tab1(lower(object_name));
Index created.
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 707634933
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | INDEX RANGE SCAN| TAB1_OBJ_NAME_IDX | 181 | 3077 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
...
As you can see the cost cuts down (from 18 to 1) dramatically and there are only 2 consistent gets.
So function-based indexes can increase the performance of your application very well.