Using DECODE on a parameter in WHERE clause will shortcircuit using the index? - oracle

I have a stored procedure with multiple mandatory parameters and a SELECT statement inside it which has multiple conditions in its WHERE clause, like below:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = param_3;
This query works fine and it uses the indexes on the table correctly. But a change in requirements implied adjusting the procedure so that you can pass it less parameters, so maybe just the first two, but we want the procedure to work with minimal changes to the stored procedure.
One of the suggestions I've made was to use a DECODE function to treat each possibly NULL parameter, like this:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = DECODE(param_3, null, column_3);
And this way, I considered that because the function is not applied on the table column, the index will still be used. I have made some tests and the query still works and uses the indexes even in this situation.
But I'm still getting contradicted by our architect (with no other explanations), that the query will not use the index because I'm using a function in the WHERE clause.
I'm not sure if my change is enough proof that it will always use the index, or if there are other situations which I should check and in which the index might not be used because of the DECODE function.
Any help / suggestions / information will be very much appreciated.

You are right. Test it and prove it.
Setup
SQL> CREATE TABLE t AS SELECT LEVEL id FROM dual CONNECT BY LEVEL <=10;
Table created.
SQL>
SQL> CREATE INDEX id_indx ON t(ID);
Index created.
Test case
Normal query, without any function:
SQL> set autot on explain
SQL>
SQL> SELECT * FROM t WHERE ID = 5;
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using DECODE on the value(not on column):
SQL> SELECT * FROM t WHERE ID = decode(5, NULL, 3, 5);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using NVL on the value(not on column):
SQL> SELECT * FROM t WHERE ID = nvl(5, 3);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Above all the three cases, index is used.
DECODE on the column:
SQL> SELECT * FROM t WHERE decode(ID, NULL, 3, 5) = 5;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(DECODE(TO_CHAR("ID"),NULL,3,5)=5)
NVL on the column:
SQL> SELECT * FROM t WHERE nvl(ID, 3) = 3;
ID
----------
3
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL("ID",3)=3)
SQL>
As expected, index is not used as you are applying a function on the column having a regular index. You need a function-based index.
So, you are right, you don't have to worry about index usage when you are not applying the function on the column, but on the parameter value.

Related

pl sql declare variable with query too

I want to make a simple query in pl sql
Please suggest and how to make it MORE FAST EXECUTE (maybe only 0.01 second in 1000000 data)
first query:
select datetime
from product
order by datetime desc
FETCH NEXT 1 ROWS ONLY
Result of first query will be used in second query.
select *
from traceability
where endtime = [first query]
Please help me to implement that logic to pl sql
Thank you.
Please find bellow an example with sample data.
create table product as
select rownum product_id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') datetime
from dual connect by level <= 10;
create index product_idx on product(datetime);
create table traceability as
select
rownum id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') endtime
from dual connect by level <= 10;
create index traceability_idx on traceability(endtime);
Your query shou be as follows
select *
from traceability
where endtime =
(select max(datetime)
from product );
The query will lead to this execution plan. See here how to get the execution plan.
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | TRACEABILITY | 1 | 22 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TRACEABILITY_IDX | 1 | | 1 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 9 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| PRODUCT_IDX | 1 | 9 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ENDTIME"= (SELECT MAX("DATETIME") FROM "PRODUCT" "PRODUCT"))
Note that in case that in the table TRACEABILITY will be a large number of rows with the max timestamp, you can also see a FULL TABLE SCAN in the line 1.
Similar is valid for the PRODUCT table and the line 4

ORA-01722: invalid number only on select * from view, not against view directly

I'm getting ORA-01722: invalid number but only when I select * from theView but not when I select against the theView directly (using the SQL inside the view's CREATE OR REPLACE...).
(I have faced and understand this error before, as well as running aggregates against NULL values, that one shouldn't store VARCHARS in NUMBER columns, etc. but am struggling to understand this issue)
Usually you get it when oracle executes your filter predicate that should filter only numbers, after predicates where you use it as a number.
Simple example:
create table t as
select '1' x, 'num' xtype from dual union all
select 'A' x, 'str' xtype from dual
/
create index t_ind on t(x);
You can see we get ORA-01722 in this very simple example even though we specified filter xtype='num' before x > 0:
select x
from (
select x
from t
where xtype='num'
) v
where v.x > 0;
ERROR:
ORA-01722: invalid number
Execution plan:
Plan hash value: 1601196873
---------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$F5BB74E1 / T#SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((TO_NUMBER("X")>0 AND "XTYPE"='num'))
As you can see from the plan, inline view was merged and both predicates are on the same level.
Now compare with this:
select/*+
no_merge(v)
opt_param('_optimizer_filter_pushdown' 'false')
*/
x
from (
select x
from t
where xtype='num'
) v
where v.x > 0;
X
-
1
Execution plan:
Plan hash value: 3578092569
----------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | VIEW | | 1 | 3 | 3 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$2 / V#SEL$1
2 - SEL$2 / T#SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER("V"."X")>0)
2 - filter("XTYPE"='num')
Read more about this: http://orasql.org/2013/06/10/too-many-function-executions/

optimizer always full scans table eben though fething only 3 rows

I have a table foo which was created like this.
CREATE TABLE foo AS SELECT * FROM all_objects;
CREATE INDEX foo_I1 ON foo(owner,object_type,status);
exec dbms_stats.gather_table_stats('hr','foo',method_opt=>'FOR ALL COLUMNS size AUTO');
I created an index on 3 columns and firing a query which looks like below.
select * from foo where status='INVALID';
select * from foo where status='VALID';
status='VALID' fetches near about 71000 rows in a table of 71780 rows. it does a full table scan. it's understandable. but in case of status='INVALID' which fetches only 3 rows , it's doing full table scan. It's also getting A rows and E rows very different.
PLAN: same for both queries.
SQL_ID gdhy9j91gu9sm, child number 0
select /*+gather_plan_statistics */ * from foo where status='VALID'
Plan hash value: 1245013993
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 50 |00:00:00.01 | 4 |
|* 1 | TABLE ACCESS FULL| FOO | 1 | 71773 | 50 |00:00:00.01 | 4 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='VALID')
Please explan this behaviour. Database Version: 11.2g oracle.
A missing histogram is probably causing the full table scan. Histograms are usually only created if the data is skewed and if the column has been used in a relevant predicate.
Sometimes you need to run a query before gathering statistics, to let Oracle know that this column is important enough to deserve a histogram.
select * from foo where status='INVALID';
exec dbms_stats.gather_table_stats('hr','foo',method_opt=>'FOR ALL COLUMNS size AUTO');
Re-run the SELECT and now it can use the histogram. With the histogram Oracle knows that INVALID returns a small number of rows, and an index would be useful:
explain plan for select * from foo where status='INVALID';
select * from table(dbms_xplan.display);
Plan hash value: 1520589999
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 134 | 217 (0)| 00:00:01|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| FOO | 1 | 134 | 217 (0)| 00:00:01|
|* 2 | INDEX SKIP SCAN | FOO_I1 | 1 | | 216 (0)| 00:00:01|
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STATUS"='INVALID')
filter("STATUS"='INVALID')

How is the cardinality determined in a query?

I have two queries which return the same result set, but when reviewing the execution plans they have different values of cardinality.
The queries are:
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where (prp like '%pswd%' or prp like '%Pswd%')
and prp not like '%kno%'
and prp not like '%encr%';
and
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where regexp_instr(prp, 'pswd', 1,1,0,'i' ) > 0
and regexp_instr(prp, '(encr)|(kno)', 1,1,0,'i' ) = 0;
The first query has the following explain plan:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 65 | 4485 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 65 | 4485 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("PRP" LIKE '%pswd%' OR "PRP" LIKE '%Pswd%')
AND "PRP" NOT LIKE '%kno%'
AND "PRP" NOT LIKE '%encr%')
And the explain plan for the second query is:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 1 | 69 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
1 - filter(REGEXP_INSTR ("PRP",'(encr)|(kno)',1,1,0,'i') = 0
AND REGEXP_INSTR ("PRP",'pswd',1,1,0,'i') > 0 )
My question is why is the cardinality different between the two execution plans? For the first plan the cardinality (rows) is 65 and for the second it's 1?
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition, if each condition was evaluated separately, and all of this based on table statistics. And that is why for my first query this assumed maximum is 65, since the WHERE conditions are a little more permissive.
And also that is why for the second query the cardinality is 1, since the regexp_instr is more restrictive.
If my assumptions are not correct, I'd really like to know what determines this cardinality number.
Thank you in advance for any help
In your case the expression are too complex for the optimizer to use basic statistics to estimate the cardinality. In these cases (it doesn't seem that you use histograms that might affect LIKE predicates) a fixed selectivity is used:
equality operator: 1%
inequality operator: 5%
So your
LIKE example is approximately (5 % + 5 % - (5 % * 5 %)) * 95 % * 95 % => 8.8 % of total table rows. - (5 % * 5 %) is the intersection because of OR operator.
REGEX example is 1 % * 5 % => 0.05 % of total table rows.
Oracle also supports extended statistics where you can compute statistics and histograms for specific expressions or correlated columns.
You comapare plans with direct WHERE conditions and with REGEXP_INSTR functions. Actually there is no difference which function to use, for oracle very difficult to give a real estimate without function execution.
For example we can create function -
CREATE OR REPLACE FUNCTION f_check(str IN VARCHAR2)
RETURN NUMBER IS
BEGIN
IF str LIKE 'A%' THEN
RETURN 1;
END IF;
RETURN -1;
END;
/
First select -
SELECT *
FROM tmptxt
WHERE dsc LIKE 'A%'
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 121 | 4356 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 121 | 4356 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DSC" LIKE 'A%')
and with function -
SELECT *
FROM tmptxt
WHERE f_check(dsc) = 1
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 36 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 1 | 36 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("F_CHECK"("DSC")=1)
This two queries give the same result, but plan estimate has some difference. It is not too important (fullscan in first way and fullscan in the second), just need to evaluate the whole plan, didn't dwell on the numbers.
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition....
No, cardinality is the estimation of the CBO how many rows will be returned in the operation. (technicaly always >= 1).
The cardinality is calculated either from the object statistics stored in data dictionary or by dynaming sampling (details here).
Dynamic sampling are more costly (as they are calculated in each parse) but can return much precise results.
So one possible workaround to get better estimation is to use dynamic sampling. Here small demo with level 10 (which is extrem and demo only as the whole table is scanned in parsing step; but it is not a problem with 779 rows table and the cardinatlity is exact)
create table tst as
select ltrim(to_char(rownum,'09999')) prp from dual connect by level <= 999999;
select count(*) from tst where prp like '%999%';
280
select count(*) from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
280
Alter session set optimizer_dynamic_sampling=10;
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where prp like '%999%';
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 467 (2)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 467 (2)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter("PRP" IS NOT NULL AND "PRP" LIKE '%999%')
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 479 (5)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 479 (5)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter( REGEXP_INSTR ("PRP",'999',1,1,0,'i')>0)

What reason to use indexing of particular rows in a table?

There is example of using a function-based indexes in the documentation Concepts Oracle 11G:
A function-based index is also useful for indexing only specific rows
in a table. For example, the cust_valid column in the sh.customers
table has either I or A as a value. To index only the A rows, you
could write a function that returns a null value for any rows other
than the A rows.
I can imagine only this use case: the reducing size of index, by eliminating some rows by condition. Is there other use cases when this possibility is useful?
Let's take a look at function-based indexes:
SQL> create table tab1 as select object_name from all_objects;
Table created.
SQL> exec dbms_stats.gather_table_stats(user, 'TAB1');
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 1117438016
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 18 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
|* 2 | TABLE ACCESS FULL| TAB1 | 181 | 3439 | 18 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
63 consistent gets
...
As you know, all the objects have unique names, but oracle has to analyze all 181 rows and performs 63 consistent gets (physical or logical block reads)
Let's create a function-based index:
SQL> create index tab1_obj_name_idx on tab1(lower(object_name));
Index created.
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 707634933
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | INDEX RANGE SCAN| TAB1_OBJ_NAME_IDX | 181 | 3077 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
...
As you can see the cost cuts down (from 18 to 1) dramatically and there are only 2 consistent gets.
So function-based indexes can increase the performance of your application very well.

Resources