I'm doing a basic performance check using both Connect By and a user-defined function to get a parent value. It seems that using a user-defined function performs better than the Connect By query.
I would like to know if using the user-defined function is supposed to be better performing as compared to Connect By.
create table org ( pid number, cid number, type varchar2(10), name varchar2(30) );
alter table org add constraint org_pk primary key ( cid ); -- UPDATE#2
insert into org values (null,1,'MGT','OP');
insert into org values (1,2,'DEP','HR');
insert into org values (1,3,'DEP','IT');
insert into org values (3,4,'DIV','WEB');
insert into org values (3,5,'DIV','DB');
insert into org values (4,6,'SEC','HTML');
insert into org values (4,7,'SEC','JAVA');
create or replace function get_dep ( p_cid in number ) return number
is
l_pid number;
l_cid number;
l_type varchar2(30);
begin
select pid
, cid
, type
into l_pid
, l_cid
, l_type
from org
where cid = p_cid;
if ( l_type = 'MGT' ) then
return null;
elsif ( l_type = 'DEP' ) then
return l_cid;
else
return get_dep ( l_pid );
end if;
end;
/
select cid --correction
from org
where type = 'DEP'
start
with cid = 7
connect
by
prior pid = cid
and
prior type != 'DEP'
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 66 | 6 (17)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select get_dep ( cid )
from org
where cid = 7;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORG | 1 | 13 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #1:
I updated the function to add a logic to return null if id is MGT.
Also, change the queries to fetch all records in the table.
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 10 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | ORG | 7 | 91 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| ORG | 7 | 91 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #2: Added index as suggested. The explain plan improved on both but the query with the user-defined function still performs better based on the explain plan (unless I'm not interpreting the plan correctly).
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY WITH FILTERING | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 5 | NESTED LOOPS | | 1 | 53 | 2 (0)| 00:00:01 |
|* 6 | CONNECT BY PUMP | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 9 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 1 (0)| 00:00:01 |
| 1 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
Appreciate any feedback.
First of all, in your example, SQL and PL/SQL return different results.
SQL> select pid
2 from org
3 where type = 'DEP'
4 start
5 with cid = 7
6 connect
7 by
8 prior pid = cid
9 and
10 prior type != 'DEP';
PID
----------
1
SQL>
SQL> select get_dep ( cid )
2 from org
3 where cid = 7;
GET_DEP(CID)
------------
3
Secondly, it does not really make sense to compare different approaches on such extremely small data volumes.
Let's assume we have a tree with depth 999 999 and want to find a root for a given node.
In my example there is only one tree (which is actually a list since each parent has one child) therefore root is the same for all nodes.
The important thing is: the bigger depth of a given ID the longer execution time.
create table org0 ( pid number, cid number, name varchar2(30) );
insert into org0
select rownum, rownum+1, 'name' || rpad(rownum,25,'#')
from dual
connect by rownum < 1e6;
alter table org0 add constraint org0_pk primary key ( cid );
Function for returning the root
create or replace function get_id(p_cid in number) return number is
l_pid number;
begin
select pid into l_pid from org0 where cid = p_cid;
return get_id(l_pid);
exception
when no_data_found then
return p_cid;
end get_id;
/
Testing
SQL
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 10000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.07
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 100000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.55
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.79
PL/SQL
SQL> select get_id(10000) id from dual;
ID
----------
1
Elapsed: 00:00:00.15
SQL> select get_id(100000) id from dual;
ID
----------
1
Elapsed: 00:00:01.47
SQL> select get_id(1000000) id from dual;
ID
----------
1
Elapsed: 00:00:14.83
As you can see, PL/SQL is approximately 2 times slower.
In some specific cases PL/SQL may be faster though (not for your task).
You can read about fine grained performance analysis and using tools like dbms_hprof in this book Oracle SQL Revealed, chapter "When PL/SQL Is Better Than Vanilla SQL".
The hierarchical queries often lead to suboptimal performance. Frequent use of PL/SQL functions additionally introduce the problem of context switch.
One possible approach to get performance of a hierarchical query comparable with a single row index access is to define a materialize view that pre-calculates the query.
I’m using the identical data from the #Dr Y Wit answer.
create materialized view mv_org as
select
CID, PID, NAME, CONNECT_BY_ROOT PID ROOT_PID
from org0
start with pid in (
select pid from org0
MINUS
select cid from org0
)
connect by prior cid = pid;
Note that the MV contain the original data and adds the column PID_ROOT which is the pre-calculated root key.
CID PID NAME ROOT_PID
---------- ---------- ------------------------------ ----------
2 1 name1######################## 1
3 2 name2######################## 1
4 3 name3######################## 1
....
The performance of the queries is fine, as there is no need to do hierarchical query any more.
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.07
SQL> select root_pid from mv_org where cid = 1000000;
ROOT_PID
----------
1
Elapsed: 00:00:00.01
So if you can manage the changes in the hierarchical table in regular windows (say once per day or month) and performs refresh of the MV, you localize the complexity (and performance load) in this refresh and you regular queries are fast.
SQL> exec DBMS_MVIEW.REFRESH ('mv_org','c');
Elapsed: 00:00:27.58
Related
In the example below Oracle's optimizer's estimated rows is incorrect by two orders of magnitude. How do I improve the estimated rows?
Table A has rows with numbers 1 through 1,000 for each of the 10 letters A through J.
Table C has 100 copies of table A.
So, table A has a cardinality of 10K and table C has a cardinality of 1M.
A given single-valued predicate on the number in table A will yield 1/1000 of the rows in table A (same for table C).
A given single-valued predicate on the letter in table A will yield 1/10 of the rows in table A (same for table C).
Setup script.
drop table C;
drop table A;
create table A
( num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into A (num, val, pad)
select mod(level-1, 1000) +1
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A'))
, ' '
from dual
connect by level <= 10*1000
;
create table C
( id NUMBER
, num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into C (id, num, val, pad)
with
"D1" as
( select /*+ materialize */ null from dual connect by level <= 100 --320
)
, "D" as
( select /*+ materialize */
level rn
, mod(level-1, 1000) + 1 num
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A')) val
, ' ' pad
from dual
connect by level <= 10*1000
order by 1 offset 0 rows
)
select rownum id
, num num
, val val
, pad pad
from "D1", "D"
;
commit;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
Consider the explain plan to the following query.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num = 1
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 1 | 47 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 100 | 5200 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."NUM"=1 AND "A"."VAL"='A')
3 - filter("C"."NUM"=1 AND "C"."VAL"='A')
The row cardinality of each step makes sense to me.
ID=2 --> (1/1,000) * (1/10) * 10,000 = 1
ID=3 --> (1/1,000) * (1/10) * 1,000,000 = 100
ID=1 --> 100 is correct. Predicates in ID=2 and ID=3 are the same, every row from ID=2 will have one and only one match in the row source from ID=3.
Now consider the explain plan to the slightly modified query below.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 2 | 94 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 200 | 10400 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."VAL"='A' AND ("A"."NUM"=1 OR "A"."NUM"=2))
3 - filter("C"."VAL"='A' AND ("C"."NUM"=1 OR "C"."NUM"=2))
The row cardinality of each step ID=2 and ID=3 makes sense to me, but now ID=1 is incorrect by two orders of magnitude.
ID=2 --> (1/1,000)(1/10) * 10,000 = 1
ID=3 --> (1/1,000)(1/10) * 1,000,000 = 100
ID=1 --> The optimizer's estimate is two orders of magnitude different from the actual.
Adding unique and foreign constraints and extended statistics did not improve the estimated row counts.
create unique index IU_A on A (num, val);
alter table A add constraint UK_A unique (num, val) rely using index IU_A enable validate;
alter table C add constraint R_C foreign key (num, val) references A (num, val) rely enable validate;
create index IR_C on C (num, val);
select dbms_stats.create_extended_stats(null,'A','(num, val)') from dual;
select dbms_stats.create_extended_stats(null,'C','(num, val)') from dual;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 2 | 198 | 10 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| A | 2 | 94 | 5 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IU_A | 2 | | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access(("A"."NUM"=1 OR "A"."NUM"=2) AND "A"."VAL"='A')
6 - access("A"."NUM"="C"."NUM" AND "C"."VAL"='A')
filter("C"."NUM"=1 OR "C"."NUM"=2)
What do I need to do to make the estimated rows better match reality?
Using Oracle Enterprise Edition 19c.
Thanks in advance.
Edit
After ensuring the most recent optimizer_features_enable was used and modifying one of the predicates, we still have an explain plan whose estimated row count is short by two orders of magnitude.
ID=6 ought to have an estimated rows of 100. It seems it is applying the predicate factor twice. Once for the access and again for the filter.
select /*+ optimizer_features_enable('19.1.0') */
*
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val in('A','B')
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 4 | 396 | 16 (0)| 00:00:01 |
| 1 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 2 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | table access by index ROWID BATCHED| A | 4 | 188 | 7 (0)| 00:00:01 |
|* 5 | index range scan | IU_A | 4 | | 3 (0)| 00:00:01 |
|* 6 | index range scan | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | table access by index ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."NUM"=1 or "A"."NUM"=2)
filter("A"."VAL"='A' or "A"."VAL"='B')
6 - access("A"."NUM"="C"."NUM" and "A"."VAL"="C"."VAL")
filter(("C"."NUM"=1 or "C"."NUM"=2) and ("C"."VAL"='A' or "C"."VAL"='B'))
I want to make a simple query in pl sql
Please suggest and how to make it MORE FAST EXECUTE (maybe only 0.01 second in 1000000 data)
first query:
select datetime
from product
order by datetime desc
FETCH NEXT 1 ROWS ONLY
Result of first query will be used in second query.
select *
from traceability
where endtime = [first query]
Please help me to implement that logic to pl sql
Thank you.
Please find bellow an example with sample data.
create table product as
select rownum product_id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') datetime
from dual connect by level <= 10;
create index product_idx on product(datetime);
create table traceability as
select
rownum id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') endtime
from dual connect by level <= 10;
create index traceability_idx on traceability(endtime);
Your query shou be as follows
select *
from traceability
where endtime =
(select max(datetime)
from product );
The query will lead to this execution plan. See here how to get the execution plan.
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | TRACEABILITY | 1 | 22 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TRACEABILITY_IDX | 1 | | 1 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 9 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| PRODUCT_IDX | 1 | 9 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ENDTIME"= (SELECT MAX("DATETIME") FROM "PRODUCT" "PRODUCT"))
Note that in case that in the table TRACEABILITY will be a large number of rows with the max timestamp, you can also see a FULL TABLE SCAN in the line 1.
Similar is valid for the PRODUCT table and the line 4
I have a procedure , in which a table's columns is being filled using sum and nvl functions on other tables' column. These update queries are slow and which is making overall Proc slow.One of such update query is below:
UPDATE t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
Here t_overall and t_final , both the tables do not have any indexes as they have multiple updates in the overall procedure. Number of records for table t_final is around 8500 and for table t_overall is around 13000. Is there any other way , I can write above query in more optimized way?
Edit 1: Here SUM(NVL(pct,0)) function is first replacing null to 0 in 'pct' column of table t_overall and then adds all pct values using sum function and updates pct column of the table t_final depending on the criteria.
Explain plan returns below:
OPERATION OBJECT_NAME CARDINALITY COST
UPDATE STATEMENT 6 424
UPDATE T_FINAL
TABLE ACCESS(FULL) T_FINAL 6 238
. Filter Predicates
. AND
. RTYPE=6
. SID='R12'
. RID=9
. PID=21
SORT(AGGREGATE) 1
TABLE ACCESS(FULL) T_OVERALL 1 30
Filter Predicates
AND
MID-:B1
RTYPE=6
SID='R12'
RID=9
PID=21
Updated number of rows are around 2200
Edit 2: I have run update query with hint /*+ gather_plan_statistics */ as below:
ALTER session SET statistics_level=ALL;
UPDATE /*+ gather_plan_statistics */ t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
select * from
table (dbms_xplan.display_cursor (format=>'ALLSTATS LAST'));
The result is:
SQL_ID gypnfv5nzurb0, child number 1
-------------------------------------
select child_number from v$sql where sql_id = :1 order by
child_number
Plan hash value: 4252345203
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | | | |
| 1 | SORT ORDER BY | | 1 | 1 | 2 |00:00:00.01 | 2048 | 2048 | 2048 (0)|
|* 2 | FIXED TABLE FIXED INDEX| X$KGLCURSOR_CHILD (ind:2) | 1 | 1 | 2 |00:00:00.01 | | | |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("KGLOBT03"=:1 AND "INST_ID"=USERENV('INSTANCE')))
Thank you.
You did not provide enough information to make unique diagnose, so I can only hint you how to troubleshoot your query.
Here is my setup simulation your data
create table t_final as
select rownum mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, 0 pct from dual
connect by level <= 8800;
drop table T_OVERALL;
create table T_OVERALL as
select mod(rownum,8800) mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, rownum pct from dual
connect by level <= 13000;
Now I run the query activating the statistics gathering to see what the query is doing:
SQL> UPDATE /*+ gather_plan_statistics */ t_final wp
2 SET PCT =
3 (
4 SELECT SUM(NVL(pct,0))
5 FROM t_overall
6 WHERE rid = 9
7 AND rtype = 1
8 AND sid = 'r12'
9 AND pid = 21
10 AND mid = wp.mid
11 )
12 WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
2200 rows updated.
Elapsed: 00:00:00.97
So nearly one second elapsed time, which is is slow if you have lot of such updates. To see the cause we display the cursor and the statsitics (hist is possible using the hint /*+ gather_plan_statistics */)
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
SQL_ID 3ctaz5gvksb54, child number 0
-------------------------------------
UPDATE /*+ gather_plan_statistics */ t_final wp SET PCT = (
SELECT SUM(NVL(pct,0)) FROM t_overall WHERE rid
= 9 AND rtype = 1 AND sid = 'r12' AND pid =
21 AND mid = wp.mid ) WHERE rid = 9 AND rtype =
1 AND sid = 'r12' AND pid = 21
Plan hash value: 1255260726
-------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.96 | 116K|
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.96 | 116K|
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.92 | 112K|
|* 4 | TABLE ACCESS FULL| T_OVERALL | 2200 | 33 | 3250 |00:00:00.85 | 112K|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
2 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "SID"='r12'))
4 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "MID"=:B1 AND "SID"='r12'))
So you see the main problem was in the FULL TABLE SCAN on T_OVERALL which was called 2200 times (columns Starts, line 4).
A remedy could provide an Index based on the filter predicate of line 4:
create index T_OVERALL_IDX on T_OVERALL(mid, rid, rtype, sid, pid);
On the same data now I got:
Elapsed: 00:00:00.05
with the changed plan using now 2200 INDEX RANGE SCANs
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.05 | 10272 |
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.05 | 10272 |
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.01 | 5755 |
| 4 | TABLE ACCESS BY INDEX ROWID| T_OVERALL | 2200 | 33 | 3250 |00:00:00.01 | 5755 |
|* 5 | INDEX RANGE SCAN | T_OVERALL_IDX | 2200 | 1 | 3250 |00:00:00.01 | 2505 |
---------------------------------------------------------------------------------------------------------
Simple recheck the same approach with your data, if you observe a different behavior feel free to post it.
I have a stored procedure with multiple mandatory parameters and a SELECT statement inside it which has multiple conditions in its WHERE clause, like below:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = param_3;
This query works fine and it uses the indexes on the table correctly. But a change in requirements implied adjusting the procedure so that you can pass it less parameters, so maybe just the first two, but we want the procedure to work with minimal changes to the stored procedure.
One of the suggestions I've made was to use a DECODE function to treat each possibly NULL parameter, like this:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = DECODE(param_3, null, column_3);
And this way, I considered that because the function is not applied on the table column, the index will still be used. I have made some tests and the query still works and uses the indexes even in this situation.
But I'm still getting contradicted by our architect (with no other explanations), that the query will not use the index because I'm using a function in the WHERE clause.
I'm not sure if my change is enough proof that it will always use the index, or if there are other situations which I should check and in which the index might not be used because of the DECODE function.
Any help / suggestions / information will be very much appreciated.
You are right. Test it and prove it.
Setup
SQL> CREATE TABLE t AS SELECT LEVEL id FROM dual CONNECT BY LEVEL <=10;
Table created.
SQL>
SQL> CREATE INDEX id_indx ON t(ID);
Index created.
Test case
Normal query, without any function:
SQL> set autot on explain
SQL>
SQL> SELECT * FROM t WHERE ID = 5;
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using DECODE on the value(not on column):
SQL> SELECT * FROM t WHERE ID = decode(5, NULL, 3, 5);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using NVL on the value(not on column):
SQL> SELECT * FROM t WHERE ID = nvl(5, 3);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Above all the three cases, index is used.
DECODE on the column:
SQL> SELECT * FROM t WHERE decode(ID, NULL, 3, 5) = 5;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(DECODE(TO_CHAR("ID"),NULL,3,5)=5)
NVL on the column:
SQL> SELECT * FROM t WHERE nvl(ID, 3) = 3;
ID
----------
3
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL("ID",3)=3)
SQL>
As expected, index is not used as you are applying a function on the column having a regular index. You need a function-based index.
So, you are right, you don't have to worry about index usage when you are not applying the function on the column, but on the parameter value.
I executed a SQL statement and come across a mess. I am not able to understand how this output is coming.
My employee table is: Emp_Id is primary key and dept_no is a foreign key to some other table.
EMP_ID EMP_NAME DEPT_NO MGR_NAME MGR_NO
---------- -------------------- ---------- ---------- -----------
111 Anish 121 Tanuj 1123
112 Aman 122 Jasmeet 1234
1123 Tanuj 122 Vipul 122
1234 Jasmeet 122 Anish 111
122 Vipul 123 Aman 112
100 Chetan 123 Anoop 666
101 Antal Aman
1011 Anjali 126
1111 Angelina 127
My dep1 table is:
DEPT_ID DEPT_NAME
---------- -------------
121 CSE
122 ECE
123 MEC
And the two tables are not linked at all.
The SQL Query is:
SQL> select emp_name
from employee
where dept_no IN (select dept_no from dep1 where dept_name='MEC');
And the output is:
EMP_NAME
--------------------
Anish
Aman
Tanuj
Jasmeet
Vipul
Chetan
Anjali
Angelina
8 rows selected.
And if I change the where condition to dept_name='me' it returns no rows.
Can someone explain why the execution is not generating an error since dept_no is not the column of dep1 table. And how the output is being generated.
if you run this query:
select emp_name
from employee
where dept_no IN (select t.dept_no from dep1 t where dept_name='MEC');
you will see the error there for in your query dept_no comes from employee table (not from dep1 table)and when dept_no is null ,no result will be come back from it and if you change your dept_name to something which is not in dep1 table it is clear that your dep1 table returns nothing and then dept_no cant be in nothing.
As from your query,
..where dept_no IN (select dept_no ...); -- it is similar as using EXISTS
The condition EXISTS is being done here: ( oracle won't return error for EXISTS clause).
CREATE TABLE my_test(ID INT);
CREATE TABLE my_new_test ( new_ID INT);
EXPLAIN PLAN FOR
select * from my_test where id in( select id from my_new_test);
select * from table(dbms_xplan.display);
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | MY_TEST | 1 | 13 | 2 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MY_NEW_TEST | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "MY_NEW_TEST" "MY_NEW_TEST" WHERE
:B1=:B2))
3 - filter(:B1=:B2)
Note
-----
- dynamic sampling used for this statement (level=2)
If you execute the plan for the valid columns , here ( new_id):
then normal access is done:
1 - ACCESS("ID"="NEW_ID")
And will cause and error for the following:
EXPLAIN PLAN FOR
select * from my_test where id in( select some_thing from my_new_test);
SQL Error: ORA-00904: "SOME_THING": invalid identifier
Let me try to answer it.
Oracle uses optimizer to decide the explain plan. Oracle may rewrite your query as it likes and thinks which one is better. And in and exists are interchangeable and performance depend on different things. (exists ends in full table scan and in uses index).
Let me take your case. Below is the explain plan for your query
Plan hash value: 3333342911
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 225 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL| DEPARTMENT | 1 | 12 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "DEPARTMENT" "DEPARTMENT" WHERE
:B1=:B2 AND "DEPT_NAME"='MEC'))
3 - filter(:B1=:B2)
4 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
This explain plan clearly shows that the query is re-written to use exists and it is equivalent to
select emp_name from employee where exists (select 0 from department where dept_name = 'MEC' and dept_no = dept_no);
The above query is a valid query and you are getting the right results.
Bind variables are nothing but dept_no (joining column).
Refer this IN vs EXISTS in oracle link to know more about in and exists.
Your explain plan is completely different if you use the correct column name. Below is the query and explain plan
Query:
select emp_name from employee where dept_no IN (select dept_id from department where dept_name='MEC');
Explain Plan:
Plan hash value: 3817251802
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 100 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 2 | 100 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| DEPARTMENT | 1 | 25 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEPT_NO"="DEPT_ID")
3 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
Oracle thinks it is better to use the filter and hash join to get the required details.
This behaviour depends on the oracle query parser and optimizer.
How about a join?
select a.emp_name
from employee a
join dep1 b
on a.dept_no = b.dept_id
where b.dept_name = 'MEC'