Understanding Multi-Row Subqueries in Oracle 10g - oracle

I executed a SQL statement and come across a mess. I am not able to understand how this output is coming.
My employee table is: Emp_Id is primary key and dept_no is a foreign key to some other table.
EMP_ID EMP_NAME DEPT_NO MGR_NAME MGR_NO
---------- -------------------- ---------- ---------- -----------
111 Anish 121 Tanuj 1123
112 Aman 122 Jasmeet 1234
1123 Tanuj 122 Vipul 122
1234 Jasmeet 122 Anish 111
122 Vipul 123 Aman 112
100 Chetan 123 Anoop 666
101 Antal Aman
1011 Anjali 126
1111 Angelina 127
My dep1 table is:
DEPT_ID DEPT_NAME
---------- -------------
121 CSE
122 ECE
123 MEC
And the two tables are not linked at all.
The SQL Query is:
SQL> select emp_name
from employee
where dept_no IN (select dept_no from dep1 where dept_name='MEC');
And the output is:
EMP_NAME
--------------------
Anish
Aman
Tanuj
Jasmeet
Vipul
Chetan
Anjali
Angelina
8 rows selected.
And if I change the where condition to dept_name='me' it returns no rows.
Can someone explain why the execution is not generating an error since dept_no is not the column of dep1 table. And how the output is being generated.

if you run this query:
select emp_name
from employee
where dept_no IN (select t.dept_no from dep1 t where dept_name='MEC');
you will see the error there for in your query dept_no comes from employee table (not from dep1 table)and when dept_no is null ,no result will be come back from it and if you change your dept_name to something which is not in dep1 table it is clear that your dep1 table returns nothing and then dept_no cant be in nothing.

As from your query,
..where dept_no IN (select dept_no ...); -- it is similar as using EXISTS
The condition EXISTS is being done here: ( oracle won't return error for EXISTS clause).
CREATE TABLE my_test(ID INT);
CREATE TABLE my_new_test ( new_ID INT);
EXPLAIN PLAN FOR
select * from my_test where id in( select id from my_new_test);
select * from table(dbms_xplan.display);
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | MY_TEST | 1 | 13 | 2 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MY_NEW_TEST | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "MY_NEW_TEST" "MY_NEW_TEST" WHERE
:B1=:B2))
3 - filter(:B1=:B2)
Note
-----
- dynamic sampling used for this statement (level=2)
If you execute the plan for the valid columns , here ( new_id):
then normal access is done:
1 - ACCESS("ID"="NEW_ID")
And will cause and error for the following:
EXPLAIN PLAN FOR
select * from my_test where id in( select some_thing from my_new_test);
SQL Error: ORA-00904: "SOME_THING": invalid identifier

Let me try to answer it.
Oracle uses optimizer to decide the explain plan. Oracle may rewrite your query as it likes and thinks which one is better. And in and exists are interchangeable and performance depend on different things. (exists ends in full table scan and in uses index).
Let me take your case. Below is the explain plan for your query
Plan hash value: 3333342911
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 225 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL| DEPARTMENT | 1 | 12 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "DEPARTMENT" "DEPARTMENT" WHERE
:B1=:B2 AND "DEPT_NAME"='MEC'))
3 - filter(:B1=:B2)
4 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
This explain plan clearly shows that the query is re-written to use exists and it is equivalent to
select emp_name from employee where exists (select 0 from department where dept_name = 'MEC' and dept_no = dept_no);
The above query is a valid query and you are getting the right results.
Bind variables are nothing but dept_no (joining column).
Refer this IN vs EXISTS in oracle link to know more about in and exists.
Your explain plan is completely different if you use the correct column name. Below is the query and explain plan
Query:
select emp_name from employee where dept_no IN (select dept_id from department where dept_name='MEC');
Explain Plan:
Plan hash value: 3817251802
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 100 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 2 | 100 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| DEPARTMENT | 1 | 25 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEPT_NO"="DEPT_ID")
3 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
Oracle thinks it is better to use the filter and hash join to get the required details.
This behaviour depends on the oracle query parser and optimizer.

How about a join?
select a.emp_name
from employee a
join dep1 b
on a.dept_no = b.dept_id
where b.dept_name = 'MEC'

Related

Oracle 12c to Oracle 19c Migration - Unusual behavior

I'm performing test before we migrate Oracle database from 12c to 19c.
I'm facing an unusual behavior, which can be explained with below example.
I've condensed it to reproduceable issue as below.
Sorry for making it very long post, I wanted to feed in all possible information.
If any further information is required, then I would be happy to provide that.
Oracle 12c & 19c versions are as below (from v$instance):
VERSION
12.1.0.2.0
VERSION VERSION_FULL
19.0.0.0.0 19.16.0.0.0
Sample Data
2 tables are as below
TAB1
COLUMN_NAME DATA_TYPE NULLABLE
COL1 VARCHAR2(20 BYTE) Yes
RUL_NO NUMBER(11,0) No
INP_DT TIMESTAMP(6) WITH LOCAL TIME ZONE No
TAB2
COLUMN_NAME DATA_TYPE NULLABLE
COL1 VARCHAR2(20 BYTE) No
COL6 NUMBER(11,0) No
COL7 VARCHAR2(5 BYTE) Yes
INP_DT TIMESTAMP(6) WITH LOCAL TIME ZONE No
Index on TAB2 -
create index tab2_IDX1 on tab2(col6);
create index tab2_IDX2 on tab2(col1);
Problem SQL
SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
This sql returns 10 rows on 12c db, but none on 19c db which is causing regression on 19c side.
Here's the output when this sql is run in trace mode.
12c Trace
SQL> set autotrace traceonly
SQL> set linesize 200
SQL> set pagesize 1000
SQL> SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
2 3 4 5 6 7
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 572408916
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 160 | 3 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | TAB1 | 10 | 160 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB2 | 1 | 15 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TAB2_IDX3 | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T"."COL1" IS NULL OR EXISTS (SELECT 0 FROM "TAB2" "B" WHERE
"B"."COL1"=NVL(:B1,'<NULL>') AND "B"."COL6"=1088609))
3 - filter("B"."COL6"=1088609)
4 - access("B"."COL1"=NVL(:B1,'<NULL>'))
Note
-----
- dynamic statistics used: dynamic sampling (level=4)
19c Trace
SQL> set autotrace traceonly
SQL> set linesize 200
SQL> set pagesize 1000
SQL>
SQL> SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
2 3 4 5 6 7
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 4175419084
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 5 (0)| 00:00:01 |
|* 1 | HASH JOIN SEMI NA | | 1 | 31 | 5 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | TAB1 | 10 | 160 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB2 | 1 | 15 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TAB2_IDX1 | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(NVL("T"."COL1",'<NULL>')="B"."COL1")
4 - access("B"."COL6"=1088609)
Note
-----
- this is an adaptive plan
Can somebody suggest why this behavior is observed in 19c, as it should return 10 rows like 12c db. It seems HASH JOIN SEMI NA step on 19c side is causing this issue, but I can't be sure.
Any help on this matter is very much appreciated.
Thanks,
Kailash
It seems that the 19c execution plan somehow loose the OR t.col1 IS NULL predicate in the Predicate Information
1 - access(NVL("T"."COL1",'<NULL>')="B"."COL1")
Which is most probably a bug (wrong predicate elimination??).
Anyway a workround (if you can change the query) seems to be to add the OR into the EXISTS subquery
SELECT *
FROM tab1 t
WHERE EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>')
OR t.col1 IS NULL );
This also implicitely disables the NA semi join back to the FILTER plan from 12c, which is an other sign that this is the cause of the wrong behaviour.
Open a SR with Oracle for a final solution!

ORA_ROWSCN queried properly but why is Oracle returning the wrong value in the results?

Why does Oracle sometimes return the wrong ORA_ROWSCN, such as in the following? (Note this does not seem to be a ROWDEPENDENCIES issue or a "greater than expected SCN" issue, as I realize both these caveats when using ORA_ROWSCN.)
When I run:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618
My result is:
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA2
1887508 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA3
1887512 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA7
...
Yep, you see that right. The ORA_ROWSCN returned is less than my literal value that I asked for greater-than in the query WHERE clause. (I also included otherSCN to see if it was throwing me off somehow, but it appears to be irrelevant)
It appears that the Row in question in reality has a higher ORA_ROWSCN, and indeed the WHERE clause worked properly, as when I then do SELECT ORA_ROWSCN FROM changed_rows_log WHERE changed_rows_log_id=1887507, I get 7884576380644 not 7884576380617.
Also, when I add just one WHERE condition, I also get the correct data returned:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618 AND l.changed_rows_log_id=1887507
gives me this, as expected
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380644 7884576380644 FOO AAARiGAAMAAG4B4AA2
So why does and how can SELECT ORA_ROWSCN give me simply incorrect data like this? Can I work around it somehow so I can get the expected ORA_ROWSCN that more particular queries give me?
(If it matters, changed_rows_log has ROWDEPENDENCIES enabled. I'm using Oracle Database 12.1.0.2.0 64-bit.)
More detail--the EXPLAIN PLAN for the first query (with bad value)
Plan hash value: 3153795477
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 3 | HASH JOIN | | 208K| 12M| 3424K| 30787 (1)| 00:00:02 |
|* 4 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 71438 | 2581K| | 14052 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 1428K| 34M| | 14058 (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("L"."CHANGED_ROWS_LOG_ID"=MAX("CHANGED_ROWS_LOG_ID"))
3 - access("L"."TABLE_NAME"="TABLE_NAME" AND "L"."RECORD_ROWID"="RECORD_ROWID")
4 - filter("ORA_ROWSCN">7884576380618)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 2 Sql Plan Directives used for this statement
And the last query above (correct value)
Plan hash value: 402632295
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 7 (15)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | 7 (15)| 00:00:01 |
| 3 | NESTED LOOPS | | 3 | 186 | 6 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID | CHANGED_ROWS_LOG | 1 | 37 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C00141068 | 1 | | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| CHANGED_ROWS_LOG | 3 | 75 | 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | CHANGED_ROWS_LOG_IF1 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(MAX("CHANGED_ROWS_LOG_ID")=1887507)
4 - filter("ORA_ROWSCN">7884576380618)
5 - access("L"."CHANGED_ROWS_LOG_ID"=1887507)
7 - access("L"."RECORD_ROWID"="RECORD_ROWID" AND "L"."TABLE_NAME"="TABLE_NAME")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
Adding the MATERIALIZE hint to the WITH subquery overcomes this issue. I'd love if someone could explain why the issue happens at all, but for now:
WITH maxIds as (
SELECT /*+ MATERIALIZE */ table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618

Improving query performance to search in clob column in big table

I have a big table (about 4.6 million records) with many columns, I have concatenated some columns and inserted it in clob column(one column have an alias of the name so the name came in a different way and it inserted in clob column )and that column with ctxsys.context index.
I searched with contains a function with the fuzzy operator and for performance, I added tow columns to search and those columns with a bitmap index
and analyzed the table by using DBMS_STATS.GATHER_TABLE_STATS() also I alter my index to take parallel with 4 degrees and increase SORT_AREA_SIZE to 8300000.
My problem is when I searched it's taken from 2 to 5 min to executed.
is there any way to improve performance and reduce time execution(another algorithm to speed searching or I can change the structure of my table by increase the columns and search in multiple columns).
Here is my query:
SELECT first_name,
last_name,
countries,
category,
aliases
FROM (SELECT first_name,
last_name,
countries,
category,
aliases,
rr
FROM (SELECT T.u_id,
T.first_name,
T.last_name,
T.countries,
T.category,
T.aliases,
ROWNUM rr,
all_data
FROM tbl_rsk_list_world T
WHERE t.countries = 'SPAIN'
AND category = 'Eng')
WHERE Contains(all_data, 'fuzzy(JOSE,60,,weight)', 1) > 0)
WHERE rr BETWEEN 1 AND 500
The Execution plan is:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2747287528
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20651 | 109M| 5724 (1
|* 1 | VIEW | | 20651 | 109M| 5724 (1
| 2 | COUNT | | | |
| 3 | PX COORDINATOR | | | |
| 4 | PX SEND QC (RANDOM)| :TQ10000 | 20651 | 4638K| 5724 (1
| 5 | PX BLOCK ITERATOR | | 20651 | 4638K| 5724 (1
|* 6 | TABLE ACCESS FULL| TBL_RSK_LIST_WORLD | 20651 | 4638K| 5724 (1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CTXSYS"."CONTAINS"("ALL_DATA",'fuzzy(jose,60,,weight)',1)>0 AND "
AND "from$_subquery$_002"."RR">=1)
6 - filter("COUNTRIES"='SPAIN' AND "CATEGORY"='Eng')
20 rows selected
when I using FIRST_ROWS and DOMAIN_INDEX_NO_SORT hints the execution plan be:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1488722846
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 173K|
|* 1 | VIEW | | 50 | 173K|
| 2 | COUNT | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| TBL_RSK_LIST_WORLD | 50 | 11500 |
|* 4 | DOMAIN INDEX | NDX_RSK_LIST_WORLD_CTX | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CATEGORY"='Eng' AND "COUNTRIES"='SPAIN' AND
"from$_subquery$_002"."RR"<=500 AND "from$_subquery$_002"."RR">=1)
4 - access("CTXSYS"."CONTAINS"("W"."ALL_DATA",'fuzzy(jose,60,,weight)',1)>0)
18 rows selected
but still the performance bad :\

Oracle Connect By vs Recursive User-Defined Function performance

I'm doing a basic performance check using both Connect By and a user-defined function to get a parent value. It seems that using a user-defined function performs better than the Connect By query.
I would like to know if using the user-defined function is supposed to be better performing as compared to Connect By.
create table org ( pid number, cid number, type varchar2(10), name varchar2(30) );
alter table org add constraint org_pk primary key ( cid ); -- UPDATE#2
insert into org values (null,1,'MGT','OP');
insert into org values (1,2,'DEP','HR');
insert into org values (1,3,'DEP','IT');
insert into org values (3,4,'DIV','WEB');
insert into org values (3,5,'DIV','DB');
insert into org values (4,6,'SEC','HTML');
insert into org values (4,7,'SEC','JAVA');
create or replace function get_dep ( p_cid in number ) return number
is
l_pid number;
l_cid number;
l_type varchar2(30);
begin
select pid
, cid
, type
into l_pid
, l_cid
, l_type
from org
where cid = p_cid;
if ( l_type = 'MGT' ) then
return null;
elsif ( l_type = 'DEP' ) then
return l_cid;
else
return get_dep ( l_pid );
end if;
end;
/
select cid --correction
from org
where type = 'DEP'
start
with cid = 7
connect
by
prior pid = cid
and
prior type != 'DEP'
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 66 | 6 (17)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select get_dep ( cid )
from org
where cid = 7;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORG | 1 | 13 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #1:
I updated the function to add a logic to return null if id is MGT.
Also, change the queries to fetch all records in the table.
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 10 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | ORG | 7 | 91 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| ORG | 7 | 91 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #2: Added index as suggested. The explain plan improved on both but the query with the user-defined function still performs better based on the explain plan (unless I'm not interpreting the plan correctly).
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY WITH FILTERING | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 5 | NESTED LOOPS | | 1 | 53 | 2 (0)| 00:00:01 |
|* 6 | CONNECT BY PUMP | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 9 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 1 (0)| 00:00:01 |
| 1 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
Appreciate any feedback.
First of all, in your example, SQL and PL/SQL return different results.
SQL> select pid
2 from org
3 where type = 'DEP'
4 start
5 with cid = 7
6 connect
7 by
8 prior pid = cid
9 and
10 prior type != 'DEP';
PID
----------
1
SQL>
SQL> select get_dep ( cid )
2 from org
3 where cid = 7;
GET_DEP(CID)
------------
3
Secondly, it does not really make sense to compare different approaches on such extremely small data volumes.
Let's assume we have a tree with depth 999 999 and want to find a root for a given node.
In my example there is only one tree (which is actually a list since each parent has one child) therefore root is the same for all nodes.
The important thing is: the bigger depth of a given ID the longer execution time.
create table org0 ( pid number, cid number, name varchar2(30) );
insert into org0
select rownum, rownum+1, 'name' || rpad(rownum,25,'#')
from dual
connect by rownum < 1e6;
alter table org0 add constraint org0_pk primary key ( cid );
Function for returning the root
create or replace function get_id(p_cid in number) return number is
l_pid number;
begin
select pid into l_pid from org0 where cid = p_cid;
return get_id(l_pid);
exception
when no_data_found then
return p_cid;
end get_id;
/
Testing
SQL
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 10000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.07
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 100000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.55
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.79
PL/SQL
SQL> select get_id(10000) id from dual;
ID
----------
1
Elapsed: 00:00:00.15
SQL> select get_id(100000) id from dual;
ID
----------
1
Elapsed: 00:00:01.47
SQL> select get_id(1000000) id from dual;
ID
----------
1
Elapsed: 00:00:14.83
As you can see, PL/SQL is approximately 2 times slower.
In some specific cases PL/SQL may be faster though (not for your task).
You can read about fine grained performance analysis and using tools like dbms_hprof in this book Oracle SQL Revealed, chapter "When PL/SQL Is Better Than Vanilla SQL".
The hierarchical queries often lead to suboptimal performance. Frequent use of PL/SQL functions additionally introduce the problem of context switch.
One possible approach to get performance of a hierarchical query comparable with a single row index access is to define a materialize view that pre-calculates the query.
I’m using the identical data from the #Dr Y Wit answer.
create materialized view mv_org as
select
CID, PID, NAME, CONNECT_BY_ROOT PID ROOT_PID
from org0
start with pid in (
select pid from org0
MINUS
select cid from org0
)
connect by prior cid = pid;
Note that the MV contain the original data and adds the column PID_ROOT which is the pre-calculated root key.
CID PID NAME ROOT_PID
---------- ---------- ------------------------------ ----------
2 1 name1######################## 1
3 2 name2######################## 1
4 3 name3######################## 1
....
The performance of the queries is fine, as there is no need to do hierarchical query any more.
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.07
SQL> select root_pid from mv_org where cid = 1000000;
ROOT_PID
----------
1
Elapsed: 00:00:00.01
So if you can manage the changes in the hierarchical table in regular windows (say once per day or month) and performs refresh of the MV, you localize the complexity (and performance load) in this refresh and you regular queries are fast.
SQL> exec DBMS_MVIEW.REFRESH ('mv_org','c');
Elapsed: 00:00:27.58

Using DECODE on a parameter in WHERE clause will shortcircuit using the index?

I have a stored procedure with multiple mandatory parameters and a SELECT statement inside it which has multiple conditions in its WHERE clause, like below:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = param_3;
This query works fine and it uses the indexes on the table correctly. But a change in requirements implied adjusting the procedure so that you can pass it less parameters, so maybe just the first two, but we want the procedure to work with minimal changes to the stored procedure.
One of the suggestions I've made was to use a DECODE function to treat each possibly NULL parameter, like this:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = DECODE(param_3, null, column_3);
And this way, I considered that because the function is not applied on the table column, the index will still be used. I have made some tests and the query still works and uses the indexes even in this situation.
But I'm still getting contradicted by our architect (with no other explanations), that the query will not use the index because I'm using a function in the WHERE clause.
I'm not sure if my change is enough proof that it will always use the index, or if there are other situations which I should check and in which the index might not be used because of the DECODE function.
Any help / suggestions / information will be very much appreciated.
You are right. Test it and prove it.
Setup
SQL> CREATE TABLE t AS SELECT LEVEL id FROM dual CONNECT BY LEVEL <=10;
Table created.
SQL>
SQL> CREATE INDEX id_indx ON t(ID);
Index created.
Test case
Normal query, without any function:
SQL> set autot on explain
SQL>
SQL> SELECT * FROM t WHERE ID = 5;
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using DECODE on the value(not on column):
SQL> SELECT * FROM t WHERE ID = decode(5, NULL, 3, 5);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using NVL on the value(not on column):
SQL> SELECT * FROM t WHERE ID = nvl(5, 3);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Above all the three cases, index is used.
DECODE on the column:
SQL> SELECT * FROM t WHERE decode(ID, NULL, 3, 5) = 5;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(DECODE(TO_CHAR("ID"),NULL,3,5)=5)
NVL on the column:
SQL> SELECT * FROM t WHERE nvl(ID, 3) = 3;
ID
----------
3
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL("ID",3)=3)
SQL>
As expected, index is not used as you are applying a function on the column having a regular index. You need a function-based index.
So, you are right, you don't have to worry about index usage when you are not applying the function on the column, but on the parameter value.

Resources