I'm performing test before we migrate Oracle database from 12c to 19c.
I'm facing an unusual behavior, which can be explained with below example.
I've condensed it to reproduceable issue as below.
Sorry for making it very long post, I wanted to feed in all possible information.
If any further information is required, then I would be happy to provide that.
Oracle 12c & 19c versions are as below (from v$instance):
VERSION
12.1.0.2.0
VERSION VERSION_FULL
19.0.0.0.0 19.16.0.0.0
Sample Data
2 tables are as below
TAB1
COLUMN_NAME DATA_TYPE NULLABLE
COL1 VARCHAR2(20 BYTE) Yes
RUL_NO NUMBER(11,0) No
INP_DT TIMESTAMP(6) WITH LOCAL TIME ZONE No
TAB2
COLUMN_NAME DATA_TYPE NULLABLE
COL1 VARCHAR2(20 BYTE) No
COL6 NUMBER(11,0) No
COL7 VARCHAR2(5 BYTE) Yes
INP_DT TIMESTAMP(6) WITH LOCAL TIME ZONE No
Index on TAB2 -
create index tab2_IDX1 on tab2(col6);
create index tab2_IDX2 on tab2(col1);
Problem SQL
SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
This sql returns 10 rows on 12c db, but none on 19c db which is causing regression on 19c side.
Here's the output when this sql is run in trace mode.
12c Trace
SQL> set autotrace traceonly
SQL> set linesize 200
SQL> set pagesize 1000
SQL> SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
2 3 4 5 6 7
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 572408916
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 160 | 3 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | TAB1 | 10 | 160 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB2 | 1 | 15 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TAB2_IDX3 | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T"."COL1" IS NULL OR EXISTS (SELECT 0 FROM "TAB2" "B" WHERE
"B"."COL1"=NVL(:B1,'<NULL>') AND "B"."COL6"=1088609))
3 - filter("B"."COL6"=1088609)
4 - access("B"."COL1"=NVL(:B1,'<NULL>'))
Note
-----
- dynamic statistics used: dynamic sampling (level=4)
19c Trace
SQL> set autotrace traceonly
SQL> set linesize 200
SQL> set pagesize 1000
SQL>
SQL> SELECT *
FROM tab1 t
WHERE (EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>'))
OR t.col1 IS NULL);
2 3 4 5 6 7
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 4175419084
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 5 (0)| 00:00:01 |
|* 1 | HASH JOIN SEMI NA | | 1 | 31 | 5 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | TAB1 | 10 | 160 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB2 | 1 | 15 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TAB2_IDX1 | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(NVL("T"."COL1",'<NULL>')="B"."COL1")
4 - access("B"."COL6"=1088609)
Note
-----
- this is an adaptive plan
Can somebody suggest why this behavior is observed in 19c, as it should return 10 rows like 12c db. It seems HASH JOIN SEMI NA step on 19c side is causing this issue, but I can't be sure.
Any help on this matter is very much appreciated.
Thanks,
Kailash
It seems that the 19c execution plan somehow loose the OR t.col1 IS NULL predicate in the Predicate Information
1 - access(NVL("T"."COL1",'<NULL>')="B"."COL1")
Which is most probably a bug (wrong predicate elimination??).
Anyway a workround (if you can change the query) seems to be to add the OR into the EXISTS subquery
SELECT *
FROM tab1 t
WHERE EXISTS (SELECT 1
FROM tab2 b
WHERE b.col6 = 1088609
AND NVL(t.col1, '<NULL>') = NVL(b.col1, '<NULL>')
OR t.col1 IS NULL );
This also implicitely disables the NA semi join back to the FILTER plan from 12c, which is an other sign that this is the cause of the wrong behaviour.
Open a SR with Oracle for a final solution!
Related
I want to make a simple query in pl sql
Please suggest and how to make it MORE FAST EXECUTE (maybe only 0.01 second in 1000000 data)
first query:
select datetime
from product
order by datetime desc
FETCH NEXT 1 ROWS ONLY
Result of first query will be used in second query.
select *
from traceability
where endtime = [first query]
Please help me to implement that logic to pl sql
Thank you.
Please find bellow an example with sample data.
create table product as
select rownum product_id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') datetime
from dual connect by level <= 10;
create index product_idx on product(datetime);
create table traceability as
select
rownum id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') endtime
from dual connect by level <= 10;
create index traceability_idx on traceability(endtime);
Your query shou be as follows
select *
from traceability
where endtime =
(select max(datetime)
from product );
The query will lead to this execution plan. See here how to get the execution plan.
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | TRACEABILITY | 1 | 22 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TRACEABILITY_IDX | 1 | | 1 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 9 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| PRODUCT_IDX | 1 | 9 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ENDTIME"= (SELECT MAX("DATETIME") FROM "PRODUCT" "PRODUCT"))
Note that in case that in the table TRACEABILITY will be a large number of rows with the max timestamp, you can also see a FULL TABLE SCAN in the line 1.
Similar is valid for the PRODUCT table and the line 4
I'm doing a basic performance check using both Connect By and a user-defined function to get a parent value. It seems that using a user-defined function performs better than the Connect By query.
I would like to know if using the user-defined function is supposed to be better performing as compared to Connect By.
create table org ( pid number, cid number, type varchar2(10), name varchar2(30) );
alter table org add constraint org_pk primary key ( cid ); -- UPDATE#2
insert into org values (null,1,'MGT','OP');
insert into org values (1,2,'DEP','HR');
insert into org values (1,3,'DEP','IT');
insert into org values (3,4,'DIV','WEB');
insert into org values (3,5,'DIV','DB');
insert into org values (4,6,'SEC','HTML');
insert into org values (4,7,'SEC','JAVA');
create or replace function get_dep ( p_cid in number ) return number
is
l_pid number;
l_cid number;
l_type varchar2(30);
begin
select pid
, cid
, type
into l_pid
, l_cid
, l_type
from org
where cid = p_cid;
if ( l_type = 'MGT' ) then
return null;
elsif ( l_type = 'DEP' ) then
return l_cid;
else
return get_dep ( l_pid );
end if;
end;
/
select cid --correction
from org
where type = 'DEP'
start
with cid = 7
connect
by
prior pid = cid
and
prior type != 'DEP'
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 66 | 6 (17)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select get_dep ( cid )
from org
where cid = 7;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORG | 1 | 13 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #1:
I updated the function to add a logic to return null if id is MGT.
Also, change the queries to fetch all records in the table.
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 10 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | ORG | 7 | 91 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| ORG | 7 | 91 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #2: Added index as suggested. The explain plan improved on both but the query with the user-defined function still performs better based on the explain plan (unless I'm not interpreting the plan correctly).
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY WITH FILTERING | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 5 | NESTED LOOPS | | 1 | 53 | 2 (0)| 00:00:01 |
|* 6 | CONNECT BY PUMP | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 9 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 1 (0)| 00:00:01 |
| 1 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
Appreciate any feedback.
First of all, in your example, SQL and PL/SQL return different results.
SQL> select pid
2 from org
3 where type = 'DEP'
4 start
5 with cid = 7
6 connect
7 by
8 prior pid = cid
9 and
10 prior type != 'DEP';
PID
----------
1
SQL>
SQL> select get_dep ( cid )
2 from org
3 where cid = 7;
GET_DEP(CID)
------------
3
Secondly, it does not really make sense to compare different approaches on such extremely small data volumes.
Let's assume we have a tree with depth 999 999 and want to find a root for a given node.
In my example there is only one tree (which is actually a list since each parent has one child) therefore root is the same for all nodes.
The important thing is: the bigger depth of a given ID the longer execution time.
create table org0 ( pid number, cid number, name varchar2(30) );
insert into org0
select rownum, rownum+1, 'name' || rpad(rownum,25,'#')
from dual
connect by rownum < 1e6;
alter table org0 add constraint org0_pk primary key ( cid );
Function for returning the root
create or replace function get_id(p_cid in number) return number is
l_pid number;
begin
select pid into l_pid from org0 where cid = p_cid;
return get_id(l_pid);
exception
when no_data_found then
return p_cid;
end get_id;
/
Testing
SQL
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 10000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.07
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 100000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.55
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.79
PL/SQL
SQL> select get_id(10000) id from dual;
ID
----------
1
Elapsed: 00:00:00.15
SQL> select get_id(100000) id from dual;
ID
----------
1
Elapsed: 00:00:01.47
SQL> select get_id(1000000) id from dual;
ID
----------
1
Elapsed: 00:00:14.83
As you can see, PL/SQL is approximately 2 times slower.
In some specific cases PL/SQL may be faster though (not for your task).
You can read about fine grained performance analysis and using tools like dbms_hprof in this book Oracle SQL Revealed, chapter "When PL/SQL Is Better Than Vanilla SQL".
The hierarchical queries often lead to suboptimal performance. Frequent use of PL/SQL functions additionally introduce the problem of context switch.
One possible approach to get performance of a hierarchical query comparable with a single row index access is to define a materialize view that pre-calculates the query.
I’m using the identical data from the #Dr Y Wit answer.
create materialized view mv_org as
select
CID, PID, NAME, CONNECT_BY_ROOT PID ROOT_PID
from org0
start with pid in (
select pid from org0
MINUS
select cid from org0
)
connect by prior cid = pid;
Note that the MV contain the original data and adds the column PID_ROOT which is the pre-calculated root key.
CID PID NAME ROOT_PID
---------- ---------- ------------------------------ ----------
2 1 name1######################## 1
3 2 name2######################## 1
4 3 name3######################## 1
....
The performance of the queries is fine, as there is no need to do hierarchical query any more.
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.07
SQL> select root_pid from mv_org where cid = 1000000;
ROOT_PID
----------
1
Elapsed: 00:00:00.01
So if you can manage the changes in the hierarchical table in regular windows (say once per day or month) and performs refresh of the MV, you localize the complexity (and performance load) in this refresh and you regular queries are fast.
SQL> exec DBMS_MVIEW.REFRESH ('mv_org','c');
Elapsed: 00:00:27.58
I'm playing with Oracle 12 and indexes...
In a query like this:
SELECT a, b, c FROM table WHERE col1 = val1 AND col2 = val2 ORDER BY id DESC
(where id is the primary key of the table), Oracle always uses the index on the primary key.
So even if I create an index on the columns col1 and col2, since there's the ORDER BY statement, it doesn't use the index.
So can I infer that this is a general rule? Should I never put extra indexes in case all my queries contains "ORDER BY ID" ?
Here is my table structure:
ID NUMBER GENERATED ALWAYS AS IDENTITY NOCACHE ORDER,
USERNAME VARCHAR2(30 CHAR)
TYPE_A CHAR(1 BYTE)
TYPE_B CHAR(1 BYTE)
CREATED DATE
UPDATED DATE
ALTER TABLE my_table
ADD CONSTRAINT my_table_pk
PRIMARY KEY (ID)
USING INDEX TABLESPACE XXX;
On the table I perform only this query:
SELECT id, USERNAME, TYPE_A, TYPE_B, CREATED FROM table
where username = 'MYUSER'
AND created >= TO_DATE('2016-01-01','YYYY-MM-DD')
AND created <= TO_DATE('2016-06-30','YYYY-MM-DD')
AND TYPE_A = 1
order by ID desc;
One index: on pk (ID) (automatically created by oracle)
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 384 | 1 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| table | 2 | 384 | 1 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN DESCENDING| INDEX_PK | 10 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
Two indexes: first on pk and second on (USERNAME, CREATED, TYPE_A)
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 384 | 1 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| table | 2 | 384 | 1 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN DESCENDING| INDEX_PK | 10 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
So the second index seems to be useless.
By the way If i remove the ORDER BY statement, Oracle uses the second index on USERNAME, CREATED, TYPE_A.
Thanks all!
Let me just give you a counterexample which shows that there are cases where Oracle will use the second index.
SQL> create table tab (
2 ID NUMBER GENERATED ALWAYS AS IDENTITY NOCACHE ORDER,
3 USERNAME VARCHAR2(30 CHAR),
4 TYPE_A CHAR(1 BYTE),
5 TYPE_B CHAR(1 BYTE),
6 CREATED DATE,
7 UPDATED DATE
8 )
9 /
Table created.
SQL> alter table tab add constraint tab_pk primary key (id) using index
2 /
Table altered.
SQL> create index SECOND_IDX on tab(username, created, type_a)
2 /
Index created.
SQL> insert into tab(username, type_a, type_b, created)
2 select 'OTHER_USER', '2', '2', date '2015-06-01'
3 from all_objects, all_objects where rownum <= 1e5;
100000 rows created.
SQL>
SQL> update tab
2 set username = 'MYUSER',
3 created = DATE '2016-06-01',
4 type_a = '1'
5 where id = 50000;
1 row updated.
SQL> commit;
Commit complete.
SQL> begin
2 dbms_stats.gather_table_stats(ownname => USER,
3 tabname => 'TAB',
4 estimate_percent => 100,
5 method_opt => 'FOR ALL INDEXED COLUMNS'
6 );
7 end;
8 /
PL/SQL procedure successfully completed.
SQL>
SQL> set autotrace traceonly exp
SQL>
SQL> SELECT id, USERNAME, TYPE_A, TYPE_B, CREATED FROM tab
2 where username = 'MYUSER'
3 AND created >= TO_DATE('2016-01-01','YYYY-MM-DD')
4 AND created <= TO_DATE('2016-06-30','YYYY-MM-DD')
5 AND TYPE_A = '1'
6 order by ID desc;
Execution Plan
----------------------------------------------------------
Plan hash value: 3658386757
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 5 (20)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 29 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB | 1 | 29 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | SECOND_IDX | 1 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
In this case the reason to use the second index is the extremely high selectivity (one row out of 100000).
Well, in short answer - no, but we can't give you a general rule, because every time will be different as to a lot of different variable. For more specific answer you should include an explain plan of this query, and we'll have a better picture on why it doesn't use the index.
Oracle will know to use this index as long as ID column will be specified first .
You shouldn't add unnecessary indexes for selects that will only occur once in a lot of time or those that are slow, but not too slow. You should only add indexes related to the most common selects/updates that occurs on this table.
If a select with filters on col1 and col2 is repeatedly , then most likely(again, I don't know what other processes you are doing on this table) an index on all 3 columns will be better :
(ID,Col1,Col2)
There is example of using a function-based indexes in the documentation Concepts Oracle 11G:
A function-based index is also useful for indexing only specific rows
in a table. For example, the cust_valid column in the sh.customers
table has either I or A as a value. To index only the A rows, you
could write a function that returns a null value for any rows other
than the A rows.
I can imagine only this use case: the reducing size of index, by eliminating some rows by condition. Is there other use cases when this possibility is useful?
Let's take a look at function-based indexes:
SQL> create table tab1 as select object_name from all_objects;
Table created.
SQL> exec dbms_stats.gather_table_stats(user, 'TAB1');
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 1117438016
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 18 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
|* 2 | TABLE ACCESS FULL| TAB1 | 181 | 3439 | 18 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
63 consistent gets
...
As you know, all the objects have unique names, but oracle has to analyze all 181 rows and performs 63 consistent gets (physical or logical block reads)
Let's create a function-based index:
SQL> create index tab1_obj_name_idx on tab1(lower(object_name));
Index created.
SQL> select count(*) from tab1 where lower(object_name) = 'all_tables';
Execution Plan
----------------------------------------------------------
Plan hash value: 707634933
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | INDEX RANGE SCAN| TAB1_OBJ_NAME_IDX | 181 | 3077 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(LOWER("OBJECT_NAME")='all_tables')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
...
As you can see the cost cuts down (from 18 to 1) dramatically and there are only 2 consistent gets.
So function-based indexes can increase the performance of your application very well.
I executed a SQL statement and come across a mess. I am not able to understand how this output is coming.
My employee table is: Emp_Id is primary key and dept_no is a foreign key to some other table.
EMP_ID EMP_NAME DEPT_NO MGR_NAME MGR_NO
---------- -------------------- ---------- ---------- -----------
111 Anish 121 Tanuj 1123
112 Aman 122 Jasmeet 1234
1123 Tanuj 122 Vipul 122
1234 Jasmeet 122 Anish 111
122 Vipul 123 Aman 112
100 Chetan 123 Anoop 666
101 Antal Aman
1011 Anjali 126
1111 Angelina 127
My dep1 table is:
DEPT_ID DEPT_NAME
---------- -------------
121 CSE
122 ECE
123 MEC
And the two tables are not linked at all.
The SQL Query is:
SQL> select emp_name
from employee
where dept_no IN (select dept_no from dep1 where dept_name='MEC');
And the output is:
EMP_NAME
--------------------
Anish
Aman
Tanuj
Jasmeet
Vipul
Chetan
Anjali
Angelina
8 rows selected.
And if I change the where condition to dept_name='me' it returns no rows.
Can someone explain why the execution is not generating an error since dept_no is not the column of dep1 table. And how the output is being generated.
if you run this query:
select emp_name
from employee
where dept_no IN (select t.dept_no from dep1 t where dept_name='MEC');
you will see the error there for in your query dept_no comes from employee table (not from dep1 table)and when dept_no is null ,no result will be come back from it and if you change your dept_name to something which is not in dep1 table it is clear that your dep1 table returns nothing and then dept_no cant be in nothing.
As from your query,
..where dept_no IN (select dept_no ...); -- it is similar as using EXISTS
The condition EXISTS is being done here: ( oracle won't return error for EXISTS clause).
CREATE TABLE my_test(ID INT);
CREATE TABLE my_new_test ( new_ID INT);
EXPLAIN PLAN FOR
select * from my_test where id in( select id from my_new_test);
select * from table(dbms_xplan.display);
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | MY_TEST | 1 | 13 | 2 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MY_NEW_TEST | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "MY_NEW_TEST" "MY_NEW_TEST" WHERE
:B1=:B2))
3 - filter(:B1=:B2)
Note
-----
- dynamic sampling used for this statement (level=2)
If you execute the plan for the valid columns , here ( new_id):
then normal access is done:
1 - ACCESS("ID"="NEW_ID")
And will cause and error for the following:
EXPLAIN PLAN FOR
select * from my_test where id in( select some_thing from my_new_test);
SQL Error: ORA-00904: "SOME_THING": invalid identifier
Let me try to answer it.
Oracle uses optimizer to decide the explain plan. Oracle may rewrite your query as it likes and thinks which one is better. And in and exists are interchangeable and performance depend on different things. (exists ends in full table scan and in uses index).
Let me take your case. Below is the explain plan for your query
Plan hash value: 3333342911
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 225 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL| DEPARTMENT | 1 | 12 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "DEPARTMENT" "DEPARTMENT" WHERE
:B1=:B2 AND "DEPT_NAME"='MEC'))
3 - filter(:B1=:B2)
4 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
This explain plan clearly shows that the query is re-written to use exists and it is equivalent to
select emp_name from employee where exists (select 0 from department where dept_name = 'MEC' and dept_no = dept_no);
The above query is a valid query and you are getting the right results.
Bind variables are nothing but dept_no (joining column).
Refer this IN vs EXISTS in oracle link to know more about in and exists.
Your explain plan is completely different if you use the correct column name. Below is the query and explain plan
Query:
select emp_name from employee where dept_no IN (select dept_id from department where dept_name='MEC');
Explain Plan:
Plan hash value: 3817251802
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 100 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 2 | 100 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMPLOYEE | 9 | 225 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| DEPARTMENT | 1 | 25 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEPT_NO"="DEPT_ID")
3 - filter("DEPT_NAME"='MEC')
Note
-----
- dynamic sampling used for this statement (level=2)
Oracle thinks it is better to use the filter and hash join to get the required details.
This behaviour depends on the oracle query parser and optimizer.
How about a join?
select a.emp_name
from employee a
join dep1 b
on a.dept_no = b.dept_id
where b.dept_name = 'MEC'