I have a procedure , in which a table's columns is being filled using sum and nvl functions on other tables' column. These update queries are slow and which is making overall Proc slow.One of such update query is below:
UPDATE t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
Here t_overall and t_final , both the tables do not have any indexes as they have multiple updates in the overall procedure. Number of records for table t_final is around 8500 and for table t_overall is around 13000. Is there any other way , I can write above query in more optimized way?
Edit 1: Here SUM(NVL(pct,0)) function is first replacing null to 0 in 'pct' column of table t_overall and then adds all pct values using sum function and updates pct column of the table t_final depending on the criteria.
Explain plan returns below:
OPERATION OBJECT_NAME CARDINALITY COST
UPDATE STATEMENT 6 424
UPDATE T_FINAL
TABLE ACCESS(FULL) T_FINAL 6 238
. Filter Predicates
. AND
. RTYPE=6
. SID='R12'
. RID=9
. PID=21
SORT(AGGREGATE) 1
TABLE ACCESS(FULL) T_OVERALL 1 30
Filter Predicates
AND
MID-:B1
RTYPE=6
SID='R12'
RID=9
PID=21
Updated number of rows are around 2200
Edit 2: I have run update query with hint /*+ gather_plan_statistics */ as below:
ALTER session SET statistics_level=ALL;
UPDATE /*+ gather_plan_statistics */ t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
select * from
table (dbms_xplan.display_cursor (format=>'ALLSTATS LAST'));
The result is:
SQL_ID gypnfv5nzurb0, child number 1
-------------------------------------
select child_number from v$sql where sql_id = :1 order by
child_number
Plan hash value: 4252345203
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | | | |
| 1 | SORT ORDER BY | | 1 | 1 | 2 |00:00:00.01 | 2048 | 2048 | 2048 (0)|
|* 2 | FIXED TABLE FIXED INDEX| X$KGLCURSOR_CHILD (ind:2) | 1 | 1 | 2 |00:00:00.01 | | | |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("KGLOBT03"=:1 AND "INST_ID"=USERENV('INSTANCE')))
Thank you.
You did not provide enough information to make unique diagnose, so I can only hint you how to troubleshoot your query.
Here is my setup simulation your data
create table t_final as
select rownum mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, 0 pct from dual
connect by level <= 8800;
drop table T_OVERALL;
create table T_OVERALL as
select mod(rownum,8800) mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, rownum pct from dual
connect by level <= 13000;
Now I run the query activating the statistics gathering to see what the query is doing:
SQL> UPDATE /*+ gather_plan_statistics */ t_final wp
2 SET PCT =
3 (
4 SELECT SUM(NVL(pct,0))
5 FROM t_overall
6 WHERE rid = 9
7 AND rtype = 1
8 AND sid = 'r12'
9 AND pid = 21
10 AND mid = wp.mid
11 )
12 WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
2200 rows updated.
Elapsed: 00:00:00.97
So nearly one second elapsed time, which is is slow if you have lot of such updates. To see the cause we display the cursor and the statsitics (hist is possible using the hint /*+ gather_plan_statistics */)
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
SQL_ID 3ctaz5gvksb54, child number 0
-------------------------------------
UPDATE /*+ gather_plan_statistics */ t_final wp SET PCT = (
SELECT SUM(NVL(pct,0)) FROM t_overall WHERE rid
= 9 AND rtype = 1 AND sid = 'r12' AND pid =
21 AND mid = wp.mid ) WHERE rid = 9 AND rtype =
1 AND sid = 'r12' AND pid = 21
Plan hash value: 1255260726
-------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.96 | 116K|
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.96 | 116K|
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.92 | 112K|
|* 4 | TABLE ACCESS FULL| T_OVERALL | 2200 | 33 | 3250 |00:00:00.85 | 112K|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
2 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "SID"='r12'))
4 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "MID"=:B1 AND "SID"='r12'))
So you see the main problem was in the FULL TABLE SCAN on T_OVERALL which was called 2200 times (columns Starts, line 4).
A remedy could provide an Index based on the filter predicate of line 4:
create index T_OVERALL_IDX on T_OVERALL(mid, rid, rtype, sid, pid);
On the same data now I got:
Elapsed: 00:00:00.05
with the changed plan using now 2200 INDEX RANGE SCANs
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.05 | 10272 |
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.05 | 10272 |
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.01 | 5755 |
| 4 | TABLE ACCESS BY INDEX ROWID| T_OVERALL | 2200 | 33 | 3250 |00:00:00.01 | 5755 |
|* 5 | INDEX RANGE SCAN | T_OVERALL_IDX | 2200 | 1 | 3250 |00:00:00.01 | 2505 |
---------------------------------------------------------------------------------------------------------
Simple recheck the same approach with your data, if you observe a different behavior feel free to post it.
Related
In the example below Oracle's optimizer's estimated rows is incorrect by two orders of magnitude. How do I improve the estimated rows?
Table A has rows with numbers 1 through 1,000 for each of the 10 letters A through J.
Table C has 100 copies of table A.
So, table A has a cardinality of 10K and table C has a cardinality of 1M.
A given single-valued predicate on the number in table A will yield 1/1000 of the rows in table A (same for table C).
A given single-valued predicate on the letter in table A will yield 1/10 of the rows in table A (same for table C).
Setup script.
drop table C;
drop table A;
create table A
( num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into A (num, val, pad)
select mod(level-1, 1000) +1
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A'))
, ' '
from dual
connect by level <= 10*1000
;
create table C
( id NUMBER
, num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into C (id, num, val, pad)
with
"D1" as
( select /*+ materialize */ null from dual connect by level <= 100 --320
)
, "D" as
( select /*+ materialize */
level rn
, mod(level-1, 1000) + 1 num
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A')) val
, ' ' pad
from dual
connect by level <= 10*1000
order by 1 offset 0 rows
)
select rownum id
, num num
, val val
, pad pad
from "D1", "D"
;
commit;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
Consider the explain plan to the following query.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num = 1
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 1 | 47 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 100 | 5200 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."NUM"=1 AND "A"."VAL"='A')
3 - filter("C"."NUM"=1 AND "C"."VAL"='A')
The row cardinality of each step makes sense to me.
ID=2 --> (1/1,000) * (1/10) * 10,000 = 1
ID=3 --> (1/1,000) * (1/10) * 1,000,000 = 100
ID=1 --> 100 is correct. Predicates in ID=2 and ID=3 are the same, every row from ID=2 will have one and only one match in the row source from ID=3.
Now consider the explain plan to the slightly modified query below.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 2 | 94 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 200 | 10400 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."VAL"='A' AND ("A"."NUM"=1 OR "A"."NUM"=2))
3 - filter("C"."VAL"='A' AND ("C"."NUM"=1 OR "C"."NUM"=2))
The row cardinality of each step ID=2 and ID=3 makes sense to me, but now ID=1 is incorrect by two orders of magnitude.
ID=2 --> (1/1,000)(1/10) * 10,000 = 1
ID=3 --> (1/1,000)(1/10) * 1,000,000 = 100
ID=1 --> The optimizer's estimate is two orders of magnitude different from the actual.
Adding unique and foreign constraints and extended statistics did not improve the estimated row counts.
create unique index IU_A on A (num, val);
alter table A add constraint UK_A unique (num, val) rely using index IU_A enable validate;
alter table C add constraint R_C foreign key (num, val) references A (num, val) rely enable validate;
create index IR_C on C (num, val);
select dbms_stats.create_extended_stats(null,'A','(num, val)') from dual;
select dbms_stats.create_extended_stats(null,'C','(num, val)') from dual;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 2 | 198 | 10 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| A | 2 | 94 | 5 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IU_A | 2 | | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access(("A"."NUM"=1 OR "A"."NUM"=2) AND "A"."VAL"='A')
6 - access("A"."NUM"="C"."NUM" AND "C"."VAL"='A')
filter("C"."NUM"=1 OR "C"."NUM"=2)
What do I need to do to make the estimated rows better match reality?
Using Oracle Enterprise Edition 19c.
Thanks in advance.
Edit
After ensuring the most recent optimizer_features_enable was used and modifying one of the predicates, we still have an explain plan whose estimated row count is short by two orders of magnitude.
ID=6 ought to have an estimated rows of 100. It seems it is applying the predicate factor twice. Once for the access and again for the filter.
select /*+ optimizer_features_enable('19.1.0') */
*
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val in('A','B')
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 4 | 396 | 16 (0)| 00:00:01 |
| 1 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 2 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | table access by index ROWID BATCHED| A | 4 | 188 | 7 (0)| 00:00:01 |
|* 5 | index range scan | IU_A | 4 | | 3 (0)| 00:00:01 |
|* 6 | index range scan | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | table access by index ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."NUM"=1 or "A"."NUM"=2)
filter("A"."VAL"='A' or "A"."VAL"='B')
6 - access("A"."NUM"="C"."NUM" and "A"."VAL"="C"."VAL")
filter(("C"."NUM"=1 or "C"."NUM"=2) and ("C"."VAL"='A' or "C"."VAL"='B'))
I have a big table (about 4.6 million records) with many columns, I have concatenated some columns and inserted it in clob column(one column have an alias of the name so the name came in a different way and it inserted in clob column )and that column with ctxsys.context index.
I searched with contains a function with the fuzzy operator and for performance, I added tow columns to search and those columns with a bitmap index
and analyzed the table by using DBMS_STATS.GATHER_TABLE_STATS() also I alter my index to take parallel with 4 degrees and increase SORT_AREA_SIZE to 8300000.
My problem is when I searched it's taken from 2 to 5 min to executed.
is there any way to improve performance and reduce time execution(another algorithm to speed searching or I can change the structure of my table by increase the columns and search in multiple columns).
Here is my query:
SELECT first_name,
last_name,
countries,
category,
aliases
FROM (SELECT first_name,
last_name,
countries,
category,
aliases,
rr
FROM (SELECT T.u_id,
T.first_name,
T.last_name,
T.countries,
T.category,
T.aliases,
ROWNUM rr,
all_data
FROM tbl_rsk_list_world T
WHERE t.countries = 'SPAIN'
AND category = 'Eng')
WHERE Contains(all_data, 'fuzzy(JOSE,60,,weight)', 1) > 0)
WHERE rr BETWEEN 1 AND 500
The Execution plan is:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2747287528
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20651 | 109M| 5724 (1
|* 1 | VIEW | | 20651 | 109M| 5724 (1
| 2 | COUNT | | | |
| 3 | PX COORDINATOR | | | |
| 4 | PX SEND QC (RANDOM)| :TQ10000 | 20651 | 4638K| 5724 (1
| 5 | PX BLOCK ITERATOR | | 20651 | 4638K| 5724 (1
|* 6 | TABLE ACCESS FULL| TBL_RSK_LIST_WORLD | 20651 | 4638K| 5724 (1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CTXSYS"."CONTAINS"("ALL_DATA",'fuzzy(jose,60,,weight)',1)>0 AND "
AND "from$_subquery$_002"."RR">=1)
6 - filter("COUNTRIES"='SPAIN' AND "CATEGORY"='Eng')
20 rows selected
when I using FIRST_ROWS and DOMAIN_INDEX_NO_SORT hints the execution plan be:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1488722846
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 173K|
|* 1 | VIEW | | 50 | 173K|
| 2 | COUNT | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| TBL_RSK_LIST_WORLD | 50 | 11500 |
|* 4 | DOMAIN INDEX | NDX_RSK_LIST_WORLD_CTX | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CATEGORY"='Eng' AND "COUNTRIES"='SPAIN' AND
"from$_subquery$_002"."RR"<=500 AND "from$_subquery$_002"."RR">=1)
4 - access("CTXSYS"."CONTAINS"("W"."ALL_DATA",'fuzzy(jose,60,,weight)',1)>0)
18 rows selected
but still the performance bad :\
I'm doing a basic performance check using both Connect By and a user-defined function to get a parent value. It seems that using a user-defined function performs better than the Connect By query.
I would like to know if using the user-defined function is supposed to be better performing as compared to Connect By.
create table org ( pid number, cid number, type varchar2(10), name varchar2(30) );
alter table org add constraint org_pk primary key ( cid ); -- UPDATE#2
insert into org values (null,1,'MGT','OP');
insert into org values (1,2,'DEP','HR');
insert into org values (1,3,'DEP','IT');
insert into org values (3,4,'DIV','WEB');
insert into org values (3,5,'DIV','DB');
insert into org values (4,6,'SEC','HTML');
insert into org values (4,7,'SEC','JAVA');
create or replace function get_dep ( p_cid in number ) return number
is
l_pid number;
l_cid number;
l_type varchar2(30);
begin
select pid
, cid
, type
into l_pid
, l_cid
, l_type
from org
where cid = p_cid;
if ( l_type = 'MGT' ) then
return null;
elsif ( l_type = 'DEP' ) then
return l_cid;
else
return get_dep ( l_pid );
end if;
end;
/
select cid --correction
from org
where type = 'DEP'
start
with cid = 7
connect
by
prior pid = cid
and
prior type != 'DEP'
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 66 | 6 (17)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select get_dep ( cid )
from org
where cid = 7;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORG | 1 | 13 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #1:
I updated the function to add a logic to return null if id is MGT.
Also, change the queries to fetch all records in the table.
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 10 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | ORG | 7 | 231 | 5 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | ORG | 7 | 91 | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| ORG | 7 | 91 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
UPDATE #2: Added index as suggested. The explain plan improved on both but the query with the user-defined function still performs better based on the explain plan (unless I'm not interpreting the plan correctly).
select cid, ( select cid
from org
where type = 'DEP'
start
with cid = m.cid
connect
by
prior pid = cid
and
prior type != 'DEP' ) dep
from org m;
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY WITH FILTERING | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 5 | NESTED LOOPS | | 1 | 53 | 2 (0)| 00:00:01 |
|* 6 | CONNECT BY PUMP | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| ORG | 1 | 33 | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | ORG_PK | 1 | | 0 (0)| 00:00:01 |
| 9 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
select cid, get_dep ( cid ) dep
from org;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 91 | 1 (0)| 00:00:01 |
| 1 | INDEX FULL SCAN | ORG_PK | 7 | 91 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
Appreciate any feedback.
First of all, in your example, SQL and PL/SQL return different results.
SQL> select pid
2 from org
3 where type = 'DEP'
4 start
5 with cid = 7
6 connect
7 by
8 prior pid = cid
9 and
10 prior type != 'DEP';
PID
----------
1
SQL>
SQL> select get_dep ( cid )
2 from org
3 where cid = 7;
GET_DEP(CID)
------------
3
Secondly, it does not really make sense to compare different approaches on such extremely small data volumes.
Let's assume we have a tree with depth 999 999 and want to find a root for a given node.
In my example there is only one tree (which is actually a list since each parent has one child) therefore root is the same for all nodes.
The important thing is: the bigger depth of a given ID the longer execution time.
create table org0 ( pid number, cid number, name varchar2(30) );
insert into org0
select rownum, rownum+1, 'name' || rpad(rownum,25,'#')
from dual
connect by rownum < 1e6;
alter table org0 add constraint org0_pk primary key ( cid );
Function for returning the root
create or replace function get_id(p_cid in number) return number is
l_pid number;
begin
select pid into l_pid from org0 where cid = p_cid;
return get_id(l_pid);
exception
when no_data_found then
return p_cid;
end get_id;
/
Testing
SQL
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 10000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.07
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 100000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:00.55
SQL>
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.79
PL/SQL
SQL> select get_id(10000) id from dual;
ID
----------
1
Elapsed: 00:00:00.15
SQL> select get_id(100000) id from dual;
ID
----------
1
Elapsed: 00:00:01.47
SQL> select get_id(1000000) id from dual;
ID
----------
1
Elapsed: 00:00:14.83
As you can see, PL/SQL is approximately 2 times slower.
In some specific cases PL/SQL may be faster though (not for your task).
You can read about fine grained performance analysis and using tools like dbms_hprof in this book Oracle SQL Revealed, chapter "When PL/SQL Is Better Than Vanilla SQL".
The hierarchical queries often lead to suboptimal performance. Frequent use of PL/SQL functions additionally introduce the problem of context switch.
One possible approach to get performance of a hierarchical query comparable with a single row index access is to define a materialize view that pre-calculates the query.
I’m using the identical data from the #Dr Y Wit answer.
create materialized view mv_org as
select
CID, PID, NAME, CONNECT_BY_ROOT PID ROOT_PID
from org0
start with pid in (
select pid from org0
MINUS
select cid from org0
)
connect by prior cid = pid;
Note that the MV contain the original data and adds the column PID_ROOT which is the pre-calculated root key.
CID PID NAME ROOT_PID
---------- ---------- ------------------------------ ----------
2 1 name1######################## 1
3 2 name2######################## 1
4 3 name3######################## 1
....
The performance of the queries is fine, as there is no need to do hierarchical query any more.
SQL> select pid id
2 from org0
3 where connect_by_isleaf = 1
4 start with cid = 1000000
5 connect by prior pid = cid;
ID
----------
1
Elapsed: 00:00:05.07
SQL> select root_pid from mv_org where cid = 1000000;
ROOT_PID
----------
1
Elapsed: 00:00:00.01
So if you can manage the changes in the hierarchical table in regular windows (say once per day or month) and performs refresh of the MV, you localize the complexity (and performance load) in this refresh and you regular queries are fast.
SQL> exec DBMS_MVIEW.REFRESH ('mv_org','c');
Elapsed: 00:00:27.58
What is the best practice to convert the following sql statement using a subquery (with data as clause) to use it in a database view.
AFAIK the with data as clause is not supported in database views (Edited: Oracle supports Common Table Expressions), but in my case the subquery factoring offers advantage for performance. If I create a database view using Common Table Expression, than this advantage is lost.
Please have a look at my example:
Description of query
a_table
Millions of entries, by the select statement a few thousand are selected.
anchor_table
For each entry in a_table exists a corresponding entry in anchor_table. By this table is determined at runtime exactly one row as anchor. See example below.
horizon_table
For each selection exactly one entry is determined at runtime (all entries of a selection of a_table have the same horizon_id)
Please notice: This is a strongly simplified sql that works fine so far.
In reality more than 20 tables are joined together to get the results of data.
The where clause is much more complex.
Further columns of horizon_table and anchor_table are required to prepare my where condition and result list in the subquery, i.e. moving these tables to the main query is no solution.
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 1 and 10000
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
example of with data as select:
id descr offset anchor position
1 bla 3 0 1
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
54 oeroe 3 0 9
...
result of select * from data
id descr offset anchor position
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
I.E. the result is the anchor row and the tree rows above and below.
How can I achieve the same within a database view?
My attempt failed as I expected by performance issues:
Create a view data of with data as select above
Use this view as above
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
Thank you for any advice :-)
Amendment
If I create a view as recommended in first comment, than I get the same performance issue. Oracle does not use the subquery to restrict the results.
Here are the execution plans of my production queries (please click at the images)
a) SQL
b) View
Here are the execution plans of my test cases
-- Create Testdata table with ~ 1,000,000 entries
insert into a_table
(id, descr, a_position_field, anchor_id, horizon_id, a_value)
select level, 'data' || level, mod(level, 10), level, 1, level
from dual
connect by level <= 999999;
insert into anchor_table
(id, a_date)
select level, trunc(sysdate) - 500000 + level
from dual
connect by level <= 999999;
insert into horizon_table (id, offset) values (1, 50);
commit;
-- Create view
create or replace view testdata_vw as
with data as
(select a_table.id,
a_table.descr,
a_table.a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id))
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
-- Explain plan of subquery factoring select statement
explain plan for
with data as
(select a_table.id,
a_table.descr,
a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 500000 - 500 and 500000 + 500)
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D6628_284C5768 ~ 1000 rows
Plan hash value: 1145408420
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 1791 (2)| 00:00:31 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6628_284C5768 | | | | |
| 3 | WINDOW SORT | | 57 | 6840 | 1785 (2)| 00:00:31 |
|* 4 | HASH JOIN | | 57 | 6840 | 1784 (2)| 00:00:31 |
|* 5 | TABLE ACCESS FULL | A_TABLE | 57 | 4104 | 1193 (2)| 00:00:21 |
| 6 | MERGE JOIN CARTESIAN | | 1189K| 54M| 586 (2)| 00:00:10 |
| 7 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | 3 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 1189K| 24M| 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| 583 (2)| 00:00:10 |
|* 10 | FILTER | | | | | |
| 11 | VIEW | | 57 | 3534 | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 13 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 15 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID" AND
"ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
5 - filter("A_TABLE"."A_VALUE">=499500 AND "A_TABLE"."A_VALUE"<=500500)
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE
("T1") "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<=
(SELECT "D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1"
"DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D2" WHERE "D2"."ANCHOR"=1))
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
-- Explain plan of database view
explain plan for
select *
from testdata_vw
where a_value between 500000 - 500 and 500000 + 500;
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D662A_284C5768 ~ 1000000 rows
Plan hash value: 1422141561
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 1 | VIEW | TESTDATA_VW | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 2 | TEMP TABLE TRANSFORMATION | | | | | | |
| 3 | LOAD AS SELECT | SYS_TEMP_0FD9D662A_284C5768 | | | | | |
| 4 | WINDOW SORT | | 1189K| 136M| 147M| 37032 (1)| 00:10:30 |
|* 5 | HASH JOIN | | 1189K| 136M| | 6868 (1)| 00:01:57 |
| 6 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | | 3 (0)| 00:00:01 |
|* 7 | HASH JOIN | | 1189K| 106M| 38M| 6860 (1)| 00:01:57 |
| 8 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| | 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | A_TABLE | 1209K| 83M| | 1191 (2)| 00:00:21 |
|* 10 | FILTER | | | | | | |
|* 11 | VIEW | | 1189K| 70M| | 4431 (1)| 00:01:16 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 13 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 15 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID")
7 - access("ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE ("T1")
"C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= (SELECT
"D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1" "DESCR","C2"
"A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D2"
WHERE "D2"."ANCHOR"=1))
11 - filter("A_VALUE">=499500 AND "A_VALUE"<=500500)
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
sqlfiddle
explain plan of sql http://www.sqlfiddle.com/#!4/6a7022/3
explain plan of view http://www.sqlfiddle.com/#!4/6a7022/2
You need to write a view definition which returns all possible selectable ranges of a_value as two columns, start_a_value and end_a_value, along with all records which fall into each start/end range. In other words, the correct view definition should logically describe a |n^3| result set given n rows in a_table.
Then query that view as:
SELECT * FROM testdata_vw WHERE START_A_VALUE = 4950 AND END_A_VALUE = 5050;
Also, your multiple references to "data" are unnecessary; same logic can be delivered with an additional analytic function.
Final view def:
CREATE OR REPLACE VIEW testdata_vw AS
SELECT *
FROM
(
SELECT T.*,
MAX(CASE WHEN ANCHOR=1 THEN POSITION END)
OVER (PARTITION BY START_A_VALUE, END_A_VALUE) ANCHOR_POS
FROM
(
SELECT S.A_VALUE START_A_VALUE,
E.A_VALUE END_A_VALUE,
B.ID ID,
B.DESCR DESCR,
HORIZON_TABLE.OFFSET OFFSET,
CASE
WHEN ANCHOR_TABLE.A_DATE = TRUNC(SYSDATE)
THEN 1
ELSE 0
END ANCHOR,
ROW_NUMBER()
OVER(PARTITION BY S.A_VALUE, E.A_VALUE
ORDER BY B.A_POSITION_FIELD) POSITION
FROM
A_TABLE S
JOIN A_TABLE E
ON S.A_VALUE<E.A_VALUE
JOIN A_TABLE B
ON B.A_VALUE BETWEEN S.A_VALUE AND E.A_VALUE
JOIN ANCHOR_TABLE
ON ANCHOR_TABLE.ID = B.ANCHOR_ID
JOIN HORIZON_TABLE
ON HORIZON_TABLE.ID = B.HORIZON_ID
) T
) T
WHERE POSITION BETWEEN ANCHOR_POS - OFFSET AND ANCHOR_POS+OFFSET;
EDIT: SQL Fiddle with expected execution plan
I'm seeing the same (sensible) plan here that I saw in my database; if you're getting something different, please send fiddle link.
Use index lookup to find 1 row in "S" A_TABLE (A_VALUE = 4950)
Use index lookup to find 1 row in "E" A_TABLE (A_VALUE = 5050)
Nested Loop join #1 and #2 (1 x 1 join, still 1 row)
FTS 1 row from HORIZON table
Cartesian join #1 and #2 (1 x 1, okay to use Cartesian).
Use index lookup to find ~100 rows in "B" A_TABLE with values between 4950 and 5050.
Cartesian join #5 and #6 (1 x 102, okay to use Cartesian).
FTS ANCHOR_TABLE with hash join to #7.
Window-sort for analytic functions
You have a predicate outside the view and you want to be applied in the view.
For this, you can use push_pred hint:
select /*+PUSH_PRED(v)*/
*
from
testdata_vw v
where
a_value between 5000 - 50 and 5000 + 50;
SQLFIDDLE
EDIT: Now I've seen that you use the data subquery three times. For the first occurrence it makes sense to push the predicate, but for d1 and d2 it doesn't. It's another query.
What would I do is to use two context variables, set them according my needs and write the query:
SYS_CONTEXT('my_context_name', 'var5000');
create or replace view testdata_vw as
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between SYS_CONTEXT('my_context_name', 'var5000') - SYS_CONTEXT('my_context_name', 'var50') and SYS_CONTEXT('my_context_name', 'var5000') + SYS_CONTEXT('my_context_name', 'var50')
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1) ;
to use it:
dbms_session.set_context ('my_context_name', 'var5000', 5000);
dbms_session.set_context ('my_context_name', 'var50', 50);
select * from testdata_vw;
UPDATE: Instead of context variables(which can be used across sessions) you can use package variables as you commented.
This query works but takes 5000 miliseconds.
SELECT
SUM(case
when ((TRUNC(OPEN_DATE) <= thedate and TRUNC(END_DATE) > thedate) or(TRUNC(OPEN_DATE) <= thedate and END_DATE Is Null)) then 1
else 0
end) as Open
From (
select *
FROM PROJECT
WHERE
PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
)
cross join (
select add_months(last_day(SYSDATE), level-7) as thedate
from dual
connect by level <= 12
)
GROUP BY thedate
ORDER BY thedate
If I copy the subquery to its own table
create table test_project as
select * FROM PROJECT WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
then do the above query but the subquery is on the copied table as:
From ( select * FROM test_project WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName )
the query takes 10 milliseconds
The query produces a count of how many projects were open in that month over the past 5 and future months (count of open projects for furture months will just equal todays months totals) based on comparing OPEN_DATE to END_DATE
Is there a way to rewrite the original query for optimal performance?
EDIT
OK, I created a second table which is a full copy of the project table (well view) that I was allowed access to. The table copy took about 5 seconds. Using the full set of data and either my sql query or from Egor below, the query is super fast. Something is up with the view. Trying to spit out explain plan using the View in the subquery I get insufficient privileges. Here is the explain plan using a full copy of the view
Plan hash value: 3695211866
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 637 | 1277K| 163 (2)| 00:00:02 |
| 1 | SORT ORDER BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 2 | HASH GROUP BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 3 | MERGE JOIN CARTESIAN | | 637 | 1277K| 161 (0)| 00:00:02 |
| 4 | VIEW | | 1 | 6 | 2 (0)| 00:00:01 |
|* 5 | CONNECT BY WITHOUT FILTERING| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | BUFFER SORT | | 637 | 1273K| 163 (2)| 00:00:02 |
|* 8 | TABLE ACCESS FULL | COMMIT_TEST | 637 | 1273K| 159 (0)| 00:00:02 |
Predicate Information (identified by operation id):
5 - filter(LEVEL<=12)
8 - filter("PROGRAM_NAME"='program_name' AND "ACTION_FOR_ORG"='action_for_org')
Note
- dynamic sampling used for this statement (level=2)
Explain Plan using live table
with
PRJ as (
select /*+ NO_UNNEST */
trunc(OPEN_DATE) as OPEN_DATE,
nvl(trunc(END_DATE), sysdate + 1000) as END_DATE
from
PROJECT
where
PROGRAM_NAME = :program
and ACTION_FOR_ORG = :orgName
),
DATES as (
select
add_months(trunc(last_day(SYSDATE)), level-7) as thedate
from dual
connect by level <= 12
)
SELECT
thedate,
sum(case when thedate between open_date and end_date then 1 end) as Open
FROM
DATES, PRJ
GROUP BY thedate
ORDER BY 1