How to reuse sql with subquery factoring by database view - oracle

What is the best practice to convert the following sql statement using a subquery (with data as clause) to use it in a database view.
AFAIK the with data as clause is not supported in database views (Edited: Oracle supports Common Table Expressions), but in my case the subquery factoring offers advantage for performance. If I create a database view using Common Table Expression, than this advantage is lost.
Please have a look at my example:
Description of query
a_table
Millions of entries, by the select statement a few thousand are selected.
anchor_table
For each entry in a_table exists a corresponding entry in anchor_table. By this table is determined at runtime exactly one row as anchor. See example below.
horizon_table
For each selection exactly one entry is determined at runtime (all entries of a selection of a_table have the same horizon_id)
Please notice: This is a strongly simplified sql that works fine so far.
In reality more than 20 tables are joined together to get the results of data.
The where clause is much more complex.
Further columns of horizon_table and anchor_table are required to prepare my where condition and result list in the subquery, i.e. moving these tables to the main query is no solution.
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 1 and 10000
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
example of with data as select:
id descr offset anchor position
1 bla 3 0 1
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
54 oeroe 3 0 9
...
result of select * from data
id descr offset anchor position
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
I.E. the result is the anchor row and the tree rows above and below.
How can I achieve the same within a database view?
My attempt failed as I expected by performance issues:
Create a view data of with data as select above
Use this view as above
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
Thank you for any advice :-)
Amendment
If I create a view as recommended in first comment, than I get the same performance issue. Oracle does not use the subquery to restrict the results.
Here are the execution plans of my production queries (please click at the images)
a) SQL
b) View
Here are the execution plans of my test cases
-- Create Testdata table with ~ 1,000,000 entries
insert into a_table
(id, descr, a_position_field, anchor_id, horizon_id, a_value)
select level, 'data' || level, mod(level, 10), level, 1, level
from dual
connect by level <= 999999;
insert into anchor_table
(id, a_date)
select level, trunc(sysdate) - 500000 + level
from dual
connect by level <= 999999;
insert into horizon_table (id, offset) values (1, 50);
commit;
-- Create view
create or replace view testdata_vw as
with data as
(select a_table.id,
a_table.descr,
a_table.a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id))
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
-- Explain plan of subquery factoring select statement
explain plan for
with data as
(select a_table.id,
a_table.descr,
a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 500000 - 500 and 500000 + 500)
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D6628_284C5768 ~ 1000 rows
Plan hash value: 1145408420
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 1791 (2)| 00:00:31 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6628_284C5768 | | | | |
| 3 | WINDOW SORT | | 57 | 6840 | 1785 (2)| 00:00:31 |
|* 4 | HASH JOIN | | 57 | 6840 | 1784 (2)| 00:00:31 |
|* 5 | TABLE ACCESS FULL | A_TABLE | 57 | 4104 | 1193 (2)| 00:00:21 |
| 6 | MERGE JOIN CARTESIAN | | 1189K| 54M| 586 (2)| 00:00:10 |
| 7 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | 3 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 1189K| 24M| 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| 583 (2)| 00:00:10 |
|* 10 | FILTER | | | | | |
| 11 | VIEW | | 57 | 3534 | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 13 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 15 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID" AND
"ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
5 - filter("A_TABLE"."A_VALUE">=499500 AND "A_TABLE"."A_VALUE"<=500500)
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE
("T1") "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<=
(SELECT "D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1"
"DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D2" WHERE "D2"."ANCHOR"=1))
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
-- Explain plan of database view
explain plan for
select *
from testdata_vw
where a_value between 500000 - 500 and 500000 + 500;
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D662A_284C5768 ~ 1000000 rows
Plan hash value: 1422141561
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 1 | VIEW | TESTDATA_VW | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 2 | TEMP TABLE TRANSFORMATION | | | | | | |
| 3 | LOAD AS SELECT | SYS_TEMP_0FD9D662A_284C5768 | | | | | |
| 4 | WINDOW SORT | | 1189K| 136M| 147M| 37032 (1)| 00:10:30 |
|* 5 | HASH JOIN | | 1189K| 136M| | 6868 (1)| 00:01:57 |
| 6 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | | 3 (0)| 00:00:01 |
|* 7 | HASH JOIN | | 1189K| 106M| 38M| 6860 (1)| 00:01:57 |
| 8 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| | 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | A_TABLE | 1209K| 83M| | 1191 (2)| 00:00:21 |
|* 10 | FILTER | | | | | | |
|* 11 | VIEW | | 1189K| 70M| | 4431 (1)| 00:01:16 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 13 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 15 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID")
7 - access("ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE ("T1")
"C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= (SELECT
"D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1" "DESCR","C2"
"A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D2"
WHERE "D2"."ANCHOR"=1))
11 - filter("A_VALUE">=499500 AND "A_VALUE"<=500500)
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
sqlfiddle
explain plan of sql http://www.sqlfiddle.com/#!4/6a7022/3
explain plan of view http://www.sqlfiddle.com/#!4/6a7022/2

You need to write a view definition which returns all possible selectable ranges of a_value as two columns, start_a_value and end_a_value, along with all records which fall into each start/end range. In other words, the correct view definition should logically describe a |n^3| result set given n rows in a_table.
Then query that view as:
SELECT * FROM testdata_vw WHERE START_A_VALUE = 4950 AND END_A_VALUE = 5050;
Also, your multiple references to "data" are unnecessary; same logic can be delivered with an additional analytic function.
Final view def:
CREATE OR REPLACE VIEW testdata_vw AS
SELECT *
FROM
(
SELECT T.*,
MAX(CASE WHEN ANCHOR=1 THEN POSITION END)
OVER (PARTITION BY START_A_VALUE, END_A_VALUE) ANCHOR_POS
FROM
(
SELECT S.A_VALUE START_A_VALUE,
E.A_VALUE END_A_VALUE,
B.ID ID,
B.DESCR DESCR,
HORIZON_TABLE.OFFSET OFFSET,
CASE
WHEN ANCHOR_TABLE.A_DATE = TRUNC(SYSDATE)
THEN 1
ELSE 0
END ANCHOR,
ROW_NUMBER()
OVER(PARTITION BY S.A_VALUE, E.A_VALUE
ORDER BY B.A_POSITION_FIELD) POSITION
FROM
A_TABLE S
JOIN A_TABLE E
ON S.A_VALUE<E.A_VALUE
JOIN A_TABLE B
ON B.A_VALUE BETWEEN S.A_VALUE AND E.A_VALUE
JOIN ANCHOR_TABLE
ON ANCHOR_TABLE.ID = B.ANCHOR_ID
JOIN HORIZON_TABLE
ON HORIZON_TABLE.ID = B.HORIZON_ID
) T
) T
WHERE POSITION BETWEEN ANCHOR_POS - OFFSET AND ANCHOR_POS+OFFSET;
EDIT: SQL Fiddle with expected execution plan
I'm seeing the same (sensible) plan here that I saw in my database; if you're getting something different, please send fiddle link.
Use index lookup to find 1 row in "S" A_TABLE (A_VALUE = 4950)
Use index lookup to find 1 row in "E" A_TABLE (A_VALUE = 5050)
Nested Loop join #1 and #2 (1 x 1 join, still 1 row)
FTS 1 row from HORIZON table
Cartesian join #1 and #2 (1 x 1, okay to use Cartesian).
Use index lookup to find ~100 rows in "B" A_TABLE with values between 4950 and 5050.
Cartesian join #5 and #6 (1 x 102, okay to use Cartesian).
FTS ANCHOR_TABLE with hash join to #7.
Window-sort for analytic functions

You have a predicate outside the view and you want to be applied in the view.
For this, you can use push_pred hint:
select /*+PUSH_PRED(v)*/
*
from
testdata_vw v
where
a_value between 5000 - 50 and 5000 + 50;
SQLFIDDLE
EDIT: Now I've seen that you use the data subquery three times. For the first occurrence it makes sense to push the predicate, but for d1 and d2 it doesn't. It's another query.
What would I do is to use two context variables, set them according my needs and write the query:
SYS_CONTEXT('my_context_name', 'var5000');
create or replace view testdata_vw as
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between SYS_CONTEXT('my_context_name', 'var5000') - SYS_CONTEXT('my_context_name', 'var50') and SYS_CONTEXT('my_context_name', 'var5000') + SYS_CONTEXT('my_context_name', 'var50')
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1) ;
to use it:
dbms_session.set_context ('my_context_name', 'var5000', 5000);
dbms_session.set_context ('my_context_name', 'var50', 50);
select * from testdata_vw;
UPDATE: Instead of context variables(which can be used across sessions) you can use package variables as you commented.

Related

Oracle Compound Join Predicate Causes Row Estimate to be Incorrect

In the example below Oracle's optimizer's estimated rows is incorrect by two orders of magnitude. How do I improve the estimated rows?
Table A has rows with numbers 1 through 1,000 for each of the 10 letters A through J.
Table C has 100 copies of table A.
So, table A has a cardinality of 10K and table C has a cardinality of 1M.
A given single-valued predicate on the number in table A will yield 1/1000 of the rows in table A (same for table C).
A given single-valued predicate on the letter in table A will yield 1/10 of the rows in table A (same for table C).
Setup script.
drop table C;
drop table A;
create table A
( num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into A (num, val, pad)
select mod(level-1, 1000) +1
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A'))
, ' '
from dual
connect by level <= 10*1000
;
create table C
( id NUMBER
, num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into C (id, num, val, pad)
with
"D1" as
( select /*+ materialize */ null from dual connect by level <= 100 --320
)
, "D" as
( select /*+ materialize */
level rn
, mod(level-1, 1000) + 1 num
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A')) val
, ' ' pad
from dual
connect by level <= 10*1000
order by 1 offset 0 rows
)
select rownum id
, num num
, val val
, pad pad
from "D1", "D"
;
commit;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
Consider the explain plan to the following query.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num = 1
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 1 | 47 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 100 | 5200 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."NUM"=1 AND "A"."VAL"='A')
3 - filter("C"."NUM"=1 AND "C"."VAL"='A')
The row cardinality of each step makes sense to me.
ID=2 --> (1/1,000) * (1/10) * 10,000 = 1
ID=3 --> (1/1,000) * (1/10) * 1,000,000 = 100
ID=1 --> 100 is correct. Predicates in ID=2 and ID=3 are the same, every row from ID=2 will have one and only one match in the row source from ID=3.
Now consider the explain plan to the slightly modified query below.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 2 | 94 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 200 | 10400 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."VAL"='A' AND ("A"."NUM"=1 OR "A"."NUM"=2))
3 - filter("C"."VAL"='A' AND ("C"."NUM"=1 OR "C"."NUM"=2))
The row cardinality of each step ID=2 and ID=3 makes sense to me, but now ID=1 is incorrect by two orders of magnitude.
ID=2 --> (1/1,000)(1/10) * 10,000 = 1
ID=3 --> (1/1,000)(1/10) * 1,000,000 = 100
ID=1 --> The optimizer's estimate is two orders of magnitude different from the actual.
Adding unique and foreign constraints and extended statistics did not improve the estimated row counts.
create unique index IU_A on A (num, val);
alter table A add constraint UK_A unique (num, val) rely using index IU_A enable validate;
alter table C add constraint R_C foreign key (num, val) references A (num, val) rely enable validate;
create index IR_C on C (num, val);
select dbms_stats.create_extended_stats(null,'A','(num, val)') from dual;
select dbms_stats.create_extended_stats(null,'C','(num, val)') from dual;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 2 | 198 | 10 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| A | 2 | 94 | 5 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IU_A | 2 | | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access(("A"."NUM"=1 OR "A"."NUM"=2) AND "A"."VAL"='A')
6 - access("A"."NUM"="C"."NUM" AND "C"."VAL"='A')
filter("C"."NUM"=1 OR "C"."NUM"=2)
What do I need to do to make the estimated rows better match reality?
Using Oracle Enterprise Edition 19c.
Thanks in advance.
Edit
After ensuring the most recent optimizer_features_enable was used and modifying one of the predicates, we still have an explain plan whose estimated row count is short by two orders of magnitude.
ID=6 ought to have an estimated rows of 100. It seems it is applying the predicate factor twice. Once for the access and again for the filter.
select /*+ optimizer_features_enable('19.1.0') */
*
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val in('A','B')
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 4 | 396 | 16 (0)| 00:00:01 |
| 1 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 2 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | table access by index ROWID BATCHED| A | 4 | 188 | 7 (0)| 00:00:01 |
|* 5 | index range scan | IU_A | 4 | | 3 (0)| 00:00:01 |
|* 6 | index range scan | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | table access by index ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."NUM"=1 or "A"."NUM"=2)
filter("A"."VAL"='A' or "A"."VAL"='B')
6 - access("A"."NUM"="C"."NUM" and "A"."VAL"="C"."VAL")
filter(("C"."NUM"=1 or "C"."NUM"=2) and ("C"."VAL"='A' or "C"."VAL"='B'))

ORA_ROWSCN queried properly but why is Oracle returning the wrong value in the results?

Why does Oracle sometimes return the wrong ORA_ROWSCN, such as in the following? (Note this does not seem to be a ROWDEPENDENCIES issue or a "greater than expected SCN" issue, as I realize both these caveats when using ORA_ROWSCN.)
When I run:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618
My result is:
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA2
1887508 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA3
1887512 7884576380617 7884576380617 FOO AAARiGAAMAAG4B4AA7
...
Yep, you see that right. The ORA_ROWSCN returned is less than my literal value that I asked for greater-than in the query WHERE clause. (I also included otherSCN to see if it was throwing me off somehow, but it appears to be irrelevant)
It appears that the Row in question in reality has a higher ORA_ROWSCN, and indeed the WHERE clause worked properly, as when I then do SELECT ORA_ROWSCN FROM changed_rows_log WHERE changed_rows_log_id=1887507, I get 7884576380644 not 7884576380617.
Also, when I add just one WHERE condition, I also get the correct data returned:
WITH maxIds as (
SELECT table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618 AND l.changed_rows_log_id=1887507
gives me this, as expected
CHANGED_ROWS_LOG_ID ORA_ROWSCN OTHERSCN TABLE_NAME RECORD_ROWID
1887507 7884576380644 7884576380644 FOO AAARiGAAMAAG4B4AA2
So why does and how can SELECT ORA_ROWSCN give me simply incorrect data like this? Can I work around it somehow so I can get the expected ORA_ROWSCN that more particular queries give me?
(If it matters, changed_rows_log has ROWDEPENDENCIES enabled. I'm using Oracle Database 12.1.0.2.0 64-bit.)
More detail--the EXPLAIN PLAN for the first query (with bad value)
Plan hash value: 3153795477
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | | 30794 (1)| 00:00:02 |
|* 3 | HASH JOIN | | 208K| 12M| 3424K| 30787 (1)| 00:00:02 |
|* 4 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 71438 | 2581K| | 14052 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL| CHANGED_ROWS_LOG | 1428K| 34M| | 14058 (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("L"."CHANGED_ROWS_LOG_ID"=MAX("CHANGED_ROWS_LOG_ID"))
3 - access("L"."TABLE_NAME"="TABLE_NAME" AND "L"."RECORD_ROWID"="RECORD_ROWID")
4 - filter("ORA_ROWSCN">7884576380618)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 2 Sql Plan Directives used for this statement
And the last query above (correct value)
Plan hash value: 402632295
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 7 (15)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 62 | 7 (15)| 00:00:01 |
| 3 | NESTED LOOPS | | 3 | 186 | 6 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID | CHANGED_ROWS_LOG | 1 | 37 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C00141068 | 1 | | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| CHANGED_ROWS_LOG | 3 | 75 | 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | CHANGED_ROWS_LOG_IF1 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(MAX("CHANGED_ROWS_LOG_ID")=1887507)
4 - filter("ORA_ROWSCN">7884576380618)
5 - access("L"."CHANGED_ROWS_LOG_ID"=1887507)
7 - access("L"."RECORD_ROWID"="RECORD_ROWID" AND "L"."TABLE_NAME"="TABLE_NAME")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
Adding the MATERIALIZE hint to the WITH subquery overcomes this issue. I'd love if someone could explain why the issue happens at all, but for now:
WITH maxIds as (
SELECT /*+ MATERIALIZE */ table_name, record_rowid, MAX(changed_rows_log_id) AS changed_rows_log_id, ORA_ROWSCN as otherSCN
FROM changed_rows_log
GROUP BY table_name, record_rowid
)
SELECT l.changed_rows_log_id, l.ORA_ROWSCN, otherSCN, l.table_name, l.record_rowid
FROM changed_rows_log l
JOIN maxIds m on l.changed_rows_log_id = m.changed_rows_log_id and l.table_name=m.table_name and l.record_rowid=m.record_rowid
WHERE ORA_ROWSCN > 7884576380618

Update statement is slow with sum and nvl function

I have a procedure , in which a table's columns is being filled using sum and nvl functions on other tables' column. These update queries are slow and which is making overall Proc slow.One of such update query is below:
UPDATE t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
Here t_overall and t_final , both the tables do not have any indexes as they have multiple updates in the overall procedure. Number of records for table t_final is around 8500 and for table t_overall is around 13000. Is there any other way , I can write above query in more optimized way?
Edit 1: Here SUM(NVL(pct,0)) function is first replacing null to 0 in 'pct' column of table t_overall and then adds all pct values using sum function and updates pct column of the table t_final depending on the criteria.
Explain plan returns below:
OPERATION OBJECT_NAME CARDINALITY COST
UPDATE STATEMENT 6 424
UPDATE T_FINAL
TABLE ACCESS(FULL) T_FINAL 6 238
. Filter Predicates
. AND
. RTYPE=6
. SID='R12'
. RID=9
. PID=21
SORT(AGGREGATE) 1
TABLE ACCESS(FULL) T_OVERALL 1 30
Filter Predicates
AND
MID-:B1
RTYPE=6
SID='R12'
RID=9
PID=21
Updated number of rows are around 2200
Edit 2: I have run update query with hint /*+ gather_plan_statistics */ as below:
ALTER session SET statistics_level=ALL;
UPDATE /*+ gather_plan_statistics */ t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
select * from
table (dbms_xplan.display_cursor (format=>'ALLSTATS LAST'));
The result is:
SQL_ID gypnfv5nzurb0, child number 1
-------------------------------------
select child_number from v$sql where sql_id = :1 order by
child_number
Plan hash value: 4252345203
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | | | |
| 1 | SORT ORDER BY | | 1 | 1 | 2 |00:00:00.01 | 2048 | 2048 | 2048 (0)|
|* 2 | FIXED TABLE FIXED INDEX| X$KGLCURSOR_CHILD (ind:2) | 1 | 1 | 2 |00:00:00.01 | | | |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("KGLOBT03"=:1 AND "INST_ID"=USERENV('INSTANCE')))
Thank you.
You did not provide enough information to make unique diagnose, so I can only hint you how to troubleshoot your query.
Here is my setup simulation your data
create table t_final as
select rownum mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, 0 pct from dual
connect by level <= 8800;
drop table T_OVERALL;
create table T_OVERALL as
select mod(rownum,8800) mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, rownum pct from dual
connect by level <= 13000;
Now I run the query activating the statistics gathering to see what the query is doing:
SQL> UPDATE /*+ gather_plan_statistics */ t_final wp
2 SET PCT =
3 (
4 SELECT SUM(NVL(pct,0))
5 FROM t_overall
6 WHERE rid = 9
7 AND rtype = 1
8 AND sid = 'r12'
9 AND pid = 21
10 AND mid = wp.mid
11 )
12 WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
2200 rows updated.
Elapsed: 00:00:00.97
So nearly one second elapsed time, which is is slow if you have lot of such updates. To see the cause we display the cursor and the statsitics (hist is possible using the hint /*+ gather_plan_statistics */)
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
SQL_ID 3ctaz5gvksb54, child number 0
-------------------------------------
UPDATE /*+ gather_plan_statistics */ t_final wp SET PCT = (
SELECT SUM(NVL(pct,0)) FROM t_overall WHERE rid
= 9 AND rtype = 1 AND sid = 'r12' AND pid =
21 AND mid = wp.mid ) WHERE rid = 9 AND rtype =
1 AND sid = 'r12' AND pid = 21
Plan hash value: 1255260726
-------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.96 | 116K|
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.96 | 116K|
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.92 | 112K|
|* 4 | TABLE ACCESS FULL| T_OVERALL | 2200 | 33 | 3250 |00:00:00.85 | 112K|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
2 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "SID"='r12'))
4 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "MID"=:B1 AND "SID"='r12'))
So you see the main problem was in the FULL TABLE SCAN on T_OVERALL which was called 2200 times (columns Starts, line 4).
A remedy could provide an Index based on the filter predicate of line 4:
create index T_OVERALL_IDX on T_OVERALL(mid, rid, rtype, sid, pid);
On the same data now I got:
Elapsed: 00:00:00.05
with the changed plan using now 2200 INDEX RANGE SCANs
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.05 | 10272 |
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.05 | 10272 |
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.01 | 5755 |
| 4 | TABLE ACCESS BY INDEX ROWID| T_OVERALL | 2200 | 33 | 3250 |00:00:00.01 | 5755 |
|* 5 | INDEX RANGE SCAN | T_OVERALL_IDX | 2200 | 1 | 3250 |00:00:00.01 | 2505 |
---------------------------------------------------------------------------------------------------------
Simple recheck the same approach with your data, if you observe a different behavior feel free to post it.

Oracle optimizer equijoin of 3 tables

i have 3 tables in a oracle 11g database. I don't have access to trace file or explain plan anymore. I join the 3 table on the date field like:
select * from a,b,c where a.date = b.date and b.date = c.date
and that takes forever.
when I
select * from a,b,c where a.date = b.date and b.date = c.date and a.date = c.date
its fast. but should that make a difference?
Not sure but it looks like a transitive dependency. that's to say if a.date = b.date and b.date = c.date then a.date = c.date. You can modify your query rather like
select a.*
from a
join b on a.date = b.date
join c on a.date = c.date;
I would also have a index on date column for all this 3 tables since that's the column you are joining on.
Apparently the database does not rewrite queries if the joins are such that A = B, B = C ==> A = C so it's stuck to using what its given.
Consider the following:
create table a (dt date);
create table b (dt date);
create table c (dt date);
Now fill in the tables so that a is the smallest (5 rows), b is the biggest (100 rows), and c is in the middle (50 rows). Also, so that not all rows in b and c will join to a just to make things a bit more interesting.
insert into a
select to_date('2015-01-01', 'yyyy-mm-dd') + rownum - 1
from dual
connect by level <= 5
;
insert into b
select to_date('2015-01-01', 'yyyy-mm-dd') + mod(rownum, 10)
from dual
connect by level <= 100
;
insert into c
select to_date('2015-01-01', 'yyyy-mm-dd') + mod(rownum, 10)
from dual
connect by level <= 50
;
I'm going to bypass statistics for now and leave it totally up to the database on how to figure out a plan.
Take 1: without the join from a to c:
explain plan for
select *
from a
, b
, c
where a.dt = b.dt
and b.dt = c.dt
;
and here's the plan:
select *
from table(dbms_xplan.display())
;
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 50 | 900 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| B | 100 | 900 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | C | 50 | 450 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."DT"="C"."DT")
2 - access("A"."DT"="B"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
First off, since there were no statistics on the tables, Oracle chose to sample the data first so it wasn't going in blind. In this case, table a joins to b first, then the result of that joins to c.
Take 2: introduce the a.dt = c.dt condition:
explain plan for
select *
from a
, b
, c
where a.dt = b.dt
and b.dt = c.dt
and a.dt = c.dt
;
select *
from table(dbms_xplan.display())
;
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 675 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 25 | 675 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 25 | 450 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| C | 50 | 450 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | B | 100 | 900 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DT"="B"."DT" AND "B"."DT"="C"."DT")
2 - access("A"."DT"="C"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
And there you go. The order of the joins has switched now that Oracle has been given the extra join path. (FYI, this is the same plan if using just a.dt = b.dt and a.dt = c.dt.)
BUT, notice anything? The estimates are not right anymore. It's guessing 25 rows in the end, not 250. So, the extra condition is actually causing some confusion.
Without the b.dt = c.dt, though, same join path, different estimates (same end result as the first one):
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 25 | 450 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| C | 50 | 450 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | B | 100 | 900 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DT"="B"."DT")
2 - access("A"."DT"="C"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Long story a little longer, since the database isn't going to assume any join paths for you, adding one in your query gives the database more options and as such can change its plan...and a change in plan can certainly affect how fast the results are returned.
This is Your Query.....
select * from a,b,c where a.date = b.date and .date = c.date and a.date = c.date
Now As per my view ..
SELECT * FROM a
JOIN B USING(date)
JOIN C USING(date);

SQL cross join query slower than expected, refactoring ideas needed

This query works but takes 5000 miliseconds.
SELECT
SUM(case
when ((TRUNC(OPEN_DATE) <= thedate and TRUNC(END_DATE) > thedate) or(TRUNC(OPEN_DATE) <= thedate and END_DATE Is Null)) then 1
else 0
end) as Open
From (
select *
FROM PROJECT
WHERE
PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
)
cross join (
select add_months(last_day(SYSDATE), level-7) as thedate
from dual
connect by level <= 12
)
GROUP BY thedate
ORDER BY thedate
If I copy the subquery to its own table
create table test_project as
select * FROM PROJECT WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
then do the above query but the subquery is on the copied table as:
From ( select * FROM test_project WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName )
the query takes 10 milliseconds
The query produces a count of how many projects were open in that month over the past 5 and future months (count of open projects for furture months will just equal todays months totals) based on comparing OPEN_DATE to END_DATE
Is there a way to rewrite the original query for optimal performance?
EDIT
OK, I created a second table which is a full copy of the project table (well view) that I was allowed access to. The table copy took about 5 seconds. Using the full set of data and either my sql query or from Egor below, the query is super fast. Something is up with the view. Trying to spit out explain plan using the View in the subquery I get insufficient privileges. Here is the explain plan using a full copy of the view
Plan hash value: 3695211866
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 637 | 1277K| 163 (2)| 00:00:02 |
| 1 | SORT ORDER BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 2 | HASH GROUP BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 3 | MERGE JOIN CARTESIAN | | 637 | 1277K| 161 (0)| 00:00:02 |
| 4 | VIEW | | 1 | 6 | 2 (0)| 00:00:01 |
|* 5 | CONNECT BY WITHOUT FILTERING| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | BUFFER SORT | | 637 | 1273K| 163 (2)| 00:00:02 |
|* 8 | TABLE ACCESS FULL | COMMIT_TEST | 637 | 1273K| 159 (0)| 00:00:02 |
Predicate Information (identified by operation id):
5 - filter(LEVEL<=12)
8 - filter("PROGRAM_NAME"='program_name' AND "ACTION_FOR_ORG"='action_for_org')
Note
- dynamic sampling used for this statement (level=2)
Explain Plan using live table
with
PRJ as (
select /*+ NO_UNNEST */
trunc(OPEN_DATE) as OPEN_DATE,
nvl(trunc(END_DATE), sysdate + 1000) as END_DATE
from
PROJECT
where
PROGRAM_NAME = :program
and ACTION_FOR_ORG = :orgName
),
DATES as (
select
add_months(trunc(last_day(SYSDATE)), level-7) as thedate
from dual
connect by level <= 12
)
SELECT
thedate,
sum(case when thedate between open_date and end_date then 1 end) as Open
FROM
DATES, PRJ
GROUP BY thedate
ORDER BY 1

Resources