SQL cross join query slower than expected, refactoring ideas needed - oracle

This query works but takes 5000 miliseconds.
SELECT
SUM(case
when ((TRUNC(OPEN_DATE) <= thedate and TRUNC(END_DATE) > thedate) or(TRUNC(OPEN_DATE) <= thedate and END_DATE Is Null)) then 1
else 0
end) as Open
From (
select *
FROM PROJECT
WHERE
PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
)
cross join (
select add_months(last_day(SYSDATE), level-7) as thedate
from dual
connect by level <= 12
)
GROUP BY thedate
ORDER BY thedate
If I copy the subquery to its own table
create table test_project as
select * FROM PROJECT WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
then do the above query but the subquery is on the copied table as:
From ( select * FROM test_project WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName )
the query takes 10 milliseconds
The query produces a count of how many projects were open in that month over the past 5 and future months (count of open projects for furture months will just equal todays months totals) based on comparing OPEN_DATE to END_DATE
Is there a way to rewrite the original query for optimal performance?
EDIT
OK, I created a second table which is a full copy of the project table (well view) that I was allowed access to. The table copy took about 5 seconds. Using the full set of data and either my sql query or from Egor below, the query is super fast. Something is up with the view. Trying to spit out explain plan using the View in the subquery I get insufficient privileges. Here is the explain plan using a full copy of the view
Plan hash value: 3695211866
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 637 | 1277K| 163 (2)| 00:00:02 |
| 1 | SORT ORDER BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 2 | HASH GROUP BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 3 | MERGE JOIN CARTESIAN | | 637 | 1277K| 161 (0)| 00:00:02 |
| 4 | VIEW | | 1 | 6 | 2 (0)| 00:00:01 |
|* 5 | CONNECT BY WITHOUT FILTERING| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | BUFFER SORT | | 637 | 1273K| 163 (2)| 00:00:02 |
|* 8 | TABLE ACCESS FULL | COMMIT_TEST | 637 | 1273K| 159 (0)| 00:00:02 |
Predicate Information (identified by operation id):
5 - filter(LEVEL<=12)
8 - filter("PROGRAM_NAME"='program_name' AND "ACTION_FOR_ORG"='action_for_org')
Note
- dynamic sampling used for this statement (level=2)
Explain Plan using live table

with
PRJ as (
select /*+ NO_UNNEST */
trunc(OPEN_DATE) as OPEN_DATE,
nvl(trunc(END_DATE), sysdate + 1000) as END_DATE
from
PROJECT
where
PROGRAM_NAME = :program
and ACTION_FOR_ORG = :orgName
),
DATES as (
select
add_months(trunc(last_day(SYSDATE)), level-7) as thedate
from dual
connect by level <= 12
)
SELECT
thedate,
sum(case when thedate between open_date and end_date then 1 end) as Open
FROM
DATES, PRJ
GROUP BY thedate
ORDER BY 1

Related

pl sql declare variable with query too

I want to make a simple query in pl sql
Please suggest and how to make it MORE FAST EXECUTE (maybe only 0.01 second in 1000000 data)
first query:
select datetime
from product
order by datetime desc
FETCH NEXT 1 ROWS ONLY
Result of first query will be used in second query.
select *
from traceability
where endtime = [first query]
Please help me to implement that logic to pl sql
Thank you.
Please find bellow an example with sample data.
create table product as
select rownum product_id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') datetime
from dual connect by level <= 10;
create index product_idx on product(datetime);
create table traceability as
select
rownum id, DATE'2020-01-01' + NUMTODSINTERVAL(rownum-1, 'second') endtime
from dual connect by level <= 10;
create index traceability_idx on traceability(endtime);
Your query shou be as follows
select *
from traceability
where endtime =
(select max(datetime)
from product );
The query will lead to this execution plan. See here how to get the execution plan.
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | TRACEABILITY | 1 | 22 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TRACEABILITY_IDX | 1 | | 1 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 9 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| PRODUCT_IDX | 1 | 9 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ENDTIME"= (SELECT MAX("DATETIME") FROM "PRODUCT" "PRODUCT"))
Note that in case that in the table TRACEABILITY will be a large number of rows with the max timestamp, you can also see a FULL TABLE SCAN in the line 1.
Similar is valid for the PRODUCT table and the line 4

Oracle - Query Optimization - Query runs for a long time

I have a oracle query which is executed once a month to get the order details processed. This query is taking a painfully lot of time to execute. ( More than thirty mins ). Therefore I am trying to optimize this. I have a decent knowledge in Oracle and I will explain what I have tried so far. Still, it takes around 20 minutes to complete. This is the query. Oracle version is 11g.
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
Background
order_main - This table has over 4 million records
I introduced index for order_no column which reduced time to execute.
My questions are as follows.
1) Will it help if I move date validation inside the inner query like this ?
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
WHERE to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
2) Could someone please suggest any other improvements that can be done to this query?
3) Additional indexing would help in any other tables / columns ? Only daily_trx and order_main tables are the tables that contains huge amount of data.
Some generall suggestions
Do not combine ANSI and Oracle Join Syntax in one Query
Do not use outer join if inner join can be used
Your inner subqueries use outer joins, but the final join to main_stores is an inner join
eliminating all rows with store_no is null - you may use inner joins with the same result.
Filter rows early
A suboptimal practice is to first join in a subquery and than filter relevant row with where conditions
Use simple predicated
If you want to constraint a DATE column do it this way
trx.ADDED_DATE >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
Use count distinct if appropriate
The select DISTINCTquery in the third line cam be eliminated if you use COUNT(DISTINCT order_no)
Applying all the above point I come to the following query
select
store.store_no, store.store_typ, count(DISTINCT trx.order_no) order_no_cnt
from daily_trx trx
join order_main odr on trx.order_no = odr.order_no
join ordr_prod_typ prod on odr.prod_typ = prod.prod_typ
join main_stores store on store.store_no = prod.store_no
where trx.ADDED_DATE >= date'2020-05-01' and
trx.ADDED_DATE < date'2020-06-01' and
trx.order_no !='NA'
group by store.store_no, store.store_typ
Performance Considereations
You process a month of data, so there will be probably a large number of transaction (say 100K+). In this case the best approach is to full scan the two large tables and perform HASH JOINs.
You can expect this execution plan
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 199K| 5850K| | 592 (2)| 00:00:08 |
|* 1 | HASH JOIN | | 199K| 5850K| | 592 (2)| 00:00:08 |
| 2 | TABLE ACCESS FULL | MAIN_STORES | 26 | 104 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 199K| 5070K| | 588 (2)| 00:00:08 |
| 4 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 199K| 4290K| 1960K| 584 (1)| 00:00:08 |
|* 6 | TABLE ACCESS FULL| ORDER_MAIN | 100K| 782K| | 69 (2)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| DAILY_TRX | 200K| 2734K| | 172 (2)| 00:00:03 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
3 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
5 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
6 - filter("ODR"."ORDER_NO"<>'NA')
7 - filter("TRX"."ADDED_DATE"<TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ORDER_NO"<>'NA' AND "TRX"."ADDED_DATE">=TO_DATE(' 2020-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
If you have a partition option available you will massively profit by defining a monthly partitioning schema (alternative a daily partitioning) on the two tables DAILY_TRX and ORDER_MAIN.
If the above assumption is not correct and you have very few transactions in the selected time interval (say below 1K) - you will go better using the index access and NESTED LOOPS joins.
You will need this set of indices
create index daily_trx_date on daily_trx(ADDED_DATE);
create unique index order_main_idx on order_main (order_no);
create unique index ORDR_PROD_TYP_idx1 on ORDR_PROD_TYP(prod_typ);
create unique index main_stores_idx1 on main_stores(store_no);
The expected plan is as follows
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID | DAILY_TRX | 92 | 1288 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | DAILY_TRX_DATE | 92 | | 3 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 100K| 1564K| 75 (3)| 00:00:01 |
| 5 | MERGE JOIN | | 26 | 208 | 6 (17)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MAIN_STORES | 26 | 104 | 2 (0)| 00:00:01 |
| 7 | INDEX FULL SCAN | MAIN_STORES_IDX1 | 26 | | 1 (0)| 00:00:01 |
|* 8 | SORT JOIN | | 26 | 104 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | 3 (0)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | ORDER_MAIN | 100K| 782K| 69 (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
2 - filter("TRX"."ORDER_NO"<>'NA')
3 - access("TRX"."ADDED_DATE">=TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ADDED_DATE"<TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
4 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
8 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
filter("STORE"."STORE_NO"="PROD"."STORE_NO")
10 - filter("ODR"."ORDER_NO"<>'NA')
Check here how to get the execution plan of your query

Update statement is slow with sum and nvl function

I have a procedure , in which a table's columns is being filled using sum and nvl functions on other tables' column. These update queries are slow and which is making overall Proc slow.One of such update query is below:
UPDATE t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
Here t_overall and t_final , both the tables do not have any indexes as they have multiple updates in the overall procedure. Number of records for table t_final is around 8500 and for table t_overall is around 13000. Is there any other way , I can write above query in more optimized way?
Edit 1: Here SUM(NVL(pct,0)) function is first replacing null to 0 in 'pct' column of table t_overall and then adds all pct values using sum function and updates pct column of the table t_final depending on the criteria.
Explain plan returns below:
OPERATION OBJECT_NAME CARDINALITY COST
UPDATE STATEMENT 6 424
UPDATE T_FINAL
TABLE ACCESS(FULL) T_FINAL 6 238
. Filter Predicates
. AND
. RTYPE=6
. SID='R12'
. RID=9
. PID=21
SORT(AGGREGATE) 1
TABLE ACCESS(FULL) T_OVERALL 1 30
Filter Predicates
AND
MID-:B1
RTYPE=6
SID='R12'
RID=9
PID=21
Updated number of rows are around 2200
Edit 2: I have run update query with hint /*+ gather_plan_statistics */ as below:
ALTER session SET statistics_level=ALL;
UPDATE /*+ gather_plan_statistics */ t_final wp
SET PCT =
(
SELECT SUM(NVL(pct,0))
FROM t_overall
WHERE rid = 9
AND rtype = 1
AND sid = 'r12'
AND pid = 21
AND mid = wp.mid
)
WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
select * from
table (dbms_xplan.display_cursor (format=>'ALLSTATS LAST'));
The result is:
SQL_ID gypnfv5nzurb0, child number 1
-------------------------------------
select child_number from v$sql where sql_id = :1 order by
child_number
Plan hash value: 4252345203
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | | | |
| 1 | SORT ORDER BY | | 1 | 1 | 2 |00:00:00.01 | 2048 | 2048 | 2048 (0)|
|* 2 | FIXED TABLE FIXED INDEX| X$KGLCURSOR_CHILD (ind:2) | 1 | 1 | 2 |00:00:00.01 | | | |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("KGLOBT03"=:1 AND "INST_ID"=USERENV('INSTANCE')))
Thank you.
You did not provide enough information to make unique diagnose, so I can only hint you how to troubleshoot your query.
Here is my setup simulation your data
create table t_final as
select rownum mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, 0 pct from dual
connect by level <= 8800;
drop table T_OVERALL;
create table T_OVERALL as
select mod(rownum,8800) mid, 8 + mod(rownum,4) rid, 1 rtype, 'r12' sid, 21 pid, rownum pct from dual
connect by level <= 13000;
Now I run the query activating the statistics gathering to see what the query is doing:
SQL> UPDATE /*+ gather_plan_statistics */ t_final wp
2 SET PCT =
3 (
4 SELECT SUM(NVL(pct,0))
5 FROM t_overall
6 WHERE rid = 9
7 AND rtype = 1
8 AND sid = 'r12'
9 AND pid = 21
10 AND mid = wp.mid
11 )
12 WHERE rid = 9 AND rtype = 1 AND sid = 'r12' AND pid = 21;
2200 rows updated.
Elapsed: 00:00:00.97
So nearly one second elapsed time, which is is slow if you have lot of such updates. To see the cause we display the cursor and the statsitics (hist is possible using the hint /*+ gather_plan_statistics */)
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
SQL_ID 3ctaz5gvksb54, child number 0
-------------------------------------
UPDATE /*+ gather_plan_statistics */ t_final wp SET PCT = (
SELECT SUM(NVL(pct,0)) FROM t_overall WHERE rid
= 9 AND rtype = 1 AND sid = 'r12' AND pid =
21 AND mid = wp.mid ) WHERE rid = 9 AND rtype =
1 AND sid = 'r12' AND pid = 21
Plan hash value: 1255260726
-------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.96 | 116K|
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.96 | 116K|
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.92 | 112K|
|* 4 | TABLE ACCESS FULL| T_OVERALL | 2200 | 33 | 3250 |00:00:00.85 | 112K|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
2 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "SID"='r12'))
4 - filter(("RID"=9 AND "RTYPE"=1 AND "PID"=21 AND "MID"=:B1 AND "SID"='r12'))
So you see the main problem was in the FULL TABLE SCAN on T_OVERALL which was called 2200 times (columns Starts, line 4).
A remedy could provide an Index based on the filter predicate of line 4:
create index T_OVERALL_IDX on T_OVERALL(mid, rid, rtype, sid, pid);
On the same data now I got:
Elapsed: 00:00:00.05
with the changed plan using now 2200 INDEX RANGE SCANs
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.05 | 10272 |
| 1 | UPDATE | T_FINAL | 1 | | 0 |00:00:00.05 | 10272 |
|* 2 | TABLE ACCESS FULL | T_FINAL | 1 | 2200 | 2200 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 2200 | 1 | 2200 |00:00:00.01 | 5755 |
| 4 | TABLE ACCESS BY INDEX ROWID| T_OVERALL | 2200 | 33 | 3250 |00:00:00.01 | 5755 |
|* 5 | INDEX RANGE SCAN | T_OVERALL_IDX | 2200 | 1 | 3250 |00:00:00.01 | 2505 |
---------------------------------------------------------------------------------------------------------
Simple recheck the same approach with your data, if you observe a different behavior feel free to post it.

How to reuse sql with subquery factoring by database view

What is the best practice to convert the following sql statement using a subquery (with data as clause) to use it in a database view.
AFAIK the with data as clause is not supported in database views (Edited: Oracle supports Common Table Expressions), but in my case the subquery factoring offers advantage for performance. If I create a database view using Common Table Expression, than this advantage is lost.
Please have a look at my example:
Description of query
a_table
Millions of entries, by the select statement a few thousand are selected.
anchor_table
For each entry in a_table exists a corresponding entry in anchor_table. By this table is determined at runtime exactly one row as anchor. See example below.
horizon_table
For each selection exactly one entry is determined at runtime (all entries of a selection of a_table have the same horizon_id)
Please notice: This is a strongly simplified sql that works fine so far.
In reality more than 20 tables are joined together to get the results of data.
The where clause is much more complex.
Further columns of horizon_table and anchor_table are required to prepare my where condition and result list in the subquery, i.e. moving these tables to the main query is no solution.
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 1 and 10000
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
example of with data as select:
id descr offset anchor position
1 bla 3 0 1
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
54 oeroe 3 0 9
...
result of select * from data
id descr offset anchor position
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
I.E. the result is the anchor row and the tree rows above and below.
How can I achieve the same within a database view?
My attempt failed as I expected by performance issues:
Create a view data of with data as select above
Use this view as above
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
Thank you for any advice :-)
Amendment
If I create a view as recommended in first comment, than I get the same performance issue. Oracle does not use the subquery to restrict the results.
Here are the execution plans of my production queries (please click at the images)
a) SQL
b) View
Here are the execution plans of my test cases
-- Create Testdata table with ~ 1,000,000 entries
insert into a_table
(id, descr, a_position_field, anchor_id, horizon_id, a_value)
select level, 'data' || level, mod(level, 10), level, 1, level
from dual
connect by level <= 999999;
insert into anchor_table
(id, a_date)
select level, trunc(sysdate) - 500000 + level
from dual
connect by level <= 999999;
insert into horizon_table (id, offset) values (1, 50);
commit;
-- Create view
create or replace view testdata_vw as
with data as
(select a_table.id,
a_table.descr,
a_table.a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id))
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
-- Explain plan of subquery factoring select statement
explain plan for
with data as
(select a_table.id,
a_table.descr,
a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 500000 - 500 and 500000 + 500)
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D6628_284C5768 ~ 1000 rows
Plan hash value: 1145408420
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 1791 (2)| 00:00:31 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6628_284C5768 | | | | |
| 3 | WINDOW SORT | | 57 | 6840 | 1785 (2)| 00:00:31 |
|* 4 | HASH JOIN | | 57 | 6840 | 1784 (2)| 00:00:31 |
|* 5 | TABLE ACCESS FULL | A_TABLE | 57 | 4104 | 1193 (2)| 00:00:21 |
| 6 | MERGE JOIN CARTESIAN | | 1189K| 54M| 586 (2)| 00:00:10 |
| 7 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | 3 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 1189K| 24M| 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| 583 (2)| 00:00:10 |
|* 10 | FILTER | | | | | |
| 11 | VIEW | | 57 | 3534 | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 13 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 15 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID" AND
"ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
5 - filter("A_TABLE"."A_VALUE">=499500 AND "A_TABLE"."A_VALUE"<=500500)
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE
("T1") "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<=
(SELECT "D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1"
"DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D2" WHERE "D2"."ANCHOR"=1))
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
-- Explain plan of database view
explain plan for
select *
from testdata_vw
where a_value between 500000 - 500 and 500000 + 500;
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D662A_284C5768 ~ 1000000 rows
Plan hash value: 1422141561
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 1 | VIEW | TESTDATA_VW | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 2 | TEMP TABLE TRANSFORMATION | | | | | | |
| 3 | LOAD AS SELECT | SYS_TEMP_0FD9D662A_284C5768 | | | | | |
| 4 | WINDOW SORT | | 1189K| 136M| 147M| 37032 (1)| 00:10:30 |
|* 5 | HASH JOIN | | 1189K| 136M| | 6868 (1)| 00:01:57 |
| 6 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | | 3 (0)| 00:00:01 |
|* 7 | HASH JOIN | | 1189K| 106M| 38M| 6860 (1)| 00:01:57 |
| 8 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| | 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | A_TABLE | 1209K| 83M| | 1191 (2)| 00:00:21 |
|* 10 | FILTER | | | | | | |
|* 11 | VIEW | | 1189K| 70M| | 4431 (1)| 00:01:16 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 13 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 15 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID")
7 - access("ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE ("T1")
"C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= (SELECT
"D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1" "DESCR","C2"
"A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D2"
WHERE "D2"."ANCHOR"=1))
11 - filter("A_VALUE">=499500 AND "A_VALUE"<=500500)
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
sqlfiddle
explain plan of sql http://www.sqlfiddle.com/#!4/6a7022/3
explain plan of view http://www.sqlfiddle.com/#!4/6a7022/2
You need to write a view definition which returns all possible selectable ranges of a_value as two columns, start_a_value and end_a_value, along with all records which fall into each start/end range. In other words, the correct view definition should logically describe a |n^3| result set given n rows in a_table.
Then query that view as:
SELECT * FROM testdata_vw WHERE START_A_VALUE = 4950 AND END_A_VALUE = 5050;
Also, your multiple references to "data" are unnecessary; same logic can be delivered with an additional analytic function.
Final view def:
CREATE OR REPLACE VIEW testdata_vw AS
SELECT *
FROM
(
SELECT T.*,
MAX(CASE WHEN ANCHOR=1 THEN POSITION END)
OVER (PARTITION BY START_A_VALUE, END_A_VALUE) ANCHOR_POS
FROM
(
SELECT S.A_VALUE START_A_VALUE,
E.A_VALUE END_A_VALUE,
B.ID ID,
B.DESCR DESCR,
HORIZON_TABLE.OFFSET OFFSET,
CASE
WHEN ANCHOR_TABLE.A_DATE = TRUNC(SYSDATE)
THEN 1
ELSE 0
END ANCHOR,
ROW_NUMBER()
OVER(PARTITION BY S.A_VALUE, E.A_VALUE
ORDER BY B.A_POSITION_FIELD) POSITION
FROM
A_TABLE S
JOIN A_TABLE E
ON S.A_VALUE<E.A_VALUE
JOIN A_TABLE B
ON B.A_VALUE BETWEEN S.A_VALUE AND E.A_VALUE
JOIN ANCHOR_TABLE
ON ANCHOR_TABLE.ID = B.ANCHOR_ID
JOIN HORIZON_TABLE
ON HORIZON_TABLE.ID = B.HORIZON_ID
) T
) T
WHERE POSITION BETWEEN ANCHOR_POS - OFFSET AND ANCHOR_POS+OFFSET;
EDIT: SQL Fiddle with expected execution plan
I'm seeing the same (sensible) plan here that I saw in my database; if you're getting something different, please send fiddle link.
Use index lookup to find 1 row in "S" A_TABLE (A_VALUE = 4950)
Use index lookup to find 1 row in "E" A_TABLE (A_VALUE = 5050)
Nested Loop join #1 and #2 (1 x 1 join, still 1 row)
FTS 1 row from HORIZON table
Cartesian join #1 and #2 (1 x 1, okay to use Cartesian).
Use index lookup to find ~100 rows in "B" A_TABLE with values between 4950 and 5050.
Cartesian join #5 and #6 (1 x 102, okay to use Cartesian).
FTS ANCHOR_TABLE with hash join to #7.
Window-sort for analytic functions
You have a predicate outside the view and you want to be applied in the view.
For this, you can use push_pred hint:
select /*+PUSH_PRED(v)*/
*
from
testdata_vw v
where
a_value between 5000 - 50 and 5000 + 50;
SQLFIDDLE
EDIT: Now I've seen that you use the data subquery three times. For the first occurrence it makes sense to push the predicate, but for d1 and d2 it doesn't. It's another query.
What would I do is to use two context variables, set them according my needs and write the query:
SYS_CONTEXT('my_context_name', 'var5000');
create or replace view testdata_vw as
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between SYS_CONTEXT('my_context_name', 'var5000') - SYS_CONTEXT('my_context_name', 'var50') and SYS_CONTEXT('my_context_name', 'var5000') + SYS_CONTEXT('my_context_name', 'var50')
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1) ;
to use it:
dbms_session.set_context ('my_context_name', 'var5000', 5000);
dbms_session.set_context ('my_context_name', 'var50', 50);
select * from testdata_vw;
UPDATE: Instead of context variables(which can be used across sessions) you can use package variables as you commented.

Oracle optimizer equijoin of 3 tables

i have 3 tables in a oracle 11g database. I don't have access to trace file or explain plan anymore. I join the 3 table on the date field like:
select * from a,b,c where a.date = b.date and b.date = c.date
and that takes forever.
when I
select * from a,b,c where a.date = b.date and b.date = c.date and a.date = c.date
its fast. but should that make a difference?
Not sure but it looks like a transitive dependency. that's to say if a.date = b.date and b.date = c.date then a.date = c.date. You can modify your query rather like
select a.*
from a
join b on a.date = b.date
join c on a.date = c.date;
I would also have a index on date column for all this 3 tables since that's the column you are joining on.
Apparently the database does not rewrite queries if the joins are such that A = B, B = C ==> A = C so it's stuck to using what its given.
Consider the following:
create table a (dt date);
create table b (dt date);
create table c (dt date);
Now fill in the tables so that a is the smallest (5 rows), b is the biggest (100 rows), and c is in the middle (50 rows). Also, so that not all rows in b and c will join to a just to make things a bit more interesting.
insert into a
select to_date('2015-01-01', 'yyyy-mm-dd') + rownum - 1
from dual
connect by level <= 5
;
insert into b
select to_date('2015-01-01', 'yyyy-mm-dd') + mod(rownum, 10)
from dual
connect by level <= 100
;
insert into c
select to_date('2015-01-01', 'yyyy-mm-dd') + mod(rownum, 10)
from dual
connect by level <= 50
;
I'm going to bypass statistics for now and leave it totally up to the database on how to figure out a plan.
Take 1: without the join from a to c:
explain plan for
select *
from a
, b
, c
where a.dt = b.dt
and b.dt = c.dt
;
and here's the plan:
select *
from table(dbms_xplan.display())
;
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 50 | 900 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| B | 100 | 900 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | C | 50 | 450 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."DT"="C"."DT")
2 - access("A"."DT"="B"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
First off, since there were no statistics on the tables, Oracle chose to sample the data first so it wasn't going in blind. In this case, table a joins to b first, then the result of that joins to c.
Take 2: introduce the a.dt = c.dt condition:
explain plan for
select *
from a
, b
, c
where a.dt = b.dt
and b.dt = c.dt
and a.dt = c.dt
;
select *
from table(dbms_xplan.display())
;
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 675 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 25 | 675 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 25 | 450 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| C | 50 | 450 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | B | 100 | 900 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DT"="B"."DT" AND "B"."DT"="C"."DT")
2 - access("A"."DT"="C"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
And there you go. The order of the joins has switched now that Oracle has been given the extra join path. (FYI, this is the same plan if using just a.dt = b.dt and a.dt = c.dt.)
BUT, notice anything? The estimates are not right anymore. It's guessing 25 rows in the end, not 250. So, the extra condition is actually causing some confusion.
Without the b.dt = c.dt, though, same join path, different estimates (same end result as the first one):
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 6750 | 9 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 25 | 450 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 5 | 45 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| C | 50 | 450 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | B | 100 | 900 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DT"="B"."DT")
2 - access("A"."DT"="C"."DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Long story a little longer, since the database isn't going to assume any join paths for you, adding one in your query gives the database more options and as such can change its plan...and a change in plan can certainly affect how fast the results are returned.
This is Your Query.....
select * from a,b,c where a.date = b.date and .date = c.date and a.date = c.date
Now As per my view ..
SELECT * FROM a
JOIN B USING(date)
JOIN C USING(date);

Resources