Using Oracle 10gR2 on LINUX, I'm trying to tune the following query.
I'm pretty sure that getting rid of the correlated subqueries and the possible use of some analytic functions may be the optimal way to go, but I'm just not getting it -- especially with the nested correlated subquery that selects on the MAX(TABLE_2.NOTE_DATE). Any help would be much appreciated. Thanks.
EXPLAIN PLAN FOR
SELECT TABLE_4.INCIDENT_TYPE,
TABLE_4.POC_CONTACT,
(SELECT TABLE_2.NOTE_DATE
|| ' '
|| TABLE_1.USER_FIRST_NAME
|| ' '
|| TABLE_1.USER_LAST_NAME
|| ' : '
|| TABLE_2.OTHER_HELP_NOTES
FROM TABLE_1,
TABLE_2
WHERE TABLE_2.USER_ID = TABLE_1.USER_ID
AND TABLE_2.REC_ID = TABLE_4.REC_ID
AND TABLE_2.NOTE_DATE = (SELECT MAX(TABLE_2.NOTE_DATE)
FROM TABLE_2
WHERE TABLE_2.REC_ID = TABLE_4.REC_ID
AND TABLE_2.NOTE_DATE <=
TABLE_4.REPORT_DATE))
AS SUM_OF_SHORTAGE,
(SELECT TABLE_3.NOTE_DATE
|| ' '
|| TABLE_1.USER_FIRST_NAME
|| ' '
|| TABLE_1.USER_LAST_NAME
|| ' : '
|| TABLE_3.HELP_NOTES
FROM TABLE_1,
TABLE_3
WHERE TABLE_3.USER_ID = TABLE_1.USER_ID
AND TABLE_3.REC_ID = TABLE_4.REC_ID
AND TABLE_3.NOTE_DATE = (SELECT MAX(TABLE_3.NOTE_DATE)
FROM TABLE_3
WHERE TABLE_3.REC_ID = TABLE_4.REC_ID
AND TABLE_3.NOTE_DATE <=
TABLE_4.REPORT_DATE)) AS HELP_NOTES,
TABLE_4.REPORT_NUM
FROM TABLE_4
WHERE TABLE_4.SITE_ID = '1';
#C:\ORACLE\PRODUCT\11.2.0\CLIENT_1\RDBMS\ADMIN\UTLXPLS.SQL;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PLAN HASH VALUE: 4036328474
------------------------------------------------------------------------------------------------------------
| ID | OPERATION | NAME | ROWS | BYTES | COST (%CPU)| TIME |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13009 | 2286K| 449 (2)| 00:00:06 |
|* 1 | FILTER | | | | | |
| 2 | NESTED LOOPS | | 3 | 612 | 8 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| TABLE_2 | 3 | 552 | 5 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_TABLE_2_REC_ID | 3 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TABLE_1 | 1 | 20 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | TABLE_1_PK | 1 | | 0 (0)| 00:00:01 |
| 7 | SORT AGGREGATE | | 1 | 13 | | |
|* 8 | TABLE ACCESS BY INDEX ROWID| TABLE_2 | 1 | 13 | 5 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | IX_TABLE_2_REC_ID | 3 | | 1 (0)| 00:00:01 |
|* 10 | FILTER | | | | | |
|* 11 | HASH JOIN | | 17 | 4063 | 482 (2)| 00:00:06 |
|* 12 | TABLE ACCESS FULL | TABLE_3 | 17 | 3723 | 474 (2)| 00:00:06 |
| 13 | TABLE ACCESS FULL | TABLE_1 | 1504 | 30080 | 8 (0)| 00:00:01 |
| 14 | SORT AGGREGATE | | 1 | 13 | | |
|* 15 | TABLE ACCESS FULL | TABLE_3 | 1 | 13 | 474 (2)| 00:00:06 |
|* 16 | TABLE ACCESS FULL | TABLE_4 | 13009 | 2286K| 449 (2)| 00:00:06 |
------------------------------------------------------------------------------------------------------------
PREDICATE INFORMATION (IDENTIFIED BY OPERATION ID):
---------------------------------------------------
1 - FILTER("TABLE_2"."NOTE_DATE"= (SELECT /*+ */ MAX("TABLE_2"."NOTE_DATE")
FROM "TABLE_2" "TABLE_2" WHERE "TABLE_2"."REC_ID"=:B1 AND
"TABLE_2"."NOTE_DATE"<=:B2))
4 - ACCESS("TABLE_2"."REC_ID"=:B1)
6 - ACCESS("TABLE_2"."USER_ID"="TABLE_1"."USER_ID")
8 - FILTER("TABLE_2"."NOTE_DATE"<=:B1)
9 - ACCESS("TABLE_2"."REC_ID"=:B1)
10 - FILTER("TABLE_3"."NOTE_DATE"= (SELECT /*+ */
MAX("TABLE_3"."NOTE_DATE") FROM "TABLE_3" "TABLE_3" WHERE
"TABLE_3"."REC_ID"=:B1 AND "TABLE_3"."NOTE_DATE"<=:B2))
11 - ACCESS("TABLE_3"."USER_ID"="TABLE_1"."USER_ID")
12 - FILTER("TABLE_3"."REC_ID"=:B1)
15 - FILTER("TABLE_3"."REC_ID"=:B1 AND "TABLE_3"."NOTE_DATE"<=:B2)
16 - FILTER("TABLE_4"."SITE_ID"=1)
41 ROWS SELECTED
Breaking down this query -- the key problem seems to be the following:
select REC_ID, TO_CHAR(REPORT_DATE,'DD-MON-YY HH:MI:SS') REPORT_DATE,
(SELECT MAX(TABLE_2.note_date) as MAX_DATE
FROM TABLE_2
where TABLE_2.REC_ID = TABLE_1.REC_ID
and TABLE_2.NOTE_DATE <= TABLE_1.REPORT_DATE
) NOTES_MAX_DATE
from TABLE_1 where REC_ID = 121 order by TO_DATE(REPORT_DATE,'DD-MON-YY HH:MI:SS');
Which should return the following:
REC_ID REPORT_DATE NOTES_MAX_DATE
---------------------- ------------------ -------------------------
121 17-APR-10 12:30:00
121 24-APR-10 12:30:00
121 01-MAY-10 12:30:00
121 08-MAY-10 12:30:00
121 15-MAY-10 12:30:00 12-MAY-10
121 22-MAY-10 12:30:01 17-MAY-10
121 29-MAY-10 12:30:01 25-MAY-10
121 05-JUN-10 12:30:00 25-MAY-10
8 rows selected
The output needs to be the same as the above. I tried creating a join as follows:
SELECT TABLE_1.REC_ID, TO_CHAR(TABLE_1.REPORT_DATE,'DD-MON-YY HH:MI:SS') REPORT_DATE, MAX(TABLE_2.NOTE_DATE) AS NOTES_MAX_DATE
FROM TABLE_2,
TABLE_1
where TABLE_2.REC_ID = TABLE_1.REC_ID
AND TABLE_2.NOTE_DATE <= TABLE_1.REPORT_DATE
and ( TABLE_1.SITE_ID = '1' )
and TABLE_1.REC_ID = 121
group by TABLE_1.REC_ID, TABLE_1.REPORT_DATE
order by TO_DATE(REPORT_DATE,'DD-MON-YY HH:MI:SS');
But that yields:
REC_ID REPORT_DATE NOTES_MAX_DATE
---------------------- ------------------ -------------------------
121 15-MAY-10 12:30:00 12-MAY-10
121 22-MAY-10 12:30:01 17-MAY-10
121 29-MAY-10 12:30:01 25-MAY-10
121 05-JUN-10 12:30:00 25-MAY-10
So I'm really stumped. Any ideas? -- Thanks.
Below is a version that only gets the max once and should remove the correlated sub-queries. It does still use sub-queries, but, as they're in the FROM clause rather than the SELECT clause, the database should do a better job of resolving them. It's probably possible to remove those sub-queries as well, but it's more readable this way. This version also uses the SQL-99 syntax for joins, which is generally considered preferable.
SELECT table_4.incident_type,
table_4.poc_contact,
t2.sum_of_shortage,
t3.help_notes,
table_4.report_num
FROM table_4
LEFT JOIN (SELECT table_2.rec_id,
table_2.note_date
|| ' '
|| table_1.user_first_name
|| ' '
|| table_1.user_last_name
|| ' : '
|| table_2.other_help_notes
AS sum_of_shortage
FROM table_1
JOIN table_2
ON table_2.user_id = table_1.user_id
WHERE table_2.note_date =
(SELECT MAX(table_2.note_date) AS max_date
FROM table_2
WHERE table_2.rec_id = table_4.rec_id
AND table_2.note_date <= table_4.report_date)) t2
ON t2.rec_id = table_4.rec_id
LEFT JOIN (SELECT table_3.rec_id,
table_3.note_date
|| ' '
|| table_1.user_first_name
|| ' '
|| table_1.user_last_name
|| ' : '
|| table_3.other_help_notes
AS help_notes
FROM table_1
JOIN table_3
ON table_3.user_id = table_1.user_id
WHERE table_2.note_date =
(SELECT MAX(table_3.note_date) AS max_date
FROM table_3
WHERE table_3.rec_id = table_4.rec_id
AND table_3.note_date <= table_4.report_date)) t3
ON t3.rec_id = table_4.rec_id
WHERE table_4.site_id = '1';
#shawno: You're right, the with clause was flawed because I misread your initial query. Above is a corrected version. Because the max values are specific to each row, the method that you were already using to get those values is probably the most efficient. Your best option for optimizing this appears to just be moving the sub-queries from the select clause to the from clause.
Also, this is an untested solution, as I have neither your table structure nor your data. The best I can do without putting far too much work into it is to verified that the syntax is valid.
Related
I'm facing unsolvable and impossible performace drop while using UNION ALL with two sub-queries in one cursor (at least I think that's the problem). PL/SQL Developer just freezes when opening cursor results in test window.
If I turn off no matter which sub-query - everything works fine.
If I take the whole query out of cursor to regular SQL Query windows - everything is okay without any need to turn off some parts.
Procedure structure is down below, looking forward any help:
procedure p_proc(p_param varchar2,
outcur out sys_refcursor) is
begin
open outcur for
select *
from (select -- visible cols
si.item_full_name
, si.final_price
, si.full_price
, si.receipt_num
, si.receipt_date
, si.vendor_code
, case when det.br_summary is null and mr.motiv_rate_value is not null then mr.motiv_rate_value
when det.br_summary is not null then det.br_summary
end personal_bonus_amount
, case when det.br_summary is null and mr.motiv_rate_value is not null then 1
when det.br_summary is not null then det.cross_sale_kt
end personal_bonus_koeff
-- service cols
, case when det.br_summary is null and mr.motiv_rate_value is not null then 'approximate'
when det.br_summary is not null then 'definite'
end personal_bonus_type
, coalesce(det.sale_stream, mr.sale_stream, 'Not defined') item_group_name
, si.operation_type
, si.src
-- pagination
, row_number() over (order by si.receipt_date desc) rn
from (-- curr day
select b.cost final_price
, case when b.discount = 0 then null else b.price
end full_price
, b.doc_number receipt_num
, b.receipt_date receipt_date
, i.item_code vendor_code
, i.full_name item_full_name
, b.subsite code_op
, b.operator_id
, to_char(b.businessday, 'yyyymm') sale_period
, b.oper_type operation_type
, 'bill' src
from scheme.bills b
join scheme.items i on i.item_code = b.item
where b.businessday = trunc(p_date_to)
and b.subsite = p_office_id
and b.operator_id = p_emp_id
union all
-- prev days
select l.txn_amount final_price
, case when l.disc = 0 then null else l.price
end full_price
, t.receipt_num receipt_num
, t.ts receipt_date
, i.item_code vendor_code
, i.full_name item_full_name
, s.office_code code_op
, e.emp_code operator_id
, to_char(l.dt,'yyyymm') sale_period
, l.txn_type operation_type
, 'txn' src
from scheme.txn t
join scheme.txn_lines l on t.rtl_txn_id = l.rtl_txn_id
join scheme.items i on l.item_id = i.item_id
join scheme.offices s on t.subsite_id = s.subsite_id
join scheme.employees e on t.employee_id = e.employee_id
where t.ts between trunc(p_date_from) and trunc(p_date_to)
and t.subsite_id = v_op_id
and t.employee_id = v_emp_id
) si
/* fact */
left join scheme.sales_details det on si.sale_period = det.period
and si.code_op = det.op_code
and ltrim(si.operator_id,'0') = ltrim(det.tab_num,'0')
and si.receipt_num = det.rcpt_num
and si.vendor_code = det.item_article
/* prognosis */
left join scheme.rates mr on si.sale_period = mr.motiv_rate_period
and si.code_op = mr.code_op
and si.vendor_code = mr.code_1c
where 1 = 1
and si.final_price between nvl(p_price_from, si.final_price) and nvl(p_price_to, si.final_price)
/* if no filters */
and (item_group_cnt = 0 or coalesce(det.sale_stream, mr.sale_stream, 'Not defined') in (select * from table(p_item_group)))
and si.receipt_num = nvl(p_receipt_num, si.receipt_num)
)
where rn between p_page_num * p_page_size + 1 and (p_page_num + 1) * p_page_size;
end;
UPD Explain plan for the whole query used in a cursor:
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 32810 | 62 | 00:00:01 |
| * 1 | VIEW | | 10 | 32810 | 62 | 00:00:01 |
| * 2 | WINDOW SORT PUSHED RANK | | 2 | 2956 | 62 | 00:00:01 |
| 3 | NESTED LOOPS OUTER | | 2 | 2956 | 61 | 00:00:01 |
| 4 | NESTED LOOPS OUTER | | 2 | 2826 | 53 | 00:00:01 |
| 5 | VIEW | | 2 | 2728 | 46 | 00:00:01 |
| 6 | UNION-ALL | | | | | |
| 7 | NESTED LOOPS | | 1 | 138 | 32 | 00:00:01 |
| 8 | NESTED LOOPS | | 1 | 138 | 32 | 00:00:01 |
| 9 | PARTITION RANGE SINGLE | | 1 | 66 | 29 | 00:00:01 |
| * 10 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | F003_BILL | 1 | 66 | 29 | 00:00:01 |
| * 11 | INDEX RANGE SCAN | IX_SUBSITE_DOCNUM_BUSINDAY_SEQ | 1 | | 5 | 00:00:01 |
| * 12 | INDEX RANGE SCAN | IX_D001_CODE_1C_ITEM_ID | 1 | | 2 | 00:00:01 |
| 13 | TABLE ACCESS BY INDEX ROWID | D001_ITEM | 1 | 72 | 3 | 00:00:01 |
| 14 | NESTED LOOPS | | 1 | 183 | 14 | 00:00:01 |
| 15 | NESTED LOOPS | | 1 | 183 | 14 | 00:00:01 |
| 16 | NESTED LOOPS | | 1 | 104 | 12 | 00:00:01 |
| 17 | NESTED LOOPS | | 1 | 70 | 7 | 00:00:01 |
| 18 | NESTED LOOPS | | 1 | 30 | 4 | 00:00:01 |
| 19 | TABLE ACCESS BY INDEX ROWID | D005_EMPLOYEE | 1 | 18 | 3 | 00:00:01 |
| * 20 | INDEX UNIQUE SCAN | PK_D005 | 1 | | 2 | 00:00:01 |
| 21 | TABLE ACCESS BY INDEX ROWID | D018_SUBSITE | 1 | 12 | 1 | 00:00:01 |
| * 22 | INDEX UNIQUE SCAN | PK_D018 | 1 | | 0 | 00:00:01 |
| 23 | PARTITION RANGE ITERATOR | | 1 | 40 | 3 | 00:00:01 |
| 24 | PARTITION HASH SINGLE | | 1 | 40 | 3 | 00:00:01 |
| * 25 | TABLE ACCESS FULL | F007_RTL_TXN | 1 | 40 | 3 | 00:00:01 |
| * 26 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | F008_RTL_TXN_LI | 1 | 34 | 5 | 00:00:01 |
| * 27 | INDEX RANGE SCAN | IX_F008_RTL_TXN_ID | 7 | | 3 | 00:00:01 |
| * 28 | INDEX UNIQUE SCAN | PK_D001 | 1 | | 1 | 00:00:01 |
| 29 | TABLE ACCESS BY INDEX ROWID | D001_ITEM | 1 | 79 | 2 | 00:00:01 |
| * 30 | TABLE ACCESS BY INDEX ROWID BATCHED | T_OP_MOTIVATION_RATE_MYRTK | 1 | 49 | 7 | 00:00:01 |
| * 31 | INDEX RANGE SCAN | IDX02_CODE_OP_1C | 3 | | 3 | 00:00:01 |
| * 32 | TABLE ACCESS BY INDEX ROWID BATCHED | DET_SALES_PPT_DWH | 1 | 65 | 4 | 00:00:01 |
| * 33 | INDEX RANGE SCAN | IDX_03_RCPT_NUM | 3 | | 2 | 00:00:01 |
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("RN">=1 AND "RN"<=10)
* 2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("SI"."RECEIPT_DATE") DESC )<=10)
* 10 - filter("F003"."OPERATOR_ID"='000189513' AND "F003"."COST">=TO_NUMBER(TO_CHAR("F003"."COST")) AND "F003"."COST"<=TO_NUMBER(TO_CHAR("F003"."COST")))
* 11 - access("F003"."SUBSITE"='S165' AND "F003"."BUSINESSDAY"=TO_DATE(' 2021-11-23 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
* 11 - filter("F003"."BUSINESSDAY"=TO_DATE(' 2021-11-23 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "F003"."DOC_NUMBER" IS NOT NULL)
* 12 - access("I"."D001_CODE_1C"="F003"."ITEM")
* 12 - filter("I"."D001_CODE_1C" IS NOT NULL)
* 20 - access("E"."EMPLOYEE_ID"=3561503543)
* 22 - access("S"."SUBSITE_ID"=29260)
* 25 - filter("T"."EMPLOYEE_ID"=3561503543 AND "T"."SUBSITE_ID"=29260 AND "T"."F007_TS"<=TO_DATE(' 2021-11-23 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T"."F007_RCPT_NUM_1C" IS NOT NULL)
* 26 - filter("L"."F008_AMOUNT">=TO_NUMBER(TO_CHAR("L"."F008_AMOUNT")) AND "L"."F008_AMOUNT"<=TO_NUMBER(TO_CHAR("L"."F008_AMOUNT")))
* 27 - access("T"."RTL_TXN_ID"="L"."RTL_TXN_ID")
* 28 - access("L"."ITEM_ID"="I"."ITEM_ID")
* 30 - filter("SI"."SALE_PERIOD"="MR"."MOTIV_RATE_PERIOD"(+))
* 31 - access("SI"."CODE_OP"="MR"."CODE_OP"(+) AND "SI"."VENDOR_CODE"="MR"."CODE_1C"(+))
* 32 - filter("SI"."CODE_OP"="DET"."OP_CODE"(+) AND "SI"."VENDOR_CODE"="DET"."ITEM_ARTICLE"(+) AND "DET"."ITEM_ARTICLE"(+) IS NOT NULL AND "DET"."PERIOD"(+)=TO_NUMBER("SI"."SALE_PERIOD") AND
LTRIM("SI"."OPERATOR_ID",'0')=LTRIM("DET"."TAB_NUM_RTK"(+),'0'))
* 33 - access("SI"."RECEIPT_NUM"="DET"."RCPT_NUM"(+))
* 33 - filter("DET"."RCPT_NUM"(+) IS NOT NULL)
Actual solution
Managed to get procedure execution plan from DBA. The problem was that optimizer chose another index for joining scheme.sales_details table when executing query inside the procedure. Added INDEX HINT with the same index which was used in regular query and everything works just fine.
Deprecated ideas down below
As far as I understood the problem is in Oracle optimizer which "thought" that doing UNION ALL first is better than pushing predicate into the sub-query. Separating this union into two single queries make him push pred without any hesitations.
Probably this can be fixed by playing with hints, that's wip for now.
Temporary workaround is to regroup the query, going from this structure
select *
from (select row_number() rn
, u.*
from (select *
from first_query
union all
select *
from second_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
)
where rn between c and d;
to this
select *
from (select row_number() rn
, u.*
from (select *
from first_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
union all
select row_number() rn
, u.*
from (select *
from second_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
)
where rn between c and d;
That's not the perfect solution cause it doubles the JOIN section but at least it works.
I have a oracle query which is executed once a month to get the order details processed. This query is taking a painfully lot of time to execute. ( More than thirty mins ). Therefore I am trying to optimize this. I have a decent knowledge in Oracle and I will explain what I have tried so far. Still, it takes around 20 minutes to complete. This is the query. Oracle version is 11g.
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
Background
order_main - This table has over 4 million records
I introduced index for order_no column which reduced time to execute.
My questions are as follows.
1) Will it help if I move date validation inside the inner query like this ?
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
WHERE to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
2) Could someone please suggest any other improvements that can be done to this query?
3) Additional indexing would help in any other tables / columns ? Only daily_trx and order_main tables are the tables that contains huge amount of data.
Some generall suggestions
Do not combine ANSI and Oracle Join Syntax in one Query
Do not use outer join if inner join can be used
Your inner subqueries use outer joins, but the final join to main_stores is an inner join
eliminating all rows with store_no is null - you may use inner joins with the same result.
Filter rows early
A suboptimal practice is to first join in a subquery and than filter relevant row with where conditions
Use simple predicated
If you want to constraint a DATE column do it this way
trx.ADDED_DATE >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
Use count distinct if appropriate
The select DISTINCTquery in the third line cam be eliminated if you use COUNT(DISTINCT order_no)
Applying all the above point I come to the following query
select
store.store_no, store.store_typ, count(DISTINCT trx.order_no) order_no_cnt
from daily_trx trx
join order_main odr on trx.order_no = odr.order_no
join ordr_prod_typ prod on odr.prod_typ = prod.prod_typ
join main_stores store on store.store_no = prod.store_no
where trx.ADDED_DATE >= date'2020-05-01' and
trx.ADDED_DATE < date'2020-06-01' and
trx.order_no !='NA'
group by store.store_no, store.store_typ
Performance Considereations
You process a month of data, so there will be probably a large number of transaction (say 100K+). In this case the best approach is to full scan the two large tables and perform HASH JOINs.
You can expect this execution plan
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 199K| 5850K| | 592 (2)| 00:00:08 |
|* 1 | HASH JOIN | | 199K| 5850K| | 592 (2)| 00:00:08 |
| 2 | TABLE ACCESS FULL | MAIN_STORES | 26 | 104 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 199K| 5070K| | 588 (2)| 00:00:08 |
| 4 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 199K| 4290K| 1960K| 584 (1)| 00:00:08 |
|* 6 | TABLE ACCESS FULL| ORDER_MAIN | 100K| 782K| | 69 (2)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| DAILY_TRX | 200K| 2734K| | 172 (2)| 00:00:03 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
3 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
5 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
6 - filter("ODR"."ORDER_NO"<>'NA')
7 - filter("TRX"."ADDED_DATE"<TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ORDER_NO"<>'NA' AND "TRX"."ADDED_DATE">=TO_DATE(' 2020-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
If you have a partition option available you will massively profit by defining a monthly partitioning schema (alternative a daily partitioning) on the two tables DAILY_TRX and ORDER_MAIN.
If the above assumption is not correct and you have very few transactions in the selected time interval (say below 1K) - you will go better using the index access and NESTED LOOPS joins.
You will need this set of indices
create index daily_trx_date on daily_trx(ADDED_DATE);
create unique index order_main_idx on order_main (order_no);
create unique index ORDR_PROD_TYP_idx1 on ORDR_PROD_TYP(prod_typ);
create unique index main_stores_idx1 on main_stores(store_no);
The expected plan is as follows
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID | DAILY_TRX | 92 | 1288 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | DAILY_TRX_DATE | 92 | | 3 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 100K| 1564K| 75 (3)| 00:00:01 |
| 5 | MERGE JOIN | | 26 | 208 | 6 (17)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MAIN_STORES | 26 | 104 | 2 (0)| 00:00:01 |
| 7 | INDEX FULL SCAN | MAIN_STORES_IDX1 | 26 | | 1 (0)| 00:00:01 |
|* 8 | SORT JOIN | | 26 | 104 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | 3 (0)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | ORDER_MAIN | 100K| 782K| 69 (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
2 - filter("TRX"."ORDER_NO"<>'NA')
3 - access("TRX"."ADDED_DATE">=TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ADDED_DATE"<TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
4 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
8 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
filter("STORE"."STORE_NO"="PROD"."STORE_NO")
10 - filter("ODR"."ORDER_NO"<>'NA')
Check here how to get the execution plan of your query
I am wondering about the following strange behaviour.
This function should log the selected data to a table ps_cs_corr_data_tb (this table is empty):
create or replace function cs_corr_data(i_id in varchar2,
i_key1 in varchar2,
i_key2 in varchar2,
i_key3 in varchar2,
i_key4 in varchar2,
i_key5 in varchar2)
return number as pragma autonomous_transaction;
begin
insert into ps_cs_corr_data_tb
(descr,
cs_key_id_01,
cs_key_id_02,
cs_key_id_03,
cs_key_id_04,
cs_key_id_05)
values
(i_id, i_key1, i_key2, i_key3, i_key4, i_key5);
commit;
return 1; /* insert successful */
exception
when dup_val_on_index then
return 0;
end;
Test a)
The test with the following select statement is successful (as expected):
select b.id, b.key1, b.key2, b.key3, b.key4, b.key5
from (select a.id, a.key1, a.key2, a.key3, a.key4, a.key5
from ( -- test data
select '1' as id,'1' as key1,' ' as key2,' ' as key3,' ' as key4,' ' as key5 from dual union all
select '1' as id,'2' as key1,' ' as key2,' ' as key3,' ' as key4,' ' as key5 from dual union all
select '1' as id,'3' as key1,' ' as key2,' ' as key3,' ' as key4,' ' as key5 from dual union all
select '1' as id,'4' as key1,' ' as key2,' ' as key3,' ' as key4,' ' as key5 from dual union all
select '1' as id,'5' as key1,' ' as key2,' ' as key3,' ' as key4,' ' as key5 from dual
) a
-- some conditions
where a.id = '1'
and a.key1 = '4') b
-- log the results of selection
where cs_corr_data(b.id, b.key1, b.key2, b.key3, b.key4, b.key5) = 1;
result of selection:
ID KEY1 KEY2 KEY3 KEY4 KEY5
1 4
result in logging table:
select * from ps_cs_corr_data_tb d;
DESCR CS_KEY_ID_01 CS_KEY_ID_02 CS_KEY_ID_03 CS_KEY_ID_04 CS_KEY_ID_05
1 4
So far the expected result!
Explain Plan:
Plan hash value: 334628103
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 90 | 2 (0)| 00:00:01 |
| 1 | VIEW | | 5 | 90 | 2 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | FILTER | | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 5 | FILTER | | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 7 | FILTER | | | | | |
| 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 9 | FILTER | | | | | |
| 10 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 11 | FILTER | | | | | |
| 12 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(NULL IS NOT NULL AND "CS_CORR_DATA"('1','1',' ',' ',' ','
')=1)
5 - filter(NULL IS NOT NULL AND "CS_CORR_DATA"('1','2',' ',' ',' ','
')=1)
7 - filter(NULL IS NOT NULL AND "CS_CORR_DATA"('1','3',' ',' ',' ','
')=1)
9 - filter("CS_CORR_DATA"('1','4',' ',' ',' ',' ')=1)
11 - filter(NULL IS NOT NULL AND "CS_CORR_DATA"('1','5',' ',' ',' ','
')=1)
Test b)
Now the same test with different test data preparation (but the same test data):
select b.id, b.key1, b.key2, b.key3, b.key4, b.key5
from (select a.id, a.key1, a.key2, a.key3, a.key4, a.key5
from (select '1' as id,
to_char(level) as key1,
' ' as key2,
' ' as key3,
' ' as key4,
' ' as key5
from dual
connect by level <= 5) a
where a.id = '1'
and a.key1 = '4') b
where cs_corr_data(b.id, b.key1, b.key2, b.key3, b.key4, b.key5) = 1;
result of selection:
ID KEY1 KEY2 KEY3 KEY4 KEY5
1 4
result in logging table:
select * from ps_cs_corr_data_tb d;
DESCR CS_KEY_ID_01 CS_KEY_ID_02 CS_KEY_ID_03 CS_KEY_ID_04 CS_KEY_ID_05
1 1
1 2
1 3
1 4
1 5
Explain Plan:
Plan hash value: 2403765415
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 1 | 37 | 2 (0)| 00:00:01 |
|* 2 | CONNECT BY WITHOUT FILTERING| | | | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CS_CORR_DATA"("A"."ID","A"."KEY1","A"."KEY2","A"."KEY3","A"."KE
Y4","A"."KEY5")=1 AND "A"."ID"='1' AND "A"."KEY1"='4')
2 - filter(LEVEL<=5)
Any ideas what is going on here?
Oracle (along with just about any relational database) is free to evaluate predicates in whatever order it expects would be most efficient. In either query, it is free to evaluate the function predicate first or to evaluate the a.id = '1' and a.key1 = '4' predicates first or to evaluate the function predicate between those two predicates. It appears that the actual plan the optimizer chose in the second case (at least this time) was to evaluate the function first while it chose to evaluate the function last in the first case. Of course, the optimizer is free to change its mind tomorrow in both cases so you shouldn't depend on a particular query plan.
What is the best practice to convert the following sql statement using a subquery (with data as clause) to use it in a database view.
AFAIK the with data as clause is not supported in database views (Edited: Oracle supports Common Table Expressions), but in my case the subquery factoring offers advantage for performance. If I create a database view using Common Table Expression, than this advantage is lost.
Please have a look at my example:
Description of query
a_table
Millions of entries, by the select statement a few thousand are selected.
anchor_table
For each entry in a_table exists a corresponding entry in anchor_table. By this table is determined at runtime exactly one row as anchor. See example below.
horizon_table
For each selection exactly one entry is determined at runtime (all entries of a selection of a_table have the same horizon_id)
Please notice: This is a strongly simplified sql that works fine so far.
In reality more than 20 tables are joined together to get the results of data.
The where clause is much more complex.
Further columns of horizon_table and anchor_table are required to prepare my where condition and result list in the subquery, i.e. moving these tables to the main query is no solution.
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 1 and 10000
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
example of with data as select:
id descr offset anchor position
1 bla 3 0 1
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
54 oeroe 3 0 9
...
result of select * from data
id descr offset anchor position
2 blab 3 0 2
5 dfkdj 3 0 3
4 dld 3 0 4
6 oeroe 3 1 5
3 blab 3 0 6
9 dfkdj 3 0 7
14 dld 3 0 8
I.E. the result is the anchor row and the tree rows above and below.
How can I achieve the same within a database view?
My attempt failed as I expected by performance issues:
Create a view data of with data as select above
Use this view as above
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1)
Thank you for any advice :-)
Amendment
If I create a view as recommended in first comment, than I get the same performance issue. Oracle does not use the subquery to restrict the results.
Here are the execution plans of my production queries (please click at the images)
a) SQL
b) View
Here are the execution plans of my test cases
-- Create Testdata table with ~ 1,000,000 entries
insert into a_table
(id, descr, a_position_field, anchor_id, horizon_id, a_value)
select level, 'data' || level, mod(level, 10), level, 1, level
from dual
connect by level <= 999999;
insert into anchor_table
(id, a_date)
select level, trunc(sysdate) - 500000 + level
from dual
connect by level <= 999999;
insert into horizon_table (id, offset) values (1, 50);
commit;
-- Create view
create or replace view testdata_vw as
with data as
(select a_table.id,
a_table.descr,
a_table.a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id))
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
-- Explain plan of subquery factoring select statement
explain plan for
with data as
(select a_table.id,
a_table.descr,
a_value,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(order by a_table.a_position_field) as position
from a_table
join anchor_table
on (anchor_table.id = a_table.anchor_id)
join horizon_table
on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between 500000 - 500 and 500000 + 500)
select *
from data d
where d.position between
(select d1.position - d.offset from data d1 where d1.anchor = 1) and
(select d2.position + d.offset from data d2 where d2.anchor = 1);
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D6628_284C5768 ~ 1000 rows
Plan hash value: 1145408420
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 1791 (2)| 00:00:31 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6628_284C5768 | | | | |
| 3 | WINDOW SORT | | 57 | 6840 | 1785 (2)| 00:00:31 |
|* 4 | HASH JOIN | | 57 | 6840 | 1784 (2)| 00:00:31 |
|* 5 | TABLE ACCESS FULL | A_TABLE | 57 | 4104 | 1193 (2)| 00:00:21 |
| 6 | MERGE JOIN CARTESIAN | | 1189K| 54M| 586 (2)| 00:00:10 |
| 7 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | 3 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 1189K| 24M| 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| 583 (2)| 00:00:10 |
|* 10 | FILTER | | | | | |
| 11 | VIEW | | 57 | 3534 | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 13 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
|* 15 | VIEW | | 57 | 912 | 2 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6628_284C5768 | 57 | 4104 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID" AND
"ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
5 - filter("A_TABLE"."A_VALUE">=499500 AND "A_TABLE"."A_VALUE"<=500500)
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE
("T1") "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<=
(SELECT "D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1"
"DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D2" WHERE "D2"."ANCHOR"=1))
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
-- Explain plan of database view
explain plan for
select *
from testdata_vw
where a_value between 500000 - 500 and 500000 + 500;
select plan_table_output
from table(dbms_xplan.display('plan_table', null, null));
/*
Note: Size of SYS_TEMP_0FD9D662A_284C5768 ~ 1000000 rows
Plan hash value: 1422141561
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 1 | VIEW | TESTDATA_VW | 2973 | 180K| | 50324 (1)| 00:14:16 |
| 2 | TEMP TABLE TRANSFORMATION | | | | | | |
| 3 | LOAD AS SELECT | SYS_TEMP_0FD9D662A_284C5768 | | | | | |
| 4 | WINDOW SORT | | 1189K| 136M| 147M| 37032 (1)| 00:10:30 |
|* 5 | HASH JOIN | | 1189K| 136M| | 6868 (1)| 00:01:57 |
| 6 | TABLE ACCESS FULL | HORIZON_TABLE | 1 | 26 | | 3 (0)| 00:00:01 |
|* 7 | HASH JOIN | | 1189K| 106M| 38M| 6860 (1)| 00:01:57 |
| 8 | TABLE ACCESS FULL | ANCHOR_TABLE | 1189K| 24M| | 583 (2)| 00:00:10 |
| 9 | TABLE ACCESS FULL | A_TABLE | 1209K| 83M| | 1191 (2)| 00:00:21 |
|* 10 | FILTER | | | | | | |
|* 11 | VIEW | | 1189K| 70M| | 4431 (1)| 00:01:16 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 13 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
|* 15 | VIEW | | 1189K| 18M| | 4431 (1)| 00:01:16 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D662A_284C5768 | 1189K| 81M| | 4431 (1)| 00:01:16 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID")
7 - access("ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM (SELECT + CACHE_TEMP_TABLE ("T1")
"C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM
"SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= (SELECT
"D2"."POSITION"+:B2 FROM (SELECT + CACHE_TEMP_TABLE ("T1") "C0" "ID","C1" "DESCR","C2"
"A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D2"
WHERE "D2"."ANCHOR"=1))
11 - filter("A_VALUE">=499500 AND "A_VALUE"<=500500)
13 - filter("D1"."ANCHOR"=1)
15 - filter("D2"."ANCHOR"=1)
Note
-----
- dynamic sampling used for this statement (level=4)
*/
sqlfiddle
explain plan of sql http://www.sqlfiddle.com/#!4/6a7022/3
explain plan of view http://www.sqlfiddle.com/#!4/6a7022/2
You need to write a view definition which returns all possible selectable ranges of a_value as two columns, start_a_value and end_a_value, along with all records which fall into each start/end range. In other words, the correct view definition should logically describe a |n^3| result set given n rows in a_table.
Then query that view as:
SELECT * FROM testdata_vw WHERE START_A_VALUE = 4950 AND END_A_VALUE = 5050;
Also, your multiple references to "data" are unnecessary; same logic can be delivered with an additional analytic function.
Final view def:
CREATE OR REPLACE VIEW testdata_vw AS
SELECT *
FROM
(
SELECT T.*,
MAX(CASE WHEN ANCHOR=1 THEN POSITION END)
OVER (PARTITION BY START_A_VALUE, END_A_VALUE) ANCHOR_POS
FROM
(
SELECT S.A_VALUE START_A_VALUE,
E.A_VALUE END_A_VALUE,
B.ID ID,
B.DESCR DESCR,
HORIZON_TABLE.OFFSET OFFSET,
CASE
WHEN ANCHOR_TABLE.A_DATE = TRUNC(SYSDATE)
THEN 1
ELSE 0
END ANCHOR,
ROW_NUMBER()
OVER(PARTITION BY S.A_VALUE, E.A_VALUE
ORDER BY B.A_POSITION_FIELD) POSITION
FROM
A_TABLE S
JOIN A_TABLE E
ON S.A_VALUE<E.A_VALUE
JOIN A_TABLE B
ON B.A_VALUE BETWEEN S.A_VALUE AND E.A_VALUE
JOIN ANCHOR_TABLE
ON ANCHOR_TABLE.ID = B.ANCHOR_ID
JOIN HORIZON_TABLE
ON HORIZON_TABLE.ID = B.HORIZON_ID
) T
) T
WHERE POSITION BETWEEN ANCHOR_POS - OFFSET AND ANCHOR_POS+OFFSET;
EDIT: SQL Fiddle with expected execution plan
I'm seeing the same (sensible) plan here that I saw in my database; if you're getting something different, please send fiddle link.
Use index lookup to find 1 row in "S" A_TABLE (A_VALUE = 4950)
Use index lookup to find 1 row in "E" A_TABLE (A_VALUE = 5050)
Nested Loop join #1 and #2 (1 x 1 join, still 1 row)
FTS 1 row from HORIZON table
Cartesian join #1 and #2 (1 x 1, okay to use Cartesian).
Use index lookup to find ~100 rows in "B" A_TABLE with values between 4950 and 5050.
Cartesian join #5 and #6 (1 x 102, okay to use Cartesian).
FTS ANCHOR_TABLE with hash join to #7.
Window-sort for analytic functions
You have a predicate outside the view and you want to be applied in the view.
For this, you can use push_pred hint:
select /*+PUSH_PRED(v)*/
*
from
testdata_vw v
where
a_value between 5000 - 50 and 5000 + 50;
SQLFIDDLE
EDIT: Now I've seen that you use the data subquery three times. For the first occurrence it makes sense to push the predicate, but for d1 and d2 it doesn't. It's another query.
What would I do is to use two context variables, set them according my needs and write the query:
SYS_CONTEXT('my_context_name', 'var5000');
create or replace view testdata_vw as
with data as (
select
a_table.id,
a_table.descr,
horizon_table.offset,
case
when anchor_table.a_date = trunc(sysdate) then
1
else
0
end as anchor,
row_number() over(
order by a_table.a_position_field) as position
from a_table
join anchor_table on (anchor_table.id = a_table.anchor_id)
join horizon_table on (horizon_table.id = a_table.horizon_id)
where a_table.a_value between SYS_CONTEXT('my_context_name', 'var5000') - SYS_CONTEXT('my_context_name', 'var50') and SYS_CONTEXT('my_context_name', 'var5000') + SYS_CONTEXT('my_context_name', 'var50')
)
select *
from data d
where d.position between (
select d1.position - d.offset
from data d1
where d1.anchor = 1)
and (
select d2.position + d.offset
from data d2
where d2.anchor = 1) ;
to use it:
dbms_session.set_context ('my_context_name', 'var5000', 5000);
dbms_session.set_context ('my_context_name', 'var50', 50);
select * from testdata_vw;
UPDATE: Instead of context variables(which can be used across sessions) you can use package variables as you commented.
This query works but takes 5000 miliseconds.
SELECT
SUM(case
when ((TRUNC(OPEN_DATE) <= thedate and TRUNC(END_DATE) > thedate) or(TRUNC(OPEN_DATE) <= thedate and END_DATE Is Null)) then 1
else 0
end) as Open
From (
select *
FROM PROJECT
WHERE
PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
)
cross join (
select add_months(last_day(SYSDATE), level-7) as thedate
from dual
connect by level <= 12
)
GROUP BY thedate
ORDER BY thedate
If I copy the subquery to its own table
create table test_project as
select * FROM PROJECT WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName
then do the above query but the subquery is on the copied table as:
From ( select * FROM test_project WHERE PROGRAM_NAME = :program
AND ACTION_FOR_ORG = :orgName )
the query takes 10 milliseconds
The query produces a count of how many projects were open in that month over the past 5 and future months (count of open projects for furture months will just equal todays months totals) based on comparing OPEN_DATE to END_DATE
Is there a way to rewrite the original query for optimal performance?
EDIT
OK, I created a second table which is a full copy of the project table (well view) that I was allowed access to. The table copy took about 5 seconds. Using the full set of data and either my sql query or from Egor below, the query is super fast. Something is up with the view. Trying to spit out explain plan using the View in the subquery I get insufficient privileges. Here is the explain plan using a full copy of the view
Plan hash value: 3695211866
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 637 | 1277K| 163 (2)| 00:00:02 |
| 1 | SORT ORDER BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 2 | HASH GROUP BY | | 637 | 1277K| 163 (2)| 00:00:02 |
| 3 | MERGE JOIN CARTESIAN | | 637 | 1277K| 161 (0)| 00:00:02 |
| 4 | VIEW | | 1 | 6 | 2 (0)| 00:00:01 |
|* 5 | CONNECT BY WITHOUT FILTERING| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | BUFFER SORT | | 637 | 1273K| 163 (2)| 00:00:02 |
|* 8 | TABLE ACCESS FULL | COMMIT_TEST | 637 | 1273K| 159 (0)| 00:00:02 |
Predicate Information (identified by operation id):
5 - filter(LEVEL<=12)
8 - filter("PROGRAM_NAME"='program_name' AND "ACTION_FOR_ORG"='action_for_org')
Note
- dynamic sampling used for this statement (level=2)
Explain Plan using live table
with
PRJ as (
select /*+ NO_UNNEST */
trunc(OPEN_DATE) as OPEN_DATE,
nvl(trunc(END_DATE), sysdate + 1000) as END_DATE
from
PROJECT
where
PROGRAM_NAME = :program
and ACTION_FOR_ORG = :orgName
),
DATES as (
select
add_months(trunc(last_day(SYSDATE)), level-7) as thedate
from dual
connect by level <= 12
)
SELECT
thedate,
sum(case when thedate between open_date and end_date then 1 end) as Open
FROM
DATES, PRJ
GROUP BY thedate
ORDER BY 1