I have the following table (Marks):
firstname lastname Mark
------------------------------
arun prasanth 40
ann antony 45
sruthy abc 41
new abc 47
arun prasanth 45
arun prasanth 49
ann antony 49
And would like to add a column that tags if a record with specific columns occurs more than once. This is the result:
firstname lastname Mark MULTI_FLAG
----------------------------------------------
arun prasanth 40 1
ann antony 45 1
sruthy abc 41 0
new abc 47 0
arun prasanth 45 1
arun prasanth 49 1
ann antony 49 1
I can get the result with the following GROUP BY query:
SELECT M1.firstname
,M1.lastname
,M1.Mark
,M2.MULTI_COUNT
FROM Marks M1
JOIN (SELECT firstname, lastname, CASE WHEN COUNT (*) > 1 THEN 1 ELSE 0 END AS MULTI_COUNT
FROM Marks
GROUP BY firstname, lastname) M2
ON M2.firstname = M1.firstname AND M2.lastname = M1.lastname;
Or by this much prettier PARTITION BY query:
SELECT
firstname,
lastname,
CASE WHEN COUNT(*) OVER (PARTITION BY
firstname,
lastname) > 1 THEN 1 ELSE 0 END AS MULTI_FLAG
FROM
Marks
Running the GROUP BY query on a similar large table returned in:
34 m 56 s 595 ms
Running the PARTITION BY query on a similar large table returned in:
First run: 55 m 47 s 851 ms
Second run: 36 m 46 s 95 ms
I would be interested in knowing:
The best way to achieve my results
What accounts for the performance difference.
EDIT: How to read the query plan.
EDIT:
Oracle Version
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
"CORE 11.2.0.3.0 Production"
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
PARTITION BY Plan
PLAN_TABLE_OUTPUT
Plan hash value: 3822227444
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 668K| 90M| | 90429 (1)| 00:18:06 |
| 1 | WINDOW SORT | | 668K| 90M| 98M| 90429 (1)| 00:18:06 |
|* 2 | HASH JOIN RIGHT OUTER | | 668K| 90M| | 69340 (1)| 00:13:53 |
| 3 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | | | | | |
| 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
| 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM"
AND "PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE
'INPUT%')
GROUP BY Plan
PLAN_TABLE_OUTPUT
Plan hash value: 648231064
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 912 | 2052K| | 226K (1)| 00:45:22 |
|* 1 | HASH JOIN | | 912 | 2052K| | 226K (1)| 00:45:22 |
| 2 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 89667 | 194M| 45M| 226K (1)| 00:45:22 |
| 4 | NESTED LOOPS | | | | | | |
| 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
| 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
| 9 | VIEW | | 668K| 1377M| | 86518 (1)| 00:17:19 |
| 10 | HASH GROUP BY | | 668K| 72M| 80M| 86518 (1)| 00:17:19 |
|* 11 | HASH JOIN RIGHT OUTER | | 668K| 72M| | 69340 (1)| 00:13:53 |
| 12 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 2478 | | 3 (0)| 00:00:01 |
| 13 | NESTED LOOPS | | | | | | |
| 14 | NESTED LOOPS | | 377K| 35M| | 69335 (1)| 00:13:53 |
| 15 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 16 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 17 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 1701 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("R2"."TRIAL_COUNTRY_CD"="CRM"."COUNTRY_CD" AND
UPPER("CRM"."COUNTRY")=UPPER("QCAB"."TRIAL_COUNTRY"))
3 - access("R2"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "R2"."ITERATION"="QCAB"."ITERATION" AND
"R2"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND "R2"."ASSUMPTION"="QCAB"."ASSUMPTION")
7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
"PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')
11 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
16 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
"PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')
Typically you start with the analytic function count(*) which leads to a compact SQL.
The drawback of this aproach is that the data must be sorted (see WINDOW SORT operation). The GROUP BY approach avoids
the sorting as HASH GROUP BY may be used, which can lead to a better performance.
Your example is a bit more involved, as you do not use table but a view that joins three tables - this join is performed twice, for the GROUP BY and for the detail data; which
is of course not optimal.
So I'll start with the analytic function version of the query (possible with a PARALLELoption).
If you want to try the GROUP BY a lightway version is possible:
1) group only the duplicated keys
2) make OUTER JOIN to assign the MULTI_FLAG
example with execution plan below - simple test with your data
with dups as (
select firstname,lastname from tmp
group by firstname,lastname
having count(*) > 1)
select tmp.FIRSTNAME, tmp.LASTNAME, tmp.MARK,
case when dups.firstname is not NULL then 1 else 0 end as MULTI_FLAG
from tmp
left outer join dups on tmp.firstname = dups.firstname and tmp.lastname = dups.lastname;
You still need to access your view twice, but the final join will be faster (espetially if you have only small number of duplicated keys).
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 26M| | 1673 (1)| 00:00:21 |
|* 1 | HASH JOIN RIGHT OUTER| | 105K| 26M| 11M| 1673 (1)| 00:00:21 |
| 2 | VIEW | | 105K| 10M| | 128 (4)| 00:00:02 |
|* 3 | FILTER | | | | | | |
| 4 | HASH GROUP BY | | 105K| 10M| | 128 (4)| 00:00:02 |
| 5 | TABLE ACCESS FULL| TMP | 105K| 10M| | 125 (1)| 00:00:02 |
| 6 | TABLE ACCESS FULL | TMP | 105K| 15M| | 125 (1)| 00:00:02 |
--------------------------------------------------------------------------------------
every time I execute this query it takes like 2 minutes to execute:
select * from CPOB_Monitoring_Dashboard
where VOYAGE_STRT_DT >= TO_TIMESTAMP('2014-07-03 00:00:00.000','YYYY-MM-DD HH24:MI:SS.FF')
and VOYAGE_STRT_DT <= TO_TIMESTAMP('2018-07-03 00:00:00.000','YYYY-MM-DD HH24:MI:SS.FF')
However if I change it to use TO_DATE instead of TO_TIMESTAMP is really fast.
Linq is generating the query using TOTIMESTAMP and I've not found yet a way to change that to use TO_DATE, is there any way that I can optimize the TOTIMESTAMP query??
Here is the Execution Plan for the query using TOTIMESTAMP:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 246273147
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 21842 | 4820K| | 1336 (1)| 00:00:17 |
| 1 | VIEW | CPOB_Monitoring_Dashboard| 21842 | 4820K| | 1336 (1)| 00:00:17 |
| 2 | HASH UNIQUE | | 21842 | 3988K| 4384K| 1336 (1)| 00:00:17 |
| 3 | NESTED LOOPS | | 21842 | 3988K| | 442 (1)| 00:00:06 |
| 4 | NESTED LOOPS | | 47 | 7661 | | 160 (1)| 00:00:02 |
|* 5 | TABLE ACCESS FULL | VOYAGE_INFO | 46 | 1012 | | 68 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| PROCESS_CTRL | 1 | 141 | | 2 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | VOYAGE_ID_IDX | 1 | | | 1 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | PLY_IDX2 | 467 | 11208 | | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(INTERNAL_FUNCTION("CPVI"."VOYAGE_STRT_DT")>=TIMESTAMP' 2014-07-03 00:00:00.000000000' AND
INTERNAL_FUNCTION("CPVI"."VOYAGE_STRT_DT")<=TIMESTAMP' 2018-07-03 00:00:00.000000000')
7 - access("CPC"."VOYAGE_ID"="CPVI"."VOYAGE_ID")
8 - access("CPC"."BRAND_NAME"="CPOB"."BRAND_ID" AND "CPC"."SHIP_NAME"=""SHIP_NAME")
filter("CPC"."SHIP_NAME"="CPOB"."SHIP_NAME")
24 rows selected.
below code works well in 11g BUT only in 12c works so slow (over 10 sec).
SELECT * FROM DMProgValue_00001
WHERE 1=1
AND ProgressOID IN (
SELECT P.OID FROM (
SELECT OID FROM (
SELECT A.OID, ROWNUM as seqNum FROM (
SELECT OID FROM DMProgress_00001 WHERE 1=1
AND Project = 'Q539'
ORDER BY actCode
) A
WHERE ROWNUM <= 40
) WHERE seqNum > 20
) P
);
Plan hash value: 763232015
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 189 | 171 (1)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | DMPROGVALUE_00001 | 1 | 189 | 68 (0)| 00:00:01 |
|* 3 | VIEW | | 20 | 800 | 103 (1)| 00:00:01 |
|* 4 | COUNT STOPKEY | | | | | |
| 5 | VIEW | | 2916 | 78732 | 103 (1)| 00:00:01 |
|* 6 | SORT ORDER BY STOPKEY| | 2916 | 130K| 103 (1)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | DMPROGRESS_00001 | 2916 | 130K| 102 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM (SELECT "A"."OID" "OID",ROWNUM "SEQNUM" FROM
(SELECT "OID" "OID" FROM "DMPROGRESS_00001" "DMPROGRESS_00001" WHERE
"PHASE"='Construction' AND "PROJECT"='Q539' ORDER BY "ACTCODE") "A" WHERE ROWNUM<=40)
"from$_subquery$_003" WHERE "SEQNUM">20 AND "OID"=:B1))
3 - filter("SEQNUM">20 AND "OID"=:B1)
4 - filter(ROWNUM<=40)
6 - filter(ROWNUM<=40)
7 - filter("PHASE"='Construction' AND "PROJECT"='Q539')
DMProgress_0001 stats
NUM_ROW : 10385
BLOCKS : 370
AVG_ROW_LEN : 176
SMAPLE_SIZE : 8263
DMProgvalue_0001 stats
NUM_ROW : 15703
BLOCKS : 244
AVG_ROW_LEN : 49
SMAPLE_SIZE : 5033
It's only 10k rows and Indexs are well made ( I can tell because of 11g experience). I know some detour way to make fast ( below code - 0.001 sec) BUT I want to know real problem and fix it.
I cannot understand it only has one sub query and 10k rows for each table. Not just compared to 11g, There is no way that this query takes over 10 sec.
SELECT * FROM DMProgValue_00001 V,
(SELECT OID FROM (
SELECT A.OID, ROWNUM as seqNum FROM (
SELECT OID FROM DMProgress_00001 WHERE 1=1
AND Project = 'Q539'
ORDER BY actCode
) A
WHERE ROWNUM <= 40
) WHERE seqNum > 20
) P
WHERE 1=1
AND V.ProgressOID = P.OID;
added 11g similar query plan
Plan hash value: 3049049852
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 684 | 49 (5)| 00:00:01 |
|* 1 | HASH JOIN RIGHT SEMI | | 9 | 684 | 49 (5)| 00:00:01 |
| 2 | VIEW | VW_NSO_1 | 3 | 81 | 35 (3)| 00:00:01 |
|* 3 | VIEW | | 3 | 75 | 35 (3)| 00:00:01 |
|* 4 | COUNT STOPKEY | | | | | |
| 5 | VIEW | | 3 | 36 | 35 (3)| 00:00:01 |
|* 6 | SORT ORDER BY STOPKEY| | 3 | 144 | 35 (3)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | DMPROGRESS_00037 | 3 | 144 | 34 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | DMPROGVALUE_00037 | 5106 | 244K| 13 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("PROGRESSOID"="OID")
3 - filter("SEQNUM">20)
4 - filter(ROWNUM<=40)
6 - filter(ROWNUM<=40)
7 - filter("DISPLINE"='Q340' AND "PHASE"='Procurement' AND "PROJECT"='Moho')
oracle 11g automatically change the query as Hash join BUT 12c does not. I think this is point. They are same structure.
I seem to have an odd issue regarding an Oracle view that has functions defined for columns and when those functions are evaluated.
Let's say I have the following view and function definition:
CREATE OR REPLACE VIEW test_view_one AS
SELECT column_one,
a_package.function_that_returns_a_value(column_one) function_column
FROM a_table;
CREATE OR REPLACE PACKAGE BODY a_package AS
FUNCTION function_that_returns_a_value(p_key VARCHAR2) RETURN VARCHAR2 IS
CURSOR a_cur IS
SELECT value
FROM table_b
WHERE key = p_key;
p_temp VARCHAR2(30);
BEGIN
-- Code here to write into a temp table. The function call is autonomous.
OPEN a_cur;
FETCH a_cur INTO p_temp;
CLOSE a_cur;
RETURN p_temp;
END function_that_returns_a_value;
END a_package;
In general, I would expect that if function_column is included in a query then for every row brought back by that query, the function would be run. This seems to be true in some circumstances but not for others.
For example, let's say I have the following:
SELECT pageouter,*
FROM(WITH page_query AS (SELECT *
FROM test_view_one
ORDER BY column_one)
SELECT page_query.*, ROWNUM as innerrownum
FROM page_query
WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1
In this scenario, that inner query (the one querying test_view_one) brings back around 90,000 records.
If I define the function as inserting into a temporary table then I can tell that the function ran 25 times, once for each row brought back. Exactly what I would expect.
However, if I add a significant where clause on to that inner query, e.g.
SELECT pageouter,*
FROM(WITH page_query AS (SELECT *
FROM test_view_one
WHERE EXISTS (SELECT 'x' FROM some_table WHERE ...)
AND NOT EXISTS (SELECT 'x' FROM some_other_table WHERE ...)
AND EXISTS (SELECT 'x' FROM another_table WHERE ...)
ORDER BY column_one)
SELECT page_query.*, ROWNUM as innerrownum
FROM page_query
WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1
Then the number of rows being brought back by the inner query is 60,000 and if I then query the temporary table, I can tell the function has run 60,000 times. Unsurprisingly, this pretty much destroys performance of the query.
The queries above are run as part of a paging implementation which is why we only ever bring back 25 rows and is why we only ever need the functions to be run for those 25 rows.
I should add, if I change the WHERE clause (i.e. I remove some of the conditions) then the query goes back to behaving it self, only running the functions for the 25 rows that are actually brought back.
Does anyone have any idea as to when functions in views are evaluated? Or anyway in determining what causes it or a way of identifying when the functions are evaluated (I've checked the explain plan and there's nothing in there which seems to give it away). If I knew that then I could hopefully find a solution to the problem but there seems to be little documentation other than "They'll run for each row brought back" which is clearly not the case in some scenarios.
I fully appreciate it's difficult to work out what's going on without a working schema but if you need anymore info then please feel free to ask.
Many Thanks
Additional Info as Requested.
Below is the actual explain plan that I get out of the production environment. The table names don't match the above query (in fact there's considerably more tables involved but they're all joined by NOT EXISTS statements within the WHERE clause.)
The DEMISE table, is the equivalent of the A_TABLE in the above query.
It's worth noting that stats were gathered just before I ran the explain plan to make it as accurate as possible.
My understanding of this is that the VIEW row is where the functions would be evaluated, which occurs AFTER the rows have been filtered down. My understanding is obviously flawed!
So this is the bad plan, the one that calls the function 60,000 times...
Execution Plan
----------------------------------------------------------
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 10230 | 984 (1)|
| 1 | FAST DUAL | | 1 | | 2 (0)|
| 2 | FAST DUAL | | 1 | | 2 (0)|
|* 3 | VIEW | | 5 | 10230 | 984 (1)|
|* 4 | COUNT STOPKEY | | | | |
| 5 | VIEW | | 5 | 10165 | 984 (1)|
|* 6 | SORT ORDER BY STOPKEY | | 5 | 340 | 984 (1)|
| 7 | COUNT | | | | |
|* 8 | FILTER | | | | |
|* 9 | HASH JOIN RIGHT OUTER | | 5666 | 376K| 767 (1)|
|* 10 | INDEX RANGE SCAN | USERDATAI1 | 1 | 12 | 2 (0)|
|* 11 | HASH JOIN RIGHT ANTI | | 5666 | 309K| 765 (1)|
|* 12 | INDEX FAST FULL SCAN | TNNTMVINI1 | 1 | 17 | 35 (0)|
|* 13 | HASH JOIN RIGHT ANTI | | 6204 | 236K| 729 (1)|
|* 14 | INDEX RANGE SCAN | CODESGENI3 | 1 | 10 | 2 (0)|
|* 15 | INDEX FULL SCAN | DEMISEI4 | 6514 | 184K| 727 (1)|
| 16 | NESTED LOOPS | | 1 | 25 | 3 (0)|
| 17 | NESTED LOOPS | | 1 | 25 | 3 (0)|
|* 18 | INDEX RANGE SCAN | PROPERTY_GC | 1 | 15 | 2 (0)|
|* 19 | INDEX UNIQUE SCAN | CODESGENI1 | 1 | | 0 (0)|
|* 20 | TABLE ACCESS BY INDEX ROWID| CODESGEN | 1 | 10 | 1 (0)|
| 21 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
|* 22 | INDEX RANGE SCAN | DMSELEASI4 | 1 | 21 | 2 (0)|
|* 23 | INDEX RANGE SCAN | TNNTMVINI1 | 1 | 17 | 1 (0)|
| 24 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
-------------------------------------------------------------------------------------------
This is the good plan. This calls the function 25 times but has some of the not exists statements removed from the where clause.
Execution Plan
----------------------------------------------------------
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 54200 | 144 (0)|
| 1 | FAST DUAL | | 1 | | 2 (0)|
| 2 | FAST DUAL | | 1 | | 2 (0)|
|* 3 | VIEW | | 25 | 54200 | 144 (0)|
|* 4 | COUNT STOPKEY | | | | |
| 5 | VIEW | | 26 | 56030 | 144 (0)|
| 6 | COUNT | | | | |
|* 7 | FILTER | | | | |
| 8 | NESTED LOOPS ANTI | | 30 | 3210 | 144 (0)|
| 9 | NESTED LOOPS OUTER | | 30 | 2580 | 114 (0)|
| 10 | NESTED LOOPS ANTI | | 30 | 2220 | 84 (0)|
| 11 | NESTED LOOPS ANTI | | 32 | 1824 | 52 (0)|
| 12 | TABLE ACCESS BY INDEX ROWID| DEMISE | 130K| 5979K| 18 (0)|
| 13 | INDEX FULL SCAN | DEMISEI4 | 34 | | 3 (0)|
|* 14 | INDEX RANGE SCAN | CODESGENI3 | 1 | 10 | 1 (0)|
|* 15 | INDEX RANGE SCAN | TNNTMVINI1 | 1 | 17 | 1 (0)|
|* 16 | INDEX RANGE SCAN | USERDATAI1 | 1 | 12 | 1 (0)|
|* 17 | INDEX RANGE SCAN | DMSELEASI4 | 1 | 21 | 1 (0)|
| 18 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
----------------------------------------------------------------------------------------
I fully appreciate the second plan is doing less but that doesn't explain why the functions aren't being evaluated... at least not that I can work out.
The Pagination with ROWNUM may be performed
in two ways:
A) full scan the row source with optimized sorting (limited to the top N rows) or
B) index access of the row source with no sort at all
Here simplified example of case A
SELECT *
FROM
(SELECT a.*,
ROWNUM rnum
FROM
( SELECT * FROM test_view_one ORDER BY id
) a
WHERE ROWNUM <= 25
)
WHERE rnum >= 1
The corresponding execution plan looks as follows (Note that I presend also part
of column projection - I will soon explain why):
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 975 | | 1034 (1)| 00:00:01 |
|* 1 | VIEW | | 25 | 975 | | 1034 (1)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | | |
| 3 | VIEW | | 90000 | 2285K| | 1034 (1)| 00:00:01 |
|* 4 | SORT ORDER BY STOPKEY| | 90000 | 439K| 1072K| 1034 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 90000 | 439K| | 756 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
...
3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
5 - "ID"[NUMBER,22]
Within the execution the table is accessed with FULL SCAN, i.e. all records are red.
The optimization is in the SORT operation: SORT ORDER BY STOPKEY means that not all
rows are sorted, but only the top 25 are kept and sortet.
Here the execution plan for case B
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 975 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 25 | 975 | 2 (0)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | VIEW | | 26 | 676 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| TEST_IDX | 26 | 130 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Here are accessed only the required 25 rows and therefore the function can't be called more that the N times.
Now the important consideration, in case A, the function can, but need not be called for each row. How do we see it?
The answer is in the column projection in the explain plan.
4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
The relevant line 4 show, that the function is called in the SORT operation and therefor for each line. (Sort gets all the rows).
Conclusion
My test on 11.2 shows that in case A (FULL SCAN with SORT ORDER BY STOPKEY) the view function is called
once per each row.
I guess the only workaround is to select only the ID, limit the result and than join back the original view to get the function value.
Final notes
I tested this in 12.1 as well and see below the shift in the column projection.
The function is calculated first in the VIEW (line 3), i.e. both cases works fine.
Column Projection Information (identified by operation id):
-----------------------------------------------------------
...
3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
4 - (#keys=1) "ID"[NUMBER,22]
5 - "ID"[NUMBER,22]
And of course in 12c the new feature of OFFSET - FETCH NEXT could be used.
Good Luck!
So, I'm in the process of writing a process that will populate a table using about 10 different selects. As part of the process I've picked up a few things, however I'm facing some inconsistent behaviour at times when performing an INSERT/SELECT. When I run the select query alone, I get results back within 10 seconds (after I flush buffer cache and shared_pool), but at soon as I tag on the insert portion, takes over 10 min...
Here's what I'm doing after I clear the table... Added the "Append" hint on insert which appeared to have helped quite a bit earlier this week, but now it's back to taking longer than usual, but only when performing the insert!!
--// Disable GL_JLOG_DETAILS INDEXES
Execute immediate 'alter index IDX_GLDTL1 unusable';
Execute immediate 'alter index IDX_GLDTL2 unusable';
--// SECTION 1
INSERT /*+ APPEND PARALLEL(6) */ INTO GL_JLOG_DETAILS ( MAT_SECTION
,JLOG_KEY
,SRC_CD
,SRC_KEY
,CASE_KEY
,CASE_MBR_KEY
,UNALLOC_ACCT_CD
,FDRT_KEY
,FDRT_TR_CD
,TR_CD
,TR_REF_NO
,STAT_CD
,FD_DESC_ID
,FD_NO
,FD_TYP_CD
,BKT_NO
,RVSL_CD
,CO_CD
,BEN_RPT_TYP_CD
,DB_CR_CD
,ACCT_NO
,SUB_ACCT_NO
,JRNL_AMT
,TDTL_AMT
,RVSL_ERROR
,RVSL_SAME_DAY
,TDTL_REF_KEY)
---// SECTION 1 //--------
Select /*+ USE_NL(TMAP,TMOV,ACCT,SBNT,BNTP,FDRT) */ '1' MAT_SECTION,
JLOG.JLOG_KEY,
JLOG.SRC_CD,
TDTL.TDTL_KEY AS SRC_KEY,
TDTL.CASE_KEY,
TDTL.CASE_MBR_KEY,
TDTL.UNALLOC_ACCT_CD,
TDTL.FDRT_KEY,
FDRT.TR_CD FDRT_TR_CD,
TDTL.TR_CD,
TDTL.TR_REF_NO,
TDTL.STAT_CD,
TDTL.FD_DESC_ID,
TDTL.FD_NO,
FDDC.FD_TYP_CD,
TDTL.BKT_NO,
JLOG.RVSL_CD,
BNTP.CO_CD,
SBNT.BEN_RPT_TYP_CD,
JLOG.DB_CR_CD,
ACCT.ACCT_NO,
ACCT.SUB_ACCT_NO,
JLOG.JRNL_AMT,
ABS (TDTL.AMT) AS TDTL_AMT,
CASE
WHEN TDTL.PROC_DT < TDTL.RVSL_CYC_DT AND TDTL.ORIG_INBS_KEY IS NULL
THEN
1
ELSE
0
END RVSL_ERROR,
CASE
WHEN TDTL.PROC_DT = TDTL.RVSL_CYC_DT AND NOT TDTL.TR_CD IN ('3002','3004','1501','1502','1503','1504','1505')
THEN 1
ELSE 0
END RVSL_SAME_DAY,
TDTL.REF_KEY TDTL_REF_KEY
from GL_JOURNAL_LOGS JLOG,
Transact_Details TDTL,
FUND_DESC FDDC,
FD_RATES FDRT,
BEN_TYPES BNTP,
SYS_BEN_TYPES SBNT,
LEDGER_ACCOUNTS ACCT,
TRANS_MAP_OVRD TMOV,
TRANSACTION_MAP TMAP
WHERE JLOG.JRNL_CD = '0'
AND JLOG.SRC_CD = '2'
AND JLOG.MKEY_FD_NUM <> 0
AND NVL(JLOG.TMOV_KEY, -1) > 0
AND NVL(JLOG.ORIG_SCAT_KEY, 1) = 1
AND JLOG.Scat_key = TDTL.SCAT_KEY
AND JLOG.TR_CD = TDTL.TR_CD
AND JLOG.CASE_KEY = TDTL.CASE_KEY
AND JLOG.TR_REF_NO = TDTL.TR_REF_NO
AND JLOG.ACCT_KEY = ACCT.ACCT_KEY
AND JLOG.TMOV_KEY = TMOV.TMOV_KEY
AND NVL(TDTL.ORIG_SCAT_KEY, 1) = 1
AND TDTL.STAT_CD <> '4'
AND TDTL.FD_DESC_ID = FDDC.FD_DESC_ID
AND TDTL.FDRT_KEY = FDRT.FDRT_KEY
AND BNTP.BNTP_KEY = FDRT.BNTP_KEY
AND BNTP.SBNT_KEY (+) = SBNT.SBNT_KEY
AND TMOV.TMAP_KEY = TMAP.TMAP_KEY
AND TMOV.CO_CD = BNTP.CO_CD
AND DECODE(FDDC.MKEY_FD_NUM, NULL, TMAP.MKEY_FD_NUM, FDDC.MKEY_FD_NUM) = TMAP.MKEY_FD_NUM;
Any tips / advice would be greatly appreciated!
Explain Plan
Plan hash value: 4157721082
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | 212 | 1596 (1)| 00:00:20 |
| 1 | LOAD AS SELECT | GL_JLOG_DETAILS | | | | |
| 2 | NESTED LOOPS | | | | | |
| 3 | NESTED LOOPS | | 1 | 212 | 1596 (1)| 00:00:20 |
| 4 | NESTED LOOPS | | 1 | 195 | 1595 (1)| 00:00:20 |
| 5 | NESTED LOOPS | | 1 | 190 | 1594 (1)| 00:00:20 |
| 6 | NESTED LOOPS | | 12 | 2172 | 1582 (1)| 00:00:19 |
|* 7 | HASH JOIN | | 12 | 2004 | 1570 (1)| 00:00:19 |
| 8 | TABLE ACCESS FULL | FUND_DESC | 168 | 1176 | 4 (0)| 00:00:01 |
| 9 | NESTED LOOPS | | | | | |
| 10 | NESTED LOOPS | | 257 | 41120 | 1566 (1)| 00:00:19 |
| 11 | NESTED LOOPS | | 257 | 22359 | 537 (0)| 00:00:07 |
| 12 | NESTED LOOPS | | 257 | 20817 | 280 (0)| 00:00:04 |
|* 13 | TABLE ACCESS BY INDEX ROWID| GL_JOURNAL_LOGS | 257 | 18504 | 23 (0)| 00:00:01 |
|* 14 | INDEX RANGE SCAN | IDX_ORIGSCATKEY | 690 | | 4 (0)| 00:00:01 |
| 15 | TABLE ACCESS BY INDEX ROWID| TRANS_MAP_OVRD | 1 | 9 | 1 (0)| 00:00:01 |
|* 16 | INDEX UNIQUE SCAN | TMOV_PK | 1 | | 0 (0)| 00:00:01 |
| 17 | TABLE ACCESS BY INDEX ROWID | TRANSACTION_MAP | 1 | 6 | 1 (0)| 00:00:01 |
|* 18 | INDEX UNIQUE SCAN | TMAP_PK | 1 | | 0 (0)| 00:00:01 |
|* 19 | INDEX RANGE SCAN | IX_AML8890 | 3 | | 3 (0)| 00:00:01 |
|* 20 | TABLE ACCESS BY INDEX ROWID | TRANSACT_DETAILS | 1 | 73 | 4 (0)| 00:00:01 |
| 21 | TABLE ACCESS BY INDEX ROWID | FD_RATES | 1 | 14 | 1 (0)| 00:00:01 |
|* 22 | INDEX UNIQUE SCAN | FDRT_PK | 1 | | 0 (0)| 00:00:01 |
|* 23 | TABLE ACCESS BY INDEX ROWID | BEN_TYPES | 1 | 9 | 1 (0)| 00:00:01 |
|* 24 | INDEX UNIQUE SCAN | BNTP_PK | 1 | | 0 (0)| 00:00:01 |
| 25 | TABLE ACCESS BY INDEX ROWID | SYS_BEN_TYPES | 1 | 5 | 1 (0)| 00:00:01 |
|* 26 | INDEX UNIQUE SCAN | SBNT_PK | 1 | | 0 (0)| 00:00:01 |
|* 27 | INDEX UNIQUE SCAN | ACCT_PK | 1 | | 0 (0)| 00:00:01 |
| 28 | TABLE ACCESS BY INDEX ROWID | LEDGER_ACCOUNTS | 1 | 17 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("TDTL"."FD_DESC_ID"="FDDC"."FD_DESC_ID")
filter("TMAP"."MKEY_FD_NUM"=DECODE(TO_CHAR("FDDC"."MKEY_FD_NUM"),NULL,"TMAP"."MKEY_FD_NUM","
FDDC"."MKEY_FD_NUM"))
13 - filter("JLOG"."TMOV_KEY" IS NOT NULL AND "JLOG"."SRC_CD"='2' AND "JLOG"."MKEY_FD_NUM"<>0
AND NVL("TMOV_KEY",(-1))>0 AND "JLOG"."JRNL_CD"='0')
14 - access(NVL("ORIG_SCAT_KEY",1)=1)
16 - access("JLOG"."TMOV_KEY"="TMOV"."TMOV_KEY")
18 - access("TMOV"."TMAP_KEY"="TMAP"."TMAP_KEY")
19 - access("JLOG"."SCAT_KEY"="TDTL"."SCAT_KEY" AND "JLOG"."CASE_KEY"="TDTL"."CASE_KEY")
filter("TDTL"."FD_DESC_ID" IS NOT NULL AND "TDTL"."STAT_CD"<>'4' AND
"JLOG"."CASE_KEY"="TDTL"."CASE_KEY")
20 - filter("TDTL"."FDRT_KEY" IS NOT NULL AND NVL("TDTL"."ORIG_SCAT_KEY",1)=1 AND
"JLOG"."TR_CD"="TDTL"."TR_CD" AND "JLOG"."TR_REF_NO"="TDTL"."TR_REF_NO")
22 - access("TDTL"."FDRT_KEY"="FDRT"."FDRT_KEY")
23 - filter("TMOV"."CO_CD"="BNTP"."CO_CD")
24 - access("BNTP"."BNTP_KEY"="FDRT"."BNTP_KEY")
26 - access("BNTP"."SBNT_KEY"="SBNT"."SBNT_KEY")
27 - access("JLOG"."ACCT_KEY"="ACCT"."ACCT_KEY")
A couple of problems with "Direct path" load a.k.a. /*+ APPEND */.
Firstly, it isn't necessarily faster in general. It does a direct path load to disk - bypassing the buffer cache. In many cases - especially with smaller sets of data - where the direct path load to disk would be slower than a conventional path load into the cache.
Second, DPL works above the high watermark of the table which means it does not re-use the space. So unless you TRUNCATE the whole table every time, it probably isn't gonna be necessarily faster.
Third, only one seesion/user can DPL on to a table at a time. That may cause concurrency issues, here all the modifications will become serialized (like a serial circuit, rather than a parallel circuit). No insert/update/delete or merge into this table will be allowed until the transaction that direct paths, commits.
Lastly, the indexes of your table in question matters. It is pretty difficult to say without looking at the data as of what kind of index will be appropriate and on which columns. Try a combination of Local and global bitmap indexes, but be careful while chosing the columns. Making an index 'unusable' is not a good idea in general, there is a purpose why it was created and altering it almost conflicts to its immediate interests.
To start with, remove all hints, try indexing. and iff it is a reloadable table then truncate it prior to every load.