I have two paging query that I consider to use.
First one is
SELECT * FROM ( SELECT rownum rnum, a.* from (
select * from members
) a WHERE rownum <= #paging.endRow# ) where rnum > #paging.startRow#
And the Second is
SELECT * FROM ( SELECT rownum rnum, a.* from (
select * from members
) a ) WHERE rnum BETWEEN #paging.startRow# AND #paging.endRow#
how do you think which query is the faster one?
I don't actually have availability of Oracle now but the best SQL query for paging is the following for sure
select *
from (
select rownum as rn, a.*
from (
select *
from my_table
order by ....a_unique_criteria...
) a
)
where rownum <= :size
and rn > (:page-1)*:size
http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html
To achieve a consistent paging you should order rows using a unique criteria, doing so will avoid to load for page X a row you already loaded for a page Y ( !=X ).
EDIT:
1) Order rows using a unique criteria means to order data in way that each row will keep the same position at every execution of the query
2) An index with all the expressions used on the ORDER BY clause will help getting results faster, expecially for the first pages. With that index the execution plan choosen by the optimizer doesn't needs to sort the rows because it will return rows scrolling the index by its natural order.
3) By the way, the fastests way to page result from a query is to execute the query only once and to handle all the flow from the application side.
Take a look at the execution plans, example with 1000 rows:
SELECT *
FROM (SELECT ROWNUM rnum
,a.*
FROM (SELECT *
FROM members) a
WHERE ROWNUM <= endrow#)
WHERE rnum > startrow#;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 39000 | 3 (0)| 00:00:01 |
|* 1 | VIEW | | 1000 | 39000 | 3 (0)| 00:00:01 |
| 2 | COUNT | | | | | |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MEMBERS | 1000 | 26000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">"STARTROW#")
3 - filter("MEMBERS"."ENDROW#">=ROWNUM)
And 2.
SELECT *
FROM (SELECT ROWNUM rnum
,a.*
FROM (SELECT *
FROM members) a)
WHERE rnum BETWEEN startrow# AND endrow#;
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 39000 | 3 (0)| 00:00:01 |
|* 1 | VIEW | | 1000 | 39000 | 3 (0)| 00:00:01 |
| 2 | COUNT | | | | | |
| 3 | TABLE ACCESS FULL| MEMBERS | 1000 | 26000 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM"<="ENDROW#" AND "RNUM">="STARTROW#")
Out of that I'd say version 2 could be slightly faster as it includes one step less. But I don't know about your indexes and data distribution so it's up to you to get these execution plans yourself and judge the situation for your data. Or simply test it.
A already answered in here But let me copypaste.
Just want to summarize the answers and comments. There are a number of ways doing a pagination.
Prior to oracle 12c there were no OFFSET/FETCH functionality, so take a look at whitepaper as the #jasonk suggested. It's the most complete article I found about different methods with detailed explanation of advantages and disadvantages. It would take a significant amount of time to copy-paste them here, so I want do it.
There is also a good article from jooq creators explaining some common caveats with oracle and other databases pagination. jooq's blogpost
Good news, since oracle 12c we have a new OFFSET/FETCH functionality. OracleMagazine 12c new features. Please refer to "Top-N Queries and Pagination"
You may check your oracle version by issuing the following statement
SELECT * FROM V$VERSION
Related
I faced a puzzling situation. A query had a good execution plan. But when that query was used as an inner query inside a larger query, that plan changed. I am trying to understand why it might be so.
This was on Oracle 11g. My query was:
SELECT * FROM YFS_SHIPMENT_H
WHERE SHIPMENT_KEY IN
(
SELECT DISTINCT SHIPMENT_KEY
FROM YFS_SHIPMENT_LINE_H
WHERE ORDER_HEADER_KEY = '20150113083918815889858'
OR ( ORDER_LINE_KEY IN ( '20150113084438815896336') )
);
As you can see, there is an inner query here, which is:
SELECT DISTINCT SHIPMENT_KEY
FROM YFS_SHIPMENT_LINE_H
WHERE ORDER_HEADER_KEY = '20150113083918815889858'
OR ( ORDER_LINE_KEY IN ( '20150113084438815896336') )
When I run just the inner query, I get the execution plan as:
PLAN_TABLE_OUTPUT
========================================================================================================
SQL_ID 3v82m4j5tv1k3, child number 0
=====================================
SELECT DISTINCT SHIPMENT_KEY FROM YFS_SHIPMENT_LINE_H WHERE
ORDER_HEADER_KEY = '20150113083918815889858' OR ( ORDER_LINE_KEY IN (
'20150113084438815896336') )
Plan hash value: 3691773903
========================================================================================================
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
========================================================================================================
| 0 | SELECT STATEMENT | | | | 10 (100)| |
| 1 | HASH UNIQUE | | 7 | 525 | 10 (10)| 00:00:01 |
| 2 | CONCATENATION | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| YFS_SHIPMENT_LINE_H | 1 | 75 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | YFS_SHIPMENT_LINE_H_I4 | 1 | | 3 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| YFS_SHIPMENT_LINE_H | 6 | 450 | 5 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | YFS_SHIPMENT_LINE_H_I6 | 6 | | 3 (0)| 00:00:01 |
========================================================================================================
Predicate Information (identified by operation id):
===================================================
4 = access("ORDER_LINE_KEY"='20150113084438815896336')
5 = filter(LNNVL("ORDER_LINE_KEY"='20150113084438815896336'))
6 = access("ORDER_HEADER_KEY"='20150113083918815889858')
The execution plan shows that the table YFS_SHIPMENT_LINE_H is accessed with two indexes YFS_SHIPMENT_LINE_H_I4 and YFS_SHIPMENT_LINE_H_I6; and then the results are concatenated. This plan seems fine and the query response time is great.
But when I run the complete query, the access path of the inner query changes as given below:
PLAN_TABLE_OUTPUT
=======================================================================================================
SQL_ID dk1bp8p9g3vzx, child number 0
=====================================
SELECT * FROM YFS_SHIPMENT_H WHERE SHIPMENT_KEY IN ( SELECT DISTINCT
SHIPMENT_KEY FROM YFS_SHIPMENT_LINE_H WHERE ORDER_HEADER_KEY =
'20150113083918815889858' OR ( ORDER_LINE_KEY IN (
'20150113084438815896336') ) )
Plan hash value: 3651083773
=======================================================================================================
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
=======================================================================================================
| 0 | SELECT STATEMENT | | | | 12593 (100)| |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 7 | 6384 | 12593 (1)| 00:02:32 |
| 3 | SORT UNIQUE | | 7 | 525 | 12587 (1)| 00:02:32 |
|* 4 | INDEX FAST FULL SCAN | YFS_SHIPMENT_LINE_H_I2 | 7 | 525 | 12587 (1)| 00:02:32 |
|* 5 | INDEX UNIQUE SCAN | YFS_SHIPMENT_H_PK | 1 | | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| YFS_SHIPMENT_H | 1 | 837 | 2 (0)| 00:00:01 |
=======================================================================================================
Predicate Information (identified by operation id):
===================================================
4 = filter(("ORDER_HEADER_KEY"='20150113083918815889858' OR
"ORDER_LINE_KEY"='20150113084438815896336'))
5 = access("SHIPMENT_KEY"="SHIPMENT_KEY")
Please note that the YFS_SHIPMENT_LINE_H is now being accessed with a different index (YFS_SHIPMENT_LINE_H_I2). As it turns out, this is not a very good index and the query response time suffers.
My question is: Why would the inner query execution plan change when it is run as part of the larger query? Once the optimizer has figured out the best way to access YFS_SHIPMENT_LINE_H, why wouldn't it continue to use the same execution plan even when it is part of the larger query?
Note: I am not too concerned about what would be the correct access path or the index to use; and hence not giving all the indexes on the table here; and the cardinality of the data. My concern is about the change when executed separately versus as part of another query.
Thanks.
-- Parag
I'm not sure why the Oracle optimizer decides to change the execution path. But, I think this is a better way to write the query:
SELECT s.*
FROM YFS_SHIPMENT_H s
WHERE s.SHIPMENT_KEY IN (SELECT sl.SHIPMENT_KEY
FROM YFS_SHIPMENT_LINE_H sl
WHERE sl.ORDER_HEADER_KEY = '20150113083918815889858'
) OR
s.SHIPMENT_KEY IN (SELECT sl.SHIPMENT_KEY
FROM YFS_SHIPMENT_LINE_H sl
WHERE sl.ORDER_LINE_KEY IN ('20150113084438815896336')
);
Notes:
There is no need to have SELECT DISTINCT in a subquery for IN. I'm pretty sure that Oracle ignores it, but it could add overhead.
Splitting the logic into two queries makes it more likely that Oracle can use indexes for the query (the best ones are on YFS_SHIPMENT_LINE_H(ORDER_HEADER_KEY, SHIPMENT_KEY) and YFS_SHIPMENT_LINE_H(ORDER_LINE_KEY, SHIPMENT_KEY)).
In the first query (not used as a subquery), the base table is accessed based on the conditions in the where clause. The indexes on the two columns involved are used for accessing the rows.
In the complex query, you are doing a semi-join. The optimizer, rightly or wrongly, has decided that it is more efficient to read the rows from the shipment table first, read the shipment_key, and use the index on shipment_key in the shipment_line table to retrieve rows to see if they are a match. The where clause conditions on the shipment_line table are now just filter predicates, they are not used to decide which rows to be retrieved from the table.
If you feel the optimizer got it wrong (which is possible, although not often with relatively simple queries like this one), make sure statistics are up-to-date. What would be relevant here is the size of each table, how many rows on average have the same shipment_key in shipment_line, and the selectiveness of the conditions in the where clause in the subquery. Keep in mind that for the outer query, it is not necessary to compute the subquery in full (and very likely Oracle does not compute it in full); for each row from the shipment table, as soon as a matching row in the shipment_line table is found that satisfies the where clause, the search for that shipment_key in shipment_line stops.
One thing you can do, if you really think the optimizer got it wrong, is to see what happens if you use hints. For example, you can tell the optimizer not to use the I2 index on shipment_line (pretend it doesn't exist) - see what plan it will come up with.
The join on shipment_key forces the optimizer to use the most selective index, in this case, the YFS_SHIPMENT_LINE_H_I2 index. Sterling created this index for this query and it is WRONG. Drop it (or make invisible) and watch your query pick up the correct plan. If you are hesitant to drop the index since it is part of the Sterling product, use SQL Plan Management baselines.
YFS_SHIPMENT_LINE_H_I2 SHIPMENT_KEY 1
YFS_SHIPMENT_LINE_H_I2 ORDER_HEADER_KEY 2
YFS_SHIPMENT_LINE_H_I2 ORDER_RELEASE_KEY 3
YFS_SHIPMENT_LINE_H_I2 ORDER_LINE_KEY 4
YFS_SHIPMENT_LINE_H_I2 REQUESTED_TAG_NUMBER 5
could someone explain this case?
example, i have a dump table with data like this:
TGL
19810909
19761026
19832529
when i execute with this query:
SELECT to_date(tgl,'YYYYMMDD') tgl
FROM
(
SELECT tgl
FROM tmpx
WHERE
SUBSTR(tgl,5,2) BETWEEN '01' AND '12'
AND length(tgl) = 8
)
WHERE to_date(tgl,'YYYYMMDD') < to_date('19811231','YYYYMMDD')
result: no error
TGL
09/09/1981
26/10/1976
but, when i execute with this query:
SELECT to_date(tgl,'YYYYMMDD') tgl
FROM
(
SELECT tgl
FROM tmpx
WHERE
SUBSTR(tgl,5,2) IN ('01','02','03','04','05','06','07','08','09','10','01','12')
AND length(tgl) = 8
)
WHERE to_date(tgl,'YYYYMMDD') < to_date('19811231','YYYYMMDD')
result: error
ORA-01843: not a valid month
why the row number third (19832529) include in selection that causes an error?
whereas if I execute the following query:
SELECT tgl
FROM tmpx
WHERE
SUBSTR(tgl,5,2) IN ('01','02','03','04','05','06','07','08','09','10','11','12')
AND length(tgl) = 8
the result is like this (wihtout row number 3)
TGL
19810909
19761026
thank you.
If you look at the execution plans for both queries you can see how they are being handled by the optimiser. For the first one:
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPX | 1 | 6 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(LENGTH("TGL")=8 AND SUBSTR("TGL",5,2)>='01' AND
SUBSTR("TGL",5,2)<='12' AND TO_DATE("TGL",'YYYYMMDD')<TO_DATE('
1981-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
And for the second:
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPX | 1 | 6 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(LENGTH("TGL")=8 AND TO_DATE("TGL",'YYYYMMDD')<TO_DATE('
1981-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
(SUBSTR("TGL",5,2)='01' OR SUBSTR("TGL",5,2)='02' OR
SUBSTR("TGL",5,2)='03' OR SUBSTR("TGL",5,2)='04' OR
SUBSTR("TGL",5,2)='05' OR SUBSTR("TGL",5,2)='06' OR
SUBSTR("TGL",5,2)='07' OR SUBSTR("TGL",5,2)='08' OR
SUBSTR("TGL",5,2)='09' OR SUBSTR("TGL",5,2)='10' OR
SUBSTR("TGL",5,2)='12'))
Notice the order that the filters are applied. In the first one it's looking at the substring first, and only the values that pass that filter will then be converted to a date for the 1998 comparison.
In the second one the date check is being done first, so it tries to convert the invalid value before it filters it out.
The real problem here is storing dates as string, which allows invalid data to be entered. If you're stuck with that then another approach is to use a function to attempt to convert the string to a date and ignore the error(s) thrown, which still isn't ideal but would ignore the same values you already are. There are lots of examples of this, including this one of mine. With something like that you could do:
SELECT safe_to_date(tgl) tgl
FROM tmpx
WHERE safe_to_date(tgl) < date '1981-12-31';
or if you prefer:
SELECT tgl
FROM (
SELECT safe_to_date(tgl) tgl
FROM tmpx
)
WHERE tgl < date '1981-12-31';
Your function could only look for YYYYMMDD format strings, or you could pass in the format you want to check, if you don't want it to be flexible.
Consider the problem of applying changes to an aggregate table. Row that exist must be updated while new rows must be inserted. My approach was as follows:
Insert all changes in a temporary table (100K at a time)
MERGE the temporary table into the main table (eventually reaching 100s of millions rows)
The SQL (with a SORT MERGE hint) looks as follows (nothing fancy):
merge /*+ USE_MERGE(t s) */
into F_SCREEN_INSTANCE t
using F_SCREEN_INSTANCE_BUF s
on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID)
when matched then update set
t.ACTIVE_TIME_SUM = t.ACTIVE_TIME_SUM + s.ACTIVE_TIME_SUM,
t.IDLE_TIME_SUM = t.IDLE_TIME_SUM + s.IDLE_TIME_SUM
when not matched then insert values (
s.DAY_ID, s.PARTIAL_ID, s.ID, s.AGENT_USER_ID, s.COMPUTER_ID, s.RAW_APPLICATION_ID, s.APP_USER_ID, s.APPLICATION_ID, s.USER_ID, s.RAW_MODULE_ID, s.MODULE_ID, s.START_TIME, s.RAW_SCREEN_NAME, s.SCREEN_ID, s.SCREEN_TYPE, s.ACTIVE_TIME_SUM, s.IDLE_TIME_SUM)
The F_SCREEN_INSTANCE table has (DAY_ID, PARTIAL_ID) as a primary key and also is IOT (index organized table). This makes it an ideal candidate for a merge join: the rows are physically sorted by the lookup key.
So far so good. I've started a benchmark and the initial times looked good, 10s for one merge. But after about an hour, the merges were taking about 4 min with heavy tempdb usage (4GB per merge). The query plan below shows that F_SCREEN_INSTANCE is re-sorted before the merge, even though the table is ideally sorted already. And of course, as the table grows even more tempdb will be needed and the whole approach falls apart.
OK, so why re-sort the table? It turns to be a limitation of the merge join implementation: the second table is always sorted.
If an index exists, then the database can avoid sorting the first data
set. However, the database always sorts the second data set,
regardless of indexes.
O...K, so then can I make the main table to be first and the buffer to be second? Nope, that's not possible either. No matter how I list the tables in the USE_MERGE hint, the source table is always first.
Finally, here is my question: Have I missed anything? Is it possible to make this SORT MERGE approach work?
Here are some more details addressing questions you might ask:
What Oracle version? 12c.
Have you tried HASH JOIN? Yes, it's bad, as expected. The main table needs to be scanned in order to build the hash table. It can't scale as F_SCREEN_INSTANCE grows.
Have you tried LOOP JOIN? Yes, it's also bad. Considering the size of the buffer table, 100K lookups into F_SCREEN_INSTANCE take unreasonably long. Merges took about 3 min very quickly.
All in all, the MERGE JOIN is conceptually the best access strategy, but the Oracle implementation seems to be severely crippled by re-sorting the target table.
Sort merge outer joins will always put the outer-joined table second regardless of the hints. Adding an extra inner-join allows control of the join order, and then ROWID can be used to join again to the large table. Hopefully two good joins will work better than one bad join.
Assumptions
This answer assumes that the sort merge join is the fastest join, and that the manual is correct that the second data set is always sorted. It would be difficult to test these assumptions without significantly more information about the data.
Sample Schema
Here are some similar tables, with fake statistics to make the optimizer think they have 500M rows and 100K rows.
create table F_SCREEN_INSTANCE(DAY_ID number, PARTIAL_ID number, ID number, AGENT_USER_ID number,COMPUTER_ID number, RAW_APPLICATION_ID number, APP_USER_ID number, APPLICATION_ID number, USER_ID number, RAW_MODULE_ID number,MODULE_ID number, START_TIME date, RAW_SCREEN_NAME varchar2(100), SCREEN_ID number, SCREEN_TYPE number, ACTIVE_TIME_SUM number, IDLE_TIME_SUM number,
constraint f_screen_instance_pk primary key (day_id, partial_id)
) organization index;
create table F_SCREEN_INSTANCE_BUF(DAY_ID number, PARTIAL_ID number, ID number, AGENT_USER_ID number,COMPUTER_ID number, RAW_APPLICATION_ID number, APP_USER_ID number,APPLICATION_ID number, USER_ID number, RAW_MODULE_ID number, MODULE_ID number, START_TIME date, RAW_SCREEN_NAME varchar2(100), SCREEN_ID number, SCREEN_TYPE number, ACTIVE_TIME_SUM number, IDLE_TIME_SUM number,
constraint f_screen_instance_buf_pk primary key (day_id, partial_id)
);
begin
dbms_stats.set_table_stats(user, 'F_SCREEN_INSTANCE', numrows => 500000000);
dbms_stats.set_table_stats(user, 'F_SCREEN_INSTANCE_BUF', numrows => 100000);
end;
/
The Problem
The desired join and join order can be achieved with the LEADING hint when an inner join is used. The smaller table, F_SCREEN_INSTANCE_BUF, is the second table.
explain plan for
select /*+ use_merge(t s) leading(t s) */ *
from f_screen_instance_buf s
join f_screen_instance t
on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID);
select * from table(dbms_xplan.display(format => '-predicate'));
Plan hash value: 563239985
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 19M| | 6898 (66)| 00:00:01 |
| 1 | MERGE JOIN | | 100K| 19M| | 6898 (66)| 00:00:01 |
| 2 | INDEX FULL SCAN | F_SCREEN_INSTANCE_PK | 500M| 46G| | 4504 (100)| 00:00:01 |
| 3 | SORT JOIN | | 100K| 9765K| 26M| 2393 (1)| 00:00:01 |
| 4 | TABLE ACCESS FULL| F_SCREEN_INSTANCE_BUF | 100K| 9765K| | 34 (6)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
The LEADING hint does not work when changing to a left join.
explain plan for
select /*+ use_merge(t s) leading(t s) */ *
from f_screen_instance_buf s
left join f_screen_instance t
on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID);
select * from table(dbms_xplan.display(format => '-predicate'));
Plan hash value: 1472690071
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 19M| | 16M (1)| 00:10:34 |
| 1 | MERGE JOIN OUTER | | 100K| 19M| | 16M (1)| 00:10:34 |
| 2 | TABLE ACCESS BY INDEX ROWID| F_SCREEN_INSTANCE_BUF | 100K| 9765K| | 826 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | F_SCREEN_INSTANCE_BUF_PK | 100K| | | 26 (0)| 00:00:01 |
| 4 | SORT JOIN | | 500M| 46G| 131G| 16M (1)| 00:10:34 |
| 5 | INDEX FAST FULL SCAN | F_SCREEN_INSTANCE_PK | 500M| 46G| | 2703 (100)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------
This limitation is not documented as far as I can tell. I tried using the +outline setting of DBMS_XPLAN to see the full set of hints and then changed them around. But nothing I did could make the join order change for the LEFT JOIN version. Perhaps someone else can get this to work.
select * from table(dbms_xplan.display(format => '-predicate +outline'));
...
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
USE_MERGE(#"SEL$0E991E55" "T"#"SEL$1")
LEADING(#"SEL$0E991E55" "S"#"SEL$1" "T"#"SEL$1")
INDEX_FFS(#"SEL$0E991E55" "T"#"SEL$1" ("F_SCREEN_INSTANCE"."DAY_ID" "F_SCREEN_INSTANCE"."PARTIAL_ID"))
INDEX(#"SEL$0E991E55" "S"#"SEL$1" ("F_SCREEN_INSTANCE_BUF"."DAY_ID"
"F_SCREEN_INSTANCE_BUF"."PARTIAL_ID"))
OUTLINE(#"SEL$9EC647DD")
OUTLINE(#"SEL$2")
MERGE(#"SEL$9EC647DD")
OUTLINE_LEAF(#"SEL$0E991E55")
ALL_ROWS
DB_VERSION('12.1.0.1')
OPTIMIZER_FEATURES_ENABLE('12.1.0.1')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
Possible Solution
--#3: Join the large table to the smaller result set. This uses the largest table twice,
--but the plan can use the ROWID for a very quick join.
explain plan for
merge into F_SCREEN_INSTANCE t
using
(
--#2: Now get the missing rows with an outer join. Since the _BUF table is
--small I assume it does not make a big difference exactly how it it joind
--to the 100K result set.
--The hints NO_MERGE and NO_PUSH_PRED are required to keep the INNER_JOIN
--inline view intact.
select /*+ no_merge(inner_join) no_push_pred(inner_join) */ inner_join.*
from f_screen_instance_buf s
left join
(
--#1: Get 100K rows efficiently with an inner join.
--Note that the ROWID is retrieved here.
select /*+ use_merge(t s) leading(t s) */ s.*, s.rowid s_rowid
from f_screen_instance_buf s
join f_screen_instance t
on (s.DAY_ID = t.DAY_ID and s.PARTIAL_ID = t.PARTIAL_ID)
) inner_join
on (s.DAY_ID = inner_join.DAY_ID and s.PARTIAL_ID = inner_join.PARTIAL_ID)
) s
on (s.s_rowid = t.rowid)
when matched then update set
t.ACTIVE_TIME_SUM = t.ACTIVE_TIME_SUM + s.ACTIVE_TIME_SUM,
t.IDLE_TIME_SUM = t.IDLE_TIME_SUM + s.IDLE_TIME_SUM
when not matched then insert values (
s.DAY_ID, s.PARTIAL_ID, s.ID, s.AGENT_USER_ID, s.COMPUTER_ID, s.RAW_APPLICATION_ID, s.APP_USER_ID, s.APPLICATION_ID, s.USER_ID, s.RAW_MODULE_ID, s.MODULE_ID, s.START_TIME, s.RAW_SCREEN_NAME, s.SCREEN_ID, s.SCREEN_TYPE, s.ACTIVE_TIME_SUM, s.IDLE_TIME_SUM);
It ain't pretty, but at least it generates a plan with the large table first in the sort merge join.
select * from table(dbms_xplan.display);
Plan hash value: 1086560566
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 500G| 173T| | 5355K (43)| 00:03:30 |
| 1 | MERGE | F_SCREEN_INSTANCE | | | | | |
| 2 | VIEW | | | | | | |
|* 3 | HASH JOIN OUTER | | 500G| 179T| 29M| 5355K (43)| 00:03:30 |
|* 4 | HASH JOIN OUTER | | 100K| 28M| 3712K| 8663 (53)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| F_SCREEN_INSTANCE_BUF_PK | 100K| 2539K| | 9 (0)| 00:00:01 |
| 6 | VIEW | | 100K| 25M| | 6898 (66)| 00:00:01 |
| 7 | MERGE JOIN | | 100K| 12M| | 6898 (66)| 00:00:01 |
| 8 | INDEX FULL SCAN | F_SCREEN_INSTANCE_PK | 500M| 12G| | 4504 (100)| 00:00:01 |
|* 9 | SORT JOIN | | 100K| 9765K| 26M| 2393 (1)| 00:00:01 |
| 10 | TABLE ACCESS FULL| F_SCREEN_INSTANCE_BUF | 100K| 9765K| | 34 (6)| 00:00:01 |
| 11 | INDEX FAST FULL SCAN | F_SCREEN_INSTANCE_PK | 500M| 46G| | 2703 (100)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("INNER_JOIN"."S_ROWID"=("T".ROWID(+)))
4 - access("S"."PARTIAL_ID"="INNER_JOIN"."PARTIAL_ID"(+) AND
"S"."DAY_ID"="INNER_JOIN"."DAY_ID"(+))
9 - access("S"."DAY_ID"="T"."DAY_ID" AND "S"."PARTIAL_ID"="T"."PARTIAL_ID")
filter("S"."PARTIAL_ID"="T"."PARTIAL_ID" AND "S"."DAY_ID"="T"."DAY_ID")
Say we have two tables, TEST and TEST_CHILDS in the following way:
creat TABLE TEST(id1 number PRIMARY KEY, word VARCHAR(50),numero number);
creat TABLE TEST_CHILD (id2 number references test(id), word2 VARCHAR(50));
CREATE INDEX TEST_IDX ON TEST_CHILD(word2);
CREATE INDEX TEST_JOIN_IDX ON TEST_CHILD(id);
insert into TEST SELECT ROWNUM,U1.USERNAME||U2.TABLE_NAME, LENGTH(U1.USERNAME) FROM ALL_USERS U1,ALL_TABLES U2;
INSERT INTO TEST_CHILD SELECT MOD(ROWNUM,15000)+1,U1.USER_ID||U2.TABLE_NAME FROM ALL_USERS U1,ALL_TABLES U2;
We would like to query to get rows from TEST table that satisfy some criteria in the child table, so we go for:
SELECT /*+FIRST_ROWS(10)*/* FROM TEST T WHERE EXISTS (SELECT NULL FROM TEST_CHILD TC WHERE word2 like 'string%' AND TC.id = T.id ) AND ROWNUM < 10;
We always want just the first 10 results, not any more at all. Therefore, we would like to get the same response time to read 10 results whether table has 10 matching values or 1,000,000; since it could get 10 distinct results from the child table and get the values on the parent table (or at least that is the plan that we would like). But when checking the actual execution plan we see:
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 54 | 5 (20)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | NESTED LOOPS | | | | | |
| 3 | NESTED LOOPS | | 1 | 54 | 5 (20)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 23 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TEST_CHILD | 1 | 23 | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TEST_IDX | 1 | | 2 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | SYS_C005145 | 1 | | 0 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | TEST | 1 | 31 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
6 - access("WORD2" LIKE 'string%')
filter("WORD2" LIKE 'string%')
7 - access("TC"."ID"="T"."ID")
SORT UNIQUE under the STOPKEY, what afaik means that it is reading all results from the child table, making the distinct to finally select only the first 10, making the query not as scalable as we would like it to be.
Is there any mistake in my example?
Is it possible to improve this execution plan so it scales better?
The SORT UNIQUE is going to find and sort all of the records from TEST_CHILD that matched 'string%' - it is NOT going to read all results from child table. Your logic requires this. IF you only picked the first 10 rows from TEST_CHILD that matched 'string%', and those 10 rows all had the same ID, then your final results from TEST would only have 1 row.
Anyway, your performance should be fine as long as 'string%' matches a relatively low number of rows in TEST_CHILD. IF your situation is such that 'string%' often matches a HUGE record count on TEST_CHILD, there's not much you can do to make the SQL more performant given the current tables. In such a case, if this is a mission-critical SQL, with performance tied to your annual bonus, there's probably some fancy footwork you could do with MATERIALIZED VIEWs to, e.g. pre-compute 10 TEST rows for high-cardinality WORD2 values in TEST_CHILD.
One final thought - a "risky" solution, but one which should work if you don't have thousands of TEST_CHILD rows matching the same TEST row, would be the following:
SELECT *
FROM TEST
WHERE ID1 IN
(SELECT ID2
FROM TEST_CHILD
WHERE word2 like 'string%'
AND ROWNUM < 1000)
AND ROWNUM <10;
You can adjust 1000 up or down, of course, but if it's too low, you risk finding less than 10 distinct ID values, which would give you final results with less than 10 rows.
I have below simple dynamic select query
Select RELATIONSHIP
from DIME_MASTER
WHERE CIN=? AND SSN=? AND ACCOUNT_NUMBER=?
The table has 1,083,701 records. This query takes 11 to 12 secs to execute which is expensive. DIME_MASTER table has ACCOUNT, CARD_NUMBER INDEXES. Please help me to optimize this query so that query execution time is under fraction of second.
Look at the predicate information:
--------------------------------------
1 - filter(TO_NUMBER("DIME_MASTER"."SSN")=226550956
AND TO_NUMBER("DIME_MASTER"."ACCOUNT_NUMBER")=4425050005218650
AND TO_NUMBER("DIME_MASTER"."CIN")=00335093464)
The type of your columns is NVARCHAR, but parameters in the query are NUMBERs.
Oracle must cast numbers to strings, but it is sometimes not very smart in casting.
Oracles and fortune-tellers are not always right ;)
These casts prevents the query from using indices.
Rewrite the query using explicit conversion into:
Select RELATIONSHIP
from DIME_MASTER
WHERE CIN=to_char(?) AND SSN=to_char(?) AND ACCOUNT_NUMBER=to_char(?)
then run this command:
exec dbms_stats.gather_table_stats( user, 'DIME_MASTER' );
and run the query and show us a new explain plan.
Would you please do not paste explain plans here, they are unreadable,
please use pastebin instead, and paste only links here, thank you.
Look at this simple example, it shows why you need explicit casts:
CREATE TABLE "DIME_MASTER" (
"ACCOUNT_NUMBER" NVARCHAR2(16)
);
insert into dime_master
select round( dbms_random.value( 1, 100000 )) from dual
connect by level <= 100000;
commit;
create index dime_master_acc_ix on dime_master( account_number );
explain plan for select * from dime_master
where account_number = 123;
select * from table( dbms_xplan.display );
Plan hash value: 1551952897
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 54 | 70 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| DIME_MASTER | 3 | 54 | 70 (3)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER("ACCOUNT_NUMBER")=123)
explain plan for select * from dime_master
where account_number = to_char( 123 );
select * from table( dbms_xplan.display );
Plan hash value: 3367829596
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 54 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| DIME_MASTER_ACC_IX | 3 | 54 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ACCOUNT_NUMBER"=U'123')
Depending on the cardinality of the columns (Total rows / unique values ) - you can create bitmap indexes on each column. Bitmap indexes are very usefull for and / or operations.
Rule of thumb says that a bitmap index is useful for cardinality of more then 10%.
create bitmap index DIME_MASTER_CIN_BIX on DIME_MASTER (CIN);