having sql for a big table join (232mln records) with GTT by index. explanation looks like below:
4 NESTED LOOPS
( Estim. Costs = 439,300 , Estim. #Rows = 548,275 )
Estim. CPU-Costs = 3,642,574,678 Estim. IO-Costs = 438,956
1 INDEX FAST FULL SCAN ZTRM_REXP_PRESEL~0
( Estim. Costs = 336 , Estim. #Rows = 548,275 )
Estim. CPU-Costs = 3,432,714 Estim. IO-Costs = 336
3 TABLE ACCESS BY INDEX ROWID BATCHED TEXT_REXP_ITEM
( Estim. Costs = 1 , Estim. #Rows = 1 )
Estim. CPU-Costs = 6,637 Estim. IO-Costs = 1
Filter Predicates
2 INDEX RANGE SCAN TEXT_REXP_ITEM~Y01
( Estim. Costs = 1 , Estim. #Rows = 1 )
Search Columns: 3
Estim. CPU-Costs = 4,523 Estim. IO-Costs = 1
Access Predicates
it shows wrong estimations because of GTT usage. the goal is to make Nested loop for index (2) and gtt (1) first and only then make access to a table itself (3). for some reason, hint USE_NL_WITH_INDEX("TEXT_REXP_ITEM" "TEXT_REXP_ITEM~Y01") is simply being ignored. any ideas why?
(1) consists of
EXPOSURE_ID
VERSION
(2) consists of
Column Name #Distinct
MANDT 1
ZZHEAD_EXPOSURE_ID 251,454
ZZHEAD_VERSION 3,217
ZZHEAD_ATTRIBUTE_DH01 1,691
EXT_ITEM_ID 823
ZZHEAD_ATTRIBUTE_LH01 3
ZZHEAD_RELEASE_STATE 1
(1) and (2) are joined by exposure_id and version fields
text explanation
| 3 | NESTED LOOPS | | 548K| 135M| 439K (1)| 00:00:18 |
| 4 | INDEX FAST FULL SCAN | ZTRM_REXP_PRESEL~0 | 548K| 16M| 336 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID BATCHED| TEXT_REXP_ITEM | 1 | 228 | 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TEXT_REXP_ITEM~Y01 | 1 | | 1 (0)| 00:00:01 |
thank you
The optimizer is obeying the hint. As the docs say:
The USE_NL_WITH_INDEX hint instructs the optimizer to join the
specified table to another row source with a nested loops join using
the specified table as the inner table
In a nested loop, the outer table is the first one accessed. The inner table is the second.
So the plan uses ZTRM_REXP_PRESEL~0 as the outer table. And TEXT_REXP_ITEM as the inner table. Which is exactly what you've asked for!
Constructing a similar example and using Oracle Database 19c's hint reporting mechanism shows the hint is followed:
create table t1 (
c1 int
);
create table t2 (
c1 int, c2 varchar2(100)
);
create index i1
on t1 ( c1 );
create index i2
on t2 ( c1 );
insert into t1 values ( 1, 'stuff' );
insert into t2
with rws as (
select level x from dual
connect by level <= 1000
)
select x, rpad ( 'stuff', 100, 'f' )
from rws;
exec dbms_stats.gather_table_stats ( user, 't1' ) ;
exec dbms_stats.gather_table_stats ( user, 't2' ) ;
set serveroutput off
select /*+ USE_NL_WITH_INDEX ( T2 I2 ) */*
from t1
join t2
on t1.c1 = t2.c1;
select *
from table(dbms_xplan.display_cursor(null, null, 'BASIC LAST +HINT_REPORT'));
Plan hash value: 3271411982
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
| 3 | INDEX FULL SCAN | I1 |
| 4 | INDEX RANGE SCAN | I2 |
| 5 | TABLE ACCESS BY INDEX ROWID| T2 |
---------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
4 - SEL$58A6D7F6 / T2#SEL$1
- USE_NL_WITH_INDEX ( T2 I2 )
Related
I have a table with 2 columns
item_name varchar2
brand varchar2
both of them have bitmap index
let's say I create a view for a specific brand and rename the column item_name ,something like that
create view my_brand as
select item_name as item from table x where brand='x'
We cannot create an index on a normal view but what is Oracle doing when issuing the underlying query of that view? Is the index of the item_name column being used if we write select item from my_brand where item='item1'?
thanks
The answer will be “it depends”. The index access path is certainly an option open to the optimizer; but remember that the optimizer makes a cost based decision. So essentially it will evaluate the cost of all the available plans and choose the one with the lowest cost.
Here is an example:
create table tab1 ( item_name varchar2(15), brand varchar2(15) );
insert into tab1
select 'Name '||to_char( rownum), 'Brand '||to_char(mod(rownum,10))
from dual
connect by rownum < 1000000
commit;
exec dbms_stats.gather_table_stats( user, 'TAB1' );
create bitmap index bm1 on tab1 ( item_name );
create bitmap index bm2 on tab1 ( brand );
create or replace view my_brand
as select item_name as item from tab1 where brand = 'Brand 1';
explain plan for
select item from my_brand where item = 'Name 1001'
select * from table( dbms_xplan.display )
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB1 | 1 | 20 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | BM1 | | | | |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("BRAND"='Brand 1')
3 - access("ITEM_NAME"='Name 1001')
We can create an index on a normal view
No, you can't
SQL> create table as_idx_view (item_name varchar2(10), brand varchar2(10));
Table created.
SQL> create view as_view as select item_name item from as_idx_view where brand = 'X';
View created.
SQL> create index as_idx_view_idx1 on as_view (item);
create index as_idx_view_idx1 on as_view (item)
*
ERROR at line 1:
ORA-01702: a view is not appropriate here
I have a query in which I am producing results with rows that contain 0 values. I would like to exclude any rows in which columns B or C = 0. To exclude such rows, I have added the T2.A <> 0 and T2.A != 0. When I do this, the 0 values are replaced with NULLs. Thus I also added T2.A IS NOT NULL.
My results still produce the columns that I do not need which show (null) and would like to exclude these.
SELECT
(SELECT
SUM(T2.A) as prem
FROM Table_2 T2, Table_2 T1
WHERE T2.ENT_REF = T1.ENT_REF
AND UPPER(T2.PER) = 'HURR'
AND UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
AND T2.A <> 0
AND T2.A IS NOT NULL
) as B,
(SELECT
SUM(T2.A) as prem
FROM Table_2 T2, Table_2 T1
WHERE T2.ENT_REFE = T1.ENT_REF
AND UPPER(T2.PER) IN ('I', 'II', 'II')
AND UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
AND T2.A <> 0
AND T2.A IS NOT NULL
) as C
Ideally the result will go from:
+----+--------+--------+
| ID | B | C |
+----+--------+--------+
| 1 | 24 | 123 |
| 2 | 65 | 78 |
| 3 | 43 | 89 |
| 3 | 0 | 0 |
| 4 | 95 | 86 |
| 5 | 43 | 65 |
| 5 | (null) | (null) |
+----+--------+--------+
To something similar to the following:
+----+-----+-----+
| ID | B | C |
+----+-----+-----+
| 1 | 24 | 123 |
| 2 | 65 | 78 |
| 3 | 43 | 89 |
| 4 | 95 | 86 |
| 5 | 43 | 65 |
+----+-----+-----+
I have also attempted distinct values, but I have other columns such as dates which are different per row. Although I need to include dates, they are not as important to me as only getting B and C columns with only values > 0. I have also tried using a GROUP BY ID statement, but I get an error that states 'ORA-00979: not a GROUP BY expression'
You have written all the conditions in the SELECT clause.
You are facing the issue because the WHERE clause decides the number of rows to be fetched and SELECT clause decides values to be returned.
In your case, something like the following is happening:
Simple Example:
-- MANUAL DATA
WITH DATAA AS (
SELECT
1 KEY,
'VALS' VALUE,
1 SEQNUM
FROM
DUAL
UNION ALL
SELECT
2,
'IDEAL OPTION',
2
FROM
DUAL
UNION ALL
SELECT
10,
'EXCLUDE',
3
FROM
DUAL
)
-- QUERY OF YOUR TYPE
SELECT
(
SELECT
KEY
FROM
DATAA I
WHERE
I.KEY = 1
AND O.KEY = I.KEY
) AS KEY, -- DECIDE VALUES TO BE SHOWN
(
SELECT
KEY
FROM
DATAA I
WHERE
I.SEQNUM = 1
AND O.SEQNUM = I.SEQNUM
) AS SEQNUM -- DECIDE VALUES TO BE SHOWN
FROM
DATAA O
WHERE
O.KEY <= 2; -- DECIDES THE NUMBER OF RECORDS
OUTPUT:
If you don't want to change much logic in your query then just use additional WHERE clause outside your final query like:
SELECT <bla bla bla>
FROM <YOUR FINAL QUERY>
WHERE B IS NOT NULL AND C IS NOT NULL
Cheers!!
I guess you were on the right track, trying to group values.
In order to do that, columns (that are supposed to be distinct) will be left alone (such as ID in the following example), while the rest should be aggregated (using min, max or any other you find appropriate).
For example, as you said that there's some date column you don't care about - I mean, which one of them you'll select - then select the first one (i.e. min(date_column)). Similarly, you'd do with the rest. The group by clause should contain all non-aggregated columns (id in this example).
select id,
sum(a) a,
sum(b) b,
min(date_column) date_column
from your_current_query
group by id
If I understand your query right, it would be much easier and more performant, to avoid the lookups in the Select clause. Try to bring it all in one Query:
SELECT * FROM (
SELECT T2.ENT_REF AS ID,
SUM(CASE WHEN UPPER(T2.PER) = 'HURR' THEN T2.A END) AS B,
SUM(CASE WHEN UPPER(T2.PER) IN ('I', 'II', 'II') THEN T2.A END) as C
FROM Table_2 T2
WHERE UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
GROUP BY T2.ENT_REF
)
WHERE B IS NOT NULL
OR C IS NOT NULL
We are optimizing performance and want to create the materialized view (several millions of records) built on joins of handful of tables. This view will be used in order to show users documents in folders with delay not more than several (3-5) seconds.
I suppose it must be out-of-place MV with refresh interval several seconds.
Is it acceptable solution from database point of view?
The view will be something like this:
SELECT *
FROM documents this_
LEFT OUTER JOIN account_statements this_1_
ON this_.Id = this_1_.FK_Document
LEFT OUTER JOIN contracts this_2_ ON this_.Id = this_2_.FK_Document
LEFT OUTER JOIN pension_agreements this_3_
ON this_.Id = this_3_.FK_Contract
LEFT OUTER JOIN dead this_4_ ON this_.Id = this_4_.FK_Document
LEFT OUTER JOIN pay_orders this_5_ ON this_.Id = this_5_.FK_Document
LEFT OUTER JOIN pay_registers this_6_
ON this_.Id = this_6_.FK_Document
LEFT OUTER JOIN pocards this_7_ ON this_.Id = this_7_.FK_Document
LEFT OUTER JOIN ransom_agreements this_8_
ON this_.Id = this_8_.FK_Document
LEFT OUTER JOIN successor_statements this_9_
ON this_.Id = this_9_.FK_Document
INNER JOIN document_treenodes treenodes14_
ON this_.Id = treenodes14_.fk_document
INNER JOIN treenodes treenode2_
ON treenodes14_.fk_treenode = treenode2_.Id
LEFT OUTER JOIN registration_cards regcard1_
ON this_.fk_registration_card = regcard1_.Id
LEFT OUTER JOIN employees todirectem12_
ON regcard1_.to_direct = todirectem12_.Id
LEFT OUTER JOIN REG_CARD_STATUSES regcardsta11_
ON regcard1_.status = regcardsta11_.Id
LEFT OUTER JOIN filestorages filestorag10_
ON this_.fk_file = filestorag10_.Id
LEFT OUTER JOIN actions holdaction4_
ON this_.fk_hold = holdaction4_.Id
LEFT OUTER JOIN employees holdemploy5_
ON holdaction4_.fk_operator = holdemploy5_.Id
LEFT OUTER JOIN actions doneaction6_
ON this_.fk_done = doneaction6_.Id
LEFT OUTER JOIN employees doneemploy7_
ON doneaction6_.fk_operator = doneemploy7_.Id
LEFT OUTER JOIN actions signaction8_
ON this_.fk_signed = signaction8_.Id
LEFT OUTER JOIN employees signemploy9_
ON signaction8_.fk_operator = signemploy9_.Id
LEFT OUTER JOIN actions scanaction3_
ON this_.fk_scan = scanaction3_.Id
UPDATE
The bottleneck is in the following:
SELECT *
FROM documents this_
INNER JOIN document_treenodes treenodes14_ ON this_.Id = treenodes14_.fk_document
INNER JOIN treenodes treenode2_ ON treenodes14_.fk_treenode = treenode2_.Id
LEFT OUTER JOIN registration_cards regcard1_ ON this_.fk_registration_card = regcard1_.Id
WHERE (
regcard1_.status IS NULL OR
(
NOT (
regcard1_.status = 3 /* :p0 */)
AND
NOT (
regcard1_.status = 4 /* :p1 */)
)
)
AND
this_.fk_deleted IS NULL AND
(
this_.isdelete IS NULL OR
this_.isdelete = 0 /* :p2 */)
AND
treenode2_.Id = 1235 /* :p3 */ AND
this_.fk_done IS NULL AND
(
regcard1_.status IS NULL OR
NOT (
regcard1_.status = 1 /* :p4 */)
)
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
treenode2_.Id = 1235 /* :p3 */ AND
this_.fk_done IS NULL AND
(
regcard1_.status IS NULL OR
NOT (
regcard1_.status = 1 /* :p4 */)
)
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
And the plan is:
Plan hash value: 3579815467
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 85M| | 50518 (1)| 00:00:04 |
|* 1 | VIEW | | 105K| 85M| | 50518 (1)| 00:00:04 |
|* 2 | WINDOW SORT PUSHED RANK | | 105K| 13M| 14M| 50518 (1)| 00:00:04 |
|* 3 | HASH JOIN RIGHT OUTER | | 105K| 13M| | 48503 (1)| 00:00:04 |
| 4 | INDEX FULL SCAN | REG_CARD_STATUSES_PK | 4 | 12 | | 1 (0)| 00:00:01 |
|* 5 | FILTER | | | | | | |
|* 6 | HASH JOIN RIGHT OUTER | | 105K| 13M| 4048K| 48502 (1)| 00:00:04 |
| 7 | TABLE ACCESS FULL | REGISTRATION_CARDS | 84317 | 3046K| | 171 (2)| 00:00:01 |
|* 8 | HASH JOIN | | 183K| 17M| 3936K| 47339 (1)| 00:00:04 |
|* 9 | INDEX FAST FULL SCAN| DOCUMENT_TREENODE_PK | 183K| 1788K| | 1872 (2)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | DOCUMENTS | 5064K| 425M| | 24635 (2)| 00:00:02 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_010"."rowlimit_$$_rownumber"<=0+50 AND
"from$_subquery$_010"."rowlimit_$$_rownumber">0)
2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("THIS_"."ID") DESC )<=0+50)
3 - access("REGCARD1_"."STATUS"="REGCARDSTA11_"."ID"(+))
5 - filter(("REGCARD1_"."STATUS" IS NULL OR "REGCARD1_"."STATUS"<>3 AND
"REGCARD1_"."STATUS"<>4) AND ("REGCARD1_"."STATUS" IS NULL OR "REGCARD1_"."STATUS"<>1))
6 - access("THIS_"."FK_REGISTRATION_CARD"="REGCARD1_"."ID"(+))
8 - access("THIS_"."ID"="TREENODES14_"."FK_DOCUMENT")
9 - filter("TREENODES14_"."FK_TREENODE"=1235)
10 - filter("THIS_"."FK_DONE" IS NULL AND ("THIS_"."ISDELETE"=0 OR "THIS_"."ISDELETE" IS NULL)
AND "THIS_"."FK_DELETED" IS NULL)
I took the liberty of 'reformatting' your query (BTW: I think there is a bit of a copy-paste error in there, some parts look doubled near the end)
The 'layout' will have zero effect on the actual execution times, but it makes it easier for me to understand what you're doing (simply because I'm used to my own style, I'm not claiming it's better, it's simply what I am used to)
Anyway, if I understood correctly and didn't mess up the brackets, then this should be equivalent to your query:
SELECT *
FROM documents this_
INNER JOIN document_treenodes treenodes14_
ON treenodes14_.fk_document = this_.Id
INNER JOIN treenodes treenode2_
ON treenode2_.Id = treenodes14_.fk_treenode
AND treenode2_.Id = 1235 /* :p3 */
LEFT OUTER JOIN registration_cards regcard1_
ON regcard1_.Id = this_.fk_registration_card
WHERE this_.fk_deleted IS NULL
AND this_.fk_done IS NULL
AND (
this_.isdelete IS NULL OR this_.isdelete = 0 /* :p2 */
)
AND (
regcard1_.status IS NULL OR regcard1_.status NOT IN (3 /* :p0 */, 4 /* :p1 */, 1 /* :p4 */)
)
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
I doubt this change will make much difference, but it might change the way the system approaches steps 8 & 9 in the current plan. Worth a try =)
Anyway, what I 'learned' from the query is that you seem to want all records that have no matching [registration_cards] record, but if there are, then they should not have status 3, 4 or 1 (:p0, :p1, :p4 respectively).
=> wouldn't this be equivalent to saying that you want all [documents] records for which there is no matching [registration_cards] record that has status 3, 4 or 1?
SELECT *
FROM documents this_
INNER JOIN document_treenodes treenodes14_
ON treenodes14_.fk_document = this_.Id
INNER JOIN treenodes treenode2_
ON treenode2_.Id = treenodes14_.fk_treenode
AND treenode2_.Id = 1235 /* :p3 */
WHERE this_.fk_deleted IS NULL
AND this_.fk_done IS NULL
AND (
this_.isdelete IS NULL OR this_.isdelete = 0 /* :p2 */
)
AND NOT EXISTS ( SELECT *
FROM registration_cards regcard1_
WHERE regcard1_.Id = this_.fk_registration_card
AND regcard1_.status IN (3 /* :p0 */, 4 /* :p1 */, 1 /* :p4 */) )
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
Assuming [registration_card].Id is the PK of the table, or there is a covering index on the status and Id field, this might be slightly faster. Like I said before, I'm under the impression most of the time is lost in sorting the result-set but then again I might be totally misinterpreting the explain plan. Googling around actually seems to inform me that the explain plan is but guesswork and not 'the real deal'.... sigh, at times I really pity you poor Oracle users =P
Assume I have 2 tables - TABLE-1 & TABLE-2 and each of the table has 1 million rows with 10 columns and index on col1..
Now I build a internal table on this 2 tables ( 1 + 1 = 2 million) rows,
select * from
(select col1, col2,....col10 from table-1
union all
select col1, col2,....col10 from table-2) x
Questions,
how will the internal table will be treated in Oracle since its a internal table..
1. Will the internal table will be treated as a table with index on col1?
2. Will this be captured in the Explain plan?
Yes and yes.
Oracle will effectively treat this inline view as a table. It can use predicate pushing to apply a filter on the inline view to the base tables, and potentially use an index. The explain plan will show this.
Tables, indexes, sample data, and statistics
create table table1(col1 number, col2 number, col3 number, col4 number);
create table table2(col1 number, col2 number, col3 number, col4 number);
create index table1_idx on table1(col1);
create index table2_idx on table2(col1);
insert into table1 select level, level, level, level
from dual connect by level <= 100000;
insert into table2 select level, level, level, level
from dual connect by level <= 100000;
commit;
begin
dbms_stats.gather_table_stats(user, 'TABLE1');
dbms_stats.gather_table_stats(user, 'TABLE2');
end;
/
Explain plan showing predicate pushing and index access
explain plan for
select * from
(
select col1, col2, col3, col4 from table1
union all
select col1, col2, col3, col4 from table2
)
where col1 = 1;
select * from table(dbms_xplan.display);
Plan hash value: 400235428
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 40 | 2 (0)| 00:00:01 |
| 1 | VIEW | | 2 | 40 | 2 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TABLE1 | 1 | 20 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TABLE1_IDX | 1 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| TABLE2 | 1 | 20 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TABLE2_IDX | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("COL1"=1)
6 - access("COL1"=1)
Notice how the predicates happen before the VIEW, and both indexes are used. By default everything should work as well as can be expected.
Notes
This type of query structure is called an inline view. Although a physical table is not built, the phrase "internal tables" is a good way of thinking about how the query logically works. Ideally, an inline view would work exactly like a pre-built table with the same data. In reality there are some cases where things don't quit work that way. But in general you are definitely on the right path - build a large query by assembling small inline views, and assume that Oracle will optimize it correctly.
for your particular query no any index will be used, but I suppose you do some filtering, ie where x.col1 = ###, I'm not sure that oracle will be able to use table-1/table-2 indexes to filter, so I suggest you to put where statements inside "union query"
I have the following use case:
A table stores the changed as well as the original data from a person. My query is designed to get only one row for each person: The changed data if there is some, else the original data.
I populated the table with 100k rows of data and 2k of changed data. When using a primary key on my table the query runs in less than a half second. If I put an index on the table instead of a primary key the query runs really slow. So I'll use the primary key, no doubt about that.
My question is: Why is the PK approach so much faster than the one with an index?
Code here:
drop table up_data cascade constraints purge;
/
create table up_data(
pk integer,
hp_nr integer,
up_nr integer,
ps_flag varchar2(1),
ps_name varchar2(100)
-- comment this out and uncomment the index below.
, constraint pk_up_data primary key (pk,up_nr)
);
/
-- insert some data
insert into up_data
select rownum, 1, 0, 'A', 'tester_' || to_char(rownum)
from dual
connect by rownum < 100000;
/
-- insert some changed data
-- change ps_flag = 'B' and mark it with a change number in up_nr
insert into up_data
select rownum, 1, 1, 'B', 'tester_' || to_char(rownum)
from dual
connect by rownum < 2000;
/
-- alternative(?) to the primary key
-- CREATE INDEX idx_up_data ON up_data(pk, up_nr);
/
The select statement looks like this:
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
The statement might be target of optimization but for the moment it will stay like this.
When you create a primary key constraint, Oracle also creates an index to support this at the same time. A primary key index has a couple of important differences over a basic index, namely:
All the values in this are guaranteed to be unique
There's no nulls in the table rows (of the columns forming the PK)
These reasons are the key to the performance differences you see. Using your setup, I get the following query plans:
--fast version with PK
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
-----------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| PK_UP_DATA | 103K|
| 4 | INDEX UNIQUE SCAN | PK_UP_DATA | 1 |
-----------------------------------------------------
alter table up_data drop constraint pk_up_data;
CREATE INDEX idx_up_data ON up_data(pk, up_nr);
/
--slow version with normal index
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| IDX_UP_DATA | 103K|
| 4 | INDEX FAST FULL SCAN| IDX_UP_DATA | 1870 |
------------------------------------------------------
The big difference is that the fast version employs a INDEX UNIQUE SCAN, rather than a INDEX FAST FULL SCAN in the second access of the table data.
From the Oracle docs (emphasis mine):
In contrast to an index range scan, an index unique scan must have
either 0 or 1 rowid associated with an index key. The database
performs a unique scan when a predicate references all of the columns
in a UNIQUE index key using an equality operator. An index unique scan
stops processing as soon as it finds the first record because no
second record is possible.
This optimization to stop processing proves to be a significant factor in this example. The fast version of your query:
Full scans ~103,000 index entries
For each one of these finds one matching row in the PK index and stop processing the second index further
The slow version:
Full scans ~103,000 index entries
For each one of these performs another scan of the 103,000 rows to find if there's any matches.
So to compare the work done:
With the PK, we have one fast full scan, then 103,000 lookups of one index value
With normal index, we have one fast full scan then 103,000 scans of 103,000 index entries - several orders of magnitude more work!
In this example, both the uniqueness of the primary key and the not null-ness of the index values are necessary to get the performance benefit:
-- create index as unique - we still get two fast full scans
drop index index idx_up_data;
create unique index idx_up_data ON up_data(pk, up_nr);
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| IDX_UP_DATA | 103K|
| 4 | INDEX FAST FULL SCAN| IDX_UP_DATA | 1870 |
------------------------------------------------------
-- now the columns are not null, we see the index unique scan
alter table up_data modify (pk not null, up_nr not null);
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| IDX_UP_DATA | 103K|
| 4 | INDEX UNIQUE SCAN | IDX_UP_DATA | 1 |
------------------------------------------------------