Oracle query execution plan

Oracle query execution plan - oracle

I am little confused over the execution plan of an Oracle query. This is in Oracle Enterprise Edition 11.2.0.1.0 on platform IBM AIX 6.1. I have a table TEST1 (1 million rows) and another table TEST2 (50,000 rows). Both tables have identical columns. There is a view created as a union of these 2 tables. I am firing a query on this view, with an indexed column in the WHERE clause. What I could find is that the index is not used and a full table scan is resulted. With a slight modification of the query, it started using the index. I am wondering how this particular change can result in the plan change.
Please find the complete DDL + DML below. I have given simplified example. Actual schema and requirements are bit more complex. In fact the query in question is dynamically constructed and executed by an OCI code generator. My intention here is not to get alternatives, but to really understand what could be the logical reasoning behind the plan change (between, I am an application programmer and not a database administrator). Your help is much appreciated.
DROP TABLE TEST1 CASCADE CONSTRAINTS ;
DROP TABLE TEST2 CASCADE CONSTRAINTS ;
CREATE TABLE TEST1
(
ID NUMBER(20) NOT NULL,
NAME VARCHAR2(40),
DAY NUMBER(20)
)
PARTITION BY RANGE (DAY)
(
PARTITION P001 VALUES LESS THAN (2),
PARTITION P002 VALUES LESS THAN (3),
PARTITION P003 VALUES LESS THAN (4),
PARTITION P004 VALUES LESS THAN (5),
PARTITION P005 VALUES LESS THAN (6),
PARTITION P006 VALUES LESS THAN (7),
PARTITION P007 VALUES LESS THAN (8),
PARTITION P008 VALUES LESS THAN (9),
PARTITION P009 VALUES LESS THAN (10),
PARTITION P010 VALUES LESS THAN (11),
PARTITION P011 VALUES LESS THAN (12),
PARTITION P012 VALUES LESS THAN (13),
PARTITION P013 VALUES LESS THAN (14),
PARTITION P014 VALUES LESS THAN (15),
PARTITION P015 VALUES LESS THAN (16),
PARTITION P016 VALUES LESS THAN (17),
PARTITION P017 VALUES LESS THAN (18),
PARTITION P018 VALUES LESS THAN (19),
PARTITION P019 VALUES LESS THAN (20),
PARTITION P020 VALUES LESS THAN (21),
PARTITION P021 VALUES LESS THAN (22),
PARTITION P022 VALUES LESS THAN (23),
PARTITION P023 VALUES LESS THAN (24),
PARTITION P024 VALUES LESS THAN (25),
PARTITION P025 VALUES LESS THAN (26),
PARTITION P026 VALUES LESS THAN (27),
PARTITION P027 VALUES LESS THAN (28),
PARTITION P028 VALUES LESS THAN (29),
PARTITION P029 VALUES LESS THAN (30),
PARTITION P030 VALUES LESS THAN (31)
) ;
CREATE INDEX IX_ID on TEST1 (ID) INITRANS 4 STORAGE(FREELISTS 16) LOCAL
(
PARTITION P001,
PARTITION P002,
PARTITION P003,
PARTITION P004,
PARTITION P005,
PARTITION P006,
PARTITION P007,
PARTITION P008,
PARTITION P009,
PARTITION P010,
PARTITION P011,
PARTITION P012,
PARTITION P013,
PARTITION P014,
PARTITION P015,
PARTITION P016,
PARTITION P017,
PARTITION P018,
PARTITION P019,
PARTITION P020,
PARTITION P021,
PARTITION P022,
PARTITION P023,
PARTITION P024,
PARTITION P025,
PARTITION P026,
PARTITION P027,
PARTITION P028,
PARTITION P029,
PARTITION P030
) ;
CREATE TABLE TEST2
(
ID NUMBER(20) PRIMARY KEY NOT NULL,
NAME VARCHAR2(40),
DAY NUMBER(20)
) ;
CREATE OR REPLACE VIEW TEST_V AS
SELECT
ID, NAME, DAY
FROM
TEST1
UNION
SELECT
ID, NAME, DAY
FROM
TEST2 ;
begin
for count in 1..1000000
loop
insert into test1 values(count, 'John', mod(count, 30) + 1) ;
end loop ;
end ;
/
begin
for count in 1000000..1050000
loop
insert into test2 values(count, 'Mary', mod(count, 30) + 1) ;
end loop ;
end ;
/
commit ;
set lines 300 ;
set pages 1000 ;
-- Actual query
explain plan for
SELECT Key FROM
(
WITH recs AS
(
SELECT * FROM TEST_V WHERE ID = 70000
)
(
SELECT 1 AS Key FROM recs WHERE NAME = 'John'
)
UNION
(
SELECT 2 AS Key FROM recs WHERE NAME = 'Mary'
)
) ;
select * from table(dbms_xplan.display()) ;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | TempSpc | Cost (%CPU) | Time | Pstart | Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1611K | 4721K | | 13559 (1) | 00:02:43 | | |
| 1 | VIEW | | 1611K | 4721K | | 13559 (1) | 00:02:43 | | |
| 2 | TEMP TABLE TRANSFORMATION | | | | | | | | |
| 3 | LOAD AS SELECT | SYS_TEMP_0FD9D6610_34D3B6C | | | | | | | |
|* 4 | VIEW | TEST_V | 805K | 36M | | 10403 (1) | 00:02:05 | | |
| 5 | SORT UNIQUE | | 805K | 36M | 46M | 10403 (8) | 00:02:05 | | |
| 6 | UNION-ALL | | | | | | | | |
| 7 | PARTITION RANGE ALL | | 752K | 34M | | 721 (1) | 00:00:09 | 1 | 30 |
| 8 | TABLE ACCESS FULL | TEST1 | 752K | 34M | | 721 (1) | 00:00:09 | 1 | 30 |
| 9 | TABLE ACCESS FULL | TEST2 | 53262 | 2496K | | 68 (0) | 00:00:01 | | |
| 10 | SORT UNIQUE | | 1611K | 33M | 43M | 13559 (51) | 00:02:43 | | |
| 11 | UNION-ALL | | | | | | | | |
|* 12 | VIEW | | 805K | 16M | | 1429 (1) | 00:00:18 | | |
| 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6610_34D3B6C | 805K | 36M | | 1429 (1) | 00:00:18 | | |
|* 14 | VIEW | | 805K | 16M | | 1429 (1) | 00:00:18 | | |
| 15 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6610_34D3B6C | 805K | 36M | | 1429 (1) | 00:00:18 | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("ID"=70000)
12 - filter("NAME"='John')
14 - filter("NAME"='Mary')
-- Modified query (only change is absence of outermost SELECT)
explain plan for
WITH recs AS
(
SELECT * FROM TEST_V WHERE ID = 70000
)
(
SELECT 1 AS Key FROM recs WHERE NAME = 'John'
)
UNION
(
SELECT 2 AS Key FROM recs WHERE NAME = 'Mary'
) ;
select * from table(dbms_xplan.display()) ;
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart | Pstop |
-----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 88 | 6 (67) | 00:00:01 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6611_34D3B6C | | | | | | |
| 3 | VIEW | TEST_V | 2 | 96 | 4 (50) | 00:00:01 | | |
| 4 | SORT UNIQUE | | 2 | 96 | 4 (75) | 00:00:01 | | |
| 5 | UNION-ALL | | | | | | | |
| 6 | PARTITION RANGE ALL | | 1 | 48 | 1 (0) | 00:00:01 | 1 | 30 |
| 7 | TABLE ACCESS BY LOCAL INDEX ROWID | TEST1 | 1 | 48 | 1 (0) | 00:00:01 | 1 | 30 |
|* 8 | INDEX RANGE SCAN | IX_ID | 1 | | 1 (0) | 00:00:01 | 1 | 30 |
| 9 | TABLE ACCESS BY INDEX ROWID | TEST2 | 1 | 48 | 1 (0) | 00:00:01 | | |
|* 10 | INDEX UNIQUE SCAN | SYS_C001242692 | 1 | | 1 (0) | 00:00:01 | | |
| 11 | SORT UNIQUE | | 4 | 88 | 6 (67) | 00:00:01 | | |
| 12 | UNION-ALL | | | | | | | |
|* 13 | VIEW | | 2 | 44 | 2 (0) | 00:00:01 | | |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6611_34D3B6C | 2 | 96 | 2 (0) | 00:00:01 | | |
|* 15 | VIEW | | 2 | 44 | 2 (0) | 00:00:01 | | |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6611_34D3B6C | 2 | 96 | 2 (0) | 00:00:01 | | |
-----------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("ID"=70000)
10 - access("ID"=70000)
13 - filter("NAME"='John')
15 - filter("NAME"='Mary')
quit ;
thanks & regards,
Reji

I can not reproduce this in 11.2.0.3, I don't think there is a logical explanation for this behavior other than: you hit a bug, that apparently is solved in 11.2.0.3.
One thing that jumped immediately in my eye is the lack of object statistics and - if your output was complete - the fact that OPTIMIZER_DYNAMIC_SAMPLING is set to 0. You could try to reproduce with OPTIMIZER_DYNAMIC_SAMPLING=2. In that case the dynamic sampler kicks in if the object statistics are missing. BTW: don't use this feature instead of correct optimizer statistics. More info about dynamic sampling Dynamic sampling and its impact on the Optimizer
In your - nice documented - question and script/test case you try to make use of append and nologging. This only works for bulk inserts, not for row inserts with values. What would happen is for every insert: push-up the highwater mark and dump a full block of data in the free block, in your case that would have only 1 row .... Luckily, the database ignores this instruction.
Before you fire SQL to a table, make sure that you give it optimizer statistics. This will certainly help your case.

Related

Why there are both filter and access predicates on the same index on this execution plan?

Considering the execution plan for this query :
SQL_ID 1m5r644say02b, child number 0
-------------------------------------
select * from hr.employees where department_id = 80 intersect select *
from hr.employees where first_name like 'A%'
Plan hash value: 1738366820
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 8 | | | |
| 1 | INTERSECTION | | 1 | | 4 |00:00:00.01 | 8 | | | |
| 2 | SORT UNIQUE | | 1 | 34 | 34 |00:00:00.01 | 6 | 6144 | 6144 | 6144 (0)|
|* 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 34 | 34 |00:00:00.01 | 6 | | | |
| 4 | SORT UNIQUE | | 1 | 11 | 10 |00:00:00.01 | 2 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 1 | 11 | 10 |00:00:00.01 | 2 | | | |
|* 6 | INDEX SKIP SCAN | EMP_NAME_IX | 1 | 11 | 10 |00:00:00.01 | 1 | | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("DEPARTMENT_ID"=80)
6 - access("FIRST_NAME" LIKE 'A%')
filter("FIRST_NAME" LIKE 'A%')
The execution plan has both access and filter predicates with the same '%A' predicate here on the EMP_NAME_IX index. But shouldn't the access predicate be enough here, as they both will filter the same rows? Why did it perform the additional filter predicate?
Is there a general rule for when both access and filter are the same? Based on GV$SQL_PLAN, when an operation has either an access or a filter predicate, they are only equal about 1% of the time. And this situation only happens with operations and options like INDEX (FULL/RANGE/SKIP/UNIQUE) and SORT (JOIN/UNIQUE).
select *
from gv$sql_plan
where access_predicates = filter_predicates;

Presumably you have an index on hr.employees that includes the first_name column. But you are selecting * from hr.employees such that the rows obtained from the index would have to traced back (i.e. join) with the table.
For conceptual understanding it helps to think of indexes as plain tables with a foreign key to the original table's primary key. When usage of indexes helps, these two tables are joined. The index is used alone when it contains all needed columns.
In this case we assume a join is required since you are selecting *. When accessing the hr.employee table for the second query of the intersect, because its where clause filters on an index column, a join to the index is performed prior to filtering.
The first occurrence of "FIRST_NAME" LIKE 'A%' is the reason usage of the index is decided. The second occurrence, is then the actual filtering. Filtering happens only once, not twice.
These are listed as distinct operations as deciding to use the index (and therefore perform the join) has its own costs.

Limit rows examined in Oracle

My table has millions of records. In this query below, can I make Oracle 12c examine the first X rows only instead of doing a full table scan?
The value of X, I imagine should be Offset + Fetch Next , so in this case 15
SELECT * FROM table OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
Thanks in advance
Edit 1
These are the tables involved and this is the actual query
Orders - This table has 113k records in my test DB ( and over 8 million in prod db like my original question mentioned)
--------------------------
| Id | SKUField1|SKUField2|
--------------------------
| 1 | Value1 | Value2 |
| 2 | Value2 | Value2 |
| 3 | Value1 | Value3 |
--------------------------
Products - This table has 2 million records in my test DB ( prod db is similar)
---------------
| PId| SKU_NUM|
---------------
| 1 | Value1 |
| 2 | Value2 |
| 3 | Value3 |
---------------
Note that values of Orders.SKUField1 and Orders.SKUField2 come from the Products.SKU_NUM values
Actual Query:
SELECT /*+ gather_plan_statistics */ Id, PId, SKUField1, SKUField2, SKU_NUM
FROM Orders
LEFT JOIN (
-- this inner query reduces size of Products from 2 million rows down to 1462 rows
select * from Products where SKU_NUM in (
select SKUField1 from Orders
)
) p1 ON SKUField1 = p1.SKU_NUM
LEFT JOIN (
-- this inner query reduces size of table B from 2 million rows down to 459 rows
select * from Products where SKU_NUM in (
select SKUField2 from Orders
)
) p4 ON SKUField2 = p4.SKU_NUM
OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY
Execution Plan:
--------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.06 | 8013 | | | |
|* 1 | VIEW | | 1 | 00:00:01 | 10 |00:00:00.06 | 8013 | | | |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 00:00:01 | 15 |00:00:00.06 | 8013 | 27M| 1904K| |
|* 3 | HASH JOIN RIGHT OUTER | | 1 | 00:00:01 | 15 |00:00:00.06 | 8013 | 1162K| 1162K| 1344K (0)|
| 4 | VIEW | | 1 | 00:00:01 | 1462 |00:00:00.04 | 6795 | | | |
| 5 | NESTED LOOPS | | 1 | 00:00:01 | 1462 |00:00:00.04 | 6795 | | | |
| 6 | NESTED LOOPS | | 1 | 00:00:01 | 1462 |00:00:00.04 | 5333 | | | |
| 7 | SORT UNIQUE | | 1 | 00:00:01 | 1469 |00:00:00.04 | 3010 | 80896 | 80896 |71680 (0)|
| 8 | TABLE ACCESS FULL | Orders | 1 | 00:00:01 | 113K|00:00:00.02 | 3010 | | | |
|* 9 | INDEX UNIQUE SCAN | UIX_Product_SKU_NUM | 1469 | 00:00:01 | 1462 |00:00:00.01 | 2323 | | | |
| 10 | TABLE ACCESS BY INDEX ROWID | Products | 1462 | 00:00:01 | 1462 |00:00:00.01 | 1462 | | | |
|* 11 | HASH JOIN RIGHT OUTER | | 1 | 00:00:01 | 15 |00:00:00.02 | 1218 | 1142K| 1142K| 1335K (0)|
| 12 | VIEW | | 1 | 00:00:01 | 459 |00:00:00.02 | 1213 | | | |
| 13 | NESTED LOOPS | | 1 | 00:00:01 | 459 |00:00:00.02 | 1213 | | | |
| 14 | NESTED LOOPS | | 1 | 00:00:01 | 459 |00:00:00.02 | 754 | | | |
| 15 | SORT UNIQUE | | 1 | 00:00:01 | 462 |00:00:00.02 | 377 | 24576 | 24576 |22528 (0)|
| 16 | INDEX FAST FULL SCAN | Orders_SKUField2_IDX6 | 1 | 00:00:01 | 113K|00:00:00.01 | 377 | | | |
|* 17 | INDEX UNIQUE SCAN | UIX_Product_SKU_NUM | 462 | 00:00:01 | 459 |00:00:00.01 | 377 | | | |
| 18 | TABLE ACCESS BY INDEX ROWID| Products | 459 | 00:00:01 | 459 |00:00:00.01 | 459 | | | |
| 19 | TABLE ACCESS FULL | Orders | 1 | 00:00:01 | 15 |00:00:00.01 | 5 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------
Hence, based on the "A-Rows" column values for row Ids 8 and 16 in the execution plan, it seems like there are full table scans on the Orders table (though row id 16 atleast seems to be using an index). So my question is is it true that there is a full table scan on the orders table even though I am using Offset/Fetch Next

Although your FETCH clause may use a full table scan, Oracle will still only fetch the first X rows from the table.
In the following example, the "TABLE ACCESS FULL" operation does start to read the entire table, but it gets cutoff part of the way through by the "WINDOW NOSORT STOPKEY" operation. Not all full table scans actually scan the full table. You would see similar behavior if your code ended with WHERE ROWNUM <= 50.
CREATE TABLE some_table AS SELECT * FROM all_objects;
EXPLAIN PLAN FOR SELECT * FROM some_table OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
SELECT * FROM TABLE(dbms_xplan.display);
Plan hash value: 2559837639
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 7410 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 15 | 7410 | 2 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 15 | 2010 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | SOME_TABLE | 15 | 2010 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=15 AND
"from$_subquery$_002"."rowlimit_$$_rownumber">5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=15)
The performance implications get more complicated if you also want to order the results. If that is the case, you may want to post the full query and execution plan.
(EDIT: 2022-09-25)
Yes, there is a full table scan on the ORDERS table happening on line 8 of the execution plan. As you mentioned, you can look at the "A-rows" column to tell what's really happening.
But the third full table scan of ORDERS, on line 19, is not a "full" full table scan. The operation "WINDOW NOSORT STOPKEY" stops that full table scan as soon as the 15 necessary rows are read. So the FETCH syntax is helping at least a little.
Applying a FETCH to a query does not mean that every single table will be limited. Although, in your query, it does seem like there ought to be a way to reduce the full table scans. Perhaps an index on SKUField1 would help?

Since Oracle as I know don't provide something like limit or top you can created by yourself like the following:
what is happening here, the inner query gets all the first 10 records and the outer query get those, you can still use any clauses like where or order or any others
SELECT * FROM (
SELECT * FROM Customers WHERE CustomerID <= 10 ORDER BY CustomerID
)
The full article will be found about this topic here at Oracle-Fetch
I am using Online Oracle so you can try it from your end, please let me know if you still have a problem.

Explain Plan in Oracle - how do I detect the most efficient query?

Intro
I am writing a query to compare 2 different oracle database tables containing hashes.
I want to find which hashes from location 1, have not been migrated over to location 2.
I have three different queries, and have written explain plan for statements for them.
However, the results don't tell me that much.
Question
How do I find which is the most efficient and fastest?
Current Guess
My suspicion however is that the first query is the fastest since it makes a one-off use of the remote link. But this is just a guess that is not supported by actual results.
Code
--------------------------
EXPLAIN PLAN
SET statement_id = 'ex_plan1' FOR
select* from document doc left outer join migrated_document#V2_PROD migrated on doc.hash = migrated.document_hash AND migrated.document_hash is null ;
SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, 'ex_plan1','BASIC'));
---------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN RIGHT OUTER| |
| 2 | REMOTE | MIGRATED_DOCUMENT |
| 3 | TABLE ACCESS FULL | DOCUMENT |
---------------------------------------------------
--------------------------
EXPLAIN PLAN
SET statement_id = 'ex_plan2' FOR
select* from document doc where not exists( select 1 from migrated_document#V2_PROD migrated where migrated.document_hash = doc.HASH ) ;
SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, 'ex_plan2','BASIC'));
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FILTER | |
| 2 | TABLE ACCESS FULL| DOCUMENT |
| 3 | REMOTE | MIGRATED_DOCUMENT |
------------------------------------------------
--------------------------
EXPLAIN PLAN
SET statement_id = 'ex_plan3' FOR
select* from document doc where doc.hash not in ( select migrated.document_hash from migrated_document#V2_PROD migrated) ;
SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, 'ex_plan3','BASIC'));
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS ANTI | |
| 2 | TABLE ACCESS FULL| DOCUMENT |
| 3 | REMOTE | MIGRATED_DOCUMENT |
------------------------------------------------
--------------------------
Update
I updated the explain plan statements to get more results.
Due to the remote operation... could something cost less but be more slow?
If I am reading the data correctly it seems that option 2 is best.
But I still think option 1 is quicker.
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Inst |IN-OUT|
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 51M| 194 | | |
| 1 | HASH JOIN RIGHT OUTER| | 105K| 51M| 194 | | |
| 2 | REMOTE | MIGRATED_DOCUMENT | 1 | 275 | 2 | V2_MN~ | R->S |
| 3 | TABLE ACCESS FULL | DOCUMENT | 105K| 23M| 192 | | |
-------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Inst |IN-OUT|
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 23M| 104K| | |
| 1 | FILTER | | | | | | |
| 2 | TABLE ACCESS FULL| DOCUMENT | 105K| 23M| 192 | | |
| 3 | REMOTE | MIGRATED_DOCUMENT | 1 | 50 | 1 | V2_MN~ | R->S |
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Inst |IN-OUT|
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 29M| 526 | | |
| 1 | NESTED LOOPS ANTI | | 105K| 29M| 526 | | |
| 2 | TABLE ACCESS FULL| DOCUMENT | 105K| 23M| 192 | | |
| 3 | REMOTE | MIGRATED_DOCUMENT | 1 | 50 | 0 | V2_MN~ | R->S |
----------------------------------------------------------------------------------------

In a perfect world, you would choose the plan with the lowest cost. However I do not think your first query does what you want. It looks to me like no rows will join; you should be using a filter predicate rather than a join.
Instead of
select * from document doc
left outer join migrated_document#V2_PROD migrated
on doc.hash = migrated.document_hash
AND migrated.document_hash is null
it should be
select * from document doc
left outer join migrated_document#V2_PROD migrated
on doc.hash = migrated.document_hash
WHERE migrated.document_hash is null

Oracle - Optimize query, large database table, CLOB field

So I've been wracking my brain on this and admittedly am not very good with Oracle. We have a table that holds about 60 million records with values stored in it for buildings. Have added appropriate indexes where I thought was fit, but still poor performance. Here is the query as that should help:
SELECT count(*)
FROM viewBuildings
INNER JOIN tblValues
ON viewBuildings.bldg_id = tblValues.bldg_id
WHERE bldg_deleted = 0
AND (bldg_summary = 1
OR (bldg_root = 0 AND bldg_def = 0)
OR bldg_parent = 1)
AND field_id IN (207)
AND UPPER(dbms_lob.substr(v_value, 2000, 1)) = UPPER('2320')
So the above is just one example of a query that can be constructed. It's looking in tblValues on the v_value CLOB field for a match of '2320'. It uppercases as it can search both numeric and text values. tblValues has 60 million records. It is indexed by the building id and also the field id.
I may need to supply more info, but as far as stats go, the number that jumped out to me was "consistent gets". Consistent gets = 74069. Is that a large number?
Any advice would be great, primarily in dealing with a CLOB field on a large database table. Can't use context type indexing as I need exact matches and the data being looked up can be numeric or string.
EDIT (more information):
tblBuildings is part of viewBuildings (a view), has 80,000 records
tblValues has the values of each building, has 68,000,000 records
tblValues has about 550 fields per building (field_id)
Desired result: Query to return results in < 5 seconds. Is this unreasonable? Sometimes it will indefinitely run, other times maybe 80 seconds.
Explain Plan results
Plan hash value: 1480138519
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------|
| 0 | SELECT STATEMENT | | 1 | 192 | 32 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 192 | | |
| 2 | NESTED LOOPS | | 1 | 192 | 15 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 183 | 12 (0)| 00:00:01 |
|* 4 | FILTER | | | | | |
| 5 | NESTED LOOPS OUTER | | 1 | 64 | 10 (0)| 00:00:01 |
|* 6 | TABLE ACCESS BY INDEX ROWID | TBLBUILDINGS | 1 | 60 | 9 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | SAA_4 | 17 | | 3 (0)| 00:00:01 |
| 8 | NESTED LOOPS | | 1 | 21 | 3 (0)| 00:00:01 |
| 9 | TABLE ACCESS BY INDEX ROWID| TBLBUILDINGSTATUSES | 1 | 15 | 2 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | IDX_BUILDINGSTATUS_EXCLUDEQUERY | 1 | | 1 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | IDX_BUILDING_STATUS_ASID_DELETED | 1 | 6 | 1 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | TBLBUILDINGSTATUSES | 1 | 4 | 1 (0)| 00:00:01 |
|* 13 | INDEX UNIQUE SCAN | PK_TBLBUILDINGSTATUS | 1 | | 0 (0)| 00:00:01 |
|* 14 | TABLE ACCESS BY INDEX ROWID | TBLVALUES | 1 | 119 | 2 (0)| 00:00:01 |
|* 15 | INDEX UNIQUE SCAN | PK_SAA_6 | 1 | | 1 (0)| 00:00:01 |
| 16 | INLIST ITERATOR | | | | | |
|* 17 | INDEX RANGE SCAN | SAA_7 | 1 | 9 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
4 - filter("TBLBUILDINGSTATUSES"."BUILDING_STATUS_HIDE_REPORTS" IS NULL OR
"TBLBUILDINGSTATUSES"."BUILDING_STATUS_HIDE_REPORTS"=0)
6 - filter("TBLBUILDINGS"."BLDG_SUMMARY"=1 OR "TBLBUILDINGS"."BLDG_SUB_BUILDING_PARENT"=1 OR
"TBLBUILDINGS"."BLDG_BUILDING_DEF"=0 AND "TBLBUILDINGS"."BLDG_ROOT"=0)
7 - access("TBLBUILDINGS"."BLDG_DELETED"=0)
filter( NOT EXISTS (SELECT 0 FROM "TBLBUILDINGSTATUSES" "TBLBUILDINGSTATUSES","TBLBUILDINGS" "TBLBUILDINGS" WHERE
"TBLBUILDINGS"."BLDG_ID"=:B1 AND "TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"="TBLBUILDINGS"."BUILDING_STATUS_ID" AND
"TBLBUILDINGSTATUSES"."BUILDING_STATUS_EXCLUDE_QUERY"=1))
10 - access("TBLBUILDINGSTATUSES"."BUILDING_STATUS_EXCLUDE_QUERY"=1)
11 - access("TBLBUILDINGS"."BLDG_ID"=:B1 AND "TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"="TBLBUILDINGS"."BUILDING_STATUS_ID")
filter("TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"="TBLBUILDINGS"."BUILDING_STATUS_ID")
13 - access("TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"(+)="TBLBUILDINGS"."BUILDING_STATUS_ID")
14 - filter(UPPER("DBMS_LOB"."SUBSTR"("TBLVALUES"."V_VALUE",2000,1))=U'2320')
15 - access("TBLVALUES"."FE_ID"=207 AND "TBLBUILDINGS"."BLDG_ID"="TBLVALUES"."BLDG_ID")
17 - access("TBLINSPECTORBUILDINGMAP"."IN_ID"=1 AND ("TBLINSPECTORBUILDINGMAP"."IAM_BUILDING_ACCESS_LEVEL"=0 OR
"TBLINSPECTORBUILDINGMAP"."IAM_BUILDING_ACCESS_LEVEL"=1) AND "TBLBUILDINGS"."BLDG_ID"="TBLINSPECTORBUILDINGMAP"."BLDG_ID")
44 rows selected
Plan hash value: 2137789089
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 16336 | 29 (0)| 00:00:01 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| DISPLAY | 8168 | 16336 | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Okay, I gathered statistics as you suggested and then here is the plan_table_output. Looks like IDX_CURVAL_FE_ID is the problem here? That is an index on the values table for the field id.
SQL_ID d4aq8nsr1p6uw, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM
viewAssetsForUser1 INNER JOIN tblCurrentValues ON
viewAssetsForUser1.as_id = tblCurrentValues.as_id WHERE as_deleted =
:"SYS_B_0" AND (as_summary = :"SYS_B_1" OR (as_root =
:"SYS_B_2" AND as_asset_def = :"SYS_B_3") OR
as_sub_asset_parent = :"SYS_B_4") AND fe_id IN (:"SYS_B_5")
AND UPPER(dbms_lob.substr(cv_value, :"SYS_B_6", :"SYS_B_7")) =
UPPER(:"SYS_B_8")
Plan hash value: 4033422776
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:08:43.19 | 56589 | 56084 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:08:43.19 | 56589 | 56084 | | | |
|* 2 | FILTER | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
| 3 | NESTED LOOPS | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
| 4 | NESTED LOOPS | | 1 | 115 | 0 |00:08:43.19 | 56589 | 56084 | | | |
|* 5 | FILTER | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
|* 6 | HASH JOIN RIGHT OUTER | | 1 | 82 | 0 |00:08:43.19 | 56589 | 56084 | 1348K| 1348K| 742K (0)|
| 7 | TABLE ACCESS FULL | TBLASSETSTATUSES | 1 | 4 | 4 |00:00:00.01 | 3 | 0 | | | |
| 8 | NESTED LOOPS | | 1 | | 0 |00:08:43.19 | 56586 | 56084 | | | |
| 9 | NESTED LOOPS | | 1 | 163 | 0 |00:08:43.19 | 56586 | 56084 | | | |
|* 10 | TABLE ACCESS BY INDEX ROWID | TBLCURRENTVALUES | 1 | 163 | 0 |00:08:43.19 | 56586 | 56084 | | | |
|* 11 | INDEX RANGE SCAN | IDX_CURVAL_FE_ID | 1 | 16283 | 61357 |00:00:05.98 | 132 | 132 | | | |
|* 12 | INDEX RANGE SCAN | SAA_1 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 13 | TABLE ACCESS BY INDEX ROWID | TBLASSETS | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 14 | INDEX UNIQUE SCAN | PK_TBLINSPECTORBRIDGEMAP2 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 15 | TABLE ACCESS BY GLOBAL INDEX ROWID| TBLINSPECTORASSETMAP | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:SYS_B_0=0)
5 - filter(("TBLASSETSTATUSES"."ASSET_STATUS_HIDE_REPORTS" IS NULL OR "TBLASSETSTATUSES"."ASSET_STATUS_HIDE_REPORTS"=0))
6 - access("TBLASSETSTATUSES"."ASSET_STATUS_ID"="TBLASSETS"."ASSET_STATUS_ID")
10 - filter(UPPER("DBMS_LOB"."SUBSTR"("TBLCURRENTVALUES"."CV_VALUE",:SYS_B_6,:SYS_B_7))=SYS_OP_C2C(UPPER(:SYS_B_8)))
11 - access("TBLCURRENTVALUES"."FE_ID"=:SYS_B_5)
12 - access("TBLASSETS"."AS_DELETED"=:SYS_B_0 AND "TBLASSETS"."AS_ID"="TBLCURRENTVALUES"."AS_ID")
13 - filter((("TBLASSETS"."AS_ROOT"=:SYS_B_2 AND "TBLASSETS"."AS_ASSET_DEF"=:SYS_B_3) OR "TBLASSETS"."AS_SUMMARY"=:SYS_B_1 OR
"TBLASSETS"."AS_SUB_ASSET_PARENT"=:SYS_B_4))
14 - access("TBLASSETS"."AS_ID"="TBLINSPECTORASSETMAP"."AS_ID" AND "TBLINSPECTORASSETMAP"."IN_ID"=1)
15 - filter(("TBLINSPECTORASSETMAP"."IAM_ASSET_ACCESS_LEVEL"=0 OR "TBLINSPECTORASSETMAP"."IAM_ASSET_ACCESS_LEVEL"=1))

Bad Index Cost If the stats are fresh, and the optimizer has a relatively good cardinality estimate, why would it pick a bad plan? Perhaps there is a parameter making indexes look artificially cheap. Take a look at: select * from v$parameter where name in ('optimizer_index_cost_adj', 'optimizer_index_caching'); Are they significantly different than their default values, 100 and 0?
Also, take a look at select * from sys.aux_stats$; Maybe your system statistics make full table scans look too expensive. Some versions of Oracle have bugs with workload statistics where the numbers are wrong by several orders of magnitude.
Or perhaps your table is just incredibly huge, and 16K index reads is the best access path. Look at DBA_SEGMENTS.BYTES to find the size of your table and LOB segment.
Even if the table is medium-sized, and the plan changed to a full table scan, that probably won't get the run time to within 5 seconds. But combined with your idea to partition, it might be enough.
LOB STORAGE Given your example, I assume most of the CLOBs are relatively small? Perhaps you have an unusual LOB setting wasting a lot of space, such as DISABLE STORAGE IN ROW. You may want to check your table DDL, or post all of it here. Or if you can replace the CLOB with a VARCHAR2, that would be even better.
FBI A function-based index on the CLOB may significantly speed things up. But it may be a very large index: create index TBLCURRENTVALUES_FBI on TBLCURRENTVALUES(UPPER(dbms_lob.substr(v_value, 2000, 1)));
CURSOR_SHARING The queries are changing a bit, which makes tuning difficult. Looks like this latest version has CURSOR_SHARING=FORCE, which is unusual. For an expensive query, using literals can be a good thing - the extra time spent building query plans is probably worth it. If the system parameter can't change, look into the hint /*+ cursor_sharing_exact */.

You can do any number of optimization but ultimately its the huge amount of data which causes the problem. When you execute the query and track it on performance graph on OEM , you will that major amount of time will be spent on IO. That is taking data in and out of memory.
So whats the solution: It will be to partition the table. Whenever data is huge , you should look to partition the table so that you deal with only relevant data.
In order to partition the table you need some point to segregate the data and looking at your data it can be building id.
You can read more about it at this url : http://docs.oracle.com/cd/E11882_01/server.112/e25523/partition.htm#g471747
Partitioning comes up with many other features, like local indexes which help to optimize queries even more.
Partitioning will not be a solution if you are deal with whole of large table data all the time but then that puts a question mark over database schema.
SO yes query optimization will help but as data is large you should evaluate table partitioning as well.

Top N query performance when accessing a list of IDs

I have a top N query that is giving me problems.
First of all, I have a query like the following:
select /*+ gather_plan_statistics */ * from
(
select rowid
from payer_subscription ps
where ps.subscription_status = :i_subscription_status
and ps.merchant_id = :merchant_id2
order by transaction_date desc
) where rownum <= :i_rowcount;
This query works well. It can very efficiently find me the top 10 rows for a massive data set, using an index on merchant_id, subscription_status, transaction_date.
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 4 |
|* 1 | COUNT STOPKEY | | 1 | | 10 |00:00:00.01 | 4 |
| 2 | VIEW | | 1 | 11 | 10 |00:00:00.01 | 4 |
|* 3 | INDEX RANGE SCAN DESCENDING| SODTEST2_IX | 1 | 100 | 10 |00:00:00.01 | 4 |
-------------------------------------------------------------------------------------------------------
As you can see the estimated actual rows at each stage are 10, which is correct.
Now, I have a requirement to get the top N records for a set of merchant_Ids, so if I change the query to include two merchant_ids, the performance tanks:
select /*+ gather_plan_statistics */ * from
(
select rowid
from payer_subscription ps
where ps.subscription_status = :i_subscription_status
and (ps.merchant_id = :merchant_id or
ps.merchant_id = :merchant_id2 )
order by transaction_date desc
) where rownum <= :i_rowcount;
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.17 | 178 | | | |
|* 1 | COUNT STOPKEY | | 1 | | 10 |00:00:00.17 | 178 | | | |
| 2 | VIEW | | 1 | 200 | 10 |00:00:00.17 | 178 | | | |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 200 | 10 |00:00:00.17 | 178 | 2048 | 2048 | 2048 (0)|
| 4 | INLIST ITERATOR | | 1 | | 42385 |00:00:00.10 | 178 | | | |
|* 5 | INDEX RANGE SCAN | SODTEST2_IX | 2 | 200 | 42385 |00:00:00.06 | 178 | | | |
----------------------------------------------------------------------------------------------------------------------------
Notice now that there are 42K rows coming out of the two index range scans - Oracle is no longer aborting the index range scan when it reaches 10 rows. What I thought would happen, is that Oracle would get at most 10 rows for each merchant_id, knowing that at most 10 rows are to be returned by the query. Then it would sort that 10 + 10 rows and output the top 10 based on the transaction date, but it refuses to do that.
Does anyone know how I can get the performance of the first query, when I need to pass a list of merchants into the query? I could probably get the performance using a union all, but the list of merchants is variable, and could be anywhere between 1 or 2 to several 100.

You can use --+ use_concat hint to make Oracle execute query as if it was a UNION ALL.
From documentation:
The USE_CONCAT hint instructs the optimizer to transform combined
OR-conditions in the WHERE clause of a query into a compound query
using the UNION ALL set operator. Without this hint, this
transformation occurs only if the cost of the query using the
concatenations is cheaper than the cost without them. The USE_CONCAT
hint overrides the cost consideration.

There are many cases where use_concat is ignored.
See: MOS Note: USE_CONCAT hint on different versions (Doc ID 259741.1)
I have had success in 10.2.0.4, 11.2.0.1 with OR_EXPAND where USE_CONCAT will not work.
/*+ OR_EXPAND( alias column_name ) */
Documented here:
http://www.hellodba.com/reader.php?ID=199&lang=EN

I'm not sure if this helps, but you cah try to replace the OR operator with IN:
and ps.merchant_id IN (:merchant_id, :merchant_id2)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio