query rewrite not working properly on materialized view datasubset - oracle

I'm currently experimenting with Oracle materialized views, but I'm facing a problem:
I created a materialized view (a data subset that only contains data for January 2020) from base table DW_F_TIMESHEETLINE and matching bitmapindex:
create materialized view DW.MV_TSL_4
ENABLE QUERY REWRITE
as
SELECT "Fact_Timesheet_Line".*
FROM
DW."DW_F_TIMESHEETLINE" "Fact_Timesheet_Line"
WHERE
"Fact_Timesheet_Line".DDAT_WORK_SK between 20200101 and 20200131
;
CREATE BITMAP INDEX DW.IDX_BM_MV_TSL_4_DDAT_WORK_SK ON DW.MV_TSL_4 (DDAT_WORK_SK) NOLOGGING TABLESPACE DW_INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE ( INITIAL 64K NEXT 1M MAXSIZE UNLIMITED MINEXTENTS 1 MAXEXTENTS UNLIMITED PCTINCREASE 0 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT ) NOPARALLEL;
When I run the following query with a filter on DDAT_WORK_SK, the query is being rewritten by the optimize to use the materialized view:
select sum(data) from DW.DW_F_TIMESHEETLINE where DDAT_WORK_SK between 20200101 and 20200131
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 14490 (1)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | MAT_VIEW REWRITE ACCESS FULL| MV_TSL_4 | 1732K| 6769K| 14490 (1)| 00:00:02 |
------------------------------------------------------------------------------------------
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
However, If I change the query so that the base table is joined with table DW_D_DATE on column DDAT_WORK_SK, the query is rewritten to get the data from the materialized view and also from the base table (with a UNION ALL):
select sum(data)
from DW.DW_F_TIMESHEETLINE , DW.DW_D_DATE
where DW_D_DATE.DDAT_SK = DW.DW_F_TIMESHEETLINE.DDAT_WORK_SK and
DW_D_DATE.DDAT_SK between 20200101 and 20200131;
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 14500 (1)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 2 | 26 | 14500 (1)| 00:00:02 |
| 3 | UNION-ALL | | | | | |
| 4 | SORT AGGREGATE | | 1 | 16 | | |
|* 5 | HASH JOIN | | 131K| 2052K| 14500 (1)| 00:00:02 |
|* 6 | INDEX RANGE SCAN | DW_PK_DDAT_SK | 2 | 12 | 2 (0)| 00:00:01 |
|* 7 | MAT_VIEW REWRITE ACCESS FULL | MV_TSL_4 | 1732K| 16M| 14494 (1)| 00:00:02 |
| 8 | SORT AGGREGATE | | 1 | 10 | | |
|* 9 | FILTER | | | | | |
| 10 | TABLE ACCESS BY INDEX ROWID BATCHED| DW_F_TIMESHEETLINE | 2361K| 22M| 134K (1)| 00:00:11 |
| 11 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 12 | BITMAP INDEX RANGE SCAN | IDX_BM_FTSL_DDAT_WORK_SK | | | | |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("DW_D_DATE"."DDAT_SK"="MV_TSL_4"."DDAT_WORK_SK")
6 - access("DW_D_DATE"."DDAT_SK">=20200101 AND "DW_D_DATE"."DDAT_SK"<=20200131)
7 - filter("MV_TSL_4"."DDAT_WORK_SK">=20200101 AND "MV_TSL_4"."DDAT_WORK_SK"<=20200131)
9 - filter(NULL IS NOT NULL)
12 - access("DW_F_TIMESHEETLINE"."DDAT_WORK_SK">=20200101 AND
"DW_F_TIMESHEETLINE"."DDAT_WORK_SK"<=20200131)
Note
-----
- dynamic statistics used: dynamic sampling (level=5)
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
- this is an adaptive plan
I don't understand why the optimizer is still using the base table?
In the explain plan, I noticed line '9 - filter(NULL IS NOT NULL)'. After some googling this could be because of constraints that are not being validated.
I tried to validate the foreign key constraint DW_FK_DDAT_WORK_SK on DW_F_TIMESHEETLINE and create a few extra constraints on this column:
alter table DW.DW_F_TIMESHEETLINE modify constraint DW_FK_DDAT_WORK_SK validate;
alter table DW.DW_F_TIMESHEETLINE add constraint testconstr check (DDAT_WORK_SK is not null) validate;
alter table DW.MV_TSL_4 add constraint t1a_ck check (DDAT_WORK_SK is not null) validate;
alter table DW.MV_TSL_4 add constraint t1a_ck2 check (DDAT_WORK_SK between 20200101 and 20200131) validate;
But even with these constraints, the optimizer continues to rewrite the query using a UNION ALL between the materialized view and the base table.
Any idea what I am doing wrong?
Thank you for your feedback!

9 - filter(NULL IS NOT NULL) means that Oracle knows it can safely not execute this part.
You can use row source statistics to confirm which lines are executed in a plan (and how many times and how long they take).
alter session set statistics_level=all;
set serverout off
select sum(data)
from DW.DW_F_TIMESHEETLINE , DW.DW_D_DATE
where DW_D_DATE.DDAT_SK = DW.DW_F_TIMESHEETLINE.DDAT_WORK_SK and
DW_D_DATE.DDAT_SK between 20200101 and 20200131;
select * from dbms_xplan.display_cursor(sql_id=>null,format=>'typical allstats last');
(assuming the user you're running this as has the privileges to read plans, otherwise grab the sql_id of the execution and run the dbms_xplan part manually from a privileged session).
I imagine, Oracle has decided to expand this out as there is a foreign key which guarantees that the join is always successful - so if you were to use the base table, you don't need to do the join.
BTW it is a small red flag to have bitmap indexes on this base table, make sure you really need them as they do not make concurrency easy. Should also note that none of your shared queries would benefit from the bitmap index you created on the MView.

Related

Why there are both filter and access predicates on the same index on this execution plan?

Considering the execution plan for this query :
SQL_ID 1m5r644say02b, child number 0
-------------------------------------
select * from hr.employees where department_id = 80 intersect select *
from hr.employees where first_name like 'A%'
Plan hash value: 1738366820
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 8 | | | |
| 1 | INTERSECTION | | 1 | | 4 |00:00:00.01 | 8 | | | |
| 2 | SORT UNIQUE | | 1 | 34 | 34 |00:00:00.01 | 6 | 6144 | 6144 | 6144 (0)|
|* 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 34 | 34 |00:00:00.01 | 6 | | | |
| 4 | SORT UNIQUE | | 1 | 11 | 10 |00:00:00.01 | 2 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 1 | 11 | 10 |00:00:00.01 | 2 | | | |
|* 6 | INDEX SKIP SCAN | EMP_NAME_IX | 1 | 11 | 10 |00:00:00.01 | 1 | | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("DEPARTMENT_ID"=80)
6 - access("FIRST_NAME" LIKE 'A%')
filter("FIRST_NAME" LIKE 'A%')
The execution plan has both access and filter predicates with the same '%A' predicate here on the EMP_NAME_IX index. But shouldn't the access predicate be enough here, as they both will filter the same rows? Why did it perform the additional filter predicate?
Is there a general rule for when both access and filter are the same? Based on GV$SQL_PLAN, when an operation has either an access or a filter predicate, they are only equal about 1% of the time. And this situation only happens with operations and options like INDEX (FULL/RANGE/SKIP/UNIQUE) and SORT (JOIN/UNIQUE).
select *
from gv$sql_plan
where access_predicates = filter_predicates;
Presumably you have an index on hr.employees that includes the first_name column. But you are selecting * from hr.employees such that the rows obtained from the index would have to traced back (i.e. join) with the table.
For conceptual understanding it helps to think of indexes as plain tables with a foreign key to the original table's primary key. When usage of indexes helps, these two tables are joined. The index is used alone when it contains all needed columns.
In this case we assume a join is required since you are selecting *. When accessing the hr.employee table for the second query of the intersect, because its where clause filters on an index column, a join to the index is performed prior to filtering.
The first occurrence of "FIRST_NAME" LIKE 'A%' is the reason usage of the index is decided. The second occurrence, is then the actual filtering. Filtering happens only once, not twice.
These are listed as distinct operations as deciding to use the index (and therefore perform the join) has its own costs.

how to avoid TABLE ACCESS FULL when fetching rows in Oracle

I'm using Oracle 18c but I guess my question would not be bound to the specific version.
I want to fetch rows from a table but I found a complex, ugly solution.
I would like to know if there is better, simple query that can return the same result as following.
First of all, I have a simple table like this.
Note that col is going to store large text.
CREATE TABLE simpletable
(record_id NUMBER,
col CLOB,
PRIMARY KEY (record_id));
I want to retrieve single row from the above table and whichever row is acceptable.
First query came to my mind is as following.
SELECT * FROM (SELECT * FROM simpletable) WHERE rownum <= 1;
Another is as following.
SELECT * FROM (SELECT * FROM simpletableORDER BY record_id) WHERE rownum <= 1;
Unfortunately, neither of above two does not use primary-key index and uses TABLE ACCESS FULL which can take long time when the table grows enough large.
(I'm guessing that oracle preferred the simpler plan because my table is not enough large yet to use index scan.
Oracle might choose different plan if the table grows up further.)
My final solution that uses primary-key index to narrow down the table access is following.
SELECT simpletable.* FROM
(SELECT * FROM
(SELECT record_id, ROWID as id FROM simpletable ORDER BY record_id)
WHERE rownum<=1) a
JOIN simpletable ON a.id = simpletable.ROWID;
If you have a better solution, please let me know.
It would be very appreciated.
P.S.
The first two queries produced the following plan.
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 2015 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | SIMPLETABLE | 1 | 2015 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
the final one is:
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2039 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2039 | 3 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_C007561 | 1 | 25 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID| SIMPLETABLE | 1 | 2014 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
I think using OFFSET..FETCH method might helps you here -
SELECT *
FROM simpletable
ORDER BY record_id
OFFSET 0 ROWS
FETCH FIRST ROW ONLY;
If the Cost Based Optimizer code has access to reliable statistics, from its dictionary, regarding all the objects available for this query, then it will very likely produce an optimal execution plan. Of course, there are exceptions and you would argue with their support people as to whether or not choosing a suboptimal plan is a bug.
In this specific case, if you are querying a single table and the CBO could choose between a full table scan and some other scan and then chose a full table scan, then chances were good that the CBO determined that the number of blocks scanned (buffer gets) would have been smaller using a full table scan.
You can expose the truth of the matter by tracing the execution of multiple versions of the statement, each one using a different set of hints to force a particular execution plan. You should consider the execution with the fewest buffer gets to be the winner. Alternatively, if the execution plan is of a serial nature, then you can use response time as measure. If the winner is not automatically chosen by the CBO, then it's probably because the statistics it used were not accurate and you should make them accurate. If the statistics are indeed accurate then Oracle support will probably give you a very long homework assignment.
Similar to the Horror Vacui, some database developers suffer under the Horror FULL TABLE SCAN by simply assuming index access good, full scan bad.
But this is not true, FULL TABLE SCAN is a normal access method, that is preferred in some situation.
Let's illustrate it on a simple example with 10K rows in your table
insert into simpletable (record_id, col)
select rownum, rpad('x',3998,'y')
from dual connect by level <= 10000
To get one arbitrary row from the table you simple use the following query
select * from simpletable where rownum = 1;
Here is the output (edited for brevity) you get from SQL*Plus with setting set autotrace traceonly to see the execution plan and the statistics.
Execution Plan
----------------------------------------------------------
Plan hash value: 1007892724
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 2 (0)| 00:00:01
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS FULL| SIMPLETABLE | 10188 | 19M| 2 (0)| 00:00:01
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
Statistics
----------------------------------------------------------
5 consistent gets
2 physical reads
1 rows processed
The most important information is the statistics consistent gets - there were only 5 blocks accessed - the table is much larger.
What is the explanation? See the operation COUNT STOPKEY above the TABLE ACCESS FULL this ensures that the scan is terminated after the first row is found.
If you want to get a specific row, e.g. the one with the highest ID, the prefered approach is using the row_limiting_clause
SELECT *
FROM simpletable
ORDER BY record_id DESC
OFFSET 0 ROWS FETCH NEXT 1 ROW ONLY;
You will see the execution plan below, that performs first the INDEX FULL SCAN DESCENDING. The complete (full) index will be red in the descending order, but again due to STOPKEY you break after reading the highest key (which is the first entry due to the descending order).
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 1 | VIEW | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY | | 10188 | 19M| 613 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| SIMPLETABLE | 10188 | 19M| 613 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN DESCENDING| SYS_C008793 | 10188 | | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN (0>=0) THEN
0 ELSE 0 END +1 AND "from$_subquery$_002"."rowlimit_$$_rownumber">0)
2 - filter(ROW_NUMBER() OVER ( ORDER BY
INTERNAL_FUNCTION("SIMPLETABLE"."RECORD_ID") DESC )<=CASE WHEN (0>=0) THEN 0 ELSE 0
END +1)
Note if the table is empty or contains very few rows, you will se even here a TABLE ACCESS FULL because the optimizer recognises that it is more effective that to first go to the index and that access the table.

Oracle unique constraint violation

I have a table with over 30 million records. When doing insert, I need to avoid the Unique constraint violation.
When I use this NOT EXIST approach, the insert takes forever. In fact, it couldn't finish after 24 hours of running. And I can't use the ignore_row_on_dupkey_index hint, because this table has more than 1 PK columns.
Another option is to insert in subsets. But I want to know if there's any other way before I do sub-setting.
insert into tlb1 a
select * from tlb2 b
where not exists (select 'x' from tlb1 c
where b.pk = c.pk)
The important decision depends on the numbe rof row inserted, i.e. the number of the rows in the table TBL2
If this number is rather low (say in hundreds to thousands) you may use safely your approach, provided there is an index on the PK column(s) - whoch should be to enforce the unique constraint.
Please check that the used execution plan is something like the one below
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 110 | 2860 | 113 (0)| 00:00:02 |
| 1 | LOAD TABLE CONVENTIONAL | TBL1 | | | | |
| 2 | NESTED LOOPS ANTI | | 110 | 2860 | 113 (0)| 00:00:02 |
| 3 | TABLE ACCESS FULL | TBL2 | 110 | 1430 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | TBL1_IXD | 1 | 13 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("B"."PK"="C"."PK")
The NESTED LOOPS ANTI means that for each inserted row a single index lookup will be done to check if the key already exists in the target table.
This will work fine for a low number of inserted rows. For a large insert (millions rows) the optimizer will switch to a HASH JOIN RIGHT ANTI, i.e. all rows from both table will be joined to get th epossible duplicates.
This can take some time (but usually not 24 hours) and the approach with DML Error Logging which eliminates the need of the join.
INSERT INTO tbl1 (pk)
SELECT pk
FROM tbl3
LOG ERRORS INTO err$_tbl1 ('dedup tbl3') REJECT LIMIT UNLIMITED;
This approach will scale well especially when the number of the duplicates is low compared with the number of inserted rows. It is comparable to a normal insert:
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 876K| 10M| 427 (1)| 00:00:06 |
| 1 | LOAD TABLE CONVENTIONAL | TBL1 | | | | |
| 2 | TABLE ACCESS FULL | TBL3 | 876K| 10M| 427 (1)| 00:00:06 |
---------------------------------------------------------------------------------

Fetching data is too slow in Oracle DB (query comparison)

I am hitting my head with the following problem:
I have a table with more than 1,000,000,000 data. Now I am running the following query (acc_no is the primary key):
select acc_no from user where acc_no between 753976276998100 and 78776276998199
The above query ran in less than a second and fetched 100,000 records
But if I add one more column ("service_no") in the same query,
select acc_no,service_no from user where acc_no between 753976276998100 and 78776276998199
.. it is taking more than a minute. Why is that? Why is the first query taking less than a second, and the second query is taking more than a minute?
FYI : service_no is a NUMBER column
If you look at the execution plan for both queries, you'll see that the first query is fulfilled with just an index range scan:
explain plan for
select acc_no from t42
where acc_no between 753976276998100 and 78776276998199;
select * from table (dbms_xplan.display);
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 0 (0)| |
|* 1 | FILTER | | | | | |
|* 2 | INDEX RANGE SCAN| SYS_C0090827 | 1 | 10 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
... which can be quite fast; but the second query has an additional step, table access by index rowid:
explain plan for
select acc_no, service_no from t42
where acc_no between 753976276998100 and 78776276998199;
select * from table (dbms_xplan.display);
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 0 (0)| |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T42 | 1 | 14 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C0090827 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
When you only query for columns that exist in the index - acc_no in this case, which is in the primary key's backing index - only the index has to be touched. There is no need to go and look at the underlying table data for the values you already have from the indexed column.
When your select list includes columns that are not in the index the table data has to be retrieved too, because the other column - service_no is not in the index. That is another disk operation access the data blocks in the table segments. The table data is likely to be scattered across more blocks than the index as well, which amplifies the effect as you might have to fetch a different block for every matching row.
Basically it's having to do much more work to access more data from the disk, so it's going to take longer.

Deadlock in SELECT FOR UPDATE query

I have table say
TAB1
ID, TARGET, STATE, NEXT
Column ID is the primary key.
The query is that is showing deadlock is similar to this
SELECT *
FROM TAB1
WHERE NEXT = (SELECT MIN(NEXT) FROM TAB1 WHERE TARGET=? AND STATE=?) FOR UPDATE
I did an explain plan I see something like this:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8095 | 6 (0)| 00:00:01 |
| 1 | FOR UPDATE | | | | | |
| 2 | BUFFER SORT | | | | | |
|* 3 | TABLE ACCESS FULL | TAB1 | 1 | 8095 | 3 (0)| 00:00:01 |
| 4 | SORT AGGREGATE | | 1 | 2083 | | |
|* 5 | TABLE ACCESS FULL| TAB1 | 1 | 2083 | 3 (0)| 00:00:01 |
Since the query is doing TABLE ACCESS FULL twice, so I'm suspecting 2 session executing the same query will access the rows in different orders.
Can indexing of columns will help in preventing the deadlock? Say creating an index on NEXT??? Or by changing the PRIMARY to NON CLUSTERED KEY?? Note: Normally, the table will have max 1000 rows.
Addind a non clustered index on the NEXT column would indeed boost your performance and reduce your deadlock issues.

Resources