I am hitting my head with the following problem:
I have a table with more than 1,000,000,000 data. Now I am running the following query (acc_no is the primary key):
select acc_no from user where acc_no between 753976276998100 and 78776276998199
The above query ran in less than a second and fetched 100,000 records
But if I add one more column ("service_no") in the same query,
select acc_no,service_no from user where acc_no between 753976276998100 and 78776276998199
.. it is taking more than a minute. Why is that? Why is the first query taking less than a second, and the second query is taking more than a minute?
FYI : service_no is a NUMBER column
If you look at the execution plan for both queries, you'll see that the first query is fulfilled with just an index range scan:
explain plan for
select acc_no from t42
where acc_no between 753976276998100 and 78776276998199;
select * from table (dbms_xplan.display);
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 0 (0)| |
|* 1 | FILTER | | | | | |
|* 2 | INDEX RANGE SCAN| SYS_C0090827 | 1 | 10 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
... which can be quite fast; but the second query has an additional step, table access by index rowid:
explain plan for
select acc_no, service_no from t42
where acc_no between 753976276998100 and 78776276998199;
select * from table (dbms_xplan.display);
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 0 (0)| |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T42 | 1 | 14 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C0090827 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
When you only query for columns that exist in the index - acc_no in this case, which is in the primary key's backing index - only the index has to be touched. There is no need to go and look at the underlying table data for the values you already have from the indexed column.
When your select list includes columns that are not in the index the table data has to be retrieved too, because the other column - service_no is not in the index. That is another disk operation access the data blocks in the table segments. The table data is likely to be scattered across more blocks than the index as well, which amplifies the effect as you might have to fetch a different block for every matching row.
Basically it's having to do much more work to access more data from the disk, so it's going to take longer.
Related
I have a table(table1) monthly partitions on dt column. I have create a local index on dt column. When I run the below query I see optimizer going for full partition scan instead of using the index on dt column.
WITH
A AS
(
Select * from table1
WHERE
EXISTS (SELECT u_id FROM table2
WHERE u_id=UPPER('ABC'))
)
SELECT DISTINCT
A.id,
A.dt
FROM
A
WHERE
A.dt BETWEEN timestamp '2022-04-01 00:00:00' AND timestamp '2022-04-01 23:59:59.999000000'
Explain plan
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 755K| 67M| | 34447 (1)| 00:00:02 | | |
| 1 | HASH UNIQUE | | 755K| 67M| 75M| 34447 (1)| 00:00:02 | | |
|* 2 | FILTER | | | | | | | | |
| 3 | PARTITION RANGE SINGLE| | 755K| 67M| | 18298 (1)| 00:00:01 | 5 | 5 |
|* 4 | TABLE ACCESS FULL | TABLE1 | 755K| 67M| | 18298 (1)| 00:00:01 | 5 | 5 |
|* 5 | INDEX UNIQUE SCAN | SYS_C0099684 | 1 | 15 | | 0 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "TEST"."TABLE2"
"TABLE2" WHERE "U_ID"=U'ABC'))
4 - filter("TABLE1"."DT"<=TIMESTAMP' 2022-04-02 23:59:59.999000000' AND
"TABLE1"."DT">=TIMESTAMP' 2022-04-02 00:00:00.000000000')
5 - access("U_ID"=U'ABC')
First of all, I'm not sure that INDEX RANGE SCAN would be better here:
We don't know your average row length, so maybe full partition scan using multiblock reads would be faster than sequential single block reads of the index and partition (table access by rowid), especially considering that you need 1/30 part of the partition.
Just consider the following example:
partition size = 100 GB
rows in partition = 400 million
avg row length = 100GB/400M =~ 250 byte
Since we need 1 day, we should get ~400M/30 = 13.3M rows. That means that in case of index range scan we need 13.3 M single block reads only to get column "id" from the table by rowid from the index (table access by rowid), and that's even without counting single block reads required from the index scan. Assume, that your average single block read time is 3ms in that system, so you will need more than 13.3M * 3ms = 40000 seconds only to read table rows in case of index access.
Now consider full partition scan. If your multiblock_read_count = 64 (if I remember correctly by default it's 128) and block size is default 8kB, you will need to perform 100GB/(64*8kB) = 195k multiblock reads. Assuming that your multiblock read time is 8ms, your full partition scan would take just ~1500 sec
I'm using Oracle 18c but I guess my question would not be bound to the specific version.
I want to fetch rows from a table but I found a complex, ugly solution.
I would like to know if there is better, simple query that can return the same result as following.
First of all, I have a simple table like this.
Note that col is going to store large text.
CREATE TABLE simpletable
(record_id NUMBER,
col CLOB,
PRIMARY KEY (record_id));
I want to retrieve single row from the above table and whichever row is acceptable.
First query came to my mind is as following.
SELECT * FROM (SELECT * FROM simpletable) WHERE rownum <= 1;
Another is as following.
SELECT * FROM (SELECT * FROM simpletableORDER BY record_id) WHERE rownum <= 1;
Unfortunately, neither of above two does not use primary-key index and uses TABLE ACCESS FULL which can take long time when the table grows enough large.
(I'm guessing that oracle preferred the simpler plan because my table is not enough large yet to use index scan.
Oracle might choose different plan if the table grows up further.)
My final solution that uses primary-key index to narrow down the table access is following.
SELECT simpletable.* FROM
(SELECT * FROM
(SELECT record_id, ROWID as id FROM simpletable ORDER BY record_id)
WHERE rownum<=1) a
JOIN simpletable ON a.id = simpletable.ROWID;
If you have a better solution, please let me know.
It would be very appreciated.
P.S.
The first two queries produced the following plan.
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 2015 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 2015 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | SIMPLETABLE | 1 | 2015 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
the final one is:
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2039 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2039 | 3 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | VIEW | | 1 | 25 | 2 (0)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_C007561 | 1 | 25 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID| SIMPLETABLE | 1 | 2014 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
I think using OFFSET..FETCH method might helps you here -
SELECT *
FROM simpletable
ORDER BY record_id
OFFSET 0 ROWS
FETCH FIRST ROW ONLY;
If the Cost Based Optimizer code has access to reliable statistics, from its dictionary, regarding all the objects available for this query, then it will very likely produce an optimal execution plan. Of course, there are exceptions and you would argue with their support people as to whether or not choosing a suboptimal plan is a bug.
In this specific case, if you are querying a single table and the CBO could choose between a full table scan and some other scan and then chose a full table scan, then chances were good that the CBO determined that the number of blocks scanned (buffer gets) would have been smaller using a full table scan.
You can expose the truth of the matter by tracing the execution of multiple versions of the statement, each one using a different set of hints to force a particular execution plan. You should consider the execution with the fewest buffer gets to be the winner. Alternatively, if the execution plan is of a serial nature, then you can use response time as measure. If the winner is not automatically chosen by the CBO, then it's probably because the statistics it used were not accurate and you should make them accurate. If the statistics are indeed accurate then Oracle support will probably give you a very long homework assignment.
Similar to the Horror Vacui, some database developers suffer under the Horror FULL TABLE SCAN by simply assuming index access good, full scan bad.
But this is not true, FULL TABLE SCAN is a normal access method, that is preferred in some situation.
Let's illustrate it on a simple example with 10K rows in your table
insert into simpletable (record_id, col)
select rownum, rpad('x',3998,'y')
from dual connect by level <= 10000
To get one arbitrary row from the table you simple use the following query
select * from simpletable where rownum = 1;
Here is the output (edited for brevity) you get from SQL*Plus with setting set autotrace traceonly to see the execution plan and the statistics.
Execution Plan
----------------------------------------------------------
Plan hash value: 1007892724
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2015 | 2 (0)| 00:00:01
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS FULL| SIMPLETABLE | 10188 | 19M| 2 (0)| 00:00:01
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
Statistics
----------------------------------------------------------
5 consistent gets
2 physical reads
1 rows processed
The most important information is the statistics consistent gets - there were only 5 blocks accessed - the table is much larger.
What is the explanation? See the operation COUNT STOPKEY above the TABLE ACCESS FULL this ensures that the scan is terminated after the first row is found.
If you want to get a specific row, e.g. the one with the highest ID, the prefered approach is using the row_limiting_clause
SELECT *
FROM simpletable
ORDER BY record_id DESC
OFFSET 0 ROWS FETCH NEXT 1 ROW ONLY;
You will see the execution plan below, that performs first the INDEX FULL SCAN DESCENDING. The complete (full) index will be red in the descending order, but again due to STOPKEY you break after reading the highest key (which is the first entry due to the descending order).
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 1 | VIEW | | 10188 | 19M| 613 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY | | 10188 | 19M| 613 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| SIMPLETABLE | 10188 | 19M| 613 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN DESCENDING| SYS_C008793 | 10188 | | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN (0>=0) THEN
0 ELSE 0 END +1 AND "from$_subquery$_002"."rowlimit_$$_rownumber">0)
2 - filter(ROW_NUMBER() OVER ( ORDER BY
INTERNAL_FUNCTION("SIMPLETABLE"."RECORD_ID") DESC )<=CASE WHEN (0>=0) THEN 0 ELSE 0
END +1)
Note if the table is empty or contains very few rows, you will se even here a TABLE ACCESS FULL because the optimizer recognises that it is more effective that to first go to the index and that access the table.
I have a table with over 30 million records. When doing insert, I need to avoid the Unique constraint violation.
When I use this NOT EXIST approach, the insert takes forever. In fact, it couldn't finish after 24 hours of running. And I can't use the ignore_row_on_dupkey_index hint, because this table has more than 1 PK columns.
Another option is to insert in subsets. But I want to know if there's any other way before I do sub-setting.
insert into tlb1 a
select * from tlb2 b
where not exists (select 'x' from tlb1 c
where b.pk = c.pk)
The important decision depends on the numbe rof row inserted, i.e. the number of the rows in the table TBL2
If this number is rather low (say in hundreds to thousands) you may use safely your approach, provided there is an index on the PK column(s) - whoch should be to enforce the unique constraint.
Please check that the used execution plan is something like the one below
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 110 | 2860 | 113 (0)| 00:00:02 |
| 1 | LOAD TABLE CONVENTIONAL | TBL1 | | | | |
| 2 | NESTED LOOPS ANTI | | 110 | 2860 | 113 (0)| 00:00:02 |
| 3 | TABLE ACCESS FULL | TBL2 | 110 | 1430 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | TBL1_IXD | 1 | 13 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("B"."PK"="C"."PK")
The NESTED LOOPS ANTI means that for each inserted row a single index lookup will be done to check if the key already exists in the target table.
This will work fine for a low number of inserted rows. For a large insert (millions rows) the optimizer will switch to a HASH JOIN RIGHT ANTI, i.e. all rows from both table will be joined to get th epossible duplicates.
This can take some time (but usually not 24 hours) and the approach with DML Error Logging which eliminates the need of the join.
INSERT INTO tbl1 (pk)
SELECT pk
FROM tbl3
LOG ERRORS INTO err$_tbl1 ('dedup tbl3') REJECT LIMIT UNLIMITED;
This approach will scale well especially when the number of the duplicates is low compared with the number of inserted rows. It is comparable to a normal insert:
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 876K| 10M| 427 (1)| 00:00:06 |
| 1 | LOAD TABLE CONVENTIONAL | TBL1 | | | | |
| 2 | TABLE ACCESS FULL | TBL3 | 876K| 10M| 427 (1)| 00:00:06 |
---------------------------------------------------------------------------------
I have a table foo which was created like this.
CREATE TABLE foo AS SELECT * FROM all_objects;
CREATE INDEX foo_I1 ON foo(owner,object_type,status);
exec dbms_stats.gather_table_stats('hr','foo',method_opt=>'FOR ALL COLUMNS size AUTO');
I created an index on 3 columns and firing a query which looks like below.
select * from foo where status='INVALID';
select * from foo where status='VALID';
status='VALID' fetches near about 71000 rows in a table of 71780 rows. it does a full table scan. it's understandable. but in case of status='INVALID' which fetches only 3 rows , it's doing full table scan. It's also getting A rows and E rows very different.
PLAN: same for both queries.
SQL_ID gdhy9j91gu9sm, child number 0
select /*+gather_plan_statistics */ * from foo where status='VALID'
Plan hash value: 1245013993
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 50 |00:00:00.01 | 4 |
|* 1 | TABLE ACCESS FULL| FOO | 1 | 71773 | 50 |00:00:00.01 | 4 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='VALID')
Please explan this behaviour. Database Version: 11.2g oracle.
A missing histogram is probably causing the full table scan. Histograms are usually only created if the data is skewed and if the column has been used in a relevant predicate.
Sometimes you need to run a query before gathering statistics, to let Oracle know that this column is important enough to deserve a histogram.
select * from foo where status='INVALID';
exec dbms_stats.gather_table_stats('hr','foo',method_opt=>'FOR ALL COLUMNS size AUTO');
Re-run the SELECT and now it can use the histogram. With the histogram Oracle knows that INVALID returns a small number of rows, and an index would be useful:
explain plan for select * from foo where status='INVALID';
select * from table(dbms_xplan.display);
Plan hash value: 1520589999
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 134 | 217 (0)| 00:00:01|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| FOO | 1 | 134 | 217 (0)| 00:00:01|
|* 2 | INDEX SKIP SCAN | FOO_I1 | 1 | | 216 (0)| 00:00:01|
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STATUS"='INVALID')
filter("STATUS"='INVALID')
I have table say
TAB1
ID, TARGET, STATE, NEXT
Column ID is the primary key.
The query is that is showing deadlock is similar to this
SELECT *
FROM TAB1
WHERE NEXT = (SELECT MIN(NEXT) FROM TAB1 WHERE TARGET=? AND STATE=?) FOR UPDATE
I did an explain plan I see something like this:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8095 | 6 (0)| 00:00:01 |
| 1 | FOR UPDATE | | | | | |
| 2 | BUFFER SORT | | | | | |
|* 3 | TABLE ACCESS FULL | TAB1 | 1 | 8095 | 3 (0)| 00:00:01 |
| 4 | SORT AGGREGATE | | 1 | 2083 | | |
|* 5 | TABLE ACCESS FULL| TAB1 | 1 | 2083 | 3 (0)| 00:00:01 |
Since the query is doing TABLE ACCESS FULL twice, so I'm suspecting 2 session executing the same query will access the rows in different orders.
Can indexing of columns will help in preventing the deadlock? Say creating an index on NEXT??? Or by changing the PRIMARY to NON CLUSTERED KEY?? Note: Normally, the table will have max 1000 rows.
Addind a non clustered index on the NEXT column would indeed boost your performance and reduce your deadlock issues.