Auto trace statistics comparison - oracle

How to tell which oracle plan is good when comparing different queries which produce same number of rows ?
If I have to consider last_consistent_gets to be low, I see the elapsed time is more.
And for other query elapsed time is less but last_consistent_gets are more.
It’s very confusing.

The elapsed time is usually the most important metric for Oracle performance. In theory, we may occasionally want to sacrifice the run time of one SQL statement to preserve resources for other statements. In practice, those situations are rare.
In your specific case, there are many times when a statement that consumes more consistent gets is both faster and more efficient. For example, when retrieving a large percentage of data from a table, a full table scan is often more efficient than an index scan. A full table scan can use a multi-block read, which can be much more efficient than the multiple single-block reads of an index scan. Storage systems generally are much faster at reading large chunks of data than multiple small chunks.
The below example compares reading 25% of the data from a table. The index approach uses only half as many consistent gets, but it is also more than twice as slow.
Sample Schema
Create a simple table and index and gather stats.
create table test1(a number, b number);
insert into test1 select level, level from dual connect by level <= 1000000;
create index test1_ids on test1(a);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
Autotrace
The code below shows the full table scan consumes 2082 consistent gets and forcing an index access consumes 1078 consistent gets.
JHELLER#orclpdb> set autotrace on;
JHELLER#orclpdb> set linesize 120;
JHELLER#orclpdb> select sum(b) from test1 where a >= 750000;
SUM(B)
----------
2.1875E+11
Execution Plan
----------------------------------------------------------
Plan hash value: 3896847026
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 597 (3)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| TEST1 | 250K| 2441K| 597 (3)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A">=750000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2082 consistent gets
0 physical reads
0 redo size
552 bytes sent via SQL*Net to client
404 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
JHELLER#orclpdb> select /*+ index(test1) */ sum(b) from test1 where a >= 750000;
SUM(B)
----------
2.1875E+11
Execution Plan
----------------------------------------------------------
Plan hash value: 1247966541
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 1084 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST1 | 250K| 2441K| 1084 (1)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | TEST1_IDS | 250K| | 563 (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("A">=750000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1078 consistent gets
0 physical reads
0 redo size
552 bytes sent via SQL*Net to client
424 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Performance
If you run the statements a hundred times in a loop (and run those loops multiple times to ignore caching and other system activity), the full table scan version runs much faster than the forced index scan version.
--Seconds to run plan with more consistent gets: 1.7, 1.7, 1.8
declare
v_count number;
begin
for i in 1 .. 100 loop
select sum(b) into v_count from test1 where a >= 750000;
end loop;
end;
/
--Seconds to run plan with less consistent gets: 4.5, 4,5, 4.5
declare
v_count number;
begin
for i in 1 .. 100 loop
select /*+ index(test1) */ sum(b) into v_count from test1 where a >= 750000;
end loop;
end;
/
Exceptions
There are some times when resource consumption is more important than elapsed time. For example, parallelism is kind of cheating in that it forces the system to work harder, not smarter. A single out-of-control parallel query can take down an entire system. There are also times when you need to break up statements into less efficient versions to decrease the amount of time something is locked, or to avoid consuming too much UNDO or temporary tablespace.
But the above examples are somewhat uncommon exceptions, and they generally only happen when dealing with data warehouses that query a large amount of data. For most OLTP systems, where every query takes less than a second, the elapsed time is the only metric you need to worry about.

Related

Oracle - strategy for large number of inserts but only using last insert in an OLTP environment

I am inserting 10000 rows into an Oracle OLTP table every 30 seconds. This about 240Mb of data every half an hour. All 10000 rows have the same timestamp which I floor to a 30 second boundary. I also have 3 indexes one of which is a spatial point geometry index (latitude and longitude). The timestamp is also indexed.
During a test the 2 CPUs showed 50% utilization and Input/Output showed 80% with inserts doubling in duration after a half an hour.
I also select from the table to get the last inserted timestamp 10000 rows by using a sub-query to find the maximum timestamp, due to this being two different processes (Python for inserts and google maps for select). I tried to employ a strategy whereby I tried to use the current time to retrieve the last 10000 rows but I could not get it to work even when go for the before last 10000 rows. It often returned no rows.
My question is how can I retrieve the last inserted 10000 rows efficiently and what type of index and/or table would be most appropriate where all 10000 rows have the same timestamp value. Keeping the insert time low and it not doubling in duration would however be of more importance, so not sure whether a history table is needed in addition while only keeping the last row in the current table; but surely that will double the amount of IO which seems to be the biggest issue currently. Any advice will be appreciated.
The database can "walk" down the "right hand side" of an index to very quickly get the maximum value. Here's an example
SQL> create table t ( ts date not null, x int, y int, z int );
Table created.
SQL>
SQL> begin
2 for i in 1 .. 100
3 loop
4 insert into t
5 select sysdate, rownum, rownum, rownum
6 from dual
7 connect by level <= 10000;
8 commit;
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
SQL>
SQL> create index ix on t (ts );
Index created.
SQL>
SQL> set autotrace on
SQL> select max(ts) from t;
MAX(TS)
---------
12-JUN-20
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1223533863
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 9 | | |
| 2 | INDEX FULL SCAN (MIN/MAX)| IX | 1 | 9 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
6 recursive calls
0 db block gets
92 consistent gets
8 physical reads
0 redo size
554 bytes sent via SQL*Net to client
383 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
So 92 consistent gets is pretty snappy...However, you can probably go better by jumping the very last leaf block with a descending index read, eg
SQL> select *
2 from (
3 select ts from t order by ts desc
4 )
5 where rownum = 1;
TS
---------
12-JUN-20
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3852867534
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 3 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1184K| 10M| 3 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN DESCENDING| IX | 1184K| 10M| 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
9 recursive calls
5 db block gets
9 consistent gets
0 physical reads
1024 redo size
549 bytes sent via SQL*Net to client
430 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
So your current index is fine. Simply get the highest timestamp as per above and you're good to go

How to populate same value for multiple rows using pl/sql

How do I populate the same data into multiple rows, if the employee id is the same. Without querying the table every time
E.g.
If I get below rows from the Employee Table
EMPLID CHANGETIME
------ --------------
1234 8/10/2017
1234 8/11/2017
For the above employee I need to query the NAME table to get the names and populate both rows.
EMPLID CHANGETIME FirstNAME LastNAME
------ ---------- --------- --------
1234 08/10/17 JOHN MATHEW
1234 08/11/17 JOHN MATHEW
When I query first time would like to store it in array or some variable and populate the same if EMPLID matches previous one.
Just want to do this to improve performance. Any hint would be helpful.
Right now I'm using bulk insert into type table and it goes and searches the NAME table every time a row is fetched from EMPLOYEE table
I would use a join for getting the employee name (also within pl/sql) like:
SELECT e.emplid, e.first_name, e.last_name, c.changetime
FROM employee_changes c
INNER JOIN employee e ON e.emplid = c.emplid
WHERE c.change_time > sysdate - 30
ORDER BY e.emplid, c.change_time
Select can be used as cursor if you want to ...
I think you need some extra criteria, like "the last change time".
In that case you can code something like this:
SELECT e.EMPLID, e.CHANGETIME, n.FirstNAME, n.LastNAME
FROM Employee e, NAME n
WHERE e.emplid = n.emplid
AND e.changetime = (SELECT MAX(e1.changetime)
FROM Employee e1
WHERE e1.emplid = e.emplid);
"For each Employee, just get de max changetime"
If you are using 11g you should consider using the Result Cache feature. This allows us to define functions whose returned values are stored in memory. Something like this:
create or replace function get_name
(p_empid pls_integer)
return varchar2
result_cache relies_on (names)
is
return_value varchar2(30);
begin
select empname into return_value
from names
where empid = p_empid;
return return_value;
end get_name;
/
Note the RELIES_ON clause: this is option but it makes the point that caching is only useful for slowly changing tables. If there's a lot of churn in the NAMES table Oracle will keep flushing the cache and you won't get much benefit. But at least the results will be correct - something which you can't guarantee with your current approach.
Here is a sample. Ignore the elapsed times, but mark how little effort is required to get the same employee name on the second call:
SQL> set autotrace on
SQL> set timing on
SQL> select get_name(7934) from dual
2 /
GET_NAME(7934)
------------------------------------------------------------------------------------------------------------------------------------------------------
MILLER
Elapsed: 00:00:01.25
Execution Plan
----------------------------------------------------------
Plan hash value: 1388734953
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
| 1 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
Statistics
----------------------------------------------------------
842 recursive calls
166 db block gets
1074 consistent gets
44 physical reads
30616 redo size
499 bytes sent via SQL*Net to client
437 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
134 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
Second call:
SQL> r
1* select get_name(7934) from dual
GET_NAME(7934)
------------------------------------------------------------------------------------------------------------------------------------------------------
MILLER
Elapsed: 00:00:00.13
Execution Plan
----------------------------------------------------------
Plan hash value: 1388734953
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
| 1 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
499 bytes sent via SQL*Net to client
437 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
The documentation has lots on this nifty feature. Find out more.

Oracle is not using the Indexes

I have a very large table in oracle 11g that has a very simple index in a char field (that is normally Y or N)
If I just execute the queue as bellow it takes around 10s to return
select QueueId, QueueSiteId, QueueData from queue where QueueProcessed = 'N'
However if I force it to use the index I create it takes 80ms
select /*+ INDEX(avaqueue QUEUEPROCESSED_IDX) */ QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N'
Also if I run under the explain plan for as bellow:
explain plan for select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N'
and
explain plan for select /*+ INDEX(avaqueue QUEUEPROCESSED_IDX) */
QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N'
For the frist plan I got:
------------------------------------------------------------------------------
Plan hash value: 803924726
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 691K| 128M| 12643 (1)| 00:02:32 |
|* 1 | TABLE ACCESS FULL| AVAQUEUE | 691K| 128M| 12643 (1)| 00:02:32 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("QUEUEPROCESSED"='N')
For the second pla I got:
Plan hash value: 2012309891
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 691K| 128M| 24386 (1)| 00:04:53 |
| 1 | TABLE ACCESS BY INDEX ROWID| AVAQUEUE | 691K| 128M| 24386 (1)| 00:04:53 |
|* 2 | INDEX RANGE SCAN | QUEUEPROCESSED_IDX | 691K| | 1297 (1)| 00:00:16 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("QUEUEPROCESSED"='N')
------------------------------------------------------------------------------
What proves that if I don't explicit tell oracle to use the index it does not use it, my question is why is oracle not using this index? Oracle is normally smart enough to make decisions 10 times better than me, that is the first time I actually have to force oracle to use a index and I am not very comfortable with it.
Does anyone have a good explanation for oracle decision to not use the index in this very explicit case?
The QueueProcessed column is probably missing a histogram so Oracle does not know the data is skewed.
If Oracle does not know the data is skewed it will assume the equality predicate, QueueProcessed = 'N', returns DBA_TABLES.NUM_ROWS /
DBA_TAB_COLUMNS.NUM_DISTINCT. The optimizer thinks the query returns half the rows in the table. Based on the 80ms return time the real number of rows returned is small.
Index range scans generally only work well when they select a small percentage of the rows. Index range scans read from a data structure one block at a time. And if the data is randomly distributed, it may need to read every block of data from the table anyway. For those reasons, if the query accesses a large portion of the table, it is more efficient to use a multi-block full table scan.
The bad cardinality estimate from the skewed data causes Oracle to think a full table scan is better. Creating a histogram will fix the issue.
Sample schema
Create a table, fill it with skewed data, and gather statistics the first time.
drop table queue;
create table queue(
queueid number,
queuesiteid number,
queuedata varchar2(4000),
queueprocessed varchar2(1)
);
create index QUEUEPROCESSED_IDX on queue(queueprocessed);
--Skewed data - only 100 of the 100000 rows are set to N.
insert into queue
select level, level, level, decode(mod(level, 1000), 0, 'N', 'Y')
from dual connect by level <= 100000;
begin
dbms_stats.gather_table_stats(user, 'QUEUE');
end;
/
The first execution will have the problem.
In this case the default statistics settings do not gather histograms the first time. The plan shows a full table scan and estimates Rows=50000, exactly half.
explain plan for
select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N';
select * from table(dbms_xplan.display);
Plan hash value: 1157425618
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 878K| 103 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| QUEUE | 50000 | 878K| 103 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("QUEUEPROCESSED"='N')
Create a histogram
The default statistics settings are usually sufficient. Histogram may not be collected for several reasons. They may be manually disabled - check for the tasks, jobs or preferences set by the DBA.
Also, histograms are only automatically collected on columns that are both skewed and used. Gathering histograms can take time, there's no need to create the histogram on a column that is never used in a relevant predicate. Oracle tracks when a column is used and could benefit from a histogram, although that data is lost if the table is dropped.
Running a sample query and re-gathering statistics will make the histogram appear:
select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N';
begin
dbms_stats.gather_table_stats(user, 'QUEUE');
end;
/
Now the Rows=100 and the Index is used.
explain plan for
select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N';
select * from table(dbms_xplan.display);
Plan hash value: 2630796144
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 1800 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| QUEUE | 100 | 1800 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | QUEUEPROCESSED_IDX | 100 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("QUEUEPROCESSED"='N')
Here's the histogram:
select column_name, histogram
from dba_tab_columns
where table_name = 'QUEUE'
order by column_name;
COLUMN_NAME HISTOGRAM
----------- ---------
QUEUEDATA NONE
QUEUEID NONE
QUEUEPROCESSED FREQUENCY
QUEUESITEID NONE
Create the histogram
Try to determine why the histogram was missing. Check that statistics are gathered with the defaults, there are no weird column or table preferences, and that table is not constantly dropped and re-loaded.
If you cannot rely on the default statistics job for your process you can manually gather histograms with the method_opt parameter like this:
begin
dbms_stats.gather_table_stats(user, 'QUEUE', method_opt=>'for columns size 254 queueprocessed');
end;
/
The answer - at least the first one that will just lead to more questions - is right there in the plans. The first plan has an estimated cost and estimated execution time about half that of the second plan. In the absence of the hint, Oracle is choosing the plan that it thinks will run faster.
So of course the next question is why is its estimate so far off in this case. Not only are the estimated times wrong relative to each other, both are much greater than what you actually experience when running the query.
The first thing I would look at is the estimated number of rows returned. The optimizer is guessing, in both cases, that there are about 691,000 rows in table matching your predicate. Is this close to the truth, or very far off? If it's far off, then refreshing statistics may be the right solution. Although if the column only has two possible values, I'd be kind of surprised if the existing stats are so off base.

Monitor index usage

Is there a way to find if a a particular Oracle index was ever used by Oracle when executing a query?
We have a function based index, which I suspect is not getting used by Oracle and hence some queries are running slow. How could I find out if any query run against the database is using this query?
If the question is : if there are any queries that ever use the index?
ALTER INDEX myindex MONITORING USAGE;
Wait a few days/months/years:
SELECT *
FROM v$object_usage
WHERE index_name = 'MYINDEX';
http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes004.htm#i1006905
If you're using some sort of IDE (e.g. Oracle's SQL Developer, PL/SQL Developer from Allround Automations, Toad, etc) each one of them has some way to dump the plan for a statement - poke around in the menus and the on-line help.
If you can get into SQL*Plus (try typing "sql" at your friendly command line) you can turn autotrace on, execute your statement, and the plan should be printed. As in
SQL> set autotrace on
SQL> select * from dept where deptno = 40;
DEPTNO DNAME LOC
---------- -------------- -------------
40 OPERATIONS BOSTON
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=18)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT' (Cost=1 Card=1 Bytes=18)
2 1 INDEX (UNIQUE SCAN) OF 'PK_DEPT' (UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
499 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
This assumes that your friendly neighborhood DBA has performed the necessary incantations to enable this feature. If this hasn't been done, or you just want One More Way (tm) to do this, try something like the following, substituting the query you care about:
SQL> EXPLAIN PLAN FOR select * from dept where deptno = 40;
Explained.
SQL> set linesize 132
SQL> SELECT * FROM TABLE( dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 2852011669
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DEPTNO"=40)
14 rows selected.
Share and enjoy.

is there a tricky way to optimize this query

I'm working on a table that has 3008698 rows
exam_date is a DATE field.
But queries I run want to match only the month part. So what I do is:
select * from my_big_table where to_number(to_char(exam_date, 'MM')) = 5;
which I believe takes long because of function on the column. Is there a way to avoid this and make it faster? other than making changes to the table? exam_date in the table have different date values. like 01-OCT-10 or 12-OCT-10...and so on
I don't know Oracle, but what about doing
WHERE exam_date BETWEEN first_of_month AND last_of_month
where the two dates are constant expressions.
select * from my_big_table where MONTH(exam_date) = 5
oops.. Oracle huh?..
select * from my_big_table where EXTRACT(MONTH from exam_date) = 5
Bear in mind that since you want approximately 1/12th of all the data, it may well be more efficient for Oracle to perform a full table scan anyway. This may explain why performance was worse when you followed harpo's advice.
Why? Suppose your data is such that 20 rows fit on each database block (on average), so that you have a total of 3,000,000/20 = 150,000 blocks. That means a full table scan will require 150,000 block reads. Now about 1/12th of the 3,000,000 rows will be for month 05. 3,000,000/12 is 250,000. So that's 250,000 table reads if you use the index - and that's ignoring the index reads that will also be required. So in this example the full table scan does a lot less work than the indexed search.
Bear in miond that there are only twelve distinct values for MONTH. So unless you have a strongly clustered set of records (say if you use partitioining) it is possible that using an index is not necessarily the most efficient way of querying in this fashion.
I didn't find that using EXTRACT() lead the optimizer to use a regular index on my date column but YMMV:
SQL> create index big_d_idx on big_table(col3) compute statistics
2 /
Index created.
SQL> set autotrace traceonly explain
SQL> select * from big_table
2 where extract(MONTH from col3) = 'MAY'
3 /
Execution Plan
----------------------------------------------------------
Plan hash value: 3993303771
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23403 | 1028K| 4351 (3)| 00:00:53 |
|* 1 | TABLE ACCESS FULL| BIG_TABLE | 23403 | 1028K| 4351 (3)| 00:00:53 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('M
AY'))
SQL>
What definitely can persuade the optimizer to use an index in these scenarios is building a function-based index:
SQL> create index big_mon_fbidx on big_table(extract(month from col3))
2 /
Index created.
SQL> select * from big_table
2 where extract(MONTH from col3) = 'MAY'
3 /
Execution Plan
----------------------------------------------------------
Plan hash value: 225326446
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23403 | 1028K| 475 (0)|00:00:06|
| 1 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 23403 | 1028K| 475 (0)|00:00:06|
|* 2 | INDEX RANGE SCAN | BIG_MON_FBIDX | 9361 | | 382 (0)|00:00:05|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('MAY'))
SQL>
The function call means that Oracle won't be able to use any index that might be defined on the column.
Either remove the function call (as in harpo's answer) or use a function based index.

Resources