Oracle scalar function in WHERE clause leads to poor performance - oracle

I've written a scalar function (DYNAMIC_DATE) that converts a text value to a date/time. For example, DYANMIC_DATE('T-1') (T-1 = today minus 1 = 'yesterday') returns 08-AUG-2012 00:00:00. It also accepts date strings: DYNAMIC_DATE('10/10/1974').
The function makes use of CASE statements to parse the sole parameter and calculate a date relative to sysdate.
While it doesn't make use of any table in its schema, it does make use of TABLE type to store date-format strings:
TYPE VARCHAR_TABLE IS TABLE OF VARCHAR2(10);
formats VARCHAR_TABLE := VARCHAR_TABLE ('mm/dd/rrrr','mm-dd-rrrr','rrrr/mm/dd','rrrr-mm-dd');
When I use the function in the SELECT clause, the query returns in < 1 second:
SELECT DYNAMIC_DATE('MB-1') START_DATE, DYNAMIC_DATE('ME-1') END_DATE
FROM DUAL
If I use it against our date dimension table (91311 total records), the query completes in < 1 second:
SELECT count(1)
from date_dimension
where calendar_dt between DYNAMIC_DATE('MB-1') and DYNAMIC_DATE('ME-1')
Others, however, are having problems with the function if it is used against a larger table (26,301,317 records):
/*
cost: 148,840
records: 151,885
time: ~20 minutes
*/
SELECT count(1)
FROM ORDERS ord
WHERE trunc(ord.ordering_date) between DYNAMIC_DATE('mb-1') and DYNAMIC_DATE('me-1')
However, the same query, using 'hard coded' dates, returns fairly rapidly:
/*
cost: 144,257
records: 151,885
time: 62 seconds
*/
SELECT count(1)
FROM ORDERS ord
WHERE trunc(ord.ordering_date) between to_date('01-JUL-2012','dd-mon-yyyy') AND to_date('31-JUL-2012','dd-mon-yyyy')
The vendor's vanilla installation doesn't include an index on the ORDERING_DATE field.
The explain plans for both queries are similar:
with function:
with hard-coded dates:
Is the DYNAMIC_DATE function being called repeatedly in the WHERE clause?
What else might explain the disparity?
** edit **
A NONUNIQUE index was added to ORDERS table. Both queries execute in < 1 second. Both plans are the same (approach), but the one with the function is lower cost.
I removed the DETERMINISTIC keyword from the function; the query executed in < 1 second.
Is the issue really with the function or was it related to the table?
3 years from now, when this table is even larger, and if I don't include the DETERMINISTIC keyword, will query performance suffer?
Will the DETERMINISTIC keyword have any affect on the function's results? If I run DYNAMIC_DATE('T-1') tomorrow, will I get the same results as if I ran it today (08/09/2012)? If so, this approach won't work.

If the steps of the plan are identical, then the total amount of work being done should be identical. If you trace the session (something simple like set autotrace on in SQL*Plus or something more sophisticated like an event 10046 trace), or if you look at DBA_HIST_SQLSTAT assuming you have licensed access to the AWR tables, are you seeing (roughly) the same amount of logical I/O and CPU consumption for the two queries? Is it possible that the difference in runtime you are seeing is the result of the data being cached when you run the second query?

I am guessing that the problem isn't with your function. Try creating a function based index on trunc(ord.ordering_date) and see the explain plans.
CREATE INDEX ord_date_index ON ord(trunc(ord.ordering_date));

Related

Oracle Parameterized Query Performance

Execution time differs too much between the queries below. These are the generated queries from an app using Entity Framework.
The first one is non-parameterized query that takes 0,559 seconds.
SELECT
"Project1"."C2" AS "C1",
"Project1"."C1" AS "C2",
"Project1"."KEYFIELD" AS "KEYFIELD"
FROM ( SELECT
"Extent1"."KEYFIELD" AS "KEYFIELD",
CAST( "Extent1"."LOCALDT" AS date) AS "C1",
2 AS "C2"
FROM "MYTABLE" "Extent1"
WHERE (
("Extent1"."LOCALDT" >= to_timestamp('2017-01-01','YYYY-MM-DD')) AND
("Extent1"."LOCALDT" <= to_timestamp('2018-01-01','YYYY-MM-DD'))
)
) "Project1"
ORDER BY "Project1"."C1" DESC;
The other one has parameterized WHERE clause. It takes 18,372 seconds to fetch the data:
SELECT
"Project1"."C2" AS "C1",
"Project1"."C1" AS "C2",
"Project1"."KEYFIELD" AS "KEYFIELD"
FROM ( SELECT
"Extent1"."KEYFIELD" AS "KEYFIELD",
CAST( "Extent1"."LOCALDT" AS date) AS "C1",
2 AS "C2"
FROM "MYTABLE" "Extent1"
WHERE (
("Extent1"."LOCALDT" >= :p__linq__0) AND
("Extent1"."LOCALDT" <= :p__linq__1)
)
) "Project1"
ORDER BY "Project1"."C1" DESC;
I know that parameterized queries are pretty useful for caching. How can I find the way to improve the performance of the parameterized query?
"parameterized queries are pretty useful for caching"
Just to be clear, when we use bind variables what gets cached is the parsed query and the execution plan. The assumption is that given a query like ...
where col1 = :p1
and col2 = :p2
... the same plan works as well when :p1 = 23 and :p2 = 42 as when :p1 = 42 and :p2 = 23. If our data has an even distribution then the assumption holds good. But if our data has some form of skew we may end up with a plan which works well for one specific combination of values but is rubbish for most of the other queries our users need to run. This is a phenomenon known as bind variable peeking.
Date range queries are a notorious case in point. Your first query provides values that will match records for a well defined range. Presuming that retrieves a narrow slice of the table. However, with the second query the specified date range could be anything: a day, a week, a month, a year, a - well you get the picture.
The upshot is, an index range scan could be very efficient for the first query and shocking for the second.
To understand more you need to explore the specific query:
Run explain plans for the two versions of the queries, understand the differences. (Make sure you're working with realistic (production-like) data: not just volumes but distribution and skew as well.
Check the statistics are accurate, and consider whether refreshing them might help.
Understand the skew of the data, and check whether you are suffering from bind variable peeking. Perhaps you need to look at adaptive cursors.
Alternatively you may need to avoid using bind variables. Especially with date ranged queries on large tables it is not unusual to pass actual values for the date arguments. The cost of parsing the query each time it is executed is offset by getting the best plan for each set of parameters.
In short, we should understand our data and the way our users need to work with it, then write queries accordingly.

Oracle difference between !=(<>) and not in

Today I heard that a query with <> will take more time to execute than one with not in.
I tried to test this and with an equal plan had the following time results:
select * from test_table where test <> 'test'
0,063 seconds
select * from test_table where test not in ('test')
0,073 seconds
So the question is, what is the difference between <> and not in for a single condition and what is better to use.
Whether or not the column is indexed, I would expect both queries to perform a full scan on the table, i.e the query plan is essentially the same. The small timing difference you noted is probably insignificant - run the same query more than once and you will get different timings.
Having said that I would use <> because it is more natural.

oracle : how to ensure that a function in the where clause will be called only after all the remaining where clauses have filtered the result?

I am writing a query to this effect:
select *
from players
where player_name like '%K%
and player_rank<10
and check_if_player_is_eligible(player_name) > 1;
Now, the function check_if_player_is_eligible() is heavy and, therefore, I want the query to filter the search results sufficiently and then only run this function on the filtered results.
How can I ensure that the all filtering happens before the function is executed, so that it runs the minimum number of times ?
Here's two methods where you can trick Oracle into not evaluating your function before all the other WHERE clauses have been evaluated:
Using rownum
Using the pseudo-column rownum in a subquery will force Oracle to "materialize" the subquery. See for example this askTom thread for examples.
SELECT *
FROM (SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1)
WHERE check_if_player_is_eligible(player_name) > 1
Here's the documentation reference "Unnesting of Nested Subqueries":
The optimizer can unnest most subqueries, with some exceptions. Those exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery.
Using CASE
Using CASE you can force Oracle to only evaluate your function when the other conditions are evaluated to TRUE. Unfortunately it involves duplicating code if you want to make use of the other clauses to use indexes as in:
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND CASE
WHEN player_name LIKE '%K%'
AND player_rank < 10
THEN check_if_player_is_eligible(player_name)
END > 1
There is the NO_PUSH_PRED hint to do it without involving rownum evaluation (that is a good trick anyway) in the process!
SELECT /*+NO_PUSH_PRED(v)*/*
FROM (
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
) v
WHERE check_if_player_is_eligible(player_name) > 1
You usually want to avoid forcing a specific order of execution. If the data or the query changes, your hints and tricks may backfire. It's usually better to provide useful metadata to Oracle so it can make the correct decisions for you.
In this case, you can provide better optimizer statistics about the function with ASSOCIATE STATISTICS.
For example, if your function is very slow because it has to read 50 blocks each time it is called:
associate statistics with functions
check_if_player_is_eligible default cost(1000 /*cpu*/, 50 /*IO*/, 0 /*network*/);
By default Oracle assumes that a function will select a row 1/20th of the time. Oracle wants to eliminate as many rows as soon
as possible, changing the selectivity should make the function less likely to be executed first:
associate statistics with functions
check_if_player_is_eligible default selectivity 90;
But this raises some other issues. You have to pick a selectivity for ALL possible conditions, 90% certainly won't always be accurate. The IO cost is the number of blocks fetched, but CPU cost is "machine instructions used", what exactly does that mean?
There are more advanced ways to customize statistics,for example using the Oracle Data Cartridge Extensible Optimizer. But data cartridge is probably one of the most difficult Oracle features.
You did't specify whether player.player_name is unique or not. One could assume that it is and then the database has to call the function at least once per result record.
But, if player.player_name is not unique, you would want to minimize the calls down to count(distinct player.player_name) times. As (Ask)Tom shows in Oracle Magazine, the scalar subquery cache is an efficient way to do this.
You would have to wrap your function call into a subselect in order to make use of the scalar subquery cache:
SELECT players.*
FROM players,
(select check_if_player_is_eligible(player.player_name) eligible) subq
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1
AND subq.eligible = 1
Put the original query in a derived table then place the additional predicate in the where clause of the derived table.
select *
from (
select *
from players
where player_name like '%K%
and player_rank<10
) derived_tab1
Where check_if_player_is_eligible(player_name) > 1;

Oracle 8i date function slow

I'm trying to run the following PL/SQL on an Oracle 8i server (old, I know):
select
-- stuff --
from
s_doc_quote d,
s_quote_item i,
s_contact c,
s_addr_per a,
cx_meter_info m
where
d.row_id = i.sd_id
and d.con_per_id = c.row_id
and i.ship_per_addr_id = a.row_id(+)
and i.x_meter_info_id = m.row_id(+)
and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
and i.prod_id in ('1-QH6','1-QH8')
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
;
Execution is incredibly slow however. Because the server is taken down around midnight each night, it often fails to complete in time.
The execution plan is as follows:
SELECT STATEMENT 1179377
NESTED LOOPS 1179377
NESTED LOOPS OUTER 959695
NESTED LOOPS OUTER 740014
NESTED LOOPS 520332
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 157132
INDEX RANGE SCAN S_QUOTE_ITEM_IDX8 8917
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX UNIQUE SCAN S_DOC_QUOTE_P1 1
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
If I change the following where clause however:
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
To a static value, such as:
and d.created between to_date('20110101','yyyymmdd') and sysdate
the execution plan becomes:
SELECT STATEMENT 5
NESTED LOOPS 5
NESTED LOOPS OUTER 4
NESTED LOOPS OUTER 3
NESTED LOOPS 2
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX RANGE SCAN S_DOC_QUOTE_IDX1 3
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 1
INDEX RANGE SCAN S_QUOTE_ITEM_IDX4 4
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
which begins to return rows almost instantly.
So far, I've tried replacing the dynamic date condition with bind variables, as well as using a subquery which selects a dynamic date from the dual table. Neither of these methods have helped improve performance so far.
Because I'm relatively new to PL/SQL, I'm unable to understand the reasons for such substantial differences in the execution plans.
I'm also trying to run the query as a pass-through from SAS, but for the purposes of testing the execution speed I've been using SQL*Plus.
EDIT:
For clarification, I've already tried using bind variables as follows:
var start_date varchar2(8);
exec :start_date := to_char(add_months(trunc(sysdate,'MM'), -1),'yyyymmdd')
With the following where clause:
and d.created between to_date(:start_date,'yyyymmdd') and sysdate
which returns an execution cost of 1179377.
I would also like to avoid bind variables if possible as I don't believe I can reference them from a SAS pass-through query (although I may be wrong).
I doubt that the problem here has much to do with the execution time of the ADD_MONTHS function. You've already shown that there is a significant difference in the execution plan when you use a hardcoded minimum date. Big changes in execution plans generally have much more impact on run time than function call overhead is likely to, although potentially different execution plans can mean that the function is called many more times. Either way the root problem to look at is why you aren't getting the execution plan you want.
The good execution plan starts off with a range scan on S_DOC_QUOTE_IDX1. Given the nature of the change to the query, I assume this is an index on the CREATED column. Often the optimizer will not choose to use an index on a date column when the filter condition is based on SYSDATE. Because it is not evaluated until execution time, after the execution plan has been determined, the parser cannot make a good estimate of the selectivity of the date filter condition. When you use a hardcoded start date instead, the parser can use that information to determine selectivity, and makes a better choice about the use of the index.
I would have suggested bind variables as well, but I think because you are on 8i the optimizer can't peek at bind values, so this leaves it just as much in the dark as before. On a later Oracle version I would expect that the bind solution would be effective.
However, this is a good case where using literal substitution is probably more appropriate than using a bind variable, since (a) the start date value is not user-specified, and (b) it will remain constant for the whole month, so you won't be parsing lots of slightly different queries.
So my suggestion is to write some code to determine a static value for the start date and concatenate it directly into the query string before parsing & execution.
First of all, the reason you are getting different execution time is not because Oracle executes the date function a lot. The execution of this SQL function, even if it is done for each and every row (it probably is not by the way), only takes a negligible amount of time compared to the time it takes to actually retrieve the rows from disk/memory.
You are getting completely different execution times because, as you have noticed, Oracle chooses a different access path. Choosing one access path over another can lead to orders of magnitudes of difference in execution time. The real question therefore, is not "why does add_months takes time?" but:
Why does Oracle choose this particular unefficient path while there is a more efficient one?
To answer this question, one must understand how the optimizer works. The optimizer chooses a particular access path by estimating the cost of several access paths (all of them if there are only a few tables) and choosing the execution plan that is expected to be the most efficient. The algorithm to determine the cost of an execution plan has rules and it makes its estimation based on statistics gathered from your data.
As all estimation algorithms, it makes assumptions about your data, such as the general distribution based on min/max value of columns, cardinality, and the physical distribution of the values in the segment (clustering factor).
How this applies to your particular query
In your case, the optimizer has to make an estimation about the selectivity of the different filter clauses. In the first query the filter is between two variables (add_months(trunc(sysdate,'MM'), -1) and sysdate) while in the other case the filter is between a constant and a variable.
They look the same to you because you have substituted the variable by its value, but to the optimizer the cases are very different: the optimizer (at least in 8i) only computes an execution plan once for a particular query. Once the access path has been determined, all further execution will get the same execution plan. It can not, therefore, replace a variable by its value because the value may change in the future, and the access plan must work for all possible values.
Since the second query uses variables, the optimizer cannot determine precisely the selectivity of the first query, so the optimizer makes a guess, and that results in your case in a bad plan.
What can you do when the optimizer doesn't choose the correct plan
As mentionned above, the optimizer sometimes makes bad guesses, which result in suboptimal access path. Even if it happens rarely, this can be disastrous (hours instead of seconds). Here are some actions you could try:
Make sure your stats are up-to-date. The last_analyzed column on ALL_TABLES and ALL_INDEXES will tell you when was the last time the stats were collected on these objects. Good reliable stats lead to more accurate guesses, leading (hopefully) to better execution plan.
Learn about the different options to collect statistics (dbms_stats package)
Rewrite your query to make use of constants, when it makes sense, so that the optimizer will make more reliable guesses.
Sometimes two logically identical queries will result in different execution plans, because the optimizer will not compute the same access paths (of all possible paths).
There are some tricks you can use to force the optimizer to perform some join before others, for example:
Use rownum to materialize a subquery (it may take more temporary space, but will allow you to force the optimizer through a specific step).
Use hints, although most of the time I would only turn to hints when all else fails. In particular, I sometimes use the LEADING hint to force the optimizer to start with a specific table (or couple of table).
Last of all, you will probably find that the more recent releases have a generally more reliable optimizer. 8i is 12+ years old and it may be time for an upgrade :)
This is really an interesting topic. The oracle optimizer is ever-changing (between releases) it improves over time, even if new quirks are sometimes introduced as defects get corrected. If you want to learn more, I would suggest Jonathan Lewis' Cost Based Oracle: Fundamentals
That's because the function is run for every comparison.
sometimes it's faster to put it in a select from dual:
and d.created
between (select add_months(trunc(sysdate,'MM'), -1) from dual)
and sysdate
otherwise, you could also join the date like this:
select
-- stuff --
from
s_doc_quote d,
s_quote_item i,
s_contact c,
s_addr_per a,
cx_meter_info m,
(select add_months(trunc(sysdate,'MM'), -1) as startdate from dual) sd
where
d.row_id = i.sd_id
and d.con_per_id = c.row_id
and i.ship_per_addr_id = a.row_id(+)
and i.x_meter_info_id = m.row_id(+)
and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
and i.prod_id in ('1-QH6','1-QH8')
and d.created between sd.startdate and sysdate
Last option and actually the best chance of improved performance: Add a date parameter to the query like this:
and d.created between :startdate and sysdate
[edit]
I'm sorry, I see you already tried options like these. Still odd. If the constant value works, the bind parameter should work as well, as long as you keep the add_months function outside the query.
This is SQL. You may want to use PL/SQL and save the calculation add_months(trunc(sysdate,'MM'), -1) into a variable first ,then bind that.
Also, I've seen SAS calcs take a long while due to pulling data across the network and doing additional work on each row it processes. Depending on your environment, you may consider creating a temp table to store the results of these joins first, then hitting the temp table (try a CTAS).

Oracle - Understanding the no_index hint

I'm trying to understand how no_index actually speeds up a query and haven't been able to find documentation online to explain it.
For example I have this query that ran extremely slow
select *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And one of our DBAs was able to speed it up significantly by doing this
select /*+ NO_INDEX(TAB_000000000019) */ *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And I can't figure out why? I would like to figure out why this works so I can see if I can apply it to another query (this one a join) to speed it up because it's taking even longer to run.
Thanks!
** Update **
Here's what I know about the table in the example.
It's a 'partitioned table'
TAB_000000000019 is the table not a column in it
field1 is indexed
Oracle's optimizer makes judgements on how best to run a query, and to do this it uses a large number of statistics gathered about the tables and indexes. Based on these stats, it decides whether or not to use an index, or to just do a table scan, for example.
Critically, these stats are not automatically up-to-date, because they can be very expensive to gather. In cases where the stats are not up to date, the optimizer can make the "wrong" decision, and perhaps use an index when it would actually be faster to do a table scan.
If this is known by the DBA/developer, they can give hints (which is what NO_INDEX is) to the optimizer, telling it not to use a given index because it's known to slow things down, often due to out-of-date stats.
In your example, TAB_000000000019 will refer to an index or a table (I'm guessing an index, since it looks like an auto-generated name).
It's a bit of a black art, to be honest, but that's the gist of it, as I understand things.
Disclaimer: I'm not a DBA, but I've dabbled in that area.
Per your update: If field1 is the only indexed field, then the original query was likely doing a fast full scan on that index (i.e. reading through every entry in the index and checking against the filter conditions on field1), then using those results to find the rows in the table and filter on the other conditions. The conditions on field1 are such that an index unique scan or range scan (i.e. looking up specific values or ranges of values in the index) would not be possible.
Likely the optimizer chose this path because there are two filter predicates on field1. The optimizer would calculate estimated selectivity for each of these and then multiply them to determine their combined selectivity. But in many cases this will significantly underestimate the number of rows that will match the condition.
The NO_INDEX hint eliminates this option from the optimizer's consideration, so it essentially goes with the plan it thinks is next best -- possibly in this case using partition elimination based on one of the other filter conditions in the query.
Using an index degrades query performance if it results in more disk IO compared to querying the table with an index.
This can be demonstrated with a simple table:
create table tq84_ix_test (
a number(15) primary key,
b varchar2(20),
c number(1)
);
The following block fills 1 Million records into this table. Every 250th record is filled with a rare value in column b while all the others are filled with frequent value:
declare
rows_inserted number := 0;
begin
while rows_inserted < 1000000 loop
if mod(rows_inserted, 250) = 0 then
insert into tq84_ix_test values (
-1 * rows_inserted,
'rare value',
1);
rows_inserted := rows_inserted + 1;
else
begin
insert into tq84_ix_test values (
trunc(dbms_random.value(1, 1e15)),
'frequent value',
trunc(dbms_random.value(0,2))
);
rows_inserted := rows_inserted + 1;
exception when dup_val_on_index then
null;
end;
end if;
end loop;
end;
/
An index is put on the column
create index tq84_index on tq84_ix_test (b);
The same query, but once with index and once without index, differ in performance. Check it out for yourself:
set timing on
select /*+ no_index(tq84_ix_test) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
select /*+ index(tq84_ix_test tq84_index) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
Why is it? In the case without the index, all database blocks are read, in sequential order. Usually, this is costly and therefore considered bad. In normal situation, with an index, such a "full table scan" can be reduced to reading say 2 to 5 index database blocks plus reading the one database block that contains the record that the index points to. With the example here, it is different altogether: the entire index is read and for (almost) each entry in the index, a database block is read, too. So, not only is the entire table read, but also the index. Note, that this behaviour would differ if c were also in the index because in that case Oracle could choose to get the value of c from the index instead of going the detour to the table.
So, to generalize the issue: if the index does not pick few records then it might be beneficial to not use it.
Something to note about indexes is that they are precomputed values based on the row order and the data in the field. In this specific case you say that field1 is indexed and you are using it in the query as follows:
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString'
In the query snippet above the filter is on both a variable piece of data since the percent (%) character cradles the string and then on another specific string. This means that the default Oracle optimization that doesn't use an optimizer hint will try to find the string inside the indexed field first and also find if the data it is a sub-string of the data in the field, then it will check that the data doesn't match another specific string. After the index is checked the other columns are then checked. This is a very slow process if repeated.
The NO_INDEX hint proposed by the DBA removes the optimizer's preference to use an index and will likely allow the optimizer to choose the faster comparisons first and not necessarily force index comparison first and then compare other columns.
The following is slow because it compares the string and its sub-strings:
field1_ like '%someGenericString%'
While the following is faster because it is specific:
field1_ like 'someSpecificString'
So the reason to use the NO_INDEX hint is if you have comparisons on the index that slow things down. If the index field is compared against more specific data then the index comparison is usually faster.
I say usually because when the indexed field contains more redundant data like in the example #Atish mentions above, it will have to go through a long list of comparison negatives before a positive comparison is returned. Hints produce varying results because both the database design and the data in the tables affect how fast a query performs. So in order to apply hints you need to know if the individual comparisons you hint to the optimizer will be faster on your data set. There are no shortcuts in this process. Applying hints should happen after proper SQL queries have been written because hints should be based on the real data.
Check out this hints reference: http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm
To add to what Rene' and Dave have said, this is what I have actually observed in a production situation:
If the condition(s) on the indexed field returns too many matches, Oracle is better off doing a Full Table Scan.
We had a report program querying a very large indexed table - the index was on a region code and the query specified the exact region code, so Oracle CBO uses the index.
Unfortunately, one specific region code accounted for 90% of the tables entries.
As long as the report was run for one of the other (minor) region codes, it completed in less than 30 minutes, but for the major region code it took many hours.
Adding a hint to the SQL to force a full table scan solved the problem.
Hope this helps.
I had read somewhere that using a % in front of query like '%someGenericString%' will lead to Oracle ignoring the INDEX on that field. Maybe that explains why the query is running slow.

Resources