FTS execution plans with NULL query parameters - oracle

Is there any possibility of avoiding full scans, when dealing with incoming NULL query parameters in a stored procedure? Suppose I have 4 parameters, that the user sends from a form and tries to look for an exact match in the table, like this:
SELECT *
FROM table1 t1
WHERE ((:qParam1 is null) OR (t1.col1 = :qParam1)) AND
((:qParam2 is null) OR (t1.col2 = :qParam2)) AND
((:qParam3 is null) OR (t1.col3 = :qParam3)) AND
((:qParam4 is null) OR (t1.col4 = :qParam4));
So when this part of the procedure executes, because of NULL check, it will do a FTS, since the procedure has already been compiled and the execution plan determined. It would need 2^4 different queries to be written inside the procedure in order to always use the most efficient plan considering the incoming query parameters (and considerably more if the input parameter number increases). My question is - is there any way, except for dynamic SQL, to avoid the FTS in these type of queries?

Maybe not. Oracle does not store nulls in the index so it cannot ever use an index when there's a possibility of null in the predicate. If your columns are nullable then no. Having said that, there's a good chance it's the best plan anyway - your query is so vague (ok - flexible) that it would be hard pressed to build a single, useful plan anyway. Oracle is quite smart with flexible plans but there's not really much to go on here.
If you do have nullable columns and an index you might be able to bodge it with
t1.col1 = :qParam2 and t1.col1 is not null
in case it's not smart enough to work that out for itself.

Related

Oracle not using an index in a simple query, despite the hint

I have a table with column status. It is a string nullable column. I also have an index on this field only. Why is the following query not using the index?
select /*+ index(m IDX_STATUS) */ * from messages m where m.status = :1
Try running the query without named (bind) parameters. Sometimes that makes a big difference.
select * from messages m where m.status = 'P'
It may turn out that you don't even need a hint to trigger index usage.
A possible explanation is that the column contains many equal values, for example 90% of rows has status = 'D' (low-cardinality column). Now we can understand Oracle, why it did not use the index :) It simply doesn't make sense on value 'D', but is reasonable for other values. I would prefer Oracle to consider my hint (I know better), but that seems undoable.
In general there is an extremely useful guide Oracle SQL Tuning Guide: The index is being ignored. Still it does not mention the situation, where omitting bind parameter makes the problem go away. That's why I insisted on asking the question on SO.

Oracle index with nested conditions

I have a query, something like
select * from table1 where :iparam is null or iparam = field1;
On field1 there is a non-unique index, but oracle (11g) don't want to use it. As i understand, it optimize query not in run-time, but at compiling. I'm using such query in stored procedures. I wonder, if there is a way, to tell oracle, to use an indexes?
I know about "hints" but i would like to use something on all project, like some optimizer argument, to optimize queries in run-time.
where :iparam is null or :iparam = field1;
Oracle has no way of knowing in advance if you will pass NULL value for :iparam
If you do, full scan is the best way to access data. If you don't, index might be better. You can split this statement in two parts using IF, then there will be no ambiguity.
If you have a lot of fields to compare, dynamic sql migh be a better way.
IF :param1 IS NOT NULL THEN
v_sql := v_sql||' and field1 = :param1';
ELSE
v_sql := v_sql||' and nvl(:param1,1) = 1';
END IF;
ELSE part is for easyer usage of USING.
It is not true that the execution plan is determined at the time of package compilation. It will be determined just before the query is actually executed.
How the optimizer decides to run the query depends on many things. Foremost the availability of statistics. These give the optimizer something to go on. How many records are in the table. How many different values are in the index.
Here is an article that goes into more detail:
http://joco.name/2014/01/05/why-wouldnt-oracle-use-a-perfectly-valid-index/

after alter system flush shared_pool low performance Oracle

We did refactoring and replaced 2 similar requests with parameterized request
a.isGood = :1
after that request that used this parameter with parameter 'Y' was executed longer that usually (become almost the same with parameter 'N'). We used alter system flush shared_pool command and request for parameter 'Y' has completed fast (as before refactoring) while request with parameter 'N' hangs for a long time.
As you could understand number of lines in data base with parameter 'N' much more then with 'Y'
Oracle 10g
Why it happened?
I assume that you have an index on that column, otherwise the performance would be the same regardless of the Y/N combination. I have seen this happening quite bit on 10g+ due to Oracle's optimizer Bind Peeking combined to histograms on columns with skewed data distribution. The histograms get created automatically when one gathers tables statistics using the parameter method_opt with 'FOR ALL COLUMNS SIZE AUTO' (among other values). Oracle optimizes the query for the value in the bind variables provided in the very first execution of that query. If you run the query with Y the first time, Oracle might want to use an index instead of a full table scan, since Y will return a small quantity of rows. The next time you run the query with N, then Oracle will repeat the first execution plan, which happens to be a poor choice for N, since it will return the vast majority of rows.
The execution plans are cached in the SGA. Once you flush it, you get a brand new execution plan the very first time the query runs again.
My suggestion is:
Obtain the explain plan of both original queries (one with a hardcoded Y and one with a hardcode N). Investigate if the two plans use different indexes or one has a much higher Cost than the other. I have the feeling that one uses a full table scan and the other uses an index. The first one should be faster for N and the second should be faster for Y.
Try to remove the statistics on the table and see if it makes a difference on the query that has the bind variable. Later you need to gather statistics again for the table or other queries on that table might suffer.
You can also gather statistics for that one table using method_opt => FOR ALL COLUMNS SIZE 1. That will keep the statistics without the histograms on any columns of that table.
A bitmap index on this column might fix the issue as well. Indexes on a column that have only two possible values (Y and N) are not exactly very efficient.
If column isGood has 99,000 'N' values and 1,000 'Y' values and you run with the condition isGood = 'Y', then it may be appropriate to use an index to find the results: you are returning 1% of the rows. If you run the query with the condition isGood = 'N', a full table scan would be more appropriate since you are returning most of the table anyway. If you were to use an index for the N condition, you would be doing an extra index lookup for every data item lookup.
Although the general rule is that bind parameters are good, it can be problematic in this kind of instance if really two different plans are required for the query. With the bind parameter scenario:
SELECT * FROM x WHERE isGood = :1
The statement will be parsed and a plan computed and saved in the sql cache. The same plan will be used for both query scenarios which is not desirable. But:
SELECT * FROM x WHERE isGood = 'Y'
SELECT * FROM x WHERE isGood = 'N'
will result in two plans being stored in the sql cache, hopefully each with the appropriate plan for the query. Version 11g avoids this problem with adaptive cursor sharing, which can use different plans for different bind variable values.
You need to look at your plans (EXPLAIN PLAN) to see what is happening in your case. Flush the cache, try one method, examine the plan; try the other, examine the plan. It might give you an idea what is happening in your case. There are a bunch of other topics you might follow up on that may help, for example:
using a hint to force the use of an index
cursor_sharing parameter
histograms on statistics

Stumped - Oracle won't use index when value is specified but will when function returns same value

I'm currently working with a database that has two indexes for a specific table. The index I want has two columns "Name" (varchar2) and "Time" (number). When I write the query
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN STARTVALUE AND ENDVALUE
(where STARTVALUE and ENDVALUE are numbers) it does not use the index. However if I use the following query instead
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN MY_FUNC('STARTQUAL') AND MY_FUNC('ENDQUAL')
it does.
The only difference I can think of is that MY_FUNC explicitly returns a value of type NUMBER - is it possible that the query optimizer is confused about the data type for STARTVALUE and ENDVALUE specified explicitly and is refusing to use the index (I saw some similar threads that mentioned a type conflict was the cause).
Note:
The value being returned by MY_FUNC is EXACTLY the same value that I am specifying in the first query.
The index in question is UNDOUBTEDLY (absolutely no question) the correct index to be using and execution times are orders of magnitude faster when it does.
I have even specified a query hint with the first query and it refuses to use the index.
I know there must be something silly / simple that I'm overlooking but I just can't see it.
Thanks in advance for your assistance.
Alternatively, Oracle could be optimizing the queries differently based on whether the query involves literal values or bound values.
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN 7 AND 41;
I'll bet Oracle knows something about the distribution of data in the TIME column, and is making a guess - perhaps using outdated statistics - as to what percentage of rows and blocks (i.e. the selectivity) of that column is. Check to see if there's a histogram on that column.
However, a query like this:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN MY_FUNC('7') AND MY_FUNC('41');
is likely to be optimized as semantically equivalent to:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN :some_bind AND :some_other_bind;
Because Oracle doesn't know what MY_FUNC('7') does - or even that MY_FUNC('7') will always return the same value of 7 - unless you've told Oracle the function's deterministic. So my experience is that Oracle takes a stab in the dark, for the most part, and tends to prefer an index with a high clustering factor. It seems to guess that even if the index isn't the best choice, at least it minimizes the downside risk by visiting as few data blocks as possible.
My recommendation is to find out for yourself why it's behaving differently - take a 10053 trace of each query:
alter session set events = '10053 trace name context forever;
run sql
alter session set events = '10053 trace name context off;
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN STARTVALUE AND ENDVALUE
Here, you have TIME which is a NUMBER, and STARTVALUE and ENDVALUE which are strings (according to your comment). Therefore, an implicit conversion is done - i.e. your query is effectively:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TO_CHAR(TIME) BETWEEN STARTVALUE AND ENDVALUE
Unless you have a function-based index on TO_CHAR(TIME), it won't use an index.
Therefore, you must tell Oracle that you always expect the string parameters to be convertable to numbers, i.e.:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN TO_NUMBER(STARTVALUE) AND TO_NUMBER(ENDVALUE)
(It's always good practice to avoid implicit conversions, especially in queries, anyway)

Oracle 8i date function slow

I'm trying to run the following PL/SQL on an Oracle 8i server (old, I know):
select
-- stuff --
from
s_doc_quote d,
s_quote_item i,
s_contact c,
s_addr_per a,
cx_meter_info m
where
d.row_id = i.sd_id
and d.con_per_id = c.row_id
and i.ship_per_addr_id = a.row_id(+)
and i.x_meter_info_id = m.row_id(+)
and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
and i.prod_id in ('1-QH6','1-QH8')
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
;
Execution is incredibly slow however. Because the server is taken down around midnight each night, it often fails to complete in time.
The execution plan is as follows:
SELECT STATEMENT 1179377
NESTED LOOPS 1179377
NESTED LOOPS OUTER 959695
NESTED LOOPS OUTER 740014
NESTED LOOPS 520332
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 157132
INDEX RANGE SCAN S_QUOTE_ITEM_IDX8 8917
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX UNIQUE SCAN S_DOC_QUOTE_P1 1
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
If I change the following where clause however:
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
To a static value, such as:
and d.created between to_date('20110101','yyyymmdd') and sysdate
the execution plan becomes:
SELECT STATEMENT 5
NESTED LOOPS 5
NESTED LOOPS OUTER 4
NESTED LOOPS OUTER 3
NESTED LOOPS 2
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX RANGE SCAN S_DOC_QUOTE_IDX1 3
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 1
INDEX RANGE SCAN S_QUOTE_ITEM_IDX4 4
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
which begins to return rows almost instantly.
So far, I've tried replacing the dynamic date condition with bind variables, as well as using a subquery which selects a dynamic date from the dual table. Neither of these methods have helped improve performance so far.
Because I'm relatively new to PL/SQL, I'm unable to understand the reasons for such substantial differences in the execution plans.
I'm also trying to run the query as a pass-through from SAS, but for the purposes of testing the execution speed I've been using SQL*Plus.
EDIT:
For clarification, I've already tried using bind variables as follows:
var start_date varchar2(8);
exec :start_date := to_char(add_months(trunc(sysdate,'MM'), -1),'yyyymmdd')
With the following where clause:
and d.created between to_date(:start_date,'yyyymmdd') and sysdate
which returns an execution cost of 1179377.
I would also like to avoid bind variables if possible as I don't believe I can reference them from a SAS pass-through query (although I may be wrong).
I doubt that the problem here has much to do with the execution time of the ADD_MONTHS function. You've already shown that there is a significant difference in the execution plan when you use a hardcoded minimum date. Big changes in execution plans generally have much more impact on run time than function call overhead is likely to, although potentially different execution plans can mean that the function is called many more times. Either way the root problem to look at is why you aren't getting the execution plan you want.
The good execution plan starts off with a range scan on S_DOC_QUOTE_IDX1. Given the nature of the change to the query, I assume this is an index on the CREATED column. Often the optimizer will not choose to use an index on a date column when the filter condition is based on SYSDATE. Because it is not evaluated until execution time, after the execution plan has been determined, the parser cannot make a good estimate of the selectivity of the date filter condition. When you use a hardcoded start date instead, the parser can use that information to determine selectivity, and makes a better choice about the use of the index.
I would have suggested bind variables as well, but I think because you are on 8i the optimizer can't peek at bind values, so this leaves it just as much in the dark as before. On a later Oracle version I would expect that the bind solution would be effective.
However, this is a good case where using literal substitution is probably more appropriate than using a bind variable, since (a) the start date value is not user-specified, and (b) it will remain constant for the whole month, so you won't be parsing lots of slightly different queries.
So my suggestion is to write some code to determine a static value for the start date and concatenate it directly into the query string before parsing & execution.
First of all, the reason you are getting different execution time is not because Oracle executes the date function a lot. The execution of this SQL function, even if it is done for each and every row (it probably is not by the way), only takes a negligible amount of time compared to the time it takes to actually retrieve the rows from disk/memory.
You are getting completely different execution times because, as you have noticed, Oracle chooses a different access path. Choosing one access path over another can lead to orders of magnitudes of difference in execution time. The real question therefore, is not "why does add_months takes time?" but:
Why does Oracle choose this particular unefficient path while there is a more efficient one?
To answer this question, one must understand how the optimizer works. The optimizer chooses a particular access path by estimating the cost of several access paths (all of them if there are only a few tables) and choosing the execution plan that is expected to be the most efficient. The algorithm to determine the cost of an execution plan has rules and it makes its estimation based on statistics gathered from your data.
As all estimation algorithms, it makes assumptions about your data, such as the general distribution based on min/max value of columns, cardinality, and the physical distribution of the values in the segment (clustering factor).
How this applies to your particular query
In your case, the optimizer has to make an estimation about the selectivity of the different filter clauses. In the first query the filter is between two variables (add_months(trunc(sysdate,'MM'), -1) and sysdate) while in the other case the filter is between a constant and a variable.
They look the same to you because you have substituted the variable by its value, but to the optimizer the cases are very different: the optimizer (at least in 8i) only computes an execution plan once for a particular query. Once the access path has been determined, all further execution will get the same execution plan. It can not, therefore, replace a variable by its value because the value may change in the future, and the access plan must work for all possible values.
Since the second query uses variables, the optimizer cannot determine precisely the selectivity of the first query, so the optimizer makes a guess, and that results in your case in a bad plan.
What can you do when the optimizer doesn't choose the correct plan
As mentionned above, the optimizer sometimes makes bad guesses, which result in suboptimal access path. Even if it happens rarely, this can be disastrous (hours instead of seconds). Here are some actions you could try:
Make sure your stats are up-to-date. The last_analyzed column on ALL_TABLES and ALL_INDEXES will tell you when was the last time the stats were collected on these objects. Good reliable stats lead to more accurate guesses, leading (hopefully) to better execution plan.
Learn about the different options to collect statistics (dbms_stats package)
Rewrite your query to make use of constants, when it makes sense, so that the optimizer will make more reliable guesses.
Sometimes two logically identical queries will result in different execution plans, because the optimizer will not compute the same access paths (of all possible paths).
There are some tricks you can use to force the optimizer to perform some join before others, for example:
Use rownum to materialize a subquery (it may take more temporary space, but will allow you to force the optimizer through a specific step).
Use hints, although most of the time I would only turn to hints when all else fails. In particular, I sometimes use the LEADING hint to force the optimizer to start with a specific table (or couple of table).
Last of all, you will probably find that the more recent releases have a generally more reliable optimizer. 8i is 12+ years old and it may be time for an upgrade :)
This is really an interesting topic. The oracle optimizer is ever-changing (between releases) it improves over time, even if new quirks are sometimes introduced as defects get corrected. If you want to learn more, I would suggest Jonathan Lewis' Cost Based Oracle: Fundamentals
That's because the function is run for every comparison.
sometimes it's faster to put it in a select from dual:
and d.created
between (select add_months(trunc(sysdate,'MM'), -1) from dual)
and sysdate
otherwise, you could also join the date like this:
select
-- stuff --
from
s_doc_quote d,
s_quote_item i,
s_contact c,
s_addr_per a,
cx_meter_info m,
(select add_months(trunc(sysdate,'MM'), -1) as startdate from dual) sd
where
d.row_id = i.sd_id
and d.con_per_id = c.row_id
and i.ship_per_addr_id = a.row_id(+)
and i.x_meter_info_id = m.row_id(+)
and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
and i.prod_id in ('1-QH6','1-QH8')
and d.created between sd.startdate and sysdate
Last option and actually the best chance of improved performance: Add a date parameter to the query like this:
and d.created between :startdate and sysdate
[edit]
I'm sorry, I see you already tried options like these. Still odd. If the constant value works, the bind parameter should work as well, as long as you keep the add_months function outside the query.
This is SQL. You may want to use PL/SQL and save the calculation add_months(trunc(sysdate,'MM'), -1) into a variable first ,then bind that.
Also, I've seen SAS calcs take a long while due to pulling data across the network and doing additional work on each row it processes. Depending on your environment, you may consider creating a temp table to store the results of these joins first, then hitting the temp table (try a CTAS).

Resources