Oracle composite index - oracle

If I have a table with columns a,b,c,d and and pk b-tree index on (a,b,c) in that order. I want to query like so:
(1)
select b, d from table
where a = :p1
and c = :p2
I.e. missing a where clause on the b column for perfect leveraging the index. Now the b column can only have one of a few possible values (20 unique) but c (and a) can have a lot (100 000's). I figured it would be more efficient to rewrite the query to:
(2)
select /*+USE_NL(table)*/ b, d from table
where a = :p1
and b IN (<allPossibleValues>)
and c = :p2
but I haven't been able to find any oracle documentation that explains how the range scan in (1) works when a non-leading column is missing from the composite index. All the sources seem to only cover the case where the leading column is missing. Those sources suggest using a skip scan like so:
(3)
select /*+INDEX_SS(table <theIndex>)*/ b, d from table
where a = :p1
and c = :p2
Would that work when the missing column is not the leading one but the second one (b). As I said all the sources I've found explaining skip scan have the leading column missing. Would query (2) and/or (3) be better than query (1).

Before starting premature optimization, check the explain plan and the performance of your original query in your production environment.
The Oracle query optimizer is quite good at choosing the correct plan. Depending on the size of your table it will probably choose either full index scan (I guess the table has way to many rows for this to happen), or index range scan.
If Oracle fails to choose a plan with good performance, then you can start optimizing.

Related

Oracle join using Hint USE_NL USE_HASH

What is the best way to Force execution plan to do only nested loop joins for all tables using Hint USE_NL in once case,
And in other case to do only Hash Join using USE_HASH hint for all tables
I want to run both query and see which has low cost in execution plan and use, please suggest
My doubt is in which sequence i should put for all 4 tables inside HINT like below
USE_NL(bl1_gain_adj,customers,bl1_gain,bl1_reply_code)
SELECT bl1_gain_adj.adj_seq_no,
bl1_gain_adj.amount_currency ,
bl1_gain_adj.gain_seq_no,
customers.loan_key,
customers.customer_key,
FROM
bl1_gain_adj,
customers,
bl1_gain,
bl1_reply_code
WHERE
bl1_gain.loan_key = customers.loan_key
AND bl1_gain.customer_key = customers.customer_key
AND bl1_gain.receiver_customer = customers.customer_no
AND bl1_gain.cycle_seq_no = customers.cycle_seq_no
AND bl1_reply_code.gain_code = bl1_gain.gain_code
AND bl1_reply_code.revenue_code = 'RC'
AND bl1_gain_adj.gain_seq_no = bl1_gain.gain_seq_no
AND bl1_gain_adj.customer_key = bl1_gain.customer_key;
Records in tables
---------------
bl1_gain_adj = 100 records
customers = 10 Million records
bl1_gain = 1 Million records
bl1_reply_code = 100 million records
Keeping aside the choice of the most appropriate hint for your query (if any), the order you write the table names/aliases in the USE_NL hint does not matter.
According to Oracle documentation:
Note that USE_NL(table1 table2) is not considered a multi-table hint
because it is a shortcut for USE_NL(table1) and USE_NL(table2)
About USE_NL, Oracle says:
The USE_NL hint instructs the optimizer to join each specified table
to another row source with a nested loops join, using the specified
table as the inner table.
That is, if you write USE_NL(table1 table2 table3 table4) this means "use all these tables as inner tables in a nested loop join"; if your query only has these 4 tables, the hint will be ignored for at least one table: to use a table as inner, we need another table to use as outer, so it's impossible to use all the tables as inner.
LEADING does something different, regarding the order in which tables are scanned:
The LEADING hint instructs the optimizer to use the specified set of
tables as the prefix in the execution plan.

How to update a column with concatenate of two other column in a same table

I have a table with 3 columns a, b and c. I want to know how to update the value of third column with concatenate of two other columns in each row.
before update
A B c
-------------
1 4
2 5
3 6
after update
A B c
-------------
1 4 1_4
2 5 2_5
3 6 3_6
How can I do this in oracle?
Use the concatentation operator ||:
update mytable set
c = a || '_' || b
Or better, to avoid having to rerun this whenever rows are inserted or updated:
create view myview as
select *, a || '_' || b as c
from mytable
Firstly, you are violating the rules of normalization. You must re-think about the design. If you have the values in the table columns, then to get a computed value, all you need is a select statement to fetch the result the way you want. Storing computed values is generally a bad idea and considered a bad design.
Anyway,
Since you are on 11g, If you really want to have a computed column, then I would suggest a VIRTUAL COLUMN than manually updating the column. There is a lot of overhead involved with an UPDATE statement. Using a virtual column would reduce a lot of the overhead. Also, you would completely get rid of the manual effort and those lines of code to do the update. Oracle does the job for you.
Of course, you will use the same condition of concatenation in the virtual column clause.
Something like,
Column_c varchar2(50) GENERATED ALWAYS AS (column_a||'_'||column_b) VIRTUAL
Note : There are certain restrictions on its use. So please refer the documentation before implementing it. However, for the simple use case provided by OP, a virtual column is a straight fit.
Update I did a small test. There were few observations. Please read this question for a better understanding about how to implement my suggestion.

Use of index in multiple join condition oracle

I have two tables: tableA and tableB
TableA have millions of record and tableB have around 1000 records
Table A {
aid
city, (city is indexed)
state,
X,
Y
}
Table B {
bid,
city,
state
}
Now my query is
SELECT X, Y, COUNT(*) FROM A,B
WHERE A.city = B.city
and A.state=B.state
group by X,Y
This query is running very slow. However when we had join only on city everything was working very quickly.
Now my query is
SELECT X, Y, COUNT(*) FROM A,B
WHERE A.city = B.city
group by X,Y
So I went to the explain plan and in the first case(slow) the query plan is not using the index whereas in the second case it was using the city index. I tried adding state index in A table which did not help as expected. Also i tried to use the index hint like /*+ INDEX(A,city_idx) */ after select which did not help much. Can you help me out in this case?
Creating indexes for both tables on city and state is likely to help.
Create a composite index on the table A that has all the four columns: city, state, X, Y:
CREATE INDEX index_name ON table_name (city, state, X, Y);
In this way, your query won't need to access the table A, only the newly created index. Of course, the downside of yet another index -> insert/update/delete in this table will be slower.
TableA have millions of record and tableB have around 1000
In this case using nested loops seems like the most suited access path for the job.
you are requesting a aggregation based on two columns from table A meaning oracle will have to access pretty much all the blocks in the table anyway. In this case creating an index on the big table will be useless. creating an index on the small, inner table of the join, will make sense.
WHERE A.city = B.city and A.state=B.state
WHERE A.city = B.city
Can the same city exist in two states ? sounds unlikely... if a city cannot exists in more then one state then any index on state (in either table) will be redundant.
As #Florin Ghita noted in his comment you can use the hint USE_NL to force oracle to use nested loops but personally, I highly recommend avoiding hints (for so many reasons - mostly maintenance).
my suggestions are
gather stats on both tables to make sure oracle knows the
proportions and have sufficient data to estimate cardinality
exec dbms_stats.gather_table_stats(user,'tableX').
Test the query with parallel execution - parallel is great at
speeding NL between small and big tables by broadcasting the entire
small table to the slave process working the big table chunk (get
even further with compression on the small table).
Cities and states are related but the optimizer does not understand that. Oracle can probably accurately predict each condition separately but not together.
For example, assume that 10% of all states match and 10% of all cities match. When both conditions are present Oracle will estimate 0.1 * 0.1 = 0.01. The real number is probably closer to 0.1. If the city name matches the state name will almost always match.
Adding extended statistics tells Oracle about this column relationship. And these statistics can help any query, not just the current problem query.
declare
v_name varchar2(100);
begin
v_name := dbms_stats.create_extended_stats(user, 'A', '(city, state)');
v_name := dbms_stats.create_extended_stats(user, 'B', '(city, state)');
dbms_stats.gather_table_stats(user, 'A');
dbms_stats.gather_table_stats(user, 'B');
end;
/
Without the plans we can't accurately predict whether this will solve the problem or not. But giving the optimizer more accurate information usually helps and almost never hurts.

different columns on select query results different costs

There is an index at table invt_item_d on (item_id & branch_id & co_id) columns.
The plan results for the first query are TABLE ACCESS FULL and cost is 528,
results for the second query are INDEX FAST FULL SCAN (my index) and cost is 27.
The only difference is, as you can see, the selected column is used in index on the second query.
Is there something wrong with this? And please, can you tell me what should I do to fix this at db administration level?
select d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
select d.item_id
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
EDIT:
i made a new query and this query's cost is 529, with TABLE ACCESS FULL.
select qty from invt_item_d
so it doesn't matter if i use an index or not. Some says this is normal, is this a normal behaviour really?
In the first case, the table must be accessed, since the "qty" column is only stored in the table.
In the second case, all the columns used in the query can be read from the index, skipping the table read altogether.
You can add another index on columns (item_id, branch_id, co_id, qty) and it will most probably be used in the first query.
From the Oracle documentation: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
A fast full index scan is a full index scan in which the database
accesses the data in the index itself without accessing the table, and
the database reads the index blocks in no particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set. For this result to be guaranteed, at least one column in the
index must have either:
A NOT NULL constraint
A predicate applied to it that prevents nulls from being considered in the query result set
This is exactly the main purpose of using index -- make search faster.
Querying columns with indexes are faster compared to querying columns without indexes.
Its basic oracle knowledge.
I am adding another answer because it seems to be more convinient.
First:
" i doesn't hit the index because there are 34000 rows, not millions". This is COMPLETELY WRONG and a dangerous understanding.
What I meant was, if there are a few thousand rows, and the index is not hit(oracle engine does a full table scan(TABLE ACCESS FULL) then), its not a big deal. Oracle is fast enough to read few thousand rows in a matter of a second(even without indexes) , and hence you wont feel the difference.The query is still slower(than the occasion when there is an index) , but its is so minimally slower that you wont feel the difference.
But, if there are millions of rows, the execution of the query will be much, much slower without index ( as this time it will scan millions of rows in a full table scan)and your performance will be hit.
Second: Why on earth do you have to loop over a table with 34000 rows, that too 4000 times???
Thats a terrible approach. Avoid loops as much as possible.There has to be a better approach!
Third:
You can force the oracle optimiser to hit the index by using the index hint.You will need to know the name of the index for that.
select /*+ index(invt_item_d <index_name>) */
d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
Here is the link to a stack overflow question on index hint

Oracle - Understanding the no_index hint

I'm trying to understand how no_index actually speeds up a query and haven't been able to find documentation online to explain it.
For example I have this query that ran extremely slow
select *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And one of our DBAs was able to speed it up significantly by doing this
select /*+ NO_INDEX(TAB_000000000019) */ *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And I can't figure out why? I would like to figure out why this works so I can see if I can apply it to another query (this one a join) to speed it up because it's taking even longer to run.
Thanks!
** Update **
Here's what I know about the table in the example.
It's a 'partitioned table'
TAB_000000000019 is the table not a column in it
field1 is indexed
Oracle's optimizer makes judgements on how best to run a query, and to do this it uses a large number of statistics gathered about the tables and indexes. Based on these stats, it decides whether or not to use an index, or to just do a table scan, for example.
Critically, these stats are not automatically up-to-date, because they can be very expensive to gather. In cases where the stats are not up to date, the optimizer can make the "wrong" decision, and perhaps use an index when it would actually be faster to do a table scan.
If this is known by the DBA/developer, they can give hints (which is what NO_INDEX is) to the optimizer, telling it not to use a given index because it's known to slow things down, often due to out-of-date stats.
In your example, TAB_000000000019 will refer to an index or a table (I'm guessing an index, since it looks like an auto-generated name).
It's a bit of a black art, to be honest, but that's the gist of it, as I understand things.
Disclaimer: I'm not a DBA, but I've dabbled in that area.
Per your update: If field1 is the only indexed field, then the original query was likely doing a fast full scan on that index (i.e. reading through every entry in the index and checking against the filter conditions on field1), then using those results to find the rows in the table and filter on the other conditions. The conditions on field1 are such that an index unique scan or range scan (i.e. looking up specific values or ranges of values in the index) would not be possible.
Likely the optimizer chose this path because there are two filter predicates on field1. The optimizer would calculate estimated selectivity for each of these and then multiply them to determine their combined selectivity. But in many cases this will significantly underestimate the number of rows that will match the condition.
The NO_INDEX hint eliminates this option from the optimizer's consideration, so it essentially goes with the plan it thinks is next best -- possibly in this case using partition elimination based on one of the other filter conditions in the query.
Using an index degrades query performance if it results in more disk IO compared to querying the table with an index.
This can be demonstrated with a simple table:
create table tq84_ix_test (
a number(15) primary key,
b varchar2(20),
c number(1)
);
The following block fills 1 Million records into this table. Every 250th record is filled with a rare value in column b while all the others are filled with frequent value:
declare
rows_inserted number := 0;
begin
while rows_inserted < 1000000 loop
if mod(rows_inserted, 250) = 0 then
insert into tq84_ix_test values (
-1 * rows_inserted,
'rare value',
1);
rows_inserted := rows_inserted + 1;
else
begin
insert into tq84_ix_test values (
trunc(dbms_random.value(1, 1e15)),
'frequent value',
trunc(dbms_random.value(0,2))
);
rows_inserted := rows_inserted + 1;
exception when dup_val_on_index then
null;
end;
end if;
end loop;
end;
/
An index is put on the column
create index tq84_index on tq84_ix_test (b);
The same query, but once with index and once without index, differ in performance. Check it out for yourself:
set timing on
select /*+ no_index(tq84_ix_test) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
select /*+ index(tq84_ix_test tq84_index) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
Why is it? In the case without the index, all database blocks are read, in sequential order. Usually, this is costly and therefore considered bad. In normal situation, with an index, such a "full table scan" can be reduced to reading say 2 to 5 index database blocks plus reading the one database block that contains the record that the index points to. With the example here, it is different altogether: the entire index is read and for (almost) each entry in the index, a database block is read, too. So, not only is the entire table read, but also the index. Note, that this behaviour would differ if c were also in the index because in that case Oracle could choose to get the value of c from the index instead of going the detour to the table.
So, to generalize the issue: if the index does not pick few records then it might be beneficial to not use it.
Something to note about indexes is that they are precomputed values based on the row order and the data in the field. In this specific case you say that field1 is indexed and you are using it in the query as follows:
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString'
In the query snippet above the filter is on both a variable piece of data since the percent (%) character cradles the string and then on another specific string. This means that the default Oracle optimization that doesn't use an optimizer hint will try to find the string inside the indexed field first and also find if the data it is a sub-string of the data in the field, then it will check that the data doesn't match another specific string. After the index is checked the other columns are then checked. This is a very slow process if repeated.
The NO_INDEX hint proposed by the DBA removes the optimizer's preference to use an index and will likely allow the optimizer to choose the faster comparisons first and not necessarily force index comparison first and then compare other columns.
The following is slow because it compares the string and its sub-strings:
field1_ like '%someGenericString%'
While the following is faster because it is specific:
field1_ like 'someSpecificString'
So the reason to use the NO_INDEX hint is if you have comparisons on the index that slow things down. If the index field is compared against more specific data then the index comparison is usually faster.
I say usually because when the indexed field contains more redundant data like in the example #Atish mentions above, it will have to go through a long list of comparison negatives before a positive comparison is returned. Hints produce varying results because both the database design and the data in the tables affect how fast a query performs. So in order to apply hints you need to know if the individual comparisons you hint to the optimizer will be faster on your data set. There are no shortcuts in this process. Applying hints should happen after proper SQL queries have been written because hints should be based on the real data.
Check out this hints reference: http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm
To add to what Rene' and Dave have said, this is what I have actually observed in a production situation:
If the condition(s) on the indexed field returns too many matches, Oracle is better off doing a Full Table Scan.
We had a report program querying a very large indexed table - the index was on a region code and the query specified the exact region code, so Oracle CBO uses the index.
Unfortunately, one specific region code accounted for 90% of the tables entries.
As long as the report was run for one of the other (minor) region codes, it completed in less than 30 minutes, but for the major region code it took many hours.
Adding a hint to the SQL to force a full table scan solved the problem.
Hope this helps.
I had read somewhere that using a % in front of query like '%someGenericString%' will lead to Oracle ignoring the INDEX on that field. Maybe that explains why the query is running slow.

Resources