Oracle 10g - optimize WHERE IS NOT NULL - oracle

We have Oracle 10g and we need to query 1 table (no joins) and filter out rows where 1 of the columns is null. When we do this - WHERE OurColumn IS NOT NULL - we get a full table scan on a very large table - BAD BAD BAD. The column has an index on it but it gets ignored in this instance. Are there any solutions to this?
Thanks

The optimizer thinks that the full table scan will be better.
If there are just a few NULL rows, the optimizer is right.
If you are absolutely sure that the index access will be faster (that is, you have more than 75% rows with col1 IS NULL), then hint your query:
SELECT /*+ INDEX (t index_name_on_col1) */
*
FROM mytable t
WHERE col1 IS NOT NULL
Why 75%?
Because using INDEX SCAN to retrieve values not covered by the index implies a hidden join on ROWID, which costs about 4 times as much as table scan.
If the index range includes more than 25% of rows, the table scan is usually faster.
As mentioned by Tony Andrews, clustering factor is more accurate method to measure this value, but 25% is still a good rule of thumb.

The optimiser will make its decision based on the relative cost of the full table scan and using the index. This mainly comes down to how many blocks will have to be read to satisfy the query. The 25%/75% rule of thumb mentioned in another answer is simplistic: in some cases a full table scan will make sense even to get 1% of the rows - i.e. if those rows happen to be spread around many blocks.
For example, consider this table:
SQL> create table t1 as select object_id, object_name from all_objects;
Table created.
SQL> alter table t1 modify object_id null;
Table altered.
SQL> update t1 set object_id = null
2 where mod(object_id,100) != 0
3 /
84558 rows updated.
SQL> analyze table t1 compute statistics;
Table analyzed.
SQL> select count(*) from t1 where object_id is not null;
COUNT(*)
----------
861
As you can see, only approximately 1% of the rows in T1 have a non-null object_id. But due to the way I built the table, these 861 rows will be spread more or less evenly around the table. Therefore, the query:
select * from t1 where object_id is not null;
is likely to visit almost every block in T1 to get data, even if the optimiser used the index. It makes sense then to dispense with the index and go for a full table scan!
A key statistic to help identify this situation is the index clustering factor:
SQL> select clustering_factor from user_indexes where index_name='T1_IDX';
CLUSTERING_FACTOR
-----------------
460
This value 460 is quite high (compared to the 861 rows in the index), and suggests that a full table scan will be used. See this DBAZine article on clustering factors.

If you are doing a select *, then it would make sense to do a table scan rather than using the index. If you know which columns you are interested in, you could create a covered index with those colums plus the one you are applying the IS NOT NULL condition.

It can depend on the type of index you have on the table.
Most B-tree indexes do not store null entries. Bitmap indexes do store null entries.
So, if you have:
select * from mytable
where mycolumn is null
and you have a standard B-tree index on mycolumn, then the query can't use the index as the "null" isn't in the index.
(If the index is against multiple columns, and one of the indexed columns is not null then there will be an entry in the index.)

Create an index on that column.
To make sure the index is used, it should be on the index and other columns in the where.
ocdecio answered:
If you are doing a select *, then it would make sense to do a table scan rather than using the index.
That's not strictly true; an index will be used if there is an index that fits your where clause, and the query optimizer decides using that index would be faster than doing a table scan. If there is no index, or no suitable index, only then must a table scan be done.

It's also worth checking whether Oracle's statistics on the table are up to date. It may not know that a full table scan will be slower.

Oracle database don't index null values at all in regular (b-tree) indexes, so it can't use it nor you can't force oracle database to use it.
BR

Using hints should be done only as a work around rather than a solution.
As mentioned in other answers, the null value is not available in B-TREE indexes.
Since you know that you have mostly null values in this column, would you be able to replace the null value by a range for instance.
That really depends on your column and the nature of your data but typically, if your column is a date type for instance:
where mydatecolumn is not null
Can be translated in a rule saying: I want all rows which have a date.
Then you can most definitely do this:
where mydatecolumn <=sysdate (in oracle)
This will return all rows with a date and ommit null values while taking advantage of the index on that column without using any hints.

See http://www.oracloid.com/2006/05/using-index-for-is-null/
If your index is on one single field, it will NOT be used. Try to add a dummy field or a constant in the index:
create index tind on t(field_to_index, 1);

Related

Indexes in oracle:

Which of the below would be the best way to create an Index, so that my query gives faster results?
My query is:
select emp_name,emp_last_name, salary
from employees_table
where salary <=2000;
1. create index emp_index on employees_table (salary);
2. create index emp_index on employees_table (emp_name,emp_last_name,salary);
In principle create index emp_index on employees_table (emp_name,emp_last_name,salary); would be the better one, because for your query Oracle has to read only the index but not any table data.
However, this means you create a separate index for each particular query which is certainly an overkill.
For 2 index Oracle may will be select table access full, if disctinct values in firsts columns of index to much, Oracle give you Table access full, if distinct values in first columns of index not much then you give index skip scan.
For first index you will see “index range scan” + “table access by rowid”....
but for both index’s - if sql returns >5-7% rows then Oracle maybe want to use “Table access full”...
All depends on your data in Table and count return rows

How to create an index so that the following statement has an index

I have table A(id,name,code)
I have sql statement:
Select * from A where upper(code || name) like upper('%<search text>%');
How to create an index so that the following statement has an index?
Question for two option: table partitioned, and table not partitioned
Thanks & BR
Do you have a performance issue or is this just a hypothetical question?
An index is unlikely to help with this example: a full table scan will probably be the quickest solution. Why? Your table has 3 columns. The best index would be one that avoided looking in the table at all e.g.
create index ai on a (code, name, id);
But that needs to contain all the same data as the table plus a ROWID for each table row - so it is going to be bigger than the table and take longer to scan. You could try putting the columns in the index with the least selective first and using compression:
create index ai on a (code, name, id) compress;
Now the index may be smaller than the table - it depends on how selective the code and name columns are. If it is small enough, the optimizer might decide to use it instead of the table. It still contains all the IDs and ROWIDs so the reduction in size probably won't be dramatic. In the test case I set up the compressed index is about half the size of the table, yet Explain Plan shows the query has a higher cost if I use a hint to force it to use the index - maybe due to overheads of compression, I don't know.
You could look into Oracle Text and the CONTAINS expression - but then you would be writing a different query, not using LIKE.

Best way to identify a handful of records expected to have a flag set to TRUE

I have a table that I expect to get 7 million records a month on a pretty wide table. A small portion of these records are expected to be flagged as "problem" records.
What is the best way to implement the table to locate these records in an efficient way?
I'm new to Oracle, but is a materialized view an valid option? Are there such things in Oracle such as indexed views or is this potentially really the same thing?
Most of the reporting is by month, so partitioning by month seems like an option, but a "problem" record may be lingering for several months theorectically. Otherwise, the reporting shuold be mostly for the current month. Would you expect that querying across all month partitions to locate any problem record would cause significant performance issues compared to usinga single table?
Your general thoughts of where to start would be appreciated. I realize I need to read up and I'll do that but I wanted to get the community thought first to make sure I read the right stuff.
One more thought: The primary key is a GUID varchar2(36). In order of magnitude, how much of a performance hit would you expect this to be relative to using a NUMBER data type PK? This worries me but it is out of my control.
It depends what you mean by "flagged", but it sounds to me like you would benefit from a simple index, function based index, or an indexed virtual column.
In all cases you should be careful to ensure that all the index columns are NULL for rows that do not need to be flagged. This way your index will contain only the rows that are flagged (Oracle does not - by default - index rows in B-Tree indexes where all index column values are NULL).
Your primary key being a VARCHAR2 GUID should make no difference, at least with regards to the specific flagging of rows in this question, indexes will point to rows via Oracle internal ROWIDs.
Indexes support partitioning, so if your data is already partitioned, your index could be set to match.
Simple column index method
If you can dictate how the flagging works, or the column already exists, then I would simply add an index to it like so:
CREATE INDEX my_table_problems_idx ON my_table (problem_flag)
/
Function-based index method
If the data model is fixed / there is no flag column, then you can create a function-based index assuming that you have all the information you need in the target table. For example:
CREATE INDEX my_table_problems_fnidx ON my_table (
CASE
WHEN amount > 100 THEN 'Y'
ELSE NULL
END
)
/
Now if you use the same logic in your SELECT statement, you should find that it uses the index to efficiently match rows.
SELECT *
FROM my_table
WHERE CASE
WHEN amount > 100 THEN 'Y'
ELSE NULL
END IS NOT NULL
/
This is a bit clunky though, and it requires you to use the same logic in queries as the index definition. Not great. You could use a view to mask this, but you're still duplicating logic in at least two places.
Indexed virtual column
In my opinion, this is the best way to do it if you are computing the value dynamically (available from 11g onwards):
ALTER TABLE my_table
ADD virtual_problem_flag VARCHAR2(1) AS (
CASE
WHEN amount > 100 THEN 'Y'
ELSE NULL
END
)
/
CREATE INDEX my_table_problems_idx ON my_table (virtual_problem_flag)
/
Now you can just query the virtual column as if it were a real column, i.e.
SELECT *
FROM my_table
WHERE virtual_problem_flag = 'Y'
/
This will use the index and puts the function-based logic into a single place.
Create a new table with just the pks of the problem rows.

Oracle: Index organised table with null values

I have a table which is basically a tree structure with a column parent_id and id.
parent_id is null for root nodes.
There is also a self referential foreign key, so that every parent_id has a corresponding id.
This table is mainly read-only with mostly infrequent batch updates.
One of the most common queries from the application which accesses this table is select ... where parent_id = X. I thought this might be faster if this table was index organised on parent_id.
However, I'm not sure how to index organise this table if parent_id can be null. I'd rather not fudge things so that parent_id=0 is some special id, as I'd have to add dummy values to the table to ensure the foreign key constraints are satisfied, and it also changes the application logic.
Is there any way to index organise a table by possible null value columns?
Solution from question asker:
I found I could get the same benefits from index organisation just by adding the queried columns to the end of the parent_id index, i.e. instead of:
create index foo_idx on foo_tab(parent_id);
I do:
create index foo_idx on foo_tab(parent_id, col1, col2, col3);
Where col1, col2, col3 etc are frequently accessed columns.
I've only done this with indexes which are used to return multiple rows which benefit from the ordering and hence disk locality provided by the index, instead of having to jump around the table. Indexes which are generally used to return single rows I've left to reference the table, as there is only one row to read anyway so locality matters much less.
Like I mentioned, this is a mainly read table, and also space is not a huge concern, so I don't think the overhead to writes caused by these indexes is a big concern.
(I realise this won't index null parent_ids, but instead I've made another index on decode(parent_id, null, 1, null) which indexes nulls and only nulls).
I would try adding the index on the single column parent_id.
If all of the columns in your index are non-null, then this row does not appear in your index.
So for the parent_id = X you cite above, this should use the index. However, if you're doing parent_id is null, then it won't use the index, and you'll be getting the same performance as you have now. This sounds like behaviour that would suit you.
I have used this in the past to improve the performance of queries. It works particulalry well if the number of items in the index is small compared to the number of rows in the database. We had about 3% of our rows in this particular index, and it flew :-)
But, as always, you need to try it and measure the difference in performance. Your mileage may vary.

Is an index clustered or unclustered in Oracle?

How can I determine if an Oracle index is clustered or unclustered?
I've done
select FIELD from TABLE where rownum <100
where FIELD is the field on which is built the index. I have ordered tuples, but the result is wrong because the index is unclustered.
By default all indexes in Oracle are unclustered. The only clustered indexes in Oracle are the Index-Organized tables (IOT) primary key indexes.
You can determine if a table is an IOT by looking at the IOT_TYPE column in the ALL_TABLES view (its primary key could be determined by querying the ALL_CONSTRAINTS and ALL_CONS_COLUMNS views).
Here are some reasons why your query might return ordered rows:
Your table is index-organized and FIELD is the leading part of its primary key.
Your table is heap-organized but the rows are by chance ordered by FIELD, this happens sometimes on an incrementing identity column.
Case 2 will return sorted rows only by chance. The order of the inserts is not guaranteed, furthermore Oracle is free to reuse old blocks if some happen to have available space in the future, disrupting the fragile ordering.
Case 1 will most of the time return ordered rows, however you shouldn't rely on it since the order of the rows returned depends upon the algorithm of the access path which may change in the future (or if you change DB parameter, especially parallelism).
In both case if you want ordered rows you should supply an ORDER BY clause:
SELECT field
FROM (SELECT field
FROM TABLE
ORDER BY field)
WHERE rownum <= 100;
There is no concept of a "clustered index" in Oracle as in SQL Server and Sybase. There is an Index-Organized Table, which is similar but not the same.
"Clustered" indices, as implemented in Sybase, MS SQL Server and possibly others, where rows are physically stored in the order of the indexed column(s) don't exist as such in Oracle. "Cluster" has a different meaning in Oracle, relating, I believe, to the way blocks and tables are organized.
Oracle does have "Index Organized Tables", which are physically equivalent, but they're used much less frequently because the query optimizer works differently.
The closest I can get to an answer to the identification question is to try something like this:
SELECT IOT_TYPE FROM user_tables
WHERE table_name = '<your table name>'
My 10g instance reports IOT or null accordingly.
Index Organized Tables have to be organized on the primary key. Where the primary key is a sequence generated value this is often useless or even counter-productive (because simultaneous inserts get into conflict for the same block).
Single table clusters can be used to group data with the same column value in the same database block(s). But they are not ordered.

Resources