Oracle: use index for searching null values - oracle

I've done some search but I prefer something like an hint or similar
http://www.dba-oracle.com/oracle_tips_null_idx.htm
http://www.oracloid.com/2006/05/using-index-for-is-null/

What about a functional index using NVL2, like;
CREATE TABLE foo (bar INTEGER);
INSERT INTO foo VALUES (1);
INSERT INTO foo VALUES (NULL);
CREATE INDEX baz ON foo (NVL2(bar,0,1));
and then;
DELETE plan_table;
EXPLAIN PLAN FOR SELECT * FROM foo WHERE NVL2(bar,0,1) = 1;
SELECT operation, object_name FROM plan_table;
should give you
OPERATION OBJECT_NAME
---------------- -----------
SELECT STATEMENT
TABLE ACCESS FOO
INDEX BAZ << yep

If you're asking, "How can I create an index that would allow it to be used when searching for NULL values on a particular field", my suggestion is to create an index on the field you're interested in PLUS the primary key field(s). Thus, if you've got a table called A_TABLE, with field VAL that you want to search for NULLs, and a primary key named PK, I'd create an index on (VAL, PK).
Share and enjoy.

I'm going to "answer" the non-question above.
The articles you link to are kinda right - Oracle's b-tree indexes will not capture when the leaf nodes are null. Take this example:
CREATE TABLE MYTABLE (
ID NUMBER(8) NOT NULL,
DAT VARCHAR2(100)
);
CREATE INDEX MYTABLE_IDX_1 ON MYTABLE (DAT);
/* Perform inserts into MYTABLE where some DAT are null */
SELECT COUNT(*) FROM MYTABLE WHERE DAT IS NULL;
The ending SELECT will not be able to use the index, because the leafs (right-most column) will not capture the nulls. Burleson's solution is stupid, because now you have to use a NVL in all your queries and have compromised the data in the tables. Gorbachev's method includes a known NOT NULL column for the leaves of the b-tree, but this expands the index for no reason. Maybe in his case the index made sense that way for tuning other queries, but if all you want to do is find the NULLs then the easiest solution is to make the leaf a constant.
CREATE INDEX MYTABLE_IDX_1 ON MYTABLE (DAT, 1);
Now, the leaves are all the constant (1), and by default the nulls will all be together (either at the top or bottom of the index, but it doesn't really matter as Oracle can use the index forwards or backwards). There is a slight storage penalty for that constant, but a single number is smaller than most other data fields in a typical table. Now the database can use the index when querying for nulls...if the optimizer finds that the best way to get the data.

Related

Database Indexing

I'm trying to figure out on how to create an Index for below query such that the SELECT statement only traverse the leaf level of the index horizontally and it does not access the relational table. I'm working on a relational database in Oracle.
SELECT SUM(SUM(qty))
FROM PlaceOrder
GROUP BY OrderNumber
HAVING COUNT(LineNumber) > 10;
Am I correct to create the below index?
CREATE INDEX IDX_PO
ON PlaceOrder(qty, OrderNumber, LineNumber);
Thank you.
As commented by astentx, not null constraint is needed. In fact, as long as any of qty, OrderNumber, LineNumber has not null, Oracle should be able to use the index.
Also, note that unless you specially want to exclude lines with null LineNumber, you can replace COUNT(LineNumber) with COUNT(OrderNumber) or even COUNT(*).

find a best way to traverse oracle table

I have an oracle table. Table's DDL is (not have the primary key)
create table CLIENT_ACCOUNT
(
CLIENT_ID VARCHAR2(18) default ' ' not null,
ACCOUNT_ID VARCHAR2(18) default ' ' not null,
......
)
create unique index UK_ACCOUNT
on CLIENT_ACCOUNT (CLIENT_ID, ACCOUNT_ID)
Then, the data's scale is very huge, maybe 100M records. I want to traverse this whole table's data with batch.
Now, I use the table's index to batch traverse. But I have some oracle grammar problems.
# I want to use this SQL, but grammar error.
# try to use b-tree's index to locate start position, but not work
select * from CLIENT_ACCOUNT
WHERE (CLIENT_ID, ACCOUNT_ID) > (1,2)
AND ROWNUM < 1000
ORDER BY CLIENT_ID, ACCOUNT_ID
Has the fastest way to batch touch table data?
Wild guess:
select * from CLIENT_ACCOUNT
WHERE CLIENT_ID > '1'
and ACCOUNT_ID > '2'
AND ROWNUM < 1000;
It would at least compile, although whether it correctly implements your business logic is a different matter. Note that I have cast your filter criteria to strings. This is because your columns have a string datatype and you are defaulting them to spaces, so there's a high probability those columns contain non-numeric values.
If this doesn't solve your problem, please edit your question with more details; sample input data and expected output is always helpful in these situations.
Your data model seems odd.
Your columns are defined as varchar2. So why is your criteria numeric?
Also, why do you default the key columns to space? It would be better to leave unpopulated values as null. (To be clear, NULL is not a good thing in an indexed column, it's just better than a space.)

Define index for sparse column

I have a table with a columns 'A' and 'B'.
'A' is a column with 90% 'null' and 10% different values , and most of the time I query to have record with one or two of these different values.
and 'B' is a column with 90% value='1' and 10% different values and most of the time I query to have record with one or two of these different values.
In this table we have DML transaction most of the time.
now , I don't know define index on these columns is good? if yes which type of index?
In principle Bitmap Index would be the best in such situation. However, due to mulit-user environment they are not suitable - you would slow down your application significantly by table locks and perhaps get even dead-locks.
Maybe you can optimize your application by smart partitioning and usage of Partial Indexes (new feature in Oracle 12c)
CREATE TABLE statements below should be equivalent.
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (a) SUBPARTITION BY LIST (b) (
PARTITION part_a_NULL VALUES (NULL) (
SUBPARTITION part_a_NULL_b_1 VALUES (1) INDEXING OFF,
SUBPARTITION part_a_NULL_b_other VALUES (DEFAULT) INDEXING ON
),
PARTITION part_a_others VALUES (DEFAULT) (
SUBPARTITION part_a_others_b_1 VALUES (1) INDEXING OFF,
SUBPARTITION part_a_others_b_other VALUES (DEFAULT) INDEXING ON
)
);
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (a) SUBPARTITION BY LIST (b)
SUBPARTITION TEMPLATE (
SUBPARTITION b_1 VALUES (1) INDEXING OFF,
SUBPARTITION b_other VALUES (DEFAULT) INDEXING ON
)
(
PARTITION part_a_NULL VALUES (NULL),
PARTITION part_a_others VALUES (DEFAULT)
);
CREATE INDEX IND_A ON YOUR_TABLE (A) LOCAL INDEXING PARTIAL;
CREATE INDEX IND_B ON YOUR_TABLE (B) LOCAL INDEXING PARTIAL;
By this your index will consume only 10% of entire tablespace. If your WHERE condition is WHERE A IS NULL or WHERE B = 1 then Oracle optimizer would skip such indexes anyway.
Verify with this query
SELECT table_name, partition_name, subpartition_name, indexing
FROM USER_TAB_SUBPARTITIONS
WHERE table_name = 'YOUR_TABLE';
if INDEXING is used on desired subpartitions.
Update
I just see actually this is an overkill because NULL values on column A do not create any index entry anyway. So, it can be simplified to
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (b) (
PARTITION part_b_1 VALUES (1) INDEXING OFF,
PARTITION part_b_other VALUES (DEFAULT) INDEXING ON
);
For example, if you have index a_b_idx on A, B (in that order):
a) select ... from ... where A = ... will use index
b) select ... from ... where B = ... will not use index
On the other side, if you have index b_a_idx on B, A:
a) select ... from ... where A = ... will not use index
b) select ... from ... where B = ... will use index
Oracle can't use second column in index if it doesn't filter on first column, since in regular cases index is tree-like structure: column1->column2->column3->etc.
You need index on column A only or on columns A, B if you do queries like a).
You need index on column B only or on columns B, A if you do queries like b).
Oracle doesn't store all-null values in index, but it can store null value for A if B contains non-null value.
Sometimes it's more fruitful to read whole table into memory and ignore index. Optimizer can do it if possible result set is big and it goes for all records, since index-to-record transition costs more than simple records read.
Also sometimes it happens erroneously for tables without statistics, so you either need jobs with alter table ... compute statistics or oracle 11+ that can compute statistics like this without jobs.
Most of the times, another index is good thing for queries, but bad thing for updates/disk. Each index takes disk space and each update of record(s) makes updates to every index. So for heavily updated tables it's not good to have many indexes, but for frequently queried tables it's better to have indexes covering all common cases.
For most flat queries (without joins/subqueries/hierarchy) only 1 index is used, so having indexes for each column is generally just a waste of disk space. You need multicolumn index to optimize where A=... and B=...
As for index type, you probably need simple non-unique indexes.
Column A
Let assume that you create an index named _columnA_index_. In general, indexes in RDBMS would not include NULL values, which means there is no index entries in _columnA_index_ pointing to records having NULL values. Thus, the following query
Q1: select * from MyTable where A is null;
will result in a table scan instead ( or DBMS opts to use another index on another column if any)
However, since there is 10% of records having 'different values', the _columnA_index_ will of course help for queries, for example.
Q2: select * from MyTable where A = '123';
In the above example, if the query returns < 1% of the records, the _columnA_index_ is helpful. Depending on how selective the query is, the index greatly improves the performance. You can create an index that is suitable for datatype of column A.
Column B
Similarly, an index on B will not help
Q3: select * from MyTable where B = 1;
but it will help with different values
Q4: select * from MyTable where B = '456';
NULL values
So far, I answered that any index does not help with NULL values. Therefore, if you need to query Q1 most of the time, I suggest the following ideas
Make sure that your version of DBMS does support NULL values be included in indexes. For example Oracle 11g does but not versions before that.
Plan to create function-based index here, again with Oracle. But you can take the idea at least.
Redesign the logic of your application / your need to do querying on Null values. I prefer this approach.

Procedure to alter and update table on hierarchical relationship to see if there are any children

I have a hierarchical table on Oracle pl/sql. something like:
create table hierarchical (
id integer primary key,
parent_id references hierarchical ,
name varchar(100));
I need to create a procedure to alter that table so I get a new field that tells, for each node, if it has any children or not.
Is it possible to do the alter and the update in one single procedure?
Any code samples would be much appreciated.
Thanks
You can not do the ALTER TABLE (DDL) and the UPDATE (DML) in a single step.
You will have to do the ALTER TABLE, followed by the UPDATE.
BEGIN
EXECUTE IMMEDIATE 'ALTER TABLE hierarchical ADD child_count INTEGER';
--
EXECUTE IMMEDIATE '
UPDATE hierarchical h
SET child_count = ( SELECT COUNT(*)
FROM hierarchical h2
WHERE h2.parent_id = h.id )';
END;
Think twice before doing this though. You can easily find out now if an id has any childs with a query.
This one would give you the child-count of all top-nodes for example:
SELECT h.id, h.name, COUNT(childs.id) child_count
FROM hierarchical h
LEFT JOIN hierarchical childs ON ( childs.parent_id = h.id )
WHERE h.parent_id IS NULL
GROUP BY h.id, h.name
Adding an extra column with redundant data will make changing your data more difficult, as you will always have to update the parent too, when adding/removing childs.
If you just need to know whether children exist, the following query can do it without the loop or the denormalized column.
select h.*, connect_by_isleaf as No_children_exist
from hierarchical h
start with parent_id is null
connect by prior id = parent_id;
CONNECT_BY_LEAF returns 0 if the row has children, 1 if it does not.
I think you could probably get the exact number of children through a clever use of analytic functions and the LEVEL pseudo-column, but I'm not sure.

Sequence with variable

In SQL we will be having a sequence. But it should be appended to a variable like this
M1,M2,M3,M4....
Any way of doing this ?
Consider having the prefix stored in a separate column in the table, e.g.:
CREATE TABLE mytable (
idprefix VARCHAR2(1) NOT NULL,
id NUMBER NOT NULL,
CONSTRAINT mypk PRIMARY KEY (idprefix, id)
);
In the application, or in a view, you can concatenate the values together. Or, in 11g you can create a virtual column that concatenates them.
I give it 99% odds that someone will say "we want to search for ID 12345 regardless of the prefix" and this design means you can have a nice index lookup instead of a "LIKE '%12345'".
select 'M' || my_sequence.nextval from dual;

Resources