SQLITE Operators "=" compare with "< AND >" performance difference - performance

I've got a strange problem with my sql queries' performance. When I use operator = in LEFT JOIN query takes about 30.514 minutes seconds but in a case with < AND > it takes only 1.717 seconds. This is the query:
-- data_filehash.size>4095 AND data_filehash.size<4097 || 1.717 seconds
SELECT files.*, data_filehash.*
FROM v_filesp AS files
LEFT JOIN data_filehash ON files.id = data_filehash.file AND data_filehash.size>4095 AND data_filehash.size<4097
WHERE data_filehash.file IS NULL
-- data_filehash.size=4096 || 30.515 minutes
SELECT files.*, data_filehash.*
FROM v_filesp AS files
LEFT JOIN data_filehash ON files.id = data_filehash.file AND data_filehash.size=4096
WHERE data_filehash.file IS NULL
Results are always same (33016 records in my database);
v_filep is a view; I've got indexes on data_filehash.size, data_filehash.file and primary key on files(v_filesp).id;
I think that isn't normal. Maybe I should configure something or I don't understand that.
There are EXPLAIN QUERY PLAN for both queries:
query wih = (slower)
SEARCH TABLE files USING INDEX files_c_dup (c_dup=?)
SEARCH TABLE dirs USING INTEGER PRIMARY KEY (rowid=?)
SEARCH TABLE data_filehash USING INDEX index_size (size=?)
query with < AND > (faster)
SEARCH TABLE files USING INDEX files_c_dup (c_dup=?)
SEARCH TABLE dirs USING INTEGER PRIMARY KEY (rowid=?)
SEARCH TABLE data_filehash USING INDEX index_file (file=?)
Last steps are different but what does it mean? How can I tell the db that it should use second better algorithm in the first query?

Update, at first I misread this to be that the inequality comparison was much slower. Which is usually what one expects. It wasn't the case, so let's have another crack at it.
With the inequality comparision The engine first has to find all the records that match the condition size > 4095 it's quite likely that there will be very many. There may be so many matches that it would be futile for the engine to use an index. A full table scan could happen.
But sqlite can only use one index per table in a query. If it cannot use an index on size the best thing is to use the index on file. And this null comparison probably eliminates a large number of rows which leads to a faster query.
It's much simpler with an equality comparison so it looks to use the index on the size field but this probably eliminates a much smaller number of rows than with the other index on is null.
If this still doesn't explain matters, can you update your question to show the number of records returned, the number of records with file=4096 and number of null names.

Ok, now it works corectly with equality comparison. I added INDEXED BY:
SELECT files.*, data_filehash.*
FROM v_filesp AS files
LEFT JOIN data_filehash INDEXED BY index_file
ON files.id = data_filehash.file AND data_filehash.size=4096
WHERE files.c_dup=1 AND data_filehash.file IS NULL
Thanks e4c5

Related

Oracle - Select Only Columns That Contain Data

We have a database with a vast number of tables and columns that was set up by a 3rd party.
Many of these columns are entirely unused. I am trying to create a query that returns a list of all the columns that are actually used (contain > 0 values).
My current attempt -
SELECT table_name, column_name
FROM ALL_TAB_COLUMNS
WHERE OWNER = 'XUSER'
AND num_nulls < 1
;
Using num_nulls < 1 dramatically reduces the number of returned values, as expected.
However, on inspection of some of the tables, there are columns missing from the results of the query that appear to have values in them.
Could anybody explain why this might be the case?
First of all, statistics are not always 100% accurate. They can be gathered on a subset of the table rows, since they are, after all, statistics. Just like pollsters do not have to ask every American how they feel about a given politician, Oracle can get an accurate-enough sense of the data in a table by reading only a portion of it.
Even if the statistics were gathered on 100% of the rows in a table (and they can be gathered that way, if you want), the statistics will become outdated as soon as there are any inserts, updates, or deletes on the table.
Second of all, num_nulls < 1 wouldn't tell you the columns that had no data. Imagine a table with 100 rows and "column X" having num_nulls equal to 80. That would imply the column has 20 non-null values, but would NOT pass your filter. A better approach (if you trust your statistics are not stale and based on a 100% sample of the rows), might be to compare DBA_TAB_COLUMNS.NUM_NULLS < DBA_TABLES.NUM_ROWS. For example, a column that has 99 nulls in a 100 row table has data in 1 row.
"there are columns missing from the results of the query that appear to have values in them."
Potentially every non-mandatory column could appear in this set, because it is likely that some rows will have values but not all rows. "Some rows" being greater than zero means such columns won't pass your test for num_nulls < 1.
So maybe you should search for columns which aren't in use. This query will find columns where every row is null:
select t.table_name
, tc.column_name
from user_tables t
join user_tab_cols tc on t.table_name = tc.table_name
where t.num_rows > 0
and t.num_rows = tc.num_nulls;
Note that if you are using Partitioning you will need to scan user_tab_partitions.num_rows and user_part_col_statistics.num_nulls.
Also, I second the advice others have given regarding statistics. The above query may throw out some false positives. I would treat the results generated from that query as a list of candidates to be investigated further. For instance you could generate queries which counted the actual number of nulls for each column.

different columns on select query results different costs

There is an index at table invt_item_d on (item_id & branch_id & co_id) columns.
The plan results for the first query are TABLE ACCESS FULL and cost is 528,
results for the second query are INDEX FAST FULL SCAN (my index) and cost is 27.
The only difference is, as you can see, the selected column is used in index on the second query.
Is there something wrong with this? And please, can you tell me what should I do to fix this at db administration level?
select d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
select d.item_id
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
EDIT:
i made a new query and this query's cost is 529, with TABLE ACCESS FULL.
select qty from invt_item_d
so it doesn't matter if i use an index or not. Some says this is normal, is this a normal behaviour really?
In the first case, the table must be accessed, since the "qty" column is only stored in the table.
In the second case, all the columns used in the query can be read from the index, skipping the table read altogether.
You can add another index on columns (item_id, branch_id, co_id, qty) and it will most probably be used in the first query.
From the Oracle documentation: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
A fast full index scan is a full index scan in which the database
accesses the data in the index itself without accessing the table, and
the database reads the index blocks in no particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set. For this result to be guaranteed, at least one column in the
index must have either:
A NOT NULL constraint
A predicate applied to it that prevents nulls from being considered in the query result set
This is exactly the main purpose of using index -- make search faster.
Querying columns with indexes are faster compared to querying columns without indexes.
Its basic oracle knowledge.
I am adding another answer because it seems to be more convinient.
First:
" i doesn't hit the index because there are 34000 rows, not millions". This is COMPLETELY WRONG and a dangerous understanding.
What I meant was, if there are a few thousand rows, and the index is not hit(oracle engine does a full table scan(TABLE ACCESS FULL) then), its not a big deal. Oracle is fast enough to read few thousand rows in a matter of a second(even without indexes) , and hence you wont feel the difference.The query is still slower(than the occasion when there is an index) , but its is so minimally slower that you wont feel the difference.
But, if there are millions of rows, the execution of the query will be much, much slower without index ( as this time it will scan millions of rows in a full table scan)and your performance will be hit.
Second: Why on earth do you have to loop over a table with 34000 rows, that too 4000 times???
Thats a terrible approach. Avoid loops as much as possible.There has to be a better approach!
Third:
You can force the oracle optimiser to hit the index by using the index hint.You will need to know the name of the index for that.
select /*+ index(invt_item_d <index_name>) */
d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
Here is the link to a stack overflow question on index hint

Is it OK to create several indexes on a table if they're really needed

I have a table with 7 columns.
It's going to contain lots and lots of data - something like more than 1.7 million records will be added every month.
Of those 7 columns 5 are the ones that I'll be using in the WHERE clause of my queries against this table in different combinations.
Is it OK to create different indexes for those possible combinations ?
I'm asking this question because if I do that, there'll be more than 10 indexes on this table and I'm not sure if this is a good idea.
On the other hand, I'm afraid of querying a table with this big amount of data without indexes.
Here's the table:
CREATE TABLE AG_PAYMENTS_TO_BE
(
PAYMENTID NUMBER(15, 0) NOT NULL
, DEPARTID NUMBER(3,0)
, PENSIONERID NUMBER(11, 0) NOT NULL
, AMOUNT NUMBER(6, 2)
, PERIOD CHAR(6 CHAR)
, PAYMENTTYPE NUMBER(1,0)
, ST NUMBER(1, 0) DEFAULT 0
, CONSTRAINT AG_PAYMENTS_TO_BE_PK PRIMARY KEY
(
PAYMENTID
)
ENABLE
);
Possible queries:
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND DEPARTID=112 AND PERIOD='201207';
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND PENSIONERID=123456 AND PERIOD='201207';
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND PENSIONERID=123456 AND PERIOD='201207' AND PAYMENTTYPE=1;
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND DEPARTID=112 AND ST=0;
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND PENSIONERID=123456;
and so on.
Ignoring index skip scans* for the moment, in order for a query to use an index:
The leading index columns must be listed in the query
They must compared using exact joins (i.e. using =, not <,> or like)
For example, a table with a composite index on (a, b) could use the index in the following queries:
a = :b1 and b >= :b2
a = :b1
but not:
b = :b2
because column b is listed second in the index. * In some cases, it's possible for the index to be used in this case via an index skip scan. This is where the leading column in the index is skipped. There needs to be relatively few distinct values for the first column however, which doesn't happen often (in my experience).
Note that a "larger" index can be used by queries which only use some of the leading columns from it. So in the example above, an index on just a is redundant because the queries shown can use the index on a, b. An index on just b may be useful however.
The more indexes you add, the slower your inserts/updates/deletes will be, because the indexes have to be maintained at the same time as the table. Therefore you should aim to keep the number of indexes down, unless there's significant query benefits to adding a new one. This is something you'll have to measure in your environment to determine the exact cost/benefit.
Note that having multiple indexes with similar columns can lead to the wrong index being selected. So there is potential downside for selects when you have many similar indexes. There is also a slight overhead in parse times, as Oracle has more options to consider when selecting the execution plan.
Looking at your queries I believe you only need indexes on:
st, departid, period
st, pensionerid, period
You may wish to add amount at the end of these as well, so your queries can be fully answered from the index, saving you a table lookup. You may also need further indexes if these columns are foreign keys to other tables, to prevent locking issues.
This decision would greatly depend on expected number of distinct values in each column, and thus selectivity of each possible index.
Things I would consider while making decisions:
Obviously, PAYMENTTYPE & ST fields hold up to 10 19 distinct values each, which is pretty unselective if we keep in mind your expected volume of data (~400M rows), so they won't help you much.
However, they probably could become good candidates for list partitioning instead.
I would also think of switching PERIOD CHAR(6 CHAR) to DATE and making a composite range-list partition on period+st/paymenttype.
DEPARTID - If you have hundreds of departments, then it's probably an indexing candidate, but if only dozens - then probably a full scan would perform way faster.
PENSIONERID seems to be a high-selectivity field, so I would consider creating a separate index on it, and including it in a composite index on PERIOD+PENSIONERID (in that field order).
I think you should create a few combined indexes (like ('ST' and 'PERIOD') and ('ST' and 'PENSIONERID'). That will speed up most of your sample queries...

Getting count from large tables

I was trying to get the count from a table with millions of entries. My query looks somewhat like this:
Select count(*)
from Users
where status = 'A' and office_id = '000111' and user_type = 'C'
Status can be A or C, User Type can be C or R.
Status, Office_id and User_type are Strings
The result has around 10 million rows, and its taking a lot of time. I just want the total count.
Would appreciate if anyone could tell me why its taking this much time, and workaround if any.
Do let me know in case of any more details required.
The database engine is Oracle 11g
Edit: I Added index for all three columnns. Still theres no improvement. Also tried the below query, but it always returns the total count in the table without checking the conditions.
SELECT COUNT(office_id_key)
FROM Users
WHERE EXISTS (SELECT * FROM Users WHERE status = 'A' AND office_id = '000111' AND user_type = 'C')
Why not just simply create indexes on the table on age and place this way your search will be faster then simply scanning the entire table for these values.
CREATE INDEX age_index ON Employee(age);
CREATE INDEX place_index ON Employee(place);
This should speed up the process.
AMENDED BASED ON QUERY CHANGE
CREATE INDEX status_index ON Users(status);
CREATE INDEX office_id_index ON Users(office_id);
CREATE INDEX user_type_index ON Users(user_type);
You'll want to create the following multi-column index on the Users table to improve the query:
(office_id, status, user_type)
The database can use a "covering" index with COUNT(*). Create the index with the columns in that order, due to cardinality.
After adding the indexes, I think changing where to where exists and a subquery may help as well.
Edit2: removed exists as it was returning all valid, usually the subquery has multiple joins, but I guess the case with one table returns all true. I read that count is optimized to act similar to exists when it has only one table and no where clause, so I treat the results as a table. Hopefully, this will have the same quick results.
select count(1) from
(select 1 from Employee where age = '25' and place = 'bricksgate')
Edit: When you use 'where exists' the db server doesn't load your data into memory and also takes advantage of the indexes because you will be reading values from the indexes not doing costly table lookups. You may also want to change count(*) to count(place) - that way it will limit the fields to an indexed field as well.
In your original query, your data was doing table lookups and then loading them into memory just to be counted.
count(1) works faster than count(*)

Oracle - Understanding the no_index hint

I'm trying to understand how no_index actually speeds up a query and haven't been able to find documentation online to explain it.
For example I have this query that ran extremely slow
select *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And one of our DBAs was able to speed it up significantly by doing this
select /*+ NO_INDEX(TAB_000000000019) */ *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And I can't figure out why? I would like to figure out why this works so I can see if I can apply it to another query (this one a join) to speed it up because it's taking even longer to run.
Thanks!
** Update **
Here's what I know about the table in the example.
It's a 'partitioned table'
TAB_000000000019 is the table not a column in it
field1 is indexed
Oracle's optimizer makes judgements on how best to run a query, and to do this it uses a large number of statistics gathered about the tables and indexes. Based on these stats, it decides whether or not to use an index, or to just do a table scan, for example.
Critically, these stats are not automatically up-to-date, because they can be very expensive to gather. In cases where the stats are not up to date, the optimizer can make the "wrong" decision, and perhaps use an index when it would actually be faster to do a table scan.
If this is known by the DBA/developer, they can give hints (which is what NO_INDEX is) to the optimizer, telling it not to use a given index because it's known to slow things down, often due to out-of-date stats.
In your example, TAB_000000000019 will refer to an index or a table (I'm guessing an index, since it looks like an auto-generated name).
It's a bit of a black art, to be honest, but that's the gist of it, as I understand things.
Disclaimer: I'm not a DBA, but I've dabbled in that area.
Per your update: If field1 is the only indexed field, then the original query was likely doing a fast full scan on that index (i.e. reading through every entry in the index and checking against the filter conditions on field1), then using those results to find the rows in the table and filter on the other conditions. The conditions on field1 are such that an index unique scan or range scan (i.e. looking up specific values or ranges of values in the index) would not be possible.
Likely the optimizer chose this path because there are two filter predicates on field1. The optimizer would calculate estimated selectivity for each of these and then multiply them to determine their combined selectivity. But in many cases this will significantly underestimate the number of rows that will match the condition.
The NO_INDEX hint eliminates this option from the optimizer's consideration, so it essentially goes with the plan it thinks is next best -- possibly in this case using partition elimination based on one of the other filter conditions in the query.
Using an index degrades query performance if it results in more disk IO compared to querying the table with an index.
This can be demonstrated with a simple table:
create table tq84_ix_test (
a number(15) primary key,
b varchar2(20),
c number(1)
);
The following block fills 1 Million records into this table. Every 250th record is filled with a rare value in column b while all the others are filled with frequent value:
declare
rows_inserted number := 0;
begin
while rows_inserted < 1000000 loop
if mod(rows_inserted, 250) = 0 then
insert into tq84_ix_test values (
-1 * rows_inserted,
'rare value',
1);
rows_inserted := rows_inserted + 1;
else
begin
insert into tq84_ix_test values (
trunc(dbms_random.value(1, 1e15)),
'frequent value',
trunc(dbms_random.value(0,2))
);
rows_inserted := rows_inserted + 1;
exception when dup_val_on_index then
null;
end;
end if;
end loop;
end;
/
An index is put on the column
create index tq84_index on tq84_ix_test (b);
The same query, but once with index and once without index, differ in performance. Check it out for yourself:
set timing on
select /*+ no_index(tq84_ix_test) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
select /*+ index(tq84_ix_test tq84_index) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
Why is it? In the case without the index, all database blocks are read, in sequential order. Usually, this is costly and therefore considered bad. In normal situation, with an index, such a "full table scan" can be reduced to reading say 2 to 5 index database blocks plus reading the one database block that contains the record that the index points to. With the example here, it is different altogether: the entire index is read and for (almost) each entry in the index, a database block is read, too. So, not only is the entire table read, but also the index. Note, that this behaviour would differ if c were also in the index because in that case Oracle could choose to get the value of c from the index instead of going the detour to the table.
So, to generalize the issue: if the index does not pick few records then it might be beneficial to not use it.
Something to note about indexes is that they are precomputed values based on the row order and the data in the field. In this specific case you say that field1 is indexed and you are using it in the query as follows:
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString'
In the query snippet above the filter is on both a variable piece of data since the percent (%) character cradles the string and then on another specific string. This means that the default Oracle optimization that doesn't use an optimizer hint will try to find the string inside the indexed field first and also find if the data it is a sub-string of the data in the field, then it will check that the data doesn't match another specific string. After the index is checked the other columns are then checked. This is a very slow process if repeated.
The NO_INDEX hint proposed by the DBA removes the optimizer's preference to use an index and will likely allow the optimizer to choose the faster comparisons first and not necessarily force index comparison first and then compare other columns.
The following is slow because it compares the string and its sub-strings:
field1_ like '%someGenericString%'
While the following is faster because it is specific:
field1_ like 'someSpecificString'
So the reason to use the NO_INDEX hint is if you have comparisons on the index that slow things down. If the index field is compared against more specific data then the index comparison is usually faster.
I say usually because when the indexed field contains more redundant data like in the example #Atish mentions above, it will have to go through a long list of comparison negatives before a positive comparison is returned. Hints produce varying results because both the database design and the data in the tables affect how fast a query performs. So in order to apply hints you need to know if the individual comparisons you hint to the optimizer will be faster on your data set. There are no shortcuts in this process. Applying hints should happen after proper SQL queries have been written because hints should be based on the real data.
Check out this hints reference: http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm
To add to what Rene' and Dave have said, this is what I have actually observed in a production situation:
If the condition(s) on the indexed field returns too many matches, Oracle is better off doing a Full Table Scan.
We had a report program querying a very large indexed table - the index was on a region code and the query specified the exact region code, so Oracle CBO uses the index.
Unfortunately, one specific region code accounted for 90% of the tables entries.
As long as the report was run for one of the other (minor) region codes, it completed in less than 30 minutes, but for the major region code it took many hours.
Adding a hint to the SQL to force a full table scan solved the problem.
Hope this helps.
I had read somewhere that using a % in front of query like '%someGenericString%' will lead to Oracle ignoring the INDEX on that field. Maybe that explains why the query is running slow.

Resources