Oracle query with "in clause" - how to speed up using an index? - oracle

I got a oracle query that uses an in-clause with eight given values, like:
select * from mytable a
where a.wf_type in ('value1', 'value2', 'value3', 'value4', 'value5', 'value6', 'value7', 'value8');
The table is not really big (about 3 million rows) and the query does a full table scan.
Therefore I added an index for the wf_type attribute.
But the index is not used by the query with the in-clause. If I change the query to one specific value like
select * from mytable a where a.wf_type = 'value1';
the index is used and the query runs fast.
How do I fasten the query with the in-clause? Is it possible by using an index or are there other ways?

Related

Efficent use of an index for a self join with a group by

I'm trying to speed up the following
create table tab2 parallel 24 nologging compress for query high as
select /*+ parallel(24) index(a ix_1) index(b ix_2)*/
a.usr
,a.dtnum
,a.company
,count(distinct b.usr) as num
,count(distinct case when b.checked_1 = 1 then b.usr end) as num_che_1
,count(distinct case when b.checked_2 = 1 then b.usr end) as num_che_2
from tab a
join tab b on a.company = b.company
and b.dtnum between a.dtnum-1 and a.dtnum-0.0000000001
group by a.usr, a.dtnum, a.company;
by using indexes
create index ix_1 on tab(usr, dtnum, company);
create index ix_2 on tab(usr, company, dtnum, checked_1, checked_2);
but the execution plan tells me that it's going to be an index full scan for both indexes, and the calculations are very long (1 day is not enough).
About the data. Table tab has over 3 mln records. None of the single columns are unique. The unique values here are pairs of (usr, dtnum), where dtnum is a date with time written as a number in the format yyyy,mmddhh24miss. Columns checked_1, checked_2 have values from set (null, 0, 1, 2). Company holds an id for a company.
Each pair can only have one value checked_1, checked_2 and company as it is unique. Each user can be in multple pairs with different dtnum.
Edit
#Roberto Hernandez: I've attached the picture with the execution plan. As for parallel 24, in our company we are told to create tables with options 'parallel [num] nologging compress for query high'. I'm using 24 but I'm no expert in this field.
#Sayan Malakshinov: http://sqlfiddle.com/#!4/40b6b/2 Here I've simplified by giving data with checked_1 = checked_2, but in real life this may not be true.
#scaisEdge:
For
create index my_id1 on tab (company, dtnum);
create index my_id2 on tab (company, dtnum, usr);
I get
For table tab Your join condition is based on columns
company, datun
so you index should be primarly based on these columns
create index my_id1 on tab (company, datum);
The indexes you are using are useless because don't contain in left most position columsn use ij join /where condition
Eventually you can add user right most potition for avoid the needs of table access and let the db engine retrive alla the inf inside the index values
create index my_id1 on tab (company, datum, user, checked_1, checked_2);
Indexes (bitmap or otherwise) are not that useful for this execution. If you look at the execution plan, the optimizer thinks the group-by is going to reduce the output to 1 row. This results in serialization (PX SELECTOR) So I would question the quality of your statistics. What you may need is to create a column group on the three group-by columns, to improve the cardinality estimate of the group by.

Understanding the behavior of dbms_random.value in where clause

I have a table with two columns(Using oracle 11g database) : Country, IndexNumber. Table contains 10 rows(10 different cities and with its unique index number.)
For example:
Country IndexNUmber
India 1
Australia 2
. .
. .
. .
. .
US 10
Now i want to fetch a random row from above table by generating random number using dbms_random.value(1,10). To achieve that i am using below query:
select * from tab_name where indexnumber = dbms_random.value(1,10);
I am not able to understand the output of this query as some time it is fetching one row, some time zero rows and some time more that one row.
Can someone please make me understand how oracle is evaluating this query.
Thanks
Ankit
Since dbms_random.value is a nondeterministic PL/SQL function, it will be called once for each row evaluated by the query.
The function might return 4 when evaluating the first row, then it might return 8 on the second row, etc.
To compare each row to a single random number, you can turn the function call into a scalar subquery, e.g.:
select * from tab_name where indexnumber = (select dbms_random.value(1,10) from dual);
Since the subquery is not correlated to the main query, Oracle will execute it only once (for the first row returned from the table) and remember the result for all subsequent rows. In particular, if a suitable index is on indexnumber the query will be able to use it more efficiently since it knows it is probing for a single value.
When you run your original query:
select * from tab_name where indexnumber = dbms_random.value(1,10);
it appears that the call to dbms_random is happening for each record's where clause. In other words, there is a chance that every record in your table might be returned if the random number chosen happen to match the index for every record. If you want to retrieve a single random record, then follow this pattern:
select *
from
( select * from tab_name order by DBMS_RANDOM.VALUE )
where rownum < 2;

SQLite SELECT with max() performance

I have a table with about 1.5 million rows and three columns. Column 'timestamp' is of type REAL and indexed. I am accessing the SQLite database via PHP PDO.
The following three selects run in less than a millisecond:
select timestamp from trades
select timestamp + 1 from trades
select max(timestamp) from trades
The following select needs almost half a second:
select max(timestamp) + 1 from trades
Why is that?
EDIT:
Lasse has asked for a "explain query plan", I have run this within a PHP PDO query since I have no direct SQLite3 command line tool access at the moment. I guess it does not matter, here is the result:
explain query plan select max(timestamp) + 1 from trades:
[selectid] => 0
[order] => 0
[from] => 0
[detail] => SCAN TABLE trades (~1000000 rows)
explain query plan select max(timestamp) from trades:
[selectid] => 0
[order] => 0
[from] => 0
[detail] => SEARCH TABLE trades USING COVERING INDEX tradesTimestampIdx (~1 rows)
The reason this query
select max(timestamp) + 1 from trades
takes so long is that the query engine must, for each record, compute the MAX value and then add one to it. Computing the MAX value involves doing a full table scan, and this must be repeated for each record because you are adding one to the value.
In the query
select timestamp + 1 from trades
you are doing a calculation for each record, but the engine only needs to scan the entire table once. And in this query
select max(timestamp) from trades
the engine does have to scan the entire table, however it also does so only once.
From the SQLite documentation:
Queries that contain a single MIN() or MAX() aggregate function whose argument is the left-most column of an index might be satisfied by doing a single index lookup rather than by scanning the entire table.
I emphasized might from the documentation, because it appears that a full table scan may be necessary for a query of the form SELECT MAX(x)+1 FROM table
if column x be not the left-most column of an index.

Using index to speed up child <> parent query

I have query similar to this:
select *
from table1
where status = 'ACTV'
and child_id <> parent_id
The problem is that this table is quite and large and Oracle is doing full table scan.
I was trying to create an index (with status, child_id, parent_id columns) that would speed up this query but Oracle is not using this index even with hint.
Is there a way to speed up this query ?
You can use index with function:
CREATE INDEX child_parent ON table1(DECODE(child_id,parent_id,1, 0))
And then use it in your select:
select *
from table1
where status = 'ACTV'
and DECODE(child_id,parent_id,1, 0) = 0
Only cons for this solution - it will slow down insert and update operations a bit more than regular index.
Also if potentially returnable record count is large Oracle can do table full scan
In parent, child table : "child_id <> parent_id" is obvious right , it will always fetch 99% of data then full table scan is better approach. Index will be slower if you selecting more percentage of data.
if your application needs "child_id <> parent_id" always then you can create check constrain for the same. Then you may not need this where condition "child_id <> parent_id" any time.

Oracle: Forcing index usage

I've got this two index:
CREATE INDEX NETATEMP.CAMBI_MEM_ANIMALI_ELF_T2A ON NETATEMP.CAMBI_MEM_ANIMALI_ELF_T2
(TELE_TESTATA_LETTURA_ID, ELF_DATA_FINE_FATTURAZIONE)
CREATE INDEX NETATEMP.LET_TESTATE_LETTURE1A ON NETATEMP.LET_TESTATE_LETTURE1
(TELE_STORICO_ID, TRUNC("TELE_DATA_LETTURA"))
CREATE TABLE NETATEMP.cambi_mem_animali_elf
AS
SELECT --/*+ parallel(forn 32) */
DISTINCT
forn_fornitura_id,
TRUNC (tele.TELE_DATA_LETTURA) TELE_DATA_LETTURA,
forn.edw_partition,
DECODE (SUBSTR (forn.TELE_TESTATA_LETTURA_ID, 1, 1), '*', 'MIGRATO', 'INTEGRA') Origine
FROM NETATEMP.cambi_mem_animali_elf_t2 forn,
netatemp.let_testate_letture1 tele
WHERE forn.tele_testata_lettura_id = tele.tele_storico_id
--
AND forn.ELF_DATA_FINE_FATTURAZIONE != TRUNC (tele.TELE_DATA_LETTURA)
It uses two full table scan. I simply can't understand why Oracle doesn't look at both index and makes and index range scan after that.
How can I force to do so?
It's because HASH joins don't use indexes on the join predicates.
Read this for all the details: http://use-the-index-luke.com/sql/join/hash-join-partial-objects
You are referencing columns that are not included in the indexes, so even if the join itself would be faster using index, Oracle would anyway have to retrieve all the table blocks for the remaining columns.
For reference: Depending on statistics you may get the index join you are looking for with the first of these two queries because it can be resolved with index only, whereas the second query has to go to the table.
select count(*)
from netatemp.cambi_mem_animali_elf_t2 forn
,netatemp.let_testate_letture1 tele
where forn.tele_testata_lettura_id = tele.tele_storico_id;
select count(*), min(forn.edw_partition)
from netatemp.cambi_mem_animali_elf_t2 forn
,netatemp.let_testate_letture1 tele
where forn.tele_testata_lettura_id = tele.tele_storico_id;
If you have the partitioning option then consider hash partitioning the two tables on the join columns. A partition-wise join will greatly reduce the memory requirement and likelihood of the join spilling to disk.

Resources