SQL/Oracle: when indexes on multiple columns can be used - oracle

If I create an index on columns (A, B, C), in that order, my understanding is that the database will be able to use it even if I search only on (A), or (A and B), or (A and B and C), but not if I search only on (B), or (C), or (B and C). Is this correct?

There are actually three index-based access methods that Oracle can use when a predicate is placed on a non-leading column of an index.
i) Index skip-scan: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#PFGRF10105
ii) Fast full index scan: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52044
iii) Index full scan: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i82107
I've most often seen the fast full index scan "in the wild", but all are possible.

That is not correct. Always best to come up with a test case that represents your data and see for yourself. If you want to really understand the Oracle SQL Optimizer google Jonathan Lewis, read his books, read his blog, check out his website, the guy is amazing, and he always generates test cases.
create table mytab nologging as (
select mod(rownum, 3) x, rownum y, mod(rownum, 3) z from all_objects, (select 'x' from user_tables where rownum < 4)
);
create index i on mytab (x, y, z);
exec dbms_stats.gather_table_stats(ownname=>'DBADMIN',tabname=>'MYTAB', cascade=>true);
set autot trace exp
select * from mytab where y=5000;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=10)
1 0 INDEX (SKIP SCAN) OF 'I' (INDEX) (Cost=1 Card=1 Bytes=10)

Up to version Oracle 8 an index will never be used unless the first column is included in the SQL.
In Oracle 9i the Skip Scan Index Access feature was introduced, which lets the Oracle CBO attempt to use indexes even when the prefix column is not available.
Good overview of how skip scan works here: http://www.quest-pipelines.com/newsletter-v5/1004_C.htm

Related

Efficent use of an index for a self join with a group by

I'm trying to speed up the following
create table tab2 parallel 24 nologging compress for query high as
select /*+ parallel(24) index(a ix_1) index(b ix_2)*/
a.usr
,a.dtnum
,a.company
,count(distinct b.usr) as num
,count(distinct case when b.checked_1 = 1 then b.usr end) as num_che_1
,count(distinct case when b.checked_2 = 1 then b.usr end) as num_che_2
from tab a
join tab b on a.company = b.company
and b.dtnum between a.dtnum-1 and a.dtnum-0.0000000001
group by a.usr, a.dtnum, a.company;
by using indexes
create index ix_1 on tab(usr, dtnum, company);
create index ix_2 on tab(usr, company, dtnum, checked_1, checked_2);
but the execution plan tells me that it's going to be an index full scan for both indexes, and the calculations are very long (1 day is not enough).
About the data. Table tab has over 3 mln records. None of the single columns are unique. The unique values here are pairs of (usr, dtnum), where dtnum is a date with time written as a number in the format yyyy,mmddhh24miss. Columns checked_1, checked_2 have values from set (null, 0, 1, 2). Company holds an id for a company.
Each pair can only have one value checked_1, checked_2 and company as it is unique. Each user can be in multple pairs with different dtnum.
Edit
#Roberto Hernandez: I've attached the picture with the execution plan. As for parallel 24, in our company we are told to create tables with options 'parallel [num] nologging compress for query high'. I'm using 24 but I'm no expert in this field.
#Sayan Malakshinov: http://sqlfiddle.com/#!4/40b6b/2 Here I've simplified by giving data with checked_1 = checked_2, but in real life this may not be true.
#scaisEdge:
For
create index my_id1 on tab (company, dtnum);
create index my_id2 on tab (company, dtnum, usr);
I get
For table tab Your join condition is based on columns
company, datun
so you index should be primarly based on these columns
create index my_id1 on tab (company, datum);
The indexes you are using are useless because don't contain in left most position columsn use ij join /where condition
Eventually you can add user right most potition for avoid the needs of table access and let the db engine retrive alla the inf inside the index values
create index my_id1 on tab (company, datum, user, checked_1, checked_2);
Indexes (bitmap or otherwise) are not that useful for this execution. If you look at the execution plan, the optimizer thinks the group-by is going to reduce the output to 1 row. This results in serialization (PX SELECTOR) So I would question the quality of your statistics. What you may need is to create a column group on the three group-by columns, to improve the cardinality estimate of the group by.

Function index does not work in oracle where it is used with other operator

You assume this simple query:
select name, code
from item
where length(code) > 5
Due to avoiding of full access table, there is an function-index on length(code) by following command:
create index index_len_code on item(length(code));
The optimizer detects the index and use it(INDEX RANGE SCAN). Nonetheless the optimizer does not detect the above index for the below query:
select i.name, i.code
from item i, item ii
where length(i.code) - length(ii.code) > 0
When I see the execution plan, it is the access full table, not to be index range scan while index is existed on length(code).
Where is wrong and what is wrong?
If you have an EMP table with a column HIREDATE, and that column is indexed, then the optimizer may choose to use the index for accessing the table in a query with a condition like
... HIREDATE >= ADD_MONTHS(SYSDATE, -12)
to find employees hired in the last 12 months.
However, HIREDATE has to be alone on the left-hand side. If you add or subtract months or days to it, or if you wrap it within a function call like ADD_MONTHS, the index can't be used. The optimizer will not perform trivial arithmetic manipulations to convert the condition into one where HIREDATE by itself must satisfy an inequality.
The same happened in your second query. If you change the condition to
... length(i.code) > length(ii.code)
then the optimizer can use the function-based index on length(code). But even in your first query, if you change the condition to
... length(code) - 5 > 0
the index will NOT be used, because this is not an inequality condition on length(code). Again, the optimizer is not smart enough to perform trivial algebraic manipulations to rewrite this in a form where it's an inequality condition on length(code) itself.

Oracle Spatial - SDO_BUFFER does not work?

I have a table which has SDO_Geometries and I query all the geometries to find their start and end point, then I insert these points to another table called ORAHAN. Now my main purpose is for each point in orahan I must find if it is intersects with another point in orahan when giving 2 cm buffer to points.
So I write some pl sql using Relate and Bufer functions but when I check some records in Map Info, I saw there is points within 1 cm area from itself but no record in intersections table called ORAHANCROSSES.
Am I use these functions wrongly or what?
Note: I am using Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production and
PL/SQL Release 11.2.0.1.0 - and SDO_PACKAGE
ORAHAN has approximately 400 thousands records.(points and other columns.)
declare
BEGIN
for curs in (select * from ORAHAN t) loop
for curs2 in (select *
from ORAHAN t2
where SDO_RELATE(t2.geoloc,SDO_GEOM.SDO_BUFFER(curs.geoloc,0.02,0.5) ,
'mask=ANYINTERACT') = 'TRUE'
and t2.mi_prinx <> curs.mi_prinx) loop
Insert INTO ORAHANCROSSES
values
(curs.Mip, curs.Startmi, curs2.Mip, curs2.Startmi);
commit;
end loop;
end loop;
END;
And this is MapInfo map image that shows 3 points which are close to each other aproximately 1 centimeter. But in the orahancrosses there is no record matching these 3.
Note: 0,00001000km equals 1cm
Orahan Metadata:
select * from user_sdo_geom_metadata where table_name = 'ORAHAN';
And diminfo:
What is the coordinate system of your data ? And, most important, what tolerance have you set in your metadata ?
Some other comments:
1) Don't use a relate with buffer approach. Just use a within-distance approach.
2) You don't need a PL/SQL loop for that sort of query just use a simple CTAS:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from orahan c1, orahan c2
where sdo_within_distance (c2.geoloc, c1.geoloc, 'distance=2 unit=cm') = 'TRUE'
and c2.mi_prinx <> c1.mi_prinx;
3) As written, couples of points A and B that are within 2 cm will be returned twice: once as (A,B) and once again as (B,A). To avoid that (and only return one of the cases), then write the query like this:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from orahan c1, orahan c2
where sdo_within_distance (c2.geoloc, c1.geoloc, 'distance=2 unit=cm') = 'TRUE'
and c1.rowid < c2.rowid;
3) Processing the number of points you mention (400000+) should run better using the SDO_JOIN technique, like this:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from table (
sdo_join (
'ORAHAN','GEOLOC',
'ORAHAN','GEOLOC',
'DISTANCE=2 UNIT=CM'
)
) j,
orahan c1,
orahan c2
where j.rowid1 < j.rowid2
and c1.rowid = j.rowid1
and c2.rowid = j.rowid2;
This will probably still take time to process - depending on the capacity of your database server. If you are licences for Oracle Enterprise Edition and your hardware has the proper capacity (# of cores) then parallelism can reduce the elapsed time.
4) You say you are using Oracle 11g. What exact version ? Version 11.2.0.4 is the terminal release for 11gR2. Anything older is no longer supported. By now you should really be on 12cR1 (12.1.0.2). The major benefit of 12.1.0.2 in your case s the Vector Performance Accelerator feature that speeds up a number of spatial functions and operators (only if you own the proper Oracle Spatial licenses - it is not available with the free Oracle Locator feature).
======================================
Using the two points in your example. Let's compute the distance:
select sdo_geom.sdo_distance(
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.07336716,null),null,null),
0.005
) distance
from dual;
DISTANCE
----------
.01000197
1 row selected.
Notice I don't specify any SRID. Assuming the coordinates are expressed in meters, the distance between them is indeed a little more than 1 cm.
======================================
The reason why your original syntax does not work is, as you noticed, because of the tolerance you specify for the SDO_BUFFER() call. You pass it as 0.5 (=50cm) to produce a buffer with a radius of 0.02 (2cm). The effect is that the buffer produced effectively dissolves into the point itself.
For example at tolerance 0.5:
select sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.5) from dual;
Produces:
SDO_GEOMETRY(2001, NULL, SDO_POINT_TYPE(521554.782, 4230983.08, NULL), NULL, NULL)
At tolerance 0.005:
select sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.005) from dual;
You get the proper buffer:
SDO_GEOMETRY(2003, NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 2), SDO_ORDINATE_ARRAY(521554.782, 4230983.06, 521554.802, 4230983.08, 521554.782, 4230983.1, 521554.762, 4230983.08, 521554.782, 4230983.06))
And the very close point now matches with that buffer:
select sdo_geom.relate(
sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.005),
'determine',
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.07336716,null),null,null),
0.005
) relation
from dual;
RELATION
-------------------------
CONTAINS
1 row selected.
======================================
Now the fact that your data does not have a proper explicit SRID means that the use of explicit units in measurements or distance-based searches will not work. Because the database does not know what coordinate system your data is in, it does not know how to determine that two points are less than a set number of cm or m apart. All you can do is assume the coordinates are in meters.
So in the examples I give above, replace 'DISTANCE=2 UNIT=CM' with 'DISTANCE=0.02'

Select from a loop in Oracle

In oracle 11g, I want to execute a query like that :
In this case, I didn't allowed use Function or Procedure.
I tried to Google it, but i couldn't find a good solution. Almost show me the way to use Function or Store Procedure.
Table X with columns (A,B,C)
With a row in table X i want to select :
Count = B - A;
for(i=0;i<Count;i++)
{
C++;
D = C * A;
}
Expect result : table Y with columns (A,B,C,D)
You are thinking like a 3GL developer. Java (or whatever) only has arrays, so everything is an iteration. But SQL is a set-oriented language: we don't need loops to work on sets of data. Oracle SQL has built-in aggregation functions which allow us to compute values from sets of records.
For instance, this query calculates total remuneration (salary plus commission), number of employees and average salary:
select sum(sal + nvl(comm,0)) as total_renum
, count(*) as total_emps
, avg(sal) as average_salary
from emp
/
Oracle has a comprehensive range of such functions, some of them are really powerful. Find out more. Be sure to check out analytic functions too.
Hmmm, so you subsequently posted a cryptic snippet of code. It's still not clear exactly what you want, but this might produce the outcome for your tab;e Y:
select a
, b
, c
, 0 + ((c+level) * a) as d
from x
connect by level <= (b-a)
/
For each row in table X it will generate (b-a) rows, with a derived value of d. I have assumed a start of 0 for d.

Performance of query without using OR clause

I am facing a problem. I have one query
Select * from tabA
where (a) in (a,b,c)
OR b in (a,b,c)
I want to facing performance issue due to this query as I need to remove the or condition , so I tried with the following query:
Select * from tabA
where (a,b) in (a,b,c)
but this query seems not to work, please help. I dont want to use 'or' condition.
If you logically need the OR condition, then that is what you need. There is nothing wrong with using OR. If both columns are indexed then the query is likely to take no longer than running these 2 queries independently:
select * from tabA
where a in (a,b,c);
select * from tabA
where b in (a,b,c);
The optimizer may well do that and concatenate the results like this:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=2 Bytes=256)
1 0 CONCATENATION
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TABA' (Cost=2 Card=1 Bytes=128)
3 2 INDEX (RANGE SCAN) OF 'TABA_A_IDX' (NON-UNIQUE) (Cost=1 Card=1)
4 1 TABLE ACCESS (BY INDEX ROWID) OF 'TABA' (Cost=2 Card=1 Bytes=128)
5 4 INDEX (UNIQUE SCAN) OF 'TABA_B_IDX' (NON-UNIQUE) (Cost=1 Card=1)
if the logic remains the same - you may try a UNION
Select * from tabA
where (a) in (a,b,c)
union
Select * from tabA
where b in (a,b,c)
also, check your indexes and explain plan results - indexing may solve the original OR issues.
You use wrong syntax, if you want pair compare values you should use smth like this:
select * from tabA
where (a,b) in ((a,b), (a,c), (b,c) etc.
Anyway in condition is transformed to multiple or conditions during query execution.
Provided you show table structure and execution plan people will be able to help you more effectively.

Resources