Utilize function-based spatial index in SELECT list - oracle

I have an Oracle 18c table called LINES with 1000 rows. The DDL for the table can be found here: db<>fiddle.
The data looks like this:
create table lines (shape sdo_geometry);
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574360, 4767080, 574200, 4766980)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(573650, 4769050, 573580, 4768870)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574290, 4767090, 574200, 4767070)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(571430, 4768160, 571260, 4768040)));
...
I've created a function that's intentionally slow — for testing purposes. The function takes the SDO_GEOMETRY lines and outputs a SDO_GEOEMTRY point.
create or replace function slow_function(shape in sdo_geometry) return sdo_geometry
deterministic is
begin
return
--Deliberately make the function slow for testing purposes...
-- ...convert from SDO_GEOMETRY to JSON and back, several times, for no reason.
sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(
sdo_lrs.geom_segment_start_pt(shape)
))))))))));
end;
As an experiment, I want to create a function-based spatial index, as a way to pre-compute the result of the slow function.
Steps:
Create an entry in USER_SDO_GEOM_METADATA:
insert into user_sdo_geom_metadata (table_name, column_name, diminfo, srid)
values (
'lines',
'infrastr.slow_function(shape)',
-- 🡅 Important: Include the function owner.
sdo_dim_array (
sdo_dim_element('X', 567471.222, 575329.362, 0.5), --note to self: these coordinates are wrong.
sdo_dim_element('Y', 4757654.961, 4769799.360, 0.5)
),
26917
);
commit;
Create a function-based spatial index:
create index lines_idx on lines (slow_function(shape)) indextype is mdsys.spatial_index_v2;
Problem:
When I use the function in the SELECT list of a query, the index isn't being used. Instead, it's doing a full table scan...so the query is still slow when I select all rows (CTRL+ENTER in SQL Developer).
You might ask, "Why select all rows?" Answer: That's how mapping software often works...you display all (or most) of the points in the map — all at once.
explain plan for
select
slow_function(shape)
from
lines
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| LINES | 1 | 34 | 7 (0)| 00:00:01 |
---------------------------------------------------------------------------
Likewise, in my mapping software (ArcGIS Desktop 10.7.1), the map doesn't utilize the index either. I can tell, because the points are slow to draw in the map.
I'm aware that it's possible to create a view, and then register that view in USER_SDO_GEOM_METADATA (in addition to registering the index). And use that view in the map. I've tried that, but the mapping software still doesn't use the index.
I've also tried an SQL hint, but no luck — I don't think the hint is being used:
create or replace view lines_vw as (
select
/*+ INDEX (lines lines_idx) */
cast(rownum as number(38,0)) as objectid, --the mapping software needs a unique ID column
slow_function(shape) as shape
from
lines
where
slow_function(shape) is not null --https://stackoverflow.com/a/59581129/5576771
)
Question:
How can I utilize the function-based spatial index in the SELECT list in a query?

A spatial index is invoked only by the WHERE clause, not the SELECT list. A function in the SELECT list is invoked for every row returned by the WHERE clause, which in your case is SDO_ANYINTERACT( ) returning all rows.

You don't appear to be firing the index; just adding the function call as an attribute is insufficient
select
slow_function(shape)
from
lines
Should be....
select slow_function(shape)
from lines
Where sdo_anyinteract(slow_function(shape),sdo_geometry(2003, 26917,null,sdo_elem_info_array(1,1003,3),sdo_ordinate_array(1,2,3,4)) = 'TRUE'
Where 1,2,3,4 are the values of an optimized rectangle.

I tried using sdo_anyinteract() in the WHERE clause, as #SimonGreener suggested.
Unfortunately, the query still seems to be doing a full table scan (in addition to using the index). I was hoping to only use the index.
select
slow_function(shape) as shape
from
lines
where
sdo_anyinteract(slow_function(shape),
mdsys.sdo_geometry(2003, 26917, null, mdsys.sdo_elem_info_array(1, 1003, 1), mdsys.sdo_ordinate_array(573085.8702, 4771088.3813, 566461.6349, 4768833.3225, 570335.0629, 4757455.1278, 576959.2982, 4759710.1866, 573085.8702, 4771088.3813))
) = 'TRUE'
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | LINES | 1 | 46 | 1 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX (SEL: 0.000000 %)| LINES_IDX | | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
-----------------------------------
2 - access("MDSYS"."SDO_ANYINTERACT"("INFRASTR"."SLOW_FUNCTION"("SHAPE"),"MDSYS"."
SDO_GEOMETRY"(2003,26917,NULL,"MDSYS"."SDO_ELEM_INFO_ARRAY"(1,1003,1),"MDSYS"."SDO_OR
DINATE_ARRAY"(573085.8702,4771088.3813,566461.6349,4768833.3225,570335.0629,4757455.1
278,576959.2982,4759710.1866,573085.8702,4771088.3813)))='TRUE')
I played around with a SQL hint: /*+ INDEX (lines lines_idx) */. But that didn't seem to make a difference.

The function takes the SDO_GEOMETRY lines and outputs a SDO_GEOEMTRY point.
A possible alternative might be:
Instead of returning/indexing a geometry column, maybe I could return/index X&Y numeric columns (using a regular non-spatial index). And then either:
A) Convert the XY columns to SDO_GEOMETRY after-the-fact in a query on-the-fly, or...
B) Use GIS software to display the XY data as points in a map. For example, in ArcGIS Pro, create an "XY Event Layer".
That technique seemed to work ok here: Improve performance of startpoint query (ST_GEOMETRY). I was able to utilize the function-based index (non-spatial) in a SELECT clause — making my query significantly faster.
Of course, that technique would work best for points — since converting XYs after-the-fact to point geometries is easy/performant/practical. Whereas converting lines or polygons (maybe from WKT?) to geometries after-the-fact likely wouldn't make much sense. Even if that were possible, it would likely be too slow, and defeat the purpose of precomputing the data in the function-based index in the first place.

Related

How should I index a FULLNAME field in Oracle when I need to query by first and last name?

I have a rather large table (34 GB, 77M rows) which contains payment information. The table is partitioned by payment date because users usually care about small ranges of dates so the partition pruning really helps queries to return quickly.
The problem is that I have a user who wants to find out all payments that have ever been made to certain people.
Names are stored in columns NAME1 and NAME2, which are both VARCHAR2(40 Byte) and hold free-form full name data. For example, John Q Public could appear in either column as:
John Q Public
John Public
Public, John Q
or even embedded in the middle of the field, like "Estate of John Public"
Right now, the way the query is set up is to look for
NAME1||NAME2 LIKE '%JOHN%PUBLIC%' OR NAME1||NAME2 LIKE '%PUBLIC%JOHN%' and as you can imagine, the performance sucks.
Is this a job for Oracle Text? How else could I better index the atomic bits of the columns so that the user can search by first/last name?
Database Version: Oracle 12c (12.1.0.2.0)
Create a multi-column index on both names and modify your query to use an INDEX FAST FULL SCAN operation.
Traversing a b-tree index is a great way to quickly find a small amount of data. Unfortunately the leading wildcards ruin that access path for your query. However, Oracle has multiple ways of reading data from an index. The INDEX FAST FULL SCAN operation simply reads all of the index blocks in no particular order, as if the index was a skinny table. Since the average row length of your table is 442 bytes, and the two columns use at most 80 bytes, reading all the names in the index may be much faster than scanning the entire table.
But the index alone probably isn't enough. You need to change the concatenation into multiple OR expressions.
Sample schema:
--Create payment table and index on name columns.
create table payment
(
id number,
paydate date,
other_data varchar2(400),
name1 varchar2(40),
name2 varchar2(40)
);
create index payment_idx on payment(name1, name2);
--Insert 100K sample rows.
insert into payment
select level, sysdate + level, lpad('A', 400, 'A'), level, level
from dual
connect by level <= 100000;
--Insert two rows with relevant values.
insert into payment values(0, sysdate, 'other data', 'B JOHN B PUBLIC B', 'asdf');
insert into payment values(0, sysdate, 'other data', 'asdf', 'C JOHN C PUBLIC C');
commit;
--Gather stats to help optimizer pick the right plan.
begin
dbms_stats.gather_table_stats(user, 'payment');
end;
/
Original expression uses a full table scan:
explain plan for
select name1, name2
from payment
where NAME1||NAME2 LIKE '%JOHN%PUBLIC%' OR NAME1||NAME2 LIKE '%PUBLIC%JOHN%';
select * from table(dbms_xplan.display);
Plan hash value: 684176532
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9750 | 4056K| 1714 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| PAYMENT | 9750 | 4056K| 1714 (1)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME1"||"NAME2" LIKE '%JOHN%PUBLIC%' OR "NAME1"||"NAME2"
LIKE '%PUBLIC%JOHN%')
New expression uses a faster INDEX FAST FULL SCAN operation:
explain plan for
select name1, name2
from payment
where
NAME1 LIKE '%JOHN%PUBLIC%' OR
NAME1 LIKE '%PUBLIC%JOHN%' OR
NAME2 LIKE '%JOHN%PUBLIC%' OR
NAME2 LIKE '%PUBLIC%JOHN%';
select * from table(dbms_xplan.display);
Plan hash value: 1655289165
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18550 | 217K| 152 (3)| 00:00:01 |
|* 1 | INDEX FAST FULL SCAN| PAYMENT_IDX | 18550 | 217K| 152 (3)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME1" LIKE '%JOHN%PUBLIC%' AND "NAME1" IS NOT NULL AND
"NAME1" IS NOT NULL OR "NAME1" LIKE '%PUBLIC%JOHN%' AND "NAME1" IS NOT NULL
AND "NAME1" IS NOT NULL OR "NAME2" LIKE '%JOHN%PUBLIC%' AND "NAME2" IS NOT
NULL AND "NAME2" IS NOT NULL OR "NAME2" LIKE '%PUBLIC%JOHN%' AND "NAME2" IS
NOT NULL AND "NAME2" IS NOT NULL)
This solution should definitely be faster than a full table scan. How much faster depends on the average name size and the name being searched. And depending on the query you may want to add additional columns to keep all the relevant data in the index.
Oracle Text is also a good option, but that feature feels a little "weird" in my opinion. If you're not already using text indexes you might want to stick with normal indexes to simplify administrative tasks.

Oracle CBO when using types [duplicate]

I'm trying to optimize a set of stored procs which are going against many tables including this view. The view is as such:
We have TBL_A (id, hist_date, hist_type, other_columns) with two types of rows: hist_type 'O' vs. hist_type 'N'. The view self joins table A to itself and transposes the N rows against the corresponding O rows. If no N row exists for the O row, the O row values are repeated. Like so:
CREATE OR REPLACE FORCE VIEW V_A (id, hist_date, hist_type, other_columns_o, other_columns_n)
select
o.id, o.hist_date, o.hist_type,
o.other_columns as other_columns_o,
case when n.id is not null then n.other_columns else o.other_columns end as other_columns_n
from
TBL_A o left outer join TBL_A n
on o.id=n.id and o.hist_date=n.hist_date and n.hist_type = 'N'
where o.hist_type = 'O';
TBL_A has a unique index on: (id, hist_date, hist_type). It also has a unique index on: (hist_date, id, hist_type) and this is the primary key.
The following query is at issue (in a stored proc, with x declared as TYPE_TABLE_OF_NUMBER):
select b.id BULK COLLECT into x from TBL_B b where b.parent_id = input_id;
select v.id from v_a v
where v.id in (select column_value from table(x))
and v.hist_date = input_date
and v.status_new = 'CLOSED';
This query ignores the index on id column when accessing TBL_A and instead does a range scan using the date to pick up all the rows for the date. Then it filters that set using the values from the array. However if I simply give the list of ids as a list of numbers the optimizer uses the index just fine:
select v.id from v_a v
where v.id in (123, 234, 345, 456, 567, 678, 789)
and v.hist_date = input_date
and v.status_new = 'CLOSED';
The problem also doesn't exist when going against TBL_A directly (and I have a workaround that does that, but it's not ideal.).Is there a way to get the optimizer to first retrieve the array values and use them as predicates when accessing the table? Or a good way to restructure the view to achieve this?
Oracle does not use the index because it assumes select column_value from table(x) returns 8168 rows.
Indexes are faster for retrieving small amounts of data. At some point it's faster to scan the whole table than repeatedly walk the index tree.
Estimating the cardinality of a regular SQL statement is difficult enough. Creating an accurate estimate for procedural code is almost impossible. But I don't know where they came up with 8168. Table functions are normally used with pipelined functions in data warehouses, a sorta-large number makes sense.
Dynamic sampling can generate a more accurate estimate and likely generate a plan that will use the index.
Here's an example of a bad cardinality estimate:
create or replace type type_table_of_number as table of number;
explain plan for
select * from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 00:00:01 |
-------------------------------------------------------------------------
Here's how to fix it:
explain plan for select /*+ dynamic_sampling(2) */ *
from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 7 | 00:00:01 |
-------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Oracle PLSQL: Performance issue when using TABLE functions

I am currently facing a performance problem when using the table functions. I will explain.
I am working with Oracle types and one of them is defined like below:
create or replace TYPE TYPESTRUCTURE AS OBJECT
(
ATTR1 VARCHAR2(30),
ATTR2 VARCHAR2(20),
ATTR3 VARCHAR2(20),
ATTR4 VARCHAR2(20),
ATTR5 VARCHAR2(20),
ATTR6 VARCHAR2(20),
ATTR7 VARCHAR2(20),
ATTR8 VARCHAR2(20),
ATTR9 VARCHAR2(20),
ATTR10 VARCHAR2(20),
ATTR11 VARCHAR2(20),
ATTR12 VARCHAR2(20),
ATTR13 VARCHAR2(10),
ATTR14 VARCHAR2(50),
ATTR15 VARCHAR2(13)
);
Then I have one table of this type like:
create or replace TYPE TYPESTRUCTURE_ARRAY AS TABLE OF TYPESTRUCTURE ;
I have one procedure with the following variables:
arr QCSTRUCTURE_ARRAY;
arr2 QCSTRUCTURE_ARRAY;
ARR is only containing one single instance of TYPESTRUCTURE with all its attributes set to NULL except ATTR4 which is set to 'ABC'
ARR2 is completelly empty.
Here comes the part which is giving me the performance issue.
The purpose is to take some values from a view (depending on the value on ATTR4) and fill those in same or similar structure. So I do the following:
SELECT TYPESTRUCTURE(MV.A,null,null,MV.B,MV.C,MV.D,null,null,MV.E,null,null,MV.F,MV.F,MV.G,MV.H)
BULK COLLECT INTO arr2
FROM TABLE(arr) PARS
JOIN MYVIEW MV
ON MV.B = PARS.ATTR4;
The code here works correctly except for the fact that is taking 15 seconds to execute the query...
This query is filling into ARR around 20 instances of TYPESTRUCTURE (or rows).
It could look like there may be lots of data on the view. But what gets me strange is that if I change the query and I set something hardcoded like the one below then is completelly fast (miliseconds)
SELECT TYPESTRUCTURE(MV.A,null,null,MV.B,MV.C,MV.D,null,null,MV.E,null,null,MV.F,MV.F,MV.G,MV.H)
BULK COLLECT INTO arr2
FROM (SELECT 'ABC' ATTR4 FROM DUAL) PARS
JOIN MYVIEW MV
ON MV.B = PARS.ATTR4;
In this new query I am directly hardcoding the value but keeping the join in order to try to test something as much similar as the one above but without the TABLE() function..
So here my question.... Is it possible that this TABLE() function is creating such a big delay with only having one single record inside? I would like to know whether someone can give me some advice on what is wrong in my approach and whether there may be some other way to achieve...
Thanks!!
This problem is likely caused by a poor optimizer estimate for the number of rows returned by the TABLE function. The CARDINALITY or DYNAMIC_SAMPLING hints may be the best way to solve the problem.
Cardinality estimate
Oracle gathers statistics on tables and indexes in order to estimate the cost of accessing those objects. The most important estimate is how many rows will be returned by an object. Procedural code does not have statistics, by default, and Oracle does not make any attempt to parse the code and estimate how many rows will be produced. Whenever Oracle sees a procedural row source it uses a static number. On my database, the number is 16360. On most databases the estimate is 8192, as beherenow pointed out.
explain plan for
select * from table(sys.odcinumberlist(1,2,3));
select * from table(dbms_xplan.display(format => 'basic +rows'));
Plan hash value: 2234210431
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16360 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 16360 |
--------------------------------------------------------------
Fix #1: CARDINALITY hint
As beherenow suggested, the CARDINALITY hint can solve this problem by statically telling Oracle how many rows to estimate.
explain plan for
select /*+ cardinality(1) */ * from table(sys.odcinumberlist(1,2,3));
select * from table(dbms_xplan.display(format => 'basic +rows'));
Plan hash value: 2234210431
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 16360 |
--------------------------------------------------------------
Fix #2: DYNAMIC_SAMPLING hint
A more "official" solution is to use the DYNAMIC_SAMPLING hint. This hint tells Oracle to sample some data at run time before it builds the explain plan. This adds some cost to building the explain plan, but it will return the true number of rows. This may work much better if you don't know the number ahead of time.
explain plan for
select /*+ dynamic_sampling(2) */ * from table(sys.odcinumberlist(1,2,3));
select * from table(dbms_xplan.display(format => 'basic +rows'));
Plan hash value: 2234210431
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 3 |
--------------------------------------------------------------
But what's really slow?
We don't know exactly was slow in your query. But whenever things are slow it's usually best to focus on the worst cardinality estimate. Row estimates are never perfect, but being off by several orders of magnitude can have a huge impact on an execution plan. In the simplest case it may change an index range scan to a full table scan.

Using function based index (oracle) to speed up count(X)

I've a table Film:
CREATE TABLE film (
film_id NUMBER(5) NOT NULL,
title varchar2(255));
And I wanted to make the query, which counts how many titles start with the same word and only displays ones with more than 20, faster using a function based index. The query:
SELECT FW_SEPARATOR.FIRST_WORD AS "First Word", COUNT(FW_SEPARATOR.FIRST_WORD) AS "Count"
FROM (SELECT regexp_replace(FILM.TITLE, '(\w+).*$','\1') AS FIRST_WORD FROM FILM) FW_SEPARATOR
GROUP BY FW_SEPARATOR.FIRST_WORD
HAVING COUNT(FW_SEPARATOR.FIRST_WORD) >= 20;
The thing is, I created this function based index:
CREATE INDEX FIRST_WORD_INDEX ON FILM(regexp_replace(TITLE, '(\w+).*$','\1'));
But it didn't speed anything up...
I was wondering if anyone could help me with this :)
Add a redundant predicate to the query to convince Oracle that the expression will not return null values and an index can be used:
select regexp_replace(film.title, '(\w+).*$','\1') first_word
from film
where regexp_replace(film.title, '(\w+).*$','\1') is not null;
Oracle can use an index like a skinny version of a table. Many queries only contain a small subset of the columns in a table. If all the columns in that set are part of the same index, Oracle can use that index instead of the table. This will be either an INDEX FAST FULL SCAN or an INDEX FULL SCAN. The data may be read similar to the way a regular table scan works. But since the index is much smaller than the table, that access method can be much faster.
But function-based indexes do not store NULLs. Oracle cannot use an index scan if it thinks there is a NULL that is not stored in the index. In this case, if the base column was defined as NOT NULL, the regular expression would always return a non-null value. But unsurprisingly, Oracle has not built code to determine whether or not a regular expression could return NULL. That sounds like an impossible task, similar to the halting problem.
There are several ways to convince Oracle that the expression is not null. The simplest may be to repeat the predicate and add an IS NOT NULL condition.
Sample Schema
create table film (
film_id number(5) not null,
title varchar2(255) not null);
insert into film select rownumber, column_value
from
(
select rownum rownumber, column_value from table(sys.odcivarchar2list(
q'<The Shawshank Redemption>',
q'<The Godfather>',
q'<The Godfather: Part II>',
q'<The Dark Knight>',
q'<Pulp Fiction>',
q'<The Good, the Bad and the Ugly>',
q'<Schindler's List>',
q'<12 Angry Men>',
q'<The Lord of the Rings: The Return of the King>',
q'<Fight Club>'))
);
create index film_idx1 on film(regexp_replace(title, '(\w+).*$','\1'));
begin
dbms_stats.gather_table_stats(user, 'FILM');
end;
/
Query that does not use index
Even with an index hint, the normal query will not use an index. Remember that hints are directives, and this query would use the index if it was possible.
explain plan for
select /*+ index_ffs(film) */ regexp_replace(title, '(\w+).*$','\1') first_word
from film;
select * from table(dbms_xplan.display);
Plan hash value: 1232367652
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| FILM | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Query that uses index
Now add the extra condition and the query will use the index. I'm not sure why it uses an INDEX FULL SCAN instead of an INDEX FAST FULL SCAN. With such small sample data it doesn't matter. The important point is that an index is used.
explain plan for
select regexp_replace(film.title, '(\w+).*$','\1') first_word
from film
where regexp_replace(film.title, '(\w+).*$','\1') is not null;
select * from table(dbms_xplan.display);
Plan hash value: 1151375616
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 1 (0)| 00:00:01 |
|* 1 | INDEX FULL SCAN | FILM_IDX1 | 10 | 50 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( REGEXP_REPLACE ("TITLE",'(\w+).*$','\1') IS NOT NULL)

Use Oracle unnested VARRAY's instead of IN operator

Let's say users have 1 - n accounts in a system. When they query the database, they may choose to select from m acounts, with m between 1 and n. Typically the SQL generated to fetch their data is something like
SELECT ... FROM ... WHERE account_id IN (?, ?, ..., ?)
So depending on the number of accounts a user has, this will cause a new hard-parse in Oracle, and a new execution plan, etc. Now there are a lot of queries like that and hence, a lot of hard-parses, and maybe the cursor/plan cache will be full quite early, resulting in even more hard-parses.
Instead, I could also write something like this
-- use any of these
CREATE TYPE numbers AS VARRAY(1000) of NUMBER(38);
CREATE TYPE numbers AS TABLE OF NUMBER(38);
SELECT ... FROM ... WHERE account_id IN (
SELECT column_value FROM TABLE(?)
)
-- or
SELECT ... FROM ... JOIN (
SELECT column_value FROM TABLE(?)
) ON column_value = account_id
And use JDBC to bind a java.sql.Array (i.e. an oracle.sql.ARRAY) to the single bind variable. Clearly, this will result in less hard-parses and less cursors in the cache for functionally equivalent queries. But is there anything like general a performance-drawback, or any other issues that I might run into?
E.g: Does bind variable peeking work in a similar fashion for varrays or nested tables? Because the amount of data associated with every account may differ greatly.
I'm using Oracle 11g in this case, but I think the question is interesting for any Oracle version.
I suggest you try a plain old join like in
SELECT Col1, Col2
FROM ACCOUNTS ACCT
TABLE TAB,
WHERE ACCT.User = :ParamUser
AND TAB.account_id = ACCT.account_id;
An alternative could be a table subquery
SELECT Col1, Col2
FROM (
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
) ACCT,
TABLE TAB
WHERE TAB.account_id = ACCT.account_id;
or a where subquery
SELECT Col1, Col2
FROM TABLE TAB
WHERE TAB.account_id IN
(
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
);
The first one should be better for perfomance, but you better check them all with explain plan.
Looking at V$SQL_BIND_CAPTURE in a 10g database, I have a few rows where the datatype is VARRAY or NESTED_TABLE; the actual bind values were not captured. In an 11g database, there is just one such row, but it also shows that the bind value is not captured. So I suspect that bind value peeking essentially does not happen for user-defined types.
In my experience, the main problem you run into using nested tables or varrays in this way is that the optimizer does not have a good estimate of the cardinality, which could lead it to generate bad plans. But, there is an (undocumented?) CARDINALITY hint that might be helpful. The problem with that is, if you calculate the actual cardinality of the nested table and include that in the query, you're back to having multiple distinct query texts. Perhaps if you expect that most or all users will have at most 10 accounts, using the hint to indicate that as the cardinality would be helpful. Of course, I'd try it without the hint first, you may not have an issue here at all.
(I also think that perhaps Miguel's answer is the right way to go.)
For medium sized list (several thousand items) I would use this approach:
First:generate a prepared statement with an XMLTABLE in join with your main table.
For instance:
String myQuery = "SELECT ...
+" FROM ACCOUNTS A,"
+ "XMLTABLE('tab/row' passing XMLTYPE(?) COLUMNS id NUMBER path 'id') t
+ "WHERE A.account_id = t.id"
then loop through your data and build a StringBuffer with this content:
StringBuffer idList = "<tab><row><id>101</id></row><row><id>907</id></row> ...</tab>";
eventually, prepare and submit your statement, then fetch the results.
myQuery.setString(1, idList);
ResultSet rs = myQuery.executeQuery();
while (rs.next()) {...}
Using this approach is also possible to pass multi-valued list, as in the select statement
SELECT * FROM TABLE t WHERE (t.COL1, t.COL2) in (SELECT X.COL1, X.COL2 FROM X);
In my experience performances are pretty good, and the approach is flexible enough to be used in very complex query scenarios.
The only limit is the size of the string passed to the DB, but I suppose it is possible to use CLOB in place of String for arbitrary long XML wrapper to the input list;
This binding a variable number of items into an in list problem seems to come up a lot in various form. One option is to concatenate the IDs into a comma separated string and bind that, and then use a bit of a trick to split it into a table you can join against, eg:
with bound_inlist
as
(
select
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.token
Bind variable peaking is going to be a problem though.
Does the query plan actually change for larger number of accounts, ie would it be more efficient to move from index to full table scan in some cases, or is it borderline? As someone else suggested, you could use the CARDINALITY hint to indicate how many IDs are being bound, the following test case proves this actually works:
create table actual_table (id integer, padding varchar2(100));
create unique index actual_table_idx on actual_table(id);
insert into actual_table
select level, 'this is just some padding for '||level
from dual connect by level <= 1000;
explain plan for
with bound_inlist
as
(
select /*+ CARDINALITY(10) */
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.id;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 840 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 840 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 10 | 190 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | ACTUAL_TABLE_IDX | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | ACTUAL_TABLE | 1 | 65 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Another option is to always use n bind variables in every query. Use null for m+1 to n.
Oracle ignores repeated items in the expression_list. Your queries will perform the same way and there will be fewer hard parses. But there will be extra overhead to bind all the variables and transfer the data. Unfortunately I have no idea what the overall affect on performance would be, you'd have to test it.

Resources