Use Oracle unnested VARRAY's instead of IN operator - performance

Let's say users have 1 - n accounts in a system. When they query the database, they may choose to select from m acounts, with m between 1 and n. Typically the SQL generated to fetch their data is something like
SELECT ... FROM ... WHERE account_id IN (?, ?, ..., ?)
So depending on the number of accounts a user has, this will cause a new hard-parse in Oracle, and a new execution plan, etc. Now there are a lot of queries like that and hence, a lot of hard-parses, and maybe the cursor/plan cache will be full quite early, resulting in even more hard-parses.
Instead, I could also write something like this
-- use any of these
CREATE TYPE numbers AS VARRAY(1000) of NUMBER(38);
CREATE TYPE numbers AS TABLE OF NUMBER(38);
SELECT ... FROM ... WHERE account_id IN (
SELECT column_value FROM TABLE(?)
)
-- or
SELECT ... FROM ... JOIN (
SELECT column_value FROM TABLE(?)
) ON column_value = account_id
And use JDBC to bind a java.sql.Array (i.e. an oracle.sql.ARRAY) to the single bind variable. Clearly, this will result in less hard-parses and less cursors in the cache for functionally equivalent queries. But is there anything like general a performance-drawback, or any other issues that I might run into?
E.g: Does bind variable peeking work in a similar fashion for varrays or nested tables? Because the amount of data associated with every account may differ greatly.
I'm using Oracle 11g in this case, but I think the question is interesting for any Oracle version.

I suggest you try a plain old join like in
SELECT Col1, Col2
FROM ACCOUNTS ACCT
TABLE TAB,
WHERE ACCT.User = :ParamUser
AND TAB.account_id = ACCT.account_id;
An alternative could be a table subquery
SELECT Col1, Col2
FROM (
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
) ACCT,
TABLE TAB
WHERE TAB.account_id = ACCT.account_id;
or a where subquery
SELECT Col1, Col2
FROM TABLE TAB
WHERE TAB.account_id IN
(
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
);
The first one should be better for perfomance, but you better check them all with explain plan.

Looking at V$SQL_BIND_CAPTURE in a 10g database, I have a few rows where the datatype is VARRAY or NESTED_TABLE; the actual bind values were not captured. In an 11g database, there is just one such row, but it also shows that the bind value is not captured. So I suspect that bind value peeking essentially does not happen for user-defined types.
In my experience, the main problem you run into using nested tables or varrays in this way is that the optimizer does not have a good estimate of the cardinality, which could lead it to generate bad plans. But, there is an (undocumented?) CARDINALITY hint that might be helpful. The problem with that is, if you calculate the actual cardinality of the nested table and include that in the query, you're back to having multiple distinct query texts. Perhaps if you expect that most or all users will have at most 10 accounts, using the hint to indicate that as the cardinality would be helpful. Of course, I'd try it without the hint first, you may not have an issue here at all.
(I also think that perhaps Miguel's answer is the right way to go.)

For medium sized list (several thousand items) I would use this approach:
First:generate a prepared statement with an XMLTABLE in join with your main table.
For instance:
String myQuery = "SELECT ...
+" FROM ACCOUNTS A,"
+ "XMLTABLE('tab/row' passing XMLTYPE(?) COLUMNS id NUMBER path 'id') t
+ "WHERE A.account_id = t.id"
then loop through your data and build a StringBuffer with this content:
StringBuffer idList = "<tab><row><id>101</id></row><row><id>907</id></row> ...</tab>";
eventually, prepare and submit your statement, then fetch the results.
myQuery.setString(1, idList);
ResultSet rs = myQuery.executeQuery();
while (rs.next()) {...}
Using this approach is also possible to pass multi-valued list, as in the select statement
SELECT * FROM TABLE t WHERE (t.COL1, t.COL2) in (SELECT X.COL1, X.COL2 FROM X);
In my experience performances are pretty good, and the approach is flexible enough to be used in very complex query scenarios.
The only limit is the size of the string passed to the DB, but I suppose it is possible to use CLOB in place of String for arbitrary long XML wrapper to the input list;

This binding a variable number of items into an in list problem seems to come up a lot in various form. One option is to concatenate the IDs into a comma separated string and bind that, and then use a bit of a trick to split it into a table you can join against, eg:
with bound_inlist
as
(
select
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.token
Bind variable peaking is going to be a problem though.
Does the query plan actually change for larger number of accounts, ie would it be more efficient to move from index to full table scan in some cases, or is it borderline? As someone else suggested, you could use the CARDINALITY hint to indicate how many IDs are being bound, the following test case proves this actually works:
create table actual_table (id integer, padding varchar2(100));
create unique index actual_table_idx on actual_table(id);
insert into actual_table
select level, 'this is just some padding for '||level
from dual connect by level <= 1000;
explain plan for
with bound_inlist
as
(
select /*+ CARDINALITY(10) */
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.id;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 840 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 840 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 10 | 190 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | ACTUAL_TABLE_IDX | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | ACTUAL_TABLE | 1 | 65 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Another option is to always use n bind variables in every query. Use null for m+1 to n.
Oracle ignores repeated items in the expression_list. Your queries will perform the same way and there will be fewer hard parses. But there will be extra overhead to bind all the variables and transfer the data. Unfortunately I have no idea what the overall affect on performance would be, you'd have to test it.

Related

Utilize function-based spatial index in SELECT list

I have an Oracle 18c table called LINES with 1000 rows. The DDL for the table can be found here: db<>fiddle.
The data looks like this:
create table lines (shape sdo_geometry);
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574360, 4767080, 574200, 4766980)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(573650, 4769050, 573580, 4768870)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574290, 4767090, 574200, 4767070)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(571430, 4768160, 571260, 4768040)));
...
I've created a function that's intentionally slow — for testing purposes. The function takes the SDO_GEOMETRY lines and outputs a SDO_GEOEMTRY point.
create or replace function slow_function(shape in sdo_geometry) return sdo_geometry
deterministic is
begin
return
--Deliberately make the function slow for testing purposes...
-- ...convert from SDO_GEOMETRY to JSON and back, several times, for no reason.
sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(
sdo_lrs.geom_segment_start_pt(shape)
))))))))));
end;
As an experiment, I want to create a function-based spatial index, as a way to pre-compute the result of the slow function.
Steps:
Create an entry in USER_SDO_GEOM_METADATA:
insert into user_sdo_geom_metadata (table_name, column_name, diminfo, srid)
values (
'lines',
'infrastr.slow_function(shape)',
-- 🡅 Important: Include the function owner.
sdo_dim_array (
sdo_dim_element('X', 567471.222, 575329.362, 0.5), --note to self: these coordinates are wrong.
sdo_dim_element('Y', 4757654.961, 4769799.360, 0.5)
),
26917
);
commit;
Create a function-based spatial index:
create index lines_idx on lines (slow_function(shape)) indextype is mdsys.spatial_index_v2;
Problem:
When I use the function in the SELECT list of a query, the index isn't being used. Instead, it's doing a full table scan...so the query is still slow when I select all rows (CTRL+ENTER in SQL Developer).
You might ask, "Why select all rows?" Answer: That's how mapping software often works...you display all (or most) of the points in the map — all at once.
explain plan for
select
slow_function(shape)
from
lines
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| LINES | 1 | 34 | 7 (0)| 00:00:01 |
---------------------------------------------------------------------------
Likewise, in my mapping software (ArcGIS Desktop 10.7.1), the map doesn't utilize the index either. I can tell, because the points are slow to draw in the map.
I'm aware that it's possible to create a view, and then register that view in USER_SDO_GEOM_METADATA (in addition to registering the index). And use that view in the map. I've tried that, but the mapping software still doesn't use the index.
I've also tried an SQL hint, but no luck — I don't think the hint is being used:
create or replace view lines_vw as (
select
/*+ INDEX (lines lines_idx) */
cast(rownum as number(38,0)) as objectid, --the mapping software needs a unique ID column
slow_function(shape) as shape
from
lines
where
slow_function(shape) is not null --https://stackoverflow.com/a/59581129/5576771
)
Question:
How can I utilize the function-based spatial index in the SELECT list in a query?
A spatial index is invoked only by the WHERE clause, not the SELECT list. A function in the SELECT list is invoked for every row returned by the WHERE clause, which in your case is SDO_ANYINTERACT( ) returning all rows.
You don't appear to be firing the index; just adding the function call as an attribute is insufficient
select
slow_function(shape)
from
lines
Should be....
select slow_function(shape)
from lines
Where sdo_anyinteract(slow_function(shape),sdo_geometry(2003, 26917,null,sdo_elem_info_array(1,1003,3),sdo_ordinate_array(1,2,3,4)) = 'TRUE'
Where 1,2,3,4 are the values of an optimized rectangle.
I tried using sdo_anyinteract() in the WHERE clause, as #SimonGreener suggested.
Unfortunately, the query still seems to be doing a full table scan (in addition to using the index). I was hoping to only use the index.
select
slow_function(shape) as shape
from
lines
where
sdo_anyinteract(slow_function(shape),
mdsys.sdo_geometry(2003, 26917, null, mdsys.sdo_elem_info_array(1, 1003, 1), mdsys.sdo_ordinate_array(573085.8702, 4771088.3813, 566461.6349, 4768833.3225, 570335.0629, 4757455.1278, 576959.2982, 4759710.1866, 573085.8702, 4771088.3813))
) = 'TRUE'
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | LINES | 1 | 46 | 1 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX (SEL: 0.000000 %)| LINES_IDX | | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
-----------------------------------
2 - access("MDSYS"."SDO_ANYINTERACT"("INFRASTR"."SLOW_FUNCTION"("SHAPE"),"MDSYS"."
SDO_GEOMETRY"(2003,26917,NULL,"MDSYS"."SDO_ELEM_INFO_ARRAY"(1,1003,1),"MDSYS"."SDO_OR
DINATE_ARRAY"(573085.8702,4771088.3813,566461.6349,4768833.3225,570335.0629,4757455.1
278,576959.2982,4759710.1866,573085.8702,4771088.3813)))='TRUE')
I played around with a SQL hint: /*+ INDEX (lines lines_idx) */. But that didn't seem to make a difference.
The function takes the SDO_GEOMETRY lines and outputs a SDO_GEOEMTRY point.
A possible alternative might be:
Instead of returning/indexing a geometry column, maybe I could return/index X&Y numeric columns (using a regular non-spatial index). And then either:
A) Convert the XY columns to SDO_GEOMETRY after-the-fact in a query on-the-fly, or...
B) Use GIS software to display the XY data as points in a map. For example, in ArcGIS Pro, create an "XY Event Layer".
That technique seemed to work ok here: Improve performance of startpoint query (ST_GEOMETRY). I was able to utilize the function-based index (non-spatial) in a SELECT clause — making my query significantly faster.
Of course, that technique would work best for points — since converting XYs after-the-fact to point geometries is easy/performant/practical. Whereas converting lines or polygons (maybe from WKT?) to geometries after-the-fact likely wouldn't make much sense. Even if that were possible, it would likely be too slow, and defeat the purpose of precomputing the data in the function-based index in the first place.

Confusion regarding to_char and to_number

First of all, I am aware about basics.
select to_number('A231') from dual; --this will not work but
select to_char('123') from dual;-- this will work
select to_number('123') from dual;-- this will also work
Actually in my package, we have 2 tables A(X number) and B(Y varchar) There are many columns but we are worried about only X and Y. X contains values only numeric like 123,456 etc but Y contains some string and some number for eg '123','HR123','Hello'. We have to join these 2 tables. its legacy application so we are not able to change tables and columns.
Till this time below condition was working properly
to_char(A.x)=B.y;
But since there is index on Y, performance team suggested us to do
A.x=to_number(B.y); it is running in dev env.
My question is, in any circumstances will this query give error? if it picks '123' definitely it will give 123. but if it picks 'AB123' then it will fail. can it fail? can it pick 'AB123' even when it is getting joined with other table.
can it fail?
Yes. It must put every row through TO_NUMBER before it can check whether or not it meets the filter condition. Therefore, if you have any one row where it will fail then it will always fail.
From Oracle 12.2 (since you tagged Oracle 12) you can use:
SELECT *
FROM A
INNER JOIN B
ON (A.x = TO_NUMBER(B.y DEFAULT NULL ON CONVERSION ERROR))
Alternatively, put an index on TO_CHAR(A.x) and use your original query:
SELECT *
FROM A
INNER JOIN B
ON (TO_CHAR(A.x) = B.y)
Also note: Having an index on B.y does not mean that the index will be used. If you are filtering on TO_NUMBER(B.y) (with or without the default on conversion error) then you would need a function-based index on the function TO_NUMBER(B.Y) that you are using. You should profile the queries and check the explain plans to see whether there is any improvement or change in use of indexes.
Never convert a VARCHAR2 column that can contain non-mumeric strings to_number.
This can partially work, but will eventuelly definitively fail.
Small Example
create table a as
select rownum X from dual connect by level <= 10;
create table b as
select to_char(rownum) Y from dual connect by level <= 10
union all
select 'Hello' from dual;
This could work (as you limit the rows, so that the conversion works; if you are lucky and Oracle chooses the right execution plan; which is probable, but not guarantied;)
select *
from a
join b on A.x=to_number(B.y)
where B.y = '1';
But this will fail
select *
from a
join b on A.x=to_number(B.y)
ORA-01722: invalid number
Performance
But since there is index on Y, performance team suggested us to do A.x=to_number(B.y);
You should chalange the team, as if you use a function on a column (to_number(B.y)) index can't be used.
On the contrary, your original query can perfectly use the following indexes:
create index b_y on b(y);
create index a_x on a(x);
Query
select *
from a
join b on to_char(A.x)=B.y
where A.x = 1;
Execution Plan
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 1 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 5 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN| A_X | 1 | 3 | 1 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| B_Y | 1 | 2 | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."X"=1)
3 - access("B"."Y"=TO_CHAR("A"."X"))

Understanding characteristics of a query for which an index makes a dramatic difference

I am trying to come up with an example showing that indexes can have a dramatic (orders of magnitude) effect on query execution time. After hours of trial and error I am still at square one. Namely, the speed-up is not large even when the execution plan shows using the index.
Since I realized that I better have a large table for the index to make a difference, I wrote the following script (using Oracle 11g Express):
CREATE TABLE many_students (
student_id NUMBER(11),
city VARCHAR(20)
);
DECLARE
nStudents NUMBER := 1000000;
nCities NUMBER := 10000;
curCity VARCHAR(20);
BEGIN
FOR i IN 1 .. nStudents LOOP
curCity := ROUND(DBMS_RANDOM.VALUE()*nCities, 0) || ' City';
INSERT INTO many_students
VALUES (i, curCity);
END LOOP;
COMMIT;
END;
I then tried quite a few queries, such as:
select count(*)
from many_students M
where M.city = '5467 City';
and
select count(*)
from many_students M1
join many_students M2 using(city);
and a few other ones.
I have seen this post and think that my queries satisfy the requirements stated in the replies there. However, none of the queries I tried showed dramatic improvement after building an index: create index myindex on many_students(city);
Am I missing some characteristic that distinguishes a query for which an index makes a dramatic difference? What is it?
The test case is a good start but it needs a few more things to get a noticeable performance difference:
Realistic data sizes. One million rows of two small values is a small table. With a table that small the performance difference between a good and a bad execution plan may not matter much.
The below script will double the table size until it gets to 64 million rows. It takes about 20 minutes on my machine. (To make it go quicker, for larger sizes, you could make the table nologging and add an /*+ append */ hint to the insert.
--Increase the table to 64 million rows. This took 20 minutes on my machine.
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
commit;
--The table has about 1.375GB of data. The actual size will vary.
select bytes/1024/1024/1024 gb from dba_segments where segment_name = 'MANY_STUDENTS';
Gather statistics. Always gather statistics after large table changes. The optimizer cannot do its job well unless it has table, column, and index statistics.
begin
dbms_stats.gather_table_stats(user, 'MANY_STUDENTS');
end;
/
Use hints to force a good and bad plan. Optimizer hints should usually be avoided. But to quickly compare different plans they can be helpful to fix a bad plan.
For example, this will force a full table scan:
select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
But you'll also want to verify the execution plan:
explain plan for select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
select * from table(dbms_xplan.display);
Flush the cache. Caching is probably the main culprit behind the index and full table scan queries taking the same amount of time. If the table fits entirely in memory then the time to read all the rows may be almost too small to measure. The number could be dwarfed by the time to parse the query or to send a simple result across the network.
This command will force Oracle to remove almost everything from the buffer cache. This will help you test a "cold" system. (You probably do not want to run this statement on a production system.)
alter system flush buffer_cache;
However, that won't flush the operating system or SAN cache. And maybe the table really would fit in memory on production. If you need to test a fast query it may be necessary to put it in a PL/SQL loop.
Multiple, alternating runs. There many things happening in the background, like caching and other processes. It's so easy to get bad results because something unrelated changed on the system.
Maybe the first run takes extra long to put things in a cache. Or maybe some huge job was started between queries. To avoid those issues, alternate running the two queries. Run them five times, throw out the highs and lows, and compare the averages.
For example, copy and paste the statements below five times and run them. (If using SQL*Plus, run set timing on first.) I already did that and posted the times I got in a comment before each line.
--Seconds: 0.02, 0.02, 0.03, 0.234, 0.02
alter system flush buffer_cache;
select count(*) from many_students M where M.city = '5467 City';
--Seconds: 4.07, 4.21, 4.35, 3.629, 3.54
alter system flush buffer_cache;
select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
Testing is hard. Putting together decent performance tests is difficult. The above rules are only a start.
This might seem like overkill at first. But it's a complex topic. And I've seen so many people, including myself, waste a lot of time "tuning" something based on a bad test. Better to spend the extra time now and get the right answer.
An index really shines when the database doesn't need to go to every row in a table to get your results. So COUNT(*) isn't the best example. Take this for example:
alter session set statistics_level = 'ALL';
create table mytable as select * from all_objects;
select * from mytable where owner = 'SYS' and object_name = 'DUAL';
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 300 |00:00:00.01 | 12 |
| 1 | TABLE ACCESS FULL| MYTABLE | 1 | 19721 | 300 |00:00:00.01 | 12 |
---------------------------------------------------------------------------------------
So, here, the database does a full table scan (TABLE ACCESS FULL), which means it has to visit every row in the database, which means it has to load every block from disk. Lots of I/O. The optimizer guessed that it was going to find 15000 rows, but I know there's only one.
Compare that with this:
create index myindex on mytable( owner, object_name );
select * from mytable where owner = 'SYS' and object_name = 'JOB$';
select * from table( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | 2 |
| 1 | TABLE ACCESS BY INDEX ROWID| MYTABLE | 1 | 2 | 1 |00:00:00.01 | 3 | 2 |
|* 2 | INDEX RANGE SCAN | MYINDEX | 1 | 1 | 1 |00:00:00.01 | 2 | 2 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS' AND "OBJECT_NAME"='JOB$')
Here, because there's an index, it does an INDEX RANGE SCAN to find the rowids for the table that match our criteria. Then, it goes to the table itself (TABLE ACCESS BY INDEX ROWID) and looks up only the rows we need and can do so efficiently because it has a rowid.
And even better, if you happen to be looking for something that is entirely in the index, the scan doesn't even have to go back to the base table. The index is enough:
select count(*) from mytable where owner = 'SYS';
select * from table( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 46 | 46 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 46 | 46 |
|* 2 | INDEX RANGE SCAN| MYINDEX | 1 | 8666 | 9294 |00:00:00.01 | 46 | 46 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
Because my query involved the owner column and that's contained in the index, it never needs to go back to the base table to look anything up there. So the index scan is enough, then it does an aggregation to count the rows. This scenario is a little less than perfect, because the index is on (owner, object_name) and not just owner, but its definitely better than doing a full table scan on the main table.

Oracle CBO when using types [duplicate]

I'm trying to optimize a set of stored procs which are going against many tables including this view. The view is as such:
We have TBL_A (id, hist_date, hist_type, other_columns) with two types of rows: hist_type 'O' vs. hist_type 'N'. The view self joins table A to itself and transposes the N rows against the corresponding O rows. If no N row exists for the O row, the O row values are repeated. Like so:
CREATE OR REPLACE FORCE VIEW V_A (id, hist_date, hist_type, other_columns_o, other_columns_n)
select
o.id, o.hist_date, o.hist_type,
o.other_columns as other_columns_o,
case when n.id is not null then n.other_columns else o.other_columns end as other_columns_n
from
TBL_A o left outer join TBL_A n
on o.id=n.id and o.hist_date=n.hist_date and n.hist_type = 'N'
where o.hist_type = 'O';
TBL_A has a unique index on: (id, hist_date, hist_type). It also has a unique index on: (hist_date, id, hist_type) and this is the primary key.
The following query is at issue (in a stored proc, with x declared as TYPE_TABLE_OF_NUMBER):
select b.id BULK COLLECT into x from TBL_B b where b.parent_id = input_id;
select v.id from v_a v
where v.id in (select column_value from table(x))
and v.hist_date = input_date
and v.status_new = 'CLOSED';
This query ignores the index on id column when accessing TBL_A and instead does a range scan using the date to pick up all the rows for the date. Then it filters that set using the values from the array. However if I simply give the list of ids as a list of numbers the optimizer uses the index just fine:
select v.id from v_a v
where v.id in (123, 234, 345, 456, 567, 678, 789)
and v.hist_date = input_date
and v.status_new = 'CLOSED';
The problem also doesn't exist when going against TBL_A directly (and I have a workaround that does that, but it's not ideal.).Is there a way to get the optimizer to first retrieve the array values and use them as predicates when accessing the table? Or a good way to restructure the view to achieve this?
Oracle does not use the index because it assumes select column_value from table(x) returns 8168 rows.
Indexes are faster for retrieving small amounts of data. At some point it's faster to scan the whole table than repeatedly walk the index tree.
Estimating the cardinality of a regular SQL statement is difficult enough. Creating an accurate estimate for procedural code is almost impossible. But I don't know where they came up with 8168. Table functions are normally used with pipelined functions in data warehouses, a sorta-large number makes sense.
Dynamic sampling can generate a more accurate estimate and likely generate a plan that will use the index.
Here's an example of a bad cardinality estimate:
create or replace type type_table_of_number as table of number;
explain plan for
select * from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 00:00:01 |
-------------------------------------------------------------------------
Here's how to fix it:
explain plan for select /*+ dynamic_sampling(2) */ *
from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 7 | 00:00:01 |
-------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Generalising Oracle Static Cursors

I am creating an OLAP-like package in Oracle where you call a main, controlling function that assembles its returning output table by making numerous left joins. These joined tables are defined in 'slave' functions within the package, which return specific subsets using static cursors, parameterised by the function's arguments. The thing is, these cursors are all very similar.
Is there a way, beyond generating dynamic queries and using them in a ref cursor, that I can generalise these. Every time I add a function, I get this weird feeling, as a developer, that this isn't particularly elegant!
Pseduocode
somePackage
function go(param)
return select myRows.id,
stats1.value,
stats2.value
from myRows
left join table(somePackage.stats1(param)) stats1
on stats1.id = myRows.id
left join table(somePackage.stats2(param)) stats2
on stats2.id = myRows.id
function stats1(param)
return [RESULTS OF SOME QUERY]
function stats2(param)
return [RESULTS OF A RELATED QUERY]
The stats queries all have the same structure:
First they aggregate the data in a useful way
Then they split this data into logical sections, based on criteria, and aggregate again (e.g., by department, by region, etc.) then union the results
Then they return the results, cast into the relevant object type, so I can easily do a bulk collect
Something like:
cursor myCursor is
with fullData as (
[AGGREGATE DATA]
),
fullStats as (
[AGGREGATE FULLDATA BY TOWN]
union all
[AGGREGATE FULLDATA BY REGION]
union all
[AGGREGATE FULLDATA BY COUNTRY]
)
select myObjectType(fullStats.*)
from fullStats;
...
open myCursor;
fetch myCursor bulk collect into output limit 1000;
close myCursor;
return output;
Filter operations can help build dynamic queries with static SQL. Especially when the column list is static.
You may have already considered this approach but discarded it for performance reasons. "Why execute every SQL block if we only need the results
from one of them?" You're in luck, the optimizer already does this for you with a FILTER operation.
Example Query
First create a function that waits 5 seconds every time it is run. It will help find which query blocks were executed.
create or replace function slow_function return number is begin
dbms_lock.sleep(5);
return 1;
end;
/
This static query is controlled by bind variables. There are three query blocks but the entire query runs in 5 seconds instead of 15.
declare
v_sum number;
v_query1 number := 1;
v_query2 number := 0;
v_query3 number := 0;
begin
select sum(total)
into v_sum
from
(
select total from (select slow_function() total from dual) where v_query1 = 1
union all
select total from (select slow_function() total from dual) where v_query2 = 1
union all
select total from (select slow_function() total from dual) where v_query3 = 1
);
end;
/
Execution Plan
This performance is not the result of good luck; it's not simply Oracle randomly executing one predicate before another. Oracle analyzes the bind variables before
run-time and does not even execute the irrelevant query blocks. That's what the FILTER operation below is doing. (Which is a poor name, many people generally
refer to all predicates as "filters". But only some of them result in a FILTER operation.)
select * from table(dbms_xplan.display_cursor(sql_id => '0cfqc6a70kzmt'));
SQL_ID 0cfqc6a70kzmt, child number 0
-------------------------------------
SELECT SUM(TOTAL) FROM ( SELECT TOTAL FROM (SELECT SLOW_FUNCTION()
TOTAL FROM DUAL) WHERE :B1 = 1 UNION ALL SELECT TOTAL FROM (SELECT
SLOW_FUNCTION() TOTAL FROM DUAL) WHERE :B2 = 1 UNION ALL SELECT TOTAL
FROM (SELECT SLOW_FUNCTION() TOTAL FROM DUAL) WHERE :B3 = 1 )
Plan hash value: 926033116
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 3 | 39 | 6 (0)| 00:00:01 |
| 3 | UNION-ALL | | | | | |
|* 4 | FILTER | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 8 | FILTER | | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(:B1=1)
6 - filter(:B2=1)
8 - filter(:B3=1)
Issues
The FILTER operation is poorly documented. I can't explain in detail when it does or does not work, and exactly what affects it has on other parts of the query. For example, in the explain plan the Rows estimate is 3, but at run time Oracle should easily be able to estimate the cardinality is 1. Apparently the execution plan is not that dynamic, that poor cardinality estimate may cause later issues. Also, I've seen some weird cases where static expressions are not appropriately filtered. But if a query uses a simple equality predicate it should be fine.
This approach allows you remove all dynamic SQL and replace it with a large static SQL statement. Which has some advantages; dynamic SQL is often "ugly" and difficult to debug. But people only familiar with procedural programming tend to think of a single SQL statement as one huge God-function, a bad practice. They won't appreciate that the UNION ALLs create independent blocks of SQL
Dynamic SQL is Still Probably Better
In general I would recommend against this approach. What you have is good because it looks good. The main problem with dynamic SQL is that people don't treat it like real code; it's not commented or formatted and ends up looking like a horrible mess that nobody can understand. If you are able to spend the extra time to generate clean code then you should stick with that.

Resources