pl sql procedure for update - oracle

this procedure for update rows of phone number with dashes and adding default area code if phone number does not have it.i do not want to use cursor
CREATE OR REPLACE procedure pro
AS
begin
update judge set phone# = substr(Phone#, 1, 3) || '-' || substr(Phone#, 4,3) || '-' ||
substr(Phone#, 7, 4) where length(trim(phone#))=10;
update judge set phone# = substr(Phone#, 0, 0) || '309-298' ||
substr(Phone#, 1, 5)
where length(trim(phone#))=5;
END;
/
i want to add dashes only if phone number length is 10 and add area code if length is 5.
this code is working but is there any more efficient way of doing it.

is there any more efficient way of doing it.
Yes, there may be a faster method, but it depends on how big the table is, and what percentage of records in the table will be changed.
If the entire table is small - let's say it has less than 100~500 records, then creating indexes will most likely not give you any profit, simple Full Table Scan will be fast enough. In this case use only ONE update command instead of two separate ones - in this way, the table will be read and updated only once instead of twice, and the execution time will be shorter by about half :
update judge set phone# =
CASE length(trim(phone#))
WHEN 10
THEN substr(Phone#,1,3) || '-' || substr(Phone#,4,3) || '-' || substr(Phone#,7,4)
WHEN 5
THEN '309-298' || substr(Phone#,1,5)
ELSE phone#
END
where length(trim(phone#)) in (5,10);
If the entire table is big (thousands or millions records), but a number of records with lengths of 5 and 10 is relatively small (let say that less than 10~15% of all records), then create a function based index:
CREATE INDEX some_name_ix ON judge( length(trim(phone#)) );
and then, after creating the index, refresh the statistics:
exec DBMS_STATS.gather_table_stats( user, 'judge' );
After the above steps check if oracle is willing to use this index by generating explain plans for the below three update commands
EXPLAIN PLAN FOR
UPDATE judge SET phone#= '123'
WHERE length( trim( phone# ) ) in ( 5, 10 )
SELECT * FROM table( dbms_xplan.display );
EXPLAIN PLAN FOR
UPDATE judge SET phone#= '123'
WHERE length( trim( phone# ) ) = 5
SELECT * FROM table( dbms_xplan.display );
EXPLAIN PLAN FOR
UPDATE judge SET phone#= '123'
WHERE length( trim( phone# ) ) = 10
SELECT * FROM table( dbms_xplan.display );
NOTE: SET phone#= '123' clause doesn't matter here, Oracle will not update the table, what's important to us - and what we are checking using explain plan command - is how Oracle will execute the query for different WHERE clauses
For each of the above commands you may see something like below - the keyword TABLE ACCESS FULL means, that Oracle is going to use the Full Table Scan method for this udate, and ignores our index:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 53421 | 6468K| 36512 (1)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| JUDGE | 53421 | 6468K| 36512 (1)| 00:00:02 |
--------------------------------------------------------------------------------
you may also see sometjing like below: TABLE ACCESS BY INDEX ROWID ... + INDEX RANGE SCAN ... index name - this means, that Oracle is willing to use the index for this update:
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | 124 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JUDGE | 1 | 124 | 5 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | MY_INDEX_IX | 1 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
There is a Cost (%CPU) column in these tables, which say as what is relative cost of this concrete update (low cost = fast speed, high cost = slow speed).
Finally, depending of results you will get in these explain plans, you'll may decide to:
use only one single update in the procedure
or keep two separate updates
In a case when Oracle use the index in none of these 3 cases, then the index is useless and you can drop it using:
DROP INDEX indexname

For example you can try to combine updates:
update judge
set phone# = decode( length( trim( phone# ) ),
5, '309-298' || substr( Phone#, 1, 5 ),
10, substr( Phone#, 1, 3 ) || '-' || substr( Phone#, 4, 3 ) || '-' || substr( Phone#, 7, 4 ),
phone# )
where length( trim( phone# ) ) in ( 5, 10 );
and using regexp maybe more flexible than seaching by length and second of course insert already prepared and formated data.

Related

Confusion regarding to_char and to_number

First of all, I am aware about basics.
select to_number('A231') from dual; --this will not work but
select to_char('123') from dual;-- this will work
select to_number('123') from dual;-- this will also work
Actually in my package, we have 2 tables A(X number) and B(Y varchar) There are many columns but we are worried about only X and Y. X contains values only numeric like 123,456 etc but Y contains some string and some number for eg '123','HR123','Hello'. We have to join these 2 tables. its legacy application so we are not able to change tables and columns.
Till this time below condition was working properly
to_char(A.x)=B.y;
But since there is index on Y, performance team suggested us to do
A.x=to_number(B.y); it is running in dev env.
My question is, in any circumstances will this query give error? if it picks '123' definitely it will give 123. but if it picks 'AB123' then it will fail. can it fail? can it pick 'AB123' even when it is getting joined with other table.
can it fail?
Yes. It must put every row through TO_NUMBER before it can check whether or not it meets the filter condition. Therefore, if you have any one row where it will fail then it will always fail.
From Oracle 12.2 (since you tagged Oracle 12) you can use:
SELECT *
FROM A
INNER JOIN B
ON (A.x = TO_NUMBER(B.y DEFAULT NULL ON CONVERSION ERROR))
Alternatively, put an index on TO_CHAR(A.x) and use your original query:
SELECT *
FROM A
INNER JOIN B
ON (TO_CHAR(A.x) = B.y)
Also note: Having an index on B.y does not mean that the index will be used. If you are filtering on TO_NUMBER(B.y) (with or without the default on conversion error) then you would need a function-based index on the function TO_NUMBER(B.Y) that you are using. You should profile the queries and check the explain plans to see whether there is any improvement or change in use of indexes.
Never convert a VARCHAR2 column that can contain non-mumeric strings to_number.
This can partially work, but will eventuelly definitively fail.
Small Example
create table a as
select rownum X from dual connect by level <= 10;
create table b as
select to_char(rownum) Y from dual connect by level <= 10
union all
select 'Hello' from dual;
This could work (as you limit the rows, so that the conversion works; if you are lucky and Oracle chooses the right execution plan; which is probable, but not guarantied;)
select *
from a
join b on A.x=to_number(B.y)
where B.y = '1';
But this will fail
select *
from a
join b on A.x=to_number(B.y)
ORA-01722: invalid number
Performance
But since there is index on Y, performance team suggested us to do A.x=to_number(B.y);
You should chalange the team, as if you use a function on a column (to_number(B.y)) index can't be used.
On the contrary, your original query can perfectly use the following indexes:
create index b_y on b(y);
create index a_x on a(x);
Query
select *
from a
join b on to_char(A.x)=B.y
where A.x = 1;
Execution Plan
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 1 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 5 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN| A_X | 1 | 3 | 1 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| B_Y | 1 | 2 | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."X"=1)
3 - access("B"."Y"=TO_CHAR("A"."X"))

Understanding characteristics of a query for which an index makes a dramatic difference

I am trying to come up with an example showing that indexes can have a dramatic (orders of magnitude) effect on query execution time. After hours of trial and error I am still at square one. Namely, the speed-up is not large even when the execution plan shows using the index.
Since I realized that I better have a large table for the index to make a difference, I wrote the following script (using Oracle 11g Express):
CREATE TABLE many_students (
student_id NUMBER(11),
city VARCHAR(20)
);
DECLARE
nStudents NUMBER := 1000000;
nCities NUMBER := 10000;
curCity VARCHAR(20);
BEGIN
FOR i IN 1 .. nStudents LOOP
curCity := ROUND(DBMS_RANDOM.VALUE()*nCities, 0) || ' City';
INSERT INTO many_students
VALUES (i, curCity);
END LOOP;
COMMIT;
END;
I then tried quite a few queries, such as:
select count(*)
from many_students M
where M.city = '5467 City';
and
select count(*)
from many_students M1
join many_students M2 using(city);
and a few other ones.
I have seen this post and think that my queries satisfy the requirements stated in the replies there. However, none of the queries I tried showed dramatic improvement after building an index: create index myindex on many_students(city);
Am I missing some characteristic that distinguishes a query for which an index makes a dramatic difference? What is it?
The test case is a good start but it needs a few more things to get a noticeable performance difference:
Realistic data sizes. One million rows of two small values is a small table. With a table that small the performance difference between a good and a bad execution plan may not matter much.
The below script will double the table size until it gets to 64 million rows. It takes about 20 minutes on my machine. (To make it go quicker, for larger sizes, you could make the table nologging and add an /*+ append */ hint to the insert.
--Increase the table to 64 million rows. This took 20 minutes on my machine.
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
commit;
--The table has about 1.375GB of data. The actual size will vary.
select bytes/1024/1024/1024 gb from dba_segments where segment_name = 'MANY_STUDENTS';
Gather statistics. Always gather statistics after large table changes. The optimizer cannot do its job well unless it has table, column, and index statistics.
begin
dbms_stats.gather_table_stats(user, 'MANY_STUDENTS');
end;
/
Use hints to force a good and bad plan. Optimizer hints should usually be avoided. But to quickly compare different plans they can be helpful to fix a bad plan.
For example, this will force a full table scan:
select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
But you'll also want to verify the execution plan:
explain plan for select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
select * from table(dbms_xplan.display);
Flush the cache. Caching is probably the main culprit behind the index and full table scan queries taking the same amount of time. If the table fits entirely in memory then the time to read all the rows may be almost too small to measure. The number could be dwarfed by the time to parse the query or to send a simple result across the network.
This command will force Oracle to remove almost everything from the buffer cache. This will help you test a "cold" system. (You probably do not want to run this statement on a production system.)
alter system flush buffer_cache;
However, that won't flush the operating system or SAN cache. And maybe the table really would fit in memory on production. If you need to test a fast query it may be necessary to put it in a PL/SQL loop.
Multiple, alternating runs. There many things happening in the background, like caching and other processes. It's so easy to get bad results because something unrelated changed on the system.
Maybe the first run takes extra long to put things in a cache. Or maybe some huge job was started between queries. To avoid those issues, alternate running the two queries. Run them five times, throw out the highs and lows, and compare the averages.
For example, copy and paste the statements below five times and run them. (If using SQL*Plus, run set timing on first.) I already did that and posted the times I got in a comment before each line.
--Seconds: 0.02, 0.02, 0.03, 0.234, 0.02
alter system flush buffer_cache;
select count(*) from many_students M where M.city = '5467 City';
--Seconds: 4.07, 4.21, 4.35, 3.629, 3.54
alter system flush buffer_cache;
select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
Testing is hard. Putting together decent performance tests is difficult. The above rules are only a start.
This might seem like overkill at first. But it's a complex topic. And I've seen so many people, including myself, waste a lot of time "tuning" something based on a bad test. Better to spend the extra time now and get the right answer.
An index really shines when the database doesn't need to go to every row in a table to get your results. So COUNT(*) isn't the best example. Take this for example:
alter session set statistics_level = 'ALL';
create table mytable as select * from all_objects;
select * from mytable where owner = 'SYS' and object_name = 'DUAL';
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 300 |00:00:00.01 | 12 |
| 1 | TABLE ACCESS FULL| MYTABLE | 1 | 19721 | 300 |00:00:00.01 | 12 |
---------------------------------------------------------------------------------------
So, here, the database does a full table scan (TABLE ACCESS FULL), which means it has to visit every row in the database, which means it has to load every block from disk. Lots of I/O. The optimizer guessed that it was going to find 15000 rows, but I know there's only one.
Compare that with this:
create index myindex on mytable( owner, object_name );
select * from mytable where owner = 'SYS' and object_name = 'JOB$';
select * from table( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | 2 |
| 1 | TABLE ACCESS BY INDEX ROWID| MYTABLE | 1 | 2 | 1 |00:00:00.01 | 3 | 2 |
|* 2 | INDEX RANGE SCAN | MYINDEX | 1 | 1 | 1 |00:00:00.01 | 2 | 2 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS' AND "OBJECT_NAME"='JOB$')
Here, because there's an index, it does an INDEX RANGE SCAN to find the rowids for the table that match our criteria. Then, it goes to the table itself (TABLE ACCESS BY INDEX ROWID) and looks up only the rows we need and can do so efficiently because it has a rowid.
And even better, if you happen to be looking for something that is entirely in the index, the scan doesn't even have to go back to the base table. The index is enough:
select count(*) from mytable where owner = 'SYS';
select * from table( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 46 | 46 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 46 | 46 |
|* 2 | INDEX RANGE SCAN| MYINDEX | 1 | 8666 | 9294 |00:00:00.01 | 46 | 46 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
Because my query involved the owner column and that's contained in the index, it never needs to go back to the base table to look anything up there. So the index scan is enough, then it does an aggregation to count the rows. This scenario is a little less than perfect, because the index is on (owner, object_name) and not just owner, but its definitely better than doing a full table scan on the main table.

Using function based index (oracle) to speed up count(X)

I've a table Film:
CREATE TABLE film (
film_id NUMBER(5) NOT NULL,
title varchar2(255));
And I wanted to make the query, which counts how many titles start with the same word and only displays ones with more than 20, faster using a function based index. The query:
SELECT FW_SEPARATOR.FIRST_WORD AS "First Word", COUNT(FW_SEPARATOR.FIRST_WORD) AS "Count"
FROM (SELECT regexp_replace(FILM.TITLE, '(\w+).*$','\1') AS FIRST_WORD FROM FILM) FW_SEPARATOR
GROUP BY FW_SEPARATOR.FIRST_WORD
HAVING COUNT(FW_SEPARATOR.FIRST_WORD) >= 20;
The thing is, I created this function based index:
CREATE INDEX FIRST_WORD_INDEX ON FILM(regexp_replace(TITLE, '(\w+).*$','\1'));
But it didn't speed anything up...
I was wondering if anyone could help me with this :)
Add a redundant predicate to the query to convince Oracle that the expression will not return null values and an index can be used:
select regexp_replace(film.title, '(\w+).*$','\1') first_word
from film
where regexp_replace(film.title, '(\w+).*$','\1') is not null;
Oracle can use an index like a skinny version of a table. Many queries only contain a small subset of the columns in a table. If all the columns in that set are part of the same index, Oracle can use that index instead of the table. This will be either an INDEX FAST FULL SCAN or an INDEX FULL SCAN. The data may be read similar to the way a regular table scan works. But since the index is much smaller than the table, that access method can be much faster.
But function-based indexes do not store NULLs. Oracle cannot use an index scan if it thinks there is a NULL that is not stored in the index. In this case, if the base column was defined as NOT NULL, the regular expression would always return a non-null value. But unsurprisingly, Oracle has not built code to determine whether or not a regular expression could return NULL. That sounds like an impossible task, similar to the halting problem.
There are several ways to convince Oracle that the expression is not null. The simplest may be to repeat the predicate and add an IS NOT NULL condition.
Sample Schema
create table film (
film_id number(5) not null,
title varchar2(255) not null);
insert into film select rownumber, column_value
from
(
select rownum rownumber, column_value from table(sys.odcivarchar2list(
q'<The Shawshank Redemption>',
q'<The Godfather>',
q'<The Godfather: Part II>',
q'<The Dark Knight>',
q'<Pulp Fiction>',
q'<The Good, the Bad and the Ugly>',
q'<Schindler's List>',
q'<12 Angry Men>',
q'<The Lord of the Rings: The Return of the King>',
q'<Fight Club>'))
);
create index film_idx1 on film(regexp_replace(title, '(\w+).*$','\1'));
begin
dbms_stats.gather_table_stats(user, 'FILM');
end;
/
Query that does not use index
Even with an index hint, the normal query will not use an index. Remember that hints are directives, and this query would use the index if it was possible.
explain plan for
select /*+ index_ffs(film) */ regexp_replace(title, '(\w+).*$','\1') first_word
from film;
select * from table(dbms_xplan.display);
Plan hash value: 1232367652
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| FILM | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Query that uses index
Now add the extra condition and the query will use the index. I'm not sure why it uses an INDEX FULL SCAN instead of an INDEX FAST FULL SCAN. With such small sample data it doesn't matter. The important point is that an index is used.
explain plan for
select regexp_replace(film.title, '(\w+).*$','\1') first_word
from film
where regexp_replace(film.title, '(\w+).*$','\1') is not null;
select * from table(dbms_xplan.display);
Plan hash value: 1151375616
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 1 (0)| 00:00:01 |
|* 1 | INDEX FULL SCAN | FILM_IDX1 | 10 | 50 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( REGEXP_REPLACE ("TITLE",'(\w+).*$','\1') IS NOT NULL)

Generalising Oracle Static Cursors

I am creating an OLAP-like package in Oracle where you call a main, controlling function that assembles its returning output table by making numerous left joins. These joined tables are defined in 'slave' functions within the package, which return specific subsets using static cursors, parameterised by the function's arguments. The thing is, these cursors are all very similar.
Is there a way, beyond generating dynamic queries and using them in a ref cursor, that I can generalise these. Every time I add a function, I get this weird feeling, as a developer, that this isn't particularly elegant!
Pseduocode
somePackage
function go(param)
return select myRows.id,
stats1.value,
stats2.value
from myRows
left join table(somePackage.stats1(param)) stats1
on stats1.id = myRows.id
left join table(somePackage.stats2(param)) stats2
on stats2.id = myRows.id
function stats1(param)
return [RESULTS OF SOME QUERY]
function stats2(param)
return [RESULTS OF A RELATED QUERY]
The stats queries all have the same structure:
First they aggregate the data in a useful way
Then they split this data into logical sections, based on criteria, and aggregate again (e.g., by department, by region, etc.) then union the results
Then they return the results, cast into the relevant object type, so I can easily do a bulk collect
Something like:
cursor myCursor is
with fullData as (
[AGGREGATE DATA]
),
fullStats as (
[AGGREGATE FULLDATA BY TOWN]
union all
[AGGREGATE FULLDATA BY REGION]
union all
[AGGREGATE FULLDATA BY COUNTRY]
)
select myObjectType(fullStats.*)
from fullStats;
...
open myCursor;
fetch myCursor bulk collect into output limit 1000;
close myCursor;
return output;
Filter operations can help build dynamic queries with static SQL. Especially when the column list is static.
You may have already considered this approach but discarded it for performance reasons. "Why execute every SQL block if we only need the results
from one of them?" You're in luck, the optimizer already does this for you with a FILTER operation.
Example Query
First create a function that waits 5 seconds every time it is run. It will help find which query blocks were executed.
create or replace function slow_function return number is begin
dbms_lock.sleep(5);
return 1;
end;
/
This static query is controlled by bind variables. There are three query blocks but the entire query runs in 5 seconds instead of 15.
declare
v_sum number;
v_query1 number := 1;
v_query2 number := 0;
v_query3 number := 0;
begin
select sum(total)
into v_sum
from
(
select total from (select slow_function() total from dual) where v_query1 = 1
union all
select total from (select slow_function() total from dual) where v_query2 = 1
union all
select total from (select slow_function() total from dual) where v_query3 = 1
);
end;
/
Execution Plan
This performance is not the result of good luck; it's not simply Oracle randomly executing one predicate before another. Oracle analyzes the bind variables before
run-time and does not even execute the irrelevant query blocks. That's what the FILTER operation below is doing. (Which is a poor name, many people generally
refer to all predicates as "filters". But only some of them result in a FILTER operation.)
select * from table(dbms_xplan.display_cursor(sql_id => '0cfqc6a70kzmt'));
SQL_ID 0cfqc6a70kzmt, child number 0
-------------------------------------
SELECT SUM(TOTAL) FROM ( SELECT TOTAL FROM (SELECT SLOW_FUNCTION()
TOTAL FROM DUAL) WHERE :B1 = 1 UNION ALL SELECT TOTAL FROM (SELECT
SLOW_FUNCTION() TOTAL FROM DUAL) WHERE :B2 = 1 UNION ALL SELECT TOTAL
FROM (SELECT SLOW_FUNCTION() TOTAL FROM DUAL) WHERE :B3 = 1 )
Plan hash value: 926033116
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 3 | 39 | 6 (0)| 00:00:01 |
| 3 | UNION-ALL | | | | | |
|* 4 | FILTER | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 8 | FILTER | | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(:B1=1)
6 - filter(:B2=1)
8 - filter(:B3=1)
Issues
The FILTER operation is poorly documented. I can't explain in detail when it does or does not work, and exactly what affects it has on other parts of the query. For example, in the explain plan the Rows estimate is 3, but at run time Oracle should easily be able to estimate the cardinality is 1. Apparently the execution plan is not that dynamic, that poor cardinality estimate may cause later issues. Also, I've seen some weird cases where static expressions are not appropriately filtered. But if a query uses a simple equality predicate it should be fine.
This approach allows you remove all dynamic SQL and replace it with a large static SQL statement. Which has some advantages; dynamic SQL is often "ugly" and difficult to debug. But people only familiar with procedural programming tend to think of a single SQL statement as one huge God-function, a bad practice. They won't appreciate that the UNION ALLs create independent blocks of SQL
Dynamic SQL is Still Probably Better
In general I would recommend against this approach. What you have is good because it looks good. The main problem with dynamic SQL is that people don't treat it like real code; it's not commented or formatted and ends up looking like a horrible mess that nobody can understand. If you are able to spend the extra time to generate clean code then you should stick with that.

Use Oracle unnested VARRAY's instead of IN operator

Let's say users have 1 - n accounts in a system. When they query the database, they may choose to select from m acounts, with m between 1 and n. Typically the SQL generated to fetch their data is something like
SELECT ... FROM ... WHERE account_id IN (?, ?, ..., ?)
So depending on the number of accounts a user has, this will cause a new hard-parse in Oracle, and a new execution plan, etc. Now there are a lot of queries like that and hence, a lot of hard-parses, and maybe the cursor/plan cache will be full quite early, resulting in even more hard-parses.
Instead, I could also write something like this
-- use any of these
CREATE TYPE numbers AS VARRAY(1000) of NUMBER(38);
CREATE TYPE numbers AS TABLE OF NUMBER(38);
SELECT ... FROM ... WHERE account_id IN (
SELECT column_value FROM TABLE(?)
)
-- or
SELECT ... FROM ... JOIN (
SELECT column_value FROM TABLE(?)
) ON column_value = account_id
And use JDBC to bind a java.sql.Array (i.e. an oracle.sql.ARRAY) to the single bind variable. Clearly, this will result in less hard-parses and less cursors in the cache for functionally equivalent queries. But is there anything like general a performance-drawback, or any other issues that I might run into?
E.g: Does bind variable peeking work in a similar fashion for varrays or nested tables? Because the amount of data associated with every account may differ greatly.
I'm using Oracle 11g in this case, but I think the question is interesting for any Oracle version.
I suggest you try a plain old join like in
SELECT Col1, Col2
FROM ACCOUNTS ACCT
TABLE TAB,
WHERE ACCT.User = :ParamUser
AND TAB.account_id = ACCT.account_id;
An alternative could be a table subquery
SELECT Col1, Col2
FROM (
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
) ACCT,
TABLE TAB
WHERE TAB.account_id = ACCT.account_id;
or a where subquery
SELECT Col1, Col2
FROM TABLE TAB
WHERE TAB.account_id IN
(
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
);
The first one should be better for perfomance, but you better check them all with explain plan.
Looking at V$SQL_BIND_CAPTURE in a 10g database, I have a few rows where the datatype is VARRAY or NESTED_TABLE; the actual bind values were not captured. In an 11g database, there is just one such row, but it also shows that the bind value is not captured. So I suspect that bind value peeking essentially does not happen for user-defined types.
In my experience, the main problem you run into using nested tables or varrays in this way is that the optimizer does not have a good estimate of the cardinality, which could lead it to generate bad plans. But, there is an (undocumented?) CARDINALITY hint that might be helpful. The problem with that is, if you calculate the actual cardinality of the nested table and include that in the query, you're back to having multiple distinct query texts. Perhaps if you expect that most or all users will have at most 10 accounts, using the hint to indicate that as the cardinality would be helpful. Of course, I'd try it without the hint first, you may not have an issue here at all.
(I also think that perhaps Miguel's answer is the right way to go.)
For medium sized list (several thousand items) I would use this approach:
First:generate a prepared statement with an XMLTABLE in join with your main table.
For instance:
String myQuery = "SELECT ...
+" FROM ACCOUNTS A,"
+ "XMLTABLE('tab/row' passing XMLTYPE(?) COLUMNS id NUMBER path 'id') t
+ "WHERE A.account_id = t.id"
then loop through your data and build a StringBuffer with this content:
StringBuffer idList = "<tab><row><id>101</id></row><row><id>907</id></row> ...</tab>";
eventually, prepare and submit your statement, then fetch the results.
myQuery.setString(1, idList);
ResultSet rs = myQuery.executeQuery();
while (rs.next()) {...}
Using this approach is also possible to pass multi-valued list, as in the select statement
SELECT * FROM TABLE t WHERE (t.COL1, t.COL2) in (SELECT X.COL1, X.COL2 FROM X);
In my experience performances are pretty good, and the approach is flexible enough to be used in very complex query scenarios.
The only limit is the size of the string passed to the DB, but I suppose it is possible to use CLOB in place of String for arbitrary long XML wrapper to the input list;
This binding a variable number of items into an in list problem seems to come up a lot in various form. One option is to concatenate the IDs into a comma separated string and bind that, and then use a bit of a trick to split it into a table you can join against, eg:
with bound_inlist
as
(
select
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.token
Bind variable peaking is going to be a problem though.
Does the query plan actually change for larger number of accounts, ie would it be more efficient to move from index to full table scan in some cases, or is it borderline? As someone else suggested, you could use the CARDINALITY hint to indicate how many IDs are being bound, the following test case proves this actually works:
create table actual_table (id integer, padding varchar2(100));
create unique index actual_table_idx on actual_table(id);
insert into actual_table
select level, 'this is just some padding for '||level
from dual connect by level <= 1000;
explain plan for
with bound_inlist
as
(
select /*+ CARDINALITY(10) */
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.id;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 840 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 840 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 10 | 190 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | ACTUAL_TABLE_IDX | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | ACTUAL_TABLE | 1 | 65 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Another option is to always use n bind variables in every query. Use null for m+1 to n.
Oracle ignores repeated items in the expression_list. Your queries will perform the same way and there will be fewer hard parses. But there will be extra overhead to bind all the variables and transfer the data. Unfortunately I have no idea what the overall affect on performance would be, you'd have to test it.

Resources