I asked a question yesterday which got answers but didnt answer the main point. I wanted to reduce amount of time it took to do a MINUS operation.
Now, I'm thinking about doing MINUS operation in blocks of 5000, appending each iterations results to the cursor and finally returning the cursor.
I have following:
select count(1) into v_cnt from TABLE_1
while (v_cnt > 0)
open cv_1 for
However, as you see...in each iteration the cursor is overwritten. How can I change the code so that in each iteration it appends to cv_1 cursor rather than overwriting?
You haven't stated the requirement clearly.
So , i am assuming , you want to do a MINUS on two tables, A and B.
i.e you want to find tuples in A that are not in B.
Assuming this , the logic that you have written is not completely correct, as you are doing a MINUS on corresponding (5000-length) batches of A and B.
Eg: Your logic will return a tuple in the 4000th row in table A, that is present in say the 6000th row of table B.
I suggest you use left-outer join to accomplish your need. (Same as Peter Lang's post).
That should suffice for your performance requirements too, I think.
That's not how cursors work, you would have to store the values in some sort of collection.
Your current query gets you 5000 random rows from Table_1 and removes rows that also exist in 5000 random rows selected from Table_2.
Have you tried doing it without the MINUS?
As I understand the query, it should produce the same as this one:
Select a.head, a.effective_date,
From table_1 a
Left Join table_2 b On (b.head = a.head And b.effective_date = a.effective_date )
Where a.type_of_action='6' And a.effective_date >= ADD_MONTHS(SYSDATE,-15)
And b.head Is Null;
Having a compound index on TABLE_1 (type_of_action, head, effective_date) and on TABLE_2 (head, effective_date) should help you with performance.
I have a test script that I'm beggining to play with. I'm getting stuck with something that seems simple.
I want to iterate through rows to fetch data from last row of result set to use only it.
procedure e_test_send
cursor get_rec is
from test_email_tab;
for rec_ in get_rec loop
ifsapp.send_email_api.send_html_email(rec_.email_to,rec_.email_from, rec_.email_subject, rec_.email_message);
end loop;
end e_test_send;
All I'm trying to do is send an email with a message and to a person from the last row only. This is a sample table that will grow in records. At the minute I have 2 rows of data in it, if I execute this procedure it will send 2 emails which is not the desired action.
I hope this makes sense.
Do you know which row is the last row? The one with the MAX(ID) value? If so, then you could base cursor on a straightforward
FROM test_email_tab
WHERE id = (SELECT MAX (id) FROM test_email_tab)
As it scans the same table twice, its performance will drop as number of rows gets higher and higher. In that case, consider
FROM test_email_tab)
SELECT t.id,
FROM temp t
WHERE t.rn = 1
which does it only once; sorts rows by ID in descending order and returns the one that ranks as the "highest" (i.e. the last).
I will try to present my problem as simplified as possible.
Assume that we have 3 tables in Oracle 11g.
Persons (person_id, name, surname, status, etc )
Actions (action_id, person_id, action_value, action_date, calculated_flag)
Calculations (calculation_id, person_id,computed_value,computed_date)
What I want is for each person that meets certain criteria (let's say status=3)
I should get the sum of action_values from the Actions table where calculated_flag=0. (something like this select sum(action_value) from Actions where calculated_flag=0 and person_id=current_id).
Then I shall use that sum in a some kind of formula and update the Calculations table for that specific person_id.
update Calculations set computed_value=newvalue, computed_date=sysdate
where person_id=current_id
After that calculated_flag for participated rows will be set to 1.
update Actions set calculated_flag=1
where calculated_flag=0 and person_id=current_id
Now this can be easily done sequentially, by creating a cursor that will run through Persons table and then execute each action needed for the specific person.
(I don't provide the code for the sequential solution as the above is just an example that resembles my real-world setup.)
The problem is that we are talking about quite big amount of data and sequential approach seems like a waste of computational time.
It seems to me that this task could be performed in parallel for number of person_ids.
So the question is:
Can this kind of task be performed using parallelization in PL/SQL?
What would the solution look like? That is, what special packages (e.g. DBMS_PARALLEL_EXECUTE), keywords (e.g. bulk collect), methods should be used and in what manner?
Also, should I have any concerns about partial failure of parallel updates?
Note that I am not quite familiar with parallel programming with PL/SQL.
Edit 1.
Here my pseudo code for my sequential solution
procedure sequential_solution is
cursor persons_of_interest is
select person_id from persons
where status = 3;
tempvalue number;
newvalue number;
for person in persons_of_interest
savepoint personsp;
--step 1
select sum(action_value) into tempvalue
from actions
where calculated_flag = 0
and person_id = person.person_id;
newvalue := dosomemorecalculations(tempvalue);
--step 2
update calculations set computed_value = newvalue, computed_date = sysdate
where person_id = person.person_id;
--step 3
update actions set calculated_flag = 1;
where calculated_flag = 0 and person_id = person.person_id;
--step 4 (didn't mention this step before - sorry)
insert into actions
( person_id, action_value, action_date, calculated_flag )
( person.person_id, 100, sysdate, 0 );
when others then
rollback to personsp;
-- this call is defined with pragma AUTONOMOUS_TRANSACTION:
end loop;
Now, how would I speed up the above either with forall and bulk colletct or with parallel programming Under the following constrains:
proper memory management (taking into consideration large amount of data)
For a single person if one part of the step sequence fails - all steps should be rolled back and the failure logged.
I can propose the following. Let's say you have 1 000 000 rows in persons table, and you want to process 10 000 persons per iteration. So you can do it in this way:
id_from persons.person_id%type;
id_to persons.person_id%type;
calc_date date := sysdate;
for i in 1 .. 100 loop
id_from := (i - 1) * 10000;
id_to := i * 10000;
-- Updating Calculations table, errors are logged into err$_calculations table
merge into Calculations c
using (select p.person_id, sum(action_value) newvalue
from Actions a join persons p on p.person_id = a.person_id
where a.calculated_flag = 0
and p.status = 3
and p.person_id between id_from and id_to
group by p.person_id) s
on (s.person_id = c.person_id)
when matched then update
set c.computed_value = s.newvalue,
c.computed_date = calc_date
log errors into err$_calculations reject limit unlimited;
-- updating actions table only for those person_id which had no errors:
merge into actions a
using (select distinct p.person_id
from persons p join Calculations c on p.person_id = c.person_id
where c.computed_date = calc_date
and p.person_id between id_from and id_to)
on (c.person_id = p.person_id)
when matched then update
set a.calculated_flag = 1;
-- inserting list of persons for who calculations were successful
insert into actions (person_id, action_value, action_date, calculated_flag)
select distinct p.person_id, 100, calc_date, 0
from persons p join Calculations c on p.person_id = c.person_id
where c.computed_date = calc_date
and p.person_id between id_from and id_to;
end loop;
How it works:
You split the data in persons table into chunks about 10000 rows (depends on gaps in numbers of ID's, max value of i * 10000 should be knowingly more than maximal person_id)
You make a calculation in the MERGE statement and update the Calculations table
LOG ERRORS clause prevents exceptions. If an error occurs, the row with the error will not be updated, but it will be inserted into a table for errors logging. The execution will not be interrupted. To create this table, execute:
The table err$_calculations will be created. More information about DBMS_ERRLOG package see in the documentation.
The second MERGE statement sets calculated_flag = 1 only for rows, where no errors occured. INSERT statement inserts the these rows into actions table. These rows could be found just with the select from Calculations table.
Also, I added variables id_from and id_to to calculate ID's range to update, and the variable calc_date to make sure that all rows updated in first MERGE statement could be found later by date.
I have a function that returns a value and displays a similarity between tracks, i want the returned result to be ordered by this returned value, but i cannot figure out a way on how to do it, here is what i have already tried:
CREATE OR REPLACE PROCEDURE proc_list_similar_tracks(frstTrack IN tracks.track_id%TYPE)
sim number;
res tracks%rowtype;
chosenTrack tracks%rowtype;
select * into chosenTrack from tracks where track_id = frstTrack;
dbms_output.put_line('similarity between');
FOR res IN (select * from tracks WHERE ROWNUM <= 10)LOOP
SELECT * INTO sim FROM ( SELECT func_similarity(frstTrack, res.track_id)from dual order by sim) order by sim; //that's where i am getting the value and where i am trying to order
dbms_output.put_line( chosenTrack.track_name || '(' ||frstTrack|| ') and ' || res.track_name || '(' ||res.track_id|| ') ---->' || sim);
END proc_list_similar_tracks;
no errors are given, the list is just presented unsorted, is it not possible to order by a value that was returned by a function? if so, how do i accomplish something like this? or am i just doing something horribly wrong?
Any help will be appreciated
In the interests of (over-)optimisation I would avoid ordering by a function if I could possibly avoid it; especially one that queries other tables. If you're querying a table you should be able to add that part to your current query, which enables you to use it normally.
However, let's look at your function:
There's no point using DBMS_OUTPUT for anything but debugging unless you're going to be there looking at exactly what is output every time the function is run; you could remove these lines.
The following is used only for a DBMS_OUTPUT and is therefore an unnecessary SELECT and can be removed:
select * into chosenTrack from tracks where track_id = frstTrack;
You're selecting a random 10 rows from the table TRACKS; why?
FOR res IN (select * from tracks WHERE ROWNUM <= 10)LOOP
Your ORDER BY, order by sim, is ordering by a non-existent column as the column SIM hasn't been declared within the scope of the SELECT
Your ORDER BY is asking for the least similar as the default sort order is ascending (this may be correct but it seems wrong?)
Your function is not a function, it's a procedure (one without an OUT parameter).
Your SELECT INTO is attempting to place multiple rows into a single-row variable.
Assuming your "function" is altered to provide the maximum similarity between the parameter and a random 10 TRACK_IDs it might look as follows:
create or replace function list_similar_tracks (
frstTrack in tracks.track_id%type
) return number is
sim number;
select max(func_similarity(frstTrack, track_id)) into sim
from tracks
where rownum <= 10
return sim;
end list_similar_tracks;
However, the name of the function seems to preclude that this is what you're actually attempting to do.
From your comments, your question is actually:
I have the following code; how do I print the top 10 function results? The current results are returned unsorted.
sim number;
for res in ( select * from tracks ) loop
select * into sim
from ( select func_similarity(var1, var2)
from dual
order by sim
order by sim;
end loop;
The problem with the above is firstly that you're ordering by the variable sim, which is NULL in the first instance but changes thereafter. However, the select from DUAL is only a single row, which means you're randomly ordering by a single row. This brings us back to my point at the top - use SQL where possible.
In this case you can simply SELECT from the table TRACKS and order by the function result. To do this you need to give the column created by your function result an alias (or order by the positional argument as already described in Emmanuel's answer).
For instance:
select func_similarity(var1, var2) as function_result
from dual
Putting this together the code becomes:
for res in ( select *
from ( select func_similarity(variable, track_id) as f
from tracks
order by f desc
where rownum <= 10 ) loop
-- do something
end loop;
You have a query using a function, let's say something like:
select t.field1, t.field2, ..., function1(t.field1), ...
from table1 t
where ...
Oracle supports order by clause with column indexes, i.e. if the field returned by the function is the nth one in the select (here, field1 is in position 1, field2 in position 2), you just have to add:
order by n
For instance:
select t.field1, function1(t.field1) c2
from table1 t
where ...
order by 2 /* 2 being the index of the column computed by the function */
I have 2 delete statements that are taking a long time to complete. There are several indexes on the columns in where clause.
What is a duplicate?
If 2 or more records have same values in columns id,cid,type,trefid,ordrefid,amount and paydt then there are duplicates.
The DELETEs delete about 1 million record.
Can they be re-written in any way to make it quicker.
SELECT max(loaddt) FROM TABLE1 B
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
DELETE FROM TABLE1 a where rowid > (
Select min(rowid) from TABLE1 b
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
Explain Plan:
HASH JOIN 1296491
Access Predicates
ITEM_7=NVL(PAYDT,TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Filter Predicates
VIEW VW_SQ_1 690385
How large is the table? If count of deleted rows is up to 12% then you may think about index.
Could you somehow partition your table - like week by week and then scan only actual week?
Maybe this could be more effecient. When you're using aggregate function, then oracle must walk through all relevant rows (in your case fullscan), but when you use exists it stops when the first occurence is found. (and of course the query would be much faster, when there was one function-based(because of NVL) index on all columns in where clause)
WHERE exists (
A.loaddt != b.loaddt
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
Although some may disagree, I am a proponent of running large, long running deletes procedurally. In my view it is much easier to control and track progress (and your DBA will like you better ;-) Also, not sure why you need to join table1 to itself to identify duplicates (and I'd be curious if you ever run into snapshot too old issues with your current approach). You also shouldn't need multiple delete statements, all duplicates should be handled in one process. Finally, you should check WHY you're constantly re-introducing duplicates each week, and perhaps change the load process (maybe doing a merge/upsert rather than all inserts).
That said, you might try something like:
-- first create mat view to find all duplicates
create materialized view my_dups_mv
tablespace my_tablespace
build immediate
refresh complete on demand
select id,cid,type,trefid,ordrefid,amount,paydt, count(1) as cnt
from table1
group by id,cid,type,trefid,ordrefid,amount,paydt
having count(1) > 1;
-- dedup data (or put into procedure and schedule along with mat view refresh above)
-- make sure my_dups_mv is refreshed first
cursor dup_cur is
select * from my_dups_mv;
type duprec_t is record(row_id rowid);
duprec duprec_t;
type duptab_t is table of duprec_t index by pls_integer;
duptab duptab_t;
l_ctr pls_integer := 0;
l_dupcnt pls_integer := 0;
for rec in dup_cur
l_ctr := l_ctr + 1;
-- assuming needed indexes exist
select rowid
bulk collect into duptab
from table1
where id = rec.id
and cid = rec.cid
and type = rec.type
and trefid = rec.trefid
and ordrefid = rec.ordrefid
and amount = rec.amount
and paydt = rec.paydt
-- order by whatever makes sense to make the "keeper" float to top
order by loaddt desc
for i in 2 .. duptab.count
l_dupcnt := l_dupcnt + 1;
delete from table1 where rowid = duptab(i).row_id;
end loop;
if (mod(l_ctr, 10000) = 0) then
-- log to log table here (calling autonomous procedure you'll need to implement)
insert_logtable('Table1 deletes', 'Commit reached, deleted ' || l_dupcnt || ' rows');
end if;
end loop;
Check your log table for progress status.
1. Parallel
alter session enable parallel dml;
Assuming you have Enterprise Edition, a sane server configuration, and you are on 11g. If you're not on 11g, the parallel syntax is slightly different.
2. Reduce memory requirements
The plan shows a hash join, which is probably a good thing. But without any useful filters, Oracle has to hash the entire table. (Tbone's query, that only use a GROUP BY, looks nicer and may run faster. But it will also probably run into the same problem trying to sort or hash the entire table.)
If the hash can't fit in memory it must be written to disk, which can be very slow. Since you run this query every week, only one of the tables needs to look at all the rows. Depending on exactly when it runs, you can add something like this to the end of the query: ) where b.loaddt >= sysdate - 14. This may significantly reduce the amount of writing to temporary tablespace. And it may also reduce read IO if you use some partitioning strategy like jakub.petr suggested.
3. Active Report
If you want to know exactly what your query is doing, run the Active Report:
select dbms_sqltune.report_sql_monitor(sql_id => 'YOUR_SQL_ID_HERE', type => 'active')
from dual;
(Save the output to an .html file and open it with a browser.)
This question already has answers here:
SQL IN Clause 1000 item limit
(5 answers)
Closed 8 years ago.
Is there any way to get around the Oracle 10g limitation of 1000 items in a static IN clause? I have a comma delimited list of many of IDs that I want to use in an IN clause, Sometimes this list can exceed 1000 items, at which point Oracle throws an error. The query is similar to this...
select * from table1 where ID in (1,2,3,4,...,1001,1002,...)
Put the values in a temporary table and then do a select where id in (select id from temptable)
select column_X, ... from my_table
where ('magic', column_X ) in (
('magic', 1),
('magic', 2),
('magic', 3),
('magic', 4),
('magic', 99999)
) ...
I am almost sure you can split values across multiple INs using OR:
select * from table1 where ID in (1,2,3,4,...,1000) or
ID in (1001,1002,...,2000)
You may try to use the following form:
select * from table1 where ID in (1,2,3,4,...,1000)
union all
select * from table1 where ID in (1001,1002,...)
Where do you get the list of ids from in the first place? Since they are IDs in your database, did they come from some previous query?
When I have seen this in the past it has been because:-
a reference table is missing and the correct way would be to add the new table, put an attribute on that table and join to it
a list of ids is extracted from the database, and then used in a subsequent SQL statement (perhaps later or on another server or whatever). In this case, the answer is to never extract it from the database. Either store in a temporary table or just write one query.
I think there may be better ways to rework this code that just getting this SQL statement to work. If you provide more details you might get some ideas.
Use ...from table(... :
create or replace type numbertype
as object
(nr number(20,10) )
create or replace type number_table
as table of numbertype
create or replace procedure tableselect
( p_numbers in number_table
, p_ref_result out sys_refcursor)
open p_ref_result for
select *
from employees , (select /*+ cardinality(tab 10) */ tab.nr from table(p_numbers) tab) tbnrs
where id = tbnrs.nr;
This is one of the rare cases where you need a hint, else Oracle will not use the index on column id. One of the advantages of this approach is that Oracle doesn't need to hard parse the query again and again. Using a temporary table is most of the times slower.
edit 1 simplified the procedure (thanks to jimmyorr) + example
create or replace procedure tableselect
( p_numbers in number_table
, p_ref_result out sys_refcursor)
open p_ref_result for
select /*+ cardinality(tab 10) */ emp.*
from employees emp
, table(p_numbers) tab
where tab.nr = id;
set serveroutput on
create table employees ( id number(10),name varchar2(100));
insert into employees values (3,'Raymond');
insert into employees values (4,'Hans');
l_number number_table := number_table();
l_sys_refcursor sys_refcursor;
l_employee employees%rowtype;
l_number(1) := numbertype(3);
l_number(2) := numbertype(4);
tableselect(l_number, l_sys_refcursor);
fetch l_sys_refcursor into l_employee;
exit when l_sys_refcursor%notfound;
end loop;
close l_sys_refcursor;
This will output:
I wound up here looking for a solution as well.
Depending on the high-end number of items you need to query against, and assuming your items are unique, you could split your query into batches queries of 1000 items, and combine the results on your end instead (pseudocode here):
//remove dupes
items = items.RemoveDuplicates();
//how to break the items into 1000 item batches
batches = new batch list;
batch = new batch;
for (int i = 0; i < items.Count; i++)
if (batch.Count == 1000)
if (i == items.Count - 1)
//add the final batch (it has < 1000 items).
// now go query the db for each batch
results = new results;
foreach(batch in batches)
This may be a good trade-off in the scenario where you don't typically have over 1000 items - as having over 1000 items would be your "high end" edge-case scenario. For example, in the event that you have 1500 items, two queries of (1000, 500) wouldn't be so bad. This also assumes that each query isn't particularly expensive in of its own right.
This wouldn't be appropriate if your typical number of expected items got to be much larger - say, in the 100000 range - requiring 100 queries. If so, then you should probably look more seriously into using the global temporary tables solution provided above as the most "correct" solution. Furthermore, if your items are not unique, you would need to resolve duplicate results in your batches as well.
Yes, very weird situation for oracle.
if you specify 2000 ids inside the IN clause, it will fail.
this fails:
select ...
where id in (1,2,....2000)
but if you simply put the 2000 ids in another table (temp table for example), it will works
below query:
select ...
where id in (select userId
from temptable_with_2000_ids )
what you can do, actually could split the records into a lot of 1000 records and execute them group by group.
Here is some Perl code that tries to work around the limit by creating an inline view and then selecting from it. The statement text is compressed by using rows of twelve items each instead of selecting each item from DUAL individually, then uncompressed by unioning together all columns. UNION or UNION ALL in decompression should make no difference here as it all goes inside an IN which will impose uniqueness before joining against it anyway, but in the compression, UNION ALL is used to prevent a lot of unnecessary comparing. As the data I'm filtering on are all whole numbers, quoting is not an issue.
# generate the innards of an IN expression with more than a thousand items
use English '-no_match_vars';
sub big_IN_list{
#_ < 13 and return join ', ',#_;
my $padding_required = (12 - (#_ % 12)) % 12;
# get first dozen and make length of #_ an even multiple of 12
my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l) = splice #_,0,12, ( ('NULL') x $padding_required );
my #dozens;
local $LIST_SEPARATOR = ', '; # how to join elements within each dozen
push #dozens, "SELECT #{[ splice #_,0,12 ]} FROM DUAL"
$LIST_SEPARATOR = "\n union all\n "; # how to join #dozens
return <<"EXP";
select $a A, $b B, $c C, $d D, $e E, $f F, $g G, $h H, $i I, $j J, $k K, $l L FROM DUAL
union all
select A from t union select B from t union select C from t union
select D from t union select E from t union select F from t union
select G from t union select H from t union select I from t union
select J from t union select K from t union select L from t
One would use that like so:
my $bases_list_expr = big_IN_list(list_your_bases());
update bases_table set belong_to = 'us'
where id in ($bases_list_expr)
Instead of using IN clause, can you try using JOIN with the other table, which is fetching the id. that way we don't need to worry about limit. just a thought from my side.
Instead of SELECT * FROM table1 WHERE ID IN (1,2,3,4,...,1000);
Use this :
SELECT * FROM table1 WHERE ID IN (SELECT rownum AS ID FROM dual connect BY level <= 1000);
*Note that you need to be sure the ID does not refer any other foreign IDS if this is a dependency. To ensure only existing ids are available then :
SELECT * FROM table1 WHERE ID IN (SELECT distinct(ID) FROM tablewhereidsareavailable);