I need to perform delete operation(with oldest data first) on a de-normalized table which does not have any unique ID. We are considering date as parameter
while deciding the oldest data. But for each date their would be 600K records.And as table is very large the delete opertaion will be done in batches(Loop with rownum)
I used below query but this one is throwing error as 'data manipulation operation not legal on this view'. But I am sure that temp is object type of table.
delete from (Select A.*,rownum as rn from (select * from temp order by date) A) B where B.rn<10
Please let me know any alternative suggestions.
Thanks Kifinity your solution worked but now i am confused how to make it work to delete huge number of records.. I have around 10 millions of records and need to delete them in batches. Thanks in advance.
In your statement, everything after delete from is an inline view. You can't delete from an inline view; your statement has to be delete from TABLE, e.g.
delete from temp
where rowid in (select rowid as rid
from (select * from temp order by date)
where rownum < 10)
Edit: as requested, here's an example as part of a PL/SQL loop.
begin
loop
delete from temp
where rowid in (select rowid as rid
from (select * from temp order by date)
where rownum < 1000);
if SQL%ROWCOUNT = 0 then exit; end if;
end loop;
end;
/
Related
I am trying to use the following statement for the Delete process and it has to delete around 23566424 Rows, but oracle takes almost 3 hours to complete the process and we have already created an index on " SCHEDULE_DATE_KEY" but still, the process is very slow.Can someone advise on how to make Deletes faster in oracle
DELETE
FROM
EDWSOURCE.SCHEDULE_DAY_F
WHERE
SCHEDULE_DATE_KEY >
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
I don't think any index will help here, probably Oracle will decide the best approach is a full table scan to delete 20M rows from 300M. It is deleting at a rate of over 2000 rows per second, which isn't bad. In fact any additional indexes will slow it down as it has to delete the row entry from the index as well.
A quicker approach could be to create a new table of the rows you want to keep, something like:
create table EDWSOURCE.SCHEDULE_DAY_F_KEEP
as
select * from EDWSOURCE.SCHEDULE_DAY_F
where SCHEDULE_DATE_KEY <=
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
Then recreate any constraints and indexes to use the new table.
Finally drop the old table and rename the new one.
You can try testing a filtered table move. This has an online clause. So you can do this while the application is still running.
Note 12.2 and later the indexes will remain valid. In earlier versions you will need to rebuild the indexes as they will become invalid. Good Luck
Move a Table
Create and populate a new test table.
DROP TABLE t1 PURGE;
CREATE TABLE t1 AS
SELECT level AS id,
'Description for ' || level AS description
FROM dual
CONNECT BY level <= 100;
COMMIT;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
100 1 100
SQL>
Move the table, filtering out rows with an ID value greater than 50.
ALTER TABLE t1 MOVE ONLINE
INCLUDING ROWS WHERE id <= 50;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
50 1 50
SQL>
The rows with an ID value between 51 and 100 have been removed.
As mentioned above if maybe best to PARTITION the table abs drop a PARTITION every N number of days as part of a daily task.
I have written a plsql block to delete some records for a bunch of tables.
so to identify the records to be deleted I have created a cursor on top that query.
declare
type t_guid_invoice is ref cursor;
c_invoice t_guid_invoice;
begin
open c_invoice for
select * from a,b where a.col=b.col ;--(quite a complex join,renders 200k records)
loop fetch c_invoice into col1,col2,col3;
exit when c_invoice%NOTFOUND;
begin
DELETE
FROM tab2
WHERE cola= col1;
if SQL%rowcount > 0 then
dbms_output.put_line ( 'INFO: tab2 for ' || col1|| '/' || col2|| ' removed.');
else
dbms_output.put_line ( 'WARN: No tab2 for ' || col1|| '/' || col2|| ' found!');
end if;
eXception
when others then
dbms_output.put_line ( 'ERR: Problems while deleting tab2 for ' || col1|| '/' || col2 );
dbms_output.put_line ( SQLERRM );
end;
....
end loop;
This continues to loop through about 26 tables, there are some tables which are as big as 60 million records.
deletion is based on primary key in each table. All triggers are disabled before deletion process.
if I try to delete 10k records, it loops through 10k times, deleting multiple rows in each table but its taking as long as 30 minutes. There is no commit after each block, since I have to cater to simulation mode too.
Any suggestions to speed up the process? Thanks!
For sure, if you loop 10k times, all those DBMS_OUTPUT.PUT_LINE calls will slow things down (even if you aren't doing anything "smart") (and I wonder whether buffer is large enough). If you want to log what's going on, create a log table and an autonomous transaction procedure which will insert that info (and commit).
Apart from that, are tables properly indexed? E.g. that would be cola column in tab2 table (in code you posted). Did you collect statistics on tables and indexes? Probably wouldn't harm if you do that for the whole schema.
Did you check explain plan?
Do you know what takes the most time? Is it a ref cursor query (so it has to be optimized), or is it deleting itself?
Can't you avoid loop entirely? Row-by-row processing is slow. For example, instead of using ref cursor, create a table out of it, index it, and use it as
create table c_invoice as
select * from a join b on a.col = b.col;
create index i1inv_col1 on c_invoice (col1);
delete from tab2 t
where exists (select null
from c_invoice c
where c.col1 = t.cola
);
You typically never want to delete a large number of rows from a table in a loop.
You want to use one DELETE statement with an appropriate WHERE condition.
Additionaly while processing a large number of rows you typically do not want to use an index.
So your first step would be to check the rows that would not be deleted (your warnings)
You get the keys with the following query and you may log them
select a.col from a,b where a.col=b.col
minus
select cola from tab2;
In the second step you delete all rows with one statement.
delete
from tab2
where cola in (select a.col from a,b where a.col=b.col);
In problems check the execution plan, you expect TABLE ACCESS FULL (INDEX FAST FULL SCAN is fine as well) on all sources combined with a HASH JOIN.
I have identified some duplicates in my table:
-- DUPLICATES: ----
select PPLP_NAME,
START_TIME,
END_TIME,
count(*)
from PPLP_LOAD_GENSTAT
group by PPLP_NAME,
START_TIME,
END_TIME
having count(*) > 1
-- DUPLICATES: ----
How is it possible to delete them?
Even if you don't have the primary key, each record has a unique rowid associated.
By using the query below you delete only the records that don't have the maximum row id by self joining a table with the columns that cause duplication. This will make sure that you delete any duplicates.
DELETE FROM PPLP_LOAD_GENSTAT plg_outer
WHERE ROWID NOT IN(
select MAX(ROWID)
from PPLP_LOAD_GENSTAT plg_inner
WHERE plg_outer.pplp_name = plg_inner.pplg_name
AND plg_outer.start_time= plg_inner.start_time
AND plg_outer.end_time = plg_inner.end_time
);
I'd suggest something easier:
CREATE table NewTable as
SELECT DISTINCT pplp_name,start_time,end_time
FROM YourTable
Then delete your table, and rename the new table.
If you really want to delete records, you can find a few examples of how here.
I have to create a procedure which searches any recently added records and if there are then move them to ARCHIVE table.
This is my statement which filters recently added records
SELECT
CL_ID,
CL_NAME,
CL_SURNAME,
CL_PHONE,
VEH_ID,
VEH_REG_NO,
VEH_MODEL,
VEH_MAKE_YEAR,
WD_ID,
WORK_DESC,
INV_ID,
INV_SERIES,
INV_NUM,
INV_DATE,
INV_PRICE
FROM
CLIENT,
INVOICE,
VEHICLE,
WORKS,
WORKS_DONE
WHERE
Client.CL_ID=Invoice.INV_CL_ID and
Invoice.INV_CL_ID = Client.CL_ID and
Client.CL_ID = Vehicle.VEH_CL_ID and
Vehicle.VEH_ID = Works_Done.WD_VEH_ID and
Works_done.WD_INV_ID = Invoice.INV_ID and
WORKS_DONE.WD_WORK_ID = Works.WORK_ID and
Works_done. Timestamp >= sysdate -1;
You may need something like this (pseudo-code):
create or replace procedure moveRecords is
vLimitDate timestamp := systimestamp -1;
begin
insert into table2
select *
from table1
where your_date >= vLimitDate;
--
delete table1
where your_date >= vLimitDate;
end;
Here are the steps I've used for this sort of task in the past.
Create a global temporary table (GTT) to hold a set of ROWIDs
Perform a multitable direct path insert, which selects the rows to be archived from the source table and inserts their ROWIDs into the GTT and the rest of the data into the archive table.
Perform a delete from the source table, where the source table ROWID is in the GTT of rowids
Issue a commit.
The business with the GTT and the ROWIDs ensures that you have 100% guaranteed stability in the set of rows that you are selecting and then deleting from the source table, regardless of any changes that might occur between the start of your select and the start of your delete (other than someone causing a partitioned table row migration or shrinking the table).
You could alternatively achieve that through changing the transaction isolation level.
O.K. may be something like this...
The downside is - it can be slow for large tables.
The upside is that there is no dependence on date and time - so you can run it anytime and synchronize your archives with live data...
create or replace procedure archive is
begin
insert into archive_table
(
select * from main_table
minus
select * from archive_table
);
end;
I have sql query like :
insert into dedupunclear
select mdnnumber,poivalue
from deduporiginal a
where exists (
select 1
from deduporiginal
where mdnnumber=a.mdnnumber and rowid<a.rowid)
or mdnnumber is null;
There is 500K records in my deduporiginal. I have put this query inside function but it will take around 3hrs to commit records to dedupunclear table.
Is there any alternative to resolve performance issue ?
When this query commit records , At some interval or after getting all results from select query ?
This is how I did it the other day:
delete from table a
where rowid >
(select min(rowid) from table b
where nvl(a.id, 'x') = nvl(b.id, 'x') )
Instead of an insert into a dedupe table, I just deleted the rows directly from the staging table. For a table with 1 million rows, this query worked pretty well. I was worried that the nvl function would kill the index, but it worked well enough.