Speed up updates on Oracle DB which a lot of records - oracle

I have to update table which has around 93 mln records, at the beginning DB updated 10 k records per 5 seconds, now after around 60 mln updated records, update next 10k records take 30-60 s, don't know why I have to update columns which are null.
I use loop with commit each 10 k records:
LOOP
UPDATE TABLE
SET DATE_COLUMN = v_hist_date
WHERE DATE_COLUMN IS NULL
AND ROWNUM <= c_commit_limit
AND NOT_REMOVED IS NULL;
EXIT WHEN SQL%ROWCOUNT = 0;
COMMIT;
END LOOP;
Have you any ideas why it slow down so much and how is possible to speed up this update ?

Updates are queries too. You haven't posted an explain plan but given you are filtering on columns which are null it seems probable that your statement is executing a Full Table Scan. That certainly fits the behaviour you describe.
What happens is this. The first loop the FTS finds 10000 rows which fit the WHERE criteria almost immediately. Then you exit the loop and start again. This time the FTS reads the same blocks again, including the ones it updated in the previous iteration before it finds the next 10000 rows it can update. And so on. Each loop takes longer because the full table scan has to read more of the table for each loop.
This is one of the penalties of randomly committing inside a loop. It may be too late for you now, but a better approach would be to track an indexed column such as a primary key. Using such a tracking key will allow an index scan to skip past the rows you have already visited.

Related

Insert procedure is working fast at first but slows down after n number of records

We were doing a bulk collect insert into a table in Oracle 19c. The logic is the following
loop
fetch cursor c bulk collect
into v_row limit 1000 --v_row is table of source_table%rowtype
exit when (v_row.count) = 0;
for i in v_row.first .. v_row.last
loop
--do processing here, assign v_row(i) values to v_target_table%rowtype variable
--at the end we are extending a nested table and populating it with this v_target_table row
v_tar_tab.extend;
v_tar_tab(v_tar_tab.count) := v_target_table;
end loop;
end loop;
--insert with forall
forall i in v_tar_tab.first .. v_tar_tab.last
insert into target_table
values v_tar_tab (i);
v_tar_tab := t_tar_table(); -- is table of target_table%rowtype
commit;
The problem is in this particular case source table has 300 000 rows, and for the first 100 000 rows insert is working very fast, but after that time for each 1000 fetch is increasing and overall time for the rest 200 000 rows is too big compared to a time spent for the first 100 000 rows.
To see this, we added a counter variable and in each fetch increased this counter variable by 1000 and logged iteration number and value of this counter in our log table. After 95th - 100th iteration where 100 000 rows are fetched and processed, process slows down.
There are no commits inside the loop, target table is set to nologging, and its constraints and indexes are disabled before executing this insert procedure. I can't think of any reason why it works fast for the first n number of rows and slow for the rest. Any ideas on what should be changed?
If it is important to note, select statement in cursor c runs in parallel with a hint. I added the APPEND_VALUES hint inside insert into statement, but it didn't change the overall behaviour or time.
Look at your session statistics related to undo records. The delta between when a query commenced and when you perform a fetch from it can have an impact on performance because we guarantee to return the records as they were at the moment the query commenced.
If the source table is underdoing transaction activity, then we need to undo those changes as part of the fetch process.
Video showing a demo of that cost here

Amazon Redshift, getting slower in every update run

I am beginning with Amanzon Redshift.
I have just loaded a big table, millions of rows and 171 fields. The data quality is poor, there are a lot of characters that must be removed.
I have prepare updates for every column, since redshift stores in column mode, I suppose it is faster by column.
UPDATE MyTable SET Field1 = REPLACE(Field1, '~', '');
UPDATE MyTable SET Field2 = REPLACE(Field2, '~', '');
.
.
.
UPDATE MyTable set FieldN = Replace(FieldN, '~', '');
The first 'update' took 1 min. The second one took 1 min and 40 sec...
Every time a run one of the updates, it takes more time than the last one. I have run 19 and the last one took almost 25 min. The time consumed by every 'update' increases one after another.
Another thing is that with the first update, the cpu utilization was minimal, now with the last update it is taking 100%
I have a 3-nodes cluster of dc1.large instances.
I have rebooted the cluster but the problem continues.
Please, I need orientation to find the cause of this problem.
When you update a column, Redshift actually deletes all those rows and inserts new rows with the new value. So there are bunch of space that needs to be reclaimed. So you need to VACUUM your table after the update.
They also recommend that you run ANALYZE after each update to update statistics for the query planner.
http://docs.aws.amazon.com/redshift/latest/dg/r_UPDATE.html
A more optimal way might be
Create another identical table.
Read N ( say 10000) rows at a time from first table, process and load into the second table using s3
loading (instead of insert).
Delete first table and rename second table
If are running into space issues, delete the N migrated rows after every iteration from the first table and run vacuum delete only <name_of_first_table>
Refrences
s3 loading : http://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-run-copy.html
copy table from 's3://<your-bucket-name>/load/key_prefix' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' options;

ORACLE db performance tuning

We are running into performance issue where I need some suggestions ( we are on Oracle 10g R2)
The situation is sth like this
1) It is a legacy system.
2) In some of the tables it holds data for the last 10 years ( means data was never deleted since the first version was rolled out). Now in most of the OLTP tables they are having around 30,000,000 - 40,000,000 rows.
3) Search operations on these tables is taking flat 5-6 minutes of time. ( a simple query like select count(0) from xxxxx where isActive=’Y’ takes around 6 minutes of time.) When we saw the explain plan we found that index scan is happening on isActive column.
4) We have suggested archive and purge of the old data which is not needed and team is working towards it. Even if we delete 5 years of data we are left with around 15,000,000 - 20,000,000 rows in the tables which itself is very huge, so we thought of having table portioning on these tables, but we found that the user can perform search of most of the columns of these tables from UI,so which will defeat the very purpose of table partitioning.
so what are the steps which need to be taken to improve this situation.
First of all: question why you are issuing the query select count(0) from xxxxx where isactive = 'Y' in the first place. Nine out of ten times it is a lazy way to check for existence of a record. If that's the case with you, just replace it with a query that select 1 row (rownum = 1 and a first_rows hint).
The number of rows you mention are nothing to be worried about. If your application doesn't perform well when number of rows grows, then your system is not designed to scale. I'd investigate all queries that take too long using a SQL*Trace or ASH and fix it.
By the way: nothing you mentioned justifies the term legacy, IMHO.
Regards,
Rob.
Just a few observations:
I'm guessing that the "isActive" column can have two values - 'Y' and 'N' (or perhaps 'Y', 'N', and NULL - although why in the name of Fred there wouldn't be a NOT NULL constraint on such a column escapes me). If this is the case an index on this column would have very poor selectivity and you might be better off without it. Try dropping the index and re-running your query.
#RobVanWijk's comment about use of SELECT COUNT(*) is excellent. ONLY ask for a row count if you really need to have the count; if you don't need the count, I've found it's faster to do a direct probe (SELECT whatever FROM wherever WHERE somefield = somevalue) with an apprpriate exception handler than it is to do a SELECT COUNT(*). In the case you cited, I think it would be better to do something like
BEGIN
SELECT IS_ACTIVE
INTO strIsActive
FROM MY_TABLE
WHERE IS_ACTIVE = 'Y';
bActive_records_found := TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
bActive_records_found := FALSE;
WHEN TOO_MANY_ROWS THEN
bActive_records_found := TRUE;
END;
As to partitioning - partitioning can be effective at reducing query times IF the field on which the table is partitioned is used in all queries. For example, if a table is partitioned on the TRANSACTION_DATE variable, then for the partitioning to make a difference all queries against this table would have to have a TRANSACTION_DATE test in the WHERE clause. Otherwise the database will have to search each partition to satisfy the query, so I doubt any improvements would be noted.
Share and enjoy.

How to delete large data from Oracle 9i DB?

I have a table that is 5 GB, now I was trying to delete like below:
delete from tablename
where to_char(screatetime,'yyyy-mm-dd') <'2009-06-01'
But it's running long and no response. Meanwhile I tried to check if anybody is blocking with this below:
select l1.sid, ' IS BLOCKING ', l2.sid
from v$lock l1, v$lock l2
where l1.block =1 and l2.request > 0
and l1.id1=l2.id1
and l1.id2=l2.id2
But I didn't find any blocking also.
How can I delete this large data without any problem?
5GB is not a useful measurement of table size. The total number of rows matters. The number of rows you are going to delete as a proportion of the total matters. The average length of the row matters.
If the proportion of the rows to be deleted is tiny it may be worth your while creating an index on screatetime which you will drop afterwards. This may mean your entire operation takes longer, but crucially, it will reduce the time it takes for you to delete the rows.
On the other hand, if you are deleting a large chunk of rows you might find it better to
Create a copy of the table using
'create table t1_copy as select * from t1
where screatedate >= to_date('2009-06-01','yyyy-mm-dd')`
Swap the tables using the rename command.
Re-apply constraints, indexs to the new T1.
Another thing to bear in mind is that deletions eat more UNDO than other transactions, because they take more information to rollback. So if your records are long and/or numerous then your DBA may need to check the UNDO tablespace (or rollback segs if you're still using them).
Finally, have you done any investigation to see where the time is actually going? DELETE statements are just another query, and they can be tackled using the normal panoply of tuning tricks.
Use a query condition to export necessary rows
Truncate table
Import rows
If there is an index on screatetime your query may not be using it. Change your statement so that your where clause can use the index.
delete from tablename where screatetime < to_date('2009-06-01','yyyy-mm-dd')
It runs MUCH faster when you lock the table first. Also change the where clause, as suggested by Rene.
LOCK TABLE tablename IN EXCLUSIVE MODE;
DELETE FROM tablename
where screatetime < to_date('2009-06-01','yyyy-mm-dd');
EDIT: If the table cannot be locked, because it is constantly accessed, you can choose the salami tactic to delete those rows:
BEGIN
LOOP
DELETE FROM tablename
WHERE screatetime < to_date('2009-06-01','yyyy-mm-dd')
AND ROWNUM<=10000;
EXIT WHEN SQL%ROWCOUNT=0;
COMMIT;
END LOOP;
END;
Overall, this will be slower, but it wont burst your rollback segment and you can see the progress in another session (i.e. the number of rows in tablename goes down). And if you have to kill it for some reason, rollback won't take forever and you haven't lost all work done so far.

Inserts are 4x slower if table has lots of record (400K) vs. if it's empty

(Database: Oracle 10G R2)
It takes 1 minute to insert 100,000 records into a table. But if the table already contains some records (400K), then it takes 4 minutes and 12 seconds; also CPU-wait jumps up and “Free Buffer Waits” become really high (from dbconsole).
Do you know what’s happing here? Is this because of frequent table extents? The extent size for these tables is 1,048,576 bytes. I have a feeling DB is trying to extend the table storage.
I am really confused about this. So any help would be great!
This is the insert statement:
begin
for i in 1 .. 100000 loop
insert into customer
(id, business_name, address1,
address2, city,
zip, state, country, fax,
phone, email
)
values (customer_seq.nextval, dbms_random.string ('A', 20), dbms_random.string ('A', 20),
dbms_random.string ('A', 20), dbms_random.string ('A', 20),
trunc (dbms_random.value (10000, 99999)), 'CA', 'US', '798-779-7987',
'798-779-7987', 'asdfasf#asfasf.com'
);
end loop;
end;
Here dstat output (CPU, IO, MEMORY, NET) for :
Empty Table inserts: http://pastebin.com/f40f50dbb
Table with 400K records: http://pastebin.com/f48d8ebc7
Output from v$buffer_pool_statistics
ID: 3
NAME: DEFAULT
BLOCK_SIZE: 8192
SET_MSIZE: 4446
CNUM_REPL: 4446
CNUM_WRITE: 0
CNUM_SET: 4446
BUF_GOT: 1407656
SUM_WRITE: 1244533
SUM_SCAN: 0
FREE_BUFFER_WAIT: 93314
WRITE_COMPLETE_WAIT: 832
BUFFER_BUSY_WAIT: 788
FREE_BUFFER_INSPECTED: 2141883
DIRTY_BUFFERS_INSPECTED: 1030570
DB_BLOCK_CHANGE: 44445969
DB_BLOCK_GETS: 44866836
CONSISTENT_GETS: 8195371
PHYSICAL_READS: 930646
PHYSICAL_WRITES: 1244533
UPDATE
I dropped indexes off this table and performance improved drastically even when inserting 100K into 600K records table (which took 47 seconds with no CPU wait - see dstat output http://pastebin.com/fbaccb10 ) .
Not sure if this is the same in Oracle, but in SQL Server the first thing I'd check is how many indexes you have on the table. If it's a lot the DB has to do a lot of work reindexing the table as records are inserted. It's more difficult to reindex 500k rows than 100k.
The indices are some form of tree, which means the time to insert a record is going to be O(log n), where n is the size of the tree (≈ number of rows for the standard unique index).
The fastest way to insert them is going to be dropping/disabling the index during the insert and recreating it after, as you've already found.
Even with indexes, 4 minutes to insert 100,000 records seems like a problem to me.
If this database has I/O problems, you haven't fixed them and they will appear again. I would recommend that you identify the root cause.
If you post the index DDL, I'll time it for a comparison.
I added indexes on id and business_name. Doing 10 iterations in a loop, the average time per 100,000 rows was 25 seconds. This was on my home PC/server all running on a single disk.
Another trick to improve performance is to turn on or set the cache higher on your sequence(customer_seq). This will allow oracle to allocate the sequence into memory instead of hitting the object for each insert.
Be careful with this one though. In some situations this will cause gaps your sequence to have gaps between values.
More information here:
Oracle/PLSQL: Sequences (Autonumber)
Sorted inserts always take longer the more entries there are in the table.
You don't say which columns are indexed. If you had indexes on fax, phone or email, you would have had a LOT of duplicates (ie every row).
Oracle 'pretends' to have non-unique indexes. In reality every index entry is unique with the rowid of the actual table row being the deciding factor. The rowid is made up of the file/block/record.
It is possible that, once you hit a certain number of records, the new ones were getting rowids which meant that had to be fitted into the middle of existing indexes with a lot of index re-writing going on.
If you supply full table and index creation statements, others would be able to reproduce the experience which would have allowed for more evidence based responses.
i think it has to do with the extending the internal structure of the file, as well as building database indexes for the added information - i believe the database arranges the data in a non-linear fashion that helps speed up data retrieval on selects

Resources