I have an application that do like:
delete from tableA where columnA='somevalue' and rownum<1000
In cycle like:
while(deletedRows>0) {
begin tran
deletedRows = session ... "delete from tableA where columnA='somevalue'
and rownum<1000"....
commit tran
}
It runs few times (each deleting takes near 20 seconds) and after hungs for long time
Why? Does it possible to fix?
Thanks.
The reason why the deletes are run in a loop rather than as a single SQL statement is lack of rollback space. See this question for more info.
Every time the query scans the table from the beginning. So, it scans the zones where there are no rows to delete(columnA='somevalue'). They are more and more far away from the first block of the table.
If the table is big and there would be no columnA='somevalue' the query will take the time to verify all the row for your condition.
What you can do is to make an index on columnA. In this case the engine will know faster where are the rows with that condition(search on index is exponential time faster).
Another possibility, if you are in a concurent system, is that someone updated a row that you ar trying to delete, but doesn't commited the transaction, so the row is locked.
You probably run into many different issues. As you are saying that database hungs the main reason is that your database is hitting ORA-00257 Archiver error.
Every delete produces a redo vector, all redos are then downloaded into an archive log. When archivelog space is exahausted your session hang and remain stuck until someone frees the space.
Usually your DBA has a job that run an archivelog backup every hour (this might be any couple of hours, or every 5 mins, depending by the database workload, etc...) and after the backup has done all sessions go ahead correctly.
Depending by the database configuration, from the client point of view, you might not see the error but just have the behaviour described where you session waits until the space is freed.
In term of design, I agree with other users that a DELETE in a loop is not a good idea. It could be interesting to know why you are trying to do this loop instead a single DELETE statement.
Related
An Oracle stored procedure suddenly throws ORA-01555 while executing.
select
a,b
from table1 S into a_var,b_var
where s.abc=systedate
and requiedate between add_months(sysdate,-2) and sysdate
AND Currency= NVL(currency_CODe,USD)
group by S.actcount;
table1_invoice(1)=a_var;
table1_invoice(2)=b_var;
FORALL indx in 1..test.count SAVE EXCEPTIONS
insert into table2 values
table1_invoice(indx);
When the procedure was running and using table A, I executed an index re-build in parallel on the same table.
Once that completed, I executed gather stats on table A.
Does this things create error ORA-01555? Does rebuild index consume a rollback segment and the old snapshot of uncommitted data is removed?
I have pasted dummy code.
I execute index re-build in parallel on the same table.
This is your likely cause. ORA-1555 pertains to being able to give you a consistent view of the data. For example, using your dummy code as a template:
You open your cursor at 9am
You start fetching from that cursor at 9am and lets say the total execution of the query takes 60 seconds.
So lets say you at the 40 second mark of that fetch. Because (you) reading data does not block others from changing it, you might come across some data that has been recently changed (say 3 seconds ago) by someone else.
We can't give you THAT data, because we have to show you the data as it was at 9am, (when your query started).
So we find the transaction(s) that changed that data 3 seconds ago, and use the undo information those transactions wrote to reverse out the changes. We'll continue to do that until the data now looks like it did at 9am
Now we can use that (undone) data because it is consistent with the time you opened the cursor.
So where does ORA-1555 fit in? What if our query ran for (say) an hour? Now we might need to be undo-ing other transactions that ran nearly an hour ago. There is only so much space we reserve for the undo for (completed) transactions, because we need to free it up for new transactions as they come in. We throw away the old stuff because those transactions have committed. So revisiting the processing above, the following might happen:
You open your cursor at 9am
You start fetching from that cursor at 9am and lets say the total execution of the query takes 60 seconds.
So lets say you at the 40 second mark of that fetch. Because (you) reading data does not block others from changing it, you might come across some data that has been recently changed (say 3 seconds ago) by someone else.
We can't give you THAT data, because we have to show you the data as it was at 9am, (when your query started).
So we find the transaction(s) that changed that data 3 seconds ago and use the undo information those transactions wrote to reverse out the changes.
OH NO! That undo information has been discarded!!!
Now we're stuck, because we cannot give you the data as it was at 9am anymore because we can't take some changed data all the way back to 9am. The snapshot in time of the data you want is too old.
Hence "ORA-1555: Snapshot too old"
This is why the most common solution is just to retry your operation because now you are starting your query at a more recent time.
So you can see - the more activity going on against the database from OTHER sessions at the time of your query, the greater the risk of hitting a ORA-1555 because undo space is being consumed quickly and thus we might throw away the older stuff more rapidly.
I am writing some data loading code that pulls data from a large, slow table in an oracle database. I have read-only access to the data, and do not have the ability to change indexes or affect the speed of the query in any way.
My select statement takes 5 minutes to execute and returns around 300,000 rows. The system is inserting large batches of new records constantly, and I need to make sure I get every last one, so I need to save a timestamp for the last time I downloaded the data.
My question is: If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?
My gut tells me that the answer is 'no', especially since a large portion of those 5 minutes is just the time spent on the data transfer from the database to the local environment, but I can't find any direct documentation on the scenario.
"If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?"
No. Oracle enforces strict isolation levels and does not permit dirty reads.
The default isolation level is Read Committed. This means the result set you get after five minutes will be identical to the one you would have got if Oracle could have delivered you all the records in 0.0000001 seconds. Anything committed after you query started running will not be included in the results. That includes updates to the records as well as inserts.
Oracle does this by tracking changes to the table in the UNDO tablespace. Provided it can restrict the original image from that data your query will run to completion; if for any reason the undo information is overwritten your query will fail with the dreaded ORA-1555: Snapshot too old. That's right: Oracle would rather hurl an exception than provide us with an inconsistent result set.
Note that this consistency applies at the statement level. If we run the same query twice within the one transaction we may see two different results sets. If that is a problem (I think not in your case) we need to switch from Read Committed to Serialized isolation.
The Concepts Manual covers Concurrency and Consistency in great depth. Find out more.
So to answer your question, take the timestamp from the time you start the select. Specifically, take the max(created_ts) from the table before you kick off the query. This should protect you from the gap Alex mentions (if records are not committed the moment they are inserted there is the potential to lose records if you base the select on comparing with the system timestamp). Although doing this means you're issuing two queries in the same transaction which means you do need Serialized isolation after all!
The situation is simple, there is a table in oracle used as a "shared table" for data exchange. The table structure and number of records remains unchanged. In normal case, I continuously update data into this table and other process read this table for current data.
Strange thing is, when my process starts, the time consumption of each update statement execution is approximately 2 ms. And after a certain peroid of time(like 8 hours), the time consumption increased to 10 ~ 20 ms per statement. It makes the procedure quite slow.
the structure of table
and the update statement is like:
anaNum = anaList.size();
qry.prepare(tr("update YC set MEAVAL=:MEAVAL, QUALITY=:QUALITY, LASTUPDATE=:LASTUPDATE where YCID=:YCID"));
foreach(STbl_ANA ana, anaList)
{
qry.bindValue(":MEAVAL",ana.meaVal);
qry.bindValue(":QUALITY",ana.quality);
qry.bindValue(":LASTUPDATE",QDateTime::fromTime_t(ana.lastUpdate));
qry.bindValue(":YCID",ana.ycId);
if(!qry.exec())
{
qWarning() << QObject::tr("update yc failed, ")
<< qry.lastError().databaseText() << qry.lastError().driverText();
failedAnaList.append(ana);
}
}
the update statement using qt interface
There is many reasons which can cause orcle opreation slowd down, but I cannot find a clue to explain this.
I never start a transaction manually in qt code, which means the commit operation is executed every time after update statement.
The update frequency is about 200 records per second, but the number is dynamically changed by time. It maybe increase to 1000 in one time and drop to 10 in next time.
once the time consumption up to 10 ~ 20 ms per statement, it'll never dorp down. time consumption can be restored to 2ms only be restart oracle service.(it's useless to shutdown or restart any user process which visit orcle)
Please tell me how to solve it or at least what to be examined.
Good starting points is to check the AWR and ASH reports.
Comparing the reports in "good" and "bad" times you can spot the cause of the change. This can be for example a change of an execution plan or increase of wait events. One possible outcome is that only change you see is that the database is waiting more time on the client (i.e. the problem is not in the DB).
Anyway as diagnosed in other answer, the root cause of problems seems to be the update in a loop. If your update lists are long (say more that 10-100 entries) you can profit by updating the whole list in a single statement using MERGE.
build a collection from your list
cast the collection as TABLE
use this table in a MERGE statement to update the rows.
See here for details.
You can trace the session while it is running quickly and again later when it is running slowly. Use the sql trace functionality and tkprof to get a breakdown of where the update is spending its time in each case and see what has changed.
https://docs.oracle.com/cd/E25178_01/server.1111/e16638/sqltrace.htm#i4640
If you need help interpreting the results you can update your question or ask a new one.
Secondly, as a rule single record updates are not the best way to do updates in Oracle. Since you have many records to update already prepared before you prepare the query, look at execBatch.
https://doc.qt.io/qt-4.8/qsqlquery.html#execBatch
This will both execute the update faster and only issue a single commit.
I have a table which has around 180 million records and 40 indexes. A nightly program, loads data into this table but due to certain business conditions we can only delete and load data into this table. The nightly program will bring new records or updates to existing records in the table from the source system.We have limited window i.e about 6 hours to complete the extract from the source system, perform business transformations and finally load the data into this target table and be ready for users to consume the data in the morning. The issue which we are facing is that the delete from this table takes a lot of time mainly due to the 40 indexes on the table(an average of 70000 deletes per hour). I did some digging on the internet and see the below options
a) Drop or disable indexes before delete and then rebuild indexes: The program which loads data into the target table after delete and loading the data needs to perform quite a few updates for which the indexes are critical. And to rebuild 1 index it takes almost 1.5 hours due to the enormous amount of data in the table. So this approach is not feasible due to the time it takes to rebuild indexes and due to the limited time we have to get the data ready for the users
b) Use bulk delete: Currently the program deletes based on rowid and deletes records one by one as below
DELETE
FROM <table>
WHERE rowid = g_wpk_tab(ln_i);
g_wpk_tab is the collection which holds rowids to be deleted which is read by looping via FOR ALL and I do an intermediate commit every 50000 row deletes.
Tom of AskTom says in this discussion over here says that the bulk delete and row by row delete will take almost the same amount of time
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5033906925164
So this wont be a feasible option as well
c)Regular Delete: Tom of AskTom suggests to use the regular delete and even that takes a long time probably due to the number of indexes on this table
d)CTAS: This approach is out of question because the program needs to recreate the table , create the 40 indexes and then proceed with the updates and I mentioned above an index will take atleast 1.5 hrs to create
If you could provide me any other suggestions I would really appreciate it.
UPDATE: As of now we have decided to go with the approach suggested by https://stackoverflow.com/users/409172/jonearles to archive instead of delete. Approach is to add a flag to the table to mark the records to be deleted as DELETE and then have a post delete program run during the day to delete off the records. This will ensure that the data is available for users at the right time. Since users consume via OBIEE we are planning to set content level filter on the table to not look at the archival column so that users needn't know about what to select and what to ignore.
Parallel DML alter session enable parallel dml;, delete /*+ parallel */ ...;, commit;. Sometimes it's that easy.
Parallel DDL alter index your_index rebuild nologging compress parallel;. NOLOGGING to reduce the amount of redo generated during the index rebuild. COMPRESS can significantly reduce the size of a non-unique index, which significantly reduces the rebuild time. PARALLEL can also make a huge difference in rebuild time if you have more than one CPU or more than one disk. If you're not already using these options, I wouldn't be surprised if using all of them together improves index rebuilds by an order of magnitude. And then 1.5 * 40 / 10 = 6 hours.
Re-evaluate your indexes Do you really need 40 indexes? It's entirely possible, but many indexes are only created because "indexes are magic". Make sure there's a legitimate reason behind each index. This can be very difficult to do, very few people document the reason for an index. Before you ask around, you may want to gather some information. Turn on index monitoring to see which indexes are really being used. And even if the index is used, see how it is used, perhaps through v$sql_plan. It's possible that an index is used for a specific statement but another index would have worked just as well.
Archive instead of delete Instead of deleting, just set a flag to mark a row as archived, invalid, deleted, etc. This will avoid the immediate overhead of index maintenance. Ignore the rows temporarily and let some other job delete them later. The large downside to this is that it affects any query on the table.
Upgrading is probably out of the question, but 12c has an interesting new feature called in-database archiving. It's a more transparent way of accomplishing the same thing.
Advice/suggestions needed for a bit of application design.
I have an application which uses 2 tables, one is a staging table, which many separate processes write to, once a 'group' of processes has finished, another job comes along a aggregates the results together into a final table, then deletes that 'group' from the staging table.
The problem that I'm having is that when the staging table is being cleared out, lots of redo is generated and I'm seeing a lot of 'log file sync' waits in the database. This is a shared database with many other applications and this is causing some issues.
When applying the aggregate, the rows are reduced to about 1 row in the final table for every 20 rows in the staging table.
I'm thinking of getting around this by rather than having a single 'staging' table, I will create a table for each 'group'. Once done, this table can just be dropped, which should result in much less redo.
I only have SE, so partitioned tables isn't an option. Also faster disks for the redo probably isn't an option in the short term either.
Is this a bad idea? Any better solutions to be offered?
Thanks.
Would it be possible to solve the problem by having your process do a logical delete (i.e. set a DELETE_FLAG column in the table to 'Y') and then having a nightly process that truncates the table (potentially writing any non-deleted rows to a separate table before the truncate and then copy them back after the table is truncated)?
Are you certain that the source of the log file sync waits is that your disks can't keep up with the I/O? That's certainly possible, of course, but there are other possible causes of excessive log file sync waits including excessive commits. There is an excellent article on tuning log file sync events on the Pythian blog.
The most common cause of excessive log file syncs is too frequent commits, which are often deliberately coded in a mistaken attempt to reduce system load due to locking. You should commit only when your business transaction is complete.
Loading each group into a separate table sounds like a fine plan to reduce redo. You can truncate individual group table following each aggregation.
Another (but I think probably worse) option is to create a new staging table with the groups that haven't been aggregated then drop the original and rename the new table to replace the staging table.
I prefer Justin's suggestion ("logical delete"), but another option to consider might be a partitioned table, if you have the EE licence. The aggregation process could drop a partition instead of deleting the rows.