I am getting 'snapshot too old error' frequently while i am running my workflow when it runs for more than 5 hrs.My source is oracle and target is Teradata. Please help to solve this issue.Thanks in advance
The best explanation of the ORA-01555 snapshot too old error that I've read, is found in this AskTom thread
Regards.
The snapshot too old error is more or less directly related to the running time of your queries (often a cursor of a FOR loop). So the best solution is to optimize your queries so they run faster.
As a short term solution you can try to increase the size of the UNDO log.
Update:
The UNDO log stores the previous version of a record before it's updated. It is used to rollback transactions and to retrieve older version of a record for consistent data snapshots for long running queries.
You'll probably need to dive into Oracle DB administration if you want to solve it via increasing the UNDO log. Basically you do (as SYSDBA):
ALTER SYSTEM SET UNDO_RETENTION = 21600;
21600 is 6 hours in seconds.
However, Oracle will only keep 6 hours of old data if the UNDO log files are big enough, which depends on the size of the rollback segments and the amount of updates executed on the database.
So in addition to changing the undo retention time, you should also make sure that few concurrent updates are executed while your job is running. In particular, updates of the data your job is reading should be minimized.
If everything fails, increase the UNDO logs.
Related
The situation is simple, there is a table in oracle used as a "shared table" for data exchange. The table structure and number of records remains unchanged. In normal case, I continuously update data into this table and other process read this table for current data.
Strange thing is, when my process starts, the time consumption of each update statement execution is approximately 2 ms. And after a certain peroid of time(like 8 hours), the time consumption increased to 10 ~ 20 ms per statement. It makes the procedure quite slow.
the structure of table
and the update statement is like:
anaNum = anaList.size();
qry.prepare(tr("update YC set MEAVAL=:MEAVAL, QUALITY=:QUALITY, LASTUPDATE=:LASTUPDATE where YCID=:YCID"));
foreach(STbl_ANA ana, anaList)
{
qry.bindValue(":MEAVAL",ana.meaVal);
qry.bindValue(":QUALITY",ana.quality);
qry.bindValue(":LASTUPDATE",QDateTime::fromTime_t(ana.lastUpdate));
qry.bindValue(":YCID",ana.ycId);
if(!qry.exec())
{
qWarning() << QObject::tr("update yc failed, ")
<< qry.lastError().databaseText() << qry.lastError().driverText();
failedAnaList.append(ana);
}
}
the update statement using qt interface
There is many reasons which can cause orcle opreation slowd down, but I cannot find a clue to explain this.
I never start a transaction manually in qt code, which means the commit operation is executed every time after update statement.
The update frequency is about 200 records per second, but the number is dynamically changed by time. It maybe increase to 1000 in one time and drop to 10 in next time.
once the time consumption up to 10 ~ 20 ms per statement, it'll never dorp down. time consumption can be restored to 2ms only be restart oracle service.(it's useless to shutdown or restart any user process which visit orcle)
Please tell me how to solve it or at least what to be examined.
Good starting points is to check the AWR and ASH reports.
Comparing the reports in "good" and "bad" times you can spot the cause of the change. This can be for example a change of an execution plan or increase of wait events. One possible outcome is that only change you see is that the database is waiting more time on the client (i.e. the problem is not in the DB).
Anyway as diagnosed in other answer, the root cause of problems seems to be the update in a loop. If your update lists are long (say more that 10-100 entries) you can profit by updating the whole list in a single statement using MERGE.
build a collection from your list
cast the collection as TABLE
use this table in a MERGE statement to update the rows.
See here for details.
You can trace the session while it is running quickly and again later when it is running slowly. Use the sql trace functionality and tkprof to get a breakdown of where the update is spending its time in each case and see what has changed.
https://docs.oracle.com/cd/E25178_01/server.1111/e16638/sqltrace.htm#i4640
If you need help interpreting the results you can update your question or ask a new one.
Secondly, as a rule single record updates are not the best way to do updates in Oracle. Since you have many records to update already prepared before you prepare the query, look at execBatch.
https://doc.qt.io/qt-4.8/qsqlquery.html#execBatch
This will both execute the update faster and only issue a single commit.
I am running into ORA-01555: snapshot too old errors with Oracle 9i but am not running any updates with this application at all.
The error occurs after the application has been connected for some hours without any queries, then every query (which would otherwise be subsecond queries) comes back with a ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6$" too small.
Could this be cause of transaction isolation set to TRANSACTION_SERIALIZABLE? Or some other bug in the JDBC code? This could be caused by a bug in the jdbc-go driver but everything I've read about this bug has led me to believe scenarios where no DML statements are made this would not occur.
Read below a very good insight on this error by Tom Kyte. The problem in your case may come from what is called 'delayed block cleanout'. This is a case where selects creates redo. However, the root cause is almost sure improper size of rollback segments(but Tom adds as correlated causes: too frequently commits, a too big read after many updates, etc).
snaphot too old error (Tom Kyte)
When you run a query on an Oracle database the result will be what Oracle calls a "Read consistent snapshot".
What it means is that all the data items in the result will be represented with the value as of the the time the query was started.
To achieve this the DBMS looks into the rollback segments to get the original value of items which have been updated since the start of the query.
The DBMS uses the rollback segment in a circular way and will eventually wrap around - overwriting the old data.
If your query needs data that is no longer available in the rollback segment you will get "snapshot too old".
This can happen if your query is running for a long time on data being concurrently updated.
You can prevent it by either extending your rollback segments or avoid running the query concurrently with heavy updaters.
I also believe newer versions of Oracle provides better dynamic management of rollback segments than what is the case for Oracle 9i.
I have a stored proc that do a very large update. Sometimes the job failed with error ORA-30036 Unable to extend segment by 8 in undo tablespace 'undotbs2'
But after a few hours, we reran the job and it completed successfully.
I checked and found undotbs2 already has AUTOEXTENSIBLE set to YES, and size is 3 GB, so I guess the undo tablespace already has pretty decent size there, and has automatic space management already turned on.
My question is, why does it complete successfully after we rerun it? Is it because there were other transactions using undotbs2 at the same time? For this error, Oracle mentions "An alternative is to wait until active transactions to commit.", does "active transactions" refer to other transactions/sql that were happened to run besides the stored proc?
Oracle version is 11.2.0.1.0
Thank you
Looks like your UNDO tablespace has reached it MAXSIZE. This can happen if you have a lenghty transaction going on together with other lengthy transactions.
UNDO tablespace is used by Oracle to keep information required for restoring data after your transaction issues a ROLLBACK. That said, its use is dependent on how many active transactions there are at any given moment, and how much information is being changed by each of them.
The resulting usage/size of the tablespace can - as you have experienced - be pretty random.
A solution might be to:
increase the MAXSIZE for UNDO tablespace so it can handle the amount of information your lenghty transaction produces
modify your implementation, so it issues COMMITS every now and then, so the UNDO information for your lenghty transaction could be freed.
I have an application that do like:
delete from tableA where columnA='somevalue' and rownum<1000
In cycle like:
while(deletedRows>0) {
begin tran
deletedRows = session ... "delete from tableA where columnA='somevalue'
and rownum<1000"....
commit tran
}
It runs few times (each deleting takes near 20 seconds) and after hungs for long time
Why? Does it possible to fix?
Thanks.
The reason why the deletes are run in a loop rather than as a single SQL statement is lack of rollback space. See this question for more info.
Every time the query scans the table from the beginning. So, it scans the zones where there are no rows to delete(columnA='somevalue'). They are more and more far away from the first block of the table.
If the table is big and there would be no columnA='somevalue' the query will take the time to verify all the row for your condition.
What you can do is to make an index on columnA. In this case the engine will know faster where are the rows with that condition(search on index is exponential time faster).
Another possibility, if you are in a concurent system, is that someone updated a row that you ar trying to delete, but doesn't commited the transaction, so the row is locked.
You probably run into many different issues. As you are saying that database hungs the main reason is that your database is hitting ORA-00257 Archiver error.
Every delete produces a redo vector, all redos are then downloaded into an archive log. When archivelog space is exahausted your session hang and remain stuck until someone frees the space.
Usually your DBA has a job that run an archivelog backup every hour (this might be any couple of hours, or every 5 mins, depending by the database workload, etc...) and after the backup has done all sessions go ahead correctly.
Depending by the database configuration, from the client point of view, you might not see the error but just have the behaviour described where you session waits until the space is freed.
In term of design, I agree with other users that a DELETE in a loop is not a good idea. It could be interesting to know why you are trying to do this loop instead a single DELETE statement.
I need your opinion in this situation. I’ll try to explain the scenario. I have a Windows service that stores data in an Oracle database periodically. The table where this data is being stored is partitioned by date (Interval-Date Range Partitioning). The database also has a dbms_scheduler job that, among other operations, truncates and drops older partitions.
This approach has been working for some time, but recently I had an ORA-00054 error. After some investigation, the error was reproduced with the following steps:
Open one sqlplus session, disable auto-commit, and insert data in the
partitioned table, without committing the changes;
Open another sqlplus session and truncate/drop an old partition (DDL
operations are automatically committed, if I’m not mistaken). We
will then get the ORA-00054 error.
There are some constraints worthy to be mentioned:
I don’t have DBA access to the database;
This is a legacy application and a complete refactoring isn’t
feasible;
So, in your opinion, is there any way of dropping these old partitions, without the risk of running into an ORA-00054 error and without the intervention of the DBA? I can just delete the data, but the number of empty partitions will grow everyday.
Many thanks in advance.
This error means somebody (or something) is working with the data in the partition you are trying to drop. That is, the lock is granted at the partition level. If nobody was using the partition your job could drop it.
Now you say this is a legacy app and you don't want to, or can't, refactor it. Fair enough. But there is clearly something not right if you have a process which is zapping data that some other process is using. I don't agree with #tbone's suggestion of just looping until the lock is released: you can't just get rid of data which somebody is using with establishing why they are still working with data that they apparently should not be using.
So, the first step is to find out what the locking session is doing. Why are they still amending this data your background job wants to retire? Here's a script which will help you establish which session has the lock.
Except that you "don't have DBA access to the database". Hmmm, that's a curly one. Basically this is not a problem which can be resolved without DBA access.
It seems like you have several issues to deal with. Unfortunately for you, they are political and architectural rather than technical, and there's not much we can do to help you further.
How about wrapping the truncate or drop in pl/sql that tries the operation in a loop, waiting x seconds between tries, for a max num of tries. Then use dbms_scheduler to call that procedure/function.
Maybe this can help. Seems to be the same issue as the one that you discribe.
(ignore the comic sans, if you can) :)