ORA-30036 happen intermittently - oracle

I have a stored proc that do a very large update. Sometimes the job failed with error ORA-30036 Unable to extend segment by 8 in undo tablespace 'undotbs2'
But after a few hours, we reran the job and it completed successfully.
I checked and found undotbs2 already has AUTOEXTENSIBLE set to YES, and size is 3 GB, so I guess the undo tablespace already has pretty decent size there, and has automatic space management already turned on.
My question is, why does it complete successfully after we rerun it? Is it because there were other transactions using undotbs2 at the same time? For this error, Oracle mentions "An alternative is to wait until active transactions to commit.", does "active transactions" refer to other transactions/sql that were happened to run besides the stored proc?
Oracle version is 11.2.0.1.0
Thank you

Looks like your UNDO tablespace has reached it MAXSIZE. This can happen if you have a lenghty transaction going on together with other lengthy transactions.
UNDO tablespace is used by Oracle to keep information required for restoring data after your transaction issues a ROLLBACK. That said, its use is dependent on how many active transactions there are at any given moment, and how much information is being changed by each of them.
The resulting usage/size of the tablespace can - as you have experienced - be pretty random.
A solution might be to:
increase the MAXSIZE for UNDO tablespace so it can handle the amount of information your lenghty transaction produces
modify your implementation, so it issues COMMITS every now and then, so the UNDO information for your lenghty transaction could be freed.

Related

How to cause a "ORA-01555: snapshot too old error" without updates

I am running into ORA-01555: snapshot too old errors with Oracle 9i but am not running any updates with this application at all.
The error occurs after the application has been connected for some hours without any queries, then every query (which would otherwise be subsecond queries) comes back with a ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6$" too small.
Could this be cause of transaction isolation set to TRANSACTION_SERIALIZABLE? Or some other bug in the JDBC code? This could be caused by a bug in the jdbc-go driver but everything I've read about this bug has led me to believe scenarios where no DML statements are made this would not occur.
Read below a very good insight on this error by Tom Kyte. The problem in your case may come from what is called 'delayed block cleanout'. This is a case where selects creates redo. However, the root cause is almost sure improper size of rollback segments(but Tom adds as correlated causes: too frequently commits, a too big read after many updates, etc).
snaphot too old error (Tom Kyte)
When you run a query on an Oracle database the result will be what Oracle calls a "Read consistent snapshot".
What it means is that all the data items in the result will be represented with the value as of the the time the query was started.
To achieve this the DBMS looks into the rollback segments to get the original value of items which have been updated since the start of the query.
The DBMS uses the rollback segment in a circular way and will eventually wrap around - overwriting the old data.
If your query needs data that is no longer available in the rollback segment you will get "snapshot too old".
This can happen if your query is running for a long time on data being concurrently updated.
You can prevent it by either extending your rollback segments or avoid running the query concurrently with heavy updaters.
I also believe newer versions of Oracle provides better dynamic management of rollback segments than what is the case for Oracle 9i.

What's the difference between Oracle Flashback Database and the Oracle (guaranteed) Restore Points?

My team is planning a very large set of updates to our apps soon, including some hefty DB updates (Oracle 11gR2). As I was writing scripts that would revert all the DB updates (as a roll back contingency) and researching potential Oracle features, I came across this Oracle documentation. I see that flashbacks use "flashback logs" to restore the DB to a specific state. I also see that the restore points use the system change number to bookmark the DB. \
This SO questions says flashback will "return a table to the state it was in 10 minutes ago" but does that mean the data will be reverted too? (we have lots of reference tables as well)
Would either of these Oracle features be useful to undo our DB updates while maintaining the integrity of our production data? It's unclear to me what the two features do in practice and how they are different.
The main difference is that flashback rolls back changes including the changes made by others in the whole table or database to any point of time in the past within the range of flashback setting. To roll back to restored points will only rollback what you change in your transaction, and changes by others won't be affected.
When you create a Guaranteed restore point it will keep enough flashback logs to flashback the database to the guaranteed restore point.
Guaranteed restore points must be dropped manually using DROP RESTORE POINT statement. Guaranteed restore points do not expire. If you do not do that flash recovery area will grow indefinitely until filesystem or Diskgroup becomes full...
Flashback database to restore point

Snapshot too old error

I am getting 'snapshot too old error' frequently while i am running my workflow when it runs for more than 5 hrs.My source is oracle and target is Teradata. Please help to solve this issue.Thanks in advance
The best explanation of the ORA-01555 snapshot too old error that I've read, is found in this AskTom thread
Regards.
The snapshot too old error is more or less directly related to the running time of your queries (often a cursor of a FOR loop). So the best solution is to optimize your queries so they run faster.
As a short term solution you can try to increase the size of the UNDO log.
Update:
The UNDO log stores the previous version of a record before it's updated. It is used to rollback transactions and to retrieve older version of a record for consistent data snapshots for long running queries.
You'll probably need to dive into Oracle DB administration if you want to solve it via increasing the UNDO log. Basically you do (as SYSDBA):
ALTER SYSTEM SET UNDO_RETENTION = 21600;
21600 is 6 hours in seconds.
However, Oracle will only keep 6 hours of old data if the UNDO log files are big enough, which depends on the size of the rollback segments and the amount of updates executed on the database.
So in addition to changing the undo retention time, you should also make sure that few concurrent updates are executed while your job is running. In particular, updates of the data your job is reading should be minimized.
If everything fails, increase the UNDO logs.

Does using NOLOGGING in Oracle break ACID? specifically during poweroutage

When using NOLOGGING in Oracle, say for inserting new records. Will my database be able to gracefully recover from a power outage? if it randomly went down during the insert.
Am I correct in stating that the the UNDO logs will be used for such recoveries ... as opposed to REDO log usage which be be used for recovery if the main datafiles were physically corrupted.
It seems to me, you're muddling some concepts together here.
First, let's talk about instance recovery. Instance recovery is what happens following a database crash, whether it is killed, server goes down, etc. On instance startup, Oracle will read data from the redo logs and roll forward, writing all pending changes to the datafiles. Next, it will read undo, determine which transactions were not committed, and use the data in undo to rollback any changes that had not committed up to the time of the crash. In this way, Oracle guarantees to have recovered up to the last committed transaction.
Now, as to direct loads and NOLOGGING. It's important to note that NOLOGGING is only valid for direct loads. This means that updates and deletes are never NOLOGGING, and that INSERT is only nologging if you specify the APPEND hint.
It's important to understand that when you do a direct load, you are literally "directly loading" data into the datafiles. So, no need to worry about issues around instance recovery, etc. When you do a NOLOGGING direct load, data is still written directly to the datafiles.
It goes something like this. You do a direct load (for now, let set aside the issue of NOLOGGING), and data is loaded directly into the datafiles. The way that happens, is that Oracle will allocate storage from above the high water mark (HWM), and format and load those brand new blocks directly. When that block allocation is made, those data dictionary updates that describe the space allocation are written to and protected by redo. Then when your transaction commits, the changes become permanent.
Now, in the event of an instance crash, either the transaction was committed (in which case the data is in the datafiles and the data dictionary reflects those new extents have been allocated), or it was not committed, and the table looks exactly like it did before the direct load began. So, again, data up to and including the last committed transaction is recovered.
Now, NOLOGGING. Whether a direct load is logged or not, is irrelevant for the purposes of instance recovery. It will only come into play in the event of media failure and media recovery.
If you have a media failure, you'll need to recover from backup. So, you'll restore the corrupted datafile and then apply redo, from archived redo logs, to "play back" the transactions that occurred from the time of the backup to the current point in time. As long as all the changes were logged, this is not a problem, as all the data is there in the redo logs. However, what will happen in the event of a media failure subsequent to a NOLOGGING direct load?
Well, when the redo is applied to your segments that were loaded with NOLOGGING, the required data is not in the redo. So, those data dictionary transactions that I mentioned that created the new extents where data was loaded, those are in the redo, but nothing to populate those blocks. So, the extents are allocated to the segment, but then are also marked as invalid. So, if/when you attempt to select from the table, and hit those invalid blocks, you'll get ORA-26040 "data was loaded using the NOLOGGING option". This is Oracle letting you know you have a data corruption caused by recovery through a NOLOGGING operation.
So, what to do? Well, first off, any time you load data with NOLOGGING, make sure you can re-run the load, if necessary. So, if you do suffer an instance failure during the load, you can restart the load, or if your suffer a media failure between the time of the NOLOGGING load and the next backup, you can re-run the load.
Note that, in the event of a NOLOGGING direct load, you're only exposed to data loss until your next backup of the datafiles/tablespaces containing the segments that had the direct load. Once it's protected by backup, you're safe.
Hope this helps clarify the ideas around direct loads, NOLOGGING, instance recovery, and media recovery.
IF you use NOLOGGING you don't care about the data. Nologging operations should be recoverable with other procedures than the regular databases recovery procedures. Many times the recovery will happen without problems. Problem is when you have a power failure on the storage. In that case you might end up corrupting the online redo - that was active - and because of that also have problems with corrupt undo segments.
So, specifically in your case: I would not bet on it.
Yes, much of the recovery would be done by reading undo, that might get stuck because of exactly the situation you described. That is one of the nastiest problems to recover.
As to be 100% ACID compliant a DBMS needs to be serializable, this is very rare even amongst major vendors. To be serializable read, write and range locks need to be released at the end of a transaction. There are no read locks in Oracle so Oracle is not 100% ACID compliant.

Dropping a table partition avoiding the error ORA-00054

I need your opinion in this situation. I’ll try to explain the scenario. I have a Windows service that stores data in an Oracle database periodically. The table where this data is being stored is partitioned by date (Interval-Date Range Partitioning). The database also has a dbms_scheduler job that, among other operations, truncates and drops older partitions.
This approach has been working for some time, but recently I had an ORA-00054 error. After some investigation, the error was reproduced with the following steps:
Open one sqlplus session, disable auto-commit, and insert data in the
partitioned table, without committing the changes;
Open another sqlplus session and truncate/drop an old partition (DDL
operations are automatically committed, if I’m not mistaken). We
will then get the ORA-00054 error.
There are some constraints worthy to be mentioned:
I don’t have DBA access to the database;
This is a legacy application and a complete refactoring isn’t
feasible;
So, in your opinion, is there any way of dropping these old partitions, without the risk of running into an ORA-00054 error and without the intervention of the DBA? I can just delete the data, but the number of empty partitions will grow everyday.
Many thanks in advance.
This error means somebody (or something) is working with the data in the partition you are trying to drop. That is, the lock is granted at the partition level. If nobody was using the partition your job could drop it.
Now you say this is a legacy app and you don't want to, or can't, refactor it. Fair enough. But there is clearly something not right if you have a process which is zapping data that some other process is using. I don't agree with #tbone's suggestion of just looping until the lock is released: you can't just get rid of data which somebody is using with establishing why they are still working with data that they apparently should not be using.
So, the first step is to find out what the locking session is doing. Why are they still amending this data your background job wants to retire? Here's a script which will help you establish which session has the lock.
Except that you "don't have DBA access to the database". Hmmm, that's a curly one. Basically this is not a problem which can be resolved without DBA access.
It seems like you have several issues to deal with. Unfortunately for you, they are political and architectural rather than technical, and there's not much we can do to help you further.
How about wrapping the truncate or drop in pl/sql that tries the operation in a loop, waiting x seconds between tries, for a max num of tries. Then use dbms_scheduler to call that procedure/function.
Maybe this can help. Seems to be the same issue as the one that you discribe.
(ignore the comic sans, if you can) :)

Resources