My app to recovers automatically from failures. I test it as follows:
Start app
In the middle of processing, kill the application server host (shutdown -r -f)
On host reboot, application server restarts (as a windows service)
Application restarts
Application tries to process, but is blocked by incomplete 2-phase commit transaction in Oracle DB from previous session.
Somewhere between 10 and 30 minutes later the DB resolves the prior txn and processing continues OK.
I need it to continue processing faster than this. My DBA advises that I should prefix my statement with
ALTER SESSION ADVISE COMMIT;
But he can't give me guarantees or details about the potential for data loss doing this.
Luckily the statement in question is simply updating a datetime value to SYSDATE every second or so, so if there was some data corruption it would last < 1 second before it was overwritten.
But, to my question. What exactly does the statement above do? How does Oracle resolve data synchronisation issues when it is used?
Can you clarify the role of the 'local' and 'remote' databases in your scenario.
Generally a multi-db transaction does the following
Starts the transaction
Makes a change on on database
Makes a change on the other database
Gets the other database to 'promise to commit'
Commits locally
Gets the remote db to commit
In doubt transactions happen if step 4 is completed and then something fails. The general practice is to get the remote database back up and confirm if it committed. If so, step (5) goes ahead. If the remote component of the transaction can't be committed, the local component is rolled back.
Your description seems to refer to an app server failure which is a different kettle of fish. In your case, I think the scenario is as follows :
App server takes a connection and starts a transaction
App server dies without committing
App server restarts and make a new database connection
App server starts a new transaction on the new connection
New transaction get 'stuck' waiting for a lock held by the old connection/transaction
After 20 minutes, dead connection is terminated and transaction rolled back
New transaction then continues
In which case the solution is to kill off the old connection quicker, with a shorter timeout (eg SQLNET_EXPIRE_TIME in the sqlnet.ora of the server) or a manual ALTER SYSTEM KILL SESSION.
Related
I have setup a data guard on two separate servers (primary and standby).
All the steps have been completed and when I make a change in the primary database and commit, it is also applied to the standby server.
Now I want it to be OK without committing the changes on the standby server.
For example, if a record is inserted in the primary database table, that record will also be inserted in the standby database table and there is no need to commit.
I have not found a solution.
Lets put aside the standby for a second. If you make a change on a database and commit it, that the change is now there permanently. If you do not commit, it can be considered as never happened, ie, you rolled it back, or not yet happened (the transaction is open).
Having a standby or not does not impact this fundamental premise.
In a particularly requested DB2 table, accessed by distributed Java desktop applications via JDBC, I'm getting the following scenario several times a day:
Client A wants to INSERT new registers and gets a IX lock on the table, and X locks in each new row;
Other client(s) want(s) to perform a SELECT, is granted a IS lock on the table, but the application stucks;
Client A continues to work, but the INSERT and UPDATE queries are not commited, the locks are not released, and it keeps collecting X locks to each row;
Client A exits and its work is not committed. The other clients finnally get their SELECT result set.
Used to work well, and it does most of the time, but the lock situations are getting more and more frequent.
Auto-commit is ON.
There are no exceptions thrown or errors detected in the logs.
DB2 9.5 / JDBC Driver 9.1 (JDBC 3 specification)
If the jdbc applications are not performing COMMIT then the locks will persist until a rollback or commit. If an application quits with uncommitted inserts then a rollback will happen for all recent versions of Db2. This is expected behaviour for Db2 on Linux/Unix/Windows.
If the jdbc application is failing to commit then it is broken or misconfigured so you must get to root cause of that if you seek a permanent solution.
If the other clients wish to ignore the insert row-locks then they should choose the correct isolation level and you can configure Db2 to skip insert-locks . See documentation DB2_SKIPINSERTED at this link
It turns out that sometimes the auto-commit, and I don't know why, becomes off to a random single instance of the application.
The following validation seems to solve the problem (but not the root of it):
if (!conn.getAutoCommit()) {
conn.commit();
}
After my app tries committing many transactions after several mins, I'm getting the following exception:
could not commit jdbc transaction nested exception is
java.sql.sqlexception: jz006: caught ioexception:
java.net.SocketTimeoutException: Read timed out..."
I'm using Sybase with the JDBC 4 driver with Spring JDBC, and I found this link: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc39001.0707/html/prjdbc0707/prjdbc070714.htm
Could I just use any of the following:
SESSION_TIMEOUT
DEFAULT_QUERY_ TIMEOUT
INTERNAL_QUERY_TIMEOUT
One idea is to make the transactions in batch, but I have no time to develop that.
What options are there to avoid getting that error?
Check if your processes are blocking each other when they execute (or ask your DBA if you're not sure how to check). Depending upon the connection properties (specifically autocommit being set to off) you may not actually be committing each transaction fully before the next one is attempted and they may block each other if you're using a connection pool with multiple threads. Talk to your DBA and check the table's locking scheme as for example if its set to allpages locking, you will hold locks at the page rather than row-level of data. You can also check this yourself via sp_help . Some more info regarding the various types of locking scheme can be found at http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20021_1251/html/locking/X25549.htm (old version but still valid on current versions).
You can check for locks via sp_who, sp_lock or against the system tables directly (select spid, blocked from master..sysprocesses where blocked !=0 is a very simple one to get the process and blocking process you can add more columns to this as required).
You should also ask your DBA to check that the transactions are optimal as for example a tablescan on an update could well lock out the whole table to other transactions and would lead to the timeout issues you're seeing here.
I often get below error while running some sessions in Informatica Powercenter. The session is supposed to insert/update some records in oracle tables.
Username USER123 DB Error -1
Database driver Error...
Function Name: Logon
ORA-12537: TNS-connection closed
Database Error: Failed to connect to database using user [USER123] and connection string [ORCL123]
This is totally random. I have ran the same sessions sometimes smoothly without a singe hitch. But sometimes the error comes back again and again. Whenever it occurs, stays for 5 mins max, means if I restart the session immediately after failure, it will fail again. But if I wait for 5 mins and restart them again, it runs successfully. But the only problem is it comes back again in another half an hour or so.
Can somebody enlighten me to get a probable resolution for the error?
Do check the number of connections allowed on the Oracle instance if they are exceeding intermittently by other users and refusing your user connection from Informatica.
I've got a load-balanced (not using Session state) ASP.Net 2.0 app on IIS5 running back to a single Oracle 10g server, using version 10.1.0.301 of the ODAC/ODP.Net drivers. After a long period of inactivity (a few hours), the application, seemingly randomly, will throw an Oracle exception:
Exception: ORA-03113: end-of-file on communication channel at
Oracle.DataAccess.Client.OracleException.HandleErrorHelper(Int32
errCode, OracleConnection conn, IntPtr opsErrCtx, OpoSqlValCtx*
pOpoSqlValCtx, Object src, String procedure) at
Oracle.DataAccess.Client.OracleCommand.ExecuteReader(Boolean requery,
Boolean fillRequest, CommandBehavior behavior) at
Oracle.DataAccess.Client.OracleCommand.System.Data.IDbCommand.ExecuteReader()
...Oracle portion of the stack ends here...
We are creating new connections on every request, have the open & close wrapped in a try/catch/finally to ensure proper connection closure, and the whole thing is wrapped in a using (OracleConnection yadayada) {...} block. This problem does not appear linked to the restart of the ASP.Net application after being spun down for inactivity.
We have yet to reproduce the problem ourselves. Thoughts, prayers, help?
More: Checked with IT, the firewall isn't set to kill connections between those servers.
ORA-03113: end-of-file on communication channel
Is the database letting you know that the network connection is no more. This could be because:
A network issue - faulty connection, or firewall issue
The server process on the database that is servicing you died unexpectedly.
For 1) (firewall) search tahiti.oracle.com for SQLNET.EXPIRE_TIME. This is a sqlnet.ora parameter that will regularly send a network packet at a configurable interval ie: setting this will make the firewall believe that the connection is live.
For 1) (network) speak to your network admin (connection could be unreliable)
For 2) Check the alert.log for errors. If the server process failed there will be an error message. Also a trace file will have been written to enable support to identify the issue. The error message will reference the trace file.
Support issues can be raised at metalink.oracle.com with a suitable Customer Service Identifier (CSI)
Add Validate Connection=true to your connection string.
Look at this blog to find more about.
DETAILS:
After OracleConnection.Close() the real database connection does not terminate. The connection object is put back in connection pool. The use of connection pool is implicit by ODP.NET. If you create a new connection you get one of the pool. If this connection is "yet open" the OracleConnection.Open() method does not really creates a new connection. If the real connection is broken (for any reason) you get a failure on first select, update, insert or delete.
With Validate Connection the real connection is validated in Open() method.
Check that there isn't a firewall that is ending the connection after certain period of time (this was the cause of a similar problem we had)
end-of-file on communication channel:
One of the course of this error is due to database fail to write the log when its in the stage of opening;
Solution check the database if its running in ARCHIVELOG or NOARCHIVELOG
to check use
select log_mode from v$database;
if its on ARCHIVELOG try to change into NOARCHIVELOG
by using sqlplus
startup mount
alter database noarchivelog;
alter database open;
if it works for this
Then you can adjust your flashrecovery area its possibly that your flashrecovery area is full
-> then after confirm that your flashrecovery area has the space you can alter your database into the ARCHIVELOG
This error message can be thrown in the application logs when the actual issue is that the oracle database server ran out of space.
After correcting the space issue, this particular error message disappeared.
You could try this registry hack:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"DeadGWDetectDefault"=dword:00000001
"KeepAliveTime"=dword:00120000
If it works, just keep increasing the KeepAliveTime. It is currently set for 2 minutes.
The article previously mentioned is good. http://forums.oracle.com/forums/thread.jspa?threadID=191750 (as far as it goes)
If this is not something that runs frequently (don't do it on your home page), you can turn off connection pooling.
There is one other "gotcha" that is not mentioned in the article. If the first thing you try to do with the connection is call a stored procedure, ODP will HANG!!!! You will not get back an error condition to manage, just a full bore HANG! The only way to fix it is to turn OFF connection pooling. Once we did that, all issues went away.
Pooling is good in some situations, but at the cost of increased complexity around the first statement of every connection.
If the error handling approach is so good, why don't they make it an option for ODP to handle it for us????
//First start the database in mount mode
startup mount
//Disable archivelog
alter database noarchivelog
//Then put db in open
alter database open