How can I set the timeout on OCILogon2? - oracle

When the Oracle 10 databases are up and running fine, OCILogon2() will connect immediately. When the databases are turned off or inaccessible due to network issues - it will fail immediately.
However when our DBAs go into emergency maintenance and block incomming connections, it can take 5 to 10 minutes to timeout.
This is problematic for me since I've found that OCILogin2 isn't thread safe and we can only use it serially - and I connect to quite a few Oracle DBs. 3 blocked servers X 5-10 minutes = 15 to 30 minutes of lockup time
Does anyone know how to set the OCILogon2 connection timeout?
Thanks.

I'm currenty playing with OCI and it seems to me that it's impossible.
The only way I can think of is to use non-blocking mode. You'll need OCIServerAttach() and OCISessionBegin() instead of OCILogon() in this case. But when I tried this, OCISessionBegin() constantly returns OCI_ERROR with the following error code:
ORA-03123 operation would block
Cause: The attempted operation cannot complete now.
Action: Retry the operation later.
It looks strange and I don't yet know how to deal with it.
Possible workaround is to run your logon in another process, which you can kill after timeout...

We think we found the right file setting - but it's one of those problems where we have to wait until something rare and horrible occurs before we can verify it :-/
[sqlnet.ora]
SQLNET.OUTBOUND_CONNECT_TIMEOUT=60
From the Oracle docs..
http://download.oracle.com/docs/cd/B28359_01/network.111/b28317/sqlnet.htm#BIIFGFHI
5.2.35 SQLNET.OUTBOUND_ CONNECT _TIMEOUT
Purpose
Use the SQLNET.OUTBOUND_ CONNECT _TIMEOUT parameter to specify the time, in seconds, for a client to establish an Oracle Net connection to the database instance.
If an Oracle Net connection is not established in the time specified, the connect attempt is terminated. The client receives an ORA-12170: TNS:Connect timeout occurred error.
The outbound connect timeout interval is a superset of the TCP connect timeout interval, which specifies a limit on the time taken to establish a TCP connection. Additionally, the outbound connect timeout interval includes the time taken to be connected to an Oracle instance providing the requested service.
Without this parameter, a client connection request to the database server may block for the default TCP connect timeout duration (approximately 8 minutes on Linux) when the database server host system is unreachable.
The outbound connect timeout interval is only applicable for TCP, TCP with SSL, and IPC transport connections.
Default
None
Example
SQLNET.OUTBOUND_ CONNECT _TIMEOUT=10

Related

how to preserve database connection while thread is blocked

The database i'm integrated is configured as if a connection is idle(not being used for a while), then connection is dropped. Since im using spring batch in persistent configuration, there is always an active database connection on running threads.
One of my spring batch job is dependent to data from external web service which takes long time to execute. Thats why i already lose the database connection when i get the result.
I tried to use taskscheduler to register a heartbeat query(select 1 from dual) before the web request occurs, which executes the queryevery 5 minutes to keep the connection alive but even if the query executes periodically, i guess it executes the query on a seperate connecyion since it runs on another thread.
Does anyone have an alternative suggestion to keep the connection alive while on locked thread?
I use JPA's EntityManager for the haertbeat query
If you use Spring then you also use HikariCP. The recent JDBC standard defines method isValid() so you do not have to call SQL to check whether Connection is alive.
More over there is one more mechanism you can use. It is called TCP keepalive.
If you insert stanza ENABLE=BROKEN into your JDBC url Oracle JDBC drivers will enable TCP Keepalive feature on a TCP connection
jdbc:oracle:thin:#(DESCRIPTION=(ENABLE=BROKEN)(ADDRESS=(PROTOCOL=tcp)(PORT=1521)(HOST=myhost))(CONNECT_DATA=(SERVICE_NAME=orcl)))
Then it will be Linux kernel who will be sending keepalive probes over TCP connection even if your thread is blocked.
Beware: The delay for 1st probe and frequency is determined by Linux kernel parameters.
# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
By default the 1st keep alive probe (TCP window carying 0 bytes) is sent after 2 hours.
While Cisco/Juniper usually cut off TCP connection after one hour.

Why RabbitMQ won't the connection stay open when not in use?

I have used http://github.com/streadway/amqp package in my application in order to handle connections to a remote RabbitMQ server. Everything is ok and works fine but when a connection is idle for a long period of time f.g 6 hours it gets closed. I check NotifyClose(make(chan *amqp.Error)) all time in my go routine and it returns :
Exception (501) Reason: "write tcp
192.168.133.53:55424->192.168.134.34:5672: write: broken pipe"
Why this error happens? (is there any problem in my code?)
How long a connection can be idle?
How to prevent this problem?
As Cosmic Ossifrage says, the error is saying your RabbitMQ client has disconnected.
There are so many things that could sit between your client and server that can/will drop dormant connections that it's not worth focusing on how long your connection can be dormant for. You want to set the requested heartbeat interval in your connection manager.
https://www.rabbitmq.com/heartbeats.html
I'm not familiar with the framework you're using but I see it has a defaultHeartbeat field in connection.go. You might need to experiment with the value to find the best balance is to stop the connection being killed but not hit the server too often with keep-alive traffic.

HornetQ client-failure-check-period

Suppose that after 30s (default client-failure-check-period) the client did not receive any packets from the server as a result of net connection problems.
Will the client now be disconnect from session/connection?
Suppose now I add this configration :
<retry-interval>1000</retry-interval>
<retry-interval-multiplier>1.5</retry-interval-multiplier>
<max-retry-interval>60000</max-retry-interval>
<reconnect-attempts>1000</reconnect-attempts>
What will happen now?
Will the client still get disconnected from session/connection but only after trying to reconnect 1000 times (until net is available again)? Or will it ignore the need to do disconnect?
Regarding your first question, and according to HornetQ documentation, that can be found under 17.2. Detecting failure from the client side:
As long as the client is receiving data from the server it will consider the connection to be still alive.
If the client does not receive any packets for client-failure-check-period milliseconds then it will consider the connection failed and will either initiate failover, or call any FailureListener instances (or ExceptionListener instances if you are using JMS) depending on how it has been configured.
Therefore the client will assume that the connection was in fact lost and start its failure processes.
For your second question, also according to the HornetQ documentation, that can be found under 34.3. Configuring reconnection/reattachment attributes:
reconnect-attempts. This optional parameter determines the total number of reconnect attempts to make before giving up and shutting down. A value of -1 signifies an unlimited number of attempts. The default value is 0.
So, yes, the connection will be dropped after 1000 attempts.

Connection pool opens more connections then maximum pool size

Hey I'm using Glassfish open source v4 and I'm having a weird problem.
I have defined a JDBC connection pool to Oracle 11g in the admin console and I've set :
Pool Settings
Initial and Minimum Pool Size: 500
Maximum Pool Size: 1000
Pool Resize Quantity: : 750
And I've created a specific user for this connection pool. Yet sometimes when I inspect opened connections in the database I see that there are more then 1000 (maximum I've seen was 1440)
When this happens any query attempts fail, sometimes with OutOfMemory exception, some show http thread interuptions and some don't show any logs at all, just takes a long time.
What I am wondering is how is it possible the Glassfish opens more connections then I've defined it to?
1t try to compare output from netstat on appl. server and db server side. You may have some "dangling" connections. Also try to find some documentation about DCD (Dead connection detection) in Oracle.
Few years ago I saw situations where Java application server thought that the connection is dead because it is not responding for few minutes. So this connection was put onto some dead connection list and a new connection was created.
There also can be some network issues - for example there is a FW between appl and db server.
When TCP connection is not active for one hour then it's cut over on one side but DB sever does not know about that.
The usual way how to investigate that is
compare output of both netstat(s) (appl./db)
identify dangling TCP connections
translate TCP connection onto Unix process id(PID) of Oracle session process
translate PID onto Oracle session (SID and SERIAL#)
kill the session on Oracle level (alter system kill session ...)

What is the difference between "ORA-12571: TNS packet writer failure" and "ORA-03135: connection lost contact"?

I am working in an environment where we get production issues from time to time related to Oracle connections. We use ODP.NET from ASP.NET applications, and we suspect the firewall closes connections that have been in the connection pool too long.
Sometimes we get an "ORA-12571: TNS packet writer failure" error, and sometimes we get "ORA-03135: connection lost contact."
I was wondering if someone has run into this and/or has an understanding of the difference between the 2 errors.
Using a mobile phone analogy:
ORA-12571 (Failure) Means call is dropped.
ORA-03135 (Connection Lost) Other party hung up.
My understanding is that 3135 occurs when a connection is lost. This doesn't tell you why the connection was lost, though. It may have been terminated by the server because the server failed to recieve a response to a probe for a certain amount of time, and assumed that the connection was dead. Or (I'm not sure about this) the exact reverse of that: the client failed to recieve a probe response from the server for a certain amount of time, so it assumed the connection was lost. The "certain amount of time" is cotrolled by SQLNET.EXPIRE_TIME=[minutes] in sqlnet.ora.
As for 12571, my (again vague) understanding is that there was a sudden failure to send a packet during communication with the server, and that this is typically caused by some software or hardware interfering with the connection (either by design, or by error). For instance, if you pull out your ethernet cable and then try to execute a query, you'll probably get this. Or if a firewall or anti-malware application decides to block the traffic.

Resources