What is the difference between "ORA-12571: TNS packet writer failure" and "ORA-03135: connection lost contact"? - oracle

I am working in an environment where we get production issues from time to time related to Oracle connections. We use ODP.NET from ASP.NET applications, and we suspect the firewall closes connections that have been in the connection pool too long.
Sometimes we get an "ORA-12571: TNS packet writer failure" error, and sometimes we get "ORA-03135: connection lost contact."
I was wondering if someone has run into this and/or has an understanding of the difference between the 2 errors.

Using a mobile phone analogy:
ORA-12571 (Failure) Means call is dropped.
ORA-03135 (Connection Lost) Other party hung up.

My understanding is that 3135 occurs when a connection is lost. This doesn't tell you why the connection was lost, though. It may have been terminated by the server because the server failed to recieve a response to a probe for a certain amount of time, and assumed that the connection was dead. Or (I'm not sure about this) the exact reverse of that: the client failed to recieve a probe response from the server for a certain amount of time, so it assumed the connection was lost. The "certain amount of time" is cotrolled by SQLNET.EXPIRE_TIME=[minutes] in sqlnet.ora.
As for 12571, my (again vague) understanding is that there was a sudden failure to send a packet during communication with the server, and that this is typically caused by some software or hardware interfering with the connection (either by design, or by error). For instance, if you pull out your ethernet cable and then try to execute a query, you'll probably get this. Or if a firewall or anti-malware application decides to block the traffic.

Related

What would happen if a process established multiple PostgreSQL connections and terminated without closing them?

I'm writing a DLL for a purchased software.
The software will perform multi-threaded calculations on certain tasks.
My job is to output the relative result into a database.
However, due to the limited support of the software, it is kind of difficult to do multi-threaded output of the data.
The key problem is that there is no info on the last execution of the DLL function.
Therefore, the database connection will not be closed.
So may I ask if I leave the connection open and terminate the process, what would be the potential problems?
My platform is winserver 2008, and PostgreSQL 10.
I don't understand the background information you are giving, but I can answer the question:
If a PostgreSQL client process dies without closing the database (and TCP) connection, the PostgreSQL server process (“backend process”) that servers this connection will not realize this immediately.
Of course, as soon as the server tries to communicate to the client, e.g. to return some results, TCP it will notice that the partner has gone away and will return an error.
However, often the backend process is idle, waiting for the client to send the next request. In this case, it would never notice that its partner has died. This could eventually cause max_connections to be exhausted with dead connections.
Because this is a common problem in networking, TCP provides the “keepalive” functionality: when a connection has been idle for a while (2 hours by default), the operating system will send a so-called “keepalive packet” and wait for a response from the other side. Sending keepalive packets is repeated several times (5 times by default) in short intervals (1 second by default), and if no response is received, the connection is closed by the operating system, the backend process receives an error message and terminates.
PostgreSQL provides parameters with which you can configure these settings on the server side: tcp_keepalives_idle, tcp_keepalives_count and tcp_keepalives_interval. If you set tcp_keepalives_idle to a shorter value, dead connections will be detected and removed faster, at the cost of some little added network traffic.

Why RabbitMQ won't the connection stay open when not in use?

I have used http://github.com/streadway/amqp package in my application in order to handle connections to a remote RabbitMQ server. Everything is ok and works fine but when a connection is idle for a long period of time f.g 6 hours it gets closed. I check NotifyClose(make(chan *amqp.Error)) all time in my go routine and it returns :
Exception (501) Reason: "write tcp
192.168.133.53:55424->192.168.134.34:5672: write: broken pipe"
Why this error happens? (is there any problem in my code?)
How long a connection can be idle?
How to prevent this problem?
As Cosmic Ossifrage says, the error is saying your RabbitMQ client has disconnected.
There are so many things that could sit between your client and server that can/will drop dormant connections that it's not worth focusing on how long your connection can be dormant for. You want to set the requested heartbeat interval in your connection manager.
https://www.rabbitmq.com/heartbeats.html
I'm not familiar with the framework you're using but I see it has a defaultHeartbeat field in connection.go. You might need to experiment with the value to find the best balance is to stop the connection being killed but not hit the server too often with keep-alive traffic.

socketException broken pipe upon upgrading httpclient jar version to 4.5.3

I am getting socket exception for broken pipe in my client side.
[write] I/O error: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe (Write failed)
[LoggingManagedHttpClientConnection::shutdown] http-outgoing-278: Shutdown connection
1520546494584[20180308 23:01:34] [ConnectionHolder::abortConnection] Connection discarded
1520546494584[20180308 23:01:34] [BasicHttpClientConnectionManager::releaseConnection] Releasing connection [Not bound]
It seems that the upgradation of httpclient jar is causing issue.
Issue is not coming with httpclient-4.3.2
Exception is coming in every 2 minutes. Issue is intermittent at times.
after , send expect:100-continue ,conn.flush is throwing exception
client and server are Linux machine
client uses http jar to make request to server REST.
Please help me in debugging the issue
can httpjar cause such issue?
The persistent connections that are kept alive by the connection manager become stale. That is, the target server shuts down the connection on its end without HttpClient being able to react to that event, while the connection is being idle, thus rendering the connection half-closed or 'stale'
This is a general limitation of the blocking I/O in Java. There is simply no way of finding out whether or not the opposite endpoint has closed connection other than by attempting to read from the socket.
If a stale connection is used to transmit a request message the request execution usually fails in the write operation with SocketException and gets automatically retried.
Apache HttpClient works this problem around by employing the so stale connection check which is essentially a very brief read operation. However, the check can and often is disabled. In fact it is often advisable to have it disabled due to extra latency the check introduces.
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms

Stale connection with Pheanstalk

I'm using beanstalkd to offload some work to other machines. The setup is a bit unusual, the server is on the internet (public ip) but the consumers are behind adsl lines on some peoples homes. So there is a linux server as client going out through a dynamic ip and connecting to the server to get a job. It's all PHP and I'm using pheanstalk library.
Everything runs smoothly for some time, but then the adsl changes the IP (every 24h hours the provider forces a disconnect-reconnect) the client just hangs, never to go out of "reserve".
I thought that putting a timeout on the reserve would help it, but it didn't. As it seems, the client issues a command and blocks, it never checks the timeout. It just issues a reserve-with-timeout (instead of a simple reserve) and it is the servers responsibility to return a TIME_OUT as the timeout occurs. The problem is, the connection is broken (but the TCP/IP doesn't know about that yet until any of the sides try to talk to the other side) and if the client blocked reading, it will never return.
The library seems to have support for some kind of timeouts locally (for example when trying to connect to server), but it does not seem to contemplate this scenario.
How could I detect the stale connection and force a reconnect? Is there some kind of keepalive on the protocol (and on the pheanstalk itself)?
Thanks!
You could try to close each connection right after the request is answered and reopen a new connection each time.
There is no close() function but you deleting the Pheanstaly Object with unset($pheanstalk) will close it.
This explanation is quite helpful:
Pheanstalk (PHP client for beanstalk) - how do connections work?
I haven't tried it yet, but I came up with the idea of connecting to the beanstalk server through an SSH tunnel. We can enable the ServerAliveCountMax and ServerAliveInterval options on the tunnel, so that a network or server failure will cause the tunnel to close. This should then cause the pheanstalk client to report an error.

Irregular socket errors (10054) on Windows application

I am working on a Windows (Microsoft Visual C++ 2005) application that uses several processes
running on different hosts in an intranet.
Processes communicate with each other using TCP/IP. Different processes can be on the
same host or on different hosts (i.e. the communication can be both within the same
host or between different hosts).
We have currently a bug that appears irregularly. The communication seems to work
for a while, then it stops working. Then it works again for some time.
When the communication does not work, we get an error (apparently while a process
was trying to send data). The call looks like this:
send(socket, (char *) data, (int) data_size, 0);
By inspecting the error code we get from
WSAGetLastError()
we see that it is an error 10054. Here is what I found in the Microsoft documentation
(see here):
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally
results if the peer application on the remote host is suddenly stopped, the
host is rebooted, the host or remote network interface is disabled, or the
remote host uses a hard close (see setsockopt for more information on the
SO_LINGER option on the remote socket). This error may also result if a
connection was broken due to keep-alive activity detecting a failure while
one or more operations are in progress. Operations that were in progress
fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
So, as far as I understand, the connection was interrupted by the receiving process.
In some cases this error is (AFAIK) correct: one process has terminated and
is therefore not reachable. In other cases both the sender and receiver are running
and logging activity, but they cannot communicate due to the above error (the error
is reported in the logs).
My questions.
What does the SO_LINGER option mean?
What is a keep-alive activity and how can it break a connection?
How is it possible to avoid this problem or recover from it?
Regarding the last question. The first solution we tried (actually, it is rather a
workaround) was resending the message when the error occurs. Unfortunately, the
same error occurs over and over again for a while (a few minutes). So this is not
a solution.
At the moment we do not understand if we have a software problem or a configuration
issue: maybe we should check something in the windows registry?
One hypothesis was that the OS runs out of ephemeral ports (in case connections are
closed but ports are not released because of TcpTimedWaitDelay), but by analyzing
this issue we think that there should be plenty of them: the problem occurs even
if messages are not sent too frequently between processes. However, we still are not
100% sure that we can exclude this: can ephemeral ports get lost in some way (???)
Another detail that might help is that sending and receiving occurs in each process
concurrently in separate threads: are there any shared data structures in the
TCP/IP libraries that might get corrupted?
What is also very strange is that the problem occurs irregularly: communication works
OK for a few minutes, then it does not work for a few minutes, then it works again.
Thank you for any ideas and suggestions.
EDIT
Thanks for the hints confirming that the only possible explanation was a connection closed error. By further analysis of the problem, we found out that the server-side process of the connection had crashed / had been terminated and had been restarted. So there was a new server process running and listening on the correct port, but the client had not detected this and was still trying to use the old connection. We now have a mechanism to detect such situations and reset the connection on the client side.
That error means that the connection was closed by the
remote site. So you cannot do anything on your programm except to accept that the connection is broken.
I was facing this problem for some days recently and found out that Adobe Acrobat Reader update was the culprit. As soon as you completely uninstall Adobe from the system everything returns back to normal.
I spent a long time debugging a 10054/10053 error in s3 pre-signed uploads
Turns out that the s3 server will reject pre-signed s3 uploads for the first 15 minutes of it's life.
So - If you're debugging s3 check it's not a new bucket.
If you're debugging something else - this is most likely a problem on the server side not client side.

Resources