Oracle JDBC intermittent connection reset SQLRecoverableException [duplicate] - oracle

I am getting the following error trying to read from a socket. I'm doing a readInt() on that InputStream, and I am getting this error. Perusing the documentation this suggests that the client part of the connection closed the connection. In this scenario, I am the server.
I have access to the client log files and it is not closing the connection, and in fact its log files suggest I am closing the connection. So does anybody have an idea why this is happening? What else to check for? Does this arise when there are local resources that are perhaps reaching thresholds?
I do note that I have the following line:
socket.setSoTimeout(10000);
just prior to the readInt(). There is a reason for this (long story), but just curious, are there circumstances under which this might lead to the indicated error? I have the server running in my IDE, and I happened to leave my IDE stuck on a breakpoint, and I then noticed the exact same errors begin appearing in my own logs in my IDE.
Anyway, just mentioning it, hopefully not a red herring. :-(

There are several possible causes.
The other end has deliberately reset the connection, in a way which I will not document here. It is rare, and generally incorrect, for application software to do this, but it is not unknown for commercial software.
More commonly, it is caused by writing to a connection that the other end has already closed normally. In other words an application protocol error.
It can also be caused by closing a socket when there is unread data in the socket receive buffer.
In Windows, 'software caused connection abort', which is not the same as 'connection reset', is caused by network problems sending from your end. There's a Microsoft knowledge base article about this.

Connection reset simply means that a TCP RST was received. This happens when your peer receives data that it can't process, and there can be various reasons for that.
The simplest is when you close the socket, and then write more data on the output stream. By closing the socket, you told your peer that you are done talking, and it can forget about your connection. When you send more data on that stream anyway, the peer rejects it with an RST to let you know it isn't listening.
In other cases, an intervening firewall or even the remote host itself might "forget" about your TCP connection. This could happen if you don't send any data for a long time (2 hours is a common time-out), or because the peer was rebooted and lost its information about active connections. Sending data on one of these defunct connections will cause a RST too.
Update in response to additional information:
Take a close look at your handling of the SocketTimeoutException. This exception is raised if the configured timeout is exceeded while blocked on a socket operation. The state of the socket itself is not changed when this exception is thrown, but if your exception handler closes the socket, and then tries to write to it, you'll be in a connection reset condition. setSoTimeout() is meant to give you a clean way to break out of a read() operation that might otherwise block forever, without doing dirty things like closing the socket from another thread.

Whenever I have had odd issues like this, I usually sit down with a tool like WireShark and look at the raw data being passed back and forth. You might be surprised where things are being disconnected, and you are only being notified when you try and read.

You should inspect full trace very carefully,
I've a server socket application and fixed a java.net.SocketException: Connection reset case.
In my case it happens while reading from a clientSocket Socket object which is closed its connection because of some reason. (Network lost,firewall or application crash or intended close)
Actually I was re-establishing connection when I got an error while reading from this Socket object.
Socket clientSocket = ServerSocket.accept();
is = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
int readed = is.read(); // WHERE ERROR STARTS !!!
The interesting thing is for my JAVA Socket if a client connects to my ServerSocket and close its connection without sending anything is.read() is being called repeatedly.It seems because of being in an infinite while loop for reading from this socket you try to read from a closed connection.
If you use something like below for read operation;
while(true)
{
Receive();
}
Then you get a stackTrace something like below on and on
java.net.SocketException: Socket is closed
at java.net.ServerSocket.accept(ServerSocket.java:494)
What I did is just closing ServerSocket and renewing my connection and waiting for further incoming client connections
String Receive() throws Exception
{
try {
int readed = is.read();
....
}catch(Exception e)
{
tryReConnect();
logit(); //etc
}
//...
}
This reestablises my connection for unknown client socket losts
private void tryReConnect()
{
try
{
ServerSocket.close();
//empty my old lost connection and let it get by garbage col. immediately
clientSocket=null;
System.gc();
//Wait a new client Socket connection and address this to my local variable
clientSocket= ServerSocket.accept(); // Waiting for another Connection
System.out.println("Connection established...");
}catch (Exception e) {
String message="ReConnect not successful "+e.getMessage();
logit();//etc...
}
}
I couldn't find another way because as you see from below image you can't understand whether connection is lost or not without a try and catch ,because everything seems right . I got this snapshot while I was getting Connection reset continuously.

Embarrassing to say it, but when I had this problem, it was simply a mistake that I was closing the connection before I read all the data. In cases with small strings being returned, it worked, but that was probably due to the whole response was buffered, before I closed it.
In cases of longer amounts of text being returned, the exception was thrown, since more then a buffer was coming back.
You might check for this oversight. Remember opening a URL is like a file, be sure to close it (release the connection) once it has been fully read.

I had the same error. I found the solution for problem now. The problem was client program was finishing before server read the streams.

I had this problem with a SOA system written in Java. I was running both the client and the server on different physical machines and they worked fine for a long time, then those nasty connection resets appeared in the client log and there wasn't anything strange in the server log. Restarting both client and server didn't solve the problem. Finally we discovered that the heap on the server side was rather full so we increased the memory available to the JVM: problem solved! Note that there was no OutOfMemoryError in the log: memory was just scarce, not exhausted.

Check your server's Java version. Happened to me because my Weblogic 10.3.6 was on JDK 1.7.0_75 which was on TLSv1. The rest endpoint I was trying to consume was shutting down anything below TLSv1.2.
By default Weblogic was trying to negotiate the strongest shared protocol. See details here: Issues with setting https.protocols System Property for HTTPS connections.
I added verbose SSL logging to identify the supported TLS. This indicated TLSv1 was being used for the handshake.
-Djavax.net.debug=ssl:handshake:verbose:keymanager:trustmanager -Djava.security.debug=access:stack
I resolved this by pushing the feature out to our JDK8-compatible product, JDK8 defaults to TLSv1.2. For those restricted to JDK7, I also successfully tested a workaround for Java 7 by upgrading to TLSv1.2. I used this answer: How to enable TLS 1.2 in Java 7

I also had this problem with a Java program trying to send a command on a server via SSH. The problem was with the machine executing the Java code. It didn't have the permission to connect to the remote server. The write() method was doing alright, but the read() method was throwing a java.net.SocketException: Connection reset. I fixed this problem with adding the client SSH key to the remote server known keys.

In my case was DNS problem .
I put in host file the resolved IP and everything works fine.
Of course it is not a permanent solution put this give me time to fix the DNS problem.

In my experience, I often encounter the following situations;
If you work in a corporate company, contact the network and security team. Because in requests made to external services, it may be necessary to give permission for the relevant endpoint.
Another issue is that the SSL certificate may have expired on the server where your application is running.

I've seen this problem. In my case, there was an error caused by reusing the same ClientRequest object in an specific Java class. That project was using Jboss Resteasy.
Initially only one method was using/invoking the object ClientRequest (placed as global variable in the class) to do a request in an specific URL.
After that, another method was created to get data with another URL, reusing the same ClientRequest object, though.
The solution: in the same class was created another ClientRequest object and exclusively to not be reused.

In my case it was problem with TSL version. I was using Retrofit with OkHttp client and after update ALB on server side I should have to delete my config with connectionSpecs:
OkHttpClient.Builder clientBuilder = new OkHttpClient.Builder();
List<ConnectionSpec> connectionSpecs = new ArrayList<>();
connectionSpecs.add(ConnectionSpec.COMPATIBLE_TLS);
// clientBuilder.connectionSpecs(connectionSpecs);
So try to remove or add this config to use different TSL configurations.

I used to get the 'NotifyUtil::java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:...' message in the Apache Console of my Netbeans7.4 setup.
I tried many solutions to get away from it, what worked for me is enabling the TLS on Tomcat.
Here is how to:
Create a keystore file to store the server's private key and
self-signed certificate by executing the following command:
Windows:
"%JAVA_HOME%\bin\keytool" -genkey -alias tomcat -keyalg RSA
Unix:
$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA
and specify a password value of "changeit".
As per https://tomcat.apache.org/tomcat-7.0-doc/ssl-howto.html
(This will create a .keystore file in your localuser dir)
Then edit server.xml (uncomment and edit relevant lines) file (%CATALINA_HOME%apache-tomcat-7.0.41.0_base\conf\server.xml) to enable SSL and TLS protocol:
<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
maxThreads="150" scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS" keystorePass="changeit" />
I hope this helps

Related

socketException broken pipe upon upgrading httpclient jar version to 4.5.3

I am getting socket exception for broken pipe in my client side.
[write] I/O error: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe (Write failed)
[LoggingManagedHttpClientConnection::shutdown] http-outgoing-278: Shutdown connection
1520546494584[20180308 23:01:34] [ConnectionHolder::abortConnection] Connection discarded
1520546494584[20180308 23:01:34] [BasicHttpClientConnectionManager::releaseConnection] Releasing connection [Not bound]
It seems that the upgradation of httpclient jar is causing issue.
Issue is not coming with httpclient-4.3.2
Exception is coming in every 2 minutes. Issue is intermittent at times.
after , send expect:100-continue ,conn.flush is throwing exception
client and server are Linux machine
client uses http jar to make request to server REST.
Please help me in debugging the issue
can httpjar cause such issue?
The persistent connections that are kept alive by the connection manager become stale. That is, the target server shuts down the connection on its end without HttpClient being able to react to that event, while the connection is being idle, thus rendering the connection half-closed or 'stale'
This is a general limitation of the blocking I/O in Java. There is simply no way of finding out whether or not the opposite endpoint has closed connection other than by attempting to read from the socket.
If a stale connection is used to transmit a request message the request execution usually fails in the write operation with SocketException and gets automatically retried.
Apache HttpClient works this problem around by employing the so stale connection check which is essentially a very brief read operation. However, the check can and often is disabled. In fact it is often advisable to have it disabled due to extra latency the check introduces.
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms

Postgresql: No connection could be made because the target machine actively refused it

Running Postgresql 9.5 on a windows server 2012 R2 in Azure
While running some loadtests on my application, I get errors on not being able to connect to the postgres server. In the logs of postgres I get the following message:
could not receive data from client: No connection could be made
because the target machine actively refused it.
This only happens when the loadtest goes to the next scenario, hitting a different part of the code. So new connections to the database are required. But after 10-20 seconds the rest of the scenario works flawlessly without hitting any other hiccups. So the problem seems to be the tcp connections. (My code retries a couple of times but it is not feasible to let it retry for 20 seconds)
I'm using the following settings in the config files
postgresql.conf
listen_addresses = '*'
max_connections = 500
shared_buffers = 1024MB
temp_buffers = 2MB
work_mem = 2MB
maintenance_work_mem = 128MB
pg_hba.conf
host all all 0.0.0.0/0 trust
host all all ::/0 trust
I know, I know.. It is not save to accept connections from everyone, but this is just for testing purposes and to make sure these settings are not blocking any connection. So this answer is void
I've been monitoring the number of connection on the server and under the load it is stable at 75. Postgres is using around 350mb of RAM. So given the config and the vm specs (7gb ram) there should be plenty of space to create more connections. However when the next scenario is spinning up the number of connections does not increase, it stays level and starts giving these log messages about no connection could be made.
What could be the problem here?
It does sound like this isn't really a Postgres problem (hence no changes in DB stats you're checking), rather that the traffic is being stopped by the server. Possibly because traffic on that port is saturated while handling your load testing queries?
It doesn't sound like you're hitting any of the Azure resource limits (including the database limits if that applies to your setup?), but without more detail on your load tests it's hard to say exactly what is needed.
Solutions from around the web and other SO answers suggest:
Disable TCP autotuning and tweak the TCP/IP registry keys on the server, e.g. set TcpAckFrequency - see this article for details
Make TCP setting adjustments (like WinsockListenBacklog) - which may be affected by whether connection pooling is in use or not - see this MS support article, which is for SQL Server 2005 but has some great tips on troubleshooting rejected TCP/IP connections (using Network Monitor, but applies to newer tools)
Faster request processing if you have enough control of the server - source
Disabling network proxying (in your load testing app): <defaultProxy> <proxy usesystemdefault="False"/> </defaultProxy> - source
Most possible reason is a Firewall/Anti-virus:
Software/Personal Firewall Settings
Multiple Software/Personal Firewalls
Anti-virus Software
LSP Layer
(Virtual) Router Firmware
Does your current Azure infrastructure contain Firewall or Anti-virus ?
Additionally on doing some additional searches, it looks like this is a standard Windows "connection refused" message, which suggests that PostgreSQL is trying to connect to something and being refused.
Also possible that one network element in your network - assuming that you are still connected to the server - will delay or drop somes DB login/authentication network packets (considered for example as a fake auth.replay) ...
You may also use a packet analyzer (like Wireshark) to record/inspect network flow when the error appear.
Regards
I was facing the same issue in my AspNet core application while I was trying to connect the Postgresql from my application. The error was thrown in the Program.cs file when I was calling the Migrate function.
public static void Main(string[] args) {
try {
var host = BuildWebHost(args);
using(var scope = host.Services.CreateScope()) {
// Migrate once after app is started.
scope.ServiceProvider.GetService <MyDatabaseContext>().Migrate();
}
host.Run();
}
catch(Exception e) {
//NLog: catch setup errors
_logger ? .Error(e, "Stopped program because of exception: ");
throw;
}
}
To fix this problem I did the following steps.
Check whether the Postgresql service is running by going to the services.msc
Tried to login to the pgAdmin with the user and password I provided in the database context
Everything was file, and as you know that 5432 is the default port of Postgresql and somehow I was using a different port in my application connection string, changing it to 5432 fixed this issue for me.
"ConnectionString": "User Id=postgres;Password=mypwd;Host=localhost;Port=5432;Database=mydb;"
I came across a similar issue whilst trying to beast my api, where I was seeing Npgsql.NpgsqlException No connection could be made because the target machine actively refused it..
However my issue was was down to the fact that I was re-creating my NpgsqlConnection for each query rather than re-using and keeping it alive.

Stale connection with Pheanstalk

I'm using beanstalkd to offload some work to other machines. The setup is a bit unusual, the server is on the internet (public ip) but the consumers are behind adsl lines on some peoples homes. So there is a linux server as client going out through a dynamic ip and connecting to the server to get a job. It's all PHP and I'm using pheanstalk library.
Everything runs smoothly for some time, but then the adsl changes the IP (every 24h hours the provider forces a disconnect-reconnect) the client just hangs, never to go out of "reserve".
I thought that putting a timeout on the reserve would help it, but it didn't. As it seems, the client issues a command and blocks, it never checks the timeout. It just issues a reserve-with-timeout (instead of a simple reserve) and it is the servers responsibility to return a TIME_OUT as the timeout occurs. The problem is, the connection is broken (but the TCP/IP doesn't know about that yet until any of the sides try to talk to the other side) and if the client blocked reading, it will never return.
The library seems to have support for some kind of timeouts locally (for example when trying to connect to server), but it does not seem to contemplate this scenario.
How could I detect the stale connection and force a reconnect? Is there some kind of keepalive on the protocol (and on the pheanstalk itself)?
Thanks!
You could try to close each connection right after the request is answered and reopen a new connection each time.
There is no close() function but you deleting the Pheanstaly Object with unset($pheanstalk) will close it.
This explanation is quite helpful:
Pheanstalk (PHP client for beanstalk) - how do connections work?
I haven't tried it yet, but I came up with the idea of connecting to the beanstalk server through an SSH tunnel. We can enable the ServerAliveCountMax and ServerAliveInterval options on the tunnel, so that a network or server failure will cause the tunnel to close. This should then cause the pheanstalk client to report an error.

Irregular socket errors (10054) on Windows application

I am working on a Windows (Microsoft Visual C++ 2005) application that uses several processes
running on different hosts in an intranet.
Processes communicate with each other using TCP/IP. Different processes can be on the
same host or on different hosts (i.e. the communication can be both within the same
host or between different hosts).
We have currently a bug that appears irregularly. The communication seems to work
for a while, then it stops working. Then it works again for some time.
When the communication does not work, we get an error (apparently while a process
was trying to send data). The call looks like this:
send(socket, (char *) data, (int) data_size, 0);
By inspecting the error code we get from
WSAGetLastError()
we see that it is an error 10054. Here is what I found in the Microsoft documentation
(see here):
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally
results if the peer application on the remote host is suddenly stopped, the
host is rebooted, the host or remote network interface is disabled, or the
remote host uses a hard close (see setsockopt for more information on the
SO_LINGER option on the remote socket). This error may also result if a
connection was broken due to keep-alive activity detecting a failure while
one or more operations are in progress. Operations that were in progress
fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
So, as far as I understand, the connection was interrupted by the receiving process.
In some cases this error is (AFAIK) correct: one process has terminated and
is therefore not reachable. In other cases both the sender and receiver are running
and logging activity, but they cannot communicate due to the above error (the error
is reported in the logs).
My questions.
What does the SO_LINGER option mean?
What is a keep-alive activity and how can it break a connection?
How is it possible to avoid this problem or recover from it?
Regarding the last question. The first solution we tried (actually, it is rather a
workaround) was resending the message when the error occurs. Unfortunately, the
same error occurs over and over again for a while (a few minutes). So this is not
a solution.
At the moment we do not understand if we have a software problem or a configuration
issue: maybe we should check something in the windows registry?
One hypothesis was that the OS runs out of ephemeral ports (in case connections are
closed but ports are not released because of TcpTimedWaitDelay), but by analyzing
this issue we think that there should be plenty of them: the problem occurs even
if messages are not sent too frequently between processes. However, we still are not
100% sure that we can exclude this: can ephemeral ports get lost in some way (???)
Another detail that might help is that sending and receiving occurs in each process
concurrently in separate threads: are there any shared data structures in the
TCP/IP libraries that might get corrupted?
What is also very strange is that the problem occurs irregularly: communication works
OK for a few minutes, then it does not work for a few minutes, then it works again.
Thank you for any ideas and suggestions.
EDIT
Thanks for the hints confirming that the only possible explanation was a connection closed error. By further analysis of the problem, we found out that the server-side process of the connection had crashed / had been terminated and had been restarted. So there was a new server process running and listening on the correct port, but the client had not detected this and was still trying to use the old connection. We now have a mechanism to detect such situations and reset the connection on the client side.
That error means that the connection was closed by the
remote site. So you cannot do anything on your programm except to accept that the connection is broken.
I was facing this problem for some days recently and found out that Adobe Acrobat Reader update was the culprit. As soon as you completely uninstall Adobe from the system everything returns back to normal.
I spent a long time debugging a 10054/10053 error in s3 pre-signed uploads
Turns out that the s3 server will reject pre-signed s3 uploads for the first 15 minutes of it's life.
So - If you're debugging s3 check it's not a new bucket.
If you're debugging something else - this is most likely a problem on the server side not client side.

Fault Tolerance JMS URL in Java

I am doing a JMS connection using Java. The command I am using to establish connection is
QueueConnectionFactory factory =
new com.tibco.tibjms.TibjmsQueueConnectionFactory(JMSserverUrl);
Where JMSServerUrl is the varible which stores my JMS URL.
Now the problem is that I need to add the fault tolerance URL i.e two different URL's. So can any one tell me how can I specify two URLs together in the above code sample such that if first URL is not accessible it should try connecting to the other URL.
Put all URLs in a single string with a comma between them.
new TibjmsQueueConnectionFactory("ssl://host01:20302,ssl://host02:20302");
Caution, I am a Tibco EMS newbie, but this seems to work, as evidenced by the error I can get ...
javax.jms.JMSSecurityException: Failed to connect to any server at:
ssl://host01:20302,ssl://host02:20302
[Error: Can not initialize SSL client: no trusted certificates are set:
url that returned this exception = SSL://host01:20302 ]
The .NET documentation for tibco(I know your using java) suggests that you can provide a comma delimited list of server URL's for messaging connections. Bear in mind that I don't have any real tibco experience, but this is a common way to handle initial connection fault tolerance(i.e. prior to establishing a connection and receiving information about the cluster, after which failover is typically handled by the connection). It may be worth a try. Another solution that I have seen to this problem is creating a virtual IP and handling fault tolerance at the Network Level.

Resources