Why are some WCF Named Pipe clients getting a TimeoutException - windows

I have a WCF name pipe service, to communicate 2 processes on windows.
I've tested it on my machine and it worked, I've developed an installer for it, and it's working for most users, but some users are getting a TimeoutException with this message:
"request operation sent to net.pipe://localhost/myService did not receive a reply within the configured timeout (00:00:10). The time allotted to this operation may have been a portion of a longer timeout. This may be because the service is still processing the operation or because the service was unable to send a reply message. Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client."
What can be the reason for this error? Is it a security issue? Where can I set security settings for named pipes?
Thanks

Some users may not receive a reply in time due to network problems, which may cause a timeout.
You can configure the timeout setting in configuration file by setting the send and receive timeout on binding element.
Apply the binding configuration on both service and client-side:
<netNamedPipeBinding>
<binding sendTimeout="00:20:00" receiveTimeout="00:20:00">
<reliableSession inactivityTimeout="00:20:00"/>
</binding>
</netNamedPipeBinding>

Related

Oracle JDBC intermittent connection reset SQLRecoverableException [duplicate]

I am getting the following error trying to read from a socket. I'm doing a readInt() on that InputStream, and I am getting this error. Perusing the documentation this suggests that the client part of the connection closed the connection. In this scenario, I am the server.
I have access to the client log files and it is not closing the connection, and in fact its log files suggest I am closing the connection. So does anybody have an idea why this is happening? What else to check for? Does this arise when there are local resources that are perhaps reaching thresholds?
I do note that I have the following line:
socket.setSoTimeout(10000);
just prior to the readInt(). There is a reason for this (long story), but just curious, are there circumstances under which this might lead to the indicated error? I have the server running in my IDE, and I happened to leave my IDE stuck on a breakpoint, and I then noticed the exact same errors begin appearing in my own logs in my IDE.
Anyway, just mentioning it, hopefully not a red herring. :-(
There are several possible causes.
The other end has deliberately reset the connection, in a way which I will not document here. It is rare, and generally incorrect, for application software to do this, but it is not unknown for commercial software.
More commonly, it is caused by writing to a connection that the other end has already closed normally. In other words an application protocol error.
It can also be caused by closing a socket when there is unread data in the socket receive buffer.
In Windows, 'software caused connection abort', which is not the same as 'connection reset', is caused by network problems sending from your end. There's a Microsoft knowledge base article about this.
Connection reset simply means that a TCP RST was received. This happens when your peer receives data that it can't process, and there can be various reasons for that.
The simplest is when you close the socket, and then write more data on the output stream. By closing the socket, you told your peer that you are done talking, and it can forget about your connection. When you send more data on that stream anyway, the peer rejects it with an RST to let you know it isn't listening.
In other cases, an intervening firewall or even the remote host itself might "forget" about your TCP connection. This could happen if you don't send any data for a long time (2 hours is a common time-out), or because the peer was rebooted and lost its information about active connections. Sending data on one of these defunct connections will cause a RST too.
Update in response to additional information:
Take a close look at your handling of the SocketTimeoutException. This exception is raised if the configured timeout is exceeded while blocked on a socket operation. The state of the socket itself is not changed when this exception is thrown, but if your exception handler closes the socket, and then tries to write to it, you'll be in a connection reset condition. setSoTimeout() is meant to give you a clean way to break out of a read() operation that might otherwise block forever, without doing dirty things like closing the socket from another thread.
Whenever I have had odd issues like this, I usually sit down with a tool like WireShark and look at the raw data being passed back and forth. You might be surprised where things are being disconnected, and you are only being notified when you try and read.
You should inspect full trace very carefully,
I've a server socket application and fixed a java.net.SocketException: Connection reset case.
In my case it happens while reading from a clientSocket Socket object which is closed its connection because of some reason. (Network lost,firewall or application crash or intended close)
Actually I was re-establishing connection when I got an error while reading from this Socket object.
Socket clientSocket = ServerSocket.accept();
is = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
int readed = is.read(); // WHERE ERROR STARTS !!!
The interesting thing is for my JAVA Socket if a client connects to my ServerSocket and close its connection without sending anything is.read() is being called repeatedly.It seems because of being in an infinite while loop for reading from this socket you try to read from a closed connection.
If you use something like below for read operation;
while(true)
{
Receive();
}
Then you get a stackTrace something like below on and on
java.net.SocketException: Socket is closed
at java.net.ServerSocket.accept(ServerSocket.java:494)
What I did is just closing ServerSocket and renewing my connection and waiting for further incoming client connections
String Receive() throws Exception
{
try {
int readed = is.read();
....
}catch(Exception e)
{
tryReConnect();
logit(); //etc
}
//...
}
This reestablises my connection for unknown client socket losts
private void tryReConnect()
{
try
{
ServerSocket.close();
//empty my old lost connection and let it get by garbage col. immediately
clientSocket=null;
System.gc();
//Wait a new client Socket connection and address this to my local variable
clientSocket= ServerSocket.accept(); // Waiting for another Connection
System.out.println("Connection established...");
}catch (Exception e) {
String message="ReConnect not successful "+e.getMessage();
logit();//etc...
}
}
I couldn't find another way because as you see from below image you can't understand whether connection is lost or not without a try and catch ,because everything seems right . I got this snapshot while I was getting Connection reset continuously.
Embarrassing to say it, but when I had this problem, it was simply a mistake that I was closing the connection before I read all the data. In cases with small strings being returned, it worked, but that was probably due to the whole response was buffered, before I closed it.
In cases of longer amounts of text being returned, the exception was thrown, since more then a buffer was coming back.
You might check for this oversight. Remember opening a URL is like a file, be sure to close it (release the connection) once it has been fully read.
I had the same error. I found the solution for problem now. The problem was client program was finishing before server read the streams.
I had this problem with a SOA system written in Java. I was running both the client and the server on different physical machines and they worked fine for a long time, then those nasty connection resets appeared in the client log and there wasn't anything strange in the server log. Restarting both client and server didn't solve the problem. Finally we discovered that the heap on the server side was rather full so we increased the memory available to the JVM: problem solved! Note that there was no OutOfMemoryError in the log: memory was just scarce, not exhausted.
Check your server's Java version. Happened to me because my Weblogic 10.3.6 was on JDK 1.7.0_75 which was on TLSv1. The rest endpoint I was trying to consume was shutting down anything below TLSv1.2.
By default Weblogic was trying to negotiate the strongest shared protocol. See details here: Issues with setting https.protocols System Property for HTTPS connections.
I added verbose SSL logging to identify the supported TLS. This indicated TLSv1 was being used for the handshake.
-Djavax.net.debug=ssl:handshake:verbose:keymanager:trustmanager -Djava.security.debug=access:stack
I resolved this by pushing the feature out to our JDK8-compatible product, JDK8 defaults to TLSv1.2. For those restricted to JDK7, I also successfully tested a workaround for Java 7 by upgrading to TLSv1.2. I used this answer: How to enable TLS 1.2 in Java 7
I also had this problem with a Java program trying to send a command on a server via SSH. The problem was with the machine executing the Java code. It didn't have the permission to connect to the remote server. The write() method was doing alright, but the read() method was throwing a java.net.SocketException: Connection reset. I fixed this problem with adding the client SSH key to the remote server known keys.
In my case was DNS problem .
I put in host file the resolved IP and everything works fine.
Of course it is not a permanent solution put this give me time to fix the DNS problem.
In my experience, I often encounter the following situations;
If you work in a corporate company, contact the network and security team. Because in requests made to external services, it may be necessary to give permission for the relevant endpoint.
Another issue is that the SSL certificate may have expired on the server where your application is running.
I've seen this problem. In my case, there was an error caused by reusing the same ClientRequest object in an specific Java class. That project was using Jboss Resteasy.
Initially only one method was using/invoking the object ClientRequest (placed as global variable in the class) to do a request in an specific URL.
After that, another method was created to get data with another URL, reusing the same ClientRequest object, though.
The solution: in the same class was created another ClientRequest object and exclusively to not be reused.
In my case it was problem with TSL version. I was using Retrofit with OkHttp client and after update ALB on server side I should have to delete my config with connectionSpecs:
OkHttpClient.Builder clientBuilder = new OkHttpClient.Builder();
List<ConnectionSpec> connectionSpecs = new ArrayList<>();
connectionSpecs.add(ConnectionSpec.COMPATIBLE_TLS);
// clientBuilder.connectionSpecs(connectionSpecs);
So try to remove or add this config to use different TSL configurations.
I used to get the 'NotifyUtil::java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:...' message in the Apache Console of my Netbeans7.4 setup.
I tried many solutions to get away from it, what worked for me is enabling the TLS on Tomcat.
Here is how to:
Create a keystore file to store the server's private key and
self-signed certificate by executing the following command:
Windows:
"%JAVA_HOME%\bin\keytool" -genkey -alias tomcat -keyalg RSA
Unix:
$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA
and specify a password value of "changeit".
As per https://tomcat.apache.org/tomcat-7.0-doc/ssl-howto.html
(This will create a .keystore file in your localuser dir)
Then edit server.xml (uncomment and edit relevant lines) file (%CATALINA_HOME%apache-tomcat-7.0.41.0_base\conf\server.xml) to enable SSL and TLS protocol:
<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
maxThreads="150" scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS" keystorePass="changeit" />
I hope this helps

The fix client can receive incoming messages but cannot send outgoing heartbeat message

We have built a fix client. The fix client can receive incoming messages but cannot send outgoing heartbeat message or reply the TestRequest message after the last heartbeat was sent, something is triggered to stop sending heartbeat anymore from client side.
fix version: fix5.0
The same incident happened before, we have tcpdump for one session in that time
we deploy every fix session to separated k8s pods.
We doubted it's CPU resource issue because the load average is high around the issue time, but it's not solved after we add more cpu cores. we think the load average is high because of fix reconnection.
We doubted it's IO issue because we use AWS efs which shared by 3 sessions for logging and message store. but it's still not solved after we use pod affinity to assign 3 sessions to different nodes.
It's not a network issue either, since we can receive fix messages, other sessions worked well at that time. We have disabled SNAT in k8s cluster too.
We are using quickfixj 2.2.0 to create a fix client, we have 3 sessions, which are deployed to k8s pods in separated nodes.
rate session to get fx price from server
order session to get transaction(execution report) messages from server, we only send logon/heartbeat/logout messages to server.
backoffice session to get marketstatus
We use apache camel quickfixj component to make our programming easy. It works well in most time, but it keeps happening to reconnect to fix servers in 3 sessions, the frequency is like once a month, mostly only 2 sessions have issues.
heartbeatInt = 30s
The fix event messages at client side
20201004-21:10:53.203 Already disconnected: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=1)
20201004-21:10:53.271 MINA session created: local=/172.28.65.164:44974, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/10.60.45.132:11050
20201004-21:10:53.537 Initiated logon request
20201004-21:10:53.643 Setting DefaultApplVerID (1137=9) from Logon
20201004-21:10:53.643 Logon contains ResetSeqNumFlag=Y, resetting sequence numbers to 1
20201004-21:10:53.643 Received logon
The fix incoming messages at client side
8=FIXT.1.1☺9=65☺35=0☺34=2513☺49=Quote1☺52=20201004-21:09:02.887☺56=TA_Quote1☺10=186☺
8=FIXT.1.1☺9=65☺35=0☺34=2514☺49=Quote1☺52=20201004-21:09:33.089☺56=TA_Quote1☺10=185☺
8=FIXT.1.1☺9=74☺35=1☺34=2515☺49=Quote1☺52=20201004-21:09:48.090☺56=TA_Quote1☺112=TEST☺10=203☺
----- 21:10:53.203 Already disconnected ----
8=FIXT.1.1☺9=87☺35=A☺34=1☺49=Quote1☺52=20201004-21:10:53.639☺56=TA_Quote1☺98=0☺108=30☺141=Y☺1137=9☺10=183☺
8=FIXT.1.1☺9=62☺35=0☺34=2☺49=Quote1☺52=20201004-21:11:23.887☺56=TA_Quote1☺10=026☺
The fix outgoing messages at client side
8=FIXT.1.1☺9=65☺35=0☺34=2513☺49=TA_Quote1☺52=20201004-21:09:02.884☺56=Quote1☺10=183☺
---- no heartbeat message around 21:09:32 ----
---- 21:10:53.203 Already disconnected ---
8=FIXT.1.1☺9=134☺35=A☺34=1☺49=TA_Quote1☺52=20201004-21:10:53.433☺56=Quote1☺98=0☺108=30☺141=Y☺553=xxxx☺554=xxxxx☺1137=9☺10=098☺
8=FIXT.1.1☺9=62☺35=0☺34=2☺49=TA_Quote1☺52=20201004-21:11:23.884☺56=Quote1☺10=023☺
8=FIXT.1.1☺9=62☺35=0☺34=3☺49=TA_Quote1☺52=20201004-21:11:53.884☺56=Quote1☺10=027☺
Thread dump when TEST message from server was received.BTW, The gist is from our development environment which has the same deployment.
https://gist.github.com/hitxiang/345c8f699b4ad1271749e00b7517bef6
We had enabled the debug log at quickfixj, but not much information, only logs for messages receieved.
The sequence in time serial
20201101-23:56:02.742 Outgoing heartbeat should be sent at this time, Looks like it's sending, but hung at io writing - in Running state
20201101-23:56:18.651 test message from server side to trigger thread dump
20201101-22:57:45.654 server side began to close the connection
20201101-22:57:46.727 thread dump - right
20201101-23:57:48.363 logon message
20201101-22:58:56.515 thread dump - left
The right(2020-11-01T22:57:46.727Z): when it hangs, The left(2020-11-01T22:58:56.515Z): after reconnection
It looks like that the storage - aws efs we are using made the issue happen.
But the feedback from aws support is that nothing is wrong at aws efs side.
Maybe it's the network issue between k8s ec2 instance and aws efs.
First, we make the logging async at all session, make the disconnection happen less.
Second, for market session, we write the sequence files to local disk, the disconnection had gone at market session.
Third, at last we replaced the aws efs with aws ebs(persist volume in k8s) for all sessions. It works great now.
BTW, aws ebs is not high availability across zone, but it's better than fix disconnection.

java.net.SocketException: Connection reset on reaching 3000 users in JMeteR

All required changes have been done to respective files like:
stalecheck=true,
keepalive is checked from HTTP request defaults,
retrycount=1,
hc.parameters file changes,
Socket timeout is 240000
Still we see "java.net.SocketException: Connection reset" in response data however I see the valid requests been passed to Server.
The issue wasnt till we reach 3000 users, worked smoothly till 3000 users.
Connection Reset has a lot of meaning, possible reasons are:
One of the server components is not able to handle load so it closes connections on its side
On JMeter side, check that you running in NON GUI mode and that neither JMeter JVM nor injector machine are overloaded which could explain this. See:
https://jmeter.apache.org/usermanual/get-started.html#non_gui

socketException broken pipe upon upgrading httpclient jar version to 4.5.3

I am getting socket exception for broken pipe in my client side.
[write] I/O error: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe (Write failed)
[LoggingManagedHttpClientConnection::shutdown] http-outgoing-278: Shutdown connection
1520546494584[20180308 23:01:34] [ConnectionHolder::abortConnection] Connection discarded
1520546494584[20180308 23:01:34] [BasicHttpClientConnectionManager::releaseConnection] Releasing connection [Not bound]
It seems that the upgradation of httpclient jar is causing issue.
Issue is not coming with httpclient-4.3.2
Exception is coming in every 2 minutes. Issue is intermittent at times.
after , send expect:100-continue ,conn.flush is throwing exception
client and server are Linux machine
client uses http jar to make request to server REST.
Please help me in debugging the issue
can httpjar cause such issue?
The persistent connections that are kept alive by the connection manager become stale. That is, the target server shuts down the connection on its end without HttpClient being able to react to that event, while the connection is being idle, thus rendering the connection half-closed or 'stale'
This is a general limitation of the blocking I/O in Java. There is simply no way of finding out whether or not the opposite endpoint has closed connection other than by attempting to read from the socket.
If a stale connection is used to transmit a request message the request execution usually fails in the write operation with SocketException and gets automatically retried.
Apache HttpClient works this problem around by employing the so stale connection check which is essentially a very brief read operation. However, the check can and often is disabled. In fact it is often advisable to have it disabled due to extra latency the check introduces.
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms

What is the difference between "ORA-12571: TNS packet writer failure" and "ORA-03135: connection lost contact"?

I am working in an environment where we get production issues from time to time related to Oracle connections. We use ODP.NET from ASP.NET applications, and we suspect the firewall closes connections that have been in the connection pool too long.
Sometimes we get an "ORA-12571: TNS packet writer failure" error, and sometimes we get "ORA-03135: connection lost contact."
I was wondering if someone has run into this and/or has an understanding of the difference between the 2 errors.
Using a mobile phone analogy:
ORA-12571 (Failure) Means call is dropped.
ORA-03135 (Connection Lost) Other party hung up.
My understanding is that 3135 occurs when a connection is lost. This doesn't tell you why the connection was lost, though. It may have been terminated by the server because the server failed to recieve a response to a probe for a certain amount of time, and assumed that the connection was dead. Or (I'm not sure about this) the exact reverse of that: the client failed to recieve a probe response from the server for a certain amount of time, so it assumed the connection was lost. The "certain amount of time" is cotrolled by SQLNET.EXPIRE_TIME=[minutes] in sqlnet.ora.
As for 12571, my (again vague) understanding is that there was a sudden failure to send a packet during communication with the server, and that this is typically caused by some software or hardware interfering with the connection (either by design, or by error). For instance, if you pull out your ethernet cable and then try to execute a query, you'll probably get this. Or if a firewall or anti-malware application decides to block the traffic.

Resources