packet_write_wait: Connection to xxx.xxx.xxx.xxx: Broken pipe - shell

What does it mean when the terminal throw this error and how to solve it?
packet_write_wait: Connection to xxx.xxx.xxx.xxx: Broken pipe
It was just happen today. After it work normally for year.
My terminal keep disconnect at a certain time. I had already search on google but most of it is about "Write failed: Broken pipe."
Which I already solved that for years. I just found this new annoyed problems today

I experienced this problem as well and spent a few days trying to bisect it.
Like specified, playing with SSH KeepAlive parameters (ClientAliveInterval, ClientAliveCountMax, ServerAliveInterval and ServerAliveCountMax) or kernel TCP parameters (TCPKeepAlive on/off) does not solve the problem.
After playing with USB to Ethernet drivers and tcpdump, I realized the issue was due to the kernel 4.8 I was using. I switched the source (sending side) to 4.4 LTS and the problem disappeared (rsync via ssh and scp were working nicely again). The destination side can remain on 4.8 if you want, in my use case this was working (tested).
On the technical side, we can narrow a little bit the issue thanks to the wireshark dump below I made. We can see the TCP channel of the SSHv2 protocol is being reset (RST flag of TCP set to 1) causing the connection to abort. I don't know the cause of that RST yet. I need to make some bisection from 4.8.1 to 4.8.11 for that.
I'm not saying your problem is specifically due to the kernel 4.8, but wrt. the date you posted your question/message, there are high chances you are currently using a kernel more recent than 4.4.

If that is an ssh connection, then you might want to make sure you send a keepalive message to the server.

ServerAliveInterval seems to be the most common strategy to keep a connection alive. To prevent the broken pipe problem, here is the ssh config I useed in my .ssh/ssh_config file (may be named as /etc/ssh/config or sshd_config):
Host myhostshortcut
HostName myhost.com
User barthelemy
ServerAliveInterval 60
ServerAliveCountMax 10

Connect through another wifi.
I don't know why or how it works, but it does.
The original poster sthapaun already mentioned this solution in a comment, but I want to add that the solution works for me, too.

Related

Enable TCP keepalive on port open by another program

On a Debian machine I'm using an OPCUA server https://github.com/FreeOpcUa/opcua-asyncio. The server does not give the possibility to enable TCP keepalive on the port opened by the server.
Basically, I want to know if it's possible to start the server then in another script, enable the tcp keepalive on that port.
I also found some other information from Redhat https://access.redhat.com/solutions/19029, and https://access.redhat.com/solutions/25773 (requires you to sign up to see the articles). But again I'm still lost as to what to do exactly.
I'll keep reading up on this, but so far I've spent about 10 hours trying to figure out whether it's even possible. So I thought I should ask for some help.
Any advice is welcome, thanks!
For operations of socket of another process socket must be shared from it https://docs.python.org/3/library/socket.html#socket.socket.share or duplicated.
Its easier to patch your server for keepalive.

DNS resolution timeouts after upgrade to go1.5

After upgrading from go1.3 to go1.5.2, I have been experiencing DNS request timeouts in my connections which go through ResolveTCPAddr() API in net library. I dug a little deeper to see what's going wrong in go1.5 because there's was a mention of DNS resolution implementation change in release notes of go1.5.
https://golang.org/doc/go1.5#net
In go1.5, for DNS resolution of unknown address, there's first a UDP dial and then a TCP dial to make the DNS request.
https://github.com/golang/go/blob/release-branch.go1.5/src/net/dnsclient_unix.go#L134
Sometimes, the read on UDP conn for a dns request is timing out (which could be for whatever reason being UDP) and my connection establishment is terribly slowing down (1ms becomes 5sec sometimes) randomly.
This made me force --netcgo flag at compile time to revert to old behavior mentioned in the release notes. But I want to resolve this without have to force anything that's not default.
Is this a known issue? Did anybody else run into this? I also want to understand why its making a udp connection first and what it did earlier and what changed for this to act differently now.

Irregular socket errors (10054) on Windows application

I am working on a Windows (Microsoft Visual C++ 2005) application that uses several processes
running on different hosts in an intranet.
Processes communicate with each other using TCP/IP. Different processes can be on the
same host or on different hosts (i.e. the communication can be both within the same
host or between different hosts).
We have currently a bug that appears irregularly. The communication seems to work
for a while, then it stops working. Then it works again for some time.
When the communication does not work, we get an error (apparently while a process
was trying to send data). The call looks like this:
send(socket, (char *) data, (int) data_size, 0);
By inspecting the error code we get from
WSAGetLastError()
we see that it is an error 10054. Here is what I found in the Microsoft documentation
(see here):
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally
results if the peer application on the remote host is suddenly stopped, the
host is rebooted, the host or remote network interface is disabled, or the
remote host uses a hard close (see setsockopt for more information on the
SO_LINGER option on the remote socket). This error may also result if a
connection was broken due to keep-alive activity detecting a failure while
one or more operations are in progress. Operations that were in progress
fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
So, as far as I understand, the connection was interrupted by the receiving process.
In some cases this error is (AFAIK) correct: one process has terminated and
is therefore not reachable. In other cases both the sender and receiver are running
and logging activity, but they cannot communicate due to the above error (the error
is reported in the logs).
My questions.
What does the SO_LINGER option mean?
What is a keep-alive activity and how can it break a connection?
How is it possible to avoid this problem or recover from it?
Regarding the last question. The first solution we tried (actually, it is rather a
workaround) was resending the message when the error occurs. Unfortunately, the
same error occurs over and over again for a while (a few minutes). So this is not
a solution.
At the moment we do not understand if we have a software problem or a configuration
issue: maybe we should check something in the windows registry?
One hypothesis was that the OS runs out of ephemeral ports (in case connections are
closed but ports are not released because of TcpTimedWaitDelay), but by analyzing
this issue we think that there should be plenty of them: the problem occurs even
if messages are not sent too frequently between processes. However, we still are not
100% sure that we can exclude this: can ephemeral ports get lost in some way (???)
Another detail that might help is that sending and receiving occurs in each process
concurrently in separate threads: are there any shared data structures in the
TCP/IP libraries that might get corrupted?
What is also very strange is that the problem occurs irregularly: communication works
OK for a few minutes, then it does not work for a few minutes, then it works again.
Thank you for any ideas and suggestions.
EDIT
Thanks for the hints confirming that the only possible explanation was a connection closed error. By further analysis of the problem, we found out that the server-side process of the connection had crashed / had been terminated and had been restarted. So there was a new server process running and listening on the correct port, but the client had not detected this and was still trying to use the old connection. We now have a mechanism to detect such situations and reset the connection on the client side.
That error means that the connection was closed by the
remote site. So you cannot do anything on your programm except to accept that the connection is broken.
I was facing this problem for some days recently and found out that Adobe Acrobat Reader update was the culprit. As soon as you completely uninstall Adobe from the system everything returns back to normal.
I spent a long time debugging a 10054/10053 error in s3 pre-signed uploads
Turns out that the s3 server will reject pre-signed s3 uploads for the first 15 minutes of it's life.
So - If you're debugging s3 check it's not a new bucket.
If you're debugging something else - this is most likely a problem on the server side not client side.

TCP: Address already in use exception - possible causes for client port? NO PORT EXHAUSTION

stupid problem. I get those from a client connecting to a server. Sadly, the setup is complicated making debugging complex - and we run out of options.
The environment:
*Client/Server system, both running on the same machine. The client is actually a service doing some database manipulation at specific times.
* The cnonection comes from C# going through OleDb to an EasySoft JDBC driver to a custom written JDBC server that then hosts logic in C++. Yeah, compelx - but the third party supplier decided to expose the extension mechanisms for their server through a JDBC interface. Not a lot can be done here ;)
The Symptom:
At (ir)regular intervals we get a "Address already in use: connect" told from the JDBC driver. They seem to come from one particular service we run.
Now, I did read all the stuff about port exhaustion. This is why we have a little tool running now that counts ports and their states every minute. Last time this happened, we had an astonishing 370 ports in use, with the count rising to about 900 AFTER the error. We aleady patched the registry (it is a windows machine) to allow more than the 5000 client ports standard, but even then, we are far far from that limit to start with.
Which is why I am asking here. Ayneone an ide what ELSE could cause this?
It is a Windows 2003 Server machine, 64 bit. The only other thing I can see that may cause it (but this functionality is supposedly disabled) is Symantec Endpoint Protection that is installed on the server - and being capable of actinc as a firewall, it could possibly intercept network traffic. I dont want to open a can of worms by pointing to Symantec prematurely (if pointing to Symantec can ever be seen as such). So, anyone an idea what else may be the cause?
Thanks
"Address already in use", aka WSAEADDRINUSE (10048), means that when the client socket prepared to connect to the server socket, it first tried to bind itself to a specific local IP/Port pair that was already in use by another socket, either an active one or one that has been closed but is still in the FD_WAIT state. This has nothing to do with the number of ports that are available.
I'm having the same issue on a Windows 2000 Server with a .Net application connecting to a SQL Server 7.0. There's like 10 servers with the same configuration and only one is showing this error several times a day. With a small test program I'm able to reproduce the error by just establishing a TCP connection on the SQL Server listening port. Running CurrPorts (http://www.nirsoft.net/utils/cports.html) shows there's still plenty of available ports in range 1024-5000.
I'm out of ideas and would like to know if you've found a solution since you've posted your question.
Edit : I finally found the solution : a worm was present on the server (WORM_DOWNAD.A) and exhausted local ports without being noticed.

sever/kill tcp connection in windows

I would like to see how a program responds when it's connection is severed. Aside from disabling the network card, is there a way to sever a tcp connection in Windows without killing the process, or the thread that owns the connections?
The closest thing that I've found to generating an OS error is to use something like TcpView to look at what sockets are open and sever them. I'm not sure exactly what it does to sever the connection, but it does close it in a way that an application can see.
TCPView by SysInternals lets you close a connection (and see all open connections).
One thing I've seen done is to have the network code written in such a way that a connection can be severed remotely. A product I once worked on was written that way. We even had a set of torture tests that would randomly break the connections. The product was meant to be transactional, and it was instructive to see how it behaved.
Of course, we then found a customer whose network was actually breaking connections all the time, and were very glad we'd tested so hard.
Why not just unplug the network cable?

Resources