Stale connection with Pheanstalk - beanstalkd

I'm using beanstalkd to offload some work to other machines. The setup is a bit unusual, the server is on the internet (public ip) but the consumers are behind adsl lines on some peoples homes. So there is a linux server as client going out through a dynamic ip and connecting to the server to get a job. It's all PHP and I'm using pheanstalk library.
Everything runs smoothly for some time, but then the adsl changes the IP (every 24h hours the provider forces a disconnect-reconnect) the client just hangs, never to go out of "reserve".
I thought that putting a timeout on the reserve would help it, but it didn't. As it seems, the client issues a command and blocks, it never checks the timeout. It just issues a reserve-with-timeout (instead of a simple reserve) and it is the servers responsibility to return a TIME_OUT as the timeout occurs. The problem is, the connection is broken (but the TCP/IP doesn't know about that yet until any of the sides try to talk to the other side) and if the client blocked reading, it will never return.
The library seems to have support for some kind of timeouts locally (for example when trying to connect to server), but it does not seem to contemplate this scenario.
How could I detect the stale connection and force a reconnect? Is there some kind of keepalive on the protocol (and on the pheanstalk itself)?
Thanks!

You could try to close each connection right after the request is answered and reopen a new connection each time.
There is no close() function but you deleting the Pheanstaly Object with unset($pheanstalk) will close it.
This explanation is quite helpful:
Pheanstalk (PHP client for beanstalk) - how do connections work?

I haven't tried it yet, but I came up with the idea of connecting to the beanstalk server through an SSH tunnel. We can enable the ServerAliveCountMax and ServerAliveInterval options on the tunnel, so that a network or server failure will cause the tunnel to close. This should then cause the pheanstalk client to report an error.

Related

Why RabbitMQ won't the connection stay open when not in use?

I have used http://github.com/streadway/amqp package in my application in order to handle connections to a remote RabbitMQ server. Everything is ok and works fine but when a connection is idle for a long period of time f.g 6 hours it gets closed. I check NotifyClose(make(chan *amqp.Error)) all time in my go routine and it returns :
Exception (501) Reason: "write tcp
192.168.133.53:55424->192.168.134.34:5672: write: broken pipe"
Why this error happens? (is there any problem in my code?)
How long a connection can be idle?
How to prevent this problem?
As Cosmic Ossifrage says, the error is saying your RabbitMQ client has disconnected.
There are so many things that could sit between your client and server that can/will drop dormant connections that it's not worth focusing on how long your connection can be dormant for. You want to set the requested heartbeat interval in your connection manager.
https://www.rabbitmq.com/heartbeats.html
I'm not familiar with the framework you're using but I see it has a defaultHeartbeat field in connection.go. You might need to experiment with the value to find the best balance is to stop the connection being killed but not hit the server too often with keep-alive traffic.

Irregular socket errors (10054) on Windows application

I am working on a Windows (Microsoft Visual C++ 2005) application that uses several processes
running on different hosts in an intranet.
Processes communicate with each other using TCP/IP. Different processes can be on the
same host or on different hosts (i.e. the communication can be both within the same
host or between different hosts).
We have currently a bug that appears irregularly. The communication seems to work
for a while, then it stops working. Then it works again for some time.
When the communication does not work, we get an error (apparently while a process
was trying to send data). The call looks like this:
send(socket, (char *) data, (int) data_size, 0);
By inspecting the error code we get from
WSAGetLastError()
we see that it is an error 10054. Here is what I found in the Microsoft documentation
(see here):
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally
results if the peer application on the remote host is suddenly stopped, the
host is rebooted, the host or remote network interface is disabled, or the
remote host uses a hard close (see setsockopt for more information on the
SO_LINGER option on the remote socket). This error may also result if a
connection was broken due to keep-alive activity detecting a failure while
one or more operations are in progress. Operations that were in progress
fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
So, as far as I understand, the connection was interrupted by the receiving process.
In some cases this error is (AFAIK) correct: one process has terminated and
is therefore not reachable. In other cases both the sender and receiver are running
and logging activity, but they cannot communicate due to the above error (the error
is reported in the logs).
My questions.
What does the SO_LINGER option mean?
What is a keep-alive activity and how can it break a connection?
How is it possible to avoid this problem or recover from it?
Regarding the last question. The first solution we tried (actually, it is rather a
workaround) was resending the message when the error occurs. Unfortunately, the
same error occurs over and over again for a while (a few minutes). So this is not
a solution.
At the moment we do not understand if we have a software problem or a configuration
issue: maybe we should check something in the windows registry?
One hypothesis was that the OS runs out of ephemeral ports (in case connections are
closed but ports are not released because of TcpTimedWaitDelay), but by analyzing
this issue we think that there should be plenty of them: the problem occurs even
if messages are not sent too frequently between processes. However, we still are not
100% sure that we can exclude this: can ephemeral ports get lost in some way (???)
Another detail that might help is that sending and receiving occurs in each process
concurrently in separate threads: are there any shared data structures in the
TCP/IP libraries that might get corrupted?
What is also very strange is that the problem occurs irregularly: communication works
OK for a few minutes, then it does not work for a few minutes, then it works again.
Thank you for any ideas and suggestions.
EDIT
Thanks for the hints confirming that the only possible explanation was a connection closed error. By further analysis of the problem, we found out that the server-side process of the connection had crashed / had been terminated and had been restarted. So there was a new server process running and listening on the correct port, but the client had not detected this and was still trying to use the old connection. We now have a mechanism to detect such situations and reset the connection on the client side.
That error means that the connection was closed by the
remote site. So you cannot do anything on your programm except to accept that the connection is broken.
I was facing this problem for some days recently and found out that Adobe Acrobat Reader update was the culprit. As soon as you completely uninstall Adobe from the system everything returns back to normal.
I spent a long time debugging a 10054/10053 error in s3 pre-signed uploads
Turns out that the s3 server will reject pre-signed s3 uploads for the first 15 minutes of it's life.
So - If you're debugging s3 check it's not a new bucket.
If you're debugging something else - this is most likely a problem on the server side not client side.

X11 remote application timeout

I am in need of a way to decrease the timeout my X server has on remote applications. Currently X11 will keep an application on the display for a very long time (> 30min) after removing the Ethernet connection. I am needing to timeout within 10-30 seconds of loss of communication with the application.
I am running a standard Xorg server with no modification made to it. I have tried numerous methods for doing this. I have tried using the -to option on the X server but this does not seem to have any effect. I have also tried messing with the TCP properties using sysctl. I have set the tcp_keepalive_* properties to values which should give me the timeout needed but this also does not seem to have an effect on the timeout.
Also, the remote applications are not using SSH tunneling to connect to the server. It is an open sever on a secure connection so tunneling is not needed. The timeout mechanism must be done on the server side as I have no control over the applications.
Anyone have any ideas how to get the needed behavior from the X server?
The X server doesn't have client timeouts. Anything you see that looks like one is TCP's doing, not X's.
If you're lucky, the application you're talking to responds to the _NET_WM_PING protocol (most modern toolkits do this for you internally). If you can at least control the window manager you're using, you could modify it to send ping messages to all your running apps and blow them away with XKillClient if they don't respond promptly.

Socket connection rerouting

Most proxy servers perform the job of forwarding data to an appropriate "real" server. However, I am in the process of designing a distributed system in which when the "proxy" receives a TCP/IP socket connection, the remote system actually connects with a real server which the proxy nominates. All subsequent data flows from remote to the real server.
So is it possible to "forward" the socket connection request so that the remote system connects with the real server?
(I am assuming for the moment that nothing further can be done with the remote system. Ie the proxy can't respond to the connection by sending the IP address of the actual server and the remote connections with that. )
This will be under vanilla Windows (not Server), so can't use cunning stuff like TCPCP.
I assume your "remote system" is the one that initiates connection attempts, i.e. client of the proxy.
If I get this right: when the "remote system" wants to connect somewhere, you want the "proxy server" to decide where the connection will really go ("real server"). When the decision is made, you don't want to involve the proxy server any further - the data of the connection should not pass the proxy, but go directly between the "remote system" and the "real server".
Problem is, if you want the connection to be truly direct, the "remote system" must know the IP address of of the "real server", and vice versa.
(I am assuming for the moment that nothing further can be done with
the remote system. Ie the proxy can't respond to the connection by
sending the IP address of the actual server and the remote connections
with that. )
Like I said, not possible. Why is it a problem to have the "proxy" send back the actual IP address?
Is it security - you want to make sure the connection really goes where the proxy wanted? If that's the case, you don't have an option - you have to compromise. Either the proxy forwards all the data, and it knows where the data is going, or let the client connect itself, but you don't have control where it connects.
Most networking problems can be solved as long as you have complete control over the entire network. Here, for instance, you could involve routers on the path between the "remote system" and the "real client", to make sure the connection is direct and that it goes where the proxy wanted. But this is complex, and probably not an option in practice (since you may not have control over those routers).
A compromise may be to have several "relay servers" distributed around the network that will forward the connections instead of having the actual proxy server forward them. When a proxy makes a decision, it finds the best (closest) relay server, tells it about the connection, then orders the client to connect to the relay server, which makes sure the connection goes where the proxy intended it to go.
There might be a way of doing this but you need to use a Windows driver to achieve it. I've not tried this when the connection comes from an IP other than localhost, but it might work.
Take a look at NetFilter SDK. There's a trial version which is fully functional up to 100000 TCP and UDP connections. The other possibility is to write a Windows driver yourself, but this is non-trivial.
http://www.netfiltersdk.com
Basically it works as follows:
1) You create a class which inherits from NF_EventHandler. In there you can provide your own implementation of methods like tcpConnectRequest to allow you to redirect TCP connections somewhere else.
2) You initialize the library with a call to nf_init. This provides the link between the driver and your proxy, as you provide an instance of your NF_EventHandler implementation to it.
There are also some example programs for you to see the redirection happening. For example, to redirect a connection on port 80 from process id 214 to 127.0.0.0:8081, you can run:
TcpRedirector.exe -p 80 -pid 214 -r 127.0.0.1:8081
For your proxy, this would be used as follows:
1) Connect from your client application to the proxy.
2) The connection request is intercepted by NetFilterSDK (tcpConnectRequest) and the connection endpoint is modified to connect to the server the proxy chooses. This is the crucial bit because your connection is coming from outside and this is the part that may not work.
Sounds like routing problem, one layer lower than TCP/IP;
You're actually looking for ARP like proxy:
I'd say you need to manage ARP packets, chekcing the ARP requests:
CLIENT -> WHOIS PROXY.MAC
PROXY -> PROXY.IP is SERVER.IP
Then normal socket connection via TCP/IP from client to server.

TCP: Address already in use exception - possible causes for client port? NO PORT EXHAUSTION

stupid problem. I get those from a client connecting to a server. Sadly, the setup is complicated making debugging complex - and we run out of options.
The environment:
*Client/Server system, both running on the same machine. The client is actually a service doing some database manipulation at specific times.
* The cnonection comes from C# going through OleDb to an EasySoft JDBC driver to a custom written JDBC server that then hosts logic in C++. Yeah, compelx - but the third party supplier decided to expose the extension mechanisms for their server through a JDBC interface. Not a lot can be done here ;)
The Symptom:
At (ir)regular intervals we get a "Address already in use: connect" told from the JDBC driver. They seem to come from one particular service we run.
Now, I did read all the stuff about port exhaustion. This is why we have a little tool running now that counts ports and their states every minute. Last time this happened, we had an astonishing 370 ports in use, with the count rising to about 900 AFTER the error. We aleady patched the registry (it is a windows machine) to allow more than the 5000 client ports standard, but even then, we are far far from that limit to start with.
Which is why I am asking here. Ayneone an ide what ELSE could cause this?
It is a Windows 2003 Server machine, 64 bit. The only other thing I can see that may cause it (but this functionality is supposedly disabled) is Symantec Endpoint Protection that is installed on the server - and being capable of actinc as a firewall, it could possibly intercept network traffic. I dont want to open a can of worms by pointing to Symantec prematurely (if pointing to Symantec can ever be seen as such). So, anyone an idea what else may be the cause?
Thanks
"Address already in use", aka WSAEADDRINUSE (10048), means that when the client socket prepared to connect to the server socket, it first tried to bind itself to a specific local IP/Port pair that was already in use by another socket, either an active one or one that has been closed but is still in the FD_WAIT state. This has nothing to do with the number of ports that are available.
I'm having the same issue on a Windows 2000 Server with a .Net application connecting to a SQL Server 7.0. There's like 10 servers with the same configuration and only one is showing this error several times a day. With a small test program I'm able to reproduce the error by just establishing a TCP connection on the SQL Server listening port. Running CurrPorts (http://www.nirsoft.net/utils/cports.html) shows there's still plenty of available ports in range 1024-5000.
I'm out of ideas and would like to know if you've found a solution since you've posted your question.
Edit : I finally found the solution : a worm was present on the server (WORM_DOWNAD.A) and exhausted local ports without being noticed.

Resources