What is the standard way of handling TCP connection timeout issues in EventMachine?
The EventMachine::Connection.send_data method does not return a deferrable(Future) So there is no way to check whether send_data method is failed due to TCP connection timeout or for some other reason.
Related
I have Delphi application that uses Indy HTTP components (that uses Windows socket). And from time to time I am receiving #10060 socket error (WSAETIMEDOUT - An attempt to connect timed out without establishing a connection) upon execution of Indy procedure:
CheckForSocketError(IdWinsock2.Connect(ASocket, #LAddr, SizeOf(LAddr)));
...
connect : TconnectProc;
...
TconnectProc = function ( const s: TSocket; const name: PSockAddr; const namelen: Integer): Integer; stdcall;
Actually all this is just wrap around Windows connect function https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-connect that gives error and message WSAETIMEDOUT. So - my question is - can this be programming error at all? Even if I have server running on the other computer and even that server has trouble serving request, even in such cases the low level connect should execute normally, if the server can not serve the GET/POST request, then, of course, the errors should be but those errors should spring in only during execution of other Socket functions not in the connect function, isn't so?
I am trying to solve my problem https://serverfault.com/questions/973648/is-it-possible-that-unencrypted-traffic-can-cause-windows-socket-10057-10060?noredirect=1#comment1266907_973648 and now I am seeking whats happening in my code.
My server side code is very simple - it is just TIdHttpServer component with implemented (I provide event name only here):
MyForm.IdHTTPServerCommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
So - what can be worng with my implemention, what can lead to the appearance of WSAETIMEDOUT for connect? Yes, my procedure can be long sometimes, but it for years it returned the answer sucessfully and there were no communication errors. And I guess, that connect function can even not depend (does not use/raise) OnCommandGet event, so, I have no control how the server side handles socket connect function from the client?
It may be possible that this is connected with TCP (not HTTP) keepalive, maybe some Windows updates have reduced client-side settings of Windows TCP keepalive for the clients and now this manifests as such error.
Indy TCP clients, like TIdHTTP, have a public ConnectTimeout property, which is set to 0 (infinite) by default. If no timeout is specified, a hard-coded 2 minute timeout is used if the client's TIdTCPClient.Connect() method is called in the main UI thread and TIdAntiFreeze is active, otherwise no timeout is used.
If a timeout is used, Indy calls Winsock's connect() function in a worker thread and waits for that thread to terminate. If Indy's timeout elapses, the socket is closed to abort the connect(), and then EIdConnectTimeout is raised to the caller. If connect() exits before Indy's timeout elapses, an exception is raised to the caller only if connect() failed.
If no timeout is used, Indy calls Winsock's connect() directly, waits for it to exit on its own accord, and then raises an exception only if failed.
So, the ONLY way you can get a WSATIMEDOUT error from Indy when it is calling Winsock's connect() function is if Winsock itself timed out internally before Indy's own timeout elapses. That does not necessarily indicate a problem in your code. It just means that the Host you are trying to connect to is simply not reachable at that moment in time. If the server were reachable, but could not accept your connection, you would get a different error, such as WSAECONNREFUSED.
If your server is behind a firewall or router, make sure it is not blocking connections from reaching your server. Try running a packet sniffer on the server machine, such as WireShark, and make sure the 3-way TCP handshake from TIdHTTP is reaching the server machine correctly.
I'm using redigo for both regular commands as well as subscribing. Every few days I get this error which causes a panic.
dial tcp IP:6379: connect: connection timed out
I'm guessing there is a some lag or minor disturbance with the network which is causing the connection to time out.
How can I avoid this? I'm OK with the program waiting a few seconds until the problem is resolves, rather than panicking.
How can I avoid this? Should I define Timeouts for Dial? Such as
DialReadTimeout
DialWriteTimeout
Use DialConnectTimeout to specify a timeout for dialing a network connection or DialNetDial for complete control over dialing a network connection.
The application supplied NetDial function can set timeouts, throttle connect attempts on failure, and more.
Panics related to a dial failure are probably due to a lack of error checking in the application.
DialWriteTimeout and DialReadTimeout are dial options for specifying the timeout when writing a command to the network connection and reading a reply from the network connection respectively. These options have no bearing on timeouts during connect.
Server:
s = TCPServer.open(6000)
loop do
Thread.start(s.accept) do |client|
# Keep receive and handle message from client
...
end
end
Clients:
server = TCPSocket.open(server_ip, 6000)
... # Send message if event, will keep TCP connection
Question:
Sometimes network down or client crash, How does sever know the TCP connection is alive? Is there a method or command the verify the connection?
Thanks
The most reliable way to verify the state of a TCP connection is to send an empty packet to the server and check if you get a response or an error. That will give you the current connection state of the socket.
does anyone know if you can set a timeout for a listening socket?
I know that you can use a timeout for a send/recv action with SO_RCVTIMEO and SO_SNDTIMEO (through setsockopt) but in my case I need to set that timeout for a socket in a listen state. If no connection is established in X time, I closed the socket. Do you know any socket option to get that?
Thank you.
Yes, you can set SO_RCVTIMEO and it will timeout the accept() method.
What is the error returned on aix/linux when a connection breaks down due to keepalive activity? Is it a unique error code which can be distinguished from other socket errors?
On windows this can be either WSAECONNRESET or WSAENETRESET.
Is there a way to differentiate the error due to keepalive activity when WSAECONNRESET is returned?
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
Is there a way to differentiate the error due to keepalive activity when WSAECONNRESET is returned ?
No. The underlying condition is a 'connection reset' in all cases.