Connecting timeout for WebSockets - websocket

I can't find any information which specify the timeout for building the connection with a WebSocket server. This means, how long can the state "CONNECTING" be active before it changes to "CLOSED"? When the target host doesn't exist the state changes almost immediately, this is easy to find out. But what happens if the DNS lookup takes longer or the server is busy?
The same question arises for the "CLOSING" state when the connection goes through the closing handshake procedure. If the connection fails here, how long does it take until onClose() is called?
Are these two timeouts browser specific? If so, does anyone know any concrete numbers?

Related

Tracking 'TCP/IP failed to establish an outgoing connection' bug that happens rarely

We're seeing TCP/IP warnings and a few connection failures on our server but they happen pretty rarely, like once a month or so. Right now, I built a little monitoring application that will just track TIME_WAIT statuses in netstat and see if there are any anomalies with the system we are monitoring. This was all done in response to how Microsoft's documentation would handle port exhaustion. I was wondering if I need to track for anything else? Is there a faster way to resolve this problem? I am not sure where the bug is originating but this seems to be the only way supposedly.
The error in question:
TCP/IP failed to establish an outgoing connection because the selected
local endpoint was recently used to connect to the same remote
endpoint. This error typically occurs when outgoing connections are
opened and closed at a high rate, causing all available local ports to
be used and forcing TCP/IP to reuse a local port for an outgoing
connection. To minimize the risk of data corruption, the TCP/IP
standard requires a minimum time period to elapse between successive
connections from a given local endpoint to a given remote endpoint.

Unable to connect to Elastic Search intermittently

I am trying to connect to elastic search via Jest Client.
Sometimes, the client is not able to connect to the elastic search cluster.
Stack Trace :
org.apache.http.NoHttpResponseException: search-xxx-yyy.ap-southeast-1.es.amazonaws.com:443 failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
The elastic search cluster is in a public domain, so I am not understanding why the client is unable to connect.
Also, the issue happens intermittently, if I retry the request, it connects sometimes.
Any help is appreciated. Thanks
When JestClient initiates the http request, it will call read() on the socket and block. When this read returns -1, this means that the server closed the connection before or during client was waiting for the response.
Why it happens
There's two main causes for NoHttpResponseException:
. The server end of the connection was closed before the client attempts to send a request down it.
. The server end of the connection closes the connection in the middle of a request.
Stale Connection (connection closed before request)
Most often this is a stale connection. When using persistent connections, you may have a connection sit around in the connection pool not being used for a while. If it is idle for longer than the server or load balancer's HTTP keep alive timeout, then the server or load balancer will close the connection due to its idleness. The Jakarta client isn't structured to receive a notification of this happening (it doesn't use NIO), so the connection sits around in a half-closed state. The only way the client can detect this state is by reading from the socket. So when you send a request, the write is successful because the socket is only half closed (writes succeed until you close your end) but then the read indicates the socket was closed. This causes the request to fail.
Connection Closed Mid-Request
The other reason this might occur is the connection was actually closed while the service was working on it. Anything between your client and service may close the connection, including load balancers, proxies, or the HTTP endpoint fronting your service. If your activities are quite long-running or you're transferring a lot of data, the window for something to go wrong is larger and the connection is more likely to be lost in the middle of the request. An example of this happening is a Java server process exiting after an OutOfMemoryException occurs due to trying to return a large amount of data. You can verify whether this is the problem by looking at TCP dumps to see whether the connection is closed while the request is in flight. Also, failures of this type usually occur some time after sending the request, whereas stale connection failures always occur immediately when the request is made.
Diagnosing The Cause
NoHttpResponseException is usually a stale connection (according to problems I've observed and helped people with)
When the failure always occurs immediately after submitting the request, stale connection is almost certainly the problem
When failures occur some non-trivial amount of time after waiting for the response, then the connection wasn't stale when the request was made and the connection is being closed in the middle of the request
TCPDumps can be more conclusive. You can see when the connection is being closed (before or during the request).
What can be done about it
Use a better client
Nonblocking HTTP clients exist that allow the caller to know when a connection is closed without having to try to read from the connection.
Retry failed requests
If your call is safe to retry (e.g. it's idempotent), this is a good option. It also covers all sorts of transient failures besides stale connection failures. NoHttpResponseException isn't necessarily a stale connection and it's possible that the service received the request, so you should take care to retry only when safe.

nodeJS being bombarded with reconnections after restart

We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.

Websockets and uwsgi - detect broken connections client side?

I'm using uwsgi's websockets support and so far it's looking great, the server detects when the client disconnects and the client as well when the server goes down. But i'm concerned this will not work in every case/browser.
In other frameworks, namely sockjs, the connection is monitored by sending regular messages that work as heartbeats/pings. But uwsgi sends PING/PONG frames (ie. not regular messages/control frames) according to the websockets spec and so from the client side i have no way to know when the last ping was received from the server. So my question is this:
If the connection is dropped or blocked by some proxy will browsers reliably (ie. Chrome, IE, Firefox, Opera) detect no PING was received from the server and signal the connection as down or should i implement some additional ping/pong system so that the connection is detected as closed from the client side?
Thanks
You are totally right. There is no way from client side to track or send ping/pongs. So if the connection drops, the server is able of detecting this condition through the ping/pong, but the client is let hung... until it tries to send something and the underlying TCP mechanism detect that the other side is not ACKnowledging its packets.
Therefore, if the client application expects to be "listening" most of the time, it may be convenient to implement a keep alive system that works "both ways" as Stephen Clearly explains in the link you posted. But, this keep alive system would be part of your application layer, rather than part of the transport layer as ping/pongs.
For example you can have a message "{token:'whatever'}" that the server and client just echoes with a 5 seconds delay. The client should have a timer with a 10 seconds timeout that stops every time that messages is received and starts every time the message is echoed, if the timer triggers, the connection can be consider dropped.
Although browsers that implement the same RFC as uWSGI should detect reliably when the server closes the connection cleanly they won't detect when the connection is interrupted midway (half open connections)t. So from what i understand we should employ an extra mechanism like application level pings.

WinSock best accept() practices

Imagine you have a server which can handle only one client at a time. The server uses WSAAsyncSelect to be notified of new connections. In this case, what is the best way of handling FD_ACCEPT messages:
A > Accept the connection attempt right away but queue the client until its turn?
B > Do not accept the next connection attempt until we are done serving the currently connected client?
What do you guys think is the most efficient?
Here I describe the cons that I'm aware for both options. Hopefully this might help you decide.
A)
Upon a new client connection, it could send tons of data making your receive buffer become full, which causes unnecessary packets to be transmitted (see this). If you don't plan to receive any data from the client, shutdown receiving on that socket, thus if the client sends any data after that, the connection is reset. Moreover, if your protocol has strict rules, disconnect the client.
If the connection stays idle for too long, the system might disconnect it. To solve this, use setsockopt to set SO_KEEPALIVE on each client socket.
B)
If you don't accept the connection after a certain period (I guess the default is 60 seconds), it will timeout. In a normal (or most common) situation this indicates the server is overloaded, thus unable to answer in time. However, if the client is also designed by you, make the socket non-blocking, try to connect, then manage the timeout as you wish.
Ask yourself: what do you want the user experience to be at the other end? Do you want them to be stuck? Do you want them to time out? Do you want them to get a polite message?

Resources