Tracking 'TCP/IP failed to establish an outgoing connection' bug that happens rarely - windows

We're seeing TCP/IP warnings and a few connection failures on our server but they happen pretty rarely, like once a month or so. Right now, I built a little monitoring application that will just track TIME_WAIT statuses in netstat and see if there are any anomalies with the system we are monitoring. This was all done in response to how Microsoft's documentation would handle port exhaustion. I was wondering if I need to track for anything else? Is there a faster way to resolve this problem? I am not sure where the bug is originating but this seems to be the only way supposedly.
The error in question:
TCP/IP failed to establish an outgoing connection because the selected
local endpoint was recently used to connect to the same remote
endpoint. This error typically occurs when outgoing connections are
opened and closed at a high rate, causing all available local ports to
be used and forcing TCP/IP to reuse a local port for an outgoing
connection. To minimize the risk of data corruption, the TCP/IP
standard requires a minimum time period to elapse between successive
connections from a given local endpoint to a given remote endpoint.

Related

nodeJS being bombarded with reconnections after restart

We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.

Socket.io data loss when Internet speed drop

I am using socket.io 1.4 and I want to know that what happens in this scenario:
The client Emits like this:
Socket.emit('test',data);
The client does 3 emits to server but suddenly Internet speed drops and those emits may not get to server
But after a while the Internet speed rises again but what will happen to previous failed emits?
They will be emitted again automatically?
How should I handle that
Websockets use TCP, which is in general a reliable protocol. There is not exactly such a thing as "The internet speed dropped and I lost some messages." If some messages are lost they will be automatically retransmitted at the TCP level. If retransmission fails completely, the connection will be reset.
So what you really are asking is how socket.io handles this. And the answer is that it has some amount of reconnecting logic, and you may also want to monitor the connection in case it resets (hook up a listener for the disconnect event on the socket), if you want to take some extra action (like notify the user).

Relation between DISCINT and Keep Alive interval in WebSphere MQ server?

We had issues with lot of Applications connecting to MQ server without properly doing a disconnect. Hence we introduced DISCINT on our server connection channels with a value 1800 sec which we found ideal for our transactions. But our Keep Alive interval is pretty high with 900 sec. We would like to reduce that less than 300 as suggested by mqconfig util. But before doing that I would like to know if this is going to affect our disconnect interval value and whether it is going to override our disconnect interval value and make more frequent disconnects which will be a performance hit for us.
How does both these values work and how they are related?
Thanks
TCP KeepAlive works below the application layer in the protocol stack, so it does not affect the disconnecting of the channel configured by the DISCINT.
However lowering the value can result in more frequent disconnects, if your network is unreliable, for example has intermittent very short (shorter then the current KeepAlive, but longer then the new) periods when packets are not flowing.
I think the main difference is, that DISCINT is for disconnecting a technically working channel, which is not used for a given period, while KeepAlive is for detecting a not working TCP connection.
And MQ provides means to detect not working connections in the application layer too, configured by the heartbeat interval.
These may help:
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q015650_.htm
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081900_.htm
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081860_.htm

Connecting timeout for WebSockets

I can't find any information which specify the timeout for building the connection with a WebSocket server. This means, how long can the state "CONNECTING" be active before it changes to "CLOSED"? When the target host doesn't exist the state changes almost immediately, this is easy to find out. But what happens if the DNS lookup takes longer or the server is busy?
The same question arises for the "CLOSING" state when the connection goes through the closing handshake procedure. If the connection fails here, how long does it take until onClose() is called?
Are these two timeouts browser specific? If so, does anyone know any concrete numbers?

Performance Implication of Creating New TCP Connection Per Message

In the past, our server application was designed so that a client creates one TCP connection, keeps this connection established indefinitely and sends messages when needed. These messages may come in high volume bursts or with longer idle periods in between. We are switching to a different connection protocol now, where the client will create a new connection per message and then disconnect after sending.
I have a few questions:
I know there is some overhead for the 3-shake handshake to establish the connection. But is this overhead significant (cpu, memory, bandwidth, etc.)?
Is there any difference between the latency of message being transferred for an established TCP connection that has been idle for minutes vs. creating a new connection and sending the message?
Are there any other factors/considerations that I should be considering if I'm trying to determine the performance impact of switching to this new connection protocol compared to the old one?
Any help at all is greatly appreciated.
Opening and closing a lot of TCP sessions may impact connection tracking firewalls and load balancers, causing them to slow down or even fail and reject the connection. Some, like the Linux iptables conntrack, have moderate default values for the number of tracked connections.
The program might run out of available local port numbers if it cycles messages too quickly. There is a TCP timeout before a socket can be considered "closed". There is often an operating system timer to clean up these closed connections. If too many sockets are opened too quickly, the operating system may not have had time to clean up.
The handshake adds about an extra 80 bytes to your bandwidth cost. The TCP connection close also involves FIN or RST packets, although these flags may be combined with the data packet.
Latency in an established TCP session might be a tiny bit higher if the Nagle algorithm is turned on for the message sender. Nagle causes the OS to wait for more data before sending a partially filled packet. The TCP session that is being closed will flush all data. The same effect can be had in the open session by disabling Nagle with the TCP_NODELAY flag.

Resources