Performance Implication of Creating New TCP Connection Per Message - performance

In the past, our server application was designed so that a client creates one TCP connection, keeps this connection established indefinitely and sends messages when needed. These messages may come in high volume bursts or with longer idle periods in between. We are switching to a different connection protocol now, where the client will create a new connection per message and then disconnect after sending.
I have a few questions:
I know there is some overhead for the 3-shake handshake to establish the connection. But is this overhead significant (cpu, memory, bandwidth, etc.)?
Is there any difference between the latency of message being transferred for an established TCP connection that has been idle for minutes vs. creating a new connection and sending the message?
Are there any other factors/considerations that I should be considering if I'm trying to determine the performance impact of switching to this new connection protocol compared to the old one?
Any help at all is greatly appreciated.

Opening and closing a lot of TCP sessions may impact connection tracking firewalls and load balancers, causing them to slow down or even fail and reject the connection. Some, like the Linux iptables conntrack, have moderate default values for the number of tracked connections.
The program might run out of available local port numbers if it cycles messages too quickly. There is a TCP timeout before a socket can be considered "closed". There is often an operating system timer to clean up these closed connections. If too many sockets are opened too quickly, the operating system may not have had time to clean up.
The handshake adds about an extra 80 bytes to your bandwidth cost. The TCP connection close also involves FIN or RST packets, although these flags may be combined with the data packet.
Latency in an established TCP session might be a tiny bit higher if the Nagle algorithm is turned on for the message sender. Nagle causes the OS to wait for more data before sending a partially filled packet. The TCP session that is being closed will flush all data. The same effect can be had in the open session by disabling Nagle with the TCP_NODELAY flag.

Related

ZeroMQ reliability on bad links

I am wondering how ZeroMQ behaves if messages are delivered over a bad quality link, e.g. a very unstable, low level serial connection which might drop individual bytes.
Of course in such a case the affected message will be lost, but will ZeroMQ be able to recover with the next message? Does it find the start again in any case?
Thank you!
The connection reliability is mostly the responsibility of the TCP protocol - if a socket believes it's connected, then the message is getting through. If packets are lost, then TCP detects that and attempts to retransmit them (see here for more info). This all happens "for free" as far as ZMQ is concerned, any connection type using TCP will behave the same way.
When the TCP connection is lost, which, presumably, could occur if the connection is very unreliable and the message never gets through after repeated attempts by TCP, then ZMQ adds another, separate layer of reliability on top of that, allowing your application to reconnect.
What happens with the original or subsequent messages during this outage depends on the ZMQ socket type you've chosen. Some socket types drop messages, some socket types queue them. If the message was already in transit, it may be lost because the sending socket has relinquished control over it.
Generally, if you want absolute reliability in message delivery, you'll be writing that yourself in your application, with your own confirmations that messages have been received. In most cases, something less than total reliability is needed and you'll just rely on TCP and ZMQ to get the job mostly done. If you're so focused on performance that even the reliability of TCP will slow you down too much and you'd rather just discard that data and move on, you'll need to use UDP - I've heard of people using UDP with ZMQ, but I haven't tried it and I don't believe it's fully supported across the board.

Relation between DISCINT and Keep Alive interval in WebSphere MQ server?

We had issues with lot of Applications connecting to MQ server without properly doing a disconnect. Hence we introduced DISCINT on our server connection channels with a value 1800 sec which we found ideal for our transactions. But our Keep Alive interval is pretty high with 900 sec. We would like to reduce that less than 300 as suggested by mqconfig util. But before doing that I would like to know if this is going to affect our disconnect interval value and whether it is going to override our disconnect interval value and make more frequent disconnects which will be a performance hit for us.
How does both these values work and how they are related?
Thanks
TCP KeepAlive works below the application layer in the protocol stack, so it does not affect the disconnecting of the channel configured by the DISCINT.
However lowering the value can result in more frequent disconnects, if your network is unreliable, for example has intermittent very short (shorter then the current KeepAlive, but longer then the new) periods when packets are not flowing.
I think the main difference is, that DISCINT is for disconnecting a technically working channel, which is not used for a given period, while KeepAlive is for detecting a not working TCP connection.
And MQ provides means to detect not working connections in the application layer too, configured by the heartbeat interval.
These may help:
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q015650_.htm
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081900_.htm
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081860_.htm

WebSphere MQ DISC vs KAINT on SVRCONN channels

we have a major problem with many of our Applications making improper connections (SVRCONN) with queue manager and not issuing MQDISC when connection not required. This causes lot of idle stale connections and prevents Application from making new connections and fails with CONNECTION BROKEN (2009) error. We have been restricting Application connections with clientidle parameter in our Windows MQ on version 7.0.1.8 but when we migrated to MQ v7.5.0.2 in Linux platform we are deciding on the best option available in the new version. We do not have clientidle anymore in ini file for v7.5 but has DISCINT & KAINT in SVRCONN channels. I have been going through the advantages and disadvantages of both for our scenario of Application making connections through SVRCONN channels and leave connections open without issuing a disconnect. Which of these above channel attributes is ideal for us. Any suggestions? Does any of these take precedence over the other??
First off, KAINT controls TCP functions, not MQ functions. That means for it to take effect, the TCP Keepalive function must be enabled in the qm.ini TCP stanza. Nothing wrong with this, but the native HBINT and DISCINT are more responsive than delegating to TCP. This addresses the problem that the OS hasn't recognized that a socket's remote partner is gone and cleaned up the socket. As long as the socket exists and MQ's channel is idle, MQ won't notice. When TCP cleans the socket up, MQ's exception callback routine sees it immediately and closes the channel.
Of the remaining two, DISCINT controls the interval after which MQ will terminate an idle but active socket whereas HBINT controls the interval after which MQ will shut down an MCA attached to an orphan socket. Ideally, you will have a modern MQ client and server so you can use both of these.
The DISCINT should be a value longer than the longest expected interval between messages if you want the channel to stay up during the Production shift. So if a channel should have message traffic at least once every 5 minutes by design, then a DISCINT longer than 5 minutes would be required to avoid channel restart time.
The HBINT actually flows a small heartbeat message over the channel, but only will do so if HBINT seconds have passed without a message. Thsi catches the case that the socket is dead but TCP hasn't yet cleaned it up. HBINT allows MQ to discover this before the OS and take care of it, including tearing down the socket.
In general, really low values for HBINT can cause lots of unnecessary traffic. For example, HBINT(5) would flow a heartbeat every five second interval in which no other channel traffic is passed. chances are, you don't need to terminate orphan channels within 5 seconds of the loss of the socket so a larger value is perhaps more useful. That said, HBINT(5) would cause zero extra traffic in a system with a sustained message rate of 1/second - until the app died, in which case the orphan socket would be killed pretty quick.
For more detail, please go to the SupportPacs page and look for the Morag's "Keeping Channels Running" presentation.

TCP idle connection performance

How does the server and client keep the TCP connection open? Is it heavy on the system both cpu-wise and/or network-wise even if the connection is idle?
I found a good article about it http://www.tcpipguide.com/free/t_TCPConnectionManagementandProblemHandlingtheConnec-3.htm
It explains that when a TCP connection is idle, nothing happens on the network and when they need to send data the connection simply opens again.
Some people think the use of "keepalive" messages is necessary to limit the number of idle connections open, and to ensure that no 'broken' connections are kept open.
Others think the keepalive messages are a waste of resources, And possible accidental server disconnection problems.

Go(lang): about MaxIdleConnsPerHost in the http client's transport

In case MaxIdleConnsPerHost is set to a high number, let's say 1000, the number of connections open will still depend on the other host, right? I mean, allowing 1000 idle connections with the same host will result in 1000 connections open as long as these are not closed by the other host?
So, effectively setting this value to a high number, will result in never closing a connection, but waiting for the other host to do it? am I interpreting this correctly?
Your understanding is correct. MaxIdleConnsPerHost restricts how many connections there are which are not actively serving requests, but which the client has not closed.
Idle connections are useful for web browsers because they can keep reusing connections for subsequent HTTP requests to the same server. Idle connections have a cost for the server, though. They use kernel resources, and you may run up against per process limits or kernel limits on the number of open connections, files, or handles, which may cause unexpected errors in your program, or even for other programs on the same machine.
As such, be careful when increasing MaxIdleConnsPerHost to a large number. It only makes sense to increase idle connections if you are seeing many connections in a short period from the same clients.

Resources