Why TCP performance degrade with bidirectional traffic? - performance

I've made a simple experiment on TCP transportation performance. The experiment is as follows:
There are two machines, A and B, each installed with Ubuntu 12.04 Server. I've installed "iperf" on either machine, and use it to test the transportation rate. A and B are connected through a 100Mbps link. The experiment is like this:
I use iperf to send from A to B using TCP mode. The result is that on both side the rate output by iperf is 100Mbps, and is very stable.
I use another iperf process to send from B to A, using the same settings. The result is that on both side the rate output is a little lower, 99Mbps, stably. But this is understandable.
I use one more iperf process to again send from A to B, with the presence of the previous two traffic flows. Now the wired thing happens. The rate of the three traffic flows are all 50Mbps, on both sides. The rates are all very stable.
I understand the reason that flow 1 and flow 3 share the one-direction link and both have a bandwidth of 50Mbps. But what is the reason that the backward flow, flow 2, is also affected and is also 50Mbps? Shouldn't the bidirectional link be regarded as two different links that have no interference with each other?

Related

How to optimize ZeroMQ Performance On Windows (XP SP3)

I have two Windows XP SP3 machines in which I am trying to send 3k ZMQ messages from one to the other. These are both fairly modern system (Dual Quad Core Xeon with 5100 chipset and Dual Hex Core Xeon with 5500 chipset) with server grade Intel gigabit ethernet cards.
The two machines are connected point to point without a switch or router in between.
With pcttcp for performance comparison I am able to send 70MB/s (56% utilization) via TCP from one machine to the other. With ZMQ PUSH/PULL I am only able to get ~28MB/s between the two.
With the sender and receiver on the same machine (the slower machine of the two) I am able to achieve a rate of 97MB/s. (220MB/s in the dual hex core)
The PUSH/PULL channel has a HWM set on both ends. It performs marginally better if the HWM sizes are set to low (~150 messages) rather than a larger value like 1024.
I tried 6000 byte jumbo frames and it go worse. (pcttcp performed marginally better though # 72MB/s)
I tried setting TcpWindowSize to a larger value but it seemed to get worse as well. ZMQ liked the lower size, pcttcp did not change. TcpWindowSize is now set to 32K
Other parameters:
TcpAckFrequency = 1 // would not work with out this.
Tcp1323Opts = 1
Receive Side Scaling enabled
How should I approach finding the bottle neck? What should I expect to achieve with TCP and ZMQ performance? The ZeroMQ web site performance section details tests in which the throughput approaches that of TCP (95%+).
Any performance tips / wisdom (aside from use linux, ;-) ) would be greatly appreciated.
Thanks!!!
Another clue: if I setup multiple sender / receiver pairs between the two systems (same direction, different ports) I am able to achieve a higher aggrigate rate. (a total of ~42MB/s with three)
A quick google pulled this up http://comments.gmane.org/gmane.network.zeromq.devel/10089
The nugget out of that thread is TcpDelAckTicks: [quote]
I got a huge increase of performance (2.4 seconds to 0.4 seconds) after setting TcpDelAckTicks registry value to the machine that does
"apr_socket_accept()" -call in the server code. Client just sends
request and waits for response in loop. There was no change in
performance.
The reason I got there was because I was looking for something around MTU, thinking that it might be network related.
And then I found this http://lists.zeromq.org/pipermail/zeromq-dev/2010-November/007814.html, which has a number of performance tuning recommendations (tho not specifically xp), I won't summarise here, as it would be an almost direct copy and paste (not sure I can be more succinct.)
I'm not sure this'll be helpful, but you might not have spotted them.

Measuring link quality between two machines

Are there some standard methods(libraries) for measuring quality of link/connection between two computers.
This results would be used to improve routing logic. If connection condition is unacceptable stop data transfer to that computer and initiate alternative route for that transfer. It looks like Skype has some of this functionality.
I was thinking to establish several continuous testing streams that can show bandwidth problems, and some kind of ping-pong messaging logic to show latency values.
Link Reliability
I usually use a continuous traceroute (i.e. mtr) for isolating unreliable links; but for your purposes, you could start with average ping statistics as #recursive mentioned. Migrate to more complicated things (like a UDP/TCP echo protocol) if you find that ICMP is getting blocked too often by client firewalls in the path.
Bandwidth / Delay Estimation
For bandwidth and delay estimation, yaz provides a low-bandwidth algorithm to estimate throughput / delay along the path; it uses two different endpoints for measurement, so your client and servers will need to coordinate their usage.
Sally Floyd maintains a pretty good list of bandwidth estimation tools that you may want to check out if yaz isn't what you are looking for.
Ping is good for testing latency, but not bandwidth.

iperf tool measure throughput server side or client

From iperf man:
iperf is a tool for performing network throughput measurements. It can
test either TCP or UDP throughput. To perform an iperf test the user
must establish both a server (to discard traffic) and a client (to gen-
erate traffic).
Basically you run iperf server at one end and iperf client at other end.
My question is :
suppose there is machine A and machine B and you run iperf server at Machine A and client at Machine B you get X number
iperf server at Machine B and iperf client at Machine A, you get Y number.
X and Y denotes the throughput. My question is X denotes which machine (A/B) throughput?
If in case you say it is not machine specific and only denotes throughput between links, Why I should observe different throughput when I swap the client and server(Actually I have observed)?
Thnx in Advance.
When you swap the machines, you're swapping the operations which are slightly different. In case of TCP, sending is directly affected by the buffers length at the TCP/IP stack (since iperf is an application dumping down this data to the stack) so what it will measures is how fast it was able to dump the data to the TCP/IP stack, the smaller the buffers, is the slower the operation gets. Rx is different, most likely the app CPU will be consuming these packets as they arrive, so the buffers would be empty most of the time.

What is the best way for "Polling"?

This question is related with Microcontroller programming but anyone may suggest a good algorithm to handle this situation.
I have a one central console and set of remote sensors. The central console has a receiver and the each sensor has a transmitter operates on same frequency. So we can only implement Simplex communication.
Since the transmitters work on same frequency we cannot have 2 sensors sending data to central console at the same time.
Now I want to program the sensors to perform some "polling". The central console should get some idea about the existence of sensors (Whether the each sensor is responding or not)
I can imagine several ways.
Using a same interval between the poll messages for each sensor and start the sensors randomly. So they will not transmit at the same time.
Use of some round mechanism. Sensor 1 starts polling at 5 seconds the second at 10 seconds etc. More formal version of method 1.
The maximum data transfer rate is around 4800 bps so we need to consider that as well.
Can some one imagine a good way to resolve this with less usage of communication links. Note that we can use different poll intervals for each sensors if necessary.
I presume what you describe is that the sensors and the central unit are connected to a bus that can deliver only one message at a time.
A normal way to handle this is to have collision detection. This is e.g. how Ethernet operates as far as I know. You try to send a message; then attempt to detect collision. If you detect a collision, wait for a random amount (to break ties) and then re-transmit, of course with collision check again.
If you can't detect collisions, the different sensors could have polling intervals that are all distinct prime numbers. This would guarantee that every sensor would have dedicated slots for successful polling. Of course there would be still collisions, but they wouldn't need to be detected. Here example with primes 5, 7 and 11:
----|----|----|----|----|----|----|----| (5)
------|------|------|------|------|----- (7)
----------|----------|----------|-:----- (11)
* COLLISION
Notable it doesn't matter if the sensor starts "in phase" or "out of phase".
I think you need to look into a collision detection system (a la Ethernet). If you have time-based synchronization, you rely on the clocks on the console and sensors never drifting out of sync. This might be ok if they are connected to an external, reliable time reference, or if you go to the expense of having a battery backed RTC on each one (expensive).
Consider using all or part of an existing protocol, unless protocol design is an end in itself - apart from saving time you reduce the probability that your protocol will have a race condition that causes rare irreproducible bugs.
A lot of protocols for this situation have the sensors keeping quiet until the master specifically asks them for the current value. This makes it easy to avoid collisions, and it makes it easy for the master to request retransmissions if it thinks it has missed a packet, or if it is more interested in keeping up to date with one sensor than with others. This may even give you better performance than a system based on collision detection, especially if commands from the master are much shorter than sensor responses. If you end up with something like Alohanet (see http://en.wikipedia.org/wiki/ALOHAnet#The_ALOHA_protocol) you will find that the tradeoff between not transmitting very often and having too many collisions forces you to use less than 50% of the available bandwidth.
Is it possible to assign a unique address to each sensor?
In that case you can implement a Master/Slave protocol (like Modbus or similar), with all devices sharing the same communication link:
Master is the only device which can initiate communication. It can poll each sensor separately (one by one), by broadcasting its address to all slaves.
Only the slave device which was addressed will reply.
If there is no response after a certain period of time (timeout), device is not available and Master can poll the next device.
See also: List of automation protocols
I worked with some Zigbee systems a few years back. It only had two sensors so we just hard-coded them with different wait times and had them always respond to requests. But since Zigbee has systems However, we considered something along the lines of this:
Start out with an announcement from the console 'Hey everyone, let's make a network!'
Nodes all attempt to respond with something like 'I'm hardware address x, can I join?'
At first it's crazy, but with some random retry times, eventually the console responds to all nodes: 'Yes hardware address x, you can join. You are node #y and you will have a wait time of z milliseconds from the time you receive your request for data'
Then it should be easy. Every time the console asks for data the nodes respond in their turn. Assuming transmission of all of the data takes less time than the polling period you're set. It's best not to acknowledge the messages. If the console fails to respond, then very likely the node will try to retransmit just when another node is trying to send data, messing both of them up. Then it snowballs into complete failure...

Cannot achieve full speed on Symmetrical Internet Connection

We are using a business Ethernet connection (3Mbit upload, 3Mbit download) and trying to understand issues with our tested bandwidth speeds. When uploading a large file we sustain 340 KB/s; downloading we sustain 340KB/s. However when we run these transfers simultaneously the two transfer speeds rise and fall erratically with a average speed for both at around 250 KB/s. We're using a Hatteras HN404 CPi and we've bypassed the router (plugged a machine directly into the Hatteras; set the NIC to full-duplex).
Is this expected? Should a max upload interfere with a max download on this type of Internet connection?
Are you sure the bottleneck is your connection?
Do you also see this behavior when the simultaneous upload and download are occurring on different systems, or only when one system is handling both the upload and download?
If the problem goes away when independent machines are doing the work, the bottleneck is likely closer to the hard drive.
This sounds expected from my experience with lower end lines. On a home line, I've found that traffic shaping and changing buffer sizes can be a huge help.
TCP/IP without any unusual traffic shaping will favor the most aggressive traffic at the expense of everything else. In your case, this means responses to the outgoing ACKs and such for the download will be delayed or maybe even dropped. See if your HN404 supports class based queuing or something similar and try it out.
Yes it is expected. This is symptomatic of any case in which you have a throttled or capped connection. If you saturate your uplink it will affect your downlink and vice versa.
This is because the your connection's rate-limiting impacts the TCP handshake acknowledgement packets (ACKs) and disrupts the normal "balance" of how these packets flow.
This is very thoroughly described on this page about Cable Modem Troubleshooting Tips, although it is not limited to cable modems:
If you saturate your cable modem's
upload cap with an upload, the ACK
packets of your download will have to
queue up waiting for a gap between the
congested upload data packets. So your
ACKs will be delayed getting back to
the remote download server, and it
will therefore believe you are on a
very slow link, and slow down the
transmission of further data to you.
So how do you avoid this? The best way is to implement some sort of traffic-shaping or QoS (Quality of Service) on individual sessions to limit them to a maximum throughput based on a percentage of your total available bandwidth.
For example on my home network I have it so that no outbound connection can utilize any more than 67% (2/3rd) of my 192Kbps uplink. That means any single outbound session can only utilized 128Kbps, therefore protecting my downlink speed by preventing the uplink from becoming saturated.
In most cases you are able to perform this kind of traffic-shaping based on any available criteria such as source ip, destination ip, protocol, port, time of day, etc.
It appears that I was wrong about the simultaneous transfer speeds. The 250KB/s speeds up and down were miscalculated by the transfer program (seemed to have been showing a high average speed). Apparently the Business Ethernet (in this case it is an XO circuit provisioned by Speakeasy) only supports 3Mb total, not up AND down (for 6Mbit total). So if I am transferring up and down at the same time in theory I should only have 1.5Mbit up and down or 187.5KB/s at the maximum (if there was zero overhead).

Resources