In theory, max tcp speed is min{rwnd,cwnd} / RTT, where cwnd is congestion window size and rwnd is the receive window size. Assuming cwnd is big enough, it would then just be rwnd/RTT.
Now, if the max window size is 65Kbytes I get (using these calculations from some site):
RTT 10 ms => TCP throughput = 52428000 bps = 52Mbps
RTT 20 ms => TCP throughput = 26214000 bps = 26Mbps
RTT 50 ms => TCP throughput = 10485600 bps = 10Mbps
RTT 100 ms => TCP throughput = 5242800 bps = 5.2Mbps
RTT 150 ms => TCP throughput = 3495200 bps = 4.3Mbps
RTT 200 ms => TCP throughput = 2621400 bps = 2.5Mbps
RTT 300 ms => TCP throughput = 1747600 bps = 1.7Mbps
RTT 500 ms => TCP throughput = 1048560 bps = 1Mbps
How accurate is this? Since I can download from a website (not torrent, direct download) at 5Mbps while having more than 200ms RTT, so I'm above the theoretical max, why does this happen? Do browsers use more than 1 tcp connection for downloads?
Also, I would like to know where exactly rwnd/RTT actually comes from, since rwnd bytes can (and will surely be) be more than 1 TCP segment size, meaning you would be sending way more than 1 segment per RTT start, meaning 1 RTT won't be enough to send and receive ACKs from all the segments sent, so rwnd/RTT actually is pretty far away from the real throughput.
The max window size is not 65Kbytes. The max window size is 65,535 window size units, which may or may not be bytes.
I'm not quite sure I follow your last question. What does the segment size have to do with anything? You can send whatever data you're sending using as many segments as you need.
Do I understand you correctly that you wonder how you can receive "faster that possible"?
The formula you state is correct. The window(s) and the RTT determine your bandwidth (there are other factors, but in most cases these are the important ones).
But I'm wondering about your numbers.
Ad 1) Are you sure about the RTT? This seems pretty high for regular downloads, unless the are transcontinental. Check the RTT by using ping (e.g. ping simtel.net, replace the host name with the host name in question). You can use a more accurate ping utility like my hrping (http://www.cfos.de/ping) (for Windows).
Ad 2) Are you sure about the Window size? 64k is pretty low today, all modern OSes try to negotiate more than that through RFC 1323 Window Scaling (http://en.wikipedia.org/wiki/TCP_window_scale_option). You can use SG TCP/IP Analyzer (http://www.speedguide.net/analyzer.php) to check your RWIN. Another greate tool for checking your connection is Netalyzr (http://netalyzr.icsi.berkeley.edu).
I would be interested to see the measured figures.
Related
Is WSASendTo() somehow faster than sendto() on Windows?
Is UDP sendto() faster with a non-blocking socket (if there is space in the send buffer)?
Similar to this question :
Faster WinSock sendto()
From my profiling, the send is network bound with blocking socket, i.e. for example with 100 mbit network both send about 38461 datagrams of size 256 bytes/s which is the network speed allowable, I was wondering if anyone has any preference over the 2 speed wise.
sending from localhost to itself on 127.0.0.1 it seems to handle about 250 k send / s which should be about 64 mbyte/s on a 3 ghz pc
it seems 2 times faster blocking, i.e. without FIONBIO set, i.e. with non blocking set it seems to drop to 32 mbyte/s if I retry on EWOULDBLOCK
I don't need to do any heavy duty UDP broadcasting, only wondering the most efficient way if anyone has any deep set "feelings" ?
Also could there be some sort of transmission moderation taking place on network card drivers could there be a maximum datagrams sendable on a gigabit card say would it tolerate for example 100k sends/s or moderate somehow ?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
The goal is to determine metrics of an UDP protocol performance, specifically:
Minimal possible Theoretical RTT (round-trip time, ping)
Maximal possible Theoretical PPS of 1-byte-sized UDP Packets
Maximal possible Theoretical PPS of 64-byte-sized UDP Packets
Maximal and minimal possible theoretical jitter
This could and should be done without taking in account any slow software-caused issues(like 99% cpu usage by side process, inefficiently-written test program), or hardware (like busy channel, extremely long line, so on)
How should I go with estimating these best-possible parameters on a "real system"?
PS. I would offer a prototype, of what I call "a real system".
Consider 2 PCs, PC1 and PC2. They both are equipped with:
modern fast processors(read "some average typical socket-1151 i7 CPU"), so processing speed and single-coreness are not an issues.
some typical DDR4 #2400mhz..
average NICs (read typical Realteks/Intels/Atheroses, typically embedded in mobos), so there is no very special complicated circuitry.
a couple meters of ethernet 8 pair cable that connects their NICs, having established GBIT connection. So no internet, no traffic between them, other that generated by you.
no monitors
no any other I/O devices
single USB flash per PC, that booted their initramfs to the RAM, and used to mount and store program output after test program finishes
lightest possible software stack - There is probably busy box, running on top of latest Linux kernel, all libs are up-to-date. So virtually no software(read "busyware") runs on them.
And you run a server test program on PC1, and a client - on PC2. After program runs, USB stick is mounted and results are dumped to file, and system powers down then. So, I've described some ideal situation. I can't imagine more "sterile" conditions for such an experiment..
For the PPS calculations take the total size of the frames and divide it into the Throughput of the medium.
For IPv4:
Ethernet Preamble and start of frame and the interframe gap 7 + 1 + 12 = 20 bytes.(not counted in the 64 byte minimum frame size)
Ethernet II Header and FCS(CRC) 14 + 4 = 18 bytes.
IP Header 20 bytes.
UDP Header 8 bytes.
Total overhead 46 bytes(padded to min 64 if payload is less than ) + 20 bytes "more on the wire"
Payload(Data)
1 byte payload - becomes 18 based on 64 byte minimum + wire overhead. Totaling 84 bytes on the wire.
64 byte - 48 + 64 = 112 + 20 for the wire overhead = 132 bytes.
If the throughput of the medium is 125000000 bytes per second(1 Gb/s).
1-18 bytes of payload = 1.25e8 / 84 = max theoretical 1,488,095 PPS.
64 bytes payload = 1.25e8 / 132 = max theoretical 946,969 PPS.
These calculations assume a constant stream: The network send buffers are filled constantly. This is not an issue given your modern hardware description. If this were 40/100 Gig Ethernet CPU, bus speeds and memory would all be factors.
Ping RTT time:
To calculate the time it takes to transfer data through a medium divide the data transferred by the speed of the medium.
This is harder since the ping data payload could be any size 64 - MTU(~1500 bytes). ping typically uses the min frame size (64 bytes total frame size + 20 bytes wire overhead * 2 = 168 bytes) Network time(0.001344 ms) + Process response and reply time combined estimated between 0.35 and 0.9 ms. This value depends on too many internal CPU and OS factors, L1-3 caching, branch predictions, ring transitions (0 to 3 and 3 to 0) required, TCP/IP stack implemented, CRC calculations, interrupts processed, network card drivers, DMA, validation of data(skipped by most implementations)...
Max time should be < 1.25 ms based on anecdotal evidence.(My best eval was 0.6ms on older hardware(I would expect a consistent average of 0.7 ms or less on the hardware as described)).
Jitter:
The only inherent theoretical reason for network jitter is the asynchronous nature of transport which is resolved by the preamble. Max < (8 bytes)0.000512 ms. If sync is not established in this time the entire frame is lost. This is possibility that needs to be taken into account. Since UDP is best effort delivery.
As evidenced by the description of RTT: The possible variances in the CPU time in executing of identical code, as well as OS scheduling, and drivers makes this impossible to evaluate effectively.
If I had to estimate, I would design for a maximum of 1 ms jitter, with provisions for lost packets. It would be unwise to design a system intolerant of faults. Even for a "Perfect Scenario" as described faults will occur (a nearby lightening strike induces spurious voltages on the wire). UDP has no inherent method for tolerating lost packets.
I'm trying to simulate an imergancy breaking application using veins and analyze its performance. Research papers on 802.11p shows that as beacon frequency and number of vehicles increase delay should increase considerably due to mac layer delay of the protocol ( for 50 vehicles at 8Hz - about 300ms average delay).
But when simulating application with veins delay values does not show much different ( it ranges from 1ms-4ms).I've checked the Mac layer functionality and it appears that the channel is idle most of the time. So when a packet reaches Mac layer the channel has already been idle for more than the DIFS so packet gets sent quickly. I tried increasing packet size and reducing bitrate. It increase the previous delay by a certain amount. But drastic increase of delay due to backoff process cannot be seen.
Do you know why this happens ???
As you use 802.11p the default data rate on the control channel is 6Mbits (source: ETSI EN 302 663)
750Mbyte/s = 750.000bytes/s
Your beacons contain 500bytes. So the transmission of a beacon takes about 0.0007 seconds. As you have about 50 cars in your multi lane scenario and for example they are sending beacons with a 10 hertz frequency, it takes about 0.35s from 1 second to transmit your 500 beacons.
In my opinion, this are to less cars to create your mentioned effect, because the channel is idling for about 60% of the time.
I am currently using the io.netty.handler.traffic.ChannelTrafficShapingHandler & io.netty.handler.traffic.TrafficCounter to measure performance across a netty client and server. I am consistently see a discrepancy for the value Current Write on the server and Current Read on the client. How can I account for this difference considering the Write/Read KB/s are close to matching all the time.
2014-10-28 16:57:50,099 [Timer-4] INFO PerfLogging 130 - Netty Traffic stats TrafficShaping with Write Limit: 0 Read Limit: 0 and Counter: Monitor ChannelTC431885482 Current Speed Read: 3049 KB/s, Write: 0 KB/s Current Read: 90847 KB Current Write: 0 KB
2014-10-28 16:57:42,230 [ServerStreamingLogging] DEBUG c.f.s.r.l.ServerStreamingLogger:115 - Traffic Statistics WKS226-39843-MTY6NDU6NTAvMDAwMDAw TrafficShaping with Write Limit: 0 Read Limit: 0 and Counter: Monitor ChannelTC385810078 Current Speed Read: 0 KB/s, Write: 3049 KB/s Current Read: 0 KB Current Write: 66837 KB
Is there some sort of compression between client and server?
I can see that my client side value is approximately 3049 * 30 = 91470KB where 30 is the number of seconds where the cumulative figure is calculated
Scott is right, there are some fix around that are also taken this into consideration.
Some explaination:
read is actually real read bandwidth and read bytes account (since the system is not the origin of read reception)
for write events, the system is the source of them and managed them, so there are 2 kinds of writes (and will be in the next fix):
proposed writes which are not yet sent but before the fix taken into account in the bandwidth (lastWriteThroughput) and in the current write (currentWrittenBytes)
real writes when they are effectively pushed to the wire
Currently the issue is that currentWrittenBytes could be higher than real writes since they are mostly scheduled in the future, so they depend on the write speed from the handler which is the source of the write events.
After the fix, we will be more precise on what is "proposed/scheduled" and what is really "sent":
proposed writes taken into consideration into lastWriteThroughput and currentWrittenBytes
real writes operations taken into consideration into realWriteThroughput and realWrittenBytes when the writes occur on the wire (at least on the pipeline)
Now there is a second element, if you set the checkInterval to 30s, this implies the following:
the bandwidth (global average and so control of the traffic) is computed according to those 30s (read or write)
every 30s the "small" counters are reset to 0, while the cumulative counters are not: if you use cumulative counters, you should see that bytes received/sent should be almost the same, while every 30s the "small" counters (currentXxxx) are reset to 0
The smaller the value of this checkInterval, the better the bandwidth, but not too small to prevent too frequent reset and too many thread activities on bandwidth computations. In general, a default of 1s is quite efficient.
The difference seen could be for instance because the 30s event of the sender is not "synchronized" with 30s event of the receiver (and shall not be). So according to your numbers: when receiver (read) is resetting its counters with the 30s event, the writer will resetting its own counters 8s later (24 010 KB).
I'm a new NS-3 user. I'm trying to find and verify the throughput of TCP wireless network. When experimenting with the "ht-wifi-network.cc"(http://www.nsnam.org/doxygen-release/ht-wifi-network_8cc_source.html) in the example file, I used the default settings,which is a UDP flow, and then tried TCP flow. Then I noticed 2 things:
Throughput is very low comparing with datarate, UDP is 22.78 / 65 and TCP is 11.73 / 65. Is this how the result should be like? Because I was expecting at least 30 Mbps out of 65 Mbps.
UDP throughput is almost twice of the TCP throughput. But I expected that TCP throughput would be higher.
Can somebody help and explain why? Thanks!