Measuring link quality between two machines

Measuring link quality between two machines - algorithm

Are there some standard methods(libraries) for measuring quality of link/connection between two computers.
This results would be used to improve routing logic. If connection condition is unacceptable stop data transfer to that computer and initiate alternative route for that transfer. It looks like Skype has some of this functionality.
I was thinking to establish several continuous testing streams that can show bandwidth problems, and some kind of ping-pong messaging logic to show latency values.

Link Reliability
I usually use a continuous traceroute (i.e. mtr) for isolating unreliable links; but for your purposes, you could start with average ping statistics as #recursive mentioned. Migrate to more complicated things (like a UDP/TCP echo protocol) if you find that ICMP is getting blocked too often by client firewalls in the path.
Bandwidth / Delay Estimation
For bandwidth and delay estimation, yaz provides a low-bandwidth algorithm to estimate throughput / delay along the path; it uses two different endpoints for measurement, so your client and servers will need to coordinate their usage.
Sally Floyd maintains a pretty good list of bandwidth estimation tools that you may want to check out if yaz isn't what you are looking for.

Ping is good for testing latency, but not bandwidth.

Related

What algorithm iperf3 use under the hood to measure the bandwidth and latency between end to end points in a network infrastructure?

I am trying to learn an iperf tool, which is a handy tool to measure the bandwidth and latency of two endpoints on a network.
I am wondering which algorithm does iperf/iperf3 use under the hood to measure the latency and bandwidth. I go through the documentation of the iperf but couldn't find the information.
Does anyone know about it?

iperf3 and similar bandwidth-testing tools work in pretty much the same way; they send a certain amount of data from one host to another and also measure the amount of time it took to send that data. Divide the amount of data sent by the time it took, and (with appropriate unit conversions) that's the bandwidth. These measurements can also be done on the receiving end for similar (but possibly different) numbers. Sometimes, the endpoints can pace themselves, to artificially limit the speed at which data is sent.
These tools may differ in how they coordinate the start and end of tests, how the test parameters are communicated, or how pacing is done. But the basic test being run is the same.
If you really want to understand, in detail, what a particular tool is doing, you might need to actually read and understand the source code.

How to utilize all available bandwidth with real-time data?

How to measure actual bandwidth between server and client to decide how much of real-time data to send?
My server sends read-time data to clients, 30 times per second. If server has too much data it prioritises data chunks and throws away anything that doesn't fit into available bandwidth because this data will be invalidated next tick anyway. Data is sent over reliable (20%) and unreliable channels (80%) (both UDP based but if TCP as a reliable channel can provide any benefit please let me know). Data is highly latency-sensitive. Server often (but not always!) has more data than available bandwidth. It's critical to send as much data as possible but not more than available bandwidth to avoid packets drop or higher latency.
Server and client are custom applications so can implement any algorithm/protocol.
My main problem is how to keep track of available bandwidth. Also any statistical info about typical bandwidth jitter would be helpful (servers are in a cloud, clients are home users, worldwide).
At the moment I'm thinking how to utilize:
latency info of reliable channel. It should correlate with bandwidth because if latency grows this can (!) mean retransmission is involved as result of packets drop and so server must lower data rate.
data amount received by client on unreliable channel during time frame. Especially if data amount is lower than what was sent from server.
if current latency is close to or below lowest recorded one, bandwidth can be increased
The problem is that this approach is too complicated and involves a lot of "heuristics" like what should be a step to increase/decrease bandwidth etc.
Looking for any advice from people who dealt with similar problem in the past or just any bright ideas

The first symptom of trying to use more bandwidth than you actually have will be increased latency, as you fill up the buffers between the sender and whatever the bottleneck is. See https://en.wikipedia.org/wiki/Bufferbloat. My guess is that if you can successfully detect increased latency as you start to fill up the bandwidth and back off then you can avoid packet loss.
I wouldn't underestimate TCP - people have spent a lot of time tuning its congestion avoidance to get a reasonable amount of the available bandwidth while still being a good network citizen. It may not be easy to do better.
On the other hand, a lot will depend on the attitude of the intermediate nodes, which may treat UDP differently from TCP. You may find that under load they either prioritize or discard UDP. Also some networks, especially with satellite links, may use https://en.wikipedia.org/wiki/TCP_acceleration without you even knowing about it. (This was a painful surprise for us - we relied on the TCP connection failing and keep-alive to detect loss of connectivity. Unfortunately the TCP accelerator in use maintained a connection to us, pretending to be the far end, even when connectivity to the far end had in fact been lost).

After some research, the problem has a name: Congestion Control, or Congestion Avoidance Algorithm. It's quite a complicated topic and there're lots of materials about it. TCP Congestion Control was evolving over time and is really good one. There're other protocols that implement it, e.g. UDT or SCTP

Measuring ZeroMQ performances on a network

This is probably a very naïve question, but I'm really a newbie in that stuff.
I'd like to test 0MQ performances (latency, throughput) according to different communication patterns: REQ/REP, PUB/SUB, PUSH/PULL, ROUTER/DEALER and so on, ... and estimate how well, performance-wise, 0MQ would handle the various communication scenarios we encounter in our software.
When everything runs on the same machine, it is relatively easy to measure things and do basic statistics according to message size, etc. I know for sure when my messages are sent, and when they are received.
But how can I do measurements across the network without a common time
reference (which is accurate enough, I mean)? Do I measure round-trips (from machine A to machine B and back)? Is that a meaningful test?

ZeroMQ comes with performance testing tools; look in the perf/ directory. E.g. to test throughput, run local_thr on one machine, and remote_thr on the other. You can set message sizes and counts. Do test with sufficient messages to get accurate figure (test should run for at least 5-10 seconds).

Gauging a web browser's bandwidth

Is it possible to gauge a web browsers upload and/or download speed by monitoring normal http requests? Ideally a web application would be able to tell the speed of a client without any modifications and without client-side scripting like JavaScript/Java/Flash. So even if a client was accessing the service with a library like Curl it would still work. If this is possible, how? If its not possible, why? How accurate can this method be?
(If it helps assume PHP/Apache, but really this is a platform independent question. Also being able to gauge the upload speed is more important to me.)

Overview
You're asking for what is commonly called "passive" available bandwidth (ABW) measurement along a path (versus measuring a single link's ABW). There are a number of different techniques1 that estimate bandwidth using passive observation, or low-bandwidth "Active" ABW probing techniques. However, the most common algorithms used in production services are active ABW techniques; they observe packet streams from two different end-points.
I'm most familiar with yaz, which sends packets from one side and measures variation in delay on the other side. The one-sided passive path ABW measurement techniques are considered more experimental; there aren't solid implementations of the algorithms AFAIK.
Discussion
The problem with the task you've asked for is that all non-intrusive2 ABW measurement techniques rely on timing. Sadly, timing is a very tricky thing when working with http...
You have to deal with the reality of object caching (for instance, akamai) and http proxies (which terminate your TCP session prematurely and often spoof the web-server's IP address to the client).
You have to deal with web-hosts which may get intermittently slammed
Finally, active ABW techniques rely on a structured packet stream (wrt packet sizes and timing), unlike what you see in a standard http transfer.
Summary
In summary, unless you set up dedicated client / server / protocol just for ABW measurement, I think you'll be rather frustrated with the results. You can keep your ABW socket connections on TCP/80, but the tools I have seen won't use http3.
Editorial note: My original answer suggested that ABW with http was possible. On further reflection, I changed my mind.
END-NOTES:
---
See Sally Floyd's archive of end-to-end TCP/IP bandwidth estimation tools
The most common intrusive techniques (such as speedtest.net) use a flash or java applet in the browser to send & receive 3-5 parallel TCP streams to each endpoint for 20-30 seconds. Add the streams' average throughput (not including lost packets requiring retransmission) over time, and you get that path's tx and rx ABW. This is obviously pretty disruptive to VoIP calls, or any downloads in progress. Disruptive meausurements are called bulk transfer capacity (BTC). See RFC 3148: A Framework for Defining Empirical Bulk Transfer Capacity Metrics. BTC measurements often use HTTP, but BTC doesn't seem to be what you're after.
That is good, since it removes the risk of in-line caching by denying http caches an object to cache; although some tools (like yaz) are udp-only.

Due to the way TCP connections adapt to available bandwidth, no this is not possible. Requests are small and typically fit within one or two packets. You need a least a dozen full-size packets to get even a coarse bandwidth estimate, since TCP first has to scale up to available bandwidth ("TCP slow start"), and you need to average out jitter effects. If you want any accuracy, you're probably talking hundreds of packets required. That's why upload rate measurement scripts typically transfer several megabytes of data.
OTOH, you might be able to estimate round-trip delay from the three-way handshake and the timing of acks. But download speed has at least as much impact as upload speed.

There's no support in javascript or any browser component to measure upload performance.
The only way I can think of is if you are uploading to a page/http handler, and the page is receiving the incoming bytes, it can measure how many bytes it is receiving per second. Then store that in some application wide dictionary with a session ID.
Then from the browser you can periodically poll the server to get the value in the dictionary using the session ID and show it to user. This way you can tell how's the upload speed.

You can use AJAXOMeter, a JavaScript library which meassures your up- and download speed. You can see a live demo here.

That is not feasible in general as in-bound and out-bound bandwidth frequently is not symmetric. Different ISPs have significantly different ratios here that can vary on even time of the day basis.

How can I estimate ethernet performance?

I need to think about performance limitations of 100 mbps ethernet (including scenarios with up to ~100 endpoints on the same subnet) and I'm wondering how best to go about estimating the capacity of the network. Are there any rules of thumb for this?
The reason I ask is that I am working on some back-of-the-envelope level calculations about performance limitations, so it doesn't need to be incredibly accurate. I just haven't been through this exercise before and was hoping to gain some insight from those who have. Mark Brackett's answer (as of 1/26) is along the lines of what I am looking for.

If you're using switches (and, honestly, who isn't these days) - then I've found 80% of capacity a reasonable estimate. Usually, it's really about 90% because of TCP overhead - but 80% accounts for occasional retransmits.
If it's a single collision domain (hubs), then you'd probably be around 30% with moderate activity on those 100 nodes. But, it'd be pretty variable based on the traffic generated. And anyone putting 100 nodes in a single CD these days would no doubt be shot - so I don't think you'll actually run into those IRL.
Edit: Note that these numbers are for a relatively healthy network - one that is generally defined as working. Extremely excessive broadcasts or other anomalous traffic patterns have been known to bring a network to it's knees.

Use WANem
WANem is a Wide Area Network Emulator,
meant to provide a real experience of
a Wide Area Network/Internet, during
application development / testing over
a LAN environment.
You can simulate any network scenario using it and then test your application's behaviour using it. It is open-source and is available with sourceforge.
Link : WANem - The Wide Area Network emulator

Opnet creates software for simulating network performance. I once used Opnet IT Guru Academic edition. Maybe this application or some other software from opnet may be of some help.

100 endpoints are not suppose to be an issue. If the network is properly configured (nothing special) the only issue is the bandwidth. Fast Ethernet (100 mbps) should be able to transfer almost 10Mb (bytes) per second. It is able to transfer it to one client or to many. If you are using hubs instead of switches. And if you are using half-duplex instead of full-duplex. Then you should change that( this is the rule of thumb).

Working from the title of your post, "How can I estimate Ethernet performance", see this wiki link; http://en.wikipedia.org/wiki/Ethernet_frame#Maximum_throughput

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio