Variable latency in Windows networking

Variable latency in Windows networking - windows

I have a PLC that sends UDP packets every 24ms. "Simultaneously" (i.e. within a matter of what should be a few tens or at most hundreds of microseconds), the same PLC triggers a camera to snap an image. There is a Windows 8.1 system that receives both the images and the UDP packets, and an application running on it that should be able to match each image with the UDP packet from the PLC.
Most of the time, there is a reasonably fixed latency between the two events as far as the Windows application is concerned - 20ms +/- 5ms. But sometimes the latency rises, and never really falls. Eventually it goes beyond the range of the matching buffer I have, and the two systems reset themselves, which always starts back off with "normal" levels of latency.
What puzzles me is the variability in this variable latency - that sometimes it will sit all day on 20ms +/- 5ms, but on other days it will regularly and rapidly increase, and our system resets itself disturbingly often.
What could be going on here? What can be done to fix it? Is Windows the likely source of the latency, or the PLC system?
I 99% suspect Windows, since the PLC is designed for real time response, and Windows isn't. Does this sound "normal" for Windows? If so, even if there are other processes contending for the network and/or other resources, why doesn't Windows ever seem to catch up - to rise in latency when contention occurs, but return to normal latency levels after the contention stops?
FYI: the Windows application calls SetPriorityClass( GetCurrentProcess(), REALTIME_PRIORITY_CLASS ) and each critical thread is started with AfxBeginThread( SomeThread, pSomeParam, THREAD_PRIORITY_TIME_CRITICAL ). There is as little as possible else running on the system, and the application only uses about 5% of the available Quad-core processor (with hyperthreading, so 8 effective processors). There is no use of SetThreadAffinityMask() although I am considering it.

So, you have two devices, PLC and camera, which send data to the same PC using UDP.
I 90% suspect networking.
Either just buffering / shaping mechanism in your switch/router (by the way, I hope your setup is isolated, i.e. you have not just plugged your hardware into a busy corporate network), or network stack in either of devices, or maybe some custom retransmission mechanism in the PLC. Both IP and Ethernet protocols were never meant to guarantee low latencies.
To verify, use Wireshark to view the network traffic.
For best experiment, you can use another PC with three network cards.
Plug your three devices (windows client, PLC, camera) into that PC, and configure network bridge between the 3 cards. This way that second PC will act as an Ethernet switch, and you’ll be able to use Wireshark to capture all network traffic that goes through that.

The answer turned out to be a complex interaction between multiple factors, most of which don't convey any information useful to others... except as examples of "just because it seems to have been running fine for 12 months doesn't give you licence to assume everything was actually OK."
Critical to the issue was that the PLC was a device from Beckhoff to which several I/O modules were attached. It turns out that the more of these modules are attached, the less ability the PLC has to transmit UDP packets, despite having plenty of processor time and network bandwidth available. It looks like a resource contention issue of some kind which we have not resolved - we have simply chosen to upgrade to a more powerful PLC device. That device is still subject to the same issue, but the issue occurs if you try to transmit at roughly every 10ms, not 24ms.
The issue arose because our PLC application was operating right on the threshold of its UDP transmit capabilities. The PLC has to step its way through states in a state machine to do the transmit. With a PLC cycle of 2ms, the fastest it could ever go through the states in the state machine with the I/O modules we had attached turned out to be every 22ms.
Finally, what was assumed at first to be an insignificant and unrelated change on PLC startup tipped it over the edge and left it occasionally unable to keep up with the normal 24ms transmit cycle. So it would fall progressively further behind, giving the appearance of an increasingly latency.
I'm accepting #Soonts answer because careful analysis of some Wireshark captures was the the key to unlocking what was going on.

Related

Network problems on one device

I currently have a problem, that some packages are getting dropped in my local network.
But just on one device.
Ping to local router
Here you can see a ping to my router. I only have this problem on my pc. Mobilephone and Laptops are completly fine.
I tried a network card and two Wlan usb sticks all with the same problem.
Does somebody a clue on what could cause these problems?
*OS: Windows 10 21H2
*CPU usage ideling around 4-10%
*RAM usage 40%
*Network usage 0-1%

Your question is a bit broad - there are so many things that can disturb a network connection, from physical issues (i.e. cable defects, WIFI interference), to driver problems, CPU bottlenecks, etc. That being said, I would tip at a CPU bottleneck (app using most or all of your CPU), but even that is by no means certain.
Take a look at you CPU usage with TaskManager or ProcessExplorer (from the Sysinternals package). They both also show network usage. If your machine shows excessive CPU (constantly over 30% with frequent peeks), then you might want to explorer the reasons for that, and there can be many.
Using those same tools you can also try to identify apps that are possibly using alot of network bandwidth.
Windows has much happening in the background and those processes require resources (CPU, RAM, Network, Harddisk, etc.). Should any of those resources be limited then, you can easily see issues as you describe as there is a certain interdependence between those resources, i.e. you have many apps running with limited RAM, that leads to paging and as the hard disk is slow, the CPU is then busy with data shoveling and can't keep up with the NIC requests.
However, I am theorizing here. Supply some hard data (machine config, OS Info, Network info/config, task list, CPU usage, etc.) and we can continue.

What is the fastest connection method to debug a windows 10/7 kernel using windbg?

There seems to be many methods for debugging a windows 10/7, including USB or network or COM
But which of them is the fastest? I have only used COM and it seems to be really slow compared to debugging a local usermode application, was wondering what is the fastest method? is there any method that makes debugging kernel as fast as user-mode apps or close?
by fast I mean for example the amount of time for the single steps to take or amount of time for windbg to execute commands, because right now even the simplest commands sometimes take too long
Also what is the fastest method for windows 7?

There are two factors coming in: baud rate (data transfer rate) and response time (ping time). It depends much on what task you perform.
Creating a full memory kernel crash dump will likely transfer a lot of data, so a higher bandwidth is helpful.
On the other hand side, sending small WinDbg commands like k or | have just small amount of data, but you typically send it and wait for the answer. In that case, the response time has more effect.
For baud rates:
COM port is a serial port and can be configured from 75 baud up to 2 MBit/s.
USB depends on the version and has 12 MBit/s up to 10 GBit/s on USB 3.2 generation 2.
Firewire is available from 100 MBit/s to 3200 MBit/s.
Network, well has typical values from 10 MBit/s to 10 GBit/s. But of course if you debug over the Internet, it won't be faster than your DSL or cable modem.
For ping time:
USB has a response time of less than 1 ms, but that may depend on how many devices you connect.
A local full duplex network also has a response time of less than 1 ms.
Debugging over the Internet is pretty slow with 20 ms up to 300 ms.
From an availability and cost standpoint, I would start with a 1 GBit/s network connection. If you don't have that yet, you can buy a cheap Gigabit USB adapter for 12 € or so.
which of them is the fastest?
As I hopefully explained well enough, that's a question which can only be answered when we know the exact situation
I have only used COM and it seems to be really slow
Yes. It is.
right now even the simplest commands sometimes take too long
From a performance view, that's not something we can work with. If you define performance requirements, we'd need to know a) how fast is it now and b) how fast is acceptable for you.
What is the fastest method for windows 7?
I don't think the operating system matters much here.

1394 is the fastest one I used on Win 7. USB debugging is also possible, but you need to make sure the usb port (usually the onboard one) supports debugging - not all ports support this. On Win 10, KDNet probably is the fastest so far.
However, if you are debugging a virtual machine with VMWare or VirtualBox, VirtualKD is even faster than any above physical connections since it just copies data between the guest and the host. Btw, its implementation is very interesting.
All of the above is way more faster than COM. You won't feel much differences unless you are generating a full memory dump, and even that's the case none of them will seriously cause you real pain.

High bandwidth Networking and the Windows "System Interrupts" Process

I am writing a massive UDP network application.
Running traffic at 10gigabits per second.
I have a very high "System Interrupts" CPU usage in task manager.
Reading about what this means, I see:
What Is the “System Interrupts” Process?
System Interrupts is an official part of Windows and, while it does
appear as a process in Task Manager, it’s not really a process in the
traditional sense. Rather, it’s an aggregate placeholder used to
display the system resources used by all the hardware interrupts
happening on your PC.
However most articles say that a high value corresponds with failing hardware.
However, since the "system interrupts" entry correlates to high IRQ usage, maybe this should be high considering my large UDP network usage.
Also, is all of this really happenning on one CPU core? Or is this an aggregate of all things happening across all CPU cores.

If you have many individual datagrams being sent over UDP, it's certainly going to cause a lot of hardware interrupts, and a lot of CPU usage. 10 Gb is certainly in the range of "lots of CPU" if your datagrams are relatively small.
Each CPU has its own hardware interrupts. You can see how spread out the load is over cores on the performance tab - the red line is the kernel CPU time, which includes hardware interrupts and other low-level socket handling by the OS.

How do I share/mix sound output across the network on Windows?

I'm looking to replace a hardware mixer with software, to increase the flexibility of our system and reduce hardware complexity
We have 4-8 server-class PCs (Windows 7) connected in a local LAN via gigabit Ethernet.
Each PC has a USB sound card which is connected to an 8-input mixer.
The output from the mixer is sent to speakers in a few places.
What I'd like is change is:
route the sound over the network instead
each computer should can thus listen to all others and output it's own "mix"
less cables
If possible, support for really cheap hardware (i.e. raspberry pi or something)
There is no hard requirement on latency and such. Up to 100 ms is acceptable (i.e. way higher than your average quake ping...).
While I prefer open source, I'm also open to redistributable commercial solutions, for this to be economically viable, the license costs can't exceed 30-40 €/server. (Preferrably less..)
Grateful for all help!
(Please also share you experiences if possible, not just post links..)
Related question:
https://stackoverflow.com/questions/4297102/how-to-create-a-virtual-sound-card-on-windows <- doesn't seem to interconnect over network

What causes poor network performance when playing audio or video in Windows Vista and newer?

The software in question is a native C++/MFC application that receives a large amount of data over UDP and then processes the data for display, sound output, and writing to disk among other things. I first encountered the problem when the application's CHM help document was launched from its help menu and then I clicked around the help document while gathering data from the hardware. To replicate this, an AutoHotkey script was used to rapidly click around in the help document while the application was running. As soon as any sound occurred on the system, I started getting errors.
If I have the sound card completely disabled, everything processes fine with no errors, though sound output is obviously disabled. However, if I have sound playing (in this application, a different application or even just the beep from a message box) I get thousands of dropped packets (we know this because each packet is timestamped). As a second test, I didn't use my application at all and just used Wireshark to monitor incoming packets from the hardware. Sure enough, whenever a sound played in Windows, we had dropped packets. In fact, sound doesn't even have to be actively playing to cause the error. If I simply create a buffer (using DirectSound8) and never start playing, I still get these errors.
This occurs on multiple PCs with multiple combinations of network cards (both fiber optic and RJ45) and sound cards (both integrated and separate cards). I've also tried different driver versions for each NIC and sound card. All tests have been on Windows 7 32bit. Since my application uses DirectSound for audio, I've tried different CooperativeLevels (normal operation is DSSCL_PRIORITY) with no success.
At this point, I'm pretty convinced it has nothing to do with my application and was wondering if anyone had any idea what could be causing this problem before I started dealing with the hardware vendors and/or Microsoft.

It turns out that this behavior is by design. Windows Vista and later implemented something called the Multimedia Class Scheduler service (MMCSS) that is intended to make all multimedia playback as smooth as possible. Since multimedia playback relies on hardware interrupts to ensure smooth playback, any competing interrupts will cause problems. One of the major hardware interrupt sources is network traffic. Because of this, Microsoft decided to throttle the network traffic when a program was running under MMCSS.
I guess this was a big deal back in 2007 when Vista came out, but I missed it. There was an article by Mark Russinovich (thanks ypnos) describing MMCSS. It seems that the my entire problem boiled down to this:
Because the standard Ethernet frame
size is about 1500 bytes, a limit of
10,000 packets per second equals a
maximum throughput of roughly 15MB/s.
100Mb networks can handle at most
12MB/s, so if your system is on a
100Mb network, you typically won’t see
any slowdown. However, if you have a
1Gb network infrastructure and both
the sending system and your Vista
receiving system have 1Gb network
adapters, you’ll see throughput drop
to roughly 15%. Further, there’s an
unfortunate bug in the NDIS throttling
code that magnifies throttling if you
have multiple NICs. If you have a
system with both wireless and wired
adapters, for instance, NDIS will
process at most 8000 packets per
second, and with three adapters it
will process a maximum of 6000 packets
per second. 6000 packets per second
equals 9MB/s, a limit that’s visible
even on 100Mb networks.
I haven't verified that the multiple adapter bug still exists in Windows 7 or Vista SP1, but it is something to look for if you are running into problems.
From the comments on Russinovich's post, I found that Vista SP1 introduced some registry settings that allowed one to adjust how MMCSS affects Windows. Specifically the NetworkThrottlingIndex key.
The solution to my issue was to completely disable network throttling by setting the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Multimedia\SystemProfile\NetworkThrottlingIndex key to 0xFFFFFFFF and then rebooting. This completely disables the network throttling portion of MMCSS. I had tried simply upping the value to 70, but it didn't stop causing errors until I completely disabled it.
Thus far I have not seen any adverse effects on other multimedia applications (nor the video capture and audio output portions of my own application) from this change. I will report back here if that changes.

It is known that Microsoft built some weird anti-feature into the Windows Vista kernel that will degrade I/O performance preventatively to make sure that multimedia applications (windows media player, directX) get 100% responsiveness. I don't know if that also means packet loss with UDP. Read this lame justification for the method: http://blogs.technet.com/b/markrussinovich/archive/2007/08/27/1833290.aspx
One of the comments there summarizes this quite well: "Seems to me Microsoft tried to 'fix' something that wasn't broken."

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio