I read this blog:
After conducting a bit of research I
discovered that this is caused by the
fact that the default Linux kernel
runs at a 1000Hz internal clock
frequency and that VMware is unable to
deliver the clock interrupts on time
without losing them. This means that
some clock interrupts are lost without
notice to the Linux kernels which
assumes each interrupt marks 1/1000th
of a second. So each clock interrupt
that gets lost makes the clock fall
behind a 1/1000th of a second.
Now, my question is, how does the hypervisor sync time internally if the hypervisor is capable of handling the clock interrupts?
Because when say (scaled up example, not real world): its 19:10:22 on Host, till it propagates to the guest, it will be 19:10:23 on the host.
I understand this is a hard problem, but I guess you need to slow the time from the VMs prespective. How is that achieved?
VMWare timekeeping
The hypervisor does not synchronize the clocks. It is software running in the guest VM that keeps the clocks in sync.
From page 15 (with the explanation continuing on through page 19) of your linked PDF:
There are two main options available for guest operating system clock synchronization: VMware Tools periodic clock synchronization
or the native synchronization software that you would use with the guest operating system if you were running it directly on physical
hardware. Some examples of native synchronization software are Microsoft W32Time for Windows and NTP for Linux.
The VMware Tools clock sync tool just checks the guest clock against the host clock every so often (probably once per minute) and corrects the guest clock if it's wrong. If the guest clock is off by just a little bit the tool will speed up or slow down the guest clock until it has the correct time (using an API like SetSystemTimeAdjustment on Windows or adjtime on Unix). If you're wondering how the tool accesses the host's clock, there's just an API for it that the VMware tool knows how to use.
Related
I'm testing the speed of different sorting methods for a CS class, and although our professor said we didn't need to be terribly precise, he still wants us to be careful not to run apps in the background while testing, use a different machine, or do anything to throw the speed of the sorts off.
If I ran the tests in a VM, would the environment outside of the VM affect the speed? Would that help make the tests accurate without having to worry about changes in the apps I have open alongside the VM?
In short, yes.
In most scenarios, hosts share their resources with the VM. If you bog down/freeze/crash the host then the VM will be affected.
For those that have more robust servers with better resources, processes running in the host won't affect the VM as much. Because if you have more resources on the host you can assign better RAM and Virtual Processors to the VM so it runs smoothly.
For instance, let's say our host has 64GB of RAM a processor that has 4 cores and 8 threads (such as an Intel® Xeon® E3-1240 Processor).
We can tell VirtualBox, VMware or Hyper-V to assign 32GB of RAM and 4 virtual processors to the VM, essentially cutting the host's power by half.
With this in mind, any processes you run on the host will usually be separate from the VM but if the processes freeze, crash or cause a hard reboot on the host then the VM will be affected regardless of RAM or virtual processors assigned.
In enterprises environments, a hyper-v server should only be used for that purpose and installing/running a lot of processes in the host is usually frowned upon (such as installing/running DHCP, DNS, Web Server (IIS), etc).
So your professor is right to advise against running processes on the host while testing your VM.
I have a PLC that sends UDP packets every 24ms. "Simultaneously" (i.e. within a matter of what should be a few tens or at most hundreds of microseconds), the same PLC triggers a camera to snap an image. There is a Windows 8.1 system that receives both the images and the UDP packets, and an application running on it that should be able to match each image with the UDP packet from the PLC.
Most of the time, there is a reasonably fixed latency between the two events as far as the Windows application is concerned - 20ms +/- 5ms. But sometimes the latency rises, and never really falls. Eventually it goes beyond the range of the matching buffer I have, and the two systems reset themselves, which always starts back off with "normal" levels of latency.
What puzzles me is the variability in this variable latency - that sometimes it will sit all day on 20ms +/- 5ms, but on other days it will regularly and rapidly increase, and our system resets itself disturbingly often.
What could be going on here? What can be done to fix it? Is Windows the likely source of the latency, or the PLC system?
I 99% suspect Windows, since the PLC is designed for real time response, and Windows isn't. Does this sound "normal" for Windows? If so, even if there are other processes contending for the network and/or other resources, why doesn't Windows ever seem to catch up - to rise in latency when contention occurs, but return to normal latency levels after the contention stops?
FYI: the Windows application calls SetPriorityClass( GetCurrentProcess(), REALTIME_PRIORITY_CLASS ) and each critical thread is started with AfxBeginThread( SomeThread, pSomeParam, THREAD_PRIORITY_TIME_CRITICAL ). There is as little as possible else running on the system, and the application only uses about 5% of the available Quad-core processor (with hyperthreading, so 8 effective processors). There is no use of SetThreadAffinityMask() although I am considering it.
So, you have two devices, PLC and camera, which send data to the same PC using UDP.
I 90% suspect networking.
Either just buffering / shaping mechanism in your switch/router (by the way, I hope your setup is isolated, i.e. you have not just plugged your hardware into a busy corporate network), or network stack in either of devices, or maybe some custom retransmission mechanism in the PLC. Both IP and Ethernet protocols were never meant to guarantee low latencies.
To verify, use Wireshark to view the network traffic.
For best experiment, you can use another PC with three network cards.
Plug your three devices (windows client, PLC, camera) into that PC, and configure network bridge between the 3 cards. This way that second PC will act as an Ethernet switch, and you’ll be able to use Wireshark to capture all network traffic that goes through that.
The answer turned out to be a complex interaction between multiple factors, most of which don't convey any information useful to others... except as examples of "just because it seems to have been running fine for 12 months doesn't give you licence to assume everything was actually OK."
Critical to the issue was that the PLC was a device from Beckhoff to which several I/O modules were attached. It turns out that the more of these modules are attached, the less ability the PLC has to transmit UDP packets, despite having plenty of processor time and network bandwidth available. It looks like a resource contention issue of some kind which we have not resolved - we have simply chosen to upgrade to a more powerful PLC device. That device is still subject to the same issue, but the issue occurs if you try to transmit at roughly every 10ms, not 24ms.
The issue arose because our PLC application was operating right on the threshold of its UDP transmit capabilities. The PLC has to step its way through states in a state machine to do the transmit. With a PLC cycle of 2ms, the fastest it could ever go through the states in the state machine with the I/O modules we had attached turned out to be every 22ms.
Finally, what was assumed at first to be an insignificant and unrelated change on PLC startup tipped it over the edge and left it occasionally unable to keep up with the normal 24ms transmit cycle. So it would fall progressively further behind, giving the appearance of an increasingly latency.
I'm accepting #Soonts answer because careful analysis of some Wireshark captures was the the key to unlocking what was going on.
I'm looking to fuzz virtual drivers, I've read the other questions about this but they don't really go anywhere. Basically looking to see if there's an obvious tool I've missed and want to know if fuzzing IOCTLs from a windows guest would work? Or if I need to write one in low level eg IN/OUT?
Any tools out there for fuzzing drivers in a windows guest to hit the hypervisor either hyper-v or VMware
There are a number of ways to exercise virtualization code.
First, of course, if you're on Windows, is the IOCTL interface.
Then you should remember that all virtual devices are emulated in some way by some code in the guest OS and in the host OS. So, accessing input devices (keyboard and mouse), video device, storage (disks), network card, communication ports (serial, parallel), standard PC devices (PIC, PIT, RTC, DMA), CPU APIC, etc etc will also exercise virtualization code.
It's also very important to remember that virtualization of the various PC devices (unless we're talking about synthetic devices working over the VMBUS in Windows) is done by intercepting, parsing and emulating/executing instructions that access device memory-mapped buffers and registers and I/O ports. This gives you yet another "interface" to pound on.
By using it you might uncover not only device-related bugs but also instruction-related bugs. If you're interested in the latter, you need to have a good understanding of how the x86 CPU works at the instruction level in various modes (real, virtual 8086, protected, 64-bit), how it handles interrupts and exceptions and you'll also need to know how to access those PC devices (how and at what memory addresses and I/O port numbers).
Btw, Windows won't let you directly access these things unless your code is running in the kernel. You may want to have a non-Windows guest VM for things like this just to avoid overprotective functionality of Windows. Look for edge cases, unusual instruction encodings (including invalid encodings) or unusual instructions for usual tasks (e.g. using FPU/MMX/SSE/etc or special protected-mode instructions (like SIDT) to access devices). Think and be naughty.
Another thing to consider is race conditions and computational or I/O load. You may have some luck exploring in that direction too.
I'm new to kvm, can someone explain it's process when guest handle a external interrupt or the emulated device interrupt?
Thanks
Amos
In x86 architecture, Intel in this case, most interrupts will cause CPU VM exit, which means the control of CPU will return to host from guests.
So the processes are
CPU is used by guest OS in VMX non-root mode.
CPU is aware of an interrupt coming.
CPU's control returns to host running in VMX root mode. (VM exit)
The host (KVM) handles the interrupt.
Host executed VMLAUNCH instruction to let CPU transfer to VMX non-root mode again for running
guest code.
Repeat 1.
If you are new to kvm, you should first read a few papers about how kvm module works (I assume you know basic idea of virtualization).How it uses qemu to do i/o emulation etc.
I recommend you read these papers:
kvm: the Linux Virtual Machine Monitor: https://www.kernel.org/doc/mirror/ols2007v1.pdf#page=225
Kernel-based Virtual Machine Technology : http://www.fujitsu.com/downloads/MAG/vol47-3/paper18.pdf KVM: Kernel-based Virtualization Driver: http://www.linuxinsight.com/files/kvm_whitepaper.pdf
These are papers written by guys who started kvm.( they are short and sweet :) )
After this you should start looking at the documentation of the kvm in the source code especially the file api.txt its very good.
Then I think you can jump into the source code to understand how things actually work.
Cheers
In Linux I am spawning a guess VM and load another instance of Linux. VM is spawned through KVM/libvirt/qemu. Guest VM is seen as a process by the host kernel. Lets say for some reason the guest VM QEMU process doesn't get scheduled for sometime.. how does the kernel in the VM maintain time ? Lets say I have a timer in my application in the guest VM. If the guest VM qemu process itself doesn't get scheduled will it affect my timer expiry ?
Some virtualization solutions have the VM clock(s) hooked to some host clock(s), so that the VM clock does not tick independently. In other cases, no such thing may occur (relying on an emulated interrupt clock for example), which then does lead to clock skew. The wall clock skew you can attempt to combat with ntpd, but for things like CLOCK_MONOTONIC, you will probably have to live with it.