Linux: How the guest VM's kernel maintain's time - linux-kernel

In Linux I am spawning a guess VM and load another instance of Linux. VM is spawned through KVM/libvirt/qemu. Guest VM is seen as a process by the host kernel. Lets say for some reason the guest VM QEMU process doesn't get scheduled for sometime.. how does the kernel in the VM maintain time ? Lets say I have a timer in my application in the guest VM. If the guest VM qemu process itself doesn't get scheduled will it affect my timer expiry ?

Some virtualization solutions have the VM clock(s) hooked to some host clock(s), so that the VM clock does not tick independently. In other cases, no such thing may occur (relying on an emulated interrupt clock for example), which then does lead to clock skew. The wall clock skew you can attempt to combat with ntpd, but for things like CLOCK_MONOTONIC, you will probably have to live with it.

Related

Is a VM's speed affected by programs running outside of it?

I'm testing the speed of different sorting methods for a CS class, and although our professor said we didn't need to be terribly precise, he still wants us to be careful not to run apps in the background while testing, use a different machine, or do anything to throw the speed of the sorts off.
If I ran the tests in a VM, would the environment outside of the VM affect the speed? Would that help make the tests accurate without having to worry about changes in the apps I have open alongside the VM?
In short, yes.
In most scenarios, hosts share their resources with the VM. If you bog down/freeze/crash the host then the VM will be affected.
For those that have more robust servers with better resources, processes running in the host won't affect the VM as much. Because if you have more resources on the host you can assign better RAM and Virtual Processors to the VM so it runs smoothly.
For instance, let's say our host has 64GB of RAM a processor that has 4 cores and 8 threads (such as an Intel® Xeon® E3-1240 Processor).
We can tell VirtualBox, VMware or Hyper-V to assign 32GB of RAM and 4 virtual processors to the VM, essentially cutting the host's power by half.
With this in mind, any processes you run on the host will usually be separate from the VM but if the processes freeze, crash or cause a hard reboot on the host then the VM will be affected regardless of RAM or virtual processors assigned.
In enterprises environments, a hyper-v server should only be used for that purpose and installing/running a lot of processes in the host is usually frowned upon (such as installing/running DHCP, DNS, Web Server (IIS), etc).
So your professor is right to advise against running processes on the host while testing your VM.

Difference between KVM and LXC

What is the difference between KVM and Linux Containers (LXCs)? To me it seems, that LXC is also a way of creating multiple VMs within the same kernel if we use both "namespaces" and "control groups" features of kernel.
Text from https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/Resource_Management_and_Linux_Containers_Guide/sec-Linux_Containers_Compared_to_KVM_Virtualization.html Copyright © 2014 Red Hat, Inc.:
Linux Containers Compared to KVM Virtualization
The main difference between the KVM virtualization and Linux
Containers is that virtual machines require a separate kernel instance
to run on, while containers can be deployed from the host operating
system. This significantly reduces the complexity of container
creation and maintenance. Also, the reduced overhead lets you create a
large number of containers with faster startup and shutdown speeds.
Both Linux Containers and KVM virtualization have certain advantages
and drawbacks that influence the use cases in which these technologies
are typically applied:
KVM virtualization
KVM virtualization lets you boot full operating systems of different
kinds, even non-Linux systems. However, a complex setup is sometimes
needed. Virtual machines are resource-intensive so you can run only a
limited number of them on your host machine.
Running separate kernel instances generally means better separation
and security. If one of the kernels terminates unexpectedly, it does
not disable the whole system. On the other hand, this isolation makes
it harder for virtual machines to communicate with the rest of the
system, and therefore several interpretation mechanisms must be used.
Guest virtual machine is isolated from host changes, which lets you
run different versions of the same application on the host and virtual
machine. KVM also provides many useful features such as live
migration. For more information on these capabilities, see Red Hat
Enterprise Linux 7 Virtualization Deployment and Administration Guide.
Linux Containers:
The current version of Linux Containers is designed primarily to
support isolation of one or more applications, with plans to implement
full OS containers in the near future. You can create or destroy
containers very easily and they are convenient to maintain.
System-wide changes are visible in each container. For example, if you
upgrade an application on the host machine, this change will apply to
all sandboxes that run instances of this application.
Since containers are lightweight, a large number of them can run
simultaneously on a host machine. The theoretical maximum is 6000
containers and 12,000 bind mounts of root file system directories.
Also, containers are faster to create and have low startup times.
source
LXC, or Linux Containers are the lightweight and portable OS based virtualization units which share the base operating system's kernel, but at same time act as an isolated environments with its own filesystem, processes and TCP/IP stack. They can be compared to Solaris Zones or Jails on FreeBSD. As there is no virtualization overhead they perform much better then virtual machines.
KVM represents the virtualization capabilities built in the own Linux kernel. As already stated in the previous answers, it's the hypervisor of type 2, i.e. it's not running on a bare metal.
This whitepaper gives the difference between the hypervisor and linux containers and also some history behind the containers
http://sp.parallels.com/fileadmin/media/hcap/pcs/documents/ParCloudStorage_Mini_WP_EN_042014.pdf
An excerpt from the paper:
a hypervisor works by having the host operating
system emulate machine
hardware and then bringing up
other virtual machines (VMs)
as guest operating systems on
top of that hardware. This
means that the communication
between guest and host
operating systems must follow
a hardware paradigm (anything
that can be done in hardware
can be done by the host to the
guest).
On the other hand,
container virtualization (shown
in figure 2), is virtualization at
the operating system level,
instead of the hardware level.
So each of the guest operating
systems shares the same kernel, and
sometimes parts of the operating system, with
the host. This enhanced sharing gives
containers a great advantage in that they are
leaner and smaller than hypervisor guests,
simply because they're sharing much more of
the pieces with the host. It also gives them the
huge advantage that the guest kernel is much
more efficient about sharing resources
between containers, because it sees the
containers as simply resources to be
managed.
An example:
Container 1 and Container 2 open the same file, the host kernel opens the file and
puts pages from it into the kernel page cache. These pages are then handed out to
Container 1 and Container 2 as they are needed, and if both want to read the same
position, they both get the same page.
In the case of VM1 and VM2 doing the same thing, the host opens the file (creating
pages in the host page cache) but then each of the kernels in VM1 and VM2 does
the same thing, meaning if VM1 and VM2 read the same file, there are now three
separate pages (one in the page caches of the host, VM1 and VM2 kernels) simply
because they cannot share the page in the same way a container can. This
advanced sharing of containers means that the density (number of containers of
Virtual Machines you can run on the system) is up to three times higher in the
container case as with the Hypervisor case.
Summary:
KVM is a Hypervisor based on emulating virtual hardware. Containers, on the other hand, are based on shared operating systems and is skinnier. But this poses a limitation on the containers that that we are using a single shared kernel and hence cant run Windows and Linux on the same shared hardware

Where is guest ring-3 code run in VM environment?

According to the white paper that VMWare has published, binary translation techinology is only used in kernel (ring 0 codes), ring 3 code is "directly executed" on cpu hardware.
As I observed, no matter how many processes are run in the guest OS, there is always only 1 process in the host OS. So I assume all the guest ring 3 code are run in the single host process context. (for VMWare, it's vmware-vmx.exe).
So my question here is, how do you execute so many ring 3 code natively in a single process? Considering most of the windows exe file don't contain relocation information, it cannot be executed somewhere else, and binary translation is not used in ring3 code.
Thanks.
Let's talk about VMX, which is Intel VT-x's design.
Intel VT-x introduces two new modes to solve this problem: VMX root mode and VMX non-root mode, which are for host and guest respectively. Both modes have ring 0~3, which means the host and guest will not share the same ring level.
A hypervisor running in ring 3 of VMX root mode, when it decides to transfer the CPU control to a guest, the hypervisor lanuch VMLAUNCH instruction, which allows transfer to VMX non-root mode from VMX root mode. Then guest ring 3 code now is able to automatically executing in VMX non-root mode. All of this is supported by Intel VT-x. No binary translation or instruction emulation is needed for running guest.
Of course ring 3 of VMX non-root mode has less privilege and power. For example, when a guest ring 3 code encounters somthing it cannot handle, such as a physical device access request, CPU will automatically detect this kind of restriction and transfer back to hypervisor in VMX root-mode. After hypervisor finish this task, then it will trigger VMLAUNCH again to for running guest.

about guest in the kvm to handle the external interrupt and external interrupt

I'm new to kvm, can someone explain it's process when guest handle a external interrupt or the emulated device interrupt?
Thanks
Amos
In x86 architecture, Intel in this case, most interrupts will cause CPU VM exit, which means the control of CPU will return to host from guests.
So the processes are
CPU is used by guest OS in VMX non-root mode.
CPU is aware of an interrupt coming.
CPU's control returns to host running in VMX root mode. (VM exit)
The host (KVM) handles the interrupt.
Host executed VMLAUNCH instruction to let CPU transfer to VMX non-root mode again for running
guest code.
Repeat 1.
If you are new to kvm, you should first read a few papers about how kvm module works (I assume you know basic idea of virtualization).How it uses qemu to do i/o emulation etc.
I recommend you read these papers:
kvm: the Linux Virtual Machine Monitor: https://www.kernel.org/doc/mirror/ols2007v1.pdf#page=225
Kernel-based Virtual Machine Technology : http://www.fujitsu.com/downloads/MAG/vol47-3/paper18.pdf KVM: Kernel-based Virtualization Driver: http://www.linuxinsight.com/files/kvm_whitepaper.pdf
These are papers written by guys who started kvm.( they are short and sweet :) )
After this you should start looking at the documentation of the kvm in the source code especially the file api.txt its very good.
Then I think you can jump into the source code to understand how things actually work.
Cheers

How does a Hypervisor synchronize time between Host and Guest VM?

I read this blog:
After conducting a bit of research I
discovered that this is caused by the
fact that the default Linux kernel
runs at a 1000Hz internal clock
frequency and that VMware is unable to
deliver the clock interrupts on time
without losing them. This means that
some clock interrupts are lost without
notice to the Linux kernels which
assumes each interrupt marks 1/1000th
of a second. So each clock interrupt
that gets lost makes the clock fall
behind a 1/1000th of a second.
Now, my question is, how does the hypervisor sync time internally if the hypervisor is capable of handling the clock interrupts?
Because when say (scaled up example, not real world): its 19:10:22 on Host, till it propagates to the guest, it will be 19:10:23 on the host.
I understand this is a hard problem, but I guess you need to slow the time from the VMs prespective. How is that achieved?
VMWare timekeeping
The hypervisor does not synchronize the clocks. It is software running in the guest VM that keeps the clocks in sync.
From page 15 (with the explanation continuing on through page 19) of your linked PDF:
There are two main options available for guest operating system clock synchronization: VMware Tools periodic clock synchronization
or the native synchronization software that you would use with the guest operating system if you were running it directly on physical
hardware. Some examples of native synchronization software are Microsoft W32Time for Windows and NTP for Linux.
The VMware Tools clock sync tool just checks the guest clock against the host clock every so often (probably once per minute) and corrects the guest clock if it's wrong. If the guest clock is off by just a little bit the tool will speed up or slow down the guest clock until it has the correct time (using an API like SetSystemTimeAdjustment on Windows or adjtime on Unix). If you're wondering how the tool accesses the host's clock, there's just an API for it that the VMware tool knows how to use.

Resources