NUMA simulation on x86_64 machine - multiprocessing

I wanted to know if there is a way to perform experiments specific to NUMA architecture
on a x86_64 machine, by some sort of simulation tools.
I cam across some resource on creating fake NUMA nodes but couldn't figure out how exactly
to use them. Virtual machines are also fine, if there is a way.
Thanks,

Related

Low hardware simulation for performance profiling

I need to optimize the app I'm working on and I can't get reliable profiling data on my development machine. The app should run on low end ARM hardware on QNX, but from logistic reasons I don't have access to the final hardware for profiling.
I've tried to do profiling on my development machine, but as you can imagine everything is so fast that I can't pin point the slow parts. I've created a Linux virtual machine with reduced memory and CPU cores count, but they are still too fast compared to the final hardware.
Is it possible to reduce the CPU clock speed/ram speed/disk speed in a virtual machine to simulate low performance hardware or is there any other way to get relevant profiling data on my development machine?
Considering the app is processing several gigabytes of data I assume disk access is a major bottleneck and limiting disk speed might help
I can use any (as in most open source and commercially available) tool/approach that runs on Windows/Linux/MacOS on real or virtual machine.
This URL describes how to limit disk bandwidth on VirtualBox images. You could run a Linux VM on Virtualbox and use this method to limit disk access speeds, turn off Disk Caching using suggestions from this answer and profile your application. Alternatively you can download QNX SDP, which comes with the option of a prebuilt x86_64 Virtual Machine image that can be run using VMWare/Virtualbox/qemu
My previous experiences with QNX on armv7 and x86_64 suggest that the devb-sdmmc driver is possibly a bottleneck when working with a lot of big files being read from flash storage. devb-sdmmc and io-blk often require fine tuning of the drivers with proper cache, block, read-ahead size and other parameters helps improve disk access performance.

Difference between KVM and LXC

What is the difference between KVM and Linux Containers (LXCs)? To me it seems, that LXC is also a way of creating multiple VMs within the same kernel if we use both "namespaces" and "control groups" features of kernel.
Text from https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/Resource_Management_and_Linux_Containers_Guide/sec-Linux_Containers_Compared_to_KVM_Virtualization.html Copyright © 2014 Red Hat, Inc.:
Linux Containers Compared to KVM Virtualization
The main difference between the KVM virtualization and Linux
Containers is that virtual machines require a separate kernel instance
to run on, while containers can be deployed from the host operating
system. This significantly reduces the complexity of container
creation and maintenance. Also, the reduced overhead lets you create a
large number of containers with faster startup and shutdown speeds.
Both Linux Containers and KVM virtualization have certain advantages
and drawbacks that influence the use cases in which these technologies
are typically applied:
KVM virtualization
KVM virtualization lets you boot full operating systems of different
kinds, even non-Linux systems. However, a complex setup is sometimes
needed. Virtual machines are resource-intensive so you can run only a
limited number of them on your host machine.
Running separate kernel instances generally means better separation
and security. If one of the kernels terminates unexpectedly, it does
not disable the whole system. On the other hand, this isolation makes
it harder for virtual machines to communicate with the rest of the
system, and therefore several interpretation mechanisms must be used.
Guest virtual machine is isolated from host changes, which lets you
run different versions of the same application on the host and virtual
machine. KVM also provides many useful features such as live
migration. For more information on these capabilities, see Red Hat
Enterprise Linux 7 Virtualization Deployment and Administration Guide.
Linux Containers:
The current version of Linux Containers is designed primarily to
support isolation of one or more applications, with plans to implement
full OS containers in the near future. You can create or destroy
containers very easily and they are convenient to maintain.
System-wide changes are visible in each container. For example, if you
upgrade an application on the host machine, this change will apply to
all sandboxes that run instances of this application.
Since containers are lightweight, a large number of them can run
simultaneously on a host machine. The theoretical maximum is 6000
containers and 12,000 bind mounts of root file system directories.
Also, containers are faster to create and have low startup times.
source
LXC, or Linux Containers are the lightweight and portable OS based virtualization units which share the base operating system's kernel, but at same time act as an isolated environments with its own filesystem, processes and TCP/IP stack. They can be compared to Solaris Zones or Jails on FreeBSD. As there is no virtualization overhead they perform much better then virtual machines.
KVM represents the virtualization capabilities built in the own Linux kernel. As already stated in the previous answers, it's the hypervisor of type 2, i.e. it's not running on a bare metal.
This whitepaper gives the difference between the hypervisor and linux containers and also some history behind the containers
http://sp.parallels.com/fileadmin/media/hcap/pcs/documents/ParCloudStorage_Mini_WP_EN_042014.pdf
An excerpt from the paper:
a hypervisor works by having the host operating
system emulate machine
hardware and then bringing up
other virtual machines (VMs)
as guest operating systems on
top of that hardware. This
means that the communication
between guest and host
operating systems must follow
a hardware paradigm (anything
that can be done in hardware
can be done by the host to the
guest).
On the other hand,
container virtualization (shown
in figure 2), is virtualization at
the operating system level,
instead of the hardware level.
So each of the guest operating
systems shares the same kernel, and
sometimes parts of the operating system, with
the host. This enhanced sharing gives
containers a great advantage in that they are
leaner and smaller than hypervisor guests,
simply because they're sharing much more of
the pieces with the host. It also gives them the
huge advantage that the guest kernel is much
more efficient about sharing resources
between containers, because it sees the
containers as simply resources to be
managed.
An example:
Container 1 and Container 2 open the same file, the host kernel opens the file and
puts pages from it into the kernel page cache. These pages are then handed out to
Container 1 and Container 2 as they are needed, and if both want to read the same
position, they both get the same page.
In the case of VM1 and VM2 doing the same thing, the host opens the file (creating
pages in the host page cache) but then each of the kernels in VM1 and VM2 does
the same thing, meaning if VM1 and VM2 read the same file, there are now three
separate pages (one in the page caches of the host, VM1 and VM2 kernels) simply
because they cannot share the page in the same way a container can. This
advanced sharing of containers means that the density (number of containers of
Virtual Machines you can run on the system) is up to three times higher in the
container case as with the Hypervisor case.
Summary:
KVM is a Hypervisor based on emulating virtual hardware. Containers, on the other hand, are based on shared operating systems and is skinnier. But this poses a limitation on the containers that that we are using a single shared kernel and hence cant run Windows and Linux on the same shared hardware

Better performance from windows virtualboxes on ubuntu or from ubuntu virtualboxes on windows

I am planning to develop an automated test solution with multiple windows machines and multiple ubuntu machines that perform related/interdependent tasks. To start the project, I'd like to have one or two windows machines (virtual) and a few ubuntu machines (virtual) running on a single desktop. It seems likely that I will be pushing a single desktop to the limit here so I am trying to guess if I will have better luck if my host OS is ubuntu or if it is Windows 7. I would be able to use the host OS as one of the machines in my environment. The desktop is some sort of above average Dell, but nothing really impressive.
Does anyone have any insight here? I've worked mostly with VMWare in the past and my host was windows along with my VMs.
Note: VirtualBox is a type-2 hypervisor (it runs on the host OS, not on the hardware like a type-1 hypervisor) and tends to offer weaker performance than, for example, Hyper-V, ESX or XEN (type-1 hypervisors).
Therefore, if performance is a considerable concern, you may squeeze more juice out of Win8 or Windows Server 2012 box running, for example, Hyper-V. Further reading on this here and here (YMMV).
How your environment will run when hosted by a Windows vs. a Linux box is, frankly impossible to tell. I suggest you build your VM's and try dual-booting your machine in Windows and Linux and measuring your scenario. Be sure to have enough RAM in the host to allocate enough working RAM to each VM and enough IO throughput that your host doesn't end up dragging the perf of all VM's down if one VM saturates the machine's IO.
One last note of caution though: Don't completely trust fine-grained perf results measured on VM's - even the best hypervisors cannot truly replicate the perf' characteristics of code running on bare-metal. Treat your measurements as a guideline only.
Measure, then measure again. Measure again just to be sure ... and THEN tweak your config and re-measure, measure, measure ;)
My $0.02:
If its VirtualBox you are using I would go with Ubuntu for certain. I have an AMD 945 Phenom with 16GB of Ram with 12.04LTS 64bit . I can usually have 2 VM's running Windows and / or Ubuntu guests and never consume more than 7 GBs of RAM . If your running them in a testing solution you could expect to probably see 12 maybe 13 GBs of RAM, but the CPU might be your problem. My AMD Phenom runs great, but would be maxed out for sure. I use VMWare at work and on my Laptop and would recommend that if you were running a Windows Host. I also have VMWare on my Ubuntu host, but it just does not run as well as it does on Windows., at least for me.

using pci to interconnect motherboards

i've got a few old mobos and i was wondering whether it might be possible to create a pair of pci header cards with interconnect wires and write some software to drive the interconnect cards to allow one of the mobos to access the cpu and ram on the other? i'm sure it would be an arduous undertaking involving writing a device driver for the header boards and then writing an application to make use of the interconnect; perhaps a simple demo demonstrating a thread running on each processor and use of both sets of ram, perhaps creating a mini virtual machine that maps 2x3gb ram on 32 bit mobos to a single 6 gb address space. a microcontroller may be needed on each pci header card to act as a translator.
given that mobos almost always have multiple pci slots, i wonder if these interconnected card pairs could be used to daisychain mobos in a sort of high speed beowulf cluster.
i would use debian for each mobo and probably just an atmega128 for each card with a couple of ribbon cables for interconnecting.
pci is basically just an io bus, so i don't see why this shouldn't be possible (but it would be pretty hard going).
does anyone have any advice or has this sort of thing been done before?
Update:
Thanks Martin. What you say makes sense, and it would also seem that if it were possible that it would have already been done before.
Instead, would it be possible to indirectly control the slave cpu by booting it using a "pretend" bootable storage device (hard disk, usb stick, etc)? As long as the slave mobo thinks its being operated by an operating system on a real device it should work.
This could potentially extend to any interface (sata, ide, usb etc); if you connected two pcs together with a sata/ide/usb cable (plug one end of an ide ribbon into one mobo and the other into another mobo), that would be all the hardware you need. the key is in creating a new driver for that interface on the master pc, so instead of the master pc treating that interface as having a storage device on it, it would be driven as a dummy bootable hard disk for the slave computer. this would still be a pretty difficult job for me because i've never done device drivers before, but at least i wouldn't need a soldering iron (which would be much further beyond me). i might be able to take an open source ide driver for linux, study it, and then butcher it to create something that kindof acts in reverse (instead of getting data off it, an application puts data onto it for the slave machine to access like a hard disk). i could then take a basic linux kernel and try booting the slave computer from an application on the master computer (via the butchered master pc ide/sata/usb device driver). for safety, i would probably try to isolate my customised driver as much as possible by targeting an interface not being used for anything else on the master pc (the master pc might use all sata hard disks with the ide bus normally unused, so if i created a custom ide driver it might cause less problems with the host system - because it is sata driven).
Does anyone know if anything like this (faking a bootable hard disk from another pc) has ever been tried before? It would make a pretty cool hackaday on youtube, but also seriously it could add a new dimension to parallel computing if it proved promising.
The PCI bus can't take over the other CPU.
You could make an interconnect that can transfer data from a program on one machine to another. An ethernet card is the most common implementation but for high performance clusters there are faster direct connections like infiband.
Unfortunately PCI is more difficult to build cards for than the old ISA bus, you need surface mount controller chips and specific track layouts to match the impedance requirements of PCI.
Going faster than a few megabit/s involves understanding things like transmission lines and the characteristics of the connection cable.
would use debian for each mobo and probably just an atmega128 for each card with a couple of ribbon cables for interconnecting.
pci is basically just an io bus, so i don't see why this shouldn't be possible (but it would be pretty hard going).
LOL. PCI is an 32-Bit 33MHz Bus at minimum. So simply out of reach for an ATMEGA.
But your idea of:
a pair of pci header cards with interconnect wires and write some software to drive the interconnect cards to allow one of the mobos to access the cpu and ram on the other [...]
This is cheaply possible with just a pair of PCI Firewire (IEEE 1394) cards (and a Firewire cable). There is even a linux driver that allows remote debugging over firewire.

Linux kernel code segment memory page modifications

I'm trying to implement a "Semantic based memory sharing model" for Xen. As a part of my project, i'm trying to share kernel code pages across VMs. I' ve assumed that the code segments of linux kernels with similar version are 100% identical. But when i carry out some experiments using Virtual Machines running Debian Squeeze, i have found 3 memory pages are different in kernel code segment.
So my question is that, does the linux kernel modifies its code pages at runtime?
Yes, it can - for example, spinlocks can be dynamically patched out of the code if the kernel sees at runtime that it is running on a uniprocessor system. I do not know of an exhaustive list of such cases, you will need to inspect the code.
See the LWN article on SMP Alternatives for more information on one system that does runtime patching within the kernel.

Resources