kvm vs. vmware for kernel debugging / USB driver development - linux-kernel

I'm currently setting up vmware Server 2.0 for kernel debugging with gdb ( see this setup guide ) and someone asked me why not use kvm?
So I ask: kvm vs. vmware for kernel debugging / USB driver development
what are the pros and cons of each?

Driver development? are you working on a driver for a particular piece of hardware? if so, then you probably won't be able to use virtualization, because the virtualized instance won't have access to the new hardware.
For this you will need two machines, one running a remote debugger on the other.
*Edit: * Apparently you're developing a driver for a USB Device? this is one area in particular that a VM actually Can help. These days most VM's have the ability to delegate specific USB devices to a guest OS.
That said, this situation doesn't really offer any benefits over the remote debugger option, because you still need a way to inspect the state of the running or crashed OS, and VM's offer very little assistance in this regard. You might be able to replay saved states from just before a crash.
You might be able to get a bit of traction using UML, which would allow you to do local debugging as on a regular user process, which is a little bit less trouble.

Instead of answering the direct question I'll add another option... Depending on if the kernel in question is a Linux kernel, and what part(s) of it you are working on, you might find that UserModeLinux (included in the 2.6.x source, and available as patch sets for 2.4 and 2.2) may trump both of those options.
As it runs the kernel as a userland process under the host kernel it is easier to attach common debugging tools to. I believe it is very commonly used in the early stages of updates/additions to file-system related code. If you are developing/debugging modules that interact directly with hardware it may be much less use to you though.
Reference links: home,
other

I recently started building GNU Mach/HURD and found the combination of QEmu/KVM to work really quite well.. for the following reasons:
QEmu presents quite a clean environment
Networking has alot of options
I can easily mount the filesystem using a raw device file / loopback
Bottom line is, for kernel work I just want the minimum of functionality to boot and see the result. VMWare is much more for usable virtualization rather than down-and-dirty.
There is however no comparison to booting on a real machine with real hardware. The VM environment can seem like a safety blanket somtimes ... because even my toaster would know what a Realtek RTL8139C was.

If it is a "real hardware" device, of course, vmware will not emulate it, so you won't be able to debug the driver under it (nor will any other virtualisation software, unless you extend one to do so).
Device driver debugging can be done to some extent with a real hardware machine with a normal kernel - although there are obviously things you can't do - like set breakpoints.
It is still possible to attach a debugger to the kernel and inspect stuff. Moreover, traditional printf() debugging is quite possible (printk, anyone), and there are various features in the kernel which make debugging easier. It's possible to build the kernel with various debug options to try to detect pointer problems, memory leaks etc.
By default, the kernel even gives a nice-ish stack trace on the log when it encounters an OOPS or BUG condition (obviously this does not necessarily get written anywhere if the system hangs or crashes). Of course a pointer-out-of-range condition happening inside an interrupt is a recipe for disaster, but you could still get a stack trace on the screen immediately before the panic :)

Related

Driver debugging on a local machine

Why there is no GUI kernel debugger like SoftICE, which allows to debug kernel driver on a local machine nowadays? Why remote machine is obligatory for driver debugging in Windows 7 and higher?
An in-system kernel-mode debugger is an extremely complicated software because it must take care of many low-level kernel resources and operations. If kernel internals are changed in the next OS version, the debugger must be updated accordingly. Debugger developers must work together with kernel developers and have access to kernel source code. All that makes in-system debugger development complex and expensive.
And any kernel-mode debugging on the development system is a dangerous and inconvenient process. Even if no FS damages and/or other data losses occur due to a BSOD, booting a development system, starting all required applications to re-create convenient development environment is much longer process than rebooting a dumb target machine (hardware or virtual).
When hardware computers were expensive, there was no efficient remote debugging interface and there were no efficient virtual machine solutions, SoftICE was an acceptable tool. But in the last 15 years, remote kernel debugging in Windows had been greatly improved, so using WinDbg is much more convenient that using SoftICE, even though WinDbg has many flaws and bugs.

How does the ARM Linux kernel map console output to a hardware device on boot?

I'm currently struggling to determine how I can get an emulated environment via QEMU to correctly display output on the command line. I have an environment that displays perfectly well using the virt reference board, a cortex-a9CPU, and the 4.1 Linux kernel cross-compiled for ARM. However, if I swap out the 4.1 kernel for 2.6 or 3.1, suddenly I can no longer see console output.
While solving this issue is my main goal, I feel like I lack a critical understanding of how Linux and the hardware initially integrate before userspace configurations via boot scripts and whatnot have a chance to execute. I am aware of the device tree, and have a loose understanding of how it works. But the issue I ran into where a different kernel version broke console availability entirely confounds me. Can someone explain how Linux initially maps console output to a hardware device on the ARM architecture?
Thank you!
The answer depends quite a bit on which kernel version, what config options are set, what hardware, and also possibly on kernel command line arguments.
For modern kernels, the answer is that it looks in the device tree blob it is passed for descriptions of devices, some of which will be serial ports, and it initializes those. The kernel config or command line will specify which of those is to be used for the console. For earlier kernels, especially if you go all the way back to 2.6, use of device tree was less universal, and for some hardware the boot loader simply said "this is a versatile express board" (for instance) and the kernel had compiled-in data structures to tell it where the devices were for each board that it supported. As the transition to device tree progressed, boards were converted one by one, and sometimes a few devices at a time, so what exactly the situation was for any specific kernel version depends on which board you're using.
The other thing that I rather suspect you're running into is that if the kernel crashes early in bootup (ie before it finds the serial port at all) then it will never output anything. So if the kernel is just too early to support the "virt" board properly at all, or if your kernel config is missing something important, then the chances are good that it crashes in early boot without being able to print you a useful message. (Sometimes "earlycon" or "earlyprintk" kernel arguments can assist here, but not always.)

Immidiate and latent effects of modifying my own kernel binary at runtime?

I'm more of a web-developer and database guy, but severely inconvenient performance issues relating to kernel_task and temperature on my personal machine have made me interested in digging into the details of my Mac OS (I notices some processes would trigger long-lasting spikes in kernel-task, despite consistently low CPU temperature and newly re-imaged machine).
I am a root user on my own OSX machine. I can read /System/Library/Kernels/kernel. My understanding is this is "Mach/XNU" Kernel of this machine (although I don't know a lot about those, but I'm surprised that it's only 13Mb).
What happens if I modify or delete /System/Library/Kernels/kernel?
I imagine since it's at run-time, things might be okay until I try to reboot. If this is the case, would carefully modifying this file change the behavior of my OS, only effective on reboot, presuming it didn't cause a kernel panic? (is kernel-panic only a linux thing?)
What happens if I modify or delete /System/Library/Kernels/kernel?
First off, you'll need to disable SIP (system integrity protection) in order to be able to modify or edit this file, as it's protected even from the root user by default for security reasons.
If you delete it, your system will no longer boot. If you replace it with a different xnu kernel, that kernel will in theory boot next time, assuming it's sufficiently matched to both the installed device drivers and other kexts, and the OS userland.
Note that you don't need to delete/replace the kernel file to boot a different one, you can have more than one installed at a time. For details, see the documentation that comes with Apple's Kernel Debug Kits (KDKs) which you can download from the Apple Developer Downloads Area.
I imagine since it's at run-time, things might be okay until I try to reboot.
Yes, the kernel is loaded into memory by the bootloader early on during the boot process; the file isn't used past that, except for producing prelinked kernels when your device drivers change.
Finally, I feel like I should explain a little about what you actually seem to be trying to diagnose/fix:
but severely inconvenient performance issues relating to kernel_task and temperature on my personal machine have made me interested in digging into the details of my Mac OS
kernel_task runs more code than just the kernel core itself. Specifically, any kexts that are loaded (see kextstat command) - and there are a lot of those on a modern macOS system - are loaded into kernel space, meaning they are counted under kernel_task.
Long-running spikes of kernel CPU usage sound like they might be caused by file system self-maintenance, or volume encryption/decryption activity. They are almost certainly not basic programming errors in the xnu kernel itself. (Although I suppose stupid mistakes are easy to make.)
Another possible culprits are device drivers; especially GPU drivers are incredibly complex pieces of software, and of course are busy even if your system is seemingly idle.
The first step to dealing with this problem - if there indeed is one - would be to find out what the kernel is actually doing with those CPU cycles. So for that you'd want to do some profiling and/or tracing. Doing this on the running kernel most likely again requires SIP to be disabled. The Instruments.app that ships with Xcode is able to profile processes; I'm not sure if it's still possible to profile kernel_task with it, I think it at least used to be possible in earlier versions. Another possible option is DTrace. (there are entire books written on this topic)

windows driver development

I am new to windows driver development, so please bear with me if my question is being too stupid. Well, I am not sure why, as MSDN suggested and also the way I perceived, the host computer, e.g developing the driver, and the target computer, e.g debugging the driver, need to be two separate ones. why such separation? I did try to merge those two by deploying and debugging a driver on the host computer, in which I am developing a driver, and it seemed work with no objection from windows. Thanks.
PS. Source like this http://msdn.microsoft.com/en-us/library/windows/hardware/hh698272(v=vs.85).aspx got me think so.
Practically, when you are developing and testing a driver, in many situation you will get system crash (BSOD) and your system may not be bootable. In such situations your development + debugger environment is also gone/in-accessible.
Two separate machines are required for kernel debugging. You cannot debug self by obvious reasons (a debugger and a debuggee are in the same kernel and a deadlock appears). Of course, the target machine can be a virtual one.
When we develop a driver and test it the system will crash and a blue screen (called BSOD - blue screen of death)will show up. This is not the case like developing a User mode application and it crashed due to a memory error. Your driver will be running as a kernel mode application , If it crashes due to any illegal memory operation then the whole system is gone. It is not a simple issue to resolve , You need to log into safe mode and remove the driver from your system to recover it.
Due to this it is preferred to use a target machine mostly a VM on which the driver is installed and a host machine there we will be using a debugger to debug the driver.

Do I need two machines to develop IOKit Mac drivers?

I'm building an IOKit CFPlugin driver for OS X. I'll be working with network data coming in that will be translated to MIDI data. No hardware is involved other than the built-in Airport. I have experience with drivers on Windows machines and firmware but this is my first dip into doing it on the Mac. So far things are going pretty well, but the Apple documentation sez: "For safety reasons, you should not load your driver on your development machine."
I only have one Mac. I really don't want two Macs- sorry, Apple. Should I take this warning seriously? Are there things I need to know?
Thanks, Tom Jeffries
You could also consider running OS X inside a VM as your testbed. It would surely be much more convenient that having a separate boot volume.
The warning is rather poorly worded; what you should consider doing is using a separate boot volume (partition) for trying out your driver, since it's possible to arbitrarily hose your system with your driver.
If you're doing kernel development on any OS that isn't isolated from your main system (via a VM, alternate boot disk, etc.), you're crazy!
What may be a bigger issue is that you can't do any kernel debugging, because the only option for that is to use GDB on a remote OS X system. For this, you may want to consider running OS X in virtualization.
you DEFINITELY want to have some way to recover a fubar kext installation: a bootable external drive or something you can quickly restore from-- this is the main reason for Apple's warning against running in-development-kernel-extensions on your production machine.
Nicholas is right that in order to debug using gdb (the only way in kernel space) you do need two machines. I've never tried using a VM as Coxy suggests: but I guess it's feasible (assuming that you run your kext on the virtual machine and use the real host machine to run gdb).
My preferred method for tracing and debugging in the kernel is kprintf() routed to firewire (aka firewire kprintf (man fwkpfv) ). for this you do need two machines with firewire ports.
finally, being an old computer musician myself, I wonder why you want to program a MIDI synthesizer (or transformer) on the network stack level. my guess is that you would have a much more gratifying experience working in userland (where you can use floating point math...)
if you need some hints or tips, feel free to get in touch...
|K<
from the ADC Kernel Programming Guide
Kernel programming is a black art that
should be avoided if at all possible.
Fortunately, kernel programming is
usually unnecessary. You can write
most software entirely in user space.
Even most device drivers (FireWire and
USB, for example) can be written as
applications, rather than as kernel
code. A few low-level drivers must be
resident in the kernel's address
space, however, and this document
might be marginally useful if you are
writing drivers that fall into this
category.

Resources