Debugging an Operating System - debugging

I was going through some general stuff about operating systems and struck on a question. How will a developer debug when developing an operating system i.e. debug the OS itself? What tools are available to debug for the OS developer?

Debugging a kernel is hard, because you probably can't rely on the crashing machine to communicate what's going on. Furthermore, the codes which are wrong are probably in scary places like interrupt handlers.
There are four primary methods of debugging an operating system of which I'm aware:
Sanity checks, together with output to the screen.
Kernel panics on Linux (known as "Oops"es) are a great example of this. The Linux folks wrote a function that would print out what they could find out (including a stack trace) and then stop everything.
Even warnings are useful. Linux has guards set up for situations where you might accidentally go to sleep in an interrupt handler. The mutex_lock function, for instance, will check (in might_sleep) whether you're in an unsafe context and print a stack trace if you are.
Debuggers
Traditionally, under debugging, everything a computer does is output over a serial line to a stable test machine. With the advent of virtual machines, you can now wire one VM's execution serial line to another program on the same physical machine, which is super convenient. Naturally, however, this requires that your operating system publish what it is doing and wait for a debugger connection. KGDB (Linux) and WinDBG (Windows) are some such in-OS debuggers. VMWare supports this story explicitly.
More recently the VM developers out there have figured out how to debug a kernel without either a serial line or kernel extensions. VMWare has implemented this in their recent stuff.
The problem with debugging in an operating system is (in my mind) related to the Uncertainty principle. Interrupts (where most of your hard errors are sure to be) are asynchronous, frequent and nondeterministic. If your bug relates to the overlapping of two interrupts in a particular way, you will not expose it with a debugger; the bug probably won't even happen. That said, it might, and then a debugger might be useful.
Deterministic Replay
When you get a bug that only seems to appear in production, you wish you could record what happened and replay it, like a security camera. Thanks to a professor I knew at Illinois, you can now do this in a VMWare virtual machine. VMWare and related folks describe it all better than I can, and they provide what looks like good documentation.
Deterministic replay is brand new on the scene, so thus far I'm unaware of any particularly idiomatic uses. They say it should be particularly useful for security bugs, too.
Moving everything to User Space.
In the end, things are still more brittle in the kernel, so there's a tremendous development advantage to following the Nucleus (or Microkernel) design, where you shave the kernel-mode components to their bare minimum. For everything else, you can use the myriad of user-space dev tools out there, and you'll be much happier. FUSE, a user-space filesystem extension, is the canonical example of this.
I like this last idea, because it's like you wrote the program to be writeable. Cyclic, no?

In a bootstrap scenario (OS from scratch), you'd probably have to introduce remote debugging capabilities (memory dumping, logging, etc.) in the OS kernel early on, and use a separate machine. Or you could use a virtual machine/hypervisor.
Windows CE has a component called KITL - Kernel Independent Transport Layer. I guess the title speaks for itslf.

You can use a VM: eg. debug ring0 code with bochs/gdb
or Debugging NetBSD kernel with qemu
or a serial line with something like KDB.

printf logging
attach to process
serious unit tests
etc..

Remote debugging with kernel debuggers, which can also be done via virtualization.

Debugging an operating system is not for the faint of heart. Because the kernel is being debugged, your options would be quite limited. Copious amount of printf statements is one trick, and furthermore, it depends on really what 'operating system' is being debugged, we could be talking about
Filesystem
Drivers
Memory management
Raw Disk input/output
Screen input/output
Kernel
Again, it is a widely varying exercise as in the above, they all interact with one another. Even more complicated is the fact, supposing you were to debug the kernel, how would you do it if the runtime environment is not properly set (by that, I am talking about the kernel's responsibility for loading binary executables).
Some kernels may (not all of them have them) incorporate a simple debug monitor, in fact, if I rightly recall, in the book titled 'Developing your own 32bit Operating System' by Richard A Burgess, Sams publishing, he incorporated a debug monitor which displays various states of the CPU, registers and so on.
Again, take into account of the fact that the binary executables require a certain loading mechanism, for example a gdb equivalent, if the environment for loading binaries are not set up, then your options are quite limited.
By using copious amount of printf statements to display errors, logs etc to a separate terminal or to a file is the best line of debugging, it does sound a nightmare but it would be worth the effort to do so.
Hope this helps,
Best regards,
Tom.

Related

Tool displaying processor or core assignment for process in Windows?

In Windows 7, is there a tool that will allow me to see the cpu/core to which a process has been assigned for a recent timeslice under windows? I need to demonstrate that a particular application's process's threads can, and do, land on different processors/cores in a multi-processor/core environment with default scheduling behavior.
Intel VTune for Windows may be what you're looking for.
As for the point you're trying to demonstrate, the answer is almost certainly yes, but it will depend on what else is happening in the system. You can of course take control of which core(s) a thread runs on using the core affinity API routines, but you have to work really hard to beat the OSes own judgement.
Under Solaris there's DTrace, and Linux has a clone called FTrace. I've used FTrace and it does exactly what you want. It might be worth Googling around for an DTrace for Windows. The Windows Performance Toolkit might be just that.

Stepping through a TCP/IP stack

I was working as a QA engineer for a proprietary embedded operating system. They built their own ATN stack and stepping though it with a debugger was the most eye opening experience I have had with networking. Watching each layer of the stack build their part of the packet was amazing. Then finally being able to see the built packet on the wire had more meaning.
As an educator I would like share this experience with others. Does anyone know of a straight forward method stepping though a TCP/IP stack? Ideally I would like something easier than debugging a *BSD or Linux kernel, although if this is the only option then some tips and tricks for this process would be nice. A reference stack written in C/C++ that could be run in user mode with Visual Studio or Eclipse would be ideal.
This all depends on what you want to focus on. From your question, the thing you are most interested in is the data flow throughout the different layers (user-space stream -> voltage on the cable).
For this, I propose you use http://www.csse.uwa.edu.au/cnet/, which is a full network simulator. It allows you to step through all levels of the stack.
Real systems will always have a clear distinction between Layer3, Layer2 and Layer1 (Ethernet and CRC-checking firmware on chip, hardware MAC). You will have trouble getting into the OS and some implementation details will be messy and confusing for students. For Linux, you'll have to explain kernel infrastructure to make sense of the TCP/IP stack design.
If you are only interested in the TCP/IP part, I recommend you use an embedded TCP/IP stack like http://www.sics.se/~adam/lwip/ . You can incorporate this into a simple user-space program and fully construct the TCP/IP packet.
Please note that there are a lot of network communication aspects that you cannot address while stepping through the TCP/IP stack. There is still a MAC chip in between which regulates medium access, collisions etc. Below that, there is a PHY chip which translates everything into electric/optical signals, and there is even a protocol which handles communication between MAC and PHY. Also, you are not seeing all aspects related to queueing, concurrency, OS resource allocation ea. A full picture should include all of these aspects, which can only be seen in a network simulator.
I would run Minix in a virtual machine and debug that. It is perfect for this.
Minix is a full OS with TCP/IP stack so you have the code you need. However, unlike Linux/BSD its roots and design goal are to be a teaching tool, so it eschews a certain level of complexity in favor of being clear. In fact, this is the OS Linus Torvalds started hacking on when he started out with Linux :-)
You can run minix in an VM such as VirtualBox or VMware and debug it. There are instruction on the web site: http://www.minix3.org/
I personally learned TCP/IP stack using DOS and SoftICE (oops, leaked that I'm an old guy). Using DOS on a virtual machine and debug through a TCP/IP driver will be much simpler since your goal is to educate how TCP/IP works. Modern OS does a lot of optimization on network I/O and it's not easy to debug through.
http://www.crynwr.com/ has a bunch of open source packet drivers. Debugging with source code shall be a bit easier.
This not exactly what you are looking for but I hope this helps
1995 - TCP/IP Illustrated, Volume 2: The Implementation (with Gary R. Wright) - ISBN 0-201-63354-X
Just walk through the code side by side. Near stepping through experience. Mr Steven's explains key variables too. Just awesome. Note: Code may have changed since the book but still awesome.
Probably lwIP project is what you are looking for because it can be run without an operating system.
As for debugging Linux kernel, there is not very simple, but well-known way to do it. Use KGDB. Install debugging version of Linux kernel on virtual machine or on separate box. And remotely connect GDB to this machine. Probably you would like to use some GDB frontend instead of text-only interface. If you need more details on kernel debugging from more competent people, just add "linux" tag to the question.
I actually wrote a small subset of a TCP/IP stack in a 8051 once, it was a very enlightening experience.
I believe that the best way to learn something is by doing it. Once you finished your task, go and get feedback with other developers and compare your implementation with other existing ones.
My opinion might be biased here, but I think that doing this in a embedded platform is the best way to go. What you are trying to do is very low level, and a PC will just add more complexity into the problem. A embedded chip has no operational system to get in your way. Besides that, it is very satisfying to see a simple 8051 respond to ping requests and telnet calls.
They key is to start small, don't try to create a full TCP/IP stack all at once. Write the code to handle the MAC first, then IP, Ping, UDP and finally TCP.
I don't think that studying an existing implementation is a good ideia. TCP/IP implementations tend to be bloated with code that is unrelated with your goal.
I work in the TCP/IP industry. In BSD and variants, the function tcp_input() is an ideal starting point to explore the innards of TCP. Setting a breakpoint on this function and stepping through it on a live system can give a lot of enlightenment. If that is hard, you can simply browse through the source to get a broad feel of it:
http://fxr.watson.org/fxr/source/netinet/tcp_input.c
It will take time, many weeks at least, to understand the big picture. Quite exhilarating. :-)
You can run the NetBSD IP stack in userspace in Linux or other OS, with gdb or whatever see http://www.netbsd.org/docs/rump/ and https://github.com/anttikantee/buildrump.sh and then eg feed it to a tun/tap device so you can see whats on the wire.

How are operating systems debugged?

How are operating systems typically debugged? They cannot be stepped through with a debugger like simple console programs, and the build times are too large to repeatedly make small changes and recompile the whole thing.
They aren't debugged as a multi-gigabyte programs! :)
If you mean the individual user-mode components, they can mainly be debugged just like normal programs and libraries (because they are normal programs/libraries!).
For kernel-mode components, though, each OS has its own mechanism; here is some information regarding the way that we do kernel debugging in Windows. It can be done using the help of another machine connected to the machine you're debugging, via a serial port or something. I'm not familiar with the process itself, but that's the gist of how they work. (You need to set some boot loader options so that the system is ready for the debugger to be connected as early as possible.)
It depends on which part of the operating system you're talking about. When I worked at MSFT, I worked on the IE team. We debugged IE and the shell (Windows Explorer) in Visual Studio and stepped through them line by line all day long. Though, sometimes, it's easier to debug using a command line tool such as NTSD.
If, however, you want to debug anything in Kernel land such as the OS kernel or device drivers, which I suspect is really what you're asking, then you must use the Kernel debugger. For Windows that is a command line tool called kd, and generally you run the debugger on one machine and remotely debug the target.
There are a whole set of techniques throughout history from flashing lights on the console, to the use of hardware devices like an ICE, to more modern techniques utilizing fairly standard debuggers. One technique that is more common among OS developers then application developers is the analysis of a core dump. Look at something like mdb on solaris for ideas about how Solaris kernel developers do some of their debugging. Also tracing technologies are used. Anywhere from fairly straightforward logging packages to more modern techniques like dtrace.
Also note that the techniques used depend on the layer of software. Initial boot tends to be a fairly hard place to get your fingers into. But after that the environment of modern operation systems looks more and more like the application setting you are use to. In the end, it is all code :)

LPT control on Windows

I am into new project, which should use microcontroller. The easiest way to program it is using parallel port. But, there are few things I hope you can help me with. Oh, and the preferred language is C and platform Windows.
So, I studied LPT ports and Windows a bit, and from what I learned the most important is: Since Windows NT based systems, you cannot use instructions for direct port manipulation. This should be, because now programs are run in different privilege mode, which doesn't support the kind of instructions that are used by outport() function.
But at this point, I don´t understand a few things. First, I thought that Windows actually used privilege levels since first protected mode version, but that's the wrong assumption.
But more importantly, I thought that Windows has included functions for just about any hardware communication. I mean, anything you do in Windows these days, you just call windows functions which further call kernel services. I assumed that outport() doesn´t use any Windows function, and just makes the communication itself, which is prohibited now. But I am literally shocked that there is no system function to control parallel ports in modern Windows systems. At least that's what I read.
But even if I could get the control of parallel port, there comes my second problem.
For programming the controller, I need to follow special protocol, especially timing. But since Windows is multitasked, I worry about what if Scheduler switches to another app, and therefore when is the right time to switch signals on LPT, my program just will not be able to run.
Oh, by the way, I know I could use any 3rd-party apps, but I just like to be able to do it myself, or at least before I use some 3rd-party app, I want to know how it works. And yes, you can program some microcontrollers just by parallel port with some resistors, I know this for sure.
Thanks.
For windows you need to install a DLL which contains a driver to run at elevated privileges to get access to the HW ports.
You can find such a library at :
http://logix4u.net/Legacy_Ports/Parallel_Port/Inpout32.dll_for_Windows_98/2000/NT/XP.html
There are also some links to sample code.
I do not know which uController you are using, but I programmed in the past a variety of them and never had issues with timings, well for programming at least. The programming protocols are usually robust enough to deal with the jitter caused by multitasking. Just keep your clock edges and signa edges well separated and it should go fine.

Quick CPU ring mode protection question

I am very curious in messing up with HW. But my top level "messing" so far was linked or inline assembler in C program. If my understanding of CPU and ring mode is right, I cannot directly from user mode app access some low level CPU features, like disabling interrupts, or changing protected mode segments, so I must use system calls to do everything I want.
But, if I am right, drivers can run in ring mode 0. I actually don´t know much about drivers, but this is what I ask for. I just want to know, is learning how to write your own drivers and than call them the way I should go, to do what I wrote?
I know I could write whole new OS (at least to some point), but what I exactly want to do is acessing some low level features of HW from standart windows application. So, is driver the way to go?
Short answer: yes.
Long answer: Managing access to low-level hardware features is exactly the job of the OS kernel and if you only want access to a single feature there's no need to start your own OS from scratch. Most modern OSes, such as WIndows, Linux, or the BSDs, allow you to add code to the kernel through kernel modules.
When writing a kernel module (or device driver), you write code that is going to be executed inside the OS kernel and will thus be running in CPU ring 0. Great power comes with great responsibility, which in this case means that you should really know what you're doing as there will be no pre-configured OS interface to prevent you from doing the wrong things. You should therefore study the manuals of your hardware (e.g., Intel's x86 software developer's manuals, device specs, ...) as well as standard operating systems development literature (where you're also going to find plenty on the web -- OSDev, OSDever, OSR, Linux Device Drivers).
If you want to play with HW write some programs for 16-bit real-mode (or even with your own transition to protected-mode). There you have to deal with ASM, BIOS interrupts, segments, video memory and a lot of other low-level stuff.

Resources