How to debug deadlock problems in kernel

How to debug deadlock problems in kernel - debugging

I have a buggy kernel module which I am trying to fix. Basically when this module is running, it will cause other tasks to hang for more than 120 seconds. Since almost all the hung tasks are waiting for either mm->mmap_sem or some file system locks (i_node->i_mutex) I suspect that it has something to do with this module doesn't not grab the mmap_sem lock and some file-system level lock (like inote->i_mutex) in order, which could have caused some deadlock problem. Since my module does not try to grab those locks directly though, I assume it is some function I called that grab those locks. And now I am trying to figure out which function calls in my module is causing the problem.
However, I am having a hard time debugging it for the following reasons:
I don't know exactly which lock the hung task is trying to grab. I got the call trace of the hung task, and know at what point it hangs. Kernel also gives me some kind of information like:
"1 lock held by automount/3115:
0: (&type->i_mutex_dir_key#2){--..}, at: [] real_lookup+0x24/0xc5".
However, I want to know exact which lock a task holds, and exactly which lock it is trying to acquire in order to figure out the problem. As kernel doesn't provide the arguments of function calls along with the call trace, I find this information difficult to obtain.
I am using gdb andvmware to debug this, which allows me to set breakpoints, step into a function and such. However, as which task and at what point that task will hang is kind of un-deterministic, I don't really know where to set breakpoints and inspect. It will be great if I can somehow "attach" to the task which kernel reported to be blocked for more than 120 secs, and get some information about it.
So my questions are as following:
Where can I get, along with the call trace, the arguments of the functions in the call trace, in order to figure out exactly which lock a task is trying to grab.
Is it possible for me to use gdb to somehow "attach" to a hung task in a kernel? If not, is there some way for me to at least examine the data structure which represents that task? As I am having a hard time examining all the global data structure in kernel too. GDB always complains that "can't access memory 0x3200" or something similar.
It would also be very helpful if I can print out for every task in the kernel, what locks they are currently holding. Is there a way to do it?
Thank you very much!

Not answering your question directly, but hopefully this is more helpful - the Linux kernel has a built heavy duty lock validator called lockdep. Turn it on and let it run. If you have a lock order problem, it is likely to catch it and give you a detailed report.
See: http://www.mjmwired.net/kernel/Documentation/lockdep-design.txt

The kernel feature lockdep can help you in this regard. Check out my post on how to use it in your kernel: How to use lockdep feature in linux kernel for deadlock detection

Let me try.
1) Try KGDB
2) You mean a hung process?
http://www.ibm.com/developerworks/aix/library/au-unix-strace.html
3) Try the lsof package maybe.

Related

time.Sleep not waking up [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I ran several processes on my work desktop for several days. This morning all these processes pretty much stopped working. After some debugging I found out that after executing time.Sleep, the execution flow would just get stuck there and never wake up. So while everyone on my team was freaking out I just restarted my Windows 10 PC and people thought it was a desperation reboot. I guess luckily the issue went away after restart shrugs.
I wonder if anyone has experienced this before or has any idea what may be the cause? I read in another post that time.Sleep basically schedules when execution resumes by computing the absolute time in the OS, but AFAIK the date/time settings never changed.
I realize this may be difficult to diagnose but I've never encountered this problem on non-Windows machines. Needless to say I hate Windows and am biased towards Unix, but I promise to give Windows a chance if someone can give me some reasonable explanations on this bug.

(This is not going to be an answer — for the reasons below — but rather a couple of hints.)
The question lacks crucial context.
Was the desktop put to sleep (or hibernated) and woken up — so you expected the processes to continue from where they left off?
Are you sure the relevant goroutines were stuck in time.Sleep and not something else?
The last question is of the most interest but it's unanswerable as is.
To make it so, you'd need to arm your long-running processes with some means of debugging.
The simplest approach which works in a crude way but without much fuss is to kill your process in an interesting way: send it the SIGQUIT signal and the Go runtime will crash the process — dumping the stacktraces of the active goroutines to the process' stderr.
(Of course, this implies you did not trap this signal in your process' code.)
Windows does not have signals, but Ctrl-Break should work like Ctrl-\ in a Unix terminal where it typically sends SIGQUIT to the foreground process.
This approach could be augmented by tweaking the GOTRACEBACK environment variable — to cite the docs:
The GOTRACEBACK variable controls the amount of output generated when
a Go program fails due to an unrecovered panic or an unexpected
runtime condition. By default, a failure prints a stack trace for the
current goroutine, eliding functions internal to the run-time system,
and then exits with exit code 2. The failure prints stack traces for
all goroutines if there is no current goroutine or the failure is
internal to the run-time. GOTRACEBACK=none omits the goroutine stack
traces entirely. GOTRACEBACK=single (the default) behaves as described
above. GOTRACEBACK=all adds stack traces for all user-created
goroutines. GOTRACEBACK=system is like “all” but adds stack frames for
run-time functions and shows goroutines created internally by the
run-time. GOTRACEBACK=crash is like “system” but crashes in an
operating system-specific manner instead of exiting. For example, on
Unix systems, the crash raises SIGABRT to trigger a core dump. For
historical reasons, the GOTRACEBACK settings 0, 1, and 2 are synonyms
for none, all, and system, respectively. The runtime/debug package's
SetTraceback function allows increasing the amount of output at run
time, but it cannot reduce the amount below that specified by the
environment variable. See
https://golang.org/pkg/runtime/debug/#SetTraceback.
So, if you'd be running your process with GOTRACEBACK=crash, you could be able to not only collect the stacktraces but also a dump file (on typical Linux-based systems these days this requires running under ulimit -c unlimited as well).
Unfortunately, on Windows it's almost there but not yet; still something to keep an eye on.
A more hard-core approach is to make your process dump the stacks of goroutines when you ask for that using custom-implemented way — https://golang.org/pkg/runtime/ and https://golang.org/pkg/runtime/debug contain all the stuff required to do that.
You might look at how https://golang.org/pkg/net/http/pprof/ is implemented and/or just use it right away.

Easier way to aggregate a collection of memory accesses made by a Windows process?

I'm doing this as a personal project, I want to make a visualizer for this data. but the first step is getting the data.
My current plan is to
make my program debug the target process step through it
each step record the EIP from every thread's context within the target process
construct the memory address the instruction uses from the context and store it.
Is there an easier or built in way to do this?

Have a look at Intel PIN for dynamic binary instrumentation / running a hook for every load / store instruction. intel-pin
Instead of actually single-stepping in a debugger (extremely slow), it does binary-to-binary JIT to add calls to your hooks.
https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/index.html

Honestly the best way to do this is probably instrumentation like Peter suggested, depending on your goals. Have you ever ran a script that stepped through code in a debugger? Even automated it's incredibly slow. The only other alternative I see is page faults, which would also be incredibly slow but should still be faster than single step. Basically you make every page not in the currently executing section inaccessible. Any RW access outside of executing code will trigger an exception where you can log details and handle it. Of course this has a lot of flaws -- you can't detect RW in the current page, it's still going to be slow, it can get complicated such as handling page execution transfers, multiple threads, etc. The final possible solution I have would be to have a timer interrupt that checks RW access for each page. This would be incredibly fast and, although it would provide no specific addresses, it would give you an aggregate of pages written to and read from. I'm actually not entirely sure off the top of my head if Windows exposes that information already and I'm also not sure if there's a reliable way to guarantee your timers would get hit before the kernel clears those bits.

Suspend program execution if syscall with specific parameters called (GDB / strace)

Is there a straigtforward way with ready-at-hand tooling to suspend a traced process' execution when a certain syscalls are called with specific parameters? Specifically I want to suspend program execution whenever
stat("/${SOME_PATH}")
or
readlink("/${SOME_PATH}")
are called. I aim to then attach a debugger, so that I can identify which of the hundreds of shared objects that are linked into the process is trying to access that specific path.
strace shows me the syscalls alright, and gdb does the rest. The question is, how to bring them together. This surely can be solved with custom glue-scripting, but I'd rather use a clean solution.
The problem at hand is a 3rd party toolsuite which is available only in binary form and which distribution package completely violates the LSB/FHS and good manners and places shared objects all over the filesystem, some of which are loaded from unconfigurable paths. I'd like to identify which modules of the toolsuite try to do this and either patch the binaries or to file an issue with the vendor.

This is the approach that I use for similar condition in windows debugging. Even though I think it should be possible for you too, I have not tried it with gdb in linux.
When you attached your process, set breakpoint on your system call which is for example stat in your case.
Add a condition based on esp to your breakpoint. For example you want to check stat("/$te"). value at [esp+4] should point to address of string which in this case is "/$te". Then add a condition like: *(uint32_t*)[esp+4] == "/$te". It seems that you can use strcmp() in your condition too as described here.
I think something similar to this should work for you too.

Character device driver hangs the system - how to avoid?

I'm writing a simple writable character device driver (2.6.32-358.el6.x86_64, under VirtualBox), and since it's not mature yet, it tends to crash/freeze (segfaults, infinite loops).
I'm testing it like this: $> echo "some data" > /dev/my_dev, and if crash/freeze occurs, the whole system (VirtualBox) freezes. I tried to move all the work to another kernel thread to avoid the system-wide freeze, but it doesn't help.
Is it possible to "isolate" such a crash/freeze, so that I'd be able to kill the process, in whose context the kernel module runs?

The module runs in kernel context. That's why debugging it is difficult and bugs can easily crash the system. Infinite loop is not really an issue as it just slows the system down, but doesn't cause a crash. Writing to the wrong memory region however is fatal.
If you are lucky, you would get a kernel oops before the freeze. If you test your code in one of the TTYs, rather than the GUI, then you might immediately see the oops (kernel BUG log) on the screen which you can study and might be helpful to you.
In my experience however, it's best to write and test the kernel-independent code in user-space, probably with mock functions and test it heavily, run valgrind on it, and make sure it doesn't have bugs. Then use it in kernel space. You'd be surprised at how much of a kernel module's code may in fact not need kernel context at all. Of course this very much depends on the functionality of the kernel module.
To actually debug the code in kernel space, there are tools which I have never used, such as kgdb. What I do myself usually is a mixture of printks and binary search. That is, if the crash is so severe that the kernel oops is not shown at all. First, I put printk (possibly with a delay after) in different places to see which parts of the code are reached before the oops. tail -f /var/log/messages comes in handy. Then, I do binary search; disable half of the code to see if the crash occurs. If not, possibly the problem is in the second half. If it occurs, surely the problem is in the first half. Repeat!
The ultimate way of writing a bug-free kernel module is to write code that doesn't have bugs in the first place. Of course, this is rarely possible, but if you write clean and undefined-behavior-free C code and write very concise functions whose correctness is obvious and you pay attention to the boundaries of arrays, it's not that hard.

user defined page fault and exception handlers

I am trying to understand if we can add our page fault handlers / exception handlers in kernel / user mode and handle the fault we induced before giving the control back to the kernel.
The task here will be not modifying the existing kernel code (do_page_fault fn) but add a user defined handler which will be looked up when a page fault or and exception is triggered
One could find tools like "kprobe" which provide hooks at instruction, but looks like this will not serve my purpose.
Will be great if somebody can help me understand this or point to good references.

From user space, you can define a signal handler for SIGSEGV, so your own function will be invoked whenever an invalid memory access is made. When combined with mprotect(), this lets a program manage its own virtual memory, all from user-space.
However, I get the impression that you're looking for a way to intercept all page faults (major, minor, and invalid) and invoke an arbitrary kernel function in response. I don't know a clean way to do this. When I needed this functionality in my own research projects, I ended up adding code to do_page_fault(). It works fine for me, but it's a hack. I would be very interested if someone knew of a clean way to do this (i.e., that could be used by a module on a vanilla kernel).

If you don't won't to change the way kernel handles these fault and just add yours before, then kprobes will server your purpose. They are a little difficult to handle, because you get arguments of probed functions in structure containing registers and on stack and you have to know, where exactly did compiler put each of them. BUT, if you need it for specific functions (known during creation of probes), then you can use jprobes (here is a nice example on how to use both), which require functions for probing with exactly same arguments as probed one (so no mangling in registers/stack).
You can dynamically load a kernel module and install jprobes on chosen functions without having to modify your kernel.

You want can install a user-level pager with gnu libsegsev. I haven't used it, but it seems to be just what you are looking for.

I do not think it would be possible - first of all, the page fault handler is a complex function which need direct access to virtual memory subsystem structures.
Secondly, imagine it would not be an issue, yet in order to write a page fault handler in user space you should be able to capture a fault which is by default a force transfer to kernel space, so at least you should prevent this to happen.
To this end you would need a supervisor to keep track of all memory access, but you cannot guarantee that supervisor code was already mapped and present in memory.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio