how to get thread traces in a multithreaded c++ running process? - debugging

To debug multithreaded programs to in case of conditions such as deadlock or livelock, what are the useful utilities? I was wondering if gcore gives the stack dump of all running threds in the process or just the main thread. Also, does gcore suspend/kill the running process? Any information on debugging multithreaded programs will be very useful.

gdb supports switching between threads to investigate the state of everything going on. Here is some more information.

gdb has some nice features for working with threads. One of my favorites is thread apply. This allows you to run the same command for multiple threads.
For example, if you wanted to get a backtrace of all threads, you can use this:
thread apply all where
To break this down, the command starts with thread apply.
Next is the list of threads. Here I used the keyword all to apply this to every thread in the process. You can also use a space separated list of the gdb thread ids (thread apply 1 2 3 command).
And finally comes the command to perform. I used where which shows you the call stack but you can use any command you want.

As Carl stated, gdb supports threads. Using a UI (such as one provided by Eclipse) for GDB makes this easier, but you can get thread information when running via the command line by typing "info threads". This will list the threads and allow you to switch by typing "thread 3" etc. Once you switch to a thread, you can do backtraces in order to see the current threads stack and other commands that you're used to using with a single threaded process.

Related

How to follow forks, but detach on exec in gdb

I'm trying to troubleshoot a hairy bug which involves corruption of a particular integer in memory. I can set a watchpoint and hope to capture a backtrace of whatever is changing this particular value.
Complicating matters, the bug only occurs in production, and only a few times per day. And the bug occurs in a Python webserver called gunicorn, which is a pre-fork server. The corruption happens in one of the worker children, not the master process.
Trouble is, gdb by default won't debug children produced by fork(). If configured to do so with set detach-on-fork off, then it might debug the worker processes, but it will also debug other subprocesses if one of the workers does a fork() and exec().
So is there a way to configure gdb to:
debug child processes produced by fork(), and
detach from a process when it does exec()?
Or perhaps there's some other approach to the issue of debugging the worker children of a pre-fork server?

How can you debug a process using gdb without pausing it?

I have a process that is already running, and I want to debug it with GDB. I've been using
gdb --pid $PID
However, when I do this, the process pauses. I'd like to attach to the process without pausing it, and look around in its memory while it's still running. Is this possible? Alternatively, is there a way to "fork" the process so that I can look at its memory, without stopping/pausing the process?
There's no way in gdb to attach without some sort of pause.
The Linux kernel provides some support for this via PTRACE_SEIZE, but gdb doesn't use this yet. There's a bug in bugzilla you can track, "Bug 15250 - use PTRACE_SEIZE and PTRACE_INTERRUPT"
Meanwhile you could try setting gdb into "observer mode". Then you could attach and use continue & to continue the process in the background. You may need to set various settings, like target-async, depending on the gdb version.
I am not totally certain if this will work. It is worth a try. Note that there is a window in which the program will be paused. This is unavoidable right now.

GDB: How to get execution history

I am quite new to the area of compilers. I'm using gcc and I want to get execution history of a program for a particular run i.e. only those statements which are actually executed in the last run.
Is it possible with gdb? I couldn't get relevant options in gdb which could output executed statements.
Or is there any other way of obtaining execution history?
Regards,
Nikhil.
Process Record May be what you're looking for. The link has a quick tutorial and an overview of the functionality.
From the linked wiki page:
Compile this program with -g, and load it into gdb, then do the
following:
(gdb) break main
(gdb) run
(gdb) record
This will turn on process recording, which will now record all subsequent instructions executed by the program being debugged.
Note that you can start process recording at any point (not just at
main). You may choose to start it later, or even earlier. The only
restriction is that your program has to be running (so you have to
type "run" before "record"). If you want to start recording from the
very first instruction of your program, you can do it like this:
(gdb) break _start
(gdb) run
(gdb) record
Hope this helps.
You can use set history save command to start recording history. This can be written into the ~/.gdbinit file. Look at the docs for more information.

How can I find whether a process is in deadlock or is waiting for I/O

Asked by an Interviewer:
How can we find if an application has become non responsive due to a deadlock or due to wait on some IO?
Can anybody comment any general way of doing this, or if various provides some specific ways of doing this?
This is an OS related thing I believe so I am not tagging any language here.
EDIT: I would like to know about the techniques and the APIs as well to do this. So that i can run a monitoring program if i wish.
On linux I would use sar -u 1. If the %iowait column is high, then the application is probably waiting for IO
On Windows you can attach WinDbg and then execute !analyze -v -hang which will work out which thread is waiting on I/O. (The only time I used this I got lucky and it was an open call which was waiting, so I got to find out the file name very quickly.)
The answer is there are many possible design as solutions.
If in your application, u use open() with lockf() or flock() to lock the resource. So the next time another process (or the same process) attempt to flock() the same file again it will be blocked.
If u open a file with LOCK_NB (see "man -s 2 flock in Ubuntu) non-blocking locks, and then returned with EWOULDBLOCK error, then u can deduce that the file is locked.
To identify all the locked files in the OS, one way is to do a "lsof" to see all the opened files, and from the filename and using fcntl() u can identify the types of locks held.
Many possible alternative designs: eg, for Oracle database there is a concept called waiter list to list all the waiters waiting on the existing locked records. Because of this sophisticated design, automatic deadlock detection is also possible.
http://www.dba-oracle.com/t_deadlock.htm
Other techniques are described in general OS courses:
http://lovingod.host.sk/tanenbaum/Recovery-from-Deadlock.html
On Linux you can attach gdb to a running process. It'll stop the process at the point where is is running, with bt you'll get the back-trace. You can also get the thread info of all running threads, switch between them and look at the back-trace of each using info threads; thread N; bt.
Another very useful tool under Linux is strace which traces system calls, you can also attach this to running processes. The -c option shows you profiling information of the system calls done by the program.

How to pause / resume any external process under Windows?

I am looking for different ways to pause and resume programmatically a particular process via its process ID under Windows XP.
Process suspend/resume tool does it with SuspendThread / ResumeThread but warns about multi-threaded programs and deadlock problems.
PsSuspend looks okay, but I wonder if it does anything special about deadlocks or uses another method?
Prefered languages : C++ / Python
If you "debug the debugger" (for instance, using logger.exe to trace all API calls made by windbg.exe), it appears that the debugger uses SuspendThread()/ResumeThread() to suspend all of the threads in the process being debugged.
PsSuspend may use a different way of suspending processes (I'm not sure), but it is still possible to hang other processes: if the process you're suspending is holding a shared synchronization object that is needed by another process, you may block that other process from making any progress. If both programs are well-written, they should recover when you resume the one that you suspended, but not all programs are well-written. And if this causes your program that is doing the suspending to hang, then you have a deadlock.
I'm not sure if this does the job, but with ProcessExplorer from MS Systernals you can suspend a process.
It's been said here: https://superuser.com/a/155263 and I found it there too.
read here and you also have psutil for python that you can use it like that:
>>> import psutil
>>> pid = 7012
>>> p = psutil.Process(pid)
>>> p.suspend()
>>> p.resume()
I tested http://www.codeproject.com/KB/threads/pausep.aspx on few softwares:
it works fine.
PsSuspend and Pausep are two valid options.
So, after I found about UniversalPauseButton, Googling for this ("windows SIGSTOP"), getting this question as the first search result (thanks Ilia K. your comment did its job), and reading the answers, I went back to checkout the code.
Apparently, it uses undocumented NT kernel and Win32 APIs _NtSuspendProcess, _NtResumeProcess and _HungWindowFromGhostWindow.
PsSuspend, the utility you mentioned and linked to probably uses these APIs, I couldn't verify this, the source code isn't supplied, only executables and a EULA, you can probably figure that out by disassembling the binary but it's against the EULA.
so, to answer your specific question, checkout UniversalPauseButton's main.cpp, basically you call _NtSuspendProcess(ProcessHandle) and _NtResumeProcess(ProcessHandle), ProcessHandle being the handle of the process you want to pause or resume.
I think there is a good reason why there is no SuspendProcess() function in Windows. Having such a function opens the door for an unstable system. You shall not suspend a process unless you created that process yourself.
If you wrote that process yourself, you could use an event (see ::SetEvent() etc. in MSDN) or another kind of messaging to trigger a pause command in the process.

Resources