I'm trying to troubleshoot a hairy bug which involves corruption of a particular integer in memory. I can set a watchpoint and hope to capture a backtrace of whatever is changing this particular value.
Complicating matters, the bug only occurs in production, and only a few times per day. And the bug occurs in a Python webserver called gunicorn, which is a pre-fork server. The corruption happens in one of the worker children, not the master process.
Trouble is, gdb by default won't debug children produced by fork(). If configured to do so with set detach-on-fork off, then it might debug the worker processes, but it will also debug other subprocesses if one of the workers does a fork() and exec().
So is there a way to configure gdb to:
debug child processes produced by fork(), and
detach from a process when it does exec()?
Or perhaps there's some other approach to the issue of debugging the worker children of a pre-fork server?
Related
We develop a user-space process running on Linux 3.4.11 in an embedded MIPS system. The process creates multiple (>10) threads using pthreads. The process has a SIGSEGV signal handler which, among other things, generates a log message which goes to our log file. As part of this flow, it acquires a semaphore (bad, I know...).
During our testing the process appeared to hang. We're currently unable to build gdb for the target platform, so I wrote a CLI tool that uses ptrace to extract the register values and USER data using PTRACE_PEEKUSR.
What surprised me to see is that all of our threads were inside our crash handler, trying to acquire the semaphore. This (obviously?) indicates a deadlock on the semaphore, which means that a thread died while holding it. When I dug up the stack, it seemed that almost all of the threads (except one) were in a blocking call (recv, poll, sleep) when the signal handler started running. Manual stack reconstruction on MIPS is a pain so we have not fully done it yet. One thread appeared to be in the middle of a malloc call, which to me indicates that it crashed due to a heap corruption.
A couple of things are still unclear:
1) Assuming one thread crashed in malloc, why would all other threads be running the SIGSEGV handler? As I understand it, a SIGSEGV signal is delivered to the faulting thread, no? Does it mean that each and every one of our threads crashed?
2) Looking at the sigcontext struct for MIPS, it seems it does not contain the memory address which was accessed (badaddr). Is there another place that has it? I couldn't find it anywhere, but it seemed odd to me that it would not be available.
And of course, if anyone can suggest ways to continue the analysis, it would be appreciated!
Yes, it is likely that all of your threads crashed in turn, assuming that you have captured the thread state correctly.
siginfo_t has a si_addr member, which should give you the address of the fault. Whether your kernel fills that in is a different matter.
In-process crash handlers will always be unreliable. You should use an out-of-process handler, and set kernel.core_pattern to invoke it. In current kernels, it is not necessary to write the core file to disk; you can either read the core file from standard input, or just map the process memory of the zombie process (which is still available when the kernel invokes the crash handler).
I'm having some problems debugging an Android app which runs a string of memory-intensive operations on bitmaps. From Google's Debugging tips, I know that
The debugger and garbage collector are currently loosely integrated. The VM guarantees that any object the debugger is aware of is not garbage collected until after the debugger disconnects. This can result in a buildup of objects over time while the debugger is connected. For example, if the debugger sees a running thread, the associated Thread object is not garbage collected even after the thread terminates.
Unfortunately, this means that while my app runs fine in release mode, any memory-intensive thread running in debug mode will be ignored by the garbage collector and be kept around so more and more memory is used as more and more memory-intensive threads are created, resulting in the app crashing because it fails to allocate the required memory.
Is there any way to explicitly tell the garbage collector that these threads should be collected, or some other way around this issue?
I eventually solved this by spawning an AsyncTask rather than a Thread. It seems that AsyncTasks are cleared up more readily by the garbage collector and hence I was able to run the app in debug mode without issue.
An AsyncTask is the recommended way of spawning background operations. Any work to be done in the background should be placed in the task's doInBackground(Params...) method. AsyncTasks are usually intended to perform actions on the UI thread after completion, but you can avoid disturbing the UI thread by simply leaving the onPostExecute(Result)method empty or not present.
I am working on an MFC application that can (among other things) be used to shut Windows down. When doing this, Windows of course sends the WM_QUERYENDSESSION and WM_ENDSESSION to all applications, mine included. However, the problem is that my application, as part of some destructors, delete certain files (with CFile::Remove) that have been used during the execution. I have reason to believe that the destructors are called (but that is hard to know for certain) when the application is closed by Windows.
However, when Windows starts back up again, I do occasionally notice that the files that were supposed to be deleted are still present. This does not happen consistently, even when the execution of the program is identical (I have a script for testing this). This leads me to think that one of two things are happening: Either a) the destructors are not consistently being called, or b) the Remove function returns, but the file is not actually deleted before Windows is shut down.
The only work-around I have found so far is that if I get the system to wait with the shutdown for approximately 10 seconds after my program has stopped, then the files will be properly deleted. This leads me to believe that b) may be the case.
I hope someone is able to help me with this problem.
Regards
Mort
Once your program returns from WM_ENDSESSION, Windows can terminate it at any time:
If the session is being ended, this parameter is TRUE; the session can end any time after all applications have returned from processing this message.
If the session ends quickly, then it may end before your destructors run. You must do all your cleanup before returning from WM_ENDSESSION, because there is no guarantee that you will get a chance to do it afterwards.
The problem here is that some versions of Windows report back that file handling operations have been completed before they actually have. This isn't a problem unless shutdown is triggered as some operations, including file delete will be abandoned.
I would suggest that you cope with this by forcing your code to wait for a confirmed deletion of the files (have a process look for the files and raise an event when they've gone) before calling for system shutdown.
If the system is properly shut down (nut went sudden power loss or etc.) then all the cached data is flushed. In particular this includes flushing the global file descriptor table (or whatever it's called in your file system) which should commit the file deletion.
So the problem seems to be that the user-mode code doesn't call DeleteFile, or it failes (for whatever reason).
Note that there are several ways the application (process) may exit, whereas not always d'tors are called. There are automatic objects which are destroyed in the context of their callstack, plus there are global/static objects, which are initialized and destroyed by the CRT init/cleanup code.
Below is a short summary of ways to terminate the process, with the consequences:
All process threads exit conventionally (return from their procedure). The OS terminates the process that has no threads. All the d'tors are executed.
Some threads either exit via ExitThread or killed by TerminateThread. The automatic objects of those threads are not d'tructed.
Process exited by ExitProcess. Automatic objects are not destructed, global may be destructed (this happens in the CRT is used in a DLL)
Process is terminated by TerminateProcess. All d'tors are not called.
I suggest you check if the DeleteFile (or CFile::Remove that wraos it) is called indeed, and check also if it succeeds. For instance you may open the same file twice for whatever reason
My application creates a suspended process, gets process's information via VirtualQueryEx() ,but fails getting process's module information using EnumProcessModules().
The task above is completed ONLY if the process is NOT created suspended and a breakpoint is hit in the debugger(so the program runs, before the call is executed).
I'm trying to write a very decent disassembler and for that I would need to run a target process suspended, but EnumProcessModules() does not work on suspended processes.
Is there an alternative?
I dealt with something like this several years ago. If I remember right, what I ended up doing was creating the task suspended, then GetThreadContext, set its trap flag, SetThreadContext, resume the thread (which runs one instruction), then use EnumProcessModules.
Of course, there may be other ways to handle this, but at least if memory serves, that's what I came up with at the time and I seem to recall its working.
I am new to programming.
I know only Start debug before. Maybe start debug suit for some small application develop better.
I found Visual studio IDE provide another method of attach to process for using.
When & Why must I use the attach debugging?
Such as multi-threading application debugging. Client/Service application debugging. etc. Thank you.
Sometimes you need to debug a process started by another program.
For example you need a reliable solution and in order to protect against access violations, memory leaks and other barely recoverable stuff, you have a master program and several worker programs. The master program starts the worker program and passes parameters to it. How do you debug a worker program which is not intended to be started by anything except the master program?
You use "attach to process for that".
Typically you do it this way: insert a statement that blocks the worker program for some time - for example, call Sleep() for 15 seconds. Then you kindly ask the master program to start the worker program. When the worker program is started it blocks and you now have 15 seconds to attach to it.
This way you can debug almost any issues - problems at early startup stages, wrong parameters, etc, which you wouldn't reliably reproduce with "run with debugging".
Attaching to a process is useful if you don't want to debug right from starting the process. For example, debugging usually slows down execution, so it can be quicker to start the app, get it to a state where a bug appears, and then attach a debugger.
It's also useful if you already have an external means of launching the process that you don't want or can't to import into the IDE.
Start debugging from VS launches an instance of the VS webserver and attaches the debugger to it.
Attach to process allows you to attach to any process and debug it, usually you'd do this to your instance of w3wp.exe running your code in IIS
Attach to process is mostly used when you can't run the application from Visual Studio.
For example, if it's a service or if it is a process that has run for a long time and now you want to start debugging it.
Sometimes you also want to debug a remote process, not on your machine - and you can do that using attach to process.