valgrind on child process is not working properly. all the processes are down after 10 mins - fork

I ran below command
valgrind --tool=memcheck --leak-check=full --track-origins=yes --verbose --trace-children=yes --trace-children-skip=.*/ton,.*/zip,.*/cntr,.*/sif,.*/cms,.*/man,.*/cer,.*/ltask,.*/ftask,.*/ice --log-file=/home/temp/valgrind.log ./bin/acloader
when I removed option -trace-children=yes, everything is working fine. but with this option whole processes went down in 10 mins. what could be the cause for it and solution ? I need to find the leak in one of the child process.

Related

How to debug when bash hangs?

How to debug when bash hangs?
I have a script running on an ARM64 linux with bash. It hangs after running for a few hours. pstack shows nothing. strace shows nothing either.
I cannot stop the process even with kill -9.
What else can I do with it to debug?
Any suggestions will be appreciated.

dlv hugo just hangs

I am trying to learn Hugo using a Go debugger called dlv. And I am pretty stuck. After:
go get -v github.com/gohugoio/hugo
cd $GOPATH/src/github.com/gohugoio/hugo
go build -gcflags="-N -l"
dlv exec ./hugo -- -s /path/to/the/projectdir
This hangs. Pressing Ctrl+C runs hugo as normal. As far as I can see dlv debug not only produces the same behavior but it's the exact same: the produced binary called debug is the exact same as the hugo I built with go build -gcflags="-N -l".
dlv launches a number of child processes, these disappear after a while. The hugo process is visible via ps and pidof hugo but strace -ppidof hugo`` reports strace: attach: ptrace(PTRACE_ATTACH, ...): No such proce. Checking after, it's still in the ps list, the same pid. I would guess because it's in t state as it is being traced.
How could I then watch Hugo running?
Aaaaand it's Linux subsystem for Windows! I never thought that'd make a difference but following Jonah B's answer " I am on fedora" I tried it on a Debian box and it worked. I am surprised because strace works fine on WSL (actually the github instructions on filing a report includes strace). I filed this bug.
Hmm, doesn't happen for me. dlv prompt appears right away. I am on fedora, have been using hugo regularly over the past week or so.
$ dlv exec ./hugo -- --cleanDestinationDir -s /path/to/blog/root/
Type 'help' for list of commands.
(dlv) c
| EN
+------------------+----+
Pages | 25
Paginator pages | 0
Non-page files | 0
Static files | 11
Processed images | 0
Aliases | 0
Sitemaps | 1
Cleaned | 0
Total in 46 ms
Process 41032 has exited with status 0
$
Same experience here. However it doesn't hang, it just takes a significant time to reach the dlv prompt.
Check the usage of your memory (for instance with mpstat or vmstat if you are on Linux). I have 16G main mem and 16G of swap. Until the dlv prompt is reached, nearly all my memory and a significant amount of swap is consumed. During the startup time any playing video or music stutters and the PC is practically unusable until dlv is ready.
Hugo is pretty large app in that respect.

How to pause the execution of a program after 10 seconds and get a backtrace?

A legacy program most likely gets into an infinite loop on certain pathological inputs. I have >1000 such instances, however, I suspect that the vast majority of them trigger the same bug. Therefore, I would like to reduce the >1000 instances to the fundamentally different ones. The first step is to pause the application after, say, 10 seconds and collect the backtrace.
If I run:
gdb --batch --command=backtrace.txt --args ./legacy_program
with backtrace.txt
run
bt
and I hit Ctrl + C after 10 seconds in the same terminal I get exactly the backtrace I want.
Now, I would like to do that automatically. I have tried sending SIGINT (the expected equivalent of Ctrl + C) from another terminal but I do not get the backtrace anymore. Here are some of my failed attempts based on
GDB how to stop execution without a breakpoint?
Neither of these have any effect:
pkill -SIGINT gdb
kill -SIGINT 5717
where 5717 is the pid of the only gdb running. Sending SIGINT to the legacy_program the same way does kill it but then I do not get the backtrace:
Program received signal SIGINT, Interrupt.
Quit
How can I programmatically pause the execution of the legacy_program after 10 seconds and get a backtrace?
This post was motivated by my frustration not being able to find an answer to this question here at StackOverflow.
Also note that
[it is not merely OK to ask and answer your own question, it is explicitly encouraged.](https://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/)
Apparently, it is a known (bug) feature in gdb, see
GDB is not trapping SIGINT. Ctrl+C terminates program when should break gdb. Try sending SIGSTOP instead from the other terminal:
pkill -STOP legacy_program
It works on my machine.
Note that you do not have to run the legacy_program in the debugger. Enable core dumps
ulimit -c unlimited
and send the program SIGTRAP to make it crash, then get the backtrace from the core dump. So, start the program:
./legacy_program
From another terminal:
pkill -TRAP legacy_program
The backtrace can be obtained like this:
gdb --batch -ex=bt ./legacy_program core

LLDB Restart process without user input

I am trying to debug a concurrent program in LLDB and am getting a seg fault, but not on every execution. I would like to run my process over and over until it hits a seg fault. So far, I have the following:
b exit
breakpoint com add 1
Enter your debugger command(s). Type 'DONE' to end.
> run
> DONE
The part that I find annoying, is that when I get to the exit function and hit my breakpoint, when the run command gets executed, I get the following prompt from LLDB:
There is a running process, kill it and restart?: [Y/n]
I would like to automatically restart the process, without having to manually enter Y each time. Anyone know how to do this?
You could kill the previous instance by hand with kill - which doesn't prompt - then the run command won't prompt either.
Or:
(lldb) settings set auto-confirm 1
will give the default (capitalized) answer to all lldb queries.
Or if you have Xcode 6.x (or current TOT svn lldb) you could use the lldb driver's batch mode:
$ lldb --help
...
-b
--batch
Tells the debugger to running the commands from -s, -S, -o & -O,
and then quit. However if any run command stopped due to a signal
or crash, the debugger will return to the interactive prompt at the
place of the crash.
So for instance, you could script this in the shell, running:
lldb -b -o run
in a loop, and this will stop if the run ends in a crash rather than a normal exit. In some circumstances this might be easier to do.

Can a standalone ruby script (windows and mac) reload and restart itself?

I have a master-workers architecture where the number of workers is growing on a weekly basis. I can no longer be expected to ssh or remote console into each machine to kill the worker, do a source control sync, and restart. I would like to be able to have the master place a message out on the network that tells each machine to sync and restart.
That's where I hit a roadblock. If I were using any sane platform, I could just do:
exec('ruby', __FILE__)
...and be done. However, I did the following test:
p Process.pid
sleep 1
exec('ruby', __FILE__)
...and on Windows, I get one ruby instance for each call to exec. None of them die until I hit ^C on the window in question. On every platform I tried this on, it is executing the new version of the file each time, which I have verified this by making simple edits to the test script while the test marched along.
The reason I'm printing the pid is to double-check the behavior I'm seeing. On windows, I am getting a different pid with each execution - which I would expect, considering that I am seeing a new process in the task manager for each run. The mac is behaving correctly: the pid is the same for every system call and I have verified with dtrace that each run is trigging a call to the execve syscall.
So, in short, is there a way to get a windows ruby script to restart its execution so it will be running any code - including itself - that has changed during its execution? Please note that this is not a rails application, though it does use activerecord.
After trying a number of solutions (including the one submitted by Byron Whitlock, which ultimately put me onto the path to a satisfactory end) I settled upon:
IO.popen("start cmd /C ruby.exe #{$0} #{ARGV.join(' ')}")
sleep 5
I found that if I didn't sleep at all after the popen, and just exited, the spawn would frequently (>50% of the time) fail. This is not cross-platform obviously, so in order to have the same behavior on the mac:
IO.popen("xterm -e \"ruby blah blah blah\"&")
The classic way to restart a program is to write another one that does it for you. so you spawn a process to restart.exe <args>, then die or exit; restart.exe waits until the calling script is no longer running, then starts the script again.

Resources