How do I record all child processes spawned by an Ant script over time? - bash

I inherited a legacy Ant-based build system and I'm trying to get a sense of its scope. I observed multiple jvm and junit tasks with fork=yes. It calls subant and similar tasks wildly. Occasionally, it just execs other processes.
I really don't want to search through 100s of scripts and reference documentation for every task to find possible-forking-behavior. I'd like to capture the child-process list while the build runs.
I managed to create a clean Vagrant + Puppet environment for builds and I can run the full build like so
$ cd /vagrant && $ANT_HOME/bin/ant
If I had to brute force something... I'd have a script kick off the build and capture child processes until the build is completed?
#!/bin/bash
$ANT_HOME/bin/ant &
while ps $!
do
sleep 1
ps --ppid $! >> build_processes
done

User Jayan recommended strace, specifically:
$ strace -f -e trace=fork ant
The -f limits tracing to fork system calls.
Trace child processes as they are created by cur-
rently traced processes as a result of the fork(2)
system call. The new process is attached to as soon
as its pid is known (through the return value of
fork(2) in the parent process). This means that such
children may run uncontrolled for a while (espe-
cially in the case of a vfork(2)), until the parent
is scheduled again to complete its (v)fork(2) call.
If the parent process decides to wait(2) for a child
that is currently being traced, it is suspended
until an appropriate child process either terminates
or incurs a signal that would cause it to terminate
(as determined from the child’s current signal dis-
position).
I can't find the trace=fork expression, but trace=process seems useful.
-e trace=process
Trace all system calls which involve process management. This is useful for watching the fork, wait, and exec steps of a process.
http://linuxcommand.org/man_pages/strace1.html

As ant is a java process, you can try to use byteman. In byteman script you define rules which are triggered when methods exec from java.lang.Runtime are executed.
You attach byteman to ant using ANT_OPTS env variable.

Related

Linux - Child process to survive parent process tree kill

Motivation:
In a Java program, I'm setting a bash script to be executed on -XX:OnOutOfMemoryError. This script is responsible for uploading the heap-dump to HDFS. However, quite often only a part of the file gets uploaded.
I'm suspecting the JVM gets killed by cluster manager before the upload script completes. My guess is the JVM receives a process group kill signal and takes the bash script, i.e. its child process, down too.
The Question:
Is there a way in unix to run a sub-process in such a way that it does not die when it's parent receives a group kill signal?
You can use disown. Start the process in the background and then disown it, and any kill signals to the process parent will no longer be propagated to the child.
Script would look something like:
./handler_script &
disown

What purpose does using exec in docker entrypoint scripts serve?

For example in the redis official image:
https://github.com/docker-library/redis/blob/master/2.8/docker-entrypoint.sh
#!/bin/bash
set -e
if [ "$1" = 'redis-server' ]; then
chown -R redis .
exec gosu redis "$#"
fi
exec "$#"
Why not just run the commands as usual without exec preceding them?
As #Peter Lyons says, using exec will replace the parent process, rather than have two processes running.
This is important in Docker for signals to be proxied correctly. For example, if Redis was started without exec, it will not receive a SIGTERM upon docker stop and will not get a chance to shutdown cleanly. In some cases, this can lead to data loss or zombie processes.
If you do start child processes (i.e. don't use exec), the parent process becomes responsible for handling and forwarding signals as appropriate. This is one of the reasons it's best to use supervisord or similar when running multiple processes in a container, as it will forward signals appropriately.
Without exec, the parent shell process survives and waits for the child to exit. With exec, the child process replaces the parent process entirely so when there's nothing for the parent to do after forking the child, I would consider exec slightly more precise/correct/efficient. In the grand scheme of things, I think it's probably safe to classify it as a minor optimization.
without exec
parent shell starts
parent shell forks child
child runs
child exits
parent shell exits
with exec
parent shell starts
parent shell forks child, replaces itself with child
child program runs taking over the shell's process
child exits
Think of it as an optimization like tail recursion.
If running another program is the final act of the shell script, there's not much of a need to have the shell run the program in a new process and wait for it. Using exec, the shell process replaces itself with the program.
In either case, the exit value of the shell script will be identical1. Whatever program originally called the shell script will see an exit value that is equal to the exit value of the exec`ed program (or 127 if the program cannot be found).
1 modulo corner cases such as a program doing something different depending on the name of its parent.

How can I create a process in Bash that has zero overhead but which gives me a process ID

For those of you who know what you're talking about I apologise for butchering the way that I'm going to phrase this question. I know nothing about bash whatsoever. With that caveat out of the way, let me get out my cleaver...
I am building a Rails app which has what's called a procfile which sets up any processes that need to be run in different environments
e.g.
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
redis: redis-server
worker: bundle exec sidekiq
proxylocal: bin/proxylocal_local
Each one of these lines specs a process to be run. It also expects a pid to be returned after the process spins up. The syntax is
process_name: process_invokation_script
However the last process, proxylocal, only actually starts a process in development. In production it doesn't do anything.
Unfortunately that causes the Procfile to choke as it needs a process ID returned. So is there some super-simple, zero-overhead process that I can spawn in that case just to keep the procfile happy?
The sleep command does nothing for a specified period of time, with very low overhead. Give it an argument longer than your code will run.
For example
sleep 2147483647
does nothing for 231-1 seconds, just over 68 years. I picked that number because any reasonable implementation of sleep should be able to handle it.
In the unlikely event that that doesn't work (say if you're on an old 16-bit system that can't sleep for more than 216-1 seconds), you can do a sleep in an infinite loop:
sh -c 'while : ; do sleep 30000 ; done'
This assumes that you need the process to run for a very long time; that depends on what your application needs to do with the process ID. If it's required to be unique as long as the application is running, you need something that will continue to run for a long time; if the process terminates, its PID can be re-used by another process.
If that's not a requirement, you can use sleep 0 or true, which will terminate immediately.
If you need to give the application a little time to get the process ID before the process terminates, something like sleep 10 or even sleep 1 might work, though determining just how long it needs to run can be tricky and error-prone.
If Heroku isn't doing anything with proxylocal I'm not sure why you'd even want this in your Procifle. I'm also a bit confused about whether you want to change the Procfile or what bin/proxylocal_local does and how you would even do that.
That being said, if you are able to do anything you like for production your script can just call cat and it will create a pid and then just sit waiting for the next command (which never comes).
For truly minimal overhead, you don't want to run any external commands. When the shell starts a command, it first forks itself, then the child shell execs the external command. If the forked child can run a builtin, you can skip the exec.
Start by creating a read-only fifo somewhere.
mkfifo foo
chmod 400 foo
Then, whenever you need a do-nothing process, just fork a shell which tries to read from the fifo. It's read-only, so no one can write to it, so all reads will block.
read < foo &

Script which launches another application will bring it down on exit

I have a script which does launch another application using nohup my_app &, but when the initial script dies the launched process also goes down. As per my understanding since since it has been ran with nohup that should not happen. The original script also called with nohup.
What went wrong there?
A very reliable script that has been used successfully for years, and has always terminated after invoking a nohup uses this construct:
nohup ${BinDir}/${Watcher} >${DataDir}/${Watcher}.nohup.out 2>&1 &
Perhaps the problem is that output is not being managed?
nohup does not mean that a (child) process is still running when the (parent) process is killed. nohup is used f.e. when you're connecting over ssh to a server and there starting a process. If you log out, the process will terminate (logging out sents the signal SIGHUP to the process causing the process to terminate), using nohup avoid this behaviour and you're process is still running when you logged out.
If you need a program which runs in the background even it's parent process has terminated try using daemons.
It depends what my-app does - it might set its own signal mask. You probably know that nohup ignores the hang-up signal SIGHUP, and this is inherited by the target program. If that target program does its own signal handling then it might be setting SIGHUP to, for example SIG_DFT - the default action (which is to die).
To check, run strace -f -o out or truss -f -o out on the command. This will give you all the kernel calls in the file called 'out'. You should be able to spot the signal mask being changed if it is.

How to renice a PID dynamically?

On our development server, we have a bunch of shell script wrappers for Java JARs. Using the CRON scheduler, we do fire these scripts daily for different purposes.
For performance testing, we would like to renice a script's PID to a priority of 1 at runtime.
Right now, we do it from the command line or using TOP.
Is there a way to do that within the shell script itself without "doing harm" to the process as well as other processes?
This should work:
renice -n 1 $$
Afterwards the script itself will have a nice value of 1. This will also apply to all new children, although not to previously forked ones.

Resources