I've seen a few examples out there but not been able to work them to my situation.
I have a script that calls a long running command, but I want to periodically (say every 1s) get the status of that call. For example:
#!/bin/bash
curl localhost:9200/my_index/_forcemerge?max_num_segments=2 &
while [ command is running ]; do
curl -XGET localhost:9200/_cat/shards/my_index?v&h=index,shard,prirep,segments.count
sleep 1
done
echo "finished!"
Is it possible to get the status of the child process in this way?
Edit: Clarifying what I'm actually doing. It's actually two curl commands to an Elasticsearch cluster. The long running command merges data segments together, the "status" command will get the current segment count.
I think that the safest way of doing this is to save the process ID of the child process and then periodically check to see if this is still running:
#!/bin/bash
mycommand &
child_pid=$!
while kill -0 $child_pid >/dev/null 2>&1; do
echo "Child process is still running"
sleep 1
done
echo "Child process has finished"
The variable $! will hold the process ID of the last process started in the background.
The kill -0 will not send a signal to the process, it only make kill return with a zero exit status if the given process ID exists and belongs to the user executing kill.
One could come up with a solution using pgrep too, but that will probably be a bit more "unsafe" in the sense that care must be taken not to catch any similar running processes.
Related
Context:
Users provide me their custom scripts to run. These scripts can be of any sort like scripts to start multiple GUI programs, backend services. I have no control over how the scripts are written. These scripts can be of blocking type i.e. execution waits till all the child processes (programs that are run sequentially) exit
#exaple of blocking script
echo "START"
first_program
second_program
echo "DONE"
or non blocking type i.e. ones that fork child process in the background and exit something like
#example of non-blocking script
echo "START"
first_program &
second_program &
echo "DONE"
What am I trying to achieve?
User provided scripts can be of any of the above two types or mix of both. My job is to run the script and wait till all the processes started by it exit and then shutdown the node. If its of blocking type, case is plain simple i.e. get the PID of script execution process and wait till ps -ef|grep -ef PID has no more entries. Non-blocking scripts are the ones giving me trouble
Is there a way I can get list of PIDs of all the child process spawned by execution of a script? Any pointers or hints will be highly appreciated
You can use wait to wait for all the background processes started by userscript to complete. Since wait only works on children of the current shell, you'll need to source their script instead of running it as a separate process.
( source userscript; wait )
Sourcing the script in an explicit subshell should simulate starting a new process closely enough. If not, you can also background the subshell, which forces a new process to be started, then wait for it to complete.
( source userscript; wait ) & wait
ps --ppid $PID will list all child processes of the process with $PID.
You can open a file descriptor that gets inherited by other processes, and then wait until it's no longer in use. This is a low overhead method that usually works fine, though it's possible for processes to work around it if they want:
foo=$(mktemp)
( flock -x 5000; theirscript; ) 5000> "$foo"
flock -x 0 < "$foo"
rm "$foo"
echo "The script and its subprocesses are done"
You can follow all invoked processes using ptrace, such as with strace. This is easier, but has some associated overhead and may not work when scripts invoke suid binaries:
strace -f -e none theirscript
You can use pgrep -P <parent_pid> to get a list of child processes. Example:
IFS=$'\n' read -ra CHILD_PROCS -d '' < <(exec pgrep -P "$1")
And to get the grand-children, simply do the same procedure on each child process.
Check out my blog Bash functions to list and kill or send signals to process trees.
You can use one of those function to properly list all processes spawned under one process. Each has their own method or order of sending signals to process.
The only limitation by those is that process still have to be connected and not orphaned. If you could somehow find a way to group your processes, then that might be your solution.
To simply answer the question that was asked. You could store the process ID of each script you're calling into the same variable:
echo "START"
first_program &
child_process_ids+="$! "
second_program &
child_process_ids+="$! "
echo $child_process_ids
echo "DONE"
$child_process_ids would just be a space delimited string of process Ids. Now, this answers the question asked, however, what I would do would be a bit different. I would call each script from a for loop, store its process ID, then wait on each one in another for loop to finish and inspect each exit code individually. Using the same example, here's what it would look like.
echo "START"
scripts="first_program second_program"
for script in $scripts; do
#Call script and send to background
./$script &
#Store the script's processID that was just sent to the background
child_process_ids+="$! "
done
for child_process_id in $child_process_ids; do
#Pass each processId into the wait command to retrieve its exit
#code and store it in $rc
wait $child_process_id
rc=$?
#Inspect each processes exit code
if [ $rc -ne 0 ]; then
echo "$child_process_id failed with an exit code of $rc"
else
echo "$child_process_id was successful"
fi
done
I have written a bash script to carry out some tests on my system. The tests run in the background and in parallel. The tests can take a long time and sometimes I may wish to abort the tests part way through.
If I Control+C then it aborts the parent script, but leaves the various children running. I wish to make it so that I can hit Control+C or otherwise to quit and then kill all child processes running in the background. I have a bit of code that does the job if I'm running running the background jobs directly from the terminal, but it doesn't work in my script.
I have a minimal working example.
I have tried using trap in combination with pgrep -P $$.
#!/bin/bash
trap 'kill -n 2 $(pgrep -P $$)' 2
sleep 10 &
wait
I was hoping that on hitting control+c (SIGINT) would kill everything that the script started but it actually says:
./breakTest.sh: line 1: kill: (3220) - No such process
This number changes, but doesn't seem to apply to any running processes, so I don't know where it is coming from.
I guess if the contents of the trap command get evaluated where the trap command occurs then it might explain the outcome. The 3220 pid might be for pgrep itself.
I'd appreciate some insight here
Thanks
I have found a solution using pkill. This example also deals with many child processes.
#!/bin/bash
trap 'pkill -P $$' SIGINT SIGTERM
for i in {1..10}; do
sleep 10 &
done
wait
This appears to kill all the child processes elegantly. Though I don't properly understand what the issue was with my original code, apart from sending the correct signal.
in bash whenever you you use & after a command it places that command as a background job ( this background jobs are called job_spec ) which is incremented by one until you exit that terminal session. You can use the jobs command to get the list of the background jobs running. To work with this jobs you have to use the % with the job id. The jobs command also accept other options such as jobs -p to see the proces sids of all jobs , jobs -p %JOB_SPEC to see the process of id of that particular job.
#!/usr/bin/env bash
trap 'kill -9 %1' 2
sleep 10 &
wait
or
#!/usr/bin/env bash
trap 'kill -9 $(jobs -p %1)' 2
sleep 10 &
wait
I implemented something like this few years back, you can take a look at it async bash
You can try something like the following:
pkill -TERM -P <your_parent_id_here>
I can't figure out my bug on OSX. When I try to see when Curl is finished, the process remains loaded. I never see the CURL FINISHED message.
#!/bin/bash
curl -S -o example.com http://example.com/downloads/example.zip &
CURL_PID=$!
echo -e "CURL PID = $CURL_PID"
while :
do
sleep 1
if [ -n $(ps -p$CURL_PID -o pid=) ]; then
echo "CURL NOT FINISHED"
else
echo "CURL FINISHED"
break
fi
done
Note on OSX's version of Bash when I run this:
#!/bin/bash
PIDX=1
if [ -n $(ps -p$PIDX -o pid=) ]; then
echo "PROCESS 1 IS THERE"
else
echo "PROCESS 1 IS NOT THERE"
fi
...it says Process 1 is there. (Everyone has a PID 1, so this is just an example.) So, I know that my if statement is correct. No double quotes necessary on the if line.
Note that I can't use wait on the $CURL_PID because what you don't see here is that I also am using OSX's osascript command to show a dialog that says "Downloading...", which also has a Cancel button on it and its own $DLG_PID, and so I'm looping endlessly until either they cancel the dialog (meaning $DLG_PID points is gone) or $CURL_PID is gone (meaning the download finally completed so I can run kill $DLG_PID now).
On OSX, note I'm doing this as well before the curl statement.
osascript -e 'tell app "System Events" to display dialog "Downloading..." with title "My App Installer" buttons {"Cancel"}' &
So, if someone cancels the dialog, I kill the curl by PID and exit the infinite loop (and exit the bash script). If they don't cancel that dialog, and the curl finishes, then I kill the dialog by PID and exit the bash script.
Usually you'll use wait for that:
curl http://... &
do_something
wait
echo "CURL has finished"
The portable way for polling a backgrounded job is to use the kill builtin, and send the signal 0 to see if it's deliverable. kill -0 $pid (where $pid is the PID of a child process) will return zero if the child process is still running, and nonzero if it has already died. Note that this is safe and only safe (from PID recycling) for a child process (rather than some random process started elsewhere, with PID written to a PID file), for reasons outlined here:
Each UNIX process also has a parent process. This parent process is the process that started it, but can change to the init process if the parent process ends before the new process does. (That is, init will pick up orphaned processes.) Understanding this parent/child relationship is vital because it is the key to reliable process management in UNIX. A process's PID will NEVER be freed up for use after the process dies UNTIL the parent process waits for the PID to see whether it ended and retrieve its exit code. If the parent ends, the process is returned to init, which does this for you.
This is important for one major reason: if the parent process manages its child process, it can be absolutely certain that, even if the child process dies, no other new process can accidentally recycle the child process's PID until the parent process has waited for that PID and noticed the child died. This gives the parent process the guarantee that the PID it has for the child process will ALWAYS point to that child process, whether it is alive or a "zombie". Nobody else has that guarantee.
Of course, newer versions of OS X don't use init (in its place is launchd), but the principle is the same.
By the way, the whole page is worth a read: http://mywiki.wooledge.org/ProcessManagement.
In light of that, here's an example script that does what you want (it takes one URL argument — the URL to download). Bug me if something's unclear.
#!/usr/bin/env bash
osascript -e 'tell app "System Events" to display dialog "Downloading..." with title "Downloader" buttons {"Cancel"}' &>/dev/null &
dialog_pid=$!
curl -sSLO "$1" &
curl_pid=$!
timer=0
while kill -0 "$curl_pid" &>/dev/null; do
kill -0 "$dialog_pid" &>/dev/null || { echo "User cancelled download from dialog."; kill "$curl_pid" &>/dev/null; exit 1; }
sleep 1
(( timer++ ))
echo "Been downloading for $timer seconds..."
done
echo "Finished."
kill "$dialog_pid" &>/dev/null
wait &>/dev/null
Run it:
> ./download https://github.com/torvalds/linux/archive/v4.4-rc2.tar.gz
Been downloading for 1 seconds...
Been downloading for 2 seconds...
<omitted>
Been downloading for 38 seconds...
Finished.
Cancelling midway:
> ./download https://github.com/torvalds/linux/archive/v4.4-rc2.tar.gz
Been downloading for 1 seconds...
Been downloading for 2 seconds...
Been downloading for 3 seconds...
User cancelled download from dialog.
The ugly thing is that killing the PID of the osascript job doesn't dismiss the dialog box... Which I'm not in the position to solve because I absolutely dread AppleScript.
Context:
Users provide me their custom scripts to run. These scripts can be of any sort like scripts to start multiple GUI programs, backend services. I have no control over how the scripts are written. These scripts can be of blocking type i.e. execution waits till all the child processes (programs that are run sequentially) exit
#exaple of blocking script
echo "START"
first_program
second_program
echo "DONE"
or non blocking type i.e. ones that fork child process in the background and exit something like
#example of non-blocking script
echo "START"
first_program &
second_program &
echo "DONE"
What am I trying to achieve?
User provided scripts can be of any of the above two types or mix of both. My job is to run the script and wait till all the processes started by it exit and then shutdown the node. If its of blocking type, case is plain simple i.e. get the PID of script execution process and wait till ps -ef|grep -ef PID has no more entries. Non-blocking scripts are the ones giving me trouble
Is there a way I can get list of PIDs of all the child process spawned by execution of a script? Any pointers or hints will be highly appreciated
You can use wait to wait for all the background processes started by userscript to complete. Since wait only works on children of the current shell, you'll need to source their script instead of running it as a separate process.
( source userscript; wait )
Sourcing the script in an explicit subshell should simulate starting a new process closely enough. If not, you can also background the subshell, which forces a new process to be started, then wait for it to complete.
( source userscript; wait ) & wait
ps --ppid $PID will list all child processes of the process with $PID.
You can open a file descriptor that gets inherited by other processes, and then wait until it's no longer in use. This is a low overhead method that usually works fine, though it's possible for processes to work around it if they want:
foo=$(mktemp)
( flock -x 5000; theirscript; ) 5000> "$foo"
flock -x 0 < "$foo"
rm "$foo"
echo "The script and its subprocesses are done"
You can follow all invoked processes using ptrace, such as with strace. This is easier, but has some associated overhead and may not work when scripts invoke suid binaries:
strace -f -e none theirscript
You can use pgrep -P <parent_pid> to get a list of child processes. Example:
IFS=$'\n' read -ra CHILD_PROCS -d '' < <(exec pgrep -P "$1")
And to get the grand-children, simply do the same procedure on each child process.
Check out my blog Bash functions to list and kill or send signals to process trees.
You can use one of those function to properly list all processes spawned under one process. Each has their own method or order of sending signals to process.
The only limitation by those is that process still have to be connected and not orphaned. If you could somehow find a way to group your processes, then that might be your solution.
To simply answer the question that was asked. You could store the process ID of each script you're calling into the same variable:
echo "START"
first_program &
child_process_ids+="$! "
second_program &
child_process_ids+="$! "
echo $child_process_ids
echo "DONE"
$child_process_ids would just be a space delimited string of process Ids. Now, this answers the question asked, however, what I would do would be a bit different. I would call each script from a for loop, store its process ID, then wait on each one in another for loop to finish and inspect each exit code individually. Using the same example, here's what it would look like.
echo "START"
scripts="first_program second_program"
for script in $scripts; do
#Call script and send to background
./$script &
#Store the script's processID that was just sent to the background
child_process_ids+="$! "
done
for child_process_id in $child_process_ids; do
#Pass each processId into the wait command to retrieve its exit
#code and store it in $rc
wait $child_process_id
rc=$?
#Inspect each processes exit code
if [ $rc -ne 0 ]; then
echo "$child_process_id failed with an exit code of $rc"
else
echo "$child_process_id was successful"
fi
done
This question already has answers here:
What's the best way to send a signal to all members of a process group?
(34 answers)
Closed 6 years ago.
For testing purposes I have this shell script
#!/bin/bash
echo $$
find / >/dev/null 2>&1
Running this from an interactive terminal, ctrl+c will terminate bash, and the find command.
$ ./test-k.sh
13227
<Ctrl+C>
$ ps -ef |grep find
$
Running it in the background, and killing the shell only will orphan the commands running in the script.
$ ./test-k.sh &
[1] 13231
13231
$ kill 13231
$ ps -ef |grep find
nos 13232 1 3 17:09 pts/5 00:00:00 find /
$
I want this shell script to terminate all its child processes when it exits regardless of how it's called. It'll eventually be started from a python and java application - and some form of cleanup is needed when the script exits - any options I should look into or any way to rewrite the script to clean itself up on exit?
I would do something like this:
#!/bin/bash
trap : SIGTERM SIGINT
echo $$
find / >/dev/null 2>&1 &
FIND_PID=$!
wait $FIND_PID
if [[ $? -gt 128 ]]
then
kill $FIND_PID
fi
Some explanation is in order, I guess. Out the gate, we need to change some of the default signal handling. : is a no-op command, since passing an empty string causes the shell to ignore the signal instead of doing something about it (the opposite of what we want to do).
Then, the find command is run in the background (from the script's perspective) and we call the wait builtin for it to finish. Since we gave a real command to trap above, when a signal is handled, wait will exit with a status greater than 128. If the process waited for completes, wait will return the exit status of that process.
Last, if the wait returns that error status, we want to kill the child process. Luckily we saved its PID. The advantage of this approach is that you can log some error message or otherwise identify that a signal caused the script to exit.
As others have mentioned, putting kill -- -$$ as your argument to trap is another option if you don't care about leaving any information around post-exit.
For trap to work the way you want, you do need to pair it up with wait - the bash man page says "If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes." wait is the way around this hiccup.
You can extend it to more child processes if you want, as well. I didn't really exhaustively test this one out, but it seems to work here.
$ ./test-k.sh &
[1] 12810
12810
$ kill 12810
$ ps -ef | grep find
$
Was looking for an elegant solution to this issue and found the following solution elsewhere.
trap 'kill -HUP 0' EXIT
My own man pages say nothing about what 0 means, but from digging around, it seems to mean the current process group. Since the script get's it's own process group, this ends up sending SIGHUP to all the script's children, foreground and background.
Send a signal to the group.
So instead of kill 13231 do:
kill -- -13231
If you're starting from python then have a look at:
http://www.pixelbeat.org/libs/subProcess.py
which shows how to mimic the shell in starting
and killing a group
#Patrick's answer almost did the trick, but it doesn't work if the parent process of your current shell is in the same group (it kills the parent too).
I found this to be better:
trap 'pkill -P $$' EXIT
See here for more info.
Just add a line like this to your script:
trap "kill $$" SIGINT
You might need to change 'SIGINT' to 'INT' on your setup, but this will basically kill your process and all child processes when you hit Ctrl-C.
The thing you would need to do is trap the kill signal, kill the find command and exit.