Bash - starting and killing processes - bash

I need some advice on a "simple" bash script.
I want to start around 500 instances of a program "myprog", and kill all of them after x number of seconds
In short, I have a loop that starts the program in background, and after sleep x (number of seconds) pkill is called with the program name.
My questions are:
How can I verify that after 10 seconds all 500 instances are running? ps and grep combination with counting or is there another way?
How can I get a count of how many processes did the pkill (or similar kill functions) actually kill (so that there are not any processes that terminate before the actual timelimit)?
How can one redirect the output of pkill(or similar kill functions) so that it doesn't output the killed process information, so that 500 lines of ./initializeTest: line 250: 7566 Terminated ./$myprog can be avoided. Redirecting to /dev/null didn't do the trick.

In bash there is the ulimit command that controls the resources of a (sub)shell.
This, for example, is guaranteed to use at most 10 seconds of cpu time and then die:
(ulimit -t 10; ./do_something)
That doesn't answer your question but hopefully it is helpful.

1,2. Use pgrep. I don't remember off the top of my head whether pgrep has -c parameter, so you might need to pipe that to wc -l.
3: that output is produced by your shell's job control. I think if you run that as a script (not in an interactive shell), there shouldn't be such an output. For an interactive shell, the are number of ways to turn that off, but they are shell-dependent, so refer to your shell's manual.

Well my 2 cents :
ps and grep can do the job. I found that kill -0 $pid is better, by the way :) (it tells you if a process is running or not)
You can use ps/grep or kill -0. For your problem, I will start all processes in the background and get their pid with $!, store them in an array or a list, then use kill -0 to get the status of all the processes.
use &> or 2>&1 as it is probably written on stderr
my2c

To make sure that each process gets their fair share of 10 seconds before they are killed, I would wrap each command within a subshell with it's own sleep && kill.
function run_with_tmout {
CMD=$1; TMOUT=$2
$CMD &
PID=$!
sleep $TMOUT
kill $PID
}
for ((i=0; i < 500; i++)); do
run_with_tmout ./myprog 10 &
done
# wait for all child processes to end
wait && echo "all done"
For a more complete example, see this example from bashcookbook.com which first checks if the process is still running, then tries kill -s SIGTERM before resorting to SIGKILL.

I have been using something like the following to get a list of pids.
PS=$(ps ax | sed s/^' '*// | grep java | grep program_name | cut -d' ' -f1)
Then I use kill $PS to stop them.
!/bin/bash
PS=$(ps ax | sed s/^' '*// | grep java | grep program_name | cut -d' ' -f1)
kill $PS

Related

How to kill a process group with kill in bash?

I have a script which is much more complicated but I managed to produce a short script that exhibits the same problem.
I create a process and make it a session leader and then send SIGINT to it. The kill builtin doesn't fail but the process doesn't get killed either (i.e. the default behaviour for SIGINT is to kill). I tried with kill -INT -pid (which should be equivalent to what I do currently) and the /bin/kill command but the behaviour is the same.
The script is as follows:
#!/bin/bash
# Run in a new session so that I don't have to kill the shell
setsid bash -c "sleep 50" &
procs=$(ps --ppid $$ -o pid,pgid,command | grep 'sleep' | head -1)
if [[ -z "$procs" ]]; then
echo "Couldn't find process group"
exit 1
fi
PID=$(echo $procs | cut -d ' ' -f 1)
pgid=$(echo $procs | cut -d ' ' -f 2)
if ! kill -n SIGINT $pgid; then
echo "kill failed"
fi
echo "done"
ps -P $pgid
My expectation is that the last ps command shouldn't report anything (as kill didn't report failure and hence the process should have died) but it does.
I am looking for an explanation of the above noted behaviour and how I can kill a process group (i.e. both the bash and the sleep it starts -- the setsid line above) running in a separate session.
I think you'll find that sleep ignores SIGINT. Take a look at the signals of your sleep command and see. On my Linux box I find:
SigIgn: 0000000000000006
The second bit from the right is set (6 = 4 + 2 + 0), and from the above link:
--> 2 = SIGINT
Try send a HUP, and you'll find it does kill the sleep.

nonblocking wait ${myPid} in bash [duplicate]

Is there any builtin feature in Bash to wait for a process to finish?
The wait command only allows one to wait for child processes to finish.
I would like to know if there is any way to wait for any process to finish before proceeding in any script.
A mechanical way to do this is as follows but I would like to know if there is any builtin feature in Bash.
while ps -p `cat $PID_FILE` > /dev/null; do sleep 1; done
To wait for any process to finish
Linux (doesn't work on Alpine, where ash doesn't support tail --pid):
tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1 &>/dev/null
With timeout (seconds)
Linux:
timeout $timeout tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
There's no builtin. Use kill -0 in a loop for a workable solution:
anywait(){
for pid in "$#"; do
while kill -0 "$pid"; do
sleep 0.5
done
done
}
Or as a simpler oneliner for easy one time usage:
while kill -0 PIDS 2> /dev/null; do sleep 1; done;
As noted by several commentators, if you want to wait for processes that you do not have the privilege to send signals to, you have find some other way to detect if the process is running to replace the kill -0 $pid call. On Linux, test -d "/proc/$pid" works, on other systems you might have to use pgrep (if available) or something like ps | grep "^$pid ".
I found "kill -0" does not work if the process is owned by root (or other), so I used pgrep and came up with:
while pgrep -u root process_name > /dev/null; do sleep 1; done
This would have the disadvantage of probably matching zombie processes.
This bash script loop ends if the process does not exist, or it's a zombie.
PID=<pid to watch>
while s=`ps -p $PID -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
EDIT: The above script was given below by Rockallite. Thanks!
My orignal answer below works for Linux, relying on procfs i.e. /proc/. I don't know its portability:
while [[ ( -d /proc/$PID ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done
It's not limited to shell, but OS's themselves do not have system calls to watch non-child process termination.
FreeBSD and Solaris have this handy pwait(1) utility, which does exactly, what you want.
I believe, other modern OSes also have the necessary system calls too (MacOS, for example, implements BSD's kqueue), but not all make it available from command-line.
From the bash manpage
wait [n ...]
Wait for each specified process and return its termination status
Each n may be a process ID or a job specification; if a
job spec is given, all processes in that job's pipeline are
waited for. If n is not given, all currently active child processes
are waited for, and the return status is zero. If n
specifies a non-existent process or job, the return status is
127. Otherwise, the return status is the exit status of the
last process or job waited for.
Okay, so it seems the answer is -- no, there is no built in tool.
After setting /proc/sys/kernel/yama/ptrace_scope to 0, it is possible to use the strace program. Further switches can be used to make it silent, so that it really waits passively:
strace -qqe '' -p <PID>
All these solutions are tested in Ubuntu 14.04:
Solution 1 (by using ps command):
Just to add up to Pierz answer, I would suggest:
while ps axg | grep -vw grep | grep -w process_name > /dev/null; do sleep 1; done
In this case, grep -vw grep ensures that grep matches only process_name and not grep itself. It has the advantage of supporting the cases where the process_name is not at the end of a line at ps axg.
Solution 2 (by using top command and process name):
while [[ $(awk '$12=="process_name" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_name with the process name that appears in top -n 1 -b. Please keep the quotation marks.
To see the list of processes that you wait for them to be finished, you can run:
while : ; do p=$(awk '$12=="process_name" {print $0}' <(top -n 1 -b)); [[ $b ]] || break; echo $p; sleep 1; done
Solution 3 (by using top command and process ID):
while [[ $(awk '$1=="process_id" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_id with the process ID of your program.
Blocking solution
Use the wait in a loop, for waiting for terminate all processes:
function anywait()
{
for pid in "$#"
do
wait $pid
echo "Process $pid terminated"
done
echo 'All processes terminated'
}
This function will exits immediately, when all processes was terminated. This is the most efficient solution.
Non-blocking solution
Use the kill -0 in a loop, for waiting for terminate all processes + do anything between checks:
function anywait_w_status()
{
for pid in "$#"
do
while kill -0 "$pid"
do
echo "Process $pid still running..."
sleep 1
done
done
echo 'All processes terminated'
}
The reaction time decreased to sleep time, because have to prevent high CPU usage.
A realistic usage:
Waiting for terminate all processes + inform user about all running PIDs.
function anywait_w_status2()
{
while true
do
alive_pids=()
for pid in "$#"
do
kill -0 "$pid" 2>/dev/null \
&& alive_pids+="$pid "
done
if [ ${#alive_pids[#]} -eq 0 ]
then
break
fi
echo "Process(es) still running... ${alive_pids[#]}"
sleep 1
done
echo 'All processes terminated'
}
Notes
These functions getting PIDs via arguments by $# as BASH array.
Had the same issue, I solved the issue killing the process and then waiting for each process to finish using the PROC filesystem:
while [ -e /proc/${pid} ]; do sleep 0.1; done
There is no builtin feature to wait for any process to finish.
You could send kill -0 to any PID found, so you don't get puzzled by zombies and stuff that will still be visible in ps (while still retrieving the PID list using ps).
If you need to both kill a process and wait for it finish, this can be achieved with killall(1) (based on process names), and start-stop-daemon(8) (based on a pidfile).
To kill all processes matching someproc and wait for them to die:
killall someproc --wait # wait forever until matching processes die
timeout 10s killall someproc --wait # timeout after 10 seconds
(Unfortunately, there's no direct equivalent of --wait with kill for a specific pid).
To kill a process based on a pidfile /var/run/someproc.pid using signal SIGINT, while waiting for it to finish, with SIGKILL being sent after 20 seconds of timeout, use:
start-stop-daemon --stop --signal INT --retry 20 --pidfile /var/run/someproc.pid
Use inotifywait to monitor some file that gets closed, when your process terminates. Example (on Linux):
yourproc >logfile.log & disown
inotifywait -q -e close logfile.log
-e specifies the event to wait for, -q means minimal output only on termination. In this case it will be:
logfile.log CLOSE_WRITE,CLOSE
A single wait command can be used to wait for multiple processes:
yourproc1 >logfile1.log & disown
yourproc2 >logfile2.log & disown
yourproc3 >logfile3.log & disown
inotifywait -q -e close logfile1.log logfile2.log logfile3.log
The output string of inotifywait will tell you, which process terminated. This only works with 'real' files, not with something in /proc/
Rauno Palosaari's solution for Timeout in Seconds Darwin, is an excellent workaround for a UNIX-like OS that does not have GNU tail (it is not specific to Darwin). But, depending on the age of the UNIX-like operating system, the command-line offered is more complex than necessary, and can fail:
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
On at least one old UNIX, the lsof argument +r 1m%s fails (even for a superuser):
lsof: can't read kernel name list.
The m%s is an output format specification. A simpler post-processor does not require it. For example, the following command waits on PID 5959 for up to five seconds:
lsof -p 5959 +r 1 | awk '/^=/ { if (T++ >= 5) { exit 1 } }'
In this example, if PID 5959 exits of its own accord before the five seconds elapses, ${?} is 0. If not ${?} returns 1 after five seconds.
It may be worth expressly noting that in +r 1, the 1 is the poll interval (in seconds), so it may be changed to suit the situation.
On a system like OSX you might not have pgrep so you can try this appraoch, when looking for processes by name:
while ps axg | grep process_name$ > /dev/null; do sleep 1; done
The $ symbol at the end of the process name ensures that grep matches only process_name to the end of line in the ps output and not itself.

find and kill process in ksh script (linux) not working

I have been trying to find and kill any stale process left after the stop in a ksh script on a linux machine and it doesnt seem to work. It works from the command line but in the script though
here is the code
echo "kill any process still running"
ps -ef | grep qpasa |grep -v grep | awk '{print $2}' |xargs kill
and here is the output from the script log
usage: kill [ -s signal | -p ] [ -a ] pid ...
kill -l [ signal ]
can you you please let me know what am I doing wrong here
I think you call the script when no processes are running. Try kill without arguments and you get the same message.
You can redirect the error to /dev/null but I would try something else:
ps -ef | grep qpasa |grep -v grep | awk '{print $2}' | while read pid; do
echo "Killing ${pid}"
kill ${pid}
sleep 2
kill -9 ${pid} 2>/dev/null
done
The first kill gives qpasa the possibility to the stop controlled: Flush caches and close handles. Give qpasa 2 seconds for it.
When qpasa ignores the signal, kill it the hard way. Of course the process could have stopped already, so this time we want to ignore error messages.
When you have a lot of qpasa processes, you want to sleep 2 seconds only once.
First loop through all processes with a friendly kill, wait 5 seconds, and than hard kill the processes you find. When you make a function kill_qpasa_signal for the looping (and using $1 as kill signal), you can use
kill_qpasa_signal 15
sleep 5
kill_qpasa_signal 9

Quit less when pipe closes

As part of a bash script, I want to run a program repeatedly, and redirect the output to less. The program has an interactive element, so the goal is that when you exit the program via the window's X button, it is restarted via the script. This part works great, but when I use a pipe to less, the program does not automatically restart until I go to the console and press q. The relevant part of the script:
while :
do
program | less
done
I want to make less quit itself when the pipe closes, so that the program restarts without any user intervention. (That way it behaves just as if the pipe was not there, except while the program is running you can consult the console to view the output of the current run.)
Alternative solutions to this problem are also welcome.
Instead of exiting less, could you simply aggregate the output of each run of program?
while :
do
program
done | less
Having less exit when program would be at odds with one useful feature of less, which is that it can buffer the output of a program that exits before you finish reading its output.
UPDATE: Here's an attempt at using a background process to kill less when it is time. It assumes that the only program reading the output file is the less to kill.
while :
do
( program > /tmp/$$-program-output; kill $(lsof -Fp | cut -c2-) ) &
less /tmp/$$-program-output
done
program writes its output to a file. Once it exits, the kill command uses lsof to
find out what process is reading the file, then kills it. Note that there is a race condition; less needs to start before program exists. If that's a problem, it can
probably be worked around, but I'll avoid cluttering the answer otherwise.
You may try to kill the process group program and less belong to instead of using kill and lsof.
#!/bin/bash
trap 'kill 0' EXIT
while :
do
# script command gives sh -c own process group id (only sh -c cmd gets killed, not entire script!)
# FreeBSD script command
script -q /dev/null sh -c '(trap "kill -HUP -- -$$" EXIT; echo hello; sleep 5; echo world) | less -E -c'
# GNU script command
#script -q -c 'sh -c "(trap \"kill -HUP -- -$$\" EXIT; echo hello; sleep 5; echo world) | less -E -c"' /dev/null
printf '\n%s\n\n' "you now may ctrl-c the program: $0" 1>&2
sleep 3
done
While I agree with chepner's suggestion, if you really want individual less instances, I think this item for the man page will help you:
-e or --quit-at-eof
Causes less to automatically exit the second time it reaches end-of-file. By default,
the only way to exit less is via the "q" command.
-E or --QUIT-AT-EOF
Causes less to automatically exit the first time it reaches end-of-file.
you would make this option visible to less in the LESS envir variable
export LESS="-E"
while : ; do
program | less
done
IHTH

Wait for a process to finish

Is there any builtin feature in Bash to wait for a process to finish?
The wait command only allows one to wait for child processes to finish.
I would like to know if there is any way to wait for any process to finish before proceeding in any script.
A mechanical way to do this is as follows but I would like to know if there is any builtin feature in Bash.
while ps -p `cat $PID_FILE` > /dev/null; do sleep 1; done
To wait for any process to finish
Linux (doesn't work on Alpine, where ash doesn't support tail --pid):
tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1 &>/dev/null
With timeout (seconds)
Linux:
timeout $timeout tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
There's no builtin. Use kill -0 in a loop for a workable solution:
anywait(){
for pid in "$#"; do
while kill -0 "$pid"; do
sleep 0.5
done
done
}
Or as a simpler oneliner for easy one time usage:
while kill -0 PIDS 2> /dev/null; do sleep 1; done;
As noted by several commentators, if you want to wait for processes that you do not have the privilege to send signals to, you have find some other way to detect if the process is running to replace the kill -0 $pid call. On Linux, test -d "/proc/$pid" works, on other systems you might have to use pgrep (if available) or something like ps | grep "^$pid ".
I found "kill -0" does not work if the process is owned by root (or other), so I used pgrep and came up with:
while pgrep -u root process_name > /dev/null; do sleep 1; done
This would have the disadvantage of probably matching zombie processes.
This bash script loop ends if the process does not exist, or it's a zombie.
PID=<pid to watch>
while s=`ps -p $PID -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
EDIT: The above script was given below by Rockallite. Thanks!
My orignal answer below works for Linux, relying on procfs i.e. /proc/. I don't know its portability:
while [[ ( -d /proc/$PID ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done
It's not limited to shell, but OS's themselves do not have system calls to watch non-child process termination.
FreeBSD and Solaris have this handy pwait(1) utility, which does exactly, what you want.
I believe, other modern OSes also have the necessary system calls too (MacOS, for example, implements BSD's kqueue), but not all make it available from command-line.
From the bash manpage
wait [n ...]
Wait for each specified process and return its termination status
Each n may be a process ID or a job specification; if a
job spec is given, all processes in that job's pipeline are
waited for. If n is not given, all currently active child processes
are waited for, and the return status is zero. If n
specifies a non-existent process or job, the return status is
127. Otherwise, the return status is the exit status of the
last process or job waited for.
Okay, so it seems the answer is -- no, there is no built in tool.
After setting /proc/sys/kernel/yama/ptrace_scope to 0, it is possible to use the strace program. Further switches can be used to make it silent, so that it really waits passively:
strace -qqe '' -p <PID>
All these solutions are tested in Ubuntu 14.04:
Solution 1 (by using ps command):
Just to add up to Pierz answer, I would suggest:
while ps axg | grep -vw grep | grep -w process_name > /dev/null; do sleep 1; done
In this case, grep -vw grep ensures that grep matches only process_name and not grep itself. It has the advantage of supporting the cases where the process_name is not at the end of a line at ps axg.
Solution 2 (by using top command and process name):
while [[ $(awk '$12=="process_name" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_name with the process name that appears in top -n 1 -b. Please keep the quotation marks.
To see the list of processes that you wait for them to be finished, you can run:
while : ; do p=$(awk '$12=="process_name" {print $0}' <(top -n 1 -b)); [[ $b ]] || break; echo $p; sleep 1; done
Solution 3 (by using top command and process ID):
while [[ $(awk '$1=="process_id" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_id with the process ID of your program.
Blocking solution
Use the wait in a loop, for waiting for terminate all processes:
function anywait()
{
for pid in "$#"
do
wait $pid
echo "Process $pid terminated"
done
echo 'All processes terminated'
}
This function will exits immediately, when all processes was terminated. This is the most efficient solution.
Non-blocking solution
Use the kill -0 in a loop, for waiting for terminate all processes + do anything between checks:
function anywait_w_status()
{
for pid in "$#"
do
while kill -0 "$pid"
do
echo "Process $pid still running..."
sleep 1
done
done
echo 'All processes terminated'
}
The reaction time decreased to sleep time, because have to prevent high CPU usage.
A realistic usage:
Waiting for terminate all processes + inform user about all running PIDs.
function anywait_w_status2()
{
while true
do
alive_pids=()
for pid in "$#"
do
kill -0 "$pid" 2>/dev/null \
&& alive_pids+="$pid "
done
if [ ${#alive_pids[#]} -eq 0 ]
then
break
fi
echo "Process(es) still running... ${alive_pids[#]}"
sleep 1
done
echo 'All processes terminated'
}
Notes
These functions getting PIDs via arguments by $# as BASH array.
Had the same issue, I solved the issue killing the process and then waiting for each process to finish using the PROC filesystem:
while [ -e /proc/${pid} ]; do sleep 0.1; done
There is no builtin feature to wait for any process to finish.
You could send kill -0 to any PID found, so you don't get puzzled by zombies and stuff that will still be visible in ps (while still retrieving the PID list using ps).
If you need to both kill a process and wait for it finish, this can be achieved with killall(1) (based on process names), and start-stop-daemon(8) (based on a pidfile).
To kill all processes matching someproc and wait for them to die:
killall someproc --wait # wait forever until matching processes die
timeout 10s killall someproc --wait # timeout after 10 seconds
(Unfortunately, there's no direct equivalent of --wait with kill for a specific pid).
To kill a process based on a pidfile /var/run/someproc.pid using signal SIGINT, while waiting for it to finish, with SIGKILL being sent after 20 seconds of timeout, use:
start-stop-daemon --stop --signal INT --retry 20 --pidfile /var/run/someproc.pid
Use inotifywait to monitor some file that gets closed, when your process terminates. Example (on Linux):
yourproc >logfile.log & disown
inotifywait -q -e close logfile.log
-e specifies the event to wait for, -q means minimal output only on termination. In this case it will be:
logfile.log CLOSE_WRITE,CLOSE
A single wait command can be used to wait for multiple processes:
yourproc1 >logfile1.log & disown
yourproc2 >logfile2.log & disown
yourproc3 >logfile3.log & disown
inotifywait -q -e close logfile1.log logfile2.log logfile3.log
The output string of inotifywait will tell you, which process terminated. This only works with 'real' files, not with something in /proc/
Rauno Palosaari's solution for Timeout in Seconds Darwin, is an excellent workaround for a UNIX-like OS that does not have GNU tail (it is not specific to Darwin). But, depending on the age of the UNIX-like operating system, the command-line offered is more complex than necessary, and can fail:
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
On at least one old UNIX, the lsof argument +r 1m%s fails (even for a superuser):
lsof: can't read kernel name list.
The m%s is an output format specification. A simpler post-processor does not require it. For example, the following command waits on PID 5959 for up to five seconds:
lsof -p 5959 +r 1 | awk '/^=/ { if (T++ >= 5) { exit 1 } }'
In this example, if PID 5959 exits of its own accord before the five seconds elapses, ${?} is 0. If not ${?} returns 1 after five seconds.
It may be worth expressly noting that in +r 1, the 1 is the poll interval (in seconds), so it may be changed to suit the situation.
On a system like OSX you might not have pgrep so you can try this appraoch, when looking for processes by name:
while ps axg | grep process_name$ > /dev/null; do sleep 1; done
The $ symbol at the end of the process name ensures that grep matches only process_name to the end of line in the ps output and not itself.

Resources