How do I write a watchdog daemon in bash? - bash

I want a way to write a daemon in a shell script, which runs another application in a loop, restarting it if it dies.
When run using ./myscript.sh from an SSH session, it shall launch a new instance of the daemon, except if the daemon is already running.
When the SSH session ends, the daemon shall persist.
There shall be a parameter (./myscript -stop) that kills any existing daemon.
(Notes on edit - The original question specified that nohup and similar tools may not be used. This artificial requirement was an "XY question", and the accepted answer in fact uses all the tools the OP claimed were not possible to use.)

Based on clarifications in comments, what you actually want is a daemon process that keeps a child running, relaunching it whenever it exits. You want a way to type "./myscript.sh" in an ssh session and have the daemon started.
#!/usr/bin/env bash
PIDFILE=~/.mydaemon.pid
if [ x"$1" = x-daemon ]; then
if test -f "$PIDFILE"; then exit; fi
echo $$ > "$PIDFILE"
trap "rm '$PIDFILE'" EXIT SIGTERM
while true; do
#launch your app here
/usr/bin/server-or-whatever &
wait # needed for trap to work
done
elif [ x"$1" = x-stop ]; then
kill `cat "$PIDFILE"`
else
nohup "$0" -daemon
fi
Run the script: it will launch the daemon process for you with nohup. The daemon process is a loop that watches for the child to exit, and relaunches it when it does.
To control the daemon, there's a -stop argument the script can take that will kill the daemon. Look at examples in your system's init scripts for more complete examples with better error checking.

The pid of the most recently "backgrounded" process is stored in $!
$ cat &
[1] 7057
$ echo $!
7057
I am unaware of a fork command in bash. Are you sure bash is the right tool for this job?

Related

Trying to close all child processes when I interrupt my bash script

I have written a bash script to carry out some tests on my system. The tests run in the background and in parallel. The tests can take a long time and sometimes I may wish to abort the tests part way through.
If I Control+C then it aborts the parent script, but leaves the various children running. I wish to make it so that I can hit Control+C or otherwise to quit and then kill all child processes running in the background. I have a bit of code that does the job if I'm running running the background jobs directly from the terminal, but it doesn't work in my script.
I have a minimal working example.
I have tried using trap in combination with pgrep -P $$.
#!/bin/bash
trap 'kill -n 2 $(pgrep -P $$)' 2
sleep 10 &
wait
I was hoping that on hitting control+c (SIGINT) would kill everything that the script started but it actually says:
./breakTest.sh: line 1: kill: (3220) - No such process
This number changes, but doesn't seem to apply to any running processes, so I don't know where it is coming from.
I guess if the contents of the trap command get evaluated where the trap command occurs then it might explain the outcome. The 3220 pid might be for pgrep itself.
I'd appreciate some insight here
Thanks
I have found a solution using pkill. This example also deals with many child processes.
#!/bin/bash
trap 'pkill -P $$' SIGINT SIGTERM
for i in {1..10}; do
sleep 10 &
done
wait
This appears to kill all the child processes elegantly. Though I don't properly understand what the issue was with my original code, apart from sending the correct signal.
in bash whenever you you use & after a command it places that command as a background job ( this background jobs are called job_spec ) which is incremented by one until you exit that terminal session. You can use the jobs command to get the list of the background jobs running. To work with this jobs you have to use the % with the job id. The jobs command also accept other options such as jobs -p to see the proces sids of all jobs , jobs -p %JOB_SPEC to see the process of id of that particular job.
#!/usr/bin/env bash
trap 'kill -9 %1' 2
sleep 10 &
wait
or
#!/usr/bin/env bash
trap 'kill -9 $(jobs -p %1)' 2
sleep 10 &
wait
I implemented something like this few years back, you can take a look at it async bash
You can try something like the following:
pkill -TERM -P <your_parent_id_here>

Stop script when gnome session ends

In Start Script when Gnome Starts Up it was asked how to automatically start a script on gnome login. But how to automatically stop a long running script on logout, that was started on login? In my case there are two processes when I login twice. Interestingly the process started first does not reside under gnome-session anymore.
I would wrap the binary that gets executed in a simple bash script that saves the pid of the started process in a temporary file. If this file already exists it skips the start of the application. Since the file is saved in the /tmp directory everything gets deleted once you restart your computer.
#!/bin/bash
binary="git-cola"
temp_file="/tmp/my_${binary}_instance.pid"
if [[ -f ${temp_file} ]]
then
echo "PID exists"
else
exec ${binary} &
echo $! > ${temp_file}
fi
With a little more effort you can check if the pid of the process is still running and restart it on the login again (for example if the process crashed or the other user closed it).
I actually don't use Gnome, so I can't tell you if there is a more elegant way to kill the process. Like a logout hook. But once you got the pid of the process saved you can kill it with kill -9 PID. (See man kill for more gentle ways to end the process).
This might not be the solution to stop the process. But to prevent it starting twice.

BASH script suspend/continue a process within script

In my bash script I am writing, I am trying to start a process (sleep) in the background and then suspend it. Finally, the process with be finished. For some reason through, when I send the kill command with the stop signal, it just keeps running as if it received nothing. I can do this from the command line, but the bash script is not working as intended.
sleep 15&
pid=$!
kill -s STOP $pid
jobs
kill -s CONT $pid
You can make it work by enabling 'monitor mode' in your script: set -m
Please see why-cant-i-use-job-control-in-a-bash-script for further information

Preferred way to terminate `ssh -N` in background using bash?

I've started ssh -N <somehost> & in a bash script (to create a tunnel), and the connection persists after the script ends, and I see with ps that the ssh process has detached.
I am currently killing the background job with kill jobs -p, but is there a better way to do that?
Do you manually end your script?
if so:
Try to catch the QUIT signal (or others) inside your script (use the
trap builtin command I think). Then kill ssh.
else:
Kill ssh at the end of your script.

Terminate running commands when shell script is killed [duplicate]

This question already has answers here:
What's the best way to send a signal to all members of a process group?
(34 answers)
Closed 6 years ago.
For testing purposes I have this shell script
#!/bin/bash
echo $$
find / >/dev/null 2>&1
Running this from an interactive terminal, ctrl+c will terminate bash, and the find command.
$ ./test-k.sh
13227
<Ctrl+C>
$ ps -ef |grep find
$
Running it in the background, and killing the shell only will orphan the commands running in the script.
$ ./test-k.sh &
[1] 13231
13231
$ kill 13231
$ ps -ef |grep find
nos 13232 1 3 17:09 pts/5 00:00:00 find /
$
I want this shell script to terminate all its child processes when it exits regardless of how it's called. It'll eventually be started from a python and java application - and some form of cleanup is needed when the script exits - any options I should look into or any way to rewrite the script to clean itself up on exit?
I would do something like this:
#!/bin/bash
trap : SIGTERM SIGINT
echo $$
find / >/dev/null 2>&1 &
FIND_PID=$!
wait $FIND_PID
if [[ $? -gt 128 ]]
then
kill $FIND_PID
fi
Some explanation is in order, I guess. Out the gate, we need to change some of the default signal handling. : is a no-op command, since passing an empty string causes the shell to ignore the signal instead of doing something about it (the opposite of what we want to do).
Then, the find command is run in the background (from the script's perspective) and we call the wait builtin for it to finish. Since we gave a real command to trap above, when a signal is handled, wait will exit with a status greater than 128. If the process waited for completes, wait will return the exit status of that process.
Last, if the wait returns that error status, we want to kill the child process. Luckily we saved its PID. The advantage of this approach is that you can log some error message or otherwise identify that a signal caused the script to exit.
As others have mentioned, putting kill -- -$$ as your argument to trap is another option if you don't care about leaving any information around post-exit.
For trap to work the way you want, you do need to pair it up with wait - the bash man page says "If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes." wait is the way around this hiccup.
You can extend it to more child processes if you want, as well. I didn't really exhaustively test this one out, but it seems to work here.
$ ./test-k.sh &
[1] 12810
12810
$ kill 12810
$ ps -ef | grep find
$
Was looking for an elegant solution to this issue and found the following solution elsewhere.
trap 'kill -HUP 0' EXIT
My own man pages say nothing about what 0 means, but from digging around, it seems to mean the current process group. Since the script get's it's own process group, this ends up sending SIGHUP to all the script's children, foreground and background.
Send a signal to the group.
So instead of kill 13231 do:
kill -- -13231
If you're starting from python then have a look at:
http://www.pixelbeat.org/libs/subProcess.py
which shows how to mimic the shell in starting
and killing a group
#Patrick's answer almost did the trick, but it doesn't work if the parent process of your current shell is in the same group (it kills the parent too).
I found this to be better:
trap 'pkill -P $$' EXIT
See here for more info.
Just add a line like this to your script:
trap "kill $$" SIGINT
You might need to change 'SIGINT' to 'INT' on your setup, but this will basically kill your process and all child processes when you hit Ctrl-C.
The thing you would need to do is trap the kill signal, kill the find command and exit.

Resources