Killing script with hardware control - bash

I have a script that controls an array of 12v relays depending on certain paramters. For an example: I am monitoring temperatures and pressures. If a temperature exceeds a certain value a relay will be pulled in to open vents and start fans. If the temperature drops to a certain value the relays are released and the vents will close and the fans will stop. Same with the pressures, which will open a solenoid valve and close it again depending on the pressure values.
All works fine and I am happy. The script (bash) is started at boot-up. However, sometimes the script dies mysteriously which leaves the relays in an "active" state.
Is there a way to ensure to reset the relays to "not-active" or "unenergized" when the script dies?

Continuing from my comment, you can trap any of the signals your script can receive (except SIGKILL and SIGSTOP) that are shutting it down and use trap to intercept the signal received and run the required commands to reset the relays to "not-active" or "unenergized" state before the process dies.
Using trap is quite easy. You simply set a trap at the top of your script listing the commands to be executed when a signal is caught. For simply commands you can do
trap 'command1; command2` SIGTERM SIGINT EXIT
to run command1 and command2 on receipt of any of the three listed signals. If you have a series of commands you need to execute, declare a function and then have trap execute the function on signal receipt, e.g.
cleanup () {
# any number of commands to run
}
trap cleanup SIGTERM SIGINT EXIT
See man 7 signals for addition information on standard signals. Consult man bash (or search on "using trap in bash") for additional information on trap.

Related

Background process getting killed when its parent is terminated?

I have code that looks something like this
function doTheThing{
# a potentially infinite while loop...
}
# other stuff...
doTheThing &
trap "kill $!" SIGINT SIGTERM
Strangely, when I ctrl-C out of the parent process before the loop is done, I get a message that the process doesn't exist. Furthermore, if I get rid of the trap, I can't find the process with a ps -aF. It looks like the background process is getting killed when its parent is terminated, but my understanding was that wasn't supposed to happen. I just want to make sure that I can safely leave out the trap and not leave zombie processes everywhere.
The POSIX specification says that when you type the interrupt character (normally Control-C) the SIGINT is sent to the foreground process group. So as long as the background process is running in the same process group as the script that invoked it, it will receive the signal at the same time as the script process.
Shells generally use process groups to implement job control, and by default this is only enabled in interactive shells, not shells running scripts. There's no standard way to run a function in its own process group, but you could use setsid to run it in a new session, which is an even higher level of grouping than process groups. Then it wouldn't receive the interrupt.
You might still want to write a trap command that kills the function on EXIT, though.
doTheThing&
trap "kill $!" EXIT
since exiting the script doesn't automatically kill the rest of the process group.

Forwarding signals in bash script which is submitted on the cluster

I have a launch.sh script which I submit on the cluster with
bsub $settings < launch.sh
This launch.sh bash script looks simplified as the following:
function trap_with_arg() {
func="$1" ; shift
for sig ; do
echo "$ES Installing trap for signal $sig"
trap "$func $sig" "$sig"
done
}
function signalHandler() {
# do stuff depending in what stage the script is
}
# Setup the Trap
trap_with_arg signalHandler SIGINT SIGTERM SIGUSR1 SIGUSR2
./start.sh
mpirun process.sh
./end.sh
Where process.sh calls two binaries (as an example) as
./binaryA
./binaryB
My question is the following:
The cluster already sends SIGUSR1 (approx. 10min before SIGTERM) to the process (I think this is the bash shell running my launch.sh script).
At the moment I catch this signal in the launch.sh script and call some signal handler. The problem is, this signal handler only gets executed (at least what I know) after a running command is finished (e.g. that might be mpirun process.sh or ./start.sh )
How can I forward these signals to make the commands/binaries exit gracefully. Forwarding for example to process.sh (mpirun, as I experienced, already forwards somehow these received signals (how does it do that?)
What is the proper way of forwarding signals, (e.g. also to the binaries binaryA, binaryB ?
I have no really good clue how to do this? Making the commands execute in background, creating a child process?
Thanks for some enlightenment :-)
From bash manual at http://www.gnu.org/software/bash/manual/html_node/Signals.html:
If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes. When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.
Thus, the solution seems to place commands in background and use "wait":
something &
wait

shell script process termination issue

/bin/sh -version
GNU sh, version 1.14.7(1)
exitfn () {
# Resore signal handling for SIGINT
echo "exiting with trap" >> /tmp/logfile
rm -f /var/run/lockfile.pid # Growl at user,
exit # then exit script.
}
trap 'exitfn; exit' SIGINT SIGQUIT SIGTERM SIGKILL SIGHUP
The above is my function in shell script.
I want to call it in some special conditions...like
when:
"kill -9" fires on pid of this script
"ctrl + z" press while it is running on -x mode
server reboots while script is executing ..
In short, with any kind of interrupt in script, should do some action
eg. rm -f /var/run/lockfile.pid
but my above function is not working properly; it works only for terminal close or "ctrl + c"
Kindly don't suggest to upgrade "bash / sh" version.
SIGKILL cannot be trapped by the trap command, or by any process. It is a guarenteed kill signal, that by it's definition cannot be trapped. Thus upgrading you sh/bash will not work anyway.
You can't trap kill -9 that's the whole point of it, to destroy processes violently that don't respond to other signals (there's a workaround for this, see below).
The server reboot should first deliver a signal to your script which should be caught with what you have.
As to the CTRL-Z, that also gives you a signal, SIGSTOP from memory, so you may want to add that. Though that wouldn't normally be a reason to shut down your process since it may be then put into the background and restarted (with bg).
As to what do do for those situations where your process dies without a catchable signal (like the -9 case), the program should check for that on startup.
By that, I mean lockfile.pid should store the actual PID of the process that created it (by using echo $$ >/var/run/myprog_lockfile.pid for example) and, if you try to start your program, it should check for the existence of that process.
If the process doesn't exist, or it exists but isn't the right one (based on name usually), your new process should delete the pidfile and carry on as if it was never there. If the old process both exists and is the right one, your new process should log a message and exit.

bash restart sub-process using trap SIGCHLD?

I've seen monitoring programs either in scripts that check process status using 'ps' or 'service status(on Linux)' periodically, or in C/C++ that forks and wait on the process...
I wonder if it is possible to use bash with trap and restart the sub-process when SIGCLD received?
I have tested a basic suite on RedHat Linux with following idea (and certainly it didn't work...)
#!/bin/bash
set -o monitor # can someone explain this? discussion on Internet say this is needed
trap startProcess SIGCHLD
startProcess() {
/path/to/another/bash/script.sh & # the one to restart
while [ 1 ]
do
sleep 60
done
}
startProcess
what the bash script being started just sleep for a few seconds and exit for now.
several issues observed:
when the shell starts in foreground, SIGCHLD will be handled only once. does trap reset signal handling like signal()?
the script and its child seem to be immune to SIGINT, which means they cannot be stopped by ^C
since cannot be closed, I closed the terminal. The script seems to be HUP and many zombie children left.
when run in background, the script caused terminal to die
... anyway, this does not work at all. I have to say I know too little about this topic.
Can someone suggest or give some working examples?
Are there scripts for such use?
how about use wait in bash, then?
Thanks
I can try to answer some of your questions but not all based on what I
know.
The line set -o monitor (or equivalently, set -m) turns on job
control, which is only on by default for interactive shells. This seems
to be required for SIGCHLD to be sent. However, job control is more of
an interactive feature and not really meant to be used in shell scripts
(see also this question).
Also keep in mind this is probably not what you intended to do
because once you enable job control, SIGCHLD will be sent for every
external command that exists (e.g. every time you run ls or grep or
anything, a SIGCHLD will fire when that command completes and your trap
will run).
I suspect the reason the SIGCHLD trap only appears to run once is
because your trap handler contains a foreground infinite loop, so your
script gets stuck in the trap handler. There doesn't seem to be a point
to that loop anyways, so you could simply remove it.
The script's "immunity" to SIGINT seems to be an effect of enabling
job control (the monitor part). My hunch is with job control turned on,
the sub-instance of bash that runs your script no longer terminates
itself in response to a SIGINT but instead passes the SIGINT through to
its foreground child process. In your script, the ^C i.e. SIGINT
simply acts like a continue statement in other programming languages
case, since SIGINT will just kill the currently running sleep 60,
whereupon the while loop will immediately run a new sleep 60.
When I tried running your script and then killing it (from another
terminal), all I ended up with were two stray sleep processes.
Backgrounding that script also kills my shell for me, although
the behavior is not terribly consistent (sometimes it happens
immediately, other times not at all). It seems typing any keys other
than enter causes an EOF to get sent somehow. Even after the terminal
exits the script continues to run in the background. I have no idea
what is going on here.
Being more specific about what you want to accomplish would help. If
you just want a command to run continuously for the lifetime of your
script, you could run an infinite loop in the background, like
while true; do
some-command
echo some-command finished
echo restarting some-command ...
done &
Note the & after the done.
For other tasks, wait is probably a better idea than using job control
in a shell script. Again, it would depend on what exactly you are trying
to do.

How to kill all children of the current shell on interrupt?

My scripts cdist-deploy-to and cdist-mass-deploy (from cdist configuration management) run interactively (i.e. are called by a user).
These scripts call a lot of scripts, which again call some scripts:
cdist-mass-deploy ...
cdist-deploy-to ...
cdist-explorer-run-global ...
cdist-dir ....
What I want is to exit / kill all scripts, as soon as cdist-mass-deploy is either stopped by control C (SIGINT) or killed with SIGTERM.
cdist-deploy-to can also be called interactively and should exhibit the same behaviour.
Using ps -ef... and co variants to find out all processes with the ppid looks like it could be quite unportable. Using $! does not work as in the deeper levels the children are no background processes.
I tried using the following code:
__cdist_kill_on_interrupt()
{
__cdist_tmp_removal
kill 0
exit 1
}
trap __cdist_kill_on_interrupt INT TERM
But this leads to ugly Terminated messages as well as to a segfault in the shells (dash, bash, zsh) and seems not to stop everything instantly anyway:
# cdist-mass-deploy -p ikq04.ethz.ch ikq05.ethz.ch
core: Waiting for cdist-deploy-to jobs to finish
^CTerminated
Terminated
Terminated
Terminated
Segmentation fault
So the question is, how to cleanly exit including all (sub-)children in a portable manner (bourne shell, no csh support needed)?
You don't need to handle ^C, that will result in a signal being sent to the whole process group, which will kill all the processes that are not in the background. So you don't need to catch INT.
The only reason you get a Terminated when you kill them is that kill sends TERM by default, but that's reasonable if you are handling a TERM in the first place. You could use kill -INT 0 if you want to avoid the messages.
(responding with extra info)
If the child processes are run in the background, you can get their process ids just after you start them, using the $! special shell variable. Gather these together in a variable and just kill them all when you need to terminate.

Resources