I have a bash script server.sh which is maintained by an external source and ideally should not be modified. This script writes to stdout and stderr.
In fact, this server.sh itself is doing an exec tclsh immediately:
#!/bin/sh
# \
exec tclsh "$0" ${1+"$#"}
so in fact, it is just a wrapper around a Tcl script. I just mention this in case you think that this matters.
I need a Tcl script setup.tcl which is supposed to do some preparatory work, then invoke server.sh (in the background), then do some cleanup work (and display the PID of the background process), and terminate.
server.sh is supposed to continue running until explicitly killed.
setup.tcl is usually invoked manually, either from a Cygwin bash shell or from a Windows cmd shell. In the latter case, it is ensured that Cygwin's bash.exe is in the PATH.
The environment is Windows 7 and Cygwin. The Tcl is either Cygwin's (8.5) or ActiveState 8.4.
The first version (omitting error handling) went like this:
# setup.tcl:
# .... preparatory work goes here
set childpid [exec bash.exe server.sh &]
# .... clean up work goes here
puts $childpid
exit 0
While this works when started as ActiveState Tcl from a Windows CMD shell, it does not work in a pure Cygwin setup. The reason is that as soon as setup.tcl ends, a signal is sent to the child process and this is killed too.
Using nohup would not help here, because I want to see the output of server.sh as soon as it occurs.
My next idea would be to created an intermediate bash script, mediator.sh, which uses disown -h to detach the child process and keep it from being killed:
#!/usr/bin/bash
# mediator.sh
server.sh &
child=$!
disown -h $child
and invoke mediator.sh from setup.tcl. But aside from the fact that I don't see an easy way to pass the child PID up to setup.tcl, the main problem is that it doesn't work either: While mediator.sh indeed keeps the child alive when called from the Cygwin command line directly, we have the same behaviour again (server.sh being killed when setup.tcl exits), when I call it via setup.tcl.
Anybody knowing a solution for this?
You'll want to set a trap handler in your server script so you can handle/ignore certain signals.
For example, to ignore HUP signals, you can do something like the following:
#!/bin/bash
handle_signal() {
echo "Ignoring HUP signal"
}
trap handle_signal SIGHUP
# Rest of code goes here
In the example case, if the script receives a HUP signal it will print a message and continue as normal. It will still die to Ctrl-C as that's the INT signal which is unhandled.
Related
The way I normally start a long-running shell script is
% (nohup ./script.sh </dev/null >script.log 2>&1 & )
The redirections close stdin, and reopen stdout and stderr; the nohup stops HUP reaching the process when the owning process exits (I realise that the 2>&1 is somewhat redundant, since the nohup does something like this anyway); and the backgrounding within the subshell is the double-fork which means that the ./script.sh process's parent has exited while it's still running, so it acquires the init process as its parent.
That doesn't completely work, however, because when I exit the shell from which I've invoked this (typically, of course, I'm doing this on a remote machine), it doesn't exit cleanly. I can do ^C to exit, and this is OK – the process does carry on in the background as intended. However I can't work out what is/isn't happening to require the ^C, and that's annoying me.
The actions above seem to tick most of the boxes in the unix FAQ (question 1.7), except that I'm not doing anything to detach this process from a controlling terminal, or to make it a session leader. The setsid(2) call exists on FreeBSD, but not the setsid command; nor, as far as I can see, is there an obvious substitute for that command. The same is true on macOS, of course.
So, the questions are:
Is there a differently-named caller of setsid on this platform, that I'm missing?
What, precisely, is happening when I exit the calling shell, that I'm killing with the ^C? Is there any way this could bite me?
Related questions (eg 1, 2) either answer a slightly different question, or assume the presence of the setsid command.
(This question has annoyed me for years, but because what I do here doesn't actually not work, I've never before got around to investigating, getting stumped, and asking about it).
In FreeBSD, out of the box you could use daemon -- run detached from the controlling terminal. option -r could be useful:
-r Supervise and restart the program after a one-second delay if it
has been terminated.
You could also try a supervisor, for example immortal is available for both platforms:
pkg install immortal # FreeBSD
brew install immortal # macOS
To daemonize your script and log (stdout/stderr) you could use:
immortal /path/to/your/script.sh -l /tmp/script.log
Or for more options, you could create a my-service.yml for example:
cmd: /path/to/script
cwd: /your/path
env:
DEBUG: 1
ENVIROMENT: production
log:
file: /tmp/app.log
stderr:
file: /tmp/app-error.log
And then run it with immortal -c my-service.yml
More examples can be found here: https://immortal.run/post/examples
If just want to use nohup and save the stdout & stderr into a file, you could add this to your script:
#!/bin/sh
exec 2>&1
...
Check more about exec 2>&1 in this answers https://stackoverflow.com/a/13088401/1135424
And then simply call nohup /your/script.sh & and check the file nohup.out, from the man
FILES
nohup.out The output file of the nohup execution if stan-
dard output is a terminal and if the current
directory is writable.
$HOME/nohup.out The output file of the nohup execution if stan-
dard output is a terminal and if the current
directory is not writable.
I have a script (lets call it parent.sh) that makes 2 calls to a second script (child.sh) that runs a java process. The child.sh scripts are run in the background by placing an & at the end of the line in parent.sh. However, when i run parent.sh, i need to press Ctrl+C to return to the terminal screen. What is the reason for this? Is it something to do with the fact that the child.sh processes are running under the parent.sh process. So the parent.sh doesn't die until the childs do?
parent.sh
#!/bin/bash
child.sh param1a param2a &
child.sh param1b param2b &
exit 0
child.sh
#!/bin/bash
java com.test.Main
echo "Main Process Stopped" | mail -s "WARNING-Main Process is down." user#email.com
As you can see, I don't want to run the java process in the background because i want to send a mail out when the process dies. Doing it as above works fine from a functional standpoint, but i would like to know how i can get it to return to the terminal after executing parent.sh.
What i ended up doing was to make to change parent.sh to the following
#!/bin/bash
child.sh param1a param2a > startup.log &
child.sh param1b param2b > startup2.log &
exit 0
I would not have come to this solution without your suggestions and root cause analysis of the issue. Thanks!
And apologies for my inaccurate comment. (There was no input, I answered from memory and I remembered incorrectly.)
The following link from the Linux Documentation Project suggests adding a wait after your mail command in child.sh:
http://tldp.org/LDP/abs/html/x9644.html
Summary of the above document
Within a script, running a command in the background with an ampersand (&)
may cause the script to hang until ENTER is hit. This seems to occur with
commands that write to stdout. It can be a major annoyance.
....
....
As Walter Brameld IV explains it:
As far as I can tell, such scripts don't actually hang. It just
seems that they do because the background command writes text to
the console after the prompt. The user gets the impression that
the prompt was never displayed. Here's the sequence of events:
Script launches background command.
Script exits.
Shell displays the prompt.
Background command continues running and writing text to the
console.
Background command finishes.
User doesn't see a prompt at the bottom of the output, thinks script
is hanging.
If you change child.sh to look like the following you shouldn't experience this annoyance:
#!/bin/bash
java com.test.Main
echo "Main Process Stopped" | mail -s "WARNING-Main Process is down." user#gmail.com
wait
Or as #SebastianStigler states in a comment to your question above:
Add a > /dev/null at the end of the line with mail. mail will otherwise try to start its interactive mode.
This will cause the mail command to write to /dev/null rather than stdout which should also stop this annoyance.
Hope this helps
The process was still linked to the controlling terminal because STDOUT needs somewhere to go. You solved that problem by redirecting to a file ( > startup.log ).
If you're not interested in the output, discard STDOUT completely ( >/dev/null ).
If you're not interested in errors, either, discard both ( &>/dev/null ).
If you want the processes to keep running even after you log out of your terminal, use nohup — that effectively disconnects them from what you are doing and leaves them to quietly run in the background until you reboot your machine (or otherwise kill them).
nohup child.sh param1a param2a &>/dev/null &
I have a script (lets call it parent.sh) that makes 2 calls to a second script (child.sh) that runs a java process. The child.sh scripts are run in the background by placing an & at the end of the line in parent.sh. However, when i run parent.sh, i need to press Ctrl+C to return to the terminal screen. What is the reason for this? Is it something to do with the fact that the child.sh processes are running under the parent.sh process. So the parent.sh doesn't die until the childs do?
parent.sh
#!/bin/bash
child.sh param1a param2a &
child.sh param1b param2b &
exit 0
child.sh
#!/bin/bash
java com.test.Main
echo "Main Process Stopped" | mail -s "WARNING-Main Process is down." user#email.com
As you can see, I don't want to run the java process in the background because i want to send a mail out when the process dies. Doing it as above works fine from a functional standpoint, but i would like to know how i can get it to return to the terminal after executing parent.sh.
What i ended up doing was to make to change parent.sh to the following
#!/bin/bash
child.sh param1a param2a > startup.log &
child.sh param1b param2b > startup2.log &
exit 0
I would not have come to this solution without your suggestions and root cause analysis of the issue. Thanks!
And apologies for my inaccurate comment. (There was no input, I answered from memory and I remembered incorrectly.)
The following link from the Linux Documentation Project suggests adding a wait after your mail command in child.sh:
http://tldp.org/LDP/abs/html/x9644.html
Summary of the above document
Within a script, running a command in the background with an ampersand (&)
may cause the script to hang until ENTER is hit. This seems to occur with
commands that write to stdout. It can be a major annoyance.
....
....
As Walter Brameld IV explains it:
As far as I can tell, such scripts don't actually hang. It just
seems that they do because the background command writes text to
the console after the prompt. The user gets the impression that
the prompt was never displayed. Here's the sequence of events:
Script launches background command.
Script exits.
Shell displays the prompt.
Background command continues running and writing text to the
console.
Background command finishes.
User doesn't see a prompt at the bottom of the output, thinks script
is hanging.
If you change child.sh to look like the following you shouldn't experience this annoyance:
#!/bin/bash
java com.test.Main
echo "Main Process Stopped" | mail -s "WARNING-Main Process is down." user#gmail.com
wait
Or as #SebastianStigler states in a comment to your question above:
Add a > /dev/null at the end of the line with mail. mail will otherwise try to start its interactive mode.
This will cause the mail command to write to /dev/null rather than stdout which should also stop this annoyance.
Hope this helps
The process was still linked to the controlling terminal because STDOUT needs somewhere to go. You solved that problem by redirecting to a file ( > startup.log ).
If you're not interested in the output, discard STDOUT completely ( >/dev/null ).
If you're not interested in errors, either, discard both ( &>/dev/null ).
If you want the processes to keep running even after you log out of your terminal, use nohup — that effectively disconnects them from what you are doing and leaves them to quietly run in the background until you reboot your machine (or otherwise kill them).
nohup child.sh param1a param2a &>/dev/null &
/bin/sh -version
GNU sh, version 1.14.7(1)
exitfn () {
# Resore signal handling for SIGINT
echo "exiting with trap" >> /tmp/logfile
rm -f /var/run/lockfile.pid # Growl at user,
exit # then exit script.
}
trap 'exitfn; exit' SIGINT SIGQUIT SIGTERM SIGKILL SIGHUP
The above is my function in shell script.
I want to call it in some special conditions...like
when:
"kill -9" fires on pid of this script
"ctrl + z" press while it is running on -x mode
server reboots while script is executing ..
In short, with any kind of interrupt in script, should do some action
eg. rm -f /var/run/lockfile.pid
but my above function is not working properly; it works only for terminal close or "ctrl + c"
Kindly don't suggest to upgrade "bash / sh" version.
SIGKILL cannot be trapped by the trap command, or by any process. It is a guarenteed kill signal, that by it's definition cannot be trapped. Thus upgrading you sh/bash will not work anyway.
You can't trap kill -9 that's the whole point of it, to destroy processes violently that don't respond to other signals (there's a workaround for this, see below).
The server reboot should first deliver a signal to your script which should be caught with what you have.
As to the CTRL-Z, that also gives you a signal, SIGSTOP from memory, so you may want to add that. Though that wouldn't normally be a reason to shut down your process since it may be then put into the background and restarted (with bg).
As to what do do for those situations where your process dies without a catchable signal (like the -9 case), the program should check for that on startup.
By that, I mean lockfile.pid should store the actual PID of the process that created it (by using echo $$ >/var/run/myprog_lockfile.pid for example) and, if you try to start your program, it should check for the existence of that process.
If the process doesn't exist, or it exists but isn't the right one (based on name usually), your new process should delete the pidfile and carry on as if it was never there. If the old process both exists and is the right one, your new process should log a message and exit.
I've seen monitoring programs either in scripts that check process status using 'ps' or 'service status(on Linux)' periodically, or in C/C++ that forks and wait on the process...
I wonder if it is possible to use bash with trap and restart the sub-process when SIGCLD received?
I have tested a basic suite on RedHat Linux with following idea (and certainly it didn't work...)
#!/bin/bash
set -o monitor # can someone explain this? discussion on Internet say this is needed
trap startProcess SIGCHLD
startProcess() {
/path/to/another/bash/script.sh & # the one to restart
while [ 1 ]
do
sleep 60
done
}
startProcess
what the bash script being started just sleep for a few seconds and exit for now.
several issues observed:
when the shell starts in foreground, SIGCHLD will be handled only once. does trap reset signal handling like signal()?
the script and its child seem to be immune to SIGINT, which means they cannot be stopped by ^C
since cannot be closed, I closed the terminal. The script seems to be HUP and many zombie children left.
when run in background, the script caused terminal to die
... anyway, this does not work at all. I have to say I know too little about this topic.
Can someone suggest or give some working examples?
Are there scripts for such use?
how about use wait in bash, then?
Thanks
I can try to answer some of your questions but not all based on what I
know.
The line set -o monitor (or equivalently, set -m) turns on job
control, which is only on by default for interactive shells. This seems
to be required for SIGCHLD to be sent. However, job control is more of
an interactive feature and not really meant to be used in shell scripts
(see also this question).
Also keep in mind this is probably not what you intended to do
because once you enable job control, SIGCHLD will be sent for every
external command that exists (e.g. every time you run ls or grep or
anything, a SIGCHLD will fire when that command completes and your trap
will run).
I suspect the reason the SIGCHLD trap only appears to run once is
because your trap handler contains a foreground infinite loop, so your
script gets stuck in the trap handler. There doesn't seem to be a point
to that loop anyways, so you could simply remove it.
The script's "immunity" to SIGINT seems to be an effect of enabling
job control (the monitor part). My hunch is with job control turned on,
the sub-instance of bash that runs your script no longer terminates
itself in response to a SIGINT but instead passes the SIGINT through to
its foreground child process. In your script, the ^C i.e. SIGINT
simply acts like a continue statement in other programming languages
case, since SIGINT will just kill the currently running sleep 60,
whereupon the while loop will immediately run a new sleep 60.
When I tried running your script and then killing it (from another
terminal), all I ended up with were two stray sleep processes.
Backgrounding that script also kills my shell for me, although
the behavior is not terribly consistent (sometimes it happens
immediately, other times not at all). It seems typing any keys other
than enter causes an EOF to get sent somehow. Even after the terminal
exits the script continues to run in the background. I have no idea
what is going on here.
Being more specific about what you want to accomplish would help. If
you just want a command to run continuously for the lifetime of your
script, you could run an infinite loop in the background, like
while true; do
some-command
echo some-command finished
echo restarting some-command ...
done &
Note the & after the done.
For other tasks, wait is probably a better idea than using job control
in a shell script. Again, it would depend on what exactly you are trying
to do.