How do I write a bash script to restart a service if it dies? - bash

I have a program that runs as a daemon, using the C command fork(). It creates a new instance that runs in the background. The main instance exists after that.
What would be the best option to check if the service is running? I'm considering:
Create a file with the process id of the program and check if it's running with a script.
Use ps | grep to find the program in the running proccess list.
Thanks.

I think it will be better to manage your process with supervisord, or other process control system.

Create a cron job that runs every few minutes (or whatever you're comfortable with) and does something like this:
/path/to/is_script_stopped.sh && /path/to/script.sh
Write is_script_stopped.sh using any of the methods that you suggest. If your script is stopped cron will evaluate your script, if not, it won't.

To the question, you gave in the headline:
This simple endless loop will restart yourProgram as soon as it fails:
#!/bin/bash
for ((;;))
do
yourProgram
done
If your program depends on a resource, which might fail, it would be wise to insert a short pause, to avoid, that it will catch all system resources when failing million times per second:
#!/bin/bash
for ((;;))
do
yourProgram
sleep 1
done
To the question from the body of your post:
What would be the best option to check if the service is running?
If your ps has a -C option (like the Linux ps) you would prefer that over a ps ax | grep combination.
ps -C yourProgram

Related

Crontab closing background process?

I am testing a bash script I hope to run as a cron job to scan a download log and perform labor-intensive conversions on image files. In order to run several conversions at once, the first script loops through the download log and sends the filename to the second script, which I set to run as a background process using &.
The script pair works well, but when the process is complete, I must press the enter key to return to a command prompt. This is a non-issue when I am running a test, but I am not sure if this behavior has ramifications when run as a cron job.
Will this be an issue? If so, is there a way to close the "terminal" running the first script from the crontab?
Here's a truncated form of my code:
Script 1 (to be launched by crontab):
for i in file1 file2 file3 etc
do
bash /path/to/convert.sh $i &
done
exit 0
Script 2 (convert.sh)
fileName=${1?no file given}
jpegName=$(echo $fileName | sed s/tif/jpg/g)
convert $fileName $jpegName
exit 0
Thanks for any help/assurances you can give!
you don't need script 2. you can convert it to function and put it inside script1.
Another problem is you are running convert.sh in an uncontrolled way. You cannot foresee how many processes will be created (background processes) and this may lead to severe performance overheads.
finally, if you cannot end process in normal way, you may choose to terminate it again using cron by issueing pkill script1.sh

How to run a command after a certain time while another command is running?

I have a bash script which will be running a main command, let's say for one hour. I would like to execute another command after a certain time since the main command has been started (at t_x). Something like this:
main starts -------> main ends
|
|
at time t_x, second command is executed
At the moment I have something like this:
mpirun main_command & sleep 1m && second_command
and the problem is that after second command is executed, the main command is killed. Can anyone help me? Thanks!
The first command is failing to lock the console, as another process is also using it. You will need to redirect the standard io pipelines, 0<&- mpirun main_command >/dev/null 2>/dev/null If this still does not work, use shell -c 'mpirun main_command' & sleep 1m;second_command You can use ; instead of &&, unless you need a failing exit code when someone interrupts the sleep.

How can I tell if a script was run in the background and with nohup?

Ive got a script that takes a quite a long time to run, as it has to handle many thousands of files. I want to make this script as fool proof as possible. To this end, I want to check if the user ran the script using nohup and '&'. E.x.
me#myHost:/home/me/bin $ nohup doAlotOfStuff.sh &. I want to make 100% sure the script was run with nohup and '&', because its a very painful recovery process if the script dies in the middle for whatever reason.
How can I check those two key paramaters inside the script itself? and if they are missing, how can I stop the script before it gets any farther, and complain to the user that they ran the script wrong? Better yet, is there way I can force the script to run in nohup &?
Edit: the server enviornment is AIX 7.1
The ps utility can get the process state. The process state code will contain the character + when running in foreground. Absence of + means code is running in background.
However, it will be hard to tell whether the background script was invoked using nohup. It's also almost impossible to rely on the presence of nohup.out as output can be redirected by user elsewhere at will.
There are 2 ways to accomplish what you want to do. Either bail out and warn the user or automatically restart the script in background.
#!/bin/bash
local mypid=$$
if [[ $(ps -o stat= -p $mypid) =~ "+" ]]; then
echo Running in foreground.
exec nohup $0 "$#" &
exit
fi
# the rest of the script
...
In this code, if the process has a state code +, it will print a warning then restart the process in background. If the process was started in the background, it will just proceed to the rest of the code.
If you prefer to bailout and just warn the user, you can remove the exec line. Note that the exit is not needed after exec. I left it there just in case you choose to remove the exec line.
One good way to find if a script is logging to nohup, is to first check that the nohup.out exists, and then to echo to it and ensure that you can read it there. For example:
echo "complextag"
if ( $(cat nohup.out | grep "complextag" ) != "complextag" );then
# various commands complaining to the user, then exiting
fi
This works because if the script's stdout is going to nohup.out, where they should be going (or whatever out file you specified), then when you echo that phrase, it should be appended to the file nohup.out. If it doesn't appear there, then the script was nut run using nohup and you can scold them, perhaps by using a wall command on a temporary broadcast file. (if you want me to elaborate on that I can).
As for being run in the background, if it's not running you should know by checking nohup.

shell: clean up leaked background processes which hang due to shared stdout/stderr

I need to run essentially arbitrary commands on a (remote) shell in ephemeral containers/VMs for a test execution engine. Sometimes these leak background processes which then cause the entire command to hang. This can be boiled down to this simple command:
$ sh -c 'sleep 30 & echo payload'
payload
$
Here the backgrounded sleep 30 plays the role of a leaked process (which in reality will be something like dbus-daemon) and the echo is the actual thing I want to run. The sleep 30 & echo payload should be considered as an atomic opaque example command here.
The above command is fine and returns immediately as the shell's and also sleep's stdout/stderr are a PTY. However, when capturing the output of the command to a pipe/file (a test runner wants to save everything into a log, after all), the whole command hangs:
$ sh -c 'sleep 30 & echo payload' | cat
payload
# ... does not return to the shell (until the sleep finishes)
Now, this could be fixed with some rather ridiculously complicated shell magic which determines the FDs of stdout/err from /proc/$$/fd/{1,2}, iterating over ls /proc/[0-9]*/fd/* and killing every process which also has the same stdout/stderr. But this involves a lot of brittle shell code and expensive shell string comparisons.
Is there a way to clean up these leaked background processes in a more elegant and simpler way? setsid does not help:
$ sh -c 'setsid -w sh -c "sleep 30 & echo payload"' | cat
payload
# hangs...
Note that process groups/sessions and killing them wholesale isn't sufficient as leaked processes (like dbus-daemon) often setsid themselves.
P.S. I can only assume POSIX shell or bash in these environments; no Python, Perl, etc.
Thank you in advance!
We had this problem with parallel tests in Launchpad. The simplest solution we had then - which worked well - was just to make sure that no processes share stdout/stdin/stderr (except ones where you actually want to hang if they haven't finished - e.g. the test workers themselves).
Hmm, having re-read this I cannot give you the solution you are after (use systemd to kill them). What we came up with is to simply ignore the processes but reliably not hang when the single process we were waiting for is done. Note that this is distinctly different from the pipes getting closed.
Another option, not perfect but useful, is to become a local reaper with prctl(2) and PR_SET_CHILD_SUBREAPER. This will allow you to be the parent of all the processes that would otherwise reparent to init. With this arrangement you could try to kill all the processes that have you as ppid. This is terrible but it's the closest best thing to using cgroups.
But note, that unless you are running this helper as root you will find that practical testing might spawn some setuid thing that will lurk and won't be killable. It's an annoying problem really.
Use script -qfc instead of sh -c.

How to run shell script on VM indefinitely?

I have a VM that I want running indefinitely. The server is always running but I want the script to keep running after I log out. How would I go about doing so? Creating a cron job?
In general the following steps are sufficient to convince most Unix shells that the process you're launching should not depend on the continued existence of the shell:
run the command under nohup
run the command in the background
redirect all file descriptors that normally point to the terminal to other locations
So, if you want to run command-name, you should do it like so:
nohup command-name >/dev/null 2>/dev/null </dev/null &
This tells the process that will execute command-name to send all stdout and stderr to nowhere (instead of to your terminal) and also to read stdin from nowhere (instead of from your terminal). Of course if you actually have locations to write to/read from, you can certainly use those instead -- anything except the terminal is fine:
nohup command-name >outputFile 2>errorFile <inputFile &
See also the answer in Petur's comment, which discusses this issue a fair bit.

Resources