Runit exits with error if process tells itself to go down - bash

I'm seeing some unexpected behavior with runit and not sure how to get it to do what I want without throwing an error during termination. I have a process that sometimes knows it should stop itself and not let itself be restarted (thus should call sv d on itself). This works if I never change the user but produces errors if I switch to a non-root user when running.
I'll use the same finish script for both examples:
#!/bin/bash -e
echo "downtest finished with exit code $1 and exit status $2"
The run script that works as expected (prints downtest finished with exit code 0 and exit status 0 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
sv d downtest
exit 0
The run script that doesn't work as expected (prints downtest finished with exit code -1 and exit status 15 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
chpst -u ubuntu sudo sv d downtest
exit 0
I get the same result if I use su ubuntu instead of chpst.
Any ideas on why I see this behavior and how to fix it so calling sudo sv d downtest results in a clean process exit rather than returning error status codes?

sv d sends a SIGTERM if the process is still running. This is signal 15, hence the error being handled in the manner in question.
By contrast, to tell a running program not to start up again after it exits on its own (thus allowing that opportunity), use sv o (once) instead.
Alternately, you can trap SIGTERM in your script when you're expecting it:
trap 'exit 0' TERM
If you want to make this conditional:
trap 'if [[ $ignore_sigterm ]]; then exit 0; fi' TERM
...and then run
ignore_sigterm=1
before triggering sv d.

Has a workaround try a subshell for running (chpst -u ubuntu sudo sv d downtest) that will help to allow calling the last exit 0 since now is not being called because is exiting before.
#!/bin/sh
exec 2>&1
echo "running downtest"
(sudo sv d downtest)
exit 0
Indeed, for stopping the process you don’t need chpst -u ubuntu if want to stop or control the service as another user just need to adjust the permissions to the ./supervise directory that’s why probably you are getting the exit code -1
Checking the runsv man:
Two arguments are given to ./finish. The first one is ./run’s exit code, or -1 if ./run didn’t exit normally. The second one is the least significant byte of the exit status as determined by waitpid(2); for instance it is 0 if ./run exited normally, and the signal number if ./run was terminated by a signal. If runsv cannot start ./run for some reason, the exit code is 111 and the status is 0.
And from the faq:
Is it possible to allow a user other than root to control a service
Using the sv program to control a service, or query its status informations, only works as root. Is it possible to allow non-root users to control a service too?
Answer: Yes, you simply need to adjust file system permissions for the ./supervise/ subdirectory in the service directory. E.g.: to allow the user burdon to control the service dhcp, change to the dhcp service directory, and do
# chmod 755 ./supervise
# chown burdon ./supervise/ok ./supervise/control ./supervise/status
In case you would like to full stop/start you could remove the symlink of your run service, but that will imply to create it again when you want the service up.
Just in case, because of this and other cases, I came up with immortal to simplify the stop/start/restart/retries of services without root privileges, full based on daemontools & runit just adapted to some new flows.

Related

Break not exit loop when it's already executed in BASH

I am trying to make a bash script which will do a command when a condition is met.
#!/bin/bash
/bin/journalctl -f -u service_one.service | while read LOGLINE
do
[[ "${LOGLINE}" == *"Authenticated"* ]] && /bin/systemctl restart serviceone_recorder.service && echo "Service_one is ready, Restarting Recorder Service" && break
done
/bin/journalctl -f -u service_two.service | while read LOGLINEC
do
[[ "${LOGLINEC}" == *"Server:main: Started"* ]] && /bin/systemctl restart servicetwo_recorder.service && echo "Service two is ready, Restarting Recorder Service" && break
done
echo "Done!"
So, the first loop is working well, the break function does the job, but the second loop doesn't. It doesn't exit the loop even after it is executed.
I have tried to run script with bash -x ( for tracing) and I can see that break is executed.
+ read LOGLINEC
+ [[ Nov 02 22:00:03 debian9 AM[26336]: 2022-11-02 22:00:03.984:INFO:oejs.Server:main: Started #164179ms == *\S\e\r\v\e\r\:\m\a\i\n\:\ \S\t\a\r\t\e\d* ]]
+ /bin/systemctl restart tassta_recorder.service
+ echo 'Service two is ready, Restarting Recorder Service'
Service two is ready, Restarting Recorder Service
+ break
and it's stuck forever on that loop.
Maybe someone could help me?
As I suggested in the comments, the problem isn't that the break isn't working, it's that the /bin/journalctl ...| while read pipeline doesn't finish until both journalctl and the while loop exit. The while loop exits due to break, but journalctl keeps running until the next time it tries to write to the (now-closed) pipe, gets a SIGPIPE signal, and that makes it exit. This may take a while, and in the meantime, the overall script just hangs waiting for it.
I haven't adequately tested this, but think you can mostly solve this by running journalctl in the background, like this:
{ /bin/journalctl -f -u service_two.service 2>/dev/null & } | while read LOGLINEC
...but this does have the problem that it will leave journalctl running in the background, possibly for quite a while. I did add a redirect for errors, so if it gets an error later it won't randomly show up while you're doing something else.
(Note: the usual solution to problems like this is to store the PID of the background process in a variable, and kill it afterward; but here it's created in a subshell, and its variables won't be available in the parent shell. I suppose you could store the PID in a temp file...)

syntax for identifying a failed service on Ubuntu

A server has nginx falling over frequently and needs to have a sude service nginx restart executed.
A suggestion has been the following bash script:
service nginx status | grep 'active (running)' > /dev/null 2>&1
if [ $? != 0 ]
then
sudo service nginx restart > /dev/null
fi
Being thoroughly unversed in bash, there are two propositions that are opaque to me and require clarification:
> /dev/null 2>&1
and
[ $? != 0 ]
Because the response to service nginx status returns a clear statement:
Active: failed (Result ... and thus I would intuitively devise the if statement to focus on failed ...
Being thoroughly unversed in bash,
(It is probably time to do a bash scripting tutorial then. Note that this code will probably work with any POSIX compliant shell, not just bash. So an sh tutorial would do too.)
... there are two propositions that are opaque to me and require clarification:
> /dev/null 2>&1
That means "write stdout to /dev/null and write stderr (2) to the same place as stdout (1)". In short, throw away the output from grep.
and
[ $? != 0 ]
$? expands to the exit code of the last command, so this means "test if the last command exited with a non-zero exit code; i.e. if it failed.
In the case of a pipeline, the last command in the pipeline supplies the exit code. In this case, it will be the grep, which is specified to give a non-zero exit code if it doesn't find any matching lines.
Because the response to service nginx status returns a clear statement: Active: failed (Result ... and thus I would intuitively devise the if statement to focus on failed ...
Well, that doesn't take account of the possibility that service nginx status doesn't return any output for some reason. It is unlikely that will happen, but this version takes account of that. Also, the actual output of the systemd script for nginx status is most likely not specified. It might change and that would break this script.
Anyway ... there are many ways to implement something like this. This way works, and that's all that really matters.
You can change systemd script to restart service always or on-failure
https://www.freedesktop.org/software/systemd/man/systemd.service.html

Check processes run by cronjob to avoid multiple execution

How do I avoid cronjob from executing multiple times on the same command? I had tried to look around and try to check and kill in processes but it doesn't work with the below code. With the below code it keeps entering into else condition where it suppose to be "running". Any idea which part I did it wrongly?
#!/bin/sh
devPath=`ps aux | grep "[i]mport_shell_script"` | xargs
if [ ! -z "$devPath" -a "$devPath" != " " ]; then
echo "running"
exit
else
while true
do
sudo /usr/bin/php /var/www/html/xxx/import_from_datafile.php /dev/null 2>&1
sleep 5
done
fi
exit
cronjob:
*/2 * * * * root /bin/sh /var/www/html/xxx/import_shell_script.sh /dev/null 2>&1
I don't see the point to add a cron job which then starts a loop that runs a job. Either use cron to run the job every minute or use a daemon script to make sure your service is started and is kept running.
To check whether your script is already running, you can use a lock directory (unless your daemon framework already does that for you):
LOCK=/tmp/script.lock # You may want a better name here
mkdir $LOCK || exit 1 # Exit with error if script is already running
trap "rmdir $LOCK" EXIT # Remove the lock when the script terminates
...normal code...
If your OS supports it, then /var/lock/script might be a better path.
Your next question is probably how to write a daemon. To answer that, I need to know what kind of Linux you're using and whether you have things like systemd, daemonize, etc.
check the presence of a file at the beginning of your script ( for example /tmp/runonce-import_shell_script ). If it exists, that means the same script is already running (or the previous one halted with an error).
You can also add a timestamp in that file so you can check since when the script was running (and maybe decide to run it again after 24h even if the file is present)

Hudson : "yes: standard output: Broken pipe"

I need to run a shell script in hudson. That script needs an answer from the user. To give an automatic answer I did the following command line :
yes | ./MyScript.sh
This works well in Ubuntu terminal. But when I use the same command in the Hudson job, the script will be automated and do all the needed work, but at the end, I get these two lines of error :
yes: standard output: Broken pipe
yes: write error
And this causes the failure to my Hudson job.
How should I change my command line to work well in Hudson?
But how would you explain that I dont get this error while running the script locally, but I get the error when running it remotely from a Hudson job?
When you are running it in a terminal (locally); yes is killed by SIGPIPE signal that is generated when it tries to write to the pipe when MyScript.sh has already exited.
Whatever runs the command (remotely) in Hudson traps that signal (set its handler to SIG_IGN, you can test it by running trap command and searching for SIGPIPE in the output) and it doesn't restore the signal for new child processes (yes and whatever runs MyScript.sh e.g., sh in your case). It leads to the write error (EPIPE) instead of the signal. yes detects the write error and reports it.
You can simply ignore the error message:
yes 2>/dev/null | ./MyScript.sh
You could also report the bug against the component that runs the pipeline. The bug is in not restoring SIGPIPE to the default handler after the child is forked. It is what programs expect when they are run in a terminal on POSIX systems. Though I don't know whether there is a standard way to do it for a java-based program. jvm probably raises an exception for every write error so not-dying on SIGPIPE is not a problem for a java program.
It is common for daemons such as hudson process to ignore SIGPIPE signal. You don't want your daemon to die only because the process you are communicating with dies and you would check for write errors anyway.
Ordinary programs that are written to be run in a terminal do not check status of every printf() for errors but you want them to die if programs down the pipeline die e.g., if you run source | sink pipeline; usually you want source process to exit as soon as possible if sink exits.
EPIPE write error is returned if SIGPIPE signal is disabled (as it looks like in hudson's case) or if a program does not die on receiving it (yes program does not defined any handlers for SIGPIPE so it should die on receiving the signal).
I don't want to ignore the error, I want to do the right command or fix to get rid of the error.
the only way yes process stops if it is killed or encountered a write error. If SIGPIPE signal is set to be ignored (by the parent) and no other signal kills the process then yes receives write error on ./MyScript.sh exit. There are no other options if you use yes program.
SIGPIPE signal and EPIPE error communicate the exact same information -- pipe is broken. If SIGPIPE were enabled for yes process then you wouldn't see the error. And only because you see it; nothing new happens. It just means that ./MyScript.sh exited (successfully or unsuccessfully -- doesn't matter).
I had this error, and my problem with it is not that it output yes: standard output: Broken pipe but rather than it returns an error code.
Because I run my script with bash strict mode including -o pipefail, when yes "errors" it causes my script to error.
How to avoid an error
The way I avoided this is like so:
bash -c "yes || true" | my-script.sh
You are trying to use the yes program to pipe to the script? or echo yes to the script? If the process is working through jenkins, add "; true" to the end of your shell command.
Since yes and ./MyScript.sh can each be run in an explicit subshell, it is possible to background the yes command, send yespid to the ./MyScript.sh subshell and then implement a trap on EXIT there to manually terminate the yes command. (The trap on EXIT should always be implemented in the subshell of the last command of a piped commmand sequence).
# avoid hangup or "broken pipe" error message when parent process set SIGPIPE to be ignored
# sleep 0 or cat /dev/null: do nothing but with external command (for a shell builtin command see: help :)
(
trap "" PIPE
( (sleep 0; exec yes) & echo ${!}; wait ${!} ) |
(
trap 'trap - EXIT; kill "$yespid"; exit 0' EXIT
yespid="$(head -n 1)"
head -n 10 # replacement for ./MyScript.sh
)
echo ${PIPESTATUS[*]}
)
If you want to exit the yes subshell with exit code 0 you can do this as well:
# avoid hangup or "broken pipe" error message when parent process set SIGPIPE to be ignored
# set exit code of yes subshell to 0
(
trap "" PIPE
(
trap 'trap - TERM; echo "kill from yes subshell ..." 1>&2; kill "${!}"; exit 0' TERM
subshell_pid="$(bash -c 'echo "$PPID"')"
(sleep 0; exec yes) & echo "${subshell_pid}"; wait ${!}
) |
(
trap 'trap - EXIT; kill -s TERM "$subshell_pid"; exit' EXIT
subshell_pid="$(head -n 1)"
head -n 10 # replacement for ./MyScript.sh
)
echo ${PIPESTATUS[*]}
)
The command yes being running in an infinite loop I supposed that this might be the solution :
yes | head -1 | ./MyScript.sh #only one "Y" would be output of the command yes
But, I got the same error.
We can redirect the error to /dev/null as suggested by #J.F. Sebastian, or enforce that the command is correct by this :
yes | head -1 | ./MyScript.sh || yes
But, this suggestions were less appreciated. And so, I had to create my own named pipe, as follow :
mkfifo /tmp/my_fifo #to create the named pipe
exec 3<>/tmp/my_fifo #to make the named pipe in read and write mode by assigning it to a file descriptor
echo "Y" >/tmp/my_fifo #to write into the named pipe, "Y" is the default value of yes
./MyScript.sh </tmp/my_fifo #to read from the named pipe
rm /tmp/my_fifo #remove the named pipe
I'm expecting more valuable solutions with greater explainations.
Here it is an explaination for a file descriptor in linux.
Thanks

Best Option for resumable script

I am writing a script that executes around 10 back-end processes in sequence, depending on if the previous process was executed without any errors.
Now let's assume the scenario, in which lets say 5th process failed and script came out. But I want to code it in a way such that, when next time user runs it(after removing the error because of which script exited last time), he should be able to run from 5th process onwards and not again from 1st process.
To be more specific, assume following is the script:
Script Starts
Process1
if [ $? -eq 0 ] then
Process2
if [ $? -eq 0 ] then
Process3
if [ $? -eq 0 ] then
..
..
..
..
if [ $? -eq 0 ] then
Process10
else
exit
So here the script will exit anytime if any one of the process fails to complete with status 0. So again, if process5 fails, and user corrects the problem and restarts script, the script should start with process5 again and not process1 or at least there should be an option to user if he wants to resume the script or start it back from beginning i.e. process1.
What all possible ways we can code this kind of script, also please bear in mind, I am not allowed to use a temporary db, where I can store the status of each process.
I need to code in sh (shell script) in unix.
A simple solution would be to write stamp files:
#/bin/sh
set -e # Automatically abort if any simple command fails
if ! test -f cmd1-stamp; cmd1; fi
touch cmd1-stamp
if ! test -f cmd2-stamp; cmd2; fi
touch cmd2-stamp
When the script executes, if cmd1-stamp exists, cmd1 is not executed. Otherwise, cmd1 is executed. The script will abort if it fails. Note that it is very tempting to write test -f cmd1-stamp || cmd1, and this seems to work ( in bash ) but the shell specs state that the shell shall abort if the simple command that fails is not a part of an AND or OR list, and I suspect this is (yet another) instance of bash not conforming to the spec. (Although it doesn't seem to specify that the shell shall not abort if the failing command is part of an AND or OR list.)

Resources