syntax for identifying a failed service on Ubuntu - bash

A server has nginx falling over frequently and needs to have a sude service nginx restart executed.
A suggestion has been the following bash script:
service nginx status | grep 'active (running)' > /dev/null 2>&1
if [ $? != 0 ]
then
sudo service nginx restart > /dev/null
fi
Being thoroughly unversed in bash, there are two propositions that are opaque to me and require clarification:
> /dev/null 2>&1
and
[ $? != 0 ]
Because the response to service nginx status returns a clear statement:
Active: failed (Result ... and thus I would intuitively devise the if statement to focus on failed ...

Being thoroughly unversed in bash,
(It is probably time to do a bash scripting tutorial then. Note that this code will probably work with any POSIX compliant shell, not just bash. So an sh tutorial would do too.)
... there are two propositions that are opaque to me and require clarification:
> /dev/null 2>&1
That means "write stdout to /dev/null and write stderr (2) to the same place as stdout (1)". In short, throw away the output from grep.
and
[ $? != 0 ]
$? expands to the exit code of the last command, so this means "test if the last command exited with a non-zero exit code; i.e. if it failed.
In the case of a pipeline, the last command in the pipeline supplies the exit code. In this case, it will be the grep, which is specified to give a non-zero exit code if it doesn't find any matching lines.
Because the response to service nginx status returns a clear statement: Active: failed (Result ... and thus I would intuitively devise the if statement to focus on failed ...
Well, that doesn't take account of the possibility that service nginx status doesn't return any output for some reason. It is unlikely that will happen, but this version takes account of that. Also, the actual output of the systemd script for nginx status is most likely not specified. It might change and that would break this script.
Anyway ... there are many ways to implement something like this. This way works, and that's all that really matters.

You can change systemd script to restart service always or on-failure
https://www.freedesktop.org/software/systemd/man/systemd.service.html

Related

Shell exit codes inconsistants with simple command

I have an issue that should not be too hard to solve, I just can't figure what I'm doing wrong.
I need to test if a command is successful or not, and the command needs to be executed from a script. The command is:
curl 127.0.0.1:5000 &> /dev/null
Right now there is no server running, so it should always fail. And it does fail when I execute it from a command line. However when I run it from inside a shell script, it fails but the exit code is 0. What could the cause of that be?
Here is the script:
if curl 127.0.0.1:5000 &> /dev/null
then
echo "sucess"
exit 0
else
echo "failure"
exit 1
fi
And here is the output:
success
curl: (7) Failed to connect to 127.0.0.1 port 5000: Connection refused
However, it does work as expected if I remove the redirection (I'm quite a beginner in shell code, but the redirection shouldn't also redirect the exit code right? So I really don't know what this means)
here is the code without redirections that works as expected (therefore that indicates a failure and has an exit code of 1):
if curl 127.0.0.1:5000
then
echo "sucess"
exit 0
else
echo "failure"
exit 1
fi
Anyone has an idea?
Edit:
I was launching the script with sh script_name.sh in zsh. When I use zsh script_name.sh it now works normally. I still don't fully understand why but at least it works!
"&> /dev/null" is interpreted differently in Bourne shell (sh), The "&" puts the command in background, you can test it with "sleep 100 &>/dev/null". Since it successfully put the command in background, it is a success, and the exit status of the backgrounded command is disregarded.
If you want it to work in Bourne shell (sh), use the traditional syntax ">/dev/null 2>&1", and it will work in newer shells as well, i.e. it is more compatible.
In a system where sh is linked to bash, it will work as is.

Loop trough docker output until I find a String in bash

I am quite new to bash (barely any experience at all) and I need some help with a bash script.
I am using docker-compose to create multiple containers - for this example let's say 2 containers. The 2nd container will execute a bash command, but before that, I need to check that the 1st container is operational and fully configured. Instead of using a sleep command I want to create a bash script that will be located in the 2nd container and once executed do the following:
Execute a command and log the console output in a file
Read that file and check if a String is present. The command that I will execute in the previous step will take a few seconds (5 - 10) seconds to complete and I need to read the file after it has finished executing. I suppose i can add sleep to make sure the command is finished executing or is there a better way to do this?
If the string is not present I want to execute the same command again until I find the String I am looking for
Once I find the string I am looking for I want to exit the loop and execute a different command
I found out how to do this in Java, but if I need to do this in a bash script.
The docker-containers have alpine as an operating system, but I updated the Dockerfile to install bash.
I tried this solution, but it does not work.
#!/bin/bash
[command to be executed] > allout.txt 2>&1
until
tail -n 0 -F /path/to/file | \
while read LINE
do
if echo "$LINE" | grep -q $string
then
echo -e "$string found in the console output"
fi
done
do
echo "String is not present. Executing command again"
sleep 5
[command to be executed] > allout.txt 2>&1
done
echo -e "String is found"
In your docker-compose file make use of depends_on option.
depends_on will take care of startup and shutdown sequence of your multiple containers.
But it does not check whether a container is ready before moving to another container startup. To handle this scenario check this out.
As described in this link,
You can use tools such as wait-for-it, dockerize, or sh-compatible wait-for. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections.
OR
Alternatively, write your own wrapper script to perform a more application-specific health check.
In case you don't want to make use of above tools then check this out. Here they use a combination of HEALTHCHECK and service_healthy condition as shown here. For complete example check this.
Just:
while :; do
# 1. Execute a command and log the console output in a file
command > output.log
# TODO: handle errors, etc.
# 2. Read that file and check if a String is present.
if grep -q "searched_string" output.log; then
# Once I find the string I am looking for I want to exit the loop
break;
fi
# 3. If the string is not present I want to execute the same command again until I find the String I am looking for
# add ex. sleep 0.1 for the loop to delay a little bit, not to use 100% cpu
done
# ...and execute a different command
different_command
You can timeout a command with timeout.
Notes:
colon is a utility that returns a zero exit status, much like true, I prefer while : instead of while true, they mean the same.
The code presented should work in any posix shell.

Runit exits with error if process tells itself to go down

I'm seeing some unexpected behavior with runit and not sure how to get it to do what I want without throwing an error during termination. I have a process that sometimes knows it should stop itself and not let itself be restarted (thus should call sv d on itself). This works if I never change the user but produces errors if I switch to a non-root user when running.
I'll use the same finish script for both examples:
#!/bin/bash -e
echo "downtest finished with exit code $1 and exit status $2"
The run script that works as expected (prints downtest finished with exit code 0 and exit status 0 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
sv d downtest
exit 0
The run script that doesn't work as expected (prints downtest finished with exit code -1 and exit status 15 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
chpst -u ubuntu sudo sv d downtest
exit 0
I get the same result if I use su ubuntu instead of chpst.
Any ideas on why I see this behavior and how to fix it so calling sudo sv d downtest results in a clean process exit rather than returning error status codes?
sv d sends a SIGTERM if the process is still running. This is signal 15, hence the error being handled in the manner in question.
By contrast, to tell a running program not to start up again after it exits on its own (thus allowing that opportunity), use sv o (once) instead.
Alternately, you can trap SIGTERM in your script when you're expecting it:
trap 'exit 0' TERM
If you want to make this conditional:
trap 'if [[ $ignore_sigterm ]]; then exit 0; fi' TERM
...and then run
ignore_sigterm=1
before triggering sv d.
Has a workaround try a subshell for running (chpst -u ubuntu sudo sv d downtest) that will help to allow calling the last exit 0 since now is not being called because is exiting before.
#!/bin/sh
exec 2>&1
echo "running downtest"
(sudo sv d downtest)
exit 0
Indeed, for stopping the process you don’t need chpst -u ubuntu if want to stop or control the service as another user just need to adjust the permissions to the ./supervise directory that’s why probably you are getting the exit code -1
Checking the runsv man:
Two arguments are given to ./finish. The first one is ./run’s exit code, or -1 if ./run didn’t exit normally. The second one is the least significant byte of the exit status as determined by waitpid(2); for instance it is 0 if ./run exited normally, and the signal number if ./run was terminated by a signal. If runsv cannot start ./run for some reason, the exit code is 111 and the status is 0.
And from the faq:
Is it possible to allow a user other than root to control a service
Using the sv program to control a service, or query its status informations, only works as root. Is it possible to allow non-root users to control a service too?
Answer: Yes, you simply need to adjust file system permissions for the ./supervise/ subdirectory in the service directory. E.g.: to allow the user burdon to control the service dhcp, change to the dhcp service directory, and do
# chmod 755 ./supervise
# chown burdon ./supervise/ok ./supervise/control ./supervise/status
In case you would like to full stop/start you could remove the symlink of your run service, but that will imply to create it again when you want the service up.
Just in case, because of this and other cases, I came up with immortal to simplify the stop/start/restart/retries of services without root privileges, full based on daemontools & runit just adapted to some new flows.

Exit status of a Command in Bash Scripting is always true

I'm trying to run a command ( gerrit query ) in bash and assign that to a variable.
I'm using this is a bash script file & I want to handle the case that if the command throws an error( i.e if the gerrit query fails), I should be able to handle the same.
For example:
var=`ssh -p $GERRIT_PORT_NUMBER $GERRIT_SERVER_NAME gerrit query --current-patch-set $PATCHSET_ID`
I do know that I can check the last exit status using $? in bash, but for the above case, the assignment to the variable over-rides the earlier exit status ( i.e the gerrit query failure status) and the above command never fails. It is always true.
Can you let me know if there is a way to handle the exit status of a command even when it is assigned to a variable in bash.
Update:
My assumption was wrong here that an assignment was causing the overriding of the exit status and Charles example and explanation in his answer are correct.
The real reason for the exit status being overridden was I was piping the output of the above command to a sed script which was the culprit in overriding the exit status. I found the following which helped me to resolve the issue. https://unix.stackexchange.com/questions/14270/get-exit-status-of-process-thats-piped-to-another/73180#73180 Pipe output and capture exit status in Bash
Complete command that I was trying.
var=ssh -p $GERRIT_PORT_NUMBER $GERRIT_SERVER_NAME gerrit query --current-patch-set $PATCHSET_ID | sed 's/message//'
The assertion made in this question is untrue; assignments do not modify exit status. You can check this yourself:
var=$(false); echo $?
...will correctly emit 1.
That said, if an assignment is done in the context of a local, declare, or similar keyword, this may no longer hold true:
f() { local var=$(false); echo $?; }; f
...will emit 0, and is worked around by separating out the local from the assignment:
f() { local var; var=$(false); echo $?; }; f
...which correctly returns 1.
SSH itself also returns exit status correctly, as you can similarly test yourself:
ssh localhost false; echo $?
...correctly returns 1.
The reasonable conclusion, then, is that gerrit itself is failing to convey a non-successful exit status. This bug should be addressed through gerrit's support mechanisms, rather than as a bash question.

Best Option for resumable script

I am writing a script that executes around 10 back-end processes in sequence, depending on if the previous process was executed without any errors.
Now let's assume the scenario, in which lets say 5th process failed and script came out. But I want to code it in a way such that, when next time user runs it(after removing the error because of which script exited last time), he should be able to run from 5th process onwards and not again from 1st process.
To be more specific, assume following is the script:
Script Starts
Process1
if [ $? -eq 0 ] then
Process2
if [ $? -eq 0 ] then
Process3
if [ $? -eq 0 ] then
..
..
..
..
if [ $? -eq 0 ] then
Process10
else
exit
So here the script will exit anytime if any one of the process fails to complete with status 0. So again, if process5 fails, and user corrects the problem and restarts script, the script should start with process5 again and not process1 or at least there should be an option to user if he wants to resume the script or start it back from beginning i.e. process1.
What all possible ways we can code this kind of script, also please bear in mind, I am not allowed to use a temporary db, where I can store the status of each process.
I need to code in sh (shell script) in unix.
A simple solution would be to write stamp files:
#/bin/sh
set -e # Automatically abort if any simple command fails
if ! test -f cmd1-stamp; cmd1; fi
touch cmd1-stamp
if ! test -f cmd2-stamp; cmd2; fi
touch cmd2-stamp
When the script executes, if cmd1-stamp exists, cmd1 is not executed. Otherwise, cmd1 is executed. The script will abort if it fails. Note that it is very tempting to write test -f cmd1-stamp || cmd1, and this seems to work ( in bash ) but the shell specs state that the shell shall abort if the simple command that fails is not a part of an AND or OR list, and I suspect this is (yet another) instance of bash not conforming to the spec. (Although it doesn't seem to specify that the shell shall not abort if the failing command is part of an AND or OR list.)

Resources