Systemd unit, check status with external script - systemd

The short version is:
I have a systemd unit that I want to check the return code of a script when I call:
systemctl status service.service
Long version: I had a lsb init script that did exactly that, when status was passed as parameter it called a script that checked the state of several processes and based on the return code the init system returned the state correctly of the software.
Now when adapting the script to systemd I can't find out how to configure this behaviour.

Short answer
This is impossible in systemd. The systemctl status verb always does the same thing, it cannot be overrided per-unit to a custom action.
Long answer
You can write a foo-status.service unit file with Type=oneshot and ExecStart= pointing to your custom status script, and then run systemctl start foo-status. However, this will only provide a zero/nonzero information (any nonzero exit code will be converted to 1).
To get the real exit code of your status script, run systemctl show -pExecMainStatus foo-status, however, if you go this far, then it is simpler to run your script directly.

You can use:
systemctl show -p ExecMainStatus service.service | sed 's/ExecMainStatus=//g'
This will return the exit code of the service.

If you are in control of the code of the service you start / stop that way, then you can easily edit it and save the result in a file.
Otherwise, you can always add a wrapper that does that for you.
#!/bin/sh
/path/to/service and args here
echo $? >/run/service.result
Then your status can be accessed using the contents of that file:
STATUS=`cat /run/service.result`
if test $STATUS = 1
then
echo "An error occurred..."
fi
(Side note: /run/ is only writable by root, use /tmp/ if you are not root.)

Related

Get exit code from a Makefile command and use it as a variable in Git hook

I am currently hooking into Git's pre-push hook to run PHP CS Fixer, but I'm looking for a more less clunky way of doing it. I couldn't figure out how to pass the GNU Make command's script exit code and pass in to the Git hook script.
File setup
pre-push:
#!/bin/bash
output=$(make run-cs-fixer)
exitCode="${output##*$'\n'}" # Gets last line printed from cs-fixer.sh
# Use exitCode to continue or halt push
# Exit code from cs-fixer.sh cannot be obtained here. Make seems to produce its own exit code.
Makefile:
run-cs-fixer:
#-cd docker-compose exec -T workspace bash ./cs-fixer.sh
# echo $? gives me the exit code from cs-fixer.sh, but it's not available outside of this area, even if exited with it.
cs-fixer.sh:
#!/bin/bash
./vendor/bin/php-cs-fixer fix...
exitCode=$?
echo $exitCode # This is the exit code I need
So, it goes: git push... > pre-push > Makefile > cs-fixer produces exit code that needs to be available in pre-push.
As you can see, I'm printing the exit code manually from cs-fixer.sh, so that it can be extracted when the entire output of this script is captured in pre-push using output=$(make run-cs-fixer). I just couldn't figure out how to naturally pass the exist code around.
It seems like Make starts a new shell to run its command, so that seems to be one of the problems, but couldn't get .ONESHELL to work. I was able to confirm I can echo the desired exit code with the Make command instructions (below run-cs-fixer:), but that was still unavailable in pre-push.
GNU make returns the status of its work, per the man page. You cannot change the exit status of make (other than by forcing a 0, 1 or 2 as per the man page).
To do what you want, you will need to capture the output of the make command, say via echo 0 or echo $$? and use that output as the value you're looking for.

Runit exits with error if process tells itself to go down

I'm seeing some unexpected behavior with runit and not sure how to get it to do what I want without throwing an error during termination. I have a process that sometimes knows it should stop itself and not let itself be restarted (thus should call sv d on itself). This works if I never change the user but produces errors if I switch to a non-root user when running.
I'll use the same finish script for both examples:
#!/bin/bash -e
echo "downtest finished with exit code $1 and exit status $2"
The run script that works as expected (prints downtest finished with exit code 0 and exit status 0 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
sv d downtest
exit 0
The run script that doesn't work as expected (prints downtest finished with exit code -1 and exit status 15 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
chpst -u ubuntu sudo sv d downtest
exit 0
I get the same result if I use su ubuntu instead of chpst.
Any ideas on why I see this behavior and how to fix it so calling sudo sv d downtest results in a clean process exit rather than returning error status codes?
sv d sends a SIGTERM if the process is still running. This is signal 15, hence the error being handled in the manner in question.
By contrast, to tell a running program not to start up again after it exits on its own (thus allowing that opportunity), use sv o (once) instead.
Alternately, you can trap SIGTERM in your script when you're expecting it:
trap 'exit 0' TERM
If you want to make this conditional:
trap 'if [[ $ignore_sigterm ]]; then exit 0; fi' TERM
...and then run
ignore_sigterm=1
before triggering sv d.
Has a workaround try a subshell for running (chpst -u ubuntu sudo sv d downtest) that will help to allow calling the last exit 0 since now is not being called because is exiting before.
#!/bin/sh
exec 2>&1
echo "running downtest"
(sudo sv d downtest)
exit 0
Indeed, for stopping the process you don’t need chpst -u ubuntu if want to stop or control the service as another user just need to adjust the permissions to the ./supervise directory that’s why probably you are getting the exit code -1
Checking the runsv man:
Two arguments are given to ./finish. The first one is ./run’s exit code, or -1 if ./run didn’t exit normally. The second one is the least significant byte of the exit status as determined by waitpid(2); for instance it is 0 if ./run exited normally, and the signal number if ./run was terminated by a signal. If runsv cannot start ./run for some reason, the exit code is 111 and the status is 0.
And from the faq:
Is it possible to allow a user other than root to control a service
Using the sv program to control a service, or query its status informations, only works as root. Is it possible to allow non-root users to control a service too?
Answer: Yes, you simply need to adjust file system permissions for the ./supervise/ subdirectory in the service directory. E.g.: to allow the user burdon to control the service dhcp, change to the dhcp service directory, and do
# chmod 755 ./supervise
# chown burdon ./supervise/ok ./supervise/control ./supervise/status
In case you would like to full stop/start you could remove the symlink of your run service, but that will imply to create it again when you want the service up.
Just in case, because of this and other cases, I came up with immortal to simplify the stop/start/restart/retries of services without root privileges, full based on daemontools & runit just adapted to some new flows.

Chef run sh script

I have a problem trying to run shell script via Chef (with docker-provisioning).
This is how I try to execute my script:
bash 'shell_try' do
user "root"
run = "#{some_path_to_script}/my_script.sh some_params"
code " #{run} > stdout.txt 2> stderr.txt"
end
(note that this script should run another scripts, processes and write logs)
Here's no errors in the output, but when I log into machine and run ps aux process isn't running.
I guess something wrong with permissions (or env variables), because when I try the same command manually - it works.
A bash resource just runs the provided script text directly, if you wanted to run a long-running process generally you would set up an Upstart or systemd service and use the service resource to start it.
Finally find a solution (thanks to #coderanger) -
Install supervisor:
Download supervisor cookbook
Add:
include_recipe 'supervisor::default'
Add my service to supervisor:
supervisor_service "name" do
action :enable
#action :start
command '/path/script.sh start'
end
Run supervisor service
All done!
Please see the Chef documentation for your resource: https://docs.chef.io/resource_bash.html. The bash resource does not support a run attribute. Text of the code attribute is run as a bash script. The default action is to run the script unless told otherwise by the resource.
bash 'shell_try' do
user "root"
code " #{run} > stdout.txt 2> stderr.txt"
action :run
end
The code attribute is written to a temporary file where it is then run using the attributes specified in the resource.
The line run = "#{some_path_to_script}/my_script.sh some_params" at this point does nothing.

capture exit code from a script flow

I need help with some scripts I'm writing.
Scenario:
Script A is executed by a scheduling process. This script takes the arguments passed to it, parses them in some way and runs script B feeding it with those arguments;
Script B does sudo -u user ssh user#REMOTEMACHINE, runs some commands (in the remote machine) and finally runs script C (also in the remote machine). I am passing those commands using a HERE DOCUMENT. Also, I'm passing the previous arguments to this script too.
This "flow" runs correctly and the job completes successfully.
My problems are:
Since this "flow" is ran by a scheduling process, I need to tell it if the job completed successfully or not. I'm doing this via exit codes, so what I want is to have a chain of exit codes, returning back from the last script to the first, in case of errors. I'm not able to perform this, because exit codes works correctly for the single scripts (I tried executing them singularly and look for the exit codes), but they are not sended back to the parent script. In my opinion, the problem is that ssh is getting the exit code from the child script, which in fact ended successfully, because there was no error executing it: it's the command inside of it that gone wrong.
While the process works correctly, I still get this line:
ssh: Could not resolve hostname : Name or service not known
But actually the script completes successfully.
I hope you understand what I wrote, I can eventually post my scripts here.
Thanks
O.
EDIT:
This are the scripts. There could be some problem with variable names because I renamed it quikly to upload the files.
Since I can't upload 3 files because of my low reputation, I merged them in a single file
SCRIPT FILE
I managed to solve the problem.
I followed olivier's advice and used the escape char to make the variable expanded by the remote machine.
Also I implemented different exit codes based on where the error occured.
At last, I modified the first script as follows, after launching sudo -u for the second script:
EXITCODEOFTHESECONDSCRIPT=$?
if [ $EXITCODEOFTHESECONDSCRIPT = 0 ]
then
echo ""
echo "Export job took $SECONDS seconds."
echo ""
exit 0
else
exit $EXITCODEOFTHESECONDSCRIPT
fi
This way I am able to exit the main script MAINTAINING the exit code provided from the second script.
In fact, I found that the problem was that the process worked well, even in case of errors, but the fact that I was giving more commands after the second script fail (the echo command was enough) provided other exit codes that overwrited the one I wanted.
Thanks to all !

Net::SSH::Shell::Process $DONTEVERUSETHIS

When using Net::SSH to run commands on a remote connection, it adds the following script to the end of each and every command:
DONTEVERUSETHIS=$?; echo #{manager.separator} $DONTEVERUSETHIS; echo \"exit $DONTEVERUSETHIS\"|sh
the output produced looks like:
DONTEVERUSETHIS=$?; echo 10e75e2821012645fa3a3cc08ec5de527a392af68db4c3cac63dac22d4de2a8708fcc176190817fe $DONTEVERUSETHIS; echo "exit $DONTEVERUSETHIS"|sh
Here's a link to the source code Net::SSH::Shell::Process and look at the 'run' method
Can anyone explain why this is always added?
It doesn't appear in the console output but plays hell with parsing ~/.bash_history
A quick look into the source repository reveals this commit:
keep the exitcode 1 available for the next command
In effect, this allows you to inspect the value of $? (i.e. the exitcode of the previous command) in the next command.
TL;DR: It's the machine readable equivalent of a colored shell prompt. It's there to tell the library when the issued command has finished, and whether it was successful.
When running a command with Net::SSH (not ::Shell), here's what happens:
Connection is established
Command is sent
Output is received
The command exits, sshd returns the exit code and ends the connection.
This means that it's easy to:
Get the output: just read until sshd closes the connection.
Get the exit code. sshd returns it.
However, it means that each command is run in a separate session, so cd /tmp followed by pwd will return /home/youruser because these are two different sessions, so the former doesn't affect the latter.
The purpose of Net::SSH::Shell is instead to run multiple, individual commands in the same shell session:
Connection is established.
Commands are sent as a single, infinite, concatenated stream
Output is received as a single, infinite, concatenated stream
This leaves two open questions:
How do you know whether the command has finished or whether it's still processing?
How do you get the exit code now that sshd doesn't return it?
The way Net::SSH::Shell solves this is by modifying the command in the way you saw, to make it print a unique ID and exit code when done:
To get the command's output, read until a line with the unique ID is printed.
To get the exit code, read it from the same line.

Resources