How to specify commands to be run before shutdown with NUT in case of power & network failure? - bash

I have a NAS with a usb plugged UPS on it.
I have installed nut client (only the client) on my raspberry to monitor the UPS.
I must plug the UPS to the NAS because it's the only device which recognizes my UPS (chinese generic, unfortunately no choice in my country).
No way to make it directly work plugged to the raspberry.
As I've installed only nut client into my raspberry, /etc/nut was empty. I've created 3 files:
nut.conf
MODE=netclient
upsmon.conf
MONITOR login#nas_ip 1 admin password slave
SHUTDOWNCMD "/sbin/shutdown -h now"
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
NOTIFYFLAG ONLINE SYSLOG+WALL+EXEC
NOTIFYCMD "/etc/nut/notifycmd"
the notifycmd script:
#!/bin/bash
#
# NUT NOTIFYCMD script
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/usr/local/bin
trap "exit 0" SIGTERM
if [ "$NOTIFYTYPE" = "ONLINE" ]
then
echo $0: power restored | wall
# curl command to send me a sms
# Cause all instances of this script to exit.
killall -s SIGTERM `basename $0`
fi
if [ "$NOTIFYTYPE" = "ONBATT" ]
then
echo $0: 40 minutes till system powers down... | wall
#curl command to send me a sms
# Loop with one second interval to allow SIGTERM reception.
let "n = 2400"
while [ $n -ne 0 ]
do
sleep 1
let "n--"
done
echo $0: commencing shutdown | wall
sleep 10
upsmon -c fsd
fi
My UPS is powerful enough to wait 40 minutes before shuting down the nas and the raspberry remotely.
I receive a sms when power loss and power restored.
But there is a special case: if the raspberry lost connection with the ups when it is on battery, the raspberry shut down almost immediately... I think this is a protection behavior, but i'd like to insert a curl command which sends a sms before shutdown, like "power loss and connexion failure with ups"
(i know the command itself, this is not the problem)
I know I must add something like "DEADTIME" or "NOTIFYFLAG NOCOMM" or "COMMBAD" in upsmon.conf, but not sure what to choose...
And what do i have to add in my "notifycmd" file ?
Thanks

You may raise the value of DEADTIME, if you feel that this would be good for your Raspberry Pi. From a recent upsmon.conf:
# upsmon requires a UPS to provide status information every few seconds
# (see POLLFREQ and POLLFREQALERT) to keep things updated. If the status
# fetch fails, the UPS is marked stale. If it stays stale for more than
# DEADTIME seconds, the UPS is marked dead.
#
# A dead UPS that was last known to be on battery is assumed to have gone
# to a low battery condition. This may force a shutdown if it is providing
# a critical amount of power to your system.
If you know for sure that the UPS has enough battery for your Raspberry Pi (even when you can't talk with it) you may rise DEADTIME to something like 120 (2 minutes) or even 300 (5 minutes). Just keep in mind that the value should be a multiple of POLLFREQ and POLLFREQALERT.
After this, upsmon will have some more time to run your NOTIFYCMD.
Now, just add a case for COMMBAD:
COMMBAD is triggered as long as the communication is lost
COMMOK is triggered as long as the communication is recovered
NOCOMM is triggered as long as the communication has not been recovered for some time
Also, consider using the NOTIFYCMD just for notifying something, not doing the dirty job (e.g. shutting down the system).

Related

Get output of server crash from server re-spawning script

I currently have Homebridge set up on my raspberry pi. When the pi boots, it starts a script which attempts to keep homebridge alive. I originally took the script from this answer which walks you through the rather trivial process of creating such a script. However, I have slightly adapted the script and it now looks like this:
until "homebridge" -s /bin/sh pi; do
echo "Server homebridge crashed with exit code $?. Respawning.." >&2
echo "Looks like Homebridge just crashed, restarting it now..." | mail -s "Homebridge Crash" pi
rm -r /home/pi/.homebridge/accessories/cachedAccessories
sleep 1
done
It is virtually the same as the original script with the exception that it deletes a folder and waits a second before re-spawning. Furthermore, it sends some mail to my user (pi) to let me know that the process has died and that it is re-spawning. This has been working perfectly for me with the simple omission of any sort of de-bugging. By that I mean that whilst I do get notified that the process has died, I am not presented with an output of the process when it died. It would be perfect if the mail could include, for example, the last 300 lines before the process exited in order to aid with debugging following a crash
What exactly would I need to add to the above script in order to receive a 'log' of what the homebridge output was just before it crashed in order to help with debugging?
Thank you in advance for your help,
Kind regards, Rocco

Shell script: How to loop run two programs?

I'm running an Ubuntu server to mine crypto. It's not a very stable coin yet and their main node gets disconnected sometimes. When this happens it crashes the program through fatal error.
At first I wrote a loop script so it would keep running after a crash and just try again after 15 seconds:
while true;
do ./miner <somecodetoconfiguretheminer> &&break;
sleep 15
done;
This works, but is inefficient. Sometimes the loop will keep running for 30 minutes until the main node is back up - which costs me 30 minutes of hashing power unused. So I want it to run a second miner for 15 minutes to mine another coin, then check the first miner again if its working yet.
So basically: Start -> Mine coin 1 -> if crash -> Mine coin 2 for 15 minutes -> go to Start
I tried the script below but the server just becomes unresponsive once the first miner disconnects:
while true;
do ./miner1 <somecodetoconfiguretheminer> &&break;
timeout 900 ./miner2
sleep 15
done;
Ive read through several topics / questions on how &&break works, timeout works and how while true works but I can't figure out what I'm missing here.
Thanks in advance for the help!
A much simpler solution would be to run both of the programs all the time, and lower the priority of the less-preferred one. On Linux and similar systems, that is:
nice -10 ./miner2loop.sh &
./miner1loop.sh
Then the scripts can be similar to your first one.
Okay, so after trial and error - and some help - I found out that there is nothing wrong with my initial code. Timeout appears to behave differently on my linux instance when used in terminal than in a bash script. If used in Terminal it behaves as it should, it counts down and then kills the process it started. If used in bash however - it acts as if I typed 'sleep' and then after counting down stops.
Apparently this has to do with my Ubuntu instance (running on a VPS). Even though I installed latest versions of coreutils, have all the latest versions installed through apt-get update etc. This is the case for me on Digital Ocean as well as Google Compute.
The solution is to use the Timeout code as a function within the bash script, as found on another thread in stackoverflow. I named the function timeout2 as to not confuse the system in triggering the not properly working timeout command:
#!/bin/bash
# Executes command with a timeout
# Params:
# $1 timeout in seconds
# $2 command
# Returns 1 if timed out 0 otherwise
timeout2() {
time=$1
# start the command in a subshell to avoid problem with pipes
# (spawn accepts one command)
command="/bin/sh -c "$2""
expect -c "set echo "-noecho"; set timeout $time; spawn -noecho
$command; expect timeout { exit 1 } eof { exit 0 }"
if [ $? = 1 ] ; then
echo "Timeout after ${time} seconds"
fi
}
while true;
do
./miner1 <parameters for miner> && break;
sleep 5
timeout2 300 ./miner2 <parameters for miner>
done;

checking per ssh if a specific program is still running, in parallel

I have several machines where I have a program running. Every 30 seconds or so I want to check if those programs are still running. I use the following command to do that.
ssh ${USER}#${HOSTS[i]} "bash -c 'if [[ -z \"\$(pgrep -u ${USER} program)\" ]]; then exit 1; else exit 0; fi'"
Now running this on >100 machines takes a long time and I want to speed that up by checking in parallel. I am aware of '&' and 'parallel', but I am unsure how to retreive the return value (task completed or not).
The following lets all connections complete before starting any in the next batch, and thus can potentially wait for more than 30 seconds -- but should give you a good idea of how to do what you're looking for:
hosts=( host1 host2 host3 )
user=someuser
script="script you want to run on each remote host"
last_time=$(( SECONDS - 30 ))
while (( ( SECONDS - last_time ) >= 30 )) || \
sleep $(( 30 - (SECONDS - last_time) )); do
last_time=$SECONDS
declare -A pids=( )
for host in "${hosts[#]}"; do
ssh "${user}#${host}" "$script" & pids[$!]="$host"
done
for pid in "${!pids[#]}"; do
wait "$pid" || {
echo "Failure monitoring host ${pids[$pid]} at time $SECONDS" >&2
}
done
done
Now, bigger picture: Don't do that.
Almost every operating system has a process supervision framework. Ubuntu has Upstart; Fedora and CentOS 7 have systemd; MacOS X has launchd; runit, daemontools, and others can be installed anywhere (and are very, very easy to use -- look at the run scripts at http://smarden.org/runit/runscripts.html for examples).
Using these tools are the Right Way to monitor a process and ensure that it restarts whenever it exits: Unlike this (very high-overhead) solution they have almost no overhead at all, since they rely on the operating system notifying a process's parent when that process exits, rather than doing the work of polling for a process (and that only after all the overhead of connecting via SSH, negotiating a pair of session keys, starting a shell to run your script, etc, etc, etc).
Yes, this may be a small private project. Still, you're making extra complexity (and thus, extra bugs) for yourself -- and if you learn to use the tools to do this right, you'll know how to do things right when you have something that isn't a small private project.

How to wait in script until device is connected

I have a Sky wireless sensor node and a script which prints the output from the node.
sudo ./serialdump-linux -b115200 /dev/tmotesky1
If I start this script before my pc detects the node, I get the following error:
/dev/tmotesky1: No such file or directory
But if I wait for example 20 seconds, I miss the initial prints (which are important).
Is there a way to detect if the /dev/tmotesky1 exists?
Something like
while [ ! -f /dev/tmotesky1 ] ; do sleep 1; print 'Waiting...'; done
Thanks in advance!
Your code indicates that you are using Linux where you can use the hotplugging mechanism.
On generic systems, you can write an udev rule (--> see with udevadmin monitor -e what happens when you attach the device) which starts e.g. a program or writes something into a pipe. When systemd is used, you can start a service (see man systemd.device).
On small/embedded systems it is possible to write a custom /sbin/hotplug program (set in /proc/sys/kernel/hotplug) instead of using udev.

Bash script to monitor ISDN connection

On a Ubuntu 10.04 Server I would like to do the following with a bash script:
Create a service that monitors the ISDN connection and if downtime exceeds 60 seconds forces reconnect.
My current solution looks something like this:
#!/usr/bin/bash
LOGFILE=/home/msw/router/ping-stats.txt
TIME="`date +%C%y%m%d%H%M`"
/sbin/ping -c 1 google.com > /dev/null 2>&1
if [ "$?" == "0" ]
then
STATUS=1
else
STATUS=0
fi
echo "$TIME $STATUS" >> $LOGFILE
Since bandwidth is precious on an ISDN connection, I would like to avoid the ping and replace it with a command that simply checks for the status of the network device. Is it possible to infer from that if the connection is "up"?
I would also like to implement the solution as a service that checks connectivity continually instead of checking periodically with a cronjob.
I would really appreciate it if somebody could push me in the right direction.
Thank you
For quick-and-dirty there's nm-tool. dbus-send can be a bit more precise, but needs knowledge of how D-Bus works with NetworkManager. If you want something persistent then you'll need to learn how to interact with D-Bus, but that may require using something a bit lower-level such as Python.
Is your ISDN provided by an internal adapter or via an Ethernet connection? If you have an external "modem" you'd need to query that using SNMP or its proprietary facility.
You can do your test this way, by the way:
if /sbin/ping -c 1 google.com > /dev/null 2>&1
then
...
Also, a single ping is a very small number of bytes. If you're only doing it a few times a minute you may never notice it.

Resources