Shell script: How to loop run two programs? - bash

I'm running an Ubuntu server to mine crypto. It's not a very stable coin yet and their main node gets disconnected sometimes. When this happens it crashes the program through fatal error.
At first I wrote a loop script so it would keep running after a crash and just try again after 15 seconds:
while true;
do ./miner <somecodetoconfiguretheminer> &&break;
sleep 15
done;
This works, but is inefficient. Sometimes the loop will keep running for 30 minutes until the main node is back up - which costs me 30 minutes of hashing power unused. So I want it to run a second miner for 15 minutes to mine another coin, then check the first miner again if its working yet.
So basically: Start -> Mine coin 1 -> if crash -> Mine coin 2 for 15 minutes -> go to Start
I tried the script below but the server just becomes unresponsive once the first miner disconnects:
while true;
do ./miner1 <somecodetoconfiguretheminer> &&break;
timeout 900 ./miner2
sleep 15
done;
Ive read through several topics / questions on how &&break works, timeout works and how while true works but I can't figure out what I'm missing here.
Thanks in advance for the help!

A much simpler solution would be to run both of the programs all the time, and lower the priority of the less-preferred one. On Linux and similar systems, that is:
nice -10 ./miner2loop.sh &
./miner1loop.sh
Then the scripts can be similar to your first one.

Okay, so after trial and error - and some help - I found out that there is nothing wrong with my initial code. Timeout appears to behave differently on my linux instance when used in terminal than in a bash script. If used in Terminal it behaves as it should, it counts down and then kills the process it started. If used in bash however - it acts as if I typed 'sleep' and then after counting down stops.
Apparently this has to do with my Ubuntu instance (running on a VPS). Even though I installed latest versions of coreutils, have all the latest versions installed through apt-get update etc. This is the case for me on Digital Ocean as well as Google Compute.
The solution is to use the Timeout code as a function within the bash script, as found on another thread in stackoverflow. I named the function timeout2 as to not confuse the system in triggering the not properly working timeout command:
#!/bin/bash
# Executes command with a timeout
# Params:
# $1 timeout in seconds
# $2 command
# Returns 1 if timed out 0 otherwise
timeout2() {
time=$1
# start the command in a subshell to avoid problem with pipes
# (spawn accepts one command)
command="/bin/sh -c "$2""
expect -c "set echo "-noecho"; set timeout $time; spawn -noecho
$command; expect timeout { exit 1 } eof { exit 0 }"
if [ $? = 1 ] ; then
echo "Timeout after ${time} seconds"
fi
}
while true;
do
./miner1 <parameters for miner> && break;
sleep 5
timeout2 300 ./miner2 <parameters for miner>
done;

Related

How to exit command if command does not output anything for 10 seconds?

I want to ssh IP obtained in varible "a". this works fine if everything is ok.
But if my command "commandToGetIP" get stuck in network or by any chance it will not return any output my script will go in hung state.
Now what i need to get is this command "commandToGetIP" should wait only for 10 sec and come out giving some message
a=commandToGetIP <DeviceID>
ssh $a
I want to ssh IP obtained in varible "a". this works fine if everything is ok.
But if my command "commandToGetIP" get stuck in network or by any chance it will not return any output my script will go in hung state.
Now what i need to get is this command "commandToGetIP" should wait only for 10 sec and come out giving some message
So, bash has a built in command to timeout another command. If the command takes too long, timeout will kill it with exit code 143. Using this information, I am checking to see if the exit code was not 143 before performing the ssh command. Otherwise, it'll do whatever you want it to do when commandToGetIP takes too long.
timeout 10 commandToGetIP <DeviceID> | read ip
result=$?
if [ "$result" != "143" ]; then
ssh $ip
else
# What we do if it times out
fi

checking per ssh if a specific program is still running, in parallel

I have several machines where I have a program running. Every 30 seconds or so I want to check if those programs are still running. I use the following command to do that.
ssh ${USER}#${HOSTS[i]} "bash -c 'if [[ -z \"\$(pgrep -u ${USER} program)\" ]]; then exit 1; else exit 0; fi'"
Now running this on >100 machines takes a long time and I want to speed that up by checking in parallel. I am aware of '&' and 'parallel', but I am unsure how to retreive the return value (task completed or not).
The following lets all connections complete before starting any in the next batch, and thus can potentially wait for more than 30 seconds -- but should give you a good idea of how to do what you're looking for:
hosts=( host1 host2 host3 )
user=someuser
script="script you want to run on each remote host"
last_time=$(( SECONDS - 30 ))
while (( ( SECONDS - last_time ) >= 30 )) || \
sleep $(( 30 - (SECONDS - last_time) )); do
last_time=$SECONDS
declare -A pids=( )
for host in "${hosts[#]}"; do
ssh "${user}#${host}" "$script" & pids[$!]="$host"
done
for pid in "${!pids[#]}"; do
wait "$pid" || {
echo "Failure monitoring host ${pids[$pid]} at time $SECONDS" >&2
}
done
done
Now, bigger picture: Don't do that.
Almost every operating system has a process supervision framework. Ubuntu has Upstart; Fedora and CentOS 7 have systemd; MacOS X has launchd; runit, daemontools, and others can be installed anywhere (and are very, very easy to use -- look at the run scripts at http://smarden.org/runit/runscripts.html for examples).
Using these tools are the Right Way to monitor a process and ensure that it restarts whenever it exits: Unlike this (very high-overhead) solution they have almost no overhead at all, since they rely on the operating system notifying a process's parent when that process exits, rather than doing the work of polling for a process (and that only after all the overhead of connecting via SSH, negotiating a pair of session keys, starting a shell to run your script, etc, etc, etc).
Yes, this may be a small private project. Still, you're making extra complexity (and thus, extra bugs) for yourself -- and if you learn to use the tools to do this right, you'll know how to do things right when you have something that isn't a small private project.

Introduce timeout in a bash for-loop

I have a task that is very well inside of a bash for loop. The situation is though, that a few of the iterations seem to not terminate. What I'm looking for is a way to introduce a timeout that if that iteration of command hasn't terminated after e.g. two hours it will terminate, and move on to the next iteration.
Rough outline:
for somecondition; do
while time-run(command) < 2h do
continue command
done
done
One (tedious) way is to start the process in the background, then start another background process that attempts to kill the first one after a fixed timeout.
timeout=7200 # two hours, in seconds
for somecondition; do
command & command_pid=$!
( sleep $timeout & wait; kill $command_pid 2>/dev/null) & sleep_pid=$!
wait $command_pid
kill $sleep_pid 2>/dev/null # If command completes prior to the timeout
done
The wait command blocks until the original command completes, whether naturally or because it was killed after the sleep completes. The wait immediately after sleep is used in case the user tries to interrupt the process, since sleep ignores most signals, but wait is interruptible.
If I'm understanding your requirement properly, you have a process that needs to run, but you want to make sure that if it gets stuck it moves on, right? I don't know if this will fully help you out, but here is something I wrote a while back to do something similar (I've since improved this a bit, but I only have access to a gist at present, I'll update with the better version later).
#!/bin/bash
######################################################
# Program: logGen.sh
# Date Created: 22 Aug 2012
# Description: parses logs in real time into daily error files
# Date Updated: N/A
# Developer: #DarrellFX
######################################################
#Prefix for pid file
pidPrefix="logGen"
#output direcory
outDir="/opt/Redacted/logs/allerrors"
#Simple function to see if running on primary
checkPrime ()
{
if /sbin/ifconfig eth0:0|/bin/grep -wq inet;then isPrime=1;else isPrime=0;fi
}
#function to kill previous instances of this script
killScript ()
{
/usr/bin/find /var/run -name "${pidPrefix}.*.pid" |while read pidFile;do
if [[ "${pidFile}" != "/var/run/${pidPrefix}.${$}.pid" ]];then
/bin/kill -- -$(/bin/cat ${pidFile})
/bin/rm ${pidFile}
fi
done
}
#Check to see if primary
#If so, kill any previous instance and start log parsing
#If not, just kill leftover running processes
checkPrime
if [[ "${isPrime}" -eq 1 ]];then
echo "$$" > /var/run/${pidPrefix}.$$.pid
killScript
commands && commands && commands #Where the actual command to run goes.
else
killScript
exit 0
fi
I then set this script to run on cron every hour. Every time the script is run, it
creates a lock file named after a variable that describes the script that contains the pid of that instance of the script
calls the function killScript which:
uses the find command to find all lock files for that version of the script (this lets more than one of these scripts be set to run in cron at once, for different tasks). For each file it finds, it kills the processes of that lock file and removes the lock file (it automatically checks that it's not killing itself)
Starts doing whatever it is I need to run and not get stuck (I've omitted that as it's hideous bash string manipulation that I've since redone in python).
If this doesn't get you squared let me know.
A few notes:
the checkPrime function is poorly done, and should either return a status, or just exit the script itself
there are better ways to create lock files and be safe about it, but this has worked for me thus far (famous last words)

How to switch a sequence of tasks to background?

I'm running two tests on a remote server, here is the command I used several hours ago:
% ./test1.sh; ./test2.sh
The two tests are supposed to run one by one.If the second runs before the first completes, everything will be in ruin, and I'll have to restart the whole procedure.
The dilemma is, these two tasks cost too many hours to complete, and when I prepare to logout the server and wait for the result. I don't know how to switch both of them to background... If I use Ctrl+Z, only the first task will be suspended, while the second starts doing nothing useful while wiping out current data.
Is it possible to switch both of them to background, preserving their orders? Actually I should make these two tasks in the same process group like (./test1.sh; ./test2.sh) &, but sadly, the first test have run several hours, and it's quite a pity to restart the tests.
An option is to kill the second test before it starts, but is there any mechanism to cope with this?
First rename the ./test2.sh to ./test3.sh. Then do [CTRL+Z], followed by bg and disown -h. Then save this script (test4.sh):
while :; do
sleep 5;
pgrep -f test1.sh &> /dev/null
if [ $? -ne 0 ]; then
nohup ./test3.sh &
break
fi
done
then do: nohup ./test4.sh &.
and you can logout.
First, screen or tmux are your friends here, if you don't already work with them (they make remote machine work an order of magnitude easier).
To use conditional consecutive execution you can write:
./test1.sh && ./test2.sh
which will only execute test2.sh if test1.sh returns with 0 (conventionally meaning: no error). Example:
$ true && echo "first command was successful"
first command was successful
$ ! true && echo "ain't gonna happen"
More on control operators: http://www.humbug.in/docs/the-linux-training-book/ch08s01.html

Un*x shell script: what is the correct way to run a script for at most x milliseconds?

I'm not a scripting expert and I was wondering what was an acceptable way to run a script for at most x milliseconds (and yet finish before x milliseconds if the script is done before the timeout).
I solved that problem using Bash in a way that I think is very hacky and I wonder if there's a better way to do it.
Basically I've got one shell script called sleep_kill.sh that takes a PID as the first argument and a timeout as its second argument and that does this:
sleep $2
kill -9 $1 2> /dev/null 1> /dev/null
So if the PID corresponds to a script that finishes before timing out, nothing is going to be killed (I take it that the OS shall not have the time to be reusing this PID for another [unrelated] process seen that it's 'cycling' through all the process IDs before starting to reuse them).
Anyway, then I call my script that may "hang" or timeout:
command_that_may_hang.sh
PID=$!
sleep_kill.sh $PID .3
wait $PID > /dev/null 2>&1
And I'll be waiting at most 300 ms for command_that_may_hang.sh. Yet if command_that_may_hang.sh took only 10 ms to execute, I won't be "stuck" for 300 ms.
It would be great if some shell expert could explain the drawbacks of this approach and what should be done instead.
Have a look at this script: http://www.pixelbeat.org/scripts/timeout
Note timeouts of less that one second are pretty much nonsensical on most systems due to scheduling delays etc. Note also that newer coreutils has the timeout command included and it has a resolution of 1 second.

Resources