Catching a specific error and re-trying script? - BASH - bash

I have a bash script which runs a program to migrate some data. This fails around 30-40% of the time.
I want a way to retry the script when this particular error comes up, but I only want to try 3 times before failing.
The script outputs the following when it fails:
Error: The connection to the remote server has timed out, no changes have been committed. (#134 - scope: ajax_verify_connection_to_remote_site)
Edit: To be more specific....
migration.sh:
#!/bin/bash
various other scripts........
sudo a_broken_migration_program <Variables>
I want to retry broken_migration several times, ideally only when it fails with this specific error but if that's too complicated I will settle on retrying all errors.

To do this, just run your command in a loop:
#Loop until counter is 3
counter=1
while [[ $counter -le 3 ]] ; do
yourcommand && break
((counter++))
done
If yourcommand is successful then it will break the loop. If it's unsuccessful then it will increment the counter and loop. Until the counter is 3.
If you just want to retry on a specific error code, you could capture the error on failure, test the code, and increment:
#Loop until counter is 3
counter=1
while [[ $counter -le 3 ]]
do
#command to run
ssh person#compthatdoesntexist
rc=$?
[[ $rc -eq 255 ]] && ((counter++)) || break
done
This example tries to ssh to a box that doesn't exist. We then capture the return code $? in variable $rc. If $rc is 255 ("ssh: Could not resolve hostname compthatdoesntexist: Name or service not known") then it increments the counter and loops. Any other exit code kicks us out of the loop.

Related

Bash script not exiting once a process is running

How should I modify my bash script logic so it exits the while loop and exits the script itself once a process named custom_app is running on my local Ubuntu 18.04? I've tried using break and exit inside an if statement with no luck.
Once custom app is running from say...1st attempt then I quit the app, run_custom_app.sh lingers in the background and resumes retrying 2nd, 3rd, 4th, 5th time. It should be doing nothing at this point since app already ran successfully and user intentionally quit.
Below is run_custom_app.sh used to run my custom app triggered from a website button click.
Script logic
Check if custom_app process is running already. If so, don't run the commands in the while code block. Do nothing. Exit run_custom_app.sh.
While custom_app process is NOT running, retry up to 5 times.
Once custom_app process is running, stop while loop and exit run_custom_app.sh as well.
In cases where 5 run retries have been attempted but custom_app process is still not running, display a message to the user.
#!/bin/sh
RETRYCOUNT=0
PROCESS_RUNNING=`ps cax | grep custom_app`
# Try to connect until process is running. Retry up to 5 times. Wait 10 secs between each retry.
while [ ! "$PROCESS_RUNNING" ] && [ "$RETRYCOUNT" -le 5 ]; do
RETRYCOUNT="`expr $RETRYCOUNT + 1`"
commands
sleep 10
PROCESS_RUNNING=`ps cax | grep custom_app`
if [ "$PROCESS_RUNNING" ]; then
break
fi
done
# Display an error message if not connected after 5 connection attempts
if [ ! "$PROCESS_RUNNING" ]; then
echo "Failed to connect, please try again in about 2 minutes" # I need to modify this later so it opens a Terminal window displaying the echo statement, not yet sure how.
fi
I have tested this code on VirtualBox as a replacement for your custom_app and the previous post was using an until loop and pgrep instead of ps. As suggested by DavidC.Rankin pidof is more correct but if you want to use ps then I suggest to use ps -C custom_app -o pid=
#!/bin/sh
retrycount=0
until my_app_pid=$(ps -C VirtualBox -o pid=); do ##: save the output of ps in a variable so we can check/test it for later.
echo commands ##: Just echoed the command here not sure which commands you are using/running.
if [ "$retrycount" -eq 4 ]; then ##: We started at 0 so the fifth count is 4
break ##: exit the loop
fi
sleep 10
retrycount=$((retrycount+1)) ##: increment by one using shell syntax without expr
done
if [ -n "$my_app_pid" ]; then ##: if $my_app_pid is not empty
echo "app is running"
else
echo "Failed to connect, please try again in about 2 minutes" >&2 ##: print the message to stderr
exit 1 ##: exit with a failure which is not 0
fi
The my_app_pid=$(ps -C VirtualBox -o pid=) variable assignment has a useful exit status so we can use it.
Basically the until loop is just the opposite of the while loop.

Bash script: spawning multiple processes issues

So i am writing a script to call a process 365 times and they should run in 10 batches, so this is something i wrote but there are multiple issues -
1. the log message is not getting written to the log file, i see the error message in err file
2. there is this "Command not found" error I keep getting from the script for the line process.
3. even if the command doesnt succeed, still it doesn't print FAIL but prints success
#!/bin/bash
set -m
FAIL=0
for i in {1..10}
do
waitPIDS=()
j=$i
while [ $j -lt 366 ]; do
exec 1>logfile
exec 2>errorfile
`process $j &`
waitPIDS[${#waitPIDS[#]}]=$!
j=$[$j+1]
done
for jpid in "${waitPIDS[#]}"
do
echo $jpid
wait $jpid
if [[ $? != 0 ]] ; then
echo "fail"
else
echo "success"
fi
done
done
What is wrong with it ?
thanks!
At the very least, this line:
`process $j &`
Shouldn't have any backticks in it. You probably just want:
process $j &
Besides that, you're overwriting your log files instead of appending to them; is that intended?

how to assign a value to variable and get the return value of output in single line

I have below line in my script
script_list=`ssh#hostip ls -A /directory 2>/dev/null`
Is there a way to use that in if condition, so that i can get the script_list variable assigned or handle the failure scenario in else condition
Thanks in advance
You can simply check the automatic variable $? in the next line:
script_list=$( ssh ... )
rc=$?
if [[ $rc -ne 0 ]]; then
...something is wrong...
fi
This works because the exit code of ssh is the exit code of the command it ran remotely if ssh itself could be executed successfully. But usually, you don't care which part of the command chain failed, it's good enough to know that any part (the local ssh or the remote command failed).
No problem, just do it. An assignment is perfectly fine as a command (by command I mean the thing which can come after an if).
if asdf=$(echo test1; exit 1); then
echo "SUCCESS1: $asdf"
fi
if asdf=$(echo test0; exit 0); then
echo "SUCCESS0: $asdf"
fi

bash - killing a subprocess after a set timeout

I was hoping somebody would be able to help me with this
I need a loop for a shell script that will run what is inside the loop for 15 seconds. SO for example
if (true)
run command for 15 seconds
fi
kill PID
I am new to shell scripting, so i am lost with this.
Also I am using a debian instll if that makes any difference
Any help is appreciated
Are you looking for the timeout command?
The following bash script might work for you. The script will set the initial epoch time as a variable prior to beginning a loop. While the loop runs an additional variable will be set with the current epoch time. Both epoch times will be compared and as long as the difference is less than or equal to 15 your command will continue to run. Note that in the script below the current command running is 'echo "counting ${COUNTER}"'. You should change this portion of the script to match what you are trying to accomplish. Once the difference of the two epoch times is greater than 15 the script will exit. You will need to initate your kill command at this point. If an error does occur you should see "ERROR... YourScript.sh failed" in "YourLogFile" (set your log file to what you would like)
NOTE: Whatever you are attempting to run while inside this loop may run many many many times within the 15 second period. By utilizing the script below as a test you will see that the echo command runs more than 50 times per second.
#!/bin/bash
LOOP="true"
INITIAL_TIME=$(date "+%s")
while [[ ${LOOP} == true ]]; do
CURRENT_TIME=$(date "+%s")
COUNTER=$(expr ${CURRENT_TIME} - ${INITIAL_TIME})
if [[ ${COUNTER} -le "15" ]]; then
echo "counting ${COUNTER}"
# RUN YOUR COMMAND IN PLACE OF THE ABOVE echo COMMAND
elif [[ ${COUNTER} -gt "15" ]]; then
exit 0
#INITIATE YOUR KILL COMMAND IN PLACE OF OR BEFORE THE exit
else
echo "ERROR... YourScript.sh failed" >> /YourLogFile
fi
done

Waiting for a command to return in a bash script

What I am trying to do:
My bash shell script contains a modprobe -a $modulename. Sometimes loading that module fails and the modprobe statement just gets stuck. It never returns and hence, my script is stuck too.
What I want to do is this: Call modprobe -a $modulename , wait for 20 secs and if the command does not return and script remains stuck for 20 secs, call that a failure and exit !
I am looking at possible options for that. I know timeout is one, which will allow me to timeout after certain time. So I am thinking :
timeout -t 10 modprobe -a $modulename
if [ "$?" -gt 0 ]; then
echo "error"
exit
fi
But the problem is $? can be > 0 , not just because of timeout, but because of an error while loading the module too and I want to handle the two cases differently.
Any ideas using timeout and without using timeout are welcome.
According to timeout(1), timeout exits with a specific code (124 in my case) if the command times out. It's highly unlikely that modprobe would exit with that code, so you could probably check specifically for that by changing your condition:
...
RET="$?"; if [[ "$RET" = "124" ]]; then echo timeout; OTHER COMMAND; elif [[ "$RET" -gt 0 ]]; then echo error; exit; fi
BTW, it is a very good practice to assign "$?" to a variable immediately after your command. You will avoid a lot of grief later...
If you really do need to make sure, you can check the modprobe source code to see what exit codes it produces, since apparently it was not deemed important enough to mention in its manual page...
consider using "expect", you can set a timeout as well as running different command depending on the outcome of the modprobe.
Regards,
Andrew.

Resources