I have a cron job that runs every hour. It accesses an xml feed. If the xml feed is unvailable (which seems to happen once a day or so) it creates a "failure" file. This "failure" file has some metadata in it and is erased at the next hour when the script runs again and the XML feed works again.
What I want is to make a 2nd cron job that runs a few minutes after the first one, looks into the directory for a "failure" file and, if it's there, retries the 1st cron job.
I know how to set up cron jobs, I just don't know how to make scripting conditional like that. Am I going about this in entirely the wrong way?
Possibly. Maybe what you'd be better off doing is having the original script sleep and retry a (limited) number of times.
Sleep is a shell command and shells support looping so it could look something like:
for ((retry=0;retry<12;retry++)); do
try the thing
if [[ -e my_xlm_file ]]; then break; fi
sleep 300
# five minutes later...
done
As the command to run, try:
/bin/bash -c 'test -e failurefile && retrycommand -someflag -etc'
It runs retrycommand if failurefile exists
Why not have your set your script touch a status file when it has successfully completed. Have it run every 5 minutes, and have the first check of the script be to see if the status file is less then 60 minutes old, and if it is young, then quit, if it is old, then fetch.
I agree with MarkusQ that you should retry in the original job instead of creating another job to watch the first job.
Take a look at this tool to make retrying easier: https://github.com/kadwanev/retry
You can just wrap the original cron in a retry very easily and the final existence of the failure file would indicate if it failed even after the retries.
If somebody will need a bash script to ping an endpoint (for example, run scheduled API tasks via cron), retry it, if the response status was bad, then:
#!/bin/bash
echo "Start pinch.sh script.";
# run 5 times
for ((i=1;i<=5;i++))
do
# run curl to do a POST request to https://www.google.com
# silently flush all its output
# get the response status code as a bash variable
http_response=$(curl -o /dev/null -s -w "%{response_code}" https://www.google.com);
# check for the expected code
if [ $http_response != "200" ]
then
# process fail
echo "The pinch is Failed. Sleeping for 5 minutes."
# wait for 300 seconds, then start another iteration
sleep 300
else
# exit from the cycle
echo "The pinch is OK. Finishing."
break;
fi
done
exit 0
Related
am writing a crontab script, which will run on each Saturday for every 15minutes. The idea is to validate an external api status =SUCCESS or not. If its success, then the cronjob for the day should not trigger any more.
Right now am trying with recursion, but I dont think so that is a best solution.
Is there any other solution to achieve this? am using Shell script to invoke api.
Here is the existing snippet:
Cronjob:
*/15 * * * 6 validate.sh
script:
status='curl -X GET "api"'
if [[ $status == "SUCCEEDED" ]];then
trigger email
else sleep 180
./validate.sh
fi
Add another cron job so it removes the flag file on Friday evening, before the other job starts running:
59 23 * * 5 rm .succeeded.txt
Then change your script so it aborts if this file exists, and creates it when it succeeds.
#!/bin/bash
test -e .succeeded.txt && exit
if [[ $(curl -X GET "api") == "SUCCEEDED" ]];then
trigger email
touch .succeeded.txt
fi
I tried to fix other errors in your script, too, but I had to guess many things. This assumes "SUCCEEDED" is the sole output from curl when the GET works.
Putting the command in a variable is a useless complication which makes your script longer and (very slightly) slower, but in addition, it creates problems of its own when the command contains embedded quotes; see e.g. http://mywiki.wooledge.org/BashFAQ/050
... But of course, presumably you wanted to actually run the command. Your attempt would merely check whether the string in the variable was equal to "SUCEEDED" which of course it would never be.
Another problem was that you were spawning multiple validate.sh jobs, each of which would recurse and retry. You want one or the other, not both. I went with keeping your schedule and just trying once in each job.
I need to execute several calls to a C++ program that records frames from a videogame. I have about 1800 test games, and some of them work and some of them don't.
When they don't work, the console returns a Segmentation fault error, but when they do work, the program opens a window and plays the game, and at the same time it records every frame.
The problem is that when it does work, this process does not end until you close the game window.
I need to make a Bash script that will test every game I have and write the names of the ones that work in a text file and the names of the ones that don't work in another file.
For the moment I have tried with this, using the timeout command:
count=0
# Run for every file in the ROMS folder
for filename in ../ROMs/*.bin; do
# Increase the counter
(( count++ ))
# Run the command with a timeout to prevent it from being infinite
timeout 5 ./doc/examples/videoRecordingExample "$filename"
# Check if execution succeeds/fails and print in a text file
if [ $? == 0 ]; then
echo "Game $count named $filename" >> successGames.txt
else
echo "Game $count named $filename" >> failedGames.txt
fi
done
But it doesn't seem to be working, because it writes all the names on the same file. I believe this is because the condition inside the if refers to the timeout and not the execution of the C++ program itself.
Then I tried without the timeout and everytime a game worked, I closed manually the window, and then the result was the expected. I tried this with only 10 games, but when I test it with all the 1800 I would need it to be completely automatic.
So, is there any way of making this process automatic? Like some command to stop the execution and at the same time know if it was successful or not?
instead of
timeout 5 ./doc/examples/videoRecordingExample "$filename"
you could try this:
./doc/examples/videoRecordingExample "$filename" && sleep 5 && pkill videoRecordingExample
Swap the arguments in the timeout code. It should be:
timeout 5 "$filename" ./doc/examples/videoRecordingExample
Reason: the syntax for timeout is:
timeout [OPTION] DURATION COMMAND [ARG]...
So the COMMAND should be just after the DURATION. In the code above the presumably non-executable file videoRecordingExample would be the COMMAND, which probably returns an error every time.
script.sh
echo First!
sleep 5
echo Second!
sleep 5
echo Third!
another_script.rb
%x[./script.sh]
I want another_script.rb to print the output of script.sh as it happens. That means printing "First!", waiting five seconds, printing "Second!', waiting 5 seconds, and so on.
I've read through the different ways to run an external script in Ruby, but none seem to do this. How can I fulfill my requirements?
You can always execute this in Ruby:
system("sh", "script.sh")
Note it's important to specify how to execute this unless you have a proper #!/bin/sh header as well as the execute bit enabled.
I need to submit multiple simulations to condor (multi-client execution grid) using shell and since this may take a while, I decided to write a shell script to do it for me. I am very new to shell scripting and this is the result of what I did on one day:
for H in {0..50}
do
for S in {0..10}
do
./p32 -data ../data.txt -out ../result -position $S -group $H
echo "> Ready to submit"
condor_submit profile.sub
echo "> Waiting 15 minutes for group $H Pos $S"
for W in {1..15}
do
echo "Staring minute $W"
sleep 60
done
done
echo "Deleting data_3 to free up space"
mkdir /tmp/data_3
if [$H < 10]
then
tar cfvz /tmp/data_3/group_000$H.tar.gz ../result/data_3/group_000$H
rm -r ../result/data_3/group_000$H
else
tar cfvz /tmp/data_3/group_00$H.tar.gz ../result/data_3/group_00$H
rm -r ../result/data_3/group_00$H
fi
done
This script runs through 0..50 simulations and submits 0..10 different parameters to a program that generates a condor submission profile. Then I submit this profile and let it execute for 15 minutes (with a call being made every minute to ensure the SSH pipe doesn't break). Once the 15 minutes are up I compress the output to a volume with more space and erase the original files.
The reason for me implementing this because is due to our condor system can only being able to handle up to 10,000 submissions at once and one submission (condor_submit profile.sub) executes 7000+ simulations.
Now my problem is with this line. When I checked this morning I (luckily) spotted that the when calling condor_submit profile.sub may cause an error if the network is too busy. The error code is:
ERROR: Failed to connect to local queue manager
CEDAR:6001:Failed to connect to <IP_NUMBER:PORT_NUMBER>
This means that from time to time a whole iteration gets lost! How can I work around this? The only way I see is to use shell to read in the last line/s of terminal output and evaluate whether they follow the expected response i.e.:
7392 job(s) submitted to cluster CLUSTER_NUMBER.
But how would I read in the last line and go about checking for errors?
Any help is very needed and very much appreciated
Does condor_submit give a non-zero exit code when it fails? If so, you can try calling it like this:
while ! condor_submit profile.sub; do
sleep 5
done
which will cause the current profile to be submitted every 5 seconds until it succeeds.
I have a java program that stops often due to errors which is logged in a .log file. What can be a simple shell script to detect a particular text in the last/latest line say
[INFO] Stream closed
and then run the following command
java -jar xyz.jar
This should keep on happening forever(possibly after every two minutes or so) because xyz.jar writes the log file.
The text stream closed can arrive a lot of times in the log file. I just want it to take an action when it comes in the last line.
How about
while [[ true ]];
do
sleep 120
tail -1 logfile | grep -q "[INFO] Stream Closed"
if [[ $? -eq 1 ]]
then
java -jar xyz.jar &
fi
done
There may be condition where the tailed last log "Stream Closed" is not the real last log and the process is still logging the messages. We can avoid this condition by checking if the process is alive or not. If the process exited and the last log is "Stream Closed" then we need to restart the application.
#!/bin/bash
java -jar xyz.jar &
PID=$1
while [ true ]
do
tail -1 logfile | grep -q "Stream Closed" && kill -0 $PID && sleep 20 && continue
java -jar xyz.jar &
PID=$1
done
I would prefer checking whether the corresponding process is still running and restart the program on that event. There might be other errors that cause the process to stop. You can use a cronjob to periodically (like every minute) perform such a check.
Also, you might want to improve your java code so that it does not crash that often (if you have access to the code).
i solved this using a watchdog script that checks directly (grep) if program(s) is(are) running. by calling watchdog every minute (from cron under ubuntu), i basically guarantee (programs and environment are VERY stable) that no program will stay offline for more than 59 seconds.
this script will check a list of programs using the name in an array and see if each one is running, and, in case not, start it.
#!/bin/bash
#
# watchdog
#
# Run as a cron job to keep an eye on what_to_monitor which should always
# be running. Restart what_to_monitor and send notification as needed.
#
# This needs to be run as root or a user that can start system services.
#
# Revisions: 0.1 (20100506), 0.2 (20100507)
# first prog to check
NAME[0]=soc_gt2
# 2nd
NAME[1]=soc_gt0
# 3rd, etc etc
NAME[2]=soc_gp00
# START=/usr/sbin/$NAME
NOTIFY=you#gmail.com
NOTIFYCC=you2#mail.com
GREP=/bin/grep
PS=/bin/ps
NOP=/bin/true
DATE=/bin/date
MAIL=/bin/mail
RM=/bin/rm
for nameTemp in "${NAME[#]}"; do
$PS -ef|$GREP -v grep|$GREP $nameTemp >/dev/null 2>&1
case "$?" in
0)
# It is running in this case so we do nothing.
echo "$nameTemp is RUNNING OK. Relax."
$NOP
;;
1)
echo "$nameTemp is NOT RUNNING. Starting $nameTemp and sending notices."
START=/usr/sbin/$nameTemp
$START 2>&1 >/dev/null &
NOTICE=/tmp/watchdog.txt
echo "$NAME was not running and was started on `$DATE`" > $NOTICE
# $MAIL -n -s "watchdog notice" -c $NOTIFYCC $NOTIFY < $NOTICE
$RM -f $NOTICE
;;
esac
done
exit
i do not use the log verification, though you could easily incorporate that into your own version (just change grep for log check, for example).
if you run it from command line (or putty, if you are remotely connected), you will see what was working and what wasnt. have been using it for months now without a hiccup. just call it whenever you want to see what's working (regardless of it running under cron).
you could also place all your critical programs in one folder, do a directory list and check if every file in that folder has a program running under the same name. or read a txt file line by line, with every line correspoding to a program that is supposed to be running. etcetcetc
A good way is to use the awk command:
tail -f somelog.log | awk '/.*[INFO] Stream Closed.*/ { system("java -jar xyz.jar") }'
This continually monitors the log stream and when the regular expression matches its fires off whatever system command you have set, which is anything you would type into a shell.
If you really wanna be good you can put that line into a .sh file and run that .sh file from a process monitoring daemon like upstart to ensure that it never dies.
Nice and clean =D