LSF - automatic job rerun using sasbatch script

LSF - automatic job rerun using sasbatch script - bash

I am trying to create an auto-rerun mechanism by implementing some code into sasbatch script after sascommand will finish. General idea is to:
locate a log of sas process and an id of the flow containing current job,
check if the log contains particular ORA-xxxxx errors which we know that solution for them is just rerun of the process,
if so, then trigger jrerun class from LSF Platform Command Line Interface,
exit sasbatch passing $rc to LSF
The idea was implemented as:
#define used paths
log_dir=/path/to/sas_logs_directory
out_log=/path/to/auto-rerun_log.txt
out_log2=/path/to/lsf_rerun_log.txt
if [ -n "${LSB_JOBNAME}"]; then
if [ ! -f "$out_log"]; then
touch $out_log
fi
#get flow runtime attributes
IFS-: read -r flow_id username flow_name job_name <<< "${LSB_JOBNAME}"
#find log of the current process
log_path=$(ls -t $log_dir/*.log | xargs grep -li "job:\s*$job_name" | grep -i "/$flow_name_" | head -1)
#set path to txt file containing lines which represents ORA errors we look for
conf_path-/path/to/error_list
#analyse process' log line by line
while read -r line;
do
#if error is found in log then try to rerun flow
if grep -q "$line" $log_path; then
(nohup /path/to/rerun_script.sh $flow_id >$out_log2 2>&1) &
disown
break
fi
done < $conf_path
fi
While rerun_script is the script which calls jrerun class after sleep command - in order to let parent script exit $rc in the meanwhile. It looks like:
sleep 10
/some/lsf/path/jrerun
Problem is that job is running for the all time. In LSF history I can see that jrerun was called before job exited.
Furthermore in $out_log2 I can see message: <flow_id> has no starting or exit points.
Do anyone have an idea how I can pass return code to LSF before jrerun calling? Or maybe some simplier way to perform autorerun of SAS jobs in Platform LSF?
I am using SAS 9.4 and Platform Process Manager 9.1

Or maybe some simplier way to perform autorerun of SAS jobs in Platform LSF?
I'm not knowledgeable about the SAS part. But on the LSF side there's at least a couple of ways to requeue the job.
If you have control of the job script, you can use special process exit value to automatically requeue the job.
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/job_requeue_about.html
If you have control outside of the job script, you can use brequeue -r to requeue a running job.
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/brequeue.1.html
Good Luck

I managed to get this working by using two additional configuration files. When my grep returnes 1 I add found flow_id to flow_list.txt configuration file and modify especially made trigger_file.txt.
I scheduled additional flow execute_rerun in LSF which is triggered after file trigger_file.txt is modified. The execute_rerun flow reads flow_list.txt configuration file line by line and calls jrerun method on each flow.
I managed to achieve an automatic rerun of the flows which fails due to particular errors.

Related

Loop trough docker output until I find a String in bash

I am quite new to bash (barely any experience at all) and I need some help with a bash script.
I am using docker-compose to create multiple containers - for this example let's say 2 containers. The 2nd container will execute a bash command, but before that, I need to check that the 1st container is operational and fully configured. Instead of using a sleep command I want to create a bash script that will be located in the 2nd container and once executed do the following:
Execute a command and log the console output in a file
Read that file and check if a String is present. The command that I will execute in the previous step will take a few seconds (5 - 10) seconds to complete and I need to read the file after it has finished executing. I suppose i can add sleep to make sure the command is finished executing or is there a better way to do this?
If the string is not present I want to execute the same command again until I find the String I am looking for
Once I find the string I am looking for I want to exit the loop and execute a different command
I found out how to do this in Java, but if I need to do this in a bash script.
The docker-containers have alpine as an operating system, but I updated the Dockerfile to install bash.
I tried this solution, but it does not work.
#!/bin/bash
[command to be executed] > allout.txt 2>&1
until
tail -n 0 -F /path/to/file | \
while read LINE
do
if echo "$LINE" | grep -q $string
then
echo -e "$string found in the console output"
fi
done
do
echo "String is not present. Executing command again"
sleep 5
[command to be executed] > allout.txt 2>&1
done
echo -e "String is found"

In your docker-compose file make use of depends_on option.
depends_on will take care of startup and shutdown sequence of your multiple containers.
But it does not check whether a container is ready before moving to another container startup. To handle this scenario check this out.
As described in this link,
You can use tools such as wait-for-it, dockerize, or sh-compatible wait-for. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections.
OR
Alternatively, write your own wrapper script to perform a more application-specific health check.
In case you don't want to make use of above tools then check this out. Here they use a combination of HEALTHCHECK and service_healthy condition as shown here. For complete example check this.

Just:
while :; do
# 1. Execute a command and log the console output in a file
command > output.log
# TODO: handle errors, etc.
# 2. Read that file and check if a String is present.
if grep -q "searched_string" output.log; then
# Once I find the string I am looking for I want to exit the loop
break;
fi
# 3. If the string is not present I want to execute the same command again until I find the String I am looking for
# add ex. sleep 0.1 for the loop to delay a little bit, not to use 100% cpu
done
# ...and execute a different command
different_command
You can timeout a command with timeout.
Notes:
colon is a utility that returns a zero exit status, much like true, I prefer while : instead of while true, they mean the same.
The code presented should work in any posix shell.

whether a shell script can be executed if another instance of the same script is already running

I have a shell script which usually runs nearly 10 mins for a single run,but i need to know if another request for running the script comes while a instance of the script is running already, whether new request need to wait for existing instance to compplete or a new instance will be started.
I need a new instance must be started whenever a request is available for the same script.
How to do it...
The shell script is a polling script which looks for a file in a directory and execute the file.The execution of the file takes nearly 10 min or more.But during execution if a new file arrives, it also has to be executed simultaneously.
the shell script is below, and how to modify it to execute multiple requests..
#!/bin/bash
while [ 1 ]; do
newfiles=`find /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -newer /afs/rch/usr$
touch /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/.my_marker
if [ -n "$newfiles" ]; then
echo "found files $newfiles"
name2=`ls /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -Art |tail -n 2 |head $
echo " $name2 "
mkdir -p -m 0755 /afs/rch/usr8/fsptools/WWW/dumpspace/$name2
name1="/afs/rch/usr8/fsptools/WWW/dumpspace/fipsdumputils/fipsdumputil -e -$
$name1
touch /afs/rch/usr8/fsptools/WWW/dumpspace/tempfiles/$name2
fi
sleep 5
done

When writing scripts like the one you describe, I take one of two approaches.
First, you can use a pid file to indicate that a second copy should not run. For example:
#!/bin/sh
pidfile=/var/run/$(0##*/).pid
# remove pid if we exit normally or are terminated
trap "rm -f $pidfile" 0 1 3 15
# Write the pid as a symlink
if ! ln -s "pid=$$" "$pidfile"; then
echo "Already running. Exiting." >&2
exit 0
fi
# Do your stuff
I like using symlinks to store pid because writing a symlink is an atomic operation; two processes can't conflict with each other. You don't even need to check for the existence of the pid symlink, because a failure of ln clearly indicates that a pid cannot be set. That's either a permission or path problem, or it's due to the symlink already being there.
Second option is to make it possible .. nay, preferable .. not to block additional instances, and instead configure whatever it is that this script does to permit multiple servers to run at the same time on different queue entries. "Single-queue-single-server" is never as good as "single-queue-multi-server". Since you haven't included code in your question, I have no way to know whether this approach would be useful for you, but here's some explanatory meta bash:
#!/usr/bin/env bash
workdir=/var/tmp # Set a better $workdir than this.
a=( $(get_list_of_queue_ids) ) # A command? A function? Up to you.
for qid in "${a[#]}"; do
# Set a "lock" for this item .. or don't, and move on.
if ! ln -s "pid=$$" $workdir/$qid.working; then
continue
fi
# Do your stuff with just this $qid.
...
# And finally, clean up after ourselves
remove_qid_from_queue $qid
rm $workdir/$qid.working
done
The effect of this is to transfer the idea of "one at a time" from the handler to the data. If you have a multi-CPU system, you probably have enough capacity to handle multiple queue entries at the same time.

ghoti's answer shows some helpful techniques, if modifying the script is an option.
Generally speaking, for an existing script:
Unless you know with certainty that:
the script has no side effects other than to output to the terminal or to write to files with shell-instance specific names (such as incorporating $$, the current shell's PID, into filenames) or some other instance-specific location,
OR that the script was explicitly designed for parallel execution,
I would assume that you cannot safely run multiple copies of the script simultaneously.
It is not reasonable to expect the average shell script to be designed for concurrent use.

From the viewpoint of the operating system, several processes may of course execute the same program in parallel. No need to worry about this.
However, it is conceivable, that a (careless) programmer wrote the program in such a way that it produces incorrect results, when two copies are executed in parallel.

Check processes run by cronjob to avoid multiple execution

How do I avoid cronjob from executing multiple times on the same command? I had tried to look around and try to check and kill in processes but it doesn't work with the below code. With the below code it keeps entering into else condition where it suppose to be "running". Any idea which part I did it wrongly?
#!/bin/sh
devPath=`ps aux | grep "[i]mport_shell_script"` | xargs
if [ ! -z "$devPath" -a "$devPath" != " " ]; then
echo "running"
exit
else
while true
do
sudo /usr/bin/php /var/www/html/xxx/import_from_datafile.php /dev/null 2>&1
sleep 5
done
fi
exit
cronjob:
*/2 * * * * root /bin/sh /var/www/html/xxx/import_shell_script.sh /dev/null 2>&1

I don't see the point to add a cron job which then starts a loop that runs a job. Either use cron to run the job every minute or use a daemon script to make sure your service is started and is kept running.
To check whether your script is already running, you can use a lock directory (unless your daemon framework already does that for you):
LOCK=/tmp/script.lock # You may want a better name here
mkdir $LOCK || exit 1 # Exit with error if script is already running
trap "rmdir $LOCK" EXIT # Remove the lock when the script terminates
...normal code...
If your OS supports it, then /var/lock/script might be a better path.
Your next question is probably how to write a daemon. To answer that, I need to know what kind of Linux you're using and whether you have things like systemd, daemonize, etc.

check the presence of a file at the beginning of your script ( for example /tmp/runonce-import_shell_script ). If it exists, that means the same script is already running (or the previous one halted with an error).
You can also add a timestamp in that file so you can check since when the script was running (and maybe decide to run it again after 24h even if the file is present)

log of parallel computations, how do I prevent interleaved write? lockfile or flock?

I see that has been discussed several times how to run scripts not concurrently, but I have not see the topic of concurrent write.
I am doing some parallel computation with xargs launching the commands for the actual computations. At the end of each computation I want that process to access a file and put the results in there. I am getting troubles because the write on the log file happens in a way that each process can access the log file at the same time, resulting in interleaved entries with one line from one run, another line from another run that finished about the same time (which is likely to happen due to the parallel nature of the run with xargs).
So in practice let's say that using xargs I run in parallel several insances of a script that reads:
#!/bin/bash
#### do something that takes some time
#### define content of the log
folder="<folder>"$PWD"</folder>\n"
datetag="<enddate>"`date`"</enddate>\n"
#### store log in XML ####
echo -e "<myrun>\n""$folder""$datetag""</myrun>" >> $outputfie
At present I get output file with interleaved runs log like this
<myrun>
<myrun>
<folder>./generations/test/run1</folder>
<folder>./generations/test/run2</folder>
<enddate>Sun Jul 6 11:17:58 CEST 2014</enddate>
</myrun>
<enddate>Sun Jul 6 11:17:58 CEST 2014</enddate>
</myrun>
Is there a way to give "exclusive access" to one instance of the script at a time, so that each script is writing its log without interference with the others?
I have seen flock and lockfile, but I am not sure what fits best to my case and I am seeking for advise/suggestion.
Thanks,
Roberto

I will use traceroute as example as that prints output slowly, but any other command would also work. Compare:
(echo 8.8.8.8;echo 8.8.4.4) | xargs -P6 -n1 traceroute > traceroute.xarg
to:
(echo 8.8.8.8;echo 8.8.4.4) | parallel traceroute > traceroute.para
Make sure you install GNU Parallel and not another parallel, and that /etc/parallel/config is empty.

I thinks this in the end does the job. The loop keeps going until this instance of the script can lock the log file for itself. Then writes and unlocks it.
The other instances of the script that are running in parallel and might be trying to write will find the lock ... or will be able to lock the file for themselves.
while [ -! `lockfile -1 log.lock` ]; do
echo -e "accessing file at "`date`
echo -e "$logblock" >> log
rm -f log.lock
break
done
Does anybody see any drawbacks in this type of solution?

Shell script that continuously checks a text file for log data and then runs a program

I have a java program that stops often due to errors which is logged in a .log file. What can be a simple shell script to detect a particular text in the last/latest line say
[INFO] Stream closed
and then run the following command
java -jar xyz.jar
This should keep on happening forever(possibly after every two minutes or so) because xyz.jar writes the log file.
The text stream closed can arrive a lot of times in the log file. I just want it to take an action when it comes in the last line.

How about
while [[ true ]];
do
sleep 120
tail -1 logfile | grep -q "[INFO] Stream Closed"
if [[ $? -eq 1 ]]
then
java -jar xyz.jar &
fi
done

There may be condition where the tailed last log "Stream Closed" is not the real last log and the process is still logging the messages. We can avoid this condition by checking if the process is alive or not. If the process exited and the last log is "Stream Closed" then we need to restart the application.
#!/bin/bash
java -jar xyz.jar &
PID=$1
while [ true ]
do
tail -1 logfile | grep -q "Stream Closed" && kill -0 $PID && sleep 20 && continue
java -jar xyz.jar &
PID=$1
done

I would prefer checking whether the corresponding process is still running and restart the program on that event. There might be other errors that cause the process to stop. You can use a cronjob to periodically (like every minute) perform such a check.
Also, you might want to improve your java code so that it does not crash that often (if you have access to the code).

i solved this using a watchdog script that checks directly (grep) if program(s) is(are) running. by calling watchdog every minute (from cron under ubuntu), i basically guarantee (programs and environment are VERY stable) that no program will stay offline for more than 59 seconds.
this script will check a list of programs using the name in an array and see if each one is running, and, in case not, start it.
#!/bin/bash
#
# watchdog
#
# Run as a cron job to keep an eye on what_to_monitor which should always
# be running. Restart what_to_monitor and send notification as needed.
#
# This needs to be run as root or a user that can start system services.
#
# Revisions: 0.1 (20100506), 0.2 (20100507)
# first prog to check
NAME[0]=soc_gt2
# 2nd
NAME[1]=soc_gt0
# 3rd, etc etc
NAME[2]=soc_gp00
# START=/usr/sbin/$NAME
NOTIFY=you#gmail.com
NOTIFYCC=you2#mail.com
GREP=/bin/grep
PS=/bin/ps
NOP=/bin/true
DATE=/bin/date
MAIL=/bin/mail
RM=/bin/rm
for nameTemp in "${NAME[#]}"; do
$PS -ef|$GREP -v grep|$GREP $nameTemp >/dev/null 2>&1
case "$?" in
0)
# It is running in this case so we do nothing.
echo "$nameTemp is RUNNING OK. Relax."
$NOP
;;
1)
echo "$nameTemp is NOT RUNNING. Starting $nameTemp and sending notices."
START=/usr/sbin/$nameTemp
$START 2>&1 >/dev/null &
NOTICE=/tmp/watchdog.txt
echo "$NAME was not running and was started on `$DATE`" > $NOTICE
# $MAIL -n -s "watchdog notice" -c $NOTIFYCC $NOTIFY < $NOTICE
$RM -f $NOTICE
;;
esac
done
exit
i do not use the log verification, though you could easily incorporate that into your own version (just change grep for log check, for example).
if you run it from command line (or putty, if you are remotely connected), you will see what was working and what wasnt. have been using it for months now without a hiccup. just call it whenever you want to see what's working (regardless of it running under cron).
you could also place all your critical programs in one folder, do a directory list and check if every file in that folder has a program running under the same name. or read a txt file line by line, with every line correspoding to a program that is supposed to be running. etcetcetc

A good way is to use the awk command:
tail -f somelog.log | awk '/.*[INFO] Stream Closed.*/ { system("java -jar xyz.jar") }'
This continually monitors the log stream and when the regular expression matches its fires off whatever system command you have set, which is anything you would type into a shell.
If you really wanna be good you can put that line into a .sh file and run that .sh file from a process monitoring daemon like upstart to ensure that it never dies.
Nice and clean =D

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio