Shell script that continuously checks a text file for log data and then runs a program - shell

I have a java program that stops often due to errors which is logged in a .log file. What can be a simple shell script to detect a particular text in the last/latest line say
[INFO] Stream closed
and then run the following command
java -jar xyz.jar
This should keep on happening forever(possibly after every two minutes or so) because xyz.jar writes the log file.
The text stream closed can arrive a lot of times in the log file. I just want it to take an action when it comes in the last line.

How about
while [[ true ]];
do
sleep 120
tail -1 logfile | grep -q "[INFO] Stream Closed"
if [[ $? -eq 1 ]]
then
java -jar xyz.jar &
fi
done

There may be condition where the tailed last log "Stream Closed" is not the real last log and the process is still logging the messages. We can avoid this condition by checking if the process is alive or not. If the process exited and the last log is "Stream Closed" then we need to restart the application.
#!/bin/bash
java -jar xyz.jar &
PID=$1
while [ true ]
do
tail -1 logfile | grep -q "Stream Closed" && kill -0 $PID && sleep 20 && continue
java -jar xyz.jar &
PID=$1
done

I would prefer checking whether the corresponding process is still running and restart the program on that event. There might be other errors that cause the process to stop. You can use a cronjob to periodically (like every minute) perform such a check.
Also, you might want to improve your java code so that it does not crash that often (if you have access to the code).

i solved this using a watchdog script that checks directly (grep) if program(s) is(are) running. by calling watchdog every minute (from cron under ubuntu), i basically guarantee (programs and environment are VERY stable) that no program will stay offline for more than 59 seconds.
this script will check a list of programs using the name in an array and see if each one is running, and, in case not, start it.
#!/bin/bash
#
# watchdog
#
# Run as a cron job to keep an eye on what_to_monitor which should always
# be running. Restart what_to_monitor and send notification as needed.
#
# This needs to be run as root or a user that can start system services.
#
# Revisions: 0.1 (20100506), 0.2 (20100507)
# first prog to check
NAME[0]=soc_gt2
# 2nd
NAME[1]=soc_gt0
# 3rd, etc etc
NAME[2]=soc_gp00
# START=/usr/sbin/$NAME
NOTIFY=you#gmail.com
NOTIFYCC=you2#mail.com
GREP=/bin/grep
PS=/bin/ps
NOP=/bin/true
DATE=/bin/date
MAIL=/bin/mail
RM=/bin/rm
for nameTemp in "${NAME[#]}"; do
$PS -ef|$GREP -v grep|$GREP $nameTemp >/dev/null 2>&1
case "$?" in
0)
# It is running in this case so we do nothing.
echo "$nameTemp is RUNNING OK. Relax."
$NOP
;;
1)
echo "$nameTemp is NOT RUNNING. Starting $nameTemp and sending notices."
START=/usr/sbin/$nameTemp
$START 2>&1 >/dev/null &
NOTICE=/tmp/watchdog.txt
echo "$NAME was not running and was started on `$DATE`" > $NOTICE
# $MAIL -n -s "watchdog notice" -c $NOTIFYCC $NOTIFY < $NOTICE
$RM -f $NOTICE
;;
esac
done
exit
i do not use the log verification, though you could easily incorporate that into your own version (just change grep for log check, for example).
if you run it from command line (or putty, if you are remotely connected), you will see what was working and what wasnt. have been using it for months now without a hiccup. just call it whenever you want to see what's working (regardless of it running under cron).
you could also place all your critical programs in one folder, do a directory list and check if every file in that folder has a program running under the same name. or read a txt file line by line, with every line correspoding to a program that is supposed to be running. etcetcetc

A good way is to use the awk command:
tail -f somelog.log | awk '/.*[INFO] Stream Closed.*/ { system("java -jar xyz.jar") }'
This continually monitors the log stream and when the regular expression matches its fires off whatever system command you have set, which is anything you would type into a shell.
If you really wanna be good you can put that line into a .sh file and run that .sh file from a process monitoring daemon like upstart to ensure that it never dies.
Nice and clean =D

Related

Is there a way to know if the `script` command is running in the current bash session?

I'm attempting to add to my ~/.zshrc file a command to record the command line inputs and outputs. I have the script (https://www.tecmint.com/record-and-replay-linux-terminal-session-commands-using-script/) command required running but I'm having an issue with it seeming to infinitely attempt to record my session. I believe this is due to the fact that when you run the script command it starts a new bash session which in turn runs the ~/.zshrc which then tries to record the session which restarts the session (etc. etc.). This causes it to get into an infinite loop of attempting to record the session.
Attempt 1
Relevant piece of my ~/.zshrc
# RECORD COMMAND INPUT AND OUTPUT
LOG_PATH=/var/log/terminal/$(date +'%Y%m%d')
LOG_FILE=${LOG_PATH}/$(date +'%H%M%S').log
mkdir -p ${LOG_PATH}
script ${LOG_FILE}
This results in a session which constantly prints the following:
Script started, output file is /var/log/terminal/20200109/141849.log
Script started, output file is /var/log/terminal/20200109/141850.log
Script started, output file is /var/log/terminal/20200109/141851.log
Script started, output file is /var/log/terminal/20200109/141852.log
Script started, output file is /var/log/terminal/20200109/141853.log
Script started, output file is /var/log/terminal/20200109/141854.log
Script started, output file is /var/log/terminal/20200109/141855.log
... repeat infinitely ...
Attempt 2
One attempt at working around this was to check if a file already exists and then go on with recording (which works somewhat but sometimes it creates 2 files or sometimes the recording doesn't start say if I open 2 command sessions in rapid succession).
Upgraded script
# RECORD COMMAND INPUT AND OUTPUT
# Check if the recording of a file for the given time has already started
# as it seems that once you start the script recording it re-starts the session
# which in turn re-runs this file which attempts to script again running into an infinite loop
LOG_PATH=/var/log/terminal/$(date +'%Y%m%d')
LOG_FILE=${LOG_PATH}/$(date +'%H%M%S').log
if [[ ! -f $LOG_FILE ]]; then
mkdir -p ${LOG_PATH}
script ${LOG_FILE}
fi
Again this either yields no recording (in the case of opening 2 sessions quickly) or it results in the recording being done twice
Script started, output file is /var/log/terminal/20200109/141903.log
Script started, output file is /var/log/terminal/20200109/141904.log
Attempt 3
Another attempt was to check the bash history and see if the last command contained the word script. Problem with this attempt was that the bash session doesn't seem to have history when starting and hence gave the following error (and then infinitely tried to start the recording session like the first attempt):
# RECORD COMMAND INPUT AND OUTPUT
# Check what the last command was to make sure that it wasn't the script starting
# as it seems that once you start the script recording it re-starts the session
# which in turn re-runs this file which attempts to script again running into an infinite loop
LAST_HISTORY=$(history -1)
if [[ "$LAST_HISTORY" != *"script"* ]]; then
LOG_PATH=/var/log/terminal/$(date +'%Y%m%d')
LOG_FILE=${LOG_PATH}/$(date +'%H%M%S').log
mkdir -p ${LOG_PATH}
script ${LOG_FILE}
fi
Output
Last login: Thu Jan 9 14:27:30 on ttys009
Script started, output file is /var/log/terminal/20200109/142754.log
Script started, output file is /var/log/terminal/20200109/142755.log
omz_history:fc:13: no such event: 0
omz_history:fc:13: no events in that range
Script started, output file is /var/log/terminal/20200109/142755.log
omz_history:fc:13: no such event: 0
omz_history:fc:13: no events in that range
Script started, output file is /var/log/terminal/20200109/142755.log
Script started, output file is /var/log/terminal/20200109/142756.log
^C% danielcarmo#Daniels-MacBook-Pro-2 git %
Any suggestions or thoughts on this would be much appreciated.
Instead of trying to figure out whether or not script is running, simply leave a breadcrumb for yourself:
if [ -z "$BEING_LOGGED" ]
then
export BEING_LOGGED="yes"
LOG_PATH="/var/log/terminal/$(date +'%Y%m%d')"
LOG_FILE="${LOG_PATH}/$(date +'%H%M%S').log"
mkdir -p "${LOG_PATH}"
exec script "${LOG_FILE}"
fi
The first time the file is sourced, BEING_LOGGED will be unset. When script is invoked and the file is sourced again, it'll be set, and you can skip logging.
script adds the variable SCRIPT to the environment of the command it runs. You can check for that to decide whether script needs to run.
# RECORD COMMAND INPUT AND OUTPUT
log_path=/var/log/terminal/$(date +'%Y%m%d')
log_file=${log_path}/$(date +'%H%M%S').log
mkdir -p "${log_path}"
[ -z "$SCRIPT" ] && script "${log_file}"
The value of SCRIPT is the name of the file being logged to.

Loop trough docker output until I find a String in bash

I am quite new to bash (barely any experience at all) and I need some help with a bash script.
I am using docker-compose to create multiple containers - for this example let's say 2 containers. The 2nd container will execute a bash command, but before that, I need to check that the 1st container is operational and fully configured. Instead of using a sleep command I want to create a bash script that will be located in the 2nd container and once executed do the following:
Execute a command and log the console output in a file
Read that file and check if a String is present. The command that I will execute in the previous step will take a few seconds (5 - 10) seconds to complete and I need to read the file after it has finished executing. I suppose i can add sleep to make sure the command is finished executing or is there a better way to do this?
If the string is not present I want to execute the same command again until I find the String I am looking for
Once I find the string I am looking for I want to exit the loop and execute a different command
I found out how to do this in Java, but if I need to do this in a bash script.
The docker-containers have alpine as an operating system, but I updated the Dockerfile to install bash.
I tried this solution, but it does not work.
#!/bin/bash
[command to be executed] > allout.txt 2>&1
until
tail -n 0 -F /path/to/file | \
while read LINE
do
if echo "$LINE" | grep -q $string
then
echo -e "$string found in the console output"
fi
done
do
echo "String is not present. Executing command again"
sleep 5
[command to be executed] > allout.txt 2>&1
done
echo -e "String is found"
In your docker-compose file make use of depends_on option.
depends_on will take care of startup and shutdown sequence of your multiple containers.
But it does not check whether a container is ready before moving to another container startup. To handle this scenario check this out.
As described in this link,
You can use tools such as wait-for-it, dockerize, or sh-compatible wait-for. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections.
OR
Alternatively, write your own wrapper script to perform a more application-specific health check.
In case you don't want to make use of above tools then check this out. Here they use a combination of HEALTHCHECK and service_healthy condition as shown here. For complete example check this.
Just:
while :; do
# 1. Execute a command and log the console output in a file
command > output.log
# TODO: handle errors, etc.
# 2. Read that file and check if a String is present.
if grep -q "searched_string" output.log; then
# Once I find the string I am looking for I want to exit the loop
break;
fi
# 3. If the string is not present I want to execute the same command again until I find the String I am looking for
# add ex. sleep 0.1 for the loop to delay a little bit, not to use 100% cpu
done
# ...and execute a different command
different_command
You can timeout a command with timeout.
Notes:
colon is a utility that returns a zero exit status, much like true, I prefer while : instead of while true, they mean the same.
The code presented should work in any posix shell.

LSF - automatic job rerun using sasbatch script

I am trying to create an auto-rerun mechanism by implementing some code into sasbatch script after sascommand will finish. General idea is to:
locate a log of sas process and an id of the flow containing current job,
check if the log contains particular ORA-xxxxx errors which we know that solution for them is just rerun of the process,
if so, then trigger jrerun class from LSF Platform Command Line Interface,
exit sasbatch passing $rc to LSF
The idea was implemented as:
#define used paths
log_dir=/path/to/sas_logs_directory
out_log=/path/to/auto-rerun_log.txt
out_log2=/path/to/lsf_rerun_log.txt
if [ -n "${LSB_JOBNAME}"]; then
if [ ! -f "$out_log"]; then
touch $out_log
fi
#get flow runtime attributes
IFS-: read -r flow_id username flow_name job_name <<< "${LSB_JOBNAME}"
#find log of the current process
log_path=$(ls -t $log_dir/*.log | xargs grep -li "job:\s*$job_name" | grep -i "/$flow_name_" | head -1)
#set path to txt file containing lines which represents ORA errors we look for
conf_path-/path/to/error_list
#analyse process' log line by line
while read -r line;
do
#if error is found in log then try to rerun flow
if grep -q "$line" $log_path; then
(nohup /path/to/rerun_script.sh $flow_id >$out_log2 2>&1) &
disown
break
fi
done < $conf_path
fi
While rerun_script is the script which calls jrerun class after sleep command - in order to let parent script exit $rc in the meanwhile. It looks like:
sleep 10
/some/lsf/path/jrerun
Problem is that job is running for the all time. In LSF history I can see that jrerun was called before job exited.
Furthermore in $out_log2 I can see message: <flow_id> has no starting or exit points.
Do anyone have an idea how I can pass return code to LSF before jrerun calling? Or maybe some simplier way to perform autorerun of SAS jobs in Platform LSF?
I am using SAS 9.4 and Platform Process Manager 9.1
Or maybe some simplier way to perform autorerun of SAS jobs in Platform LSF?
I'm not knowledgeable about the SAS part. But on the LSF side there's at least a couple of ways to requeue the job.
If you have control of the job script, you can use special process exit value to automatically requeue the job.
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/job_requeue_about.html
If you have control outside of the job script, you can use brequeue -r to requeue a running job.
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/brequeue.1.html
Good Luck
I managed to get this working by using two additional configuration files. When my grep returnes 1 I add found flow_id to flow_list.txt configuration file and modify especially made trigger_file.txt.
I scheduled additional flow execute_rerun in LSF which is triggered after file trigger_file.txt is modified. The execute_rerun flow reads flow_list.txt configuration file line by line and calls jrerun method on each flow.
I managed to achieve an automatic rerun of the flows which fails due to particular errors.

supervisor or cat or bash doesn't see changed file

I have a supervisor 'program' (queue.sh) that runs a bash script that reads from a text file and runs another program if the text file is not empty:
while true; do
# Read line 1
line=$(head -n1 ./.queue)
echo $line
if [ "$line" != "" ]; then
# Remove line 1
sed -i '1d' ./.queue
# Call the download script
/usr/bin/env php download.php $line
else
echo "Queue empty"
fi
sleep 2
done
download.php takes a few minutes, and then the next line is ran. If .queue contains 20 lines, queue.sh will be busy a long time. When it's not busy, it checks .queue for content every 2 sec.
My queue file currently has 3 lines. When I run queue.sh manually, I can see it working: it finds line 1, chops it off, and runs download.php. Supervisor doesn't! Even after restarting the job and the entire daemon, it just keeps saying Queue empty every 2 sec. After changing the message, and restarting the job, it now keeps saying Queue empty II every 2 sec. But .queue is not empty, so it should run the lines.
Is there some kind of super persistent file cache in the supervisor context? I've restarted the job and the daemon. I've run free && sync && echo 3 > /proc/sys/vm/drop_caches && free (not inside supervisor context!). I've touch .queue. Nothing works. Why does supervisor's queue.sh insist .queue is empty?
The problem was the directory. Supervisor runs in / by default, not where the program executable happens to be. Supervisor has a directory config which fixed it. New config file:
[program:xx]
command=/bin/bash queue.sh
directory=/my/path/xx
And I had to restart the service (service supervisor restart), not just the program (supervisorctl restart xx).

Check processes run by cronjob to avoid multiple execution

How do I avoid cronjob from executing multiple times on the same command? I had tried to look around and try to check and kill in processes but it doesn't work with the below code. With the below code it keeps entering into else condition where it suppose to be "running". Any idea which part I did it wrongly?
#!/bin/sh
devPath=`ps aux | grep "[i]mport_shell_script"` | xargs
if [ ! -z "$devPath" -a "$devPath" != " " ]; then
echo "running"
exit
else
while true
do
sudo /usr/bin/php /var/www/html/xxx/import_from_datafile.php /dev/null 2>&1
sleep 5
done
fi
exit
cronjob:
*/2 * * * * root /bin/sh /var/www/html/xxx/import_shell_script.sh /dev/null 2>&1
I don't see the point to add a cron job which then starts a loop that runs a job. Either use cron to run the job every minute or use a daemon script to make sure your service is started and is kept running.
To check whether your script is already running, you can use a lock directory (unless your daemon framework already does that for you):
LOCK=/tmp/script.lock # You may want a better name here
mkdir $LOCK || exit 1 # Exit with error if script is already running
trap "rmdir $LOCK" EXIT # Remove the lock when the script terminates
...normal code...
If your OS supports it, then /var/lock/script might be a better path.
Your next question is probably how to write a daemon. To answer that, I need to know what kind of Linux you're using and whether you have things like systemd, daemonize, etc.
check the presence of a file at the beginning of your script ( for example /tmp/runonce-import_shell_script ). If it exists, that means the same script is already running (or the previous one halted with an error).
You can also add a timestamp in that file so you can check since when the script was running (and maybe decide to run it again after 24h even if the file is present)

Resources