How to break shell script if a script it calls produces an error - bash

I'm currently debugging a shell script, which acts as a master-script in a data pipeline. In order to run the pipeline, you feed a bunch of arguments into the shell script. From there, the shell script sequentially calls 6 different scripts [4 in R, 2 in Python], writes out stuff to log files, and so on. Basically, my idea is to use this script to automate a data pipeline that takes a long time to run.
Right now, if any of the individual R or Python scripts break within the shell script, it just jumps to the next script that it's supposed to call. However, running script 03.py requires the data input to scripts 01.R and 02.R to be fully run and processed, otherwise 03 will produce erroneous output data which will then be written out and further processed in later scripts.
What I want to do is,
1. Break the overall shell script if there's an error in any of the R scripts
2. Output a message telling me where this error happened [line of individual R / python script]
Here's a sample of the master.sh shell script which calls the individual scripts.
#############
# STEP 2 : RUNNING SCRIPTS
#############
# A - 01.R
#################################################################
# log_file - this needs to be reassigned for every individual script
log_file=01.log
current_time=$(date)
echo "Current time: $current_time"
echo "Now running script 01. Log file output being written to $log_file_dir$log_file."
Rscript 01.R -f $input_file -s $sql_db > $log_file_dir$log_file
# current time/date
current_time=$(date)
echo "Current time: $current_time"
# B - 02.R
#################################################################
log_file=02.log
current_time=$(date)
echo "Current time: $current_time"
echo "Now running script 02. Log file output being written to $log_file_dir$log_file"
Rscript 02.R -f $input_file -s $sql_db > $log_file_dir$log_file
# PRINT OUT TIMINGS
current_time=$(date)
echo "Current time: $current_time"
This sequence is repeated throughout the master.sh script until script 06.R, after which it collates some data retrieved from output files and log files, and prints them to stout.
Here's some sample output that gets printed by my current master.sh, which shows how the script just keeps moving even though 01.R has produced an error.
file: test-data/minisample.txt
There are a total of 101 elements in file.
Using the main database.
Writing log-files to this directory: log_files/minisample/.
Writing output-csv with classifications to output/minisample.csv.
Current time: Wed Nov 14 18:19:53 UTC 2018
Now running script 01. Log file output being written to log_files/minisample/01.log.
Loading required package: stringi
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: RMySQL
Loading required package: DBI
Loading required package: methods
Loading required package: hms
Error: The following 2 arguments need to be provided:
-f <input file>.csv
-s <MySQL db name>
Execution halted
Current time: Wed Nov 14 18:19:54 UTC 2018
./master.sh: line 95: -1: substring expression < 0
./master.sh: line 100: -1: substring expression < 0
./master.sh: line 104: -1: substring expression < 0
Total time taken to run script 01.R:
Average time taken per user to run script 01.R:
Total time taken to run pipeline so far [01/06]:
Average time taken per user to run pipeline so far [01/06]:
Current time: Wed Nov 14 18:19:54 UTC 2018
Now running script 02. Log file output being written to log_files/minisample/02.log
Seeing as the R script 01.R produces an error, I want the script master.sh to stop. But how?
Any help would be greatly appreciated, thanks in advance!

As another user mentioned, simply running set -e will make your script terminate on first error. However, if you want more control, you can also check the exit status with ${?} or simply $? assuming your program gives an exit code of 0 on success, and non-zero otherwise.
#!/bin/bash
url=https://nosuchaddress1234.com/nosuchpage.html
error_file=errorFile.txt
wget ${url} 2> ${error_file}
exit_status=${?}
if [ ${exit_status} -ne 0 ]; then
echo -n "wget ${url} "
if [ ${exit_status} -eq 4 ]; then
echo "- Network failure."
elif [ ${exit_status} -eq 8 ]; then
echo "- Server issued an error response."
else
echo "- Other error"
fi
echo "See ${error_file} for more details"
exit ${exit_status};
fi

I like to put some boilerplate at the top of most scripts like this -
trap 'echo >&2 "ERROR in $0 at line $LINENO, Aborting"; exit $LINENO;' ERR
set -u
While coding at debugging, I usually add
set -x
And a lot of trace "comments" with colons -
: this will parse its args but only show under set -x
Then the trick is to make sure any errors you know are ok are handled.
Conditionals consume the errors, so those are safe.
if grep foo nonexistantfile
then : do the success stuff
else : if you *want* a failout here, just call false
false here will abort # args don't matter :)
fi
By the same token, if you just want to catch and ignore a known possible error -
ls $mightNotExist ||: # || says "do on fail"; : is an alias for "true"
Just always check your likely errors. Then the only thing that will crash your script is a fail.

Related

Bash script - check how many times public IP changes

I am trying to create my first bash script. The goal of this script is to check at what rate my public IP changes. It is a fairly straight forward script. First it checks if the new address is different from the old one. If so then it should update the old one to the new one and print out the date along with the new IP address.
At this point I have created a simple script in order to accomplish this. But I have two main problems.
First the script keeps on printing out the IP even tough it hasn't changed and I have updated the PREV_IP with the CUR_IP.
My second problem is that I want the output to direct to a file instead of outputting it into the terminal.
The interval is currently set to 1 second for test purposes. This will change to a higher interval in the final product.
#!/bin/bash
while true
PREV_IP=00
do
CUR_IP=$(curl https://ipinfo.io/ip)
if [ $PREV_IP != "$CUR_IP" ]; then
PREV_IP=$CUR_IP
"$(date)"
echo "$CUR_IP"
sleep 1
fi
done
I also get a really weird output. I have edited my public IP to xx.xxx.xxx.xxx:
Sat 20 Mar 09:45:29 CET 2021
xx.xxx.xxx.xxx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
while true
PREV_IP=00
do
is the reason you are seeing ip each loop. It's the same as while true; PREV_IP=00; do. The exit status of true; PREV_IP=00 is the exit status of last command - the exit status of assignment is 0 (success) - so the loop will always execute. But PREV_IP will be reset to 00 each loop... This is a typo and you meant to set prev_ip once, before the loop starts.
"$(date)"
will try execute the output of date command, as a next command. So it will print:
$ "$(date)"
bash: sob, 20 mar 2021, 10:57:02 CET: command not found
And finally, to silence curl, read man curl first and then find out about -s. I use -sS so errors are also visible.
Do not use uppercase variables in your scripts. Prefer lower case variables. Check you scripts with http://shellcheck.net . Quote variable expansions.
I would sleep each loop. Your script could look like this:
#!/bin/bash
prev=""
while true; do
cur=$(curl -sS https://ipinfo.io/ip)
if [ "$prev" != "$cur" ]; then
prev="$cur"
echo "$(date) $cur"
fi
sleep 1
done
that I want the output to direct to a file instead of outputting it into the terminal.
Then research how redirection works in shell and how to use it. The simplest would be to redirect echo output.
echo "$(date) $cur" >> "a_file.txt"
The interval is currently set to 1 second for test purposes. This will change to a higher interval in the final product.
You are still limited with the time it takes to connect to https://ipinfo.io/ip. And from ipinfo.io documentation:
Free usage of our API is limited to 50,000 API requests per month.
And finally, I wrote a script where I tried to use many public services as I found ,get_ip_external for getting external ip address. You may take multiple public services for getting ipv4 address and choose a random/round-robin one so that rate-limiting don't kick that fast.

Get file size not working in scheduled job

I have a bash script running on Ubuntu 18.04. I scheduled it using SYSTEMD timer.
#!/bin/bash
backupdb(){
/usr/bin/mysqldump -u backupuser -pbackuppassword --add-locks --extended-insert --hex-blob $1 > /opt/mysqlbackup/$1.sql
/bin/gzip -c /opt/mysqlbackup/$1.sql > /opt/mysqlbackup/$1-$(date +%A).sql.gz
rm -rf /opt/mysqlbackup/$1.sql
echo `date "+%h %d %H:%M:%S"`": " $1 "- Size:" `/usr/bin/stat -c%s "${1}-$(date +%A).sql.gz"` >> /opt/mysqlbackup/backupsql.log
}
# List of databases to backup
backupdb cardb
backupdb bikedb
When I run this script interactively, the backup log get 2 entries:
Jun 16 20:15:03: cardb - Size: 200345
Jun 16 20:15:12: bikedb - Size: 150123
However, when this is run as a SYSTEMD timer service, the log still gets 2 entries but no file size is given in the log file. Not 0, it's simply blank. The backup file, cardb.sql.gz is created and is non-zero. I can unzip it and it does contain a valid SQL file.
I can't figure out why this is happening.
You need to specify the absolute path of your file
Without specifying the absolute path you are making the assumption that the systemd timer is running your script from the same directory you tested it from. To remedy this, you can either use the absolute path or change directories before accessing your file.
echo `date "+%h %d %H:%M:%S"`": " $1 "- Size:" `/usr/bin/stat -c%s "/opt/mysqlbackup/${1}-$(date +%A).sql.gz"` >> /opt/mysqlbackup/backupsql.log

Does $argv behave the same between Centos and RHEL systems

I am trying to troubleshoot an old TCL accounting script called GOTS - Grant Of The System. What it does is creates a time stamped logfile entry for each user login and another for the logout. The problem is it is not creating the second log file entry on logout. I think I tracked down the area where it is going wrong and I have attached it here. FYI the log file exists and it does not exit with the error "GOTS was called incorrectly!!". It should be executing the if then for [string match "$argv" "end_session"]
This software runs properly on RHEL Linux 6.9 but fails as described on Centos 7. I am thinking that there is a system variable or difference in the $argv argument vector for the different systems that creates this behavior.
Am I correct in suspecting $argv and if not does anyone see the true problem?
How do I print or display the $argv values on logout?
# Find out if we're beginning or ending a session
if { [string match "$argv" "end_session"] } {
if { ![file writable $Log] } {
onErrorNotify "4 LOG"
}
set ifd [open $Log a]
puts $ifd "[clock format [clock seconds]]\t$Instrument\t$LogName\t$GroupName"
close $ifd
unset ifd
exit 0
} elseif { [string match "$argv" "begin_session"] == 0 } {
puts stderr "GOTS was called incorrectly!!"
exit -1
}
end_session is populated by the /etc/gdm/PostSession/Default file
#!/bin/sh
### Begin GOTS PostSession
# Do not run GOTS if root is logging out
if test "${USER}" == "root" ; then
exit 0
fi
/usr/local/lib/GOTS/gots end_session > /var/tmp/gots_postsession.log 2> /var/tmp/gots_postsession.log
exit 0
### End GOTS PostSession
This is the postsession log file:
Application initialization failed: couldn't connect to display ":1"
Error in startup script: invalid command name "option"
while executing
"option add *Font "-adobe-new century schoolbook-medium-r-*-*-*-140-*-*-*-*-*-*""
(file "/usr/local/lib/GOTS/gots" line 26)
After a lot of troubleshooting we have determined that for whatever reason Centos is not allowing part of the /etc/gdm/PostSession/default file to execute:
fi
/usr/local/lib/GOTS/gots end_session
But it does update the PostSession.log file as it should .. . Does anyone have any idea what could be interfering with only part of the PostSession/default?
Does anyone have any idea what could be interfereing with PostSession/default?
Could it be that you are hitting Bug 851769?
That said, am I correct in stating that, as your investigation shows, this is not a Tcl-related issue or question anymore?
So it turns out that our script has certain elements that depend upon the Xserver running on logout to display some of the GUI error messages. This from:
Gnome Configuration
"When a user terminates their session, GDM will run the PostSession script. Note that the Xserver will have been stopped by the time this script is run, so it should not be accessed.
Note that the PostSession script will be run even when the display fails to respond due to an I/O error or similar. Thus, there is no guarantee that X applications will work during script execution."
We are having to rewrite those error message callouts so they simply write the errors to a file instead of depending on the display. The errors are for things that should be there in the beginning anyway.

Running a python script within a bash file within a Yii project

I have a Yii project that allows importing files.
Within this project I call the following command to try and convert xls files to csv:
$file = fopen($model->importfile->tempname,'r');
$filetype = substr($model->importfile, strrpos($model->importfile, '.')+1);
if ($filetype === 'xls')
{
$tempxls = $model->importfile->tempname;
$outputArr = array();
exec(Yii::app()->basePath."/commands/xlstocsv.sh " . $tempxls, $outputArr);
PropertiesController::xlsToConsoleV7Format($tempxls, $log);
}
xlstocsv.sh:
#!/bin/bash
# Try to autodetect OOFFICE and OOOPYTHON.
OOFFICE=`ls /usr/bin/libreoffice /usr/lib/libreoffice/program/soffice /usr/bin/X11/libreoffice | head -n 1`
OOOPYTHON=`ls /usr/bin/python3 | head -n 1`
XLS='.xls'
CSV='.csv'
INPUT=$1$XLS
OUTPUT=$1$CSV
cp $1 $INPUT
if [ ! -x "$OOFFICE" ]
then
echo "Could not auto-detect OpenOffice.org binary"
exit
fi
if [ ! -x "$OOOPYTHON" ]
then
echo "Could not auto-detect OpenOffice.org Python"
exit
fi
echo "Detected OpenOffice.org binary: $OOFFICE"
echo "Detected OpenOffice.org python: $OOOPYTHON"
# Start OpenOffice.org in listening mode on TCP port 2002.
$OOFFICE "-accept=socket,host=localhost,port=2002;urp;StarOffice.ServiceManager" -norestore -nofirststartwizard -nologo -headless &
# Wait a few seconds to be sure it has started.
sleep 5s
# Convert as many documents as you want serially (but not concurrently).
# Substitute whichever documents you wish.
$OOOPYTHON /fullpath/DocumentConverter.py $INPUT $OUTPUT
# Close OpenOffice.org.
cp $OUTPUT $1
DocumentConverter.py:
This can be found here: https://github.com/mirkonasato/pyodconverter. It has been slightly modified to have correct syntax for python3.
Ok, the issue is, when running the php code from the terminal, it correctly creates the csv file from the excel file.
However, when running it from within the browser, it still runs the script and creates the output file, but it has not correctly converted it into csv.
It all works perfectly for every file I have thrown at it so far when running from console, but for some reason when running it from within a browser, it fails to convert the file properly.
Any ideas for what could be going wrong?
Thanks alejandro, permission errors seemed to be the issue. Also I needed to move the .config/librroffice folder into apaches home directory.

if else statement incorrect output

I'm working on a custom Nagios script that will monitor cPanel to make sure it is running and give back a status depending on what it gets from an output of service cpanel status. This is what I have:
##############################################################################
# Constants
cpanelstate="running..."
ALERT_OK="OK - cPanel is running"
ALERT_CRITICAL="CRITICAL - cPanel is NOT running"
###############################################################################
cpanel=$(service cpanel status | head -1)
if [ "$cpanel" = "$cpanelstate" ]; then
echo $ALERT_OK
exit 0
else
echo $ALERT_CRITICAL
exit 2
fi
exit $exitstatus
When I run the script, this is the output I get:
root#shared01 [/home/mvelez]# /usr/local/nagios/libexec/check_cpanel
CRITICAL - cPanel is NOT running
When I run the script, cPanel IS RUNNING but this is the output I get. As a matter of fact, no matter what the status reports for cPanel this is the output that comes out. When I comment out the ELSE, ECHO and EXIT 2 statement:
#else
# echo $ALERT_CRITICAL
# exit 2
It gives back a blank output:
root#shared01 [/home/mvelez]# /usr/local/nagios/libexec/check_cpanel
root#shared01 [/home/mvelez]#
I'm not sure what I'm not doing correctly as I am very new to bash scripting and trying to learn as I go along. Thank you in advanced for any and all help very very much!
The code below should work, but you might need to run it with sudo, because 'service' might not be available for ordinary users.
#!/bin/bash
##############################################################################
# Constants
cpanelstate="running"
ALERT_OK="OK - cPanel is running"
ALERT_CRITICAL="CRITICAL - cPanel is NOT running"
###############################################################################
cpanel=$(service apache2 status | head -1)
echo CPANEL $cpanel
if [[ $cpanel == *$cpanelstate* ]]; then
echo $ALERT_OK
exit 0
else
echo $ALERT_CRITICAL
exit 2
fi
#Oleg Gryb's answer solves your problem, but as for why your original script didn't work:
[ "$cpanel" = "$cpanelstate" ] compared the full command output - e.g., cpsrvd (pid 10066) is running..., against a substring of the expected output, running... for equality, which will obviously fail.
The solution is to use bash's pattern matching, provided via the right-hand side of its [[ ... ]] conditional (bash's superior alternative to the [ ... ] conditional):
[[ "$cpanel" == *"$cpanelstate" ]]
* represents any sequence of characters, so that this conditional returns true, if $cpanel ends with $cpanelstate (note how * must be unquoted to be recognized as a special pattern char.)

Resources