Getting the exit code of a Python script launched in a subshell with Bash - bash

I want to run a Bash script every minute (through a CRON entry) to launch a series of Python scripts in a granular (time-wise) fashion.
So far, this is the script I've made:
# set the current date
DATE=`date +%Y-%m-%d`
# set the current system time (HH:MM)
SYSTIME=`date +%H-%M`
# parse all .py script files in the 'daily' folder
for f in daily/*.py; do
if [[ -f $f ]]; then
# set the script name
SCRIPT=$(basename $f)
# get the script time
SCRTIME=`echo $SCRIPT | cut -c1-5`
# execute the script only if its intended execution time and the system time match
if [[ $SCRTIME == $SYSTIME ]]; then
# ensure the directory exists
install -v -m755 -d done/$DATE/failed
# execute the script
python3 evaluator.py $f > done/$DATE/daily-$SCRIPT.log &
# evaluate the result
if [ $? -eq 0 ]; then
# move the script to the 'done' folder
cp $f done/$DATE/daily-$SCRIPT
else
# log the failure
mv done/$DATE/daily-$SCRIPT.log done/$DATE/failed/
# move the script to the 'retry' folder for retrial
cp $f retry/daily-$SCRIPT
fi
fi
fi
done
Let's say we have the following files in a folder called daily/ (for daily execution):
daily/08-00-script-1.py
daily/08-00-script-2.py
daily/08-05-script-3.py
daily/09-20-script-4.py
The idea for granular execution is that a CRON task runs this script every minute. I fetch the system time and extract the execution time for each script and when the time matches between the system and the script file, it gets handed over to Python. So far, so good.
I know this script is not right in the sense that it gets a subshell for each script that's going to be executed but the following code is wrong as I know Bash automatically returns 0 on subshell invocation (if I did read correctly while searching on Google) and what I need is for each subshell to execute the code below so, if it fails, it gets sent to another folder (retry/) which is controlled by another Bash script running checks every 30 minutes for retrial (it's the same script as this one minus the checking part).
So, basically, I need to run this:
# evaluate the result
if [ $? -eq 0 ]; then
# move the script to the 'done' folder
cp $f done/$DATE/daily-$SCRIPT
else
# log the failure
mv done/$DATE/daily-$SCRIPT.log done/$DATE/failed/
# move the script to the 'retry' folder for retrial
cp $f retry/daily-$SCRIPT
fi
For every subshell-ed execution. How can I do this the right way?

Bash may return 0 for every sub-shell invocation, but if you wait for the result, then you will get the result (and I see no ampersand). If the python3 commands is relaying the exit code of the script, then your code will work. If your code does not catch an error, then it is the fault of python3 and you need to create error communication. Redirecting the output of stderr might be helpful, but first verify that your code does not work.

Related

In bash i'm building an update script, how to update the updater script

I am building a little script to update application files on a raspberry pi.
It will do the following:
Download a zip file of the application files
Unzip them
Copy each one to the right place and make it executable etc as needed.
The problem i'm having is that one of the files is updatescript.sh.
I've read that it is dangerous to update / change a bash script while it is executing. See Edit shell script while it's running
Is there a good way to achieve what I'm trying to do?
What you've read is badly overblown.
It's completely safe to overwrite a shell script in-place by mving a different file over it. When you do this, the old file handle is still valid, referring to the original unmodified file contents. What you can't safely do is edit the existing file in-place.
So, the below is fine (and is what all your OS-vendor update tools like RPM do in effect):
#!/usr/bin/env bash
tempfile=$(mktemp "$BASH_SOURCE".XXXXXX)
if curl https://example.com/whatever >"$tempfile" &&
curl https://example.com/whatever.sig >"$tempfile.sig" &&
gpgv "$tempfile.sig" "$tempfile"; then
chown --reference="$BASH_SOURCE" -- "$tempfile"
chmod --reference="$BASH_SOURCE" -- "$tempfile"
sync # force your filesystem to fully flush file contents to disk
mv -- "$tempfile" "$BASH_SOURCE" && rm -f -- "$tempfile.sig"
else
rm -f -- "$tempfile" "$tempfile.sig"
exit 1
fi
...whereas this is risky:
curl https://example.com/whatever >/usr/local/bin/whatever
So do the first, thing, not the second one: When downloading a new version of your script, write that to a different file, and only rename it over the original when the download succeeded. That's what you want to do anyhow to ensure atomicity.
(There are also some demonstrations of code-signing validation practices above because, well, you need them when building an updater. You wouldn't be trying to distribute code via an automated download without verifying a signature, right? Because that's how one simple breakin to your web server results in every single one of your customers being 0wned. The above expects the public side of your code-signing keys to be in ~/.gnupg/trustedkeys.gpg, but you can put trustedkeys.gpg in any directory and point to it with the environment variable GNUPGHOME).
Even if you don't write your update code safely, the risk is still trivial to mitigate. If you move the body of your script into a function, such that it has to be completely read before any part of it can be executed, then there's no part of the file that isn't already read at the time when execution begins.
#!/usr/bin/env bash
main() {
echo "Logic all goes here"
}; { main "$#"; exit; }
Because { main "$#"; exit; } is part of a compound command, the parser reads the exit before it starts executing the main, so it's guaranteed that no further source-file content will be read after main exits, even if some future bash release didn't handle input line-by-line in the first place.
Basically do something along:
shouldbe="/tmp/$(basename "$0")"
if [ "$0" != "$shouldbe" ]; then
cp "$0" "$shouldbe"
exec env REALPATH="$0" "$shouldbe" "$#"
fi
Check if you are running from a temporary directory
If you are not, copy yourself and rerun from the temporary directory
You can even pass some variables/state along, by using environmental variables or arguments. Then you can update yourself by using simple cp, as the old path isn't sourced (or even opened) anymore.
cp "new_script_version.sh" "$REALPATH"
The script simply looks like this:
#!/bin/bash
# we need to be run from /tmp directory
shouldbe="/tmp/$(basename "$0")"
if [ "$0" != "$shouldbe" ]; then
cp "$0" "$shouldbe"
exec env REALPATH="$0" "$shouldbe" "$#"
fi
echo "Updatting...."
echo "downloading zip files"
echo "unziping zip files..."
echo "Copying each zip files etc."
cp directory"new_updatescript.sh "$REALPATH"
echo "Update succedded"
Live/test version available at tutorialspoint.
One would also implement some flock locking to the scripts just in case.

Multiple bash script with different parameters

I have the following bash script, that I launch using the terminal.
dataset_dir='/home/super/datasets/Carpets_identification/data'
dest_dir='/home/super/datasets/Carpets_identification/augmented-data'
# if dest_dir does not exist -> create it
if [ ! -d ${dest_dir} ]; then
mkdir ${dest_dir}
fi
# for all folder of the dataset
for folder in ${dataset_dir}/*; do
curr_folder="${folder##*/}"
echo "Processing $curr_folder category"
# get all files
for item in ${folder}/*; do
# if the class dir in dest_dir does not exist -> create it
if [ ! -d ${dest_dir}/${curr_folder} ]; then
mkdir ${dest_dir}/${curr_folder}
fi
# for each file
if [ -f ${item} ]; then
# echo ${item}
filename=$(basename "$item")
extension="${filename##*.}"
filename=`readlink -e ${item}`
# get a certain number of patches
for i in {1..100}
do
python cropper.py ${filename} ${i} ${dest_dir}
done
fi
done
done
Given that it needs at least an hour to process all the files.
What happens if I change the '100' with '1000' in the last for loop and launch another instance of the same script?
Will the first process count to 1000 or will continue to count to 100?
I think the file will be readonly when a bash process executes it. But you can force the change. The already running process will count to its original value, 100.
You have to take care about the results. You are writing in the same output directory and have to expect side effects.
"When you make changes to your script, you make the changes on the disk(hard disk- the permanent storage); when you execute the script, the script is loaded to your memory(RAM).
(see https://askubuntu.com/questions/484111/can-i-modify-a-bash-script-sh-file-while-it-is-running )
BUT "You'll notice that the file is being read in at 8KB increments, so Bash and other shells will likely not load a file in its entirety, rather they read them in in blocks."
(see https://unix.stackexchange.com/questions/121013/how-does-linux-deal-with-shell-scripts )
So, in your case, all your script is loaded in the RAM memory by the script interpretor, and then executed. Meaning that if you change the value, then execute it again, the first instance will still have the "old" value.

Create a detailed self tracing log in bash

I know you can create a log of the output by typing in script nameOfLog.txt and exit in terminal before and after running the script, but I want to write it in the actual script so it creates a log automatically. There is a problem I'm having with the exec >>log_file 2>&1 line:
The code redirects the output to a log file and a user can no longer interact with it. How can I create a log where it just basically copies what is in the output?
And, is it possible to have it also automatically record the process of files that were copied? For example, if a file at /home/user/Deskop/file.sh was copied to /home/bckup, is it possible to have that printed in the log too or will I have to write that manually?
Is it also possible to record the amount of time it took to run the whole process and count the number of files and directories that were processed or am I going to have to write that manually too?
My future self appreciates all the help!
Here is my whole code:
#!/bin/bash
collect()
{
find "$directory" -name "*.sh" -print0 | xargs -0 cp -t ~/bckup #xargs handles files names with spaces. Also gives error of "cp: will not overwrite just-created" even if file didn't exist previously
}
echo "Starting log"
exec >>log_file 2>&1
timelimit=10
echo "Please enter the directory that you would like to collect.
If no input in 10 secs, default of /home will be selected"
read -t $timelimit directory
if [ ! -z "$directory" ] #if directory doesn't have a length of 0
then
echo -e "\nYou want to copy $directory." #-e is so the \n will work and it won't show up as part of the string
else
directory=/home/
echo "Time's up. Backup will be in $directory"
fi
if [ ! -d ~/bckup ]
then
echo "Directory does not exist, creating now"
mkdir ~/bckup
fi
collect
echo "Finished collecting"
exit 0
To answer the "how to just copy the output" question: use a program called tee and then a bit of exec magic explained here:
redirect COPY of stdout to log file from within bash script itself
Regarding the analytics (time needed, files accessed, etc) -- this is a bit harder. Some programs that can help you are time(1):
time - run programs and summarize system resource usage
and strace(1):
strace - trace system calls and signals
Check the man pages for more info. If you have control over the script it will be probably easier to do the logging yourself instead of parsing strace output.

Is this a valid self-update approach for a bash script?

I'm working on a script that has gotten so complex I want to include an easy option to update it to the most recent version. This is my approach:
set -o errexit
SELF=$(basename $0)
UPDATE_BASE=http://something
runSelfUpdate() {
echo "Performing self-update..."
# Download new version
wget --quiet --output-document=$0.tmp $UPDATE_BASE/$SELF
# Copy over modes from old version
OCTAL_MODE=$(stat -c '%a' $0)
chmod $OCTAL_MODE $0.tmp
# Overwrite old file with new
mv $0.tmp $0
exit 0
}
The script seems to work as intended, but I'm wondering if there might be caveats with this kind of approach. I just have a hard time believing that a script can overwrite itself without any repercussions.
To be more clear, I'm wondering, if, maybe, bash would read and execute the script line-by-line and after the mv, the exit 0 could be something else from the new script. I think I remember Windows behaving like that with .bat files.
Update: My original snippet did not include set -o errexit. To my understanding, that should keep me safe from issues caused by wget.
Also, in this case, UPDATE_BASE points to a location under version control (to ease concerns).
Result: Based on the input from these answers, I constructed this revised approach:
runSelfUpdate() {
echo "Performing self-update..."
# Download new version
echo -n "Downloading latest version..."
if ! wget --quiet --output-document="$0.tmp" $UPDATE_BASE/$SELF ; then
echo "Failed: Error while trying to wget new version!"
echo "File requested: $UPDATE_BASE/$SELF"
exit 1
fi
echo "Done."
# Copy over modes from old version
OCTAL_MODE=$(stat -c '%a' $SELF)
if ! chmod $OCTAL_MODE "$0.tmp" ; then
echo "Failed: Error while trying to set mode on $0.tmp."
exit 1
fi
# Spawn update script
cat > updateScript.sh << EOF
#!/bin/bash
# Overwrite old file with new
if mv "$0.tmp" "$0"; then
echo "Done. Update complete."
rm \$0
else
echo "Failed!"
fi
EOF
echo -n "Inserting update process..."
exec /bin/bash updateScript.sh
}
(At least it doesn't try to continue running after updating itself!)
The thing that makes me nervous about your approach is that you're overwriting the current script (mv $0.tmp $0) as it's running. There are a number of reasons why this will probably work, but I wouldn't bet large amounts that it's guaranteed to work in all circumstances. I don't know of anything in POSIX or any other standard that specifies how the shell processes a file that it's executing as a script.
Here's what's probably going to happen:
You execute the script. The kernel sees the #!/bin/sh line (you didn't show it, but I presume it's there) and invokes /bin/sh with the name of your script as an argument. The shell then uses fopen(), or perhaps open() to open your script, reads from it, and starts interpreting its contents as shell commands.
For a sufficiently small script, the shell probably just reads the whole thing into memory, either explicitly or as part of the buffering done by normal file I/O. For a larger script, it might read it in chunks as it's executing. But either way, it probably only opens the file once, and keeps it open as long as it's executing.
If you remove or rename a file, the actual file is not necessarily immediately erased from disk. If there's another hard link to it, or if some process has it open, the file continues to exist, even though it may no longer be possible for another process to open it under the same name, or at all. The file is not physically deleted until the last link (directory entry) that refers to it has been removed, and no processes have it open. (Even then, its contents won't immediately be erased, but that's going beyond what's relevant here.)
And furthermore, the mv command that clobbers the script file is immediately followed by exit 0.
BUT it's at least conceivable that the shell could close the file and then re-open it by name. I can't think of any good reason for it to do so, but I know of no absolute guarantee that it won't.
And some systems tend to do stricter file locking that most Unix systems do. On Windows, for example, I suspect that the mv command would fail because a process (the shell) has the file open. Your script might fail on Cygwin. (I haven't tried it.)
So what makes me nervous is not so much the small possibility that it could fail, but the long and tenuous line of reasoning that seems to demonstrate that it will probably succeed, and the very real possibility that there's something else I haven't thought of.
My suggestion: write a second script whose one and only job is to update the first. Put the runSelfUpdate() function, or equivalent code, into that script. In your original script, use exec to invoke the update script, so that the original script is no longer running when you update it. If you want to avoid the hassle of maintaining, distributing, and installing two separate scripts. you could have the original script create the update script with a unique in /tmp; that would also solve the problem of updating the update script. (I wouldn't worry about cleaning up the autogenerated update script in /tmp; that would just reopen the same can of worms.)
Yes, but ... I would recommend you keep a more layered version of your script's history, unless the remote host can also perform version-control with histories. That being said, to respond directly to the code you have posted, see the following comments ;-)
What happens to your system when wget has a hiccup, quietly overwrites part of your working script with only a partial or otherwise corrupt copy? Your next step does a mv $0.tmp $0 so you've lost your working version. (I hope you have it in version control on the remote!)
You can check to see if wget returns any error messages
if ! wget --quiet --output-document=$0.tmp $UPDATE_BASE/$SELF ; then
echo "error on wget on $UPDATE_BASE/$SELF"
exit 1
fi
Also, Rule-of-thumb tests will help, i.e.
if (( $(wc -c < $0.tmp) >= $(wc -c < $0) )); then
mv $0.tmp $0
fi
but are hardly foolproof.
If your $0 could windup with spaces in it, better to surround all references like "$0".
To be super-bullet proof, consider checking all command returns AND that Octal_Mode has a reasonable value
OCTAL_MODE=$(stat -c '%a' $0)
case ${OCTAL_MODE:--1} in
-[1] )
printf "Error : OCTAL_MODE was empty\n"
exit 1
;;
777|775|755 ) : nothing ;;
* )
printf "Error in OCTAL_MODEs, found value=${OCTAL_MODE}\n"
exit 1
;;
esac
if ! chmod $OCTAL_MODE $0.tmp ; then
echo "error on chmod $OCTAL_MODE %0.tmp from $UPDATE_BASE/$SELF, can't continue"
exit 1
fi
I hope this helps.
Very late answer here, but as I just solved this too, I thought it might help someone to post the approach:
#!/usr/bin/env bash
#
set -fb
readonly THISDIR=$(cd "$(dirname "$0")" ; pwd)
readonly MY_NAME=$(basename "$0")
readonly FILE_TO_FETCH_URL="https://your_url_to_downloadable_file_here"
readonly EXISTING_SHELL_SCRIPT="${THISDIR}/somescript.sh"
readonly EXECUTABLE_SHELL_SCRIPT="${THISDIR}/.somescript.sh"
function get_remote_file() {
readonly REQUEST_URL=$1
readonly OUTPUT_FILENAME=$2
readonly TEMP_FILE="${THISDIR}/tmp.file"
if [ -n "$(which wget)" ]; then
$(wget -O "${TEMP_FILE}" "$REQUEST_URL" 2>&1)
if [[ $? -eq 0 ]]; then
mv "${TEMP_FILE}" "${OUTPUT_FILENAME}"
chmod 755 "${OUTPUT_FILENAME}"
else
return 1
fi
fi
}
function clean_up() {
# clean up code (if required) that has to execute every time here
}
function self_clean_up() {
rm -f "${EXECUTABLE_SHELL_SCRIPT}"
}
function update_self_and_invoke() {
get_remote_file "${FILE_TO_FETCH_URL}" "${EXECUTABLE_SHELL_SCRIPT}"
if [ $? -ne 0 ]; then
cp "${EXISTING_SHELL_SCRIPT}" "${EXECUTABLE_SHELL_SCRIPT}"
fi
exec "${EXECUTABLE_SHELL_SCRIPT}" "$#"
}
function main() {
cp "${EXECUTABLE_SHELL_SCRIPT}" "${EXISTING_SHELL_SCRIPT}"
# your code here
}
if [[ $MY_NAME = \.* ]]; then
# invoke real main program
trap "clean_up; self_clean_up" EXIT
main "$#"
else
# update myself and invoke updated version
trap clean_up EXIT
update_self_and_invoke "$#"
fi

understanding a ksh script part of

Could someone help me understand the following piece of code which is deciding on the start and end dates to pick data out of a db.
# Get the current time as the stop time.
#
stoptime=`date +"%Y-%m-%d %H:00"`
if test $? -ne 0
then
echo "Failed to get the date"
rm -f $1/.optpamo.pid
exit 4
fi
#
# Read the lasttime file to get the start time
#
if test -f $1/optlasttime
then
starttime=`cat $1/optlasttime`
# if the length of the chain is zero
# (lasttime is empty) It is updated properly
# (and I wait for the following hour)
if test -z "$starttime"
then
echo "Empty file lasttime"
echo $stoptime > $1/optlasttime
rm -f $1/.optpamo.pid
exit 5
fi
else
# If lasttime does not exist I create, it with the present date
# and I wait for the following hour
echo "File lasttime does not exist"
echo $stoptime > $1/optlasttime
rm -f $1/.optpamo.pid
exit 6
fi
Thanks
The script checks to see if there's a non-empty file named optlasttime in the directory specified as an argument ($1). If so, the script exits successfully (status 0). If the file doesn't exist or is empty, the current hour formatted as 2010-01-07 14:00 is written to the file, another file named .optpamo.pid is deleted from the argument directory and the script exits unsuccessfully (status 5 or 6).
This script is obviously a utility being called by some outer process, to which you need to refer for full understanding.
1.) Sets stop time to current time
2.) Checks if file $1/optlasttime exists (where $1 is passed into the script)
a.) if $1/optlasttime exists it checks the contents of the file (which it is assumed that if it does have contents it is a timestamp)
b.) if $1/optlasttime does not exist it populates the $1/optlasttime file with the stoptime.
I copied and pasted a small snippet of this into a file I called test.ksh
stoptime=`date +"%Y-%m-%d %H:00"`
if test $? -ne 0
then
echo "Failed to get the date"
rm -f $1/.optpamo.pid
exit 4
fi
Then I ran it at the commandline, like so:
zhasper#berens:~$ ksh -x ./temp.ksh
+ date '+%Y-%m-%d %H:00'
+ stoptime='2010-01-08 18:00'
+ test 0 -ne 0
The -x flag to ksh makes it print out each commandline, in full, as it executes. Comparing what you see here with the snippet of shell script above should tell you something about how ksh is interpreting the file.
If you run this over the whole file you should get a good feel for what it's doing.
To learn more, you can read man ksh, or search for ksh scripting tutorial online.
Together, these three things should help you learn a lot more than us simply telling you what the script does.

Resources