Error in attempting to parallel task of a bash script - bash

I am trying to parallel the task of rpw_gen_features in the following bash script:
#!/bin/bash
maxjobs=8
jobcounter=0
MYDIR="/home/rasoul/workspace/world_db/journal/for-training"
DIR=$1
FILES=`find $MYDIR/${DIR}/${DIR}\_*.hpl -name *.hpl -type f -printf "%f\n" | sort -n -t _ -k 2`
for f in $FILES; do
fileToProcess=$MYDIR/${DIR}/$f
# construct .pfl file name
filebasename="${f%.*}"
fileToCheck=$MYDIR/${DIR}/$filebasename.pfl
# check if the .pfl file is already generated
if [ ! -f $fileToCheck ];
then
echo ../bin/rpw_gen_features -r $fileToProcess &
jobcounter=jobcounter+1
fi
if [jobcounter -eq maxjobs]
wait
jobcounter=0
fi
done
but it generates some error at runtime:
line 20: syntax error near unexpected token `fi'
I'm not an expert in bash programming, so please feel free to comment on the whole code.

I am curious why you don't just use GNU Parallel:
MYDIR="/home/rasoul/workspace/world_db/journal/for-training"
DIR=$1
find $MYDIR/${DIR}/${DIR}\_*.hpl -name *.hpl -type f |
parallel '[ ! -f {.}.pfl ] && echo ../bin/rpw_gen_features -r {}'
Or even:
MYDIR="/home/rasoul/workspace/world_db/journal/for-training"
parallel '[ ! -f {.}.pfl ] && echo ../bin/rpw_gen_features -r {}' ::: $MYDIR/$1/$1\_*.hpl
It seems to be way more readable, and it will automatically scale when you move from an 8-core to a 64-core machine.
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). You command line
with love you for it.

You are missing a then, spaces and ${} around the variables:
if [jobcounter -eq maxjobs]
wait
jobcounter=0
fi
Should be
if [ ${jobcounter} -eq ${maxjobs} ]; then
wait
jobcounter=0
fi
Further, you need to double check your script as I can see many missing ${} for example:
jobcounter=jobcounter+1
Even if you use the variables correctly this still will not work:
jobcounter=${jobcounter}+1
Will yield:
1
1+1
1+1+1
And not what you expect. You need to use:
jobcounter=`expr $jobcounter + 1`
With never versions of BASH you should be able to do:
(( jobcounter++ ))

Related

BASH - 'exit 1' failed in loop inside another loop [duplicate]

This question already has answers here:
Exit bash script within while loop
(2 answers)
Closed last month.
The following code doesn't exit at the first exit 1 from the call of error_exit. What am I missing?
#!/bin/bash
THIS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
JINJANG_DIR="$(cd "$THIS_DIR/../.." && pwd)"
DATAS_DIR="$THIS_DIR/datas"
error_exit() {
echo ""
echo "ERROR - Following command opens the file that has raised an error."
echo ""
echo " > open \"$1\""
exit 1
}
cd "$DATAS_DIR"
find . -name 'datas.*' -type f | sort | while read -r datafile
do
localdir="$(dirname $datafile)"
echo " * Testing ''$localdir''."
filename=$(basename "$datafile")
ext=${filename##*.}
if [ "$ext" == "py" ]
then
unsafe="-u"
else
unsafe=""
fi
datas="$DATAS_DIR/$datafile"
find . -name 'template.*' -type f | sort | while read -r template
do
filename=$(basename "$template")
ext=${filename##*.}
template="$DATAS_DIR/$template"
outputfound="$DATAS_DIR/$localdir/output_found.$ext"
cd "$JINJANG_DIR"
python -m src $UNSAFE "$DATA" "$TEMPLATE" "$OUTPUTFOUND" || error_exit "$localdir"
done
cd "$DATAS_DIR"
done
Here is the output I obtain.
ERROR - Following command opens the file that has raised an error.
> open "./html/no-param-1"
* Testing ''./html/no-param-2''.
ERROR - Following command opens the file that has raised an error.
> open "./html/no-param-2"
* Testing ''./latex/no-param-1''.
ERROR - Following command opens the file that has raised an error.
> open "./latex/no-param-1"
* Testing ''./latex/no-param-2''.
ERROR - Following command opens the file that has raised an error.
In my bash environment invoking exit in a subprocess does not abort the parent process, eg:
$ echo "1 2 3" | exit # does not exit my console but instead ...
$ # presents me with the command prompt
In your case you have the pipeline: find | sort | while, so the python || error_exit is being called within a subprocess which in turn means the exit 1 will apply to the subprocess but not the (parent) script.
One solution that insures the (inner) while (and thus the exit 1) is not run in a subprocess:
while read -r template
do
... snip ...
python ... || error_exit
... snip ...
done < <(find . -name 'template.*' -type f | sort)
NOTES:
I'd recommend getting used to this structure as it also addresses another common issue ...
values assigned to variables in a subprocess are not passed 'up' to the parent process
subprocess behavior may differ in other shells
Of course, this same issue applies to the parent/outer while loop so, if the objective is for the exit 1 to apply to the entire script then this same structure will need to be implemented for the parent/outer find | sort | while, too:
while read -r datafile
do
... snip ...
while read -r template
do
... snip ...
python ... || error_exit
done < <(find . -name 'template.*' -type f | sort)
cd "$DATAS_DIR"
done < <(find . -name 'datas.*' -type f | sort)
Additional note copied from GordonDavisson's edit of this answer:
Note that the <( ) construct ("process substitution") is not
available in all shells, or even in bash when it's in sh-compatibility
mode (i.e. when it's invoked as sh or /bin/sh). So be sure to use
an explicit bash shebang (like #!/bin/bash or #!/usr/bin/env bash)
in your script, and don't override it by running the script with the
sh command.

How to execute some commands on non zero exit code?

I am running the following command:
OLDIFS=$IFS
IFS=$'\n'
for i in $(find $HOME/test -maxdepth 1 -type f); do
if [ $? -eq 0 ]; then
telegram-upload --to 12345 --directories recursive --large-files split --caption '{file_name}' $i &&
rm $i
fi
done
IFS=$OLDIFS
If the telegram upload command exits with a non zero code I intend to do the following:
rm somefile && wget someurl && rerun the whole command
How do I go about doing something like this?
I believe you can capture the exit code as follows:
exit_code=$(telegram-upload --to 12345 ...)
From here, you can use the variable $exit_code as a regular variable,
like if [ $exit_code -eq 0 ]; then).
It's straightforward: Do an endless loop, which you break out once you succeed:
for i in $(find $HOME/test -maxdepth 1 -type f)
do
while true
do
if telegram-upload --to 12345 --directories recursive --large-files split --caption '{file_name}' $i
then
break
else
rm somefile && wget someurl
fi
done
done
UPDATE : For completeness, I added the loop over the files as well. Note that your approach of looping over the files will fail, if you have files where the name contains white space. However this is already a problem in your own approach and not part of the question, so I don't discuss this here.

File count in a folder not showing accurate

I am writing a shell script to check two things at one time. The first condition is to check for the existence of a specific file and the second condition is to confirm that there is only one file in that directory.
I am using the following code:
conf_file=ls -1 /opt/files/conf.json 2>/dev/null | wc -l
total_file=ls -1 /opt/files/* 2>/dev/null| wc -l
if [ $conf_file -eq 1 ] && [ $total_file -eq 1 ]
then
echo "done"
else
echo "Not Done"
fi
It is returning the following error
0
0
./ifexist.sh: 4: [: -eq: unexpected operator
Not Done
I am probably doing a very silly mistake. Can anyone help me a little bit?
One of the reasons you should normally not parse ls is that you can get strange results when you have files with newlines. In your case that won't be an issue, because any file different from json.conf should make the test fail. However you should make the code counting the files be future-proof. You can use find for this.
Your code can be changed into
jsonfile="/opt/files/conf.json"
countfiles=$(find /opt/files -maxdepth 1 -type f -exec printf '.\n' \; | wc -l)
if [[ -f "${jsonfile}" ]] && (( "${countfiles}" == 1)); then
echo "Done"
else
echo "Not Done"
fi
When you say this:
conf_file=ls -1 /opt/files/conf.json 2>/dev/null | wc -l
That assigns the value "ls" to the variable conf_file, and then tries to run a command called "-1" and pipe the result to wc If you want to run a pipe sequence, you have to enclose it in $( ):
conf_file=$(ls -1 /opt/files/conf.json 2./dev/null | wc -l)
Next, when combining clauses in the test command ([), do it inside the command:
if [ $conf_file -eq 1 -a $total_file -eq 1 ]
However, there are better ways to do this. You can check if a file exists with "-f", and you can just check whether the output of ls matches what you expect, without creating variables or running other commands:
if [ -f /opt/files/conf.json -a "$(ls /opt/files/conf.*)" -eq "/opt/files/conf.json" ]
However, it is not a friendly practice to prohibit other files. In many cases, people might want to leave backup or test copies (conf.json.bak or conf.json.test), and there's no reason for you to block that.

Bash complete function - Separating completion parts with character other than space

I've written a bash completion script to essentially do file/directory completion, but using . as the separator instead of /. However, it's not behaving as I expect it to.
Before I dive further, does anyone know of any options for this, or something that's already been written that can do this? The motivation for this is to enable completion when calling python with the -m flag. It seems crazy that this doesn't exist yet, but I was unable to find anything relevant.
My issue is that bash doesn't recognize . as a separator for completion options, and won't show the next options until I add an additional space to the end of the current command.
Here's a few concrete examples, given this directory structure.
/module
/script1.py
/script2.py
For instance, when I use the ls command, it works like this
$ ls mo<TAB>
$ ls module/<TAB><TAB>
script1.py script2.py
However, with my function, it's working like this:
$ python -m mod<TAB>
$ python -m module.<TAB><TAB>
module.
So instead of showing the next entries, it just shows the finished string again. However, if I add a space, it then works, but I don't want it to include the space:
$ python -m mod<TAB>
$ python -m module. <TAB><TAB> # (note the space here after the dot)
script1 script2 # (Note, I'm intentionally removing the file extension here).
I'd like the completion to act just like the bottom example, except not be forced to include the space to go to the next set of options
I've got about 50 tabs open and I've tried a bunch of recommendations, but nothing seems to be able to solve this how I'd like. There are a few other caveats here that would take a lot of time to go through, so I'm happy to expand on any other points if I've skipped something important. I've attached my code below, any help would be greatly appreciated. Thanks!
#!/bin/bash
_python_target() {
local cur opts cur_path
# Retrieving the current typed argument
cur="${COMP_WORDS[COMP_CWORD]}"
# Preparing an array to store available list for completions
# COMREPLY will be checked to suggest the list
COMPREPLY=()
# Here, we'll only handle the case of "-m"
# Hence, the classic autocompletion is disabled
# (ie COMREPLY stays an empty array)
if [[ "${COMP_WORDS[1]}" != "-m" ]]
then
return 0
fi
# add each path component to the current path to check for additional files
cur_path=""
for word in ${COMP_WORDS[#]:2:COMP_CWORD-2}; do
path_component=$(echo ${word} | sed 's/\./\//g')
cur_path="${cur_path}${path_component}"
done
cur_path="./${cur_path}"
if [[ ! -f "$cur_path" && ! -d "$cur_path" ]]; then
return 0
fi
# this is not very pretty, but it works. Open to comments on this too
file_opts="$(find ${cur_path} -name "*.py" -type f -maxdepth 1 -print0 | xargs -0 basename -a | sed 's/\.[^.]*$//')"
dir_opts="$(find ${cur_path} ! -path ${cur_path} -type d -maxdepth 1 -print0 | xargs -0 basename -a | xargs -I {} echo {}.)"
opts="${file_opts} ${dir_opts}"
# We store the whole list by invoking "compgen" and filling
# COMREPLY with its output content.
COMPREPLY=($(compgen -W "$opts" -- "$cur"))
[[ $COMPREPLY == *\. ]] && compopt -o nospace
}
complete -F _python_target python
Here's a draft example:
_python_target()
{
local cmd=$1 cur=$2 pre=$3
if [[ $pre != -m ]]; then
return
fi
local cur_slash=${cur//./\/}
local i arr arr2
arr=( $( compgen -f "$cur_slash" ) )
arr2=()
for i in "${arr[#]}"; do
if [[ -d $i ]]; then
arr2+=( "$i/" )
elif [[ $i == *.py ]]; then
arr2+=( "${i%.py}" )
fi
done
arr2=( "${arr2[#]//\//.}" )
COMPREPLY=( $( compgen -W "${arr2[*]}" -- "$cur" ) )
}
complete -o nospace -F _python_target python
Try with the python-2.7.18 source code directory:

Bash: Check if a directory contains only files with a specific suffix

I am trying to write a script that will check if a directory contains only
a specific kind of file (and/or folder) and will return 1 for false, 0 for true.
IE: I want to check if /my/dir/ contains only *.gz files and nothing else.
This is what i have so far, but it doesn't seem to be working as intended:
# Basic vars
readonly THIS_JOB=${0##*/}
readonly ARGS_NBR=1
declare dir_in=$1
dir_in=$1"/*.gz"
#echo $dir_in
files=$(shopt -s nullglob dotglob; echo ! $dir_in)
echo $files
if (( ${#files} ))
then
echo "Success: Directory contains files."
exit 0
else
echo "Failure: Directory is empty (or does not exist or is a file)"
exit 1
fi
I want to check if /my/dir/ contains only *.gz files and nothing else.
Use find instead of globulation. It's really easier to use find and to parse find output. Globulation are simple for simple scripts, but once you want to parse "all files in a directory" and do some filtration and such, it's way easier (and safer) to use find:
find "$1" -mindepth 1 -maxdepth 1 \! -name '*.gz' -o \! -type f | wc -l | xargs test 0 -eq
This finds all "things" that are not named *.gz inside the directory or are not files (so mkdir a.gz is accounted for), counts them, and then tests if they're count is equal to 0. If the count is equal to 0, xargs test 0 -eq will return 0, if not, it will return status between 1 - 125. You can handle the nonzero return status with a simple || return 1 if you wish.
You can remove xargs with a simple bash substitution and use the method from this thread for a little speedup and get test return value, which is 0 or 1:
[ 0 -eq "$(find "$1" -mindepth 1 -maxdepth 1 \! -name '*.gz' -o \! -type f -print '.' | wc -c)" ]
Remember that the exit status of a script is the exit status of the last command executed. So you don't need anything else in your script if you wish, only a shebang and this oneliner will suffice.
Using Bash's extglob, !(*.gz) and grep:
$ if grep -qs . path/!(*.gz) ; then echo yes ; else echo nope ; fi
man grep:
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit
immediately with zero status if any match is found, even if an
error was detected. Also see the -s or --no-messages option.
-s, --no-messages
Suppress error messages about nonexistent or unreadable files.
Since you are using bash, there is another setting you can use: GLOBIGNORE
#!/bin/bash
containsonly(){
dir="$1"
glob="$2"
if [ ! -d "$dir" ]; then
echo 1>&2 "Failure: directory does not exist"
return 2
fi
local res=$(
cd "$dir"
GLOBIGNORE=$glob"
shopt -s nullglob dotglob
echo *
)
if [ ${#res} = 0 ]; then
echo 1>&2 "Success: directory contains no extra files"
return 0
else
echo 1>&2 "Failure: directory contains extra files"
return 1
fi
}
# ...
containsonly myfolder '*.gz'
Some have suggested to count all files which do not match the globbing pattern *.gz. This might be quite inefficient depending on the the number of files. For you job it is sufficient to find just one file, which does not match your globbing pattern. Use the -quite action of find to exit after the first match:
if [ -z "$(find /usr/share/man/man1/* -not -name '*.gz' -print -quit)" ]
then echo only gz
fi

Resources