Looping through files of specified extensions in bash - macos

I am trying to loop through files of a list of specified extensions with a bash script. I tried the solution given at Matching files with various extensions using for loop but it does not work as expected. The solution given was:
for file in "${arg}"/*.{txt,h,py}; do
Here is my version of it:
for f in "${arg}"/*.{epub,mobi,chm,rtf,lit,djvu}
do
echo "$f"
done
When I run this in a directory with an epub file in it, I get:
/*.epub
/*.mobi
/*.chm
/*.rtf
/*.lit
/*.djvu
So I tried changing the for statement:
for f in "${arg}"*.{epub,mobi,chm,rtf,lit,djvu}
Then I got:
089281098X.epub
*.mobi
*.chm
*.rtf
*.lit
*.djvu
I also get the same result with:
for f in *.{epub,mobi,chm,rtf,lit,djvu}
So it seems that the "${arg}" argument is unnecessary.
Although either of these statements finds files of the specified extensions and can pass them to a program, I get read errors from the unresolved *. filenames.
I am running this on OS X Mountain Lion. I was aware that the default bash shell was outdated so I upgraded it from 3.2.48 to 4.2.45 using homebrew to see if this was the problem. That didn't help so I am wondering why I am getting these unexpected results. Is the given solution wrong or is the OS X bash shell somehow different from the *NIX version? Is there perhaps an alternate way to accomplish the same thing that might work better in the OS X bash shell?

This may be a BASH 4.2ism. It does not work in my BASH which is still 3.2. However, if you shopt -s extglob, you can use *(...) instead:
shopt -s extglob
for file in *.*(epub|mobi|chm|rtf|lit|djvu)
do
...
done
#David W.: shopt -s extglob for f in .(epub|mobi|chm|rtf|lit|djvu) results in: 089281098X.epub #kojiro: arg=. shopt -s nullglob for f in "${arg}"/.{epub,mobi,chm,rtf,lit,djvu} results in: ./089281098X.epub shopt -s nullglob for f in "${arg}".{epub,mobi,chm,rtf,lit,djvu} results in: 089281098X.epub So all of these variations work but I don't understand why. Can either of you explain what is going on with each variation and what ${arg} is doing? I would really like to understand this so I can increase my knowledge. Thanks for the help.
In mine:
for f in *.*(epub|mobi|chm|rtf|lit|djvu)
I didn't include ${arg} which expands to the value of $arg. The *(...) matches the pattern found in the parentheses which is one of any of the series of extensions. Thus, it matches *.epub.
Kojiro's:
arg=.
shopt -s nullglob
for f in "${arg}"/*.{epub,mobi,chm,rtf,lit,djvu}
Is including $arg and the slash in his matching. Thus, koriro's start with ./ because that's what they are asking for.
It's like the difference between:
echo *
and
echo ./*
By the way, you could do this with the other expressions too:
echo *.*(epub|mobi|chm|rtf|lit|djvu)
The shell is doing all of the expansion for you. It's really has nothing to do with the for statement itself.

A glob has to expand to an existing, found name, or it is left alone with the asterisk intact. If you have an empty directory, *.foo will expand to *.foo. (Unless you use the nullglob Bash extension.)
The problem with your code is that you start with an arg, $arg, which is apparently empty or undefined. So your glob, ${arg}/*.epub expands to /*.epub because there are no files ending in ".epub" in the root directory. It's never looking in the current directory. For it to do that, you'd need to set arg=. first.
In your second example, the ${arg}*.epub does expand because $arg is empty, but the other files don't exist, so they continue not to expand as globs. As I hinted at before, one easy workaround would be to activate nullglob with shopt -s nullglob. This is bash-specific, but will cause *.foo to expand to an empty string if there is no matching file. For a strict POSIX solution, you would have to filter out unexpanded globs using [ -f "$f" ]. (Then again, if you wanted POSIX, you couldn't use brace expansion either.)

To summarize, the best solutions are to use (most intuitive and elegant):
shopt -s extglob
for f in *.*(epub|mobi|chm|rtf|lit|djvu)
or, in keeping with the original solution given in the referenced thread (which was wrong as stated):
shopt -s nullglob
for f in "${arg}"*.{epub,mobi,chm,rtf,lit,djvu}

This should do it:
for file in $(find ./ -name '*.epub' -o -name '*.mobi' -o -name '*.chm' -o -name '*.rtf' -o -name '*.lit' -o -name '*.djvu'); do
echo $file
done

Related

Cannot iterate associative array keys in PKGBUILD

I am working with a PKGBUILD file for the AUR. I have a lot of colors that need to be replaced in different files in the $pkgsrc directory and I wanted to use an associative array.
declare -A _BLACKISH_REPLACEMENTS
_BLACKISH_REPLACEMENTS['#242424']='#1C1C1C'
_BLACKISH_REPLACEMENTS['#333333']='#292929'
_BLACKISH_REPLACEMENTS['#999999']='#787878'
_BLACKISH_REPLACEMENTS['#555555']='#4C4C4C'
_BLACKISH_REPLACEMENTS['#373737']='#2E2E2E'
_BLACKISH_REPLACEMENTS['#434343']='#383838'
_BLACKISH_REPLACEMENTS['#3E3E3E']='#333333'
_BLACKISH_REPLACEMENTS['#383838']='#2E2E2E'
_BLACKISH_REPLACEMENTS['#313131']='#262626'
_BLACKISH_REPLACEMENTS['#101010']='#101010'
_BLACKISH_REPLACEMENTS['#3B3B3B']='#303030'
_BLACKISH_REPLACEMENTS['#2A2A2A']='#1F1F1F'
_BLACKISH_REPLACEMENTS['#656565']='#575757'
_BLACKISH_REPLACEMENTS['#767676']='#5E5E5E'
_BLACKISH_REPLACEMENTS['#868686']='#787878'
_BLACKISH_REPLACEMENTS['#636363']='#595959'
_BLACKISH_REPLACEMENTS['#696969']='#5E5E5E'
_BLACKISH_REPLACEMENTS['#707070']='#666666'
_BLACKISH_REPLACEMENTS['#767676']='#6B6B6B'
_BLACKISH_REPLACEMENTS['#C1C1C1']='#B8B8B8'
_BLACKISH_REPLACEMENTS['#C6C6C6']='#BDBDBD'
That seems like a fairly clean solution, otherwise I would have many variables and that is less than ideal. Now, I iterate over these with the syntax found in other SO posts:
_blackish_replace() (
shopt -s globstar
echo "${!_BLACKISH_REPLACEMENTS[#]}"
echo "${_BLACKISH_REPLACEMENTS[#]}"
for file in "$1"/**/*.scss; do
echo "Replacing colors in file: $file"
for color in "${!_BLACKISH_REPLACEMENTS[#]}"; do
echo "$color"
sed -i "s;$color;${_BLACKISH_REPLACEMENTS["$color"]};gI" "$file"
done
done
)
It looks good to me, and when this is run in a standalone script, it does indeed replace the correct matches in the correct files.
However, when using it from makepkg, it fails silently, hence the four echo calls exhibited.
The first two output newlines. This leads me to believe they are undefined?
The iteration has proved to be working for the glob expansion, however echo "$color" is never reached; the loop iterates nothing.
I thought maybe makepkg was using the system shell, which in that case, running the code directly from my user shell zsh fails with event not found: _BLACKISH_REPLACEMENTS or something alike (off the top of my head).
I asked in the Arch Linux Discord server if makepkg uses the locally available bash, and was assured it does. I am very confused.
It is probably a good idea to turn your array into a sed script before iterating the files:
#!/usr/bin/env bash
declare -A _BLACKISH_REPLACEMENTS=(
['#242424']='#1C1C1C'
['#333333']='#292929'
['#999999']='#787878'
['#555555']='#4C4C4C'
['#373737']='#2E2E2E'
['#434343']='#383838'
['#3E3E3E']='#333333'
['#383838']='#2E2E2E'
['#313131']='#262626'
['#101010']='#101010'
['#3B3B3B']='#303030'
['#2A2A2A']='#1F1F1F'
['#656565']='#575757'
['#767676']='#5E5E5E'
['#868686']='#787878'
['#636363']='#595959'
['#696969']='#5E5E5E'
['#707070']='#666666'
['#767676']='#6B6B6B'
['#C1C1C1']='#B8B8B8'
['#C6C6C6']='#BDBDBD'
)
sed_script=
for k in "${!_BLACKISH_REPLACEMENTS[#]}"; do
v="${_BLACKISH_REPLACEMENTS[$k]}"
sed_script+="s/$k/$v/g;"
done
shopt -s globstar nullglob
for file in "$1"/**/*.scss; do
sed -i.bak -e "$sed_script" "$file"
done
Now in a more practical one-liner POSIX-shell friendly call:
find ./ -type f -name '*.scss' -exec sed -i.bak -e 's/#242424/#1C1C1C/g;s/#696969/#5E5E5E/g;s/#555555/#4C4C4C/g;s/#767676/#6B6B6B/g;s/#868686/#787878/g;s/#383838/#2E2E2E/g;s/#636363/#595959/g;s/#101010/#101010/g;s/#373737/#2E2E2E/g;s/#C6C6C6/#BDBDBD/g;s/#313131/#262626/g;s/#333333/#292929/g;s/#C1C1C1/#B8B8B8/g;s/#707070/#666666/g;s/#434343/#383838/g;s/#3E3E3E/#333333/g;s/#3B3B3B/#303030/g;s/#999999/#787878/g;s/#656565/#575757/g;s/#2A2A2A/#1F1F1F/g;' {} \;
To clarify the point of all the above:
As you are unsure about the shell brand running your makepkg, it is a safe route to choose the most portable shell code by sticking to POSIX-shell grammar, common tools and options.
Instead of choosing a quite over-engineered associative array here. The replacement instructions for sed can be layed-out as clearly as your associative array:
#!/usr/bin/env sh
# A plain string of sed replacement instructions
# is as compact and more portable than an associative array.
# It also saves from looping over each entry.
_BLACKISH_REPLACEMENTS='
s/#242424/#1C1C1C/g;
s/#696969/#5E5E5E/g;
s/#555555/#4C4C4C/g;
s/#767676/#6B6B6B/g;
s/#868686/#787878/g;
s/#383838/#2E2E2E/g;
s/#636363/#595959/g;
s/#101010/#101010/g;
s/#373737/#2E2E2E/g;
s/#C6C6C6/#BDBDBD/g;
s/#313131/#262626/g;
s/#333333/#292929/g;
s/#C1C1C1/#B8B8B8/g;
s/#707070/#666666/g;
s/#434343/#383838/g;
s/#3E3E3E/#333333/g;
s/#3B3B3B/#303030/g;
s/#999999/#787878/g;
s/#656565/#575757/g;
s/#2A2A2A/#1F1F1F/g;
'
_blackish_replace() {
# Instead of iterating a bash specific globstar,
# find -exec can replace it while sticking to the most
# genuine POSIX-shell grammar.
# find and sed tools are used with their most common options
# avoiding gnu-specific extensions.
find "$1" -type f -name '*.scss' -exec \
sed -i.bak -e "$_BLACKISH_REPLACEMENTS" {} \;
}

Using Zsh and QPDF to decrypt multiple PDFs

From this answer https://stackoverflow.com/a/59688271/7577919 I was able to decrpypt multiple PDFs in place using this bash script:
temp=`ls`;
for each in $temp;
do qpdf --decrypt --replace-input $each;
done
However, I had initially attempted to do this in Zsh (as it's encouraged in MacOS 10.15 Catalina), but was unable. It gave an error of output: File name too long
What is the difference between the for loops in Bash and Zsh and how would I go about writing a proper Zsh script?
There is no difference in the for-loop, but in the way variables are expanded. Consider this program:
x='a b'
for v in $x
do
echo $v
done
In bash, $x would word-split into 2 arguments, and hence the loop would be executed twice, once for a and once for b. In zsh, $x would not undergo word-splitting and the loop would be executed once, for the valud a b. This difference is everywhere when you expand a parameter.
In your case, the loop is executed once, each holding the complete output of the ls statement.
Of course in your case, it would be simpler in zsh to write the loop as
for each in *(N)
but if you really need a variable, I would use an array:
temp=(*(N))
The N-flag after the wildcard takes care that you get an empty string instead of an error message, if there are no files.
If you also want to catch the dot-files (similar to what a ls -A would do), use (ND) instead.
Since parsing the result of ls is not encouraged, it's probably not recommended to use ls to get the pdf filenames. Instead, you can use find:
find . -name '*.pdf' -exec qpdf --decrypt --replace-input "{}" \;
You can limit pdf in current directory by adding -maxdepth 1 to find.

bash script to check for folders containing two specific files recursively and print their path

I want to check recursively for two specific files say "hem" and "haw" and print the folders containing both the files.
find <top_folder> -name hem -o name haw -print
or
cd <top_folder>
ls **/hem **/haw
Try this Shellcheck-clean code:
shopt -s globstar
for hempath in ./**/hem ; do
dir=${hempath%/*}
[[ -e $dir/haw ]] && printf '%s\n' "$dir"
done
See glob - Greg's Wiki for information about globstar and the ** pattern.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${hempath%/*}.
The code uses ./**/hem instead of **/hem to ensure that all the matched paths start with ./ so it works even if the files are in the current directory (.).
See the accepted, and excellent, answer to Why is printf better than echo? to understand why printf is used instead of echo to print the directory path.
Note that support for globstar was introduced in Bash 4.0, and it was dangerous in versions prior to 4.3 because it used to follow symlinks, possibly leading to infinite recursion (and failure) or unwanted duplicates. See When does globstar descend into symlinked directories?.

Rename/move (mv) multiple files starting with name

I'm trying to rename multiple files that match a pattern in one directory.
Files:
stack_overflow_one.xml
stack_overflow_two.xml
stack_overflow_one.html
I would like to rename stack_overflow to heap_graph
heap_graph_one.xml
heap_graph_two.xml
heap_graph_one.html
I have tried the following:
Using rename:
rename stack_overflow heap_graph stack_overflow* # returns 'The syntax of the command is incorrect.'
Using for loop in Bash
# how can I write this in one line? I've tried wrapping in one line, but also does not work
for i in stack_overflow* do
mv "$i" "${i/stack_overflow/heap_graph}"
done
However, none of these are working.
What you have is a trivial syntax error in the for loop. The rest of your script should work fine without any problem.
for i in stack_overflow*; do
# ^^^ missing semi-colon
# The below condition to handle graceful loop termination when no files are found
[ -f "$i" ] || continue
mv "$i" "${i/stack_overflow/heap_graph}"
done
As noted by ghoti below if you are in the bourne again shell bash and not the POSIX bourne shell (sh) for which the solution above is portable, you can use special globbing options to avoid the condition of having to deal with case when no files are returned by the glob.
shopt -s nullglob
for i in stack_overflow*; do
mv "$i" "${i/stack_overflow/heap_graph}"
done
shopt -u nullglob
The -s option sets it and -u unsets it. More on shopt built-in from the GNU bash page
The semi-common utility mmv
(article)
is useful for the "multi-move" case you've described.
$ mmv -n 'stack_overflow_*' 'heap_graph__#' # remove -n after testing
mv -- stack_overflow_one.html heap_graph_one.html
mv -- stack_overflow_one.xml heap_graph_one.xml
mv -- stack_overflow_two.xml heap_graph_two.xml
As you can see, it's just calling down to mv multiple times for the
pattern(s) matched.
The * is a wildcard like Bash uses, but it's quote-escaped to be
passed to mmv. The _# is the reference to the match, also escaped
(though the docs suggest #1 would work instead).
This family of commands is also handy for copying (cp) and linking
(ln).
If you happen to have Zsh (a common Bash upgrade/replacement), you've
already got zmv, which would similarly do the job with:
% zmv -n 'stack_overflow_(*)' 'heap_graph_$1'

Difference between using ls and find to loop over files in a bash script

I'm not sure I understand exactly why:
for f in `find . -name "strain_flame_00*.dat"`; do
echo $f
mybase=`basename $f .dat`
echo $mybase
done
works and:
for f in `ls strain_flame_00*.dat`; do
echo $f
mybase=`basename $f .dat`
echo $mybase
done
does not, i.e. the filename does not get stripped of the suffix. I think it's because what comes out of ls is formatted differently but I'm not sure. I even tried to put eval in front of ls...
The correct way to iterate over filenames here would be
for f in strain_flame_00*.dat; do
echo "$f"
mybase=$(basename "$f" .dat)
echo "$mybase"
done
Using for with a glob pattern, and then quoting all references to the filename is the safest way to use filenames that may have whitespace.
First of all, never parse the output of the ls command.
If you MUST use ls and you DON'T know what ls alias is out there, then do this:
(
COLUMNS=
LANG=
NLSPATH=
GLOBIGNORE=
LS_COLORS=
TZ=
unset ls
for f in `ls -1 strain_flame_00*.dat`; do
echo $f
mybase=`basename $f .dat`
echo $mybase
done
)
It is surrounded by parenthesis to protect existing environment, aliases and shell variables.
Various environment names were NUKED (as ls does look those up).
One unalias command (self-explanatory).
One unset command (again, protection against scrupulous over-lording 'ls' function).
Now, you can see why NOT to use the 'ls'.
Another difference that hasn't been mentioned yet is that find is recursive search by default, whereas ls is not. (even though both can be told to do recursive / non-recursive through options; and find can be told to recurse up to a specified depth)
And, as others have mentioned, if it can be achieved by globbing, you should avoid using either.

Resources