Function to grep files containing arbitrary number of strings (boolean and) - bash

I'm hoping to write a bash script to grep files matching several strings.
I've found the 'hard-wired' solution here: (Source: Find files containing multiple strings):
find . -type f -exec grep -l 'string1' {} \; | xargs grep -l 'string2' | xargs grep -l 'string3' | xargs grep -l 'string4'
Would return all files containing string1 && string2 && string3 && string4.
How can I convert this into a bash shell function fn that takes in an arbitrary number of arguments? I'd like fn string1 string2 string3 string4 ... to give me identical results. Persumably this would involve looping through arguments and piping results to successive commands xargs grep -l ${!i}

If your grep supports -P (PCRE) option, how about:
# grep files containing arbitrary number of strings
fn() {
local dir=$1 # directory to search
shift
local -a patterns=("$#") # list of target strings
local i # local variable
local pat="(?s)" # single mode makes dot match a newline
for i in "${patterns[#]}"; do
pat+="$(printf "(?=.*\\\b%s\\\b)" "$i")"
done
find . -type f -exec grep -zlP "$pat" {} \;
}
# example of usage
fn . string1 string2 string3
If the passed word list is word1 word2, it generates a regex pattern
"(?s)(?=.*\bword1\b)(?=.*\bword2\b) which matches a file containing
both word1 and word2 in any order.
(?s) specifies a "single mode" making a dot match any characters including
a newline.
-z option to grep sets the input record separator to a null character.
Then the whole file is treated as a single line.
If grep -P is not available, here is an alternative using a loop:
fn() {
local dir=$1 # directory to search
shift
local -a patterns=("$#") # list of target strings
local i f flag # local variables
while IFS= read -rd "" f; do # loop over the files fed by "find"
fail=0 # flag to indicate match fails
for i in "${patterns[#]}"; do # loop over the target strings
grep -q "$i" "$f" || { fail=1; break; }
# if not matched, set the flag and exit the loop
done
(( fail == 0 )) && echo "$f" # if all matched, print the filename
done < <(find . -type f -print0)
}

Related

How to concatenate a list of folder paths within a variable that have spaces in them in shell script [duplicate]

I want to iterate over a list of files. This list is the result of a find command, so I came up with:
getlist() {
for f in $(find . -iname "foo*")
do
echo "File found: $f"
# do something useful
done
}
It's fine except if a file has spaces in its name:
$ ls
foo_bar_baz.txt
foo bar baz.txt
$ getlist
File found: foo_bar_baz.txt
File found: foo
File found: bar
File found: baz.txt
What can I do to avoid the split on spaces?
You could replace the word-based iteration with a line-based one:
find . -iname "foo*" | while read f
do
# ... loop body
done
There are several workable ways to accomplish this.
If you wanted to stick closely to your original version it could be done this way:
getlist() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: %s\n' "$file"
done
}
This will still fail if file names have literal newlines in them, but spaces will not break it.
However, messing with IFS isn't necessary. Here's my preferred way to do this:
getlist() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
If you find the < <(command) syntax unfamiliar you should read about process substitution. The advantage of this over for file in $(find ...) is that files with spaces, newlines and other characters are correctly handled. This works because find with -print0 will use a null (aka \0) as the terminator for each file name and, unlike newline, null is not a legal character in a file name.
The advantage to this over the nearly-equivalent version
getlist() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done
}
Is that any variable assignment in the body of the while loop is preserved. That is, if you pipe to while as above then the body of the while is in a subshell which may not be what you want.
The advantage of the process substitution version over find ... -print0 | xargs -0 is minimal: The xargs version is fine if all you need is to print a line or perform a single operation on the file, but if you need to perform multiple steps the loop version is easier.
EDIT: Here's a nice test script so you can get an idea of the difference between different attempts at solving this problem
#!/usr/bin/env bash
dir=/tmp/getlist.test/
mkdir -p "$dir"
cd "$dir"
touch 'file not starting foo' foo foobar barfoo 'foo with spaces'\
'foo with'$'\n'newline 'foo with trailing whitespace '
# while with process substitution, null terminated, empty IFS
getlist0() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# while with process substitution, null terminated, default IFS
getlist1() {
while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# pipe to while, newline terminated
getlist2() {
find . -iname 'foo*' | while read -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# pipe to while, null terminated
getlist3() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, default IFS
getlist4() {
for file in "$(find . -iname 'foo*')" ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, newline IFS
getlist5() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# see how they run
for n in {0..5} ; do
printf '\n\ngetlist%d:\n' $n
eval getlist$n
done
rm -rf "$dir"
There is also a very simple solution: rely on bash globbing
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
$ ls
stupid file 3 stupid file1 stupid file2
$ for file in *; do echo "file: '${file}'"; done
file: 'stupid file 3'
file: 'stupid file1'
file: 'stupid file2'
Note that I am not sure this behavior is the default one but I don't see any special setting in my shopt so I would go and say that it should be "safe" (tested on osx and ubuntu).
find . -iname "foo*" -print0 | xargs -L1 -0 echo "File found:"
find . -name "fo*" -print0 | xargs -0 ls -l
See man xargs.
Since you aren't doing any other type of filtering with find, you can use the following as of bash 4.0:
shopt -s globstar
getlist() {
for f in **/foo*
do
echo "File found: $f"
# do something useful
done
}
The **/ will match zero or more directories, so the full pattern will match foo* in the current directory or any subdirectories.
I really like for loops and array iteration, so I figure I will add this answer to the mix...
I also liked marchelbling's stupid file example. :)
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
Inside the test directory:
readarray -t arr <<< "`ls -A1`"
This adds each file listing line into a bash array named arr with any trailing newline removed.
Let's say we want to give these files better names...
for i in ${!arr[#]}
do
newname=`echo "${arr[$i]}" | sed 's/stupid/smarter/; s/ */_/g'`;
mv "${arr[$i]}" "$newname"
done
${!arr[#]} expands to 0 1 2 so "${arr[$i]}" is the ith element of the array. The quotes around the variables are important to preserve the spaces.
The result is three renamed files:
$ ls -1
smarter_file1
smarter_file2
smarter_file_3
find has an -exec argument that loops over the find results and executes an arbitrary command. For example:
find . -iname "foo*" -exec echo "File found: {}" \;
Here {} represents the found files, and wrapping it in "" allows for the resultant shell command to deal with spaces in the file name.
In many cases you can replace that last \; (which starts a new command) with a \+, which will put multiple files in the one command (not necessarily all of them at once though, see man find for more details).
I recently had to deal with a similar case, and I built a FILES array to iterate over the filenames:
eval FILES=($(find . -iname "foo*" -printf '"%p" '))
The idea here is to surround each filename with double quotes, separate them with spaces and use the result to initialize the FILES array.
The use of eval is necessary to evaluate the double quotes in the find output correctly for the array initialization.
To iterate over the files, just do:
for f in "${FILES[#]}"; do
# Do something with $f
done
In some cases, here if you just need to copy or move a list of files, you could pipe that list to awk as well.
Important the \"" "\" around the field $0 (in short your files, one line-list = one file).
find . -iname "foo*" | awk '{print "mv \""$0"\" ./MyDir2" | "sh" }'
Ok - my first post on Stack Overflow!
Though my problems with this have always been in csh not bash the solution I present will, I'm sure, work in both. The issue is with the shell's interpretation of the "ls" returns. We can remove "ls" from the problem by simply using the shell expansion of the * wildcard - but this gives a "no match" error if there are no files in the current (or specified folder) - to get around this we simply extend the expansion to include dot-files thus: * .* - this will always yield results since the files . and .. will always be present. So in csh we can use this construct ...
foreach file (* .*)
echo $file
end
if you want to filter out the standard dot-files then that is easy enough ...
foreach file (* .*)
if ("$file" == .) continue
if ("file" == ..) continue
echo $file
end
The code in the first post on this thread would be written thus:-
getlist() {
for f in $(* .*)
do
echo "File found: $f"
# do something useful
done
}
Hope this helps!
Another solution for job...
Goal was :
select/filter filenames recursively in directories
handle each names (whatever space in path...)
#!/bin/bash -e
## #Trick in order handle File with space in their path...
OLD_IFS=${IFS}
IFS=$'\n'
files=($(find ${INPUT_DIR} -type f -name "*.md"))
for filename in ${files[*]}
do
# do your stuff
# ....
done
IFS=${OLD_IFS}

Bash Script to Prepend a Single Random Character to All Files In a Folder

I have an audio sample library with thousands of files. I would like to shuffle/randomize the order of these files. Can someone provide me with a bash script/line that would prepend a single random character to all files in a folder (including files in sub-folders). I do not want to prepend a random character to any of the folder names though.
Example:
Kickdrum73.wav
Kickdrum SUB.wav
Kick808.mp3
Renamed to:
f_Kickdrum73.wav
!_Kickdrum SUB.wav
4_Kick808.mp3
If possible, I would like to be able to run this script more than once, but on subsequent runs, it just changes the randomly prepended character instead of prepending a new one.
Some of my attempts:
find ~/Desktop/test -type f -print0 | xargs -0 -n1 bash -c 'mv "$0" "a${0}"'
find ~/Desktop/test/ -type f -exec mv -v {} $(cat a {}) \;
find ~/Desktop/test/ -type f -exec echo -e "Z\n$(cat !)" > !Hat 15.wav
for file in *; do
mv -v "$file" $RANDOM_"$file"
done
Note: I am running on macOS.
Latest attempt using code from mr. fixit:
find . -type f -maxdepth 999 -not -name ".*" |
cut -c 3- - |
while read F; do
randomCharacter="${F:2:1}"
if [ $randomCharacter == '_' ]; then
new="${F:1}"
else
new="_$F"
fi
fileName="`basename $new`"
newFilename="`jot -r -c $fileName 1 A Z`"
filePath="`dirname $new`"
newFilePath="$filePath$newFilename"
mv -v "$F" "$newFilePath"
done
Here's my first answer, enhanced to do sub-directories.
Put the following in file randomize
if [[ $# != 1 || ! -d "$1" ]]; then
echo "usage: $0 <path>"
else
find $1 -type f -not -name ".*" |
while read F; do
FDIR=`dirname "$F"`
FNAME=`basename "$F"`
char2="${FNAME:1:1}"
if [ $char2 == '_' ]; then
new="${FNAME:1}"
else
new="_$FNAME"
fi
new=`jot -r -w "%c$new" 1 A Z`
echo mv "$F" "${FDIR}/${new}"
done
fi
Set the permissions with chmod a+x randomize.
Then call it with randomize your/path.
It'll echo the commands required to rename everything, so you can examine them to ensure they'll work for you. If they look right, you can remove the echo from the 3rd to last line and rerun the script.
cd ~/Desktop/test, then
find . -type f -maxdepth 1 -not -name ".*" |
cut -c 3- - |
while read F; do
char2="${F:2:1}"
if [ $char2 == '_' ]; then
new="${F:1}"
else
new="_$F"
fi
new=`jot -r -w "%c$new" 1 A Z`
mv "$F" "$new"
done
find . -type f -maxdepth 1 -not -name ".*" will get all the files in the current directory, but not the hidden files (names starting with '.')
cut -c 3- - will strip the first 2 chars from the name. find outputs paths, and the ./ gets in the way of processing prefixes.
while read VAR; do <stuff>; done is a way to deal with one line at a time
char2="${VAR:2:1} sets a variable char2 to the 2nd character of the variable VAR.
if - then - else sets new to the filename, either preceded by _ or with the previous random character stripped off.
jot -r -w "%c$new" 1 A Z tacks random 1 character from A-Z onto the beginning of new
mv old new renames the file
You can also do it all in bash and there are several ways to approach it. The first is simply creating an array of letters containing whatever letters you want to use as a prefix and then generating a random number to use to choose the element of the array, e.g.
#!/bin/bash
letters=({0..9} {A..Z} {a..z}) ## array with [0-9] [A-Z] [a-z]
for i in *; do
num=$(($RANDOM % 63)) ## generate number
## remove echo to actually move file
echo "mv \"$i\" \"${letters[num]}_$i\"" ## move file
done
Example Use/Output
Current the script outputs the changes it would make, you must remove the echo "..." surrounding the mv command and fix the escaped quotes to actually have it apply changes:
$ bash ../randprefix.sh
mv "Kick808.mp3" "4_Kick808.mp3"
mv "Kickdrum SUB.wav" "h_Kickdrum SUB.wav"
mv "Kickdrum73.wav" "l_Kickdrum73.wav"
You can also do it by generating a random number representing the ASCII character between 48 (character '0') through 126 (character '~'), excluding 'backtick'), and then converting the random number to an ASCII character and prefix the filename with it, e.g.
#!/bin/bash
for i in *; do
num=$((($RANDOM % 78) + 48)) ## generate number for '0' - '~'
letter=$(printf "\\$(printf '%03o' "$num")") ## letter from number
while [ "$letter" = '`' ]; do ## exclude '`'
num=$((($RANDOM % 78) + 48)) ## generate number
letter=$(printf "\\$(printf '%03o' "$num")")
done
## remove echo to actually move file
echo "mv \"$i\" \"${letter}_$i\"" ## move file
done
(similar output, all punctuation other than backtick is possible)
In each case you will want to place the script in your path or call it from within the directory you want to move the file in (you split split dirname and basename and join them back together to make the script callable passing the directory to search as an argument -- that is left to you)

Update numbers in filenames

I have a set of filenames which are ordered numerically like:
13B12363_1B1_0.png
13B12363_1B1_1.png
13B12363_1B1_2.png
13B12363_1B1_3.png
13B12363_1B1_4.png
13B12363_1B1_5.png
13B12363_1B1_6.png
13B12363_1B1_7.png
13B12363_1B1_8.png
13B12363_1B1_9.png
13B12363_1B1_10.png
[...]
13B12363_1B1_495.png
13B12363_1B1_496.png
13B12363_1B1_497.png
13B12363_1B1_498.png
13B12363_1B1_499.png
After some postprocessing, I removed some files and I would like to update the ordering number and replace the actual number by its new position. Looking at this previous question I end up doing something like:
(1) ls -v | cat -n | while read n f; do mv -i $f ${f%%[0-9]+.png}_$n.png; done
However, this command do not recognize the "ordering number + png" and just append the new number at the end of the filename. Something like 13B12363_1B1_10.png_9.png
On the other hand, if I do:
(2) ls -v * | cat -n | while read n f; do mv $f ${f%.*}_$n.png; done
The ordering number is added without issues. Like 13B12363_1B1_10_9.png
So, for (1) it seems I am not specifying the digit correctly but I am not able to find the correct syntax. So far I tried [0-9], [0-9]+, [[:digits:]] and [[:digits:]]+. Which should be the proper one?
Additionally, in (2) I am wondering how I should specify rename (CentOS version) to remove the numbers between the second and the third underscore. Here I have to say that I have some filenames like 20B12363_22_10_9.png, so I should somehow specify second and third underscore.
Using Bash's built-in Basic Regex Engine and a null delimited list of files.
Tested with sample
#!/usr/bin/env bash
prename=$1
# Bash setting to return empty result if no match found
shopt -s nullglob
# Create a temporary directory to prevent file rename collisions
tmpdir=$(mktemp -d) || exit 1
# Add a trap to remove the temporary directory on EXIT
trap 'rmdir -- "$tmpdir"' EXIT
# Initialize file counter
n=0
# Generate null delimited list of files
printf -- %s\\0 "${prename}_"*'.png' |
# Sort the null delimited list on 3rd field numeric order with _ separator
sort --zero-terminated --field-separator=_ --key=3n |
# Iterate the null delimited list
while IFS= read -r -d '' f; do
# If Bash Regex match the file name AND
# file has a different sequence number
if [[ "$f" =~ (.*)_([0-9]+)\.png$ ]] && [[ ${BASH_REMATCH[2]} -ne $n ]]; then
# Use captured Regex match group 1 to rename file with incrementing counter
# and move it to the temporary folder to prevent rename collision with
# existing file
echo mv -- "$f" "$tmpdir/${BASH_REMATCH[1]}_$((n)).png"
fi
# Increment file counter
n=$((n+1))
done
# Move back the renamed files in place
mv --no-clobber -- "$tmpdir/*" ./
# $tempdir removal is automatic on EXIT
# If something goes wrong, some files remain in it and it is not deleted
# so these can be dealt with manually
Remove the echo if the result matches your expectations.
Output from the sample
mv -- 13B12363_1B1_495.png /tmp/tmp.O2HmbyD7d5/13B12363_1B1_11.png
mv -- 13B12363_1B1_496.png /tmp/tmp.O2HmbyD7d5/13B12363_1B1_12.png
mv -- 13B12363_1B1_497.png /tmp/tmp.O2HmbyD7d5/13B12363_1B1_13.png
mv -- 13B12363_1B1_498.png /tmp/tmp.O2HmbyD7d5/13B12363_1B1_14.png
mv -- 13B12363_1B1_499.png /tmp/tmp.O2HmbyD7d5/13B12363_1B1_15.png
Do not parse ls.
read interprets \ and splits on IFS. bashfaq how to read a stream line by line
In ${f%%replacement} expansion the replacement is not regex, but globulation. Rules differ. + means literally +.
You could shopt -o extglob and then ${f%%+([0-9]).png}. Or write a loop. Or match the _ too and do f=${f%%.png}; f="${f%_[0-9]*}_".
Or something along (untested):
find . -maxdepth 1 -mindepth 1 -type f -name '13B12363_1B1_*.png' |
sort -t_ -n -k3 |
sed 's/\(.*\)[0-9]+\.png$/&\t\1/' |
{
n=1;
while IFS=$'\t' read -r from to; do
echo mv "$from" "$to$((n++)).png";
done;
}
Another alternative, with perl:
perl -e 'while(<#ARGV>){$o=$_;s/\d+(?=\D*$)/$i++.".renamed"/e;die if -e $_;rename $o,$_}while(<*.renamed>){$o=$_;s/\.renamed$//;die if -e $_;rename $o,$_}' $(ls -v|sed -E "s/$|^/'/g"|paste -sd ' ' -)
This solution should avoid rename collisions by: first renaming files adding extra ".renamed" extension. And then removing the ".renamed" extension as the last step. Also, There are checks to detect rename collision.
Anyways, please backup your data before trying :)
The perl script unrolled and explained:
while(<#ARGV>){ # loop through arguments.
# filenames are passed to "$_" variable
# save old file name
$o=$_;
# if not using variable, regex replacement (s///) uses topic variable ($_)
# e flag ==> evals the replacement
s/\d+(?=\D*$)/$i++.".renamed"/e; # works on $_
# Detect rename collision
die if -e $_;
rename $o,$_
}
while(<*.renamed>){
$o=$_;
s/\.renamed$//; # remove .renamed extension
die if -e $_;
rename $o,$_
}
The regex:
\d+ # one number or more
(?=\D*$) # followed by 0 or more non-numbers and end of string

How can I pass multiple variables that are space seperated strings to a function in bash?

Consider this mockup:
function test() {
for line in $1
do
echo $line
done
for line2 in $2
do
echo $line2
done
}
# This will give me a list of IDs
list=$(find testfolder/ -type f -exec grep "ID" {} + | sed "s/^.*ID:\ //g")
list2=$(find testfolder2/ -type f -exec grep "ID" {} + | sed "s/^.*ID:\ //g")
# this will not work
test list1 list2
# this will work
for line in $line
do
echo $line
done
for line2 in $2
do
echo $line
done
The problem with this is that the variables $1 and $2 in the function, will be (of course) the first two IDs that were retrieved in list.
Is there a way to pass list and list2 to the function and use them as I would in a non function call?
The problem with Shell scripting and file names is, that the shell splits the input stream into tokens by spaces and newlines. Which characters are used, is stored in the global variable IFS, which is the abbreviation for input field separator. The problem is, that file names may contain spaces and newlines. And if you do not quote the file names correctly as you did it in your question, then the file names get split by the shell.
Problem
Create some files with space:
$ touch a\ {1..3}
If you use a globing pattern to iterate the files, everything is fine:
$ for f in a\ *; do echo ►$f◄; done
►a 1◄
►a 2◄
►a 3◄
But when you use a sub-shell, which echos the file names, they get messed:
$ for f in $(echo a\ *); do echo ►$f◄; done
►a◄
►1◄
►a◄
►2◄
►a◄
►3◄
The same happens, when you use find:
$ for f in $(find . -name 'a *'); do echo ►$f◄; done
►./a◄
►2◄
►./a◄
►3◄
►./a◄
►1◄
Solution
The best way to read a list of files is to delimit them with a character, which is normally not in a file. This is the null character $'\0'. The program find has a special action called -print0 to print the file name with a trailing null character. And the Bash function mapfile can read a list, which is delimited with null characters:
$ mapfile -d $'\0' list < <(find . -name 'a *' -print0)
Now you can write a function, which needs a list of file names.
$ inode() { for f in "$#"; do stat -c %i "$f"; done; }
And pass the list of file names correctly quoted to the function.
$ inode "${list[#]}"
2638642
2638644
2638641
And this works even with newlines in the file name.

Looping over filtered find and performing an operation

I have a garbage dump of a bunch of Wordpress files and I'm trying to convert them all to Markdown.
The script I wrote is:
htmlDocs=($(find . -print | grep -i '.*[.]html'))
for html in "${htmlDocs[#]}"
do
P_MD=${html}.markdown
echo "${html} \> ${P_MD}"
pandoc --ignore-args -r html -w markdown < "${html}" | awk 'NR > 130' | sed '/<div class="site-info">/,$d' > "${P_MD}"
done
As far as I understand, the first line should be making an array of all html files in all subdirectories, then the for loop has a line to create a variable with the Markdown name (followed by a debugging echo), then the actual pandoc command to do the conversion.
One at a time, this command works.
However, when I try to execute it, OSX gives me:
$ ./pandoc_convert.command
./pandoc_convert.command: line 1: : No such file or directory
./pandoc_convert.command: line 1: : No such file or directory
o_0
Help?
There may be many reasons why the script fails, because the way you create the array is incorrect:
htmlDocs=($(find . -print | grep -i '.*[.]html'))
Arrays are assigned in the form: NAME=(VALUE1 VALUE2 ... ), where NAME is the name of the variable, VALUE1, VALUE2, and the rest are fields separated with characters that are present in the $IFS (input field separator) variable. Suppose you find a file name with spaces. Then the expression will create separate items in the array.
Another issue is that the expression doesn't handle globbing, i.e. file name generation based on the shell expansion of special characters such as *:
mkdir dir.html
touch \ *.html
touch a\ b\ c.html
a=($(find . -print | grep -i '.*[.]html'))
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
Output
>>>./a<<<
>>>b<<<
>>>c.html<<<
>>>./<<<
>>>a b c.html<<<
>>>dir.html<<<
>>> *.html<<<
>>>./dir.html<<<
I know two ways to fix this behavior: 1) temporarily disable globbing, and 2) use the mapfile command.
Disabling Globbing
# Disable globbing, remember current -f flag value
[[ "$-" == *f* ]] || globbing_disabled=1
set -f
IFS=$'\n' a=($(find . -print | grep -i '.*[.]html'))
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
# Restore globbing
test -n "$globbing_disabled" && set +f
Output
>>>./ .html<<<
>>>./a b c.html<<<
>>>./ *.html<<<
>>>./dir.html<<<
Using mapfile
The mapfile is introduced in Bash 4. The command reads lines from the standard input into an indexed array:
mapfile -t a < <(find . -print | grep -i '.*[.]html')
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
The find Options
The find command selects all types of nodes, including directories. You should use the -type option, e.g. -type f for files.
If you want to filter the result set with a regular expression use -regex option, or -iregex for case-insensitive matching:
mapfile -t a < <(find . -type f -iregex .*\.html$)
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
Output
>>>./ .html<<<
>>>./a b c.html<<<
>>>./ *.html<<<
echo vs. printf
Finally, don't use echo in new software. Use printf instead:
mapfile -t a < <(find . -type f -iregex .*\.html$)
for html in "${a[#]}"; do printf '>>>%s<<<\n' "$html"; done
Alternative Approach
However, I would rather pipe a loop with a read:
find . -type f -iregex .*\.html$ | while read line
do
printf '>>>%s<<<\n' "$line"
done
In this example, the read command reads a line from the standard input and stores the value into line variable.
Although I like the mapfile feature, I find the code with the pipe more clear.
Try adding the bash shebang and set IFS to handle spaces in folders and filenames:
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
htmlDocs=($(find . -print | grep -i '.*[.]html'))
for html in "${htmlDocs[#]}"
do
P_MD=${html}.markdown
echo "${html} \> ${P_MD}"
pandoc --ignore-args -r html -w markdown < "${html}" | awk 'NR > 130' | sed '/<div class="site-info">/,$d' > "${P_MD}"
done
IFS=$SAVEIFS

Resources