Use golb patterns stored in a file for searching - bash

Use golb patterns stored in a file for searching - bash - bash

How can I use glob patterns that are stored in a textfile as an input for a filesearch?
I want to search all subdirectories for files that are matching global patterns which are stored in a textfile.
The file with global patterns looks like:
/Dongle/src/*.c
/App/files/**/*.xml
...
Who can I use that file as an input for the find command in bash?
What I tried so far is:
modelFile=$1
root="Project/"
regexes=$(cat $modelFile)
outFile="out.txt"
for re in $regexes; do
find $root -type f -regex $re > $outFile
done
But it does not match any files. It works if I use it like:
(...)
for re in $regexes; do
find $root -type f -regex "/App/files/**/*.xml" > $outFile
done
for re in $regexes; do
find $root -type f -regex "/Dongle/src/*.c " >> $outFile
done
I don't necessarily have to use find. Every other bash command would work as well.
The output should be every file that matches the glob patterns.

These are not regexes, these are glob patterns.
Furthermore, find would not be very effective to find the matches, since the patterns themselves should be resolvable by the shell directly,
if your shell supports the globstar and nullglob extensions.
shopt -s globstar nullglob
while read glob; do
for path in $glob; do
echo "$path"
done
done < "$modelFile" > "$outFile"
Some other important issues with your original code:
Redirection with > in a loop to the same file would overwrite the file on every iteration
Always enclose in double quotes variables used as command line arguments
Process line by line input using a while loop instead of a for loop

Related

How to convert files in Unix using iconv?

I'm new to Bash scripting. I have a requirement to convert multiple input files in UTF-8 encoding to ISO 8859-1.
I am using the below command, which is working fine for the conversion part:
cd ${DIR_INPUT}/
for f in *.txt; do iconv -f UTF-8 -t ISO-8859-1 $f > ${DIR_LIST}/$f; done
However, when I don't have any text files in my input directory ($DIR_INPUT), it still creates an empty .txt file in my output directory ($DIR_LIST).
How can I prevent this from happening?

The empty file *.txt is being created in your output directory because by default, bash expands an unmatched expansions to the literal string that you supplied. You can change this behaviour in a number of ways, but what you're probably looking for is shopt -s nullglob. Observe:
$ for i in a*; do echo "$i"; done
a*
$ shopt -s nullglob
$ for i in a*; do echo "$i"; done
$
You can find documentation about this in the bash man page under Pathname Expansion. Or here or here.
In your case, I'd probably rewrite this in this way:
shopt -s nullglob
for f in "$DIR_INPUT"/*.txt; do
iconv -f UTF-8 -t ISO-8859-1 "$f" > "${DIR_LIST}/${f##*/}"
done
This avoids the need for the initial cd, and uses parameter expansion to strip off the path portion of $f for the output redirection. The nullglob will obviously eliminate the work being done on a nonexistent file.

As #ghoti pointed out, in the absence of files matching the wildcard expression a* the expression itself becomes the result of pathname expansion. By default (when nullglob option is unset), a* is expanded to, literally, a*.
You can set nullglob option, of course. But then you should be aware of the fact that all subsequent pathname expansions will be affected, unless you unset the option after the loop.
I would rather use find command which has a clear interface (and, in my opinion, is less likely to perform implicit conversions as opposed to the Bash globbing). E.g.:
cmd='iconv --verbose -f UTF-8 -t ISO-8859-1 "$0" > "$1"/$(basename "$0")'
find "${DIR_INPUT}/" \
-mindepth 1 \
-maxdepth 1 \
-type f \
-name '*.txt' \
-exec sh -c "$cmd" {} "${DIR_LIST}" \;
In the example above, $0 and $1 are positional arguments for the file path and ${DIR_LIST} respectively. The command is invoked via standard shell (sh) because of the need to refer to the file path {} twice. Although most modern implementations of find may handle multiple occurrences of {} correctly, the POSIX specification states:
If more than one argument containing the two characters "{}" is present, the behavior is unspecified.
As in the for loop, the -name pattern *.txt is evaluated as true if the basename of the current pathname matches the operand (*.txt) using the pattern matching notation. But, unlike the for loop, filename expansion do not apply as this is a matching operation, not an expansion.

Recursively looking for a list of file types

I want to use bash to remove all the files in a directory that aren't in an associative array of file extensions. (i.e. delete all the files in a directory that aren't image files, for example)
This question very clearly answers how to do this for a single file extension, but I'm not sure how to do it for a whole list.
currently I'm doing this
for f in $(find . -type f ! -name '*.png' -and ! -name '*.jpg' ); do rm "$f"; done
but it seems ugly to just add a massive list of "-and -name '*.aaa'" inside the parenthesis for every file type.
Is there a way to pass find an associate array like
declare -A allowedTypes=([*.png]=1 [*.jpg]=1 [*.gif]=1)
or will I just need to add a lot of "-and ! -name ___"?
Thanks!

The whole idea of using find int the first place is not needed. The shell globbing support in bash is sufficient enough for this requirement. The bash shell provides an extended glob support option using which you can get the file names under recursive paths that don't end with the extensions you want to ignore.
The extended option is extglob which needs to be set using the shopt option as below. Additionally you could use couple of options more i.e. nullglob in which an unmatched glob is swept away entirely, replaced with a set of zero words. And globstar that allows to recurse through all the directories
shopt -s extglob nullglob globstar
Now all you need to do is form the glob expression to exclude the files of type *.png, *.jpg and *.gif which you can do as below. We use an array to populate the glob results because when quoted properly and expanded, the filenames with special characters would remain intact
fileList=(**/!(*.jpg|*.gif|*.png))
The option ** is to recurse through the sub-folders and !() is a negate operation to not include any of the file extensions listed inside. Now for printing the actual files, just do
printf '%s\n' "${fileList[#]}"
If your intentions is for example to remove all the files identified, you don't need to store the glob results in the array. One could use the array approach when writing simple shell scripts which need to use the results of the glob. But for a case of deleting the files, you could use the rm command.
At first you could check if the files returned are as expected and once you confirmed you could the rm on the expression. Use ls to see if the files are listed as expected
ls -1 -- **/!(*.jpg|*.gif|*.png)
and now after confirming the files to delete, do rm at your own risk.
rm -- **/!(*.jpg|*.gif|*.png)

Assumption: allowedTypes contains only trusted input and only valid suffixes.
The first snippet supports multi-level suffixes like tar.gz. It uses find, a regular expression and a list of allowed suffixes allowedTypes.
allowedTypes=(png gif jpg)
# keepTypes='png|gif|jpg'
keepTypes="$(echo "${allowedTypes[#]}" | tr ' ' '|')"
find . -type f -regextype awk ! -iregex '(.*).('"$keepTypes"')' -exec echo rm {} \;
If you want to keep your associate array, then you could use the following snippet.
It needs additional work to support multi-level file suffixes.
declare -A allowedTypes=([*.png]=1 [*.jpg]=1 [*.gif]=1)
keepTypes="$(echo "${!allowedTypes[#]}" | tr ' ' '|' | tr -d '.*')"
It would be nice if there would be a way to replace the separators with a built-in tool instead of tr but I found none. ${allowedTypes[#]//\ /test} did not replace the whitespaces between the items.

How to surround find's -name parameter with wildcards before and after a variable?

I have a list of newline-separated strings. I need to iterate through each line, and use the argument surrounded with wildcards. The end result will append the found files to another text file. Here's some of what I've tried so far:
cat < ${INPUT} | while read -r line; do find ${SEARCH_DIR} -name $(eval *"$line"*); done >> ${OUTPUT}
I've tried many variations of eval/$() etc, but I haven't found a way to get both of the asterisks to remain. Mostly, I get things that resemble *$itemFromList, but it's missing the second asterisk, resulting in the file not being found. I think this may have something to do with bash expansion, but I haven't had any luck with the resources I've found so far.
Basically, need to supply the -name parameter with something that looks like *$itemFromList*, because the file has words both before and after the value I'm searching for.
Any ideas?

Use double quotes to prevent the asterisk from being interpreted as an instruction to the shell rather than find.
-name "*$line*"
Thus:
while read -r line; do
line=${line%$'\r'} # strip trailing CRs if input file is in DOS format
find "$SEARCH_DIR" -name "*$line*"
done <"$INPUT" >>"$OUTPUT"
...or, better:
#!/usr/bin/env bash
## use lower-case variable names
input=$1
output=$2
args=( -false ) # for our future find command line, start with -false
while read -r line; do
line=${line%$'\r'} # strip trailing CR if present
[[ $line ]] || continue # skip empty lines
args+=( -o -name "*$line*" ) # add an OR clause matching if this line's substring exists
done <"$input"
# since our last command is find, use "exec" to let it replace the shell in memory
exec find "$SEARCH_DIR" '(' "${args[#]}" ')' -print >"$output"
Note:
The shebang specifying bash ensures that extended syntax, such as arrays, are available.
See BashFAQ #50 for a discussion of why an array is the correct structure to use to collect a list of command-line arguments.
See the fourth paragraph of http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html for the relevant POSIX specification on environment and shell variable naming conventions: All-caps names are used for variables with meaning to the shell itself, or to POSIX-specified tools; lowercase names are reserved for application use. That script you're writing? For purposes of the spec, it's an application.

How to deal with `*` expansion when there are no files

I am making a shell script that allows you to select a file from a directory using YAD. I am doing this:
list='';
exc='!'
for f in "$SHOTS_NOT_CONVERTED_DIR"/*;do
f=`basename $f`
list="${list}${exc}${f}"
done
The problem is that if there are no files in that directory, I end up with a selection with *.
What's the easiest, most elegant way to make this work in Bash?
The goal is to have an empty list if there are no files there.

* expansion is called a glob expressions. The bash manual calls it filename expansion.
You need to set the nullglob option. Doing so gives you an empty result if the glob expression does not find files:
shopt -s nullglob
list='';
exc='!'
for f in "$SHOTS_NOT_CONVERTED_DIR"/*;do
# Btw, use $() instead of ``
f=$(basename "$f")
list="${list}${exc}${f}"
done

Matching .jpg but not .foo.jpg in bash

This must be simple but I can't figure it out.
for filename in *[^\.foo].jpg
do
echo $filename
done
Instead of the matched filenames, echo shows the pattern:
+ echo '*[^.foo].jpg'
*[^.foo].jpg
Intention is to find all files ending in .jpg but not .foo.jpg.
EDIT: Tried this as per (misunderstood) advice:
for filename in *[!".foo"].jpg
Still not there!

You actually can do this, with an extglob. To demonstrate, copy-and-paste the following code:
shopt -s extglob
cd "$(mktemp -d "${TMPDIR:-/tmp}/test.XXXXXX")" || exit
touch hello.txt hello.foo hello.foo.jpg hello.jpg
printf '%q\n' !(*.foo).jpg
Output should be:
hello.jpg

In bash, if a glob pattern has no matches bash will return the pattern itself. You can change this behavior with the nullglob shell option, which can be turned on like this:
shopt -s nullglob
This is described in the section titled Filename Expansion in the bash man page.
As to why it doesn't match, it's simply that you don't have any files that match. This is possibly due to your use of ^ which isn't normally a valid glob meta character. As far as glob is concerned, ^ simply matches a literal ^. Also, [...] probably doesn't do what you think it does either.
For an explanation of valid glob meta-characters, see the Pattern Matching section of the bash man page.
You can't write a glob pattern that returns "all files ending in .jpg but not .foo.jpg.". The easiest thing to do is glob over all jpg files (*.jpg) and then filter out the ones that end in foo.jpg inside the code block.
for filename in *.jpg
do
[[ $filename = *.foo.jpg ]] && continue
echo $filename
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio