Testing for existing file with an extension fails with globbing [duplicate] - bash

This question already has answers here:
Test whether a glob has any matches in Bash
(22 answers)
Closed 4 years ago.
I'm trying to check if a file exists, but with a wildcard. Here is my example:
if [ -f "xorg-x11-fonts*" ]; then
printf "BLAH"
fi
I have also tried it without the double quotes.

For Bash scripts, the most direct and performant approach is:
if compgen -G "${PROJECT_DIR}/*.png" > /dev/null; then
echo "pattern exists!"
fi
This will work very speedily even in directories with millions of files and does not involve a new subshell.
Source
The simplest should be to rely on ls return value (it returns non-zero when the files do not exist):
if ls /path/to/your/files* 1> /dev/null 2>&1; then
echo "files do exist"
else
echo "files do not exist"
fi
I redirected the ls output to make it completely silent.
Here is an optimization that also relies on glob expansion, but avoids the use of ls:
for f in /path/to/your/files*; do
## Check if the glob gets expanded to existing files.
## If not, f here will be exactly the pattern above
## and the exists test will evaluate to false.
[ -e "$f" ] && echo "files do exist" || echo "files do not exist"
## This is all we needed to know, so we can break after the first iteration
break
done
This is very similar to grok12's answer, but it avoids the unnecessary iteration through the whole list.

If your shell has a nullglob option and it's turned on, a wildcard pattern that matches no files will be removed from the command line altogether. This will make ls see no pathname arguments, list the contents of the current directory and succeed, which is wrong. GNU stat, which always fails if given no arguments or an argument naming a nonexistent file, would be more robust. Also, the &> redirection operator is a bashism.
if stat --printf='' /path/to/your/files* 2>/dev/null
then
echo found
else
echo not found
fi
Better still is GNU find, which can handle a wildcard search internally and exit as soon as at it finds one matching file, rather than waste time processing a potentially huge list of them expanded by the shell; this also avoids the risk that the shell might overflow its command line buffer.
if test -n "$(find /dir/to/search -maxdepth 1 -name 'files*' -print -quit)"
then
echo found
else
echo not found
fi
Non-GNU versions of find might not have the -maxdepth option used here to make find search only the /dir/to/search instead of the entire directory tree rooted there.

Use:
files=(xorg-x11-fonts*)
if [ -e "${files[0]}" ];
then
printf "BLAH"
fi

You can do the following:
set -- xorg-x11-fonts*
if [ -f "$1" ]; then
printf "BLAH"
fi
This works with sh and derivatives: KornShell and Bash. It doesn't create any sub-shell. $(..) and `...` commands used in other solutions create a sub-shell: they fork a process, and they are inefficient. Of course it works with several files, and this solution can be the fastest, or second to the fastest one.
It works too when there aren't any matches. There isn't a need to use nullglob as one of the commentators say. $1 will contain the original test name, and therefore the test -f $1 won't success, because the $1 file doesn't exist.

for i in xorg-x11-fonts*; do
if [ -f "$i" ]; then printf "BLAH"; fi
done
This will work with multiple files and with white space in file names.

The solution:
files=$(ls xorg-x11-fonts* 2> /dev/null | wc -l)
if [ "$files" != "0" ]
then
echo "Exists"
else
echo "None found."
fi
> Exists

Use:
if [ "`echo xorg-x11-fonts*`" != "xorg-x11-fonts*" ]; then
printf "BLAH"
fi

The PowerShell way - which treats wildcards different - you put it in the quotes like so below:
If (Test-Path "./output/test-pdf-docx/Text-Book-Part-I*"){
Remove-Item -force -v -path ./output/test-pdf-docx/*.pdf
Remove-Item -force -v -path ./output/test-pdf-docx/*.docx
}
I think this is helpful because the concept of the original question covers "shells" in general not just Bash or Linux, and would apply to PowerShell users with the same question too.

The Bash code I use:
if ls /syslog/*.log > /dev/null 2>&1; then
echo "Log files are present in /syslog/;
fi

Strictly speaking, if you only want to print "Blah", here is the solution:
find . -maxdepth 1 -name 'xorg-x11-fonts*' -printf 'BLAH' -quit
Here is another way:
doesFirstFileExist(){
test -e "$1"
}
if doesFirstFileExist xorg-x11-fonts*
then printf "BLAH"
fi
But I think the most optimal is as follows, because it won't try to sort file names:
if [ -z $(find . -maxdepth 1 -name 'xorg-x11-fonts*' -printf 1 -quit) ]
then
printf "BLAH"
fi

Here's a solution for your specific problem that doesn't require for loops or external commands like ls, find and the like.
if [ "$(echo xorg-x11-fonts*)" != "xorg-x11-fonts*" ]; then
printf "BLAH"
fi
As you can see, it's just a tad more complicated than what you were hoping for, and relies on the fact that if the shell is not able to expand the glob, it means no files with that glob exist and echo will output the glob as is, which allows us to do a mere string comparison to check whether any of those files exist at all.
If we were to generalize the procedure, though, we should take into account the fact that files might contain spaces within their names and/or paths and that the glob char could rightfully expand to nothing (in your example, that would be the case of a file whose name is exactly xorg-x11-fonts).
This could be achieved by the following function, in bash.
function doesAnyFileExist {
local arg="$*"
local files=($arg)
[ ${#files[#]} -gt 1 ] || [ ${#files[#]} -eq 1 ] && [ -e "${files[0]}" ]
}
Going back to your example, it could be invoked like this.
if doesAnyFileExist "xorg-x11-fonts*"; then
printf "BLAH"
fi
Glob expansion should happen within the function itself for it to work properly, that's why I put the argument in quotes and that's what the first line in the function body is there for: so that any multiple arguments (which could be the result of a glob expansion outside the function, as well as a spurious parameter) would be coalesced into one. Another approach could be to raise an error if there's more than one argument, yet another could be to ignore all but the 1st argument.
The second line in the function body sets the files var to an array constituted by all the file names that the glob expanded to, one for each array element. It's fine if the file names contain spaces, each array element will contain the names as is, including the spaces.
The third line in the function body does two things:
It first checks whether there's more than one element in the array. If so, it means the glob surely got expanded to something (due to what we did on the 1st line), which in turn implies that at least one file matching the glob exist, which is all we wanted to know.
If at step 1. we discovered that we got less than 2 elements in the array, then we check whether we got one and if so we check whether that one exist, the usual way. We need to do this extra check in order to account for function arguments without glob chars, in which case the array contains only one, unexpanded, element.

I found a couple of neat solutions worth sharing. The first still suffers from "this will break if there are too many matches" problem:
pat="yourpattern*" matches=($pat) ; [[ "$matches" != "$pat" ]] && echo "found"
(Recall that if you use an array without the [ ] syntax, you get the first element of the array.)
If you have "shopt -s nullglob" in your script, you could simply do:
matches=(yourpattern*) ; [[ "$matches" ]] && echo "found"
Now, if it's possible to have a ton of files in a directory, you're pretty well much stuck with using find:
find /path/to/dir -maxdepth 1 -type f -name 'yourpattern*' | grep -q '.' && echo 'found'

I use this:
filescount=`ls xorg-x11-fonts* | awk 'END { print NR }'`
if [ $filescount -gt 0 ]; then
blah
fi

Using new fancy shmancy features in KornShell, Bash, and Z shell shells (this example doesn't handle spaces in filenames):
# Declare a regular array (-A will declare an associative array. Kewl!)
declare -a myarray=( /mydir/tmp*.txt )
array_length=${#myarray[#]}
# Not found if the first element of the array is the unexpanded string
# (ie, if it contains a "*")
if [[ ${myarray[0]} =~ [*] ]] ; then
echo "No files not found"
elif [ $array_length -eq 1 ] ; then
echo "File was found"
else
echo "Files were found"
fi
for myfile in ${myarray[#]}
do
echo "$myfile"
done
Yes, this does smell like Perl. I am glad I didn't step in it ;)

IMHO it's better to use find always when testing for files, globs or directories. The stumbling block in doing so is find's exit status: 0 if all paths were traversed successfully, >0 otherwise. The expression you passed to find creates no echo in its exit code.
The following example tests if a directory has entries:
$ mkdir A
$ touch A/b
$ find A -maxdepth 0 -not -empty -print | head -n1 | grep -q . && echo 'not empty'
not empty
When A has no files grep fails:
$ rm A/b
$ find A -maxdepth 0 -not -empty -print | head -n1 | grep -q . || echo 'empty'
empty
When A does not exist grep fails again because find only prints to stderr:
$ rmdir A
$ find A -maxdepth 0 -not -empty -print | head -n1 | grep -q . && echo 'not empty' || echo 'empty'
find: 'A': No such file or directory
empty
Replace -not -empty by any other find expression, but be careful if you -exec a command that prints to stdout. You may want to grep for a more specific expression in such cases.
This approach works nicely in shell scripts. The originally question was to look for the glob xorg-x11-fonts*:
if find -maxdepth 0 -name 'xorg-x11-fonts*' -print | head -n1 | grep -q .
then
: the glob matched
else
: ...not
fi
Note that the else-branched is reached if xorg-x11-fonts* had not matched, or find encountered an error. To distinguish the case use $?.

If there is a huge amount of files on a network folder using the wildcard is questionable (speed, or command line arguments overflow).
I ended up with:
if [ -n "$(find somedir/that_may_not_exist_yet -maxdepth 1 -name \*.ext -print -quit)" ] ; then
echo Such file exists
fi

if [ `ls path1/* path2/* 2> /dev/null | wc -l` -ne 0 ]; then echo ok; else echo no; fi

Try this
fileTarget="xorg-x11-fonts*"
filesFound=$(ls $fileTarget)
case ${filesFound} in
"" ) printf "NO files found for target=${fileTarget}\n" ;;
* ) printf "FileTarget Files found=${filesFound}\n" ;;
esac
Test
fileTarget="*.html" # Where I have some HTML documents in the current directory
FileTarget Files found=Baby21.html
baby22.html
charlie 22.html
charlie21.html
charlie22.html
charlie23.html
fileTarget="xorg-x11-fonts*"
NO files found for target=xorg-x11-fonts*
Note that this only works in the current directory, or where the variable fileTarget includes the path you want to inspect.

You can also cut other files out
if [ -e $( echo $1 | cut -d" " -f1 ) ] ; then
...
fi

Use:
if ls -l | grep -q 'xorg-x11-fonts.*' # grep needs a regex, not a shell glob
then
# do something
else
# do something else
fi

man test.
if [ -e file ]; then
...
fi
will work for directory and file.

Related

For this statement, why could I get the error "[: too many arguments"? I am sure I use the "[ ]" correctly in the shell [duplicate]

This question already has answers here:
Test whether a glob has any matches in Bash
(22 answers)
Closed 4 years ago.
I'm trying to check if a file exists, but with a wildcard. Here is my example:
if [ -f "xorg-x11-fonts*" ]; then
printf "BLAH"
fi
I have also tried it without the double quotes.
For Bash scripts, the most direct and performant approach is:
if compgen -G "${PROJECT_DIR}/*.png" > /dev/null; then
echo "pattern exists!"
fi
This will work very speedily even in directories with millions of files and does not involve a new subshell.
Source
The simplest should be to rely on ls return value (it returns non-zero when the files do not exist):
if ls /path/to/your/files* 1> /dev/null 2>&1; then
echo "files do exist"
else
echo "files do not exist"
fi
I redirected the ls output to make it completely silent.
Here is an optimization that also relies on glob expansion, but avoids the use of ls:
for f in /path/to/your/files*; do
## Check if the glob gets expanded to existing files.
## If not, f here will be exactly the pattern above
## and the exists test will evaluate to false.
[ -e "$f" ] && echo "files do exist" || echo "files do not exist"
## This is all we needed to know, so we can break after the first iteration
break
done
This is very similar to grok12's answer, but it avoids the unnecessary iteration through the whole list.
If your shell has a nullglob option and it's turned on, a wildcard pattern that matches no files will be removed from the command line altogether. This will make ls see no pathname arguments, list the contents of the current directory and succeed, which is wrong. GNU stat, which always fails if given no arguments or an argument naming a nonexistent file, would be more robust. Also, the &> redirection operator is a bashism.
if stat --printf='' /path/to/your/files* 2>/dev/null
then
echo found
else
echo not found
fi
Better still is GNU find, which can handle a wildcard search internally and exit as soon as at it finds one matching file, rather than waste time processing a potentially huge list of them expanded by the shell; this also avoids the risk that the shell might overflow its command line buffer.
if test -n "$(find /dir/to/search -maxdepth 1 -name 'files*' -print -quit)"
then
echo found
else
echo not found
fi
Non-GNU versions of find might not have the -maxdepth option used here to make find search only the /dir/to/search instead of the entire directory tree rooted there.
Use:
files=(xorg-x11-fonts*)
if [ -e "${files[0]}" ];
then
printf "BLAH"
fi
You can do the following:
set -- xorg-x11-fonts*
if [ -f "$1" ]; then
printf "BLAH"
fi
This works with sh and derivatives: KornShell and Bash. It doesn't create any sub-shell. $(..) and `...` commands used in other solutions create a sub-shell: they fork a process, and they are inefficient. Of course it works with several files, and this solution can be the fastest, or second to the fastest one.
It works too when there aren't any matches. There isn't a need to use nullglob as one of the commentators say. $1 will contain the original test name, and therefore the test -f $1 won't success, because the $1 file doesn't exist.
for i in xorg-x11-fonts*; do
if [ -f "$i" ]; then printf "BLAH"; fi
done
This will work with multiple files and with white space in file names.
The solution:
files=$(ls xorg-x11-fonts* 2> /dev/null | wc -l)
if [ "$files" != "0" ]
then
echo "Exists"
else
echo "None found."
fi
> Exists
Use:
if [ "`echo xorg-x11-fonts*`" != "xorg-x11-fonts*" ]; then
printf "BLAH"
fi
The PowerShell way - which treats wildcards different - you put it in the quotes like so below:
If (Test-Path "./output/test-pdf-docx/Text-Book-Part-I*"){
Remove-Item -force -v -path ./output/test-pdf-docx/*.pdf
Remove-Item -force -v -path ./output/test-pdf-docx/*.docx
}
I think this is helpful because the concept of the original question covers "shells" in general not just Bash or Linux, and would apply to PowerShell users with the same question too.
The Bash code I use:
if ls /syslog/*.log > /dev/null 2>&1; then
echo "Log files are present in /syslog/;
fi
Strictly speaking, if you only want to print "Blah", here is the solution:
find . -maxdepth 1 -name 'xorg-x11-fonts*' -printf 'BLAH' -quit
Here is another way:
doesFirstFileExist(){
test -e "$1"
}
if doesFirstFileExist xorg-x11-fonts*
then printf "BLAH"
fi
But I think the most optimal is as follows, because it won't try to sort file names:
if [ -z $(find . -maxdepth 1 -name 'xorg-x11-fonts*' -printf 1 -quit) ]
then
printf "BLAH"
fi
Here's a solution for your specific problem that doesn't require for loops or external commands like ls, find and the like.
if [ "$(echo xorg-x11-fonts*)" != "xorg-x11-fonts*" ]; then
printf "BLAH"
fi
As you can see, it's just a tad more complicated than what you were hoping for, and relies on the fact that if the shell is not able to expand the glob, it means no files with that glob exist and echo will output the glob as is, which allows us to do a mere string comparison to check whether any of those files exist at all.
If we were to generalize the procedure, though, we should take into account the fact that files might contain spaces within their names and/or paths and that the glob char could rightfully expand to nothing (in your example, that would be the case of a file whose name is exactly xorg-x11-fonts).
This could be achieved by the following function, in bash.
function doesAnyFileExist {
local arg="$*"
local files=($arg)
[ ${#files[#]} -gt 1 ] || [ ${#files[#]} -eq 1 ] && [ -e "${files[0]}" ]
}
Going back to your example, it could be invoked like this.
if doesAnyFileExist "xorg-x11-fonts*"; then
printf "BLAH"
fi
Glob expansion should happen within the function itself for it to work properly, that's why I put the argument in quotes and that's what the first line in the function body is there for: so that any multiple arguments (which could be the result of a glob expansion outside the function, as well as a spurious parameter) would be coalesced into one. Another approach could be to raise an error if there's more than one argument, yet another could be to ignore all but the 1st argument.
The second line in the function body sets the files var to an array constituted by all the file names that the glob expanded to, one for each array element. It's fine if the file names contain spaces, each array element will contain the names as is, including the spaces.
The third line in the function body does two things:
It first checks whether there's more than one element in the array. If so, it means the glob surely got expanded to something (due to what we did on the 1st line), which in turn implies that at least one file matching the glob exist, which is all we wanted to know.
If at step 1. we discovered that we got less than 2 elements in the array, then we check whether we got one and if so we check whether that one exist, the usual way. We need to do this extra check in order to account for function arguments without glob chars, in which case the array contains only one, unexpanded, element.
I found a couple of neat solutions worth sharing. The first still suffers from "this will break if there are too many matches" problem:
pat="yourpattern*" matches=($pat) ; [[ "$matches" != "$pat" ]] && echo "found"
(Recall that if you use an array without the [ ] syntax, you get the first element of the array.)
If you have "shopt -s nullglob" in your script, you could simply do:
matches=(yourpattern*) ; [[ "$matches" ]] && echo "found"
Now, if it's possible to have a ton of files in a directory, you're pretty well much stuck with using find:
find /path/to/dir -maxdepth 1 -type f -name 'yourpattern*' | grep -q '.' && echo 'found'
I use this:
filescount=`ls xorg-x11-fonts* | awk 'END { print NR }'`
if [ $filescount -gt 0 ]; then
blah
fi
Using new fancy shmancy features in KornShell, Bash, and Z shell shells (this example doesn't handle spaces in filenames):
# Declare a regular array (-A will declare an associative array. Kewl!)
declare -a myarray=( /mydir/tmp*.txt )
array_length=${#myarray[#]}
# Not found if the first element of the array is the unexpanded string
# (ie, if it contains a "*")
if [[ ${myarray[0]} =~ [*] ]] ; then
echo "No files not found"
elif [ $array_length -eq 1 ] ; then
echo "File was found"
else
echo "Files were found"
fi
for myfile in ${myarray[#]}
do
echo "$myfile"
done
Yes, this does smell like Perl. I am glad I didn't step in it ;)
IMHO it's better to use find always when testing for files, globs or directories. The stumbling block in doing so is find's exit status: 0 if all paths were traversed successfully, >0 otherwise. The expression you passed to find creates no echo in its exit code.
The following example tests if a directory has entries:
$ mkdir A
$ touch A/b
$ find A -maxdepth 0 -not -empty -print | head -n1 | grep -q . && echo 'not empty'
not empty
When A has no files grep fails:
$ rm A/b
$ find A -maxdepth 0 -not -empty -print | head -n1 | grep -q . || echo 'empty'
empty
When A does not exist grep fails again because find only prints to stderr:
$ rmdir A
$ find A -maxdepth 0 -not -empty -print | head -n1 | grep -q . && echo 'not empty' || echo 'empty'
find: 'A': No such file or directory
empty
Replace -not -empty by any other find expression, but be careful if you -exec a command that prints to stdout. You may want to grep for a more specific expression in such cases.
This approach works nicely in shell scripts. The originally question was to look for the glob xorg-x11-fonts*:
if find -maxdepth 0 -name 'xorg-x11-fonts*' -print | head -n1 | grep -q .
then
: the glob matched
else
: ...not
fi
Note that the else-branched is reached if xorg-x11-fonts* had not matched, or find encountered an error. To distinguish the case use $?.
If there is a huge amount of files on a network folder using the wildcard is questionable (speed, or command line arguments overflow).
I ended up with:
if [ -n "$(find somedir/that_may_not_exist_yet -maxdepth 1 -name \*.ext -print -quit)" ] ; then
echo Such file exists
fi
if [ `ls path1/* path2/* 2> /dev/null | wc -l` -ne 0 ]; then echo ok; else echo no; fi
Try this
fileTarget="xorg-x11-fonts*"
filesFound=$(ls $fileTarget)
case ${filesFound} in
"" ) printf "NO files found for target=${fileTarget}\n" ;;
* ) printf "FileTarget Files found=${filesFound}\n" ;;
esac
Test
fileTarget="*.html" # Where I have some HTML documents in the current directory
FileTarget Files found=Baby21.html
baby22.html
charlie 22.html
charlie21.html
charlie22.html
charlie23.html
fileTarget="xorg-x11-fonts*"
NO files found for target=xorg-x11-fonts*
Note that this only works in the current directory, or where the variable fileTarget includes the path you want to inspect.
You can also cut other files out
if [ -e $( echo $1 | cut -d" " -f1 ) ] ; then
...
fi
Use:
if ls -l | grep -q 'xorg-x11-fonts.*' # grep needs a regex, not a shell glob
then
# do something
else
# do something else
fi
man test.
if [ -e file ]; then
...
fi
will work for directory and file.

Add .old to files without .old in them, having trouble with which variable to use?

#!/bin/bash
for filenames in $( ls $1 )
do
echo $filenames | grep "\.old$"
if [ ! $filenames = 0 ]
then
$( mv "$1/$filenames" "$1/$filenames.old" )
fi
done
So I think most of the script works. It is intended to take the output of ls for a directory inputed in the first parameter, and search for any files with .old at the end. Any files that do not contain .old will then be renamed.
The script successfully renames the files, but it will add .old to a file already containing the extension. I am assuming that the if variable is wrong, but I cannot figure out which variable to use in this case.
Answer is in the key but if anyone needs to do this here is an even easier way:
#!/bin/bash
for filenames in $( ls $1 | grep -v "\.old$" )
do
$( mv "$1/$filenames" "$1/$filenames.old" )
done
Use `find for this
find /directory/here -type f ! -iname "*.old" -exec mv {} {}.old \;
Problems the original approach
for filenames in $( ls $1 ) Never parse ls output. Check [ this ]
Variables are not double quoted, say in if [ ! $filenames = 0 ]. This results in word-splitting. Use "$filenames" unless you expect word splitting.
So the final script would be
#!/bin/bash
if [ -d "$1" ]
then
find "$1" -type f ! -iname "*.old" -exec mv {} {}.old \;
# use -maxdepth 1 with find if you don't wish to recursively check subdirectories
else
echo "Directory : $1 doesn't exist !"
fi
Usage
./script '/path/to/directory'
Don't use ls in scripts.
#!/bin/bash
for filename in "$1"/*
do
case $filename in *.old) continue;; esac
mv "$filename" "$filename.old"
done
I prefer case over if because it supports wildcard matching naturally and portably. (You could run this with /bin/sh just as well.) If you wanted to use if instead, that'd be
if echo "$filename" | grep -q '\.old$'; then
or more idiomatically, but recent shells only,
if [[ "$filename" == *.old ]]; then
You want to avoid calling additional utility functions if simple shell builtins will do. Why? Each additional utility you call grep, etc. spawns and runs in a separate subshell of its own. (if you are spawning a subshell for every iteration in your loop -- things will really slow down) If the shell doesn't provide a feature, then sure... calling a utility is the right thing to do.
As mentioned above, shell globbing along with parameter expansion with substring removal provides a simple test for determining if a file has an .old extension. All you need is:
for i in "$1"/*; do
[ "${i##*.}" = "old" ] || mv "$i" "${i}.old"
done
(note: this will skip add the .old extension to single file named 'old', but that can be handled separately if needed -- unlikely. Additionally, the solution with find is a fine approach as well)
I solved the problem, as I was misled by my instructor!
$? is the variable which represents the pipeline output which is currently in the forground (which would be grep). The new code is unedited except for
if [ ! $? = 0 ]

find command with filename coming from bash printf builtin not working

I'm trying to do a script which lists files on a directory and then searchs one by one every file in other directory. For dealing with spaces and special characters like "[" or "]" I'm using $(printf %q "$FILENAME") as input for the find command: find /directory/to/search -type f -name $(printf %q "$FILENAME").
It works like a charm for every filename except in one case: when there's multibyte characters (UTF-8). In that case the output of printf is an external quoted string, i.e.: $'file name with blank spaces and quoted characters in the form of \NNN\NNN', and that string is not being expanded without the $'' quoting, so find searchs for a file with a name including that quote: «$'filename'».
Is there an alternative solution in order to be able to pass to find any kind of filename?
My script is like follows (I know some lines can be deleted, like the "RESNAME="):
#!/bin/bash
if [ -d $1 ] && [ -d $2 ]; then
IFSS=$IFS
IFS=$'\n'
FILES=$(find $1 -type f )
for FILE in $FILES; do
BASEFILE=$(printf '%q' "$(basename "$FILE")")
RES=$(find $2 -type f -name "$BASEFILE" -print )
if [ ${#RES} -gt 1 ]; then
RESNAME=$(printf '%q' "$(basename "$RES")")
else
RESNAME=
fi
if [ "$RESNAME" != "$BASEFILE" ]; then
echo "FILE NOT FOUND: $FILE"
fi
done
else
echo "Directories do not exist"
fi
IFS=$IFSS
As an answer said, I've used associative arrays, but with no luck, maybe I'm not using correctly the arrays, but echoing it (array[#]) returns nothing. This is the script I've written:
#!/bin/bash
if [ -d "$1" ] && [ -d "$2" ]; then
declare -A files
find "$2" -type f -print0 | while read -r -d $'\0' FILE;
do
BN2="$(basename "$FILE")"
files["$BN2"]="$BN2"
done
echo "${files[#]}"
find "$1" -type f -print0 | while read -r -d $'\0' FILE;
do
BN1="$(basename "$FILE")"
if [ "${files["$BN1"]}" != "$BN1" ]; then
echo "File not found: "$BN1""
fi
done
fi
Don't use for loops. First, it is slower. Your find has to complete before the rest of your program can run. Second, it is possible to overload the command line. The enter for command must fit in the command line buffer.
Most importantly of all, for sucks at handling funky file names. You're running conniptions trying to get around this. However:
find $1 -type f -print0 | while read -r -d $'\0' FILE
will work much better. It handles file names -- even file names that contain \n characters. The -print0 tells find to separate file names with the NUL character. The while read -r -d $'\0 FILE will read each file name (separate by the NUL character) into $FILE.
If you put quotes around the file name in the find command, you don't have to worry about special characters in the file names.
Your script is running find once for each file found. If you have 100 files in your first directory, you're running find 100 times.
Do you know about associative (hash) arrays in BASH? You are probably better off using associative arrays. Run find on the first directory, and store those files names in an associative array.
Then, run find (again using the find | while read syntax) for your second directory. For each file you find in the second directory, see if you have a matching entry in your associative array. If you do, you know that file is in both arrays.
Addendum
I've been looking at the find command. It appears there's no real way to prevent it from using pattern matching except through a lot of work (like you were doing with printf. I've tried using the -regex matching and using \Q and \E to remove the special meaning of pattern characters. I haven't been successful.
There comes a time that you need something a bit more powerful and flexible than shell to implement your script, and I believe this is the time.
Perl, Python, and Ruby are three fairly ubiquitous scripting languages found on almost all Unix systems and are available on other non-POSIX platforms (cough! ...Windows!... cough!).
Below is a Perl script that takes two directories, and searches them for matching files. It uses the find command once and uses associative arrays (called hashes in Perl). I key the hash to the name of my file. In the value portion of the hash, I store an array of the directories where I found this file.
I only need to run the find command once per directory. Once that is done, I can print out all the entries in the hash that contain more than one directory.
I know it's not shell, but this is one of the cases where you can spend a lot more time trying to figure out how to get shell to do what you want than its worth.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use File::Find;
use constant DIRECTORIES => qw( dir1 dir2 );
my %files;
#
# Perl version of the find command. You give it a list of
# directories and a subroutine for filtering what you find.
# I am basically rejecting all non-file entires, then pushing
# them into my %files hash as an array.
#
find (
sub {
return unless -f;
$files{$_} = [] if not exists $files{$_};
push #{ $files{$_} }, $File::Find::dir;
}, DIRECTORIES
);
#
# All files are found and in %files hash. I can then go
# through all the entries in my hash, and look for ones
# with more than one directory in the array reference.
# IF there is more than one, the file is located in multiple
# directories, and I print them.
#
for my $file ( sort keys %files ) {
if ( #{ $files{$file} } > 1 ) {
say "File: $file: " . join ", ", #{ $files{$file} };
}
}
Try something like this:
find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\}
How about this one-liner?
find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n"
Absolutely 100% safe regarding files with funny symbols, newlines and spaces in their name.
How does it work?
find (the main one) will scan through directory dir1 and for each file (-type f) will execute
read < <(find dir2 -name "${1##*/} -type f")
with argument the name of the current file given by the main find. This argument is at position $1. The ${1##*/} removes everything before the last / so that if $1 is path/to/found/file the find statement is:
find dir2 -name "file" -type f
This outputs something if file is found, otherwise has no output. That's what is read by the read bash command. read's exit status is true if it was able to read something, and false if there wasn't anything read (i.e., in case nothing is found). This exit status becomes bash's exit status which becomes -exec's status. If true, the next -printf statement is executed, and if false, the -o -printf part will be executed.
If your dirs are given in variables $dir1 and $dir2 do this, so as to be safe regarding spaces and funny symbols that could occur in $dir2:
find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n"
Regarding efficiency: this is of course not an efficient method at all! the inner find will be executed as many times as there are found files in dir1. This is terrible, especially if the directory tree under dir2 is deep and has many branches (you can rely a little bit on caching, but there are limits!).
Regarding usability: you have fine-grained control on how both find's work and on the output, and it's very easy to add many more tests.
So, hey, tell me how to compare files from two directories? Well, if you agree on loosing a little bit of control, this will be the shortest and most efficient answer:
diff dir1 dir2
Try it, you'll be amazed!
Since you are only using find for its recursive directory following, it will be easier to simply use the globstar option in bash. (You're using associative arrays, so your bash is new enough).
#!/bin/bash
shopt -s globstar
declare -A files
if [[ -d $1 && -d $2 ]]; then
for f in "$2"/**/*; do
[[ -f "$f" ]] || continue
BN2=$(basename "$f")
files["$BN2"]=$BN2
done
echo "${files[#]}"
for f in "$1"/**/*; do
[[ -f "$f" ]] || continue
BN1=$(basename $f)
if [[ ${files[$BN1]} != $BN1 ]]; then
echo "File not found: $BN1"
fi
done
fi
** will match zero or more directories, so $1/**/* will match all the files and directories in $1, all the files and directories in those directories, and so forth all the way down the tree.
If you want to use associative arrays, here's one possibility that will work well with files with all sorts of funny symbols in their names (this script has too much to just show the point, but it is usable as is – just remove the parts you don't want and adapt to your needs):
#!/bin/bash
die() {
printf "%s\n" "$#"
exit 1
}
[[ -n $1 ]] || die "Must give two arguments (none found)"
[[ -n $2 ]] || die "Must give two arguments (only one given)"
dir1=$1
dir2=$2
[[ -d $dir1 ]] || die "$dir1 is not a directory"
[[ -d $dir2 ]] || die "$dir2 is not a directory"
declare -A dir1files
declare -A dir2files
while IFS=$'\0' read -r -d '' file; do
dir1files[${file##*/}]=1
done < <(find "$dir1" -type f -print0)
while IFS=$'\0' read -r -d '' file; do
dir2files[${file##*/}]=1
done < <(find "$dir2" -type f -print0)
# Which files in dir1 are in dir2?
for i in "${!dir1files[#]}"; do
if [[ -n ${dir2files[$i]} ]]; then
printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2"
# Remove it from dir2 has
unset dir2files["$i"]
else
printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2"
fi
done
# Which files in dir2 are not in dir1?
# Since I unset them from dir2files hash table, the only keys remaining
# correspond to files in dir2 but not in dir1
if [[ -n "${!dir2files[#]}" ]]; then
printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[#]}"
fi
Remark. The identification of files is only based on their filenames, not their contents.

Recursive Shell Script and file extensions issue

I have a problem with this script. The script is supposed to go trough all the files and all sub-directories and sub-files (recursively). If the file ends with the extension .txt i need to replace a char/word in the text with a new char/word and then copy it into a existing directory. The first argument is the directory i need to start the search, the second is the old char/word, third the new char/word and fourth the directory to copy the files to. The script goes trough the files but only does the replacement and copies the files from the original directory. Here is the script
#!/bin/bash
funk(){
for file in `ls $1`
do
if [ -f $file ]
then
ext=${file##*.}
if [ "$ext" = "txt" ]
then
sed -i "s/$2/$3/g" $file
cp $file $4
fi
elif [ -d $file ]
then
funk $file $2 $3 $4
fi
done
}
if [ $# -lt 4 ]
then
echo "Need more arg"
exit 2;
fi
cw=$1
a=$2
b=$3
od=$4
funk $cw $a $b $od
You're using a lot of bad practices here: lack of quotings, you're parsing the output of ls... all this will break as soon as a filename contains a space of other funny symbol.
You don't need recursion if you either use bash's globstar optional behavior, or find.
Here's a possibility with the former, that will hopefully show you better practices:
#!/bin/bash
shopt -s globstar
shopt -s nullglob
funk() {
local search=${2//\//\\/}
local replace=${3//\//\\/}
for f in "$1"/**.txt; do
sed -i "s/$search/$replace/g" -- "$f"
cp -nvt "$4" -- "$f"
done
}
if (($#!=4)); then
echo >&2 "Need 4 arguments"
exit 1
fi
funk "$#"
The same function funk using find:
#!/bin/bash
funk() {
local search=${2//\//\\/}
local replace=${3//\//\\/}
find "$1" -name '*.txt' -type f -exec sed -i "s/$search/$replace/g" -- {} \; -exec cp -nvt "$4" -- {} \;
}
if (($#!=4)); then
echo >&2 "Need 4 arguments"
exit 1
fi
funk "$#"
In cp I'm using
the -n switch: no clobber, so as to not overwrite an existing file. Use it if your version of mv supports it, unless you actually want to overwrite files.
the -v switch: verbose, will show you the moved files (optional).
the -t switch: -t followed by a directory tells to copy into this directory. It's a very good thing to use cp this way: imagine instead of giving an existing directory, you give an existing file: without this feature, this file will get overwritten several times (well, this will be the case if you omit the -n option)! with this feature the existing file will remain safe.
Also notice the use of --. If your cp and sed supports it (it's the case for GNU sed and cp), use it always! it means end of options now. If you don't use it and if a filename start with a hyphen, it would confuse the command trying to interpret an option. With this --, we're safe to put a filename that may start with a hyphen.
Notice that in the search and replace patterns I replaced all slashes / by their escaped form \/ so as not to clash with the separator in sed if a slash happens to appear in search or replace.
Enjoy!
As pointed out, looping over find output is not a good idea. It also doesn't support slashes in search&replace.
Check gniourf_gniourf's answer.
How about using find for that?
#!/bin/bash
funk () {
local dir=$1; shift
local search=$1; shift
local replace=$1; shift
local dest=$1; shift
mkdir -p "$dest"
for file in `find $dir -name *.txt`; do
sed -i "s/$search/$replace/g" "$file"
cp "$file" "$dest"
done
}
if [[ $# -lt 4 ]] ; then
echo "Need 4 arguments"
exit 2;
fi
funk "$#"
Though you might have files with the same names in the subdirectories, then those will be overwritten. Is that an issue in your case?

bash testing a group of directories for existence

Have documents stored in a file system which includes "daily" directories, e.g. 20050610. In a bash script I want to list the files in a months worth of these directories. So I'm running a find command find <path>/200506* -type f >> jun2005.lst. Would like to check that this set of directories is not a null set before executing the find command. However, if I use if[ -d 200506* ] I get a "too many arguements error. How can I get around this?
Your "too many arguments" error does not come from there being a huge number of files and exceeding the command line argument limit. It comes from having more than one or two directories that match the glob. Your glob "200506*" expands to something like "20050601 20050602 20050603..." and the -d test only expects one argument.
$ mkdir test
$ cd test
$ mkdir a1
$ [ -d a* ] # no error
$ mkdir a2
$ [ -d a* ]
-bash: [: a1: binary operator expected
$ mkdir a3
$ [ -d a* ]
-bash: [: too many arguments
The answer by zed_0xff is on the right track, but I'd use a different approach:
shopt -s nullglob
path='/path/to/dirs'
glob='200506*/'
outfile='jun2005.lst'
dirs=("$path"/$glob) # dirs is an array available to be iterated over if needed
if (( ${#dirs[#]} > 0 ))
then
echo "directories found"
# append may not be necessary here
find "$path"/$glob -type f >> "$outfile"
fi
The position of the quotes in "$path"/$glob versus "$path/$glob" is essential to this working.
Edit:
Corrections made to exclude files that match the glob (so only directories are included) and to handle the very unusual case of a directory named literally like the glob ("200506*").
prefix="/tmp/path"
glob="200611*"
n_dirs=$(find $prefix -maxdepth 1 -type d -wholename "$prefix/$glob" |wc -l)
if [[ $n_dirs -gt 0 ]];then
find $prefix -maxdepth 2 -type f -wholename "$prefix/$glob"
fi
S=200506*
if [ ${#S} -gt 6 ]; then
echo haz filez!
else
echo no filez
fi
not a very elegant one, but w/o any external tools/commands (if don't think of "[" as an external one)
the clue is if there is some files matched, then "S" variable will contain their names delimited with space. Otherwise it will contain a "200506*" string itself.
You could us ls like this:
if [ -n "$(ls -d | grep 200506)" ]; then
# There are directories with this pattern
fi
Because there is a limit on command line length in most shells: anything like "$(ls -d | grep 200506)" or /path/200506* will run the risk of overflowing the limit. I'm not sure if substitutions and glob expansions count towards it in BASH, but I assume so. You would have to test it and check the bash docs and source to be sure.
The answer is in simplifying your question.
find <path>/200506* -type f -exec somescript '{}' \;
Where somescript is a shell script that does the test. Something like this perhaps:
#!/bin/sh
[ -d "$#" ] && echo "$#" >> june2005.lst
Passing the june2005.lst to the script (advice: use an environment variable), and dealing with any possibility that 200506* may expand to tooo huge a file path, being left as an exercise for the OP ;)
Integrating the whole thing into a pipe line or adapting a more general scripting language would yield performance boosts, by minimizing the number of shells spawned. Now that would be fun. Here is a hint for that, use -exec and another program (awk, perl, etc) to do the directory test as part of a one line filter, and keep the >>june2005.lst on the find command.

Resources