Script fails with spaces in directory names - shell

I have a really easy question, I have found a bunch of similar questions answered but none that solved this for me.
I have a shell script that goes through a directory and prints out the number of files and directories in a sub directory, followed by the directory name.
However it fails with directories with spaces, it attempts to use each word as a new argument. I have tried putting $dir in quotations but that doesn't help. Perhaps because its already in the echo quotations.
for dir in `find . -mindepth 1 -maxdepth 1 -type d`
do
echo -e "`ls -1 $dir | wc -l`\t$dir"
done
Thanks in advance for your help :)

Warning: Two of the three code samples below use bashisms. Please take care to use the correct one if you need POSIX sh rather than bash.
Don't do any of those things. If your real problem does involve using find, you can use it like so:
shopt -s nullglob
while IFS='' read -r -d '' dir; do
files=( "$dir"/* )
printf '%s\t%s\n' "${#files[#]}" "$dir"
done < <(find . -mindepth 1 -maxdepth 1 -type d -print0)
However, for iterating over only immediate subdirectories, you don't need find at all:
shopt -s nullglob
for dir in */; do
files=( "$dir"/* )
printf '%s\t%s\n' "${#files[#]}" "$dir"
done
If you're trying to do this in a way compatible with POSIX sh, you can try the following:
for dir in */; do
[ "$dir" = "*/" ] && continue
set -- "$dir"/*
[ "$#" -eq 1 ] && [ "$1" = "$dir/*" ] && continue
printf '%s\t%s\n' "$#" "$dir"
done
You shouldn't ever use ls in scripts: http://mywiki.wooledge.org/ParsingLs
You shouldn't ever use for to read lines: http://mywiki.wooledge.org/DontReadLinesWithFor
Use arrays and globs when counting files to do this safely, robustly, and without external commands: http://mywiki.wooledge.org/BashFAQ/004
Always NUL-terminate file lists coming out of find -- otherwise, filenames containing newlines (yes, they're legal in UNIX!) can cause a single name to be read as multiple files, or (in some find versions and usages) your "filename" to not match the real file's name. http://mywiki.wooledge.org/UsingFind

Related

How to store a list of files; do a test; act on the list in bash?

I'm trying to do what has to be a pretty common workflow:
Use find to build a list of files that I want to act on
Make a test of that list (e.g. that it's not empty)
Send that list to a command
How can I do this?
FILES=$(find $DIR -type f)
[ -z "$FILES" ] && exit 1
cmd "$FILES"
The cmd command doesn't seem to understand that "$FILES" is a list of arguments for it.
Maybe you want to say something like:
declare -a FILES
ifs_bak="$IFS" # backup IFS
IFS=$'\n' # set IFS to "\n" to split the result of find on it
FILES=( $(find "$DIR" -type f) )
IFS="$ifs_bak" # restore IFS
[[ "${#FILES[#]}" -eq 0 ]] && exit 1
cmd "${FILES[#]}"
"$FILES" in your code is nothing but a single concatenated string of filenames (with spaces in between) and cmd will not accept that as a list of arguments. It will be easy to imagine what happens if you say: cmd "file1 file2 file3 ..".
You need to use an array instead. Then you are invoking as: cmd "file1" "file2" "file3" ...
As #DavidC.Rankin said in a comment, the simple way to do this is with the find command's -exec primitive. This version will run the command once for each file:
find "$DIR" -type f -exec cmd {} \;
And this will run the command for groups of files:
find "$DIR" -type f -exec cmd {} +
In either case, if there are no files it will not run cmd. The + version might run the command more than once, if there are so many files that the list exceeds the maximum argument list size.
If you want more control, you can store a list of files as an array:
files=()
while IFS= read -r -d '' file; do
files+=("$file")
done < <(find "$DIR" -type f -print0)
[[ ${#files[#]} -eq 0 ]] && exit 1
cmd "${files[#]}"
Note that there's a lot of syntactic elements here -- brackets, braces, parentheses, quotes, etc -- that're absolutely required for this to work right. BTW, the <( ) (process substitution) trick used to capture find's output is a bash-only feature, and isn't available even in bash if it's run under the name sh. So use a bash shebang (#!/bin/bash or #!/usr/bin/env bash), and don't override it by running the script with the sh command.

Using nested For Loop in Bash with IF statement

I have a directory containing a number of .cntl files - I am using a For Loop to delete all the files however I want to keep 2 of the .cntl files. This is a basic version on what I have so far
MY_DIR=/home/shell/
CNTL_FILE_LIST=`find ${MY_DIR}*.cntl -type f`
CNTL_EXCEPTION_LIST="/home/shell/test4.cntl /home/shell/test5.cntl"
I am having some syntax issues with my below nested For Loop. I am trying to delete all cntl files in MY_DIR except test4.cntl and test5.cntl
for file in CNTL_FILE_LIST
do
for exception in CNTL_EXCEPTION_LIST
do
if [ "${file}" != ${exception} ]
rm $file
fi
done
done
Can anyone see what I am doing wrong?
In practice, you should let find itself do the work of excluding files, as described in the second part (using -not) of the answer by user unknown. That said, to demonstrate how one might safely use bash for this:
#!/usr/bin/env bash
case $BASH_VERSION in
''|[1-3].*) echo "ERROR: Bash 4.0 or newer required" >&2; exit 1;;
esac
# Use of lowercase names here is deliberate -- POSIX specifies all-caps names for variables
# ...meaningful to the operating system or shell; other names are available for application
# ...use; see http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html,
# fourth paragraph.
my_dir=/home/shell
# Using an associative array rather than a regular one allows O(1) lookup
declare -A cntl_exception_list
cntl_exception_list=(
["${my_dir}/test4.cntl"]=1
["${my_dir}/test5.cntl"]=1
)
while IFS= read -r -d '' file; do
[[ ${cntl_exception_list[$file]} ]] && continue
rm -f -- "$file"
done < <(find "$my_dir" -type f -print0)
Note:
declare -A creates an associative array. These can have arbitrary strings as keys; here, we can use our names to match again as such keys.
Using NUL-delimited filenames (-print0) ensures that even names with whitespace or literal newlines are unambiguously represented.
See BashFAQ #1 for the syntax used for the while read loop.
Well file4.cntl is != file5.cntl and get's therefore deleted on comparing it, file5.cntl gets deleted when compared to file4.cntl.
MY_DIR=/home/shell/
CNTL_FILE_LIST=`find ${MY_DIR}*.cntl -type f`
CNTL_EXCEPTION_LIST="/home/shell/test4.cntl /home/shell/test5.cntl"
for file in CNTL_FILE_LIST
do
for exception in CNTL_EXCEPTION_LIST
do
if [ "${file}" != ${exception} ]
rm $file
fi
done
done
Instead use just find:
find ${MY_DIR} -maxdepth 1 -type f -name "*.cntl" -not -name "file4.cntl" -not -name "file5.cntl" -delete
But not every find supports -delete, Gnu-find does, and you have to know, if -maxdepth 1 applies for you.
Try first with -ls instead of -delete.
user unknow is right. So you should probably not doing this.
Instead, you can remove $CNTL_EXCEPTION_LIST from $CNTL_FILE_LIST before doing the deletion.
for i in $CNTL_EXCEPTION_LIST
do
CNTL_FILE_LIST=${CNTL_FILE_LIST//$i/}
done
You can reference to man bash for this usage, just search Pattern substitution.
After this, $CNTL_FILE_LIST will NOT inclued the exceptions anymore, and now you can safely delete them by rm $CNTL_FILE_LIST.

Add .old to files without .old in them, having trouble with which variable to use?

#!/bin/bash
for filenames in $( ls $1 )
do
echo $filenames | grep "\.old$"
if [ ! $filenames = 0 ]
then
$( mv "$1/$filenames" "$1/$filenames.old" )
fi
done
So I think most of the script works. It is intended to take the output of ls for a directory inputed in the first parameter, and search for any files with .old at the end. Any files that do not contain .old will then be renamed.
The script successfully renames the files, but it will add .old to a file already containing the extension. I am assuming that the if variable is wrong, but I cannot figure out which variable to use in this case.
Answer is in the key but if anyone needs to do this here is an even easier way:
#!/bin/bash
for filenames in $( ls $1 | grep -v "\.old$" )
do
$( mv "$1/$filenames" "$1/$filenames.old" )
done
Use `find for this
find /directory/here -type f ! -iname "*.old" -exec mv {} {}.old \;
Problems the original approach
for filenames in $( ls $1 ) Never parse ls output. Check [ this ]
Variables are not double quoted, say in if [ ! $filenames = 0 ]. This results in word-splitting. Use "$filenames" unless you expect word splitting.
So the final script would be
#!/bin/bash
if [ -d "$1" ]
then
find "$1" -type f ! -iname "*.old" -exec mv {} {}.old \;
# use -maxdepth 1 with find if you don't wish to recursively check subdirectories
else
echo "Directory : $1 doesn't exist !"
fi
Usage
./script '/path/to/directory'
Don't use ls in scripts.
#!/bin/bash
for filename in "$1"/*
do
case $filename in *.old) continue;; esac
mv "$filename" "$filename.old"
done
I prefer case over if because it supports wildcard matching naturally and portably. (You could run this with /bin/sh just as well.) If you wanted to use if instead, that'd be
if echo "$filename" | grep -q '\.old$'; then
or more idiomatically, but recent shells only,
if [[ "$filename" == *.old ]]; then
You want to avoid calling additional utility functions if simple shell builtins will do. Why? Each additional utility you call grep, etc. spawns and runs in a separate subshell of its own. (if you are spawning a subshell for every iteration in your loop -- things will really slow down) If the shell doesn't provide a feature, then sure... calling a utility is the right thing to do.
As mentioned above, shell globbing along with parameter expansion with substring removal provides a simple test for determining if a file has an .old extension. All you need is:
for i in "$1"/*; do
[ "${i##*.}" = "old" ] || mv "$i" "${i}.old"
done
(note: this will skip add the .old extension to single file named 'old', but that can be handled separately if needed -- unlikely. Additionally, the solution with find is a fine approach as well)
I solved the problem, as I was misled by my instructor!
$? is the variable which represents the pipeline output which is currently in the forground (which would be grep). The new code is unedited except for
if [ ! $? = 0 ]

how many files find found?

I'm writing a script where I want to error out if the file I'm searching for exists in multiple locations, and tell the user the locations (the find results). So I've got a find like:
file_location=$(find $dir -name $file -print)
I'm thinking it should be simple to see if the file is found in multiple places, but I must not be matching what find uses to separate results with (seems like space sometimes, and a newline others). As such, rather than matching on that, I want to see if there are any characters after $file in $file_location.
I'm checking for
echo "$file_location" | grep -q "${file}."; then
and this still doesn't work. So I guess I don't care what I use, except I want to capture $file_location as a result of the find, and then check that. Can you suggest a good way?
Something like below if you want to avoid errors on eols and such
files=()
while IFS= read -d $'\0' -r match; do
files+=("$match")
done < <(find "$dir" -name "$file" -print0)
(${#files[#]} > 1) && printf '%s\n' "${files[#]}"
Or in bash 4+
shopt -s globstar dotglob
files=("$dir"/**/"$file")
((${#files[#]} > 1)) && printf '%s\n' "${files[#]}"
found=$(find "$dir" -name "$file" -ls)
count=$(wc -l <<< "$found")
if [ "$count" -gt 1 ]
then
echo "I found more than one:"
echo "$found"
fi
For zero matches found you will still receive a 1 because of the intransparent way the shell strips a trailing newline with the $() operator, so effectively one line output and zero lines output are both one line in the end. See xxd <<< "" for demonstration of the automatic appending of a newline when used as input again. A simple way to circumvent this is to add a fake newline in the beginning of the string, so no empty string can happen: found=$(echo; find …), and then subtract one from the number of lines.
EDIT: I changed the usage of -printf "%p\n" in my answer to -ls which performs a proper quoting of newlines. Otherwise file names with newlines in them would mess up the counting.
If you specify the full name in the find command, the matches on name will be unique. That is, if you say find -name "hello.txt", just files named hello.txt will be found.
What you can do is something like
find $dir -name $file -printf '.'
^^^^^^^^^^^
this will print as many . as matches are found. Then, to see how many files are found with this name it is just a matter of counting the number of dots you got as output.
No need for find here if you're running a new (4.0+) bash which can do recursive globbing itself; just load glob results directly into a shell array, and check its length:
shopt -s nullglob globstar # enable recursive globbing, and null results
file_locations=( "$dir"/**/"$file" )
echo "${#file_locations[#]} files named $file found under $dir; they are:"
printf ' %q\n' "${file_locations[#]}"
If you don't want to mess with nullglob, then:
shopt -s globstar # enable recursive globbing
file_locations=( "$dir"/**/"$file" )
# without nullglob, a failed match will return the glob expression itself
# to test for this, see if our first entry exists
if [[ -e ${file_locations[0]} ]]; then
echo "No instances of $file found under $dir"
else
echo "${#file_locations[#]} files named $file found under $dir; they are:"
printf ' %q\n' "${file_locations[#]}"
fi
You can still use an array to unambiguously read find results on old versions of bash; unlike more naive approaches, this will work even when file or directory names contain literal newlines:
file_locations=( )
while IFS= read -r -d '' filename; do
file_locations+=( "$filename" )
done < <(find "$dir" -type f -name "$file" -print0)
echo "${#file_locations[#]} files named $file found under $dir; they are:"
printf ' %q\n' "${file_locations[#]}"
I recommend using:
find . -name blong.txt -print0
Which tells find to join its output together with null \0 characters. Makes it easier to use awk with the -F flag or xargs with the -0 flag.
Try:
N=0
for i in `find $dir -name $file -printf '. '`
do
N=$((N+1))
done
echo $N

find command with filename coming from bash printf builtin not working

I'm trying to do a script which lists files on a directory and then searchs one by one every file in other directory. For dealing with spaces and special characters like "[" or "]" I'm using $(printf %q "$FILENAME") as input for the find command: find /directory/to/search -type f -name $(printf %q "$FILENAME").
It works like a charm for every filename except in one case: when there's multibyte characters (UTF-8). In that case the output of printf is an external quoted string, i.e.: $'file name with blank spaces and quoted characters in the form of \NNN\NNN', and that string is not being expanded without the $'' quoting, so find searchs for a file with a name including that quote: «$'filename'».
Is there an alternative solution in order to be able to pass to find any kind of filename?
My script is like follows (I know some lines can be deleted, like the "RESNAME="):
#!/bin/bash
if [ -d $1 ] && [ -d $2 ]; then
IFSS=$IFS
IFS=$'\n'
FILES=$(find $1 -type f )
for FILE in $FILES; do
BASEFILE=$(printf '%q' "$(basename "$FILE")")
RES=$(find $2 -type f -name "$BASEFILE" -print )
if [ ${#RES} -gt 1 ]; then
RESNAME=$(printf '%q' "$(basename "$RES")")
else
RESNAME=
fi
if [ "$RESNAME" != "$BASEFILE" ]; then
echo "FILE NOT FOUND: $FILE"
fi
done
else
echo "Directories do not exist"
fi
IFS=$IFSS
As an answer said, I've used associative arrays, but with no luck, maybe I'm not using correctly the arrays, but echoing it (array[#]) returns nothing. This is the script I've written:
#!/bin/bash
if [ -d "$1" ] && [ -d "$2" ]; then
declare -A files
find "$2" -type f -print0 | while read -r -d $'\0' FILE;
do
BN2="$(basename "$FILE")"
files["$BN2"]="$BN2"
done
echo "${files[#]}"
find "$1" -type f -print0 | while read -r -d $'\0' FILE;
do
BN1="$(basename "$FILE")"
if [ "${files["$BN1"]}" != "$BN1" ]; then
echo "File not found: "$BN1""
fi
done
fi
Don't use for loops. First, it is slower. Your find has to complete before the rest of your program can run. Second, it is possible to overload the command line. The enter for command must fit in the command line buffer.
Most importantly of all, for sucks at handling funky file names. You're running conniptions trying to get around this. However:
find $1 -type f -print0 | while read -r -d $'\0' FILE
will work much better. It handles file names -- even file names that contain \n characters. The -print0 tells find to separate file names with the NUL character. The while read -r -d $'\0 FILE will read each file name (separate by the NUL character) into $FILE.
If you put quotes around the file name in the find command, you don't have to worry about special characters in the file names.
Your script is running find once for each file found. If you have 100 files in your first directory, you're running find 100 times.
Do you know about associative (hash) arrays in BASH? You are probably better off using associative arrays. Run find on the first directory, and store those files names in an associative array.
Then, run find (again using the find | while read syntax) for your second directory. For each file you find in the second directory, see if you have a matching entry in your associative array. If you do, you know that file is in both arrays.
Addendum
I've been looking at the find command. It appears there's no real way to prevent it from using pattern matching except through a lot of work (like you were doing with printf. I've tried using the -regex matching and using \Q and \E to remove the special meaning of pattern characters. I haven't been successful.
There comes a time that you need something a bit more powerful and flexible than shell to implement your script, and I believe this is the time.
Perl, Python, and Ruby are three fairly ubiquitous scripting languages found on almost all Unix systems and are available on other non-POSIX platforms (cough! ...Windows!... cough!).
Below is a Perl script that takes two directories, and searches them for matching files. It uses the find command once and uses associative arrays (called hashes in Perl). I key the hash to the name of my file. In the value portion of the hash, I store an array of the directories where I found this file.
I only need to run the find command once per directory. Once that is done, I can print out all the entries in the hash that contain more than one directory.
I know it's not shell, but this is one of the cases where you can spend a lot more time trying to figure out how to get shell to do what you want than its worth.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use File::Find;
use constant DIRECTORIES => qw( dir1 dir2 );
my %files;
#
# Perl version of the find command. You give it a list of
# directories and a subroutine for filtering what you find.
# I am basically rejecting all non-file entires, then pushing
# them into my %files hash as an array.
#
find (
sub {
return unless -f;
$files{$_} = [] if not exists $files{$_};
push #{ $files{$_} }, $File::Find::dir;
}, DIRECTORIES
);
#
# All files are found and in %files hash. I can then go
# through all the entries in my hash, and look for ones
# with more than one directory in the array reference.
# IF there is more than one, the file is located in multiple
# directories, and I print them.
#
for my $file ( sort keys %files ) {
if ( #{ $files{$file} } > 1 ) {
say "File: $file: " . join ", ", #{ $files{$file} };
}
}
Try something like this:
find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\}
How about this one-liner?
find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n"
Absolutely 100% safe regarding files with funny symbols, newlines and spaces in their name.
How does it work?
find (the main one) will scan through directory dir1 and for each file (-type f) will execute
read < <(find dir2 -name "${1##*/} -type f")
with argument the name of the current file given by the main find. This argument is at position $1. The ${1##*/} removes everything before the last / so that if $1 is path/to/found/file the find statement is:
find dir2 -name "file" -type f
This outputs something if file is found, otherwise has no output. That's what is read by the read bash command. read's exit status is true if it was able to read something, and false if there wasn't anything read (i.e., in case nothing is found). This exit status becomes bash's exit status which becomes -exec's status. If true, the next -printf statement is executed, and if false, the -o -printf part will be executed.
If your dirs are given in variables $dir1 and $dir2 do this, so as to be safe regarding spaces and funny symbols that could occur in $dir2:
find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n"
Regarding efficiency: this is of course not an efficient method at all! the inner find will be executed as many times as there are found files in dir1. This is terrible, especially if the directory tree under dir2 is deep and has many branches (you can rely a little bit on caching, but there are limits!).
Regarding usability: you have fine-grained control on how both find's work and on the output, and it's very easy to add many more tests.
So, hey, tell me how to compare files from two directories? Well, if you agree on loosing a little bit of control, this will be the shortest and most efficient answer:
diff dir1 dir2
Try it, you'll be amazed!
Since you are only using find for its recursive directory following, it will be easier to simply use the globstar option in bash. (You're using associative arrays, so your bash is new enough).
#!/bin/bash
shopt -s globstar
declare -A files
if [[ -d $1 && -d $2 ]]; then
for f in "$2"/**/*; do
[[ -f "$f" ]] || continue
BN2=$(basename "$f")
files["$BN2"]=$BN2
done
echo "${files[#]}"
for f in "$1"/**/*; do
[[ -f "$f" ]] || continue
BN1=$(basename $f)
if [[ ${files[$BN1]} != $BN1 ]]; then
echo "File not found: $BN1"
fi
done
fi
** will match zero or more directories, so $1/**/* will match all the files and directories in $1, all the files and directories in those directories, and so forth all the way down the tree.
If you want to use associative arrays, here's one possibility that will work well with files with all sorts of funny symbols in their names (this script has too much to just show the point, but it is usable as is – just remove the parts you don't want and adapt to your needs):
#!/bin/bash
die() {
printf "%s\n" "$#"
exit 1
}
[[ -n $1 ]] || die "Must give two arguments (none found)"
[[ -n $2 ]] || die "Must give two arguments (only one given)"
dir1=$1
dir2=$2
[[ -d $dir1 ]] || die "$dir1 is not a directory"
[[ -d $dir2 ]] || die "$dir2 is not a directory"
declare -A dir1files
declare -A dir2files
while IFS=$'\0' read -r -d '' file; do
dir1files[${file##*/}]=1
done < <(find "$dir1" -type f -print0)
while IFS=$'\0' read -r -d '' file; do
dir2files[${file##*/}]=1
done < <(find "$dir2" -type f -print0)
# Which files in dir1 are in dir2?
for i in "${!dir1files[#]}"; do
if [[ -n ${dir2files[$i]} ]]; then
printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2"
# Remove it from dir2 has
unset dir2files["$i"]
else
printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2"
fi
done
# Which files in dir2 are not in dir1?
# Since I unset them from dir2files hash table, the only keys remaining
# correspond to files in dir2 but not in dir1
if [[ -n "${!dir2files[#]}" ]]; then
printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[#]}"
fi
Remark. The identification of files is only based on their filenames, not their contents.

Resources