Looping over directories in Bash - bash

I have a fundamental question about how bash works, and a related practical question.
Fundamental question: suppose I am in a directory that has three subdirectories: a, b, and c.
hen the code
for dir in $(ls)
do
echo $dir
done
spits out:
a b c
a b c
a b c
i.e, dir always stores a list of all of the files/directories in my cwd. My question is: why in the world would this be convenient? In my opinion it is far more useful and intuitive to have dir store each element at a time, i.e I would want to have output
a
b
c
Also, as per one of the answers - it is wrong to use for dir in $(ls), but when I use for dir in $(ls -l) I get even more copies of a b c (more than there are directories/files in the cwd). Why is that?
My second question is practical: how do I loop over all the directories (not files!) in my cwd that start with capital W? I started with
for dir in `ls -l W*`
but this fails because a) the reason in question 1 and b) because it doesn't exclude files. Suggestions appreciated.

Never ever parse the output of ls like this (Why you shouldn't parse the output of ls(1)).
Also, your syntax is wrong. You don't mean (), you mean $().
That being said, to loop over directories starting with W you would do (or use the find command instead, depending on your scenario):
for path in /my/path/W*; do
[ -d "${path}" ] || continue # if not a directory, skip
dirname="$(basename "${path}")"
do_stuff
done
As for the output you get from the evil ls-loop, it should not look like that. This is the expected output and demonstrates why you do not want to use ls in the first place:
$ find
.
./c
./a
./foo bar
./b
$ type ls
ls is hashed (/bin/ls)
$ for x in $(ls); do echo "${x}"; done
a
b
c
foo
bar

This should work:
shopt -s nullglob # empty directory will return empty list
for dir in ./*/;do
echo "$dir" # dir is directory only because of the / after *
done
To be recursive in subdirectories too, use globstar:
shopt -s globstar nullglob
for dir in ./**/;do
echo "$dir" # dir is directory only because of the / after **
done
You can make #Adrian Frühwirths' method to be recursive to sub-directories by using globstar too:
shopt -s globstar
for dir in ./**;do
[[ ! -d $dir ]] && continue # if not directory then skip
echo "$dir"
done
From Bash Manual:
globstar
If set, the pattern ‘**’ used in a filename expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a ‘/’, only directories and subdirectories
match.
nullglob
If set, Bash allows filename patterns which match no files to expand
to a null string, rather than themselves.

Well, you know what you are seeing is not what you are expecting. The output you are seeing is not from the echo command, but from the dir command.
Try the following:
ls -1 | while read line; do
if [-d "$line" ] ; then
echo $line
fi
done
for files in $(ls) ; do
if [-d "$files" ] ; then
echo $files
fi
done

Related

How to list subdirectories with white space in their names in bash

I've found a few similar questions to this but surprisingly none of them work for me.
I have this written in a script:
for d in $(ls -d "$1"); do
echo $d
done
$1 is the parent directory for which I wish to print out the list of subdirectories for, however, running this prints out, for example, a directory named "dir with spaces" as 3 words on separate lines.
You can use shell globbing instead of process substitution, which doesn't suffer from word expansion problem:
# to include dotfiles and not iterate empty directory
shopt -s dotglob nullglob
for d in "$1"/*; do
echo "$d"
done
Or you can resort to pretty common find ... -print0 | xargs -0 ... pattern.

Add .old to files without .old in them, having trouble with which variable to use?

#!/bin/bash
for filenames in $( ls $1 )
do
echo $filenames | grep "\.old$"
if [ ! $filenames = 0 ]
then
$( mv "$1/$filenames" "$1/$filenames.old" )
fi
done
So I think most of the script works. It is intended to take the output of ls for a directory inputed in the first parameter, and search for any files with .old at the end. Any files that do not contain .old will then be renamed.
The script successfully renames the files, but it will add .old to a file already containing the extension. I am assuming that the if variable is wrong, but I cannot figure out which variable to use in this case.
Answer is in the key but if anyone needs to do this here is an even easier way:
#!/bin/bash
for filenames in $( ls $1 | grep -v "\.old$" )
do
$( mv "$1/$filenames" "$1/$filenames.old" )
done
Use `find for this
find /directory/here -type f ! -iname "*.old" -exec mv {} {}.old \;
Problems the original approach
for filenames in $( ls $1 ) Never parse ls output. Check [ this ]
Variables are not double quoted, say in if [ ! $filenames = 0 ]. This results in word-splitting. Use "$filenames" unless you expect word splitting.
So the final script would be
#!/bin/bash
if [ -d "$1" ]
then
find "$1" -type f ! -iname "*.old" -exec mv {} {}.old \;
# use -maxdepth 1 with find if you don't wish to recursively check subdirectories
else
echo "Directory : $1 doesn't exist !"
fi
Usage
./script '/path/to/directory'
Don't use ls in scripts.
#!/bin/bash
for filename in "$1"/*
do
case $filename in *.old) continue;; esac
mv "$filename" "$filename.old"
done
I prefer case over if because it supports wildcard matching naturally and portably. (You could run this with /bin/sh just as well.) If you wanted to use if instead, that'd be
if echo "$filename" | grep -q '\.old$'; then
or more idiomatically, but recent shells only,
if [[ "$filename" == *.old ]]; then
You want to avoid calling additional utility functions if simple shell builtins will do. Why? Each additional utility you call grep, etc. spawns and runs in a separate subshell of its own. (if you are spawning a subshell for every iteration in your loop -- things will really slow down) If the shell doesn't provide a feature, then sure... calling a utility is the right thing to do.
As mentioned above, shell globbing along with parameter expansion with substring removal provides a simple test for determining if a file has an .old extension. All you need is:
for i in "$1"/*; do
[ "${i##*.}" = "old" ] || mv "$i" "${i}.old"
done
(note: this will skip add the .old extension to single file named 'old', but that can be handled separately if needed -- unlikely. Additionally, the solution with find is a fine approach as well)
I solved the problem, as I was misled by my instructor!
$? is the variable which represents the pipeline output which is currently in the forground (which would be grep). The new code is unedited except for
if [ ! $? = 0 ]

Filenames with wildcards in variables

#!/bin/bash
outbound=/home/user/outbound/
putfile=DATA_FILE_PUT_*.CSV
cd $outbound
filecnt=0
for file in $putfile; do let filecnt=filecnt+1; done
echo "Filecount: " $filecnt
So this code works well when there are files located in the outbound directory. I can place files into the outbound path and as long as they match the putfile mask then the files are incremented as expected.
Where the problem comes in is if I run this while there are no files located in $outbound.
If there are zero files there $filecnt still returns a 1 but I'm looking to have it return a 0 if there are no files there.
Am I missing something simple?
Put set -x just below the #! line to watch what your script is doing.
If there is no matching file, then the wildcard is left unexpanded, and the loop runs once, with file having the value DATA_FILE_PUT_*.CSV.
To change that, set the nullglob option. Note that this only works in bash, not in sh.
shopt -s nullglob
putfile=DATA_FILE_PUT_*.CSV
for file in $putfile; do let filecnt=filecnt+1; done
Note that the putfile variable contains the wildcard pattern, not the list of file names. It might make more sense to put the list of matches in a variable instead. This needs to be an array variable, and you need to change the current directory first. The number of matching files is then the length of the array.
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
echo "Filecount: " ${#putfiles}
If you need to iterate over the files, take care to protect the expansion of the array with double quotes, otherwise if a file name contains whitespace then it will be split over several words (and if a filename contains wildcard characters, they will be expanded).
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
for file in "${putfiles[#]}"; do
echo "Processing $file"
done
You could test if file exists first
for file in $putfile; do
if [ -f "$file" ] ; then
let filecnt=filecnt+1
fi
done
Or look for your files with find
for file in $(find . -type f -name="$putfile"); do
let filecnt=filecnt+1
done
or simply (fixed)
filecnt=$(find . -type f -name "$putfile" | wc -l); echo $filecnt
This is because when no matches are found, bash by default expands the wildcard DATA_FILE_PUT_*.CSV to the word DATA_FILE_PUT_*.CSV and therefore you end up with a count of 1.
To disable this behavior, use shopt -s nullglob
Not sure why you need a piece of code here. Following one liner should do your job.
ls ${outbound}/${putfile} | wc -l
Or
find ${outbound} -maxdepth 1 -type f -name "${putfile}" | wc -l

How to get the number of files in a folder as a variable?

Using bash, how can one get the number of files in a folder, excluding directories from a shell script without the interpreter complaining?
With the help of a friend, I've tried
$files=$(find ../ -maxdepth 1 -type f | sort -n)
$num=$("ls -l" | "grep ^-" | "wc -l")
which returns from the command line:
../1-prefix_blended_fused.jpg: No such file or directory
ls -l : command not found
grep ^-: command not found
wc -l: command not found
respectively. These commands work on the command line, but NOT with a bash script.
Given a file filled with image files formatted like 1-pano.jpg, I want to grab all the images in the directory to get the largest numbered file to tack onto the next image being processed.
Why the discrepancy?
The quotes are causing the error messages.
To get a count of files in the directory:
shopt -s nullglob
numfiles=(*)
numfiles=${#numfiles[#]}
which creates an array and then replaces it with the count of its elements. This will include files and directories, but not dotfiles or . or .. or other dotted directories.
Use nullglob so an empty directory gives a count of 0 instead of 1.
You can instead use find -type f or you can count the directories and subtract:
# continuing from above
numdirs=(*/)
numdirs=${#numdirs[#]}
(( numfiles -= numdirs ))
Also see "How can I find the latest (newest, earliest, oldest) file in a directory?"
You can have as many spaces as you want inside an execution block. They often aid in readability. The only downside is that they make the file a little larger and may slow initial parsing (only) slightly. There are a few places that must have spaces (e.g. around [, [[, ], ]] and = in comparisons) and a few that must not (e.g. around = in an assignment.
ls -l | grep -v ^d | wc -l
One line.
How about:
count=$(find .. -maxdepth 1 -type f|wc -l)
echo $count
let count=count+1 # Increase by one, for the next file number
echo $count
Note that this solution is not efficient: it spawns sub shells for the find and wc commands, but it should work.
file_num=$(ls -1 --file-type | grep -v '/$' | wc -l)
this is a bit lightweight than a find command, and count all files of the current directory.
The most straightforward, reliable way I can think of is using the find command to create a reliably countable output.
Counting characters output of find with wc:
find . -maxdepth 1 -type f -printf '.' | wc --char
or string length of the find output:
a=$(find . -maxdepth 1 -type f -printf '.')
echo ${#a}
or using find output to populate an arithmetic expression:
echo $(($(find . -maxdepth 1 -type f -printf '+1')))
Simple efficient method:
#!/bin/bash
RES=$(find ${SOURCE} -type f | wc -l)
Get rid of the quotes. The shell is treating them like one file, so it's looking for "ls -l".
REmove the qoutes and you will be fine
Expanding on the accepted answer (by Dennis W): when I tried this approach I got incorrect counts for dirs without subdirs in Bash 4.4.5.
The issue is that by default nullglob is not set in Bash and numdirs=(*/) sets an 1 element array with the glob pattern */. Likewise I suspect numfiles=(*) would have 1 element for an empty folder.
Setting shopt -s nullglob to disable nullglobbing resolves the issue for me. For an excellent discussion on why nullglob is not set by default on Bash see the answer here: Why is nullglob not default?
Note: I would have commented on the answer directly but lack the reputation points.
Here's one way you could do it as a function. Note: you can pass this example, dirs for (directory count), files for files count or "all" for count of everything in a directory. Does not traverse tree as we aren't looking to do that.
function get_counts_dir() {
# -- handle inputs (e.g. get_counts_dir "files" /path/to/folder)
[[ -z "${1,,}" ]] && type="files" || type="${1,,}"
[[ -z "${2,,}" ]] && dir="$(pwd)" || dir="${2,,}"
shopt -s nullglob
PWD=$(pwd)
cd ${dir}
numfiles=(*)
numfiles=${#numfiles[#]}
numdirs=(*/)
numdirs=${#numdirs[#]}
# -- handle input types files/dirs/or both
result=0
case "${type,,}" in
"files")
result=$((( numfiles -= numdirs )))
;;
"dirs")
result=${numdirs}
;;
*) # -- returns all files/dirs
result=${numfiles}
;;
esac
cd ${PWD}
shopt -u nullglob
# -- return result --
[[ -z ${result} ]] && echo 0 || echo ${result}
}
Examples of using the function :
folder="/home"
get_counts_dir "files" "${folder}"
get_counts_dir "dirs" "${folder}"
get_counts_dir "both" "${folder}"
Will print something like :
2
4
6
Short and sweet method which also ignores symlinked directories.
count=$(ls -l | grep ^- | wc -l)
or if you have a target:
count=$(ls -l /path/to/target | grep ^- | wc -l)

bash testing a group of directories for existence

Have documents stored in a file system which includes "daily" directories, e.g. 20050610. In a bash script I want to list the files in a months worth of these directories. So I'm running a find command find <path>/200506* -type f >> jun2005.lst. Would like to check that this set of directories is not a null set before executing the find command. However, if I use if[ -d 200506* ] I get a "too many arguements error. How can I get around this?
Your "too many arguments" error does not come from there being a huge number of files and exceeding the command line argument limit. It comes from having more than one or two directories that match the glob. Your glob "200506*" expands to something like "20050601 20050602 20050603..." and the -d test only expects one argument.
$ mkdir test
$ cd test
$ mkdir a1
$ [ -d a* ] # no error
$ mkdir a2
$ [ -d a* ]
-bash: [: a1: binary operator expected
$ mkdir a3
$ [ -d a* ]
-bash: [: too many arguments
The answer by zed_0xff is on the right track, but I'd use a different approach:
shopt -s nullglob
path='/path/to/dirs'
glob='200506*/'
outfile='jun2005.lst'
dirs=("$path"/$glob) # dirs is an array available to be iterated over if needed
if (( ${#dirs[#]} > 0 ))
then
echo "directories found"
# append may not be necessary here
find "$path"/$glob -type f >> "$outfile"
fi
The position of the quotes in "$path"/$glob versus "$path/$glob" is essential to this working.
Edit:
Corrections made to exclude files that match the glob (so only directories are included) and to handle the very unusual case of a directory named literally like the glob ("200506*").
prefix="/tmp/path"
glob="200611*"
n_dirs=$(find $prefix -maxdepth 1 -type d -wholename "$prefix/$glob" |wc -l)
if [[ $n_dirs -gt 0 ]];then
find $prefix -maxdepth 2 -type f -wholename "$prefix/$glob"
fi
S=200506*
if [ ${#S} -gt 6 ]; then
echo haz filez!
else
echo no filez
fi
not a very elegant one, but w/o any external tools/commands (if don't think of "[" as an external one)
the clue is if there is some files matched, then "S" variable will contain their names delimited with space. Otherwise it will contain a "200506*" string itself.
You could us ls like this:
if [ -n "$(ls -d | grep 200506)" ]; then
# There are directories with this pattern
fi
Because there is a limit on command line length in most shells: anything like "$(ls -d | grep 200506)" or /path/200506* will run the risk of overflowing the limit. I'm not sure if substitutions and glob expansions count towards it in BASH, but I assume so. You would have to test it and check the bash docs and source to be sure.
The answer is in simplifying your question.
find <path>/200506* -type f -exec somescript '{}' \;
Where somescript is a shell script that does the test. Something like this perhaps:
#!/bin/sh
[ -d "$#" ] && echo "$#" >> june2005.lst
Passing the june2005.lst to the script (advice: use an environment variable), and dealing with any possibility that 200506* may expand to tooo huge a file path, being left as an exercise for the OP ;)
Integrating the whole thing into a pipe line or adapting a more general scripting language would yield performance boosts, by minimizing the number of shells spawned. Now that would be fun. Here is a hint for that, use -exec and another program (awk, perl, etc) to do the directory test as part of a one line filter, and keep the >>june2005.lst on the find command.

Resources