How to list subdirectories with white space in their names in bash - bash

I've found a few similar questions to this but surprisingly none of them work for me.
I have this written in a script:
for d in $(ls -d "$1"); do
echo $d
done
$1 is the parent directory for which I wish to print out the list of subdirectories for, however, running this prints out, for example, a directory named "dir with spaces" as 3 words on separate lines.

You can use shell globbing instead of process substitution, which doesn't suffer from word expansion problem:
# to include dotfiles and not iterate empty directory
shopt -s dotglob nullglob
for d in "$1"/*; do
echo "$d"
done
Or you can resort to pretty common find ... -print0 | xargs -0 ... pattern.

Related

How to recursively traverse a directory tree and find only files?

I am working on a scp call to download a folder present on a remote system. Downloaded folder has subfolders and within these subfolders there are a bunch of files which I want to pass as arguments to a python script like this:
scp -r researcher#192.168.150.4:SomeName/SomeNameElse/$folder_name/ $folder_name/
echo "File downloaded successfully"
echo "Running BD scanner"
for d in $folder_name/*; do
if [[ -d $d ]]; then
echo "It is a directory"
elif [[ -f $d ]]; then
echo "It is a file"
echo "Running the scanner :"
python bd_scanner_new.py /home/nsadmin/Some/bash_script_run_files/$d
else
echo "$d is invalid file"
exit 1
fi
done
I have added the logic to find if there are any directories and excluding them. However, I don't traverse down those directories recursively.
Partial results below:
File downloaded succesfully
Running BD scanner
It is a directory
It is a directory
It is a directory
Exiting
I want to improve this code so that it traverses all directories and picks up all files. Please help me with any suggestions.
You can use shopt -s globstar in Bash 4.0+:
#!/bin/bash
shopt -s globstar nullglob
cd _your_base_dir
for file in **/*; do
# will loop for all the regular files across the entire tree
# files with white spaces or other special characters are gracefully handled
python bd_scanner_new.py "$file"
done
Bash manual says this about globstar:
If set, the pattern ‘**’ used in a filename expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a ‘/’, only directories and subdirectories
match.
More globstar discussion here: https://unix.stackexchange.com/questions/117826/bash-globstar-matching
Why go through the trouble of using globbing for file matching but rather use find with is meant for this by using a process-substitution (<()) with a while-loop.
#!/bin/bash
while IFS= read -r -d '' file; do
# single filename is in $file
python bd_scanner_new.py "$file"
done < <(find "$folder_name" -type f -print0)
Here, find does a recursive search of all the files from the mentioned path to any level of sub-directories below. Filenames can contain blanks, tabs, spaces, newlines. To process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL which then is read command processes with the same de-limit character.
Note; On a side note, always double-quote variables in bash to avoid expansion by shell.

Trying to rename certain file types within recursive directories

I have a bunch of files within a directory structure as such:
Dir
SubDir
File
File
Subdir
SubDir
File
File
File
Sorry for the messy formatting, but as you can see there are files at all different directory levels. All of these file names have a string of 7 numbers appended to them as such: 1234567_filename.ext. I am trying to remove the number and underscore at the start of the filename.
Right now I am using bash and using this oneliner to rename the files using mv and cut:
for i in *; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
This is being run while I am CD'd into the directory. I would love to find a way to do this recursively, so that it only renamed files, not folders. I have also used a foreach loop in the shell, outside of bash for directories that have a bunch of folders with files in them and no other subdirectories as such:
foreach$ set p=`echo $f | cut -d/ -f1`
foreach$ set n=`echo $f | cut -d/ -f2 | cut -d_ -f2-10`
foreach$ mv $f $p/$n
foreach$ end
But that only works when there are no other subdirectories within the folders.
Is there a loop or oneliner I can use to rename all files within the directories? I even tried using find but couldn't figure out how to incorporate cut into the code.
Any help is much appreciated.
With Perl‘s rename (standalone command):
shopt -s globstar
rename -n 's|/[0-9]{7}_([^/]+$)|/$1|' **/*
If everything looks fine remove -n.
globstar: If set, the pattern ** used in a pathname expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a /, only directories and subdirectories
match.
bash does provide functions, and these can be recursive, but you don't need a recursive function for this job. You just need to enumerate all the files in the tree. The find command can do that, but turning on bash's globstar option and using a shell glob to do it is safer:
#!/bin/bash
shopt -s globstar
# enumerate all the files in the tree rooted at the current working directory
for f in **; do
# ignore directories
test -d "$f" && continue
# separate the base file name from the path
name=$(basename "$f")
dir=$(dirname "$f")
# perform the rename, using a pattern substitution on the name part
mv "$f" "${dir}/${name/#???????_/}"
done
Note that that does not verify that file names actually match the pattern you specified before performing the rename; I'm taking you at your word that they do. If such a check were wanted then it could certainly be added.
How about this small tweak to what you have already:
for i in `find . -type f`; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
Basically just swapping the * with `find . -type f`
Should be possible to do this using find...
find -E . -type f \
-regex '.*/[0-9]{7}_.*\.txt' \
-exec sh -c 'f="${0#*/}"; mv -v "$0" "${0%/*}/${f#*_}"' {} \;
Your find options may be different -- I'm doing this in FreeBSD. The idea here is:
-E instructs find to use extended regular expressions.
-type f causes only normal files (not directories or symlinks) to be found.
-regex ... matches the files you're looking for. You can make this more specific if you need to.
exec ... \; runs a command, using {} (the file we've found) as an argument.
The command we're running uses parameter expansion first to grab the target directory and second to strip the filename. Note the temporary variable $f, which is used to address the possibility of extra underscores being part of the filename.
Note that this is NOT a bash command, though you can of course run it from the bash shell. If you want a bash solution that does not require use of an external tool like find, you may be able to do the following:
$ shopt -s extglob # use extended glob format
$ shopt -s globstar # recurse using "**"
$ for f in **/+([0-9])_*.txt; do f="./$f"; echo mv "$f" "${f%/*}/${f##*_}"; done
This uses the same logic as the find solution, but uses bash v4 extglob to provide better filename matching and globstar to recurse through subdirectories.
Hope these help.

Filenames with wildcards in variables

#!/bin/bash
outbound=/home/user/outbound/
putfile=DATA_FILE_PUT_*.CSV
cd $outbound
filecnt=0
for file in $putfile; do let filecnt=filecnt+1; done
echo "Filecount: " $filecnt
So this code works well when there are files located in the outbound directory. I can place files into the outbound path and as long as they match the putfile mask then the files are incremented as expected.
Where the problem comes in is if I run this while there are no files located in $outbound.
If there are zero files there $filecnt still returns a 1 but I'm looking to have it return a 0 if there are no files there.
Am I missing something simple?
Put set -x just below the #! line to watch what your script is doing.
If there is no matching file, then the wildcard is left unexpanded, and the loop runs once, with file having the value DATA_FILE_PUT_*.CSV.
To change that, set the nullglob option. Note that this only works in bash, not in sh.
shopt -s nullglob
putfile=DATA_FILE_PUT_*.CSV
for file in $putfile; do let filecnt=filecnt+1; done
Note that the putfile variable contains the wildcard pattern, not the list of file names. It might make more sense to put the list of matches in a variable instead. This needs to be an array variable, and you need to change the current directory first. The number of matching files is then the length of the array.
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
echo "Filecount: " ${#putfiles}
If you need to iterate over the files, take care to protect the expansion of the array with double quotes, otherwise if a file name contains whitespace then it will be split over several words (and if a filename contains wildcard characters, they will be expanded).
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
for file in "${putfiles[#]}"; do
echo "Processing $file"
done
You could test if file exists first
for file in $putfile; do
if [ -f "$file" ] ; then
let filecnt=filecnt+1
fi
done
Or look for your files with find
for file in $(find . -type f -name="$putfile"); do
let filecnt=filecnt+1
done
or simply (fixed)
filecnt=$(find . -type f -name "$putfile" | wc -l); echo $filecnt
This is because when no matches are found, bash by default expands the wildcard DATA_FILE_PUT_*.CSV to the word DATA_FILE_PUT_*.CSV and therefore you end up with a count of 1.
To disable this behavior, use shopt -s nullglob
Not sure why you need a piece of code here. Following one liner should do your job.
ls ${outbound}/${putfile} | wc -l
Or
find ${outbound} -maxdepth 1 -type f -name "${putfile}" | wc -l

Looping over directories in Bash

I have a fundamental question about how bash works, and a related practical question.
Fundamental question: suppose I am in a directory that has three subdirectories: a, b, and c.
hen the code
for dir in $(ls)
do
echo $dir
done
spits out:
a b c
a b c
a b c
i.e, dir always stores a list of all of the files/directories in my cwd. My question is: why in the world would this be convenient? In my opinion it is far more useful and intuitive to have dir store each element at a time, i.e I would want to have output
a
b
c
Also, as per one of the answers - it is wrong to use for dir in $(ls), but when I use for dir in $(ls -l) I get even more copies of a b c (more than there are directories/files in the cwd). Why is that?
My second question is practical: how do I loop over all the directories (not files!) in my cwd that start with capital W? I started with
for dir in `ls -l W*`
but this fails because a) the reason in question 1 and b) because it doesn't exclude files. Suggestions appreciated.
Never ever parse the output of ls like this (Why you shouldn't parse the output of ls(1)).
Also, your syntax is wrong. You don't mean (), you mean $().
That being said, to loop over directories starting with W you would do (or use the find command instead, depending on your scenario):
for path in /my/path/W*; do
[ -d "${path}" ] || continue # if not a directory, skip
dirname="$(basename "${path}")"
do_stuff
done
As for the output you get from the evil ls-loop, it should not look like that. This is the expected output and demonstrates why you do not want to use ls in the first place:
$ find
.
./c
./a
./foo bar
./b
$ type ls
ls is hashed (/bin/ls)
$ for x in $(ls); do echo "${x}"; done
a
b
c
foo
bar
This should work:
shopt -s nullglob # empty directory will return empty list
for dir in ./*/;do
echo "$dir" # dir is directory only because of the / after *
done
To be recursive in subdirectories too, use globstar:
shopt -s globstar nullglob
for dir in ./**/;do
echo "$dir" # dir is directory only because of the / after **
done
You can make #Adrian Frühwirths' method to be recursive to sub-directories by using globstar too:
shopt -s globstar
for dir in ./**;do
[[ ! -d $dir ]] && continue # if not directory then skip
echo "$dir"
done
From Bash Manual:
globstar
If set, the pattern ‘**’ used in a filename expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a ‘/’, only directories and subdirectories
match.
nullglob
If set, Bash allows filename patterns which match no files to expand
to a null string, rather than themselves.
Well, you know what you are seeing is not what you are expecting. The output you are seeing is not from the echo command, but from the dir command.
Try the following:
ls -1 | while read line; do
if [-d "$line" ] ; then
echo $line
fi
done
for files in $(ls) ; do
if [-d "$files" ] ; then
echo $files
fi
done

How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?

Say I want to copy the contents of a directory excluding files and folders whose names contain the word 'Music'.
cp [exclude-matches] *Music* /target_directory
What should go in place of [exclude-matches] to accomplish this?
In Bash you can do it by enabling the extglob option, like this (replace ls with cp and add the target directory, of course)
~/foobar> shopt extglob
extglob off
~/foobar> ls
abar afoo bbar bfoo
~/foobar> ls !(b*)
-bash: !: event not found
~/foobar> shopt -s extglob # Enables extglob
~/foobar> ls !(b*)
abar afoo
~/foobar> ls !(a*)
bbar bfoo
~/foobar> ls !(*foo)
abar bbar
You can later disable extglob with
shopt -u extglob
The extglob shell option gives you more powerful pattern matching in the command line.
You turn it on with shopt -s extglob, and turn it off with shopt -u extglob.
In your example, you would initially do:
$ shopt -s extglob
$ cp !(*Music*) /target_directory
The full available extended globbing operators are (excerpt from man bash):
If the extglob shell option is enabled using the shopt builtin, several extended
pattern matching operators are recognized.A pattern-list is a list of one or more patterns separated by a |. Composite patterns may be formed using one or more of the following sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
So, for example, if you wanted to list all the files in the current directory that are not .c or .h files, you would do:
$ ls -d !(*#(.c|.h))
Of course, normal shell globing works, so the last example could also be written as:
$ ls -d !(*.[ch])
Not in bash (that I know of), but:
cp `ls | grep -v Music` /target_directory
I know this is not exactly what you were looking for, but it will solve your example.
If you want to avoid the mem cost of using the exec command, I believe you can do better with xargs. I think the following is a more efficient alternative to
find foo -type f ! -name '*Music*' -exec cp {} bar \; # new proc for each exec
find . -maxdepth 1 -name '*Music*' -prune -o -print0 | xargs -0 -i cp {} dest/
A trick I haven't seen on here yet that doesn't use extglob, find, or grep is to treat two file lists as sets and "diff" them using comm:
comm -23 <(ls) <(ls *Music*)
comm is preferable over diff because it doesn't have extra cruft.
This returns all elements of set 1, ls, that are not also in set 2, ls *Music*. This requires both sets to be in sorted order to work properly. No problem for ls and glob expansion, but if you're using something like find, be sure to invoke sort.
comm -23 <(find . | sort) <(find . | grep -i '.jpg' | sort)
Potentially useful.
You can also use a pretty simple for loop:
for f in `find . -not -name "*Music*"`
do
cp $f /target/dir
done
In bash, an alternative to shopt -s extglob is the GLOBIGNORE variable. It's not really better, but I find it easier to remember.
An example that may be what the original poster wanted:
GLOBIGNORE="*techno*"; cp *Music* /only_good_music/
When done, unset GLOBIGNORE to be able to rm *techno* in the source directory.
My personal preference is to use grep and the while command. This allows one to write powerful yet readable scripts ensuring that you end up doing exactly what you want. Plus by using an echo command you can perform a dry run before carrying out the actual operation. For example:
ls | grep -v "Music" | while read filename
do
echo $filename
done
will print out the files that you will end up copying. If the list is correct the next step is to simply replace the echo command with the copy command as follows:
ls | grep -v "Music" | while read filename
do
cp "$filename" /target_directory
done
One solution for this can be found with find.
$ mkdir foo bar
$ touch foo/a.txt foo/Music.txt
$ find foo -type f ! -name '*Music*' -exec cp {} bar \;
$ ls bar
a.txt
Find has quite a few options, you can get pretty specific on what you include and exclude.
Edit: Adam in the comments noted that this is recursive. find options mindepth and maxdepth can be useful in controlling this.
The following works lists all *.txt files in the current dir, except those that begin with a number.
This works in bash, dash, zsh and all other POSIX compatible shells.
for FILE in /some/dir/*.txt; do # for each *.txt file
case "${FILE##*/}" in # if file basename...
[0-9]*) continue ;; # starts with digit: skip
esac
## otherwise, do stuff with $FILE here
done
In line one the pattern /some/dir/*.txt will cause the for loop to iterate over all files in /some/dir whose name end with .txt.
In line two a case statement is used to weed out undesired files. – The ${FILE##*/} expression strips off any leading dir name component from the filename (here /some/dir/) so that patters can match against only the basename of the file. (If you're only weeding out filenames based on suffixes, you can shorten this to $FILE instead.)
In line three, all files matching the case pattern [0-9]*) line will be skipped (the continue statement jumps to the next iteration of the for loop). – If you want to you can do something more interesting here, e.g. like skipping all files which do not start with a letter (a–z) using [!a-z]*, or you could use multiple patterns to skip several kinds of filenames e.g. [0-9]*|*.bak to skip files both .bak files, and files which does not start with a number.
this would do it excluding exactly 'Music'
cp -a ^'Music' /target
this and that for excluding things like Music?* or *?Music
cp -a ^\*?'complete' /target
cp -a ^'complete'?\* /target

Resources