Loop through the directories to get the image files - bash

I have a directory which includes multiple sub-directories. I would like to go through the directories and subdirectories and find the jpg files and convert the size using mogrify command. I would like to do it as dynamic as possible that's why I wrote a script. The $1 is the first argument that I pass through when executing the bash script. After running the script, it gives me an error about 'mogrify can not read [#]%'. I guess something is very wrong with my code and I am not mature in bash. Can anyone tell me how to do this script dynamically so that would be fast.
p.s: the name of jpg files are not in especial format...just bunch of numbers.
for folder in $1/*
do
for file in "$folder"/*
do
if [ -e "${file[#]%.jpg}" ]; then
mogrify -resize 112x112! "${file[#]%.jpg}"
fi
done
done

If you're open to using find, then this becomes pretty easy:
#!/usr/bin/env bash
find "$1" \( -iname \*.jpg -o -iname \*.jpeg \) -print0 | while read -r -d $'\0' file; do
# base="${file##*/}" $base is the file name with all the directory stuff stripped off
# dir="${file%/*} $dir is the directory with the file name stripped off
mogify -resize '112x112!' "$file"
done
Put that in a file named mymog.bash then
$ chmod 755 mymog.bash
$ mymog.bash /some/dir
Notes:
! is special to bash, so putting that in the single quotes make it "unspecial", passing it along to the mogrify command unmolested.
The double quotes around $1 and $file are needed in case a directory or file name has spaces in it. If you had a directory named /Users/alice/my pictures and didn't use the quotes, mogrify would get one argument named /Users/alice/my and another one named pictures.
Make sure you use the \( and \) for find. That makes the whole condition ("match *.jpg" OR "match *.jpeg") apply to the action -print0.
I used find 's -print0 action which prints each matching file name with a null-terminated (zero-terminated) string. You can have filenames that have newline characters in the middle. This protects against that.
bash 's built-in read command reads until a newline by default. I used the -d $'\0' to make it read each "line" or "record" (filename) up to the null (zero) character at its end. (Each ends with null because of the -print0.)
This solution (one of many) has two parts:
It uses the find utility to find (under the directory given) all files that end in .jpg or .jpeg, ignoring the case of the filenames. [So it will match .JPG or even \.JpEg.]
It spits out one record for each file.
If you give it an absolute path like /some/dir, it will find /some/dir/a.jpg and /some/dir/sub1/sub2/sub3/b.jpg.
If you give it a relative path like ../../nearby/dir, it will find ../../nearby/dir/c.jpg and ../../nearby/dir/sub1/sub2/sub3/d.jpeg.
The find part ends with the first | on that line. After that, it is a bash while…do loop.
The variable file takes on the value of each record spit out by find.
The loop (everything between do and done) runs once for each value that file takes on.
The two rows that start with # are comments. They contain commands that are ignored (skipped). You can remove the # to have bash run those commands too. I included them as examples in case you needed the directory part or just the filename part of the record.

find "$1" -type f -name "*.jpg" -exec mogrify -resize 112x112! {} \;

Try find and a while read loop:
find "$1" -type f -name '*.jpg' -print | while read fname
do
....
done
If your filenames may contain special chars like line feed, the use:
find "$1" -type f -name '*.jpg' -print0 | while IFS= read -r -d '' fname
do
....
done
There are many more options to tune the search.

Related

How to find files with specific extensions recursively using the for/in syntax? [duplicate]

x=$(find . -name "*.txt")
echo $x
if I run the above piece of code in Bash shell, what I get is a string containing several file names separated by blank, not a list.
Of course, I can further separate them by blank to get a list, but I'm sure there is a better way to do it.
So what is the best way to loop through the results of a find command?
TL;DR: If you're just here for the most correct answer, you probably want my personal preference (see the bottom of this post):
# execute `process` once for each file
find . -name '*.txt' -exec process {} \;
If you have time, read through the rest to see several different ways and the problems with most of them.
The full answer:
The best way depends on what you want to do, but here are a few options. As long as no file or folder in the subtree has whitespace in its name, you can just loop over the files:
for i in $x; do # Not recommended, will break on whitespace
process "$i"
done
Marginally better, cut out the temporary variable x:
for i in $(find -name \*.txt); do # Not recommended, will break on whitespace
process "$i"
done
It is much better to glob when you can. White-space safe, for files in the current directory:
for i in *.txt; do # Whitespace-safe but not recursive.
process "$i"
done
By enabling the globstar option, you can glob all matching files in this directory and all subdirectories:
# Make sure globstar is enabled
shopt -s globstar
for i in **/*.txt; do # Whitespace-safe and recursive
process "$i"
done
In some cases, e.g. if the file names are already in a file, you may need to use read:
# IFS= makes sure it doesn't trim leading and trailing whitespace
# -r prevents interpretation of \ escapes.
while IFS= read -r line; do # Whitespace-safe EXCEPT newlines
process "$line"
done < filename
read can be used safely in combination with find by setting the delimiter appropriately:
find . -name '*.txt' -print0 |
while IFS= read -r -d '' line; do
process "$line"
done
For more complex searches, you will probably want to use find, either with its -exec option or with -print0 | xargs -0:
# execute `process` once for each file
find . -name \*.txt -exec process {} \;
# execute `process` once with all the files as arguments*:
find . -name \*.txt -exec process {} +
# using xargs*
find . -name \*.txt -print0 | xargs -0 process
# using xargs with arguments after each filename (implies one run per filename)
find . -name \*.txt -print0 | xargs -0 -I{} process {} argument
find can also cd into each file's directory before running a command by using -execdir instead of -exec, and can be made interactive (prompt before running the command for each file) using -ok instead of -exec (or -okdir instead of -execdir).
*: Technically, both find and xargs (by default) will run the command with as many arguments as they can fit on the command line, as many times as it takes to get through all the files. In practice, unless you have a very large number of files it won't matter, and if you exceed the length but need them all on the same command line, you're SOL find a different way.
What ever you do, don't use a for loop:
# Don't do this
for file in $(find . -name "*.txt")
do
…code using "$file"
done
Three reasons:
For the for loop to even start, the find must run to completion.
If a file name has any whitespace (including space, tab or newline) in it, it will be treated as two separate names.
Although now unlikely, you can overrun your command line buffer. Imagine if your command line buffer holds 32KB, and your for loop returns 40KB of text. That last 8KB will be dropped right off your for loop and you'll never know it.
Always use a while read construct:
find . -name "*.txt" -print0 | while read -d $'\0' file
do
…code using "$file"
done
The loop will execute while the find command is executing. Plus, this command will work even if a file name is returned with whitespace in it. And, you won't overflow your command line buffer.
The -print0 will use the NULL as a file separator instead of a newline and the -d $'\0' will use NULL as the separator while reading.
find . -name "*.txt"|while read fname; do
echo "$fname"
done
Note: this method and the (second) method shown by bmargulies are safe to use with white space in the file/folder names.
In order to also have the - somewhat exotic - case of newlines in the file/folder names covered, you will have to resort to the -exec predicate of find like this:
find . -name '*.txt' -exec echo "{}" \;
The {} is the placeholder for the found item and the \; is used to terminate the -exec predicate.
And for the sake of completeness let me add another variant - you gotta love the *nix ways for their versatility:
find . -name '*.txt' -print0|xargs -0 -n 1 echo
This would separate the printed items with a \0 character that isn't allowed in any of the file systems in file or folder names, to my knowledge, and therefore should cover all bases. xargs picks them up one by one then ...
Filenames can include spaces and even control characters. Spaces are (default) delimiters for shell expansion in bash and as a result of that x=$(find . -name "*.txt") from the question is not recommended at all. If find gets a filename with spaces e.g. "the file.txt" you will get 2 separated strings for processing, if you process x in a loop. You can improve this by changing delimiter (bash IFS Variable) e.g. to \r\n, but filenames can include control characters - so this is not a (completely) safe method.
From my point of view, there are 2 recommended (and safe) patterns for processing files:
1. Use for loop & filename expansion:
for file in ./*.txt; do
[[ ! -e $file ]] && continue # continue, if file does not exist
# single filename is in $file
echo "$file"
# your code here
done
2. Use find-read-while & process substitution
while IFS= read -r -d '' file; do
# single filename is in $file
echo "$file"
# your code here
done < <(find . -name "*.txt" -print0)
Remarks
on Pattern 1:
bash returns the search pattern ("*.txt") if no matching file is found - so the extra line "continue, if file does not exist" is needed. see Bash Manual, Filename Expansion
shell option nullglob can be used to avoid this extra line.
"If the failglob shell option is set, and no matches are found, an error message is printed and the command is not executed." (from Bash Manual above)
shell option globstar: "If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match." see Bash Manual, Shopt Builtin
other options for filename expansion: extglob, nocaseglob, dotglob & shell variable GLOBIGNORE
on Pattern 2:
filenames can contain blanks, tabs, spaces, newlines, ... to process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL. see also Gnu Findutils Manpage, Unsafe File Name Handling, safe File Name Handling, unusual characters in filenames. See David A. Wheeler below for detailed discussion of this topic.
There are some possible patterns to process find results in a while loop. Others (kevin, David W.) have shown how to do this using pipes:
files_found=1
find . -name "*.txt" -print0 |
while IFS= read -r -d '' file; do
# single filename in $file
echo "$file"
files_found=0 # not working example
# your code here
done
[[ $files_found -eq 0 ]] && echo "files found" || echo "no files found"
When you try this piece of code, you will see, that it does not work: files_found is always "true" & the code will always echo "no files found". Reason is: each command of a pipeline is executed in a separate subshell, so the changed variable inside the loop (separate subshell) does not change the variable in the main shell script. This is why I recommend using process substitution as the "better", more useful, more general pattern.See I set variables in a loop that's in a pipeline. Why do they disappear... (from Greg's Bash FAQ) for a detailed discussion on this topic.
Additional References & Sources:
Gnu Bash Manual, Pattern Matching
Filenames and Pathnames in Shell: How to do it Correctly, David A. Wheeler
Why you don't read lines with "for", Greg's Wiki
Why you shouldn't parse the output of ls(1), Greg's Wiki
Gnu Bash Manual, Process Substitution
(Updated to include #Socowi's execellent speed improvement)
With any $SHELL that supports it (dash/zsh/bash...):
find . -name "*.txt" -exec $SHELL -c '
for i in "$#" ; do
echo "$i"
done
' {} +
Done.
Original answer (shorter, but slower):
find . -name "*.txt" -exec $SHELL -c '
echo "$0"
' {} \;
If you can assume the file names don't contain newlines, you can read the output of find into a Bash array using the following command:
readarray -t x < <(find . -name '*.txt')
Note:
-t causes readarray to strip newlines.
It won't work if readarray is in a pipe, hence the process substitution.
readarray is available since Bash 4.
Bash 4.4 and up also supports the -d parameter for specifying the delimiter. Using the null character, instead of newline, to delimit the file names works also in the rare case that the file names contain newlines:
readarray -d '' x < <(find . -name '*.txt' -print0)
readarray can also be invoked as mapfile with the same options.
Reference: https://mywiki.wooledge.org/BashFAQ/005#Loading_lines_from_a_file_or_stream
# Doesn't handle whitespace
for x in `find . -name "*.txt" -print`; do
process_one $x
done
or
# Handles whitespace and newlines
find . -name "*.txt" -print0 | xargs -0 -n 1 process_one
I like to use find which is first assigned to variable and IFS switched to new line as follow:
FilesFound=$(find . -name "*.txt")
IFSbkp="$IFS"
IFS=$'\n'
counter=1;
for file in $FilesFound; do
echo "${counter}: ${file}"
let counter++;
done
IFS="$IFSbkp"
As commented by #Konrad Rudolph this will not work with "new lines" in file name. I still think it is handy as it covers most of the cases when you need to loop over command output.
As already posted on the top answer by Kevin, the best solution is to use a for loop with bash glob, but as bash glob is not recursive by default, this can be fixed by a bash recursive function:
#!/bin/bash
set -x
set -eu -o pipefail
all_files=();
function get_all_the_files()
{
directory="$1";
for item in "$directory"/* "$directory"/.[^.]*;
do
if [[ -d "$item" ]];
then
get_all_the_files "$item";
else
all_files+=("$item");
fi;
done;
}
get_all_the_files "/tmp";
for file_path in "${all_files[#]}"
do
printf 'My file is "%s"\n' "$file_path";
done;
Related questions:
Bash loop through directory including hidden file
Recursively list files from a given directory in Bash
ls command: how can I get a recursive full-path listing, one line per file?
List files recursively in Linux CLI with path relative to the current directory
Recursively List all directories and files
bash script, create array of all files in a directory
How can I creates array that contains the names of all the files in a folder?
How can I creates array that contains the names of all the files in a folder?
How to get the list of files in a directory in a shell script?
based on other answers and comment of #phk, using fd #3:
(which still allows to use stdin inside the loop)
while IFS= read -r f <&3; do
echo "$f"
done 3< <(find . -iname "*filename*")
You can put the filenames returned by find into an array like this:
array=()
while IFS= read -r -d ''; do
array+=("$REPLY")
done < <(find . -name '*.txt' -print0)
Now you can just loop through the array to access individual items and do whatever you want with them.
Note: It's white space safe.
You can store your find output in array if you wish to use the output later as:
array=($(find . -name "*.txt"))
Now to print the each element in new line, you can either use for loop iterating to all the elements of array, or you can use printf statement.
for i in ${array[#]};do echo $i; done
or
printf '%s\n' "${array[#]}"
You can also use:
for file in "`find . -name "*.txt"`"; do echo "$file"; done
This will print each filename in newline
To only print the find output in list form, you can use either of the following:
find . -name "*.txt" -print 2>/dev/null
or
find . -name "*.txt" -print | grep -v 'Permission denied'
This will remove error messages and only give the filename as output in new line.
If you wish to do something with the filenames, storing it in array is good, else there is no need to consume that space and you can directly print the output from find.
I think using this piece of code (piping the command after while done):
while read fname; do
echo "$fname"
done <<< "$(find . -name "*.txt")"
is better than this answer because while loop is executed in a subshell according to here, if you use this answer and variable changes cannot be seen after while loop if you want to modify variables inside the loop.
function loop_through(){
length_="$(find . -name '*.txt' | wc -l)"
length_="${length_#"${length_%%[![:space:]]*}"}"
length_="${length_%"${length_##*[![:space:]]}"}"
for i in {1..$length_}
do
x=$(find . -name '*.txt' | sort | head -$i | tail -1)
echo $x
done
}
To grab the length of the list of files for loop, I used the first command "wc -l".
That command is set to a variable.
Then, I need to remove the trailing white spaces from the variable so the for loop can read it.
find <path> -xdev -type f -name *.txt -exec ls -l {} \;
This will list the files and give details about attributes.
Another alternative is to not use bash, but call Python to do the heavy lifting. I recurred to this because bash solutions as my other answer were too slow.
With this solution, we build a bash array of files from inline Python script:
#!/bin/bash
set -eu -o pipefail
dsep=":" # directory_separator
base_directory=/tmp
all_files=()
all_files_string="$(python3 -c '#!/usr/bin/env python3
import os
import sys
dsep="'"$dsep"'"
base_directory="'"$base_directory"'"
def log(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
def check_invalid_characther(file_path):
for thing in ("\\", "\n"):
if thing in file_path:
raise RuntimeError(f"It is not allowed {thing} on \"{file_path}\"!")
def absolute_path_to_relative(base_directory, file_path):
relative_path = os.path.commonprefix( [ base_directory, file_path ] )
relative_path = os.path.normpath( file_path.replace( relative_path, "" ) )
# if you use Windows Python, it accepts / instead of \\
# if you have \ on your files names, rename them or comment this
relative_path = relative_path.replace("\\", "/")
if relative_path.startswith( "/" ):
relative_path = relative_path[1:]
return relative_path
for directory, directories, files in os.walk(base_directory):
for file in files:
local_file_path = os.path.join(directory, file)
local_file_name = absolute_path_to_relative(base_directory, local_file_path)
log(f"local_file_name {local_file_name}.")
check_invalid_characther(local_file_name)
print(f"{base_directory}{dsep}{local_file_name}")
' | dos2unix)";
if [[ -n "$all_files_string" ]];
then
readarray -t temp <<< "$all_files_string";
all_files+=("${temp[#]}");
fi;
for item in "${all_files[#]}";
do
OLD_IFS="$IFS"; IFS="$dsep";
read -r base_directory local_file_name <<< "$item"; IFS="$OLD_IFS";
printf 'item "%s", base_directory "%s", local_file_name "%s".\n' \
"$item" \
"$base_directory" \
"$local_file_name";
done;
Related:
os.walk without hidden folders
How to do a recursive sub-folder search and return files in a list?
How to split a string into an array in Bash?
How about if you use grep instead of find?
ls | grep .txt$ > out.txt
Now you can read this file and the filenames are in the form of a list.

Loop through all files in a directory and subdirectories using Bash [duplicate]

This question already has answers here:
How to loop through a directory recursively to delete files with certain extensions
(16 answers)
Closed 4 years ago.
I know how to loop through all the files in a directory, for example:
for i in *
do
<some command>
done
But I would like to go through all the files in a directory, including (particularly!) all the ones in the subdirectories. Is there a simple way of doing this?
The find command is very useful for that kind of thing, provided you don't have white space or other special characters in the file names:
For example:
for i in $(find . -type f -print)
do
stuff
done
The command generates path names relative from the start of the search (the first parameter).
As pointed out, this will fail if your filenames contain spaces or some other characters.
You can also use the -exec option which avoids the problem with spaces in file names. It executes the given command for each file found. The braces are a placeholder for the filename:
find . -type f -exec command {} \;
find and xargs are great tools for recursively processing the contents of directories and sub-directories. For example
find . -type f -print0 | xargs -0 command
will run command on batches of files from the current directory and its sub-directories. The -print0 and -0 arguments avoid the usual problems with filenames that contain spaces, quotes or other metacharacters.
If command just takes one argument, you can limit the number of files passed to it with -L1.
find . -type f -print0 | xargs -0 -L1 command
And as suggested by alexgirao, xargs can also name arguments, using -I, which gives some flexibility if command takes options. -I implies -L1.
find . -type f -print0 | xargs -0 -Iarg command arg --option
recurse() {
path=$1
If [ -d "$path" ] ; then
for i in "$path/"*
do
recurse "$i"
done
elif [ -f "$path" ] ; then
do-something
fi
}
Call recurse and pass first positional parameter as directory path from where you want to start.
Ex: recurse /path

Reading rsync source from file results in improper parsing of file names with white space

I wrote a simple script that searches through a specific directory defined by the variable "SCOPE" producing a list of directories that were modified within the past 24 hours printing them to a temp file. The first line of the file is deleted (to exclude the root level of the directory). Finally, it loops over the contents of the temp file and rsync's each of the directories to the destination.
Problem
Directories that contain white space in their name do not rsync. The space causes everything before the whitespace and after the whitespace to be passed as individual arguments, and thus invalid filenames.
ObservationWhen I examine the contents of the temp file, each directory appears on a single line as expected. It appears that only when it is read into rsync from the file
How can I prevent the whitespace in the directory names from preventing those directories from failing to rsync?
SCOPE="/base/directory"
DESTINATION="/volumes/destination/"
find "$SCOPE" -maxdepth 1 -type d -mtime 0 > /tmp/jobs.txt;
sed '1d' /tmp/jobs.txt > /tmp/tmpjobs.txt;
mv /tmp/tmpjobs.txt /tmp/jobs.txt;
for JOB in `cat /tmp/jobs.txt`; do
rsync -avvuh "$JOB" "$DESTINATION";
done
Replace
for JOB in `cat /tmp/jobs.txt`; do
rsync -avvuh "$JOB" "$DESTINATION";
done
by
while read -r JOB; do
rsync -avvuh "$JOB" "$DESTINATION"
done < /tmp/jobs.txt
You want the -0 option for the rsync end, and the -print0 option for find. There's a lot of utilities that have some variation of this, so it's an easy fix!
From the find(1) manpage on Linux:
-print0
True; print the full file name on the standard output, followed by a null character (instead
of the newline character that -print uses). This allows file names that contain newlines or
other types of white space to be correctly interpreted by programs that process the find out-
put. This option corresponds to the -0 option of xargs.
If you don't need tmp file you can also use "one line" command:
find "$SCOPE" -maxdepth 1 -mindepth 1 -type d -mtime 0 -exec rsync -avvuh {} "$DESTINATION" \;
-mindepth 1 # This handle sed
-exec # This handle whole loop

Using bash I need to perform a find of 0 byte files but report on their existence before deletion

The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done
To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.
Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.
With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.

How to loop through file names returned by find?

x=$(find . -name "*.txt")
echo $x
if I run the above piece of code in Bash shell, what I get is a string containing several file names separated by blank, not a list.
Of course, I can further separate them by blank to get a list, but I'm sure there is a better way to do it.
So what is the best way to loop through the results of a find command?
TL;DR: If you're just here for the most correct answer, you probably want my personal preference (see the bottom of this post):
# execute `process` once for each file
find . -name '*.txt' -exec process {} \;
If you have time, read through the rest to see several different ways and the problems with most of them.
The full answer:
The best way depends on what you want to do, but here are a few options. As long as no file or folder in the subtree has whitespace in its name, you can just loop over the files:
for i in $x; do # Not recommended, will break on whitespace
process "$i"
done
Marginally better, cut out the temporary variable x:
for i in $(find -name \*.txt); do # Not recommended, will break on whitespace
process "$i"
done
It is much better to glob when you can. White-space safe, for files in the current directory:
for i in *.txt; do # Whitespace-safe but not recursive.
process "$i"
done
By enabling the globstar option, you can glob all matching files in this directory and all subdirectories:
# Make sure globstar is enabled
shopt -s globstar
for i in **/*.txt; do # Whitespace-safe and recursive
process "$i"
done
In some cases, e.g. if the file names are already in a file, you may need to use read:
# IFS= makes sure it doesn't trim leading and trailing whitespace
# -r prevents interpretation of \ escapes.
while IFS= read -r line; do # Whitespace-safe EXCEPT newlines
process "$line"
done < filename
read can be used safely in combination with find by setting the delimiter appropriately:
find . -name '*.txt' -print0 |
while IFS= read -r -d '' line; do
process "$line"
done
For more complex searches, you will probably want to use find, either with its -exec option or with -print0 | xargs -0:
# execute `process` once for each file
find . -name \*.txt -exec process {} \;
# execute `process` once with all the files as arguments*:
find . -name \*.txt -exec process {} +
# using xargs*
find . -name \*.txt -print0 | xargs -0 process
# using xargs with arguments after each filename (implies one run per filename)
find . -name \*.txt -print0 | xargs -0 -I{} process {} argument
find can also cd into each file's directory before running a command by using -execdir instead of -exec, and can be made interactive (prompt before running the command for each file) using -ok instead of -exec (or -okdir instead of -execdir).
*: Technically, both find and xargs (by default) will run the command with as many arguments as they can fit on the command line, as many times as it takes to get through all the files. In practice, unless you have a very large number of files it won't matter, and if you exceed the length but need them all on the same command line, you're SOL find a different way.
What ever you do, don't use a for loop:
# Don't do this
for file in $(find . -name "*.txt")
do
…code using "$file"
done
Three reasons:
For the for loop to even start, the find must run to completion.
If a file name has any whitespace (including space, tab or newline) in it, it will be treated as two separate names.
Although now unlikely, you can overrun your command line buffer. Imagine if your command line buffer holds 32KB, and your for loop returns 40KB of text. That last 8KB will be dropped right off your for loop and you'll never know it.
Always use a while read construct:
find . -name "*.txt" -print0 | while read -d $'\0' file
do
…code using "$file"
done
The loop will execute while the find command is executing. Plus, this command will work even if a file name is returned with whitespace in it. And, you won't overflow your command line buffer.
The -print0 will use the NULL as a file separator instead of a newline and the -d $'\0' will use NULL as the separator while reading.
find . -name "*.txt"|while read fname; do
echo "$fname"
done
Note: this method and the (second) method shown by bmargulies are safe to use with white space in the file/folder names.
In order to also have the - somewhat exotic - case of newlines in the file/folder names covered, you will have to resort to the -exec predicate of find like this:
find . -name '*.txt' -exec echo "{}" \;
The {} is the placeholder for the found item and the \; is used to terminate the -exec predicate.
And for the sake of completeness let me add another variant - you gotta love the *nix ways for their versatility:
find . -name '*.txt' -print0|xargs -0 -n 1 echo
This would separate the printed items with a \0 character that isn't allowed in any of the file systems in file or folder names, to my knowledge, and therefore should cover all bases. xargs picks them up one by one then ...
Filenames can include spaces and even control characters. Spaces are (default) delimiters for shell expansion in bash and as a result of that x=$(find . -name "*.txt") from the question is not recommended at all. If find gets a filename with spaces e.g. "the file.txt" you will get 2 separated strings for processing, if you process x in a loop. You can improve this by changing delimiter (bash IFS Variable) e.g. to \r\n, but filenames can include control characters - so this is not a (completely) safe method.
From my point of view, there are 2 recommended (and safe) patterns for processing files:
1. Use for loop & filename expansion:
for file in ./*.txt; do
[[ ! -e $file ]] && continue # continue, if file does not exist
# single filename is in $file
echo "$file"
# your code here
done
2. Use find-read-while & process substitution
while IFS= read -r -d '' file; do
# single filename is in $file
echo "$file"
# your code here
done < <(find . -name "*.txt" -print0)
Remarks
on Pattern 1:
bash returns the search pattern ("*.txt") if no matching file is found - so the extra line "continue, if file does not exist" is needed. see Bash Manual, Filename Expansion
shell option nullglob can be used to avoid this extra line.
"If the failglob shell option is set, and no matches are found, an error message is printed and the command is not executed." (from Bash Manual above)
shell option globstar: "If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match." see Bash Manual, Shopt Builtin
other options for filename expansion: extglob, nocaseglob, dotglob & shell variable GLOBIGNORE
on Pattern 2:
filenames can contain blanks, tabs, spaces, newlines, ... to process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL. see also Gnu Findutils Manpage, Unsafe File Name Handling, safe File Name Handling, unusual characters in filenames. See David A. Wheeler below for detailed discussion of this topic.
There are some possible patterns to process find results in a while loop. Others (kevin, David W.) have shown how to do this using pipes:
files_found=1
find . -name "*.txt" -print0 |
while IFS= read -r -d '' file; do
# single filename in $file
echo "$file"
files_found=0 # not working example
# your code here
done
[[ $files_found -eq 0 ]] && echo "files found" || echo "no files found"
When you try this piece of code, you will see, that it does not work: files_found is always "true" & the code will always echo "no files found". Reason is: each command of a pipeline is executed in a separate subshell, so the changed variable inside the loop (separate subshell) does not change the variable in the main shell script. This is why I recommend using process substitution as the "better", more useful, more general pattern.See I set variables in a loop that's in a pipeline. Why do they disappear... (from Greg's Bash FAQ) for a detailed discussion on this topic.
Additional References & Sources:
Gnu Bash Manual, Pattern Matching
Filenames and Pathnames in Shell: How to do it Correctly, David A. Wheeler
Why you don't read lines with "for", Greg's Wiki
Why you shouldn't parse the output of ls(1), Greg's Wiki
Gnu Bash Manual, Process Substitution
(Updated to include #Socowi's execellent speed improvement)
With any $SHELL that supports it (dash/zsh/bash...):
find . -name "*.txt" -exec $SHELL -c '
for i in "$#" ; do
echo "$i"
done
' {} +
Done.
Original answer (shorter, but slower):
find . -name "*.txt" -exec $SHELL -c '
echo "$0"
' {} \;
If you can assume the file names don't contain newlines, you can read the output of find into a Bash array using the following command:
readarray -t x < <(find . -name '*.txt')
Note:
-t causes readarray to strip newlines.
It won't work if readarray is in a pipe, hence the process substitution.
readarray is available since Bash 4.
Bash 4.4 and up also supports the -d parameter for specifying the delimiter. Using the null character, instead of newline, to delimit the file names works also in the rare case that the file names contain newlines:
readarray -d '' x < <(find . -name '*.txt' -print0)
readarray can also be invoked as mapfile with the same options.
Reference: https://mywiki.wooledge.org/BashFAQ/005#Loading_lines_from_a_file_or_stream
# Doesn't handle whitespace
for x in `find . -name "*.txt" -print`; do
process_one $x
done
or
# Handles whitespace and newlines
find . -name "*.txt" -print0 | xargs -0 -n 1 process_one
I like to use find which is first assigned to variable and IFS switched to new line as follow:
FilesFound=$(find . -name "*.txt")
IFSbkp="$IFS"
IFS=$'\n'
counter=1;
for file in $FilesFound; do
echo "${counter}: ${file}"
let counter++;
done
IFS="$IFSbkp"
As commented by #Konrad Rudolph this will not work with "new lines" in file name. I still think it is handy as it covers most of the cases when you need to loop over command output.
As already posted on the top answer by Kevin, the best solution is to use a for loop with bash glob, but as bash glob is not recursive by default, this can be fixed by a bash recursive function:
#!/bin/bash
set -x
set -eu -o pipefail
all_files=();
function get_all_the_files()
{
directory="$1";
for item in "$directory"/* "$directory"/.[^.]*;
do
if [[ -d "$item" ]];
then
get_all_the_files "$item";
else
all_files+=("$item");
fi;
done;
}
get_all_the_files "/tmp";
for file_path in "${all_files[#]}"
do
printf 'My file is "%s"\n' "$file_path";
done;
Related questions:
Bash loop through directory including hidden file
Recursively list files from a given directory in Bash
ls command: how can I get a recursive full-path listing, one line per file?
List files recursively in Linux CLI with path relative to the current directory
Recursively List all directories and files
bash script, create array of all files in a directory
How can I creates array that contains the names of all the files in a folder?
How can I creates array that contains the names of all the files in a folder?
How to get the list of files in a directory in a shell script?
based on other answers and comment of #phk, using fd #3:
(which still allows to use stdin inside the loop)
while IFS= read -r f <&3; do
echo "$f"
done 3< <(find . -iname "*filename*")
You can put the filenames returned by find into an array like this:
array=()
while IFS= read -r -d ''; do
array+=("$REPLY")
done < <(find . -name '*.txt' -print0)
Now you can just loop through the array to access individual items and do whatever you want with them.
Note: It's white space safe.
I think using this piece of code (piping the command after while done):
while read fname; do
echo "$fname"
done <<< "$(find . -name "*.txt")"
is better than this answer because while loop is executed in a subshell according to here, if you use this answer and variable changes cannot be seen after while loop if you want to modify variables inside the loop.
You can store your find output in array if you wish to use the output later as:
array=($(find . -name "*.txt"))
Now to print the each element in new line, you can either use for loop iterating to all the elements of array, or you can use printf statement.
for i in ${array[#]};do echo $i; done
or
printf '%s\n' "${array[#]}"
You can also use:
for file in "`find . -name "*.txt"`"; do echo "$file"; done
This will print each filename in newline
To only print the find output in list form, you can use either of the following:
find . -name "*.txt" -print 2>/dev/null
or
find . -name "*.txt" -print | grep -v 'Permission denied'
This will remove error messages and only give the filename as output in new line.
If you wish to do something with the filenames, storing it in array is good, else there is no need to consume that space and you can directly print the output from find.
function loop_through(){
length_="$(find . -name '*.txt' | wc -l)"
length_="${length_#"${length_%%[![:space:]]*}"}"
length_="${length_%"${length_##*[![:space:]]}"}"
for i in {1..$length_}
do
x=$(find . -name '*.txt' | sort | head -$i | tail -1)
echo $x
done
}
To grab the length of the list of files for loop, I used the first command "wc -l".
That command is set to a variable.
Then, I need to remove the trailing white spaces from the variable so the for loop can read it.
find <path> -xdev -type f -name *.txt -exec ls -l {} \;
This will list the files and give details about attributes.
Another alternative is to not use bash, but call Python to do the heavy lifting. I recurred to this because bash solutions as my other answer were too slow.
With this solution, we build a bash array of files from inline Python script:
#!/bin/bash
set -eu -o pipefail
dsep=":" # directory_separator
base_directory=/tmp
all_files=()
all_files_string="$(python3 -c '#!/usr/bin/env python3
import os
import sys
dsep="'"$dsep"'"
base_directory="'"$base_directory"'"
def log(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
def check_invalid_characther(file_path):
for thing in ("\\", "\n"):
if thing in file_path:
raise RuntimeError(f"It is not allowed {thing} on \"{file_path}\"!")
def absolute_path_to_relative(base_directory, file_path):
relative_path = os.path.commonprefix( [ base_directory, file_path ] )
relative_path = os.path.normpath( file_path.replace( relative_path, "" ) )
# if you use Windows Python, it accepts / instead of \\
# if you have \ on your files names, rename them or comment this
relative_path = relative_path.replace("\\", "/")
if relative_path.startswith( "/" ):
relative_path = relative_path[1:]
return relative_path
for directory, directories, files in os.walk(base_directory):
for file in files:
local_file_path = os.path.join(directory, file)
local_file_name = absolute_path_to_relative(base_directory, local_file_path)
log(f"local_file_name {local_file_name}.")
check_invalid_characther(local_file_name)
print(f"{base_directory}{dsep}{local_file_name}")
' | dos2unix)";
if [[ -n "$all_files_string" ]];
then
readarray -t temp <<< "$all_files_string";
all_files+=("${temp[#]}");
fi;
for item in "${all_files[#]}";
do
OLD_IFS="$IFS"; IFS="$dsep";
read -r base_directory local_file_name <<< "$item"; IFS="$OLD_IFS";
printf 'item "%s", base_directory "%s", local_file_name "%s".\n' \
"$item" \
"$base_directory" \
"$local_file_name";
done;
Related:
os.walk without hidden folders
How to do a recursive sub-folder search and return files in a list?
How to split a string into an array in Bash?
How about if you use grep instead of find?
ls | grep .txt$ > out.txt
Now you can read this file and the filenames are in the form of a list.

Resources