Remove OS X Apps via Shell Script - bash

I'm trying to create a script that will remove a list of applications in OS X.
My general thinking:
Applications are just directories in OS X
Those directories have a Contents sub-directory that contains an Info.plist that can be used to identify the app.
So the core logic would be
Walk the drive
IF the current folder has a subfolder named Contents that contains a file called Info.plist that contains certain text, delete the current folder.
I've been playing around with find -exec, but am open to other approaches.
This is what I have for find -exec that isn't quite working
find /Applications -maxdepth 3 -type f -name Info.plist -exec sh -c "grep '<string>Example</string>' | xargs dirname | xargs echo rm --" 2>/dev/null
I realize
defaults read $dir/Contents/Info CFBundleExecutable
is probably better than grep to extract the package name, but don't think that is why the above isn't working (no output from the test "echo rm" at all, and inserting a tee command to output to a file didn't do anything either).
The above line, if it worked, would go in a loop running for each app to be removed with a list of app names in a variable.
I am totally open to other approaches, especially if there's a more efficient way to do this.

It's saner to keep the logic in your parent shell, rather than shuffling it off to a subprocess:
#!/usr/bin/env bash
# works correctly with names echo doesn't handle -- ones containing spaces, backslashes, etc
log_command() { printf '%q ' "$#" && echo; }
while IFS= read -r -d '' plist; do
if grep -e '<string>Example</string>' "$plist"; then
log_command rm -r -- "${plist%/*}"
fi
done < <(find /Applications -maxdepth 3 -type f -name Info.plist -print0)
Note that <(...) -- process substitution -- is a bash-only feature, so it isn't guaranteed to work if your script is started with sh instead of bash.
This also can be easily modified to use better practices, so it's easy to modify to:
#!/usr/bin/env bash
log_command() { printf '%q ' "$#" && echo; }
while IFS= read -r -d '' plist; do
dir=${plist%/*}
if [[ "$(defaults read "$dir"/Contents/Info CFBundleExecutable)" = example ]]; then
log_command rm -r -- "$dir"
fi
done < <(find /Applications -maxdepth 3 -type f -name Info.plist -print0)
However, if you really want find to be the parent process, you can do that:
find /Applications -maxdepth 3 -type f -name Info.plist -exec bash -c '
log_command() { printf '%s\n' "$#" && echo; }
for plist do
if grep -e "<string>Example</string>" "$plist"; then
log_command rm -r -- "${plist%/*}"
fi
done
' _ {} +
The use of -exec ... {} + passes as many results as possible to each copy of bash. The _ fills in $0, so the filenames that were found and placed on the command line are put in $1, $2, etc; this is what for loops over when not given an explicit list.

Since you're limiting it to a depth of 3, and that's also practically the minimum depth, and the rest of the path is going to be highly constrained (it must be *.app/Contents/) you don't really need find. A simple glob pattern should suffice, and should also be more efficient (since it doesn't have to search e.g. Contents/Resources):
for plist in /Applications/*.app/Contents/Info.plist; do
if [ "$(defaults read "${plist%.plist}" CFBundleExecutable)" = Example ]; then
echo rm -R -- "${plist%/Contents/Info.plist}" # remove "echo" to actually do it
fi
done

Related

Recursively Rename Files and Directories with Bash on macOS

I'm writing a script that will perform some actions, and one of those actions is to find all occurrences of a string in both file names and directory names, and replace it with another string.
I have this so far
find . -name "*foo*" -type f -depth | while read file; do
newpath=${file//foo/bar}
mv "$file" "$newpath"
done
This works fine as long as the path to the file doesn't also contain foo, but that isn't guaranteed.
I feel like the way to approach this is to ONLY change the file names first, then go back through and change the directory names, but even then, if you have a structure that has more than one directory with foo in it, it will not work properly.
Is there a way to do this with built in macOS tools? (I say built-in, because this script is going to be distributed to some other folks in our organization and it can't rely on any packages to be installed).
Separating the path_name from the file_name, something like.
#!/usr/bin/env bash
while read -r file; do
path_name="${file%/*}"; printf 'Path is %s\n' "$path_name"
file_name="${file#"$path_name"}"; printf 'Filename is %s\n' "$file_name"
newpath="$path_name${file_name//foo/bar}"
echo mv -v "$file" "$newpath"
done < <(find . -name "*foo*" -type f)
Have a look at basename and dirname as well.
The printf's is just there to show which is the path and the filename.
The script just replace foo to bar from the file_name, It can be done with the path_name as well, just use the same syntax.
newpath="${path_name//bar/more}${file_name//foo/bar}"
So renaming both path_name and file_name.
Or renaming the path_name and then the file_name like your idea is an option also.
path_name="${file%/*}"
file_name="${file#"$path_name"}"
new_pathname="${path_name//bar/more}"
mv -v "$path_name" "$new_pathname"
new_filename="${file_name//foo/bar}"
mv -v "${new_pathname%/*}$file_name" "$new_pathname$new_filename"
There are no additional external tool/utility used, except from the ones being used by your script.
Remove the echo If you're satisfied with the result/output.
You can use -execdir to run a command on just the filename (basename) in the relevant directory:
find . -depth -name '*foo*' -execdir bash -c 'mv -- "${1}" "${1//foo/bar}"' _ {} \;

How to find files with specific extensions recursively using the for/in syntax? [duplicate]

x=$(find . -name "*.txt")
echo $x
if I run the above piece of code in Bash shell, what I get is a string containing several file names separated by blank, not a list.
Of course, I can further separate them by blank to get a list, but I'm sure there is a better way to do it.
So what is the best way to loop through the results of a find command?
TL;DR: If you're just here for the most correct answer, you probably want my personal preference (see the bottom of this post):
# execute `process` once for each file
find . -name '*.txt' -exec process {} \;
If you have time, read through the rest to see several different ways and the problems with most of them.
The full answer:
The best way depends on what you want to do, but here are a few options. As long as no file or folder in the subtree has whitespace in its name, you can just loop over the files:
for i in $x; do # Not recommended, will break on whitespace
process "$i"
done
Marginally better, cut out the temporary variable x:
for i in $(find -name \*.txt); do # Not recommended, will break on whitespace
process "$i"
done
It is much better to glob when you can. White-space safe, for files in the current directory:
for i in *.txt; do # Whitespace-safe but not recursive.
process "$i"
done
By enabling the globstar option, you can glob all matching files in this directory and all subdirectories:
# Make sure globstar is enabled
shopt -s globstar
for i in **/*.txt; do # Whitespace-safe and recursive
process "$i"
done
In some cases, e.g. if the file names are already in a file, you may need to use read:
# IFS= makes sure it doesn't trim leading and trailing whitespace
# -r prevents interpretation of \ escapes.
while IFS= read -r line; do # Whitespace-safe EXCEPT newlines
process "$line"
done < filename
read can be used safely in combination with find by setting the delimiter appropriately:
find . -name '*.txt' -print0 |
while IFS= read -r -d '' line; do
process "$line"
done
For more complex searches, you will probably want to use find, either with its -exec option or with -print0 | xargs -0:
# execute `process` once for each file
find . -name \*.txt -exec process {} \;
# execute `process` once with all the files as arguments*:
find . -name \*.txt -exec process {} +
# using xargs*
find . -name \*.txt -print0 | xargs -0 process
# using xargs with arguments after each filename (implies one run per filename)
find . -name \*.txt -print0 | xargs -0 -I{} process {} argument
find can also cd into each file's directory before running a command by using -execdir instead of -exec, and can be made interactive (prompt before running the command for each file) using -ok instead of -exec (or -okdir instead of -execdir).
*: Technically, both find and xargs (by default) will run the command with as many arguments as they can fit on the command line, as many times as it takes to get through all the files. In practice, unless you have a very large number of files it won't matter, and if you exceed the length but need them all on the same command line, you're SOL find a different way.
What ever you do, don't use a for loop:
# Don't do this
for file in $(find . -name "*.txt")
do
…code using "$file"
done
Three reasons:
For the for loop to even start, the find must run to completion.
If a file name has any whitespace (including space, tab or newline) in it, it will be treated as two separate names.
Although now unlikely, you can overrun your command line buffer. Imagine if your command line buffer holds 32KB, and your for loop returns 40KB of text. That last 8KB will be dropped right off your for loop and you'll never know it.
Always use a while read construct:
find . -name "*.txt" -print0 | while read -d $'\0' file
do
…code using "$file"
done
The loop will execute while the find command is executing. Plus, this command will work even if a file name is returned with whitespace in it. And, you won't overflow your command line buffer.
The -print0 will use the NULL as a file separator instead of a newline and the -d $'\0' will use NULL as the separator while reading.
find . -name "*.txt"|while read fname; do
echo "$fname"
done
Note: this method and the (second) method shown by bmargulies are safe to use with white space in the file/folder names.
In order to also have the - somewhat exotic - case of newlines in the file/folder names covered, you will have to resort to the -exec predicate of find like this:
find . -name '*.txt' -exec echo "{}" \;
The {} is the placeholder for the found item and the \; is used to terminate the -exec predicate.
And for the sake of completeness let me add another variant - you gotta love the *nix ways for their versatility:
find . -name '*.txt' -print0|xargs -0 -n 1 echo
This would separate the printed items with a \0 character that isn't allowed in any of the file systems in file or folder names, to my knowledge, and therefore should cover all bases. xargs picks them up one by one then ...
Filenames can include spaces and even control characters. Spaces are (default) delimiters for shell expansion in bash and as a result of that x=$(find . -name "*.txt") from the question is not recommended at all. If find gets a filename with spaces e.g. "the file.txt" you will get 2 separated strings for processing, if you process x in a loop. You can improve this by changing delimiter (bash IFS Variable) e.g. to \r\n, but filenames can include control characters - so this is not a (completely) safe method.
From my point of view, there are 2 recommended (and safe) patterns for processing files:
1. Use for loop & filename expansion:
for file in ./*.txt; do
[[ ! -e $file ]] && continue # continue, if file does not exist
# single filename is in $file
echo "$file"
# your code here
done
2. Use find-read-while & process substitution
while IFS= read -r -d '' file; do
# single filename is in $file
echo "$file"
# your code here
done < <(find . -name "*.txt" -print0)
Remarks
on Pattern 1:
bash returns the search pattern ("*.txt") if no matching file is found - so the extra line "continue, if file does not exist" is needed. see Bash Manual, Filename Expansion
shell option nullglob can be used to avoid this extra line.
"If the failglob shell option is set, and no matches are found, an error message is printed and the command is not executed." (from Bash Manual above)
shell option globstar: "If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match." see Bash Manual, Shopt Builtin
other options for filename expansion: extglob, nocaseglob, dotglob & shell variable GLOBIGNORE
on Pattern 2:
filenames can contain blanks, tabs, spaces, newlines, ... to process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL. see also Gnu Findutils Manpage, Unsafe File Name Handling, safe File Name Handling, unusual characters in filenames. See David A. Wheeler below for detailed discussion of this topic.
There are some possible patterns to process find results in a while loop. Others (kevin, David W.) have shown how to do this using pipes:
files_found=1
find . -name "*.txt" -print0 |
while IFS= read -r -d '' file; do
# single filename in $file
echo "$file"
files_found=0 # not working example
# your code here
done
[[ $files_found -eq 0 ]] && echo "files found" || echo "no files found"
When you try this piece of code, you will see, that it does not work: files_found is always "true" & the code will always echo "no files found". Reason is: each command of a pipeline is executed in a separate subshell, so the changed variable inside the loop (separate subshell) does not change the variable in the main shell script. This is why I recommend using process substitution as the "better", more useful, more general pattern.See I set variables in a loop that's in a pipeline. Why do they disappear... (from Greg's Bash FAQ) for a detailed discussion on this topic.
Additional References & Sources:
Gnu Bash Manual, Pattern Matching
Filenames and Pathnames in Shell: How to do it Correctly, David A. Wheeler
Why you don't read lines with "for", Greg's Wiki
Why you shouldn't parse the output of ls(1), Greg's Wiki
Gnu Bash Manual, Process Substitution
(Updated to include #Socowi's execellent speed improvement)
With any $SHELL that supports it (dash/zsh/bash...):
find . -name "*.txt" -exec $SHELL -c '
for i in "$#" ; do
echo "$i"
done
' {} +
Done.
Original answer (shorter, but slower):
find . -name "*.txt" -exec $SHELL -c '
echo "$0"
' {} \;
If you can assume the file names don't contain newlines, you can read the output of find into a Bash array using the following command:
readarray -t x < <(find . -name '*.txt')
Note:
-t causes readarray to strip newlines.
It won't work if readarray is in a pipe, hence the process substitution.
readarray is available since Bash 4.
Bash 4.4 and up also supports the -d parameter for specifying the delimiter. Using the null character, instead of newline, to delimit the file names works also in the rare case that the file names contain newlines:
readarray -d '' x < <(find . -name '*.txt' -print0)
readarray can also be invoked as mapfile with the same options.
Reference: https://mywiki.wooledge.org/BashFAQ/005#Loading_lines_from_a_file_or_stream
# Doesn't handle whitespace
for x in `find . -name "*.txt" -print`; do
process_one $x
done
or
# Handles whitespace and newlines
find . -name "*.txt" -print0 | xargs -0 -n 1 process_one
I like to use find which is first assigned to variable and IFS switched to new line as follow:
FilesFound=$(find . -name "*.txt")
IFSbkp="$IFS"
IFS=$'\n'
counter=1;
for file in $FilesFound; do
echo "${counter}: ${file}"
let counter++;
done
IFS="$IFSbkp"
As commented by #Konrad Rudolph this will not work with "new lines" in file name. I still think it is handy as it covers most of the cases when you need to loop over command output.
As already posted on the top answer by Kevin, the best solution is to use a for loop with bash glob, but as bash glob is not recursive by default, this can be fixed by a bash recursive function:
#!/bin/bash
set -x
set -eu -o pipefail
all_files=();
function get_all_the_files()
{
directory="$1";
for item in "$directory"/* "$directory"/.[^.]*;
do
if [[ -d "$item" ]];
then
get_all_the_files "$item";
else
all_files+=("$item");
fi;
done;
}
get_all_the_files "/tmp";
for file_path in "${all_files[#]}"
do
printf 'My file is "%s"\n' "$file_path";
done;
Related questions:
Bash loop through directory including hidden file
Recursively list files from a given directory in Bash
ls command: how can I get a recursive full-path listing, one line per file?
List files recursively in Linux CLI with path relative to the current directory
Recursively List all directories and files
bash script, create array of all files in a directory
How can I creates array that contains the names of all the files in a folder?
How can I creates array that contains the names of all the files in a folder?
How to get the list of files in a directory in a shell script?
based on other answers and comment of #phk, using fd #3:
(which still allows to use stdin inside the loop)
while IFS= read -r f <&3; do
echo "$f"
done 3< <(find . -iname "*filename*")
You can put the filenames returned by find into an array like this:
array=()
while IFS= read -r -d ''; do
array+=("$REPLY")
done < <(find . -name '*.txt' -print0)
Now you can just loop through the array to access individual items and do whatever you want with them.
Note: It's white space safe.
You can store your find output in array if you wish to use the output later as:
array=($(find . -name "*.txt"))
Now to print the each element in new line, you can either use for loop iterating to all the elements of array, or you can use printf statement.
for i in ${array[#]};do echo $i; done
or
printf '%s\n' "${array[#]}"
You can also use:
for file in "`find . -name "*.txt"`"; do echo "$file"; done
This will print each filename in newline
To only print the find output in list form, you can use either of the following:
find . -name "*.txt" -print 2>/dev/null
or
find . -name "*.txt" -print | grep -v 'Permission denied'
This will remove error messages and only give the filename as output in new line.
If you wish to do something with the filenames, storing it in array is good, else there is no need to consume that space and you can directly print the output from find.
I think using this piece of code (piping the command after while done):
while read fname; do
echo "$fname"
done <<< "$(find . -name "*.txt")"
is better than this answer because while loop is executed in a subshell according to here, if you use this answer and variable changes cannot be seen after while loop if you want to modify variables inside the loop.
function loop_through(){
length_="$(find . -name '*.txt' | wc -l)"
length_="${length_#"${length_%%[![:space:]]*}"}"
length_="${length_%"${length_##*[![:space:]]}"}"
for i in {1..$length_}
do
x=$(find . -name '*.txt' | sort | head -$i | tail -1)
echo $x
done
}
To grab the length of the list of files for loop, I used the first command "wc -l".
That command is set to a variable.
Then, I need to remove the trailing white spaces from the variable so the for loop can read it.
find <path> -xdev -type f -name *.txt -exec ls -l {} \;
This will list the files and give details about attributes.
Another alternative is to not use bash, but call Python to do the heavy lifting. I recurred to this because bash solutions as my other answer were too slow.
With this solution, we build a bash array of files from inline Python script:
#!/bin/bash
set -eu -o pipefail
dsep=":" # directory_separator
base_directory=/tmp
all_files=()
all_files_string="$(python3 -c '#!/usr/bin/env python3
import os
import sys
dsep="'"$dsep"'"
base_directory="'"$base_directory"'"
def log(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
def check_invalid_characther(file_path):
for thing in ("\\", "\n"):
if thing in file_path:
raise RuntimeError(f"It is not allowed {thing} on \"{file_path}\"!")
def absolute_path_to_relative(base_directory, file_path):
relative_path = os.path.commonprefix( [ base_directory, file_path ] )
relative_path = os.path.normpath( file_path.replace( relative_path, "" ) )
# if you use Windows Python, it accepts / instead of \\
# if you have \ on your files names, rename them or comment this
relative_path = relative_path.replace("\\", "/")
if relative_path.startswith( "/" ):
relative_path = relative_path[1:]
return relative_path
for directory, directories, files in os.walk(base_directory):
for file in files:
local_file_path = os.path.join(directory, file)
local_file_name = absolute_path_to_relative(base_directory, local_file_path)
log(f"local_file_name {local_file_name}.")
check_invalid_characther(local_file_name)
print(f"{base_directory}{dsep}{local_file_name}")
' | dos2unix)";
if [[ -n "$all_files_string" ]];
then
readarray -t temp <<< "$all_files_string";
all_files+=("${temp[#]}");
fi;
for item in "${all_files[#]}";
do
OLD_IFS="$IFS"; IFS="$dsep";
read -r base_directory local_file_name <<< "$item"; IFS="$OLD_IFS";
printf 'item "%s", base_directory "%s", local_file_name "%s".\n' \
"$item" \
"$base_directory" \
"$local_file_name";
done;
Related:
os.walk without hidden folders
How to do a recursive sub-folder search and return files in a list?
How to split a string into an array in Bash?
How about if you use grep instead of find?
ls | grep .txt$ > out.txt
Now you can read this file and the filenames are in the form of a list.

Execute bash function from find command

I have defined a function in bash, which checks if two files exists, compare if they are equal and delete one of them.
function remodup {
F=$1
G=${F/.mod/}
if [ -f "$F" ] && [ -f "$G" ]
then
cmp --silent "$F" "$G" && rm "$F" || echo "$G was modified"
fi
}
Then I want to call this function from a find command:
find $DIR -name "*.mod" -type f -exec remodup {} \;
I have also tried | xargs syntax. Both find and xargs tell that ``remodup` does not exist.
I can move the function into a separate bash script and call the script, but I don't want to copy that function into a path directory (yet), so I would either need to call the function script with an absolute path or allways call the calling script from the same location.
(I probably can use fdupes for this particular task, but I would like to find a way to either
call a function from find command;
call one script from a relative path of another script; or
Use a ${F/.mod/} syntax (or other bash variable manipulation) for files found with a find command.)
You need to export the function first using:
export -f remodup
then use it as:
find $DIR -name "*.mod" -type f -exec bash -c 'remodup "$1"' - {} \;
You could manually loop over find's results.
while IFS= read -rd $'\0' file; do
remodup "$file"
done < <(find "$dir" -name "*.mod" -type f -print0)
-print0 and -d $'\0' use NUL as the delimiter, allowing for newlines in the file names. IFS= ensures spaces as the beginning of file names aren't stripped. -r disables backslash escapes. The sum total of all of these options is to allow as many special characters as possible in file names without mangling.
Given that you aren't using many features of find, you can use a pure bash solution instead to iterate over the desired files.
shopt -s globstar nullglob
for fname in ./"$DIR"/**/*.mod; do
[[ -f $fname ]] || continue
f=${fname##*/}
remodup "$f"
done
To throw in a third option:
find "$dir" -name "*.mod" -type f \
-exec bash -s -c "$(declare -f remodup)"$'\n'' for arg; do remodup "$arg"; done' _ {} +
This passes the function through the argv, as opposed to through the environment, and (by virtue of using {} + rather than {} ;) uses as few shell instances as possible.
I would use John Kugelman's answer as my first choice, and this as my second.

find command with filename coming from bash printf builtin not working

I'm trying to do a script which lists files on a directory and then searchs one by one every file in other directory. For dealing with spaces and special characters like "[" or "]" I'm using $(printf %q "$FILENAME") as input for the find command: find /directory/to/search -type f -name $(printf %q "$FILENAME").
It works like a charm for every filename except in one case: when there's multibyte characters (UTF-8). In that case the output of printf is an external quoted string, i.e.: $'file name with blank spaces and quoted characters in the form of \NNN\NNN', and that string is not being expanded without the $'' quoting, so find searchs for a file with a name including that quote: «$'filename'».
Is there an alternative solution in order to be able to pass to find any kind of filename?
My script is like follows (I know some lines can be deleted, like the "RESNAME="):
#!/bin/bash
if [ -d $1 ] && [ -d $2 ]; then
IFSS=$IFS
IFS=$'\n'
FILES=$(find $1 -type f )
for FILE in $FILES; do
BASEFILE=$(printf '%q' "$(basename "$FILE")")
RES=$(find $2 -type f -name "$BASEFILE" -print )
if [ ${#RES} -gt 1 ]; then
RESNAME=$(printf '%q' "$(basename "$RES")")
else
RESNAME=
fi
if [ "$RESNAME" != "$BASEFILE" ]; then
echo "FILE NOT FOUND: $FILE"
fi
done
else
echo "Directories do not exist"
fi
IFS=$IFSS
As an answer said, I've used associative arrays, but with no luck, maybe I'm not using correctly the arrays, but echoing it (array[#]) returns nothing. This is the script I've written:
#!/bin/bash
if [ -d "$1" ] && [ -d "$2" ]; then
declare -A files
find "$2" -type f -print0 | while read -r -d $'\0' FILE;
do
BN2="$(basename "$FILE")"
files["$BN2"]="$BN2"
done
echo "${files[#]}"
find "$1" -type f -print0 | while read -r -d $'\0' FILE;
do
BN1="$(basename "$FILE")"
if [ "${files["$BN1"]}" != "$BN1" ]; then
echo "File not found: "$BN1""
fi
done
fi
Don't use for loops. First, it is slower. Your find has to complete before the rest of your program can run. Second, it is possible to overload the command line. The enter for command must fit in the command line buffer.
Most importantly of all, for sucks at handling funky file names. You're running conniptions trying to get around this. However:
find $1 -type f -print0 | while read -r -d $'\0' FILE
will work much better. It handles file names -- even file names that contain \n characters. The -print0 tells find to separate file names with the NUL character. The while read -r -d $'\0 FILE will read each file name (separate by the NUL character) into $FILE.
If you put quotes around the file name in the find command, you don't have to worry about special characters in the file names.
Your script is running find once for each file found. If you have 100 files in your first directory, you're running find 100 times.
Do you know about associative (hash) arrays in BASH? You are probably better off using associative arrays. Run find on the first directory, and store those files names in an associative array.
Then, run find (again using the find | while read syntax) for your second directory. For each file you find in the second directory, see if you have a matching entry in your associative array. If you do, you know that file is in both arrays.
Addendum
I've been looking at the find command. It appears there's no real way to prevent it from using pattern matching except through a lot of work (like you were doing with printf. I've tried using the -regex matching and using \Q and \E to remove the special meaning of pattern characters. I haven't been successful.
There comes a time that you need something a bit more powerful and flexible than shell to implement your script, and I believe this is the time.
Perl, Python, and Ruby are three fairly ubiquitous scripting languages found on almost all Unix systems and are available on other non-POSIX platforms (cough! ...Windows!... cough!).
Below is a Perl script that takes two directories, and searches them for matching files. It uses the find command once and uses associative arrays (called hashes in Perl). I key the hash to the name of my file. In the value portion of the hash, I store an array of the directories where I found this file.
I only need to run the find command once per directory. Once that is done, I can print out all the entries in the hash that contain more than one directory.
I know it's not shell, but this is one of the cases where you can spend a lot more time trying to figure out how to get shell to do what you want than its worth.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use File::Find;
use constant DIRECTORIES => qw( dir1 dir2 );
my %files;
#
# Perl version of the find command. You give it a list of
# directories and a subroutine for filtering what you find.
# I am basically rejecting all non-file entires, then pushing
# them into my %files hash as an array.
#
find (
sub {
return unless -f;
$files{$_} = [] if not exists $files{$_};
push #{ $files{$_} }, $File::Find::dir;
}, DIRECTORIES
);
#
# All files are found and in %files hash. I can then go
# through all the entries in my hash, and look for ones
# with more than one directory in the array reference.
# IF there is more than one, the file is located in multiple
# directories, and I print them.
#
for my $file ( sort keys %files ) {
if ( #{ $files{$file} } > 1 ) {
say "File: $file: " . join ", ", #{ $files{$file} };
}
}
Try something like this:
find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\}
How about this one-liner?
find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n"
Absolutely 100% safe regarding files with funny symbols, newlines and spaces in their name.
How does it work?
find (the main one) will scan through directory dir1 and for each file (-type f) will execute
read < <(find dir2 -name "${1##*/} -type f")
with argument the name of the current file given by the main find. This argument is at position $1. The ${1##*/} removes everything before the last / so that if $1 is path/to/found/file the find statement is:
find dir2 -name "file" -type f
This outputs something if file is found, otherwise has no output. That's what is read by the read bash command. read's exit status is true if it was able to read something, and false if there wasn't anything read (i.e., in case nothing is found). This exit status becomes bash's exit status which becomes -exec's status. If true, the next -printf statement is executed, and if false, the -o -printf part will be executed.
If your dirs are given in variables $dir1 and $dir2 do this, so as to be safe regarding spaces and funny symbols that could occur in $dir2:
find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n"
Regarding efficiency: this is of course not an efficient method at all! the inner find will be executed as many times as there are found files in dir1. This is terrible, especially if the directory tree under dir2 is deep and has many branches (you can rely a little bit on caching, but there are limits!).
Regarding usability: you have fine-grained control on how both find's work and on the output, and it's very easy to add many more tests.
So, hey, tell me how to compare files from two directories? Well, if you agree on loosing a little bit of control, this will be the shortest and most efficient answer:
diff dir1 dir2
Try it, you'll be amazed!
Since you are only using find for its recursive directory following, it will be easier to simply use the globstar option in bash. (You're using associative arrays, so your bash is new enough).
#!/bin/bash
shopt -s globstar
declare -A files
if [[ -d $1 && -d $2 ]]; then
for f in "$2"/**/*; do
[[ -f "$f" ]] || continue
BN2=$(basename "$f")
files["$BN2"]=$BN2
done
echo "${files[#]}"
for f in "$1"/**/*; do
[[ -f "$f" ]] || continue
BN1=$(basename $f)
if [[ ${files[$BN1]} != $BN1 ]]; then
echo "File not found: $BN1"
fi
done
fi
** will match zero or more directories, so $1/**/* will match all the files and directories in $1, all the files and directories in those directories, and so forth all the way down the tree.
If you want to use associative arrays, here's one possibility that will work well with files with all sorts of funny symbols in their names (this script has too much to just show the point, but it is usable as is – just remove the parts you don't want and adapt to your needs):
#!/bin/bash
die() {
printf "%s\n" "$#"
exit 1
}
[[ -n $1 ]] || die "Must give two arguments (none found)"
[[ -n $2 ]] || die "Must give two arguments (only one given)"
dir1=$1
dir2=$2
[[ -d $dir1 ]] || die "$dir1 is not a directory"
[[ -d $dir2 ]] || die "$dir2 is not a directory"
declare -A dir1files
declare -A dir2files
while IFS=$'\0' read -r -d '' file; do
dir1files[${file##*/}]=1
done < <(find "$dir1" -type f -print0)
while IFS=$'\0' read -r -d '' file; do
dir2files[${file##*/}]=1
done < <(find "$dir2" -type f -print0)
# Which files in dir1 are in dir2?
for i in "${!dir1files[#]}"; do
if [[ -n ${dir2files[$i]} ]]; then
printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2"
# Remove it from dir2 has
unset dir2files["$i"]
else
printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2"
fi
done
# Which files in dir2 are not in dir1?
# Since I unset them from dir2files hash table, the only keys remaining
# correspond to files in dir2 but not in dir1
if [[ -n "${!dir2files[#]}" ]]; then
printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[#]}"
fi
Remark. The identification of files is only based on their filenames, not their contents.

How to loop through file names returned by find?

x=$(find . -name "*.txt")
echo $x
if I run the above piece of code in Bash shell, what I get is a string containing several file names separated by blank, not a list.
Of course, I can further separate them by blank to get a list, but I'm sure there is a better way to do it.
So what is the best way to loop through the results of a find command?
TL;DR: If you're just here for the most correct answer, you probably want my personal preference (see the bottom of this post):
# execute `process` once for each file
find . -name '*.txt' -exec process {} \;
If you have time, read through the rest to see several different ways and the problems with most of them.
The full answer:
The best way depends on what you want to do, but here are a few options. As long as no file or folder in the subtree has whitespace in its name, you can just loop over the files:
for i in $x; do # Not recommended, will break on whitespace
process "$i"
done
Marginally better, cut out the temporary variable x:
for i in $(find -name \*.txt); do # Not recommended, will break on whitespace
process "$i"
done
It is much better to glob when you can. White-space safe, for files in the current directory:
for i in *.txt; do # Whitespace-safe but not recursive.
process "$i"
done
By enabling the globstar option, you can glob all matching files in this directory and all subdirectories:
# Make sure globstar is enabled
shopt -s globstar
for i in **/*.txt; do # Whitespace-safe and recursive
process "$i"
done
In some cases, e.g. if the file names are already in a file, you may need to use read:
# IFS= makes sure it doesn't trim leading and trailing whitespace
# -r prevents interpretation of \ escapes.
while IFS= read -r line; do # Whitespace-safe EXCEPT newlines
process "$line"
done < filename
read can be used safely in combination with find by setting the delimiter appropriately:
find . -name '*.txt' -print0 |
while IFS= read -r -d '' line; do
process "$line"
done
For more complex searches, you will probably want to use find, either with its -exec option or with -print0 | xargs -0:
# execute `process` once for each file
find . -name \*.txt -exec process {} \;
# execute `process` once with all the files as arguments*:
find . -name \*.txt -exec process {} +
# using xargs*
find . -name \*.txt -print0 | xargs -0 process
# using xargs with arguments after each filename (implies one run per filename)
find . -name \*.txt -print0 | xargs -0 -I{} process {} argument
find can also cd into each file's directory before running a command by using -execdir instead of -exec, and can be made interactive (prompt before running the command for each file) using -ok instead of -exec (or -okdir instead of -execdir).
*: Technically, both find and xargs (by default) will run the command with as many arguments as they can fit on the command line, as many times as it takes to get through all the files. In practice, unless you have a very large number of files it won't matter, and if you exceed the length but need them all on the same command line, you're SOL find a different way.
What ever you do, don't use a for loop:
# Don't do this
for file in $(find . -name "*.txt")
do
…code using "$file"
done
Three reasons:
For the for loop to even start, the find must run to completion.
If a file name has any whitespace (including space, tab or newline) in it, it will be treated as two separate names.
Although now unlikely, you can overrun your command line buffer. Imagine if your command line buffer holds 32KB, and your for loop returns 40KB of text. That last 8KB will be dropped right off your for loop and you'll never know it.
Always use a while read construct:
find . -name "*.txt" -print0 | while read -d $'\0' file
do
…code using "$file"
done
The loop will execute while the find command is executing. Plus, this command will work even if a file name is returned with whitespace in it. And, you won't overflow your command line buffer.
The -print0 will use the NULL as a file separator instead of a newline and the -d $'\0' will use NULL as the separator while reading.
find . -name "*.txt"|while read fname; do
echo "$fname"
done
Note: this method and the (second) method shown by bmargulies are safe to use with white space in the file/folder names.
In order to also have the - somewhat exotic - case of newlines in the file/folder names covered, you will have to resort to the -exec predicate of find like this:
find . -name '*.txt' -exec echo "{}" \;
The {} is the placeholder for the found item and the \; is used to terminate the -exec predicate.
And for the sake of completeness let me add another variant - you gotta love the *nix ways for their versatility:
find . -name '*.txt' -print0|xargs -0 -n 1 echo
This would separate the printed items with a \0 character that isn't allowed in any of the file systems in file or folder names, to my knowledge, and therefore should cover all bases. xargs picks them up one by one then ...
Filenames can include spaces and even control characters. Spaces are (default) delimiters for shell expansion in bash and as a result of that x=$(find . -name "*.txt") from the question is not recommended at all. If find gets a filename with spaces e.g. "the file.txt" you will get 2 separated strings for processing, if you process x in a loop. You can improve this by changing delimiter (bash IFS Variable) e.g. to \r\n, but filenames can include control characters - so this is not a (completely) safe method.
From my point of view, there are 2 recommended (and safe) patterns for processing files:
1. Use for loop & filename expansion:
for file in ./*.txt; do
[[ ! -e $file ]] && continue # continue, if file does not exist
# single filename is in $file
echo "$file"
# your code here
done
2. Use find-read-while & process substitution
while IFS= read -r -d '' file; do
# single filename is in $file
echo "$file"
# your code here
done < <(find . -name "*.txt" -print0)
Remarks
on Pattern 1:
bash returns the search pattern ("*.txt") if no matching file is found - so the extra line "continue, if file does not exist" is needed. see Bash Manual, Filename Expansion
shell option nullglob can be used to avoid this extra line.
"If the failglob shell option is set, and no matches are found, an error message is printed and the command is not executed." (from Bash Manual above)
shell option globstar: "If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match." see Bash Manual, Shopt Builtin
other options for filename expansion: extglob, nocaseglob, dotglob & shell variable GLOBIGNORE
on Pattern 2:
filenames can contain blanks, tabs, spaces, newlines, ... to process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL. see also Gnu Findutils Manpage, Unsafe File Name Handling, safe File Name Handling, unusual characters in filenames. See David A. Wheeler below for detailed discussion of this topic.
There are some possible patterns to process find results in a while loop. Others (kevin, David W.) have shown how to do this using pipes:
files_found=1
find . -name "*.txt" -print0 |
while IFS= read -r -d '' file; do
# single filename in $file
echo "$file"
files_found=0 # not working example
# your code here
done
[[ $files_found -eq 0 ]] && echo "files found" || echo "no files found"
When you try this piece of code, you will see, that it does not work: files_found is always "true" & the code will always echo "no files found". Reason is: each command of a pipeline is executed in a separate subshell, so the changed variable inside the loop (separate subshell) does not change the variable in the main shell script. This is why I recommend using process substitution as the "better", more useful, more general pattern.See I set variables in a loop that's in a pipeline. Why do they disappear... (from Greg's Bash FAQ) for a detailed discussion on this topic.
Additional References & Sources:
Gnu Bash Manual, Pattern Matching
Filenames and Pathnames in Shell: How to do it Correctly, David A. Wheeler
Why you don't read lines with "for", Greg's Wiki
Why you shouldn't parse the output of ls(1), Greg's Wiki
Gnu Bash Manual, Process Substitution
(Updated to include #Socowi's execellent speed improvement)
With any $SHELL that supports it (dash/zsh/bash...):
find . -name "*.txt" -exec $SHELL -c '
for i in "$#" ; do
echo "$i"
done
' {} +
Done.
Original answer (shorter, but slower):
find . -name "*.txt" -exec $SHELL -c '
echo "$0"
' {} \;
If you can assume the file names don't contain newlines, you can read the output of find into a Bash array using the following command:
readarray -t x < <(find . -name '*.txt')
Note:
-t causes readarray to strip newlines.
It won't work if readarray is in a pipe, hence the process substitution.
readarray is available since Bash 4.
Bash 4.4 and up also supports the -d parameter for specifying the delimiter. Using the null character, instead of newline, to delimit the file names works also in the rare case that the file names contain newlines:
readarray -d '' x < <(find . -name '*.txt' -print0)
readarray can also be invoked as mapfile with the same options.
Reference: https://mywiki.wooledge.org/BashFAQ/005#Loading_lines_from_a_file_or_stream
# Doesn't handle whitespace
for x in `find . -name "*.txt" -print`; do
process_one $x
done
or
# Handles whitespace and newlines
find . -name "*.txt" -print0 | xargs -0 -n 1 process_one
I like to use find which is first assigned to variable and IFS switched to new line as follow:
FilesFound=$(find . -name "*.txt")
IFSbkp="$IFS"
IFS=$'\n'
counter=1;
for file in $FilesFound; do
echo "${counter}: ${file}"
let counter++;
done
IFS="$IFSbkp"
As commented by #Konrad Rudolph this will not work with "new lines" in file name. I still think it is handy as it covers most of the cases when you need to loop over command output.
As already posted on the top answer by Kevin, the best solution is to use a for loop with bash glob, but as bash glob is not recursive by default, this can be fixed by a bash recursive function:
#!/bin/bash
set -x
set -eu -o pipefail
all_files=();
function get_all_the_files()
{
directory="$1";
for item in "$directory"/* "$directory"/.[^.]*;
do
if [[ -d "$item" ]];
then
get_all_the_files "$item";
else
all_files+=("$item");
fi;
done;
}
get_all_the_files "/tmp";
for file_path in "${all_files[#]}"
do
printf 'My file is "%s"\n' "$file_path";
done;
Related questions:
Bash loop through directory including hidden file
Recursively list files from a given directory in Bash
ls command: how can I get a recursive full-path listing, one line per file?
List files recursively in Linux CLI with path relative to the current directory
Recursively List all directories and files
bash script, create array of all files in a directory
How can I creates array that contains the names of all the files in a folder?
How can I creates array that contains the names of all the files in a folder?
How to get the list of files in a directory in a shell script?
based on other answers and comment of #phk, using fd #3:
(which still allows to use stdin inside the loop)
while IFS= read -r f <&3; do
echo "$f"
done 3< <(find . -iname "*filename*")
You can put the filenames returned by find into an array like this:
array=()
while IFS= read -r -d ''; do
array+=("$REPLY")
done < <(find . -name '*.txt' -print0)
Now you can just loop through the array to access individual items and do whatever you want with them.
Note: It's white space safe.
I think using this piece of code (piping the command after while done):
while read fname; do
echo "$fname"
done <<< "$(find . -name "*.txt")"
is better than this answer because while loop is executed in a subshell according to here, if you use this answer and variable changes cannot be seen after while loop if you want to modify variables inside the loop.
You can store your find output in array if you wish to use the output later as:
array=($(find . -name "*.txt"))
Now to print the each element in new line, you can either use for loop iterating to all the elements of array, or you can use printf statement.
for i in ${array[#]};do echo $i; done
or
printf '%s\n' "${array[#]}"
You can also use:
for file in "`find . -name "*.txt"`"; do echo "$file"; done
This will print each filename in newline
To only print the find output in list form, you can use either of the following:
find . -name "*.txt" -print 2>/dev/null
or
find . -name "*.txt" -print | grep -v 'Permission denied'
This will remove error messages and only give the filename as output in new line.
If you wish to do something with the filenames, storing it in array is good, else there is no need to consume that space and you can directly print the output from find.
function loop_through(){
length_="$(find . -name '*.txt' | wc -l)"
length_="${length_#"${length_%%[![:space:]]*}"}"
length_="${length_%"${length_##*[![:space:]]}"}"
for i in {1..$length_}
do
x=$(find . -name '*.txt' | sort | head -$i | tail -1)
echo $x
done
}
To grab the length of the list of files for loop, I used the first command "wc -l".
That command is set to a variable.
Then, I need to remove the trailing white spaces from the variable so the for loop can read it.
find <path> -xdev -type f -name *.txt -exec ls -l {} \;
This will list the files and give details about attributes.
Another alternative is to not use bash, but call Python to do the heavy lifting. I recurred to this because bash solutions as my other answer were too slow.
With this solution, we build a bash array of files from inline Python script:
#!/bin/bash
set -eu -o pipefail
dsep=":" # directory_separator
base_directory=/tmp
all_files=()
all_files_string="$(python3 -c '#!/usr/bin/env python3
import os
import sys
dsep="'"$dsep"'"
base_directory="'"$base_directory"'"
def log(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
def check_invalid_characther(file_path):
for thing in ("\\", "\n"):
if thing in file_path:
raise RuntimeError(f"It is not allowed {thing} on \"{file_path}\"!")
def absolute_path_to_relative(base_directory, file_path):
relative_path = os.path.commonprefix( [ base_directory, file_path ] )
relative_path = os.path.normpath( file_path.replace( relative_path, "" ) )
# if you use Windows Python, it accepts / instead of \\
# if you have \ on your files names, rename them or comment this
relative_path = relative_path.replace("\\", "/")
if relative_path.startswith( "/" ):
relative_path = relative_path[1:]
return relative_path
for directory, directories, files in os.walk(base_directory):
for file in files:
local_file_path = os.path.join(directory, file)
local_file_name = absolute_path_to_relative(base_directory, local_file_path)
log(f"local_file_name {local_file_name}.")
check_invalid_characther(local_file_name)
print(f"{base_directory}{dsep}{local_file_name}")
' | dos2unix)";
if [[ -n "$all_files_string" ]];
then
readarray -t temp <<< "$all_files_string";
all_files+=("${temp[#]}");
fi;
for item in "${all_files[#]}";
do
OLD_IFS="$IFS"; IFS="$dsep";
read -r base_directory local_file_name <<< "$item"; IFS="$OLD_IFS";
printf 'item "%s", base_directory "%s", local_file_name "%s".\n' \
"$item" \
"$base_directory" \
"$local_file_name";
done;
Related:
os.walk without hidden folders
How to do a recursive sub-folder search and return files in a list?
How to split a string into an array in Bash?
How about if you use grep instead of find?
ls | grep .txt$ > out.txt
Now you can read this file and the filenames are in the form of a list.

Resources