bash: using find with multiple file types, provided as an array - bash

In a bash function, I want to list all the files in a given folder which correspond to a given set of file types. In pseudo-code, I am imagining something like this:
getMatchingFiles() {
output=$1
directory=$2
shift 2
_types_=("$#")
file_array=find $directory -type f where-name-matches-item-in-_types_
# do other stuff with $file_array, such as trimming file names to
# just the basename with no extension
eval $output="${file_array[#]}"
}
dir=/path/to/folder
types=(ogg mp3)
getMatchingFiles result dir types
echo "${result[#]}"
For your amusement, here are the multiple workarounds, based on my current knowledge of bash, that I am using to achieve this. I have a problem with the way the function returns the array of files: the final command tries to execute each file, rather than to set the output parameter.
getMatchingFiles() {
local _output=$1
local _dir=$2
shift 2
local _type=("$#")
local _files=($_dir/$_type/*)
local -i ii=${#_files[#]}
local -a _filetypes
local _file _regex
case $_type in
audio )
_filetypes=(ogg mp3)
;;
images )
_filetypes=(jpg png)
;;
esac
_regex="^.*\.("
for _filetype in "${_filetypes[#]}"
do
_regex+=$_filetype"|"
done
_regex=${_regex:0:-1}
_regex+=")$"
for (( ; ii-- ; ))
do
_file=${_files[$ii]}
if ! [[ $_file =~ $_regex ]];then
unset _files[ii]
fi
done
echo "${_files[#]}"
# eval $_output="${_files[#]}" # tries to execute the files
}
dir=/path/to/parent
getMatchingFiles result $dir audio
echo "${result[#]}"

As a matter of fact, it is possible to use nameref (note that you need bash 4.3 or later) to reference an array. If you want to put the output of find to an array specified by a name, you can reference it like this:
#!/usr/bin/env bash
getMatchingFiles() {
local -n output=$1
local dir=$2
shift 2
local types=("$#")
local ext file
local -a find_ext
[[ ${#types[#]} -eq 0 ]] && return 1
for ext in "${types[#]}"; do
find_ext+=(-o -name "*.${ext}")
done
unset 'find_ext[0]'
output=()
while IFS= read -r -d $'\0' file; do
output+=("$file")
done < <(find "$dir" -type f \( "${find_ext[#]}" \) -print0)
}
dir=/some/path
getMatchingFiles result "$dir" mp3 txt
printf '%s\n' "${result[#]}"
getMatchingFiles other_result /some/other/path txt
printf '%s\n' "${other_result[#]}"
Don't pass your variable $dir as a reference, pass it as a value instead. You will be able to pass a literal as well.

Update: namerefs can indeed be arrays (see PesaThe's answer)
Without spaces in file and directory names
I first assume you do not have spaces in your file and directory names. See the second part of this answer if you have spaces in your file and directory names.
In order to pass result, dir and types by name to your function, you need to use namerefs (local -n or declare -n, available only in recent versions of bash).
Another difficulty is to build the find command based on the types you passed but this is not a major one. Pattern substitutions can do this. All in all, something like this should do about what you want:
#!/usr/bin/env bash
getMatchingFiles() {
local -n output=$1
local -n directory=$2
local -n _types_=$3
local filter
filter="${_types_[#]/#/ -o -name *.}"
filter="${filter# -o }"
output=( $( find "$directory" -type f \( $filter \) ) )
# do other stuff with $output, such as trimming file names to
# just the basename with no extension
}
declare dir
declare -a types
declare -a result=()
dir=/path/to/folder
types=(ogg mp3)
getMatchingFiles result dir types
for f in "${result[#]}"; do echo "$f"; done
With spaces in file and directory names (but not in file suffixes)
If you have spaces in your file and directory names, things are a bit more difficult because you must assign your array such that names are not split in words; one possibility to do this is to use \0 as file names separator, instead of a space, thanks to the -print0 option of find and the -d $'\0' option of read:
#!/usr/bin/env bash
getMatchingFiles() {
local -n output=$1
local -n directory=$2
local -n _types_=$3
local filter
filter="${_types_[#]/#/ -o -name *.}"
filter="${filter# -o }"
while read -d $'\0' file; do
output+=( "$file" )
done < <( find "$directory" -type f \( $filter \) -print0 )
# do other stuff with $output, such as trimming file names to
# just the basename with no extension
}
declare dir
declare -a types
declare -a result=()
dir=/path/to/folder
types=(ogg mp3)
getMatchingFiles result dir types[#]
for f in "${result[#]}"; do echo "$f"; done
With spaces in file and directory names, even in file suffixes
Well, you deserve what happens to you... Still possible but left as an exercise.

Supporting the original, unmodified calling convention, and correctly handling extensions with whitespace or glob characters:
#!/usr/bin/env bash
getMatchingFiles() {
declare -g -a "$1=()"
declare -n gMF_result="$1" # variables are namespaced to avoid conflicts w/ targets
declare -n gMF_dir="$2"
declare -n gMF_types="$3"
local gMF_args=( -false ) # empty type list not a special case
local gMF_type gMF_item
for gMF_type in "${gMF_types[#]}"; do
gMF_args+=( -o -name "*.$gMF_type" )
done
while IFS= read -r -d '' gMF_item; do
gMF_result+=( "$gMF_item" )
done < <(find "$gMF_dir" '(' "${gMF_args[#]}" ')' -print0)
}
dir=/path/to/folder
types=(ogg mp3)
getMatchingFiles result dir types

Related

How can I check if exists file with name according to "template" in the directory?

Given variable with name template , for example: template=*.txt.
How can I check if files with name like this template exist in the current directory?
For example, according to the value of the template above, I want to know if there is files with the suffix .txt in the current directory.
I would do it like this with just built-ins:
templcheck () {
for f in * .*; do
[[ -f $f ]] && [[ $f = $1 ]] && return 0
done
return 1
}
This takes the template as an argument (must be quoted to prevent premature expansion) and returns success if there was a match in the current directory. This should work for any filenames, including those with spaces and newlines.
Usage would look like this:
$ ls
file1.txt 'has space1.txt' script.bash
$ templcheck '*.txt' && echo yes
yes
$ templcheck '*.md' && echo yes || echo no
no
To use with the template contained in a variable, that expansion has to be quoted as well:
templcheck "$template"
Use find:
: > found.txt # Ensure the file is empty
find . -prune -exec find -name "$template" \; > found.txt
if [ -s found.txt ]; then
echo "No matching files"
else
echo "Matching files found"
fi
Strictly speaking, you can't assume that found.txt contains exactly one file name per line; a filename with an embedded newline will look the same as two separate files. But this does guarantee that an empty file means no matching files.
If you want an accurate list of matching file names, you need to disable field splitting while keeping pathname expansion.
[[ -v IFS ]] && OLD_IFS=$IFS
IFS=
shopt -s nullglob
files=( $template )
[[ -v OLD_IFS ]] && IFS=$OLD_IFS
printf "Found: %s\n" "${files[#]}"
This requires several bash extensions (the nullglob option, arrays, and the -v operator for convenience of restoring IFS). Each element of the array is exactly one match.

bash script not filtering

I'm hoping this is a simple question, since I've never done shell scripting before. I'm trying to filter certain files out of a list of results. While the script executes and prints out a list of files, it's not filtering out the ones I don't want. Thanks for any help you can provide!
#!/bin/bash
# Purpose: Identify all *md files in H2 repo where there is no audit date
#
#
#
# Example call: no_audits.sh
#
# If that call doesn't work, try ./no_audits.sh
#
# NOTE: Script assumes you are executing from within the scripts directory of
# your local H2 git repo.
#
# Process:
# 1) Go to H2 repo content directory (assumption is you are in the scripts dir)
# 2) Use for loop to go through all *md files in each content sub dir
# and list all file names and directories where audit date is null
#
#set counter
count=0
# Go to content directory and loop through all 'md' files in sub dirs
cd ../content
FILES=`find . -type f -name '*md' -print`
for f in $FILES
do
if [[ $f == "*all*" ]] || [[ $f == "*index*" ]] ;
then
# code to skip
echo " Skipping file: " $f
continue
else
# find audit_date in file metadata
adate=`grep audit_date $f`
# separate actual dates from rest of the grepped line
aadate=`echo $adate | awk -F\' '{print $2}'`
# if create date is null - proceed
if [[ -z "$aadate" ]] ;
then
# print a list of all files without audit dates
echo "Audit date: " $aadate " " $f;
count=$((count+1));
fi
fi
done
echo $count " files without audit dates "
First, to address the immediate issue:
[[ $f == "*all*" ]]
is only true if the exact contents of f is the string *all* -- with the wildcards as literal characters. If you want to check for a substring, then the asterisks shouldn't be quoted:
[[ $f = *all* ]]
...is a better-practice solution. (Note the use of = rather than == -- this isn't essential, but is a good habit to be in, as the POSIX test command is only specified to permit = as a string comparison operator; if one writes [ "$f" == foo ] by habit, one can get unexpected failures on platforms with a strictly compliant /bin/sh).
That said, a ground-up implementation of this script intended to follow best practices might look more like the following:
#!/usr/bin/env bash
count=0
while IFS= read -r -d '' filename; do
aadate=$(awk -F"'" '/audit_date/ { print $2; exit; }' <"$filename")
if [[ -z $aadate ]]; then
(( ++count ))
printf 'File %q has no audit date\n' "$filename"
else
printf 'File %q has audit date %s\n' "$filename" "$aadate"
fi
done < <(find . -not '(' -name '*all*' -o -name '*index*' ')' -type f -name '*md' -print0)
echo "Found $count files without audit dates" >&2
Note:
An arbitrary list of filenames cannot be stored in a single bash string (because all characters that might otherwise be used to determine where the first name ends and the next name begins could be present in the name itself). Instead, read one NUL-delimited filename at a time -- emitted with find -print0, read with IFS= read -r -d ''; this is discussed in [BashFAQ #1].
Filtering out unwanted names can be done internal to find.
There's no need to preprocess input to awk using grep, as awk is capable of searching through input files itself.
< <(...) is used to avoid the behavior in BashFAQ #24, wherein content piped to a while loop causes variables set or modified within that loop to become unavailable after its exit.
printf '...%q...\n' "$name" is safer than echo "...$name..." when handling unknown filenames, as printf will emit printable content that accurately represents those names even if they contain unprintable characters or characters which, when emitted directly to a terminal, act to modify that terminal's configuration.
Nevermind, I found the answer here:
bash script to check file name begins with expected string
I tried various versions of the wildcard/filename and ended up with:
if [[ "$f" == *all.md ]] || [[ "$f" == *index.md ]] ;
The link above said not to put those in quotes, and removing the quotes did the trick!

How can I use multiple Bash arguments in loop dynamically without using long regex strings?

I have a directory with the following files:
file1.jpg
file2.jpg
file3.jpg
file1.png
file2.png
file3.png
I have a bash function named filelist and it looks like this:
filelist() {
if [ "$1" ]
then
shopt -s nullglob
for filelist in *."$#" ; do
echo "$filelist" >> created-file-list.txt;
done
echo "file created listing: " $#;
else
filelist=`find . -type f -name "*.*" -exec basename \{} \;`
echo "$filelist" >> created-file-list.txt
echo "file created listing: All Files";
fi
}
Goal: Be able to type as many arguments as I want for example filelist jpg png and create a file with a list of files of only the extensions I used as arguments. So if I type filelist jpg it would only show a list of files that have .jpg.
Currently: My code works great with one argument thanks to $#, but when I use both jpg and png it creates the following list
file1.jpg
file2.jpg
file3.jpg
png
It looks like my for loop is only running once and only using the first argument. My suspicion is I need to count how many arguments and run the loop on each one.
An obvious fix for this is to create a long regex check like (jpg|png|jpeg|html|css) and all of the different extensions one could ever think to type. This is not ideal because I want other people to be free to type their file extensions without breaking it if they type one that I don't have identified in my regex. Dynamic is key.
You can rewrite your function as shown below - just loop through each extension and append the list of matching files to the output file:
filelist() {
if [ $# -gt 0 ]; then
shopt -s nullglob
for ext in "$#"; do
printf '%s\n' *."$ext" >> created-file-list.txt
echo "created listing for extension $ext"
done
else
find . -type f -name "*.*" -exec basename \{} \; >> created-file-list.txt
echo "created listing for all files"
fi
}
And you can invoke your function as:
filelist jpg png
Try this
#!/bin/bash
while [ -n "$1" ]
do
echo "Current Parameter: $1 , Remaining $#"
#Pass $1 to some bash function or do whatever
shift
done
Using the shift you shift the args left and get the next one by reading the $1 variable.
See man bash on what shift does.
shift [n]
The positional parameters from n+1 ... are renamed to $1 .... Parameters represented by the numbers $# down to $#-n+1 are
unset. n must
be a non-negative number less than or equal to $#. If n is 0, no parameters are changed. If n is not given, it is assumed to
be 1. If n
is greater than $#, the positional parameters are not changed. The return status is greater than zero if n is greater than
$# or less
than zero; otherwise 0.
Or you can iterate like as follows
for this in "$#"
do
echo "Param = $this";
done

find -iname not working in script

Following find command.
find Work/Linux4/test/test/test_goal/spyglass_reports/clock-reset/Ac_coherency06/ -iname "Ac_coherency*.csv"
is working fine when run on shell.
But in perl script it return nothing.
#!/bin/bash
REPORT_DIR=$1
FIND_CMD=$2
echo "##";
echo $REPORT_DIR ;
echo $FIND_CMD ;
LIST_OF_CSV=$(find $REPORT_DIR $FIND_CMD)
echo $LIST_OF_CSV
if [ "X" == "X${LIST_OF_CSV}" ]; then
echo "No files Found for : '$FIND_CMD' in directory ";
echo " '$REPORT_DIR' " | sed -e 's;Work/.*/test_reports;Work/PLATFORM/test_reports;g';
echo;
exit 0;
fi
Output of script:
##
Work/$PLATFORM_SPECIES/test_reports/clock-reset/Ac_coherency06 -iname "Ac_coherency06*.csv"
No files Found for : '-iname "Ac_coherency06*.csv"' in directory 'Work/PLATFORM/test_reports/clock-reset/Ac_coherency06'
If you're allowing a list of find predicates to be passed, keep them in list form, one argument to find per argument to your script. As an example implemented in this manner:
#!/bin/bash
# read report_dir off the command line, and shift it from arguments
report_dir=$1; shift
# generate a version of report_dir for human consumption
re='Work/.*/test_reports'
replacement='Work/PLATFORM/test_reports'
if [[ $report_dir =~ $re ]]; then
report_dir_name=${report_dir//${BASH_REMATCH[0]}/$replacement}
else
report_dir_name=$report_dir
fi
# read results from find -- stored NUL-delimited -- into an array
# using NUL-delimited inputs ensure that even unusual filenames work correctly
declare -a list_of_csv
while IFS= read -r -d '' filename; do
list_of_csv+=( "$filename" )
done < <(find "$report_dir" '(' "$#" ')' -print0)
# Use the length of that array to determine whether we found contents
echo "Found ${#list_of_csv[#]} files" >&2
if (( ${#list_of_csv[#]} == 0 )); then
echo "No files found in $report_dir_name" >&2
fi
Here, shift consumes the first argument from your list, and "$#" refers to all the others that remain after that point. This means that the items you want to have passed as separate, individual arguments to find can (and must) be passed as separate, individual arguments to your script.
Thus, with usage yourscript "/path/to/report/dir" -name '*.txt', initially, $1 will be /path/to/report/dir, $2 will be -name, and $3 will be *.txt. However, after shift is run, $1 will be -name, and $2 will be *.txt; and "$#" will refer to both of those, each passed as a separate word.
For details on the use of a while read loop to read items off of a stream, see BashFAQ #001.
For details on the syntax used for bash-native string replacement, see BashFAQ #100 or http://wiki.bash-hackers.org/syntax/pe
For details on shell arrays, including ${#arrayname[#]} to check their length or "${arrayname[#]}" to expand to their contents, see BashFAQ #005.
If you have a command that is running well on the shell but not on your script, the first thing I would try would be to specify Bash on the command being called, see if this works:
bash -c 'find Work/Linux4/test/test/test_goal/spyglass_reports/clock-reset/Ac_coherency06/ -iname "Ac_coherency*.csv"'
Or even better:
/bin/bash -c 'find Work/Linux4/test/test/test_goal/spyglass_reports/clock-reset/Ac_coherency06/ -iname "Ac_coherency*.csv"'
You could also store the result on a variable or other data structure as needed, and pass it later to the script, for example:
ResultCommand="$(bash -c 'find Work/Linux4/test/test/test_goal/spyglass_reports/clock-reset/Ac_coherency06/ -iname "Ac_coherency*.csv"')"
Edit: this answer was edited more than once to fix possible issues.

Creating a which command in bash script

For an assignment, I'm supposed to create a script called my_which.sh that will "do the same thing as the Unix command, but do it using a for loop over an if." I am also not allowed to call which in my script.
I'm brand new to this, and have been reading tutorials, but I'm pretty confused on how to start. Doesn't which just list the path name of a command?
If so, how would I go about displaying the correct path name without calling which, and while using a for loop and an if statement?
For example, if I run my script, it will echo % and wait for input. But then how do I translate that to finding the directory? So it would look like this?
#!/bin/bash
path=(`echo $PATH`)
echo -n "% "
read ans
for i in $path
do
if [ -d $i ]; then
echo $i
fi
done
I would appreciate any help, or even any starting tutorials that can help me get started on this. I'm honestly very confused on how I should implement this.
Split your PATH variable safely. This is a general method to split a string at delimiters, that is 100% safe regarding any possible characters (including newlines):
IFS=: read -r -d '' -a paths < <(printf '%s:\0' "$PATH")
We artificially added : because if PATH ends with a trailing :, then it is understood that current directory should be in PATH. While this is dangerous and not recommended, we must also take it into account if we want to mimic which. Without this trailing colon, a PATH like /bin:/usr/bin: would be split into
declare -a paths='( [0]="/bin" [1]="/usr/bin" )'
whereas with this trailing colon the resulting array is:
declare -a paths='( [0]="/bin" [1]="/usr/bin" [2]="" )'
This is one detail that other answers miss. Of course, we'll do this only if PATH is set and non-empty.
With this split PATH, we'll use a for-loop to check whether the argument can be found in the given directory. Note that this should be done only if argument doesn't contain a / character! this is also something other answers missed.
My version of which handles a unique option -a that print all matching pathnames of each argument. Otherwise, only the first match is printed. We'll have to take this into account too.
My version of which handles the following exit status:
0 if all specified commands are found and executable
1 if one or more specified commands is nonexistent or not executable
2 if an invalid option is specified
We'll handle that too.
I guess the following mimics rather faithfully the behavior of my which (and it's pure Bash):
#!/bin/bash
show_usage() {
printf 'Usage: %s [-a] args\n' "$0"
}
illegal_option() {
printf >&2 'Illegal option -%s\n' "$1"
show_usage
exit 2
}
check_arg() {
if [[ -f $1 && -x $1 ]]; then
printf '%s\n' "$1"
return 0
else
return 1
fi
}
# manage options
show_only_one=true
while (($#)); do
[[ $1 = -- ]] && { shift; break; }
[[ $1 = -?* ]] || break
opt=${1#-}
while [[ $opt ]]; do
case $opt in
(a*) show_only_one=false; opt=${opt#?} ;;
(*) illegal_option "${opt:0:1}" ;;
esac
done
shift
done
# If no arguments left or empty PATH, exit with return code 1
(($#)) || exit 1
[[ $PATH ]] || exit 1
# split path
IFS=: read -r -d '' -a paths < <(printf '%s:\0' "$PATH")
ret=0
# loop on arguments
for arg; do
# Check whether arg contains a slash
if [[ $arg = */* ]]; then
check_arg "$arg" || ret=1
else
this_ret=1
for p in "${paths[#]}"; do
if check_arg "${p:-.}/$arg"; then
this_ret=0
"$show_only_one" && break
fi
done
((this_ret==1)) && ret=1
fi
done
exit "$ret"
To test whether an argument is executable or not, I'm checking whether it's a regular file1 which is executable with:
[[ -f $arg && -x $arg ]]
I guess that's close to my which's behavior.
1 As #mklement0 points out (thanks!) the -f test, when applied against a symbolic link, tests the type of the symlink's target.
#!/bin/bash
#Get the user's first argument to this script
exe_name=$1
#Set the field separator to ":" (this is what the PATH variable
# uses as its delimiter), then read the contents of the PATH
# into the array variable "paths" -- at the same time splitting
# the PATH by ":"
IFS=':' read -a paths <<< $PATH
#Iterate over each of the paths in the "paths" array
for e in ${paths[*]}
do
#Check for the $exe_name in this path
find $e -name $exe_name -maxdepth 1
done
This is similar to the accepted answer with the difference that it does not set the IFS and checks if the execute bits are set.
#!/bin/bash
for i in $(echo "$PATH" | tr ":" "\n")
do
find "$i" -name "$1" -perm +111 -maxdepth 1
done
Save this as my_which.sh (or some other name) and run it as ./my_which java etc.
However if there is an "if" required:
#!/bin/bash
for i in $(echo "$PATH" | tr ":" "\n")
do
# this is a one liner that works. However the user requires an if statment
# find "$i" -name "$1" -perm +111 -maxdepth 1
cmd=$i/$1
if [[ ( -f "$cmd" || -L "$cmd" ) && -x "$cmd" ]]
then
echo "$cmd"
break
fi
done
You might want to take a look at this link to figure out the tests in the "if".
For a complete, rock-solid implementation, see gniourf_gniourf's answer.
Here's a more concise alternative that makes do with a single invocation of find [per name to investigate].
The OP later clarified that an if statement should be used in a loop, but the question is general enough to warrant considering other approaches.
A naïve implementation would even work as a one-liner, IF you're willing to make a few assumptions (the example uses 'ls' as the executable to locate):
find -L ${PATH//:/ } -maxdepth 1 -type f -perm -u=x -name 'ls' 2>/dev/null
The assumptions - which will hold in many, but not all situations - are:
$PATH must not contain entries that when used unquoted result in shell expansions (e.g., no embedded spaces that would result in word splitting, no characters such as * that would result in pathname expansion)
$PATH must not contain an empty entry (which must be interpreted as the current dir).
Explanation:
-L tells find to investigate the targets of symlinks rather than the symlinks themselves - this ensures that symlinks to executable files are also recognized by -type f
${PATH//:/ } replaces all : chars. in $PATH with a space each, causing the result - due to being unquoted - to be passed as individual arguments split by spaces.
-maxdepth 1 instructs find to only look directly in each specified directory, not also in subdirectories
-type f matches only files, not directories.
-perm -u=x matches only files and directories that the current user (u) can execute (x).
2>/dev/null suppresses error messages that may stem from non-existent directories in the $PATH or failed attempts to access files due to lack of permission.
Here's a more robust script version:
Note:
For brevity, only handles a single argument (and no options).
Does NOT handle the case where entries or result paths may contain embedded \n chars - however, this is extremely rare in practice and likely leads to bigger problems overall.
#!//bin/bash
# Assign argument to variable; error out, if none given.
name=${1:?Please specify an executable filename.}
# Robustly read individual $PATH entries into a bash array, splitting by ':'
# - The additional trailing ':' ensures that a trailing ':' in $PATH is
# properly recognized as an empty entry - see gniourf_gniourf's answer.
IFS=: read -r -a paths <<<"${PATH}:"
# Replace empty entries with '.' for use with `find`.
# (Empty entries imply '.' - this is legacy behavior mandated by POSIX).
for (( i = 0; i < "${#paths[#]}"; i++ )); do
[[ "${paths[i]}" == '' ]] && paths[i]='.'
done
# Invoke `find` with *all* directories and capture the 1st match, if any, in a variable.
# Simply remove `| head -n 1` to print *all* matches.
match=$(find -L "${paths[#]}" -maxdepth 1 -type f -perm -u=x -name "$name" 2>/dev/null |
head -n 1)
# Print result, if found, and exit with appropriate exit code.
if [[ -n $match ]]; then
printf '%s\n' "$match"
exit 0
else
exit 1
fi

Resources