Using wildcard and regex in find - bash

I want to use find to list specific files using wildcard or regular expression.
For instance, I am currently using the following bash code to list all files starting with A.
nm="$2" # user passes wildcard "A*" in $nm.
find $dir -name $nm
But one problem is that I don't want Bash to expand the wildcard being passed by the user.
I constructed the following bash function
mse ()
{
local incl=()
local iarg=0 narg="$#"
IFSPREV="$IFS" # Save IFS (splits arguments on whitespace by default)
IFS=" |=" # Split arguments on " " and "="
set -- $* # Set positional parameters to command line arguments
IFS="$IFSPREV" # Set original IFS
while (( $# > 0 )); do
iarg=$(( iarg + 1 ))
case "$1" in
("--incl") incl+=("$2") ; echo "Test: $2" ; shift 2 ;;
esac
done
echo "\$#: $#"
echo "incl: ${incl[#]}"
}
Then I run the function in the terminal
mse -d /media/hag/hc1/a1-chaos/amvib/ --incl="A*"
which gives
incl: Admir
The result is wrong because it is picking up the directory name Admir that shows up it the current working directory.
And for passing regular expressions, how should the find command be constructed?

I suggest:
find "$dir" -name "$nm"
(Double) quotation marks prevent the expansion of wildcards.

To actually use regular expressions with find, pass the output of find to grep, for example:
find "$dir" | grep "$nm"
For example, I frequently use this variation:
find "$dir" | grep -P "$nm"
Here, GNU grep uses option:
-P : Use Perl regexes.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Related

Identifying folder with name as largest number in the directory

there is a directory which contains folders named with numbers, i've to find the folder with largest number in that directory.
This is the script i've written to find that folder:
files='ls path/'
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
But it's not working. It gives below error after invoking the script using command sudo ./restore2.sh.
ls
path/
./restore2.sh: line 6: path/: syntax error: operand expected (error token is "/")
0
Try this:
#!/bin/bash
files=`ls path/`
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
there's a backtick here: ls path/ instead of single or double-quotes.
I've only corrected this statement and it worked. and notice to add #!/bin/bash at the top of the script. This will tell your system to run the script in a bash shell.
You're using single quotes instead of backticks files='ls path/'. It's trying to use it as a literal string instead of evaluating it.
Also, for that specific task, you can just do:
ls test | awk '{if($1 > largest){largest = $1}} END{print largest}'
To have it a bit simpler.
Use find instead:
find . -maxdepth 1 -type d -regextype "posix-extended" -regex "^.*[[:digit:]]+.*$" | sort -n | tail -1
Set the maxdepth to 1 to check for directories within this directory only and no deeper. Set the regular expression type to posix-extended and search for all directories that have one or more digits. Print the result and order through sort before taking the largest one with tail -1.
Does path/ have any files in it? It looks like it's empty.
You should be getting a completely different complaint...
You don't want the path info in the filename. Rather than strip it with ${file##*/}, just go there and use non-path'd names.
An adaptation using your own logic as its base -
cd /whatever/path/ # go where the files are
var=-1 # initialize comparator
for file in [0-9]* # each entry that starts with a digit
do [[ "$file" =~ [^0-9] ]] && continue # skip any file with nondigit contents
[[ -f "$file" ]] || continue # only process plain files
(( file > var )) && var=$file # remember largest seen
done
echo $var # report largest
If you are sure there will be no negative numbered filenames, this should do it.
If there can be valid negatives, then your initialization needs to be appropriately lower, and the exclusion of nondigits should include the minus sign, as well as the list of files to select.
Note that this doesn't parse ls and doesn't require piping through a sort or spawning any other processes -- it's all handled in the bash interpreter and should be pretty efficient.
If you are sure of your data, and know there aren't any negatives or files named just 0 or non-plain-file entries in the directory that match the [0-9]* pattern, you can simplify it to just
cd /whatever/path/ # go where the files are
for file in [0-9]*; do (( file > var )) && var=$file; done
echo $var # report largest
As an aside, if you wanted to preserve the "make a list first" logic, you should still NOT use ls. Use an array.
cd /wherever/your/files/are/
files=( [0-9]* )
for file in "${files[#]}"
do : ...

file_path counts spaces as end of filename [duplicate]

I am trying to save the result from find as arrays.
Here is my code:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
len=${#array[*]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
P.S
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.
Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
array=()
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
array=()
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.
Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
-d
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.
The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
IFS=$'\n'
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"
If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.
you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"
None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
array=()
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
len=${#array[#]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
You could do like this:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
done
In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")

Assign files in a list when using the bash find command

Im trying to see if I can assign the output of the find command to a variable. In this case it would be a list and iterate one file at a time to evaluate the file.
Ive tried this:
#!/bin/bash
PATH=/Users/mike/test
LIST='find $PATH -name *.txt'
for newfiles in #LIST; do
#checksize
echo $newfiles
done
My output is:
#LIST
Im trying to do the same this as the glob command in perl in bash.
var = glob "PATH/*.txt";
Use $(command) to execute command and substitute its output in place of that construct.
list=$(find "$PATH" -name '*.txt')
And to access a variable, put $ before the name, not # (your perl experience is showing).
for newfiles in $list; do
echo "$newfiles"
done
However, it's dangerous to parse the output of find like this, because you'll get incorrect results if any of the filenames contain whitespace -- it will be treated as multiple names. It's better to pipe the output:
find "$PATH" -name '*.txt' | while read -r newfiles; do
echo "$newfiles"
done
Also, notice that you should quote any variables that you don't want to be split into multiple words if they contain whitespace.
And avoid using all-uppercase variable names. This is conventionally reserved for environment variables.
LIST=$(find $PATH -name *.txt)
for newfiles in $LIST; do
Beware that you will have issues if any of the files have whitespace in the names.
Assuming you are using bash 4 or later, don't use find at all here.
shopt -s globstar nullglob
list=( "$path"/**/*.txt )
for newfile in "${list[#]}"; do
echo "$newfile"
done

bash - passing a pattern including wildcards as a parameter to recursive function

I am still quite confused with the way variables expand. This is my code:
if [ "$2" ]; then
pattern="*$2*"
else
pattern=""
fi
function list(){
ls -lF $2 > output_file
for dir in `ls -d1 */`; do
list "$dir" $2
done
}
cd $1; path=`basename $PWD`
list "$path" $pattern
This script attempts to store some file information for the files contained in $1 whose names containt a string given in $2.
The main purpose is just learning, and the specific error I want to avoid is the one I get when wildcard characters stored in pattern are interpreted as a file name.
find, stat, and using the pattern without ls can get the desired output (And I'll be glad to learn the most elegant way. BUT the main question here is how to handle wildcard characters if you'd want to pass them as parameters.
Double quote the variable where the value shouldn't be expanded:
list "$dir" "$2"
# ...
list "$path" "$pattern"

List all the files with prefixes from a for loop using Bash

Here is a small[but complete] part of my bash script that finds and outputs all files in mydir if the have the prefix from a stored array. Strange thing I notice is that this script works perfectly if I take out the "-maxdepth 1 -name" from the script else it only gives me the files with the prefix of the first element in the array.
It would be of great help if someone explained this to me. Sorry in advance if there is some thing obviously silly that I'm doing. I'm relatively new to scripting.
#!/bin/sh
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in $DIS_ARRAY
do
IN_FILES=`find /mydir -maxdepth 1 -name "$dis*.xml"`
for file in $IN_FILES
do
echo $file
done
done
Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
Expected Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
/mydir/Bc.xml
/mydir/Cb.xml
/mydir/Dc.xml
The loop is broken either way. The reason why
IN_FILES=`find mydir -maxdepth 1 -name "$dis*.xml"`
works, whereas
IN_FILES=`find mydir "$dis*.xml"`
doesn't is because in the first one, you have specified -name. In the second one, find is listing all the files in mydir. If you change the second one to
IN_FILES=`find mydir -name "$dis*.xml"`
you will see that the loop isn't working.
As mentioned in the comments, the syntax that you are currently using $DIS_ARRAY will only give you the first element of the array.
Try changing your loop to this:
for dis in "${DIS_ARRAY[#]}"
The double quotes around the expansion aren't strictly necessary in your specific case, but required if the elements in your array contained spaces, as demonstrated in the following test:
#!/bin/bash
arr=("a a" "b b")
echo using '$arr'
for i in $arr; do echo $i; done
echo using '${arr[#]}'
for i in ${arr[#]}; do echo $i; done
echo using '"${arr[#]}"'
for i in "${arr[#]}"; do echo $i; done
output:
using $arr
a
a
using ${arr[#]}
a
a
b
b
using "${arr[#]}"
a a
b b
See this related question for further details.
#TomFenech's answer solves your problem, but let me suggest other improvements:
#!/usr/bin/env bash
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in "${DIS_ARRAY[#]}"
do
for file in "/mydir/$dis"*.xml
do
if [ -f "$file" ]; then
echo "$file"
fi
done
done
Your shebang line references sh, but your question is tagged bash - unless you need POSIX compliance, use a bash shebang line to take advantage of all that bash has to offer
To match files located directly in a given directory (i.e., if you don't need to traverse an entire subtree), use a glob (filename pattern) and rely on pathname expansion as in my code above - no need for find and command substitution.
Note that the wildcard char. * is UNquoted to ensure pathname expansion.
Caveat: if no matching files are found, the glob is left untouched (assuming the nullglob shell option is OFF, which it is by default), so the loop is entered once, with an invalid filename (the unexpanded glob) - hence the [ -f "$file" ] conditional to ensure that an actual match was found (as an aside: using bashisms, you could use [[ -f $file ]] instead).

Resources