Bash wrapping parts of a variable in quotes when expanded - bash

I'm trying to recursively find c and header files in a script, while avoiding globbing out any that exist in the current directory.
FILE_MATCH_LIST='"*.c","*.cc","*.cpp","*.h","*.hh","*.hpp"'
FILE_MATCH_REGEX=$(echo "$FILE_MATCH_LIST" | sed 's/,/ -o -name /g')
FILE_MATCH_REGEX="-name $FILE_MATCH_REGEX"
This does exactly what I want it to:
+ FILE_MATCH_REGEX='-name "*.c" -o -name "*.cc" -o -name "*.cpp" -o -name "*.h" -o -name "*.hh" -o -name "*.hpp"'
Now, if I call find with that string (in quotes), it maintains the leading and trailing quotes and breaks find:
files=$(find $root_dir "$FILE_MATCH_REGEX" | grep -v $GREP_IGNORE_LIST)
+ find [directory] '-name "*.c" -o -name "*.cc" -o -name "*.cpp" -o -name "*.h" -o -name "*.hh" -o -name "*.hpp"'
This results in a "unknown predicate" error from find, because the entire predicate is single quoted.
If I drop the quotes from the variable in the find command, I get a strange behavior:
files=$(find $root_dir $FILE_MATCH_REGEX | grep -v $GREP_IGNORE_LIST)
+ find [directory] -name '"*.c"' -o -name '"*.cc"' -o -name '"*.cpp"' -o -name '"*.h"' -o -name '"*.hh"' -o -name '"*.hpp"'
Where are these single quotes coming from? They exist if I echo that variable as well, but they aren't there in the command when I'm actually setting the $FILE_MATCH_REGEX (As seen at the beginning of the question).
This of course also breaks find, because it's looking for the actual double quoted string, instead of expanding the *.h etc.
How do I get these strings into find without all of these quoting woes?

Fleshing out the array answer:
#!/bin/bash
patterns=( '*.c' '*.cc' '*.h' '*.hh' )
find_args=( "-name" "${patterns[0]}" )
for (( i=1 ; i < "${#patterns[#]}" ; i++ )) ; do
find_args+=( "-o" "-name" "${patterns[i]}" )
done
find [directory] "${find_args[#]}"
Also, to clear up the misconception around quotes, if you echo the last line the output might not be what you expect:
echo find [directory] "${find_args[#]}"
# outputs: find [directory] -name *.c -o -name *.cc -o -name *.h -o -name *.hh
Where are the quotes? Your shell removed them after it was done with them. Quotes are not find syntax, they are shell syntax that tell the shell how to interpret (or perhaps how NOT to interpret) your command line.
The reason for the strange behavior in your debug output is that the quotes in your data are literal quotes, not shell syntax quotes that get removed during command parsing. The debugger is just trying to point out the distinction.
Some useful resources on the Bash wiki:
BashParser explains how your command line gets parsed and executed
BashFAQ/050 explains why embedding quotes in your data isn't sufficient

If you have GNU find - adjust to your liking:
#!/bin/bash
#FILE_MATCH_LIST='"*.c","*.cc","*.cpp","*.h","*.hh","*.hpp"'
FILE_MATCH_LIST='.*/.*\.(c|cc|cpp|h|hh|hpp)'
find . -type f -regextype posix-egrep -regex "${FILE_MATCH_LIST}"

Related

Bash: Find command with multiple -name variable [duplicate]

I have a find command that finds files with name matching multiple patterns mentioned against the -name parameter
find -L . \( -name "SystemOut*.log" -o -name "*.out" -o -name "*.log" -o -name "javacore*.*" \)
This finds required files successfully at the command line. What I am looking for is to use this command in a shell script and join this with a tar command to create a tar of all log files. So, in a script I do the following:
LIST="-name \"SystemOut*.log\" -o -name \"*.out\" -o -name \"*.log\" -o -name \"javacore*.*\" "
find -L . \( ${LIST} \)
This does not print files that I am looking for.
First - why this script is not functioning like the command? Once it does, can I club it with cpio or similar to create a tar in one shot?
Looks like find fails to match * in patterns from unquoted variables. This syntax works for me (using bash arrays):
LIST=( -name \*.tar.gz )
find . "${LIST[#]}"
Your example would become the following:
LIST=( -name SystemOut\*.log -o -name \*.out -o -name \*.log -o -name javacore\*.\* )
find -L . \( "${LIST[#]}" \)
eval "find -L . \( ${LIST} \)"
You could use an eval and xargs,
eval "find -L . \( $LIST \) " | xargs tar cf 1.tar
When you have a long list of file names you use, you may want to try the following syntax instead:
# List of file patterns
Pat=( "SystemOut*.log"
"*.out"
"*.log"
"javacore*.*" )
# Loop through each file pattern and build a 'find' string
find $startdir \( -name $(printf -- $'\'%s\'' "${Pat[0]}") $(printf -- $'-o -name \'%s\' ' "${Pat[#]:1}") \)
That method constructs the argument sequentially using elements from a list, which tends to work better (at least in my recent experiences).
You can use find's -exec option to pass the results to an archiving program:
find -L . \( .. \) -exec tar -Af archive.tar {} \;
LIST="-name SystemOut*.log -o -name *.out -o -name *.log -o -name javacore*.*"
The wildcards are already quoted and you don't need to quote them again. Moreover, here
LIST="-name \"SystemOut*.log\""
the inner quotes are preserved and find will get them as a part of the argument.
Building -name list for find command
Here is a proper way to do this:
cmd=();for p in \*.{log,tmp,bak} .debug-\*;do [ "$cmd" ] && cmd+=(-o);cmd+=(-name "$p");done
Or
cmd=()
for p in \*.{log,tmp,bak,'Spaced FileName'} {.debug,.log}-\* ;do
[ "$cmd" ] && cmd+=(-o)
cmd+=(-name "$p")
done
You could dump you $cmd array:
declare -p cmd
declare -a cmd=([0]="-name" [1]="*.log" [2]="-o" [3]="-name" [4]="*.tmp" [5]="-o"
[6]="-name" [7]="*.bak" [8]="-o" [9]="-name" [10]="*.Spaced FileName"
[11]="-o" [12]="-name" [13]=".debug-*" [14]="-o" [15]="-name" [16]=".log-*")
Then now you could
find [-L] [/path] \( "${cmd[#]}" \)
As
find \( "${cmd[#]}" \)
(Nota: if no path is submited, current path . is default)
find /home/user/SomeDir \( "${cmd[#]}" \)
find -L /path -type f \( "${cmd[#]}" \)

How to escape special characters in a variable to provide commandline arguments in bash

I very often use find to search for files and symbols in a huge source tree. If I don't limit the directories and file types, it takes several minutes to search for a symbol in a file. (I already mounted the source tree on an SSD and that halved the search time.)
I have a few aliases to limit the directories that I want to search, e.g.:
alias findhg='find . -name .hg -prune -o'
alias findhgbld='find . \( -name .hg -o -name bld \) -prune -o'
alias findhgbldins='find . \( -name .hg -o -name bld -o -name install \) -prune -o'
I then also limit the file types as well, e.g.:
findhgbldins \( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \)
But sometimes I only want to check for symbols in cmake files:
findhgbldins \( -name '*.cmake' -o -name '*.txt' \) -exec egrep -H 'pattern' \;
I could make a whole bunch of aliases for all possible combinations, but it would be a lot easier if I could use variables to select the file types, e.g:
export SEARCHALL="\( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \)"
export SEARCHSRC="\( -name '*.[hc]' -o -name '*.cpp' \)"
and then call:
findhgbldins $SEARCHALL -exec egrep -H 'pattern' \;
I tried several variants of escaping \, (, * and ), but there was no combination that did work.
The only way I could make it to work, was to turn off globbing in Bash, i.e. set -f, before calling my 'find'-contraption and then turn globbing on again.
One alternative I came up with is to define a set of functions (with the same names as my aliases findhg, findhgbldins, and findhgbldins), which take a simple parameter that is used in a case structure that selects the different file types I am looking for, something like:
findhg {
case $1 in
'1' )
find <many file arguments> ;;
'2' )
find <other file arguments> ;;
...
esac
}
findhgbld {
case $1 in
'1' )
find <many file arguments> ;;
'2' )
find <other file arguments> ;;
...
esac
}
etcetera
My question is: Is it at all possible to pass these types of arguments to a command as a variable ?
Or is there maybe a different way to achieve the same i.e. having a combination of a command (findhg, findhgbld,findhgbldins) and a single argument to create a large number of combinations for searching ?
It's not really possible to do what you want without unpleasantness. The basic problem is that when you expand a variable without double-quotes around it (e.g. findhgbldins $SEARCHALL), it does word splitting and glob expansion on the variable's value, but does not interpret quotes or escapes, so there's no way to embed something in the variable's value to suppress glob expansion (well, unless you use invalid glob patterns, but that'd keep find from matching them properly too). Putting double-quotes around it (findhgbldins "$SEARCHALL") suppresses glob expansion, but it also suppresses word splitting, which you need to let find interpret the expression properly. You can turn off glob expansion entirely (set -f, as you mentioned), but that turns it off for everything, not just this variable.
One thing that would work (but would be annoying to use) would be to put the search options in arrays rather than plain variables, e.g.:
SEARCHALL=( \( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \) )
findhgbldins "${SEARCHALL[#]}" -exec egrep -H 'pattern' \;
but that's a lot of typing to use it (and you do need every quote, bracket, brace, etc to get the array to expand right). Not very helpful.
My preferred option would be to build a function that interprets its first argument as a list of file types to match (e.g. findhgbldins mct -exec egrep -H 'pattern' \; might find make/cmake, c/h, and text files). Something like this:
findhgbldins() {
filetypes=()
if [[ $# -ge 1 && "$1" != "-"* ]]; then # if we were passed a type list (not just a find primitive starting with "-")
typestr="$1"
while [[ "${#typestr}" -gt 0 ]]; do
case "${typestr:0:1}" in # this looks at the first char of typestr
c) filetypes+=(-o -name '*.[ch]');;
C) filetypes+=(-o -name '*.cpp');;
m) filetypes+=(-o -name '*.make' -o '*.cmake');;
p) filetypes+=(-o -name '*.py');;
t) filetypes+=(-o -name '*.txt');;
?) echo "Usage: $0 [cCmpt] [find options]" >2
exit ;;
esac
typestr="${typestr:1}" # remove first character, so we can process the remainder
done
# Note: at this point filetypes will be something like '-o' -name '*.txt' -o -name '*.[ch]'
# To use it with find, we need to remove the first element (`-o`), and add parens
filetypes=( \( "${filetypes[#]:1}" \) )
shift # and get rid of $1, so it doesn't get passed to `find` later!
fi
# Run `find`
find . \( -name .hg -o -name bld -o -name install \) -prune -o "${filetypes[#]}" "$#"
}
...you could also use a similar approach to building a list of directories to prune, if you wanted to.
As I said, that'd be my preferred option. But there is a trick (and I do mean trick), if you really want to use the variable approach. It's called a magic alias, and it takes advantage of the fact that aliases are expanded before wildcards, but functions are processed afterward, and does something completely unnatural with the combination. Something like this:
alias findhgbldins='shopts="$SHELLOPTS"; set -f; noglob_helper find . \( -name .hg -o -name bld -o -name install \) -prune -o'
noglob_helper() {
"$#"
case "$shopts" in
*noglob*) ;;
*) set +f ;;
esac
unset shopts
}
export SEARCHALL="( -name *.cmake -o -name *.txt -o -name *.[hc] -o -name *.py -o -name *.cpp )"
Then if you run findhgbldins $SEARCHALL -exec egrep -H 'pattern' \;, it expands the alias, records the current shell options, turns off globbing, and passes the find command (including $SEARCHALL, word-split but not glob-expanded) to noglob_helper, which runs the find command with all options, then turns glob expansion back on (if it wasn't disabled in the saved shell options) so it doesn't mess you up later. It's a complete hack, but it should actually work.

Using find on multiple file extensions in combination with grep

I am having a problems using find and grep together in msys on Windows. However, I also tried the same command on a Linux machine and it behaved the same. Notwithstanding, the syntax below is for windows in that the semicolon on the end of the command is not preceded by a backslash.
I am trying to write a find expression to find *.cpp and *.h files and pass the results to grep. If I run this alone, it successfully finds all the .cpp and .h files:
find . -name '*.cpp' -o -name '*.h'
But if I add in an exec grep expression like this:
find . -name '*.cpp' -o -name '*.h' -exec grep -l 'std::deque' {} ;
It only greps the .h files. If I switch the .h and .cpp order in the command, it only searches the .h. Essentially, it appears to only grep the last file extension in the expression. What do I need to do to grep both .h and .cpp??
Since you're using -o, you will need to use parentheses around it:
find . \( -name '*.cpp' -o -name '*.h' \) -exec grep -l 'std::deque' {} \;
Or.. you can ...
bash$> grep '/bin' `find . -name "*.pl" -o -name "*.sh"`
./a.sh:#!/bin/bash
./pop3.pl:#!/usr/bin/perl
./seek.pl:#!/usr/bin/perl -w
./move.sh:#!/bin/bash
bash$>
Above command greps 'bin' in ".sh" and ".pl" files. And it has found them !!

What's wrong with this bash code to replace words in files?

I wrote this code a few months ago and didn't touch it again. Now I picked it up to complete it. This is part of a larger script to find all files with specific extensions, find which ones have a certain word, and replace every instance of that word with another one.
In this excerpt, ARG4 is the directory it starts looking at (it keeps going recursively).
ARG2 is the word it looks for.
ARG3 is the word that replaces ARG2.
ARG4="$4"
find -P "$ARG4" -type f -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Like I said it's been a while, but I've read the code and I think it's pretty understandable. I think the problem must be in the while loop. I googled more info about "while read ---" but I didn't find much.
EDIT 2: See my answer down below for the solution.
I discovered that find wasn't working properly. It turns out that it's because of -maxdepth 0 which I put there so that the search would only happen in the current directory. I took it out, but then the output of find was one single string with all of the file names. They needed to be separate entities so that the while loop could read each one. So I rewrote it:
files=(`find . -type f \( -name "*.h" -o -name "*.C" -o \
-name "*.cpp" -o -name "*.cc" \) \
-exec grep -l "$ARG1" {} \;`)
for i in ${files[#]} ; do
echo $i
echo `gsed -E -i "s/$ARG1/$ARG2/g" ${i}`
done
I had to install GNU sed, the regular one just wouldn't accept the file names.
It's hard to say if this is the only issue, since you haven't said precisely what's wrong. However, your find command's -exec action is only being applied for *.cc files. If you want it to apply for any of those, it should look more like:
ARG4="$4"
find -P "$ARG4" -type f \( -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \) \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Note the added ( and ) for grouping to attach the action to the result of all of those.

for loop / unix2dos to clean a group of files with specific extension

I am trying to use unix2dos on a group of C++ source code files. Basically, unix2dos converts LF to CRLF.
I could simply do the following, and it does what I want :
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec unix2dos {}\;
but I don't want the file to be modified if it has CRLF end of lines already.
That's why I have to modify the script.
#!/bin/sh
for i in `find . -type f \( -name "*.h" -o -name "*.cpp" \)`
do
LINE=`file $i | grep CRLF`
if [ $? -eq 1 ]
then
unix2dos $i
fi
done
The for loop seems a bit tricky to use since spaces are not being handled correctly. When the filename contains space, the shell is trying to apply unix2dos incorrectly on a splited string.
How do I solve the problem ?
You could use the following perl, which should leave CRLF files unchanged:
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec perl -pi -e 's/([^\r])\n/$1\r\n/' "{}"\;
It will insert a CR before any LF that isn't preceded by a CR.
Simply change your unix2dos command with the following (provided by putnamhill upper) :
`perl -wpi -e 's/([^\r])\n/$1\r\n/g' $1`;
Then do your previous find command :
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec unix2dos {}\;
And you are all set.
You could check with a grep if a file contains a \r and run unix2dos conditionally, like this:
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec sh -c 'grep -q ^M "{}" && dos2unix "{}"' \;
... where you enter ^M by pressing Control-V and Enter. (^M is the \r character)
You shouldn't process find command's output in a for loop.
You need to quote your variables properly in shell.
Try this code instead:
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) | while read i
do
LINE=`file "$i" | grep -c CRLF`
if [ $LINE -eq 0 ]
then
unix2dos "$i"
fi
done
UPDATE: If you decide to use BASH then you can do this looping more efficiently. Consider following code:
#!/bin/bash
while read file
do
grep -q $'\r'"$" "$file" && unix2dos "$file"
done < <(find . -type f \( -name "*.h" -o -name "*.cpp" \))
< <(...) syntax is called process substitution that makes above while loop in the current shell itself thus allowing you to set shel variables in current shell process and saving a forking of sub-shell creation.
Unix2dos will change LF to CRLF, but it will not change CRLF to CRCRLF. Any existing DOS line break will stay unchanged. So the simplest way to do what you want is:
unix2dos *.h *.cpp
best regards,
Erwin

Resources