Bash: Find command with multiple -name variable [duplicate] - bash

I have a find command that finds files with name matching multiple patterns mentioned against the -name parameter
find -L . \( -name "SystemOut*.log" -o -name "*.out" -o -name "*.log" -o -name "javacore*.*" \)
This finds required files successfully at the command line. What I am looking for is to use this command in a shell script and join this with a tar command to create a tar of all log files. So, in a script I do the following:
LIST="-name \"SystemOut*.log\" -o -name \"*.out\" -o -name \"*.log\" -o -name \"javacore*.*\" "
find -L . \( ${LIST} \)
This does not print files that I am looking for.
First - why this script is not functioning like the command? Once it does, can I club it with cpio or similar to create a tar in one shot?

Looks like find fails to match * in patterns from unquoted variables. This syntax works for me (using bash arrays):
LIST=( -name \*.tar.gz )
find . "${LIST[#]}"
Your example would become the following:
LIST=( -name SystemOut\*.log -o -name \*.out -o -name \*.log -o -name javacore\*.\* )
find -L . \( "${LIST[#]}" \)

eval "find -L . \( ${LIST} \)"

You could use an eval and xargs,
eval "find -L . \( $LIST \) " | xargs tar cf 1.tar

When you have a long list of file names you use, you may want to try the following syntax instead:
# List of file patterns
Pat=( "SystemOut*.log"
"*.out"
"*.log"
"javacore*.*" )
# Loop through each file pattern and build a 'find' string
find $startdir \( -name $(printf -- $'\'%s\'' "${Pat[0]}") $(printf -- $'-o -name \'%s\' ' "${Pat[#]:1}") \)
That method constructs the argument sequentially using elements from a list, which tends to work better (at least in my recent experiences).
You can use find's -exec option to pass the results to an archiving program:
find -L . \( .. \) -exec tar -Af archive.tar {} \;

LIST="-name SystemOut*.log -o -name *.out -o -name *.log -o -name javacore*.*"
The wildcards are already quoted and you don't need to quote them again. Moreover, here
LIST="-name \"SystemOut*.log\""
the inner quotes are preserved and find will get them as a part of the argument.

Building -name list for find command
Here is a proper way to do this:
cmd=();for p in \*.{log,tmp,bak} .debug-\*;do [ "$cmd" ] && cmd+=(-o);cmd+=(-name "$p");done
Or
cmd=()
for p in \*.{log,tmp,bak,'Spaced FileName'} {.debug,.log}-\* ;do
[ "$cmd" ] && cmd+=(-o)
cmd+=(-name "$p")
done
You could dump you $cmd array:
declare -p cmd
declare -a cmd=([0]="-name" [1]="*.log" [2]="-o" [3]="-name" [4]="*.tmp" [5]="-o"
[6]="-name" [7]="*.bak" [8]="-o" [9]="-name" [10]="*.Spaced FileName"
[11]="-o" [12]="-name" [13]=".debug-*" [14]="-o" [15]="-name" [16]=".log-*")
Then now you could
find [-L] [/path] \( "${cmd[#]}" \)
As
find \( "${cmd[#]}" \)
(Nota: if no path is submited, current path . is default)
find /home/user/SomeDir \( "${cmd[#]}" \)
find -L /path -type f \( "${cmd[#]}" \)

Related

How to escape special characters in a variable to provide commandline arguments in bash

I very often use find to search for files and symbols in a huge source tree. If I don't limit the directories and file types, it takes several minutes to search for a symbol in a file. (I already mounted the source tree on an SSD and that halved the search time.)
I have a few aliases to limit the directories that I want to search, e.g.:
alias findhg='find . -name .hg -prune -o'
alias findhgbld='find . \( -name .hg -o -name bld \) -prune -o'
alias findhgbldins='find . \( -name .hg -o -name bld -o -name install \) -prune -o'
I then also limit the file types as well, e.g.:
findhgbldins \( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \)
But sometimes I only want to check for symbols in cmake files:
findhgbldins \( -name '*.cmake' -o -name '*.txt' \) -exec egrep -H 'pattern' \;
I could make a whole bunch of aliases for all possible combinations, but it would be a lot easier if I could use variables to select the file types, e.g:
export SEARCHALL="\( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \)"
export SEARCHSRC="\( -name '*.[hc]' -o -name '*.cpp' \)"
and then call:
findhgbldins $SEARCHALL -exec egrep -H 'pattern' \;
I tried several variants of escaping \, (, * and ), but there was no combination that did work.
The only way I could make it to work, was to turn off globbing in Bash, i.e. set -f, before calling my 'find'-contraption and then turn globbing on again.
One alternative I came up with is to define a set of functions (with the same names as my aliases findhg, findhgbldins, and findhgbldins), which take a simple parameter that is used in a case structure that selects the different file types I am looking for, something like:
findhg {
case $1 in
'1' )
find <many file arguments> ;;
'2' )
find <other file arguments> ;;
...
esac
}
findhgbld {
case $1 in
'1' )
find <many file arguments> ;;
'2' )
find <other file arguments> ;;
...
esac
}
etcetera
My question is: Is it at all possible to pass these types of arguments to a command as a variable ?
Or is there maybe a different way to achieve the same i.e. having a combination of a command (findhg, findhgbld,findhgbldins) and a single argument to create a large number of combinations for searching ?
It's not really possible to do what you want without unpleasantness. The basic problem is that when you expand a variable without double-quotes around it (e.g. findhgbldins $SEARCHALL), it does word splitting and glob expansion on the variable's value, but does not interpret quotes or escapes, so there's no way to embed something in the variable's value to suppress glob expansion (well, unless you use invalid glob patterns, but that'd keep find from matching them properly too). Putting double-quotes around it (findhgbldins "$SEARCHALL") suppresses glob expansion, but it also suppresses word splitting, which you need to let find interpret the expression properly. You can turn off glob expansion entirely (set -f, as you mentioned), but that turns it off for everything, not just this variable.
One thing that would work (but would be annoying to use) would be to put the search options in arrays rather than plain variables, e.g.:
SEARCHALL=( \( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \) )
findhgbldins "${SEARCHALL[#]}" -exec egrep -H 'pattern' \;
but that's a lot of typing to use it (and you do need every quote, bracket, brace, etc to get the array to expand right). Not very helpful.
My preferred option would be to build a function that interprets its first argument as a list of file types to match (e.g. findhgbldins mct -exec egrep -H 'pattern' \; might find make/cmake, c/h, and text files). Something like this:
findhgbldins() {
filetypes=()
if [[ $# -ge 1 && "$1" != "-"* ]]; then # if we were passed a type list (not just a find primitive starting with "-")
typestr="$1"
while [[ "${#typestr}" -gt 0 ]]; do
case "${typestr:0:1}" in # this looks at the first char of typestr
c) filetypes+=(-o -name '*.[ch]');;
C) filetypes+=(-o -name '*.cpp');;
m) filetypes+=(-o -name '*.make' -o '*.cmake');;
p) filetypes+=(-o -name '*.py');;
t) filetypes+=(-o -name '*.txt');;
?) echo "Usage: $0 [cCmpt] [find options]" >2
exit ;;
esac
typestr="${typestr:1}" # remove first character, so we can process the remainder
done
# Note: at this point filetypes will be something like '-o' -name '*.txt' -o -name '*.[ch]'
# To use it with find, we need to remove the first element (`-o`), and add parens
filetypes=( \( "${filetypes[#]:1}" \) )
shift # and get rid of $1, so it doesn't get passed to `find` later!
fi
# Run `find`
find . \( -name .hg -o -name bld -o -name install \) -prune -o "${filetypes[#]}" "$#"
}
...you could also use a similar approach to building a list of directories to prune, if you wanted to.
As I said, that'd be my preferred option. But there is a trick (and I do mean trick), if you really want to use the variable approach. It's called a magic alias, and it takes advantage of the fact that aliases are expanded before wildcards, but functions are processed afterward, and does something completely unnatural with the combination. Something like this:
alias findhgbldins='shopts="$SHELLOPTS"; set -f; noglob_helper find . \( -name .hg -o -name bld -o -name install \) -prune -o'
noglob_helper() {
"$#"
case "$shopts" in
*noglob*) ;;
*) set +f ;;
esac
unset shopts
}
export SEARCHALL="( -name *.cmake -o -name *.txt -o -name *.[hc] -o -name *.py -o -name *.cpp )"
Then if you run findhgbldins $SEARCHALL -exec egrep -H 'pattern' \;, it expands the alias, records the current shell options, turns off globbing, and passes the find command (including $SEARCHALL, word-split but not glob-expanded) to noglob_helper, which runs the find command with all options, then turns glob expansion back on (if it wasn't disabled in the saved shell options) so it doesn't mess you up later. It's a complete hack, but it should actually work.

What's wrong with this bash code to replace words in files?

I wrote this code a few months ago and didn't touch it again. Now I picked it up to complete it. This is part of a larger script to find all files with specific extensions, find which ones have a certain word, and replace every instance of that word with another one.
In this excerpt, ARG4 is the directory it starts looking at (it keeps going recursively).
ARG2 is the word it looks for.
ARG3 is the word that replaces ARG2.
ARG4="$4"
find -P "$ARG4" -type f -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Like I said it's been a while, but I've read the code and I think it's pretty understandable. I think the problem must be in the while loop. I googled more info about "while read ---" but I didn't find much.
EDIT 2: See my answer down below for the solution.
I discovered that find wasn't working properly. It turns out that it's because of -maxdepth 0 which I put there so that the search would only happen in the current directory. I took it out, but then the output of find was one single string with all of the file names. They needed to be separate entities so that the while loop could read each one. So I rewrote it:
files=(`find . -type f \( -name "*.h" -o -name "*.C" -o \
-name "*.cpp" -o -name "*.cc" \) \
-exec grep -l "$ARG1" {} \;`)
for i in ${files[#]} ; do
echo $i
echo `gsed -E -i "s/$ARG1/$ARG2/g" ${i}`
done
I had to install GNU sed, the regular one just wouldn't accept the file names.
It's hard to say if this is the only issue, since you haven't said precisely what's wrong. However, your find command's -exec action is only being applied for *.cc files. If you want it to apply for any of those, it should look more like:
ARG4="$4"
find -P "$ARG4" -type f \( -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \) \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Note the added ( and ) for grouping to attach the action to the result of all of those.

find option available to omit leading './' in result

I think this is probably a pretty n00ber question but I just gotsta ask it.
When I run:
$ find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \)
and get:
./01.Adagio - Allegro Vivace.mp3
./03.Allegro Vivace.mp3
./02.Adagio.mp3
./04.Allegro Ma Non Troppo.mp3
why does find prepend a ./ to the file name? I am using this in a script:
fList=()
while read -r -d $'\0'; do
fList+=("$REPLY")
done < <(find . -type f \( -name "*.mp3" -o -name "*.ogg" \) -print0)
fConv "$fList" "$dBaseN"
and I have to use a bit of a hacky-sed-fix at the beginning of a for loop in function 'fConv', accessing the array elements, to remove the leading ./. Is there a find option that would simply omit the leading ./ in the first place?
The ./ at the beginning of the file is the path. The "." means current directory.
You can use "sed" to remove it.
find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \) | sed 's|./||'
I do not recommend doing this though, since find can search through multiple directories, how would you know if the file found is located in the current directory?
If you ask it to search under /tmp, the results will be on the form /tmp/file:
$ find /tmp
/tmp
/tmp/.X0-lock
/tmp/.com.google.Chrome.cUkZfY
If you ask it to search under . (like you do), the results will be on the form ./file:
$ find .
.
./Documents
./.xmodmap
If you ask it to search through foo.mp3 and bar.ogg, the result will be on the form foo.mp3 and bar.ogg:
$ find *.mp3 *.ogg
click.ogg
slide.ogg
splat.ogg
However, this is just the default. With GNU and other modern finds, you can modify how to print the result. To always print just the last element:
find /foo -printf '%f\0'
If the result is /foo/bar/baz.mp3, this will result in baz.mp3.
To print the path relative to the argument under which it's found, you can use:
find /foo -printf '%P\0'
For /foo/bar/baz.mp3, this will show bar/baz.mp3.
However, you shouldn't be using find at all. This is a job for plain globs, as suggested by R Sahu.
shopt -s nullglob
files=(*.mp3 *.ogg)
echo "Converting ${files[*]}:"
fConv "${files[#]}"
find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \) -exec basename "{}" \;
Having said that, I think you can use a simpler approach:
for file in *.mp3 *.ogg
do
if [[ -f $file ]]; then
# Use the file
fi
done
If your -maxdepth is 1, you can simply use ls:
$ ls *.mp3 *.ogg
Of course, that will pick up any directory with a *.mp3 or *.ogg suffix, but you probably don't have such a directory anyway.
Another is to munge your results:
$ find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \) | sed 's#^\./##'
This will remove all ./ prefixes, but not touch other file names. Note the ^ anchor in the substitution command.

for loop / unix2dos to clean a group of files with specific extension

I am trying to use unix2dos on a group of C++ source code files. Basically, unix2dos converts LF to CRLF.
I could simply do the following, and it does what I want :
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec unix2dos {}\;
but I don't want the file to be modified if it has CRLF end of lines already.
That's why I have to modify the script.
#!/bin/sh
for i in `find . -type f \( -name "*.h" -o -name "*.cpp" \)`
do
LINE=`file $i | grep CRLF`
if [ $? -eq 1 ]
then
unix2dos $i
fi
done
The for loop seems a bit tricky to use since spaces are not being handled correctly. When the filename contains space, the shell is trying to apply unix2dos incorrectly on a splited string.
How do I solve the problem ?
You could use the following perl, which should leave CRLF files unchanged:
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec perl -pi -e 's/([^\r])\n/$1\r\n/' "{}"\;
It will insert a CR before any LF that isn't preceded by a CR.
Simply change your unix2dos command with the following (provided by putnamhill upper) :
`perl -wpi -e 's/([^\r])\n/$1\r\n/g' $1`;
Then do your previous find command :
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec unix2dos {}\;
And you are all set.
You could check with a grep if a file contains a \r and run unix2dos conditionally, like this:
find . -type f \( -name "*.h" -o -name "*.cpp" \) -exec sh -c 'grep -q ^M "{}" && dos2unix "{}"' \;
... where you enter ^M by pressing Control-V and Enter. (^M is the \r character)
You shouldn't process find command's output in a for loop.
You need to quote your variables properly in shell.
Try this code instead:
#!/bin/sh
find . -type f \( -name "*.h" -o -name "*.cpp" \) | while read i
do
LINE=`file "$i" | grep -c CRLF`
if [ $LINE -eq 0 ]
then
unix2dos "$i"
fi
done
UPDATE: If you decide to use BASH then you can do this looping more efficiently. Consider following code:
#!/bin/bash
while read file
do
grep -q $'\r'"$" "$file" && unix2dos "$file"
done < <(find . -type f \( -name "*.h" -o -name "*.cpp" \))
< <(...) syntax is called process substitution that makes above while loop in the current shell itself thus allowing you to set shel variables in current shell process and saving a forking of sub-shell creation.
Unix2dos will change LF to CRLF, but it will not change CRLF to CRCRLF. Any existing DOS line break will stay unchanged. So the simplest way to do what you want is:
unix2dos *.h *.cpp
best regards,
Erwin

Fast recursive grepping of svn working copy [duplicate]

This question already has answers here:
Exclude .svn directories from grep [duplicate]
(11 answers)
Closed 6 years ago.
I need to search all cpp/h files in svn working copy for "foo", excluding svn's special folders completely. What is the exact command for GNU grep?
I use ack for this purpose, it's like grep but automatically knows how to exclude source control directories (among other useful things).
grep -ir --exclude-dir=.svn foo *
In the working directory will do.
Omit the 'i' if you want the search to be case sensitive.
If you want to check only .cpp and .h files use
grep -ir --include={.cpp,.h} --exclude-dir=.svn foo *
Going a little off-topic:
If you have a working copy with a lot of untracked files (i.e. not version-controlled) and you only want to search source controlled files, you can do
svn ls -R | xargs -d '\n' grep <string-to-search-for>
This is a RTFM. I typed 'man grep' and '/exclude' and got:
--exclude=GLOB
Skip files whose base name matches GLOB (using wildcard
matching). A file-name glob can use *, ?, and [...] as
wildcards, and \ to quote a wildcard or backslash character
literally.
--exclude-from=FILE
Skip files whose base name matches any of the file-name globs
read from FILE (using wildcard matching as described under
--exclude).
--exclude-dir=DIR
Exclude directories matching the pattern DIR from recursive
searches.
I wrote this script which I've added to my .bashrc. It automatically excludes SVN directories from grep, find and locate.
I use these bash aliases for grepping for content and files in svn trees... I find it faster and more pleasant to search from the commandline (and use vim for coding) rather than a GUI-based IDE:
s () {
local PATTERN=$1
local COLOR=$2
shift; shift;
local MOREFLAGS=$*
if ! test -n "$COLOR" ; then
# is stdout connected to terminal?
if test -t 1; then
COLOR=always
else
COLOR=none
fi
fi
find -L . \
-not \( -name .svn -a -prune \) \
-not \( -name templates_c -a -prune \) \
-not \( -name log -a -prune \) \
-not \( -name logs -a -prune \) \
-type f \
-not -name \*.swp \
-not -name \*.swo \
-not -name \*.obj \
-not -name \*.map \
-not -name access.log \
-not -name \*.gif \
-not -name \*.jpg \
-not -name \*.png \
-not -name \*.sql \
-not -name \*.js \
-exec grep -iIHn -E --color=${COLOR} ${MOREFLAGS} -e "${PATTERN}" \{\} \;
}
# s foo | less
sl () {
local PATTERN=$*
s "$PATTERN" always | less
}
# like s but only lists the files that match
smatch () {
local PATTERN=$1
s $PATTERN always -l
}
# recursive search (filenames) - find file
f () {
find -L . -not \( -name .svn -a -prune \) \( -type f -or -type d \) -name "$1"
}

Resources