grep cannot read filename after find folders with spaces - macos

Hi after I find the files and enclose their name with double quotes with the following command:
FILES=$(find . -type f -not -path "./.git/*" -exec echo -n '"{}" ' \; | tr '\n' ' ')
I do a for loop to grep a certain word inside each file that matches find:
for f in $FILES; do grep -Eq '(GNU)' $f; done
but grep complains about each entry that it cannot find file or directory:
grep: "./test/test.c": No such file or directory
see picture:
whereas echo $FILES produces:
"./.DS_Store" "./.gitignore" "./add_license.sh" "./ads.add_lcs.log" "./lcs_gplv2" "./lcs_mit" "./LICENSE" "./new test/test.js" "./README.md" "./sxs.add_lcs.log" "./test/test.c" "./test/test.h" "./test/test.js" "./test/test.m" "./test/test.py" "./test/test.pyc"
EDIT
found the answer here. works perfectly!

The issue is that your array contains filenames surrounded by literal " quotes.
But worse, find's -exec cmd {} \; executes cmd separately for each file which can be inefficient. As mentioned by #TomFenech in the comments, you can use -exec cmd {} + to search as many files within a single cmd invocation as possible.
A better approach for recursive search is usually to let find output filenames to search, and pipe its results to xargs in order to grep inside as many filenames together as possible. Use -print0 and -0 respectively to correctly support filenames with spaces and other separators, by splitting results by a null character instead - this way you don't need quotes, reducing possibility of bugs.
Something like this:
find . -type f -not -path './.git/*' -print0 | xargs -0 egrep '(GNU)'
However in your question you had grep -q in a loop, so I suspect you may be looking for an error status (found/not found) for each file? If so, you could use -l instead of -q to make grep list matching filenames, and then pipe/send that output to where you need the results.
find . -print0 | xargs -0 egrep -l pattern > matching_filenames
Also note that grep -E (or egrep) uses extended regular expressions, which means parentheses create a regex group. If you want to search for files containing (GNU) (with the parentheses) use grep -F or fgrep instead, which treats the pattern as a string literal.

Related

Removing white spaces from files but not from directories throws an error

I'm trying to recursively rename some files with parent folders that contain spaces, I've tried the following command in ubuntu terminal:
find . -type f -name '* *' -print0 | xargs -0 rename 's/ //'
It has given out the following error refering to the folder names:
Can't rename ./FOLDER WITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg
./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg: No such file
or directory
If i'm not mistaken the fact that the folders have white spaces in them shouldn't affect the process since it uses the flag -f.
What is passed to xargs is the full path of the file, not just the file name. So your s/ // substitute command also removes spaces from the directory part. And as the new directories (without spaces) don't exist you get the error you see. The renaming, in your example, was:
./FOLDER WITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg ->
./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg
And this is not possible if directories ./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1 don't already exist.
Try with the -d option of rename:
find . -type f -name '* *' -print0 | xargs -0 rename -d 's/ //'
(the -d option only renames the filename component of the path.)
Note that you don't need xargs. You could use the -execdir action of find:
find . -type f -name '* *' -execdir rename 's/ //' {} +
And as the -execdir command is executed in the subdirectory containing the matched file, you don't need the -d option of rename any more. And the -print0 action of find is not needed neither.
Last note: if you want to replace all spaces in the file names, not just the first one, do not forget to add the g flag: rename 's/ //g'.
You're correct in that -type f -name '* *' only finds files with blanks in the name, but find prints the entire path including parent directories, so if you have
dir with blank/file with blank.txt
and you do rename 's/ //' on that string, you get
dirwith blank/file with blank.txt
because the first blank in the entire string was removed. And now the path has changed, invalidating previously found results.
You could
use a different incantation of rename to a) only apply to the part after the last / and b) replace multiple blanks:
find . -type f -name '* *' -print0 | xargs -0 rename -n 's| (?=[^/]*$)||g'
s/ (?=[^\/]*$)//g matches all blanks that are followed by characters other than / until the end of the string, where (?=...) is a look-ahead.1 You can use rename -n to dry-run until everything looks right.
(with GNU find) use -execdir to operate relative to the directory where the file is found, and also use Bash parameter expansion instead of rename:
find \
-type f \
-name '* *' \
-execdir bash -c 'for f; do mv "$f" "${f//[[:blank:]]}"; done' _ {} +
This collects as many matches as possible and then calls the Bash command with all the matches; for f iterates over all positional parameters (i.e., each file), and the mv command removes all blanks. _ is a stand-in for $0 within bash -c and doesn't really do anything.
${f//[[:blank:]]} is a parameter expansion that removes all instances of [[:blank:]] from the string $f.
You can use echo mv until everything looks right.
1 There's an easier method to achieve the same using rename -d, see Renaud's answer.

Count code lines by extension and path pattern

I have a bash query that helps me with code lines counting:
find . "(" -name "*.ext" ")" -print0 | xargs -0 wc -l
I'd like also to count code lines in specific directories with some pattern (e.g, all th directories starting with "#" symbol). However, it seems that -name argument checks only filename, not full name or path.
So, I thought I can use grep to filter output, which contains full paths:
find . "(" -name "*.ext" ")" -print0 | grep "/#" | xargs -0 wc -l
But grep doesn't handle it:
Binary file (standard input) matches
I also tried to removing -print0 from find and adding -a to grep:
find . "(" -name "*.ext" ")" | grep -a "/#" | xargs -0 wc -l
This way I get file list filtered by path, but it also leads to problem with xargs:
open: File name too long
How can I accomplish the desired result? Also, an explanation how it works and why my last query fails would be greatly welcome.
Your logic seems to be right, but the commands usage is not quite right. When you did find . "(" -name "*.ext" ")" -print0 | grep "/#" you are passing the whole search results from the find command which are NULL delimited as the content to be searched on for grep, but it doesn't like the type of data it is seeing.
Usually grep identifies if a file is binary or not by seeing first few bytes of its input stream (from file or through stdin). Since you are passing the results from find with a NULL delimiter, it was not able to identify it as a text input and considered it binary data and threw error on the same.
Later you bypassed it with the -a flag to consider the given binary data as text data, you are now searching on the entire null delimited result for /#, but the search results displayed are not unique results that match your pattern, but still the whole original result from the find command. The workaround you can do, it to let know grep to put results with a NULL termination by adding a -z flag as
find . -name "*.ext" -print0 | grep -az "/#" | xargs -0 wc -l
Alternatively, you can use the regex options supported in find itself. Assuming you want to search for files with .ext on directories starting with #, you could do
find . -type f -regex ".*[#].*/.*ext" -print0 | xargs -0 wc -l
The -print0 option from man page:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that
-print uses).
You used grep on a one line output.
Also, this option doesn't print the full path if you run with path ..
So, i would suggest:
find . "(" -name "*.ext" ")" -exec readlink -f {} \;|grep "/#"|xargs wc -l

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

execute shell command with variable

To execute 'find' with some variables from txt file i made this
but it doesn't work.
is that wrong with execute statement?
#/bin/bash
while read line;
do
echo tmp_name: $line
for ST in 'service.getFile("'$line;
do
find ./compact/ -type f -exec grep -l $ST {} \;
done
done < tmpNameList.txt
Try and quote $ST in your find command.
What's more:
since you operate from the current directory, ./ is not necessary;
you don't seem to have any special regex character (the ( needs to be quoted in grep's classical regex mode, and I assume you did mean a literal dot), so use fgrep instead (or grep -F). Ie:
find compact/ -type f -exec fgrep -l "$ST" {} \;
grep can read multiple patterns from a file (-f option):
find ./compact/ -type f -exec grep -f patterns.txt {} +
where patterns.txt (prepend 'service.getFile(' to each line) is:
sed 's/^/service.getFile(/' tmpNameList.txt >patterns.txt

Find, grep, and execute - all in one?

This is the command I've been using for finding matches (queryString) in php files, in the current directory, with grep, case insensitive, and showing matching results in line:
find . -iname "*php" -exec grep -iH queryString {} \;
Is there a way to also pipe just the file name of the matches to another script?
I could probably run the -exec command twice, but that seems inefficient.
What I'd love to do on Mac OS X is then actually to "reveal" that file in the finder. I think I can handle that part. If I had to give up the inline matches and just let grep show the files names, and then pipe that to a third script, that would be fine, too - I would settle.
But I'm actually not even sure how to pipe the output (the matched file names) to somewhere else...
Help! :)
Clarification
I'd like to reveal each of the files in a finder window - so I'm probably not going to using the -q flag and stop at the first one.
I'm going to run this in the console, ideally I'd like to see the inline matches printed out there, as well as being able to pipe them to another script, like oascript (applescript, to reveal them in the finder). That's why I have been using -H - because I like to see both the file name and the match.
If I had to settle for just using -l so that the file name could more easily be piped to another script, that would be OK, too. But I think after looking at the reply below from #Charlie Martin, that xargs could be helpful here in doing both at the same time with a single find, and single grep command.
I did say bash but I don't really mind if this needs to be ran as /bin/sh instead - I don't know too much about the differences yet, but I do know there are some important ones.
Thank you all for the responses, I'm going to try some of them at the command line and see if I can get any of them to work and then I think I can choose the best answer. Leave a comment if you want me to clarify anything more.
Thanks again!
You bet. The usual thing is something like
$ find /path -name pattern -print | xargs command
So you might for example do
$ find . -name '*.[ch]' -print | xargs grep -H 'main'
(Quiz: why -H?)
You can carry on with this farther; for example. you might use
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1
to get the vector of file names for files that contain 'main', or
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1 |
xargs growlnotify -
to have each name become a Growl notification.
You could also do
$ grep pattern `find /path -name pattern`
or
$ grep pattern $(find /path -name pattern)
(in bash(1) at least these are equivalent) but you can run into limits on the length of a command line that way.
Update
To answer your questions:
(1) You can do anything in bash you can do in sh. The one thing I've mentioned that would be any different is the use of $(command) in place of using backticks around command, and that works in the version of sh on Macs. The csh, zsh, ash, and fish are different.
(2) I think merely doing $ open $(dirname arg) will opena finder window on the containing directory.
It sounds like you want to open all *.php files that contain querystring from within a Terminal.app session.
You could do it this way:
find . -name '*.php' -exec grep -li 'querystring' {} \; | xargs open
With my setup, this opens MacVim with each file on a separate tab. YMMV.
Replace -H with -l and you will get a list of those filenames that matched the pattern.
if you have bash4, simply do
grep pattern /path/**/*.php
the ** operator is like
grep pattern `find -name \*.php -print`
find /home/aaronmcdaid/Code/ -name '*.cpp' -exec grep -q -iH boost {} \; -exec echo {} \;
The first change I made is to add -q to your grep command. This is "Exit immediately with zero status if any match is found".
The good news is that this speeds up grep when a file has many matching lines. You don't care how many matches there are. But that means we need another exec on the end to actually print the filenames when grep has been successful
The grep result will be sent to stdout, so another -exec predicate is probably the best solution here.
Pipe to another script:
find . -iname "*.php" | myScript
File names will come into the stdin of myScript 1 line at a time.
You can also use xargs to form/execute commands to act on each file:
find . -iname "*.php" | xargs ls -l
act on files you find that match:
find . -iname "*.php" | xargs grep -l pattern | myScript
act that don't match pattern
find . -iname "*.php" | xargs grep -L pattern | myScript
In general using multiple -exec's and grep -q will be FAR faster than piping, since find has implied short circuits -a's separating each juxtaposed pair of expressions that's not separated with an explicit operator. The main problem here, is that you want something to happen if grep matches something AND for matches to be printed. If the files are reasonably sized then this should be faster (because grep -q exits after finding a single match)
find . -iname "*php" -exec grep -iq queryString {} \; -exec grep -iH queryString {} \; -exec otherprogram {} \;
If the files are particularly big, encapsulating it in a shell script may be faster then running multiple grep commands
find . -iname "*php" -exec bash -c \
'out=$(grep -iH queryString "$1"); [[ -n $out ]] && echo "$out" && exit 0 || exit 1' \
bash {} \; -print
Also note, if the matches are not particularly needed, then
find . -iname "*php" -exec grep -iq queryString {} \; -exec otherprogram {} \;
Will virtually always be faster than then a piped solution like
find . -iname "*php" -print0 | xargs -0 grep -iH | ...
Additionally, you should really have -type f in all cases, unless you want to catch *php directories
Regarding the question of which is faster, and you actually care about the minuscule time difference, which maybe you might if you are trying to see which will save your processor some time... perhaps testing using the command as a suffix to the "time" command, and see which one performs better.

Resources