Count code lines by extension and path pattern - bash

I have a bash query that helps me with code lines counting:
find . "(" -name "*.ext" ")" -print0 | xargs -0 wc -l
I'd like also to count code lines in specific directories with some pattern (e.g, all th directories starting with "#" symbol). However, it seems that -name argument checks only filename, not full name or path.
So, I thought I can use grep to filter output, which contains full paths:
find . "(" -name "*.ext" ")" -print0 | grep "/#" | xargs -0 wc -l
But grep doesn't handle it:
Binary file (standard input) matches
I also tried to removing -print0 from find and adding -a to grep:
find . "(" -name "*.ext" ")" | grep -a "/#" | xargs -0 wc -l
This way I get file list filtered by path, but it also leads to problem with xargs:
open: File name too long
How can I accomplish the desired result? Also, an explanation how it works and why my last query fails would be greatly welcome.

Your logic seems to be right, but the commands usage is not quite right. When you did find . "(" -name "*.ext" ")" -print0 | grep "/#" you are passing the whole search results from the find command which are NULL delimited as the content to be searched on for grep, but it doesn't like the type of data it is seeing.
Usually grep identifies if a file is binary or not by seeing first few bytes of its input stream (from file or through stdin). Since you are passing the results from find with a NULL delimiter, it was not able to identify it as a text input and considered it binary data and threw error on the same.
Later you bypassed it with the -a flag to consider the given binary data as text data, you are now searching on the entire null delimited result for /#, but the search results displayed are not unique results that match your pattern, but still the whole original result from the find command. The workaround you can do, it to let know grep to put results with a NULL termination by adding a -z flag as
find . -name "*.ext" -print0 | grep -az "/#" | xargs -0 wc -l
Alternatively, you can use the regex options supported in find itself. Assuming you want to search for files with .ext on directories starting with #, you could do
find . -type f -regex ".*[#].*/.*ext" -print0 | xargs -0 wc -l

The -print0 option from man page:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that
-print uses).
You used grep on a one line output.
Also, this option doesn't print the full path if you run with path ..
So, i would suggest:
find . "(" -name "*.ext" ")" -exec readlink -f {} \;|grep "/#"|xargs wc -l

Related

Workaround for xargs Argument list too long in grep

For every line of a file I need to search if an string, containing regular expressions, is found in another file.
The problem is that the files are big, the first is 24MB and the second 115MB. I've tried first $(cat file1) as first argument of grep but it complains for the file size and then I'm trying now with xargs grep but is the same error
If I do a simple string search works
find . -name records.txt | xargs grep "999987^00086"
999987^00086^14743^00061^4
but then if a try to take all the file with cat as argument it fails
find . -name records.txt | xargs grep "$(records_tofix.txt)"
-bash: /usr/bin/xargs Argument list too long on grep
Use the -f option:
grep -f records_tofix.txt
The file should contain the patterns each on its own line.
find can execute commands directly, no reason to call xargs. The + syntax of -exec doesn't call the command for each value separately, but fills the whole command line similarly to xargs:
find . -name records.txt -exec grep -f records_tofix.txt -- {} +

grep cannot read filename after find folders with spaces

Hi after I find the files and enclose their name with double quotes with the following command:
FILES=$(find . -type f -not -path "./.git/*" -exec echo -n '"{}" ' \; | tr '\n' ' ')
I do a for loop to grep a certain word inside each file that matches find:
for f in $FILES; do grep -Eq '(GNU)' $f; done
but grep complains about each entry that it cannot find file or directory:
grep: "./test/test.c": No such file or directory
see picture:
whereas echo $FILES produces:
"./.DS_Store" "./.gitignore" "./add_license.sh" "./ads.add_lcs.log" "./lcs_gplv2" "./lcs_mit" "./LICENSE" "./new test/test.js" "./README.md" "./sxs.add_lcs.log" "./test/test.c" "./test/test.h" "./test/test.js" "./test/test.m" "./test/test.py" "./test/test.pyc"
EDIT
found the answer here. works perfectly!
The issue is that your array contains filenames surrounded by literal " quotes.
But worse, find's -exec cmd {} \; executes cmd separately for each file which can be inefficient. As mentioned by #TomFenech in the comments, you can use -exec cmd {} + to search as many files within a single cmd invocation as possible.
A better approach for recursive search is usually to let find output filenames to search, and pipe its results to xargs in order to grep inside as many filenames together as possible. Use -print0 and -0 respectively to correctly support filenames with spaces and other separators, by splitting results by a null character instead - this way you don't need quotes, reducing possibility of bugs.
Something like this:
find . -type f -not -path './.git/*' -print0 | xargs -0 egrep '(GNU)'
However in your question you had grep -q in a loop, so I suspect you may be looking for an error status (found/not found) for each file? If so, you could use -l instead of -q to make grep list matching filenames, and then pipe/send that output to where you need the results.
find . -print0 | xargs -0 egrep -l pattern > matching_filenames
Also note that grep -E (or egrep) uses extended regular expressions, which means parentheses create a regex group. If you want to search for files containing (GNU) (with the parentheses) use grep -F or fgrep instead, which treats the pattern as a string literal.

How to grep query all files for two strings

Here we go:
I need to query php files which both have a TODO statement as well as my name.
Both strings could be anywhere in the document (ie. line) and be positioned anywhere on 0-infinite lines (position 0-n).
How to grep query for my name:
find -name '*.php' -exec grep -in "fincken" {} +
output:
./some/file.php:51: ramon fincken
./somefile.php:2: rfincken
How to grep query for the TODOs
find -name '*.php' -exec grep -n "TODO" {} +
output:
./some/file.php:53: // TODO: foobar!
./some/otherfile.php:53: // TODO: foobar?
I need to combine both grep queries (or their results) so I am expecting this as result:
./some/file.php
I have tried operators in one grep, but they expected both strings on the same line and in a particular order or .. came up with all results (OR .. OR) instead of ( AND )
this line looks ugly, but it should give what you want:
find whatever...|xargs grep -il 'fincken'
|xargs grep -il 'todo'
|xargs grep -in -e'todo' -e'fincken'
The output would look like:
/foo/bar/file : 100:TODO
/foo/bar/file : 101:fincken
only files with both TODO and fincken would be listed.
Ask the first grep to return just the file name and then pipe to another grep:
find -name '*.php' -exec grep -li "fincken" {} + | xargs grep -l "TODO"
From man grep, -l (L) returns file name. This way, the find comman will return a list of files that will be processed one by one through the xargs command.
Your output will be the list of files which contain both "fincken" and "TODO". You can of course pipe more xargs grep -l if you want to add more words to find.
You can also do use of grep alone like this, using -R to do a recursive search:
grep -Rl --include="*php" "TODO" * | xargs grep -il "fincken"
Note I moved the TODO grep to be done in the first place, because you use -i for "fincken" and it is way slowlier. This way, the grep -i will only be run on the already filtered results.
You can pipe the first grep through a second one, get the name of the file and skip repetitions:
find -name '*.php' -exec grep -in "fincken" {} + | grep TODO | cut -d: -f1 | uniq
People are making this more complicated then it needs to be. -exec will take the exit code of the command it runs and use it logically in find. So you can just do
find -name '*.php' -exec grep -iq "fincken" {} \; -exec grep -iq "TODO" {} \; -print
Which will get to the -print only if both -exec blocks return 0.

Better way to limit the unix command find by filename

I'm getting results using find with filenames that have '~' and .swp, etc. So I did the following, but is there a better way to do this? The '.*.js' -iname '*.js' part feels "redundant".
$ find ./ '.*.js' -iname '*.js' -print0 | xargs -0 grep -n ".*loginError.*"
find: `.*.js': No such file or directory
./js/signin.js:252: foo.loginError();
./js/signin.js:339:foo.loginError = function() {
./js/signin.js:340: foo.log("ui.loginError");
Try using
find . -name \*.js -print0 | xargs -0 grep -n ".*loginError.*"
That will find only files with 'js' extension and not ending in ~ or .swp
EDIT: Added '0' -print0 (edit requires 6 characters so I'm adding this; ergh!)
To do it all in one command without the xargs you could do it like this
find . -name "*.js" -exec grep -n ".*loginError.*" /dev/null {} \;
the /dev/null piece is to make grep think it's searching multiple files and then it'll output the filename correctly, otherwise it'd just print out the line number without telling you which file it's in

Shell script string search where '#' is not in 1st character position

this string search was provided by Paul.R (much appreciated Paul):
** find dir -type f -print0 | xargs -0 grep -F -f strings.txt **
Note, I am using the above search argument to perform a recursive directory search for hard coded path names within shell scripts. However, due to the limitations of the Unix environment (TRU64) I am unable to use the GREP -r switch to perform my directory search. Hence the use of the solution provided above.
As an additional criteria, I would like to extend this search argument to exclude any text where the first leading character of the string being searched is "#" (comment symbol).
Would appreciate any feedback.
Thanks...Evan
I'm assuming you're just trying to limit the results in the output from the command you posted. If that's so, then how about
find dir -type f -print0 | xargs -0 grep -F -f strings.txt | grep -v '^#'
The final piped command will ignore all lines that match the regex ^# (begins with # char)
This solution will not work if the path and file of the files is contained in strings.txt, but it might work in your situation.
find dir -type f -print0 | xargs -0 grep -v '^#' | grep -F -f strings.txt

Resources