How do you grep results from 'find'? - bash

Trying to find a word/pattern contained within the resulting file names of the find command.
For instance, I have this command:
find . -name Gruntfile.js that returns several file names.
How do I grep within these for a word pattern?
Was thinking something along the lines of:
find . -name Gruntfile.js | grep -rnw -e 'purifycss'
However, this is doesn't work..

Use the -exec {} + option to pass the list of filenames that are found as arguments to grep:
find -name Gruntfile.js -exec grep -nw 'purifycss' {} +
This is the safest and most efficient approach, as it doesn't break when the path to the file isn't "well-behaved" (e.g. contains a space). Like an approach using xargs, it also minimises the number of calls to grep by passing multiple filenames at once.
I have removed the -e and -r switches, as I don't think that they're useful to you here.
An excerpt from man find:
-exec command {} +
This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total number of invocations of the command will be much less than the number of matched files.

While this doesn't strictly answer your question, provided you have globstar turned on (shopt -s globstar), you could filter the results in bash like this:
grep something **/Gruntfile.js
I was using religiously the approach used by Tom Fenech until I switched to zsh, which handles such things much better. Now all I do is:
grep text **/*(.)
which greps text through all regular files in current directory.
I believe this to be much cleaner syntax especially for day-to-day work in shell.

When too many files exist for the * expansion to run:
$ grep -o 'xxmaj\|xxbos\|xxfld' train/* | wc -l
-bash: /bin/grep: Argument list too long
0
Then this code fixes the “too long” problem:
$ find junk -maxdepth 1 -type f | xargs grep -o 'TVDetails\|xxmaj\|xxbos\|xxfld'
junk/gum-.doc.out:TVDetails
junk/Zv0n.doc.out:TVDetails
$ find junk -maxdepth 1 -type f | xargs grep -o 'TVDetails\|xxmaj\|xxbos\|xxfld' | wc -l
2
It runs faster on my system, and maybe yours, when using the -P 0 option:
$ /usr/bin/time -f "%E Elapsed Real Time" find train -maxdepth 1 -type f | xargs -P 0 grep -o 'TVDetails\|xxmaj\|xxbos\|xxfld' | wc -l
0:02.45 Elapsed Real Time
358
$ /usr/bin/time -f "%E Elapsed Real Time" find train -maxdepth 1 -type f | xargs grep -o 'TVDetails\|xxmaj\|xxbos\|xxfld' | wc -l
0:11.96 Elapsed Real Time
358
Hope this helps.

Related

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

xargs with multiple commands

In the current directory, I'd like to print the filename and contents in it.
I can print filenames or contents separately by
find . | grep "file_for_print" | xargs echo
find . | grep "file_for_print" | xargs cat
but what I want is printing them together like this:
file1
line1 inside file1
line2 inside file1
file2
line1 inside file2
line2 inside file2
I read xargs with multiple commands as argument
and tried
find . | grep "file_for_print" | xargs -I % sh -c 'echo; cat;'
but doesn't work.
I'm not familiar with xargs, so don't know what exactly "-I % sh -c" means.
could anyone help me? thank you!
find . | grep "file_for_print" | xargs -I % sh -c 'echo %; cat %;' (OP was missing %s)
To start with, there is virtually no difference between:
find . | grep "file_for_print" | xargs echo
and
find . -name "file_for_print*"
except that the second one will not match filenames like this_is_not_the_file_for_print, and it will print the filenames one per line. It will also be a lot faster, because it doesn't need to generate and print the entire recursive directory structure just in order for grep to toss most of it away.
find . -name "file_for_print*"
is actually exactly the same as
find . -name "file_for_print*" -print
where the -print action prints each matched filename followed by a newline. If you don't provide find with any actions, it assumes you wanted -print. But it has more tricks up its sleeve than that. For example:
find . -name "file_for_print*" -exec cat {} \;
The -exec action causes find to execute the following command, up to the \;, replacing {} with each matching file name.
find does not limit itself to a single action. You can tell it to do however many you want. So:
find . -name "file_for_print*" -print -exec cat {} \;
will probably do pretty well what you want.
For lots more information on this very useful utility, type:
man find
or
info find
and read all about It.
Since it's not been said yet: -I % tells xargs to replace '%' with the arguments in the command you give it. The sh -c '...' just means run the commands '...' in a new shell.
So
xargs -I % sh -c 'echo %; cat %;'
will run echo [filename] followed by cat [filename] for every filename given to xargs. The echo and cat commands will be executed inside a different shell process but this usually doesn't matter. Your version didn't work because it was missing the % signs inside the command passed to xargs.
For what it's worth I would use this command to achieve the same thing:
find -name "*file_for_print*" | parallel 'echo {}; cat {};'
because it's simpler (parallel automatically uses {} as the substitution character and can take multiple commands by default).
In this specific case, each command is executed for each individual file anyway, so there's no advantage in using xargs. You may just append -exec twice to your 'find':
find . -name "*file_for_print*" -exec echo {} \; -exec cat {} \;
In this case-print could be used instead of the first echo as pointed out by rici, but this example shows the ability to execute two arbitrary commands with a single find
What about writing your own bash function?
#!/bin/bash
myFunction() {
while read -r file; do
echo "$file"
cat "$file"
done
}
find . -name "file_for_print*" | myFunction

Find, grep, and execute - all in one?

This is the command I've been using for finding matches (queryString) in php files, in the current directory, with grep, case insensitive, and showing matching results in line:
find . -iname "*php" -exec grep -iH queryString {} \;
Is there a way to also pipe just the file name of the matches to another script?
I could probably run the -exec command twice, but that seems inefficient.
What I'd love to do on Mac OS X is then actually to "reveal" that file in the finder. I think I can handle that part. If I had to give up the inline matches and just let grep show the files names, and then pipe that to a third script, that would be fine, too - I would settle.
But I'm actually not even sure how to pipe the output (the matched file names) to somewhere else...
Help! :)
Clarification
I'd like to reveal each of the files in a finder window - so I'm probably not going to using the -q flag and stop at the first one.
I'm going to run this in the console, ideally I'd like to see the inline matches printed out there, as well as being able to pipe them to another script, like oascript (applescript, to reveal them in the finder). That's why I have been using -H - because I like to see both the file name and the match.
If I had to settle for just using -l so that the file name could more easily be piped to another script, that would be OK, too. But I think after looking at the reply below from #Charlie Martin, that xargs could be helpful here in doing both at the same time with a single find, and single grep command.
I did say bash but I don't really mind if this needs to be ran as /bin/sh instead - I don't know too much about the differences yet, but I do know there are some important ones.
Thank you all for the responses, I'm going to try some of them at the command line and see if I can get any of them to work and then I think I can choose the best answer. Leave a comment if you want me to clarify anything more.
Thanks again!
You bet. The usual thing is something like
$ find /path -name pattern -print | xargs command
So you might for example do
$ find . -name '*.[ch]' -print | xargs grep -H 'main'
(Quiz: why -H?)
You can carry on with this farther; for example. you might use
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1
to get the vector of file names for files that contain 'main', or
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1 |
xargs growlnotify -
to have each name become a Growl notification.
You could also do
$ grep pattern `find /path -name pattern`
or
$ grep pattern $(find /path -name pattern)
(in bash(1) at least these are equivalent) but you can run into limits on the length of a command line that way.
Update
To answer your questions:
(1) You can do anything in bash you can do in sh. The one thing I've mentioned that would be any different is the use of $(command) in place of using backticks around command, and that works in the version of sh on Macs. The csh, zsh, ash, and fish are different.
(2) I think merely doing $ open $(dirname arg) will opena finder window on the containing directory.
It sounds like you want to open all *.php files that contain querystring from within a Terminal.app session.
You could do it this way:
find . -name '*.php' -exec grep -li 'querystring' {} \; | xargs open
With my setup, this opens MacVim with each file on a separate tab. YMMV.
Replace -H with -l and you will get a list of those filenames that matched the pattern.
if you have bash4, simply do
grep pattern /path/**/*.php
the ** operator is like
grep pattern `find -name \*.php -print`
find /home/aaronmcdaid/Code/ -name '*.cpp' -exec grep -q -iH boost {} \; -exec echo {} \;
The first change I made is to add -q to your grep command. This is "Exit immediately with zero status if any match is found".
The good news is that this speeds up grep when a file has many matching lines. You don't care how many matches there are. But that means we need another exec on the end to actually print the filenames when grep has been successful
The grep result will be sent to stdout, so another -exec predicate is probably the best solution here.
Pipe to another script:
find . -iname "*.php" | myScript
File names will come into the stdin of myScript 1 line at a time.
You can also use xargs to form/execute commands to act on each file:
find . -iname "*.php" | xargs ls -l
act on files you find that match:
find . -iname "*.php" | xargs grep -l pattern | myScript
act that don't match pattern
find . -iname "*.php" | xargs grep -L pattern | myScript
In general using multiple -exec's and grep -q will be FAR faster than piping, since find has implied short circuits -a's separating each juxtaposed pair of expressions that's not separated with an explicit operator. The main problem here, is that you want something to happen if grep matches something AND for matches to be printed. If the files are reasonably sized then this should be faster (because grep -q exits after finding a single match)
find . -iname "*php" -exec grep -iq queryString {} \; -exec grep -iH queryString {} \; -exec otherprogram {} \;
If the files are particularly big, encapsulating it in a shell script may be faster then running multiple grep commands
find . -iname "*php" -exec bash -c \
'out=$(grep -iH queryString "$1"); [[ -n $out ]] && echo "$out" && exit 0 || exit 1' \
bash {} \; -print
Also note, if the matches are not particularly needed, then
find . -iname "*php" -exec grep -iq queryString {} \; -exec otherprogram {} \;
Will virtually always be faster than then a piped solution like
find . -iname "*php" -print0 | xargs -0 grep -iH | ...
Additionally, you should really have -type f in all cases, unless you want to catch *php directories
Regarding the question of which is faster, and you actually care about the minuscule time difference, which maybe you might if you are trying to see which will save your processor some time... perhaps testing using the command as a suffix to the "time" command, and see which one performs better.

Why does the wc utility generate multiple lines with "total"?

I am using the wc utility in a shell script that I run from Cygwin, and I noticed that there is more than one line with "total" in its output.
The following function is used to count the number of lines in my source files:
count_curdir_src() {
find . '(' -name '*.vb' -o -name '*.cs' ')' \
-a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | \
xargs -0 wc -l
}
But its output for a certain directory looks like this:
$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | xargs -0 wc -l
19 ./dirA/fileABC.cs
640 ./dirA/subdir1/fileDEF.cs
507 ./dirA/subdir1/fileGHI.cs
2596 ./dirA/subdir1/fileJKL.cs
(...many others...)
58 ./dirB/fileMNO.cs
36 ./dirB/subdir1/filePQR.cs
122200 total
6022 ./dirB/subdir2/subsubdir/fileSTU.cs
24 ./dirC/fileVWX.cs
(...)
36 ./dirZ/Properties/AssemblyInfo.cs
88 ./dirZ/fileYZ.cs
25236 total
It looks like wc resets somewhere in the process. It cannot be caused by space characters in filenames or directory names, because I use the -print0 option. And it only happens when I run it on my largest source tree.
So, is this a bug in wc, or in Cygwin? Or something else? The wc manpage says:
Print newline, word, and byte counts
for each FILE, and a total line if
more than one FILE is specified.
It doesn't mention anything about multiple total lines (intermediate total counts or something), so who's to blame here?
What's happening is that xargs is running wc multiple times. xargs by default batches as many arguments as it thinks it can into each invocation of the command it's supposed to run, but if there are too many files it will run the command multiple times on subsets of the files.
There are a couple ways I see to fix this. The first, which will break if you have too many files, is to skip xargs and use the shell. This may not work well on Cygwin, but would look like this:
wc -l $(find . '(' -name '*.vb' -o -name '*.cs' ')' \
-a '!' -iname '*.Designer.*' -a '!' -iname '.svn' )
and you also lose the print0 capabilities.
The other is to use an awk (or perl) script to process the output of your find/xargs combo, skip "total" lines, and sum up the total yourself.
You're calling wc multiple times - once for each "batch" of input arguments provided by xargs. You're getting one total per batch.
One alternative is to use a temporary file and the --files0-from option for wc:
$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a
'!' -iname '.svn' -print0 > files
$ wc --files0-from files
The command-line length is much more limited under cygwin than on a standard linux box, and xargs must split the input to respect those limits. You can check the limits with xargs --show-limits:
On cygwin:
$ xargs --show-limits < /dev/null
Your environment variables take up 4913 bytes
POSIX upper limit on argument length (this system): 25039
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 20126
Size of command buffer we are actually using: 25039
On centos:
$ xargs --show-limits < /dev/null
Your environment variables take up 1816 bytes
POSIX upper limit on argument length (this system): 2617576
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2615760
Size of command buffer we are actually using: 131072
And to build on #JonSkeet's answer, you don't need to create an additional file, you can pipe your find results directly to wc, by passing - as argument to --files0-from:
find . -name '*.vb' -print0 | wc -l --files0-from=-
To avoid generation of multiple lines with "total" counts when feeding the wc utility with an enormous number of file paths as command line arguments, you can use an intermediate xargs to cat the contents of files to the stdin of wc (see piping output of find to xargs wc gives unreasonable totals).
This is a workaround if your wc command does not have the --files0-from as mentioned by Xavier.
count_curdir_src() (
export LC_ALL=C
find . -name '*.vb' -print0 | xargs -0 -n 1000 cat | wc -l
)

How can I count all the lines of code in a directory recursively?

We've got a PHP application and want to count all the lines of code under a specific directory and its subdirectories.
We don't need to ignore comments, as we're just trying to get a rough idea.
wc -l *.php
That command works great for a given directory, but it ignores subdirectories. I was thinking the following comment might work, but it is returning 74, which is definitely not the case...
find . -name '*.php' | wc -l
What's the correct syntax to feed in all the files from a directory resursively?
Try:
find . -name '*.php' | xargs wc -l
or (when file names include special characters such as spaces)
find . -name '*.php' | sed 's/.*/"&"/' | xargs wc -l
The SLOCCount tool may help as well.
It will give an accurate source lines of code count for whatever
hierarchy you point it at, as well as some additional stats.
Sorted output:
find . -name '*.php' | xargs wc -l | sort -nr
For another one-liner:
( find ./ -name '*.php' -print0 | xargs -0 cat ) | wc -l
It works on names with spaces and only outputs one number.
You can use the cloc utility which is built for this exact purpose. It reports each the amount of lines in each language, together with how many of them are comments, etc. CLOC is available on Linux, Mac and Windows.
Usage and output example:
$ cloc --exclude-lang=DTD,Lua,make,Python .
2570 text files.
2200 unique files.
8654 files ignored.
http://cloc.sourceforge.net v 1.53 T=8.0 s (202.4 files/s, 99198.6 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
JavaScript 1506 77848 212000 366495
CSS 56 9671 20147 87695
HTML 51 1409 151 7480
XML 6 3088 1383 6222
-------------------------------------------------------------------------------
SUM: 1619 92016 233681 467892
-------------------------------------------------------------------------------
If using a decently recent version of Bash (or ZSH), it's much simpler:
wc -l **/*.php
In the Bash shell this requires the globstar option to be set, otherwise the ** glob-operator is not recursive. To enable this setting, issue
shopt -s globstar
To make this permanent, add it to one of the initialization files (~/.bashrc, ~/.bash_profile etc.).
On Unix-like systems, there is a tool called cloc which provides code statistics.
I ran in on a random directory in our code base it says:
59 text files.
56 unique files.
5 files ignored.
http://cloc.sourceforge.net v 1.53 T=0.5 s (108.0 files/s, 50180.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C 36 3060 1431 16359
C/C++ Header 16 689 393 3032
make 1 17 9 54
Teamcenter def 1 10 0 36
-------------------------------------------------------------------------------
SUM: 54 3776 1833 19481
-------------------------------------------------------------------------------
You didn't specify how many files are there or what is the desired output.
This may be what you are looking for:
find . -name '*.php' | xargs wc -l
Yet another variation :)
$ find . -name '*.php' | xargs cat | wc -l
This will give the total sum, instead of file-by-file.
Add . after find to make it work.
Use find's -exec and awk. Here we go:
find . -type f -exec wc -l {} \; | awk '{ SUM += $0} END { print SUM }'
This snippet finds for all files (-type f). To find by file extension, use -name:
find . -name '*.py' -exec wc -l '{}' \; | awk '{ SUM += $0; } END { print SUM; }'
More common and simple as for me, suppose you need to count files of different name extensions (say, also natives):
wc $(find . -type f | egrep "\.(h|c|cpp|php|cc)" )
The tool Tokei displays statistics about code in a directory. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language. Tokei is also available on Mac, Linux, and Windows.
An example of the output of Tokei is as follows:
$ tokei
-------------------------------------------------------------------------------
Language Files Lines Code Comments Blanks
-------------------------------------------------------------------------------
CSS 2 12 12 0 0
JavaScript 1 435 404 0 31
JSON 3 178 178 0 0
Markdown 1 9 9 0 0
Rust 10 408 259 84 65
TOML 3 69 41 17 11
YAML 1 30 25 0 5
-------------------------------------------------------------------------------
Total 21 1141 928 101 112
-------------------------------------------------------------------------------
Tokei can be installed by following the instructions on the README file in the repository.
POSIX
Unlike most other answers here, these work on any POSIX system, for any number of files, and with any file names (except where noted).
Lines in each file:
find . -name '*.php' -type f -exec wc -l {} \;
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} +
Lines in each file, sorted by file path
find . -name '*.php' -type f | sort | xargs -L1 wc -l
# for files with spaces or newlines, use the non-standard sort -z
find . -name '*.php' -type f -print0 | sort -z | xargs -0 -L1 wc -l
Lines in each file, sorted by number of lines, descending
find . -name '*.php' -type f -exec wc -l {} \; | sort -nr
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} + | sort -nr
Total lines in all files
find . -name '*.php' -type f -exec cat {} + | wc -l
There is a little tool called sloccount to count the lines of code in a directory.
It should be noted that it does more than you want as it ignores empty lines/comments, groups the results per programming language and calculates some statistics.
You want a simple for loop:
total_count=0
for file in $(find . -name *.php -print)
do
count=$(wc -l $file)
let total_count+=count
done
echo "$total_count"
For sources only:
wc `find`
To filter, just use grep:
wc `find | grep .php$`
A straightforward one that will be fast, will use all the search/filtering power of find, not fail when there are too many files (number arguments overflow), work fine with files with funny symbols in their name, without using xargs, and will not launch a uselessly high number of external commands (thanks to + for find's -exec). Here you go:
find . -name '*.php' -type f -exec cat -- {} + | wc -l
None of the answers so far gets at the problem of filenames with spaces.
Additionally, all that use xargs are subject to fail if the total length of paths in the tree exceeds the shell environment size limit (defaults to a few megabytes in Linux).
Here is one that fixes these problems in a pretty direct manner. The subshell takes care of files with spaces. The awk totals the stream of individual file wc outputs, so it ought never to run out of space. It also restricts the exec to files only (skipping directories):
find . -type f -name '*.php' -exec bash -c 'wc -l "$0"' {} \; | awk '{s+=$1} END {print s}'
I know the question is tagged as bash, but it seems that the problem you're trying to solve is also PHP related.
Sebastian Bergmann wrote a tool called PHPLOC that does what you want and on top of that provides you with an overview of a project's complexity. This is an example of its report:
Size
Lines of Code (LOC) 29047
Comment Lines of Code (CLOC) 14022 (48.27%)
Non-Comment Lines of Code (NCLOC) 15025 (51.73%)
Logical Lines of Code (LLOC) 3484 (11.99%)
Classes 3314 (95.12%)
Average Class Length 29
Average Method Length 4
Functions 153 (4.39%)
Average Function Length 1
Not in classes or functions 17 (0.49%)
Complexity
Cyclomatic Complexity / LLOC 0.51
Cyclomatic Complexity / Number of Methods 3.37
As you can see, the information provided is a lot more useful from the perspective of a developer, because it can roughly tell you how complex a project is before you start working with it.
If you want to keep it simple, cut out the middleman and just call wc with all the filenames:
wc -l `find . -name "*.php"`
Or in the modern syntax:
wc -l $(find . -name "*.php")
This works as long as there are no spaces in any of the directory names or filenames. And as long as you don't have tens of thousands of files (modern shells support really long command lines). Your project has 74 files, so you've got plenty of room to grow.
WC -L ? better use GREP -C ^
wc -l? Wrong!
The wc command counts new lines codes, not lines! When the last line in the file does not end with new line code, this will not be counted!
If you still want count lines, use grep -c ^. Full example:
# This example prints line count for all found files
total=0
find /path -type f -name "*.php" | while read FILE; do
# You see, use 'grep' instead of 'wc'! for properly counting
count=$(grep -c ^ < "$FILE")
echo "$FILE has $count lines"
let total=total+count #in bash, you can convert this for another shell
done
echo TOTAL LINES COUNTED: $total
Finally, watch out for the wc -l trap (counts enters, not lines!!!)
Giving out the longest files first (ie. maybe these long files need some refactoring love?), and excluding some vendor directories:
find . -name '*.php' | xargs wc -l | sort -nr | egrep -v "libs|tmp|tests|vendor" | less
For Windows, an easy-and-quick tool is LocMetrics.
You can use a utility called codel (link). It's a simple Python module to count lines with colorful formatting.
Installation
pip install codel
Usage
To count lines of C++ files (with .cpp and .h extensions), use:
codel count -e .cpp .h
You can also ignore some files/folder with the .gitignore format:
codel count -e .py -i tests/**
It will ignore all the files in the tests/ folder.
The output looks like:
You also can shorten the output with the -s flag. It will hide the information of each file and show only information about each extension. The example is below:
If you want your results sorted by number of lines, you can just add | sort or | sort -r (-r for descending order) to the first answer, like so:
find . -name '*.php' | xargs wc -l | sort -r
If the files are too many, better to just look for the total line count.
find . -name '*.php' | xargs wc -l | grep -i ' total' | awk '{print $1}'
Very simply:
find /path -type f -name "*.php" | while read FILE
do
count=$(wc -l < $FILE)
echo "$FILE has $count lines"
done
Something different:
wc -l `tree -if --noreport | grep -e'\.php$'`
This works out fine, but you need to have at least one *.php file in the current folder or one of its subfolders, or else wc stalls.
It’s very easy with Z shell (zsh) globs:
wc -l ./**/*.php
If you are using Bash, you just need to upgrade. There is absolutely no reason to use Bash.
On OS X at least, the find+xarg+wc commands listed in some of the other answers prints "total" several times on large listings, and there is no complete total given. I was able to get a single total for .c files using the following command:
find . -name '*.c' -print0 |xargs -0 wc -l|grep -v total|awk '{ sum += $1; } END { print "SUM: " sum; }'
If you need just the total number of lines in, let's say, your PHP files, you can use very simple one line command even under Windows if you have GnuWin32 installed. Like this:
cat `/gnuwin32/bin/find.exe . -name *.php` | wc -l
You need to specify where exactly is the find.exe otherwise the Windows provided FIND.EXE (from the old DOS-like commands) will be executed, since it is probably before the GnuWin32 in the environment PATH and has different parameters and results.
Please note that in the command above you should use back-quotes, not single quotes.
While I like the scripts, I prefer this one as it also shows a per-file summary as long as a total:
wc -l `find . -name "*.php"`

Resources