total size of group of files selected with 'find' - shell

For instance, I have a large filesystem that is filling up faster than I expected. So I look for what's being added:
find /rapidly_shrinking_drive/ -type f -mtime -1 -ls | less
And I find, well, lots of stuff. Thousands of files of six-seven types. I can single out a type and count them:
find /rapidly_shrinking_drive/ -name "*offender1*" -mtime -1 -ls | wc -l
but what I'd really like is to be able to get the total size on disk of these files:
find /rapidly_shrinking_drive/ -name "*offender1*" -mtime -1 | howmuchspace
I'm open to a Perl one-liner for this, if someone's got one, but I'm not going to use any solution that involves a multi-line script, or File::Find.

The command du tells you about disk usage. Example usage for your specific case:
find rapidly_shrinking_drive/ -name "offender1" -mtime -1 -print0 | du --files0-from=- -hc | tail -n1
(Previously I wrote du -hs, but on my machine that appears to disregard find's input and instead summarises the size of the cwd.)

Darn, Stephan202 is right. I didn't think about du -s (summarize), so instead I used awk:
find rapidly_shrinking_drive/ -name "offender1" -mtime -1 | du | awk '{total+=$1} END{print total}'
I like the other answer better though, and it's almost certainly more efficient.

with GNU find,
find /path -name "offender" -printf "%s\n" | awk '{t+=$1}END{print t}'

I'd like to promote jason's comment above to the status of answer, because I believe it's the most mnemonic (though not the most generic, if you really gotta have the file list specified by find):
$ du -hs *.nc
6.1M foo.nc
280K foo_region_N2O.nc
8.0K foo_region_PS.nc
844K foo_region_xyz.nc
844K foo_region_z.nc
37M ETOPO1_Ice_g_gmt4.grd_region_zS.nc
$ du -ch *.nc | tail -n 1
45M total
$ du -cb *.nc | tail -n 1
47033368 total

Recently i faced the same(almost) problem and i came up with this solution.
find $path -type f -printf '%s '
It'll show files sizes in bytes, from man find:
-printf format
True; print format on the standard output, interpreting `\' escapes and `%' directives. Field widths and precisions can be spec‐
ified as with the `printf' C function. Please note that many of the fields are printed as %s rather than %d, and this may mean
that flags don't work as you might expect. This also means that the `-' flag does work (it forces fields to be left-aligned).
Unlike -print, -printf does not add a newline at the end of the string.
...
%s File's size in bytes.
...
And to get a total i used this:
echo $[ $(find $path -type f -printf %s+)0] #b
echo $[($(find $path -type f -printf %s+)0)/1024] #Kb
echo $[($(find $path -type f -printf %s+)0)/1024/1024] #Mb
echo $[($(find $path -type f -printf %s+)0)/1024/1024/1024] #Gb

I have tried all this commands but no luck.
So I have found this one that gives me an answer:
find . -type f -mtime -30 -exec ls -l {} \; | awk '{ s+=$5 } END { print s }'

Since OP specifically said:
I'm open to a Perl one-liner for this, if someone's got one, but I'm
not going to use any solution that involves a multi-line script, or
File::Find.
...and there's none yet, here is the perl one-liner:
find . -name "*offender1*" | perl -lne '$total += -s $_; END { print $total }'

You could also use ls -l to find their size, then awk to extract the size:
find /rapidly_shrinking_drive/ -name "offender1" -mtime -1 | ls -l | awk '{print $5}' | sum

Related

How to tell find command to escape space characters in file names?

I have a single line find command, which recursively checks and prints out the size, owner and name of a specific file type which are created in a specific time frame. But in the result, the filename column is given until the first space character in the directory or file name.
Any idea to fix this problem right in this single command without writing any loop in the bash? Thanks!
here is the command:
find /path/to/dist -type f -iname "*.png" -newermt 2015-01-01 ! -newermt 2016-12-31 -ls | sort -k5 | sort -k5 | awk '{print $5"\t"$7"\t"$11}'
Try changing your awk command to this :
awk '{$1=$2=$3=$4=$6=$8=$9=$10="" ; print$11}'
So that the whole command becomes this :
find /path/to/dist -type f -iname "*.png" -newermt 2015-01-01 ! -newermt 2016-12-31 -ls | sort -k5 | awk '{$1=$2=$3=$4=$6=$8=$9=$10="" ; print$0}'
This leaves some extra spaces at the beginning of the line, hopefully it works for your purpose.
I have removed the second instance of sort, as it sorts on the same key as the first, which does not seem likely to do anything.
Well thanks to Arno's input, the following line does the job. I used exec (-exec ls -lh {} \;) to make the size human readable:
find /Path/To/Dest/ -type f -iname "*.pdf" -newermt 2015-01-01 ! -newermt 2016-12-31 -exec ls -lh {} \; |sed 's/\\ /!!!/g' | sort -k5 | awk '{gsub("!!!"," ",$11);print $3"\t"$5"\t"$9}'
I found the following solution. You hide the space in filename. I did it with a sed, I used a strange chain "!!!" to replace "\ ". Then I replace it in awk command. Here is the command I used for my tests:
find . -type f -iname "*.pdf" -newermt 2015-01-01 -ls |sed 's/\\ /!!!/g' | sort -k5 | awk '{gsub("!!!"," ",$11);print $5"\t"$7"\t"$11}'
The -print0 action of find is probably the starting point. From find's manual page:
-print0
True; print the full file name on the standard output,
followed by a null character (instead of the newline
character that -print uses). This allows file names that
contain newlines or other types of white space to be cor‐
rectly interpreted by programs that process the find out‐
put. This option corresponds to the -0 option of xargs.
But find has the nice printf action that is even better:
find /path/to/dist -type f -iname "*.png" -newermt 2015-01-01 ! -newermt 2016-12-31 -printf "%u\t%s\t%p\n" | sort
probably does the job.

BASH script : list all files including subdirectories and sort them by date

I have a bash script:
for entry in "/home/pictures"/*
do
echo "ls -larth $entry"
done
I want to list also the files in subfolders and include their path
I want to sort the results by date
It must be a bash script, because some other software (Jenkins) will call it .
Try find.
find /home/pictures -type f -exec ls -l --full-time {} \; | sort -k 6
If there are no newlines in file names use:
find /home/pictures -type f -printf '%T# %p\n'|sort -n
If you can not tolerate timestamps in output, use:
find /home/pictures -type f -printf '%28T# %p\n' | sort -n | cut -c30-
If there is possibility of newlines in file name, and, if you can make the program that consumes the output accept null terminated records, you can use:
find /home/pictures -type f -printf '%T#,%p\0' | sort -nz
For no timestamps in output, use:
find /home/pictures -type f -printf '%28T# %p\0' | sort -nz | cut -zc30-
P.S.
I have assumed that you want to sort by last modification time.
I found out the solution for my question:
find . -name * -exec ls -larth {} +

find folders with executable files

I wrote a script to find all folders that contain executable files. I was first seeking a oneliner command but could find one. (I especially tried to use sort -k -u).
. The script works fine but my initial question remains: Is there a oneliner command to do that?
#! /bin/bash
find $1 -type d | while read Path
do
X=$(ls -l "$Path" | grep '^-rwx' | wc -l)
if ((X>0))
then
echo $Path
fi
done
Using find:
find $1 -type f -perm /111 -exec dirname {} \; | sort -u
This finds all files with permission 111 (i.e. rwx) but then we output only the directory name. To avoid duplicates, sort -u is used.
As pointed out by Paulo Almeida in the comments, this would also work:
find $1 -type f -perm /111 -printf "%h\n" | sort -u

Use find, wc, and sed to count lines

I was trying to use sed to count all the lines based on a particular extension.
find -name '*.m' -exec wc -l {} \; | sed ...
I was trying to do the following, how would I include sed in this particular line to get the totals.
You may also get the nice formatting from wc with :
wc `find -name '*.m'`
Most of the answers here won't work well for a large number of files. Some will break if the list of file names is too long for a single command line call, others are inefficient because -exec starts a new process for every file. I believe a robust and efficient solution would be:
find . -type f -name "*.m" -print0 | xargs -0 cat | wc -l
Using cat in this way is fine, as its output is piped straight into wc so only a small amount of the files' content is kept in memory at once. If there are too many files for a single invocation of cat, cat will be called multiple times, but all the output will still be piped into a single wc process.
You can cat all files through a single wc instance to get the total number of lines:
find . -name '*.m' -exec cat {} \; | wc -l
On modern GNU platforms wc and find take -print0 and -files0-from parameters that can be combined into a command that count lines in files with total at the end. Example:
find . -name '*.c' -type f -print0 | wc -l --files0-from=-
you could use sed also for counting lines in place of wc:
find . -name '*.m' -exec sed -n '$=' {} \;
where '$=' is a "special variable" that keep the count of lines
EDIT
you could also try something like sloccount
Hm, solution with cat may be problematic if you have many files, especially big ones.
Second solution doesn't give total, just lines per file, as I tested.
I'll prefer something like this:
find . -name '*.m' | xargs wc -l | tail -1
This will do the job fast, no matter how many and how big files you have.
sed is not the proper tool for counting. Use awk instead:
find . -name '*.m' -exec awk '{print NR}' {} +
Using + instead of \; forces find to call awk every N files found (like with xargs).
For big directories we should use:
find . -type f -name '*.m' -exec sed -n '$=' '{}' + 2>/dev/null | awk '{ total+=$1 }END{print total}'
# alternative using awk twice
find . -type f -name '*.m' -exec awk 'END {print NR}' '{}' + 2>/dev/null | awk '{ total+=$1 }END{print total}'

How can I count all the lines of code in a directory recursively?

We've got a PHP application and want to count all the lines of code under a specific directory and its subdirectories.
We don't need to ignore comments, as we're just trying to get a rough idea.
wc -l *.php
That command works great for a given directory, but it ignores subdirectories. I was thinking the following comment might work, but it is returning 74, which is definitely not the case...
find . -name '*.php' | wc -l
What's the correct syntax to feed in all the files from a directory resursively?
Try:
find . -name '*.php' | xargs wc -l
or (when file names include special characters such as spaces)
find . -name '*.php' | sed 's/.*/"&"/' | xargs wc -l
The SLOCCount tool may help as well.
It will give an accurate source lines of code count for whatever
hierarchy you point it at, as well as some additional stats.
Sorted output:
find . -name '*.php' | xargs wc -l | sort -nr
For another one-liner:
( find ./ -name '*.php' -print0 | xargs -0 cat ) | wc -l
It works on names with spaces and only outputs one number.
You can use the cloc utility which is built for this exact purpose. It reports each the amount of lines in each language, together with how many of them are comments, etc. CLOC is available on Linux, Mac and Windows.
Usage and output example:
$ cloc --exclude-lang=DTD,Lua,make,Python .
2570 text files.
2200 unique files.
8654 files ignored.
http://cloc.sourceforge.net v 1.53 T=8.0 s (202.4 files/s, 99198.6 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
JavaScript 1506 77848 212000 366495
CSS 56 9671 20147 87695
HTML 51 1409 151 7480
XML 6 3088 1383 6222
-------------------------------------------------------------------------------
SUM: 1619 92016 233681 467892
-------------------------------------------------------------------------------
If using a decently recent version of Bash (or ZSH), it's much simpler:
wc -l **/*.php
In the Bash shell this requires the globstar option to be set, otherwise the ** glob-operator is not recursive. To enable this setting, issue
shopt -s globstar
To make this permanent, add it to one of the initialization files (~/.bashrc, ~/.bash_profile etc.).
On Unix-like systems, there is a tool called cloc which provides code statistics.
I ran in on a random directory in our code base it says:
59 text files.
56 unique files.
5 files ignored.
http://cloc.sourceforge.net v 1.53 T=0.5 s (108.0 files/s, 50180.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C 36 3060 1431 16359
C/C++ Header 16 689 393 3032
make 1 17 9 54
Teamcenter def 1 10 0 36
-------------------------------------------------------------------------------
SUM: 54 3776 1833 19481
-------------------------------------------------------------------------------
You didn't specify how many files are there or what is the desired output.
This may be what you are looking for:
find . -name '*.php' | xargs wc -l
Yet another variation :)
$ find . -name '*.php' | xargs cat | wc -l
This will give the total sum, instead of file-by-file.
Add . after find to make it work.
Use find's -exec and awk. Here we go:
find . -type f -exec wc -l {} \; | awk '{ SUM += $0} END { print SUM }'
This snippet finds for all files (-type f). To find by file extension, use -name:
find . -name '*.py' -exec wc -l '{}' \; | awk '{ SUM += $0; } END { print SUM; }'
More common and simple as for me, suppose you need to count files of different name extensions (say, also natives):
wc $(find . -type f | egrep "\.(h|c|cpp|php|cc)" )
The tool Tokei displays statistics about code in a directory. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language. Tokei is also available on Mac, Linux, and Windows.
An example of the output of Tokei is as follows:
$ tokei
-------------------------------------------------------------------------------
Language Files Lines Code Comments Blanks
-------------------------------------------------------------------------------
CSS 2 12 12 0 0
JavaScript 1 435 404 0 31
JSON 3 178 178 0 0
Markdown 1 9 9 0 0
Rust 10 408 259 84 65
TOML 3 69 41 17 11
YAML 1 30 25 0 5
-------------------------------------------------------------------------------
Total 21 1141 928 101 112
-------------------------------------------------------------------------------
Tokei can be installed by following the instructions on the README file in the repository.
POSIX
Unlike most other answers here, these work on any POSIX system, for any number of files, and with any file names (except where noted).
Lines in each file:
find . -name '*.php' -type f -exec wc -l {} \;
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} +
Lines in each file, sorted by file path
find . -name '*.php' -type f | sort | xargs -L1 wc -l
# for files with spaces or newlines, use the non-standard sort -z
find . -name '*.php' -type f -print0 | sort -z | xargs -0 -L1 wc -l
Lines in each file, sorted by number of lines, descending
find . -name '*.php' -type f -exec wc -l {} \; | sort -nr
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} + | sort -nr
Total lines in all files
find . -name '*.php' -type f -exec cat {} + | wc -l
There is a little tool called sloccount to count the lines of code in a directory.
It should be noted that it does more than you want as it ignores empty lines/comments, groups the results per programming language and calculates some statistics.
You want a simple for loop:
total_count=0
for file in $(find . -name *.php -print)
do
count=$(wc -l $file)
let total_count+=count
done
echo "$total_count"
For sources only:
wc `find`
To filter, just use grep:
wc `find | grep .php$`
A straightforward one that will be fast, will use all the search/filtering power of find, not fail when there are too many files (number arguments overflow), work fine with files with funny symbols in their name, without using xargs, and will not launch a uselessly high number of external commands (thanks to + for find's -exec). Here you go:
find . -name '*.php' -type f -exec cat -- {} + | wc -l
None of the answers so far gets at the problem of filenames with spaces.
Additionally, all that use xargs are subject to fail if the total length of paths in the tree exceeds the shell environment size limit (defaults to a few megabytes in Linux).
Here is one that fixes these problems in a pretty direct manner. The subshell takes care of files with spaces. The awk totals the stream of individual file wc outputs, so it ought never to run out of space. It also restricts the exec to files only (skipping directories):
find . -type f -name '*.php' -exec bash -c 'wc -l "$0"' {} \; | awk '{s+=$1} END {print s}'
I know the question is tagged as bash, but it seems that the problem you're trying to solve is also PHP related.
Sebastian Bergmann wrote a tool called PHPLOC that does what you want and on top of that provides you with an overview of a project's complexity. This is an example of its report:
Size
Lines of Code (LOC) 29047
Comment Lines of Code (CLOC) 14022 (48.27%)
Non-Comment Lines of Code (NCLOC) 15025 (51.73%)
Logical Lines of Code (LLOC) 3484 (11.99%)
Classes 3314 (95.12%)
Average Class Length 29
Average Method Length 4
Functions 153 (4.39%)
Average Function Length 1
Not in classes or functions 17 (0.49%)
Complexity
Cyclomatic Complexity / LLOC 0.51
Cyclomatic Complexity / Number of Methods 3.37
As you can see, the information provided is a lot more useful from the perspective of a developer, because it can roughly tell you how complex a project is before you start working with it.
If you want to keep it simple, cut out the middleman and just call wc with all the filenames:
wc -l `find . -name "*.php"`
Or in the modern syntax:
wc -l $(find . -name "*.php")
This works as long as there are no spaces in any of the directory names or filenames. And as long as you don't have tens of thousands of files (modern shells support really long command lines). Your project has 74 files, so you've got plenty of room to grow.
WC -L ? better use GREP -C ^
wc -l? Wrong!
The wc command counts new lines codes, not lines! When the last line in the file does not end with new line code, this will not be counted!
If you still want count lines, use grep -c ^. Full example:
# This example prints line count for all found files
total=0
find /path -type f -name "*.php" | while read FILE; do
# You see, use 'grep' instead of 'wc'! for properly counting
count=$(grep -c ^ < "$FILE")
echo "$FILE has $count lines"
let total=total+count #in bash, you can convert this for another shell
done
echo TOTAL LINES COUNTED: $total
Finally, watch out for the wc -l trap (counts enters, not lines!!!)
Giving out the longest files first (ie. maybe these long files need some refactoring love?), and excluding some vendor directories:
find . -name '*.php' | xargs wc -l | sort -nr | egrep -v "libs|tmp|tests|vendor" | less
For Windows, an easy-and-quick tool is LocMetrics.
You can use a utility called codel (link). It's a simple Python module to count lines with colorful formatting.
Installation
pip install codel
Usage
To count lines of C++ files (with .cpp and .h extensions), use:
codel count -e .cpp .h
You can also ignore some files/folder with the .gitignore format:
codel count -e .py -i tests/**
It will ignore all the files in the tests/ folder.
The output looks like:
You also can shorten the output with the -s flag. It will hide the information of each file and show only information about each extension. The example is below:
If you want your results sorted by number of lines, you can just add | sort or | sort -r (-r for descending order) to the first answer, like so:
find . -name '*.php' | xargs wc -l | sort -r
If the files are too many, better to just look for the total line count.
find . -name '*.php' | xargs wc -l | grep -i ' total' | awk '{print $1}'
Very simply:
find /path -type f -name "*.php" | while read FILE
do
count=$(wc -l < $FILE)
echo "$FILE has $count lines"
done
Something different:
wc -l `tree -if --noreport | grep -e'\.php$'`
This works out fine, but you need to have at least one *.php file in the current folder or one of its subfolders, or else wc stalls.
It’s very easy with Z shell (zsh) globs:
wc -l ./**/*.php
If you are using Bash, you just need to upgrade. There is absolutely no reason to use Bash.
On OS X at least, the find+xarg+wc commands listed in some of the other answers prints "total" several times on large listings, and there is no complete total given. I was able to get a single total for .c files using the following command:
find . -name '*.c' -print0 |xargs -0 wc -l|grep -v total|awk '{ sum += $1; } END { print "SUM: " sum; }'
If you need just the total number of lines in, let's say, your PHP files, you can use very simple one line command even under Windows if you have GnuWin32 installed. Like this:
cat `/gnuwin32/bin/find.exe . -name *.php` | wc -l
You need to specify where exactly is the find.exe otherwise the Windows provided FIND.EXE (from the old DOS-like commands) will be executed, since it is probably before the GnuWin32 in the environment PATH and has different parameters and results.
Please note that in the command above you should use back-quotes, not single quotes.
While I like the scripts, I prefer this one as it also shows a per-file summary as long as a total:
wc -l `find . -name "*.php"`

Resources