I have code with two variables in echo. I don't know why it prints spaces before $NEXT even though I have just one space in code.
NEXT=$(find "${DIR}" -type f -name "*.$ext" | sed "s/.*\/\.//g" | sed "s/.*\///g" |
sed -n '/.*\..*/p' | wc -l)
echo "Files .$ext: $NEXT"
Files .tar: 1
Your find expression is not doing what you think it is:
NEXT=$(find "${DIR}" -type f -name "*.$ext" | sed "s/.*\/\.//g" | sed "s/.*\///g" |
sed -n '/.*\..*/p' | wc -l)
When you pipe to wc -l you are left with a Number. The format of the number will depend on your distributions default compile options for wc. While generally when information is piped or redirected to wc the value returned should be without any leading whitespace (but there is no guarantee that your install of wc will work that way). All you can do it test and see what results, e.g.
ls "$HOME" | wc -l
If whitespace is returned before the value -- you have found your problem.
If the last line is the output, then it seems it is an output of something else than displayed code. When your output looks weird, try putting single quotes around each variable:
echo " Average file size .'$ext': '$AEXT'"
That way, you will know, if the spaces (or tabs) are coming from the variables themselves or from the script.
Related
I want to count how many files in the current directory have the string "A" in the last line.
First solution: tail -n 1 * | grep \"A\"| wc -l
This works fine, but when there are more files it does bash: /usr/bin/tail: Argument list too long.
Is there a way to get around it?
Bonus points if I can also optionally get which files contains it.
EDIT: my folder contains 343729 files
EDIT2: #tso usefully pointed to the article I'm getting "Argument list too long". How can I process a large list in chunks? in his comment.
RESULTS:
#tso solution for f in $(find . -type f); do tail -1 $f|grep \"A\"; done|wc -l takes about 20 minutes
#lars solution grep -P "\"A\"*\Z" -r . | wc -l takes about 20 minutes
#mklement0 solution printf '%s\0' * | xargs -0 sh -c 'tail -q -n 1 "$#" | grep \"A\"' - | wc -l takes about 10 minutes
#james solution (in the comments) for i in * ; do awk 'END{if(/a/)print FILENAME}' "$i" ; done takes about 25 minutes
#codeforester find . -type f -exec tail -n 1 -- {} + | grep -EB 1 '^[^=]+A' | grep -c '^==>' takes >20 minutes.
#mklement0 and #codeforester solutiona also has the advantage that if I want to change the grep pattern, the second time I run it it takes zero time, I guess it's due to some sort of caching.
I've accepted #mklement0 answer at is seems to be the fastest, but I still like to mention #tso and #lars for their contributions and, based on my personal knowledge, an easier and adaptable solution.
xargs is able to overcome the max. command-line length limitation by efficiently batching the invocations into as few calls as possible.
The shell's builtins, such as printf, are not subject to the max. command-line length.
Knowing this, you can use the following approach (which assumes that your xargs implementation supports the -0 option for NUL-terminated input, and that your tail implementation supports multiple file operands and the -q option for suppressing filename headers.
Both assumptions hold for the GNU (Linux) and BSD/macOS implementations of these utilities):
printf '%s\0' * | xargs -0 sh -c 'tail -q -n 1 "$#" | grep \"A\"' - | wc -l
How about using find, tail, and grep this way? This will be more efficient than having to loop through each file. Also, tail -1 will just read the last line of the files and hence very I/O efficient.
find . -maxdepth 1 -type f -exec tail -n 1 -- {} + | grep -EB 1 '^[^=]+A' | grep -c '^==>'
find will invoke tail -1 in batches, passing ARG_MAX file names at a time
tail will print the last line of each of the file, prefixing it with the pattern "==> file_name <=="
grep -EB 1 '^[^=]+A' will look for pattern A and fetch the previous line as well (it will exclude the file_name lines while looking for the match)
grep -c '^==>' will count the number of files with matching pattern
If you don't need to know the name of the files having a match, but just get the count of files, you could do this:
find . -maxdepth 1 -type f -exec tail -q -n 1 -- {} + | grep -c 'A'
Using GNU awk:
$ cat foo
b
a
$ cat bar
b
b
$ awk 'ENDFILE{if(/a/){c++; print FILENAME}}END{print c}' *
foo
1
try with find:
for f in $(find . -type f); do tail -1 $f|grep PATERN; done|wc -l
If grep supports the -P option, this might work:
grep -P "A\Z" -r . | wc -l
See man pcrepattern. In short:
\Z matches at the end of the subject also matches before a newline at the end of the subject
\z matches only at the end of the subject
Try \Z and \z.
To see which files match, you would use only the grep part without the pipe to wc.
This will return the number of files:
grep -rlP "A\z" | wc -l
If you want to get the names then simply:
grep -rlP "A\Z"
conducting a word count of a directory.
ls | wc -l
if output is "17", I would like the output to display as "017".
I have played with | printf with little luck.
Any suggestions would be appreciated.
printf is the way to go to format numbers:
printf "There were %03d files\n" "$(ls | wc -l)"
ls | wc -l will tell you how many lines it encountered parsing the output of ls, which may not be the same as the number of (non-dot) filenames in the directory. What if a filename has a newline? One reliable way to get the number of files in a directory is
x=(*)
printf '%03d\n' "${#x[#]}"
But that will only work with a shell that supports arrays. If you want a POSIX compatible approach, use a shell function:
countargs() { printf '%03d\n' $#; }
countargs *
This works because when a glob expands the shell maintains the words in each member of the glob expansion, regardless of the characters in the filename. But when you pipe a filename the command on the other side of the pipe can't tell it's anything other than a normal string, so it can't do any special handling.
You coud use sed.
ls | wc -l | sed 's/^17$/017/'
And this applies to all the two digit numbers.
ls | wc -l | sed '/^[0-9][0-9]$/s/.*/0&/'
I'm trying to list all PDF files under a given directory $1 (and its subdirectories), get the number of pages in each file and calculate two numbers using the pagecount. My script used to work, but only on filenames that don't contain spaces and only in one directory that is only filled with PDF files. I've modified it a bit already (using quotes around variables and such), but now I'm a bit stuck.
The problem I'm having is that, as it is now, the script only processes the first file found by find . -name '*.pdf'. How would I go about processing the rest?
#!/bin/bash
wd=`pwd`
pppl=0.03 #euro
pppnl=0.033 #eruo
cd $1
for entry in "`find . -name '*.pdf'`"
do
filename="$(basename "$entry")"
pagecount=`pdfinfo "$filename" | grep Pages | sed 's/[^0-9]*//'`
pricel=`echo "$pagecount * $pppl" | bc`
pricenl=`echo "$pagecount * $pppnl" | bc`
echo -e "$filename\t\t$pagecount\t$pricel\t$pricenl"
done
cd "$wd"
The problem with using find in a for loop, is that if you don't quote the command, the filenames with spaces will be split, and if you do quote the command, then the entire results will be parsed in a single iteration.
The workaround is to use a while loop instead, like this:
find . -name '*.pdf' -print0 | while IFS= read -r -d '' entry
do
....
done
Read this article for more discussion: http://mywiki.wooledge.org/ParsingLs
It's a bad idea to use word splitting. Use a while loop instead.
while read -r entry
do
filename=$(basename "$entry")
pagecount=$(pdfinfo "$filename" | grep Pages | sed 's/[^0-9]*//')
pricel=$(echo "$pagecount * $pppl" | bc)
pricenl=$(echo "$pagecount * $pppnl" | bc)
echo -e "$filename\t\t$pagecount\t$pricel\t$pricenl"
done < <(exec find . -name '*.pdf')
Also prefer $() over backticks when possible. You also don't need to place around "" variables or command substitutions when they are being used for assignment.
filename=$(basename "$entry")
As well could simply be just
filename=${entry##*/}
I can't remember how to capture the result of an execution into a variable in a bash script.
Basically I have a folder full of backup files of the following format:
backup--my.hostname.com--1309565.tar.gz
I want to loop over a list of all files and pull the numeric part out of the filename and do something with it, so I'm doing this so far:
HOSTNAME=`hostname`
DIR="/backups/"
SUFFIX=".tar.gz"
PREFIX="backup--$HOSTNAME--"
TESTNUMBER=9999999999
#move into the backup dir
cd $DIR
#get a list of all backup files in there
FILES=$PREFIX*$SUFFIX
#Loop over the list
for F in $FILES
do
#rip the number from the filename
NUMBER=$F | sed s/$PREFIX//g | sed s/$SUFFIX//g
#compare the number with another number
if [ $NUMBER -lg $TESTNUMBER ]
#do something
fi
done
I know the "$F | sed s/$PREFIX//g | sed s/$SUFFIX//g" part rips the number correctly (though I appreciate there might be a better way of doing this), but I just can't remember how to get that result into NUMBER so I can reuse it in the if statement below.
Use the $(...) syntax (or ``).
NUMBER=$( echo $F | sed s/$PREFIX//g | sed s/$SUFFIX//g )
or
NUMBER=` echo $F | sed s/$PREFIX//g | sed s/$SUFFIX//g `
(I prefer the first one, since it is easier to see when multiple ones nest.)
Backticks if you want to be portable to older shells (sh):
NUMBER=`$F | sed s/$PREFIX//g | sed s/$SUFFIX//g`.
Otherwise, use NUMBER=$($F | sed s/$PREFIX//g | sed s/$SUFFIX//g). It's better and supports nesting more readily.
In Bash, how do I count the number of non-blank lines of code in a project?
cat foo.c | sed '/^\s*$/d' | wc -l
And if you consider comments blank lines:
cat foo.pl | sed '/^\s*#/d;/^\s*$/d' | wc -l
Although, that's language dependent.
#!/bin/bash
find . -path './pma' -prune -o -path './blog' -prune -o -path './punbb' -prune -o -path './js/3rdparty' -prune -o -print | egrep '\.php|\.as|\.sql|\.css|\.js' | grep -v '\.svn' | xargs cat | sed '/^\s*$/d' | wc -l
The above will give you the total count of lines of code (blank lines removed) for a project (current folder and all subfolders recursively).
In the above "./blog" "./punbb" "./js/3rdparty" and "./pma" are folders I blacklist as I didn't write the code in them. Also .php, .as, .sql, .css, .js are the extensions of the files being looked at. Any files with a different extension are ignored.
There are many ways to do this, using common shell utilities.
My solution is:
grep -cve '^\s*$' <file>
This searches for lines in <file> the do not match (-v) lines that match the pattern (-e) '^\s*$', which is the beginning of a line, followed by 0 or more whitespace characters, followed by the end of a line (ie. no content other then whitespace), and display a count of matching lines (-c) instead of the matching lines themselves.
An advantage of this method over methods that involve piping into wc, is that you can specify multiple files and get a separate count for each file:
$ grep -cve '^\s*$' *.hh
config.hh:36
exceptions.hh:48
layer.hh:52
main.hh:39
If you want to use something other than a shell script, try CLOC:
cloc counts blank lines, comment
lines, and physical lines of source
code in many programming languages. It
is written entirely in Perl with no
dependencies outside the standard
distribution of Perl v5.6 and higher
(code from some external modules is
embedded within cloc) and so is quite
portable.
This command count number of non-blank lines. cat fileName | grep -v ^$ | wc -l grep -v ^$ regular expression function is ignore blank lines.
'wc' counts lines, words, chars, so to count all lines (including blank ones) use:
wc *.py
To filter out the blank lines, you can use grep:
grep -v '^\s*$' *.py | wc
'-v' tells grep to output all lines except those that match
'^' is the start of a line
'\s*' is zero or more whitespace characters
'$' is the end of a line
*.py is my example for all the files you wish to count (all python files in current dir)
pipe output to wc. Off you go.
I'm answering my own (genuine) question. Couldn't find an stackoverflow entry that covered this.
cat file.txt | awk 'NF' | wc -l
cat 'filename' | grep '[^ ]' | wc -l
should do the trick just fine
grep -cvE '(^\s*[/*])|(^\s*$)' foo
-c = count
-v = exclude
-E = extended regex
'(comment lines) OR (empty lines)'
where
^ = beginning of the line
\s = whitespace
* = any number of previous characters or none
[/*] = either / or *
| = OR
$ = end of the line
I post this becaus other options gave wrong answers for me. This worked with my java source, where comment lines start with / or * (i use * on every line in multi-line comment).
awk '/^[[:space:]]*$/ {++x} END {print x}' "$testfile"
Here's a Bash script that counts the lines of code in a project. It traverses a source tree recursively, and it excludes blank lines and single line comments that use "//".
# $excluded is a regex for paths to exclude from line counting
excluded="spec\|node_modules\|README\|lib\|docs\|csv\|XLS\|json\|png"
countLines(){
# $total is the total lines of code counted
total=0
# -mindepth exclues the current directory (".")
for file in `find . -mindepth 1 -name "*.*" |grep -v "$excluded"`; do
# First sed: only count lines of code that are not commented with //
# Second sed: don't count blank lines
# $numLines is the lines of code
numLines=`cat $file | sed '/\/\//d' | sed '/^\s*$/d' | wc -l`
# To exclude only blank lines and count comment lines, uncomment this:
#numLines=`cat $file | sed '/^\s*$/d' | wc -l`
total=$(($total + $numLines))
echo " " $numLines $file
done
echo " " $total in total
}
echo Source code files:
countLines
echo Unit tests:
cd spec
countLines
Here's what the output looks like for my project:
Source code files:
2 ./buildDocs.sh
24 ./countLines.sh
15 ./css/dashboard.css
53 ./data/un_population/provenance/preprocess.js
19 ./index.html
5 ./server/server.js
2 ./server/startServer.sh
24 ./SpecRunner.html
34 ./src/computeLayout.js
60 ./src/configDiff.js
18 ./src/dashboardMirror.js
37 ./src/dashboardScaffold.js
14 ./src/data.js
68 ./src/dummyVis.js
27 ./src/layout.js
28 ./src/links.js
5 ./src/main.js
52 ./src/processActions.js
86 ./src/timeline.js
73 ./src/udc.js
18 ./src/wire.js
664 in total
Unit tests:
230 ./ComputeLayoutSpec.js
134 ./ConfigDiffSpec.js
134 ./ProcessActionsSpec.js
84 ./UDCSpec.js
149 ./WireSpec.js
731 in total
Enjoy! --Curran
The neatest command is
grep -vc ^$ fileName
with -c option, you don't even need wc -l
It's kinda going to depend on the number of files you have in the project. In theory you could use
grep -c '.' <list of files>
Where you can fill the list of files by using the find utility.
grep -c '.' `find -type f`
Would give you a line count per file.
Script to recursively count all non-blank lines with a certain file extension in the current directory:
#!/usr/bin/env bash
(
echo 0;
for ext in "$#"; do
for i in $(find . -name "*$ext"); do
sed '/^\s*$/d' $i | wc -l ## skip blank lines
#cat $i | wc -l; ## count all lines
echo +;
done
done
echo p q;
) | dc;
Sample usage:
./countlines.sh .py .java .html
If you want the sum of all non-blank lines for all files of a given file extension throughout a project:
while read line
do grep -cve '^\s*$' "$line"
done < <(find $1 -name "*.$2" -print) | awk '{s+=$1} END {print s}'
First arg is the project's base directory, second is the file extension. Sample usage:
./scriptname ~/Dropbox/project/src java
It's little more than a collection of previous solutions.
rgrep . | wc -l
gives the count of non blank lines in the current working directory.
grep -v '^\W*$' `find -type f` | grep -c '.' > /path/to/lineCountFile.txt
gives an aggregate count for all files in the current directory and its subdirectories.
HTH!
This gives the count of number of lines without counting the blank lines:
grep -v ^$ filename wc -l | sed -e 's/ //g'
Try this one:
> grep -cve ^$ -cve '^//' *.java
it's easy to memorize and it also excludes blank lines and commented lines.
There's already a program for this on linux called 'wc'.
Just
wc -l *.c
and it gives you the total lines and the lines for each file.