Count how many words in file test.txt start with “tol”? - bash

I'm new to Linux shell. I know there are tools to do this thing, such as awk. But I'm wondering if I could do it using grep or wc or other commands? awk seems intimidating to me. Thanks.
I tried grep and wc, like this:
grep tol test.txt | wc -w
But grep will give me the whole line.
If I tried the following:
grep '^tol$*' test.txt | wc -w
It only counts the line begins with mol.
How can I grep the words starting with tol?

Something like that:
grep -o '\<tol[[:alpha:]]*\>' test.txt | wc -w
< - for beginning of the word,
> - the end of the word.
[[:alpha:]] - to avoid match of combinations like tol123 (You said you need only words).
-o - to show only matches, not the entire line.

You can do the same fairly simply with awk, e.g.
awk '{for(i=1;i<=NF;i++) $i~/^tol/ && n++} END {print n}'
Example
$ echo -e "tolerance topaz tolstoy\nbats toluene toledo" |
> awk '{for(i=1;i<=NF;i++) $i~/^tol/ && n++} END {print n}'
4

Another option is to translate all whitespace characters into linefeeds so that each word starts on a new line, then grep can count them itself:
echo -e "tolerance topaz\ttolstoy\nbats toluene toledo" | tr '[:space:]' '\n' | grep -c "^tol"
4
Or, if using a file called words.txt:
tr '[:space:]' '\n' < words.txt | grep -c "^tol"

Related

count all the lines in all folders in bash [duplicate]

wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133

Count only lines with given character in front (<)

I want to count the amount of lines in a CSV file.
But only those with "<" at the beginning.
Use grep with c:
grep -c '^<' <FILE>
Use the awk, Luke:
$ cat file
foo
< this foo
not this foo <
< another of those foos
$ awk '/^</{c++}END{print c}' file
2
You can get all the line containing '<' at the begining with grep '^<' and count remaining line with wc -l.
grep "^<" file.csv | wc -l
grep -E "^<" filename | wc -l
Use regular expressions with grep to check for < at the start of the line (^) and then pipe through wc -l to get number of lines.

Count the number of whitespaces in a file

File test
musically us
challenged a goat that day
spartacus was his name
ba ba ba blacksheep
grep -oic "[\s]*" test
grep -oic "[ ]*" test
grep -oic "[\t]*" test
grep -oic "[\n]*" test
All give me 4, when I expect 11
grep --version -> grep (BSD grep) 2.5.1-FreeBSD
Running this on OSX Sierra 10.12
Repeating spaces should not be counted as one space.
If you are open to tricks and alternatives you might like this one:
$ awk '{print --NF}' <(tr -d '\n' <file)
11
Above solution will count "whitespace" between words. As a result for a string of 'fifteen--> <--spaces' awk will measure 1, like grep.
If you need to count actual single spaces you can use this :
$ awk -F"[ ]" '{print --NF}' <<<"fifteen--> <--spaces"
15
$ awk -F"[ ]" '{print --NF}' <<<" 2 4 6 8 10"
10
$ awk -F"[ ]" '{print --NF}' <(tr -d '\n' <file)
11
One step forward, to count single spaces and tabs:
$ awk -F"[ ]|\t" '{print --NF}' <(echo -e " 2 4 6 8 10\t12 14")
13
tr is generally better for this (in most cases):
tr -d -C ' ' <file | wc -c
The grep solution relies on the fact that the output of grep -o is newline-separated — it will fail miserably for example in the following type of circumstance where there might be multiple spaces:
v='fifteen--> <--spaces'
echo "$v" | grep -o -E ' +' | wc -l
echo "$v" | tr -d -C ' ' | wc -c
grep only returns 1, when it should be 15.
EDIT: If you wanted to count multiple characters (eg. TAB and SPACE) you could use:
tr -dC $'[ \t]' <<< $'one \t' | wc -c
Just use awk:
$ awk -v RS=' ' 'END{print NR-1}' file
11
or if you want to handle empty files gracefully:
$ awk -v RS=' ' 'END{print NR - (NR?1:0)}' /dev/null
0
The -c option counts the number of lines that match, not individual matches. Use grep -o and then pipe to wc -l, which will count the number of lines.
grep -o ' ' test | wc -l

using cut command in bash [duplicate]

This question already has answers here:
Get just the integer from wc in bash
(19 answers)
Closed 8 years ago.
I want to get only the number of lines in a file:
so I do:
$wc -l countlines.py
9 countlines.py
I do not want the filename, so I tried
$wc -l countlines.py | cut -d ' ' -f1
but this just echo empty line.
I just want number 9 to be printed
Use stdin and you won't have issue with wc printing filename
wc -l < countlines.py
You can also use awk to count lines. (reference)
awk 'END { print NR }' countlines.py
where countlines.py is the file you want to count
If your file doesn't ends with a \n (new line) the wc -l gives a wrong result. Try it with the next simulated example:
echo "line1" > testfile #correct line with a \n at the end
echo -n "line2" >> testfile #added another line - but without the \n
the
$ wc -l < testfile
1
returns 1. (The wc counts the number of newlines (\n) in a file.)
Therefore, for counting lines (and not the \n characters) in a file, you should to use
grep -c '' testfile
e.g. find empty character in a file (this is true for every line) and count the occurences -c. For the above testfile it returns the correct 2.
Additionally, if you want count the non-empty lines, you can do it with
grep -c '.' file
Don't trust wc :)
Ps: one of the strangest use of wc is
grep 'pattern' file | wc -l
instead of
grep -c 'pattern' file
cut is being confused by the leading whitespace.
I'd use awk to print the 1st field here:
% wc -l countlines.py | awk '{ print $1 }'
As an alternative, wc won't print the file name if it is being piped input from stdin
$ cat countlines.py | wc -l
9
yet another way :
cnt=$(wc -l < countlines.py )
echo "total is $cnt "
Piping the file name into wc removes it from the output, then translate away the whitespace:
wc -l <countlines.py |tr -d ' '
Use awk like this:
wc -l countlines.py | awk {'print $1'}

How to get "wc -l" to print just the number of lines without file name?

wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133

Resources