In bash I want to echo some integer value which is the sum of various "grep | wc -l" combinations.
I have tried
echo $( (`grep string file.txt | wc-l`) + (`grep string2 file.txt | wc -l`))
I assumed the return of these greps is just an integer bash would recognise but it doesn't. Where do I need to explicit and why?
Simplifying your inner commands (with seq to produce lines) but keeping your same parenthesis, this does not work:
$ echo $( (`seq 5 | wc -l`) + (`seq 10 | wc -l`))
-bash: command substitution: line 1: syntax error near unexpected token `+'
-bash: command substitution: line 1: ` (`seq 5 | wc -l`) + (`seq 10 | wc -l`)'
Arithmetic Expansion in Bash is two parenthesis next to each other:
$ echo $(( 1+2 ))
3
This works:
$ echo $((`seq 5 | wc -l` + `seq 10 | wc -l`))
15
As does the more modern version:
$ echo $(( $(seq 5 | wc -l) + $(seq 10 | wc -l) ))
15
So your parenthesis are not matched and spaced incorrectly.
As a side note, if you can refactor the two grep to produce one output with something like:
$ grep "string 1|string2" | wc -l
As Charles Duffy suggests, with command grouping either in a sub shell or same shell you can combine two greps output in a single stream. Then you do not need the echo or the arithmetic.
Using seq as a simple model for the lines from two different processes, you can do:
$ (seq 10; seq 5) | wc -l # sub shell
15
$ { seq 5; seq 10; } | wc -l # same shell
15
Finally, to the extent you have the "sum of various "grep | wc -l" combinations" you might consider awk as better grep + wc replacement.
You can do:
awk '/string1/{c++} /string2/{c++} END{print c+0}' file.txt
as well as far more complex combinations than you should be doing in Bash alone. It will perform much better if you have many different search strings.
Related
I have a char variable called sign and a given string sub. I need to find out how many times this sign appears in the sub and cannot use grep.
For example:
sign = c
sub = mechanic cup cat
echo "$sub" | awk <code i am asking for> | wc -l
And the output should be 4 because c appears 4 times. What should be inside <>?
sign=c
sub='mechanic cup cat'
echo "$sub" |
awk -v sign="$sign" -F '' '{for (i=1;i<=NF;i++){if ($i==sign) cnt++}} END{print cnt}'
Edit:
Changes for the requirements in the comment:
Test if the length of sign is 1 (no = present). If true, change sign and sub to lowercase to ignore the case.
Use ${sign:0:1} to only pass the first character to awk.
sign=c
sub='mechanic Cup cat'
if [ "${#sign}" -eq 1 ]; then
sign=${sign,,}
sub=${sub,,}
fi
echo "$sub" |
awk -v sign="${sign:0:1}" -F '' '{for (i=1;i<=NF;i++){if ($i==sign) cnt++}} END{print cnt}'
A combination of Quasimodo's comment and Freddy's lower-case example:
$ sign=c
$ sub='mechanic Cup cat'
A tr + wc solution if ${sign} is a single character.
Count the number of times ${sign} shows up in ${sub}, ignoring case:
$ tr -cd [${sign,,}] <<< ${sub,,} | wc -c
4
Where:
${sign,,} & {sub,,} - convert to all lowercase
tr -cd [...] - find all characters listed inside the brackets ([]), -d says to delete/remove said characters while -c says to take the complement (ie, remove all but the characters in the brackets), so -cp [${sign,,] says to remove all but the character stored in ${sign}
<<< .... - here string (allows passing a variable/string in as an argument to tr
wc -c count the number of chracers
NOTE: This only works if ${sign} contains a single character.
A sed solution that should work regardless of the number of characters in ${sign}.
$ sub='mechanic Cup cat'
First we embed a new line character before each occurrence of ${sign,,}:
$ sign=c
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,}
me
chani
c
cup
cat
$ sign=cup
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,}
mechanic
cup cat
Where:
\(${sign,,}\) - find the pattern that matches ${sign} (all lowercase) and assign to position 1
\n\1 - place a newline (\n) in the stream just before our pattern in position 1
At this point we just want the lines that start with ${sign,,}, which is where tail +2 comes into play (ie, display lines 2 through n):
$ sign=c
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2
chani
c
cup
cat
$ sign=cup
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2
cup cat
And now we pipe to wc -l to get a line count (ie, count the number of times ${sign} shows up in ${sub} - ignoring case):
$ sign=c
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2 | wc -l
4
$ sign=cup
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2 | wc -l
1
Why does adding | wc -l alters the result as in the following?
tst:
#!/bin/bash
pgrep tst | wc -l
echo $(pgrep tst | wc -l)
echo $(pgrep tst) | wc -l
$ ./tst
1
2
1
and even
$ bash -x tst
+ wc -l
+ pgrep tst
0
++ pgrep tst
++ wc -l
+ echo 0
0
++ pgrep tst
+ echo
pgrep and subshells can have weird interactions, but in this case that's just a red herring; the actual cause is missing double-quotes around the command substitution:
$ cat tst2
#!/bin/bash
pgrep tst | wc -l
echo "$(pgrep tst | wc -l)"
echo "$(pgrep tst)" | wc -l
$ ./tst2
1
2
2
What's going on in the original script is that in the command
echo $(pgrep tst) | wc -l
pgrep prints two process IDs (the main shell running the script, and a subshell created to handle the echo part of the pipeline). It prints each one as a separate line, something like:
11730
11736
The command substitution captures that, but since it's not in double-quotes the newline between them gets converted to an argument break, so the whole thing becomes equivalent to:
echo 11730 11736 | wc -l
As a result, echo prints both IDs as a single line, and wc -l correctly reports that.
The command substitution induces an additional process that has tst in its name, which is included in the input to wc -l.
wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133
I'm trying to get the count of a matching pattern from a variable to check the count of it, but it's only returning 1 as the results, here is what I'm trying to do:
x="HELLO|THIS|IS|TEST"
echo $x | grep -c "|"
Expected result: 3
Actual Result: 1
Do you know why is returning 1 instead of 3?
Thanks.
grep -c counts lines not matches within a line.
You can use awk to get a count:
x="HELLO|THIS|IS|TEST"
echo "$x" | awk -F '|' '{print NF-1}'
3
Alternatively you can use tr and wc:
echo "$x" | tr -dc '|' | wc -c
3
$ echo "$x" | grep -o '|' | grep -c .
3
grep -c does not count the number of matches. It counts the number of lines that match. By using grep -o, we put the matches on separate lines.
This approach works just as well with multiple lines:
$ cat file
hello|this|is
a|test
$ grep -o '|' file | grep -c .
3
The grep manual says:
grep, egrep, fgrep - print lines matching a pattern
and for the -c flag:
instead print a count of matching lines for each input file
and there is just one line that match
You don't need grep for this.
pipe_only=${x//[^|]} # remove everything except | from the value of x
echo "${#pipe_only}" # output the length of pipe_only
Try this :
$ x="HELLO|THIS|IS|TEST"; echo -n "$x" | sed 's/[^|]//g' | wc -c
3
With only one pipe with perl:
echo "$x" |
perl -lne 'print scalar(() = /\|/g)'
I have lines that look like these
value: "15"
value: "20"
value: "3"
I am getting this as input pipe after grepping
... | grep value:
What I need is a simple bash script that takes this pipe and produce me the sum
15 + 20 + 3
So my command will be:
... | grep value: | calculate_sum_value > /tmp/sum.txt
sum.txt should contain a single number which is the sum.
How can I do with bash? I have no experience with bash at all.
You could try awk. Something like this should work
... | grep value: | awk '{sum+=$2}END{print sum}'
And you could possibly avoid grep alltogether like this
.... | awk '/^value:/{sum+=$2}END{print sum}'
Update:
You can add the " character as a field seperator with the -F option.
... | awk -F\" '/^value:/{sum+=$2}END{print sum}'
My first try was to grab the stuff on the right of the colon and let bash sum it:
$ sum=0
$ cat sample.txt | while IFS=: read key value; do ((sum += value)); done
bash: ((: "15": syntax error: operand expected (error token is ""15"")
bash: ((: "20": syntax error: operand expected (error token is ""20"")
bash: ((: "3": syntax error: operand expected (error token is ""3"")
0
So, have to remove the quotes. Fine, use a fancy Perl regex to extract the first set of digits to the right of the colon:
$ cat sample.txt | grep -oP ':\D+\K\d+'
15
20
3
OK, onwards:
$ cat sample.txt | grep -oP ':\D+\K\d+' | while read n; do ((sum+=n)); done; echo $sum
0
Huh? Oh yeah, running while in a pipeline puts the modifications to sum in a subshell, not in the current shell. Well, do the echo in the subshell too:
$ cat sample.txt | grep -oP ':\D+\K\d+' | { while read n; do ((sum+=n)); done; echo $sum; }
38
That's better, but still the value is not in the current shell. Let's try something trickier
$ set -- $(cat sample.txt | grep -oP ':\D+\K\d+')
$ sum=$(IFS=+; bc <<< "$*")
$ echo $sum
38
And yes, UUOC, but it's a placeholder for whatever the OP's pipeline was.