How to print number of occurances of a word in a file in unix - shell

This is my shell script.
Given a directory, and a word, search the directory and print the absolute path of the file that has the maximum occurrences of the word and also print the number of occurrences.
I have written the following script
#!/bin/bash
if [[ -n $(find / -type d -name $1 2> /dev/null) ]]
then
echo "Directory exists"
x=` echo " $(find / -type d -name $1 2> /dev/null)"`
echo "$x"
cd $x
y=$(find . -type f | xargs grep -c $2 | grep -v ":0"| grep -o '[^/]*$' | sort -t: -k2,1 -n -r )
echo "$y"
else
echo "Directory does does not exists"
fi
result: scriptname directoryname word
output: /somedirectory/vtb/wordsearch : 4
/foo/bar: 3
Is there any option to replace xargs grep -c $2 ? Because grep -c prints the count=number of lines which contains the word but i need to print the exact occurrence of a word in the files in a given directory

Using grep's -c count feature:
grep -c "SEARCH" /path/to/files* | sort -r -t : -k 2 | head -n 1
The grep command will output each file in a /path/name:count format, the sort will numerically (-n) sort by the 2nd (-k 2) field as delimited by a colon (-t :) in reverse order (-r). We then use head to keep the first result (-n 1).

Try This:
grep -o -w 'foo' bar.txt | wc -w
OR
grep -o -w 'word' /path/to/file/ | wc -w

grep -Fwor "$word" "$dir" | sed "s/:${word}\$//" | sort | uniq -c | sort -n | tail -1

Related

Counting the number of lines of many files (only .h, .c and .py files) in a directory using bash

I'm asked to write a script (using bash) that count the number of lines in files (but only C files (.h and .c) and python files (.py)) that are regrouped in a single directory. I've already tried with this code but my calculation is always wrong
let "sum = 0"
let "sum = sum + $(wc -l $1/*.c | tail --lines=1 | tr -dc '0-9')"
let "sum = sum + $(wc -l $1/*.h | tail --lines=1 | tr -dc '0-9')"
let "sum = sum + $(wc -l $1/*.py | tail --lines=1 | tr -dc '0-9')"
echo $sum >> manifest.txt
I must write the total in the "manifest.txt" file and the argument of my script is the path to the directory that contains the files.
If someone has another technique to compute this, I'd be very grateful.
Thank you !
You could also use a loop to aggregate the counts:
extensions=("*.h" "*.c" "*.py")
sum=0
for ext in ${extensions[#]} ; do
count=$(wc -l ${1}/${ext} | awk '{ print $1 }')
sum=$((sum+count))
done
echo "${sum}"
Version 1: step by step
#!/bin/bash
echo "Counting the total number of lines for all .c .h .py files in $1"
sum=0
num_py=$(wc -l $1/*.py | tail -1 | tr -dc '0-9')
num_c=$(wc -l $1/*.c | tail -1 | tr -dc '0-9')
num_h=$(wc -l $1/*.h | tail -1 | tr -dc '0-9')
sum=$(($num_py + $num_c + $num_h))
echo $sum >> manifest.txt
version 2: concise
#!/bin/bash
echo "Counting the total number of lines for all .c .h .py files in $1"
echo "$(( $(wc -l $1/*.py | tail -1 | tr -dc '0-9') + $(wc -l $1/*.c | tail -1 | tr -dc '0-9') + $(wc -l $1/*.h | tail -1 | tr -dc '0-9') ))" >> manifest.txt
version 3: loop over your desired files
#!/bin/bash
echo "Counting the total number of lines for all .c .h .py files in $1"
sum=0
for sfile in $1/*.{c,h,py}; do
sum=$(($sum+$(wc -l $sfile|tail -1|tr -dc '0-9')))
done
echo $sum >> manifest.txt
This is how arithmetic operations work: var = $((EXPR))
For example: $sum= $(($sum + $result ))
it is very common to miss the $ sign within the EXPR! Try not to forget them :)
This is the scripts that I use (with minor modifications):
files=( $(find . -mindepth 1 -maxdepth 1 -type f -iname "*.h" -iname "*.c" -iname "*.py") )
declare -i total=0
for file in "${files[#]}"; do
lines="$(wc -l < <(cat "$file"))"
echo -e "${lines}\t${file}"
total+="$lines"
done
echo -e "\n$total\ttotal"
Here is my version.
#!/usr/bin/env bash
shopt -s extglob nullglob
files=( "$1"/*.#(c|h|py) )
shopt -u extglob nullglob
while IFS= read -rd '' file_name; do
count=$(wc -l < "$file_name")
((sum+=count))
done< <(printf '%s\0' "${files[#]}")
echo "$sum" > manifest.txt
Needs some error checking, like if the argument is a directory or if it even exists at all, and so on.

count all the lines in all folders in bash [duplicate]

wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133

Why does "... >> out | sort -n -o out" not actually run sort?

As an exercise, I should find all .c files starting from my home directory, count the lines of each file and store the sorted output in sorted_statistics.txt, using find, wc, cut ad sort.
I found this command to work
find /home/user/ -type f -name "*.c" 2> /dev/null -exec wc -l {} \; | cut -f 1 -d " " | sort -n -o sorted_statistics.txt
but I can't understand why
find /home/user/ -type f -name "*.c" 2> /dev/null -exec wc -l {} \; | cut -f 1 -d " " >> sorted_statistics.txt | sort -n sorted_statistics.txt
stops just before the sort command.
Just out of curiosity, why is that?
You were appending everything to sorted_statistics.txt ( consuming all the output ) and then trying to use that none existing output in a pipe for sort. I have corrected your code so it works now.
find /home/user/ -type f -name "*.c" 2> /dev/null -exec wc -l {} \; | cut -f 1 -d " " >> tmp.txt && sort -n tmp.txt > sorted_statistics.txt
Regards!
This part of the command makes no sense:
cut -f 1 -d " " >> sorted_statistics.txt | sort ...
because the output of cut is appended to the file sorted_statistics.txt and no output at all goes to the sort command. You will probably want to use tee:
cut -f 1 -d " " | tee -a sorted_statistics.txt | sort ...
The tee command sends its input to a file and also to the standard output. It is like a Tee junction in a pipeline.

Count of matching word, pattern or value from unix korn shell scripting is returning just 1 as count

I'm trying to get the count of a matching pattern from a variable to check the count of it, but it's only returning 1 as the results, here is what I'm trying to do:
x="HELLO|THIS|IS|TEST"
echo $x | grep -c "|"
Expected result: 3
Actual Result: 1
Do you know why is returning 1 instead of 3?
Thanks.
grep -c counts lines not matches within a line.
You can use awk to get a count:
x="HELLO|THIS|IS|TEST"
echo "$x" | awk -F '|' '{print NF-1}'
3
Alternatively you can use tr and wc:
echo "$x" | tr -dc '|' | wc -c
3
$ echo "$x" | grep -o '|' | grep -c .
3
grep -c does not count the number of matches. It counts the number of lines that match. By using grep -o, we put the matches on separate lines.
This approach works just as well with multiple lines:
$ cat file
hello|this|is
a|test
$ grep -o '|' file | grep -c .
3
The grep manual says:
grep, egrep, fgrep - print lines matching a pattern
and for the -c flag:
instead print a count of matching lines for each input file
and there is just one line that match
You don't need grep for this.
pipe_only=${x//[^|]} # remove everything except | from the value of x
echo "${#pipe_only}" # output the length of pipe_only
Try this :
$ x="HELLO|THIS|IS|TEST"; echo -n "$x" | sed 's/[^|]//g' | wc -c
3
With only one pipe with perl:
echo "$x" |
perl -lne 'print scalar(() = /\|/g)'

How to get "wc -l" to print just the number of lines without file name?

wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133

Resources