Count mutiple occurences of a word on the same line using grep - shell

Here I made a small script that take input from user searching some pattern from a file and displays required no of lines from that file where the pattern is found. Although this code is searching the pattern line wise due to standard grep practice. I mean if the pattern occurs twice on the same line, i want the output to print twice. Hope I make some sense.
#!/bin/sh
cat /dev/null>copy.txt
echo "Please enter the sentence you want to search:"
read "inputVar"
echo "Please enter the name of the file in which you want to search:"
read "inputFileName"
echo "Please enter the number of lines you want to copy:"
read "inputLineNumber"
[[-z "$inputLineNumber"]] || inputLineNumber=20
cat /dev/null > copy.txt
for N in `grep -n $inputVar $inputFileName | cut -d ":" -f1`
do
LIMIT=`expr $N + $inputLineNumber`
sed -n $N,${LIMIT}p $inputFileName >> copy.txt
echo "-----------------------" >> copy.txt
done
cat copy.txt

As I understood, the task is to count number of pattern occurrences in line. It can be done like so:
count=$((`echo "$line" | sed -e "s|$pattern|\n|g" | wc -l` - 1))
Suppose you have one file to read. Then, code will be following:
#!/bin/bash
file=$1
pattern="an."
#reading file line by line
cat -n $file | while read input
do
#storing line to $tmp
tmp=`echo $input | grep "$pattern"`
#counting occurrences count
count=$((`echo "$tmp" | sed -e "s|$pattern|\n|g" | wc -l` - 1))
#printing $tmp line $count times
for i in `seq 1 $count`
do
echo $tmp
done
done
I checked this for pattern "an." and input:
I pass here an example of many 'an' letters
an
ananas
an-an-as
Output is:
$ ./test.sh input
1 I pass here an example of many 'an' letters
1 I pass here an example of many 'an' letters
1 I pass here an example of many 'an' letters
3 ananas
4 an-an-as
4 an-an-as
Adapt this to your needs.

How about using awk?
Assume the pattern you are searching for is in variable $pattern and the file you are checking is $file
The
count=`awk 'BEGIN{n=0}{n+=split($0,a,"'$pattern'")-1}END {print n}' $file`
or for a line
count=`echo $line | awk '{n=split($0,a,"'$pattern'")-1;print n}`

Related

Counting grep results

I'm new to bash so I'm finding trouble doing something very basic.
Through playing with various scripts I found out that the following script prints the lines that contain the "word"
for file in*; do
cat $file | grep "word"
done
doing the following:
for file in*; do
cat $file | grep "word" | wc -l
done
had a result of printing in every iteration how many times did the "word" appeared on file.
How can I implement a counter for all those appearances and in the
end just echo the counter?
I used a counter that way but it appeared 0.
let x+=cat $filename | grep "word"
You can pipe the entire loop to wc -l.
for file in *; do
cat $file | grep "word"
done | wc -l
This is a useless use of cat. How about:
for file in *; do
grep "word" "$file"
done | wc -l
Actually, the entire loop is unnecessary if you pass all the file names to grep at once.
grep "word" * | wc -l
Note that if word shows up more than once on the same line these solutions will only count the entire line once. If you want to count same-line occurrences separately you can use -o to print each match on a separate line:
grep -o "word" * | wc -l
The oneliner in John's answer is the way to go. Just to satisfy your curiosity:
sum=0
for f in *; do
x="$(grep 'word' "$f" | wc -l)"
echo "x: $x"
(( sum += x ))
done
echo "sum: $sum"
If the line containing the grep and wc does not yield a number you are SOL. That is why you should stick to the other solution or do a pure bash implementation with things like read, 'case and *word*)' or if [[ "$line" =~ "$re_containing_word" ]]; then ...

Passing empty strings to grep command

I have this script where I ask for 4 patterns and then use those in a grep command. That is, I want to see if a line matches any of the patterns.
echo -n "Enter pattern1"
read pat1
echo -n "Enter pattern2"
read pat2
echo -n "Enter pattern3"
read pat3
echo -n "Enter pattern4"
read pat4
cat somefile.txt | grep $pat1 | grep $pat2 | grep $pat3 | grep $pat4
The problem I'm running into is that if the user doesn't supply one of the patterns (which I want to allow) the grep command doesn't work.
So, is there a way to have grep ignore one of the patterns if it's returned empty?
Your code has lots of problems:
Code duplication
Interactive asking for potentially unused information
using echo -n is not portable
useless use of cat
Here is what I wrote that is closer to what you should use instead:
i=1
printf %s "Enter pattern $i: "
read -r input
while [[ $input ]]; do
pattern+=(-e "$input")
let i++
printf %s "Enter pattern $i (Enter or Ctrl+D to stop entering patterns): "
read -r input
done
echo
grep "${pattern[#]}" somefile.txt
EDIT: This does not answer OP's question, this searches for multiple patterns with OR instead of AND...
Here is a working AND solution (it will stop prompting for patterns on the first empty one or after the 4th one):
pattern=
for i in {1..4}; do
printf %s "Enter pattern $i: "
read -r input
[[ $input ]] || break
pattern="${pattern:+"$pattern && "}/${input//\//\\/}/"
done
echo # skip a line
awk "$pattern" somefile.txt
Here are some links from which you can learn how to program in bash:
Bash Guide
Bash FAQ

Bash : How to check in a file if there are any word duplicates

I have a file with 6 character words in every line and I want to check if there are any duplicate words. I did the following but something isn't right:
#!/bin/bash
while read line
do
name=$line
d=$( grep '$name' chain.txt | wc -w )
if [ $d -gt '1' ]; then
echo $d $name
fi
done <$1
Assuming each word is on a new line, you can achieve this without looping:
$ cat chain.txt | sort | uniq -c | grep -v " 1 " | cut -c9-
You can use awk for that:
awk -F'\n' 'found[$1] {print}; {found[$1]++}' chain.txt
Set the field separator to newline, so that we look at the whole line. Then, if the line already exists in the array found, print the line. Finally, add the line to the found array.
Note: If a line will only be suppressed once, so if the same line appears, say, 6 times, it will be printed 5 times.

Correctly count number of lines a bash variable

I need to count the number of lines of a given variable. For example I need to find how many lines VAR has, where VAR=$(git log -n 10 --format="%s").
I tried with echo "$VAR" | wc -l), which indeed works, but if VAR is empty, is prints 1, which is wrong. Is there a workaround for this? Something better than using an if clause to check whether the variable is empty...(maybe add a line and subtract 1 from the returned value?).
The wc counts the number of newline chars. You can use grep -c '^' for counting lines.
You can see the difference with:
#!/bin/bash
count_it() {
echo "Variablie contains $2: ==>$1<=="
echo -n 'grep:'; echo -n "$1" | grep -c '^'
echo -n 'wc :'; echo -n "$1" | wc -l
echo
}
VAR=''
count_it "$VAR" "empty variable"
VAR='one line'
count_it "$VAR" "one line without \n at the end"
VAR='line1
'
count_it "$VAR" "one line with \n at the end"
VAR='line1
line2'
count_it "$VAR" "two lines without \n at the end"
VAR='line1
line2
'
count_it "$VAR" "two lines with \n at the end"
what produces:
Variablie contains empty variable: ==><==
grep:0
wc : 0
Variablie contains one line without \n at the end: ==>one line<==
grep:1
wc : 0
Variablie contains one line with \n at the end: ==>line1
<==
grep:1
wc : 1
Variablie contains two lines without \n at the end: ==>line1
line2<==
grep:2
wc : 1
Variablie contains two lines with \n at the end: ==>line1
line2
<==
grep:2
wc : 2
You can always write it conditionally:
[ -n "$VAR" ] && echo "$VAR" | wc -l || echo 0
This will check whether $VAR has contents and act accordingly.
For a pure bash solution: instead of putting the output of the git command into a variable (which, arguably, is ugly), put it in an array, one line per field:
mapfile -t ary < <(git log -n 10 --format="%s")
Then you only need to count the number of fields in the array ary:
echo "${#ary[#]}"
This design will also make your life simpler if, e.g., you need to retrieve the 5th commit message:
echo "${ary[4]}"
try:
echo "$VAR" | grep ^ | wc -l

Shell script: count the copies of each line from a txt

I would like to count the copies of each line in a txt file and I have tried so many things until know, but none worked well. In my case the text has just a word in each line.
This was my last try
echo -n 'enter file for edit: '
read file
for line in $file ; do
echo 'grep -w $line $file'
done; <$file
For example:
input file
a
a
a
c
c
Output file
a 3
c 2
Thanks in advance.
$ sort < $file | uniq -c | awk '{print $2 " " $1}'

Resources