I'm trying to create a for loop that counts the number of keywords in each file within a directory separately.
For Example if a directory has 2 files I would like the for loop to say
foo.txt = found 4 keywords
bar.txt = found 3 keywords
So far I have written the below but its give the total number keywords from all files instead of for each separate file.
The result I get is
7 keywords founds
Instead of the desired output above
Here is what I came up with
for i in *; do egrep 'The version is out of date|Fixed in|Identified the following' /path/to/directory* | wc -l ; done
Just use grep's -c count option:
grep -c <pattern> /path/to/directory/*
You'll get something like:
bar.txt:2
foo.txt:1
Note this will count lines matched, not individual patterns matched. So if a pattern appears twice on a line, it will only count once.
Related
I have lots of files in every year directory
and in each file have long and large sentence like this for exmaple
List item
home/2001/2001ab.txt
the AAAS kill every one not but me and you and etc
the A1CF maybe color of full fill zombie
home/2002/2002ab.txt
we maybe know some how what
home/2003/2003ab.txt
Mr, Miss boston, whatever
aaas will will will long long
and in home directory, I got home/reference.txt (list of word file)
A1BG
A1CF
A2M
AAAS
I'd like to do count how many word in the file reference.txt is in every single year file
this is my code where I run in every year directory
home/2001/, home/2002/, home/2003/
# awk
function search () {
awk -v pattern="$1" '$0 ~ pattern {print}' *.txt > $1
}
# load custom.txt
for i in $(cat reference.txt)
do
search $i
done
# word count
wc -l * > line-count.txt
this is my result
home/2001/A1BG
$cat A1BG
0
home/2001/A1CF
$cat A1CF
1
home/2001/A2M
$cat A2M
0
home/2001/AAAS
$cat AAAS
1
home/2001/line-count.txt
$cat line-count.txt
2021ab.txt 2
A1BG
A1CF 1
A2M 0
AAAS 1
result line-count.txt file have all information what I want
but I have to do this work repeat manually
do cd directory
do run my code
and then cd directory
I have around 500 directory and file, it is not easy
and second problem is wasty bunch of file
create lots of file and takes too much time
because of this at first I'd likt use grep command
but I dont' know how to use list of file instead of single word
that is why I use awk
How can i do it more simple
at first I'd likt use grep command but I dont' know how to use list of
file instead of single word
You might use --file=FILE option for that purpose, selected file should hold one pattern per line.
How can i do it more simple
You might use --count option to avoid need of using wc -l for that, consider following simple example, let file.txt content be
123
456
789
and file1.txt content be
abc123
def456
and file2.txt content be
ghi789
xyz000
and file3.txt content be
xyz000
xyz000
then
grep --count --file=file.txt file1.txt file2.txt file3.txt
gives output
file1.txt:2
file2.txt:1
file3.txt:0
Observe that no files are created and file without matches does appear in output. Disclaimer: this solution assumes file.txt does not contain character of special meaning for GNU grep, if this does not hold do not use this solution.
(tested in GNU grep 3.4)
I am trying to count how many files have words with the pattern [Gg]reen.
#!/bin/bash
for File in `ls ./`
do
cat ./$File | egrep '[Gg]reen' | sed -n '$='
done
When I do this I get this output:
1
1
3
1
1
So I want to count the lines to get in total 5. I tried using wc -l after the sed but it didn't work; it counted the lines in all the files. I tried to use >file.txt but it didn't write anything on it. And when I use >> instead it writes but when I execute the shell it appends the lines again.
Since according to your question, you want to know how many files contain a pattern, you are interested in the number of files, not the number of pattern occurances.
For instance,
grep -l '[Gg]reen' * | wc -l
would produce the number of files which contain somewhere green or Green as a substring.
I am trying to execute this in unix. So let's for example say I have five files named after dates, and in each of those files there are thousand of numerical values (six to ten digit number). Now, lets say I also have bunch of numerical values and I want to know which value belongs to which file.I am trying to do it the hard way like below but how do I put all my values in a file and just do a loop from there.
FILES:
20170101
20170102
20170103
20170104
20170105
Code:
for i in 5555555 67554363 564324323 23454657 666577878 345576867; do
echo $i; grep -l $i 201701*;
done
Or, why loop at all? If you have a file containing all your numbers (say numbers.txt you can find in which date file each are contained and on what line with a simple
grep -nH -w -f numbers.txt 201701*
Where the -f option simply tells grep to use the values contained in the file numbers.txt to search in each of the files matching 201701*. The -nH options for listing the line number and filename associated with each match, respectively. And as Ed points out below, the -w option to insure grep only select lines containing the whole word sought.
You can also do it with a while loop and read from the file if you create it as #Barmar suggested:
while read -r i; do
...
done < numbers.txt
Put the values in a file numbers.txt and do:
for i in $(cat numbers.txt); do
...
done
Stack Overflow already has some great posts about counting occurrences of a string (eg. "foo"), like this one: count all occurrences of string in lots of files with grep. However, I've been unable to find an answer to a slightly more involved variant.
Let's say I want to count how many instances of "foo:[*whatever*]*whatever else*" exist in a folder; I'd do:
grep -or 'foo:[(.*)]' * | wc -l
and I'd get back "55" (or whatever the count is). But what if I have a file like:
foo:bar abcd
foo:baz efgh
not relevant line
foo:bar xyz
and I want to get count how many instances of foo:bar vs. how many of foo:bazs, etc.? In other words, I'd like output that's something like:
bar 2
baz 1
I assume there's some way to chain greps, or use a different command from wc, but I have no idea what it is ... any shell scripting experts out there have any suggestions?
P.S. I realize that if I knew the set of possible sub-strings (ie. if I knew there was only "foo:bar" and "foo:baz") this would be simpler, but unfortunately there set of "things that can come after foo:" is unknown.
You could use sort and uniq -c:
$ grep -orE 'foo:(.*)' * | sort |Â uniq -c
2 foo:bar
1 foo:baz
How to filter the most counted first line var in all the files under directory (where other directories should also be checked)?
I want to find all the lines in my files (I want all the files in lots of folders under pwd) first variable where this first var display the most times
I am trying to use awk like this:
awk -f : { print $1} FILENAME
EDIT:
I will explain the purpose:
I have a server and i want to filter his logs cause I have a certain IP which repeat every day 100 times the first var in line is the ip
I want the find what is the ip which repeats problem : i have two servers therefore checking this will not be effiant by checking one log for 100 times I hope that this script will help me find out what is the IP that repeats ...
You should rewrite your question to make it clearer. I understood that you want to know which first lines are most common across a set of files. For that, I'd use this:
head -qn 1 * | sort | uniq -c | sort -nr
head prints the first line for every file in the current directory. -q causes it not to print the name of the file too; -n lets you specify the amount of lines).
sort groups them in sorted order.
uniq -c counts the occurrences, that is the amount of repeated lines in each block after the previous sort.
sort -r orders them with the most popular coming first. -r means reverse; by default it sorts in ascending order.
Not sure, if this helps. Question is not so clear.
Try if something like this can help.
find . -type f -name "*.*" -exec head -1 {} \; 2>/dev/null | awk -F':' 'BEGIN {max=0;}{if($2>max){max=$2;}}END{print max;}'
find - tries to find all the files from the current directory till end (type f) with any name and extension (*.*) and gets the first line of each of those files.
awk - sets the field seperator as : (-F:) and before processing the first line BEGIN sets the max to 0.
gets the second field after : ($2) checks if $2 > current_max_value. If it is, then it sets the current field as the new max value.
At the end of processing all the lines(first lines from all the files under current directory) END prints the max value.