Count files matching a pattern with GREP - windows

I am on a windows server, and have installed GREP for win. I need to count the number of file names that match (or do not match) a specific pattern. I don't really need all the filenames listed out, I just need a total count of how many matched. The tree structure that I will be searching is fairly large, so I'd like to conserve as much processing as possible.
I'm not very familiar with grep, but it looks like i can use the -l option to search for file names matching a given pattern. So, for example, I could use
$grep -l -r this *.doc*
to search for all MS word files in the current folder and all child folders. This would then return to me a listing of all those files. i don't want the listing, i just want a count of how many it found. Is this possible with GREP...or another tool?
thanks!

On linux you would use
grep -l -r this .doc | wc -l
to get the number of printed lines
Although -r .doc does not search all word files, you would use --include "*doc" .
And if you do not have wc, you can use grep again, to count the number of matches:
grep -l -r --include "*doc" this . | grep -c .

Related

How can I iterate from a list of source files and locate those files on my disk drive? I'm using FD and RIPGREP

I have a very long list of files stored in a text file (missing-files.txt) that I want to locate on my drive. These files are scattered in different folders in my drive. I want to get whatever closest available that can be found.
missing-files.txt
wp-content/uploads/2019/07/apple.jpg
wp-content/uploads/2019/08/apricots.jpg
wp-content/uploads/2019/10/avocado.jpg
wp-content/uploads/2020/04/banana.jpg
wp-content/uploads/2020/07/blackberries.jpg
wp-content/uploads/2020/08/blackcurrant.jpg
wp-content/uploads/2021/06/blueberries.jpg
wp-content/uploads/2021/01/breadfruit.jpg
wp-content/uploads/2021/02/cantaloupe.jpg
wp-content/uploads/2021/03/carambola.jpg
....
Here's my working bash code:
while read p;
do
file="${p##*/}"
/usr/local/bin/fd "${file}" | /usr/local/bin/rg "${p}" | /usr/bin/head -n 1 >> collected-results.txt
done <missing-files.txt
What's happening in my bash code:
I iterate from my list of files
I use FD (https://github.com/sharkdp/fd) command to locate those files in my drive
I then piped it to RIPGREP (https://github.com/BurntSushi/ripgrep) to filter the results and find the closest match. The match I'm looking for should match the same file and folder structure. I only limit it to one result.
Then finally stored it on another text file where I can later then evaluate the lists for next step
Where I need help:
Is this the most effecient way to do this? I have over 2,000 files that I need to locate. I'm open to other solution, this is something I just divised.
For some reason my coded broke, It stopped returning results to "collected-results.txt". My guess is that it broke somewhere in the second pipe right after the FD command. I haven't setup any condition in case it encounters an error or it can't find the file so it's hard for me to determine.
Additional Information:
I'm using Mac, and running on Catalina
Clearly this is not my area of expertise
"Missing" sounds like they do not exist where expected.
What makes you think they would be somewhere else?
If they are, I'd put the filenames in a list.txt file with enough minimal pattern to pick them out of the output of find.
$: cat list.txt
/apple.jpg$
/apricots.jpg$
/avocado.jpg$
/banana.jpg$
/blackberries.jpg$
/blackcurrant.jpg$
/blueberries.jpg$
/breadfruit.jpg$
/cantaloupe.jpg$
/carambola.jpg$
Then search the whole machine, which is gonna take a bit...
$: find / | grep -f list.txt
/tmp/apricots.jpg
/tmp/blackberries.jpg
/tmp/breadfruit.jpg
/tmp/carambola.jpg
Or if you want those longer partial paths,
$: find / | grep -f missing-files.txt
That should show you the actual paths to wherever those files exist IF they do exist on the system.
From the way I understand it, you want to find all files what could match the directory structure:
path/to/file
So it should return something like "/full/path/to/file" and "/another/full/path/to/file"
Using a simple find command you can get a list of all files that match this criteria.
Using find you can search your hard disk in a single go with something of the form:
$ find -regex pattern
The idea is now to build pattern, which we can do from the file missing_files.txt. The pattern should look something like .*/\(file1\|file2\|...\|filen\). So we can use the following awk to do so:
$ sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt
So now we can do exactly what you did, but a bit quicker, in the following way:
pattern="$(sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt)"
pattern=".*/\($pattern\)"
find -regex "$pattern" > file_list.txt
In order to find the files, you can now do something like:
grep -F -f missing_files file_list.txt
This will return all the matching cases. If you just want the first case, i.e.
awk '(NR==FNR){a[$0]++;next}{for(i in a) if (!(i in b)) if ($0 ~ i) {print; b[i]}}' missing_files file_list.txt
Is this the most effecient way to do this?
I/O is mostly usually the biggest bottleneck. You are running some software fd to find the files for one file one at a time. Instead, run it to find all files at once - do single I/O for all files. In shell you would do:
find . -type f '(' -name "first name" -o -name "other name" -o .... ')'
How can I iterate from a list of source files and locate those files on my disk drive?
Use -path to match the full path. First build the arguments then call find.
findargs=()
# Read bashfaq/001
while IFS= read -r patt; do
# I think */ should match anything in front.
findargs+=(-o -path "*/$patt")
done < <(
# TODO: escape glob better, not tested
# see https://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13
sed 's/[?*[]/\\&/g' missing-files.txt
)
# remove leading -o
unset findargs[0]
find / -type f '(' "${findargs[#]}" ')'
Topics to research: var=() - bash arrays, < <(...) shell redirection with process substitution and when to use it (bashfaq/024), glob (and see man 7 glob) and man find.

find/grep to list found specific file that contains specific string

I have a root directory that I need to run a find and/or grep command on to return a list of files that contain a specific string.
Here's an example of the file and directory set up. In reality, this root directory contains a lot of subdirectories that each have a lot of subdirectories and files, but this example, I hope, gets my point across.
From root, I need to go through each of the children directories, specifically into subdir/ and look through file.html for the string "example:". If a result is found, I'd like it to print out the full path to file.html, such as website_two/subdir/file.html.
I figured limiting the search to subdir/file.html will greatly increase the speed of this operation.
I'm not too knowledgeable with find and grep commands, but I have tried the following with no luck, but I honestly don't know how to troubleshoot it.
find . -name "file.html" -exec grep -HI "example:" {} \;
EDIT: I understand this may be marked as a duplicate, but I think my question is more along the lines of how can I tell the command to only search a specific file in a specific path, looping through all root-> level directories.
find ./ -type f -iname file.html -exec grep -l "example:" {} \+;
or
grep -Rl "example:" ./ | grep -iE "file.htm(l)*$" will do the trick.
Quote from GNU Grep 2.25 man page:
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all symbolic links, unlike -r.
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have
been printed. The scanning will stop on the first match.
-i, --ignore-case
Ignore case distinctions in both the PATTERN and the input files.
-E, --extended-regexp
Interpret PATTERN as an extended regular expression.

How to loop through all files in a directory and find if a value exists in those files using a shell script

I have a directory PAYMENT. Inside it I have some text files.
xyx.txt
agfh.txt
hjul.txt
I need to go through all these files and find how many entries in each file contain text like BPR.
If it's one I need to get an alert. For example, if xyx.txt contains only one BPR entry, I need to get an alert.
You do not need to loop, something like this should make it:
grep -l "BPR" /your/path/PAYMENT/*
grep is a tool to find lines matching a pattern in files.
-l shows which files have that string.
"BPR" is the string you are looking for.
/your/path/PAYMENT/* means that it will grep throughout all files in that dir.
In case you want to want to find within specific kind of files / inside directories, say so because the command would vary a little.
Update
Based on your new requests:
I need to go through all these files an find how many entries in each
file like BPR.
If its one I need to get an alert. For example, if xyx.txt contains
only one BPR entry, I need to get an alert.
grep -c is your friend (more or less). So what you can do is:
if [ "$(grep -c a_certain_file)" -eq 1 ]; then
echo "mai dei mai dei"
fi
i need to go through all these files an find is there an entry like 'BPR'
If you are looking for a command to find file names that contain BPR then use:
echo /path/to/PAYMENT/*BPR*.txt
If you are looking for a command to find file names that contain text BPR then use:
grep -l "BPR" /path/to/PAYMENT/*.txt
Give this a try:
grep -o BPR {path}/* | wc -l
where {path} is the location of all the files. It'll give you JUST the number of occurrences of the string "BPR".
Also, FYI - this has also been talked about here: count all occurrences of string in lots of files with grep

bash script to search the words from a file into the whole server

Does anyone has any ideea about a script to set some words into a file and then use find grep or fgrep to search into the whole server
me i was thinking to something like this:
fgrep 'cat /root/my-wordfile' /home//
but my ideea is not so great, i have try to find some things similar but at this moment i had no succes , yet
my script is based to search into the server files for some words , but the words i whant to add are more than 10 words , so a file with the words will be better instead of search every word by one command .
If you have saved your patterns in the file /root/my-wordfile, then you can use the following to find "all" the files under /home
grep -rf /root/my-wordfile /home
-r would search it recursively and -f would say the patterns are in /root/my-wordfile(separated by newline)
Or if you want to search files with specific extension or names, you could use find as below:
For example if you want to search all *.c files
find /home -name *.c -exec grep -Hnf /root/my-wordfile {} \;
Here -H would list the file name, -n would display line number.

grep for string in all files in directories with certain names

I have csv files in directories with this structure:
20090120/
20090121/
20090122/
etc...
I want to grep for a certain string in all of the csv files in these directories, but only for January 2009, e.g. 200901*/*.csv
Is there a bash command line argument that can do this?
Yes. grep "a certain string" 200901*/*.csv.
You need something like:
grep NEEDLE 200901*/*.csv
(assuming your search string is NEEDLE of course - just change it to whatever you're actually looking for).
The bash shell is quite capable of expanding multi-level paths and file names.
That is, of course, only limited to the CSV files one directory down. If you want to search entire subtrees, you'll have to use the slightly mode complicated (and adaptable) find command.
Though, assuming you can set a limit on the depth, you could get away with something like (for three levels):
grep NEEDLE 200901*/*.csv 200901*/*/*.csv 200901*/*/*/*.csv
Try this :
grep -lr "Pattern" 200901*/*.csv
Try a combination of find to search for specific filename patterns and grep for finding the pattern:
find . -name "*.csv" -print -exec grep -n "NEEDLE" {} \; | grep -B1 "NEEDLE"

Resources