How to grep only matched files with count? - bash

I have many text files and I want to list all files that contain foo.
grep -rc will display both matched and unmatched files, but i only want to see matched files. So I came up with this ugly pipeline:
grep -rc "foo" | grep -v ":0$"
The pipeline drops auto coloring.
Can I get the same result without a pipeline?

No, GNU grep doesn't provide such a feature. But you can tell grep to always color the output, then filter out lines ending in :<some ANSI escape codes>0 like this:
grep --color=always -rc 'foo' | grep -v $':\x1B\[m\x1B\[K0$'
($'...' syntax is a bash extension though.)
Below is a screenshot of the result on my terminal.
See ANSI-C Quoting, ANSI escape code.

Related

What is the best way to process multiple lines and extract a large list of specific strings? [duplicate]

I'm after a grep-type tool to search for purely literal strings. I'm looking for the occurrence of a line of a log file, as part of a line in a seperate log file. The search text can contain all sorts of regex special characters, e.g., []().*^$-\.
Is there a Unix search utility which would not use regex, but just search for literal occurrences of a string?
You can use grep for that, with the -F option.
-F, --fixed-strings PATTERN is a set of newline-separated fixed strings
That's either fgrep or grep -F which will not do regular expressions. fgrep is identical to grep -F but I prefer to not have to worry about the arguments, being intrinsically lazy :-)
grep -> grep
fgrep -> grep -F (fixed)
egrep -> grep -E (extended)
rgrep -> grep -r (recursive, on platforms that support it).
Pass -F to grep.
you can also use awk, as it has the ability to find fixed string, as well as programming capabilities, eg only
awk '{for(i=1;i<=NF;i++) if($i == "mystring") {print "do data manipulation here"} }' file
cat list.txt
one:hello:world
two:2:nothello
three:3:kudos
grep --color=always -F"hello
three" list.txt
output
one:hello:world
three:3:kudos
I really like the -P flag available in GNU grep for selective ignoring of special characters.
It makes grep -P "^some_prefix\Q[literal]\E$" possible
from grep manual
-P, --perl-regexp
Interpret I as Perl-compatible regular
expressions (PCREs). This option is experimental when
combined with the -z (--null-data) option, and grep -P may
warn of unimplemented features.

Grep multiple strings from text file

Okay so I have a textfile containing multiple strings, example of this -
Hello123
Halo123
Gracias
Thank you
...
I want grep to use these strings to find lines with matching strings/keywords from other files within a directory
example of text files being grepped -
123-example-Halo123
321-example-Gracias-com-no
321-example-match
so in this instance the output should be
123-example-Halo123
321-example-Gracias-com-no
With GNU grep:
grep -f file1 file2
-f FILE: Obtain patterns from FILE, one per line.
Output:
123-example-Halo123
321-example-Gracias-com-no
You should probably look at the manpage for grep to get a better understanding of what options are supported by the grep utility. However, there a number of ways to achieve what you're trying to accomplish. Here's one approach:
grep -e "Hello123" -e "Halo123" -e "Gracias" -e "Thank you" list_of_files_to_search
However, since your search strings are already in a separate file, you would probably want to use this approach:
grep -f patternFile list_of_files_to_search
I can think of two possible solutions for your question:
Use multiple regular expressions - a regular expression for each word you want to find, for example:
grep -e Hello123 -e Halo123 file_to_search.txt
Use a single regular expression with an "or" operator. Using Perl regular expressions, it will look like the following:
grep -P "Hello123|Halo123" file_to_search.txt
EDIT:
As you mentioned in your comment, you want to use a list of words to find from a file and search in a full directory.
You can manipulate the words-to-find file to look like -e flags concatenation:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' '
This will return something like -e "Hello123" -e "Halo123" -e "Gracias" -e" Thank you", which you can then pass to grep using xargs:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' ' | dir_to_search/*
As you can see, the last command also searches in all of the files in the directory.
SECOND EDIT: as PesaThe mentioned, the following command would do this in a much more simple and elegant way:
grep -f words_to_find.txt dir_to_search/*

Grep with a variable

I would like to grep a number of documents by using a set of search terms and to specify the number of characters after match. Here is what I tried
grep -F -o -P "$(<search.txt).{0,4}" foo.txt
but I get the message 'grep: conflicting matchers specified' because -F and '-oP' cannot be combined. It does not work with '-E' either.
-F and -P are conflicting options, simple as that. The first means that the patterns are fixed strings, the second means that the patterns are Perl-compatible regular expressions. Perhaps you meant to use -f instead, which reads patterns from a file or a process substitution.
If you want to match any of the patterns in your file, followed by 4 characters, you could use something like this
grep -oP -f <(awk '{print $0 ".{4}"}' search.txt) file
This dynamically adds the pattern to each line in the file.
Alternatively, a more portable and concise version would be this:
sed 's/$/.{0,4}/' search.txt | grep -f - -oP file

Passing filepaths containing spaces with xargs

I'm trying to use xargs to pass the contents of a variable containing zero or more filepaths separated by newlines to another command and have been having inconsistent success.
My input is the output of this:
newHTK=`grep -Fxv -f $TMPFILE /Users/foo/.htk`
Which generates the aforementioned list of filenames. Here's where things go wrong (or sometimes inexplicably right):
echo "$newHTK" | xargs -L 1 xattr -w com.apple.metadata:kMDItemFinderComment htk
The intention is for is to use each line in $newHTK as a filename argument for xattr. What usually happens is xattr splits the input at the spaces. I think I might need to escape the filenames coming out of the echo command or somehow enclose them in double quotation marks (Any advice on an easy way to do this would be appreciated). But if that's the case why did it work for some of the files?
You can use the xargs -I flag (if you have it I don't know what its portability is) to do this.
grep -Fxv -f $TMPFILE /Users/foo/.htk | xargs -I % xattr -w com.apple.metadata:kMDItemFinderComment htk %

Use lines in a file as filenames for grep?

I have a file which contains filenames (and the full path to them) and I want to search for a word within all of them.
some pseudo-code to explain:
grep keyword <all files specified in files.txt>
or
cat files.txt > grep keyword
cat files txt | grep keyword
the problem is that I can only get grep to search the filenames, not the contents of the actual files.
cat files.txt | xargs grep keyword
or
grep keyword `cat files.txt`
or (equivalent to previous but harder to mis-read)
grep keyword $(cat files.txt)
should do the trick.
Pitfalls:
If files.txt contains file names with spaces, either solution will malfunction, because "This is a filename.txt" will be interpreted as four files, "This", "is", "a", and "filename.txt". A good reason why you shouldn't have spaces in your filenames, ever.
There are ways around this, but none of them is trivial. (find ... -print0 / xargs -0 is one of them.)
The second (cat) version can result in a very long command line (which might fail when exceeding the limits of your environment). The first (xargs) version handles long input automatically; xargs offers several options to control the details.
Both of the answers from DevSolar work (tested on Linux Ubuntu), but the xargs version is preferable if there may be many files, since it will avoid running into command line length limits.
so:
cat files.txt | xargs grep keyword
is the way to go
tr '\n' '\0' <files.txt | LANG=C xargs -r0 grep -F keyword
tr will delimit names with NUL character so that spaces not significant (note the corresponding -0 option to xargs).
xargs -r will start a single grep process for a "large" number of files, but not start any grep process if there are no files.
LANG=C means use quick routines for matching, rather than slow locale ones
grep -F means use quick string matching rather than slow regular expression matching
bash, ksh & zsh version:
grep keyword $(<files.txt)
Long time when last created a bash shell script, but you could store the result of the first grep (the one finding all filenames) in an array and iterate over it, issuing even more grep commands.
A good starting point should be the bash scripting guide.

Resources