silver_searcher (ag) with multiple search expressions?

silver_searcher (ag) with multiple search expressions? - full-text-search

Does silver_searcher support specifying multiple search expressions something like -e in grep?
I could not find any option in the document/help.

You might want to search with both Boolean operators:
AND: ag -l pattern1 | xargs ag -l pattern2 | xargs ag 'pattern1|pattern2'
Add -d '\n' to xargs to handle spaces on filenames.
OR: ag 'pattern1|pattern2'
NOT: ag -v 'pattern'
From manual:
-l --files-with-matches:
Only print the names of files containing matches, not the matching lines. An empty query will print all files that would be searched.
-v --invert-match: Match every line not containing the specified pattern.

According to the documentation, it doesn't support multiple search patterns. That said, it does support using parallel, so you can fire off multiple instances of ag for a multi-search:
echo "foo\nbar\nbaz" | parallel 'ag --parallel --color "{}" *'
The output using the --parallel switch will be filename, linenumber and match. If that's too fancy, you can always use the OR operator in your pattern search:
ag --color "foo|bar|baz" *

Yes, you can search for multiple patterns by separating each pattern with a vertical line character (|):
ag 'pattern1|pattern2'

Related

What is the best way to process multiple lines and extract a large list of specific strings? [duplicate]

I'm after a grep-type tool to search for purely literal strings. I'm looking for the occurrence of a line of a log file, as part of a line in a seperate log file. The search text can contain all sorts of regex special characters, e.g., []().*^$-\.
Is there a Unix search utility which would not use regex, but just search for literal occurrences of a string?

You can use grep for that, with the -F option.
-F, --fixed-strings PATTERN is a set of newline-separated fixed strings

That's either fgrep or grep -F which will not do regular expressions. fgrep is identical to grep -F but I prefer to not have to worry about the arguments, being intrinsically lazy :-)
grep -> grep
fgrep -> grep -F (fixed)
egrep -> grep -E (extended)
rgrep -> grep -r (recursive, on platforms that support it).

Pass -F to grep.

you can also use awk, as it has the ability to find fixed string, as well as programming capabilities, eg only
awk '{for(i=1;i<=NF;i++) if($i == "mystring") {print "do data manipulation here"} }' file

cat list.txt
one:hello:world
two:2:nothello
three:3:kudos
grep --color=always -F"hello
three" list.txt
output
one:hello:world
three:3:kudos

I really like the -P flag available in GNU grep for selective ignoring of special characters.
It makes grep -P "^some_prefix\Q[literal]\E$" possible
from grep manual
-P, --perl-regexp
Interpret I as Perl-compatible regular
expressions (PCREs). This option is experimental when
combined with the -z (--null-data) option, and grep -P may
warn of unimplemented features.

How to extract codes using the grep command?

I have a file with below input lines.
John|1|R|Category is not found for local configuration/code/123.NNN and customer 113
TOM|2|R|Category is not found for local configuration/code/123.NNN and customer 114
PETER|3|R|Category is not found for local configuration/code/456.1 and customer 115
I need to extract only the above highlighted text using the grep command.
I tried the below command and didn't get the proper result. Getting the extra 2 unwanted characters in the output. Please suggest if there is any other way to achieve this through grep command.
find ./ -type f -name <FileName> -exec cut -f 4 -d'|' {} + |
grep -o 'Category is not found for local configuration/code/...\\....' |
grep -o '...\\....' | sort | uniq
Current Output:
123.NNN
456.1 a
Expected output:
123.NNN
456.1

You can use another grep regular expression.
find ./ -type f -name f -exec cut -f 4 -d'|' {} + |
grep -o 'Category is not found for local configuration/code/...\.[^ ]*' |
grep -o '...\..*' | sort | uniq
. matches any character, [^ ]* matches any sequence of characters until the first space
Output:
123.NNN
456.1

Your regex specifies a fixed character width for strings of variable width. Based on your examples, something like
[0-9]\+\.[A-Z0-9]\+
would seem like a better regex. However, we could probably also simplify this by merging the cut and multiple grep commands into a single Awk script.
find etc etc -exec awk -F '|' '
$4 ~ /Category is not found for local configuration\/code\/[0-9]{3}\.[0-9A-Z]/ {
split($4, a, /\/code\/);
split(a[2], b); print b[1] }' {} + |
sort -u
The two split operations are just a cheap way to pick out the text between /code/ and the next whitespace character; we have already established by way of the regex match that the string after /code/ matches the pattern we're after.
Notice also how sort has a -u option which allows you to replace (trivial cases of) uniq.
The regex variant supported by Awk is slightly different than that supported by POSIX grep; so the backslashed \+ in grep's BRE dialect is plain + in the dialect called ERE which is [more or less] supported by Awk - and grep -E. If you have grep -P you can use a third variant which has a convenient feature;
find etc etc -exec grep -oP '^([^|]*[|]){3}[^|]*Category is not found for local configuration/code/\K[0-9]{3}\.[0-9A-Z]+' {} + |
sort -u
The \K says "match up through here, but forget everything before this" and so only prints the part after this token.

With sed:
sed -E -n 's#.*code/(.*)\s+and.*#\1#p' file.txt | uniq
Output:
123.NNN
456.1

I'd use the -P option:
grep -oP '/code/\K\S+' file | sort -u
You want to extract the non-whitespace characters following /code/

An awk using match():
$ awk 'match($0,/[0-9]+\.[A-Z0-9]+/)&&++a[(b=substr($0,RSTART,RLENGTH))]==1{print b}' file
Output:
123.NNN
456.1
Pretty printed for slightly better readability:
$ awk '
match($0,/[0-9]+\.[A-Z0-9]+/) && ++a[(b=substr($0,RSTART,RLENGTH))]==1 {
print b
}' file

It's not possible just using grep. You should use AWK instead:
awk '{split($7, ar, "/"); print ar[3]}' FILE
Explanation:
The split function splits on a string, here $7, the 7th field, placing the result in an array ar, and using the string / as delimiter.
Then prints the 3rd field of the array.
Note:
I am assuming that all of your input looks like the samples you have given us, i.e.:
aaa|b|c|ddd is not found for local configuration/code/111.nnn and customer nnn
Where aaa and ddd will not contain whitespace.
I also assume you really do have a file FILE containing those lines. It's a bit unclear.
Input:
▶ cat FILE
John|1|R|Category is not found for local configuration/code/123.NNN and customer 113
TOM|2|R|Category is not found for local configuration/code/123.NNN and customer 114
PETER|3|R|Category is not found for local configuration/code/456.1 and customer 115
Output:
▶ awk '{split($7, ar, "/"); print ar[3]}' FILE
123.NNN
123.NNN
456.1

Single sed can do the filtering.
(The pattern can be further generalized as suggested by others if that is an option. But be careful to not to over simplify so that it can match with unexpected inputs)
sed -nE 's#(\S+\s+){6}configuration/code/(\S+)\s.*#\2#p' input.txt
To replace your exact command,
find ./ -type f -name <Filename> -exec cat {} \; | sed -nE 's#(\S+\s+){6}configuration/code/(\S+)\s.*#\2#p' | sort | uniq

Simple substitutions on individual lines is the job sed is best suited for. This will work using any sed in any shell on any UNIX box:
$ cat file
John|1|R|Category is not found for local configuration/code/123.NNN and customer 113
TOM|2|R|Category is not found for local configuration/code/123.NNN and customer 114
PETER|3|R|Category is not found for local configuration/code/456.1 and customer 115
$ sed -n 's:.*Category is not found for local configuration/code/\([^ ]*\).*:\1:p' file | sort -u
123.NNN
456.1

Extract number with special characters from a file using sed and grep command

I am trying to extract the number surrounded by the square brackets after matching with the word that is placed after the number. for ex.
The file contains
xxxx [098] yyyy zzzz
I need to search for the yyyy and if it is matches in the line, i have to extract the 098 itself.
I am trying with
sed 's/.*\[\([^]]*\)\].*/\1/g' str.txt
for extracting the number without pattern matching.
and am using
sed -nr 's/.*( |^)([0-9]+) yyyy.*/\2/p' str.txt
for pattern match and get the number that is placed before that match. But i couldn't merge this two commands. I am confusing with the error
sed: -e expression #1, char 26: unknown option to `s'
I think this held because of using the / as the delimiter more.

You always need to use the conditional print logic for such cases where you you decide to print on a condition. With the -n and p predicates, you can control the command to print the matched group only if the match was successful.
So combining your attempts you need something like
sed -n 's/.*\[\([^]]*\)\][[:space:]]yyyy.*/\1/p'
which won't print for any other case other than yyyy after the [..] string.
But parsing a space de-limited file is quite easy if you decide to use awk in which your result could simply be written as
awk '$3 == "yyyy" { gsub(/[][]/,"",$2); print $2 }'

As you have tagged grep, another option if you can use gnu grep is to make use of the -P option Perl-compatible regular expression and use lookarounds:
grep -Po "(?<=\[)\d+(?=\] yyyy)" str.txt
That will give you 098

if data in 'd' tried on gnu sed
sed -E 's/.*xxxx\s*\[(098)\]\s*yyyy.*/\1/' d

Grep multiple strings from text file

Okay so I have a textfile containing multiple strings, example of this -
Hello123
Halo123
Gracias
Thank you
...
I want grep to use these strings to find lines with matching strings/keywords from other files within a directory
example of text files being grepped -
123-example-Halo123
321-example-Gracias-com-no
321-example-match
so in this instance the output should be
123-example-Halo123
321-example-Gracias-com-no

With GNU grep:
grep -f file1 file2
-f FILE: Obtain patterns from FILE, one per line.
Output:
123-example-Halo123
321-example-Gracias-com-no

You should probably look at the manpage for grep to get a better understanding of what options are supported by the grep utility. However, there a number of ways to achieve what you're trying to accomplish. Here's one approach:
grep -e "Hello123" -e "Halo123" -e "Gracias" -e "Thank you" list_of_files_to_search
However, since your search strings are already in a separate file, you would probably want to use this approach:
grep -f patternFile list_of_files_to_search

I can think of two possible solutions for your question:
Use multiple regular expressions - a regular expression for each word you want to find, for example:
grep -e Hello123 -e Halo123 file_to_search.txt
Use a single regular expression with an "or" operator. Using Perl regular expressions, it will look like the following:
grep -P "Hello123|Halo123" file_to_search.txt
EDIT:
As you mentioned in your comment, you want to use a list of words to find from a file and search in a full directory.
You can manipulate the words-to-find file to look like -e flags concatenation:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' '
This will return something like -e "Hello123" -e "Halo123" -e "Gracias" -e" Thank you", which you can then pass to grep using xargs:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' ' | dir_to_search/*
As you can see, the last command also searches in all of the files in the directory.
SECOND EDIT: as PesaThe mentioned, the following command would do this in a much more simple and elegant way:
grep -f words_to_find.txt dir_to_search/*

Grep with a variable

I would like to grep a number of documents by using a set of search terms and to specify the number of characters after match. Here is what I tried
grep -F -o -P "$(<search.txt).{0,4}" foo.txt
but I get the message 'grep: conflicting matchers specified' because -F and '-oP' cannot be combined. It does not work with '-E' either.

-F and -P are conflicting options, simple as that. The first means that the patterns are fixed strings, the second means that the patterns are Perl-compatible regular expressions. Perhaps you meant to use -f instead, which reads patterns from a file or a process substitution.
If you want to match any of the patterns in your file, followed by 4 characters, you could use something like this
grep -oP -f <(awk '{print $0 ".{4}"}' search.txt) file
This dynamically adds the pattern to each line in the file.
Alternatively, a more portable and concise version would be this:
sed 's/$/.{0,4}/' search.txt | grep -f - -oP file

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

silver_searcher (ag) with multiple search expressions? - full-text-search

Does silver_searcher support specifying multiple search expressions something like -e in grep? I could not find any option in the document/help.

Yes, you can search for multiple patterns by separating each pattern with a vertical line character (|): ag 'pattern1|pattern2'

Related

What is the best way to process multiple lines and extract a large list of specific strings? [duplicate]

How to extract codes using the grep command?

Extract number with special characters from a file using sed and grep command

Grep multiple strings from text file

Grep with a variable

Categories

Resources