Why did I get different answers when I changed grep to egrep in the latter half of each [duplicate] - bash

This question already has answers here:
Difference between egrep and grep
(6 answers)
Closed 6 years ago.
$ egrep "^COMP[29]041" enrolments | grep "|F$" | wc -l
24
$ egrep "^COMP[29]041" enrolments | egrep "|F$" | wc -l
166
$
The content of file enrolments:
COMP2041|4836917|Ruld, Ruld |3978/2|M
COMP2041|4850109|Rvyiparzal, Ilbvuy |3979/3|M
COMP2041|2858836|Rzild, Fia Held |3730/4|M
COMP2041|4823158|Sheld, Yild |3978/2|M
COMP2041|4818044|Sheo, Sheo |3978/2|M
COMP2041|4818497|Sheo, Xa |3978/2|M
COMP9041|4899688|Shild, Ge |8680/2|M
COMP2041|4869506|Shild, Yild |3645/2|M
COMP9041|4897426|Shild, Yild |8680/2|M
COMP9041|4368551|Sho, Wuld |8684 |M
COMP2041|4339940|Shuld, Puaxail Baili |3978/3|F
COMP2041|4330093|Veh, Yeold-He |3711/3|M
COMP2041|2230267|Vikil, Ivrha |3978/3|F
COMP2041|4312663|Viy Chiobhova, Jiozrigh |3978/1|M
.......
The question is why I got different answers when I changed grep to egrep in the latter half of each.
What are the differences between grep and egrep?

In egrep (or, preferably, grep -E), the | is a metacharacter, whereas in plain grep it is a plain (non-meta) character.
The |F$ term in egrep looks for an empty string or F at the end of line; it finds an empty string on every line.
The same term in grep looks for a |F at the end of line. To look for that with egrep, you'd need to escape the metacharacter with a backslash: grep -E '\|F$' enrolments.
In short, the plain grep command understands Basic Regular Expressions (BRE). The egrep or 'extended grep' command understands Extended Regular Expressions (ERE). Some versions of grep (such as GNU grep) can be compiled to recognize Perl-Compatible Regular Expressions (PCRE).

Related

sed combine two search and replace [duplicate]

This question already has answers here:
Combining two sed commands
(2 answers)
Closed 1 year ago.
I am currently making a command that grabs information from iwconfig, grep's a certain line, cuts a portion and then runs two sed search and replace functions so I can pipe it's output elsewhere. The command currently is as follows:
iwconfig wlan0 | grep ESSID | cut -c32-50 | sed 's/ //g' | sed 's/"//g'
The output comes out as intended, removing whitespace and "'s, but I am wondering if there is a way to condense my search and replace into a single command, preferably with an and / or operator. Is there a way to do this? And how would the sed command be written if so? Thanks!
You haven't shown what iwconfig produces in your case, but, on my system, the following successfully extracts the ESSID:
iwconfig wlan0 | sed -n 's/.*ESSID://p'
If there really are spaces and quotes that need to be removed, then try:
iwconfig wlan0 | sed -n 's/[ "]//g; s/.*ESSID://p'
How it works
-n
This tells sed not to print any line unless we explicitly ask it to.
s/[ "]//g
This removes spaces and double-quotes.
s/.*ESSID://p
This removes everything up to and including ESSID:. If a substitution is made, meaning that this line contains ESSID:, then print it.
Example
$ echo '"something" ESSID:"my id"' | sed -n 's/[ "]//g; s/.*ESSID://p'
myid
regexp1\|regexp2
Matches either regexp1 or regexp2. Use parentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. It is a GNU extension.
sed 's/ \|"//g'
should work
With GNU awk for gensub():
iwconfig wlan0 | awk '/ESSID/{print gensub(/[ "]/,"","g",substr($0,32,19))}'
There MAY be a simpler method but without sample input/output (i.e. output from iwconfig and what you want the script to output) I'm not going to guess...

Why do you have to escape | and + in grep between apostrophes?

I was under the impression that within single quotes, e.g. 'pattern', bash special characters are not interpolated, so one need only escape single quotes themselves.
Why then does echo "123" | grep '[0-9]+' output nothing, whereas echo "123" | grep '[0-9]\+' (plus sign escaped) output 123? (Likewise, echo "123" | grep '3|4' outputs nothing unless you escape the |.)
This is under bash 4.1.2 and grep 2.6.3 on CentOS 6.5.
grep uses Basic Regular Expressions, like sed and vi. In that you have to escape metacharacters, and it is tedious.
You probably want Extended Regular Expressions, so use egrep or grep -E (depending on the version in use). Check your man grep.
See also the GNU documentation for a full list of the characters involved.
Most languages use Extended Regular Expressions (EREs) these days, and they are much easier to use. Basic Regular Expressions (BREs) are really a throw-back.
That seems to be the regular expression engine that grep uses. If you use a different one, it works:
$ echo "123" | grep '[0-9]+'
$ echo "123" | grep -P '[0-9]+'
123
$ echo "123" | grep '3|4'
$ echo "123" | grep -P '3|4'
123

How can I read a file and display only the relevant lines using grep? [duplicate]

This question already has answers here:
grep egrep multiple-strings
(4 answers)
Closed 8 years ago.
I'm trying to read /var/log/messages in order to identify a problem with the pacemakerd.
The problem is that the log is full with notifications from xinetd and nrpe, so the only way i know is:
# tail -n 2000 /var/log/messages |grep -v xinetd | grep -v nrpe |less
So my question is if there's a way to use the -v xinetd and nrpe in the same grep?
Thanks in advance
You can use
grep -v "xinetd\|nrpe"
Sure, you can use first_pattern|second_pattern together with the -E option of grep:
tail -n 2000 /var/log/messages | grep -Ev "xinetd|nrpe"
From man grep:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below).
(-E is specified by POSIX.)
Example
$ cat a
hello this is me
bye this is me
and that's all
$ grep -Ev "hello|bye" a
and that's all
grep -v "xinetd\|nrpe"
is correct and sufficient. Options like -E or egrep are not necessary.
More variaties:
grep -v "^xinetd\|nrpe" # exclude lines starting with xinetd, and any nrpe"
grep -v "xinetd$\|nrpe" # exclude lines ending with xinetd, and any nrpe"

how to grep multiples variable in bash

I need to grep multiple strings, but i don't know the exact number of strings.
My code is :
s2=( $(echo $1 | awk -F"," '{ for (i=1; i<=NF ; i++) {print $i} }') )
for pattern in "${s2[#]}"; do
ssh -q host tail -f /some/path |
grep -w -i --line-buffered "$pattern" > some_file 2>/dev/null &
done
now, the code is not doing what it's supposed to do. For example if i run ./script s1,s2,s3,s4,.....
it prints all lines that contain s1,s2,s3....
The script is supposed to do something like grep "$s1" | grep "$s2" | grep "$s3" ....
grep doesn't have an option to match all of a set of patterns. So the best solution is to use another tool, such as awk (or your choice of scripting languages, but awk will work fine).
Note, however, that awk and grep have subtly different regular expression implementations. It's not clear from the question whether the target strings are literal strings or regular expression patterns, and if the latter, what the expectations are. However, since the argument comes delimited with commas, I'm assuming that the pieces are simple strings and should not be interpreted as patterns.
If you want the strings to be interpreted as patterns, you can change index to match in the following little program:
ssh -q host tail -f /some/path |
awk -v STRINGS="$1" -v IGNORECASE=1 \
'BEGIN{split(STRINGS,strings,/,/)}
{for(i in strings)if(!index($0,strings[i]))next}
{print;fflush()}'
Note:
IGNORECASE is only available in gnu awk; in (most) other implementations, it will do nothing. It seems that is what you want, based on the fact that you used -i in your grep invocation.
fflush() is also an extension, although it works with both gawk and mawk. In Posix awk, fflush requires an argument; if you were using Posix awk, you'd be better off printing to stderr.
You can use extended grep
egrep "$s1|$s2|$s3" fileName
If you don't know how many pattern you need to grep, but you have all of them in an array called s, you can use
egrep $(sed 's/ /|/g' <<< "${s[#]}") fileName
This creates a herestring with all elements of the array, sed replaces the field separator of bash (space) with | and if we feed that to egrep we grep all strings that are in the array s.
test.sh:
#!/bin/bash -x
a=" $#"
grep ${a// / -e } .bashrc
it works that way:
$ ./test.sh 1 2 3
+ a=' 1 2 3'
+ grep -e 1 -e 2 -e 3 .bashrc
(here is lots of text that fits all the arguments)

grep match exact substring ignoring regex syntax [duplicate]

This question already has answers here:
Grep for literal strings
(6 answers)
Closed 6 years ago.
Is there some way to make grep match an exact string, and not parse it as a regex? Or is there some tool to escape a string properly for grep?
$ version=10.4
$ echo "10.4" | grep $version
10.4
$ echo "1034" | grep $version # shouldn't match
1034
Use grep -F or fgrep.
$ echo "1034" | grep -F $version # shouldn't match
$ echo "10.4" | grep -F $version
10.4
See man page:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated
by newlines, any of which is to be matched.
I was looking for the term "literal match" or "fixed string".
(See also Using grep with a complex string and How can grep interpret literally a string that contains an asterisk and is fed to grep through a variable?)

Resources