grep -v *string* and grep -v string creating wildly different results - terminal

grep -v mystring myfile.txt
returns ~300KB
grep -v *mystring* myfile.txt
returns ~7GB
....what am I doing wrong here?

Your regular expression is wrong. By default grep takes regular expressions as argument along with the command line flags. The one you have attempted *mystring* is a shell glob expression which expands to a possible set of filenames containing the string mystring. So your grep commands becomes the following; on an assumption that you have filenames containing mystring
grep -v mystring1 foomystring2 foomystring3 myfile.txt
which could produce unexpected results depending on the contents of those files. The right way would be to use the greedy match quantifier .*
grep -v '.*mystring1.*' myfile.txt

Related

grep for exact word in a file containing "."

I have a file named "TestGrep" that contains content as shown below
#!/bin/bash
/ParentFolder/a #email1.com
/ParentFolder/b #email2.com
/ParentFolder/.a #email1.com
/ParentFolder/.b #email2.com
/ParentFolder/ #email3.com
I am using the below grep command
grep -Fw "/ParentFolder/" TestGrep
The output is
/ParentFolder/.a #email1.com
/ParentFolder/.b #email2.com
/ParentFolder/ #email3.com
It is somehow ignoring the dots in the TestGrep file.
I want the output to be shown as below
/ParentFolder/ #email3.com
How can I query using grep command that would just check if the exact string match is done and return output as expected.
Could you please try following. Using -E option of grep here.
grep -E '/ParentFolder/\s+' Input_file
From man grep about -E option of grep:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression
\s+ means looks for spaces one or more occurrences.

combine grep and grep -v search together

I am trying to combine grep and grep -v search together.
Output should be display all lines ending with .xml, but to exclude lines starting with $.
Here are the commands I have tried; none worked:
grep *.xml file1.txt | grep -v '$' file1.txt > output
grep *.xml | grep -v '$' file1.txt > output
grep *.xml grep -v '$' file1.txt > output
grep *.xml '$' file1.txt > output
To match a $ at the start of a line, anchor it to the start of the line with ^. Also, $ by itself matches the end of the line (it's a special character, just like ^), and * will not do what you think it does (it works differently in regular expressions compared to in shell globbing patterns). So,
grep -v '^\$'
will filter out all lines starting with a $.
You can do either
grep '\.xml$' file1.txt | grep -v '^\$'
or
grep '^[^$].*\.xml$' file1.txt
to find all lines in the file file1.txt that do not start with $ but that ends with .xml.
Notice that I also escape the dot in .xml as that otherwise matches any character, and that the second command combines both criteria by using a character range ([ ... ]) containing all characters except $ (the .* matches any number of any characters).
The single quotes are necessary so that the shell won't interpret the regular expression as a shell globbing pattern.
You should use "cat" command to direct the output to an file.
And then use regular expression to filter the keyword, in this case all lines start with $ symbol is '^[$]'.
So you can use command cat *.xml | grep -v '^[$]'.

grep not finding ".*" string values

I have a file temp.txt as below.
a.*,super
I want to grep .* to check whether the value is present in the file or not.
Command used:
grep -i ".*" temp.txt
returns nothing
This is because grep considers the pattern as a regular expression.
To make grep interpret it as a literal, use -F.
grep -F ".*" temp.txt
Also, note -i is not needed, because there is no case distinction to take into account (we for example use it to make grep return AB, aB, Ab and ab when doing grep -i "ab").
As man grep says:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines,
any of which is to be matched. (-F is specified by POSIX.)
-i, --ignore-case
Ignore case distinctions in both the PATTERN and the input files. (-i
is specified by POSIX.)
Using awk
awk '/\.\*/' file
or fgrep
fgrep ".*" file
Both ., * have special meaning in regular expression. Escape them to match literally.
$ cat temp.txt
a.*,super
$ grep "\.\*" temp.txt
a.*,super
$ echo $?
0
$ grep "there-is-no-such-string" temp.txt
$ echo $?
1
-i is not need because there's no alphabet in the regular expression.

Finding a whole string using grep

I'm trying to find whole strings using grep. I am familiar with -w flag, but it gives me hard time since it refers a dot as a delimiter.
For example, I have a file named "a.txt" and a directory names a in some directory, this is what happens:
> ls | grep -w a
a
a.txt
What I want it to find is only "a" and that's it.
How can I do that?
If you want a single a on a line, use
grep '^a$'
If you only take whitespace as the delimiter, use
grep '\([[:space:]]\|^\)a\([[:space:]]\|$\)'
(i.e. whitespace or beginning of the line, a, whitespace or end of the line).
use the x optin of grep
ls | grep -x a
A simpler approach too would be:
grep '^[[:space:]]*a[[:space:]]*$'
Something more friendly with variables is by use of awk. It would not interpret input pattern as regex.
awk -v v="$var" '{ sub(/^[[:space:]]*/, ""); sub(/[[:space:]]*$/, ""); }; $0 == v;'

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources