grep for a specific pattern in a file? - shell

I have a file textFile.txt
abc_efg#qwe.asd
abc_aer#
#avret
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
qwe.caer
I want to grep to get specific lines :
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
That is the ones that have
[a-z]_[a-z]#[a-z].[a-z]
but the part before the # can have any number of "_"
So far this is what I have :
grep "[a-z]_[a-z]#[a-z].[a-z]" textFile.txt
But I got only one line as the output.
wqe_a#qwea.cae
Could I know a better way to do this ? :)

you can add the _ simply inside [a-z_] so the new command is:
grep "[a-z_]#[a-z].[a-z]" textFile.txt
or if you want it to start with a non _ you can have
grep "[a-z][a-z_]#[a-z].[a-z]" textFile.txt

I would suggest keeping it simple by checking only one # is present in each line:
grep -E '^[^#]+#[^#]+$' file
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae

The following selects lines that have at least one underline character followed by letters before the at-sign and one or more letters followed by at least one literal period after the at-sign:
$ grep '_[a-z]\+#[a-z]\+\.' textFile.txt
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
Notes
An unescaped period matches any character. If you want to match a literal period, it must be escaped like '.`.
Thus, #[a-z].[a-z] matches an at-sign, followed by a letter, followed by anything at all, followed by a letter.
[a-z] matches a single letter. Thus _[a-z]# would match only if there was only one character between the underline and the at-sign. To match one or more letters, use [a-z]\+.
#[a-z]\+\. will match an at-sign, followed by one or more letters, followed by a literal period character.

When you do [a-z] it only matches one character of that set. That's why you are only getting wqe_a#qwea.cae back from your grep call because there is only one character between the _ and the #.
To match more than one character, you can use + or *. + means one or more of the set and * any number of that set. As well, an unescaped . means any character.
So something like:
grep "[a-z]\+_[a-z]\+#[a-z]\+\.[a-z]\+" textFile.txt would work for this. There are shorter, less specific ways of doing this as well (that other answers have shown).
Note the escapes before the + signs and the . .

This regex should get all valid email from a text file:
grep -E -o "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" file
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
This greps for pattern like this text#text.some_more_text

Related

grep for a variable content with a dot

i found many similar questions about my issue but i still don't find the correct one for me.
I need to grep for the content of a variable plus a dot but it doesn't run escaping the dot after the variable. For example:
The file content is
item.
newitem.
My variable content is item. and i want to grep for the exact word, therefore I must use -w and not -F but with the command I can't obtain the correct output:
cat file | grep -w "$variable\."
Do you have suggestions please?
Hi, I have to rectify my scenario. My file contains some FQDN and for some reasons I have to look for hostname. with the dot.
Unfortunatelly the grep -wF doesn't run:
My file is
hostname1.domain.com
hostname2.domain.com
and the command
cat file | grep -wF hostname1.
doesn't show any output. I have to find another solution and I'm not sure that grep could help.
If $variable contains item., you're searching for item.\. which is not what you want. In fact, you want -F which interprets the pattern literally, not as a regular expression.
var=item.
echo $'item.\nnewitem.' | grep -F "$var"
Try:
grep "\b$word\."
\b: word boundary
\.: the dot itself is a word boundary
Following awk solution may help you in same.
awk -v var="item." '$0==var' Input_file
You are dereferencing variable and append \. to it, which results in calling
cat file | grep -w "item.\.".
Since grep accepts files as parameter, calling grep "item\." file should do.
from man grep
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
and
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string
provided it's not at the edge of a word. The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].
as the last character is a . it must be followed by a non word [A-Za-z0-9_] however the next character is d
grep '\<hostname1\.'
should work as \< ensures previous chracter is not a word constituent.
You can dynamically construct the search pattern and then call grep
rexp='^hostname1\.'
grep "$rexp" file.txt
The single quotes tell bash not to interpret special characters in the variable. Double quotes tell bash to allow replacing $rexp with its value. The caret ( ^ ) in the expression tells grep to look for lines starting with 'hostname1.'

How to do command "grep -oP" in a line that contains special Characters?

How can I grep a line that contains special characters.
as for example I have a file containing this text
ISA^G00^G ^G00^G ^G12^G14147844480 ^GZZ^G001165208 ^G160601^G1903^GU^G00401^G600038486^G0^GP^G>~GS^GTX^G14147844480^G001165208^G20160601^G1903^G600038486^GX^G004010VICS~ST^G864^G384860001~BMG^G00^G^G04~MIT^G000000591^GKohl's AS2 Certificate Change June 21, 2016~N1^GFR^GKOHL'S DEPARTMENT STORES~PER^GIC^GEDIMIO#kohls.com^GTE^G262-703-7334~MSG^GAttention Kohl's AS2 trading partners, Kohl's will be changing.
I would like to grep the line under MSG segment
using this command:
grep -oP 'MSG.\K[\w\s\d]*' < filename
Expected Result :
Attention Kohl's AS2 trading partners, Kohl's will be changing.
Actual Result:
Attention Kohl
How will I do it?
Your pattern:
grep -Po 'MSG.\K[\w\s\d]*'
is matching just Attention Kohl because you have a single quote after that which will not be matched by any of the \w, \s, \d tokens.
You also have , and . within your desired portion, so you need to match those too. Also, \d is actually a subset of \w so no need for explicit \d.
So you can do:
grep -Po 'MSG.\K[\w\s,.'\'']*'
Or if you you just want to match till the end:
grep -Po 'MSG.\K.*'
Do you just want grep everything after MSG? That would be simpler with other methods.
Also I see there are multiple ^G characters in your file, adjacent to MSG word as well. Not sure if you want to exclude those while grep'ing.
Back to your given regex - you can add \W which will match a non-word character and give you desired result.
grep -oP 'MSG.\K[\w\s\d\W]*' filename
Also no need to use < operator to grep here.

terminal extrac words ending with ".abc" from file

I want to do the following through the terminal. I have a file with many lines, each line containing a whole sentence. Some lines are empty. I want to read the file and extract all words that end with .abc. I want to do this through the terminal. How might I do that?
grep can be very usefull
$ cat input
.abc
.abdadf
assadf.abc
adsfas.abcadf
asdf.abc
$ grep -o '\b[^\.]*\.abc\b' input
assadf.abc
asdf.abc
What it does
-o prints the string in the line which match the regex given
\b[^\.]*\.abc\b regex matches any word wich ends with .abc
\b word boundary
[^\.] anything other than a .
* matches zero or more
\.abc\b matches .abc followed by word boundary \b
Note
If the word can contain more than one . then modify the regex as
\b.*\.abc\b
where .* would match anything including .
To find all the words that ends with .abc.
grep -oP '\S*\.abc(?=\s|$)' file
\S* Zero or more non-space charcaters.
(?=\s|$) Positive lookahead asserts that the character following the match must be a space or end of the line anchor.
Try awk among various other possibities.
awk '/\.abc$/' file
You can use sed command also.
sed -n '/\.abc$/ p' file

Print all characters upto a matching pattern from a file

Maybe a silly question but I have a text file that needs to display everything upto the first pattern match which is a '/'. (all lines contain no blank spaces)
Example.txt:
somename/for/example/
something/as/another/example
thisfile/dir/dir/example
Preferred output:
somename
something
thisfile
I know this grep code will display everything after a matching pattern:
grep -o '/[^\n]*' '/my/file.txt'
So is there any way to do the complete opposite, maybe rm everything after matching pattern or invert to display my preferred output?
Thanks.
If you're calling an external command like grep, you can get the same results your require with the sed command, i.e.
echo "something/as/another/example" | sed 's:/.*::'
something
Instead of focusing on what you want to keep, think about what you want to remove, in this case everything after the first '/' char. This is what this sed command does.
The leading s means substitute, the :/.*: is the pattern to match, with /.* meaning match the first /' char and all characters after that. The 2nd half of thesedcommand is the replacement. With::`, this means replace with nothing.
The traditional idom for sed is to use s/str/rep/, using / chars to delimit the search from the replacement, but you can use any character you want after the initial s (substitute) command.
Some seds expect the / char, and want a special indication that the following character is the sub/replace delimiter. So if s:/.*:: doesn't work, then s\:/.*:: should work.
IHTH.
Yu can use a much simpler reg exp:
/[^/]*/
The forward slash after the carat is what you're matching to.
jsFiddle
Assuming filename as "file.txt"
cat file.txt | cut -d "/" -f 1
Here, we are cutting the input line with "/" as the delimiter (-d "/"). Then we select the first field (-f 1).
You just need to include starting anchor ^ and also the / in a negated character class.
grep -o '^[^/]*' file

searching specefic word in shell script

I have a problem.Please give me a solution.
I have to run a command as I given below, which will list all the files that contain the string given "abcde1234".
find /path/to/dir/ * | xargs grep abcde1234
But here it will display the files which contain the string "abcde1234567" also.But I nee only files which contain the word "abcde1234". What modification shall I need in the command ??
When I need something like that, I use the \< and \> which mean word boundary. Like this:
grep '\<abcde1234\>'
The symbols \< and \> respectively match the empty string at the beginning and end of a word.
But that's me. The correct way might be to use the -w switch instead (which I tend to forget about):
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it
must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
One more thing: instead of find + xargs you can use just find with -exec. Or actually just grep with -r:
grep -w -r abcde1234 /path/to/dir/
$ grep abcde1234 *
This will grep the string abcde1234 in current directory, with the file name in which the string is.
Ex:
abc.log: abcde1234 found
Hi I got the answer for this.By attaching $ with word to be searched, it will display the files that contain only that word.
The command will be like this.
find /path/to/dir/ * | xargs grep abcde1234$
Thanks.

Resources