fgrep Match string patterns with a file - macos

I used fgrep -f to extract lines from genedictionary.txt which match patterns (Ids) from pattern file (id.txt)
My command fgrep -f id.txt genedictionary.txt > result.txt is not giving any results.
How to modify this command to get the result as I want
My pattern file(id.txt) looks like (one Id in each line)
P04083
P50995
Q9UJ72
P13747
A23444
My other file against which I should match these patterns looks like
ANXA1_HUMAN#SWISSPROT|P04083#SWISSPROT|ANXA1:ANXA1|
ANX10_HUMAN#SWISSPROT|Q9UJ72#SWISSPROT|ANXA10:ANXA10|
ANX11_HUMAN#SWISSPROT|P50995#SWISSPROT|ANXA11:ANXA11|
ANX13_HUMAN#SWISSPROT|P27216#SWISSPROT|ANXA13:ANXA13|
HLF_HUMAN#SWISSPROT|Q16534#SWISSPROT|HLF:HLF|
Output should be
ANXA1_HUMAN#SWISSPROT|P04083#SWISSPROT|ANXA1:ANXA1|
ANX10_HUMAN#SWISSPROT|Q9UJ72#SWISSPROT|ANXA10:ANXA10|
ANX11_HUMAN#SWISSPROT|P50995#SWISSPROT|ANXA11:ANXA11

The first pattern P04083 has a space at the end, so it doesn't match anything in genedictionary.txt.

Related

Extracting all but a certain sequence of characters in Bash

In bash I need to extract a certain sequence of letters and numbers from a filename. In the example below I need to extract just the S??E?? section of the filenames. This must work with both upper/lowercase.
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
Expected output would be:
s01e02
s03e12
S05E11
I've been trying to do this with SED but can't get it to work. This is what I have tried, without success:
sed 's/.*s[0-9][0-9]e[0-9][0-9].*//'
Many thanks for any help.
With sed we can match the desired string in a capture group, and use the I suffix for case-insensitive matching, to accomplish the desired result.
For the sake of this answer I'm assuming the filenames are in a file:
$ cat fnames
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
One sed solution:
$ sed -E 's/.*\.(s[0-9][0-9]e[0-9][0-9])\..*/\1/I' fnames
s01e02
s03e12
S05E11
Where:
-E - enable extended regex support
\.(s[0-9][0-9]e[0-9][0-9])\. - match s??e?? with a pair of literal periods as bookends; the s??e?? (wrapped in parens) will be stored in capture group #1
\1 - print out capture group #1
/I - use case-insensitive matching
I think your pattern is ok. With the grep -o you get only the matched part of a string instead of matching lines. So
grep -io 'S[0-9]{2}E[0-9]{2}'
solves your problem. Compared to your pattern only numbers will be matched. Maybe you can put it in an if, so lines without a match show that something is wrong with the filename.
Suppose you have those file names:
$ ls -1
great.s03e12.h264.Dolby.mkv
my.show.s01e02.h264.aac.subs.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
You can extract the substring this way:
$ printf "%s\n" * | sed -E 's/^.*([sS][0-9][0-9][eE][0-9][0-9]).*/\1/'
Or with grep:
$ printf "%s\n" *.m* | grep -o '[sS][0-9][0-9][eE][0-9][0-9]'
Either prints:
s03e12
s01e02
S05E11
You could use that same sed or grep on a file (with filenames in it) as well.

Grepping for exact string while ignoring regex for dot character

So here's my issue. I need to develop a small bash script that can grep a file containing account names (let's call it file.txt). The contents would be something like this:
accounttest
account2
account
accountbtest
account.test
Matching an exact line SHOULD be easy but apparently it's really not.
I tried:
grep "^account$" file.txt
The output is:
account
So in this situation the output is OK, only "account" is displayed.
But if I try:
grep "^account.test$" file.txt
The output is:
accountbtest
account.test
So the next obvious solution that comes to mind, in order to stop interpreting the dot character as "any character", is using fgrep, right?
fgrep account.test file.txt
The output, as expected, is correct this time:
account.test
But what if I try now:
fgrep account file.txt
Output:
accounttest
account2
account
accountbtest
account.test
This time the output is completely wrong, because I can't use the beginning/end line characters with fgrep.
So my question is, how can I properly grep a whole line, including the beginning and end of line special characters, while also matching exactly the "." character?
EDIT: Please note that I do know that the "." character needs to be escaped, but in my situation, escaping is not an option, because of further processing that needs to be done to the account name, which would make things too complicated.
The . is a special character in regex notation which needs to be escaped to match it as a literal string when passing to grep, so do
grep "^account\.test$" file.txt
Or if you cannot afford to modify the search string use the -F flag in grep to treat it as literal string and not do any extra processing in it
grep -Fx 'account.test' file.txt
From man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is like parenthesizing the pattern and then surrounding it with ^ and $.
fgrep is the same as grep -F. grep also has the -x option which matches against whole lines only. You can combine these to get what you want:
grep -Fx account.test file.txt

grep ignore if the word searched begin/end with a specific character

Here is my problem : I use grep to find a string into multiple files.
Let's say I am looking for the word "balloon". grep is returning lines containing balloon like "Here is a balloon", "loginxxballoonx", "balloon123" etc. This is not a problem except for one case : I want to ignore the line if it finds "/balloon/".
How can I look for every "balloon" strings in multiple files, but ignore those with / before and after (ignore "/balloon/")
EDIT : I will precise my problem a bit more : my strings to search for are stored in a file. I use grep -f mytokenfile to search for every strings stored in my "mytokenfile" file. For example, my file "mytokenfile" looks like this :
balloon
avion
car
bus
I would like to get all the lines containing these strings, with or without prefixes/suffixes, except if the prefix and suffix are "/".
Should work by using the negation sign ^
grep [^/]balloon[^/] ballonfile
Edit:
But this doesn't work if there is a 'balloon' not prefixed or suffixed by any other characters.
Use the following approach(considering that there could be a line with multiple occurrences of search keyword such as loginxxballoonx, sme text /balloon/ text):
cat testfile | grep '[^/]balloon[^/]' | grep -v '/balloon/'
-v (--invert-match)
Invert the sense of matching, to select non-matching lines. (-v is
specified by POSIX.)

how to grep the following

I have an input file
RAKESH_ONE
RAKESH-TWO
RAKESH123
RAKESHTHREE
/RAKESH/
FIVERAKESH
456RAKESH
WELCOME123
This is RAKESH
I would like to get the output
RAKESH_ONE
RAKESH-TWO
/RAKESH/
This is RAKESH
I want to print the line matching the pattern RAKESH. If the pattern is prefixed or suffixed with alphanumeric we should avoid it.
([^a-zA-Z0-9]+|^)RAKESH([^a-zA-Z0-9]+|$)
This will match patterns on the lines without alphanumeric prefixes or suffixes. It will not match the whole line, but if used with grep or sed you can output just the lines you need.
UPDATE
As requested, here's the full grep command. Use the -E option to use extended regex:
grep -E "([^a-zA-Z0-9]+|^)RAKESH([^a-zA-Z0-9]+|$)" file.txt

Exclude lines from output based on patterns in file

I have a couple of IP addresses in a text file. I have another file in which each line contains an IP along with some other data. Samples below,
pattern.txt (the IPs to exclude) :
A.B.C.D
E.F.G.H
I.J.K.L
target.txt (the main list) :
server1,L.M.N.O,user1
server2,A,B.C.D,user2
server3,P.Q.R.S,user3
Now I need to create a rule (preferably a one-liner) which lists only those lines in "target.txt", whose IP addresses are NOT present in "pattern.txt". The required output is as below,
server1,L.M.N.O,user1
server3,P.Q.R.S,user3
I tried using this cat target.txt | grep -f pattern.txt. This doesn't get the job done though: it simply highlights the IPs I want to exclude, but doesn't actually exclude them from the output. What am I doing wrong?
You can say:
grep -F -v -f pattern.txt target.txt
Supplying -F would interpret the patterns as fixed strings, so . in the IP addressed wouldn't match any arbitrary character.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)

Resources