Exclude lines from output based on patterns in file - bash

I have a couple of IP addresses in a text file. I have another file in which each line contains an IP along with some other data. Samples below,
pattern.txt (the IPs to exclude) :
A.B.C.D
E.F.G.H
I.J.K.L
target.txt (the main list) :
server1,L.M.N.O,user1
server2,A,B.C.D,user2
server3,P.Q.R.S,user3
Now I need to create a rule (preferably a one-liner) which lists only those lines in "target.txt", whose IP addresses are NOT present in "pattern.txt". The required output is as below,
server1,L.M.N.O,user1
server3,P.Q.R.S,user3
I tried using this cat target.txt | grep -f pattern.txt. This doesn't get the job done though: it simply highlights the IPs I want to exclude, but doesn't actually exclude them from the output. What am I doing wrong?

You can say:
grep -F -v -f pattern.txt target.txt
Supplying -F would interpret the patterns as fixed strings, so . in the IP addressed wouldn't match any arbitrary character.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)

Related

Checking if strings in one file occur in another set of files, list those that don't

I have a situation where I have a large number of files that I need to check if they contain a string listed, one per line, in another file and to report where they do not. The string matching file in a list of VLANs and the large number of files to be checked are the period outputs of 'show mac-address' from our Core Switches, these being a bunch of txt files. I am using a Linux Bash shell.
I can cover of the matching using grep easy enough with...
cat *.txt > [MAC-File] && fgrep -of [VLAN-File] [MAC-File] | sort -h | uniq -c
Which gives me a list of the VLANs that match and the number of lines in the txt files the do. That's progress but what I need is to find the VLANs that don't have MAC addresses seen in them so I need to turn the logic around. My searching tells me grep doesn't have an opposite condition to the -o so I need to find an alternative. This is to be applied against 3 very large LANs each with hundreds of VLANs in them, and I don't want to input the results into an Excel spreadsheet!
Please note the files I am checking against have more data per line that just the VLAN number so compare lines does not work.
The first file with the strings to be looked for (or not for!) is in the format..
100
103
230
Note I have space before and after each number to make them unique so they only match the second column of the large data file I am checking which is in the format
6c4b-904b-0c5c 230 Learned BAGG103 Y
Since you make use of fgrep, which is synonymous to grep -F we know that the pattern file are fixed strings. To find which patters did not match, you use the following method:
$ grep -oFf pattern_file search_file | grep -voFf - pattern_file
In case of the OP, this becomes:
$ grep -oFf [VLAN-File] [MAC-File] | grep -voFf - [VLAN-File]
You can also do this with awk in a single go:
$ awk '(NR==FNR){a[$0];next}($2 in a){a[$2]++}END{for(i in a) if (a[i]==0) print i}' [VLAN-File] [MAC-File]
The above works for exact matches, so no need to have the extra spaces. If you want to keep the extra spaces, it is a bit more tricky:
$ awk '(NR==FNR){a[$0];next}
{for(i in a) if (i ~ $0) a[i]++}
END{for(i in a) if (a[i]==0) print i}' [VLAN-File] [MAC-File]
All the above will print the VLAN-File entries that do not appear in the MAC-File

grep -Ff producing invalid output

I'm using
code -
grep -Ff list.txt C:/data/*.txt > found.txt
but it keeps outputting invalid responses, lines don't contain the emails i input..
list.txt contains -
email#email.com
customer#email.com
imadmin#gmail.com
newcustomer#email.com
helloworld#yes.com
and so on.. email to match on each line,
search files contain -
user1:phonenumber1:email#email.com:last-active:recent
user2:phonennumber2:customer#email.com:last-active:inactive
user3:phonenumber3:blablarandom#bla.com:last-active:never
then another may contain -
blublublu email#email.com phonenumber subscribed
nanananana customer#email.com phonenumber unsubscribed
useruser noemailinput#noemail.com phonenumber pending
so what I'm trying to do is present grep with a list of emails/list of strings " list.txt " and to then search the directory provided for matches of each string and output the entire line that contains each match.
example of output in this case would be -
user1:phonenumber1:email#email.com:last-active:recent
user2:phonennumber2:customer#email.com:last-active:inactive
blublublu email#email.com phonenumber subscribed
nanananana customer#email.com phonenumber unsubscribed
yet it wouldn't output the other two lines -
user3:phonenumber3:blablarandom#bla.com:last-active:never
useruser noemailinput#noemail.com phonenumber pending
because no string is within that line.
The file list.txt probably contains empty lines or some of the separators. When I added : to list.txt, all the lines from the first sample started to match. Similarly, adding a space made all the lines from the second sample match. Adding # causes the same symptoms.
Try running grep -oFf ... (if your grep supports -o) to see the exact matching parts. If there are empty lines in list.txt, the number of matches will be less than the number of matches without -o. Try searching the output of -o for extremely short outputs to check for suspicious strings. You can also examine the shortest lines in list.txt.
while read line ; do echo ${#line} "$line" ; done < list.txt | sort -nk1,1
I think your file list.txt may have blank lines in it, causing it to match every line in the files specified with C:/data/*.txt. To fix you can either manually delete every empty line or run the command sed -i '/^$/d' list.txt where the -i flag edits the file in place.
The issue may also be related to dos carriage returns, try running: cat -v list.txt and checking if the lines are followed by ^M:
email#email.com^M
customer#email.com^M
If this is the case you will need to amend the file using either dos2unix or tr -d '\r' < list.txt > output.txt.

grep ignore if the word searched begin/end with a specific character

Here is my problem : I use grep to find a string into multiple files.
Let's say I am looking for the word "balloon". grep is returning lines containing balloon like "Here is a balloon", "loginxxballoonx", "balloon123" etc. This is not a problem except for one case : I want to ignore the line if it finds "/balloon/".
How can I look for every "balloon" strings in multiple files, but ignore those with / before and after (ignore "/balloon/")
EDIT : I will precise my problem a bit more : my strings to search for are stored in a file. I use grep -f mytokenfile to search for every strings stored in my "mytokenfile" file. For example, my file "mytokenfile" looks like this :
balloon
avion
car
bus
I would like to get all the lines containing these strings, with or without prefixes/suffixes, except if the prefix and suffix are "/".
Should work by using the negation sign ^
grep [^/]balloon[^/] ballonfile
Edit:
But this doesn't work if there is a 'balloon' not prefixed or suffixed by any other characters.
Use the following approach(considering that there could be a line with multiple occurrences of search keyword such as loginxxballoonx, sme text /balloon/ text):
cat testfile | grep '[^/]balloon[^/]' | grep -v '/balloon/'
-v (--invert-match)
Invert the sense of matching, to select non-matching lines. (-v is
specified by POSIX.)

how to grep the following

I have an input file
RAKESH_ONE
RAKESH-TWO
RAKESH123
RAKESHTHREE
/RAKESH/
FIVERAKESH
456RAKESH
WELCOME123
This is RAKESH
I would like to get the output
RAKESH_ONE
RAKESH-TWO
/RAKESH/
This is RAKESH
I want to print the line matching the pattern RAKESH. If the pattern is prefixed or suffixed with alphanumeric we should avoid it.
([^a-zA-Z0-9]+|^)RAKESH([^a-zA-Z0-9]+|$)
This will match patterns on the lines without alphanumeric prefixes or suffixes. It will not match the whole line, but if used with grep or sed you can output just the lines you need.
UPDATE
As requested, here's the full grep command. Use the -E option to use extended regex:
grep -E "([^a-zA-Z0-9]+|^)RAKESH([^a-zA-Z0-9]+|$)" file.txt

grep exact pattern from a file in bash

I have the following IP addresses in a file
3.3.3.1
3.3.3.11
3.3.3.111
I am using this file as input file to another program. In that program it will grep each IP address. But when I grep the contents I am getting some wrong outputs.
like
cat testfile | grep -o 3.3.3.1
but I am getting output like
3.3.3.1
3.3.3.1
3.3.3.1
I just want to get the exact output. How can I do that with grep?
Use the following command:
grep -owF "3.3.3.1" tesfile
-o returns the match only and not the whole line.-w greps for whole words, meaning the match must be enclosed in non word chars like <space>, <tab>, ,, ; the start or the end of the line etc. It prevents grep from matching 3.3.3.1 out of 3.3.3.111.
-F greps for fixed strings instead of patterns. This prevents the . in the IP address to be interpreted as any char, meaning grep will not match 3a3b3c1 (or something like this).
To match whole words only, use grep -ow 3.3.3.1 testfile
UPDATE: Use the solution provided by hek2mgl as it is more robust.
You may use anhcors.
grep '^3\.3\.3\.1$' file
Since by default grep uses regex, you need to escape the dots in-order to make grep to match literal dot character.

Resources