print last occurrence of each unique line by IP in file - bash

I need to parse a log file so that the following entries like this:
Jul 23 17:38:06 192.168.1.100 638 "this message will always be the same"
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:14:17 192.168.1.101 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."
Look like this:
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."
Basically what I am doing is taking a file that has duplicate IP addresses but with different timestamps, and finding the last occurrence (or most recent by time) of each IP address, and printing that to the screen or directing it into another file.
What I have tried:
I have written a bash script that I thought would allow me to do this but it is not working.
#!/bin/bash
/bin/grep 'common pattern to all lines' /var/log/file | awk '{print $4}' | sort - u > /home/user/iplist
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "$line"
done < "/home/user/iplist"
awk '/'$line'/ {a=$0}END{print a} ' /var/log/logfile
The script runs and outputs each IP address, but it does not print the whole line except for the last one.
ex..
192.168.100.101
192.168.100.102
192.168.100.103
Jul 23 20:20:55 192.168.100.104 "this message will always be the same."
The first command in the script takes all unique occurrences of an IP and sends that to a file. The while loop assigns a "$line" variable to each line which is then passed to awk which I thought would take each IP then search the actual file and print out the last occurrance of each one. How can I get this to work, either with a script or perhaps an awk one liner?

$ tac file | awk '!seen[$4]++' | tac
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."

You can use this awk command:
awk 'NF{a[$4]=$0} NF && !seen[$4]++{ips[++numIps]=$4} END {
for (i=1;i<=numIps;i++) print a[ips[i]] }' file
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."

Related

When comparing two list of IP addresses with bash, duplicate lines are printed

I am trying to write a function that compares values in 2 files.
I have a logfile from which I have extracted unique IP addresses.
I have another file that has a list of domains in them which are "bad domians".
The objective is to print the list of IP addresses and also compare with the bad domains list and if a match is found, we need to prefix "bad address".
Conceptually, I can save the IP result to a file, and the domain result to a variable, use the while read loop on the IP file, a for loop on the domain variable and use grep to see if a pattern is found. If yes, add the prefix, else print normally. Yes, seems a bit time consuming but that is the idea.
list=`dig +short -f dns.blacklist.txt`
awk '{ print $1 }' $logfile | sort | uniq -c | sort -nr | awk '{print $2 "\t" $1}' >> response
while read -r listed
do
for x in $list
do
if [ "$(echo $listed | grep -F $x )" ]; then
echo $listed "*bad domain!*"
else
#echo $listed
fi
done
done < response | uniq
It does find the bad IP and adds the prefix, problem is, it creates a duplicate.
What it should be
213.64.237.230 2438
213.64.225.123 1202 *bad domain!*
213.64.141.89 731
213.64.214.124 480
.
.
.
What it shows
213.64.237.230 2438
213.64.225.123 1202
213.64.225.123 1202 *bad domain!*
213.64.225.123 1202
213.64.141.89 731
213.64.214.124 480
.
.
.
I fail to see why a duplicate is being made. If I remove the else condition and push the normal print to after, it still prints the IP below while it shouldnt.
Do note that the results are being piped to the uniq command.
I need a hint on where I am wrong and how I can mitigate this error.
Since I did not have access to your logfile or your dns.blacklist.txt files. I created dummy files using some of the ip addresses you listed. I refactored your loops and your if check and was able to solve your duplicate output issue.
#!/bin/bash
# used instead of your list=`dig +short -f dns.blacklist.txt`
bad=/tmp/bad.txt
# used instead of your logfile parsing/sorting/ etc...
all=/tmp/all.txt
# read blacklisted ips into a bash variable
list=$(<${bad})
# for each ip parsed from your logfile count number of times
# it is found in blacklisted ips, if greater than 0 then flag
# as a bad domain
while read -r listed
do
if [[ $(grep -c "${listed}" <<<${list}) -gt 0 ]]; then
echo "${listed} *bad domain!*"
else
echo "${listed}"
fi
done <${all}
Contents of bad.txt:
213.64.225.123 1202
Contents of all.txt:
213.64.237.230 2438
213.64.225.123 1202
213.64.141.89 731
213.64.214.124 480
Example output:
$ ./script.sh
213.64.237.230 2438
213.64.225.123 1202 *bad domain!*
213.64.141.89 731
213.64.214.124 480

Printing the same contiguous lines only once using shell/awk

I have an input as below:
Sep 9 09:22:11
Hello
Hello
Sep 9 10:23:11
Hello
Hello
Hello
Sep 10 11:23:11
I expect the output as below: (the same contiguous lines are replaced by only one line)
Sep 9 09:22:11
Hello
Sep 9 10:23:11
Hello
Sep 10 11:23:11
Could anyone help me solving this one fast using shell or awk ?
Using awk you can do this:
awk '$0 != prev; {prev=$0}' file
Sep 9 09:22:11
Hello
Sep 9 10:23:11
Hello
Sep 10 11:23:11
Command Breakup:
$0 != prev; # if previous line is not same as current then print it
{prev=$0} # store current line in a variable called prev
To remove repeats of lines, use uniq:
uniq File
With your sample input, for example:
$ uniq File
Sep 9 09:22:11
Hello
Sep 9 10:23:11
Hello
Sep 10 11:23:11
Although its name may imply that uniq concerns itself with unique lines, it does not: it looks for adjacent repeated lines and, by default, removes the repeats.
Just because you asked for shell too, though the given answers are all better solutions -
last=''
while read line
do if [[ "$line" -eq "$last" ]]
then continue
else echo "$line"
last="$line"
fi
done < infile
This is simple, clear, and likely slower than either awk or uniq.

Collect info from multiple lines

I need to extract certain info from multiple lines (5 lines every transaction) and make the output as csv file. These lines are coming from a maillog wherein every transaction has its own transaction id. Here's one sample transaction:
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender#domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107#server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107#server01>, from=<sender#domain>, size=2488, to=<recipient#domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient#domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
What I tried is, I made these 5 lines into 1 line and used awk to parse each column - unfortunately, the column count is not uniform.
I'm looking into getting the date/time (line 1, columns 1-3), sender, recipient, and subject (line 3, words after "CLEAN -" to the end of line)
Preferably sed or awk in bash.
Thanks!
Explanation: fileis your file.
The script initializes id and block to empty strings. At first run id takes the value of field nr. 7. After that all lines are added to block until a line doesn't match id. At that point block and id are reinitialized.
awk 'BEGIN{id="";block=""} {if (id=="") id=$6; else {if ($0~id) block= block $0; else {print block;block=$0;id=$6}}}' file
Then you're going to have to process each line of the output.
There are many ways to approach this. Here is one example calling a simple script and passing the log filename as the first argument. It will parse the requested data and save the data separated into individual variables. It simply prints the results at the end.
#!/bin/bash
[ -r "$1" ] || { ## validate input file readable
printf "error: invalid argument, file not readable '%s'\n" "$1"
exit 1
}
while read -r line; do
## set date from line containing from/sender
if grep -q -o 'from=<' <<<"$line" &>/dev/null; then
dt=$(cut -c -15 <<<"$line")
from=$(grep -o 'from=<[a-zA-Z0-9]*#[a-zA-Z0-9]*>' <<<"$line")
sender=${from##*<}
sender=${sender%>*}
fi
## search each line for CLEAN
if grep -q -o 'CLEAN.*$' <<<"$line" &>/dev/null; then
subject=$(grep -o 'CLEAN.*$' <<<"$line")
subject="${subject#*CLEAN - }"
fi
## search line for to
if grep -q -o 'to=<' <<<"$line" &>/dev/null; then
to=$(grep -o 'to=<[a-zA-Z0-9]*#[a-zA-Z0-9]*>' <<<"$line")
to=${to##*<}
to=${to%>*}
fi
done < "$1"
printf " date : %s\n from : %s\n to : %s\n subject: \"%s\"\n" \
"$dt" "$sender" "$to" "$subject"
Input
$ cat dat/mail.log
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender#domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107#server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107#server01>, from=<sender#domain>, size=2488, to=<recipient#domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient#domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
Output
$ bash parsemail.sh dat/mail.log
date : Nov 17 00:15:19
from : sender#domain
to : recipient#domain
subject: "Declaration for Shared Parental Leave Allocation System"
Note: if your from/sender is not always going to be in the first line, you can simply move those lines out from under the test clause. Let me know if you have any questions.

BASH grep with multiple parameters + n lines after one of the matches

I have a bunch of text as a output from command, I need to display only specific matching lines plus some additional lines after match "message" (message text is obviously longer than 1 line)
what I tried was:
grep -e 'Subject:' -e 'Date:' -A50 -e 'Message:'
but it included 50 lines after EACH match, and I need to pass that only to single parameter. How would I do that?
code with output command:
(<...> | telnet <mailserver> 110 | grep -e 'Subject:' -e 'Date:' -A50 -e 'Message:'
Part of the telnet output:
Date: Tue, 10 Sep 2013 16
Message-ID: <00fb01ceae25$
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_00FC_01CEAE3E.DE32CE40"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Ac6uJWYdA3lUzs1cT8....
Content-Language: lt
X-Mailman-Approved-At: Tue, 10 Sep 2013 16:0 ....
Subject: ...
X-BeenThere: ...
Precedence: list
Try following:
... | telnet ... > <file>
grep -e 'Subject:' -e 'Date:' <file> && grep -A50 -e 'Message:' <file>
Will need to dump the output to a file first.
This can be done with awk as well, without the need for dumping output to a file.
... | telnet ... | awk '/Date:/ {print}; /Subject:/ {print}; /Message:/ {c=50} c && c--'
With grep it would be hard to do. Better use awk for this
awk '/Subject:|Date:/;/Message:/ {while(l<=50){print $0;l++;getline}}'
Here the awk prints 50 lines below the Message: pattern and only one line is printed for all other patterns.

bash and telnet to test an email

I'm trying to find out whether an email address is valid.
I've accomplished this by usign telnet, see below
$ telnet mail.example.com 25
Trying 0.0.0.0...
Connected to mail.example.com.
Escape character is '^]'.
220 mail.example.com Mon, 14 Jan 2013 19:01:44 +0000
helo email.com
250 mail.example.com Hello email.com [0.0.0.0]
mail from:blake#email.com
250 OK
rcpt to:gfdgsdfhgsfd#example.com
550 Unknown user
with this 550 request i know that the address is not valid on the mail server... if it was valid i would get a response like the below:
250 2.1.5 OK
How would I automate this in a shell script? so far I have the below
#!/bin/bash
host=`dig mx +short $1 | cut -d ' ' -f2 | head -1`
telnet $host 25
Thanks!
Try doing this :
[[ $4 ]] || {
printf "Usage\n\t$0 <domain> <email> <from_email> <rcpt_email>\n"
exit 1
}
{
sleep 1
echo "helo $2"
sleep 0.5
echo "mail from:<$3>"
sleep 0.5
echo "rcpt to:<$4>"
echo
} | telnet $1 25 |
grep -q "Unknown user" &&
echo "Invalid email" ||
echo "Valid email"
Usage :
./script.sh domain email from_email rcpt_email
You could always enter your commands into a plain text file, line after line, just as if you typed them on the command line. Then you can use something like
cat commands.txt | telnet mail.example.com 25 | grep -i '550 Unknown User'
Since you will probably need to consider this text file as template, (I am assuming you will probably want to parameterize the e-mail address) you may need to insert a call to awk to take the output of 'cat commands.txt' and insert your e-mail address.
variables to change
BODY="open realy smtp test"
SMTP-SRV="server_ip"
SMTP-PORT="25"
RCPT="name#domain"
SRC="name#domain"
then run in bash
/bin/nc ${SMTP-SRV} ${SMTP-PORT} << EOL
ehlo example_domain.com
mail from:${SRC}
RCPT to:${RCPT}
data
From:${SRC}
To:${RCPT}
subject: Telnet test
${BODY}
.
quit
EOL

Resources