Collect info from multiple lines - bash

I need to extract certain info from multiple lines (5 lines every transaction) and make the output as csv file. These lines are coming from a maillog wherein every transaction has its own transaction id. Here's one sample transaction:
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender#domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107#server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107#server01>, from=<sender#domain>, size=2488, to=<recipient#domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient#domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
What I tried is, I made these 5 lines into 1 line and used awk to parse each column - unfortunately, the column count is not uniform.
I'm looking into getting the date/time (line 1, columns 1-3), sender, recipient, and subject (line 3, words after "CLEAN -" to the end of line)
Preferably sed or awk in bash.
Thanks!

Explanation: fileis your file.
The script initializes id and block to empty strings. At first run id takes the value of field nr. 7. After that all lines are added to block until a line doesn't match id. At that point block and id are reinitialized.
awk 'BEGIN{id="";block=""} {if (id=="") id=$6; else {if ($0~id) block= block $0; else {print block;block=$0;id=$6}}}' file
Then you're going to have to process each line of the output.

There are many ways to approach this. Here is one example calling a simple script and passing the log filename as the first argument. It will parse the requested data and save the data separated into individual variables. It simply prints the results at the end.
#!/bin/bash
[ -r "$1" ] || { ## validate input file readable
printf "error: invalid argument, file not readable '%s'\n" "$1"
exit 1
}
while read -r line; do
## set date from line containing from/sender
if grep -q -o 'from=<' <<<"$line" &>/dev/null; then
dt=$(cut -c -15 <<<"$line")
from=$(grep -o 'from=<[a-zA-Z0-9]*#[a-zA-Z0-9]*>' <<<"$line")
sender=${from##*<}
sender=${sender%>*}
fi
## search each line for CLEAN
if grep -q -o 'CLEAN.*$' <<<"$line" &>/dev/null; then
subject=$(grep -o 'CLEAN.*$' <<<"$line")
subject="${subject#*CLEAN - }"
fi
## search line for to
if grep -q -o 'to=<' <<<"$line" &>/dev/null; then
to=$(grep -o 'to=<[a-zA-Z0-9]*#[a-zA-Z0-9]*>' <<<"$line")
to=${to##*<}
to=${to%>*}
fi
done < "$1"
printf " date : %s\n from : %s\n to : %s\n subject: \"%s\"\n" \
"$dt" "$sender" "$to" "$subject"
Input
$ cat dat/mail.log
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender#domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107#server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107#server01>, from=<sender#domain>, size=2488, to=<recipient#domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient#domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
Output
$ bash parsemail.sh dat/mail.log
date : Nov 17 00:15:19
from : sender#domain
to : recipient#domain
subject: "Declaration for Shared Parental Leave Allocation System"
Note: if your from/sender is not always going to be in the first line, you can simply move those lines out from under the test clause. Let me know if you have any questions.

Related

How can I change command and option about 'date' command in bash?

I want to convert this bash command to shell script.
BASH
Input:
date --date="Wed Aug 25 22:37:44 +0900 2021" +"%s"
Output:
1629898664
SHELL
tmp.sh:
function time(a, b, c, d, e) { return date --date="a b c d +0900 e" +"%s" }
{print time($1, $2, $3, $4, $5}
timeline:
Wed Aug 25 22:37:44 2021
Command:
awk -f tmp.sh timeline
Output:
awk: tmp.sh:1: function cvtTime(w) { return date --date="Thu May 14 23:40:52 +0900 2020" +"%s" }
awk: tmp.sh:1: ^ syntax error
What about timeline file has multiple lines? Like:
Wed Aug 25 22:37:44 2021 JACK
Wed Aug 26 22:37:44 2021 EMILY
Wed Aug 27 22:37:44 2021 SAM
I tried:
#!/bin/bash
while read -r line; do
date --date="${1} ${2} ${3} ${4} +0900 ${5}" +"%s"
done
Want:
1629898664 JACK
1629985064 EMILY
1630071464 SAM
But it doesn't work :(
It seems that you want a shell script that is invoked with five command line parameters:
A weekday (in a three-letter format)
A month (in a three-letter format)
Day-of-month
A time expression (HH:MM:SS)
A year (four digits)
(Note that 1. is redundant, it is implied by 2., 3., and 5.)
Hence a somewhat minimal shell script would look sth. like this:
#!/bin/bash
date --date="${1} ${2} ${3} ${4} +0900 ${5}" +"%s"
Of course, this can be greatly improved, e.g., by adding sanity checks for the passed parameters.
In case you want to store the date information in a file so that you can pass a single filename parameter to the script instead (allowing for multiple such lines), the following variation will do:
#!/bin/bash
while read -a i; do
echo $(date --date="${i[0]} ${i[1]} ${i[2]} ${i[3]} +0900 ${i[4]}" +"%s") ${i[5]}
done < ${1}
Note, however, that this version expects an additional name parameter after the date information in each line.
In any event, no need for awk here.

Preappend timestamp in shell script which has embedded redirection

I have a unix shell script like below. I wanted to preappend a timestamp in front of every line of out.log. The general solution was create another script preappend.sh and execute the script like this:
(./a.sh 2>&1 ) | ./b.sh > out.log
However the original shell script has a line exec 2>out.log (I have commented this out below for my testing earlier). In real life this line is not commented. Could someone teach me how I would preappend the timestamp in out.log when there is a exec 2> in place?
benny
------ my script a.sh ---------
#!/bin/sh
#exec 2>out.log
set -x
echo 'hello world'
sleep 2
echo 'you rocks'
------end---------
---- preappend.sh ---
#!/bin/bash
while read line ; do
echo "$(date '+%Y%m%d %H:%M:%S'): ${line}"
done
-------end------------
Does this address the problem:
origScript.sh 2>&1 | awk '{ printf strftime() " " $0 "\n" }'
We can do a small test to check if this is working -
while [ 1 ]
do
date
(>&2 echo "error")
sleep 1
done 2>&1 | awk '{ printf strftime() " " $0 "\n" }'
It returns something like this:
Tue Sep 20 19:11:43 UTC 2016 Tue Sep 20 19:11:43 UTC 2016
Tue Sep 20 19:11:43 UTC 2016 error
Tue Sep 20 19:11:44 UTC 2016 Tue Sep 20 19:11:44 UTC 2016
Tue Sep 20 19:11:44 UTC 2016 error
...

print last occurrence of each unique line by IP in file

I need to parse a log file so that the following entries like this:
Jul 23 17:38:06 192.168.1.100 638 "this message will always be the same"
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:14:17 192.168.1.101 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."
Look like this:
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."
Basically what I am doing is taking a file that has duplicate IP addresses but with different timestamps, and finding the last occurrence (or most recent by time) of each IP address, and printing that to the screen or directing it into another file.
What I have tried:
I have written a bash script that I thought would allow me to do this but it is not working.
#!/bin/bash
/bin/grep 'common pattern to all lines' /var/log/file | awk '{print $4}' | sort - u > /home/user/iplist
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "$line"
done < "/home/user/iplist"
awk '/'$line'/ {a=$0}END{print a} ' /var/log/logfile
The script runs and outputs each IP address, but it does not print the whole line except for the last one.
ex..
192.168.100.101
192.168.100.102
192.168.100.103
Jul 23 20:20:55 192.168.100.104 "this message will always be the same."
The first command in the script takes all unique occurrences of an IP and sends that to a file. The while loop assigns a "$line" variable to each line which is then passed to awk which I thought would take each IP then search the actual file and print out the last occurrance of each one. How can I get this to work, either with a script or perhaps an awk one liner?
$ tac file | awk '!seen[$4]++' | tac
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."
You can use this awk command:
awk 'NF{a[$4]=$0} NF && !seen[$4]++{ips[++numIps]=$4} END {
for (i=1;i<=numIps;i++) print a[ips[i]] }' file
Jul 23 17:56:11 192.168.1.100 648 "this message will always be the same."
Jul 23 18:58:17 192.168.1.101 "this message will always be the same."

Execute bash command for all incoming mails (Postfix)

I want to execute a command on the body of every incoming postfix mail.
sed ':a;N;$!ba;s/=\n//g' /path-to/message-file | sed 's/</\n\</g' | sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|')"'/! s/http/httx/g'
I think it could be possible with Postfix After-Queue Content Filter, but I don't know how to do it...
EDIT:
afterqueue.sh
#!/bin/sh
# Simple shell-based filter. It is meant to be invoked as follows:
# /path/to/script -f sender recipients...
# Localize these. The -G option does nothing before Postfix 2.3.
INSPECT_DIR=/var/spool/filter
SENDMAIL="/usr/sbin/sendmail -G -i" # NEVER NEVER NEVER use "-t" here.
# Exit codes from <sysexits.h>
EX_TEMPFAIL=75
EX_UNAVAILABLE=69
# Clean up when done or when aborting.
trap "rm -f in.$$" 0 1 2 3 15
# Start processing.
cd $INSPECT_DIR || {
echo $INSPECT_DIR does not exist; exit $EX_TEMPFAIL; }
cat >in.$$ || {
echo Cannot save mail to file; exit $EX_TEMPFAIL; }
# Specify your content filter here.
sh /path/to/remove_links.sh <in.$$
$SENDMAIL "$#" <in.$$
exit $?
remove_links.sh
#!/bin/bash
sed ':a;N;$!ba;s/=\n//g' $1 | sed 's/</\n\</g' | sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' /path/to/whitelist | paste -s -d '|')"'/! s/http/httx/g'
It is working, if I call it by hand, but if I add it to the /etc/postfix/master.cf like this:
# =============================================================
# service type private unpriv chroot wakeup maxproc command
# (yes) (yes) (yes) (never) (100)
# =============================================================
filter unix - n n - 10 pipe
flags=Rq user=filter null_sender=
argv=/path/to/afterqueue.sh -f ${sender} -- ${recipient}
there are no changes in the mail.
I get the following syslog:
Apr 13 15:14:08 rs211184 postfix/qmgr[7492]: 3FFDF23CB5F: from=<test#gmail.com>, size=4358, nrcpt=1 (queue active)
Apr 13 15:14:08 rs211184 postfix/pipe[7504]: 116E523CA8C: to=<example#example.de>, relay=filter, delay=0.2, delays=0.16/0/0/0.04, dsn=2.0.0, status=sent (delivered via filter service)
Apr 13 15:14:08 rs211184 postfix/qmgr[7492]: 116E523CA8C: removed
Apr 13 15:14:08 rs211184 postfix-local[7522]: postfix-local: from=test#gmail.com, to=example#example.de, dirname=/var/qmail/mailnames
Apr 13 15:14:08 rs211184 postfix/pipe[7521]: 3FFDF23CB5F: to=<dsehlhoff#lcdev1.de>, relay=plesk_virtual, delay=0.02, delays=0.01/0/0/0.01, dsn=2.0.0, status=sent (delivered via plesk_virtual service)
Apr 13 15:14:08 rs211184 postfix/qmgr[7492]: 3FFDF23CB5F: removed
You seem to expect the message in a file, and oddly a static file name, but that's not how it works. The message arrives on standard input. Minimally, just remove /path/to/message-file -- but really, piping sed to sed is very often a mistake; you should refactor this to a single sed script (or Awk, or Python, or what have you).
sed -e ':a;N;$!ba;s/=\n//g' -e 's/</\n\</g' |
# This is too convoluted, really!
sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist |
paste -s -d '|')"'/! s/http/httx/g'

Merging CSV files : Appending instead of merging

So basically i want to merge a couple of CSV files. Im using the following script to do that :
paste -d , *.csv > final.txt
However this has worked for me in the past but this time it doesn't work. It appends the data next to each other as opposed to below each other. For instance two files that contain records in the following format
CreatedAt ID
Mon Jul 07 20:43:47 +0000 2014 4.86249E+17
Mon Jul 07 19:58:29 +0000 2014 4.86238E+17
Mon Jul 07 19:42:33 +0000 2014 4.86234E+17
When merged give
CreatedAt ID CreatedAt ID
Mon Jul 07 20:43:47 +0000 2014 4.86249E+17 Mon Jul 07 18:25:53 +0000 2014 4.86215E+17
Mon Jul 07 19:58:29 +0000 2014 4.86238E+17 Mon Jul 07 17:19:18 +0000 2014 4.86198E+17
Mon Jul 07 19:42:33 +0000 2014 4.86234E+17 Mon Jul 07 15:45:13 +0000 2014 4.86174E+17
Mon Jul 07 15:34:13 +0000 2014 4.86176E+17
Would anyone know what the reason behind this is? Or what i can do to force merge below records?
Assuming that all the csv files have the same format and all start with the same header,
you can write a little script as the following to append all files in only one and to take only one time the header.
#!/bin/bash
OutFileName="X.csv" # Fix the output name
i=0 # Reset a counter
for filename in ./*.csv; do
if [ "$filename" != "$OutFileName" ] ; # Avoid recursion
then
if [[ $i -eq 0 ]] ; then
head -1 "$filename" > "$OutFileName" # Copy header if it is the first file
fi
tail -n +2 "$filename" >> "$OutFileName" # Append from the 2nd line each file
i=$(( $i + 1 )) # Increase the counter
fi
done
Notes:
The head -1 or head -n 1 command print the first line of a file (the head).
The tail -n +2 prints the tail of a file starting from the lines number 2 (+2)
Test [ ... ] is used to exclude the output file from the input list.
The output file is rewritten each time.
The command cat a.csv b.csv > X.csv can be simply used to append a.csv and b csv in a single file (but you copy 2 times the header).
The paste command pastes the files one on a side of the other. If a file has white spaces as lines you can obtain the output that you reported above.
The use of -d , asks to paste command to define fields separated by a comma ,, but this is not the case for the format of the files you reported above.
The cat command instead concatenates files and prints on the standard output, that means it writes one file after the other.
Refer to man head or man tail for the syntax of the single options (some version allows head -1 other instead head -n 1)...
Alternative simple answer, this as combine_csv.sh:
#!/bin/bash
{ head -n 1 $1 && tail -q -n +2 $*; }
can be used like this:
pattern="my*filenames*.csv"
combine_csv.sh ${pattern} > result.csv
Thank you so much #wahwahwah.
I used your script to make nautilus-action, but it work correctly only with this changes:
#!/bin/bash
for last; do true; done
OutFileName=$last/RESULT_`date +"%d-%m-%Y"`.csv # Fix the output name
i=0 # Reset a counter
for filename in "$last/"*".csv"; do
if [ "$filename" != "$OutFileName" ] ; # Avoid recursion
then
if [[ $i -eq 0 ]] ; then
head -1 "$filename" > "$OutFileName" # Copy header if it is the first file
fi
tail -n +2 "$filename" >> "$OutFileName" # Append from the 2nd line each file
i=$(( $i + 1 )) # Increase the counter
fi
done

Resources