Is there any effective & fast way to catch two match in a log file? - bash

I would like to get some ideas.
My situation: there are tons of logs on my Linux server that are big and they are also have tons of things in them. I would like to catch ONLY the login with a timestamp and ONLY the email address from the log and collect them to a .txt file.
An example log:
[...]
2019-07-21 03:13:06.939 login
[things not needed between the two]
(mail=>example#mail.com< method=>email< cmd=>login<)
[...]
An example output:
************** 2019-07-21 **************
2019-07-21 03:13:06.939 login
example#mail.com
2019-07-21 06:22:19.424 login
example#mail.com
2019-07-21 12:10:23.665 login
example#mail.com
2019-07-21 14:26:19.068 login
example#mail.com
************** 2019-07-22 **************
2019-07-22 08:01:50.157 login
example#mail.com
2019-07-22 08:12:35.504 login
example#mail.com
2019-07-22 09:10:35.416 login
example#mail.com
To achieve this I am using this right now:
for i in $(ls); do echo "" && printf "************** " && cat $i | head -c 10 && printf " **************\n"; while read line; do echo $line | grep "login"; echo "$line" | grep -h -o -P '(?<=mail=>).*?(?=<)'; done < $i; done >> ../logins.txt
The for loop is going through the files, cat $i | head -c 10 will get the date (because that is the first thing in every log), the while loop is reading the file line-by-line and greps login and ONLY the mail address (grep between "mail=>" "<"). And at the end it is outputting to logins.txt.
While this is working I find it very-very slow because it's executing a lots of commands. (And we are talking about 2 years of logs here) And it is also looks really dirty.
I really think that there is an effective way to do this but I don't really get what would that be.

With awk use the -F for selecting the mail account:
sep='************************'
awk -v sep="$sep" -F '(mail=>|<)' '
FNR==1 { printf("%s %s %s\n", sep, substr($0,0,10), sep)}
/mail=>/ {print $2}
/login *$/ {print}
' *
When you have additional requirements and want to use a loop, consider
for f in *; do
sed -nr '
1s/(.{10}).*/********* \1 **********/p;
/login *$/p;
s/.*mail=>([^<]*).*/\1/p
' "${f}"
done

awk would do a nice job of this. You can tell it to print the line only when the line matches a particular regex. Something like:
awk '$0~/[0-9]{4}-[0-9]{2}-[0-9]{2}|\(mail=>/{print $0}' * > output.log
Updated: Noticed you just want the email. In the case, two blocks will suffice. In the second block we split by characters < or > and then retrieve the email from index 2 of the resulting array.
awk '$1~/^[0-9]{4}-[0-9]{2}-[0-9]{2}/{print $0}$1~/^\(mail=>/{split($1,a,"[<>]");print a[2]}' * > output.log
This awk says:
If the first field (where the field is delimited by awk's default of a space character) of the row we are reading starts with a date of format nnnn-nn-nn: $1~/^[0-9]{4}-[0-9]{2}-[0-9]{2}/
Then print the entire line {print $0}
If the first field of the row we are reading starts with the characters (mail=>: $1~/^\(mail=>/
Then split the first field by either characters < or > into an array named a: split($1,a,"[<>]")
Then print the 3rd item in the array (index 2): print a[2]
For all of the files in this current directory: *
Instead of printing to the command line, send the output to a file: > output.log

Related

How to send shell script output in a tablular form and send the mail

I am a shell script which will give few lines as a output. Below is the output I am getting from shell script. My script flow is like first it will check weather we are having that file, if I am having it should give me file name and modified date. If I am not having it should give me file name and not found in a tabular form and send email. Also it should add header to the output.
CMC_daily_File.xlsx Not Found
CareOneHMA.xlsx Jun 11
Output
File Name Modified Date
CMC_daily_File.xlsx Not Found
CareOneHMA.xlsx Jun 11
UPDATE
sample of script
#!/bin/bash
if [ -e /saddwsgnas/radsfftor/coffe/COE_daily_File.xlsx ]; then
cd /sasgnas/radstor/coe/
ls -la COE_daily_File.xlsx | awk '{print $9, $6"_"$7}'
else
echo "CMC_COE_daily_File.xlsx Not_Found"
fi
Output
CMC_COE_daily_File.xlsx Jun_11
I thought I might offer you some options with a slightly modified script. I use the stat command to obtain the file modification time in more expansive format, as well as specifying an arbitrary, pre-defined, spacer character to divide the column data. That way, you can focus on displaying the content in its original, untampered form. This would also allow the formatted reporting of filenames which contain spaces without affecting the logic for formatting/aligning columns. The column command is told about that spacer character and it will adjust the width of columns to the widest content in each column. (I only wish that it also allowed you to specify a column divider character to be printed, but that is not part of its features/functions.)
I also added the extra AWK action, on the chance that you might be interested in making the results stand out more.
#!/bin/sh
#QUESTION: https://stackoverflow.com/questions/74571967/how-to-send-shell-script-output-in-a-tablular-form-and-send-the-mail
SPACER="|"
SOURCE_DIR="/saddwsgnas/radsfftor/coe"
SOURCE_DIR="."
{
printf "File Name${SPACER}Modified Date\n"
#for file in COE_daily_File.xlsx
for file in test_55.sh awkReportXmlTagMissingPropertyFieldAssignment.sh test_54.sh
do
if [ -e "${SOURCE_DIR}/${file}" ]; then
cd "${SOURCE_DIR}"
#ls -la "${file}" | awk '{print $9, $6"_"$7}'
echo "${file}${SPACER}"$(stat --format "%y" "${file}" | cut -f1 -d\. | awk '{ print $1, $2 }' )
else
echo "${file}${SPACER}Not Found"
fi
done
} | column -x -t -s "|" |
awk '{
### Refer to:
# https://man7.org/linux/man-pages/man4/console_codes.4.html
# https://www.ecma-international.org/publications-and-standards/standards/ecma-48/
if( NR == 1 ){
printf("\033[93;3m%s\033[0m\n", $0) ;
}else{
print $0 ;
} ;
}'
Without that last awk command, the output session for that script was as follows:
ericthered#OasisMega1:/0__WORK$ ./test_55.sh
File Name Modified Date
test_55.sh 2022-11-27 14:07:15
awkReportXmlTagMissingPropertyFieldAssignment.sh 2022-11-05 21:28:00
test_54.sh 2022-11-27 00:11:34
ericthered#OasisMega1:/0__WORK$
With that last awk command, you get this:

How to extract phone number and Pin from each text line

Sample Text from the log file
2021/08/29 10:25:37 20210202GL1 Message Params [userid:user1] [timestamp:20210829] [from:TEST] [to:0214736848] [text:You requested for Pin reset. Your Customer ID: 0214736848 and PIN: 4581]
2021/08/27 00:03:18 20210202GL2 Message Params [userid:user1] [timestamp:20210827] [from:TEST] [to:0214736457] [text:You requested for Pin reset. Your Customer ID: 0214736457 and PIN: 6193]
2021/08/27 10:25:16 Thank you for joining our service; Your ID is 0214736849 and PIN is 5949
Other wording and formatting can change but ID and PIN don't change
Expected out put for each line
0214736848#4581
0214736457#6193
0214736849#5949
Below is what I have tried out using bash though am currently able to extract only the numeric values
while read p; do
NUM=''
counter=1;
text=$(echo "$p" | grep -o -E '[0-9]+')
for line in $text
do
if [ "$counter" -eq 1 ] #if is equal to 1
then
NUM+="$line" #concatenate string
else
NUM+="#$line" #concatenate string
fi
let counter++ #Increment counter
done
printf "$NUM\n"
done < logfile.log
Current output though not the expected.
2021#08#29#00#03#18#20210202#2#1#20210826#0214736457#0214736457#6193
2021#08#27#10#25#37#20210202#1#1#20210825#0214736848#0214736848#4581
2021#08#27#10#25#16#0214736849#5949
Another variation using gawk and 2 capture groups, matching 1 or more digits per group:
awk '
match($0, /ID: ([0-9]+) and PIN: ([0-9]+)/, m) {
print m[1]"#"m[2]
}
' file
Output
0214736848#4581
0214736457#6193
For the updated question, you could either match : or is if you want a more precise match, and the capture group values will be 2 and 4.
awk '
match($0, /ID(:| is) ([0-9]+) and PIN(:| is) ([0-9]+)/, m) {
print m[2]"#"m[4]
}
' file
Output
0214736848#4581
0214736457#6193
0214736849#5949
Using sed capture groups you can do:
sed 's/.* Your Customer ID: \([0-9]*\) and PIN: \([0-9]*\).*/\1#\2/g' file.txt
With your shown samples please try following awk code, you could simple do it with using different field separators. Simple explanation would be, making Customer ID: OR and PIN: OR ]$ as field separators and then keeping them in mind printing only 2nd and 3rd fields along with # as per required output by OP.
awk -v FS='Customer ID: | and PIN: |]$' '{print $2"#"$3}' Input_file
With bash and a regex:
while IFS='] ' read -r line; do
[[ "$line" =~ ID:\ ([^\ ]+).*PIN:\ ([^\ ]+)] ]]
echo "${BASH_REMATCH[1]}#${BASH_REMATCH[2]}"
done <file
Output:
0214736848#4581
0214736457#6193
Given the updated input in your question then using any sed in any shell on every Unix box:
$ sed 's/.* ID[: ][^0-9]*\([0-9]*\).* PIN[: ][^0-9]*\([0-9]*\).*/\1#\2/' file
0214736848#4581
0214736457#6193
0214736849#5949
Original answer:
Using any awk in any shell on every Unix box:
$ awk -v OFS='#' '{print $18, $21+0}' file
0214736848#4581
0214736457#6193

Alternating output in bash for loop from two grep

I'm trying to search through files and extract two pieces of relevant information every time they appear in the file. The code I currently have:
#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
reads=$(grep $str1 $file | cut -d ':' -f 3
samples=$(grep $str2 $file | cut -d '/' -f 8
echo $samples $reads >> reads.txt
done
It is doing each line for the file (the files have varying numbers of instances of these phrases) and gives me the output per row for each file:
PopA_15.fq 1081264
PopA_16.fq PopA_17.fq 1008416 554791
PopA_18.fq PopA_20.fq PopA_21.fq 604610 531227 595129
...
I want it to match each instance (i.e. 1st instance of both greps next two each other):
PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129
...
How do I do this? Thank you
Considering that your Input_file is same as sample shown and number of columns are even on each line with 1 PopA value and other will be with digit values. Following awk may help you in same.
awk '{for(i=1;i<=(NF/2);i++){print $i,$((NF/2)+i)}}' Input_file
Output will be as follows.
PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129
In case you want to pass output of a command to awk command then you could do like your command | awk command... no need to add Input_file to above awk command.
This is what ended up working for me...any tips for more efficient code are definitely welcome
#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
reads=$(grep $str1 $file | cut -d ':' -f 3)
samples=$(grep $str2 $file | cut -d '/' -f 8)
paste <(echo "$samples" | column -t) <(echo "$reads" | column -t) >> reads.txt
done
This provides the desired output described above.

Compare a file's contents with command output, then execute command and append file

1. File
A file /etc/ssh/ipblock contains lines that look like this:
2012-01-01 12:00 192.0.2.201
2012-01-01 14:15 198.51.100.123
2012-02-15 09:45 192.0.2.15
2012-03-12 21:45 192.0.2.14
2012-04-25 00:15 203.0.113.243
2. Command
The output of the command iptables -nL somechain looks like this:
Chain somechain (2 references)
target prot opt source destination
DROP all -- 172.18.1.4 anywhere
DROP all -- 198.51.100.123 anywhere
DROP all -- 172.20.4.16 anywhere
DROP all -- 192.0.2.125 anywhere
DROP all -- 172.21.1.2 anywhere
3. The task at hand
First I would like to get a list A of IP addresses that are existent in the iptables chain (field 4) but not in the file.
Then I would like to get a list B of IP addresses that are existent in the file but not in the iptables chain.
IP addresses in list A should then be appended to the file in the same style (date, time, IP)
IP addresses in list B should then be added to the iptables chain with
iptables -A somechain -d IP -j DROP
4. Background
I was hoping to expand my awk-fu so I have been trying to get this to work with an awk script that can be executed without arguments. But I failed.
I know I can get the output from commands with the getline command so I was able to get the time and date that way. And I also know that one can read a file using getline foo < file. But I have only had many failed attempts to combine this all into a working awk script.
I realise that I could get this to work with an other programming language or a shell script. But can this be done with an awk script that can be ran without arguments?
I think this is almost exactly what you were looking for. Does the job, all in one file, code I guess is pretty much self-explanatory...
Easily adaptable, extendable...
USAGE:
./foo.awk CHAIN ip.file
foo.awk:
#!/usr/bin/awk -f
BEGIN {
CHAIN= ARGV[1]
IPBLOCKFILE = ARGV[2]
while((getline < IPBLOCKFILE) > 0) {
IPBLOCK[$3] = 1
}
command = "iptables -nL " CHAIN
command |getline
command |getline
while((command |getline) > 0) {
IPTABLES[$4] = 1
}
close(command)
print "not in IPBLOCK (will be appended):"
command = "date +'%Y-%m-%d %H:%M'"
command |getline DATE
close(command)
for(ip in IPTABLES) {
if(!IPBLOCK[ip]) {
print ip
print DATE,ip >> IPBLOCKFILE
}
}
print "not in IPTABLES (will be appended):"
# command = "echo iptables -A " CHAIN " -s " //use for testing
command = "iptables -A " CHAIN " -s "
for(ip in IPBLOCK) {
if(!IPTABLES[ip]) {
print ip
system(command ip " -j DROP")
}
}
exit
}
Doing 1&3:
comm -13 <(awk '{print $3}' /etc/ssh/ipblock | sort) <(iptables -nL somechain | awk '/\./{print $4}' | sort) | xargs -n 1 echo `date '+%y-%m-%d %H:%M'` >> /etc/ipblock
Doing 2&4:
comm -13 <(awk '{print $3}' /etc/ssh/ipblock | sort) <(iptables -nL somechain | awk '/\./{print $4}' | sort) | xargs -n 1 iptables -A somechain -d IP -j DROP
The command is constructed of the following building blocks:
Bash process substitution feature: it is somewhat similar to pipe features, but is often used when a program requires two or more input files in its arguments/options. Bash creates fifo file, which basically "contains" the output of a given command. In our case the output will be ip adresses.
Then output of awk scripts is passed to comm program, and both awk scripts are pretty simple: they just print ip address. In first case all ips are contained in third column(hence $3), and in the second case all ips are contained in the fourth column, but it is neccessary to get rid of column header("destination" string), so simple regex is used /\./: it filters out all string that doesn't contain a dot.
comm requires both inputs to be sorted, thus output of awk is sorted using sort
Now comm program receives both lists of ip addresses. When no options are given, it prints three columns: lines unique to FILE1, lines unique to FILE2, lines in both files. By passing -23 to it we get only lines unique to FILE1. Similarly, passing -13 makes it output lines unique to FILE2.
xargs is basically a "foreach" loop in bash, it executes a given command per each input line(thanks to -n 1). The second is pretty obvious(it is the desired iptables invocation). The second one isn't complicated too: it just makes date to output current time in proper format.

How to convert HHMMSS to HH:MM:SS Unix?

I tried to convert the HHMMSS to HH:MM:SS and I am able to convert it successfully but my script takes 2 hours to complete because of the file size. Is there any better way (fastest way) to complete this task
Data File
data.txt
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,071600,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,072200,072200,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,072600,072600,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073200,073200,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073500,073500,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,073700,073700,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,073900,073900,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,074400,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,090200,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,090900,090900,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,091500,091500,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,091900,091900,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092500,092500,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092900,092900,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,093200,093200,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,093500,093500,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,094500,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,170100,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,170400,170400,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,170700,170700,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171000,171000,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171500,171500,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,171900,171900,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172500,172500,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172900,172900,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,173500,173500,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,174100,,
My code : script.sh
#!/bin/bash
awk -F"," '{print $5}' Data.txt > tmp.txt # print first line first string before , to tmp.txt i.e. all Numbers will be placed into tmp.txt
sort tmp.txt | uniq -d > Uniqe_number.txt # unique values be stored to Uniqe_number.txt
rm tmp.txt # removes tmp file
while read line; do
echo $line
cat Data.txt | grep ",$line," > Numbers/All/$line.txt # grep Number and creats files induvidtually
awk -F"," '{print $5","$4","$7","$8","$9","$10","$11}' Numbers/All/$line.txt > Numbers/All/tmp_$line.txt
mv Numbers/All/tmp_$line.txt Numbers/Final/Final_$line.txt
done < Uniqe_number.txt
ls Numbers/Final > files.txt
dos2unix files.txt
bash time_replace.sh
when you execute above script it will call time_replace.sh script
My Code for time_replace.sh
#!/bin/bash
for i in `cat files.txt`
do
while read aline
do
TimeDep=`echo $aline | awk -F"," '{print $6}'`
#echo $TimeDep
finalTimeDep=`echo $TimeDep | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'`
#echo $finalTimeDep
##########
TimeAri=`echo $aline | awk -F"," '{print $7}'`
#echo $TimeAri
finalTimeAri=`echo $TimeAri | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'`
#echo $finalTimeAri
sed -i 's/',$TimeDep'/',$finalTimeDep'/g' Numbers/Final/$i
sed -i 's/',$TimeAri'/',$finalTimeAri'/g' Numbers/Final/$i
############################
done < Numbers/Final/$i
done
Any better solution?
Appreciate any help.
Thanks
Sri
If there's a large quantity of files, then the pipelines are probably what are going to impact performance more than anything else - although processes can be cheap, if you're doing a huge amount of processing then cutting down the amount of time you do pass data through a pipeline can reap dividends.
So you're probably going to be better off writing the entire script in awk (or perl). For example, awk can send output to an arbitary file, so the while lop in your first file could be replaced with an awk script that does this. You also don't need to use a temporary file.
I assume the sorting is just for tracking progress easily as you know how many numbers there are. But if you don't care for the sorting, you can simply do this:
#!/bin/sh
awk -F ',' '
{
print $5","$4","$7","$8","$9","$10","$11 > Numbers/Final/Final_$line.txt
}' datafile.txt
ls Numbers/Final > files.txt
Alternatively, if you need to sort you can do sort -t, -k5,4,10 (or whichever field your sort keys actually need to be).
As for formatting the datetime, awk also does functions, so you could actually have an awk script that looks like this. This would replace both of your scripts above whilst retaining the same functionality (at least, as far as I can make out with a quick analysis) ... (Note! Untested, so may contain vauge syntax errors):
#!/usr/bin/awk
BEGIN {
FS=","
}
function formattime (t)
{
return substr(t,1,2)":"substr(t,3,2)":"substr(t,5,2)
}
{
print $5","$4","$7","$8","$9","formattime($10)","formattime($11) > Numbers/Final/Final_$line.txt
}
which you can save, chmod 700, and call directly as:
dostuff.awk filename
Other awk options include changing fields in-situ, so if you want to maintain the entire original file but with formatted datetimes, you can do a modification of the above. Change the print block to:
{
$10=formattime($10)
$11=formattime($11)
print $0
}
If this doesn't do everything you need it to, hopefully it gives some ideas that will help the code.
It's not clear what all your sorting and uniq-ing is for. I'm assuming your data file has only one entry per line, and you need to change the 10th and 11th comma-separated fields from HHMMSS to HH:MM:SS.
while IFS=, read -a line ; do
echo -n ${line[0]},${line[1]},${line[2]},${line[3]},
echo -n ${line[4]},${line[5]},${line[6]},${line[7]},
echo -n ${line[8]},${line[9]},
if [ -n "${line[10]}" ]; then
echo -n ${line[10]:0:2}:${line[10]:2:2}:${line[10]:4:2}
fi
echo -n ,
if [ -n "${line[11]}" ]; then
echo -n ${line[11]:0:2}:${line[11]:2:2}:${line[11]:4:2}
fi
echo ""
done < data.txt
The operative part is the ${variable:offset:length} construct that lets you extract substrings out of a variable.
In Perl, that's close to child's play:
#!/usr/bin/env perl
use strict;
use warnings;
use English( -no_match_vars );
local($OFS) = ",";
while (<>)
{
my(#F) = split /,/;
$F[9] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[9];
$F[10] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[10];
print #F;
}
If you don't want to use English, you can write local($,) = ","; instead; it controls the output field separator, choosing to use comma. The code reads each line in the file, splits it up on the commas, takes the last two fields, counting from zero, and (if they're not empty) inserts colons in between the pairs of digits. I'm sure a 'Code Golf' solution would be made a lot shorter, but this is semi-legible if you know any Perl.
This will be quicker by far than the script, not least because it doesn't have to sort anything, but also because all the processing is done in a single process in a single pass through the file. Running multiple processes per line of input, as in your code, is a performance disaster when the files are big.
The output on the sample data you gave is:
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,07:16:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,07:26:00,07:26:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:32:00,07:32:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:35:00,07:35:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,07:37:00,07:37:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,07:39:00,07:39:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:44:00,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,09:02:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:09:00,09:09:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:15:00,09:15:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,09:19:00,09:19:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:25:00,09:25:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:29:00,09:29:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,09:32:00,09:32:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,09:35:00,09:35:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:45:00,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,17:01:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,17:04:00,17:04:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,17:07:00,17:07:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:10:00,17:10:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:15:00,17:15:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,17:19:00,17:19:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:25:00,17:25:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:29:00,17:29:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:35:00,17:35:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:41:00,,

Resources