Get package name and corr. data from file - bash

I've been banging my head lately,trying to parse dumpsys output.
Here is the output:
NotificationRecord(0x4297d448: pkg=com.android.systemui user=UserHandle{0} id=273 tag=null score=0: Notification(pri=0 icon=7f020148 contentView=com.android.systemui/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x2 when=0 ledARGB=0x0 contentIntent=N deleteIntent=N contentTitle=6 contentText=15 tickerText=6 kind=[null]))
uid=10012 userId=0
icon=0x7f020148 / com.android.systemui:drawable/stat_sys_no_sim
pri=0 score=0
contentIntent=null
deleteIntent=null
tickerText=No SIM
contentView=android.widget.RemoteViews#429c1f58
defaults=0x00000000 flags=0x00000002
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=No SIM
android.subText=null
android.showChronometer=false
android.icon=2130837832
android.text=Insert SIM card
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
NotificationRecord(0x427e1878: pkg=jackpal.androidterm user=UserHandle{0} id=1 tag=null score=0: Notification(pri=0 icon=7f02000d contentView=jackpal.androidterm/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x62 when=1456782124817 ledARGB=0x0 contentIntent=Y deleteIntent=N contentTitle=17 contentText=27 tickerText=27 kind=[null]))
uid=10094 userId=0
icon=0x7f02000d / jackpal.androidterm:drawable/ic_stat_service_notification_icon
pri=0 score=0
contentIntent=PendingIntent{42754f78: PendingIntentRecord{42802aa0 jackpal.androidterm startActivity}}
deleteIntent=null
tickerText=Terminal session is running
contentView=android.widget.RemoteViews#4279b510
defaults=0x00000000 flags=0x00000062
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=Terminal Emulator
android.subText=null
android.showChronometer=false
android.icon=2130837517
android.text=Terminal session is running
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
NotificationRecord(0x429381f8: pkg=com.droidsail.dsapp2sd user=UserHandle{0} id=128 tag=null score=0: Notification(pri=0 icon=7f020000 contentView=com.droidsail.dsapp2sd/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x10 when=1456786729004 ledARGB=0x0 contentIntent=Y deleteIntent=N contentTitle=13 contentText=35 tickerText=35 kind=[null]))
uid=10107 userId=0
icon=0x7f020000 / com.droidsail.dsapp2sd:drawable/appicon
pri=0 score=0
contentIntent=PendingIntent{42955a60: PendingIntentRecord{4286db18 com.droidsail.dsapp2sd startActivity}}
deleteIntent=null
tickerText=Detected new app can be moved to SD
contentView=android.widget.RemoteViews#42a891a8
defaults=0x00000000 flags=0x00000010
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=New app to SD
android.subText=null
android.showChronometer=false
android.icon=2130837504
android.text=Detected new app can be moved to SD
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
NotificationRecord(0x423708b0: pkg=android user=UserHandle{-1} id=17041135 tag=null score=0: Notification(pri=0 icon=1080399 contentView=android/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x1002 when=0 ledARGB=0x0 contentIntent=Y deleteIntent=N contentTitle=19 contentText=17 tickerText=N kind=[android.system.imeswitcher]))
uid=1000 userId=-1
icon=0x1080399 / android:drawable/ic_notification_ime_default
pri=0 score=0
contentIntent=PendingIntent{425a8960: PendingIntentRecord{426f84b0 android broadcastIntent}}
deleteIntent=null
tickerText=null
contentView=android.widget.RemoteViews#428846b8
defaults=0x00000000 flags=0x00001002
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=Choose input method
android.subText=null
android.showChronometer=false
android.icon=17302425
android.text=Hacker's Keyboard
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
I want to get the package name and the corresponding extras={}
for each of them.
For example:
pkg:com.android.systemui
extras={
.....
}
So far I've tried:
dumpsys notification | awk '/pkg=/,/\n}/'
But without any success.
I'm a newbie to awk,and if possible I want to do it with awk or perl.Of course,any other tool like sed or grep is fine by me too,I just wanna parse it somehow.
Can anyone help me?

If you have GNU awk, try the following:
awk -v RS='(^|\n)NotificationRecord\\([^=]+=' \
'NF { print "pkg:" $1; print gensub(/^.*\n\s*(extras=\{[^}]+\}).*$/, "\\1", 1) }' file
-v RS='(^|\n)NotificationRecord\\([^=]+=' breaks the input into records by lines starting with NotificationRecord( up to and including the following = char.
In effect, that means you get records starting with the package names (com.android.systemui, ...`)
NF is a condition that only executes the following block if it evaluates to nonzero; NF is the count of fields in the record, so as long as at least 1 field is present, the block is evaluated - in effect, this skips the implied empty record before the very first line.
print "pkg:" $1 prints the package name, prefixed with literal pkg:.
gensub(/^.*\n\s*(extras=\{[^}]+\}).*$/, "\\1", 1) matches the entire record and replaces it with the extras property captured via a capture group, effectively returning the extras property only.

I would suggest perl over awk, because you'll be storing whether you're inside the extras=... block in a variable:
dumpsys notification | perl -lne '
print $1 if /^Notif.*?: pkg=(\S+)/;
$in_extras = 0 if /^ \}/;
print if $in_extras;
$in_extras = 1 if /^ extras=\{/'
Oh, if you want the extra pkg: and extras= text, slight modification:
dumpsys notification | perl -lne '
print "pkg: $1" if /^Notif.*?: pkg=(\S+)/;
$in_extras = 1 if /^ extras=\{/;
print if $in_extras;
$in_extras = 0 if /^ \}/;'

Sed version:
dumpsys notification |\
sed -n 's/.*pkg=\([^ ]*\).*/pkg:\1/p;/^ extras={$/,/^ }$/s/^ //p'
I'm assuming you always have two spaces in front of extras={ and } and you also want to remove these spaces.

Related

Store variables from lines in a text file using awk and cut in a for loop

I have a tab separated text file, call it input.txt
cat input.txt
Begin Annotation Diff End Begin,End
6436687 >ENST00000422706.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-205|APOL1|2901|protein_coding| 50 6436736 6436687,6436736
6436737 >ENST00000426053.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-206|APOL1|2808|protein_coding| 48 6436784 6436737,6436784
6436785 >ENST00000319136.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000075315.5|APOL1-201|APOL1|3000|protein_coding| 51 6436835 6436785,6436835
6436836 >ENST00000422471.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319151.1|APOL1-204|APOL1|561|nonsense_mediated_decay| 11 6436846 6436836,6436846
6436847 >ENST00000475519.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319153.1|APOL1-212|APOL1|600|retained_intron| 11 6436857 6436847,6436857
6436858 >ENST00000438034.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319152.2|APOL1-210|APOL1|566|protein_coding| 11 6436868 6436858,6436868
6436869 >ENST00000439680.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319252.1|APOL1-211|APOL1|531|nonsense_mediated_decay| 10 6436878 6436869,6436878
6436879 >ENST00000427990.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319154.2|APOL1-207|APOL1|624|protein_coding| 12 6436890 6436879,6436890
6436891 >ENST00000397278.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319100.4|APOL1-202|APOL1|2795|protein_coding| 48 6436938 6436891,6436938
6436939 >ENST00000397279.8|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-203|APOL1|1564|protein_coding| 28 6436966 6436939,6436966
6436967 >ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding| 11 6436977 6436967,6436977
6436978 >ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay| 11 6436988 6436978,6436988
Using the information in input.txt I want to obtain information from a file called Other_File.fa. This file is an annotation file filled with ENST#'s (transcript IDs) and sequences of A's,T's,C's,and G's. I want to store the sequence in a file called Output.log (see example below) and I want to store the command used to retrieve the text in a file called Input.log (see example below).
I have tried to do this using awk and cut so far using a for loop. This is the code I have tried.
for line in `awk -F "\\t" 'NR != 1 {print substr($2,2,17)"#"$5}' input.txt`
do
transcript=`cut -d "#" -f 1 $line`
range=`cut -d "#" -f 2 $line` #Range is the string location in Other_File.fa
echo "Our transcript is ${transcript} and our range is ${range}" >> Input.log
sed -n '${range}' Other_File.fa >> Output.log
done
Here is an example of the 11 lines between ENST00000433768.5 and ENST00000431184.1 in Other_File.fa.
grep -A 11 ENST00000433768.5 Other_File.fa
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
>ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay|
The range value in input.txt for this transcript is 6436967,6436977. In my file Input.log for this transcript I hope to get
Our transcript is ENST00000433768.5 and our range is 6436967,6436977
And in Output.log for this transcript I hope to get
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
But I am getting the following error, and I am unsure as to why or how to fix it.
cut: ENST00000433768.5#6436967,6436977: No such file or directory
cut: ENST00000433768.5#6436967,6436977: No such file or directory
Our transcript is and our range is
My thought was each line from the awk would be read as a string then cut could split the string along the "#" symbol I have added, but it is reading each line as a file and throwing an error when it can't locate the file in my directory.
Thanks.
EDIT2: This is a generic solution which will compare 2 files(input and other_file.fa) and on whichever line whichever range is found it will print them. Eg--> Range numbers are found on 300 line number but range shows you should print from 1 to 20 it will work in that case also. Also note this calls system command which further calls sed command(like you were using range within sed), there are other ways too, like to load whole Input_file into an array or so and then print, but I am going with this one here, fair warning this is not tested with huge size files.
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum,",")
print arr[$2]
start=lineNum[1]
end=lineNum[2]
print "sed -n \047" start","end"p \047 " FILENAME
system("sed -n \047" start","end"p\047 " FILENAME)
start=end=0
}
' file1 FS="[>|]" other_file.fa
EDIT: With OP's edited samples, please try following to print lines based on other file. assumes that the line you find range values, those values will be always after the line on which they found(eg--> 3rd line range values found and range is 4 to 10).
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum," ")
start=lineNum[1]
end=lineNum[2]
}
FNR>=start && FNR<=end{
print
if(FNR==end){
start=end=0
}
}
' file1 FS="[>|]" other_file.fa
You need not to do this with a for loop and then call awk program each time for each line. This could be done in single awk, considering that you have to only print them. Written and tested with your shown samples.
awk -F'[>| ]' 'FNR>1{print "Our transcript is:"$3" and our range is:"$NF}' Input_file
NOTE: This will print for each line of your Input_file values of transcript and range, in case you want to further perform some operation with their values then please do mention.

Error "awk: too many output files 10" when splitting SSL certificates

I'm trying to split a file that contains multiple SSL certificates with AWK but is showing an error message:
awk: too many output files 10
Command that I'm using is the following:
cat ${SSL_CERTIFICATES_PATH} | awk '/BEGIN/ { i++; } /BEGIN/, /END/ { print > i ".extracted.crt" }'
Error Message:
awk: too many output files 10
record number 735
Do you know how could I solve this issue?
You have to close() file,
awk '/BEGIN/ {f=i++".extracted.crt"}/BEGIN/,/END/{print > f;if(/END/)close(f)}'
The Best solution as suggested by Ed Morton, one should not use range expressions, for more details Read Here
awk '/BEGIN/{f=(++i)".extracted.crt"} f{print>f} /END/{close(f);f=""}'
Here is sample (not certificate)
Input
$ cat file
BEGIN
1
END
BEGIN
2
END
BEGIN
3
END
Execution
$ awk '/BEGIN/{f=i++".extracted.crt"}/BEGIN/,/END/{print > f;if(/END/)close(f)}' file
$ awk '/BEGIN/{f=(++i)".extracted.crt"} f{print>f} /END/{close(f);f=""}' file
Output files
$ ls *.crt
0.extracted.crt 1.extracted.crt 2.extracted.crt
File contents of each
$ for i in *.crt; do echo $i; cat $i; done
0.extracted.crt
BEGIN
1
END
1.extracted.crt
BEGIN
2
END
2.extracted.crt
BEGIN
3
END
We have to close the files each time variable i's value gets increases by 1, so try following and let me know if this helps you.
awk '/BEGIN/ {close(i".extracted.crt");i++} /BEGIN/, /END/ { print > i".extracted.crt" }' ${SSL_CERTIFICATES_PATH}
EDIT: Xavier, I have checked with a friend who has SUN 5 with him and following worked well without any error. You could put variable as per your need.
/usr/xpg4/bin/awk '/BEGIN/ {close(i".extracted.crt");i++} /BEGIN/, /END/ { print > i".extracted.crt" }' *.crt

Find and replace URL with content from URL

Background info:
I've got an XML file that my supplier uploads each night with new products and updated stock counts etc.
But they've stitched me up and they don't have a Description in the XML file, they have a link to their site which has the description in raw text.
What i need to do is have a script that loops through the document i download from them and replace the URL with the content of the URL.
For example, if i have
<DescriptionLink>http://www.leadersystems.com.au/DataFeed/ProductDetails/AT-CHARGERSTATION-45</DescriptionLink>
I want it to end up as
<DescriptionLink>Astrotek USB Charging Station Charger Hub 3 Port 5V 4A with 1.5m Power Cable White for iPhone Samsung iPad Tablet GPS</DescriptionLink>
I've tried a few things but i'm not very proficient with scripting or loops.
So far i've got:
#!/bin/bash
LINKGET=`awk -F '|' '{ print $2 }' products-daily.txt`
wget -O products-daily.txt http://www.suppliers-site-url.com
sed 's/<DescriptionLink>*/<DescriptionLink>$(wget -S -O- $LINKGET/g' products-daily.txt
But again, i'm not sure how this all really works so it's been trial and error.
Any help is appreciated!!!
Updated to include example URL.
You'll want something like this (using GNU awk for the 3rd arg to match()):
$ cat tst.awk
{
head = ""
tail = encode($0)
while ( match(tail,/^([^{]*[{])([^}]+)(.*)/,a) ) {
desc = ""
cmd = "curl -s \047" a[2] "\047"
while ( (cmd | getline line) > 0 ) {
desc = (desc=="" ? "" : desc ORS) line
}
close(cmd)
head = head decode(a[1]) desc
tail = a[3]
}
print head decode(tail)
}
function encode(str) {
gsub(/#/,"#A",str)
gsub(/{/,"#B",str)
gsub(/}/,"#C",str)
gsub(/<DescriptionLink>/,"{",str)
gsub(/<\/DescriptionLink>/,"}",str)
return str
}
function decode(str) {
gsub(/}/,"</DescriptionLink>",str)
gsub(/{/,"<DescriptionLink>",str)
gsub(/#C/,"}",str)
gsub(/#B/,"{",str)
gsub(/#A/,"#",str)
return str
}
$ awk -f tst.awk file
<DescriptionLink>Astrotek USB Charging Station Charger Hub 3 Port 5V 4A with 1.5m Power Cable White for iPhone Samsung iPad Tablet GPS</DescriptionLink>
See https://stackoverflow.com/a/40512703/1745001 for info on what the encode/decode functions are doing and why.
Note that this is one of the rare cases where use of getline is appropriate. If you're ever considering using getline in future make sure you read and fully understand all of the caveats and uses cases discussed at http://awk.freeshell.org/AllAboutGetline first.

Extract text and evaluate in bash

I need some help getting a script up and running. Basically I have some data that comes from a command output and want to select some of it and evaluate
Example data is
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
So far I have something along the lines of
# Define date to check
check=$(date -d "-90 days" "+%Y/%m/%d")
# Return user name
for user in $(command | awk '{print $1}')
do
# Return last logon date
$lastdate=(command | awk '{for(i=1;i<=NF;i++) if ($i==spotted) $(i+1)}')
# Evaluation date again current -90days
if $lastdate < $check; then
printf "$user not logged on for ages"
fi
done
I have a couple of problems, not least the fact that whilst I can get information from places I don't know how to go about getting it all together!! I'm also guessing my date evaluation will be more complicated but at this point that's another problem and just there to give a better idea of my intentions. If anyone can explain the logical steps needed to achieve my goal as well as propose a solution that would be great. Thanks
Every time you write a loop in shell just to manipulate text you have the wrong approach (see, for example, https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). The general purpose text manipulation tool that comes on every UNIX installation is awk. This uses GNU awk for time functions:
$ cat tst.awk
BEGIN { check = systime() - (90 * 24 * 60 * 60) }
{
user = $1
date = gensub(/([0-9]+)\/([0-9]+)\/([0-9]+)/,"\\3 \\2 \\1 0 0 0",1,$NF)
secs = mktime(date)
if (secs < check) {
printf "%s not logged in for ages\n", user
}
}
$ cat file
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
$ cat file | awk -f tst.awk
JSnow not logged in for ages
BBaggins not logged in for ages
Batman not logged in for ages
Replace cat file with command.

Optimal way to parse a log file

I have a log file that looks something like this:
Client connected with ID 8127641241
< multiple lines of unimportant log here>
Client not responding
Total duration: 154.23583
Sent: 14
Received: 9732
Client lost
Client connected with ID 2521598735
< multiple lines of unimportant log here>
Client not responding
Total duration: 12.33792
Sent: 2874
Received: 1244
Client lost
The log contains lots of these blocks starting with Client connected with ID 1234 and ending with Client lost. They are never mixed up (only 1 client at a time).
How would I parse this file and generate statistics like this:
I'm mainly asking about the parsing process, not the formatting.
I guess I could loop over all the lines, set a flag when finding a Client connected line and save the ID in a variable. Then grep the lines, save the values until I find the Client lost line. Is this a good approach? Is there a better one?
Here's a quick way using awk:
awk 'BEGIN { print "ID Duration Sent Received" } /^(Client connected|Total duration:|Sent:)/ { printf "%s ", $NF } /^Received:/ { print $NF }' file | column -t
Results:
ID Duration Sent Received
8127641241 154.23583 14 9732
2521598735 12.33792 2874 1244
A solution in perl
#!/usr/bin/perl
use warnings;
use strict;
print "\tID\tDuration\tSent\tReceived\n";
while (<>) {
chomp;
if (/Client connected with ID (\d+)/) {
print "$1\t";
}
if (/Total duration: ([\d\.]+)/) {
print "$1\t";
}
if (/Sent: (\d+)/) {
print "$1\t";
}
if (/Received: (\d+)/) {
print "$1\n";
}
}
Sample output:
ID Duration Sent Received
8127641241 154.23583 14 9732
2521598735 12.33792 2874 1244
If you're sure that the logfile can't have errors, and if the fields are always in the same order, you can use something like the following:
#!/bin/bash
ids=()
declare -a duration
declare -a sent
declare -a received
while read _ _ _ _ id; do
ids+=( "$id" )
read _ _ duration[$id]
read _ sent[$id]
read _ received[$id]
done < <(grep '\(^Client connected with ID\|^Total duration:\|^Sent:\|Received:\)' logfile)
# printing the data out, for control purposes only
for id in "${ids[#]}"; do
printf "ID=%s\n\tDuration=%s\n\tSent=%s\n\tReceived=%s\n" "$id" "${duration[$id]}" "${sent[$id]}" "${received[$id]}"
done
Output is:
$ ./parsefile
ID=8127641241
Duration=154.23583
Sent=14
Received=9732
ID=2521598735
Duration=12.33792
Sent=2874
Received=1244
but the data is stored in the corresponding associative arrays. It's fairly efficient. It would probably be slightly more efficient in another programming language (e.g., perl), but since you only tagged your post with bash, sed and grep, I guess I fully answered your question.
Explanation: grep only filters the lines we're interested in, and bash only reads the fields we're interested in, assuming they always come in the same order. The script should be easy to understand and modify to your needs.
awk:
awk 'BEGIN{print "ID Duration Sent Received"}/with ID/&&!f{f=1}f&&/Client lost/{print a[1],a[2],a[3],a[4];f=0}f{for(i=1;i<=NF;i++){
if($i=="ID")a[1]=$(i+1)
if($i=="duration:")a[2]=$(i+1)
if($i=="Sent:")a[3]=$(i+1)
if($i=="Received:")a[4]=$(i+1)
}}'log
if there is always an empty line between your data blocks, the awk script above could be simplified to:
awk -vRS="" 'BEGIN{print "ID Duration Sent Received"}
{for(i=1;i<=NF;i++){
if($i=="ID")a[1]=$(i+1)
if($i=="duration:")a[2]=$(i+1)
if($i=="Sent:")a[3]=$(i+1)
if($i=="Received:")a[4]=$(i+1)
}print a[1],a[2],a[3],a[4];}' log
output:
ID Duration Sent Received
8127641241 154.23583 14 9732
2521598735 12.33792 2874 1244
if you want to get better format, pipe the output to |column -t
you get:
ID Duration Sent Received
8127641241 154.23583 14 9732
2521598735 12.33792 2874 1244
Use Paragraph Mode to Slurp Files
Using Perl or AWK, you can slurp in records using a special paragraph mode that uses blank lines between records as a separator. In Perl, use -00 to use paragraph mode; in AWK, you set the RS variable to the empty string (e.g. "") to do the same thing. Then you can parse fields within each record.
Use Line-Oriented Statements
Alternatively, you can use a shell while-loop to read each line at a time, and then use grep or sed to parse each line. You may even be able to use a case statement, depending on the complexity of your parsing.
For example, assuming you always have 5 matching fields in a record, you could do something like this:
while read; do
grep -Eo '[[:digit:]]+'
done < /tmp/foo | xargs -n5 | sed 's/ /\t/g'
The loop would yield:
23583 14 9732 2521598735 33792
2874 1244 8127641241 23583 14
9732 2521598735 33792 2874 1244
You can certainly play with the formatting, and add header lines, and so forth. The point is that you have to know your data.
AWK, Perl, or even Ruby are better options for parsing record-oriented formats, but the shell is certainly an option if your needs are basic.
A short snippet of Perl:
perl -ne '
BEGIN {print "ID Duration Sent Received\n";}
print "$1 " if /(?:ID|duration:|Sent:|Received:) (.+)$/;
print "\n" if /^Client lost/;
' filename | column -t
awk -v RS= -F'\n' '
BEGIN{ printf "%15s%15s%15s%15s\n","ID","Duration","Sent","Received" }
{
for (i=1;i<=NF;i++) {
n = split($i,f,/ /)
if ( $i ~ /^(Client connected|Total duration:|Sent:|Received:)/ ) {
printf "%15s",f[n]
}
}
print ""
}'

Resources