AWK Script to read from log file

AWK Script to read from log file - bash

I have a requirement to read certain parameters from log file and then update to a database. I am trying to achieve the first part, i.e. to read from log file using awk commands in a shell script
Log file may consists of below lines or more-
[2018-05-22T11:35:17,857] [RQST: rqst_3ADE-5439-598D-1B8B | TB: 9000042] - [588455375] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 769 - requestType="TESTING",partnerName="Test Merchant 123",testId="123456",lob="TEST1_TO_TEST2",tranType="TEST1",paymentType="P2M",amount="110.00",currency="840",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_2AEF-2339-598D-1B8B | TB: 9000043] - [588455376] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 770 - requestType="TESTING",partnerName="Test Merchant 234",testId="234567",lob="TEST2_TO_TEST3",tranType="TEST2",paymentType="P2M",amount="120.00",currency="850",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_4EDA-4539-598D-1B8B | TB: 9000044] - [588455377] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 771 - requestType="TESTING",partnerName="Test Merchant 345",testId="345678",lob="TEST3_TO_TEST4",tranType="TEST3",paymentType="P2M",amount="130.00",currency="860",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
I need to apply filters processor and paymentType and retrieve values of the amount, currency, network and responseCode to variables in a shell script which will be inserted into an Oracle DB table.
I am new to ShellScript and AWK and unable to wrap this. I have tried using
awk '/amount/{print}' testAPI.log
however, is returning all rows which have amount.

since you didn't specify the expected output, here is a template you can tailor for your needs
$ awk -F' - ' '{n=split($NF,a,",");
for(i=1;i<=n;i++) {split(a[i],b,"="); kv[b[1]]=b[2]}}
kv["processor"]=="\"CBN\""
&& kv["paymentType"]=="\"P2M\""{print kv["amount"],kv["currency"]}' file
"110.00" "840"
"120.00" "850"
"130.00" "860"
you can trim the double quotes as well but not sure it's needed as is...

I tried with the three entries in the question, below gives you the output you want
it checks if $5 is paymentType="P2M" and if $8 is having the value processor="CBN" basically, the filter you were looking for, substitute with the required filters you need.
cat testAccelAPI.log | grep -i "[RQST: rqst" | cut -d ' ' -f 19 | awk -F, '{ if($5=="paymentType=\"P2M\"" && $8=="processor=\"CBN\"") print $5 "=" $6 "="$7 "="$8 "=" $9 "="$10}' | cut -d= -f 4,6,8,9 | tr = " "

Related

bash + awk: extract specific information from ensemble of filles

I am using bash script to extract some information from log files located within the directory and save the summary in the separate file.
In the bottom of each log file, there is a table like:
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1 -6.961 0 0
2 -6.797 2.908 4.673
3 -6.639 27.93 30.19
4 -6.204 2.949 6.422
5 -6.111 24.92 28.55
6 -6.058 2.836 7.608
7 -5.986 6.448 10.53
8 -5.95 19.32 23.99
9 -5.927 27.63 30.04
10 -5.916 27.17 31.29
11 -5.895 25.88 30.23
12 -5.835 26.24 30.36
from this I need to take only the value from the second column of the first line (-6.961) and add it together with the name of the log as one string in new ranking_${output}.log
log_name -6.961
so for 5 processed logs it should be something like:
# ranking_${output}.log
log_name1 -X.XXX
log_name2 -X.XXX
log_name3 -X.XXX
log_name4 -X.XXX
log_name5 -X.XXX
Here is a simple bash workflow, which takes ALL THE LINES from ranking table and saves it together with the name of the LOG file:
#!/bin/bash
home="$PWD"
#folder contained all *.log files
results="${home}"/results
# loop each log file and take its name + all the ranking table
for log in ${results}/*.log; do
log_name=$(basename "$log" .log)
echo "$log_name" >> ${results}/ranking_${output}.log
cat $log | tail -n 12 >> ${results}/ranking_${output}.log
done
Could you suggest me an AWK routine which would select only the top value located on the first line of each table?
This is an AWK example that I had used for another format, which does not work there:
awk -F', *' 'FNR==2 {f=FILENAME;
sub(/.*\//,"",f);
sub(/_.*/ ,"",f);
printf("%s: %s\n", f, $5) }' ${results}/*.log >> ${results}/ranking_${output}.log

With awk. If first column contains 1 print filename and second column to file output:
awk '$1=="1"{print FILENAME, $2}' *.log > output
Update to remove path and suffix (.log):
awk '$1=="1"{sub(/.*\//,"",FILENAME); sub(/\.log/,"",FILENAME); print FILENAME, $2}'

Store variables from lines in a text file using awk and cut in a for loop

I have a tab separated text file, call it input.txt
cat input.txt
Begin Annotation Diff End Begin,End
6436687 >ENST00000422706.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-205|APOL1|2901|protein_coding| 50 6436736 6436687,6436736
6436737 >ENST00000426053.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-206|APOL1|2808|protein_coding| 48 6436784 6436737,6436784
6436785 >ENST00000319136.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000075315.5|APOL1-201|APOL1|3000|protein_coding| 51 6436835 6436785,6436835
6436836 >ENST00000422471.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319151.1|APOL1-204|APOL1|561|nonsense_mediated_decay| 11 6436846 6436836,6436846
6436847 >ENST00000475519.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319153.1|APOL1-212|APOL1|600|retained_intron| 11 6436857 6436847,6436857
6436858 >ENST00000438034.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319152.2|APOL1-210|APOL1|566|protein_coding| 11 6436868 6436858,6436868
6436869 >ENST00000439680.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319252.1|APOL1-211|APOL1|531|nonsense_mediated_decay| 10 6436878 6436869,6436878
6436879 >ENST00000427990.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319154.2|APOL1-207|APOL1|624|protein_coding| 12 6436890 6436879,6436890
6436891 >ENST00000397278.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319100.4|APOL1-202|APOL1|2795|protein_coding| 48 6436938 6436891,6436938
6436939 >ENST00000397279.8|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-203|APOL1|1564|protein_coding| 28 6436966 6436939,6436966
6436967 >ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding| 11 6436977 6436967,6436977
6436978 >ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay| 11 6436988 6436978,6436988
Using the information in input.txt I want to obtain information from a file called Other_File.fa. This file is an annotation file filled with ENST#'s (transcript IDs) and sequences of A's,T's,C's,and G's. I want to store the sequence in a file called Output.log (see example below) and I want to store the command used to retrieve the text in a file called Input.log (see example below).
I have tried to do this using awk and cut so far using a for loop. This is the code I have tried.
for line in `awk -F "\\t" 'NR != 1 {print substr($2,2,17)"#"$5}' input.txt`
do
transcript=`cut -d "#" -f 1 $line`
range=`cut -d "#" -f 2 $line` #Range is the string location in Other_File.fa
echo "Our transcript is ${transcript} and our range is ${range}" >> Input.log
sed -n '${range}' Other_File.fa >> Output.log
done
Here is an example of the 11 lines between ENST00000433768.5 and ENST00000431184.1 in Other_File.fa.
grep -A 11 ENST00000433768.5 Other_File.fa
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
>ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay|
The range value in input.txt for this transcript is 6436967,6436977. In my file Input.log for this transcript I hope to get
Our transcript is ENST00000433768.5 and our range is 6436967,6436977
And in Output.log for this transcript I hope to get
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
But I am getting the following error, and I am unsure as to why or how to fix it.
cut: ENST00000433768.5#6436967,6436977: No such file or directory
cut: ENST00000433768.5#6436967,6436977: No such file or directory
Our transcript is and our range is
My thought was each line from the awk would be read as a string then cut could split the string along the "#" symbol I have added, but it is reading each line as a file and throwing an error when it can't locate the file in my directory.
Thanks.

EDIT2: This is a generic solution which will compare 2 files(input and other_file.fa) and on whichever line whichever range is found it will print them. Eg--> Range numbers are found on 300 line number but range shows you should print from 1 to 20 it will work in that case also. Also note this calls system command which further calls sed command(like you were using range within sed), there are other ways too, like to load whole Input_file into an array or so and then print, but I am going with this one here, fair warning this is not tested with huge size files.
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum,",")
print arr[$2]
start=lineNum[1]
end=lineNum[2]
print "sed -n \047" start","end"p \047 " FILENAME
system("sed -n \047" start","end"p\047 " FILENAME)
start=end=0
}
' file1 FS="[>|]" other_file.fa
EDIT: With OP's edited samples, please try following to print lines based on other file. assumes that the line you find range values, those values will be always after the line on which they found(eg--> 3rd line range values found and range is 4 to 10).
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum," ")
start=lineNum[1]
end=lineNum[2]
}
FNR>=start && FNR<=end{
print
if(FNR==end){
start=end=0
}
}
' file1 FS="[>|]" other_file.fa
You need not to do this with a for loop and then call awk program each time for each line. This could be done in single awk, considering that you have to only print them. Written and tested with your shown samples.
awk -F'[>| ]' 'FNR>1{print "Our transcript is:"$3" and our range is:"$NF}' Input_file
NOTE: This will print for each line of your Input_file values of transcript and range, in case you want to further perform some operation with their values then please do mention.

Make a division of a stat/number from another stat/number in bash

I have a simply question but I'm not this able with bash, I'm using a command line to get the number of queries and cached queries my Pi-Hole makes to Unbound (recursive DNS) and I want to display the cached queries as % of total queries, here're the lines to get the total queries:
sudo unbound-control stats_noreset | awk -F '=' '/total.num.queries/ {print $NF}'
this gives me for example 1500 and I want to divide this number with this line:
sudo unbound-control stats_noreset | awk -F '=' '/total.num.cachehits/ {print $NF}'
this give me for example 500 and I want to display it as:
500 33.3%
with one line code only, not with variables.
Thanks a lot! I was trying for days.
Edit: as asked, the full output of sudo unbound-control stats_noreset is:
sudo unbound-control stats_noreset
thread0.num.queries=1294
thread0.num.queries_ip_ratelimited=0
thread0.num.cachehits=327
thread0.num.cachemiss=967
thread0.num.prefetch=134
thread0.num.zero_ttl=0
thread0.num.recursivereplies=967
thread0.requestlist.avg=0.334242
thread0.requestlist.max=9
thread0.requestlist.overwritten=0
thread0.requestlist.exceeded=0
thread0.requestlist.current.all=0
thread0.requestlist.current.user=0
thread0.recursion.time.avg=0.080698
thread0.recursion.time.median=0.0325689
thread0.tcpusage=0
thread1.num.queries=1309
thread1.num.queries_ip_ratelimited=0
thread1.num.cachehits=342
thread1.num.cachemiss=967
thread1.num.prefetch=132
thread1.num.zero_ttl=0
thread1.num.recursivereplies=967
thread1.requestlist.avg=0.374886
thread1.requestlist.max=9
thread1.requestlist.overwritten=0
thread1.requestlist.exceeded=0
thread1.requestlist.current.all=0
thread1.requestlist.current.user=0
thread1.recursion.time.avg=0.075309
thread1.recursion.time.median=0.0322503
thread1.tcpusage=0
thread2.num.queries=1338
thread2.num.queries_ip_ratelimited=0
thread2.num.cachehits=336
thread2.num.cachemiss=1002
thread2.num.prefetch=156
thread2.num.zero_ttl=0
thread2.num.recursivereplies=1002
thread2.requestlist.avg=0.360104
thread2.requestlist.max=9
thread2.requestlist.overwritten=0
thread2.requestlist.exceeded=0
thread2.requestlist.current.all=0
thread2.requestlist.current.user=0
thread2.recursion.time.avg=0.073632
thread2.recursion.time.median=0.031425
thread2.tcpusage=0
thread3.num.queries=1258
thread3.num.queries_ip_ratelimited=0
thread3.num.cachehits=339
thread3.num.cachemiss=919
thread3.num.prefetch=127
thread3.num.zero_ttl=0
thread3.num.recursivereplies=919
thread3.requestlist.avg=0.315488
thread3.requestlist.max=9
thread3.requestlist.overwritten=0
thread3.requestlist.exceeded=0
thread3.requestlist.current.all=0
thread3.requestlist.current.user=0
thread3.recursion.time.avg=0.073834
thread3.recursion.time.median=0.0308651
thread3.tcpusage=0
total.num.queries=5199
total.num.queries_ip_ratelimited=0
total.num.cachehits=1344
total.num.cachemiss=3855
total.num.prefetch=549
total.num.zero_ttl=0
total.num.recursivereplies=3855
total.requestlist.avg=0.34673
total.requestlist.max=9
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.075873
total.recursion.time.median=0.0317773
total.tcpusage=0
time.now=1613041718.040611
time.up=14305.501526
time.elapsed=14305.501526
thread0,1,etc..are the cores but my interest is only the total.

I assume when you say without variables, you mean without variables in the shell.
With this in mind, you can use awk variables to store the intermediate result:
sudo unbound-control stats_noreset | awk -F '=' '$1 == "total.num.queries" {queries=$NF} $1 == "total.num.cachehits" {hits=$NF}END{print hits, hits/queries*100"%"}'
or in a more readable multi-line format:
sudo unbound-control stats_noreset |
awk -F '=' '$1 == "total.num.queries" { queries = $NF }
$1 == "total.num.cachehits" { hits = $NF }
END { print hits, hits / queries * 100 "%" }'
The output is:
1344 25.8511%
If you need only a single decimal place in the output, you can use printf, like
END { printf "%d %.1f%%\n", hits, hits / queries * 100 }

Wondering how to merge these two bash commands in a single efficient one

I've a log file that contains some lines I need to grab:
Jul 2 06:42:00 myhostname error proc[12345]: 01310001:3: event code xxxx Slow transactions attack detected - account id: (20), number of dropped slow transactions: (3)
Jul 2 06:51:00 myhostname error proc[12345]: 01310001:3: event code xxxx Slow transactions attack detected - account id: (20), number of dropped slow transactions: (2)
Account id(xx) gives me the name of an object that I am able to gather through mysql query.
Following command (which is for sure not optimized at all, but working) gives me the number of matching lines per account id:
grep "Slow transactions" logfile| awk '{print $18}' | awk -F '[^0-9]+' '{OFS=" ";for(i=1; i<=NF; i++) if ($i != "") print($i)}' | sort | uniq -c
14 20
The output (14 20) means the account id 20 was observed 14 times (14 lines in the logfile).
Then I also have number of dropped slow transactions: (2) part.
This gives the real number of dropped transactions that was logged. In other word, a log entry could mean 1 or more dropped transaction.
I do have a small command to count the number of dropped transactions:
grep "Slow transactions" logfile | awk '{print $24}' | sed 's/(//g' | sed 's/)//g' | awk '{s+=$1} END {print s}'
73
That means 73 transactions were dropped.
These two works but when coming to the point of merging the two I am stuck. I really don't see how to combine them; I am pretty sure awk can do it (and probably a better way that I did) but I would appreciate if any expert from the community could give me some guidance.
update
Since above one was too easy for some of our awk experts in SO I introduce an optional feature :)
As previously mentioned I can convert account ID into a name issuing a mysql query. So, the idea is now to include the ID => name conversion into the awk command.
The mySQL query looks like this (XX being the account ID):
mysql -Bs -u root -p$(perl -MF5::GenUtils -e "print get_mysql_password.qq{\n}") -e "SELECT name FROM myTABLE where account_id= 'XX'"
I founded the post below which deals with commands outputs into awk but facing syntax errors...
How can I pass variables from awk to a shell command?

This uses parentheses as your field separator, so it's easier to grab the account number and the number of slow connections.
awk -F '[()]' '
/Slow transactions/ {
acct[$2]++
dropped[$2] += $4
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc" # https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html
for (acctnum in acct)
print acctnum, acct[acctnum], dropped[acctnum]
}
' logfile
Given your sample input, this outputs
20 2 5
Required GNU awk for the "sorted_in" method of sorting array traversal by index.

Parsing log files in BASH to find "ERROR" entries between certain time stamps

I am writing a script in BASH that needs to check through log files for ERROR entries. I plan to run this as a cron hourly, so I only want to have it only return ERROR type entries that occurred within the last hour (all server times are GMT). I establish the following variables
# Log file directory
LOGPATH="/path/to/logs/"
# Current date and time
CURDATE=`date +%Y-%m-%d`
CURTIME=`date +%H:%M:%S`
# Old date and time
OLDDATE=`date +%Y-%m-%d -d "1 hour ago"`
OLDTIME=`date +%H:%M:%S -d "1 hour ago"`
All log files adhere to the file name format of ktYEAR-MONTH-DAY.root.log.txt Where YEAR/MONTH/DAY are replaced with the date that entries are recorded in. So for instance, today's log file would be kt2011-08-15.root.log.txt. An example entry of the contents is
2011-08-15 | 19:30:02 | ERROR | 18333 | 337 | n/a | dms | default | error | XMLRPC Lucene - addDocument - Reason: Failed to parse XML-RPC request: An invalid XML character (Unicode: 0xb) was found in the element content of the document.
The columns of interest are the 1st, 2nd, 3rd (value may be "INFO", "DEBUG", etc, but am only interested when "ERROR" is the value) and the last column which is the body of the log message.
What I am trying to accomplish is having this BASH script parse through the file(s) that have entries spanning the last hour of activity (as defined in the 1st and 2nd column), and if the 4th column contains the string "ERROR", then display the right-most column's contents. My confusion comes when trying to determine how to parse through the log file(s) based off of the $CURTIME an $OLDTIME, made worse when midnight comes and I then have to search through the previous day's log file. I would prefer not to do a blanket grep style search through all the log files as the quantity and size can be excessive, but if that's how it has to be done, then so be it.

awk -F ' \\| ' -v "d=$(date -d "1 hour ago" -u +%Y-%m-%d#%H:%M:%S)" '$3 == "ERROR" && $1"#"$2 > d'

This is as simple as doing string comparison in awk. When you pass midnight, simply add the $OLDDATE file to the search:
if [ "$CURDATE" != "$OLDDATE" ]; then
cat "kt$OLDDATE.root.log.txt" "kt$CURDATE.root.log.txt"
else
cat "kt$CURDATE.root.log.txt"
fi | awk -F "|" -v olddate=$OLDDATE -v oldtime=$OLDTIME -v curdate=$CURDATE 'BEGIN{olddate=olddate " "; curdate = curdate " "; oldtime = " " oldtime " "}
$1 == olddate && $2 >= oldtime && $3 == " ERROR "{print $0}
$1 > olddate && $3 == " ERROR "{print $0}'
Can be combined with glenn's solution to be much shorter.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

AWK Script to read from log file - bash

Related

bash + awk: extract specific information from ensemble of filles

Store variables from lines in a text file using awk and cut in a for loop

Make a division of a stat/number from another stat/number in bash

Wondering how to merge these two bash commands in a single efficient one

Parsing log files in BASH to find "ERROR" entries between certain time stamps

Categories

Resources