extract unique value from a log4j log file - bash

Im having trouble extracting only a matching string: OPER^ from a log4j file.
I can get this value from two different sources inside my log file:
2012-01-26 03:06:45,428 INFO [NP_OSS] OSSBSSGWIMPL6000|**OPR20120126120537008893**|GenServiceDeactivationResponse :: processRequestGenServiceDeactivationResponse() ::
or:
2012-01-26 03:06:45,411 INFO [NP_OSS] MESSAGE_DATA = <?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:ServiceDeactivationResponse xmlns:ns2="urn:ngn:foo"><MessageHeader><MessageTimeStamp>20120126031123</MessageTimeStamp>**<OperatorTrxID>OPR20120126120537008893</OperatorTrxID>**</MessageHeader></ns2:ServiceDeactivationResponse>
I need to extract only the value OPR*
I'm guessing its much easier to extract it from the first one since it doesn't involve parsing xml.
Thanks a lot in advance for your help!

maybe I didn't understand OP's question well, why a simple grep command cannot do the job?
like
grep -Po 'OPR\d+'
output for both lines are same:
OPR20120126120537008893

$ echo $line | grep OPR | sed -e "s/^.*OPR\([0-9]*\).*$/\1/"
Edit:
After reading your comment:
$ echo $line | grep OPR | sed -e "s/^.*\(OPR[0-9]*\).*$/\1/" | head -1

awk: Setting up Field Separators
awk -v FS="[<>]" '{print $13}' logfile
perl: Using Positive look ahead and look behind
perl -pne 's/.*(?<=\<OperatorTrxID\>)([A-Z0-9]+)(?=\<\/OperatorTrxID\>).*/$1/' logfile
Test:
[jaypal:~/Temp] cat logfile
2012-01-26 03:06:45,411 INFO [NP_OSS] MESSAGE_DATA = <?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:ServiceDeactivationResponse xmlns:ns2="urn:ngn:foo"><MessageHeader><MessageTimeStamp>20120126031123</MessageTimeStamp><OperatorTrxID>OPR20120126120537008893</OperatorTrxID></MessageHeader></ns2:ServiceDeactivationResponse>
[jaypal:~/Temp] awk -v FS="[<>]" '{print $13}' logfile
OPR20120126120537008893
[jaypal:~/Temp] perl -pne 's/.*(?<=\<OperatorTrxID\>)([A-Z0-9]+)(?=\<\/OperatorTrxID\>).*/$1/' logfile
OPR20120126120537008893

Related

Fetch value from xml element which is enclosed with CDATA through awk

I am using awk command to fetch values from xml elements. Using below command
$(awk -F "[><]" '/'$tag_name'/{print $3}' $FILE_NAME | sort | uniq)
Here
File_name: XML File.
tag_name: name of xml element whose value we
need.
Sample XML
<item>
<tag1>test</tag1>
<tag2><![CDATA[cdata_test]]></tag2>
</item>
One of the tag in xml contains CDATA. For that script is not working as expected.
When I tried to print it is printing blank.
Instead of using a specific tool as AWK, not aware of the XML specificities, I suggest you to use xmlstarlet for selecting the nodes you want. For instance:
xmlstarlet select -t -v '//tag1' -n input.xml
will give as result:
test
Issuing:
xmlstarlet select -t -v '//tag2' -n input.xml
gives as output:
cdata_test
If you don't need the newline at the end of the returned string, just remove the -n from the options of the xmlstarlet command.
Keep it simple.
As xmlstarlet is not installed on my machine.
I used sed prior to my awk command as follows and that works for me.
$(sed -e 's/<![CDATA[//g; s/]]>//g' ${FILE_NAME} | awk -F "[><]" '/'$tag_name'/{print $3}' | sort | uniq)
Also, If anybody has any other solution. That too is also welcome.

grep serial numbers not starting with specific prefix

I have this file (serials.txt) containing serial numbers:
S/N:175-1915011190
S/N:244-1920023447
S/N:335-1920101144
S/N:244-1920101149
Using grep or similar tool I want to select all serials NOT starting with '244'
I'm able to select all the '244' with grep -Eo '244-[0-9]*' serials.txt but I want the opposite.
Something like grep -Eo '(^244)-[0-9]*' serials.txt
The output should be (without S/N:)
175-1915011190
335-1920101144
Following awk may help you in same.
awk '!/S\/N:244/' Input_file
EDIT: Above code will give complete line as output if you need starting from serial number to till end in output then following may help you.
awk -F':' '!/S\/N:244/{print $2}' Input_file
EDIT2: Adding a sed solution too here for same.
sed -n '/:244/d;s/.*://;p' Input_file
The -v option on grep would be helpful here, and then cut to remove the leading cruft:
grep -v ':244-' serials.txt | cut -c5-
Here you go, without S/N:
grep -v ':244' serials.txt | cut -d':' -f2
Antigrep for :244, cuts with delimiter : shows field 2.
awk -F':' '$2!~/^244/{print $2}' file

Parse file by splitting string in file and get desired output using single command

I'm using bash to look into file and parse the results. Can someone tell me how to use cut/awk to split the string and get desired output by using single command? I can get through individual cut and get the below output (with 2 commands and concatenation) but i want to do using single command instead of two commands.
test.log:
1/98 | (PASSED) com.yahoo.qa.java.projects.stackoverview.questions.Password_01() | 21:20:20
Tried code:
str1=`cat test.log | tail -1 | cut -d '|' -f 1`
str2=`cat test.log | tail -1 | cut -d '|' -f 2 | sed -e 's/com.yahoo.qa.java.projects./''/g'`
str3="${str1} | ${str2}"
Expected:
1/98 | (PASSED) stackoverview.questions.Password_01
Since this is a simple substitution on an individual line it's better suited to sed than awk and not at all appropriate for cut:
$ sed 's/\(.*| [^ ]* \)com\.yahoo\.qa\.java\.projects\.\([^(]*\).*/\1\2/' file
1/98 | (PASSED) stackoverview.questions.Password_01
Following single awk may help you in same.
awk 'END{sub(/com\.yahoo\.qa\.java\.projects\./,"",$4);print $1,$2,$3,$4}' Input_file
OR for all kind of awks following may help you in same too.(As per SIR ED's suggestions):
awk '{value=$0} END{split(value, a," ");sub(/com.yahoo.qa.java.projects\./,"",a[4]);print a[1],a[2],a[3],a[4]}' Input_file
Using awk
$ awk -F "com[.]yahoo[.]qa[.]java[.]projects[.]" 'sub(/\(\).*/,"",$2)' file
1/98 | (PASSED) stackoverview.questions.Password_01

Extract field from xml file

xml file:
<head>
<head2>
<dict type="abc" file="/path/to/file1"></dict>
<dict type="xyz" file="/path/to/file2"></dict>
</head2>
</head>
I need to extract the list of files from this. So the output would be
/path/to/file1
/path/to/file2
So far, I've managed to the following.
grep "<dict*file=" /path/to/xml.file | awk '{print $3}' | awk -F= '{print $NF}'
quick and dirty based on your sample, not xml possibilties
# sed a bit secure
sed -e '/<head>/,/<\/head>/!d' -e '/.*[[:blank:]]file="\([^"]*\)".*/!d' -e 's//\1/' YourFile
# sed in brute force
sed -n 's/.*[[:blank:]]file="\([^"]*\)".*/\1/p' -e 's//\1/' YourFile
# awk quick unsecure using your sample
awk -F 'file="|">' '/<head>/{h=1} /\/head>{h=0} h && /[[:blank:]]file/ { print $2 }' YourFile
now, i don't promote this kind of extraction on XML unless your really know how is your source in format and content (extra field, escaped quote, content of string like tag format, ...) are a big cause of failure and unexpected result and no more appropriate tools are available
now to use your own script
#grep "<dict*file=" /path/to/xml.file | awk '{print $3}' | awk -F= '{print $NF}'
awk '! /<dict.*file=/ {next} {$0=$3;FS="\"";$0=$0;print $2;FS=OFS}' YourFile
no need of a grep with awk, use starting pattern filter /<dict.*file/
second awk for using a different separator (FS) could be done inside the same script changing FS but because it only occur at next evaluation (next line by default), you could force a reevaluation of current content with $0=$0 in this case
Use an xmllint solution with -xpath as //head/head2/dict/#file
xmllint --xpath "//head/head2/dict/#file" input-xml | awk 'BEGIN{FS="file="}{printf "%s\n%s\n", gensub(/"/,"","g",$2), gensub(/"/,"","g",$3)}'
/path/to/file1
/path/to/file2
Unfortunately couldn't provide a pure xmllint logic, because thought applying,
xmllint --xpath "string(//head/head2/dict/#file)" input-xml
will return the file attributes from both the nodes, but it was returning only the first instance.
So added coupled my logic with GNU Awk, to extract the required values, doing
xmllint --xpath "//head/head2/dict/#file" input-xml
returns values as
file="/path/to/file1" file="/path/to/file2"
On the above output, setting a string de-limiter as file= and removing the double-quotes using gensub() function solved the requirement.
Also PE [perl everywhere :) ] solution:
perl -MXML::LibXML -E 'say $_->to_literal for XML::LibXML->load_xml(location=>q{file.xml})->findnodes(q{/head/head2/dict/#file})'
it prints
/path/to/file1
/path/to/file2
For the above you need to have installed the XML::LibXML module.
With xmlstarlet it would be:
xmlstarlet sel -t -v "//head/head2/dict/#file" -nl input.xml
This command:
awk -F'[=" ">]' '{print $12}' file
Will produces:
/path/to/file1
/path/to/file2

Printing a substring from log

I have a log line of the following format:
2016-08-04 19:12:02,537 INFO ...<Thread-4> - Got a message [......|clientTradeId=xxxxxxx|timeInForce=xxxx|.....TradeResponseMessage]
I would need to extract all line with the 'Got a message' key phrase;and then print out just the 'clientTradeId=xxxxxxx' part of the resulting shortlist.
How do I achieve this with scripting(grep and cut? - or is there a better option)
Considering data is in file data.log
grep -F "Got a message" data.log | grep -Po "clientTradeId=[^| ]+"
using cut
grep -F "Got a message" data.log | cut -f2 -d'|'
UPDATED COMMAND thanks #BenjaminW:
sed -rn 's/.*(clientTradeId=[0-9]*).*/p' file
Haven't used sed, but I have used regex.
Looking at the documentation for sed
cat file | sed '/.*(clientTradeId=[0-9]*).*/\1/'
What this does is pipe the file to sed, then, using regex, select the part that you wanted, then output it (I hope).

Resources