How to extract values from xml in body using xpath in camel - xpath

I have an http response that is set in my exchange body. I have to extract some values from that xml. I found that the best way could be using camel-xpath. I have to extract value from root tag level. For example in the xml below, the value i want to extract would be attribute1.
<rootTag attribute1="value1">
<child1/>
</rootTag>
I saw some examples that use namespace. But i don't think i have the scope of using namespace here. If so, how could i do that. Could i not directly extract it from the body of the exchange

You can extract your attribute to the message header:
.setHeader("MyHeader").xpath("/rootTag/#attribute1", String.class)
or put attribute to the body:
.setBody().xpath("/rootTag/#attribute1", String.class)
Your do not need namespaces here..
And #Gilles Quenot is certainly right about the xpath expression.

This is the Xpath you need :
string(/rootTag/#attribute1)
tests in a shell
xmlstarlet:
$ xmlstarlet sel -t -v '/rootTag/#attribute1' file
xmllint :
$ xmllint --xpath 'string(/rootTag/#attribute1)' file
xidel :
$ xidel -se '/rootTag/#attribute1' file
Output :
value1

Related

How to convert xml response into json response with bash

I make curl command and the response is in xml format. Is there a way to convert the response in json format, because I need to extract value of one of the tags, i.e. tag
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE SiteConfidenceApi PUBLIC "SiteConfidence Api Version current" "url">
<SiteConfidenceApi Version="current">
<Request></Request>
<Response Status="Ok" Code="200" Message="Success.">
<ApiKey Lifetime="3600">value</ApiKey>
</Response>
</SiteConfidenceApi>
I assume you want to extract "value" from element "ApiKey" - this can be done with recent versions of xmllint (depending on your linux distribution, it is usually from libxml2-utils or just libxml2).
XML=$(curl "$URL")
echo "$XML" | xmllint --xpath '//ApiKey/text()' -
Instead of //Apikey/text() you could use even more specific xpath expressions like /SiteConfidenceApi/Response/ApiKey/text()
If you have a file from your curl output you could replace "-" (to read from stdin) with the filename as argument to xmllint.
--xpath was added to libxml2 with version 2.7.7 in 2010.

xmllint to parse a html file

I was trying to parse out text between specific tags on a mac in various html files. I was looking for the first <H1> heading in the body. Example:
<BODY>
<H1>Dublin</H1>
Using regular expressions for this I believe is an anti pattern so I used xmllint and xpath instead.
xmllint --nowarning --xpath '/HTML/BODY/H1[0]'
Problem is some of the HTML files contain badly formed tags. So I get errors on the lines of
parser error : Opening and ending tag mismatch: UL line 261 and LI
</LI>
Problem is I can't just do, 2>/dev/null as then I loose those files altogether. Is there any way, I can just use an XPath expression here and just say, relax if the XML isn't perfect, just give me the value between the first H1 headings?
Try the --html option. Otherwise, xmllint parses your document as XML which is a lot stricter than HTML. Also note that XPath indices are 1-based and that HTML tags are converted to lowercase when parsing. The command
xmllint --html --xpath '/html/body/h1[1]' - <<EOF
<BODY>
<H1>Dublin</H1>
EOF
prints
<h1>Dublin</h1>

Bash: Retrive value inside an element tag

I'm stuck with developing within my bash script (policies...).
I'm having a hard time finding a way to retrieve an value within the XML element variable itself. I've tried multiple ways but would appreciate any suggestions.
Bash only (or I can solve it myself..).
Example:
<Timestamp q="2016-09-26T10:03:53Z"/>
Do not confuse this with
<Timestamp>
2016-09-26T10:03:53Z
</Timestamp>
Cheers.
Use an XML aware tool like xmllint:
xmllint --xpath 'string(//Timestamp/#q)' file.xml
Solved it by:
grep -o 'Timestamp .............'

Search in a webpage using bash

I am trying to retrieve a webpage, search it for some pattern, retrieve that value and do some calculations with it. My Problem is, i can't seem to figure out how to search for the pattern in a given string.
Lets say i retrieve a Page like this
content=$(curl -L http://google.com)
now i want to search for a value im interested in, which is basically a html tag.
<div class="digits">123,456,789</div>
No i did try to find this by using sed. My Attempt looked like this:
n=$(echo "$content"|sed '<div class=\"digits\">(\\d\\d,\\d\\d\\d,\\d\\d\\d)</div>')
i want to pull that value every, lets say 10 minutes, save it and estimate when 124,xxx,xxx will be met.
My Problem is i don't really know how to save those values, but i think i can figure that out on my own. Im more interested in how to retrieve that substring as i always get an error because of the "<".
i hope someone is able and willing to help me :)
Better use a proper parser with xpath :
xmllint --html --xpath '//*[#class="digits"]' http://domain.tld/
But it seems that the example url you gave in the comments don't contains this class name. You can prove it by running first :
curl -Ls url | grep -oP '<div\s+class="digits">\K[^<]+'
It's best to use a proper parser as #sputnick suggested.
Or you can try something like this:
curl -L url | perl -ne '/<div class="digits">([\d,]+)<.div>/ && {print $1, "\n"}'

Batch to read partial xml file

I'm trying to make a batch file to pull data out of a file and set it as a variable.
The tricky part is I need to read a XML file, and I only need the data between the quotes of the following line...
narrative="I only need this text here"
The text in that line can also contain spaces, brackets, slashes, dashes and colons.
SAMPLE XML FILE :
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<cadcall>
<call callnumber="123456" jurisdiction="abcd" department="dept 1" complaint="cost" priority="M" calltencode="" callername="Persons Name" phonenumber="Cell Number HERE" narrative="[10/02/2012 14:56:27 : pos9 : PERSON] Fairly long narrative here describing issue of dispatch, but sometimes these can be short." alarmtype="" ocanumber="0000000000" disposition1="TRAN" />
</cadcall>
The proper tool to do this is xmllint from libxml, please, provide a more complete XML example, I will tell you how to use a Xpath request on your XML.
EDIT :
here a solution using Xpath (with a little hack : contains) :
xmllint --xpath '/cadcall/call/#narrative[contains(.,'.')]' file.xml
without seeing the complete input, just based on your example line. grep works for you.
kent$ echo 'narrative="I only need this text here"'|grep -Po '(?<=narrative=")[^"]*'
I only need this text here

Resources