XMLStarlet doesn't select xpath query correctly - xpath

I have the following XML
<?xml version='1.0' encoding='UTF-8'?>
<ListBucketResult xmlns='http://doc.s3.amazonaws.com/2006-03-01'>
<Name>chromedriver</Name>
<Prefix></Prefix>
<Marker></Marker>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>2.0/chromedriver_linux32.zip</Key>
<Generation>1380149859530000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:39.349Z</LastModified>
<ETag>"c0d96102715c4916b872f91f5bf9b12c"</ETag>
<Size>7262134</Size>
</Contents>
<Contents>
<Key>2.0/chromedriver_linux64.zip</Key>
<Generation>1380149860664000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:40.449Z</LastModified>
<ETag>"858ebaf47e13dce7600191ed59974c09"</ETag>
<Size>7433593</Size>
</Contents>
...
</ListBucketResult>
And I tried select only Key node with this command:
xmlstarlet sel -T -t -m '/ListBucketResult/Contents/Key' -v '.' -n file.xml
I tried some commands, but none return any value
And I tried el to see the scructure:
xmlstarlet el file.xml
ListBucketResult
ListBucketResult/Name
ListBucketResult/Prefix
ListBucketResult/Marker
ListBucketResult/IsTruncated
ListBucketResult/Contents
ListBucketResult/Contents/Key
ListBucketResult/Contents/Generation
ListBucketResult/Contents/MetaGeneration
ListBucketResult/Contents/LastModified
ListBucketResult/Contents/ETag
ListBucketResult/Contents/Size
I don't know what is incorrect

Your XML elements are bound to the namespace http://doc.s3.amazonaws.com/2006-03-01, but your XPath is not referencing any namespaces (not using a namespace-prefix). So, it is attempting to reference elements in the "no namespace" and finding nothing.
You need to declare that namespace with a namespace-prefix using the -N switch, and use the namespace-prefix in your XPath:
xmlstarlet sel -N s3="http://doc.s3.amazonaws.com/2006-03-01" -T -t -m '/s3:ListBucketResult/s3:Contents/s3:Key' -v '.' -n file.xml
Reference:
http://xmlstar.sourceforge.net/doc/UG/ch05s01.html

Related

XmlStarlet Querying XML

I have this xml schema , could you possible help me to extract the values of all item, using XMLStarlet, in shell script.
<transfer-matrix.xml>
<transfers>
<rows>
<item>
<item>Hungary</item>
<item>Kharkov-KIPT-LCG2</item>
<item>9882899680</item>
<item>4</item>
<item>1</item>
</item>
<item>
<item>Spain</item>
<item>Kharkov-KIPT-LCG2</item>
<item>32945102817</item>
<item>12</item>
<item>2</item>
</item>
<item>
<item>Finland</item>
<item>Kharkov-KIPT-LCG2</item>
<item>10737418240</item>
<item>4</item>
<item>0</item>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
</rows>
<key>...</key>
</transfers>
<params>...</params>
</transfer-matrix.xml>
I'm trying to extract item in such way
outcome=`xml sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item,'|',item,'|',item,'|',item,'|',item)" -n /usr/share/dashboard/xml/transfers-country.xml`
My output is:
|Hungary|Hungary|Hungary|Hungary|Hungary |Spain|Spain|Spain|Spain|Spain |Finland|Finland|Finland|Finland|Finland
I need format like this
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
I would be grateful for the help
You need to specify which element you want and add new line character in the end like this:
OUTPUT=$(xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5],'\n')" transfers-country.xml)
And then you can get the desired result via echo -e:
$ echo -e "$OUTPUT"
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
Edit: As npostavs points out, it would be much better to use -n flag instead:
$ xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -n -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5])" transfers-country.xml
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0

how can a BPEL variable be put into a shell variable

a BPEL process creates a xml document, a certain XSD file that has xml structure and i want to parse that BPEL variable with xmllint or xmlstarlet with a unix shell commandline command. is that possible at all?
how can i put the BPEL variable into a shell variable , in order to be able to parse it with xmllint for instance?
INPUT:
<?xml version="1.0"?>
<ns:ItemList xmlns:ns="http:///blabla">
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource> </ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory></directory>
<file/>
</ns2:LocalItem>
</GenericItem>
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource>
</ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory></directory>
<file/>
</ns2:LocalItem>
</GenericItem>
</ns:ItemList>
Using xmlstarlet :
$ cat bpel.xml
<?xml version="1.0"?>
<ns:ItemList xmlns:ns="http:///blabla">
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource> </ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory>d1</directory>
<file/>
</ns2:LocalItem>
</GenericItem>
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource>
</ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory>d2</directory>
<file/>
</ns2:LocalItem>
</GenericItem>
</ns:ItemList>
command line :
$ dir1=$(xmlstarlet sel -t -v '//directory[1]/text()' bpel.xml)
$ echo "$dir1"
d1
Using a for loop :
$ count=$(xmlstarlet sel -t -v 'count(//directory)' bpel.xml)
$ for ((i=1; i<=count; i++)) {
xmlstarlet sel -t -v "//directory[$i]/text()" bpel.xml >> newfile
}
But you can do simply :
$ xmlstarlet sel -t -v "//directory/text()" bpel.xml >> newfile
xmlstarlet from STDIN :
command_producing_xml | xmlstarlet sel -t -v "//directory/text()" -

Why does xmlstarlet say there's no 'ends-with' function?

I'm using xmlstarlet to extract changeSet nodes from a liquibase XML changelog where the viewName ends with "v".
However, xmlstarlet is complaining that the ends-with XPATH function does not exist:
$ xmlstarlet sel -N x="http://www.liquibase.org/xml/ns/dbchangelog" -t -m \
"/x:databaseChangeLog/x:changeSet[x:createView[ends-with(#viewName, 'v')]]" \
-c . public.db.changelog.xml
xmlXPathCompOpEval: function ends-with not found
Unregistered function
Stack usage errror
xmlXPathCompiledEval: 3 objects left on the stack.
runtime error: element for-each
Failed to evaluate the 'select' expression.
None of the XPaths matched; to match a node in the default namespace
use '_' as the prefix (see section 5.1 in the manual).
For instance, use /_:node instead of /node
The XML looks a bit like this:
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd">
<changeSet id="1391529990457-3">
<createView viewName="myviewnamev"><!-- view definition here --></createView>
</changeSet>
<changeSet id="1391529990457-4">
<createView viewName="anotherviewname"><!-- view definition here --></createView>
</changeSet>
</databaseChangeLog>
I do know that the XPATH expression is otherwise correct, because if I change the selection criteria to x:createView[#viewName="myviewnamev"] then it correctly selects only that changeLog entry.
How do I get xmlstarlet to correctly use ends-with? Or, is there an alternative way to accomplish what I want to do?
xmlstarlet only supports XPath 1.0, which does not offer an ends-with($string, $token) function. You need to use substring, string-length and and string comparison to construct your own using this pattern:
substring($string, string-length($string) - string-length($token) + 1) = $token]
Applied to your query, it should look like this (I "precomputed" the string length):
/x:databaseChangeLog/x:changeSet[x:createView[
substring(#viewName, string-length(#viewName)) = 'v']
]
Alternatively, you might want to look for a more powerful XPath 2.0/XQuery engine.
$ xml sel -t -c //_:changeSet[_:createView[str:split(#viewName,'')[last()]='v']] -n file.xml
$ xml sel -t -c //_:changeSet[_:createView[str:tokenize(#viewName,'')[last()]='v']] -n file.xml
$ xml sel -t -c //_:changeSet[_:createView[substring(#viewName,string-length(#viewName),1)='v']] -n file.xml
Building on #jens-erat's answer ...
[substring(#viewName, string-length(#viewName)) = 'v']
can be made more DRY by putting the predicate on the #viewName attribute:
[#viewName[substring(., string-length(.)) = 'v']]
resulting in:
$ xmlstarlet sel -N x="http://www.liquibase.org/xml/ns/dbchangelog" -t -m \
"/x:databaseChangeLog/x:changeSet[x:createView
[#viewName[substring(., string-length(.)) = 'v']]
]" -c . public.db.changelog.xml

bash XHTML parsing using xpath

I'm writing a small script to learn how to parse an XHTML web page. The following command:
cat q?s=goog.xhtml | xpath '//span[#id="yfs_l10_goog"]'
returns:
Found 2 nodes:
-- NODE --
<span id="yfs_l10_goog">624.50</span>-- NODE --
<span id="yfs_l10_goog">624.50</span>
How do I:
need to write my command in order to only extract the value 624.50 ?
what do I need to do to extract it only once ?
source page I'm parsing: http://finance.yahoo.com/q?s=goog
Edit 2:
Give this a try:
xpath -q -e '//span[#id="yfs_l10_goog"][1]/text()'
Edit:
Pipe your output through:
sed -n '/span/{s/<span[^<]*>\([^<]*\)<.*/\1/;p;q}'
Original answer:
Using xmlstarlet:
echo -e '<foo><span id="yfs_l10_goog">624.50</span>\n<bar>xyz</bar><span id="yfs_l10_goog">555.50</span>\n<span id="yfs_l10_goog">123.50</span></foo>' |
xmlstarlet sel -t -v "//span[#id='yfs_l10_goog']"
Result of query:
624.50
Result of echo:
<foo><span id="yfs_l10_goog">624.50</span>
<bar>xyz</bar><span id="yfs_l10_goog">555.50</span>
<span id="yfs_l10_goog">123.50</span></foo>
Result of xml fo:
<?xml version="1.0"?>
<foo>
<span id="yfs_l10_goog">624.50</span>
<bar>xyz</bar>
<span id="yfs_l10_goog">555.50</span>
<span id="yfs_l10_goog">123.50</span>
</foo>
Other queries:
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][1]"
624.50
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][3]"
123.50
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][last()]"
123.50

Iterate through XML with xmlstarlet

I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<test-report>
<testsuite>
<test name="RegisterConnection1Tests">
<testcase name="testRregisterConnection001"></testcase>
<testcase name="testRegisterConnection002"></testcase>
</test>
<test name="RegisterConnection2Tests">
<testcase name="testRregisterConnection001"></testcase>
<testcase name="testRegisterConnection002"></testcase>
</test>
</testsuite>
</test-report>
And I want the output:
RegisterConnection1Tests,testRregisterConnection001
RegisterConnection1Tests,testRregisterConnection002
RegisterConnection2Tests,testRregisterConnection001
RegisterConnection2Tests,testRregisterConnection002
I'm confused as to how to show the children as I expected
xmlstarlet sel -t -m 'test-report/testsuite/test' -v '#name' -v '//testcase/#name' -n $1 to work, though it only inputs:
RegisterConnection1TeststestRregisterConnection001
RegisterConnection2TeststestRregisterConnection001
To add the missing comma you can add another -v "','"
In your second column you are selecting with an absolute xpath expression from the root element and not from the element matched by the template, the double slashes are wrong. Since you want one line per testcase I would iterate over the testcase elements and then add the name attribute of the parent element like this:
xmlstarlet sel -t -m 'test-report/testsuite/test/testcase' -v '../#name' -v "','" -v '#name' -n $1

Resources