XmlStarlet Querying XML - bash

I have this xml schema , could you possible help me to extract the values of all item, using XMLStarlet, in shell script.
<transfer-matrix.xml>
<transfers>
<rows>
<item>
<item>Hungary</item>
<item>Kharkov-KIPT-LCG2</item>
<item>9882899680</item>
<item>4</item>
<item>1</item>
</item>
<item>
<item>Spain</item>
<item>Kharkov-KIPT-LCG2</item>
<item>32945102817</item>
<item>12</item>
<item>2</item>
</item>
<item>
<item>Finland</item>
<item>Kharkov-KIPT-LCG2</item>
<item>10737418240</item>
<item>4</item>
<item>0</item>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
</rows>
<key>...</key>
</transfers>
<params>...</params>
</transfer-matrix.xml>
I'm trying to extract item in such way
outcome=`xml sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item,'|',item,'|',item,'|',item,'|',item)" -n /usr/share/dashboard/xml/transfers-country.xml`
My output is:
|Hungary|Hungary|Hungary|Hungary|Hungary |Spain|Spain|Spain|Spain|Spain |Finland|Finland|Finland|Finland|Finland
I need format like this
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
I would be grateful for the help

You need to specify which element you want and add new line character in the end like this:
OUTPUT=$(xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5],'\n')" transfers-country.xml)
And then you can get the desired result via echo -e:
$ echo -e "$OUTPUT"
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
Edit: As npostavs points out, it would be much better to use -n flag instead:
$ xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -n -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5])" transfers-country.xml
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0

Related

XMLStarlet doesn't select xpath query correctly

I have the following XML
<?xml version='1.0' encoding='UTF-8'?>
<ListBucketResult xmlns='http://doc.s3.amazonaws.com/2006-03-01'>
<Name>chromedriver</Name>
<Prefix></Prefix>
<Marker></Marker>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>2.0/chromedriver_linux32.zip</Key>
<Generation>1380149859530000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:39.349Z</LastModified>
<ETag>"c0d96102715c4916b872f91f5bf9b12c"</ETag>
<Size>7262134</Size>
</Contents>
<Contents>
<Key>2.0/chromedriver_linux64.zip</Key>
<Generation>1380149860664000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:40.449Z</LastModified>
<ETag>"858ebaf47e13dce7600191ed59974c09"</ETag>
<Size>7433593</Size>
</Contents>
...
</ListBucketResult>
And I tried select only Key node with this command:
xmlstarlet sel -T -t -m '/ListBucketResult/Contents/Key' -v '.' -n file.xml
I tried some commands, but none return any value
And I tried el to see the scructure:
xmlstarlet el file.xml
ListBucketResult
ListBucketResult/Name
ListBucketResult/Prefix
ListBucketResult/Marker
ListBucketResult/IsTruncated
ListBucketResult/Contents
ListBucketResult/Contents/Key
ListBucketResult/Contents/Generation
ListBucketResult/Contents/MetaGeneration
ListBucketResult/Contents/LastModified
ListBucketResult/Contents/ETag
ListBucketResult/Contents/Size
I don't know what is incorrect
Your XML elements are bound to the namespace http://doc.s3.amazonaws.com/2006-03-01, but your XPath is not referencing any namespaces (not using a namespace-prefix). So, it is attempting to reference elements in the "no namespace" and finding nothing.
You need to declare that namespace with a namespace-prefix using the -N switch, and use the namespace-prefix in your XPath:
xmlstarlet sel -N s3="http://doc.s3.amazonaws.com/2006-03-01" -T -t -m '/s3:ListBucketResult/s3:Contents/s3:Key' -v '.' -n file.xml
Reference:
http://xmlstar.sourceforge.net/doc/UG/ch05s01.html

bash+xmlstarlet: How can one index into a list, or populate an array?

I'm trying to select a single node using xmlstarlet from the following example XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="key.xsl" ?>
<tables>
<tableset>
<table name="table1">
<row>
<fld name="fileName">
<strval><![CDATA[/my/XYZ/file1]]></strval>
</fld>
<fld name="fileName">
<strval><![CDATA[/my/XYZ/file2]]></strval>
</fld>
<fld name="fileName">
<strval><![CDATA[/my/other/XYZ/file3]]></strval>
</fld>
<fld name="worksBecauseUnique">
<strval><![CDATA[/XYZ/unique]]></strval>
</fld>
</row>
</table>
</tableset>
</tables>
I'm trying to build an associative array in bash... How can I select a single node, or iterate over multiple nodes using xmlstarlet?
I'm trying something like the following so far which is not working:
xmlstarlet sel -t -v "//tables/tableset/table/row/fld[#name=\"fileName\"]/strval[0]" xmlfile.xml
Hoping to get "/my/XYZ/file1" however this is not working.
Answering the first part of your question, there's a simple mistake you're making:
strval[0]
needs to be
strval[1]
...to select the first instance, as XPath arrays are 1-indexed, not 0-indexed.
Now, when you want to select the second match inside your whole document, not inside the parent fld, that looks a bit different:
(//tables/tableset/table/row/fld[#name="fileName"]/strval)[2]
Now on to populating a shell array. Since your content here doesn't contain newlines:
query='//tables/tableset/table/row/fld[#name="fileName"]/strval'
fileNames=( )
while IFS= read -r entry; do
fileNames+=( "$entry" )
done < <(xmlstarlet sel -t -v "$query" -n xmlfile.xml)
# print results
printf 'Extracted filename: %q\n' "${fileNames[#]}"
You aren't giving enough detail to set up an associative array (how do you want to establish the keys?), so I'm doing this as a simple indexed one.
On the other hand, if we were to make some assumptions -- that you wanted to set up your associative array to match from the #name key to the strval value, and that you wanted to use newlines to separate multiple values when given for the same key -- then that might look like this:
match='//tables/tableset/table/row/fld[#name][strval]'
key_query='./#name'
value_query='./strval'
declare -A content=( )
while IFS= read -r key && IFS= read -r value; do
if [[ $content[$key] ]]; then
# appending to existing value
content[$key]+=$'\n'"$value"
else
# first value for this key
content[$key]="$value"
fi
fileNames+=( "$entry" )
done < <(xmlstarlet sel \
-t -m "$query" \
-v "$key_query" -n \
-v "$value_query" -n xmlfile.xml)

how can a BPEL variable be put into a shell variable

a BPEL process creates a xml document, a certain XSD file that has xml structure and i want to parse that BPEL variable with xmllint or xmlstarlet with a unix shell commandline command. is that possible at all?
how can i put the BPEL variable into a shell variable , in order to be able to parse it with xmllint for instance?
INPUT:
<?xml version="1.0"?>
<ns:ItemList xmlns:ns="http:///blabla">
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource> </ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory></directory>
<file/>
</ns2:LocalItem>
</GenericItem>
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource>
</ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory></directory>
<file/>
</ns2:LocalItem>
</GenericItem>
</ns:ItemList>
Using xmlstarlet :
$ cat bpel.xml
<?xml version="1.0"?>
<ns:ItemList xmlns:ns="http:///blabla">
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource> </ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory>d1</directory>
<file/>
</ns2:LocalItem>
</GenericItem>
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource>
</ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory>d2</directory>
<file/>
</ns2:LocalItem>
</GenericItem>
</ns:ItemList>
command line :
$ dir1=$(xmlstarlet sel -t -v '//directory[1]/text()' bpel.xml)
$ echo "$dir1"
d1
Using a for loop :
$ count=$(xmlstarlet sel -t -v 'count(//directory)' bpel.xml)
$ for ((i=1; i<=count; i++)) {
xmlstarlet sel -t -v "//directory[$i]/text()" bpel.xml >> newfile
}
But you can do simply :
$ xmlstarlet sel -t -v "//directory/text()" bpel.xml >> newfile
xmlstarlet from STDIN :
command_producing_xml | xmlstarlet sel -t -v "//directory/text()" -

Why does xmlstarlet say there's no 'ends-with' function?

I'm using xmlstarlet to extract changeSet nodes from a liquibase XML changelog where the viewName ends with "v".
However, xmlstarlet is complaining that the ends-with XPATH function does not exist:
$ xmlstarlet sel -N x="http://www.liquibase.org/xml/ns/dbchangelog" -t -m \
"/x:databaseChangeLog/x:changeSet[x:createView[ends-with(#viewName, 'v')]]" \
-c . public.db.changelog.xml
xmlXPathCompOpEval: function ends-with not found
Unregistered function
Stack usage errror
xmlXPathCompiledEval: 3 objects left on the stack.
runtime error: element for-each
Failed to evaluate the 'select' expression.
None of the XPaths matched; to match a node in the default namespace
use '_' as the prefix (see section 5.1 in the manual).
For instance, use /_:node instead of /node
The XML looks a bit like this:
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd">
<changeSet id="1391529990457-3">
<createView viewName="myviewnamev"><!-- view definition here --></createView>
</changeSet>
<changeSet id="1391529990457-4">
<createView viewName="anotherviewname"><!-- view definition here --></createView>
</changeSet>
</databaseChangeLog>
I do know that the XPATH expression is otherwise correct, because if I change the selection criteria to x:createView[#viewName="myviewnamev"] then it correctly selects only that changeLog entry.
How do I get xmlstarlet to correctly use ends-with? Or, is there an alternative way to accomplish what I want to do?
xmlstarlet only supports XPath 1.0, which does not offer an ends-with($string, $token) function. You need to use substring, string-length and and string comparison to construct your own using this pattern:
substring($string, string-length($string) - string-length($token) + 1) = $token]
Applied to your query, it should look like this (I "precomputed" the string length):
/x:databaseChangeLog/x:changeSet[x:createView[
substring(#viewName, string-length(#viewName)) = 'v']
]
Alternatively, you might want to look for a more powerful XPath 2.0/XQuery engine.
$ xml sel -t -c //_:changeSet[_:createView[str:split(#viewName,'')[last()]='v']] -n file.xml
$ xml sel -t -c //_:changeSet[_:createView[str:tokenize(#viewName,'')[last()]='v']] -n file.xml
$ xml sel -t -c //_:changeSet[_:createView[substring(#viewName,string-length(#viewName),1)='v']] -n file.xml
Building on #jens-erat's answer ...
[substring(#viewName, string-length(#viewName)) = 'v']
can be made more DRY by putting the predicate on the #viewName attribute:
[#viewName[substring(., string-length(.)) = 'v']]
resulting in:
$ xmlstarlet sel -N x="http://www.liquibase.org/xml/ns/dbchangelog" -t -m \
"/x:databaseChangeLog/x:changeSet[x:createView
[#viewName[substring(., string-length(.)) = 'v']]
]" -c . public.db.changelog.xml

bash XHTML parsing using xpath

I'm writing a small script to learn how to parse an XHTML web page. The following command:
cat q?s=goog.xhtml | xpath '//span[#id="yfs_l10_goog"]'
returns:
Found 2 nodes:
-- NODE --
<span id="yfs_l10_goog">624.50</span>-- NODE --
<span id="yfs_l10_goog">624.50</span>
How do I:
need to write my command in order to only extract the value 624.50 ?
what do I need to do to extract it only once ?
source page I'm parsing: http://finance.yahoo.com/q?s=goog
Edit 2:
Give this a try:
xpath -q -e '//span[#id="yfs_l10_goog"][1]/text()'
Edit:
Pipe your output through:
sed -n '/span/{s/<span[^<]*>\([^<]*\)<.*/\1/;p;q}'
Original answer:
Using xmlstarlet:
echo -e '<foo><span id="yfs_l10_goog">624.50</span>\n<bar>xyz</bar><span id="yfs_l10_goog">555.50</span>\n<span id="yfs_l10_goog">123.50</span></foo>' |
xmlstarlet sel -t -v "//span[#id='yfs_l10_goog']"
Result of query:
624.50
Result of echo:
<foo><span id="yfs_l10_goog">624.50</span>
<bar>xyz</bar><span id="yfs_l10_goog">555.50</span>
<span id="yfs_l10_goog">123.50</span></foo>
Result of xml fo:
<?xml version="1.0"?>
<foo>
<span id="yfs_l10_goog">624.50</span>
<bar>xyz</bar>
<span id="yfs_l10_goog">555.50</span>
<span id="yfs_l10_goog">123.50</span>
</foo>
Other queries:
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][1]"
624.50
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][3]"
123.50
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][last()]"
123.50

Resources