a BPEL process creates a xml document, a certain XSD file that has xml structure and i want to parse that BPEL variable with xmllint or xmlstarlet with a unix shell commandline command. is that possible at all?
how can i put the BPEL variable into a shell variable , in order to be able to parse it with xmllint for instance?
INPUT:
<?xml version="1.0"?>
<ns:ItemList xmlns:ns="http:///blabla">
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource> </ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory></directory>
<file/>
</ns2:LocalItem>
</GenericItem>
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource>
</ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory></directory>
<file/>
</ns2:LocalItem>
</GenericItem>
</ns:ItemList>
Using xmlstarlet :
$ cat bpel.xml
<?xml version="1.0"?>
<ns:ItemList xmlns:ns="http:///blabla">
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource> </ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory>d1</directory>
<file/>
</ns2:LocalItem>
</GenericItem>
<GenericItem>
<ns2:LocalItem xmlns:ns2="http:///blabla">
<ItemSource>
</ItemSource>
<ConcItemSource>
<name></name>
<requirements/>
<strategy/>
</ConcItemSource>
<dataFormat/>
<directory>d2</directory>
<file/>
</ns2:LocalItem>
</GenericItem>
</ns:ItemList>
command line :
$ dir1=$(xmlstarlet sel -t -v '//directory[1]/text()' bpel.xml)
$ echo "$dir1"
d1
Using a for loop :
$ count=$(xmlstarlet sel -t -v 'count(//directory)' bpel.xml)
$ for ((i=1; i<=count; i++)) {
xmlstarlet sel -t -v "//directory[$i]/text()" bpel.xml >> newfile
}
But you can do simply :
$ xmlstarlet sel -t -v "//directory/text()" bpel.xml >> newfile
xmlstarlet from STDIN :
command_producing_xml | xmlstarlet sel -t -v "//directory/text()" -
Related
I have the following XML
<?xml version='1.0' encoding='UTF-8'?>
<ListBucketResult xmlns='http://doc.s3.amazonaws.com/2006-03-01'>
<Name>chromedriver</Name>
<Prefix></Prefix>
<Marker></Marker>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>2.0/chromedriver_linux32.zip</Key>
<Generation>1380149859530000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:39.349Z</LastModified>
<ETag>"c0d96102715c4916b872f91f5bf9b12c"</ETag>
<Size>7262134</Size>
</Contents>
<Contents>
<Key>2.0/chromedriver_linux64.zip</Key>
<Generation>1380149860664000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:40.449Z</LastModified>
<ETag>"858ebaf47e13dce7600191ed59974c09"</ETag>
<Size>7433593</Size>
</Contents>
...
</ListBucketResult>
And I tried select only Key node with this command:
xmlstarlet sel -T -t -m '/ListBucketResult/Contents/Key' -v '.' -n file.xml
I tried some commands, but none return any value
And I tried el to see the scructure:
xmlstarlet el file.xml
ListBucketResult
ListBucketResult/Name
ListBucketResult/Prefix
ListBucketResult/Marker
ListBucketResult/IsTruncated
ListBucketResult/Contents
ListBucketResult/Contents/Key
ListBucketResult/Contents/Generation
ListBucketResult/Contents/MetaGeneration
ListBucketResult/Contents/LastModified
ListBucketResult/Contents/ETag
ListBucketResult/Contents/Size
I don't know what is incorrect
Your XML elements are bound to the namespace http://doc.s3.amazonaws.com/2006-03-01, but your XPath is not referencing any namespaces (not using a namespace-prefix). So, it is attempting to reference elements in the "no namespace" and finding nothing.
You need to declare that namespace with a namespace-prefix using the -N switch, and use the namespace-prefix in your XPath:
xmlstarlet sel -N s3="http://doc.s3.amazonaws.com/2006-03-01" -T -t -m '/s3:ListBucketResult/s3:Contents/s3:Key' -v '.' -n file.xml
Reference:
http://xmlstar.sourceforge.net/doc/UG/ch05s01.html
I made an array of filenames of files in which match an pattern:
lista=($(grep -El "<LastVisitedURL>.+</LastVisitedURL>.*<FavoriteTopic>0</FavoriteTopic>" *))
Now I would delete in a file index.xml all tags enclosure which contains the filenames in the array.
for e in ${lista[*]}
do
sed '/\<TopicKey FileName=\"$e\"\>.*\<\/TopicKey\>/d' index.xml
done
The complete script is:
#! /bin/bash
#search xml files watched and no favorites.
lista=($(grep -El "<LastVisitedURL>.+</LastVisitedURL>.*<FavoriteTopic>0</FavoriteTopic>" *))
#declare -p lista
for e in ${lista[*]}
do
sed '/<TopicKey FileName=\"$e\">.*<\/TopicKey>/d' index.xml
done
Even though the regex pattern doesn't work, -i option in sed for edit in place index.xml, reload index file many times how filenames have the array, and this is bad.
Any suggestions?
Here an example using xmlstarlet in a shell :
% cat file.xml
<?xml version="1.0"?>
<root>
<foobar>aaa</foobar>
<LastVisitedURL>http://foo.bar/?a=1</LastVisitedURL>
<LastVisitedURL>http://foo.bar/?a=2</LastVisitedURL>
<LastVisitedURL>http://foo.bar/?a=3</LastVisitedURL>
</root>
Then, the command line :
% xmlstarlet edit --delete '//LastVisitedURL' file.xml
<?xml version="1.0"?>
<root>
<foobar>aaa</foobar>
</root>
I have this xml schema , could you possible help me to extract the values of all item, using XMLStarlet, in shell script.
<transfer-matrix.xml>
<transfers>
<rows>
<item>
<item>Hungary</item>
<item>Kharkov-KIPT-LCG2</item>
<item>9882899680</item>
<item>4</item>
<item>1</item>
</item>
<item>
<item>Spain</item>
<item>Kharkov-KIPT-LCG2</item>
<item>32945102817</item>
<item>12</item>
<item>2</item>
</item>
<item>
<item>Finland</item>
<item>Kharkov-KIPT-LCG2</item>
<item>10737418240</item>
<item>4</item>
<item>0</item>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
</rows>
<key>...</key>
</transfers>
<params>...</params>
</transfer-matrix.xml>
I'm trying to extract item in such way
outcome=`xml sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item,'|',item,'|',item,'|',item,'|',item)" -n /usr/share/dashboard/xml/transfers-country.xml`
My output is:
|Hungary|Hungary|Hungary|Hungary|Hungary |Spain|Spain|Spain|Spain|Spain |Finland|Finland|Finland|Finland|Finland
I need format like this
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
I would be grateful for the help
You need to specify which element you want and add new line character in the end like this:
OUTPUT=$(xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5],'\n')" transfers-country.xml)
And then you can get the desired result via echo -e:
$ echo -e "$OUTPUT"
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
Edit: As npostavs points out, it would be much better to use -n flag instead:
$ xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -n -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5])" transfers-country.xml
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
In shell script, I have an xml file as p.xml, as follows and I want to parse it and get values in two arrays. I am trying to use xmllint, but could not get desired data.
<?xml version="1.0" encoding="UTF-8"?>
<Share_Collection>
<Share id="data/Backup" resource-id="data/Backup" resource-type="SimpleShare" share-name="Backup" protocols="cifs,afp"/>
<Share id="data/Documents" resource-id="data/Documents" resource-type="SimpleShare" share-name="Documents" protocols="cifs,afp"/>
<Share id="data/Music" resource-id="data/Music" resource-type="SimpleShare" share-name="Music" protocols="cifs,afp"/>
<Share id="data/OwnCloud" resource-id="data/OwnCloud" resource-type="SimpleShare" share-name="OwnCloud" protocols="cifs,afp"/>
<Share id="data/Pictures" resource-id="data/Pictures" resource-type="SimpleShare" share-name="Pictures" protocols="cifs,afp"/>
<Share id="data/Videos" resource-id="data/Videos" resource-type="SimpleShare" share-name="Videos" protocols="cifs,afp"/>
</Share_Collection>
I want to get an array all share ids and one array containing share-names. So two array would be like
share-ids-array = ["data/Backup", "data/Documents", "data/Music", "data/OwnCloud", "data/Pictures", "data/Videos"]
share-names-array = ["Backup", "Documents", "Music", "OwnCloud", "Pictures", "Videos"]
I started as follows:
xmllint --xpath '//Share/#id' p.xml
xmllint --xpath '//Share/#share-name' p.xml
that gives me
id="data/Backup"
id="data/Documents" id="data/Music" id="data/OwnCloud" id="data/Pictures" id="data/Videos"
Any help to build those two arrays will be appreciated.
Here is one solution with grep (and tr)...sed or awk are other alternatives. By the way, you cannot use hyphens in variable names in bash.
share_ids=($( xmllint --xpath '//Share/#id' p.xml | grep -Po '".*?"' | tr -d \" ))
share_names=($( xmllint --xpath '//Share/#share-name' p.xml | grep -Po '".*?"' | tr -d \" ))
Example:
$ echo ${share_names[#]}
Backup Documents Music OwnCloud Pictures Videos
Using xmlstarlet is probably better, though:
share_names=($( xmlstarlet sel -T -t -m '//Share/#share-name' -v '.' -n p.xml ))
I'm writing a small script to learn how to parse an XHTML web page. The following command:
cat q?s=goog.xhtml | xpath '//span[#id="yfs_l10_goog"]'
returns:
Found 2 nodes:
-- NODE --
<span id="yfs_l10_goog">624.50</span>-- NODE --
<span id="yfs_l10_goog">624.50</span>
How do I:
need to write my command in order to only extract the value 624.50 ?
what do I need to do to extract it only once ?
source page I'm parsing: http://finance.yahoo.com/q?s=goog
Edit 2:
Give this a try:
xpath -q -e '//span[#id="yfs_l10_goog"][1]/text()'
Edit:
Pipe your output through:
sed -n '/span/{s/<span[^<]*>\([^<]*\)<.*/\1/;p;q}'
Original answer:
Using xmlstarlet:
echo -e '<foo><span id="yfs_l10_goog">624.50</span>\n<bar>xyz</bar><span id="yfs_l10_goog">555.50</span>\n<span id="yfs_l10_goog">123.50</span></foo>' |
xmlstarlet sel -t -v "//span[#id='yfs_l10_goog']"
Result of query:
624.50
Result of echo:
<foo><span id="yfs_l10_goog">624.50</span>
<bar>xyz</bar><span id="yfs_l10_goog">555.50</span>
<span id="yfs_l10_goog">123.50</span></foo>
Result of xml fo:
<?xml version="1.0"?>
<foo>
<span id="yfs_l10_goog">624.50</span>
<bar>xyz</bar>
<span id="yfs_l10_goog">555.50</span>
<span id="yfs_l10_goog">123.50</span>
</foo>
Other queries:
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][1]"
624.50
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][3]"
123.50
$ echo -e '...' | xmlstarlet sel -t -v "//span[#id='yfs_l10_goog'][last()]"
123.50