I have the following XML
<?xml version='1.0' encoding='UTF-8'?>
<ListBucketResult xmlns='http://doc.s3.amazonaws.com/2006-03-01'>
<Name>chromedriver</Name>
<Prefix></Prefix>
<Marker></Marker>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>2.0/chromedriver_linux32.zip</Key>
<Generation>1380149859530000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:39.349Z</LastModified>
<ETag>"c0d96102715c4916b872f91f5bf9b12c"</ETag>
<Size>7262134</Size>
</Contents>
<Contents>
<Key>2.0/chromedriver_linux64.zip</Key>
<Generation>1380149860664000</Generation>
<MetaGeneration>4</MetaGeneration>
<LastModified>2013-09-25T22:57:40.449Z</LastModified>
<ETag>"858ebaf47e13dce7600191ed59974c09"</ETag>
<Size>7433593</Size>
</Contents>
...
</ListBucketResult>
And I tried select only Key node with this command:
xmlstarlet sel -T -t -m '/ListBucketResult/Contents/Key' -v '.' -n file.xml
I tried some commands, but none return any value
And I tried el to see the scructure:
xmlstarlet el file.xml
ListBucketResult
ListBucketResult/Name
ListBucketResult/Prefix
ListBucketResult/Marker
ListBucketResult/IsTruncated
ListBucketResult/Contents
ListBucketResult/Contents/Key
ListBucketResult/Contents/Generation
ListBucketResult/Contents/MetaGeneration
ListBucketResult/Contents/LastModified
ListBucketResult/Contents/ETag
ListBucketResult/Contents/Size
I don't know what is incorrect
Your XML elements are bound to the namespace http://doc.s3.amazonaws.com/2006-03-01, but your XPath is not referencing any namespaces (not using a namespace-prefix). So, it is attempting to reference elements in the "no namespace" and finding nothing.
You need to declare that namespace with a namespace-prefix using the -N switch, and use the namespace-prefix in your XPath:
xmlstarlet sel -N s3="http://doc.s3.amazonaws.com/2006-03-01" -T -t -m '/s3:ListBucketResult/s3:Contents/s3:Key' -v '.' -n file.xml
Reference:
http://xmlstar.sourceforge.net/doc/UG/ch05s01.html
I want to query the names of all the persons in the test.xml below.
<body>
<person name="abc"></person>
<person name="def"></person>
<person name="ghi"></person>
</body>
basic query
This has the problem of including "name", which I don't want.
$ xmllint --xpath '//body/person/#name' test.xml`
name="abc"
name="def"
name="ghi"
string function
Using the string function, I only get one result.
$ xmllint --xpath 'string(//body/person/#name)' test.xml
abc
sed and grep
This works but looks needlessly complicated to me.
xmllint --xpath '//body/person/#name' test.xml | grep -o '"\([^"]*\)"' | sed 's|"||g'
abc
def
ghi
Question
Is it possible to get multiple values without the attribute name and without using another tool like grep?
I don't know about xmllint, but xmlstarlet can do it:
xmlstarlet sel -t -v 'body/person/#name' test.xml
Output:
abc
def
ghi
I want to read xml file and set its value into a variable.
for example ,
qhr2400.xml
<XML>
<OPERATION type="1">
<TABLENAME>TABLE</TABLENAME>
<ROWSET>
<ROW>
<CLLI>518</CLLI>
<COLLECTION_DATE>06/04/20 00:45:00</COLLECTION_DATE>
<SS7RT>99</SS7RT>
<AQPRT_1>84</AQPRT_1>
<L7RMSUOCT_01>80</L7RMSUOCT_01>
<L7RMSUOCT_02>80</L7RMSUOCT_02>
</ROW>
</ROWSET>
</OPERATION>
</XML>
I want its value in a variable like $CLLI =518, $COLLECTION_DATE = 06/04/20 00:45:00, SS7RT = 99..
so that I can use these values further to write an insert query.
Basically I want to load this .xml data into a database table.
this is what I tried.
read_xml.sh
awk 'NF==1 && (/ +<[a-zA-Z]+>/ || /^<[a-zA-Z]+>/ || / +<\/[a-zA-Z]+>/){
next
}
{
sub(/^ +/,"")
gsub(/\"|<|>/,"",$0);
sub(/\/.*/,"");
if($0){
print
}
}
' qhr2400.xml
Output
OPERATION type=1
CLLI5018
COLLECTION_DATE06
SS7RT99
AQPRT_184
L7RMSUOCT_0180
L7RMSUOCT_0280
Any help is appreciated.
Thanks!
Don't parse XML/HTML with regex, use a proper XML/HTML parser and a powerful xpath query.
theory :
According to the compiling theory, XML/HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML/HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.
Check this thread too, why-its-not-possible-to-use-regex-to-parse-html-xml
realLife©®™ everyday tool in a shell :
You can use one of the following :
xmllint often installed by default with libxml2, xpath1
xmlstarlet can edit, select, transform... Not installed by default, xpath1
xpath installed via perl's module XML::XPath, xpath1
xidel xpath3
saxon-lint my own project, wrapper over #Michael Kay's Saxon-HE Java library, xpath3
or you can use high level languages and proper libs, I think of :
python's lxml (from lxml import etree)
perl's XML::LibXML, XML::XPath, XML::Twig::XPath, HTML::TreeBuilder::XPath
ruby nokogiri, check this example
php DOMXpath, check this example
Check: Using regular expressions with HTML tags
Fot this, you need an XML parser and xpath query in your shell, see:
$ xidel -se '//CLLI/text()' file.xml
When fixed your XML (opening/closing tag missmatch: TABLENANE/TABLENAME):
xmllint --xpath '//CLLI/text()' file
This command is installed with libxml2 and is far than exotic because it's installed by default on many Linux distros
Output
518
So now, you can retrieve all wanted values in shell variables, one example:
$ collectiondate=$(xidel -se '//COLLECTION_DATE/text()' file)
$ echo "$collectiondate"
But, please, don't use awk nor regex to parse XML.
There's others tools, check:
How to execute XPath one-liners from shell?
Check too: Using regular expressions with HTML tags (same thing for XML)
Going further
declare -A arr
for i in CLLI COLLECTION_DATE SS7RT; do
read arr[$i] < <(xmllint --xpath "//$i/text()" file.xml)
done
Now you have an associative array with CLLI COLLECTION_DATE SS7RT keys:
Keys:
printf '%s\n' "${!arr[#]}"
CLLI
SS7RT
COLLECTION_DATE
Values:
$ printf '%s\n' "${arr[#]}"
518
99
06/04/20 00:45:00
for COLLECTION_DATE:
$ echo "${arr[COLLECTION_DATE]}"
06/04/20 00:45:00
It's possible to feed a numeric array in one line too:
readarray a < <(xidel -se '//*[self::CLLI or self::COLLECTION_DATE or self::SS7RT]/text()' file.xml)
I want its value in a variable like $CLLI =518, $COLLECTION_DATE = 06/04/20 00:45:00, SS7RT = 99.. so that I can use these values further to write an insert query.
I'm going to interpret this as; you want every child-node, and its value, in the "ROW"-node exported as a variable.
As "Gilles Quenot" already mentioned, please don't parse xml with regex. I'd suggest you give xidel a try.
You could do it manually and call xidel for each and every node...
CLLI=$(xidel -s qhr2400.xml -e '//CLLI')
COLLECTION_DATE=$(xidel -s qhr2400.xml -e '//COLLECTION_DATE')
[...]
...but xidel itself can also export variables, multiple at once even:
#multiple queries, multiple declarations:
xidel -s qhr2400.xml -e 'CLLI:=//CLLI' -e 'COLLECTION_DATE:=//COLLECTION_DATE' -e '[...]' --output-format=bash
#or one query, multiple declarations:
xidel -s qhr2400.xml -e 'CLLI:=//CLLI,COLLECTION_DATE:=//COLLECTION_DATE,[...]' --output-format=bash
CLLI='518'
COLLECTION_DATE='06/04/20 00:45:00'
[...]
The output are just strings. To actually set/export these variables you have to use Bash's eval built-in command:
eval "$(xidel -s qhr2400.xml -e 'CLLI:=//CLLI,COLLECTION_DATE:=//COLLECTION_DATE,[...]' --output-format=bash)"
And finally, to do it fully automatic for every child-node in the "ROW"-node:
xidel -s qhr2400.xml -e '//ROW/*/name()'
CLLI
COLLECTION_DATE
SS7RT
AQPRT_1
L7RMSUOCT_01
L7RMSUOCT_02
xidel -s qhr2400.xml -e 'for $x in //ROW/*/name() return eval(x"//ROW/{$x}")'
518
06/04/20 00:45:00
99
84
80
80
xidel -s qhr2400.xml -e 'for $x in //ROW/*/name() return eval(x"{$x}:=//ROW{$x}")[0]' --output-format=bash
CLLI='518'
COLLECTION_DATE='06/04/20 00:45:00'
SS7RT='99'
AQPRT_1='84'
L7RMSUOCT_01='80'
L7RMSUOCT_02='80'
result=
eval "$(xidel -s qhr2400.xml -e 'for $x in //ROW/*/name() return eval(x"{$x}:=//ROW{$x}")[0]' --output-format=bash)"
Another approach is to use XSLT (XSL Transformation)
Here is a fixed and indented version of the OP's XML file:
$ cat demo.xml
<XML>
<OPERATION type="1">
<TABLENAME>TABLE</TABLENAME>
<ROWSET>
<ROW>
<CLLI>518</CLLI>
<COLLECTION_DATE>06/04/20 00:45:00</COLLECTION_DATE>
<SS7RT>99</SS7RT>
<AQPRT_1>84</AQPRT_1>
<L7RMSUOCT_01>80</L7RMSUOCT_01>
<L7RMSUOCT_02>80</L7RMSUOCT_02>
</ROW>
</ROWSET>
</OPERATION>
</XML>
This is the stylesheet I will use:
$ cat demo.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="ROW">
<xsl:text>CLLI="</xsl:text><xsl:value-of select="CLLI"/><xsl:text>" </xsl:text>
<xsl:text>COLLECTION_DATE="</xsl:text><xsl:value-of select="COLLECTION_DATE"/><xsl:text>" </xsl:text>
<xsl:text>SS7RT="</xsl:text><xsl:value-of select="SS7RT"/><xsl:text>" </xsl:text>
<xsl:text>AQPRT_1="</xsl:text><xsl:value-of select="AQPRT_1"/><xsl:text>" </xsl:text>
<xsl:text>L7RMSUOCT_01="</xsl:text><xsl:value-of select="L7RMSUOCT_01"/><xsl:text>" </xsl:text>
<xsl:text>L7RMSUOCT_02="</xsl:text><xsl:value-of select="L7RMSUOCT_02"/><xsl:text>" </xsl:text>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Here is a simple shell script which uses xsktproc to transform demo.xml into suitable text for input to eval in order to create shell variables for required element values.
$ cat demo.sh
#!/bin/bash
eval $(xsltproc demo.xsl demo.xml)
echo "CLLI: $CLLI"
echo "COLLECTION_DATE: $COLLECTION_DATE"
echo "SS7RT: $SS7RT"
echo "AQPRT_1: $AQPRT_1"
echo "L7RMSUOCT_01: $L7RMSUOCT_01"
echo "L7RMSUOCT_02: $L7RMSUOCT_02"
Run the script:
$ ./demo.sh
CLLI: 518
COLLECTION_DATE: 06/04/20 00:45:00
SS7RT: 99
AQPRT_1: 84
L7RMSUOCT_01: 80
L7RMSUOCT_02: 80
$
read_xml.sh
gawk '
BEGIN {
FS="<|>"
}
// {
{
if($3 ~ /[0-9]/) { vars[$2] = $3; next }
}
}
END {
print vars["CLLI"]
print vars["SS7RT"]
print vars["COLLECTION_DATE"]
# etc...
}
' qhr2400.xml
result:
518
99
06/04/20 00:45:00
of course, instead of printing in END, you can use these variables from the vars array for something.
Rejecting AWK as an XML or HTML pareser is unreasonable. AWK is great as a parser for any files, including damaged xml files. Using AWK requires more thought, instead you don't need to install any exotic software. You can save the xml file so that AWK reads some lines incorrectly but the same can be said about xml analysis tools.
EDIT:
We fix the XML file error - splitting the field into several lines.
file qhr2400.xml contains:
<CLLI>
518
</CLLI>
instead of
<CLLI>518</CLLI>
call:
cat qhr2400.xml |tr -d '\n' |sed 's/ *//g' |sed 's/</\n</g' |awk -f readxml.awk
readxml.awk is now:
BEGIN {
FS="<|>"
}
// {
{
if($3 ~ /[0-9]/) { vars[$2] = $3; next }
}
}
END {
print vars["CLLI"]
print vars["SS7RT"]
print vars["COLLECTION_DATE"]
# etc...
}
the result is correct
EDIT2
For some time, there has been a worrying fashion for adding complexity instead of simplifying the environment. The use of a ready-made additional tool is usually a quick solution and may tempt you with its simplicity of use. Unfortunately, it is not always possible to install a huge Perl or Python or Ruby environment, e.g. on a built-in system with 32MB Flash, it is not always possible to compile any smaller tool for your processor architecture or company policy can rightly prohibit adding anything to the standard set, there is also sense for one-time processing of the file. AWK, sed, tr are usually equipped and it is the only rescue then. Also, not always parsing an XML file means wanting to extract key-value pairs, it can be something completely different, e.g.
"ROW> <CLLI> 518 </CLLI> <COLLECTION" which makes useless ready analytical tools based on xpath. AWK is a programming language written specifically for parsing text files in a practicaly unlimited way if we add standard unix tools.
However, if you have little experience, better rely on ready-made solutions if possible.
I'm trying to select a single node using xmlstarlet from the following example XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="key.xsl" ?>
<tables>
<tableset>
<table name="table1">
<row>
<fld name="fileName">
<strval><![CDATA[/my/XYZ/file1]]></strval>
</fld>
<fld name="fileName">
<strval><![CDATA[/my/XYZ/file2]]></strval>
</fld>
<fld name="fileName">
<strval><![CDATA[/my/other/XYZ/file3]]></strval>
</fld>
<fld name="worksBecauseUnique">
<strval><![CDATA[/XYZ/unique]]></strval>
</fld>
</row>
</table>
</tableset>
</tables>
I'm trying to build an associative array in bash... How can I select a single node, or iterate over multiple nodes using xmlstarlet?
I'm trying something like the following so far which is not working:
xmlstarlet sel -t -v "//tables/tableset/table/row/fld[#name=\"fileName\"]/strval[0]" xmlfile.xml
Hoping to get "/my/XYZ/file1" however this is not working.
Answering the first part of your question, there's a simple mistake you're making:
strval[0]
needs to be
strval[1]
...to select the first instance, as XPath arrays are 1-indexed, not 0-indexed.
Now, when you want to select the second match inside your whole document, not inside the parent fld, that looks a bit different:
(//tables/tableset/table/row/fld[#name="fileName"]/strval)[2]
Now on to populating a shell array. Since your content here doesn't contain newlines:
query='//tables/tableset/table/row/fld[#name="fileName"]/strval'
fileNames=( )
while IFS= read -r entry; do
fileNames+=( "$entry" )
done < <(xmlstarlet sel -t -v "$query" -n xmlfile.xml)
# print results
printf 'Extracted filename: %q\n' "${fileNames[#]}"
You aren't giving enough detail to set up an associative array (how do you want to establish the keys?), so I'm doing this as a simple indexed one.
On the other hand, if we were to make some assumptions -- that you wanted to set up your associative array to match from the #name key to the strval value, and that you wanted to use newlines to separate multiple values when given for the same key -- then that might look like this:
match='//tables/tableset/table/row/fld[#name][strval]'
key_query='./#name'
value_query='./strval'
declare -A content=( )
while IFS= read -r key && IFS= read -r value; do
if [[ $content[$key] ]]; then
# appending to existing value
content[$key]+=$'\n'"$value"
else
# first value for this key
content[$key]="$value"
fi
fileNames+=( "$entry" )
done < <(xmlstarlet sel \
-t -m "$query" \
-v "$key_query" -n \
-v "$value_query" -n xmlfile.xml)
I have this xml schema , could you possible help me to extract the values of all item, using XMLStarlet, in shell script.
<transfer-matrix.xml>
<transfers>
<rows>
<item>
<item>Hungary</item>
<item>Kharkov-KIPT-LCG2</item>
<item>9882899680</item>
<item>4</item>
<item>1</item>
</item>
<item>
<item>Spain</item>
<item>Kharkov-KIPT-LCG2</item>
<item>32945102817</item>
<item>12</item>
<item>2</item>
</item>
<item>
<item>Finland</item>
<item>Kharkov-KIPT-LCG2</item>
<item>10737418240</item>
<item>4</item>
<item>0</item>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
</rows>
<key>...</key>
</transfers>
<params>...</params>
</transfer-matrix.xml>
I'm trying to extract item in such way
outcome=`xml sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item,'|',item,'|',item,'|',item,'|',item)" -n /usr/share/dashboard/xml/transfers-country.xml`
My output is:
|Hungary|Hungary|Hungary|Hungary|Hungary |Spain|Spain|Spain|Spain|Spain |Finland|Finland|Finland|Finland|Finland
I need format like this
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
I would be grateful for the help
You need to specify which element you want and add new line character in the end like this:
OUTPUT=$(xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5],'\n')" transfers-country.xml)
And then you can get the desired result via echo -e:
$ echo -e "$OUTPUT"
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0
Edit: As npostavs points out, it would be much better to use -n flag instead:
$ xmlstarlet sel -T -t -m /transfer-matrix.xml/transfers/rows/item -s D:N:- "#item" -n -v "concat(#item,'|',item[1],'|',item[2],'|',item[3],'|',item[4],'|',item[5])" transfers-country.xml
|Hungary|Kharkov-KIPT-LCG2|9882899680|4|1
|Spain|Kharkov-KIPT-LCG2|32945102817|12|2
|Finland|Kharkov-KIPT-LCG2|10737418240|4|0