Retrieve values of multiple attributes of an XML element using shell scrit - shell

I've an XML file with below format.
<products>
<product name="A" version="1" location="tmp">
<product name="B" version="1.2" location="tmp">
<product name="C" version="2" location="tmp">
</products>
I need the values of name and version attribute for each product element. Below is the desired output
Product Version
A 1
B 1.2
Below command giving me only the product name.
echo 'cat //products/product/#name' | xmllint --shell envinfo.xml | awk -F\" '\NR % 2 == 0 { print $2 }'
Output
A
B
C
Is there any way to get multiple attribute values from each element
Thanks in advance.

Related

Is it possible to use sed instead of Grep -oP to extract a word? [duplicate]

Sometimes I need to quickly extract some arbitrary data from XML files to put into a CSV format. What's your best practices for doing this in the Unix terminal? I would love some code examples, so for instance how can I get the following problem solved?
Example XML input:
<root>
<myel name="Foo" />
<myel name="Bar" />
</root>
My desired CSV output:
Foo,
Bar,
Peter's answer is correct, but it outputs a trailing line feed.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="root">
<xsl:for-each select="myel">
<xsl:value-of select="#name"/>
<xsl:text>,</xsl:text>
<xsl:if test="not(position() = last())">
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Just run e.g.
xsltproc stylesheet.xsl source.xml
to generate the CSV results into standard output.
Use a command-line XSLT processor such as xsltproc, saxon or xalan to parse the XML and generate CSV. Here's an example, which for your case is the stylesheet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="root">
<xsl:apply-templates select="myel"/>
</xsl:template>
<xsl:template match="myel">
<xsl:for-each select="#*">
<xsl:value-of select="."/>
<xsl:value-of select="','"/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
If you just want the name attributes of any element, here is a quick but incomplete solution.
(Your example text is in the file example)
grep "name" example | cut -d"\"" -f2,2
| xargs -I{} echo "{},"
XMLStarlet is a command line toolkit to query/edit/check/transform
XML documents (for more information, see XMLStarlet Command Line XML Toolkit)
No files to write, just pipe your file to xmlstarlet and apply an xpath filter.
cat file.xml | xml sel -t -m 'xpathExpression' -v 'elemName' 'literal' -v 'elname' -n
-m expression
-v value
'' included literal
-n newline
So for your xpath the xpath expression would be //myel/#name
which would provide the two attribute values.
Very handy tool.
Here's a little ruby script that does exactly what your question asks (pull an attribute called 'name' out of elements called 'myel'). Should be easy to generalize
#!/usr/bin/ruby -w
require 'rexml/document'
xml = REXML::Document.new(File.open(ARGV[0].to_s))
xml.elements.each("//myel") { |el| puts "#{el.attributes['name']}," if el.attributes['name'] }
Using xidel:
xidel -s input.xml -e '//myel/concat(#name,",")'
Answering the original question, assuming xml file is "test.xml" that contains:
<root>
<myel name="Foo" />
<myel name="Bar" />
</root>
tr -s "\"" " " < text.xml | awk '{printf "%s,\n", $3}'
Your test file is in test.xml.
sed -n 's/^\s*<myel\s*name="\([^"]*\)".*$/\1,/p' test.xml
It has its pitfalls; for example if it is not strictly given that each myel is on one line you have to "normalize" the XML file first (so each myel is on a separate line).
yq can be used for XML parsing.
It is a lightweight and portable command-line YAML processor and can also deal with XML.
The syntax is similar to jq
Input
<root>
<myel name="Foo" />
<myel name="Bar">
<mysubel>stairway to heaven</mysubel>
</myel>
</root>
usage example 1
yq e '.root.myel.0.+name' $INPUT (version >= 4.30: yq e '.root.myel.0.+#name' $INPUT)
Foo
usage example 2
yq has a nice builtin feature to make XML easily grep-able
yq --input-format xml --output-format props $INPUT
root.myel.0.+name = Foo
root.myel.1.+name = Bar
root.myel.1.mysubel = stairway to heaven
usage example 3
yq can also convert an XML input into JSON or YAML
yq --input-format xml --output-format json $INPUT
{
"root": {
"myel": [
{
"+name": "Foo"
},
{
"+name": "Bar",
"mysubel": "stairway to heaven"
}
]
}
}
yq --input-format xml $FILE (YAML is the default format)
root:
myel:
- +name: Foo
- +name: Bar
mysubel: stairway to heaven

Unable to associate or grouping each set of xml attributes in bash script

I have following format xml which has multiple occurrences of same attributes ( name , code and format ).
<?xml version="1.0" encoding="UTF-8"?>
<config>
<input>
<pattern>
<name>ABC</name>
<code>1234</code>
<format>txt</format>
</pattern>
</input>
<input>
<pattern>
<name>XYZ</name>
<code>7799</code>
<format>csv</format>
</pattern>
</input>
</config>
I want to parse each of these patterns and construct string like : ABC-1234-txt , XYZ-7799-csv etc... and add this to an array. The idea here is to group each pattern by constructing the string which will further be used.
I have tried below command but unable to maintain the grouping :
awk -F'</?name>|</?code>|</?format>' ' { print $2 } ' sample.xml
It simply prints available values of these attributes in xml. As I am not an expert in bash so can anyone please suggest me how to group each pattern in above mentioned format in string.
With bash and xmlstarlet:
mapfile -t array < <(
xmlstarlet select \
--text --template --match '//config/input/pattern' \
--value-of "concat(name,'-',code,'-',format)" -n file.xml
)
declare -p array
Output:
declare -a array=([0]="ABC-1234-txt" [1]="XYZ-7799-csv")
See: help mapfile and xmlstarlet select
with xslt:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:strip-space elements="*"/>
<xsl:template match="pattern">
<xsl:value-of select="concat(name,'-',code,'-',format,'
')"/>
</xsl:template>
</xsl:stylesheet>
Apply the transform via xsltproc:
$ xsltproc example.xslt sample.xml
ABC-1234-txt
XYZ-7799-csv
Populate array with xslt output:
$ declare -a my_array
$ my_array=($(xsltproc example.xslt sample.xml))
$ echo "${my_array[#]}"
ABC-1234-txt XYZ-7799-csv
$ echo "${my_array[1]}"
XYZ-7799-csv

grep result get separated by spaces when saving it to variable

need your help on this. I have a simple XML file goes like:
<Entity ID="12345" Record="1">
<Info>
<Type>Individual</Type>
<Name>Test</Name>
</Info>
<Entity Record="2">
<Info>
<Type>Individual</Type>
<Name>Test2</Name>
</Info>
And what I want to do is to grep the attributes and its value for the node.
This is my code:
entities=($(grep -oP '(?<=<Entity ).*(?=>)' "abc.xml"))
for j in ${!entities[*]}
do
echo "${entities[$j]}"
((count++))
done
echo "Total Count: $count"
Ouput:
ID="12345"
Record="1"
Record="2"
Total Count: 3
However, my desired result is supposed to be:
ID="12345" Record="1"
Record="2"
Total Count: 2
When I save the grep result to a variable, it somehow get separated whenever there is a space. Wondering if anyone could help me on this, thank you in advance.
I would highly suggest to use an XML parser, for example you could use xmlstarlet
Now assuming this is your valid XML file:
<?xml version="1.0" encoding="utf-8"?>
<foo>
<Entity ID="12345" Record="1">
<Info>
<Type>Individual</Type>
<Name>Test</Name>
</Info>
</Entity>
<Entity ID="123456" Record="1">
<Info>
<Type>Individual</Type>
<Name>Test</Name>
</Info>
</Entity>
</foo>
To extract the fields something for starting could be:
xmlstarlet sel -T -t -m //Entity -o ID= -v "#ID" -o " Redcord=" -v "#Record" -n your.xml
This will print:
ID=12345 Redcord=1
ID=123456 Redcord=1
To count the number of elements:
xmlstarlet sel -t -c "count(//Entity)" your.xml
These are just the basics but hope it can help you to get an idea.
your IFS is wrong:
#!/bin/bash
ifs_ini="$IFS"
IFS=$'\n'
entities=( $(grep -oP '(?<=<Entity ).*(?=>)' "abc.xml") )
for j in ${!entities[#]}; do
echo "${entities[$j]}"
((count++))
done
echo "Total Count: $count"
IFS="$ifs_ini"
output:
ID="12345" Record="1"
Record="2"
Total Count: 2

How to parse xml using xmllint and store in arrays

In shell script, I have an xml file as p.xml, as follows and I want to parse it and get values in two arrays. I am trying to use xmllint, but could not get desired data.
<?xml version="1.0" encoding="UTF-8"?>
<Share_Collection>
<Share id="data/Backup" resource-id="data/Backup" resource-type="SimpleShare" share-name="Backup" protocols="cifs,afp"/>
<Share id="data/Documents" resource-id="data/Documents" resource-type="SimpleShare" share-name="Documents" protocols="cifs,afp"/>
<Share id="data/Music" resource-id="data/Music" resource-type="SimpleShare" share-name="Music" protocols="cifs,afp"/>
<Share id="data/OwnCloud" resource-id="data/OwnCloud" resource-type="SimpleShare" share-name="OwnCloud" protocols="cifs,afp"/>
<Share id="data/Pictures" resource-id="data/Pictures" resource-type="SimpleShare" share-name="Pictures" protocols="cifs,afp"/>
<Share id="data/Videos" resource-id="data/Videos" resource-type="SimpleShare" share-name="Videos" protocols="cifs,afp"/>
</Share_Collection>
I want to get an array all share ids and one array containing share-names. So two array would be like
share-ids-array = ["data/Backup", "data/Documents", "data/Music", "data/OwnCloud", "data/Pictures", "data/Videos"]
share-names-array = ["Backup", "Documents", "Music", "OwnCloud", "Pictures", "Videos"]
I started as follows:
xmllint --xpath '//Share/#id' p.xml
xmllint --xpath '//Share/#share-name' p.xml
that gives me
id="data/Backup"
id="data/Documents" id="data/Music" id="data/OwnCloud" id="data/Pictures" id="data/Videos"
Any help to build those two arrays will be appreciated.
Here is one solution with grep (and tr)...sed or awk are other alternatives. By the way, you cannot use hyphens in variable names in bash.
share_ids=($( xmllint --xpath '//Share/#id' p.xml | grep -Po '".*?"' | tr -d \" ))
share_names=($( xmllint --xpath '//Share/#share-name' p.xml | grep -Po '".*?"' | tr -d \" ))
Example:
$ echo ${share_names[#]}
Backup Documents Music OwnCloud Pictures Videos
Using xmlstarlet is probably better, though:
share_names=($( xmlstarlet sel -T -t -m '//Share/#share-name' -v '.' -n p.xml ))

Iterate through XML with xmlstarlet

I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<test-report>
<testsuite>
<test name="RegisterConnection1Tests">
<testcase name="testRregisterConnection001"></testcase>
<testcase name="testRegisterConnection002"></testcase>
</test>
<test name="RegisterConnection2Tests">
<testcase name="testRregisterConnection001"></testcase>
<testcase name="testRegisterConnection002"></testcase>
</test>
</testsuite>
</test-report>
And I want the output:
RegisterConnection1Tests,testRregisterConnection001
RegisterConnection1Tests,testRregisterConnection002
RegisterConnection2Tests,testRregisterConnection001
RegisterConnection2Tests,testRregisterConnection002
I'm confused as to how to show the children as I expected
xmlstarlet sel -t -m 'test-report/testsuite/test' -v '#name' -v '//testcase/#name' -n $1 to work, though it only inputs:
RegisterConnection1TeststestRregisterConnection001
RegisterConnection2TeststestRregisterConnection001
To add the missing comma you can add another -v "','"
In your second column you are selecting with an absolute xpath expression from the root element and not from the element matched by the template, the double slashes are wrong. Since you want one line per testcase I would iterate over the testcase elements and then add the name attribute of the parent element like this:
xmlstarlet sel -t -m 'test-report/testsuite/test/testcase' -v '../#name' -v "','" -v '#name' -n $1

Resources