How to get attribute value by another attribute with xmllint - xpath

I've a XML document like this:
<items>
<item id="1" name="CP_09550"/>
<item id="2" name="CP_09551"/>
<item id="3" name="CP_09552"/>
</items>
How can I get the id value with the name parameter for ex: CP_09550 in xmllint?
Thanks

To fetch the value, wrap the XPath expression into a string(...) or number(...) function call:
xmllint --xpath 'string(/items/item[#name="CP_09550"]/#id)' test.xml
This will return exactly 1, so no need to further process the output in a script.

This XPath extracts the wanted ID:
/items/item[#name='CP_09550']/#id
If I execute this in xmllint from the prompt I need to escape the quotes:
xmllint --xpath /items/item[#name=\'CP_09550\']/#id test.xml

Related

Is it possible to use sed instead of Grep -oP to extract a word? [duplicate]

Sometimes I need to quickly extract some arbitrary data from XML files to put into a CSV format. What's your best practices for doing this in the Unix terminal? I would love some code examples, so for instance how can I get the following problem solved?
Example XML input:
<root>
<myel name="Foo" />
<myel name="Bar" />
</root>
My desired CSV output:
Foo,
Bar,
Peter's answer is correct, but it outputs a trailing line feed.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="root">
<xsl:for-each select="myel">
<xsl:value-of select="#name"/>
<xsl:text>,</xsl:text>
<xsl:if test="not(position() = last())">
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Just run e.g.
xsltproc stylesheet.xsl source.xml
to generate the CSV results into standard output.
Use a command-line XSLT processor such as xsltproc, saxon or xalan to parse the XML and generate CSV. Here's an example, which for your case is the stylesheet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="root">
<xsl:apply-templates select="myel"/>
</xsl:template>
<xsl:template match="myel">
<xsl:for-each select="#*">
<xsl:value-of select="."/>
<xsl:value-of select="','"/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
If you just want the name attributes of any element, here is a quick but incomplete solution.
(Your example text is in the file example)
grep "name" example | cut -d"\"" -f2,2
| xargs -I{} echo "{},"
XMLStarlet is a command line toolkit to query/edit/check/transform
XML documents (for more information, see XMLStarlet Command Line XML Toolkit)
No files to write, just pipe your file to xmlstarlet and apply an xpath filter.
cat file.xml | xml sel -t -m 'xpathExpression' -v 'elemName' 'literal' -v 'elname' -n
-m expression
-v value
'' included literal
-n newline
So for your xpath the xpath expression would be //myel/#name
which would provide the two attribute values.
Very handy tool.
Here's a little ruby script that does exactly what your question asks (pull an attribute called 'name' out of elements called 'myel'). Should be easy to generalize
#!/usr/bin/ruby -w
require 'rexml/document'
xml = REXML::Document.new(File.open(ARGV[0].to_s))
xml.elements.each("//myel") { |el| puts "#{el.attributes['name']}," if el.attributes['name'] }
Using xidel:
xidel -s input.xml -e '//myel/concat(#name,",")'
Answering the original question, assuming xml file is "test.xml" that contains:
<root>
<myel name="Foo" />
<myel name="Bar" />
</root>
tr -s "\"" " " < text.xml | awk '{printf "%s,\n", $3}'
Your test file is in test.xml.
sed -n 's/^\s*<myel\s*name="\([^"]*\)".*$/\1,/p' test.xml
It has its pitfalls; for example if it is not strictly given that each myel is on one line you have to "normalize" the XML file first (so each myel is on a separate line).
yq can be used for XML parsing.
It is a lightweight and portable command-line YAML processor and can also deal with XML.
The syntax is similar to jq
Input
<root>
<myel name="Foo" />
<myel name="Bar">
<mysubel>stairway to heaven</mysubel>
</myel>
</root>
usage example 1
yq e '.root.myel.0.+name' $INPUT (version >= 4.30: yq e '.root.myel.0.+#name' $INPUT)
Foo
usage example 2
yq has a nice builtin feature to make XML easily grep-able
yq --input-format xml --output-format props $INPUT
root.myel.0.+name = Foo
root.myel.1.+name = Bar
root.myel.1.mysubel = stairway to heaven
usage example 3
yq can also convert an XML input into JSON or YAML
yq --input-format xml --output-format json $INPUT
{
"root": {
"myel": [
{
"+name": "Foo"
},
{
"+name": "Bar",
"mysubel": "stairway to heaven"
}
]
}
}
yq --input-format xml $FILE (YAML is the default format)
root:
myel:
- +name: Foo
- +name: Bar
mysubel: stairway to heaven

How to get attribute values of multiple nodes in xpath with just xmllint?

I want to query the names of all the persons in the test.xml below.
<body>
<person name="abc"></person>
<person name="def"></person>
<person name="ghi"></person>
</body>
basic query
This has the problem of including "name", which I don't want.
$ xmllint --xpath '//body/person/#name' test.xml`
name="abc"
name="def"
name="ghi"
string function
Using the string function, I only get one result.
$ xmllint --xpath 'string(//body/person/#name)' test.xml
abc
sed and grep
This works but looks needlessly complicated to me.
xmllint --xpath '//body/person/#name' test.xml | grep -o '"\([^"]*\)"' | sed 's|"||g'
abc
def
ghi
Question
Is it possible to get multiple values without the attribute name and without using another tool like grep?
I don't know about xmllint, but xmlstarlet can do it:
xmlstarlet sel -t -v 'body/person/#name' test.xml
Output:
abc
def
ghi

Combine results from multiple xpaths in xmlint

I'd like to return the results from multiple --xpath phrases. I have tried
xmllint --xpath '//ItemGroup/Content/#Include' --xpath '//ItemGroup/None/#Include' --xpath '//Compile/#Include' -
This returns only the last xpath
I have tried the same thing using concat() however that returns only one match from each xpath:
xmllint --xpath "concat(concat(//ItemGroup/None/#Include,' ', //ItemGroup/Content/#Include), ' ', //ItemGroup/Compile/#Include)" -
You should you the union operator | like this:
xmllint --xpath '//ItemGroup/Content/#Include | //ItemGroup/None/#Include | //Compile/#Include' input.xml
With a sample XML like
<?xml version="1.0" encoding="UTF-8"?>
<class>
<ItemGroup>
<Content Include="First Item"> aaa </Content>
<None Include="Third Item"> bbb </None>
<Content Include="Second Item"> aaa </Content>
</ItemGroup>
<parent>
<Compile Include="Compiling is great"> aaa </Compile>
<sub2> bbb </sub2>
</parent>
</class>
the output is:
Include="First Item" Include="Third Item" Include="Second Item" Include="Compiling is great"
This does work with XPath-1.0.

Look for more then one value using xmllint

I need to retrieve more then one value from several XML-blocks inside a XML-file. How can I use xmllint to do this?
I noticed this solution (xml_grep get attribute from element) and tried to extend it. Unfortunately without any luck so far.
xmllint --xpath 'string(//identity/#name #placeofbirth #photo)' file.xml
Example XML file:
<eid>
<identity>
<name>Menten</name>
<firstname>Kasper</firstname>
<middlenames>Marie J</middlenames>
<nationality>Belg</nationality>
<placeofbirth>Sint-Truiden</placeofbirth>
<photo>base64-string</photo>
</identity>
<identity>
<name>Herbal</name>
<firstname>Jane</firstname>
<middlenames>Helena</middlenames>
<nationality>Frans</nationality>
<placeofbirth>Paris</placeofbirth>
<photo>notavailable</photo>
</identity>
</eid>
Output wanted
Kasper, Sint-Truiden, base64-string
Jane, Paris, notavailable
One way to do that is
# Read xml into variable
xmlStr=$(cat test.xml)
# Count identity nodes
nodeCount=$(echo "$xmlStr" | xmllint --xpath "count(//identity)" -)
# Iterate the nodeset by index
for i in $(seq 1 $nodeCount);do
echo "$xmlStr" | xmllint --xpath "concat((//identity)[$i]/name,', ',(//identity)[$i]/placeofbirth, ', ', (//identity)[$i]/photo)" - ; echo
done
Result:
Menten, Sint-Truiden, base64-string
Herbal, Paris, notavailable

xmllint only selects first element

I would like to extract the text of every <id>element with xmllint into a text file.
<rss>
<channel>
<item>
<id>111</id>
<description>text 1</description>
</item>
<item>
<id>222</id>
<description>text 2</description>
</item>
<item>
<id>333</id>
<description>text 3</description>
</item>
</channel>
</rss>
Each element should be on a separate line in the text file, like this:
111
222
333
I'm already getting stuck at selecting all elements. For some reason my xmllint command only returns the first element.
xmllint test.xml --xpath "string(//id)"
I've tried so many variations of that --xpath statement, but can't seem to figure it out.
(I don't know if it is relevant, but I'm using xmllint on OS X)
xmllint --shell test.xml <<< echo 'cat //id/text()' > out.txt

Resources