Xpath attribute and text - xpath

I am learning for an exam and i can't quite figure out what i am doing wrong here.
i got this xml
<?xml version="1.0"?>
<schema xmlns=""
xmlns:xsi="link-2"
xsi:schemeLocation="link-3">
<wm-stats>
<wm jahr="2014">
<teilnehmer platz="1">Deutschland</teilnehmer>
<teilnehmer platz="2">Argentinien</teilnehmer>
<teilnehmer platz="3">Niederlande</teilnehmer>
</wm>
<wm jahr="2010">
<teilnehmer platz="1">Spanien</teilnehmer>
<teilnehmer platz="2">Holland</teilnehmer>
<teilnehmer platz="3">Deutschland</teilnehmer>
</wm>
<wm jahr="2006">
<teilnehmer platz="1">Italien</teilnehmer>
<teilnehmer platz="2">Frankreich</teilnehmer>
<teilnehmer platz="3">Deutschland</teilnehmer>
</wm>
<record name="Rekordtorschütze">
<person> Miroslav Klose </person> hat in Brasilien ...
</record>
<record name="Rekordweltmeisterschaften">
<ort> Brasilien </ort> ist mit 5 Weltmeistersiegen ...
</record>
</wm-stats>
</schema>
i now need to find all the years where holland was taking part in the championship, i know that i have to look for something like this //wm[#jahr]/teilnehmer[text()="Holland"]
But how do i get the value of jahr now? the correct node to be located would be jahr 2010.

The other way around
//wm[teilnehmer = "Holland"]/#jahr
but your approach is not unsalvageable, either
//wm[#jahr]/teilnehmer[. ="Holland"]/../#jahr
* note that [#jahr] is actually superfluous in this expression
You can always navigate upwards (and sideways) in XPath.
Have a look at this comprehensive image explaining the various XPath axes available for navigation: https://our.umbraco.org/wiki/reference/xslt/xpath-axes-and-their-shortcuts/

Related

XPATH get default value when node is empty or not present

I have 3 types of data
<results>
<place>
<key>place</key>
<value>1</value>
</place>
</results>
OR
<results>
<place>
<key>place</key> // notice the missing value
</place>
</results>
OR
<results>
</results>
So my sample data will be like
<event>
<results>
<place>
<key>place</key>
<value>1</value>
</place>
<some additional data here>
</results>
</event>
<event>
<results>
<place>
<key>place</key>
</place>
<some additional data here>
</results>
</event>
<event>
<results>
<some additional data here>
</results>
</event>
I need an XPath expression that can give me a default value when <value> of <place> is present, null or missing. <place> can be missing as well in some cases as mentioned in my third sample data.
Output that I expect here is 1, <default-value>, <default-value>.
XPATH 2.0 solution will work as well. I have tried scourging stackoverflow and google but couldnt find anything.
Use:
//results/concat(place/value, for $r in . return 'default-value'[not($r/place/value)])
XSLT - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:sequence select=
"//results/concat(place/value, for $r in . return 'default-value'[not($r/place/value)])"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided (and completed) XML document:
<t>
<event>
<results>
<place>
<key>place</key>
<value>1</value>
</place>
<x/>
</results>
</event>
<event>
<results>
<place>
<key>place</key>
</place>
<y/>
</results>
</event>
<event>
<results>
<z/>
</results>
</event>
</t>
the XPath expression is evaluated and its results are copied to the output:
1 default-value default-value
I did it finally after a lot of trial and error.
{xpath::/events/event/(results//(place|rank)/value/string(), '')[1]}
the trick was to go one level up i.e. <results> in my case and then use the (if value present, default-value) XPATH notation.
Earlier, I was trying this unsuccessfully.
{xpath::/events/event/results//((place|rank)/value/string(), '')[1]}

XPath results based on two nodes

I have XML that has a lot of duplicated values. I'd like to select all the rows with a specific section ("sec") and section tag ("sec_tag"), but I can't seem to get the XPath correct.
Here's a small snippet of the XML:
<root>
<record>
<sec>5</sec>
<sec_tag>919</sec_tag>
<nested_tag>
<info>Info</info>
<types>
<type>1</type>
<type>2</type>
<type>3</type>
</types>
</nested_tag>
<flags>00000000</flags>
</record>
<record>
<sec>5</sec>
<sec_tag>930</sec_tag>
<nested_tag>
<info>Info</info>
<types>
<type>1</type>
<type>2</type>
<type>3</type>
</types>
</nested_tag>
<flags>00000000</flags>
</record>
<record>
<sec>7</sec>
<sec_tag>919</sec_tag>
<nested_tag>
<info>Info</info>
<types>
<type>1</type>
<type>2</type>
<type>3</type>
</types>
</nested_tag>
<flags>00000000</flags>
</record>
</root>
I want the node that has <sec>5</sec> and <sec_tag>919</sec_tag>.
I tried something like this:
//sec[text(), "5"] and //sec_tag[text(), "919"]
Obviously that's not the correct syntax there, I just need to find the correct XPath expression.
You can use the following XPath expression to return record elements having child sec equals 5 and sec_tag equals 919 :
//record[sec = 5 and sec_tag = 919]

xpath expression to compare and evaluate value based on condition

<School>
<Child_One>
<Subject>
<name>computers</name>
<marks>55</marks>
<name>mathematics</name>
<marks>44</marks>
</Subject>
<Child_One>
<Child_Two>
<name>computers</name>
<marks>66</marks>
<name>mathematics</name>
<marks>77</marks>
</Child_Two>
</School>
Can anybody help me to find the Child_One subject name, in which he got highest marks
Thanks
First of all a few formatting things:
Your XML is not quite well formatted. It should have the same start and end tags
I believe the Subject element should look different then posted
When posting a input XML, don't use backticks, but indent the XML with 4 spaces to format it well on Stackoverflow
I used and changed the input XML to this:
<?xml version="1.0" encoding="UTF-8"?>
<School>
<Child_One>
<Subject>
<name>computers</name>
<marks>55</marks>
</Subject>
<Subject>
<name>mathematics</name>
<marks>44</marks>
</Subject>
</Child_One>
<Child_Two>
<Subject>
<name>computers</name>
<marks>66</marks>
</Subject>
<Subject>
<name>mathematics</name>
<marks>77</marks>
</Subject>
</Child_Two>
</School>
With XPath 2.0 you can use the following the find the max value:
/School/Child_One/Subject[marks = max(/School/Child_One/Subject/marks)]/name
With XPath 1.0 you can use the following (replace < with > to find minimum):
/School/Child_One/Subject[not(marks < /School/Child_One/Subject/marks)][1]/name

Replacing xml tags in BASH

I have a large collection of xml documents with a wide array of different tags in them. I need to change all tags of the form <foo> and turn them into tags of the form <field name="foo"> in a way that will also ignore the attributes of a given tag. That is, a tag of the form <foo id="bar"> should also be changed to the tag <field name="foo">.
In order for this transformation to work, I also need to distinguish between <foo> and </foo>, as </foo> must go to </field>.
I have played around with sed in a bash script, but to no avail.
Although sed is not ideal for this task (see comments; further reading: regular, context-free grammar and xml), it can be pressed into service. Try this one-liner:
sed -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
First it will replace all end tags with </field>, then replace every open tag first words with <field name="firstStoredWord">
This solution prints everything on the standard output. If you want to replace it in file directly when processing, try
sed -i -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
That makes from
<html>
<person>
but <person name="bob"> and <person name="tom"> would both become
</person>
this
<field name="html">
<field name="person">
but <field name="person"> and <field name="person"> would both become
</field>
Sed is the wrong tool for the job - a simple XSL Transform can do this much more reliably:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="foo">
<field name="foo">
<xsl:apply-templates/>
</field>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that unlike sed, it can handle short empty elements, newlines within tags (e.g. as produced by some tools), and just about anything that's well-formed XML. Here's my test file:
<?xml version="1.0"?>
<doc>
<section>
<foo>Plain foo, simple content</foo>
</section>
<foo attr="0">Foo with attr, with content
<bar/>
<foo attr="shorttag"/>
</foo>
<foo
attr="1"
>multiline</foo
>
<![CDATA[We mustn't transform <foo> in here!]]>
</doc>
which is transformed by the above (using xsltproc 16970175.xslt 16970175.xml) to:
<?xml version="1.0"?>
<doc>
<section>
<field name="foo">Plain foo, simple content</field>
</section>
<field name="foo">Foo with attr, with content
<bar/>
<field name="foo"/>
</field>
<field name="foo">multiline</field>
We mustn't transform <foo> in here!
</doc>

traversing ruby map issues

I'm pulling the following XML from mediawiki API
<?xml version="1.0"?>
<api>
<query>
<pages>
<page pageid="309311" ns="0" title="Chenonetta jubata">
<images>
<im ns="6" title="File:Australian Wood Duck.jpg" />
<im ns="6" title="File:Australian Wood Duck Female.JPG" />
<im ns="6" title="File:Australian Wood Duck Male.JPG" />
...
</images>
</page>
</pages>
</query>
</api>
and reading it into a Ruby map using xmlSimple. The data which I'm really trying to get is the image names from the images section but when I attempt to go past the query level with
x= result['query']['pages']
puts x
I'm getting the following error:
in `[]': can't convert String into Integer (TypeError)
what am I doing wrong?
Thanks,
m
I used Nokogiri in the end which allows xpath notation to traverse the xml tree.
e.g.
licenseinfo = results3.xpath("//api/query/pages/page/categories/cl/#title")

Resources