traversing ruby map issues - ruby

I'm pulling the following XML from mediawiki API
<?xml version="1.0"?>
<api>
<query>
<pages>
<page pageid="309311" ns="0" title="Chenonetta jubata">
<images>
<im ns="6" title="File:Australian Wood Duck.jpg" />
<im ns="6" title="File:Australian Wood Duck Female.JPG" />
<im ns="6" title="File:Australian Wood Duck Male.JPG" />
...
</images>
</page>
</pages>
</query>
</api>
and reading it into a Ruby map using xmlSimple. The data which I'm really trying to get is the image names from the images section but when I attempt to go past the query level with
x= result['query']['pages']
puts x
I'm getting the following error:
in `[]': can't convert String into Integer (TypeError)
what am I doing wrong?
Thanks,
m

I used Nokogiri in the end which allows xpath notation to traverse the xml tree.
e.g.
licenseinfo = results3.xpath("//api/query/pages/page/categories/cl/#title")

Related

XPath results based on two nodes

I have XML that has a lot of duplicated values. I'd like to select all the rows with a specific section ("sec") and section tag ("sec_tag"), but I can't seem to get the XPath correct.
Here's a small snippet of the XML:
<root>
<record>
<sec>5</sec>
<sec_tag>919</sec_tag>
<nested_tag>
<info>Info</info>
<types>
<type>1</type>
<type>2</type>
<type>3</type>
</types>
</nested_tag>
<flags>00000000</flags>
</record>
<record>
<sec>5</sec>
<sec_tag>930</sec_tag>
<nested_tag>
<info>Info</info>
<types>
<type>1</type>
<type>2</type>
<type>3</type>
</types>
</nested_tag>
<flags>00000000</flags>
</record>
<record>
<sec>7</sec>
<sec_tag>919</sec_tag>
<nested_tag>
<info>Info</info>
<types>
<type>1</type>
<type>2</type>
<type>3</type>
</types>
</nested_tag>
<flags>00000000</flags>
</record>
</root>
I want the node that has <sec>5</sec> and <sec_tag>919</sec_tag>.
I tried something like this:
//sec[text(), "5"] and //sec_tag[text(), "919"]
Obviously that's not the correct syntax there, I just need to find the correct XPath expression.
You can use the following XPath expression to return record elements having child sec equals 5 and sec_tag equals 919 :
//record[sec = 5 and sec_tag = 919]

xpath expression to compare and evaluate value based on condition

<School>
<Child_One>
<Subject>
<name>computers</name>
<marks>55</marks>
<name>mathematics</name>
<marks>44</marks>
</Subject>
<Child_One>
<Child_Two>
<name>computers</name>
<marks>66</marks>
<name>mathematics</name>
<marks>77</marks>
</Child_Two>
</School>
Can anybody help me to find the Child_One subject name, in which he got highest marks
Thanks
First of all a few formatting things:
Your XML is not quite well formatted. It should have the same start and end tags
I believe the Subject element should look different then posted
When posting a input XML, don't use backticks, but indent the XML with 4 spaces to format it well on Stackoverflow
I used and changed the input XML to this:
<?xml version="1.0" encoding="UTF-8"?>
<School>
<Child_One>
<Subject>
<name>computers</name>
<marks>55</marks>
</Subject>
<Subject>
<name>mathematics</name>
<marks>44</marks>
</Subject>
</Child_One>
<Child_Two>
<Subject>
<name>computers</name>
<marks>66</marks>
</Subject>
<Subject>
<name>mathematics</name>
<marks>77</marks>
</Subject>
</Child_Two>
</School>
With XPath 2.0 you can use the following the find the max value:
/School/Child_One/Subject[marks = max(/School/Child_One/Subject/marks)]/name
With XPath 1.0 you can use the following (replace < with > to find minimum):
/School/Child_One/Subject[not(marks < /School/Child_One/Subject/marks)][1]/name

XPath in Nokogiri returning empty array [] whereas I am expecting to have results

I am trying to parse XML files using Nokogiri, Ruby and XPath. I usually don't encounter any problem but with the following I can't make any xpath request:
doc = Nokogiri::HTML(open("myfile.xml"))
doc.("//Meta").count
# result ==> 0
doc.xpath("//Meta")
# result ==> []
doc.xpath(.).count
# result => 1
Here is an simplified version of my XML File
<Answer xmlns="test:com.test.search" context="hf%3D10%26target%3Dst0" last="0" estimated="false" nmatches="1" nslices="0" nhits="1" start="0">
<time>
...
</time>
<promoted>
...
</promoted>
<hits>
<Hit url="http://www.test.com/" source="test" collapsed="false" preferred="false" score="1254772" sort="0" mask="272" contentFp="4294967295" did="1287" slice="1">
<groups>
...
</groups>
<metas>
<Meta name="enligne">
<MetaString name="value">
</MetaString>
</Meta>
<Meta name="language">
<MetaString name="value">
fr
</MetaString>
</Meta>
<Meta name="text">
<MetaText name="value">
<TextSeg highlighted="false" highlightClass="0">
La
</TextSeg>
</MetaText>
</Meta>
</metas>
</Hit>
</hits>
<keywords>
...
</keywords>
<groups>
...
</groups>
How can I get all children of <Hit> from this XML?
Include the namespace information when calling xpath:
doc.xpath("//x:Meta", "x" => "test:com.test.search")
You can use the remove_namespaces! method and save your day.
This is one of the most FAQ XPAth questions -- search for "XPath default namespace".
If there is no way to register a namespace for the default namespace and use the registered prefix (say "x" in //x:Meta) then use:
//*[name() = 'Meta` and namespace-uri()='test:com.test.search']
If it is known that Meta can only belong to the default namespace, then the above can be shortened to:
//*[name() = 'Meta`]

REXML fails to select from attribute. Bug or incorrect XPath?

I try to select an element from an SVG document by a special attribute.
I set up a simple example.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg">
<g id='1'>
<path id='2' type='A'/>
<rect id='3' type='B'/>
</g>
</svg>
Now I use the following syntax to retrieve the path element by its attribute "type":
require 'rexml/document'
include REXML
xmlfile = File.new "xml_as_specified_above.svg"
xmldoc = Document.new(xmlfile)
XPath.match( xmldoc.root, "//path[#type]" )
Syntax directly from http://www.w3schools.com/xpath/xpath_syntax.asp.
I would expect that this expression selects the path element but this is what follows:
>> XPath.match( xmldoc.root, "//path[#type]" )
=> []
So, what is the correct syntax in XPath to address the path element by it's attribute?
Or is there a bug in REXML (using 3.1.7.3)?
Plus points for also retrieving the "rect" element.
It looks like an older version of rexml is being picked up that doesn't support the full XPath spec.
Try checking the output of puts XPath::VERSION to ensure that 3.1.73 is displayed.
You need to take the default namespace into account. With XPath 1.0 you need to bind a prefix (e.g. svg) to the namespace URI http://www.w3.org/2000/svg and then use a path like //svg:path[#type]. How you bind a prefix to a URI for XPath evaluation depends on the XPath API you use, I am afraid I don't know how that is done with your Ruby API, if you don't find a method or property in the API documentation yourself then maybe someone else comes along later to tell us.
Many of us use Nokogiri these days instead of ReXML or Hpricot, another early Ruby XML parser.
Nokogiri supports both XPath, and CSS accessors, so you can use familiar HTML type paths to get at nodes:
require 'nokogiri'
svg = %q{<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg">
<g id='1'>
<path id='2' type='A'/>
<rect id='3' type='B'/>
</g>
</svg>
}
doc = Nokogiri::XML(svg)
puts doc.search('//svg:path[#type]')
puts doc.search('svg|path[#type]')
puts doc.search('path[#type]')
puts doc.search('//svg:rect')
puts doc.search('//svg:rect[#type]')
puts doc.search('//svg:rect[#rect="B"]')
puts doc.search('svg|rect')
puts doc.search('rect')
# >> <path id="2" type="A"/>
# >> <path id="2" type="A"/>
# >> <path id="2" type="A"/>
# >> <rect id="3" type="B"/>
# >> <rect id="3" type="B"/>
# >> <rect id="3" type="B"/>
# >> <rect id="3" type="B"/>
The first path is XPath with the namespace. The second is CSS with a namespace. The third is CSS without namespaces. Nokogiri, being friendly to humans, will allow us to deal and dispense with the namespaces a couple ways, assuming we are aware of why namespaces are good.
This is the most FAQ: default namespace issue.
Solution:
Instead of:
//path[#type]
use
//svg:path[#type]

How to reference an XML attribute using XPath?

My XML:
<root>
<cars>
<makes>
<honda year="1995">
<model />
<!-- ... -->
</honda>
<honda year="2000">
<!-- ... -->
</honda>
</makes>
</cars>
</root>
I need a XPath that will get me all models for <honda> with year 1995.
so:
/root/cars/makes/honda
But how to reference an attribute?
"I need a XPath that will get me all models for <honda> with year 1995."
That would be:
/root/cars/makes/honda[#year = '1995']/model
Try /root/cars/makes/honda/#year
UPDATE: reading your question again:
/root/cars/makes/honda[#year = '1995']
Bottom line is: use # character to reference xml attributes.

Resources