Ruby Nokogiri XML Parsing for NodeSet - ruby

I'm having an issue with parsing some XML using Nokogiri in Ruby 2.6.5. I've checked here and other posts, but I still can't seem to get the Nokogiri bit to take. I've tried different nodes, all with the same result of thing being NilClass.
require 'nokogiri'
xml_str = <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2">
<Document>
<description><![CDATA[powered by WordPress & MapsMarker.com]]></description>
<open>0</open>
<Style id="bar"><IconStyle><Icon><href>http://epic-curiousity.com/wp-content/uploads/leaflet-maps-marker-icons/bar.png</href></Icon></IconStyle></Style>
<name>Epic Curiousity</name>
<Placemark id="marker-35">
<styleUrl>#bar</styleUrl>
<name>Brasserie de Rochefort</name>
<TimeStamp><when>2014-06-13T07:06:01-08:00</when></TimeStamp>
<atom:author><atom:name>epiccuri</atom:name></atom:author>
<atom:link rel="related" href="http://epic-curiousity.com" />
<description><![CDATA[The Brasserie de Rochefort is located inside the Abbey of Notre-Dame de Saint-Rémy in Rochefort.  They're a trappist brewery and produce three very-fine beers:<ul><li>Trappistes Rochefort 6 - Dubbel 7.5% ABV</li><li>Trappistes Rochefort 8 - Belgian Strong Dark Ale 9.2% ABV</li><li>Trappistes Rochefort 10 - Quadrupel (Quad) 11.30% ABV</li></ul>You can find the brewery's homepage here: http://www.trappistes-rochefort.com/<br /><br />Their BeerAdvocate page is located here: http://www.beeradvocate.com/beer/profile/207/]]></description>
<address><![CDATA[Brasserie Scaillet, Rue de la Griotte, Rochefort, Belgium]]></address>
<Point>
<coordinates>5.199692,50.175346</coordinates>
</Point>
</Placemark>
</Document>
</kml>
EOF
doc = Nokogiri::XML(xml_str)
puts doc.class # => Nokogiri::XML::Document
thing = doc.at_xpath("Document")
puts thing.class # ==> NilClass
Anybody know why this isn't being recognized as a nodeset? I've tried with this as well with the same results:
doc = Nokogiri::XML.parse(xml_str)

This is due to that the xml has namespaces and these needs to be included in xpath query:
document = doc.at_xpath('//xmlns:Document')
document.class
=> Nokogiri::XML::Element
document_name = doc.at_xpath('//xmlns:Document/xmlns:name')
document_name.class
=> Nokogiri::XML::Element
document_name.content
=> "Epic Curiousity"
To get a name in atom:name:
atom_name = doc.at_xpath('//atom:name')
atom_name.content
=> "Epic Curiousity"

I see on docs that you can use css to find what you want
puts doc.at_css("Document")
# show all node Document
puts doc.css("name")
# <name>Epic Curiousity</name>
# <name>Brasserie de Rochefort</name>
puts doc.css("Placemark name")
# <name>Brasserie de Rochefort</name>
puts doc.css("Document/name")
# <name>Epic Curiousity</name>

Related

Output array of tag contents using REXML?

This has been asked before in "REXML - How to extract a single element" but the answer doesn't work. Apparently, the text method is no longer available.
I have an XML file:
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
and I can place its contents into an array using REXML:
flavors = xml_file.get_elements('//flavor')
I get an array:
puts flavors[0]
Which returns:
<flavor>Vanilla</flavor>
Instead, I want:
Vanilla
I've tried:
flavors = xml_file.get_elements('//flavor').text
But, I get:
NoMethodError: undefined method `text' for #<Array:0x007fa7a3b94220>
What's the correct way to accomplish this? I'm open to using other libraries, too.
Use Nokogiri. Your code will thank you.
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
EOT
doc.search('flavor') # => [#<Nokogiri::XML::Element:0x3feb8182fc60 name="flavor" children=[#<Nokogiri::XML::Text:0x3feb8182fa44 "Vanilla">]>]
doc.search('flavor').map(&:text) # => ["Vanilla"]
search finds all nodes, as a NodeSet, that match the CSS selector 'flavor'.
search('flavor').map(&:text) walks the NodeSet and applies (map) the text method to each Node, returning its text node(s).
If your XML is actually something more complex:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
<flavor>Chocolate</flavor>
<flavor>Strawberry</flavor>
</ice_cream>
EOT
doc.search('flavor') # => [#<Nokogiri::XML::Element:0x3fcc2a577afc name="flavor" children=[#<Nokogiri::XML::Text:0x3fcc2a5778e0 "Vanilla">]>, #<Nokogiri::XML::Element:0x3fcc2a5776c4 name="flavor" children=[#<Nokogiri::XML::Text:0x3fcc2a5774bc "Chocolate">]>, #<Nokogiri::XML::Element:0x3fcc2a5772b4 name="flavor" children=[#<Nokogiri::XML::Text:0x3fcc2a572c78 "Strawberry">]>]
doc.search('flavor').map(&:text) # => ["Vanilla", "Chocolate", "Strawberry"]
Or:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<ice_creams>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
<ice_cream>
<flavor>Chocolate</flavor>
</ice_cream>
<ice_cream>
<flavor>Strawberry</flavor>
</ice_cream>
</ice_creams>
EOT
ice_cream = doc.search('ice_cream') # => [#<Nokogiri::XML::Element:0x3fe6a91f6b00 name="ice_cream" children=[#<Nokogiri::XML::Text:0x3fe6a91f68f8 "\n ">, #<Nokogiri::XML::Element:0x3fe6a91f681c name="flavor" children=[#<Nokogiri::XML::Text:0x3fe6a91f6600 "Vanilla">]>, #<Nokogiri::XML::Text:0x3fe6a91f63f8 "\n ">]>, #<Nokogiri::XML::Element:0x3fe6a91f1de4 name="ice_cream" children=[#<Nokogiri::XML::Text:0x3fe6a91f1bdc "\n ">, #<Nokogiri::XML::Element:0x3fe6a91f1ac4 name="flavor" children=[#<Nokogiri::XML::Text:0x3fe6a91f1880 "Chocolate">]>, #<Nokogiri::XML::Text:0x3fe6a91f1678 "\n ">]>, #<Nokogiri::XML::Element:0x3fe6a91f13f8 name="ice_cream" children=[#<Nokogiri::XML::Text:0x3fe6a91f1074 "\n ">, #<Nokogiri::XML::Element:0x3fe6a91f0e80 name="flavor" children=[#<Nokogiri::XML::Text:0x3fe6a91f0a98 "Strawberry">]>, #<Nokogiri::XML::Text:0x3fe6a91f0840 "\n ">]>]
ice_cream.search('flavor').map(&:text) # => ["Vanilla", "Chocolate", "Strawberry"]
For searching, Nokogiri supports using both CSS and XPath selectors, and allows you to use either in the methods, if you want. search accepts both CSS and XPath, and has corollaries of css and xpath for the CSS or XPath specific methods. at returns a single Node and is similar to search('some_node').first and has at_css and at_xpath respectively.
Here is the code :
require 'rexml/document'
doc = <<-xml
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
xml
xml_doc = REXML::Document.new(doc)
xml_doc.get_elements('//flavor').class # => Array
xml_doc.get_elements('//flavor')[0].class # => REXML::Element
xml_doc.get_elements('//flavor')[0].text # => "Vanilla"
Actually xml_doc.get_elements('//flavor') will give you the collection of REXML::Element objects. You then need to iterate through the collection and call the method #text on the REXML::Element object to get the text.

Cant find element in clone document

I am using Nokogiri (1.5.9 - java) in JRuby ( 1.6.7.2 ) to copy an XML template and edit it. I'm having problems finding elements in the cloned document.
lblock = doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first
lblock.children = new_children # kind of NodeSet or Node
copy_doc = doc.dup( 1 ) # or dup(0)
lblock = copy_doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first # nil
When print to_s or to_xml, so lblock there is with new_children.
Where is my mistake?
I can't duplicate the problem:
require 'nokogiri'
new_children = Nokogiri::XML::DocumentFragment.parse('<foo>bar</foo>')
doc = Nokogiri::XML(<<EOF)
<xml>
<lblock blockName="WINDOW_LIST" />
</xml>
EOF
lblock = doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first
lblock.children = new_children # kind of NodeSet or Node
copy_doc = doc.dup(1) # or dup(0)
lblock = copy_doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first # nil
puts lblock.to_xml
puts
puts doc.to_xml
Running that outputs:
<lblock blockName="WINDOW_LIST">
<foo>bar</foo>
</lblock>
<?xml version="1.0"?>
<xml>
<lblock blockName="WINDOW_LIST"><foo>bar</foo></lblock>
</xml>
That said, here's code that is cleaned up to show you some simpler ways to write it:
require 'nokogiri'
new_children = '<foo>bar</foo>'
doc = Nokogiri::XML(<<EOF)
<xml>
<lblock blockName="WINDOW_LIST" />
</xml>
EOF
lblock = doc.at_xpath('//lblock')
lblock.children = new_children
copy_doc = doc.dup(1)
lblock = copy_doc.at_css('lblock')
puts lblock.to_xml
puts
puts doc.to_xml
Which outputs this too after running:
<lblock blockName="WINDOW_LIST">
<foo>bar</foo>
</lblock>
<?xml version="1.0"?>
<xml>
<lblock blockName="WINDOW_LIST"><foo>bar</foo></lblock>
</xml>
Dissecting the code:
lblock = doc.at_xpath('//lblock')
lblock = copy_doc.at_css('lblock')
These use two different ways of finding the same thing. In this case, because the sample XML was simple, I used at, which returns the first matching node. at_xpath and at_css work with XPaths and CSS respectively. at would try to figure out whether the string is CSS or XPath, and normally gets it right, though I have seen it fooled.
lblock.children = new_children
In this case, new_children is a String. Nokogiri is smart enough to know it should convert the string into an XML fragment before using it. This makes it very easy to modify XML or HTML documents with strings, instead of having to create DocumentFragments.

Search node in xml by using Nokogiri xpath (with xml namesapce)

I found Nokogiri is quite powerful on dealing with xml but I met a special case
I am trying to search a node in xml file like this
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationSection>
<Configuration xmlns="clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server" >
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>
</ConfigurationSection>
When I do a
xml = File.new(webconfig,"r")
doc = Nokogiri::XML(xml.read)
nodes = doc.search("//Configuration.Store")
xml.close
I got empty nodes. Something am I missing? I have tried
nodes = doc.search("//Configuration\.Store")
still no luck.
Updated: I have attached the whole xml file
Updated the xml Again: My mistake, it does have a namaspace
EDIT #2: Solution now includes #parse_with_namespace
You can find a number of Nokogiri methods pertaining to namespaces in the Nokogiri::XML::Node documentation.
# encoding: UTF-8
require 'rspec'
require 'nokogiri'
XML = <<XML
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationSection>
<Configuration xmlns="clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server" >
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>
</ConfigurationSection>
XML
class ConfigParser
def parse(xml)
doc = Nokogiri::XML(xml).remove_namespaces!
configuration = doc.at('/ConfigurationSection/Configuration')
store = configuration.at("./Configuration.Store").text
engine = configuration.at("./Configuration.Engine").text
{store: store, engine: engine}
end
def parse_with_namespace(xml)
doc = Nokogiri::XML(xml)
configuration = doc.at('/ConfigurationSection/xmlns:Configuration', 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server')
store = configuration.at("./xmlns:Configuration.Store", 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server').text
engine = configuration.at("./xmlns:Configuration.Engine", 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server').text
{store: store, engine: engine}
end
end
describe ConfigParser do
before(:each) do
#parsed = subject.parse XML
#parsed_with_ns = subject.parse_with_namespace XML
end
it "should be able to parse the Configuration Store" do
#parsed[:store].should eq "SqlServer"
end
it "should be able to parse the Configuration Engine" do
#parsed[:engine].should eq "Staging"
end
it "should be able to parse the Configuration Store with namespace" do
#parsed_with_ns[:store].should eq "SqlServer"
end
it "should be able to parse the Configuration Engine with namespace" do
#parsed_with_ns[:engine].should eq "Staging"
end
end
require 'nokogiri'
XML = "<Configuration>
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>"
p Nokogiri::VERSION, Nokogiri.XML(XML).search('//Configuration.Store')
#=> "1.5.0"
#=> [#<Nokogiri::XML::Element:0x8103f0f8 name="Configuration.Store" children=[#<Nokogiri::XML::Text:0x81037524 "SqlServer">]>]
p RUBY_DESCRIPTION
#=> "ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0]"

How to navigate a XML object in Ruby

I have a regular xml object created from a response of a web service.
I need to get some specific values from some specific keys... for example:
<tag>
<tag2>
<tag3>
<needThisValue>3</needThisValue>
<tag4>
<needThisValue2>some text</needThisValue2>
</tag4>
</tag3>
</tag2>
</tag>
How can I get <needThisValue> and <needThisValue2> in Ruby?
I'm a big fan of Nokogiri:
xml = <<EOT
<tag>
<tag2>
<tag3>
<needThisValue>3</needThisValue>
<tag4>
<needThisValue2>some text</needThisValue2>
</tag4>
</tag3>
</tag2>
</tag>
EOT
This creates a document for parsing:
require 'nokogiri'
doc = Nokogiri::XML(xml)
Use at to find the first node matching the accessor:
doc.at('needThisValue2').class # => Nokogiri::XML::Element
Or search to find all nodes matching the accessor as a NodeSet, which acts like an Array:
doc.search('needThisValue2').class # => Nokogiri::XML::NodeSet
doc.search('needThisValue2')[0].class # => Nokogiri::XML::Element
This uses a CSS accessor to locate the first instance of each node:
doc.at('needThisValue').text # => "3"
doc.at('needThisValue2').text # => "some text"
Again with the NodeSet using CSS:
doc.search('needThisValue')[0].text # => "3"
doc.search('needThisValue2')[0].text # => "some text"
You can use XPath accessors instead of CSS if you want:
doc.at('//needThisValue').text # => "3"
doc.search('//needThisValue2').first.text # => "some text"
Go through the tutorials to get a jumpstart. It's very powerful and quite easy to use.
require "rexml/document"
include REXML
doc = Document.new string
puts XPath.first(doc, "//tag/tag2/tag3/needThisValue").text
puts XPath.first(doc, "//tag/tag2/tag3/tag4/needThisValue2").text
Try this Nokogiri tutorial.
You'll need to install nokogiri gem.
Good luck.
Check out the Nokogiri gem. You can read some tutorials enter link description here. It's fast and simple.

How do I validate specific attributes in XML using Ruby's REXML?

I'm trying to read some XML I've retrieved from a web service, and validate a specific attribute within the XML.
This is the XML up to the tag that I need to validate:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true" />
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010
04:00:28</a:ExtractDateTimeFormatted>
Here's the code I have thus far using REXML in Ruby:
require 'xmlsimple'
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class Listener
include StreamListener
xmlfile = File.new("rbxml_CS_Query.xml")
xmldoc = Document.new(xmlfile)
# Now get the root element
root = xmldoc.root
puts root.attributes["a:EntityList"]
# This will output the date/time of the query response
xmldoc.elements.each("a:BaseEntity"){
|e| puts e.attributes["a:ExtractDateTimeFormatted"]
}
end
I need to validate that ExtractDateTimeFormatted is there and has a valid value for that attribute. Any help is greatly appreciated. :)
Reading from local xml file.
File.open('temp.xml', 'w') { |f|
f.puts request
f.close
}
xml = File.read('temp.xml')
doc = Nokogiri::XML::Reader(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
show = DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
puts show
When I run this code I get an error: "undefined method 'at' for # on line 21
Are you tied to REXML or can you switch to Nokogiri? I highly recommend Nokogiri over the other Ruby XML parsers.
I had to add enough XML tags to make the sample validate.
require 'date'
require 'nokogiri'
xml = %q{<?xml version="1.0"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true"/>
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010</a:ExtractDateTimeFormatted>
</a:BaseEntity>
</a:EntityList>
</a:SearchResult>
</QueryResult>
</QueryResponse>
</s:Body>
</s:Envelope>
}
doc = Nokogiri::XML(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
puts DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
# >> 2010-11-22T00:00:00+00:00
There's a couple things going on that could make this harder to handle than a simple XML file.
The XML is using namespaces. They are useful, but you have to tell the parser how to handle them. That is why I had to add the second parameter to the at() accessor.
The date value is in a format that is often ambiguous. For many days of the year it is hard to tell whether it is mm/dd/yyyy or dd/mm/yyyy. Here in the U.S. we assume it's the first, but Europe is the second. The DateTime parser tries to figure it out but often gets it wrong, especially when it thinks it's supposed to be dealing with a month 22. So, rather than let it guess, I told it to use mm/dd/yyyy format. If a date doesn't match that format, or the date's values are out of range Ruby will raise an exception, so you'll need to code for that.
This is an example of how to retrieve and parse XML on the fly:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::XML(open('http://java.sun.com/developer/earlyAccess/xml/examples/samples/book-order.xml'))
puts doc.class
puts doc.to_xml
And an example of how to read a local XML file and parse it:
require 'nokogiri'
doc = Nokogiri::XML(File.read('test.xml'))
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <root xmlns:foo="bar">
# >> <bar xmlns:hello="world"/>
# >> </root>

Resources