Search node in xml by using Nokogiri xpath (with xml namesapce) - ruby

I found Nokogiri is quite powerful on dealing with xml but I met a special case
I am trying to search a node in xml file like this
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationSection>
<Configuration xmlns="clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server" >
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>
</ConfigurationSection>
When I do a
xml = File.new(webconfig,"r")
doc = Nokogiri::XML(xml.read)
nodes = doc.search("//Configuration.Store")
xml.close
I got empty nodes. Something am I missing? I have tried
nodes = doc.search("//Configuration\.Store")
still no luck.
Updated: I have attached the whole xml file
Updated the xml Again: My mistake, it does have a namaspace

EDIT #2: Solution now includes #parse_with_namespace
You can find a number of Nokogiri methods pertaining to namespaces in the Nokogiri::XML::Node documentation.
# encoding: UTF-8
require 'rspec'
require 'nokogiri'
XML = <<XML
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationSection>
<Configuration xmlns="clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server" >
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>
</ConfigurationSection>
XML
class ConfigParser
def parse(xml)
doc = Nokogiri::XML(xml).remove_namespaces!
configuration = doc.at('/ConfigurationSection/Configuration')
store = configuration.at("./Configuration.Store").text
engine = configuration.at("./Configuration.Engine").text
{store: store, engine: engine}
end
def parse_with_namespace(xml)
doc = Nokogiri::XML(xml)
configuration = doc.at('/ConfigurationSection/xmlns:Configuration', 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server')
store = configuration.at("./xmlns:Configuration.Store", 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server').text
engine = configuration.at("./xmlns:Configuration.Engine", 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server').text
{store: store, engine: engine}
end
end
describe ConfigParser do
before(:each) do
#parsed = subject.parse XML
#parsed_with_ns = subject.parse_with_namespace XML
end
it "should be able to parse the Configuration Store" do
#parsed[:store].should eq "SqlServer"
end
it "should be able to parse the Configuration Engine" do
#parsed[:engine].should eq "Staging"
end
it "should be able to parse the Configuration Store with namespace" do
#parsed_with_ns[:store].should eq "SqlServer"
end
it "should be able to parse the Configuration Engine with namespace" do
#parsed_with_ns[:engine].should eq "Staging"
end
end

require 'nokogiri'
XML = "<Configuration>
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>"
p Nokogiri::VERSION, Nokogiri.XML(XML).search('//Configuration.Store')
#=> "1.5.0"
#=> [#<Nokogiri::XML::Element:0x8103f0f8 name="Configuration.Store" children=[#<Nokogiri::XML::Text:0x81037524 "SqlServer">]>]
p RUBY_DESCRIPTION
#=> "ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0]"

Related

Building blank XML tags with Nokogiri?

I'm trying to build up an XML document using Nokogiri. Everything is pretty standard so far; most of my code just looks something like:
builder = Nokogiri::XML::Builder.new do |xml|
...
xml.Tag1(object.attribute_1)
xml.Tag2(object.attribute_2)
xml.Tag3(object.attribute_3)
xml.Tag4(nil)
end
builder.to_xml
However, that results in a tag like <Tag4/> instead of <Tag4></Tag4>, which is what my end user has specified that the output needs to be.
How do I tell Nokogiri to put full tags around a nil value?
SaveOptions::NO_EMPTY_TAGS will get you what you want.
require 'nokogiri'
builder = Nokogiri::XML::Builder.new do |xml|
xml.blah(nil)
end
puts 'broken:'
puts builder.to_xml
puts 'fixed:'
puts builder.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS)
output:
(511)-> ruby derp.rb
broken:
<?xml version="1.0"?>
<blah/>
fixed:
<?xml version="1.0"?>
<blah></blah>

Nokogiri XSLT tagging document as XML type when using JSON

I am using Nokogiri to transform an XML document to JSON. The code is straight forward:
#document = Nokogiri::XML(entry.data)
xslt = Nokogiri::XSLT(File.read("#{File.dirname(__FILE__)}/../../xslt/my.xslt"))
transform = xslt.transform(#document)
entry in this case is a Mongoid based model and data is an XML blob attribute stored as a string on MongoDB.
When I dump the contents of transform, the JSON is there. The problem is, Nokogiri is tagging the top of the document with:
<?xml version="1.0"?>
What's the correct way of addressing that?
Try the #apply_to method as below(Source):
require 'nokogiri'
doc = Nokogiri::XML('<?xml version="1.0"><root />')
xslt = Nokogiri::XSLT("<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'/>")
puts xslt.transform(doc)
puts "######"
puts xslt.apply_to(doc)
# >> <?xml version="1.0"?>
# >> ######
# >>

Validating DTD-String with Nokogiri

I am switching from LibXML to Nokogiri. I have a method in my code to check if an xml document matches an Dtd. The Dtd is read from a database (as string).
This is an example within an irb session
require 'xml'
doc = LibXML::XML::Document.string('<foo bar="baz" />') #=> <?xml version="1.0" encoding="UTF-8"?>
dtd = LibXML::XML::Dtd.new('<!ELEMENT foo EMPTY><!ATTLIST foo bar ID #REQUIRED>') #=> #<LibXML::XML::Dtd:0x000000026f53b8>
doc.validate dtd #=> true
As I understand #validate of Nokogiri::XML::Document it is only possible to check DTDs within the Document. How would I do this to archive the same result?
I think what you need is internal_subset:
require 'nokogiri'
doc = Nokogiri::HTML("<!DOCTYPE html>")
# then you can get the info you want
doc.internal_subset # Nokogiri::XML::DTD
# for example you can get name, system_id, external_id, etc
doc.internal_subset.name
doc.internal_subset.system_id
doc.internal_subset.external_id
Here is a full doc of Nokogiri::XML::DTD.
Thanks

Create one XML file that joins many others

I am trying to create an XML using some list of XML's.
here is an example list of XML's
java.xml :
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
c.xml
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
I want to create a Programming.xml using the above XML's
it should look like:
<programming>
<Java>
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
</Java>
<C>
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
</C>
</programming>
I am currently looking into nokogiri to do the same as Performance is a major factor, What I am not sure is how to create nodes for the output XML. any code help in Ruby using Nokogiri is much appreciated.
To create a new XML file with a specific root, it can be as simple as:
doc = Nokogiri.XML("<programming/>")
One way to add a child node to that document:
java = doc.root.add_child('<Java/>').first
To read in another XML file from disk and append it:
java_details = Nokogiri.XML( IO.read )
java << java_details.root
Thus, if you have an array of filenames and you want to construct wrapping elements from each based on the name:
require 'nokogiri'
files = %w[ java.xml c.xml ]
doc = Nokogiri.XML('<programming/>')
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
wrapper = doc.root.add_child("<#{wrap_name} />").first
wrapper << Nokogiri.XML(IO.read(filename)).root
end
puts doc
Alternatively, if you want to use the Builder interface of Nokogiri:
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming do
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
xml.send(wrap_name) do
xml.parent << Nokogiri.XML(IO.read(filename)).root
end
end
end
end
puts builder.to_xml
To install it:
gem install nokogiri
Here's the syntax:
require 'nokogiri'
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming {
xml.Java {
xml.JavaDetails {
xml.SomeList 'List item'
}
}
}
end
The result can be retrieved with to_xml:
builder.to_xml
HTH!

How do I validate specific attributes in XML using Ruby's REXML?

I'm trying to read some XML I've retrieved from a web service, and validate a specific attribute within the XML.
This is the XML up to the tag that I need to validate:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true" />
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010
04:00:28</a:ExtractDateTimeFormatted>
Here's the code I have thus far using REXML in Ruby:
require 'xmlsimple'
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class Listener
include StreamListener
xmlfile = File.new("rbxml_CS_Query.xml")
xmldoc = Document.new(xmlfile)
# Now get the root element
root = xmldoc.root
puts root.attributes["a:EntityList"]
# This will output the date/time of the query response
xmldoc.elements.each("a:BaseEntity"){
|e| puts e.attributes["a:ExtractDateTimeFormatted"]
}
end
I need to validate that ExtractDateTimeFormatted is there and has a valid value for that attribute. Any help is greatly appreciated. :)
Reading from local xml file.
File.open('temp.xml', 'w') { |f|
f.puts request
f.close
}
xml = File.read('temp.xml')
doc = Nokogiri::XML::Reader(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
show = DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
puts show
When I run this code I get an error: "undefined method 'at' for # on line 21
Are you tied to REXML or can you switch to Nokogiri? I highly recommend Nokogiri over the other Ruby XML parsers.
I had to add enough XML tags to make the sample validate.
require 'date'
require 'nokogiri'
xml = %q{<?xml version="1.0"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true"/>
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010</a:ExtractDateTimeFormatted>
</a:BaseEntity>
</a:EntityList>
</a:SearchResult>
</QueryResult>
</QueryResponse>
</s:Body>
</s:Envelope>
}
doc = Nokogiri::XML(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
puts DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
# >> 2010-11-22T00:00:00+00:00
There's a couple things going on that could make this harder to handle than a simple XML file.
The XML is using namespaces. They are useful, but you have to tell the parser how to handle them. That is why I had to add the second parameter to the at() accessor.
The date value is in a format that is often ambiguous. For many days of the year it is hard to tell whether it is mm/dd/yyyy or dd/mm/yyyy. Here in the U.S. we assume it's the first, but Europe is the second. The DateTime parser tries to figure it out but often gets it wrong, especially when it thinks it's supposed to be dealing with a month 22. So, rather than let it guess, I told it to use mm/dd/yyyy format. If a date doesn't match that format, or the date's values are out of range Ruby will raise an exception, so you'll need to code for that.
This is an example of how to retrieve and parse XML on the fly:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::XML(open('http://java.sun.com/developer/earlyAccess/xml/examples/samples/book-order.xml'))
puts doc.class
puts doc.to_xml
And an example of how to read a local XML file and parse it:
require 'nokogiri'
doc = Nokogiri::XML(File.read('test.xml'))
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <root xmlns:foo="bar">
# >> <bar xmlns:hello="world"/>
# >> </root>

Resources