Schema Validation using Nokogiri - ruby

I am trying to validate an XML document against a dozen or so schemas using Nokogiri. Currently I have a root schema document that imports all the other schemas, and I validate against that.
Can I point to each schema file from the XML file itself, and have Nokogiri look in the XML file for the schemas to validate against?

The proper way to reference multiple schemata against which to validate an XML file is with the schemaLocation attribute:
<?xml version="1.0"?>
<foo xmlns="http://bar.com/foo"
xmlns:bz="http://biz.biz/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://bar.com/foo http://www.bar.com/schemas/foo.xsd
http://biz.biz/ http://biz.biz/xml/ns/bz.xsd">
For each namespace in your document you list a pair of whitespace-delimited values: the namespace URI followed by a 'hint' as to where to find the schema for that namespace. If you provide a full URI for each hint, then you can process this with Nokogiri as such:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.XML( my_xml )
schemata_by_ns = Hash[ doc.root['schemaLocation'].scan(/(\S+)\s+(\S+)/) ]
schemata_by_ns.each do |ns,xsd_uri|
xsd = Nokogiri::XML.Schema(open(xsd_uri))
xsd.validate(doc).each do |error|
puts error.message
end
end
Disclaimer: I have never attempted to validate a single XML document using multiple namespaced schemata with Nokogiri before. As such, I have no direct experience to guarantee that the above validation will work. The validation code is based solely on Nokogiri's schema validation documentation.

Related

How to use "doc" tag in Nokogiri to build an XML document

I have a problem: I must build an XML document with a <doc> tag. I can use any custom tag except "doc".
I need to use "doc". How can I fix this?
You can add an underscore to the name to prevent it being seen as an existing method. See the section “Special Tags” in the Nokogiri Builder docs.
Something like:
Nokogiri::XML::Builder.new do |xml|
# Note the underscore here:
xml.doc_ "A doc tag"
end
This example produces the following XML (the underscore isn’t included in the tag name):
<?xml version="1.0"?>
<doc>A doc tag</doc>

Build Metadata File (txt file) containing JSON

I am building a command line app that will generate metadata files amongst other things. I have a series of values that I want included, and I would like to insert those values into json format and than write it to a .txt file.
The complicated part (to me at least) is some of the values are dynamic (i.e. they may change everytime a file is created), other parts of the json file will need to be static. Is there any sort of templating that may help with this? (json erb)
If I were to use a json erb template, how would I write the result of the template (after it has been populated) to a txt file since this is not a rails app and I thus would not be calling the view.
Thank you in advance for any help.
It seems like two things could be helpful to you, but your question is pretty open ended ...
First, if your json templates are complex (static and dynamic parts?) I suggest you look at a tool like RABL ...
https://github.com/nesquena/rabl
There is a railscast on RABL here:
http://railscasts.com/episodes/322-rabl
RABL lets you create templates for generating custom JSON output.
Regarding writing to a file, you may or may not need to call the controller first. But the flow would be something like:
#sample_controller.rb
require 'json'
def get_sample
#x = {:a => "apple", :b => "baker"}
render json: #x
end
You can call the controller and get the rendered json.
z = get_sample
File.open(yourfile, 'w') { |file| file.write(z) }

XML Namespace issue with Nokogiri

I have the following XML:
<body>
<hello xmlns='http://...'>
<world>yes</world>
</hello>
</body>
When I load that into a Nokogiri XML document, and call document.at_css "world", I receive nil back. But when I remove the namespace for hello, it works perfectly. I know I can call document.remove_namespaces!, but why is it that it will not work with the namespace?
Because Nokogiri requires you to register the XML namespaces you are querying within (read more about XML Namespaces). But you should still be able to query the element if you specify its namespace when calling at_css. To see the exact usage, check out the css method documentation. It should end up looking something like this:
document.at_css "world", 'namespace_name' => 'namespace URI'

Ruby Builder Gem - dynamically set node name

I'm currently using the Builder gem for Ruby to generate XML representations for resources in my application. The XML representation has multiple child nodes that are always structure the same, but the top-level node has a different name, depending on the value of a boolean property of the resource. Is there any way I can generate builder nodes dynamically? Something like this (tried this already, doesn't work):
if resource.attr
top_level_node = :ForFlowBased
else
top_level_node = :ForNonSeamlessOffload
end
builder = Builder::XmlMarkup.new
builder.send(top_level_node). do |top_level_node|
....
end
That code will generate a node <send:ForFlowBased>. Similarly if I call builder.(top_level_node), the xml generated is <call:ForFlowBased>. I'm looking to dynamically send builder the method I want to invoke on it, without adding send or call to the XML as well.
Do this:
builder.tag!(top_level_node) do |top_level_node|
end

HTML Entity problems using Nokogiri::XML.fragment

it seems that all entities are killed using
tags = "<p>test umlauts ö</p>"
Nokogiri::XML.fragment(tags)
Result:
<p>test umlauts </p>
The above method calls Nokogiri::XML::DocumentFragment.parse(tags) and that methods calls
Nokogiri::XML::DocumentFragment.new(XML::Document.new, tags).
In relation to the nokogiri documentation this code will be executed:
def initialize document, tags=nil
if tags
parser = if self.kind_of?(Nokogiri::HTML::DocumentFragment)
HTML::SAX::Parser.new(FragmentHandler.new(self, tags))
else
XML::SAX::Parser.new(FragmentHandler.new(self, tags))
end
parser.parse(tags)
end
end
I think we are dealing with the XML::SAX::Parser and the corresponding FragmentHandler. Digging around the code gives no hint; which parameters do I have to set to get the correct result?
oouml is not a predefined entity in XML. If you want to allow the HTML entity references in XHTML you'd need to use a parser that read the external DTD in the doctype. This is a lot of effort; you may prefer to just use the HTML parser if you have HTML-compatible XHTML with entity references.

Resources