How do I validate specific attributes in XML using Ruby's REXML? - ruby

I'm trying to read some XML I've retrieved from a web service, and validate a specific attribute within the XML.
This is the XML up to the tag that I need to validate:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true" />
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010
04:00:28</a:ExtractDateTimeFormatted>
Here's the code I have thus far using REXML in Ruby:
require 'xmlsimple'
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class Listener
include StreamListener
xmlfile = File.new("rbxml_CS_Query.xml")
xmldoc = Document.new(xmlfile)
# Now get the root element
root = xmldoc.root
puts root.attributes["a:EntityList"]
# This will output the date/time of the query response
xmldoc.elements.each("a:BaseEntity"){
|e| puts e.attributes["a:ExtractDateTimeFormatted"]
}
end
I need to validate that ExtractDateTimeFormatted is there and has a valid value for that attribute. Any help is greatly appreciated. :)
Reading from local xml file.
File.open('temp.xml', 'w') { |f|
f.puts request
f.close
}
xml = File.read('temp.xml')
doc = Nokogiri::XML::Reader(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
show = DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
puts show
When I run this code I get an error: "undefined method 'at' for # on line 21

Are you tied to REXML or can you switch to Nokogiri? I highly recommend Nokogiri over the other Ruby XML parsers.
I had to add enough XML tags to make the sample validate.
require 'date'
require 'nokogiri'
xml = %q{<?xml version="1.0"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true"/>
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010</a:ExtractDateTimeFormatted>
</a:BaseEntity>
</a:EntityList>
</a:SearchResult>
</QueryResult>
</QueryResponse>
</s:Body>
</s:Envelope>
}
doc = Nokogiri::XML(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
puts DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
# >> 2010-11-22T00:00:00+00:00
There's a couple things going on that could make this harder to handle than a simple XML file.
The XML is using namespaces. They are useful, but you have to tell the parser how to handle them. That is why I had to add the second parameter to the at() accessor.
The date value is in a format that is often ambiguous. For many days of the year it is hard to tell whether it is mm/dd/yyyy or dd/mm/yyyy. Here in the U.S. we assume it's the first, but Europe is the second. The DateTime parser tries to figure it out but often gets it wrong, especially when it thinks it's supposed to be dealing with a month 22. So, rather than let it guess, I told it to use mm/dd/yyyy format. If a date doesn't match that format, or the date's values are out of range Ruby will raise an exception, so you'll need to code for that.
This is an example of how to retrieve and parse XML on the fly:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::XML(open('http://java.sun.com/developer/earlyAccess/xml/examples/samples/book-order.xml'))
puts doc.class
puts doc.to_xml
And an example of how to read a local XML file and parse it:
require 'nokogiri'
doc = Nokogiri::XML(File.read('test.xml'))
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <root xmlns:foo="bar">
# >> <bar xmlns:hello="world"/>
# >> </root>

Related

Nokogiri XSLT tagging document as XML type when using JSON

I am using Nokogiri to transform an XML document to JSON. The code is straight forward:
#document = Nokogiri::XML(entry.data)
xslt = Nokogiri::XSLT(File.read("#{File.dirname(__FILE__)}/../../xslt/my.xslt"))
transform = xslt.transform(#document)
entry in this case is a Mongoid based model and data is an XML blob attribute stored as a string on MongoDB.
When I dump the contents of transform, the JSON is there. The problem is, Nokogiri is tagging the top of the document with:
<?xml version="1.0"?>
What's the correct way of addressing that?
Try the #apply_to method as below(Source):
require 'nokogiri'
doc = Nokogiri::XML('<?xml version="1.0"><root />')
xslt = Nokogiri::XSLT("<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'/>")
puts xslt.transform(doc)
puts "######"
puts xslt.apply_to(doc)
# >> <?xml version="1.0"?>
# >> ######
# >>

Validating DTD-String with Nokogiri

I am switching from LibXML to Nokogiri. I have a method in my code to check if an xml document matches an Dtd. The Dtd is read from a database (as string).
This is an example within an irb session
require 'xml'
doc = LibXML::XML::Document.string('<foo bar="baz" />') #=> <?xml version="1.0" encoding="UTF-8"?>
dtd = LibXML::XML::Dtd.new('<!ELEMENT foo EMPTY><!ATTLIST foo bar ID #REQUIRED>') #=> #<LibXML::XML::Dtd:0x000000026f53b8>
doc.validate dtd #=> true
As I understand #validate of Nokogiri::XML::Document it is only possible to check DTDs within the Document. How would I do this to archive the same result?
I think what you need is internal_subset:
require 'nokogiri'
doc = Nokogiri::HTML("<!DOCTYPE html>")
# then you can get the info you want
doc.internal_subset # Nokogiri::XML::DTD
# for example you can get name, system_id, external_id, etc
doc.internal_subset.name
doc.internal_subset.system_id
doc.internal_subset.external_id
Here is a full doc of Nokogiri::XML::DTD.
Thanks

converting from xml name-values into simple hash

I don't know what name this goes by and that's been complicating my search.
My data file OX.session.xml is in the (old?) form
<?xml version="1.0" encoding="utf-8"?>
<CAppLogin xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://oxbranch.optionsxpress.com">
<SessionID>FE5E27A056944FBFBEF047F2B99E0BF6</SessionID>
<AccountNum>8228-5500</AccountNum>
<AccountID>967454</AccountID>
</CAppLogin>
What is that XML data format called exactly?
Anyway, all I want is to end up with one hash in my Ruby code like so:
CAppLogin = { :SessionID => "FE5E27A056944FBFBEF047F2B99E0BF6", :AccountNum => "8228-5500", etc. } # Doesn't have to be called CAppLogin as in the file, may be fixed
What might be shortest, most built-in Ruby way to automate that hash read, in a way I can update the SessionID value and store it easily back into the file for later program runs?
I've played around with YAML, REXML but would rather not yet print my (bad) example trials.
There are a few libraries you can use in Ruby to do this.
Ruby toolbox has some good coverage of a few of them:
https://www.ruby-toolbox.com/categories/xml_mapping
I use XMLSimple, just require the gem then load in your xml file using xml_in:
require 'xmlsimple'
hash = XmlSimple.xml_in('session.xml')
If you're in a Rails environment, you can just use Active Support:
require 'active_support'
session = Hash.from_xml('session.xml')
Using Nokogiri to parse the XML with namespaces:
require 'nokogiri'
dom = Nokogiri::XML(File.read('OX.session.xml'))
node = dom.xpath('ox:CAppLogin',
'ox' => "http://oxbranch.optionsxpress.com").first
hash = node.element_children.each_with_object(Hash.new) do |e, h|
h[e.name.to_sym] = e.content
end
puts hash.inspect
# {:SessionID=>"FE5E27A056944FBFBEF047F2B99E0BF6",
# :AccountNum=>"8228-5500", :AccountID=>"967454"}
If you know that the CAppLogin is the root element, you can simplify a bit:
require 'nokogiri'
dom = Nokogiri::XML(File.read('OX.session.xml'))
hash = dom.root.element_children.each_with_object(Hash.new) do |e, h|
h[e.name.to_sym] = e.content
end
puts hash.inspect
# {:SessionID=>"FE5E27A056944FBFBEF047F2B99E0BF6",
# :AccountNum=>"8228-5500", :AccountID=>"967454"}

Is there a Nokogiri example code for parsing Acrobat XFDF?

I am looking for a ruby code snippet that shows use of Nokogiri to parse Acrobat XFDF data.
It's no different than parsing any other XML:
require 'nokogiri'
xfdf = '<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
  <f href="Demo PDF Form.pdf"/>
  <fields>
    <field name="Date of Birth">
      <value>01-01-1960</value>
    </field>
    <field name="Your Name">
      <value>Mr. Customer</value>
    </field>
  </fields>
  <ids original="FEBDB19E0CD32274C16CE13DCF244AD2" modified="5BE74DD4F607B7409DC03D600E466E12"/>
</xfdf>
'
doc = Nokogiri::XML(xfdf)
doc.at('//xmlns:f')['href'] # => "Demo PDF Form.pdf"
doc.at('//xmlns:field[#name="Date of Birth"]').text # => "\n      01-01-1960\n    "
doc.at('//xmlns:field[#name="Your Name"]').text # => "\n      Mr. Customer\n    "
It uses a XML namespace, so you have to honor that in the xpaths, or deal with it by telling Nokogiri to ignore them, but this is common in XML.
You can use [nguyen][1] gem to do parsing job
xfdf = Nguyen::Xfdf.new(:key => 'value', :other_key => 'other value')
# use to_xfdf if you just want the XFDF data, without writing it to a file puts
xfdf.to_xfdf
# write xfdf file
xfdf.save_to('path/to/file.xfdf')

Nokogiri and XML Formatting When Inserting Tags

I'd like to use Nokogiri to insert nodes into an XML document. Nokogiri uses the Nokogiri::XML::Builder class to insert or create new XML.
If I create XML using the new method, I'm able to create nice, formatted XML:
builder = Nokogiri::XML::Builder.new do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
outputs the following:
<?xml version="1.0"?>
<product>
<test>hi</test>
</product>
That's great, but what I want to do is add the above XML to an existing document, not create a new document. According to the Nokogiri documentation, this can be done by using the Builder's with method, like so:
builder = Nokogiri::XML::Builder.with(document.at('products')) do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
When I do this, however, the XML all gets put into a single line with no indentation. It looks like this:
<products><product><test>hi</test></product></products>
Am I missing something to get it to format correctly?
Found the answer in the Nokogiri mailing list:
In XML, whitespace can be considered
meaningful. If you parse a document
that contains whitespace nodes,
libxml2 will assume that whitespace
nodes are meaningful and will not
insert them for you.
You can tell libxml2 that whitespace
is not meaningful by passing the
"noblanks" flag to the parser. To
demonstrate, here is an example that
reproduces your error, then does what
you want:
require 'nokogiri'
def build_from node
builder = Nokogiri::XML::Builder.with(node) do|xml|
xml.hello do
xml.world
end
end
end
xml = DATA.read
doc = Nokogiri::XML(xml)
puts build_from(doc.at('bar')).to_xml
doc = Nokogiri::XML(xml) { |x| x.noblanks }
puts build_from(doc.at('bar')).to_xml
Output:
<root>
<foo>
<bar>
<baz />
</bar>
</foo>
</root>

Resources