How to access attributes using Nokogiri - ruby

I have a simple task of accessing the values of some attributes. This is a simple script that uses Nokogiri::XML::Builder to create a simple XML doc.
require 'nokogiri'
builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
xml.Placement(:messageId => "392847-039820-938777", :system => "MOD", :version => "2.0") {
xml.objects {
xml.object(:myattribute => "99", :anotherattrib => "333")
xml.nextobject_ '9387toot'
xml.Entertainment "Last Man Standing"
}
}
end
puts builder.to_xml
puts builder.root.attributes["messageId"]
The results are:
<?xml version="1.0" encoding="UTF-8"?>
<Placement messageId="392847-039820-938777" version="2.0" system="MOD">
<objects>
<object anotherattrib="333" myattribute="99"/>
<nextobject>9387toot</nextobject>
<Entertainment>Last Man Standing</Entertainment>
</objects>
</Placement>
C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/document.rb:178:in `add_child': Document already has a root node (RuntimeError)
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/node.rb:455:in `parent='
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/builder.rb:358:in `insert'
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/builder.rb:350:in `method_missing'
from C:/Documents and Settings/etrojan/workspace/Lads/tryXPATH2.rb:15
The XML that is generated looks fine. However, my attempts to access attributes cause an error to be generated:
Document already has a root node
I don't understand why puts would cause this error.

Using Nokogiri::XML::Reader works for your example, but probably isn't the full answer you are looking for (Note that there is no attributes method for Builder).
reader = Nokogiri::XML::Reader(builder.to_xml)
reader.read #Moves to next node in document
reader.attribute("messageId")
Note that if you issued reader.read again and then tried reader.attribute("messageId") the result will be nil since the current node will not have this attribute.
What you probably want to do is use Nokogiri::XML::Document if you want to search an XML document by attribute.
doc = Nokogiri::XML(builder.to_xml)
elems = doc.xpath("//*[#messageId]") #get all elements with an attribute of 'messageId'
elems[0].attr('messageId') #gets value of attribute of first elem

Here is a slightly more succinct way to access attributes using Nokogiri (assuming you already have your xml stored in a variable called xml, as covered by #atomicules' answer):
xml.xpath("//Placement").attr("messageId")

Related

How do I select an attribute from a Nokogiri::XML.parse result set element [duplicate]

I have a simple task of accessing the values of some attributes. This is a simple script that uses Nokogiri::XML::Builder to create a simple XML doc.
require 'nokogiri'
builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
xml.Placement(:messageId => "392847-039820-938777", :system => "MOD", :version => "2.0") {
xml.objects {
xml.object(:myattribute => "99", :anotherattrib => "333")
xml.nextobject_ '9387toot'
xml.Entertainment "Last Man Standing"
}
}
end
puts builder.to_xml
puts builder.root.attributes["messageId"]
The results are:
<?xml version="1.0" encoding="UTF-8"?>
<Placement messageId="392847-039820-938777" version="2.0" system="MOD">
<objects>
<object anotherattrib="333" myattribute="99"/>
<nextobject>9387toot</nextobject>
<Entertainment>Last Man Standing</Entertainment>
</objects>
</Placement>
C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/document.rb:178:in `add_child': Document already has a root node (RuntimeError)
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/node.rb:455:in `parent='
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/builder.rb:358:in `insert'
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/builder.rb:350:in `method_missing'
from C:/Documents and Settings/etrojan/workspace/Lads/tryXPATH2.rb:15
The XML that is generated looks fine. However, my attempts to access attributes cause an error to be generated:
Document already has a root node
I don't understand why puts would cause this error.
Using Nokogiri::XML::Reader works for your example, but probably isn't the full answer you are looking for (Note that there is no attributes method for Builder).
reader = Nokogiri::XML::Reader(builder.to_xml)
reader.read #Moves to next node in document
reader.attribute("messageId")
Note that if you issued reader.read again and then tried reader.attribute("messageId") the result will be nil since the current node will not have this attribute.
What you probably want to do is use Nokogiri::XML::Document if you want to search an XML document by attribute.
doc = Nokogiri::XML(builder.to_xml)
elems = doc.xpath("//*[#messageId]") #get all elements with an attribute of 'messageId'
elems[0].attr('messageId') #gets value of attribute of first elem
Here is a slightly more succinct way to access attributes using Nokogiri (assuming you already have your xml stored in a variable called xml, as covered by #atomicules' answer):
xml.xpath("//Placement").attr("messageId")

Nokogiri check XML root/file validity

Is there a simple method/way to check if a Nokogiri XML file has a proper root, like xml.valid? A way to check if the XML file contains specific content is very welcome as well.
I'm thinking of something like xml.valid? or xml.has_valid_root?. Thanks!
How are you going to determine what is a proper root?
<foo></foo>
has a proper root:
require 'nokogiri'
xml = '<foo></foo>'
doc = Nokogiri::XML(xml)
doc.root # => #<Nokogiri::XML::Element:0x3fd3a9471b7c name="foo">
Nokogiri has no way of determining that something else should have been the root. You might be able to test if you have foreknowledge of what the root node's name should be:
doc_root_ok = (doc.root.name == 'foo')
doc_root_ok # => true
You can see if the document parsed was well-formed (not needing any fixup), by looking at errors:
doc.errors # => []
If Nokogiri had to modify the document just to parse it, errors will return a list of changes that were made prior to parsing:
xml = '<foo><bar><bar></foo>'
doc = Nokogiri::XML(xml)
doc.errors # => [#<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: bar line 1 and foo>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag bar line 1>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag foo line 1>]
A common and useful pattern is
doc = Nokogiri::XML(xml) do |config|
config.strict
end
This will throw a wobbly if the document is not well formed. I like to do this in order to prevent Nokogiri from being too kind to my XML.

Nokogiri::XML::Builder: Need to use the string "send" as element name

I am writing an application to generate XML files as input to SipP.
One tag frequently used by SipP is 'send'
The problem is, when I use nokogiri to build the xml for me
builder = Nokogiri::XML::Builder.new do |xml|
xml.send "Some Content"
end
I get this
<?xml version="1.0"?>
<Some Content/>
The same happens when I do this:
builder = Nokogiri::XML::Builder.new do |xml|
xml.send(:'send', "Some Content")
end
I can't spell 'SEND' in capital letters, because SipP won't understand it that way.
Any ideas how to force nokogiri to create an element with the name 'send'?
Thank you
From the docs:
The builder works by taking advantage of method_missing. Unfortunately
some methods are defined in ruby that are difficult or dangerous to
remove. You may want to create tags with the name “type”, “class”, and
“id” for example. In that case, you can use an underscore to
disambiguate your tag name from the method call.
So check the following:
irb(main):007:0> Nokogiri::XML::Builder.new { |xml| xml.send_ "foo" }.to_xml
=> "<?xml version=\"1.0\"?>\n<send>foo</send>\n"

Calling super's method (add namespace to Nokogiri XML document)

I have an XML document which is missing some namespace declaration. I know I can define it when I use doc.xpath() method, like the following:
doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/')
However I would like to add it once since I have a lot of xpath calls.
I found out that my Nokogiri::XML::Document is inherited from Nokogiri::XML::Node. And the Node class contains an add_namespace() method. However I can't call it, because it says it is undefined.
Is this because Ruby does not allow calling parent class's functions? Is there a way to go around this?
EDIT
I add the following console example:
> c = Nokogiri.XML(doc_text)
> c.class
=> Nokogiri::XML::Document
> c.add_namespace('a','b')
NoMethodError: undefined method `add_namespace' for #<Nokogiri::XML::Document:0x007fea4ee22c60>
And here is the API document about Nokogiri::XML class
EDIT again:
The original doc was valid xml like this:
<root xmlns:ra="...">
<item>
<title/>
<ra:price/>
</item>
<item>...
</root>
However there are too many items, and I have to create one object for each of these, serialize and save in the database. So for each object I took the item node and turn it into string and saved in the object.
Now after I revive the object from DB and I want to parse the item node again I came to this namespace issue.
While Nokogiri::XML::Document does inherit from Nokogiri::XML::Node, some methods are explicitly removed at the document level, including add_namespace
https://github.com/tenderlove/nokogiri/blob/master/lib/nokogiri/xml/document.rb#L203
As #pguardiario notes, you want to add namespaces to the root element, not the document.
However, doing this after parsing the document is too late. Nokogiri has already created the nodes, discarding the namespaces:
require 'nokogiri'
xml = "<r><a:b/></r>"
doc = Nokogiri.XML(xml)
p doc.at('b').namespace
#=> nil
doc.root.add_namespace 'a', 'foo'
puts doc
#=> <?xml version="1.0"?>
#=> <r xmlns:a="foo">
#=> <b/>
#=> </r>
You'll need to fix your source XML as a string before parsing with Nokogiri. (Unless there's some way with the SAX parser to add the namespace when you hit the first node, before moving on.)

How to generate XML file using Ruby and Builder::XMLMarkup templates?

As you all know, with Rails it is possible to use Builder::XMLMarkup templates to provide an http reponse in XML format instead of HTML (with the respond_to command). My problem is that I would like to use the Builder::XMLMarkup templating system not with Rails but with Ruby only (i.e. a standalone program that generates/outputs an XML file from an XML template). The question is then twofold:
How do I tell the Ruby program which is the template I want to use? and
How do I tell the Builder class which is the output XML file ?
There is already a similar answer to that in Stackoverflow (How do I generate XML from XMLBuilder using a .xml.builder file?), but I am afraid it is only valid for Rails.
Here's a simple example showing the basics:
require 'builder'
#received_data = {:books => [{ :author => "John Doe", :title => "Doeisms" }, { :author => "Jane Doe", :title => "Doeisms II" }]}
#output = ""
xml = Builder::XmlMarkup.new(:target => #output, :indent => 1)
xml.instruct!
xml.books do
#received_data[:books].each do |book|
xml.book do
xml.title book[:title]
xml.author book[:author]
end
end
end
The #output object will contain your xml markup:
<?xml version="1.0" encoding="UTF-8"?>
<books>
<book>
<title>Doeisms</title>
<author>John Doe</author>
</book>
<book>
<title>Doeisms II</title>
<author>Jane Doe</author>
</book>
</books>
The Builder docs at github.com provide more examples and links to more documentation.
To select a specific template, you could pass arguments to your program for this decision.
Anyway, I prefer to use libxml-ruby to parse and build XML documents, but that's a matter of taste.
I used Tilt to do (the first part of) this. It's really easy:
require 'builder'
require 'tilt'
template = Tilt.new('templates/foo.builder')
output = template.render
That will get you a string representation of your xml. At that point you can write it out to disk yourself.

Resources