Nokogiri::XML::Builder: Need to use the string "send" as element name - ruby

I am writing an application to generate XML files as input to SipP.
One tag frequently used by SipP is 'send'
The problem is, when I use nokogiri to build the xml for me
builder = Nokogiri::XML::Builder.new do |xml|
xml.send "Some Content"
end
I get this
<?xml version="1.0"?>
<Some Content/>
The same happens when I do this:
builder = Nokogiri::XML::Builder.new do |xml|
xml.send(:'send', "Some Content")
end
I can't spell 'SEND' in capital letters, because SipP won't understand it that way.
Any ideas how to force nokogiri to create an element with the name 'send'?
Thank you

From the docs:
The builder works by taking advantage of method_missing. Unfortunately
some methods are defined in ruby that are difficult or dangerous to
remove. You may want to create tags with the name “type”, “class”, and
“id” for example. In that case, you can use an underscore to
disambiguate your tag name from the method call.
So check the following:
irb(main):007:0> Nokogiri::XML::Builder.new { |xml| xml.send_ "foo" }.to_xml
=> "<?xml version=\"1.0\"?>\n<send>foo</send>\n"

Related

Nokogiri - Checking if the value of an xpath exists and is blank or not in Ruby

I have an XML file, and before I process it I need to make sure that a certain element exists and is not blank.
Here is the code I have:
CSV.open("#{csv_dir}/products.csv","w",{:force_quotes => true}) do |out|
out << headers
Dir.glob("#{xml_dir}/*.xml").each do |xml_file|
gdsn_doc = GDSNDoc.new(xml_file)
logger.info("Processing xml file #{xml_file}")
:x
#desc_exists = #gdsn_doc.xpath("//productData/description")
if !#desc_exists.empty?
row = []
headers.each do |col|
row << product[col]
end
out << row
end
end
end
The following code is not working to find the "description" element and to check whether it is blank or not:
#desc_exists = #gdsn_doc.xpath("//productData/description")
if !#desc_exists.empty?
Here is a sample of the XML file:
<productData>
<description>Chocolate biscuits </description>
<productData>
This is how I have defined the class and Nokogiri:
class GDSNDoc
def initialize(xml_file)
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
#doc.remove_namespaces!
The code had to be moved up to an earlier stage, where Nokogiri was initialised. It doesn't get runtime errors, but it does let XML files with blank descriptions get through and it shouldn't.
class GDSNDoc
def initialize(xml_file)
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
#doc.remove_namespaces!
desc_exists = #doc.xpath("//productData/descriptions")
if !desc_exists.empty?
You are creating your instance like this:
gdsn_doc = GDSNDoc.new(xml_file)
then use it like this:
#desc_exists = #gdsn_doc.xpath("//productData/description")
#gdsn_doc and gdsn_doc are two different things in Ruby - try just using the version without the #:
#desc_exists = gdsn_doc.xpath("//productData/description")
The basic test is to use:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<productData>
<description>Chocolate biscuits </description>
<productData>
EOT
# using XPath selectors...
doc.xpath('//productData/description').to_html # => "<description>Chocolate biscuits </description>"
doc.xpath('//description').to_html # => "<description>Chocolate biscuits </description>"
xpath works fine when the document is parsed correctly.
I get an error "undefined method 'xpath' for nil:NilClass (NoMethodError)
Usually this means you didn't parse the document correctly. In your case it's because you're not using the right variable:
gdsn_doc = GDSNDoc.new(xml_file)
...
#desc_exists = #gdsn_doc.xpath("//productData/description")
Note that gdsn_doc is not the same as #gdsn_doc. The later doesn't appear to have been initialized.
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
While that should work, it's idiomatic to write it as:
#doc = Nokogiri::XML(File.read(xml_file))
File.open(...) do ... end is preferred if you're processing inside the block and want Ruby to automatically close the file. That isn't necessary when you're simply reading then passing the content to something else for processing, hence the use of File.read(...) which slurps the file. (Slurping isn't necessary a good practice because it can have scalability problems, but for reasonable sized XML/HTML it's OK because it's easier to use DOM-based parsing than SAX.)
If Nokogiri doesn't raise an exception it was able to parse the content, however that still doesn't mean the content was valid. It's a good idea to check
#doc.errors
to see whether Nokogiri/libXML had to do some fix-ups on the content just to be able to parse it. Fixing the markup can change the DOM from what you expect, making it impossible to find a tag based on your assumptions for the selector. You could use xmllint or one of the XML validators to check, but Nokogiri will still have to be happy.
Nokogiri includes a command-line version nokogiri that accepts a URL to the document you want to parse:
nokogiri http://example.com
It'll open IRB with the content loaded and ready for you to poke at it. It's very convenient when debugging and testing. It's also a decent way to make sure the content actually exists if you're dealing with HTML containing DHTML that loads parts of the page dynamically.

Create non-self-closed empty tag with Nokogiri

When I try to create an XML document with Nokogiri::XML::Builder:
builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value})
end
I get the following XML tag:
<my_tag key="value"/>
It is self-closed, but I need the full form:
<my_tag key="value"></my_tag>
When I pass a value inside the node (or even a space):
xml.my_tag("content", key: :value)
xml.my_tag(" ", key: :value)
It generates the full tag:
<my_tag key="value">content</my_tag>
<my_tag key="value"> </my_tag>
But if I pass either an empty string or nil, or even an empty block:
xml.my_tag("", key: :value)
It generates a self-closed tag:
<my_tag key="value"/>
I believe there should be some attribute or something else that helps me but simple Googling didn't find the answer.
I found a possible solution in "Building blank XML tags with Nokogiri?" but it saves all tags as non-self-closed.
You can use Nokogiri's NO_EMPTY_TAGS save option. (XML calls self-closing tags empty-element tags.)
builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value})
end
puts builder.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS)
<?xml version="1.0"?>
<my_tag key="value"></my_tag>
Each of the options is represented in a bit, so you can mix and match the ones you want. For example, setting NO_EMPTY_TAGS by itself will leave your XML on one line without spacing or indentation. If you still want it formatted for humans, you can bitwise or (|) it with the FORMAT option.
builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value}) do |my_tag|
my_tag.nested({another: :value})
end
end
puts builder.to_xml(
save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS
)
puts
puts builder.to_xml(
save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS |
Nokogiri::XML::Node::SaveOptions::FORMAT
)
<?xml version="1.0"?>
<my_tag key="value"><nested another="value"></nested></my_tag>
<?xml version="1.0"?>
<my_tag key="value">
<nested another="value"></nested>
</my_tag>
There are also a handful of DEFAULT_* options at the end of the list that already combine options into common uses.
Your update mentions "it saves all tags as non-self-closed", as if perhaps you only want this single tag instance to be non-self-closed, and the rest to self close. Nokogiri won't produce an inconsistent document like that, but if you must, you can concatenate some XML strings together that you built with different options.

Nokogiri to_xml without carriage returns

I'm currently using the Nokogiri::XML::Builder class to construct an XML document, then calling .to_xml on it. The resulting string always contains a bunch of spaces, linefeeds and carriage returns in between the nodes, and I can't for the life of me figure out how to get rid of them. Here's an example:
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml
This results in the following:
<?xml version="1.0"?>
<root>Value</root>
What I want is this (notice the missing newline):
<?xml version="1.0"?><root>Value</root>
How can this be done? Thanks in advance!
Builder#to_xml by default outputs formatted (i.e. indented) XML. You can use the Nokogiri::XML::Node::SaveOptions to get an almost unformatted result.
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.foo do
xml.text("Value")
end
end
end
b.to_xml
# => "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
# => "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
Now you could either just get rid of the XML header (which is optional anyway) and remove the last newline
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML | Nokogiri::XML::Node::SaveOptions::NO_DECLARATION).strip
# => "<root><foo>Value</foo></root>"
Just removing all newlines in the XML is probably a bad idea as newlines can actually be significant (e.g. in <pre> blocks of XHTML). If that is not the case for you (and you are really sure of that) you could just do it.
This is not something that Nokogiri is designed to do. The closest you can get is to serialize the root of the document with no newlines or indentation, and then add the PI yourself (if you really need it):
require 'nokogiri'
b = Nokogiri::XML::Builder.new{ |xml| xml.root{ xml.foo "Value" } }
p b.to_xml
#=> "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
p b.doc.serialize(save_with:0)
#=> "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
flat_root = b.doc.root.serialize(save_with:0)
p flat_root
#=> "<root><foo>Value</foo></root>"
puts %Q{<?xml version="1.0"?>#{flat_root}}
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Alternatively, you could simply cheat and do:
puts b.doc.serialize(save_with:0).sub("\n","")
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Note the usage of sub instead of gsub to only replace the first known-present newline.
b.to_xml returns a string. You just need to replace the first instance of \n in the string.
require 'nokogiri'
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml.sub("\n",'')
Probably easier than trying to overload the method.

Nokogiri and XML Formatting When Inserting Tags

I'd like to use Nokogiri to insert nodes into an XML document. Nokogiri uses the Nokogiri::XML::Builder class to insert or create new XML.
If I create XML using the new method, I'm able to create nice, formatted XML:
builder = Nokogiri::XML::Builder.new do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
outputs the following:
<?xml version="1.0"?>
<product>
<test>hi</test>
</product>
That's great, but what I want to do is add the above XML to an existing document, not create a new document. According to the Nokogiri documentation, this can be done by using the Builder's with method, like so:
builder = Nokogiri::XML::Builder.with(document.at('products')) do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
When I do this, however, the XML all gets put into a single line with no indentation. It looks like this:
<products><product><test>hi</test></product></products>
Am I missing something to get it to format correctly?
Found the answer in the Nokogiri mailing list:
In XML, whitespace can be considered
meaningful. If you parse a document
that contains whitespace nodes,
libxml2 will assume that whitespace
nodes are meaningful and will not
insert them for you.
You can tell libxml2 that whitespace
is not meaningful by passing the
"noblanks" flag to the parser. To
demonstrate, here is an example that
reproduces your error, then does what
you want:
require 'nokogiri'
def build_from node
builder = Nokogiri::XML::Builder.with(node) do|xml|
xml.hello do
xml.world
end
end
end
xml = DATA.read
doc = Nokogiri::XML(xml)
puts build_from(doc.at('bar')).to_xml
doc = Nokogiri::XML(xml) { |x| x.noblanks }
puts build_from(doc.at('bar')).to_xml
Output:
<root>
<foo>
<bar>
<baz />
</bar>
</foo>
</root>

How to access attributes using Nokogiri

I have a simple task of accessing the values of some attributes. This is a simple script that uses Nokogiri::XML::Builder to create a simple XML doc.
require 'nokogiri'
builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
xml.Placement(:messageId => "392847-039820-938777", :system => "MOD", :version => "2.0") {
xml.objects {
xml.object(:myattribute => "99", :anotherattrib => "333")
xml.nextobject_ '9387toot'
xml.Entertainment "Last Man Standing"
}
}
end
puts builder.to_xml
puts builder.root.attributes["messageId"]
The results are:
<?xml version="1.0" encoding="UTF-8"?>
<Placement messageId="392847-039820-938777" version="2.0" system="MOD">
<objects>
<object anotherattrib="333" myattribute="99"/>
<nextobject>9387toot</nextobject>
<Entertainment>Last Man Standing</Entertainment>
</objects>
</Placement>
C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/document.rb:178:in `add_child': Document already has a root node (RuntimeError)
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/node.rb:455:in `parent='
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/builder.rb:358:in `insert'
from C:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.2-x86-mingw32/lib/nokogiri/xml/builder.rb:350:in `method_missing'
from C:/Documents and Settings/etrojan/workspace/Lads/tryXPATH2.rb:15
The XML that is generated looks fine. However, my attempts to access attributes cause an error to be generated:
Document already has a root node
I don't understand why puts would cause this error.
Using Nokogiri::XML::Reader works for your example, but probably isn't the full answer you are looking for (Note that there is no attributes method for Builder).
reader = Nokogiri::XML::Reader(builder.to_xml)
reader.read #Moves to next node in document
reader.attribute("messageId")
Note that if you issued reader.read again and then tried reader.attribute("messageId") the result will be nil since the current node will not have this attribute.
What you probably want to do is use Nokogiri::XML::Document if you want to search an XML document by attribute.
doc = Nokogiri::XML(builder.to_xml)
elems = doc.xpath("//*[#messageId]") #get all elements with an attribute of 'messageId'
elems[0].attr('messageId') #gets value of attribute of first elem
Here is a slightly more succinct way to access attributes using Nokogiri (assuming you already have your xml stored in a variable called xml, as covered by #atomicules' answer):
xml.xpath("//Placement").attr("messageId")

Resources