Nokogiri to_xml without carriage returns - ruby

I'm currently using the Nokogiri::XML::Builder class to construct an XML document, then calling .to_xml on it. The resulting string always contains a bunch of spaces, linefeeds and carriage returns in between the nodes, and I can't for the life of me figure out how to get rid of them. Here's an example:
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml
This results in the following:
<?xml version="1.0"?>
<root>Value</root>
What I want is this (notice the missing newline):
<?xml version="1.0"?><root>Value</root>
How can this be done? Thanks in advance!

Builder#to_xml by default outputs formatted (i.e. indented) XML. You can use the Nokogiri::XML::Node::SaveOptions to get an almost unformatted result.
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.foo do
xml.text("Value")
end
end
end
b.to_xml
# => "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
# => "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
Now you could either just get rid of the XML header (which is optional anyway) and remove the last newline
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML | Nokogiri::XML::Node::SaveOptions::NO_DECLARATION).strip
# => "<root><foo>Value</foo></root>"
Just removing all newlines in the XML is probably a bad idea as newlines can actually be significant (e.g. in <pre> blocks of XHTML). If that is not the case for you (and you are really sure of that) you could just do it.

This is not something that Nokogiri is designed to do. The closest you can get is to serialize the root of the document with no newlines or indentation, and then add the PI yourself (if you really need it):
require 'nokogiri'
b = Nokogiri::XML::Builder.new{ |xml| xml.root{ xml.foo "Value" } }
p b.to_xml
#=> "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
p b.doc.serialize(save_with:0)
#=> "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
flat_root = b.doc.root.serialize(save_with:0)
p flat_root
#=> "<root><foo>Value</foo></root>"
puts %Q{<?xml version="1.0"?>#{flat_root}}
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Alternatively, you could simply cheat and do:
puts b.doc.serialize(save_with:0).sub("\n","")
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Note the usage of sub instead of gsub to only replace the first known-present newline.

b.to_xml returns a string. You just need to replace the first instance of \n in the string.
require 'nokogiri'
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml.sub("\n",'')
Probably easier than trying to overload the method.

Related

Adding a XML Element to a Nokogiri::XML::Builder document

How can I add a Nokogiri::XML::Element to a XML document that is being created with Nokogiri::XML::Buider?
My current solution is to serialize the element and use the << method to have the Builder reinterpret it.
orig_doc = Nokogiri::XML('<root xmlns="foobar"><a>test</a></root>')
node = orig_doc.at('/*/*[1]')
puts Nokogiri::XML::Builder.new do |doc|
doc.another {
# FIXME: this is the round-trip I would like to avoid
xml_text = node.to_xml(:skip_instruct => true).to_s
doc << xml_text
doc.second("hi")
}
end.to_xml
# The expected result is
#
# <another>
# <a xmlns="foobar">test</a>
# <second>hi</second>
# </another>
However the Nokogiri::XML::Element is a quite big node (in the order of kilobytes and thousands of nodes) and this code is in the hot path. Profiling shows that the serialization/parsing round trip is very expensive.
How can I instruct the Nokogiri Builder to add the existing XML element node in the "current" position?
Without using a private method you can get a handle on the current parent element using the parent method of the Builder instance. Then you can append an element to that (even from another document). For example:
require 'nokogiri'
doc1 = Nokogiri.XML('<r><a>success!</a></r>')
a = doc1.at('a')
# note that `xml` is not a Nokogiri::XML::Document,
# but rather a Nokogiri::XML::Builder instance.
doc2 = Nokogiri::XML::Builder.new do |xml|
xml.some do
xml.more do
xml.parent << a
end
end
end.doc
puts doc2
#=> <?xml version="1.0"?>
#=> <some>
#=> <more>
#=> <a>success!</a>
#=> </more>
#=> </some>
After looking at the Nokogiri source I have found this fragile solution: using the protected #insert(node) method.
The code, modified to use that private method looks like this:
doc.another {
xml_text = node.to_xml(:skip_instruct => true).to_s
doc.send('insert', xml_text) # <= use `#insert` instead of `<<`
doc.second("hi")
}

Nokogiri Builder: Replace RegEx match with XML

While using Nokogiri::XML::Builder I need to be able to generate a node that also replaces a regex match on the text with some other XML.
Currently I'm able to add additional XML inside the node. Here's an example;
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'An Entry')
}
}
end.to_xml
end
# further child nodes WILL be added to footnote
def add_footnotes(xml, text)
xml.footnote text
end
which produces;
<chapter>
<para>Testing[1] footnote paragraph.<footnote>An Entry</footnote></para>
</chapter>
But I need to be able to run a regex replace on the reference [1], replacing it with the <footnote> XML, producing output like the following;
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
I'm making the assumption here that the add_footnotes method would receive the reference match (e.g. as $1), which would be used to pull the appropriate footnote from a collection.
That method would also be adding additional child nodes, such as the following;
<footnote>
<para>Words.</para>
<para>More words.</para>
</footnote>
Can anyone help?
Here's a spin on your code that shows how to generate the output. You'll need to refit it to your own code....
require 'nokogiri'
FOOTNOTES = {
'1' => 'An Entry'
}
child_text = "Testing[1] footnote paragraph."
pre_footnote, footnote_id, post_footnote = /^(.+)\[(\d+)\](.+)/.match(child_text).captures
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.text(pre_footnote)
xml.footnote FOOTNOTES[footnote_id]
xml.text(post_footnote)
}
}
end
puts doc.to_xml
Which outputs:
<?xml version="1.0"?>
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
The trick is you have to grab the text preceding and following your target so you can insert those as text nodes. Then you can figure out what needs to be added. For clarity in your code you should preprocess all the text, get your variables figured out, then fall into the XML generator. Don't try to do any calculations inside the Builder block, instead just reference variables. Think of Builder like a view in an MVC-type application if that helps.
FOOTNOTES could actually be a database lookup, a hash or some other data container.
You should also look at the << method, which lets you inject XML source, so you could pre-build the footnote XML, then loop over an array containing the various footnotes and inject them. Often it's easier to pre-process, then use gsub to treat things like [1] as placeholders. See "gsub(pattern, hash) → new_str" in the documentation, along with this example:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
For instance:
require 'nokogiri'
text = 'this is[1] text and[2] text'
footnotes = {
'[1]' => 'some',
'[2]' => 'more'
}
footnotes.keys.each do |k|
v = footnotes[k]
footnotes[k] = "<footnote>#{ v }</footnote>"
end
replacement_xml = text.gsub(/\[\d+\]/, footnotes) # => "this is<footnote>some</footnote> text and<footnote>more</footnote> text"
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para { xml.<<(replacement_xml) }
}
end
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>this is<footnote>some</footnote> text and<footnote>more</footnote> text</para>
# >> </chapter>
I can try as below :
require 'nokogiri'
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'add text',"[1]")
}
}
end.to_xml
end
def add_footnotes(xml, text,ref)
string = xml.parent.child.content
xml.parent.child.content = ""
string.partition(ref).each do |txt|
next xml.text(txt) if txt != ref
xml.footnote text
end
end
puts xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>Testing<footnote>add text</footnote> footnote paragraph.</para>
# >> </chapter>

How to add XML string to Nokogiri Builder

I have an existing Nokogiri builder and some xml nodes in a string from a different source. How can I add this string to my builder?
str = "<options><cc>true</cc></options>"
xml = Nokogiri::XML::Builder.new do |q|
q.query do |f|
f.name "awesome"
f.filter str
end
end
This escapes str into something like:
xml.to_xml
=> "<?xml version=\"1.0\"?>\n<query>\n <name>awesome</name>\n <filter><options><cc>true</cc></options></filter>\n</query>\n"
I have found many, many similar things, including nesting builders and using the << operator, but nothing works to insert a full xml node tree into a builder block.
How can I make that string into real nodes?
What problems did you find using <<? This works for me:
xml = Nokogiri::XML::Builder.new do |q|
q.query do |f|
f.name "awesome"
f << str
end
end
and avoids using the private insert method.
And, as usual, I found the answer shortly after posting...
xml = Nokogiri::XML::Builder.new do |q|
q.query do |f|
f.name "awesome"
f.__send__ :insert, Nokogiri::XML::DocumentFragment.parse( str )
end
end.to_xml
Gives you
=> "<?xml version=\"1.0\"?>\n<query>\n <name>awesome</name>\n <options>\n <cc>true</cc>\n </options>\n</query>\n"
EDIT: This way worked for me when << failed for some unknown reason. However, as others have pointed out it works by directly accessing the :insert method which was intended to be protected. Consider it both "bad practice" and a last resort.

Nokogiri::XML::Builder: Need to use the string "send" as element name

I am writing an application to generate XML files as input to SipP.
One tag frequently used by SipP is 'send'
The problem is, when I use nokogiri to build the xml for me
builder = Nokogiri::XML::Builder.new do |xml|
xml.send "Some Content"
end
I get this
<?xml version="1.0"?>
<Some Content/>
The same happens when I do this:
builder = Nokogiri::XML::Builder.new do |xml|
xml.send(:'send', "Some Content")
end
I can't spell 'SEND' in capital letters, because SipP won't understand it that way.
Any ideas how to force nokogiri to create an element with the name 'send'?
Thank you
From the docs:
The builder works by taking advantage of method_missing. Unfortunately
some methods are defined in ruby that are difficult or dangerous to
remove. You may want to create tags with the name “type”, “class”, and
“id” for example. In that case, you can use an underscore to
disambiguate your tag name from the method call.
So check the following:
irb(main):007:0> Nokogiri::XML::Builder.new { |xml| xml.send_ "foo" }.to_xml
=> "<?xml version=\"1.0\"?>\n<send>foo</send>\n"

How to replace every occurrence of a pattern in a string using Ruby?

I have an XML file which is too big. To make it smaller, I want to replace all tags and attribute names with shorter versions of the same thing.
So, I implemented this:
string.gsub!(/<(\w+) /) do |match|
case match
when 'Image' then 'Img'
when 'Text' then 'Txt'
end
end
puts string
which deletes all opening tags but does not do much else.
What am I doing wrong here?
Here's another way:
class String
def minimize_tags!
{"image" => "img", "text" => "txt"}.each do |from,to|
gsub!(/<#{from}\b/i,"<#{to}")
gsub!(/<\/#{from}>/i,"<\/#{to}>")
end
self
end
end
This will probably be a little easier to maintain, since the replacement patterns are all in one place. And on strings of any significant size, it may be a lot faster than Kevin's way. I did a quick speed test of these two methods using the HTML source of this stackoverflow page itself as the test string, and my way was about 6x faster...
Here's the beauty of using a parser such as Nokogiri:
This lets you manipulate selected tags (nodes) and their attributes:
require 'nokogiri'
xml = <<EOT
<xml>
<Image ImagePath="path/to/image">image comment</Image>
<Text TextFont="courier" TextSize="9">this is the text</Text>
</xml>
EOT
doc = Nokogiri::XML(xml)
doc.search('Image').each do |n|
n.name = 'img'
n.attributes['ImagePath'].name = 'path'
end
doc.search('Text').each do |n|
n.name = 'txt'
n.attributes['TextFont'].name = 'font'
n.attributes['TextSize'].name = 'size'
end
print doc.to_xml
# >> <?xml version="1.0"?>
# >> <xml>
# >> <img path="path/to/image">image comment</img>
# >> <txt font="courier" size="9">this is the text</txt>
# >> </xml>
If you need to iterate through every node, maybe to do a universal transformation on the tag-name, you can use doc.search('*').each. That would be slower than searching for individual tags, but might result in less code if you need to change every tag.
The nice thing about using a parser is it'll work even if the layout of the XML changes since it doesn't care about whitespace, and will work even if attribute order changes, making your code more robust.
Try this:
string.gsub!(/(<\/?)(\w+)/) do |match|
tag_mark = $1
case $2
when /^image$/i
"#{tag_mark}Img"
when /^text$/i
"#{tag_mark}Txt"
else
match
end
end

Resources