Adding a XML Element to a Nokogiri::XML::Builder document - ruby

How can I add a Nokogiri::XML::Element to a XML document that is being created with Nokogiri::XML::Buider?
My current solution is to serialize the element and use the << method to have the Builder reinterpret it.
orig_doc = Nokogiri::XML('<root xmlns="foobar"><a>test</a></root>')
node = orig_doc.at('/*/*[1]')
puts Nokogiri::XML::Builder.new do |doc|
doc.another {
# FIXME: this is the round-trip I would like to avoid
xml_text = node.to_xml(:skip_instruct => true).to_s
doc << xml_text
doc.second("hi")
}
end.to_xml
# The expected result is
#
# <another>
# <a xmlns="foobar">test</a>
# <second>hi</second>
# </another>
However the Nokogiri::XML::Element is a quite big node (in the order of kilobytes and thousands of nodes) and this code is in the hot path. Profiling shows that the serialization/parsing round trip is very expensive.
How can I instruct the Nokogiri Builder to add the existing XML element node in the "current" position?

Without using a private method you can get a handle on the current parent element using the parent method of the Builder instance. Then you can append an element to that (even from another document). For example:
require 'nokogiri'
doc1 = Nokogiri.XML('<r><a>success!</a></r>')
a = doc1.at('a')
# note that `xml` is not a Nokogiri::XML::Document,
# but rather a Nokogiri::XML::Builder instance.
doc2 = Nokogiri::XML::Builder.new do |xml|
xml.some do
xml.more do
xml.parent << a
end
end
end.doc
puts doc2
#=> <?xml version="1.0"?>
#=> <some>
#=> <more>
#=> <a>success!</a>
#=> </more>
#=> </some>

After looking at the Nokogiri source I have found this fragile solution: using the protected #insert(node) method.
The code, modified to use that private method looks like this:
doc.another {
xml_text = node.to_xml(:skip_instruct => true).to_s
doc.send('insert', xml_text) # <= use `#insert` instead of `<<`
doc.second("hi")
}

Related

How to convert partial XML to hash in Ruby

I have a string which has plain text and extra spaces and carriage returns then XML-like tags followed by XML tags:
String = "hi there.
<SET-TOPIC> INITIATE </SET-TOPIC>
<SETPROFILE>
<KEY>name</KEY>
<VALUE>Joe</VALUE>
</SETPROFILE>
<SETPROFILE>
<KEY>email</KEY>
<VALUE>Email#hi.com</VALUE>
</SETPROFILE>
<GET-RELATIONS>
<COLLECTION>goals</COLLECTION>
<VALUE>walk upstairs</VALUE>
</GET-RELATIONS>
So what do you think?
Is it true?
"
I want to parse this similar to use Nori or Nokogiri or Ox where they convert XML to a hash.
My goal is to be able to easily pull out the top level tags as keys and then know all the elements, something like:
Keys = ['SETPROFILE', 'SETPROFILE', 'SET-TOPIC', 'GET-OBJECT']
Values[0] = [{name => Joe}, {email => email#hi.com}]
Values[3] = [{collection => goals}, {value => walk up}]
I have seen several functions like that for true XML but all of mine are partial.
I started going down this line of thinking:
parsed = doc.search('*').each_with_object({}) do |n, h|
(h[n.name] ||= []) << n.text
end
I'd probably do something along these lines if I wanted the keys and values variables:
require 'nokogiri'
string = "hi there.
<SET-TOPIC> INITIATE </SET-TOPIC>
<SETPROFILE>
<KEY>name</KEY>
<VALUE>Joe</VALUE>
</SETPROFILE>
<SETPROFILE>
<KEY>email</KEY>
<VALUE>Email#hi.com</VALUE>
</SETPROFILE>
<GET-RELATIONS>
<COLLECTION>goals</COLLECTION>
<VALUE>walk upstairs</VALUE>
</GET-RELATIONS>
So what do you think?
Is it true?
"
doc = Nokogiri::XML('<root>' + string + '</root>', nil, nil, Nokogiri::XML::ParseOptions::NOBLANKS)
nodes = doc.root.children.reject { |n| n.is_a?(Nokogiri::XML::Text) }.map { |node|
[
node.name, node.children.map { |c|
[c.name, c.content]
}.to_h
]
}
nodes
# => [["SET-TOPIC", {"text"=>" INITIATE "}],
# ["SETPROFILE", {"KEY"=>"name", "VALUE"=>"Joe"}],
# ["SETPROFILE", {"KEY"=>"email", "VALUE"=>"Email#hi.com"}],
# ["GET-RELATIONS", {"COLLECTION"=>"goals", "VALUE"=>"walk upstairs"}]]
From nodes it's possible to grab the rest of the detail:
keys = nodes.map(&:first)
# => ["SET-TOPIC", "SETPROFILE", "SETPROFILE", "GET-RELATIONS"]
values = nodes.map(&:last)
# => [{"text"=>" INITIATE "},
# {"KEY"=>"name", "VALUE"=>"Joe"},
# {"KEY"=>"email", "VALUE"=>"Email#hi.com"},
# {"COLLECTION"=>"goals", "VALUE"=>"walk upstairs"}]
values[0] # => {"text"=>" INITIATE "}
If you'd rather, it's possible to pre-process the DOM and remove the top-level text:
doc.root.children.select { |n| n.is_a?(Nokogiri::XML::Text) }.map(&:remove)
doc.to_xml
# => "<root><SET-TOPIC> INITIATE </SET-TOPIC><SETPROFILE><KEY>name</KEY><VALUE>Joe</VALUE></SETPROFILE><SETPROFILE><KEY>email</KEY><VALUE>Email#hi.com</VALUE></SETPROFILE><GET-RELATIONS><COLLECTION>goals</COLLECTION><VALUE>walk upstairs</VALUE></GET-RELATIONS></root>\n"
That makes it easier to work with the XML.
Wrap the string content in a node and you can parse that with Nokogiri. The text outside the XML segment will be text node in the new node.
str = "hi there. .... Is it true?"
doc = Nokogiri::XML("<wrapper>#{str}</wrapper>")
segments = doc.xpath('/*/SETPROFILE')
Now you can use "Convert a Nokogiri document to a Ruby Hash" to convert the segments into a hash.
However, if the plain text contains some characters that needs to be escaped in the XML spec you'll need to find those and escape them yourself.

Nokogiri Builder: Replace RegEx match with XML

While using Nokogiri::XML::Builder I need to be able to generate a node that also replaces a regex match on the text with some other XML.
Currently I'm able to add additional XML inside the node. Here's an example;
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'An Entry')
}
}
end.to_xml
end
# further child nodes WILL be added to footnote
def add_footnotes(xml, text)
xml.footnote text
end
which produces;
<chapter>
<para>Testing[1] footnote paragraph.<footnote>An Entry</footnote></para>
</chapter>
But I need to be able to run a regex replace on the reference [1], replacing it with the <footnote> XML, producing output like the following;
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
I'm making the assumption here that the add_footnotes method would receive the reference match (e.g. as $1), which would be used to pull the appropriate footnote from a collection.
That method would also be adding additional child nodes, such as the following;
<footnote>
<para>Words.</para>
<para>More words.</para>
</footnote>
Can anyone help?
Here's a spin on your code that shows how to generate the output. You'll need to refit it to your own code....
require 'nokogiri'
FOOTNOTES = {
'1' => 'An Entry'
}
child_text = "Testing[1] footnote paragraph."
pre_footnote, footnote_id, post_footnote = /^(.+)\[(\d+)\](.+)/.match(child_text).captures
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.text(pre_footnote)
xml.footnote FOOTNOTES[footnote_id]
xml.text(post_footnote)
}
}
end
puts doc.to_xml
Which outputs:
<?xml version="1.0"?>
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
The trick is you have to grab the text preceding and following your target so you can insert those as text nodes. Then you can figure out what needs to be added. For clarity in your code you should preprocess all the text, get your variables figured out, then fall into the XML generator. Don't try to do any calculations inside the Builder block, instead just reference variables. Think of Builder like a view in an MVC-type application if that helps.
FOOTNOTES could actually be a database lookup, a hash or some other data container.
You should also look at the << method, which lets you inject XML source, so you could pre-build the footnote XML, then loop over an array containing the various footnotes and inject them. Often it's easier to pre-process, then use gsub to treat things like [1] as placeholders. See "gsub(pattern, hash) → new_str" in the documentation, along with this example:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
For instance:
require 'nokogiri'
text = 'this is[1] text and[2] text'
footnotes = {
'[1]' => 'some',
'[2]' => 'more'
}
footnotes.keys.each do |k|
v = footnotes[k]
footnotes[k] = "<footnote>#{ v }</footnote>"
end
replacement_xml = text.gsub(/\[\d+\]/, footnotes) # => "this is<footnote>some</footnote> text and<footnote>more</footnote> text"
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para { xml.<<(replacement_xml) }
}
end
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>this is<footnote>some</footnote> text and<footnote>more</footnote> text</para>
# >> </chapter>
I can try as below :
require 'nokogiri'
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'add text',"[1]")
}
}
end.to_xml
end
def add_footnotes(xml, text,ref)
string = xml.parent.child.content
xml.parent.child.content = ""
string.partition(ref).each do |txt|
next xml.text(txt) if txt != ref
xml.footnote text
end
end
puts xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>Testing<footnote>add text</footnote> footnote paragraph.</para>
# >> </chapter>

Create one XML file that joins many others

I am trying to create an XML using some list of XML's.
here is an example list of XML's
java.xml :
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
c.xml
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
I want to create a Programming.xml using the above XML's
it should look like:
<programming>
<Java>
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
</Java>
<C>
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
</C>
</programming>
I am currently looking into nokogiri to do the same as Performance is a major factor, What I am not sure is how to create nodes for the output XML. any code help in Ruby using Nokogiri is much appreciated.
To create a new XML file with a specific root, it can be as simple as:
doc = Nokogiri.XML("<programming/>")
One way to add a child node to that document:
java = doc.root.add_child('<Java/>').first
To read in another XML file from disk and append it:
java_details = Nokogiri.XML( IO.read )
java << java_details.root
Thus, if you have an array of filenames and you want to construct wrapping elements from each based on the name:
require 'nokogiri'
files = %w[ java.xml c.xml ]
doc = Nokogiri.XML('<programming/>')
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
wrapper = doc.root.add_child("<#{wrap_name} />").first
wrapper << Nokogiri.XML(IO.read(filename)).root
end
puts doc
Alternatively, if you want to use the Builder interface of Nokogiri:
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming do
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
xml.send(wrap_name) do
xml.parent << Nokogiri.XML(IO.read(filename)).root
end
end
end
end
puts builder.to_xml
To install it:
gem install nokogiri
Here's the syntax:
require 'nokogiri'
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming {
xml.Java {
xml.JavaDetails {
xml.SomeList 'List item'
}
}
}
end
The result can be retrieved with to_xml:
builder.to_xml
HTH!

Nokogiri to_xml without carriage returns

I'm currently using the Nokogiri::XML::Builder class to construct an XML document, then calling .to_xml on it. The resulting string always contains a bunch of spaces, linefeeds and carriage returns in between the nodes, and I can't for the life of me figure out how to get rid of them. Here's an example:
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml
This results in the following:
<?xml version="1.0"?>
<root>Value</root>
What I want is this (notice the missing newline):
<?xml version="1.0"?><root>Value</root>
How can this be done? Thanks in advance!
Builder#to_xml by default outputs formatted (i.e. indented) XML. You can use the Nokogiri::XML::Node::SaveOptions to get an almost unformatted result.
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.foo do
xml.text("Value")
end
end
end
b.to_xml
# => "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
# => "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
Now you could either just get rid of the XML header (which is optional anyway) and remove the last newline
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML | Nokogiri::XML::Node::SaveOptions::NO_DECLARATION).strip
# => "<root><foo>Value</foo></root>"
Just removing all newlines in the XML is probably a bad idea as newlines can actually be significant (e.g. in <pre> blocks of XHTML). If that is not the case for you (and you are really sure of that) you could just do it.
This is not something that Nokogiri is designed to do. The closest you can get is to serialize the root of the document with no newlines or indentation, and then add the PI yourself (if you really need it):
require 'nokogiri'
b = Nokogiri::XML::Builder.new{ |xml| xml.root{ xml.foo "Value" } }
p b.to_xml
#=> "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
p b.doc.serialize(save_with:0)
#=> "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
flat_root = b.doc.root.serialize(save_with:0)
p flat_root
#=> "<root><foo>Value</foo></root>"
puts %Q{<?xml version="1.0"?>#{flat_root}}
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Alternatively, you could simply cheat and do:
puts b.doc.serialize(save_with:0).sub("\n","")
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Note the usage of sub instead of gsub to only replace the first known-present newline.
b.to_xml returns a string. You just need to replace the first instance of \n in the string.
require 'nokogiri'
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml.sub("\n",'')
Probably easier than trying to overload the method.

REXML: Equivalent of javascript-DOM's .innerHTML=

Is there a way to pass a string to an REXML::Element in such a way that the string will be parsed as XML, and the elements so found inserted into the target?
You can extend the REXML::Element class to include innerHTML as shown below.
require "rexml/element"
class REXML::Element
def innerHTML=(xml)
require "rexml/document"
self.to_a.each do |e|
self.delete e
end
d = REXML::Document.new "<root>#{xml}</root>"
d.root.to_a.each do |e|
case e
when REXML::Text
self.add_text e
when REXML::Element
self.add_element e
else
puts "ERROR"
end
end
xml
end
def innerHTML
ret = ''
self.to_a.each do |e|
ret += e.to_s
end
ret
end
end
You can then use innerHTML as you would in javascript (more or less).
require "rexml/document"
doc = REXML::Document.new "<xml><alice><b>bob</b><chuck>ch<u>u</u>ck</chuck></alice><alice/></xml>"
c = doc.root.get_elements('//chuck').first
t = c.innerHTML
c.innerHTML = "#{t}<david>#{t}</david>"
c = doc.root.get_elements('//alice').last
c.innerHTML = "<david>#{t}</david>"
doc.write( $stdout, 2 )
It would help if you could provide an example to further illustrate exactly what you had in mind.
With JS innerHTML you can insert text or HTML in one shot and changes are immediately displayed in the HTML document. The only way I know how to do this in REXML is with separate steps for inserting content/elements and saving/reloading the document.
To modify the text of a specific REXML Elemement you can use the text=() method.
#e represents a REXML Element
e.text = "blah"
If you want to insert another element you have to use the add_element() method.
#e represents a REXML Element
e.add_element('blah') #adds <blah></blah> to the existing element
b = e.get_elements('blah') #empty Element named "blah"
b.text('some text') #add some text to Element blah
Then of course save the XML document with the changes. ruby-doc.org/REXML/Element
text() will return the inner content as a string

Resources