Create one XML file that joins many others - ruby

I am trying to create an XML using some list of XML's.
here is an example list of XML's
java.xml :
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
c.xml
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
I want to create a Programming.xml using the above XML's
it should look like:
<programming>
<Java>
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
</Java>
<C>
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
</C>
</programming>
I am currently looking into nokogiri to do the same as Performance is a major factor, What I am not sure is how to create nodes for the output XML. any code help in Ruby using Nokogiri is much appreciated.

To create a new XML file with a specific root, it can be as simple as:
doc = Nokogiri.XML("<programming/>")
One way to add a child node to that document:
java = doc.root.add_child('<Java/>').first
To read in another XML file from disk and append it:
java_details = Nokogiri.XML( IO.read )
java << java_details.root
Thus, if you have an array of filenames and you want to construct wrapping elements from each based on the name:
require 'nokogiri'
files = %w[ java.xml c.xml ]
doc = Nokogiri.XML('<programming/>')
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
wrapper = doc.root.add_child("<#{wrap_name} />").first
wrapper << Nokogiri.XML(IO.read(filename)).root
end
puts doc
Alternatively, if you want to use the Builder interface of Nokogiri:
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming do
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
xml.send(wrap_name) do
xml.parent << Nokogiri.XML(IO.read(filename)).root
end
end
end
end
puts builder.to_xml

To install it:
gem install nokogiri
Here's the syntax:
require 'nokogiri'
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming {
xml.Java {
xml.JavaDetails {
xml.SomeList 'List item'
}
}
}
end
The result can be retrieved with to_xml:
builder.to_xml
HTH!

Related

how to use nokogiri to parse xml file for specific values?

I have an xml file from which I need to extract all values that contain https://www.example.com/a/b:
<xml>
<url><loc>https://www.example.com/a/b</loc></url>
<url><loc>https://www.example.com/b/c</loc></url>
<url><loc>https://www.example.com/a/b/c</loc></url>
<url><loc>https://www.example.com/c/d</loc></url>
</xml>
Given the above, this should return two results. I've opened the file and parsed it with Nokogiri, but I do not understand how to access the values of the //loc key.
require 'nokogiri'
require 'open-uri'
doc = File.open('./sitemap-en.xml') { |f| Nokogiri::XML(f) }
puts doc.xpath('//loc')
The above code puts the entire xml file, but I want it paired down so that I get everything under the /a/b subdirectories. How can I do this?
Both of the following solutions assume the following:
require 'nokogiri'
xml = <<-XML
<xml>
<url><loc>https://www.example.com/a/b</loc></url>
<url><loc>https://www.example.com/b/c</loc></url>
<url><loc>https://www.example.com/a/b/c</loc></url>
<url><loc>https://www.example.com/c/d</loc></url>
</xml>
XML
doc = Nokogiri::XML(xml)
To return a list of all loc elements, select only those whose inner text begins with https://www.example.com/a/b, and print the URL text:
elements = doc.xpath("//loc")
filtered_elements = elements.select do |element|
element.text.start_with? 'https://www.example.com/a/b'
end
filtered_elements.each do |element|
puts element.text
end
To capture a list of loc elements whose inner text contains the string https://www.example.com/a/b and print each URL:
elements = doc.xpath("//loc[contains(text(), 'https://www.example.com/a/b')]")
elements.each do |element|
puts element.text
end
To quickly print URLs using a slightly modified version of the previous XPATH query
puts doc.xpath("//loc[contains(text(), 'https://www.example.com/a/b')]/text()")

Adding a XML Element to a Nokogiri::XML::Builder document

How can I add a Nokogiri::XML::Element to a XML document that is being created with Nokogiri::XML::Buider?
My current solution is to serialize the element and use the << method to have the Builder reinterpret it.
orig_doc = Nokogiri::XML('<root xmlns="foobar"><a>test</a></root>')
node = orig_doc.at('/*/*[1]')
puts Nokogiri::XML::Builder.new do |doc|
doc.another {
# FIXME: this is the round-trip I would like to avoid
xml_text = node.to_xml(:skip_instruct => true).to_s
doc << xml_text
doc.second("hi")
}
end.to_xml
# The expected result is
#
# <another>
# <a xmlns="foobar">test</a>
# <second>hi</second>
# </another>
However the Nokogiri::XML::Element is a quite big node (in the order of kilobytes and thousands of nodes) and this code is in the hot path. Profiling shows that the serialization/parsing round trip is very expensive.
How can I instruct the Nokogiri Builder to add the existing XML element node in the "current" position?
Without using a private method you can get a handle on the current parent element using the parent method of the Builder instance. Then you can append an element to that (even from another document). For example:
require 'nokogiri'
doc1 = Nokogiri.XML('<r><a>success!</a></r>')
a = doc1.at('a')
# note that `xml` is not a Nokogiri::XML::Document,
# but rather a Nokogiri::XML::Builder instance.
doc2 = Nokogiri::XML::Builder.new do |xml|
xml.some do
xml.more do
xml.parent << a
end
end
end.doc
puts doc2
#=> <?xml version="1.0"?>
#=> <some>
#=> <more>
#=> <a>success!</a>
#=> </more>
#=> </some>
After looking at the Nokogiri source I have found this fragile solution: using the protected #insert(node) method.
The code, modified to use that private method looks like this:
doc.another {
xml_text = node.to_xml(:skip_instruct => true).to_s
doc.send('insert', xml_text) # <= use `#insert` instead of `<<`
doc.second("hi")
}

Nokogiri Builder: Replace RegEx match with XML

While using Nokogiri::XML::Builder I need to be able to generate a node that also replaces a regex match on the text with some other XML.
Currently I'm able to add additional XML inside the node. Here's an example;
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'An Entry')
}
}
end.to_xml
end
# further child nodes WILL be added to footnote
def add_footnotes(xml, text)
xml.footnote text
end
which produces;
<chapter>
<para>Testing[1] footnote paragraph.<footnote>An Entry</footnote></para>
</chapter>
But I need to be able to run a regex replace on the reference [1], replacing it with the <footnote> XML, producing output like the following;
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
I'm making the assumption here that the add_footnotes method would receive the reference match (e.g. as $1), which would be used to pull the appropriate footnote from a collection.
That method would also be adding additional child nodes, such as the following;
<footnote>
<para>Words.</para>
<para>More words.</para>
</footnote>
Can anyone help?
Here's a spin on your code that shows how to generate the output. You'll need to refit it to your own code....
require 'nokogiri'
FOOTNOTES = {
'1' => 'An Entry'
}
child_text = "Testing[1] footnote paragraph."
pre_footnote, footnote_id, post_footnote = /^(.+)\[(\d+)\](.+)/.match(child_text).captures
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.text(pre_footnote)
xml.footnote FOOTNOTES[footnote_id]
xml.text(post_footnote)
}
}
end
puts doc.to_xml
Which outputs:
<?xml version="1.0"?>
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
The trick is you have to grab the text preceding and following your target so you can insert those as text nodes. Then you can figure out what needs to be added. For clarity in your code you should preprocess all the text, get your variables figured out, then fall into the XML generator. Don't try to do any calculations inside the Builder block, instead just reference variables. Think of Builder like a view in an MVC-type application if that helps.
FOOTNOTES could actually be a database lookup, a hash or some other data container.
You should also look at the << method, which lets you inject XML source, so you could pre-build the footnote XML, then loop over an array containing the various footnotes and inject them. Often it's easier to pre-process, then use gsub to treat things like [1] as placeholders. See "gsub(pattern, hash) → new_str" in the documentation, along with this example:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
For instance:
require 'nokogiri'
text = 'this is[1] text and[2] text'
footnotes = {
'[1]' => 'some',
'[2]' => 'more'
}
footnotes.keys.each do |k|
v = footnotes[k]
footnotes[k] = "<footnote>#{ v }</footnote>"
end
replacement_xml = text.gsub(/\[\d+\]/, footnotes) # => "this is<footnote>some</footnote> text and<footnote>more</footnote> text"
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para { xml.<<(replacement_xml) }
}
end
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>this is<footnote>some</footnote> text and<footnote>more</footnote> text</para>
# >> </chapter>
I can try as below :
require 'nokogiri'
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'add text',"[1]")
}
}
end.to_xml
end
def add_footnotes(xml, text,ref)
string = xml.parent.child.content
xml.parent.child.content = ""
string.partition(ref).each do |txt|
next xml.text(txt) if txt != ref
xml.footnote text
end
end
puts xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>Testing<footnote>add text</footnote> footnote paragraph.</para>
# >> </chapter>

Extend existing XML by Nokogiri

I'm trying to extend an existing XML file and add a new node. I'm loading the XML containing a lot of products, add a new one and save it.
I'm using Nokogiri and Ruby 1.9.3.
This is the best that I created:
builder = Nokogiri::XML::Builder.new do
root do
load_xml = Nokogiri::XML(IO.read("test.xml"))
parent.add_child(load_xml.root)
data do
name "Name"
end
end
end
file = File.open("test.xml",'w')
file.puts builder.to_xml
file.close
Nokogiri::XML::Builder is actually only used when creating new XML-Files, not when editing them.
Also your code loads the XML and puts it into a new root-Node (root) while it appends a new child (the data-node) to it. Is this really the desired behaviour?
Normally you would do adding a node like this:
doc = Nokogiri::XML(IO.read("test.xml"))
name_node = Nokogiri::XML::Node.new("name",doc)
name_node.content = "Name"
data_node = Nokogiri::XML::Node.new("data",doc)
data_node.add_child(name_node)
doc.root.add_child(data_node)
file = File.open("test.xml",'w')
file.puts doc.to_xml
file.close
This is without creating a new root node, because this seems a little bit
peculiar to me...
Also you might want to try the Nokogiri-Documentation, it is fairly extensive.
There are other ways, which would use Nokogiri::XML::Builder to create everything downside from and including data, this would be an example for this combined approach:
builder = Nokogiri::XML::Builder.new do
data do
name "Name"
end
end
doc = Nokogiri::XML(IO.read("test.xml"))
doc.root.add_child builder.doc.root
file = File.open("test.xml",'w')
file.puts doc.to_xml
file.close

Nokogiri and XML Formatting When Inserting Tags

I'd like to use Nokogiri to insert nodes into an XML document. Nokogiri uses the Nokogiri::XML::Builder class to insert or create new XML.
If I create XML using the new method, I'm able to create nice, formatted XML:
builder = Nokogiri::XML::Builder.new do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
outputs the following:
<?xml version="1.0"?>
<product>
<test>hi</test>
</product>
That's great, but what I want to do is add the above XML to an existing document, not create a new document. According to the Nokogiri documentation, this can be done by using the Builder's with method, like so:
builder = Nokogiri::XML::Builder.with(document.at('products')) do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
When I do this, however, the XML all gets put into a single line with no indentation. It looks like this:
<products><product><test>hi</test></product></products>
Am I missing something to get it to format correctly?
Found the answer in the Nokogiri mailing list:
In XML, whitespace can be considered
meaningful. If you parse a document
that contains whitespace nodes,
libxml2 will assume that whitespace
nodes are meaningful and will not
insert them for you.
You can tell libxml2 that whitespace
is not meaningful by passing the
"noblanks" flag to the parser. To
demonstrate, here is an example that
reproduces your error, then does what
you want:
require 'nokogiri'
def build_from node
builder = Nokogiri::XML::Builder.with(node) do|xml|
xml.hello do
xml.world
end
end
end
xml = DATA.read
doc = Nokogiri::XML(xml)
puts build_from(doc.at('bar')).to_xml
doc = Nokogiri::XML(xml) { |x| x.noblanks }
puts build_from(doc.at('bar')).to_xml
Output:
<root>
<foo>
<bar>
<baz />
</bar>
</foo>
</root>

Resources