How to add XML string to Nokogiri Builder - ruby

I have an existing Nokogiri builder and some xml nodes in a string from a different source. How can I add this string to my builder?
str = "<options><cc>true</cc></options>"
xml = Nokogiri::XML::Builder.new do |q|
q.query do |f|
f.name "awesome"
f.filter str
end
end
This escapes str into something like:
xml.to_xml
=> "<?xml version=\"1.0\"?>\n<query>\n <name>awesome</name>\n <filter><options><cc>true</cc></options></filter>\n</query>\n"
I have found many, many similar things, including nesting builders and using the << operator, but nothing works to insert a full xml node tree into a builder block.
How can I make that string into real nodes?

What problems did you find using <<? This works for me:
xml = Nokogiri::XML::Builder.new do |q|
q.query do |f|
f.name "awesome"
f << str
end
end
and avoids using the private insert method.

And, as usual, I found the answer shortly after posting...
xml = Nokogiri::XML::Builder.new do |q|
q.query do |f|
f.name "awesome"
f.__send__ :insert, Nokogiri::XML::DocumentFragment.parse( str )
end
end.to_xml
Gives you
=> "<?xml version=\"1.0\"?>\n<query>\n <name>awesome</name>\n <options>\n <cc>true</cc>\n </options>\n</query>\n"
EDIT: This way worked for me when << failed for some unknown reason. However, as others have pointed out it works by directly accessing the :insert method which was intended to be protected. Consider it both "bad practice" and a last resort.

Related

Nokogiri Builder: Replace RegEx match with XML

While using Nokogiri::XML::Builder I need to be able to generate a node that also replaces a regex match on the text with some other XML.
Currently I'm able to add additional XML inside the node. Here's an example;
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'An Entry')
}
}
end.to_xml
end
# further child nodes WILL be added to footnote
def add_footnotes(xml, text)
xml.footnote text
end
which produces;
<chapter>
<para>Testing[1] footnote paragraph.<footnote>An Entry</footnote></para>
</chapter>
But I need to be able to run a regex replace on the reference [1], replacing it with the <footnote> XML, producing output like the following;
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
I'm making the assumption here that the add_footnotes method would receive the reference match (e.g. as $1), which would be used to pull the appropriate footnote from a collection.
That method would also be adding additional child nodes, such as the following;
<footnote>
<para>Words.</para>
<para>More words.</para>
</footnote>
Can anyone help?
Here's a spin on your code that shows how to generate the output. You'll need to refit it to your own code....
require 'nokogiri'
FOOTNOTES = {
'1' => 'An Entry'
}
child_text = "Testing[1] footnote paragraph."
pre_footnote, footnote_id, post_footnote = /^(.+)\[(\d+)\](.+)/.match(child_text).captures
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.text(pre_footnote)
xml.footnote FOOTNOTES[footnote_id]
xml.text(post_footnote)
}
}
end
puts doc.to_xml
Which outputs:
<?xml version="1.0"?>
<chapter>
<para>Testing<footnote>An Entry</footnote> footnote paragraph.</para>
</chapter>
The trick is you have to grab the text preceding and following your target so you can insert those as text nodes. Then you can figure out what needs to be added. For clarity in your code you should preprocess all the text, get your variables figured out, then fall into the XML generator. Don't try to do any calculations inside the Builder block, instead just reference variables. Think of Builder like a view in an MVC-type application if that helps.
FOOTNOTES could actually be a database lookup, a hash or some other data container.
You should also look at the << method, which lets you inject XML source, so you could pre-build the footnote XML, then loop over an array containing the various footnotes and inject them. Often it's easier to pre-process, then use gsub to treat things like [1] as placeholders. See "gsub(pattern, hash) → new_str" in the documentation, along with this example:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
For instance:
require 'nokogiri'
text = 'this is[1] text and[2] text'
footnotes = {
'[1]' => 'some',
'[2]' => 'more'
}
footnotes.keys.each do |k|
v = footnotes[k]
footnotes[k] = "<footnote>#{ v }</footnote>"
end
replacement_xml = text.gsub(/\[\d+\]/, footnotes) # => "this is<footnote>some</footnote> text and<footnote>more</footnote> text"
doc = Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para { xml.<<(replacement_xml) }
}
end
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>this is<footnote>some</footnote> text and<footnote>more</footnote> text</para>
# >> </chapter>
I can try as below :
require 'nokogiri'
def xml
Nokogiri::XML::Builder.new do |xml|
xml.chapter {
xml.para {
xml.parent.add_child("Testing[1] footnote paragraph.")
add_footnotes(xml, 'add text',"[1]")
}
}
end.to_xml
end
def add_footnotes(xml, text,ref)
string = xml.parent.child.content
xml.parent.child.content = ""
string.partition(ref).each do |txt|
next xml.text(txt) if txt != ref
xml.footnote text
end
end
puts xml
# >> <?xml version="1.0"?>
# >> <chapter>
# >> <para>Testing<footnote>add text</footnote> footnote paragraph.</para>
# >> </chapter>

Parsing XML with Ruby

I'm way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?
I'd rather not just write a brute-force way of parsing the text but I can't seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.
my ruby code:
require 'hpricot'
xml = File.open( "myfile.xml" )
doc = Hpricot::XML( xml )
(doc/:things).each do |thg|
[ 'Id', 'Name' ].each do |el|
puts "#{el}: #{thg.at(el).innerHTML}"
end
end
...which is just lifted from: http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/
And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens't error. It just returns.
As #pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the Id and Name values in your example, here is how you would do it:
require 'nokogiri'
xml_str = <<EOF
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF
doc = Nokogiri::XML(xml_str)
thing = doc.at_xpath('//things')
puts "ID = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content
A few notes:
at_xpath is for matching one thing. If you know you have multiple items, you want to use xpath instead.
Depending on your document, namespaces can be problematic, so calling doc.remove_namespaces! can help (see this answer for a brief discussion).
You can use the css methods instead of xpath if you're more comfortable with those.
Definitely play around with this in irb or pry to investigate methods.
Resources
Parsing an HTML/XML document
Getting started with Nokogiri
Update
To handle multiple items, you need a root element, and you need to remove the // in the xpath query.
require 'nokogiri'
xml_str = <<EOF
<root>
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name1</PART1:Name>
</THING1:things>
<THING2:things type="Container">
<PART2:Id type="Property">2234</PART2:Id>
<PART2:Name type="Property">The Name2</PART2:Name>
</THING2:things>
</root>
EOF
doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
puts "ID = " + thing.at_xpath('Id').content
puts "Name = " + thing.at_xpath('Name').content
end
This will give you:
Id = 1234
Name = The Name1
ID = 2234
Name = The Name2
If you are more familiar with CSS selectors, you can use this nearly identical bit of code:
doc.css('things').each do |thing|
puts "ID = " + thing.at_css('Id').content
puts "Name = " + thing.at_css('Name').content
end
If in a Rails environment, the Hash object is extended and one can take advantage of the the method from_xml:
xml = File.open("myfile.xml")
data = Hash.from_xml(xml)

Nokogiri returns XML tags as well as data

XML Data:
<configs>
<config>
<name>XP</name>
<browser>IE</browser>
<browser>FF</browser>
<browser>Chrome</browser>
</config>
</configs>
I'm new to Ruby, Nokogiri, and programming in general. I'm trying to write a QA tool to help with automation.
Ruby code:
doc = Nokogiri::XML(open("configs.xml"))
configs = doc.xpath("//configs/config").map do |i|
{'name' => i.xpath('name').to_s, 'browsers' => i.xpath('browser').to_s}
end
configs.each do |i|
puts i['name']
puts i['browsers']
end
This does what I want it to, it returns the data, but includes the XML tags. Is there a way to strip them that I'm just not finding?
Use .text to get text node data:
:name => i.xpath('name').text
.to_s is the string representation of an XML node, which is more than you're looking for.
However, the rest of your code is a bit broken if you're expecting individual browser entries.
As-is it'll smash the text data together into a single blob. Instead join them together, etc, for example:
configs = doc.xpath("//configs/config").collect do |cfg|
browsers = cfg.xpath('browser').collect { |b| b.text }.join(', ')
{ name: cfg.xpath('name').text, browsers: browsers }
end
configs.each do |i|
puts i[:name]
puts i[:browsers]
end
You may want a blob of "IEFFChrome", in which case never mind.

Nokogiri to_xml without carriage returns

I'm currently using the Nokogiri::XML::Builder class to construct an XML document, then calling .to_xml on it. The resulting string always contains a bunch of spaces, linefeeds and carriage returns in between the nodes, and I can't for the life of me figure out how to get rid of them. Here's an example:
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml
This results in the following:
<?xml version="1.0"?>
<root>Value</root>
What I want is this (notice the missing newline):
<?xml version="1.0"?><root>Value</root>
How can this be done? Thanks in advance!
Builder#to_xml by default outputs formatted (i.e. indented) XML. You can use the Nokogiri::XML::Node::SaveOptions to get an almost unformatted result.
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.foo do
xml.text("Value")
end
end
end
b.to_xml
# => "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
# => "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
Now you could either just get rid of the XML header (which is optional anyway) and remove the last newline
b.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML | Nokogiri::XML::Node::SaveOptions::NO_DECLARATION).strip
# => "<root><foo>Value</foo></root>"
Just removing all newlines in the XML is probably a bad idea as newlines can actually be significant (e.g. in <pre> blocks of XHTML). If that is not the case for you (and you are really sure of that) you could just do it.
This is not something that Nokogiri is designed to do. The closest you can get is to serialize the root of the document with no newlines or indentation, and then add the PI yourself (if you really need it):
require 'nokogiri'
b = Nokogiri::XML::Builder.new{ |xml| xml.root{ xml.foo "Value" } }
p b.to_xml
#=> "<?xml version=\"1.0\"?>\n<root>\n <foo>Value</foo>\n</root>\n"
p b.doc.serialize(save_with:0)
#=> "<?xml version=\"1.0\"?>\n<root><foo>Value</foo></root>\n"
flat_root = b.doc.root.serialize(save_with:0)
p flat_root
#=> "<root><foo>Value</foo></root>"
puts %Q{<?xml version="1.0"?>#{flat_root}}
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Alternatively, you could simply cheat and do:
puts b.doc.serialize(save_with:0).sub("\n","")
#=> <?xml version="1.0"?><root><foo>Value</foo></root>
Note the usage of sub instead of gsub to only replace the first known-present newline.
b.to_xml returns a string. You just need to replace the first instance of \n in the string.
require 'nokogiri'
b = Nokogiri::XML::Builder.new do |xml|
xml.root do
xml.text("Value")
end
end
b.to_xml.sub("\n",'')
Probably easier than trying to overload the method.

Manipulating XML files in ruby with XmlSimple

I've got a complex XML file, and I want to extract a content of a specific tag from it.
I use a ruby script with XmlSimple gem. I retrieve an XML file with HTTP request, then strip all the unnecessary tags and pull out necessary info. That's the script itself:
data = XmlSimple.xml_in(response.body)
hash_1 = Hash[*data['results']]
def find_value(hash, value)
hash.each do |key, val|
if val[0].kind_of? Hash then
find_value(val[0], value)
else
if key.to_s.eql? value
puts val
end
end
end
end
hash_1['book'].each do |arg|
find_value(arg, "title")
puts("\n")
end
The problem is, that when I change replace puts val with return val, and then call find_value method with puts find_value (arg, "title"), i get the whole contents of hash_1[book] on the screen.
How to correct the find_value method?
A "complex XML file" and XmlSimple don't mix. Your task would be solved a lot easier with Nokogiri, and be faster as well:
require 'nokogiri'
doc = Nokogiri::XML(response.body)
puts doc.xpath('//book/title/text()')

Resources