Ruby: How do I get attribute values from XML with Nokogiri? - ruby

How to get the value of the message value ("ready to use")?
<?xml version="1.0" encoding="UTF-8"?>
<response status="ok" permission_level="admin" message="ready to use" cached="0">
<title>kit</title>
</response>
Thanks

require 'rubygems'
require 'nokogiri'
string = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<response status="ok" permission_level="admin" message="ready to use" cached="0">
<title>kit</title>
</response>
}
doc = Nokogiri::XML(string)
doc.css("response").each do |response_node|
puts response_node["message"]
end
save and run this ruby file, you will get result:
#=> ready to use

You subscript them.
doc = Nokogiri::HTML(open('http://google.com'))
doc.css('img:first').first['alt']
=> "Google"

Related

How to compact existing XML using Nokogiri

I'm trying to compact an existing XML file using Nokogiri. I have the following demo code:
#!/usr/bin/env ruby
require 'nokogiri'
doc = Nokogiri.XML <<-XML.strip
<?xml version="1.0" encoding="UTF-8"?>
<root>
<foo>
<bar>test</bar>
</foo>
</root>
XML
doc.write_xml_to($stdout, indent: 0)
I expected to see:
<?xml version="1.0" encoding="UTF-8"?>
<root><foo><bar>test</bar></foo></root>
but instead I saw:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<foo>
<bar>test</bar>
</foo>
</root>
I tried:
doc.write_to($stdout, indent: 0, save_with: Nokogiri::XML::Node::SaveOptions::AS_XML)
but that doesn't work either.
How can I remove the ignorable whitespaces?
You can tell Nokogiri to ignore empty text nodes and then to output without indentation:
require 'nokogiri'
xml = <<EOT
<?xml version="1.0" encoding="UTF-8"?>
<root>
<foo>
<bar>test</bar>
</foo>
</root>
EOT
doc = Nokogiri::XML(xml) { |opts|
opts.noblanks
opts.strict.noblanks
}
doc.to_xml(:indent_text => '', :indent => 0)
# => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
# "<root>\n" +
# "<foo>\n" +
# "<bar>test</bar>\n" +
# "</foo>\n" +
# "</root>\n"
Okay, I answer my own question.
Nokogiri does not remove the white spaces because Nokogiri doesn't know if the white spaces are ignorable or not (no DTD, no schema), so it keeps all the whitespace-only text as text nodes. I should remove them manually before writing the XML doc to the IO device.
#!/usr/bin/env ruby
require 'bundler'
Bundler.require :default
doc = Nokogiri.XML <<-XML.strip
<?xml version="1.0" encoding="UTF-8"?>
<root>
<foo>
<bar>test</bar>
</foo>
</root>
XML
# remove ignorable white spaces
doc.xpath('//text()').each do |node|
node.content = '' if node.text =~ /\A\s+\z/m
end
doc.write_xml_to($stdout, indent: 0)
This is a big progress for me, but I still haven't reached my goal because the XML file I'm working on has inline self-closing tags, and there are whitespace-only text nodes between those tags that should not be compacted. I'm trying to figure out a way to handle this corner case now.

Output array of tag contents using REXML?

This has been asked before in "REXML - How to extract a single element" but the answer doesn't work. Apparently, the text method is no longer available.
I have an XML file:
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
and I can place its contents into an array using REXML:
flavors = xml_file.get_elements('//flavor')
I get an array:
puts flavors[0]
Which returns:
<flavor>Vanilla</flavor>
Instead, I want:
Vanilla
I've tried:
flavors = xml_file.get_elements('//flavor').text
But, I get:
NoMethodError: undefined method `text' for #<Array:0x007fa7a3b94220>
What's the correct way to accomplish this? I'm open to using other libraries, too.
Use Nokogiri. Your code will thank you.
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
EOT
doc.search('flavor') # => [#<Nokogiri::XML::Element:0x3feb8182fc60 name="flavor" children=[#<Nokogiri::XML::Text:0x3feb8182fa44 "Vanilla">]>]
doc.search('flavor').map(&:text) # => ["Vanilla"]
search finds all nodes, as a NodeSet, that match the CSS selector 'flavor'.
search('flavor').map(&:text) walks the NodeSet and applies (map) the text method to each Node, returning its text node(s).
If your XML is actually something more complex:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
<flavor>Chocolate</flavor>
<flavor>Strawberry</flavor>
</ice_cream>
EOT
doc.search('flavor') # => [#<Nokogiri::XML::Element:0x3fcc2a577afc name="flavor" children=[#<Nokogiri::XML::Text:0x3fcc2a5778e0 "Vanilla">]>, #<Nokogiri::XML::Element:0x3fcc2a5776c4 name="flavor" children=[#<Nokogiri::XML::Text:0x3fcc2a5774bc "Chocolate">]>, #<Nokogiri::XML::Element:0x3fcc2a5772b4 name="flavor" children=[#<Nokogiri::XML::Text:0x3fcc2a572c78 "Strawberry">]>]
doc.search('flavor').map(&:text) # => ["Vanilla", "Chocolate", "Strawberry"]
Or:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<ice_creams>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
<ice_cream>
<flavor>Chocolate</flavor>
</ice_cream>
<ice_cream>
<flavor>Strawberry</flavor>
</ice_cream>
</ice_creams>
EOT
ice_cream = doc.search('ice_cream') # => [#<Nokogiri::XML::Element:0x3fe6a91f6b00 name="ice_cream" children=[#<Nokogiri::XML::Text:0x3fe6a91f68f8 "\n ">, #<Nokogiri::XML::Element:0x3fe6a91f681c name="flavor" children=[#<Nokogiri::XML::Text:0x3fe6a91f6600 "Vanilla">]>, #<Nokogiri::XML::Text:0x3fe6a91f63f8 "\n ">]>, #<Nokogiri::XML::Element:0x3fe6a91f1de4 name="ice_cream" children=[#<Nokogiri::XML::Text:0x3fe6a91f1bdc "\n ">, #<Nokogiri::XML::Element:0x3fe6a91f1ac4 name="flavor" children=[#<Nokogiri::XML::Text:0x3fe6a91f1880 "Chocolate">]>, #<Nokogiri::XML::Text:0x3fe6a91f1678 "\n ">]>, #<Nokogiri::XML::Element:0x3fe6a91f13f8 name="ice_cream" children=[#<Nokogiri::XML::Text:0x3fe6a91f1074 "\n ">, #<Nokogiri::XML::Element:0x3fe6a91f0e80 name="flavor" children=[#<Nokogiri::XML::Text:0x3fe6a91f0a98 "Strawberry">]>, #<Nokogiri::XML::Text:0x3fe6a91f0840 "\n ">]>]
ice_cream.search('flavor').map(&:text) # => ["Vanilla", "Chocolate", "Strawberry"]
For searching, Nokogiri supports using both CSS and XPath selectors, and allows you to use either in the methods, if you want. search accepts both CSS and XPath, and has corollaries of css and xpath for the CSS or XPath specific methods. at returns a single Node and is similar to search('some_node').first and has at_css and at_xpath respectively.
Here is the code :
require 'rexml/document'
doc = <<-xml
<?xml version="1.0" encoding="UTF-8"?>
<ice_cream>
<flavor>Vanilla</flavor>
</ice_cream>
xml
xml_doc = REXML::Document.new(doc)
xml_doc.get_elements('//flavor').class # => Array
xml_doc.get_elements('//flavor')[0].class # => REXML::Element
xml_doc.get_elements('//flavor')[0].text # => "Vanilla"
Actually xml_doc.get_elements('//flavor') will give you the collection of REXML::Element objects. You then need to iterate through the collection and call the method #text on the REXML::Element object to get the text.

How do I Transform an .xml file to an instance of a ruby array?

I have the following xml file:
/my_file.xml
<?xml version="1.0" encoding="utf-8" ?>
<words>
<w>my_word</w>
<w>second_word</w>
</words>
How can I do the following using Ruby:
Load
Parse
Transform an xml file to an instance of a ruby array:
words = ["my_word","second_word"]
With the Nokogiri gem...
require 'rubygems'
require 'nokogiri'
xml = '<?xml version="1.0" encoding="utf-8" ?>
<words>
<w>my_word</w>
<w>second_word</w>
</words>'
doc = Nokogiri::XML(xml)
words = doc.xpath("//w").map {|x| x.text}

rexml and nokogiri XML parsing

Can someone please explain why there is a difference in Nokogiri and REXML outputs in the code below.
require 'rubygems'
require 'Nokogiri'
require 'rexml/document'
xml = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>
<yml>
<a>TM and © 2009</a>
</yml>"
puts 'nokogiri'
doc = Nokogiri::XML(xml)
puts doc.to_s, "\n"
puts 'rexml'
doc = REXML::Document.new(xml)
puts doc.to_s
outputs:
nokogiri
<?xml version="1.0" encoding="ISO-8859-1"?>
<yml>
<a>TM and ? 2009</a>
</yml>
rexml
<?xml version='1.0' encoding='ISO-8859-1'?>
<yml>
<a>TM and © 2009</a>
</yml>
Sure, nokogiri is converting the text using ISO-8859-1, whereas rexml is just outputting what you put in. If you change the XML to utf-8 encoding then you'll get:
nokogiri:
<?xml version="1.0" encoding="utf-8"?>
<yml>
<a>TM and © 2009</a>
</yml>
rexml:
<?xml version='1.0' encoding='UTF-8'?>
<yml>
<a>TM and © 2009</a>
</yml>

How do I get Nokogiri to add the right XML encoding?

I have created a xml doc with Nokogiri: Nokogiri::XML::Document
The header of my file is <?xml version="1.0"?> but I'd expect to have <?xml version="1.0" encoding="UTF-8"?>. Is there any options I could use so the encoding appears ?
Are you using Nokogiri XML Builder? You can pass an encoding option to the new() method:
new(options = {})
Create a new Builder object. options
are sent to the top level Document
that is being built.
Building a document with a particular encoding for example:
Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
...
end
Also this page says you can do the following (when not using Builder):
doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')
Presumably you could change 'EUC-JP' to 'UTF-8'.
When parsing the doc you can set the encoding like this:
doc = Nokogiri::XML::Document.parse(xml_input, nil, "UTF-8")
For me that returns
<?xml version="1.0" encoding="UTF-8"?>
If you're not using Nokogiri::XML::Builder but rather creating a document object directly, you can just set the encoding with Document#encoding=:
doc = Nokogiri::XML::Document.new
# => #<Nokogiri::XML::Document:0x1180 name="document">
puts doc.to_s
# <?xml version="1.0"?>
doc.encoding = 'UTF-8'
puts doc.to_s
# <?xml version="1.0" encoding="UTF-8"?>

Resources