How can I remove an element in Ox? - ruby

How can I remove an element when parsing XML with Ox?
Ox has an append method - (Object) <<(node) but doesn't seem to have a - (Element) remove method. Nokogiri has a remove function, does Ox have an equivalent?
http://www.ohler.com/ox/Ox/Element.html

Consider this document:
doc = Ox::Document.new(:version => '1.0')
top = Ox::Element.new('top')
top[:name] = 'sample'
doc << top
Now you can observe:
doc.nodes.class => Array
Your nodes are just a regular ruby array.
And thus you have all the Enumerable functionality combined with the Array facilities of Ruby.
To delete the element we've created above, you can do this:
doc.nodes.delete top
Or an index-based removal if that's what you need:
doc.nodes.delete_at 0
Hope this helps

Related

How do I create a child element within a Nokogiri node?

I’m using Rails 4.2.7 with Nokogiri. I’m having trouble creating a child node. I have the following code
general = doc.xpath("//lomimscc:general")
description = Nokogiri::XML::Node.new "lomimscc:description", doc
string = Nokogiri::XML::Node.new "lomimscc:string", doc
string.content = scenario.abstract
string['language'] = 'en'
description << string
general << description
I want the “description” element to be a child element of the “general” element (and similarly I want the “string” element to be a child of the “description” element). However what is happening is that the description element is appearing as a sibling of the general element. How do I make the element appear as a child instead of a sibling?
The tutorials show how to do this in "Creating new nodes", but the simple example is:
require 'nokogiri'
doc = Nokogiri::XML('<root/>')
doc.at('root').add_child('<foo/>')
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo/>\n</root>\n"
Nokogiri makes it easy to build nodes using a string that contains the markup or nodes you want to add.
You should be able to build upon this easily.
This is also noted throughout the Node documentation any place you see "node_or_tags".
When I changed
general = doc.xpath("//lomimscc:general")
to
general = doc.xpath("//lomimscc:general").first
then everything worked as far as creating child nodes.

Ruby: Extract and operate on partially extracted Nokogiri objects

require 'nokogiri'
xml = DATA.read
xml_nokogiri = Nokogiri::XML.parse xml
widgets = xml_nokogiri.xpath("//Widget")
dates = widgets.map { |widget| widget.xpath("//DateAdded").text }
puts dates
__END__
<Widgets>
<Widget>
<Price>42</Price>
<DateAdded>04/22/1989</DateAdded>
</Widget>
<Widget>
<Price>29</Price>
<DateAdded>02/05/2015</DateAdded>
</Widget>
</Widgets>
Notes:
This is a contrived example I cooked up as its very inconvenient to post the actual code because of too many dependencies. Did this as this code is readily testable on copy/paste.
widgets is a Nokogiri::XML::NodeSet object which has two Nokogiri::XML::Elements. Each of which is the xml fragment corresponding to the Widget tag.
I am intending to operate on each of those fragments with xpath again, but use of xpath query that starts with // seems to query from the ROOT of the xml AGAIN not the individual fragment.
Any idea why its so? Was expecting dates to hold the tag of each fragment alone.
EDIT: Assume that the tags have a complicated structure that
relative addressing is not practical (like using
xpath("DateAdded"))
.//DateAdded will give you relative XPath (any nested DateAdded node), as well as simple DateAdded without preceding slashes (immediate child):
- dates = widgets.map { |widget| widget.xpath("//DateAdded").text }
# for immediate children use 'DateAdded'
+ dates = widgets.map { |widget| widget.xpath("DateAdded").text }
# for nested elements use './/DateAdded'
+ dates = widgets.map { |widget| widget.xpath(".//DateAdded").text }
#⇒ [
# [0] "04/22/1989",
# [1] "02/05/2015"
#]

Any string to XML in Ruby

I am trying to convert a random string (which is build in XML format) in to an xml, so I can apply the "to_hash" function to it.
This is what I have:
model = live_requests[3]
parser = XML::Parser.string(model)
model_xml = parser.parse
puts model.to_hash
Now why am I getting an error when 'model_xml' should be an XML file?
I am using LibXML by the way.
http://libxml.rubyforge.org/rdoc/index.html
Libxml does not support the to_hash method. If you are looking for a way to do this that doesn't require traversing XML nodes and bulding the hash manually you should take a look at Nori.
Nori.parse("<tag>This is the contents</tag>")
# => { 'tag' => 'This is the contents' }
If you want to learn how to traverse Libxml's node trees take a look at the answer to this question.

Extracting HTML5 data attributes from a tag

I want to extract all the HTML5 data attributes from a tag, just like this jQuery plugin.
For example, given:
<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
I want to get a hash like:
{ 'data-age' => '50', 'data-location' => 'London' }
I was originally hoping use a wildcard as part of my CSS selector, e.g.
Nokogiri(html).css('span[#data-*]').size
but it seems that isn't supported.
Option 1: Grab all data elements
If all you need is to list all the page's data elements, here's a one-liner:
Hash[doc.xpath("//span/#*[starts-with(name(), 'data-')]").map{|e| [e.name,e.value]}]
Output:
{"data-age"=>"50", "data-location"=>"London"}
Option 2: Group results by tag
If you want to group your results by tag (perhaps you need to do additional processing on each tag), you can do the following:
tags = []
datasets = "#*[starts-with(name(), 'data-')]"
#If you want any element, replace "span" with "*"
doc.xpath("//span[#{datasets}]").each do |tag|
tags << Hash[tag.xpath(datasets).map{|a| [a.name,a.value]}]
end
Then tags is an array containing key-value hash pairs, grouped by tag.
Option 3: Behavior like the jQuery datasets plugin
If you'd prefer the plugin-like approach, the following will give you a dataset method on every Nokogiri node.
module Nokogiri
module XML
class Node
def dataset
Hash[self.xpath("#*[starts-with(name(), 'data-')]").map{|a| [a.name,a.value]}]
end
end
end
end
Then you can find the dataset for a single element:
doc.at_css("span").dataset
Or get the dataset for a group of elements:
doc.css("span").map(&:dataset)
Example:
The following is the behavior of the dataset method above. Given the following lines in the HTML:
<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
<span data-age="40" data-location="Oxford" class="highlight">Jim Foggs</span>
The output would be:
[
{"data-location"=>"London", "data-age"=>"50"},
{"data-location"=>"Oxford", "data-age"=>"40"}
]
You can do this with a bit of xpath:
doc = Nokogiri.HTML(html)
data_attrs = doc.xpath "//span/#*[starts-with(name(), 'data-')]"
This gets all the attributes of span elements that start with 'data-'. (You might want to do this in two steps, first to get all the elements you're interested in, then extract the data attributes from each in turn.
Continuing the example (using the span in your question):
hash = data_attrs.each_with_object({}) do |n, hsh|
hsh[n.name] = n.value
end
puts hash
produces:
{"data-age"=>"50", "data-location"=>"London"}
Try looping through element.attributes while ignoring any attribue that does not start with a data-.
The Node#css docs mention a way to attach a custom psuedo-selector. This might look like the following for selecting nodes with attributes starting with 'data-':
Nokogiri(html).css('span:regex_attrs("^data-.*")', Class.new {
def regex_attrs node_set, regex
node_set.find_all { |node| node.attributes.keys.any? {|k| k =~ /#{regex}/ } }
end
}.new)

does ruby have an elegant way to say array2 = some_lookup_method(array1)

I have an array short_code[] that contains an array of short product identifiers such as ["11111", "2222", "33333"]
I want to create a copy of the array that contains the corresponding 'long code' data:
long_code[i] = my_lookup_long_code(short_code[i])
While simple iteration is easy, I'm wondering, as a relative ruby newbie, what is the 'ruby way' to create an array which is a simply method() applied on every element in the original array?
You can use the map command, which will return a new array with the results of your code block:
long_code = short_code.map{ |code| my_lookup_long_code(code) }

Resources