converting nokogiri xml node into ruby hash - ruby

I have an xml like this
<parentNode>
<amount>12.0</amount><authIdCode>999999</ authIdCode><currency>USD</currency>
</parentNode>
How can I get all nodes inside the ParentNode to a hash something like below?
{amount: "12", authIdCode: "999999", currency: "USD"}
Yes I could search for individual keys using nokogiri. But is it possible to get all keys and values inside the ParentNode dynamically and turn it into a hash?
Thank you.
Note: Hash.from_xml wont work as am not using rails

Using Hash[]:
Hash[doc.search('parentNode/*').map{|n| [n.name, n.text]}]
#=> {"amount"=>"12.0", "authIdCode"=>"999999", "currency"=>"USD"}

Here is a working sample:
require 'nokogiri'
xml = <<-EOS
<parentNode>
<amount>12.0</amount>
<authIdCode>999999</authIdCode>
<currency>USD</currency>
</ parentNode>
EOS
document = Nokogiri::XML(xml)
hash = document.xpath("//parentNode/*").each_with_object({}) do |node, hash|
hash[node.name] = node.text
end
p hash # => {"amount"=>"12.0", "authIdCode"=>"999999", "currency"=>"USD"}
It finds all the children of parentNode, uses the childs name as key, its text content as value.

Related

Create a Ruby Hash out of an xml string with the 'ox' gem

I am currently trying to create a hash out of an xml documen, with the help of the ox gem
Input xml:
<?xml version="1.0"?>
<expense>
<payee>starbucks</payee>
<amount>5.75</amount>
<date>2017-06-10</date>
</expense>
with the following ruby/ox code:
doc = Ox.parse(xml)
plist = doc.root.nodes
I get the following output:
=> [#<Ox::Element:0x00007f80d985a668 #value="payee", #attributes={}, #nodes=["starbucks"]>, #<Ox::Element:0x00007f80d9839198 #value="amount", #attributes={}, #nodes=["5.75"]>, #<Ox::Element:0x00007f80d9028788 #value="date", #attributes={}, #nodes=["2017-06-10"]>]
The output I want is a hash in the format:
{'payee' => 'Starbucks',
'amount' => 5.75,
'date' => '2017-06-10'}
to save in my sqllite database. How can I transform the objects array into a hash like above.
Any help is highly appreciated.
The docs suggest you can use the following:
require 'ox'
xml = %{
<top name="sample">
<middle name="second">
<bottom name="third">Rock bottom</bottom>
</middle>
</top>
}
puts Ox.load(xml, mode: :hash)
puts Ox.load(xml, mode: :hash_no_attrs)
#{:top=>[{:name=>"sample"}, {:middle=>[{:name=>"second"}, {:bottom=>[{:name=>"third"}, "Rock bottom"]}]}]}
#{:top=>{:middle=>{:bottom=>"Rock bottom"}}}
I'm not sure that's exactly what you're looking for though.
Otherwise, it really depends on the methods available on the Ox::Element instances in the array.
From the docs, it looks like there are two handy methods here: you can use [] and text.
Therefore, I'd use reduce to coerce the array into the hash format you're looking for, using something like the following:
ox_nodes = [#<Ox::Element:0x00007f80d985a668 #value="payee", #attributes={}, #nodes=["starbucks"]>, #<Ox::Element:0x00007f80d9839198 #value="amount", #attributes={}, #nodes=["5.75"]>, #<Ox::Element:0x00007f80d9028788 #value="date", #attributes={}, #nodes=["2017-06-10"]>]
ox_nodes.reduce({}) do |hash, node|
hash[node['#value']] = node.text
hash
end
I'm not sure whether node['#value'] will work, so you might need to experiment with that - otherwise perhaps node.instance_variable_get('#value') would do it.
node.text does the following, which sounds about right:
Returns the first String in the elements nodes array or nil if there is no String node.
N.B. I prefer to tidy the reduce block a little using tap, something like the following:
ox_nodes.reduce({}) do |hash, node|
hash.tap { |h| h[node['#value']] = node.text }
end
Hope that helps - let me know how you get on!
I found the answer to the question in my last comment by myself:
def create_xml(expense)
Ox.default_options=({:with_xml => false})
doc = Ox::Document.new(:version => '1.0')
expense.each do |key, value|
e = Ox::Element.new(key)
e << value
doc << e
end
Ox.dump(doc)
end
The next question would be how can i transform the value of the amount key from a string to an integer befopre saving it to the database

Parsing a simple XML-like string with adjacent nodes

I'm using the engtagger gem to classify a sentence according to its parts of speech. The output I get is as follows:
puts text
# => "<nnp>My</nnp> <nn>name</nn> <vbz>is</vbz> <nnp>Max</nnp>"
I would have expected the gem to give me an array, but I guess I'll have to coerce this into an array myself.
What I'm eventually trying to get is a nested array something like this:
[["My", "nnp"], ["name", "nn"], ["is", "vbz"], ["Max", "nnp"]]
However I'm not really sure how to approach this with Nokogiri (or another parser library). Here's what I've tried:
(byebug) doc = Nokogiri::XML(text)
#<Nokogiri::XML::Document:0x3fd400286e78 name="document" children=[#<Nokogiri::XML::Element:0x3fd400286900 name="nnp" children=[#<Nokogiri::XML::Text:0x3fd400286464 "My">]>]>
(byebug) Nokogiri.parse(text)
#<Nokogiri::XML::Document:0x3fd40028cd50 name="document" children=[#<Nokogiri::XML::Element:0x3fd40028c7d8 name="nnp" children=[#<Nokogiri::XML::Text:0x3fd40028c378 "My">]>]>
So I've tried two different Nokogiri methods, but both are only showing the first node. How can I get the rest of the adjacent nodes as well?
Alternatively, how can I get the engtagger call to return an array? In the docs, I didn't find an example of how to return an array with all tags, only arrays with one specific kind of tag.
The main thing is that well-formed XML should have a root node. You were receiving the very first node only because it was treated as the root (that said, the topmost) node and as it was closed, Nokogiri considered the XML document to be ended.
Nokogiri::XML("<root>#{text}</root>").
children.first. # get root node
children.map { |e| [e.text, e.name] }. # map to what’s needed
reject { |e| e.last == 'text' } # filter out garbage
That filtering might be more semantically correct:
Nokogiri::XML("<root>#{text}</root>").
children.first.
children.reject { |e| Nokogiri::XML::Text === e }.
map { |e| [e.text, e.name] }
The problem is you're parsing the fragment incorrectly:
require 'nokogiri'
doc = Nokogiri::XML.fragment("<nnp>My</nnp> <nn>name</nn> <vbz>is</vbz> <nnp>Max</nnp>")
doc.to_xml # => "<nnp>My</nnp> <nn>name</nn> <vbz>is</vbz> <nnp>Max</nnp>"
Nokogiri wants valid XML, but you can get it to accept partial XML chunks using fragment.
At that point you're able to do:
doc.children.each_with_object([]){ |n, a| a << [n.text, n.name] unless n.text? }
# => [["My", "nnp"], ["name", "nn"], ["is", "vbz"], ["Max", "nnp"]]

How to convert partial XML to hash in Ruby

I have a string which has plain text and extra spaces and carriage returns then XML-like tags followed by XML tags:
String = "hi there.
<SET-TOPIC> INITIATE </SET-TOPIC>
<SETPROFILE>
<KEY>name</KEY>
<VALUE>Joe</VALUE>
</SETPROFILE>
<SETPROFILE>
<KEY>email</KEY>
<VALUE>Email#hi.com</VALUE>
</SETPROFILE>
<GET-RELATIONS>
<COLLECTION>goals</COLLECTION>
<VALUE>walk upstairs</VALUE>
</GET-RELATIONS>
So what do you think?
Is it true?
"
I want to parse this similar to use Nori or Nokogiri or Ox where they convert XML to a hash.
My goal is to be able to easily pull out the top level tags as keys and then know all the elements, something like:
Keys = ['SETPROFILE', 'SETPROFILE', 'SET-TOPIC', 'GET-OBJECT']
Values[0] = [{name => Joe}, {email => email#hi.com}]
Values[3] = [{collection => goals}, {value => walk up}]
I have seen several functions like that for true XML but all of mine are partial.
I started going down this line of thinking:
parsed = doc.search('*').each_with_object({}) do |n, h|
(h[n.name] ||= []) << n.text
end
I'd probably do something along these lines if I wanted the keys and values variables:
require 'nokogiri'
string = "hi there.
<SET-TOPIC> INITIATE </SET-TOPIC>
<SETPROFILE>
<KEY>name</KEY>
<VALUE>Joe</VALUE>
</SETPROFILE>
<SETPROFILE>
<KEY>email</KEY>
<VALUE>Email#hi.com</VALUE>
</SETPROFILE>
<GET-RELATIONS>
<COLLECTION>goals</COLLECTION>
<VALUE>walk upstairs</VALUE>
</GET-RELATIONS>
So what do you think?
Is it true?
"
doc = Nokogiri::XML('<root>' + string + '</root>', nil, nil, Nokogiri::XML::ParseOptions::NOBLANKS)
nodes = doc.root.children.reject { |n| n.is_a?(Nokogiri::XML::Text) }.map { |node|
[
node.name, node.children.map { |c|
[c.name, c.content]
}.to_h
]
}
nodes
# => [["SET-TOPIC", {"text"=>" INITIATE "}],
# ["SETPROFILE", {"KEY"=>"name", "VALUE"=>"Joe"}],
# ["SETPROFILE", {"KEY"=>"email", "VALUE"=>"Email#hi.com"}],
# ["GET-RELATIONS", {"COLLECTION"=>"goals", "VALUE"=>"walk upstairs"}]]
From nodes it's possible to grab the rest of the detail:
keys = nodes.map(&:first)
# => ["SET-TOPIC", "SETPROFILE", "SETPROFILE", "GET-RELATIONS"]
values = nodes.map(&:last)
# => [{"text"=>" INITIATE "},
# {"KEY"=>"name", "VALUE"=>"Joe"},
# {"KEY"=>"email", "VALUE"=>"Email#hi.com"},
# {"COLLECTION"=>"goals", "VALUE"=>"walk upstairs"}]
values[0] # => {"text"=>" INITIATE "}
If you'd rather, it's possible to pre-process the DOM and remove the top-level text:
doc.root.children.select { |n| n.is_a?(Nokogiri::XML::Text) }.map(&:remove)
doc.to_xml
# => "<root><SET-TOPIC> INITIATE </SET-TOPIC><SETPROFILE><KEY>name</KEY><VALUE>Joe</VALUE></SETPROFILE><SETPROFILE><KEY>email</KEY><VALUE>Email#hi.com</VALUE></SETPROFILE><GET-RELATIONS><COLLECTION>goals</COLLECTION><VALUE>walk upstairs</VALUE></GET-RELATIONS></root>\n"
That makes it easier to work with the XML.
Wrap the string content in a node and you can parse that with Nokogiri. The text outside the XML segment will be text node in the new node.
str = "hi there. .... Is it true?"
doc = Nokogiri::XML("<wrapper>#{str}</wrapper>")
segments = doc.xpath('/*/SETPROFILE')
Now you can use "Convert a Nokogiri document to a Ruby Hash" to convert the segments into a hash.
However, if the plain text contains some characters that needs to be escaped in the XML spec you'll need to find those and escape them yourself.

Adding a XML Element to a Nokogiri::XML::Builder document

How can I add a Nokogiri::XML::Element to a XML document that is being created with Nokogiri::XML::Buider?
My current solution is to serialize the element and use the << method to have the Builder reinterpret it.
orig_doc = Nokogiri::XML('<root xmlns="foobar"><a>test</a></root>')
node = orig_doc.at('/*/*[1]')
puts Nokogiri::XML::Builder.new do |doc|
doc.another {
# FIXME: this is the round-trip I would like to avoid
xml_text = node.to_xml(:skip_instruct => true).to_s
doc << xml_text
doc.second("hi")
}
end.to_xml
# The expected result is
#
# <another>
# <a xmlns="foobar">test</a>
# <second>hi</second>
# </another>
However the Nokogiri::XML::Element is a quite big node (in the order of kilobytes and thousands of nodes) and this code is in the hot path. Profiling shows that the serialization/parsing round trip is very expensive.
How can I instruct the Nokogiri Builder to add the existing XML element node in the "current" position?
Without using a private method you can get a handle on the current parent element using the parent method of the Builder instance. Then you can append an element to that (even from another document). For example:
require 'nokogiri'
doc1 = Nokogiri.XML('<r><a>success!</a></r>')
a = doc1.at('a')
# note that `xml` is not a Nokogiri::XML::Document,
# but rather a Nokogiri::XML::Builder instance.
doc2 = Nokogiri::XML::Builder.new do |xml|
xml.some do
xml.more do
xml.parent << a
end
end
end.doc
puts doc2
#=> <?xml version="1.0"?>
#=> <some>
#=> <more>
#=> <a>success!</a>
#=> </more>
#=> </some>
After looking at the Nokogiri source I have found this fragile solution: using the protected #insert(node) method.
The code, modified to use that private method looks like this:
doc.another {
xml_text = node.to_xml(:skip_instruct => true).to_s
doc.send('insert', xml_text) # <= use `#insert` instead of `<<`
doc.second("hi")
}

How to get values in XML data using Nokogiri?

I'm using Nokogiri to parse XML data that I'm getting from the roar engine after I create a user. The XML looks like below:
<roar tick="135098427907">
<facebook>
<create_oauth status="ok">
<auth_token>14802206136746256007</auth_token>
<player_id>8957881063899628798</player_id>
</create_oauth>
</facebook>
</roar>
I'm totally new to Nokogiri. How do I get the value of status, the auth_token and player_id?
str = "<roar ......"
doc = Nokogiri.XML(str)
puts doc.xpath('//create_oauth/#status') # => ok
puts doc.xpath('//auth_token').text # => 148....
# player_id is the same as auth_token
And it is a great idea to learn you some good xpath from w3schools.
How about this
h1 = Nokogiri::XML.parse %{
<roar tick="135098427907">
<facebook>
<create_oauth status="ok">
<auth_token>14802206136746256007</auth_token>
<player_id>8957881063899628798</player_id>
</create_oauth>
</facebook>
</roar>
}
h1.xpath("//facebook/create_oauth/auth_token").text()
h1.xpath("//facebook/create_oauth/player_id").text()
You can use Nori gem. Its a xml to hash converter and in ruby its so much convenient to access hashes
require 'nori'
Nori.parser = :nokogiri
xml = "<roar tick='135098427907'>
<facebook>
<create_oauth status='ok'>
<auth_token>14802206136746256007</auth_token>
<player_id>8957881063899628798</player_id>
</create_oauth>
</facebook>
</roar>"
hash = Nori.parse(xml)
create_oauth = hash["roar"]["facebook"]["create_oauth"]
puts create_oauth["auth_token"] # 14802206136746256007
puts create_oauth["#status"] # ok
puts create_oauth["player_id"] # 8957881063899628798

Resources