Parsing XML to hash with Nori and Nokogiri with undesired result - ruby

I am attempting to convert an XML document to a Ruby hash using Nori. But instead of receiving a collection of the root element, a new node containing the collection is returned. This is what I am doing:
#xml = content_for(:layout)
#hash = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(#xml)
or
#hash = Hash.from_xml(#xml)
Where the content of #xml is:
<bundles>
<bundle>
<id>6073</id>
<name>Bundle-1</name>
<status>1</status>
<bundle_type>
<id>6713</id>
<name>BundleType-1</name>
</bundle_type>
<begin_at nil=\"true\"></begin_at>
<end_at nil=\"true\"></end_at>
<updated_at>2013-03-21T23:02:32Z</updated_at>
<created_at>2013-03-21T23:02:32Z</created_at>
</bundle>
<bundle>
<id>6074</id>
<name>Bundle-2</name>
<status>1</status>
<bundle_type>
<id>6714</id>
<name>BundleType-2</name>
</bundle_type>
<begin_at nil=\"true\"></begin_at>
<end_at nil=\"true\"></end_at>
<updated_at>2013-03-21T23:02:32Z</updated_at>
<created_at>2013-03-21T23:02:32Z</created_at>
</bundle>
</bundles>
The parser returns #hash of format:
{"bundles"=>{"bundle"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}}
Instead I would like to get:
{"bundles"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}
The point is that I control the XML, where it if formed similar to the way described above.
My question is also related to Does RABL's JSON output not conform to standard? Can it?

Imagine an XML that consists only of a list of the same tags, e.g.
<shoppinglist>
<item>apple</item>
<item>banana</item>
<item>cherry</item>
<item>pear</item>
<shoppinglist>
When you convert this into a hash, it is quite straightforward to access the items with e.g. hash['shoppinglist']['item'][0]. But what would you expect in this case? just an array? According to your logic, the items should now be accessible with hash['shoppinglist'][0] but what if you have different elements inside the container e.g.
<shoppinglist>
<date>2013-01-01</date>
<item>apple</item>
<item>banana</item>
<item>cherry</item>
<item>pear</item>
<shoppinglist>
How would you now access the items? And how the date? The problem is that the conversion to a hash has to work in the general case.
Although i do not know Nori, i am pretty sure what you ask from it is not baked in, just because it makes no sense when you consider the general case. As an alternative, you can still get the bundle array up one level by yourself:
#hash['bundles'] = #hash['bundles']['bundle']

The general solution to to your problem is not very pretty.
I created a special Object that I named an ArrayHash. It has the special property that if in has only one key and that value of the data pointed to by that key is an array it adds integer keys to those array elements.
So if normal ruby Hash dictionary would look like
{bundle"=>["0", "1", "A", "B"]}
then in an ArrayHash dictionaary would look like this
{"bundle"=>["0", "1", "A", "B"], 0=>"0", 1=>"1", 2=>"A", 3=>"B"}
Since the extra keys are of type Fixnum this Hash looks just like the Array
[ "0", "1", "A", "B" ]
except that it also has a "bundle" entry so its size is 5
Below is the code to force Nori to use this special Dictionary.
require 'nori'
class Nori
class ArrayHash < Hash
def [](a)
if a.is_a? Fixnum and self.size == 1
key = self.keys[0]
self[key][a]
else
super
end
end
def inspect
if self.size == 1 and self.to_a[0][1].class == Array
p = Hash[self.to_a]
self.values[0].each.with_index do |v, i|
p[i] = v
end
p.inspect
else
super
end
end
end
end
class Nori
class XMLUtilityNode
alias :old_to_hash :to_hash
def to_hash
ret = old_to_hash
raise if ret.size != 1
raise unless ret.class == Hash
a = ret.to_a[0]
k, v = a.first, a.last
if v.class == Hash
v = ArrayHash[ v.to_a ]
end
ret = ArrayHash[ k, v ]
ret
end
end
end
h = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(<<EOF)
<top>
<aundles>
<bundle>0</bundle>
<bundle>1</bundle>
<bundle>A</bundle>
<bundle>B</bundle>
</aundles>
<bundles>
<nundle>A</nundle>
<bundle>A</bundle>
<bundle>B</bundle>
</bundles>
</top>
EOF
puts "#{h['top']['aundles'][0]} == #{ h['top']['aundles']['bundle'][0]}"
puts "#{h['top']['aundles'][1]} == #{ h['top']['aundles']['bundle'][1]}"
puts "#{h['top']['aundles'][2]} == #{ h['top']['aundles']['bundle'][2]}"
puts "#{h['top']['aundles'][3]} == #{ h['top']['aundles']['bundle'][3]}"
puts h.inspect
The output is then
0 == 0
1 == 1
A == A
B == B
{"top"=>{"aundles"=>{"bundle"=>["0", "1", "A", "B"], 0=>"0", 1=>"1", 2=>"A", 3=>"B"}, "bundles"=>{"nundle"=>"A", "bundle"=>["A", "B"]}}}

Related

Ruby array of hashes values to string

I have an array of hashes (#1) that looks like this:
data = [{"username"=>"Luck", "mail"=>"root#localhost.net", "active"=>0}]
that I am trying to compare with following array of hashes (#2):
test = [{"username"=>"Luck", "mail"=>"root#localhost.net", "active"=>"0"}]
where #1 I obtained from database by mysql2 (what actually is in the database)
and #2 from my cucumber scenario (what I minimally expect ot be there).
By definition #2 must be a subset of #1 so I follow with this code:
data = data.to_set
test = test.to_set
assert test.subset?(data)
The problem is in data array the value of active is NOT a string. In case of data it is Fixnum, and in case of test, it is String.
I need a solution that will work even for more than one hash in the array. (As the database can return more than one row of results) That is why I convert to sets and use subset?
From other questions I got:
data.each do |obj|
obj.map do |k, v|
{k => v.to_s}
end
end
However it does not work for me. Any ideas?
Assumptions you can make:
All the keys in data will always be Strings.
All the keys in test will always be Strings. And always be the identical to data.
All the values in test will always be Strings.
Here are a couple of approaches that should do it, assuming I understand the question correctly.
#1: convert the hash values to strings
def stringify_hash_values(h)
h.each_with_object({}) { |(k,v),h| h[k] = v.to_s }
end
def sorta_subset?(data,test)
(test.map { |h| stringify_hash_values(data) } -
data.map { |h| stringify_hash_values(data) }).empty?
end
data = [{"username"=>"Luck", "mail"=>"root#localhost.net", "active"=>0}]
test = [{"username"=>"Luck", "mail"=>"root#localhost.net", "active"=>"0"}]
sorta_subset?(data,test) #=> true
#2 see if keys are the same and values converted to strings are equal
require 'set'
def hashes_sorta_equal?(h,g)
hk = h.keys
(hk.to_set == g.keys.to_set) &&
(h.values_at(*hk).map(&:to_s) == g.values_at(*hk).map(&:to_s))
end
def sorta_subset?(data,test)
test.all? { |h| data.any? { |g| hashes_sorta_equal?(g,h) } }
end
sorta_subset?(data,test) #=> true
Don't ask me why it works, but I found A solution:
data.map! do |obj|
obj.each do |k, v|
obj[k] = "#{v}"
end
end
I think it has something to do with what functions on arrays and hashes change the object itself and not create a changed copy of the object.

Ruby search for super nested key from json response

I have a terribly nested Json response.
[[{:test=>[{:id=>1, :b=>{id: '2'}}]}]]
There's more arrays than that but you get the idea.
Is there a way to recursively search through and find all the items that have a key I need?
I tried using this function extract_list() but it doesn't handle arrays well.
def nested_find(obj, needed_keys)
return {} unless obj.is_a?(Array) || obj.is_a?(Hash)
obj.inject({}) do |hash, val|
if val.is_a?(Hash) && (tmp = needed_keys & val.keys).length > 0
tmp.each { |key| hash[key] = val[key] }
elsif val.is_a?(Array)
hash.merge!(obj.map { |v| nested_find(v, needed_keys) }.reduce(:merge))
end
hash
end
end
Example
needed_keys = [:id, :another_key]
nested_find([ ['test', [{id:1}], [[another_key: 5]]]], needed_keys)
# {:id=>1, :another_key=>5}
The following is not what I'd suggest, but just to give a brief alternative to the other solutions provided:
2.1.1 :001 > obj = [[{:test=>[{:id=>1, :b=>{id: '2'}}]}]]
=> [[{:test=>[{:id=>1, :b=>{:id=>"2"}}]}]]
2.1.1 :002 > key = :id
=> :id
2.1.1 :003 > obj.inspect.scan(/#{key.inspect}=>([^,}]*)[,}]/).flatten.map {|s| eval s}
=> [1, "2"]
Note: use of eval here is just for an example. It would fail/produce incorrect results on anything whose inspect value was not eval-able back to the same instance, and it can execute malicious code:
You'll need to write your own recursive handler. Assuming that you've already converted your JSON to a Ruby data structure (via JSON.load or whatnot):
def deep_find_value_with_key(data, desired_key)
case data
when Array
data.each do |value|
if found = deep_find_value_with_key value, desired_key
return found
end
end
when Hash
if data.key?(desired_key)
data[desired_key]
else
data.each do |key, val|
if found = deep_find_value_with_key(val, desired_key)
return found
end
end
end
end
return nil
end
The general idea is that given a data structure, you check it for the key (if it's a hash) and return the matching value if found. Otherwise, you iterate it (if it's an Array or Hash) and perform the same check on each of it's children.
This will find the value for the first occurrence of the given key, or nil if the key doesn't exist in the tree. If you need to find all instances then it's slightly different - you basically need to pass an array that will accumulate the values:
def deep_find_value_with_key(data, desired_key, hits = [])
case data
when Array
data.each do |value|
deep_find_value_with_key value, desired_key, hits
end
when Hash
if data.key?(desired_key)
hits << data[desired_key]
else
data.each do |key, val|
deep_find_value_with_key(val, desired_key)
end
end
end
return hits
end

Ruby hash with multiple keys pointing to the same value

I am looking for a way to have, I would say synonym keys in the hash.
I want multiple keys to point to the same value, so I can read/write a value through any of these keys.
As example, it should work like that (let say :foo and :bar are synonyms)
hash[:foo] = "foo"
hash[:bar] = "bar"
puts hash[:foo] # => "bar"
Update 1
Let me add couple of details. The main reason why I need these synonyms, because I receive keys from external source, which I can't control, but multiple keys could actually be associated with the same value.
Rethink Your Data Structure
Depending on how you want to access your data, you can make either the keys or the values synonyms by making them an array. Either way, you'll need to do more work to parse the synonyms than the definitional word they share.
Keys as Definitions
For example, you could use the keys as the definition for your synonyms.
# Create your synonyms.
hash = {}
hash['foo'] = %w[foo bar]
hash
# => {"foo"=>["foo", "bar"]}
# Update the "definition" of your synonyms.
hash['baz'] = hash.delete('foo')
hash
# => {"baz"=>["foo", "bar"]}
Values as Definitions
You could also invert this structure and make your keys arrays of synonyms instead. For example:
hash = {["foo", "bar"]=>"foo"}
hash[hash.rassoc('foo').first] = 'baz'
=> {["foo", "bar"]=>"baz"}
You could subclass hash and override [] and []=.
class AliasedHash < Hash
def initialize(*args)
super
#aliases = {}
end
def alias(from,to)
#aliases[from] = to
self
end
def [](key)
super(alias_of(key))
end
def []=(key,value)
super(alias_of(key), value)
end
private
def alias_of(key)
#aliases.fetch(key,key)
end
end
ah = AliasedHash.new.alias(:bar,:foo)
ah[:foo] = 123
ah[:bar] # => 123
ah[:bar] = 456
ah[:foo] # => 456
What you can do is completely possible as long as you assign the same object to both keys.
variable_a = 'a'
hash = {foo: variable_a, bar: variable_a}
puts hash[:foo] #=> 'a'
hash[:bar].succ!
puts hash[:foo] #=> 'b'
This works because hash[:foo] and hash[:bar] both refer to the same instance of the letter a via variable_a. This however wouldn't work if you used the assignment hash = {foo: 'a', bar: 'a'} because in that case :foo and :bar refer to different instance variables.
The answer to your original post is:
hash[:foo] = hash[:bar]
and
hash[:foo].__id__ == hash[:bar].__id__it
will hold true as long as the value is a reference value (String, Array ...) .
The answer to your Update 1 could be:
input.reduce({ :k => {}, :v => {} }) { |t, (k, v)|
t[:k][t[:v][v] || k] = v;
t[:v][v] = k;
t
}[:k]
where «input» is an abstract enumerator (or array) of your input data as it comes [key, value]+, «:k» your result, and «:v» an inverted hash that serves the purpose of finding a key if its value is already present.

How to declare a two-dimensional array in Ruby

I want a twodimensional array in Ruby, that I can access for example like this:
if #array[x][y] == "1" then #array[x][y] = "0"
The problem is: I don't know the initial sizes of the array dimensions and i grow the array (with the << operator).
How do I declare it as an instance variable, so I get no error like this?
undefined method `[]' for nil:NilClass (NoMethodError)
QUESTION UPDATED:
#array = Array.new {Array.new}
now works for me, so the comment from Matt below is correct!
I just found out the reason why I received the error was because I iterated over the array like this:
for i in 0..#array.length
for j in 0..#array[0].length
#array[i][j] ...
which was obviously wrong and produced the error. It has to be like this:
for i in 0..#array.length-1
for j in 0..#array[0].length-1
#array[i][j] ...
A simple implementation for a sparse 2-dimensional array using nested Hashes,
class SparseArray
attr_reader :hash
def initialize
#hash = {}
end
def [](key)
hash[key] ||= {}
end
def rows
hash.length
end
alias_method :length, :rows
end
Usage:
sparse_array = SparseArray.new
sparse_array[1][2] = 3
sparse_array[1][2] #=> 3
p sparse_array.hash
#=> {1=>{2=>3}}
#
# dimensions
#
sparse_array.length #=> 1
sparse_array.rows #=> 1
sparse_array[0].length #=> 0
sparse_array[1].length #=> 1
Matt's comment on your question is totally correct. However, based on the fact that you've tagged this "conways-game-of-life", it looks like you are trying to initialize a two dimensional array and then use this in calculations for the game. If you wanted to do this in Ruby, one way to do this would be:
a = Array.new(my_x_size) { |i| Array.new(my_y_size) { |i| 0 }}
which will create a my_x_size * my_y_size array filled with zeros.
What this code does is to create a new Array of your x size, then initialize each element of that array to be another Array of your y size, and initialize each element of each second array with 0's.
Ruby's Array doesn't give you this functionality.
Either you manually do it:
(#array[x] ||= [])[y] = 42
Or you use hashes:
#hash = Hash.new{|h, k| h[k] = []}
#hash[42][3] = 42
#hash # => {42 => [nil, nil, nil, 42]}

Creating an md5 hash of a number, string, array, or hash in Ruby

I need to create a signature string for a variable in Ruby, where the variable can be a number, a string, a hash, or an array. The hash values and array elements can also be any of these types.
This string will be used to compare the values in a database (Mongo, in this case).
My first thought was to create an MD5 hash of a JSON encoded value, like so: (body is the variable referred to above)
def createsig(body)
Digest::MD5.hexdigest(JSON.generate(body))
end
This nearly works, but JSON.generate does not encode the keys of a hash in the same order each time, so createsig({:a=>'a',:b=>'b'}) does not always equal createsig({:b=>'b',:a=>'a'}).
What is the best way to create a signature string to fit this need?
Note: For the detail oriented among us, I know that you can't JSON.generate() a number or a string. In these cases, I would just call MD5.hexdigest() directly.
I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.
This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.
def createsig(body)
Digest::MD5.hexdigest( sigflat body )
end
def sigflat(body)
if body.class == Hash
arr = []
body.each do |key, value|
arr << "#{sigflat key}=>#{sigflat value}"
end
body = arr
end
if body.class == Array
str = ''
body.map! do |value|
sigflat value
end.sort!.each do |value|
str << value
end
end
if body.class != String
body = body.to_s << body.class.to_s
end
body
end
> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true
If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:
require 'digest/md5'
class Object
def md5key
to_s
end
end
class Array
def md5key
map(&:md5key).join
end
end
class Hash
def md5key
sort.map(&:md5key).join
end
end
Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:
def createsig(o)
Digest::MD5.hexdigest(o.md5key)
end
Example:
body = [
{
'bar' => [
345,
"baz",
],
'qux' => 7,
},
"foo",
123,
]
p body.md5key # => "bar345bazqux7foo123"
p createsig(body) # => "3a92036374de88118faf19483fe2572e"
Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].
Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)
I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".
module ObjSum
refine Object do
def objsum
parts = []
queue = [self]
while queue.size > 0
item = queue.shift
if item.kind_of?(Hash)
parts << "\\000"
item.keys.sort.each do |k|
queue << k
queue << item[k]
end
elsif item.kind_of?(Set)
parts << "\\001"
item.to_a.sort.each { |i| queue << i }
elsif item.kind_of?(Enumerable)
parts << "\\002"
item.each { |i| queue << i }
elsif item.kind_of?(Fixnum)
parts << "\\003"
parts << item.to_s
elsif item.kind_of?(Float)
parts << "\\004"
parts << item.to_s
else
parts << item.to_s
end
end
Digest::MD5.hexdigest(parts.join)
end
end
end
Just my 2 cents:
module Ext
module Hash
module InstanceMethods
# Return a string suitable for generating content signature.
# Signature image does not depend on order of keys.
#
# {:a => 1, :b => 2}.signature_image == {:b => 2, :a => 1}.signature_image # => true
# {{:a => 1, :b => 2} => 3}.signature_image == {{:b => 2, :a => 1} => 3}.signature_image # => true
# etc.
#
# NOTE: Signature images of identical content generated under different versions of Ruby are NOT GUARANTEED to be identical.
def signature_image
# Store normalized key-value pairs here.
ar = []
each do |k, v|
ar << [
k.is_a?(::Hash) ? k.signature_image : [k.class.to_s, k.inspect].join(":"),
v.is_a?(::Hash) ? v.signature_image : [v.class.to_s, v.inspect].join(":"),
]
end
ar.sort.inspect
end
end
end
end
class Hash #:nodoc:
include Ext::Hash::InstanceMethods
end
These days there is a formally defined method for canonicalizing JSON, for exactly this reason: https://datatracker.ietf.org/doc/html/draft-rundgren-json-canonicalization-scheme-16
There is a ruby implementation here: https://github.com/dryruby/json-canonicalization
Depending on your needs, you could call ary.inspect or ary.to_yaml, even.

Resources