How to flatten a hash, making each key a unique value? - ruby

I want to take a hash with nested hashes and arrays and flatten it out into a single hash with unique values. I keep trying to approach this from different angles, but then I make it way more complex than it needs to be and get myself lost in what's happening.
Example Source Hash:
{
"Name" => "Kim Kones",
"License Number" => "54321",
"Details" => {
"Name" => "Kones, Kim",
"Licenses" => [
{
"License Type" => "PT",
"License Number" => "54321"
},
{
"License Type" => "Temp",
"License Number" => "T123"
},
{
"License Type" => "AP",
"License Number" => "A666",
"Expiration Date" => "12/31/2020"
}
]
}
}
Example Desired Hash:
{
"Name" => "Kim Kones",
"License Number" => "54321",
"Details_Name" => "Kones, Kim",
"Details_Licenses_1_License Type" => "PT",
"Details_Licenses_1_License Number" => "54321",
"Details_Licenses_2_License Type" => "Temp",
"Details_Licenses_2_License Number" => "T123",
"Details_Licenses_3_License Type" => "AP",
"Details_Licenses_3_License Number" => "A666",
"Details_Licenses_3_Expiration Date" => "12/31/2020"
}
For what it's worth, here's my most recent attempt before giving up.
def flattify(hashy)
temp = {}
hashy.each do |key, val|
if val.is_a? String
temp["#{key}"] = val
elsif val.is_a? Hash
temp.merge(rename val, key, "")
elsif val.is_a? Array
temp["#{key}"] = enumerate val, key
else
end
print "=> #{temp}\n"
end
return temp
end
def rename (hashy, str, n)
temp = {}
hashy.each do |key, val|
if val.is_a? String
temp["#{key}#{n}"] = val
elsif val.is_a? Hash
val.each do |k, v|
temp["#{key}_#{k}#{n}"] = v
end
elsif val.is_a? Array
temp["#{key}"] = enumerate val, key
else
end
end
return flattify temp
end
def enumerate (ary, str)
temp = {}
i = 1
ary.each do |x|
temp["#{str}#{i}"] = x
i += 1
end
return flattify temp
end

Interesting question!
Theory
Here's a recursive method to parse your data.
It keeps track of which keys and indices it has found.
It appends them in a tmp array.
Once a leaf object has been found, it gets written in a hash as value, with a joined tmp as key.
This small hash then gets recursively merged back to the main hash.
Code
def recursive_parsing(object, tmp = [])
case object
when Array
object.each.with_index(1).with_object({}) do |(element, i), result|
result.merge! recursive_parsing(element, tmp + [i])
end
when Hash
object.each_with_object({}) do |(key, value), result|
result.merge! recursive_parsing(value, tmp + [key])
end
else
{ tmp.join('_') => object }
end
end
As an example:
require 'pp'
pp recursive_parsing(data)
# {"Name"=>"Kim Kones",
# "License Number"=>"54321",
# "Details_Name"=>"Kones, Kim",
# "Details_Licenses_1_License Type"=>"PT",
# "Details_Licenses_1_License Number"=>"54321",
# "Details_Licenses_2_License Type"=>"Temp",
# "Details_Licenses_2_License Number"=>"T123",
# "Details_Licenses_3_License Type"=>"AP",
# "Details_Licenses_3_License Number"=>"A666",
# "Details_Licenses_3_Expiration Date"=>"12/31/2020"}
Debugging
Here's a modified version with old-school debugging. It might help you understand what's going on:
def recursive_parsing(object, tmp = [], indent="")
puts "#{indent}Parsing #{object.inspect}, with tmp=#{tmp.inspect}"
result = case object
when Array
puts "#{indent} It's an array! Let's parse every element:"
object.each_with_object({}).with_index(1) do |(element, result), i|
result.merge! recursive_parsing(element, tmp + [i], indent + " ")
end
when Hash
puts "#{indent} It's a hash! Let's parse every key,value pair:"
object.each_with_object({}) do |(key, value), result|
result.merge! recursive_parsing(value, tmp + [key], indent + " ")
end
else
puts "#{indent} It's a leaf! Let's return a hash"
{ tmp.join('_') => object }
end
puts "#{indent} Returning #{result.inspect}\n"
result
end
When called with recursive_parsing([{a: 'foo', b: 'bar'}, {c: 'baz'}]), it displays:
Parsing [{:a=>"foo", :b=>"bar"}, {:c=>"baz"}], with tmp=[]
It's an array! Let's parse every element:
Parsing {:a=>"foo", :b=>"bar"}, with tmp=[1]
It's a hash! Let's parse every key,value pair:
Parsing "foo", with tmp=[1, :a]
It's a leaf! Let's return a hash
Returning {"1_a"=>"foo"}
Parsing "bar", with tmp=[1, :b]
It's a leaf! Let's return a hash
Returning {"1_b"=>"bar"}
Returning {"1_a"=>"foo", "1_b"=>"bar"}
Parsing {:c=>"baz"}, with tmp=[2]
It's a hash! Let's parse every key,value pair:
Parsing "baz", with tmp=[2, :c]
It's a leaf! Let's return a hash
Returning {"2_c"=>"baz"}
Returning {"2_c"=>"baz"}
Returning {"1_a"=>"foo", "1_b"=>"bar", "2_c"=>"baz"}

Unlike the others, I have no love for each_with_object :-). But I do like passing a single result hash around so I don't have to merge and remerge hashes over and over again.
def flattify(value, result = {}, path = [])
case value
when Array
value.each.with_index(1) do |v, i|
flattify(v, result, path + [i])
end
when Hash
value.each do |k, v|
flattify(v, result, path + [k])
end
else
result[path.join("_")] = value
end
result
end
(Some details adopted from Eric, see comments)

Non-recursive approach, using BFS with an array as a queue. I keep the key-value pairs where the value isn't an array/hash, and push array/hash contents to the queue (with combined keys). Turning arrays into hashes (["a", "b"] ↦ {1=>"a", 2=>"b"}) as that felt neat.
def flattify(hash)
(q = hash.to_a).select { |key, value|
value = (1..value.size).zip(value).to_h if value.is_a? Array
!value.is_a?(Hash) || !value.each { |k, v| q << ["#{key}_#{k}", v] }
}.to_h
end
One thing I like about it is the nice combination of keys as "#{key}_#{k}". In my other solution, I could've also used a string path = '' and extended that with path + "_" + k, but that would've caused a leading underscore that I'd have to avoid or trim with extra code.

Related

How to solve the "retrieve_values" problem

I'm working on this problem:
Write a method retrieve_values that takes in two hashes and a key. The method should return an array containing the values from the two hashes that correspond with the given key.
def retrieve_values(hash1, hash2, key)
end
dog1 = {"name"=>"Fido", "color"=>"brown"}
dog2 = {"name"=>"Spot", "color"=> "white"}
print retrieve_values(dog1, dog2, "name") #=> ["Fido", "Spot"]
puts
print retrieve_values(dog1, dog2, "color") #=> ["brown", "white"]
puts
I came up with a working solution:
def retrieve_values(hash1, hash2, key)
arr = []
hash1.each { |key| } && hash2.each { |key| }
if key == "name"
arr << hash1["name"] && arr << hash2["name"]
elsif key == "color"
arr << hash1["color"] && arr << hash2["color"]
end
return arr
end
I then looked at the 'official' solution:
def retrieve_values(hash1, hash2, key)
val1 = hash1[key]
val2 = hash2[key]
return [val1, val2]
end
What is wrong with my code? Or is it an acceptable "different" approach?
Line with hash1.each { |key| } && hash2.each { |key| } just does nothing it is not needed even in your solution.
This part a bit difficult to read arr << hash1["name"] && arr << hash2["name"]. It mutates the array two times in one line, this kind of style could lead to bugs.
Also, your code sticks only to two keys name and color:
dog1 = {"name"=>"Fido", "color"=>"brown", "age" => 1}
dog2 = {"name"=>"Spot", "color"=> "white", "age" => 2}
> retrieve_values(dog1, dog2, "age")
=> []
The official solution will return [1, 2].
You don't need here to explicitly use return keyword, any block of code returns the last evaluated expression. But it is a matter of style guide.
It is possible to simplify even the official solution:
def retrieve_values(hash1, hash2, key)
[hash1[key], hash2[key]]
end

How to compare and print matching hash values in Ruby?

I have two hashes like below,
h1 = {"a" => 1, "b" => 2, "c" => 3}
h2 = {"a" => 2, "b" => 2, "d" => 3}
I want to iterate over hash1 and hash2 and find matching keys and their values and print it on console.
Example here it should return output "b" => 2 .its not working with below code,
h1.each do |key1, value1|
h2.each do |key2, value2|
if ((h2.include? key1) && (h2.include? value1))
puts "matching h2 key #{h2[key2]}and h1 key #{h1[key1]}"
else
puts " don not match h2 key #{h2[key2]}and h1 key #{h1[key1]}"
end
end
end
I am from basically C++ and Java background and its very easy to do using for loops and iterators, but using Ruby, it is very difficult.
h1.merge(h2) { |k,o,n| puts "#{k}=>#{o}" if o == n }
"b" => 2
This uses the form of Hash#merge that employs a bock to determine the values of keys that are present in both hashes being merged. See the doc for details.
The second loop is not required.
h1.each do |key1, value1|
if (h2.include? key1) and (h2[key1] == value1)
puts "Match #{key1} with value #{value1}"
else
puts "#{key1} does not match"
end
end
You might write something like
h1.each do |k,v|
if h2[k] == v
puts "matched key = #{k} and value = #{v}"
else
puts "NOT matched key = #{k} and value = #{v}"
end
end
Output
NOT matched key = a and value = 1
matched key = b and value = 2
NOT matched key = c and value = 3
If the result is the main objective, select can work too:
h1.select{|k,v| h2[k] == v }

Safely assign value to nested hash using Hash#dig or Lonely operator(&.)

h = {
data: {
user: {
value: "John Doe"
}
}
}
To assign value to the nested hash, we can use
h[:data][:user][:value] = "Bob"
However if any part in the middle is missing, it will cause error.
Something like
h.dig(:data, :user, :value) = "Bob"
won't work, since there's no Hash#dig= available yet.
To safely assign value, we can do
h.dig(:data, :user)&.[]=(:value, "Bob") # or equivalently
h.dig(:data, :user)&.store(:value, "Bob")
But is there better way to do that?
It's not without its caveats (and doesn't work if you're receiving the hash from elsewhere), but a common solution is this:
hash = Hash.new {|h,k| h[k] = h.class.new(&h.default_proc) }
hash[:data][:user][:value] = "Bob"
p hash
# => { :data => { :user => { :value => "Bob" } } }
And building on #rellampec's answer, ones that does not throw errors:
def dig_set(obj, keys, value)
key = keys.first
if keys.length == 1
obj[key] = value
else
obj[key] = {} unless obj[key]
dig_set(obj[key], keys.slice(1..-1), value)
end
end
obj = {d: 'hey'}
dig_set(obj, [:a, :b, :c], 'val')
obj #=> {d: 'hey', a: {b: {c: 'val'}}}
interesting one:
def dig_set(obj, keys, value)
if keys.length == 1
obj[keys.first] = value
else
dig_set(obj[keys.first], keys.slice(1..-1), value)
end
end
will raise an exception anyways if there's no [] or []= methods.
I found a simple solution to set the value of a nested hash, even if a parent key is missing, even if the hash already exists. Given:
x = { gojira: { guitar: { joe: 'charvel' } } }
Suppose you wanted to include mario's drum to result in:
x = { gojira: { guitar: { joe: 'charvel' }, drum: { mario: 'tama' } } }
I ended up monkey-patching Hash:
class Hash
# ensures nested hash from keys, and sets final key to value
# keys: Array of Symbol|String
# value: any
def nested_set(keys, value)
raise "DEBUG: nested_set keys must be an Array" unless keys.is_a?(Array)
final_key = keys.pop
return unless valid_key?(final_key)
position = self
for key in keys
return unless valid_key?(key)
position[key] = {} unless position[key].is_a?(Hash)
position = position[key]
end
position[final_key] = value
end
private
# returns true if key is valid
def valid_key?(key)
return true if key.is_a?(Symbol) || key.is_a?(String)
raise "DEBUG: nested_set invalid key: #{key} (#{key.class})"
end
end
usage:
x.nested_set([:instrument, :drum, :mario], 'tama')
usage for your example:
h.nested_set([:data, :user, :value], 'Bob')
any caveats i missed? any better way to write the code without sacrificing readability?
Searching for an answer to a similar question I developmentally stumbled upon an interface similar to #niels-kristian's answer, but wanted to also support a namespace definition parameter, like an xpath.
def deep_merge(memo, source)
# From: http://www.ruby-forum.com/topic/142809
# Author: Stefan Rusterholz
merger = proc { |key, v1, v2| Hash === v1 && Hash === v2 ? v1.merge(v2, &merger) : v2 }
memo.merge!(source, &merger)
end
# Like Hash#dig, but for setting a value at an xpath
def bury(memo, xpath, value, delimiter=%r{\.})
xpath = xpath.split(delimiter) if xpath.respond_to?(:split)
xpath.map!{|x|x.to_s.to_sym}.push(value)
deep_merge(memo, xpath.reverse.inject { |memo, field| {field.to_sym => memo} })
end
Nested hashes are sort of like xpaths, and the opposite of dig is bury.
irb(main):014:0> memo = {:test=>"value"}
=> {:test=>"value"}
irb(main):015:0> bury(memo, 'test.this.long.path', 'value')
=> {:test=>{:this=>{:long=>{:path=>"value"}}}}
irb(main):016:0> bury(memo, [:test, 'this', 2, 4.0], 'value')
=> {:test=>{:this=>{:long=>{:path=>"value"}, :"2"=>{:"4.0"=>"value"}}}}
irb(main):017:0> bury(memo, 'test.this.long.path.even.longer', 'value')
=> {:test=>{:this=>{:long=>{:path=>{:even=>{:longer=>"value"}}}, :"2"=>{:"4.0"=>"value"}}}}
irb(main):018:0> bury(memo, 'test.this.long.other.even.longer', 'other')
=> {:test=>{:this=>{:long=>{:path=>{:even=>{:longer=>"value"}}, :other=>{:even=>{:longer=>"other"}}}, :"2"=>{:"4.0"=>"value"}}}}
A more ruby-helper-like version of #niels-kristian answer
You can use it like:
a = {}
a.bury!([:a, :b], "foo")
a # => {:a => { :b => "foo" }}
class Hash
def bury!(keys, value)
key = keys.first
if keys.length == 1
self[key] = value
else
self[key] = {} unless self[key]
self[key].bury!(keys.slice(1..-1), value)
end
self
end
end

Ruby map a hash and extract the keys with values if nested or not

I’m using the PaperTrail gem to track changes in my Job model. Each time I change a record, the object is serialized and stored inside a changeset field on my Version model.
To display the values that have changed, I’m deserializing the object and iterating through it using map. I need help creating a method that extracts all the keys and values from this hash, regardless of whether they’re nested or not. The nesting will never be more than one level deep.
Here is my current code:
# deserializing the object into a hash
json_string = v.changeset.to_json
serialized_json = JSON.parse(json_string)
# current method to extract keys with values, but doesn’t pick up nested ones
serialized_json.map { |k,v|
puts k.titleize
puts v[1].present? ? v[1] : '<empty>'
}
# serialized_json (1st output)
{
"name" => [
[0] nil,
[1] "Epping Forest"
],
"job_address" => [
[0] nil,
[1] "51 Epping Forest, Essex, ES1 SAD"
]
}
# serialized_json (2nd output)
{
"quote" => [
[0] {
"quote" => {
"url" => nil
}
},
[1] {
"quote" => {
"url" => "/tmp/uploads/1433181671-22986-9554/file.pdf"
}
}
]
}
The values I’d like to extract from my method are:
"Name"
"Epping Forest"
"Quote Url"
"/tmp/uploads/1433181671-22986-9554/file.pdf"
In your below code, I assume you're putting [0] and [1] as index numbers to help the readers of your code, as it is not valid Ruby syntax.
"name" => [
[0] nil,
[1] "Epping Forest"
],
As for extracting 1 level deep keys and values, since it's a bit unclear to me how you would like your output formatted, I'ma ssuming a 1-D array is fine...in which case, something like this would give you the keys and values:
serialized_json hash w/valid Ruby syntax
{
"name" => [
nil,
"Epping Forest"
],
"job_address" => [
nil,
"51 Epping Forest, Essex, ES1 SAD"
]
}
Then:
serialized_json.map { |k, v| [k, v] }.flatten.reject { |x| x.nil? }
outputs:
["name", "Epping Forest", "job_address", "51 Epping Forest, Essex, ES1 SAD"]
I hope this helps :)
#str = ""
def hash_checks(hash)
hash.each do |k,v|
if v.is_a?(Array)
v.each_with_index do |element, index|
if element.is_a?(Hash)
#str = element[index] + #str if hash_check(element)
else
#str << k.titleize + "\n" + element + "\n" unless element.nil?
end
end
elsif v.is_a?(Hash)
#str << k.titleize if hash_check(v)
end
end
puts #str
end
def hash_check(hash)
add_tag = false
hash.each do |k,v|
if v.is_a?(Hash)
#str = k.titleize + " " + #str if hash_check(v)
else
#str << k.titleize + "\n" + v + "\n" unless v.nil?
add_tag = true
end
end
add_tag
end
I've excluded the '<empty>' value for nil because the data you would want to extract is the one that matters.
There's an inconsistency on the data you want though, since on your first serialized json, there are 2 non-nil values, but you just want one.
This method includes both, and runs even for multi-level nests :)

Can't convert symbol to integer from hash table

Edit: The issue is being unable to get the quantity of arrays within the hash, so it can be, x = amount of arrays. so it can be used as function.each_index{|x| code }
Trying to use the index of the amount of rows as a way of repeating an action X amount of times depending on how much data is pulled from a CSV file.
Terminal issued
=> Can't convert symbol to integer (TypeError)
Complete error:
=> ~/home/tests/Product.rb:30:in '[]' can't convert symbol into integer (TypeError) from ~home/tests/Product.rub:30:in 'getNumbRel'
from test.rb:36:in '<main>'
the function is that is performing the action is:
def getNumRel
if defined? #releaseHashTable
return #releaseHashTable[:releasename].length
else
#releaseHashTable = readReleaseCSV()
return #releaseHashTable[:releasename].length
end
end
The csv data pull is just a hash of arrays, nothing snazzy.
def readReleaseCSV()
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has started")
$log.debug("reading product csv file")
# Create a Hash where the default is an empty Array
result = Array.new
csvPath = "#{File.dirname(__FILE__)}"+"/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
if "#{column}" == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
productId = Integer(value)
if result[productId] == nil
result[productId] = Array.new
end
result[productId][result[productId].length] = proHash
end
end
end
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has finished")
#productReleaseArr = result
end
Sorry, couldn't resist, cleaned up your method.
# empty brackets unnecessary, no uppercase in method names
def read_release_csv
# you don't need + here
$log.info("Method #{self.class.name}.#{__method__} has started")
$log.debug("reading product csv file")
# you're returning this array. It is not a hash. [] is preferred over Array.new
result = []
csvPath = "#{File.dirname(__FILE__)}/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
# to_s is preferred
if column.to_s == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
# to_i is preferred
productId = value.to_i
# this notation is preferred
result[productId] ||= []
# this is identical to what you did and more readable
result[productId] << proHash
end
end
end
$log.info("Method #{self.class.name}.#{__method__} has finished")
#productReleaseArr = result
end
You haven't given much to go on, but it appears that #releaseHashTable contains an Array, not a Hash.
Update: Based on the implementation you posted, you can see that productId is an integer and that the return value of readReleaseCSV() is an array.
In order to get the releasename you want, you have to do this:
#releaseHashTable[productId][n][:releasename]
where productId and n are integers. Either you'll have to specify them specifically, or (if you don't know n) you'll have to introduce a loop to collect all the releasenames for all the products of a particular productId.
This is what Mark Thomas meant:
> a = [1,2,3] # => [1, 2, 3]
> a[:sym]
TypeError: can't convert Symbol into Integer
# here starts the backstrace
from (irb):2:in `[]'
from (irb):2
An Array is only accessible by an index like so a[1] this fetches the second element from the array
Your return a an array and thats why your code fails:
#....
result = Array.new
#....
#productReleaseArr = result
# and then later on you call
#releaseHashTable = readReleaseCSV()
#releaseHashTable[:releasename] # which gives you TypeError: can't convert Symbol into Integer

Resources