Iterate through array of hashes and merge [closed] - ruby

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have an array of hashes like so:
[{:a=>"a", :b=>"b", :c=>"c", :d=>"d"},
{:a=>"a", :b=>nil, :c=>"notc", :d=>"d"}]
I want to iterate over the hash and merge them where a specific key is the same, such as :a. If the next key element is the same - ignore it, but if the key is different, create an array. Result will look something like this:
{:a=>"a", :b=>"b", :c=>["c","notc"], :d=>"d"}
I think I have do a for loop through the array of hashes and then use the merge! method, but not sure where to start

I would also use Hash#merge! (aka update), like this (letting a denote the name of your array of hashes):
a.each_with_object({}) do |g,h|
h.update(g) do |_,o,n|
case o
when Array
o.include?(n) ? o : o + [n]
else
o.eql?(n) ? o : [o,n]
end
end
end
#=> {:a=>"a", :b=>["b", nil], :c=>["c", "notc"], :d=>"d"}
When o is an array, if you don't want to merge nil values, change the following line to:
(o.include?(n) || n.nil?) ? o : o + [n]
The steps:
a = [{:a=>"a", :b=>"b", :c=>"c", :d=>"d"},
{:a=>"a", :b=>nil, :c=>"notc", :d=>"d"},
{:a=>"aa", :b=>"b", :c=>"cc", :d=>"d"},
]
enum = a.each_with_object({})
#-> #<Enumerator: [{:a=>"a", :b=>"b", :c=>"c", :d=>"d"},
# {:a=>"a", :b=>nil, :c=>"notc", :d=>"d"},
# {:a=>"aa", :b=>"b", :c=>"cc", :d=>"d"}]
# :each_with_object({})>
We can see the values of the enumerator (which will be passed into the block) by converting it to an array:
enum.to_a
#=> [[{:a=>"a", :b=>"b", :c=>"c", :d=>"d"}, {}],
# [{:a=>"a", :b=>nil, :c=>"notc", :d=>"d"}, {}],
# [{:a=>"aa", :b=>"b", :c=>"cc", :d=>"d"}, {}]]
Pass in the first value and assign it to the block variables:
g,h = enum.next
#=> [{:a=>"a", :b=>"b", :c=>"c", :d=>"d"}, {}]
g #=> {:a=>"a", :b=>"b", :c=>"c", :d=>"d"}
h #=> {}
update's block is used for determining the values of keys that are present in both hashes being merged. As h is presently empty ({}), it is not used for this merge:
h.update(g)
#=> {:a=>"a", :b=>"b", :c=>"c", :d=>"d"}
The new value of h is returned.
Now pass the second value of enum into the block:
g,h = enum.next
#=> [{:a=>"a", :b=>nil, :c=>"notc", :d=>"d"},
# {:a=>"a", :b=>"b", :c=>"c", :d=>"d"}]
g #=> {:a=>"a", :b=>nil, :c=>"notc", :d=>"d"}
h #=> {:a=>"a", :b=>"b", :c=>"c", :d=>"d"}
and execute:
h.update(g)
When :a=>"a" from g is to be merged, update sees that h contains the same key, :a. It therefore defers to the block to determine the value for :a in the merged hash. It passes the following values to the block:
k = :a
o = "a"
n = "a"
where k is the key, o (for "old") is the value of k in h and n (for "new") is the value of k in g. (We're not using k in the block, so I've name the block variable _ to so signify.) In the case statement, o is not an array, so:
o.eql?(n) ? o : [o,n]
#=> "a".eql?("a") ? "a" : ["a","a"]
#=> "a"
is returned to returned to update to be the value for :a. That is, the value is not changed.
When the key is :b:
k = :b
o = "b"
n = nil
Again, o is not an array, so again we execute:
o.eql?(n) ? o : [o,n]
#=> ["b", nil]
but this time an array is returned. The remaining calculations for merging the second element of enum procede similarly. After the merge:
h #=> {:a=>"a", :b=>["b", nil], :c=>["c", "notc"], :d=>"d"}
When :c=>"cc" in the third element of enum is merged, the following values are passed to update's block:
_ :c
o = ["c", "notc"]
n = "cc"
Since o is an array, we execute the following line of the case statement:
o.include?(n) ? o : o + [n]
#=> ["c", "notc", "cc"]
and the value of :c is assigned that value. The remaining calculations are performed similarly.

I would do something like this:
array = [{ :a=>"a", :b=>"b", :c=>"c", :d=>"d" },
{ :a=>"a", :b=>nil, :c=>"notc", :d=>"d" }]
result = array.reduce({}) do |memo, hash|
memo.merge(hash) do |_, left, right|
combined = Array(left).push(right).uniq.compact
combined.size > 1 ? combined : combined.first
end
end
puts result
#=> { :a=>"a", :b=>"b", :c=>["c", "notc"], :d=>"d" }
Array(left) will ensure that the value of the one hash is an arary. push(right) adds the value from the other hash into that array. uniq.compact removes nil values and duplicates.
The combined.size > 1 ? combined : combined.first line returns just the element if the array holds only one element.

Assuming hs = [{:a=>"a", :b=>"b", :c=>"c", :d=>"d"}, {:a=>"a", :b=>nil, :c=>"notc", :d=>"d"}], you can do that with a (admittedly, terribly dense) one-liner:
Hash[hs.map { |h| h.map { |k,v| [k, v] } }.sort_by(&:first).reduce { |left,right| left.zip(right) }.map { |a| [a.first.first, a.map(&:last).compact.uniq] }]
To unpack what's going on here:
First we use map to convert the array of hashes into an array of arrays of pairs, and then use sort_by to sort the array so that all of the keys are 'lined up'.
[{:a=>"a", :b=>"b", :c=>"c", :d=>"d"}, {:a=>"a", :b=>nil, :c=>"notc", :d=>"d"}] becomes
[[[:a, "a"], [:b, "b"], [:c, "c"], [:d, "d"]],
[[:a, "a"], [:b, nil], [:c, "notc"], [:d, "d"]]]
That's this part:
hs.map { |h| h.map { |k,v| [k, v] } }.sort_by(&:first).
Then, we use reduce and zip all the arrays together, zip just takes two arrays [1,2,3], [:a,:b,:c] and outputs an array like: [[1,:a],[2,:b],[3,:c]]
reduce { |left,right| left.zip(right) }.
At this point we've grouped all the data together by key, and need to de-duplicate all the copies of the key, and we can remove the nils and uniq the values:
map { |a| [a.first.first, a.map(&:last).compact.uniq] }]
Here's a sample from a pry session:
[31] pry(main)> Hash[hs.map { |h| h.map { |k,v| [k, v] } }.sort_by(&:first).reduce { |left,right| left.zip(right) }.map { |a| [a.first.first, a.map(&:last).compact.uniq] }]
=> {:a=>["a"], :b=>["b"], :c=>["c", "notc"], :d=>["d"]}

Related

How to print only the keys which have values?

I have a hash with keys and values as follows:
hash = {"lili" => [] , "john" => ["a", "b"], "andrew" => [], "megh" => ["b","e"]}
As we can see some of the keys have values as empty arrays.
Some keys have array values where there are actual values in the array.
I want to loop over the hash and generate a new hash that includes only those keys which have values in their arrays (not the ones which have empty arrays). How can I do that?
The title says:
PRINT only the keys, but reading the post you are trying to generate a hash subset given a hash.
The solution for the TITLE of the POST:
hash.each {|k,v| p k unless v.empty? }
If you want to generate a new hash subset give the original hash:
hash.reject { |k,v| v.nil? || v.empty? }
If you want to PRINT the subset generated give the original hash:
hash.reject { |k,v| v.nil? || v.empty? }.each { |k,v| p k }
You could use reject to filter out those elements in your hash where the value is an empty array or the value is nil and then iterate to print their content;
{:lili=>[], :john=>[:a, :b], :andrew=>[], :megh=>[:b, :e], :other=>nil}
.reject { (_2 || []).empty? }
.each { puts "#{_1} has the value(s) #{_2}" }
that prints
john has the value(s) [:a, :b]
megh has the value(s) [:b, :e]
Unsure of your desired output or exact data structure, but you can simply use reject along with empty? to remove any hash value that contains an empty array:
hash = {"lili" => [] , "john" => ["a", "b"], "andrew" => [], "megh" => ["b","e"]}
hash.reject {|k, v| v.empty?}
#=> {"john"=>["a", "b"], "megh"=>["b", "e"]}
It should however be noted that this approach will not work if any hash values are nil. To address that situation, I would recommend either using compact to remove any hash elements with nil values prior to using reject OR by testing for either nil? or empty? (and in that order):
hash = {"lili" => [] , "john" => ["a", "b"], "andrew" => [], "megh" => ["b","e"], "other" => nil}
hash.reject {|k, v| v.empty?}
#=> Evaluation Error: Ruby NoMethodError: undefined method `empty?' for nil:NilClass
hash.compact.reject {|k, v| v.empty?}
#=> {"john"=>["a", "b"], "megh"=>["b", "e"]}
hash.reject {|k, v| v.empty? or v.nil?}
#=> Evaluation Error: Ruby NoMethodError: undefined method `empty?' for nil:NilClass
hash.reject {|k, v| v.nil? or v.empty?}
#=> {"john"=>["a", "b"], "megh"=>["b", "e"]}
hash.reject {|k, v| v.empty? || v.nil?}
#=> Evaluation Error: Ruby NoMethodError: undefined method `empty?' for nil:NilClass
hash.reject {|k, v| v.nil? || v.empty?}
#=> {"john"=>["a", "b"], "megh"=>["b", "e"]}

Reduce an array of nested hashes to a flattened array of key/value pairs

Given the following data structure:
[
[:created_at, "07/28/2017"],
[:valid_record, "true"],
[:cs_details, { gender: 'm', race: 'w', language: nil } ],
[:co_details, { description: 'possess', extra: { a: 'a', b: 'b', c: 'c'} } ]
]
I want an array of arrays of key/value pairs:
[
[:created_at, "07/28/2017"],
[:valid_record, "true"],
[:gender, 'm'],
[:race, 'w'],
[:description, "process"]
[:a, "a"],
[:b, "b"],
[:c, "c"]
]
Problem is I don't know how to flatten those hashes. flatten doesn't do anything:
arr.map(&:flatten)
=> [[:created_at, "07/28/2017"], [:valid_record, "true"], [:cs_details, {:gender=>"m", :race=>"w", :language=>nil}], [:co_details, {:description=>"possess", :extra=>{:a=>"a", :b=>"b", :c=>"c"}}]]
So I know flat_map won't help either. I cannot even turn those hashes to arrays using to_a:
arr.map(&:to_a)
=> [[:created_at, "07/28/2017"], [:valid_record, "true"], [:cs_details, {:gender=>"m", :race=>"w", :language=>nil}], [:co_details, {:description=>"possess", :extra=>{:a=>"a", :b=>"b", :c=>"c"}}]]
The problem with the above methods is that they work top level index only. And these hashes are nested in arrays. So I try reduce and then invoke flat_map on result:
arr.reduce([]) do |acc, (k,v)|
if v.is_a?(Hash)
acc << v.map(&:to_a)
else
acc << [k,v]
end
acc
end.flat_map(&:to_a)
=> [:created_at, "07/28/2017", :valid_record, "true", [:gender, "m"], [:race, "w"], [:language, nil], [:description, "possess"], [:extra, {:a=>"a", :b=>"b", :c=>"c"}]]
Not quite there, but closer. Any suggestions?
▶ flattener = ->(k, v) do
▷ case v
▷ when Enumerable then v.flat_map(&flattener)
▷ when NilClass then []
▷ else [k, v]
▷ end
▷ end
#⇒ #<Proc:0x000000032169e0#(pry):26 (lambda)>
▶ input.flat_map(&flattener).each_slice(2).to_a
#⇒ [
# [:created_at, "07/28/2017"],
# [:valid_record, "true"],
# [:gender, "m"],
# [:race, "w"],
# [:description, "possess"],
# [:a, "a"],
# [:b, "b"],
# [:c, "c"]
# ]
I think you would benefit from writing a helper function that will be called on each item in the array. So that the results are uniform, we will make sure that this function always returns an array of arrays. In other words, an array containing one or more "entries," depending on whether the thing at index 1 is a hash.
def extract_entries((k,v))
if v.is_a? Hash
v.to_a
else
[[k, v]]
end
end
Trying it out:
require 'pp'
pp data.map {|item| extract_entries(item)}
Output:
[[[:created_at, "07/28/2017"]],
[[:valid_record, "true"]],
[[:gender, "m"], [:race, "w"], [:language, nil]],
[[:description, "possess"], [:extra, {:a=>"a", :b=>"b", :c=>"c"}]]]
Now, we can flatten by one level to reach your desired format:
pp data.map {|item| extract_entries(item)}.flatten(1)
Output:
[[:created_at, "07/28/2017"],
[:valid_record, "true"],
[:gender, "m"],
[:race, "w"],
[:language, nil],
[:description, "possess"],
[:extra, {:a=>"a", :b=>"b", :c=>"c"}]]
def to_flat(arr)
arr.flat_map { |k,v| v.is_a?(Hash) ? to_flat(v.compact) : [[k, v]] }
end
test
arr = [ [:created_at, "07/28/2017"],
[:valid_record, "true"],
[:cs_details, { gender: 'm', race: 'w', language: nil } ],
[:co_details, { description: 'possess', extra: { a: 'a', b: 'b', c: 'c'} } ] ]
to_flat(arr)
#=> [[:created_at, "07/28/2017"], [:valid_record, "true"], [:gender, "m"],
# [:race, "w"], [:description, "possess"], [:a, "a"], [:b, "b"], [:c, "c"]]

How do I create a hash from this array?

I have an array that looks like this:
["value1=3", "value2=4", "value3=5"]
I'd like to end up with a hash like:
H['value1'] = 3
H['value2'] = 4
H['value3'] = 5
There's some parsing involved and I was hoping to get pointed in the right direction.
ary = ["value1=3", "value2=4", "value3=5"]
H = Hash[ary.map {|s| s.split('=') }]
This however will set all the values as strings '5' instead of integer. If you are sure they are all integers:
H = Hash[ary.map {|s| key, value = s.split('='); [key, value.to_i] }]
I'd do as #BroiSatse suggests, but here's another way that uses a Regex:
ary = ["value1=3", "value2=4", "value3=5"]
ary.join.scan(/([a-z]+\d+)=(\d+)/).map { |k,v| [k,v.to_i] }.to_h
=> {"value1"=>3, "value2"=>4, "value3"=>5}
Here's what's happening:
str = ary.join
#=> "value1=3value2=4value3=5"
a = str.scan(/([a-z]+\d+)=(\d+)/)
#=> [["value1", "3"], ["value2", "4"], ["value3", "5"]]
b = a.map { |k,v| [k,v.to_i] }
#=> [["value1", 3], ["value2", 4], ["value3", 5]]
b.to_h
#=> {"value1"=>3, "value2"=>4, "value3"=>5}
For Ruby versions < 2.0, the last line must be replaced with
Hash[b]
#=> {"value1"=>3, "value2"=>4, "value3"=>5}

dup gives different results when hash is one vs. two dimensions

dup is shallow copy, so when doing this:
h = {one: {a:'a', b: 'b'}}
h_copy = h.dup
h_copy[:one][:b] = 'new b'
now h and h_copy is same: {:one=>{:a=>"a", :b=>"new b"}}
yes, that right.
But when h is a one dimension hash:
h = {a:'a', b: 'b'}
h_copy = h.dup
h_copy[:b] = 'new b'
h still is: {a:'a', b: 'b'}
h_copy is {a:'a', b: 'new b'}
Why?
You can think about your two-dimensional hash as some kind of container, which conatins another hash container. So you have 2 containers.
When you call dup on h, then dup returns you copy of your outermost container, but any inner containers are not copied, so this is what shallow copy does. Now after dup you have 3 containers: h_copy is your new third container, which :one key just points to h's inner container
As you said, dup is shallow copy.
It appears you want both h_copy and h to refer to the same object.
Then simply do h_copy = h (i.e. no dup).
h = {a:'a', b: 'b'}
h_copy = h.dup
h_copy[:b] = 'new b'
h #=> {a:'a', b: 'new b'}
So after 1 hour of brainstorming..I have come to the conclusion that in the multi dimensional hashes, the dup generates the same object_id for each key which is in turn referring to the hash whereas in the single dimensional hash, the object_ids are similar initially but when we make any changes to the object the Ruby would assign new object_id to the hash keys..
Look at the following code
h = { :a => "a", :b => "b" } # => {:a=>"a", :b=>"b"}
h_clone = h.dup #=> {:a=>"a", :b=>"b"}
h.object_id #=> 73436330
h_clone.object_id #=> 73295920
h[:a].object_id #=> 73436400
h_clone[:a].object_id #=> 73436400
h[:b].object_id #=> 73436380
h_clone[:b].object_id #=> 73436380
h_clone[:b] = "New B" #=> "New B"
h_clone[:b].object_id #=> 74385280
h.object_id #=> 73436330
h_clone.object_id #=> 73295920
h[:a].object_id #=> 73436400
h_clone[:a].object_id #=> 73436400
Look the following code for the multidimensional array
h = { :one => { :a => "a", :b => "b" } } #=> {:one=>{:a=>"a", :b=>"b"}}
h_copy = h.dup #=> {:one=>{:a=>"a", :b=>"b"}}
h_copy.object_id #=> 80410620
h.object_id #=> 80552610
h[:one].object_id #=> 80552620
h_copy[:one].object_id #=> 80552620
h[:one][:a].object_id #=> 80552740
h_copy[:one][:a].object_id #=> 80552740
h[:one][:b].object_id #=> 80552700
h_copy[:one][:b].object_id #=> 80552700
h_copy[:one][:b] = "New B" #=> "New B"
h_copy #=> {:one=>{:a=>"a", :b=>"New B"}}
h #=> {:one=>{:a=>"a", :b=>"New B"}}
h.object_id #=> 80552610
h_copy.object_id #=> 80410620
h[:one].object_id #=> 80552620
h_copy[:one].object_id #=> 80552620
h[:one][:b].object_id #=> 81558770
h_copy[:one][:b].object_id #=> 81558770

Hash invert in Ruby?

I've got a hash of the format:
{key1 => [a, b, c], key2 => [d, e, f]}
and I want to end up with:
{ a => key1, b => key1, c => key1, d => key2 ... }
What's the easiest way of achieving this?
I'm using Ruby on Rails.
UPDATE
OK I managed to extract the real object from the server log, it is being pushed via AJAX.
Parameters: {"status"=>{"1"=>["1", "14"], "2"=>["7", "12", "8", "13"]}}
hash = {:key1 => ["a", "b", "c"], :key2 => ["d", "e", "f"]}
first variant
hash.map{|k, v| v.map{|f| {f => k}}}.flatten
#=> [{"a"=>:key1}, {"b"=>:key1}, {"c"=>:key1}, {"d"=>:key2}, {"e"=>:key2}, {"f"=>:key2}]
or
hash.inject({}){|h, (k,v)| v.map{|f| h[f] = k}; h}
#=> {"a"=>:key1, "b"=>:key1, "c"=>:key1, "d"=>:key2, "e"=>:key2, "f"=>:key2}
UPD
ok, your hash is:
hash = {"status"=>{"1"=>["1", "14"], "2"=>["7", "12", "8", "13"]}}
hash["status"].inject({}){|h, (k,v)| v.map{|f| h[f] = k}; h}
#=> {"12"=>"2", "7"=>"2", "13"=>"2", "8"=>"2", "14"=>"1", "1"=>"1"}
Lots of other good answers. Just wanted to toss this one in too for Ruby 2.0 and 1.9.3:
hash = {apple: [1, 14], orange: [7, 12, 8, 13]}
Hash[hash.flat_map{ |k, v| v.map{ |i| [i, k] } }]
# => {1=>:apple, 14=>:apple, 7=>:orange, 12=>:orange, 8=>:orange, 13=>:orange}
This is leveraging: Hash::[] and Enumerable#flat_map
Also in these new versions there is Enumerable::each_with_object which is very similar to Enumerable::inject/Enumerable::reduce:
hash.each_with_object(Hash.new){ |(k, v), inverse|
v.each{ |e| inverse[e] = k }
}
Performing a quick benchmark (Ruby 2.0.0p0; 2012 Macbook Air) using an original hash with 100 keys, each with 100 distinct values:
Hash::[] w/ Enumerable#flat_map
155.7 (±9.0%) i/s - 780 in 5.066286s
Enumerable#each_with_object w/ Enumerable#each
199.7 (±21.0%) i/s - 940 in 5.068926s
Shows that the each_with_object variant is faster for that data set.
Ok, let's guess. You say you have an array but I agree with Benoit that what you probably have is a hash. A functional approach:
h = {:key1 => ["a", "b", "c"], :key2 => ["d", "e", "f"]}
h.map { |k, vs| Hash[vs.map { |v| [v, k] }] }.inject(:merge)
#=> {"a"=>:key1, "b"=>:key1, "c"=>:key1, "d"=>:key2, "e"=>:key2, "f"=>:key2}
Also:
h.map { |k, vs| Hash[vs.product([k])] }.inject(:merge)
#=> {"a"=>:key1, "b"=>:key1, "c"=>:key1, "d"=>:key2, "e"=>:key2, "f"=>:key2}
In the case where a value corresponds to more than one key, like "c" in this example...
{ :key1 => ["a", "b", "c"], :key2 => ["c", "d", "e"]}
...some of the other answers will not give the expected result. We will need the reversed hash to store the keys in arrays, like so:
{ "a" => [:key1], "b" => [:key1], "c" => [:key1, :key2], "d" => [:key2], "e" => [:key2] }
This should do the trick:
reverse = {}
hash.each{ |k,vs|
vs.each{ |v|
reverse[v] ||= []
reverse[v] << k
}
}
This was my use case, and I would have defined my problem much the same way as the OP (in fact, a search for a similar phrase got me here), so I suspect this answer may help other searchers.
If you're looking to reverse a hash formatted like this, the following may help you:
a = {:key1 => ["a", "b", "c"], :key2 => ["d", "e", "f"]}
a.inject({}) do |memo, (key, values)|
values.each {|value| memo[value] = key }
memo
end
this returns:
{"a"=>:key1, "b"=>:key1, "c"=>:key1, "d"=>:key2, "e"=>:key2, "f"=>:key2}
new_hash={}
hash = {"key1" => ['a', 'b', 'c'], "key2" => ['d','e','f']}
hash.each_pair{|key, val|val.each{|v| new_hash[v] = key }}
This gives
new_hash # {"a"=>"key1", "b"=>"key1", "c"=>"key1", "d"=>"key2", "e"=>"key2", "f"=>"key2"}
If you want to correctly deal with duplicate values, then you should use the Hash#inverse
from Facets of Ruby
Hash#inverse preserves duplicate values,
e.g. it ensures that hash.inverse.inverse == hash
either:
use Hash#inverse from here: http://www.unixgods.org/Ruby/invert_hash.html
use Hash#inverse from FacetsOfRuby library 'facets'
usage like this:
require 'facets'
h = {:key1 => [:a, :b, :c], :key2 => [:d, :e, :f]}
=> {:key1=>[:a, :b, :c], :key2=>[:d, :e, :f]}
h.inverse
=> {:a=>:key1, :b=>:key1, :c=>:key1, :d=>:key2, :e=>:key2, :f=>:key2}
The code looks like this:
# this doesn't looks quite as elegant as the other solutions here,
# but if you call inverse twice, it will preserve the elements of the original hash
# true inversion of Ruby Hash / preserves all elements in original hash
# e.g. hash.inverse.inverse ~ h
class Hash
def inverse
i = Hash.new
self.each_pair{ |k,v|
if (v.class == Array)
v.each{ |x|
i[x] = i.has_key?(x) ? [k,i[x]].flatten : k
}
else
i[v] = i.has_key?(v) ? [k,i[v]].flatten : k
end
}
return i
end
end
h = {:key1 => [:a, :b, :c], :key2 => [:d, :e, :f]}
=> {:key1=>[:a, :b, :c], :key2=>[:d, :e, :f]}
h.inverse
=> {:a=>:key1, :b=>:key1, :c=>:key1, :d=>:key2, :e=>:key2, :f=>:key2}
One way to achieve what you're looking for:
arr = [{["k1"] => ["a", "b", "c"]}, {["k2"] => ["d", "e", "f"]}]
results_arr = []
arr.each do |hsh|
hsh.values.flatten.each do |val|
results_arr << { [val] => hsh.keys.first }···
end
end
Result: [{["a"]=>["k1"]}, {["b"]=>["k1"]}, {["c"]=>["k1"]}, {["d"]=>["k2"]}, {["e"]=>["k2"]}, {["f"]=>["k2"]}]

Resources