Merging hash values in an array of hashes based on key - ruby

I have an array of hashes similar to this:
[
{"student": "a","scores": [{"subject": "math","quantity": 10},{"subject": "english", "quantity": 5}]},
{"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]},
{"student": "a", "scores": [ { "subject": "math", "quantity": 2},{"subject": "science", "quantity": 5 } ] }
]
Is there a simpler way of getting the output similar to this except looping through the array and finding a duplicate and then combining them?
[
{"student": "a","scores": [{"subject": "math","quantity": 12},{"subject": "english", "quantity": 5},{"subject": "science", "quantity": 5 } ]},
{"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]}
]
Rules for merging duplicate objects:
Students are merged on matching "value" (e.g. student "a", student "b")
Students scores on identical subjects are added (e.g. student a's math scores 2 and 10 become 12 when merged)

Is there a simpler way of getting the output similar to this except looping through the array and finding a duplicate and then combining them?
Not that I know of. IF you explain where this data is comeing form the answer may be different but just based on the Array of Hash objects I think you will haev to iterate and combine.
While it is not elegant you could use a solution like this
arr = [
{"student"=> "a","scores"=> [{"subject"=> "math","quantity"=> 10},{"subject"=> "english", "quantity"=> 5}]},
{"student"=> "b", "scores"=> [{"subject"=> "math","quantity"=> 1 }, {"subject"=> "english","quantity"=> 2 } ]},
{"student"=> "a", "scores"=> [ { "subject"=> "math", "quantity"=> 2},{"subject"=> "science", "quantity"=> 5 } ] }
]
#Group the array by student
arr.group_by{|student| student["student"]}.map do |student_name,student_values|
{"student" => student_name,
#combine all the scores and group by subject
"scores" => student_values.map{|student| student["scores"]}.flatten.group_by{|score| score["subject"]}.map do |subject,subject_values|
{"subject" => subject,
#combine all the quantities into an array and reduce using `+`
"quantity" => subject_values.map{|h| h["quantity"]}.reduce(:+)
}
end
}
end
#=> [
{"student"=>"a", "scores"=>[
{"subject"=>"math", "quantity"=>12},
{"subject"=>"english", "quantity"=>5},
{"subject"=>"science", "quantity"=>5}]},
{"student"=>"b", "scores"=>[
{"subject"=>"math", "quantity"=>1},
{"subject"=>"english", "quantity"=>2}]}
]
I know that you specified your expected result but I wanted to point out that making the output simpler makes the code simpler.
arr.map(&:dup).group_by{|a| a.delete("student")}.each_with_object({}) do |(student, scores),record|
record[student] = scores.map(&:values).flatten.map(&:values).each_with_object(Hash.new(0)) do |(subject,score),obj|
obj[subject] += score
obj
end
record
end
#=>{"a"=>{"math"=>12, "english"=>5, "science"=>5}, "b"=>{"math"=>1, "english"=>2}}
With this structure getting the students is as easy as calling .keys and the scores would be equally as simple. I am thinking something like
above_result.each do |student,scores|
puts student
scores.each do |subject,score|
puts " #{subject.capitalize}: #{score}"
end
end
end
The console out put would be
a
Math: 12
English: 5
Science: 5
b
Math: 1
English: 2

There are two common ways of aggregating values in such instances. The first is to employ the method Enumerable#group_by, as #engineersmnky has done in his answer. The second is to build a hash using the form of the method Hash#update (a.k.a. merge!) that uses a block to resolve the values of keys which are present in both of the hashes being merged. My solution uses the latter approach, not because I prefer it to the group_by, but just to show you a different way it can be done. (Had engineersmnky used update, I would have gone with group_by.)
Your problem is complicated somewhat by the particular data structure you are using. I found that the solution could be simplfied and made easier to follow by first converting the data to a different structure, update the scores, then convert the result back to your data structure. You may want to consider changing the data structure (if that's an option for you). I've addressed that issue in the "Discussion" section.
Code
def combine_scores(arr)
reconstruct(update_scores(simplify(arr)))
end
def simplify(arr)
arr.map do |h|
hash = Hash[h[:scores].map { |g| g.values }]
hash.default = 0
{ h[:student]=> hash }
end
end
def update_scores(arr)
arr.each_with_object({}) do |g,h|
h.update(g) do |_, h_scores, g_scores|
g_scores.each { |subject,score| h_scores[subject] += score }
h_scores
end
end
end
def reconstruct(h)
h.map { |k,v| { student: k, scores: v.map { |subject, score|
{ subject: subject, score: score } } } }
end
Example
arr = [
{ student: "a", scores: [{ subject: "math", quantity: 10 },
{ subject: "english", quantity: 5 }] },
{ student: "b", scores: [{ subject: "math", quantity: 1 },
{ subject: "english", quantity: 2 } ] },
{ student: "a", scores: [{ subject: "math", quantity: 2 },
{ subject: "science", quantity: 5 } ] }]
combine_scores(arr)
#=> [{ :student=>"a",
# :scores=>[{ :subject=>"math", :score=>12 },
# { :subject=>"english", :score=> 5 },
# { :subject=>"science", :score=> 5 }] },
# { :student=>"b",
# :scores=>[{ :subject=>"math", :score=> 1 },
# { :subject=>"english", :score=> 2 }] }]
Explanation
First consider the two intermediate calculations:
a = simplify(arr)
#=> [{ "a"=>{ "math"=>10, "english"=>5 } },
# { "b"=>{ "math"=> 1, "english"=>2 } },
# { "a"=>{ "math"=> 2, "science"=>5 } }]
h = update_scores(a)
#=> {"a"=>{"math"=>12, "english"=>5, "science"=>5}
# "b"=>{"math"=> 1, "english"=>2}}
Then
reconstruct(h)
returns the result shown above.
+ simplify
arr.map do |h|
hash = Hash[h[:scores].map { |g| g.values }]
hash.default = 0
{ h[:student]=> hash }
end
This maps each hash into a simpler one. For example, the first element of arr:
h = { student: "a", scores: [{ subject: "math", quantity: 10 },
{ subject: "english", quantity: 5 }] }
is mapped to:
{ "a"=>Hash[[{ subject: "math", quantity: 10 },
{ subject: "english", quantity: 5 }].map { |g| g.values }] }
#=> { "a"=>Hash[[["math", 10], ["english", 5]]] }
#=> { "a"=>{"math"=>10, "english"=>5}}
Setting the default value of each hash to zero simplifies the update step, which follows.
+ update_scores
For the array of hashes a that is returned by simplify, we compute:
a.each_with_object({}) do |g,h|
h.update(g) do |_, h_scores, g_scores|
g_scores.each { |subject,score| h_scores[subject] += score }
h_scores
end
end
Each element of a (a hash) is merged into an initially-empty hash, h. As update (same as merge!) is used for the merge, h is modified. If both hashes share the same key (e.g., "math"), the values are summed; else subject=>score is added to h.
Notice that if h_scores does not have the key subject, then:
h_scores[subject] += score
#=> h_scores[subject] = h_scores[subject] + score
#=> h_scores[subject] = 0 + score (because the default value is zero)
#=> h_scores[subject] = score
That is, the key-value pair from g_scores is merely added to h_scores.
I've replaced the block variable representing the subject with a placeholder _, to reduce the chance of errors and to inform the reader that it is not used in the block.
+ reconstruct
The final step is to convert the hash returned by update_scores back to the original data structure, which is straightforward.
Discussion
If you change the data structure, and it meets your requirements, you may wish to consider changing it to that produced by combine_scores:
h = { "a"=>{ math: 10, english: 5 }, "b"=>{ math: 1, english: 2 } }
Then to update the scores with:
g = { "a"=>{ math: 2, science: 5 }, "b"=>{ english: 3 }, "c"=>{ science: 4 } }
you would merely to the following:
h.merge(g) { |_,oh,nh| oh.merge(nh) { |_,ohv,nhv| ohv+nhv } }
#=> { "a"=>{ :math=>12, :english=>5, :science=>5 },
# "b"=>{ :math=> 1, :english=>5 },
# "c"=>{ :science=>4 } }

Related

Transform array of nested Hashes into flat array of non-nested hashes

I want to transform the given array into result array:
given = [{
"foo_v1_4" => [{
"derivate_version" => 0,
"layers" => {
"tlayer" => {
"baz" => {
"three" => 0.65
},
"bazbar" => {
"three" => 0.65
}
}
}
}]
}]
# the value of key :one is first hash key (foo_v1_4) plus underscore (_) plus derivate_version (0)
result = [{
one: 'foo_v1_4_0',
tlayer: 'baz',
three: '0.6'
},
{
one: 'foo_v1_4_0',
tlayer: 'bazbar',
three: '0.6'
}
]
What I tried:
given.each do |el |
el.each do |derivat |
derivat.each do |d |
d.each do |layer |
layer.each do |l |
derivat = "#{d}_#{l['derivate_version']}"
puts derivat
end
end
end
end
end
I'm struggling at iterating through "layers" hash, the amount of elements in layers is equal to the amount of elements in result array.
It helps to format the objects so we can better see their structures:
given = [
{
"foo_v1_4" => [
{ "derivate_version" => 0,
"layers" => {
"tlayer" => {
"baz" => { "three" => 0.65 },
"bazbar" => { "three" => 0.65 }
}
}
}
]
}
]
result = [
{
one: 'foo_v1_4_0',
tlayer: 'baz',
three: '0.6'
},
{
one: 'foo_v1_4_0',
tlayer: 'bazbar',
three: '0.6'
}
]
We can begin by writing the structure of result:
result = [
{
one:
tlayer:
three:
},
{
one:
tlayer:
three:
}
]
We see that
given = [ { "foo_v1_4" => <array> } ]
The values of the keys :one in the hash result[0] is therefore the first key of the first element of given:
one_val = given[0].keys[0]
#=> "foo_v1_4"
result = [
{
one: one_val
tlayer:
three:
},
{
one: one_val
tlayer:
three:
}
]
All the remaining objects of interest are contained in the hash
h = given[0]["foo_v1_4"][0]["layers"]["layer"]
#=> {
# "baz"=>{ "three"=>0.65 },
# "bazbar"=>{ "three"=>0.65 }
# }
so it is convenient to define it. We see that:
h.keys[0]
#=> "baz"
h.keys[1]
#=> "bazaar"
h["bazbar"]["three"]
#=> 0.65
Note that it generally is not good practice to assume that hash keys are ordered in a particular way.
We may now complete the construction of result,
v = h["bazbar"]["three"].truncate(1)
#=> 0.6
result = [
{
one: one_val,
tlayer: h.keys[0],
three: v
},
{ one: one_val,
tlayer: h.keys[1],
three: v
}
]
#=> [
# { :one=>"foo_v1_4", :tlayer=>"baz", :three=>0.6 },
# { :one=>"foo_v1_4", :tlayer=>"bazbar", :three=>0.6 }
# ]
The creation of the temporary objects one_val, h, and v improves time- and space-efficiency, makes the calculations easier to test and improves the readability of the code.
Try the below:
result = []
given.each do |level1|
level1.each do |key, derivate_versions|
derivate_versions.each do |layers|
# iterate over the elements under tlayer
layers.dig('layers', 'tlayer').each do |tlayer_key, tlayer_value|
sub_result = {}
# key - foo_v1_4, layers['derivate_version'] - 0 => 'foo_v1_4_0'
sub_result[:one] = key + '_' + layers['derivate_version'].to_s
# talyer_key - baz, barbaz
sub_result[:tlayer] = tlayer_key
# talyer_value - { "three" => 0.65 }
sub_result[:three] = tlayer_value['three']
result << sub_result
end
end
end
end
The value of result will be:
2.6.3 :084 > p result
[{:one=>"foo_v1_4_0", :tlayer=>"baz", :three=>0.65}, {:one=>"foo_v1_4_0", :tlayer=>"bazbar", :three=>0.65}]

Is there any way to check if hashes in an array contains similar key value pairs in ruby?

For example, I have
array = [ {name: 'robert', nationality: 'asian', age: 10},
{name: 'robert', nationality: 'asian', age: 5},
{name: 'sira', nationality: 'african', age: 15} ]
I want to get the result as
array = [ {name: 'robert', nationality: 'asian', age: 15},
{name: 'sira', nationality: 'african', age: 15} ]
since there are 2 Robert's with the same nationality.
Any help would be much appreciated.
I have tried Array.uniq! {|e| e[:name] && e[:nationality] } but I want to add both numbers in the two hashes which is 10 + 5
P.S: Array can have n number of hashes.
I would start with something like this:
array = [
{ name: 'robert', nationality: 'asian', age: 10 },
{ name: 'robert', nationality: 'asian', age: 5 },
{ name: 'sira', nationality: 'african', age: 15 }
]
array.group_by { |e| e.values_at(:name, :nationality) }
.map { |_, vs| vs.first.merge(age: vs.sum { |v| v[:age] }) }
#=> [
# {
# :name => "robert",
# :nationality => "asian",
# :age => 15
# }, {
# :name => "sira",
# :nationality => "african",
# :age => 15
# }
# ]
Let's take a look at what you want to accomplish and go from there. You have a list of some objects, and you want to merge certain objects together if they have the same ethnicity and name. So we have a key by which we will merge. Let's put that in programming terms.
key = proc { |x| [x[:name], x[:nationality]] }
We've defined a procedure which takes a hash and returns its "key" value. If this procedure returns the same value (according to eql?) for two hashes, then those two hashes need to be merged together. Now, what do we mean by "merge"? You want to add the ages together, so let's write a merge function.
merge = proc { |x, y| x.dup.tap { |x1| x1[:age] += y[:age] } }
If we have two values x and y such that key[x] and key[y] are the same, we want to merge them by making a copy of x and adding y's age to it. That's exactly what this procedure does. Now that we have our building blocks, we can write the algorithm.
We want to produce an array at the end, after merging using the key procedure we've written. Fortunately, Ruby has a handy function called each_with_object which will do something very nice for us. The method each_with_object will execute its block for each element of the array, passing in a predetermined value as the other argument. This will come in handy here.
result = array.each_with_object({}) do |x, hsh|
# ...
end.values
Since we're using keys and values to do the merge, the most efficient way to do this is going to be with a hash. Hence, we pass in an empty hash as the extra object, which we'll modify to accumulate the merge results. At the end, we don't care about the keys anymore, so we write .values to get just the objects themselves. Now for the final pieces.
if hsh.include? key[x]
hsh[ key[x] ] = merge.call hsh[ key[x] ], x
else
hsh[ key[x] ] = x
end
Let's break this down. If the hash already includes key[x], which is the key for the object x that we're looking at, then we want to merge x with the value that is currently at key[x]. This is where we add the ages together. This approach only works if the merge function is what mathematicians call a semigroup, which is a fancy way of saying that the operation is associative. You don't need to worry too much about that; addition is a very good example of a semigroup, so it works here.
Anyway, if the key doesn't exist in the hash, we want to put the current value in the hash at the key position. The resulting hash from merging is returned, and then we can get the values out of it to get the result you wanted.
key = proc { |x| [x[:name], x[:nationality]] }
merge = proc { |x, y| x.dup.tap { |x1| x1[:age] += y[:age] } }
result = array.each_with_object({}) do |x, hsh|
if hsh.include? key[x]
hsh[ key[x] ] = merge.call hsh[ key[x] ], x
else
hsh[ key[x] ] = x
end
end.values
Now, my complexity theory is a bit rusty, but if Ruby implements its hash type efficiently (which I'm fairly certain it does), then this merge algorithm is O(n), which means it will take a linear amount of time to finish, given the problem size as input.
array.each_with_object(Hash.new(0)) { |g,h| h[[g[:name], g[:nationality]]] += g[:age] }.
map { |(name, nationality),age| { name:name, nationality:nationality, age:age } }
[{ :name=>"robert", :nationality=>"asian", :age=>15 },
{ :name=>"sira", :nationality=>"african", :age=>15 }]
The two steps are as follows.
a = array.each_with_object(Hash.new(0)) { |g,h| h[[g[:name], g[:nationality]]] += g[:age] }
#=> { ["robert", "asian"]=>15, ["sira", "african"]=>15 }
This uses the class method Hash::new to create a hash with a default value of zero (represented by the block variable h). Once this hash heen obtained it is a simple matter to construct the desired hash:
a.map { |(name, nationality),age| { name:name, nationality:nationality, age:age } }

Split values into separate Arrays based on keys in a Hash?

I have some thing like this
[
{
"key": "55ffee8b6a617960010e0000",
"doc_count": 1
},
{
"key": "55fff0376a61794e190f0000",
"doc_count": 1
},
{
"key": "55fff0dd6a61794e191f0000",
"doc_count": 1
}
]
i want to separate :key values and :doc_count values into separate arrays like
["55ffee8b6a617960010e0000", "55fff0376a61794e190f0000", "55fff0dd6a61794e191f0000"]
and like [1,1,1]. How to achieve this?
You can use transpose here:
keys, doc_counts = array_of_hashes.map(&:values).transpose
As D-side points out this relies on the ordering of the keys being the same for each hash. If you cannot ensure this (for instance your data is being created via an API) you would have to perform the additional step of sorting the hash's keys. That would look something like:
keys, doc_counts = array_of_hashes.map{|h| Hash[h.sort].values }.transpose
In either case you'll end up with something like:
keys # => ["55ffee8b6a617960010e0000", "55fff0376a61794e190f0000", "55fff0dd6a61794e191f0000"]
doc_counts # => [1, 1, 1]
You can use some of these
a = [
{
"key" => "55ffee8b6a617960010e0000",
"doc_count" => 1
},
{
"key" => "55fff0376a61794e190f0000",
"doc_count" => 1
},
{
"key" => "55fff0dd6a61794e191f0000",
"doc_count" => 1
}
]
1.
hash = Hash[a.map { |h| [h["key"], h["doc_count"]] }]
hash.keys
hash.values
2.
exp = Hash.new { |k, v| k[v] = [] }
a.map { |h| h.each { |k, v| exp[k] << v } }
3.
hash = a.each_with_object({}) { |arr_h, h| h[arr_h["key"]] = arr_h["doc_count"] }
hash.keys
hash.values
You could iterate and assign it to new arrays doc_counts and keys.
array = [{"key"=>"55ffee8b6a617960010e0000", "doc_count"=>1}, {"key"=>"55fff0376a61794e190f0000", "doc_count"=>1}, {"key"=>"55fff0dd6a61794e191f0000", "doc_count"=>1}]
doc_counts, keys = [],[]
array.each do |a|
doc_counts << a["doc_count"]
keys << a["key"]
end
Result
>> doc_counts
=> [1, 1, 1]
>> keys
=> ["55ffee8b6a617960010e0000", "55fff0376a61794e190f0000", "55fff0dd6a61794e191f0000"]
Or
doc_counts = []
keys = array.map do |a|
doc_counts << a["doc_count"]
a["key"]
end

Simulate join between Hashes

I have two Arrays of Hashes which simulate two tables in a database, with one key in the first hash referencing a separately-named key in the second hash, example below:
cars = [ { id: 1, color: 'red', owner_id: 1 }, { id: 2, color: 'black', owner_id: 1 } ]
owners = [ { id: 1, name: 'Alice' }, { id: 2, name: 'Bob' } ]
I'd like to try to accomplish a "join" on these two hashes, resulting in a new Array of Hashes, so that the keys and values from owners will be merged into any of the cars hashes where the cars' :owner_id matches an owner's :id. So in the above example, the result would look like this:
[ { id: 1, color: 'red', owner_id: 1, name: 'Alice' }, { id: 2, color: 'black', owner_id: 1, name: 'Alice' } ]
Anyone have any thoughts on how I could achieve this? Thank you!
[EDIT] Updated to clarify that I would like the results to be placed in a new Array of Hashes, rather than mutating either of the original Arrays.
def join(referers, referees, on_referer, on_referee)
referers.map do |record|
referees.find do |referee_record|
record[on_referer] == referee_record[on_referee]
end.merge(record)
end
end
cars = [ { id: 1, color: 'red', owner_id: 1 }, { id: 2, color: 'black', owner_id: 1 } ]
owners = [ { id: 1, name: 'Alice' }, { id: 2, name: 'Bob' } ]
join(cars, owners, :owner_id, :id)
# => [{:id=>1, :name=>"Alice", :color=>"red", :owner_id=>1},
# {:id=>2, :name=>"Alice", :color=>"black", :owner_id=>1}]
Edit: I just noticed that it is the key :owner_id in cars that is to be matched with the :id in owners. I assumed the key :id in cars was to be matched. I will leave my answer as is, considering that the modification is trivial and that it may be easier to follow if the match is to be on the same key names.
Assuming that:
you want to modify (mutate) cars; and
for each element h of owners there is an element g of cars for which h[:id] == g[:id],
it's just
owners.each { |h| cars.find { |g| g[:id] == h[:id] }.update(h) }
cars #=> [{:id=>1, :color=>"red", :owner_id=>1, :name=>"Alice"},
# {:id=>2, :color=>"black", :owner_id=>1, :name=>"Bob"}]
On the other hand, if:
you do not wish to mutate cars or
for a given element h of owners there may be no element g of cars for which h[:id]==g[:id] or
you just want to improve efficiency,
you could first create a hash for cars or owners whose keys are values of :id.
Suppose:
owners = [ { id: 3, name: 'Alice' }, { id: 2, name: 'Bob' } ]
We could create a hash for owners:
owners_by_id = owners.each_with_object({}) { |g,h| h.update(g[:id]=>g) }
#=> {3=>{:id=>3, :name=>"Alice"}, 2=>{:id=>2, :name=>"Bob"}}
and then write:
cars.map do |h|
g = {}.merge(h)
id = g[:id]
g.update(owners_by_id[id]) if owners_by_id.key?(id)
g
end
#=> [{:id=>1, :color=>"red", :owner_id=>1},
# {:id=>2, :color=>"black", :owner_id=>1, :name=>"Bob"}]
Assuming that the hashes at the same position in the arrays correspond:
[cars, owners].transpose.map{|h1, h2| h1.merge(h2)}
Otherwise, your example is bad.

How to limit an array of similar hashes to those that have more than one of the same key:value pair (details inside)

I have an array like this
arr = [ { name: "Josh", grade: 90 }, {name: "Josh", grade: 70 },
{ name: "Kevin", grade: 100 }, { name: "Kevin", grade: 95 },
{ name: "Ben", grade: 90 }, { name: "Rod", grade: 90 },
{ name: "Rod", grade: 70 }, { name: "Jack", grade: 60 } ]
I would like Ben and Jack to be removed since they only have one record in this array. What would be the most elegant way to get this done? I could manually go through it and check, but is there a better way? Like the opposite of
arr.uniq! { |person| person[:name] }
arr.reject! { |x| arr.count { |y| y[:name] == x[:name] } == 1 }
An O(n) solution:
count_hash = {}
arr.each { |x| count_hash[x[:name]] ||= 0; count_hash[x[:name]] += 1 }
arr.reject! { |x| count_hash[x[:name]] == 1 }
Here are three more ways that might be of some interest, though I prefer Robert's solution.
Each of the following returns:
#=> [{:name=>"Josh" , :grade=> 90}, {:name=>"Josh" , :grade=>70},
# {:name=>"Kevin", :grade=>100}, {:name=>"Kevin", :grade=>95},
# {:name=>"Rod" , :grade=> 90}, {:name=>"Rod" , :grade=>70}]
#1
Use the well-worn but dependable Enumerable#group_by to aggregate by name, Hash#values to extract the values then reject those that appear but once:
arr.group_by { |h| h[:name] }.values.reject { |a| a.size == 1 }.flatten
#2
Use the class method Hash#new with a default of zero to identify names with multiple entries, then select for those:
multiples = arr.each_with_object(Hash.new(0)) { |h,g| g[h[:name]] += 1 }
.reject { |_,v| v == 1 } #=> {"Josh"=>2, "Kevin"=>2, "Rod"=>2}
arr.select { |h| multiples.key?(h[:name]) }
#3
Use the form of Hash#update (aka Hash#merge!) that takes a block to determine names that appear only once, then reject for those:
singles = arr.each_with_object({}) { |h,g|
g.update({ h[:name] => 1 }) { |_,o,n| o+n } }
.select { |_,v| v == 1 } #=> {"Ben"=>1, "Jack"=>1}
arr.reject { |h| singles.key?(h[:name]) }

Resources