Merging/adding hashes in array with same key - ruby
I have an array of hashes:
array = [
{"points": 0, "block": 3},
{"points": 25, "block": 8},
{"points": 65, "block": 4}
]
I need to merge the hashes. I need the output to be:
{"points": 90, "block": 15}
You could merge the hashes together, adding the values that are in both hashes:
result = array.reduce do |memo, next_hash|
memo.merge(next_hash) do |key, memo_value, next_hash_value|
memo_value + next_hash_value
end
end
result # => {:points=>90, :block=>15}
and if your real hash has keys that don't respond well to +, you have access to the key, you could set up a case statement, to handle the keys differently, if needed.
If you have the array as you mentioned in this structure:
array = [
{"points": 0, "block": 3},
{"points": 25, "block": 8},
{"points": 65, "block": 4}
]
You can use the following code to achieve your goal:
result = {
points: array.map{ |item| item[:points] }.inject(:+),
block: array.map{ |item| item[:block] }.inject(:+)
}
You will get this result:
{:points=>90, :block=>15}
Note: This will iterate twice over the array. I'm trying to figure out a better way to iterate once and still have the same elegant/easy to ready code.
If you want to do it more generically (more keys than :points and :block), then you can use this code:
array = [
{"points": 0, "block": 3},
{"points": 25, "block": 8},
{"points": 65, "block": 4}
]
keys = [:points, :block] # or you can make it generic with array.first.keys
result = keys.map do |key|
[key, array.map{ |item| item.fetch(key, 0) }.inject(:+)]
end.to_h
You can create method as below to get the result
def process_array(array)
points = array.map{|h| h[:points]}
block = array.map{|h| h[:block]}
result = {}
result['points'] = points.inject{ |sum, x| sum + x }
result['block'] = block.inject{ |sum, x| sum + x }
result
end
and calling the method with array input will give you expected result.
[54] pry(main)> process_array(array)
=> {"points"=>90, "block"=>15}
You can also use the Enumerator each_with_object, using a hash as object.
result = array.each_with_object(Hash.new(0)) {|e, h| h[:points] += e[:points]; h[:block] += e[:block] }
# => {:points=>90, :block=>15}
Hash.new(0) means initialise the hash to default value 0 for any keys, for example:
h = Hash.new(0)
h[:whathever_key] # => 0
I was interested in how the reduce method introduced by "Simple Lime" worked and also how it would benchmark against simple iteration over the array and over the keys of each hash.
Here is the code of the "iteration" approach:
Hash.new(0).tap do |result|
array.each do |hash|
hash.each do |key, val|
result[key] = result[key] + val
end
end
end
I was surprised, that the "iteration" code performed 3 times better than the reduce approach.
Here is the benchmark code https://gist.github.com/landovsky/6a1b29cbf13d0cf81bad12b6ba472416
Related
How to convert a three-line Ruby method into one
I have a simple method that iterates through an array and returns a duplicate. (Or duplicates) def find_dup(array) duplicate = 0 array.each { |element| duplicate = element if array.count(element) > 1} duplicate end It works, but I'd like to express this more elegantly. The reason it is three lines is that the variable "duplicate", which the method must return, is not visible to the method if I introduce it inside the block, i.e, def find_dup(array) array.each { |element| duplicate = element if array.count(element) > 1} duplicate end I've tried a few ways to define "duplicate" as the result of a block, but to no avail. Any thoughts?
It's a little too much to do cleanly in a one-liner, but this is a more efficient solution. def find_dups(arr) counts = Hash.new { |hash,key| hash[key] = 0 } arr.each_with_object(counts) do |x, memo| memo[x] += 1 end.select { |key,val| val > 1 }.keys end The Hash.new call instantiates a hash where the default value is 0. each_with_object modifies this hash to track the count of each element in arr, then at the end the filter is used to select only those having a count greater than one. The benefit of this approach over a solution using Array#includes? or Array#count is that it only scans the array a single time. Thus it is a O(N) time instead of O(N^2).
Your method is only finding the last duplicate in the array. If you want all the duplicates, I would do something like this: def find_dups(arr) dups = Hash.new { |h, k| h[k] = 0 } arr.each { |el| dups[el] += 1 } dups.select { |k, v| v > 1 }.keys end
If what you really want is a one-liner that isn't concerned with big-O complexity and only returns the last duplicate in the array, I would do this: def find_last_dup(arr) arr.reverse_each { |el| return el if arr.count(el) > 1 } end
You can do this as one line and it flows a bit nicer. Though this would find the first instance of a duplicate whereas your code is returning the last instance of a duplicate, not sure if that's part of your requirement. def find_dup(array) array.group_by { |value| value }.find { |_, groups| groups.count > 1 }.first end Also, note that making things one line doesn't strictly mean is better. I'd find the code more readable split over more lines, but that's just my opinion. def find_dup(array) array.group_by { |value| value }.find { |_, groups| groups.count > 1 }.first end
Just want to add one more approach to the mix. def find_last_dup(arr) arr.reverse_each.detect { |x| arr.count(x) > 1 } end Alternatively, you can get linear time complexity in two lines. def find_last_dup(arr) freq = arr.each_with_object(Hash.new(0)) { |x, obj| obj[x] += 1 } arr.reverse_each.detect { |x| freq[x] > 1 } end For the sake of argument, the latter approach can be reduced to one line as well, but this would be unidiomatic and confusing. def find_last_dup(arr) arr.each_with_object(Hash.new(0)) { |x, obj| obj[x] += 1 } .tap do |freq| return arr.reverse_each.detect { |x| freq[x] > 1 } end end
Given: > a => [8, 5, 6, 6, 5, 8, 6, 1, 9, 7, 2, 10, 7, 7, 3, 4] You can group the dups together: > a.uniq.each_with_object(Hash.new(0)) {|e, h| c=a.count(e); h[e]=c if c>1} => {8=>2, 5=>2, 6=>3, 7=>3} Or, > a.group_by{ |e| e}.select{|k,v| v if v.length>1} => {8=>[8, 8], 5=>[5, 5], 6=>[6, 6, 6], 7=>[7, 7, 7]} In each case, the order of the result is based on the order of the elements in a that have dups. If you just want the first: > a.group_by{ |e| e}.select{|k,v| v if v.length>1}.first => [8, [8, 8]] Or last: > a.group_by{ |e| e}.select{|k,v| v if v.length>1}.to_a.last => [7, [7, 7, 7]] If you want to 'fast forward' to the first value that has a dup, you can use drop_while: > b=[1,2,3,4,5,4,5,6] > b.drop_while {|e| b.count(e)==1 }[0] => 4 Or the last: > b.reverse.drop_while {|e| b.count(e)==1 }[0] => 5
def find_duplicates(array) array.dup.uniq.each { |element| array.delete_at(array.index(element)) }.uniq end The above method find_duplicates duplicated the input array and deletes the first occurrence of all the elements, leaving the array with only remaining occurrences of the duplicate elements. Example: array = [1, 2, 3, 4, 3, 4, 3] => [1, 2, 3, 4, 3, 4, 3] find_duplicates(array) => [3, 4]
Ruby Array default value?
I have hundreds of arrays that am normalizing for a CSV. [ ["foo", "tom", nil, 1, 4, "cheese"], ["foo", "tom", "fluffy",nil, 4], ["foo", "tom", "fluffy",1, nil], ... ] Currently to make them all equal length i am finding the max length and setting to a value. rows.each { |row| row[max_index] ||= nil } this is cool because it makes the array length equal to the new length. Instead of appending a bunch of nils at the end I needed to append COLUMN_N where N is the index (1-based). table_rows.each do |row| last_index = row.length - 1 (last_index..max_index).to_a.each { |index| row[index] ||= "COLUMN_#{index+1}" } end Which seemed like an awkward way to have a default value that is a function of the index.
You can't choose a default value for filling elements with []= method. But you can easily do something like this if there aren't other nils that you don't want to replace. row.each_with_index.map { |item, index| item.nil? ? "column_#{index}": item }
To get a default value instead of nil you can use fetch: row = ["foo", "tom", "fluffy", 1, 4] row.fetch(7) { |i| "COLUMN_#{i + 1}" } => "COLUMN_8" But it won't fill the array for you. Also see: Can I create an array in Ruby with default values?
This seems like it could work for you. class Array def push_with_default(item, index, &block) new_arr = Array.new([self.size + 1, index].max, &block) self[index] = item self.map!.with_index { |n, i| n.nil? ? new_arr[i] : n } end end >> array = [1,2,5,9] [ [0] 1, [1] 2, [2] 5, [3] 9 ] >> array.push_with_default(2, 10) { |i| "column_#{i}" } [ [ 0] 1, [ 1] 2, [ 2] 5, [ 3] 9, [ 4] "column_4", [ 5] "column_5", [ 6] "column_6", [ 7] "column_7", [ 8] "column_8", [ 9] "column_9", [10] 2 ] I don't believe a method like this exists on Array already though.
Sort hash by key which is a string
Assuming I get back a string: "27,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,12,17,17,41,17,17,17,17,17,17,17,17,17,17,17,17,17,26,26,26,26,26,26,26,26,26,29,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,40,48,28,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,29,29,29,29,29,29,29,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,34,34,34,34,34,34,36,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,41,41,41,41,41,41,41,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,43,43,43,43,43,43,43,43,43,43,43,43,43,44,44,44,44,48,49,29,41,6,30,11,29,29,36,29,29,36,29,43,1,29,29,29,1,41" I turn that into an array by calling str.split(',') Then turning it into a hash by calling arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } I would get back a hash that looks like {"1"=>2, "6"=>1, "39"=>23, "36"=>23, "34"=>39, "32"=>31, "30"=>18, "3"=>8, "2"=>10, "28"=>36, "29"=>21, "26"=>41, "27"=>48, "49"=>1, "44"=>4, "43"=>14, "42"=>34, "48"=>2, "40"=>9, "41"=>10, "11"=>1, "17"=>15, "12"=>1} However, I'd like to sort that hash by key. I've tried the solutions listed here. I believe my problem is related to the fact they keys are strings. The closest I got was using Hash[h.sort_by{|k,v| k.to_i}]
Hashes shouldn't be treated as a sorted data structure. They have other advantages and use case as to return their values sequentially. As Mladen Jablanović already pointed out a array of tuples might be the better data structure when you need a sorted key/value pair. But in current versions of Ruby there actually exists a certain order in which key/value pairs are returned when you call for example each on a hash and that is the order of insertion. Using this behavior you can just build a new hash and insert all key/value pairs into that new hash in the order you want them to be. But keep in mind that the order will break when you add more entries later on. string = "27,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,12,17,17,41,17,17,17,17,17,17,17,17,17,17,17,17,17,26,26,26,26,26,26,26,26,26,29,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,40,48,28,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,29,29,29,29,29,29,29,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,34,34,34,34,34,34,36,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,41,41,41,41,41,41,41,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,43,43,43,43,43,43,43,43,43,43,43,43,43,44,44,44,44,48,49,29,41,6,30,11,29,29,36,29,29,36,29,43,1,29,29,29,1,41" sorted_number_count_tupels = string.split(','). group_by(&:itself). map { |k, v| [k, v.size] }. sort_by { |(k, v)| k.to_i } #=> [["1",2],["2",10],["3",8],["6",1],["11",1],["12",1],["17",15],["26",41],["27",48],["28",36],["29",21],["30",18],["32",31],["34",39],["36",23],["39",23],["40",9],["41",10],["42",34],["43",14],["44",4],["48",2],["49",1]] sorted_number_count_hash = sorted_number_count_tupels.to_h #=> { "1" => 2, "2" => 10, "3" => 8, "6" => 1, "11" => 1, "12" => 1, "17" => 15, "26" => 41, "27" => 48, "28" => 36, "29" => 21, "30" => 18, "32" => 31, "34" => 39, "36" => 23, "39" => 23, "40" => 9, "41" => 10, "42" => 34, "43" => 14, "44" => 4, "48" => 2, "49" => 1}
Suppose you started with str = "27,2,2,2,41,26,26,26,48,48,41,6,11,1,41" and created the following hash h = str.split(',').inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } #=> {"27"=>1, "2"=>3, "41"=>3, "26"=>3, "48"=>2, "6"=>1, "11"=>1, "1"=>1} I removed compact because the array str.split(',') contains only (possibly empty) strings, no nils. Before continuing, you may want to change this last step to h = str.split(/\s*,\s*/).each_with_object(Hash.new(0)) { |e,h| h[e] += 1 } #=> {"27"=>1, "2"=>3, "41"=>3, "26"=>3, "48"=>2, "6"=>1, "11"=>1, "1"=>1} Splitting on the regex allows for the possibility of one or more spaces before or after each comma, and Enumerable#each_with_object avoids the need for that pesky ; h. (Notice the block variables are reversed.) Then h.sort_by { |k,_| k.to_i }.to_h #=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2} creates a new hash that contains h's key-value pairs sorted by the integer representations of the keys. See Hash#sort_by. Notice we've created two hashes. Here's a way to do that by modifying h in place. h.keys.sort_by(&:to_i).each { |k| h[k] = h.delete(k) } #=> ["1", "2", "6", "11", "26", "27", "41", "48"] (each always returns the receiver) h #=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2} Lastly, another alternative is to sort str.split(',') before creating the hash. str.split(',').sort_by(&:to_i).each_with_object(Hash.new(0)) { |e,h| h[e] += 1 } #=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2}
Notes compact String#split cannot return a nil element. compact won't be useful, here. split might return an empty string, though : p "1,,2,3".split(',') # ["1", "", "2", "3"] p "1,,2,3".split(',').compact # ["1", "", "2", "3"] p "1,,2,3".split(',').reject(&:empty?) # ["1", "2", "3"] inject If you have to use two statements inside inject block, each_with_object might be a better idea : arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } can be rewritten : arr.compact.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 } Hash or Array? If you need to sort results, an Array of pairs might be more suitable than a Hash. String or Integer? If you accept to have an integer as key, it might make your code easier to write. Refactoring Here's a possibility to rewrite your code : str.split(',') .reject(&:empty?) .map(&:to_i) .group_by(&:itself) .map { |k, v| [k, v.size] } .sort It outputs : [[1, 2], [2, 10], [3, 8], [6, 1], [11, 1], [12, 1], [17, 15], [26, 41], [27, 48], [28, 36], [29, 21], [30, 18], [32, 31], [34, 39], [36, 23], [39, 23], [40, 9], [41, 10], [42, 34], [43, 14], [44, 4], [48, 2], [49, 1]] If you really want a Hash, you can add .to_h : {1=>2, 2=>10, 3=>8, 6=>1, 11=>1, 12=>1, 17=>15, 26=>41, 27=>48, 28=>36, 29=>21, 30=>18, 32=>31, 34=>39, 36=>23, 39=>23, 40=>9, 41=>10, 42=>34, 43=>14, 44=>4, 48=>2, 49=>1}
You can assign the arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } to a variable and sort it by key: num = arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } num.keys.sort That would sort the hash by key.
A Ruby hash will keep the order of keys added. If the array is small enough to sort I would just change str.split(','). to str.split(',').sort_by(&:to_i) in order to get the values, and therefore also you hash sorted...
Subtracting Values in two identical hashes in ruby
I have two hashes that are identical in structure... hash1 = {:total=>{:gold=>100, :dark=>500}, :defensive=>{:gold=>100, :dark=>500}} hash2 = {:total=>{:gold=>20, :dark=>200}, :defensive=>{:gold=>20, :dark=>200}} I want to subtract and return the following result... hash1 - hash2 => {:total=>{:gold=>80, :dark=>300}, :defensive=>{:gold=>80, :dark=>300}} Maybe this type of operation is not recommended. I'd appreciate that feedback as well. :-)
I would just do: hash1 = {:total=>{:gold=>100, :dark=>500}, :defensive=>{:gold=>100, :dark=>500}} hash2 = {:total=>{:gold=>20, :dark=>200}, :defensive=>{:gold=>20, :dark=>200}} hash1.merge(hash2) { |_, l, r| l.merge(r) { |_, x, y| x - y } } #=> {:total=>{:gold=>80, :dark=>300}, :defensive=>{:gold=>80, :dark=>300}}
You could use recursion: def diff(f,g) f.each_with_object({}) do |(k,v),h| h[k] = case v when Fixnum then v-g[k] else diff v,g[k] end end end diff hash1, hash2 #=> {:total=> {:gold=>80, :dark=>300}, # :defensive=>{:gold=>80, :dark=>300}} #under_gongor pointed out that this works for parallel, nested hashes. Here's an example: hash1 = {:total=>{:gold=>350, :dark=>500}, :defensive=>{:next=>{:gold=>300, :dark=>500}, :last=>{:gold=>150, :dark=>300}}} hash2 = {:total=>{:gold=>300, :dark=>100}, :defensive=>{:next=>{:gold=>100, :dark=>200}, :last=>{:gold=>100, :dark=>200}}} diff hash1, hash2 #=> {:total=>{:gold=> 50, :dark=>400}, # :defensive=>{:next=>{:gold=>200, :dark=>300}, # :last=>{:gold=> 50, :dark=>100}}}
hash1 = {:total=>{:gold=>100, :dark=>500}, :defensive=>{:gold=>100, :dark=>500}} hash2 = {:total=>{:gold=>20, :dark=>200}, :defensive=>{:gold=>20, :dark=>200}} {}.tap do |hash| hash1.each do |key, subhash1| subhash2 = hash2[key] hash[key] ||= {} subhash1.each do |k, val1| val2 = subhash2[k] hash[key][k] = val1 - val2 end end end Output is: {:total=>{:gold=>80, :dark=>300}, :defensive=>{:gold=>80, :dark=>300}}
Here's another approach. It may not be preferred for the present problem, but it illustrates a general technique that is sometimes useful. Step 1: Extract the inner and outer keys: okeys, ikeys = hash1.keys, hash1.values.first.keys #=> [[:total, :defensive], [:gold, :dark]] Step 2: Extract the numerical values and compute differences a = [hash1,hash2]. map { |h| h.values.map { |g| g.values_at(*ikeys) } }. transpose. map(&:transpose). map { |a| a.reduce(:-) } #=> [[100, 20], [100, 20]] Step 3: Construct the output hash okeys.zip(a.map { |b| ikeys.zip(b).to_h }).to_h #=> {:total=>{:gold=>100, :dark=>20}, :defensive=>{:gold=>100, :dark=>20}} One could combine Steps 2 and 3 by substituting out a in Step 3. ikeys and okeys could also be substituted out, to make it a one-liner, but I would not advocate that. Explanation for Step 2 Step 2 may appear a bit complex, but it's really not if you go through the operations one at a time: Remove the numerical values, using Hash#values_at to ensure correct ordering: b = [hash1,hash2].map { |h| h.values.map { |g| g.values_at(*ikeys) } } #=> [[[100, 500], [100, 500]], [[20, 200], [20, 200]]] Manipulate the array until it is in the proper form for calculating differences: c = b.transpose #=> [[[100, 500], [20, 200]], [[100, 500], [20, 200]]] d = c.map(&:transpose) #=> [[[100, 20], [500, 200]], [[100, 20], [500, 200]]] Compute differences: a = d.map { |a| a.reduce(:-) } #=> [[100, 20], [100, 20]]
Iterating over hash of arrays
I have the following: #products = { 2 => [ #<Review id: 9, answer01: 3, score: 67, style_id: 2, consumer_id: 2, branch_id: 2, business_id: 2> ], 15 => [ #<Review id: 10, answer01: 3, score: 67, style_id: 2, consumer_id: 2, branch_id: 2, business_id: 2>, #<Review id: 11, answer01: 3, score: 67, style_id: 2, consumer_id: 2, branch_id: 2, business_id: 2> ] } I want to average the scores for all reviews associated with each product's hash key. How can I do this?
To iterate over a hash: hash = {} hash.each_pair do |key,value| #code end To iterate over an array: arr=[] arr.each do |x| #code end So iterating over a hash of arrays (let's say we're iterating over each array in each point in the hash) would be done like so: hash = {} hash.each_pair do |key,val| hash[key].each do |x| #your code, for example adding into count and total inside program scope end end
Yes, just use map to make and array of the scores for each product and then take the average of the array. average_scores = {} #products.each_pair do |key, product| scores = product.map{ |p| p.score } sum = scores.inject(:+) # If you are using rails, you can also use scores.sum average = sum.to_f / scores.size average_scores[key] = average end
Thanks for the answer Shingetsu, I will certainly upvote it. I accidentally figured the answer out myself. trimmed_hash = #products.sort.map{|k, v| [k, v.map{|a| a.score}]} trimmed_hash.map{|k, v| [k, v.inject(:+).to_f/v.length]}