Delete element from nested hash in Ruby - ruby

I have a two dimensional hashes in Ruby.
h = { "a" => {"v1" => 0, "v2" => 1}, "c" => {"v1" => 2, "v2" => 3} }
I would like to delete those elements from the hash, where value 1 (v1) is 0, for example, so my result would be:
{ "c" => {"v1" => 2, "v2" => 3} }
I wanted to achieve this by iterating trough the hash with delete_if, but I'm not sure how to handle the nested parts with it.

Is that what you're looking for?
h.delete_if { |_, v| v['v1'].zero? }
#=> {"c" => {"v1" => 2, "v2" => 3}}
As #TomLord says, it also may be variant, when v1 can be not defined or equal to nil, in this case, it would be better to use v['v1'] == 0

You can use Hash#value? in your block to check if any of the values in the nested hashes equal 0:
hash.delete_if { |k,v| v.value? 0 } #=> { "c" => { "v1" => 2, "v2" => 3 } }

Related

How to find the largest value of a hash in an array of hashes

In my array, I'm trying to retrieve the key with the largest value of "value_2", so in this case, "B":
myArray = [
"A" => {
"value_1" => 30,
"value_2" => 240
},
"B" => {
"value_1" => 40,
"value_2" => 250
},
"C" => {
"value_1" => 18,
"value_2" => 60
}
]
myArray.each do |array_hash|
array_hash.each do |key, value|
if value["value_2"] == array_hash.values.max
puts key
end
end
end
I get the error:
"comparison of Hash with Hash failed (ArgumentError)".
What am I missing?
Though equivalent, the array given in the question is generally written:
arr = [{ "A" => { "value_1" => 30, "value_2" => 240 } },
{ "B" => { "value_1" => 40, "value_2" => 250 } },
{ "C" => { "value_1" => 18, "value_2" => 60 } }]
We can find the desired key as follows:
arr.max_by { |h| h.values.first["value_2"] }.keys.first
#=> "B"
See Enumerable#max_by. The steps are:
g = arr.max_by { |h| h.values.first["value_2"] }
#=> {"B"=>{"value_1"=>40, "value_2"=>250}}
a = g.keys
#=> ["B"]
a.first
#=> "B"
In calculating g, for
h = arr[0]
#=> {"A"=>{"value_1"=>30, "value_2"=>240}}
the block calculation is
a = h.values
#=> [{"value_1"=>30, "value_2"=>240}]
b = a.first
#=> {"value_1"=>30, "value_2"=>240}
b["value_2"]
#=> 240
Suppose now arr is as follows:
arr << { "D" => { "value_1" => 23, "value_2" => 250 } }
#=> [{"A"=>{"value_1"=>30, "value_2"=>240}},
# {"B"=>{"value_1"=>40, "value_2"=>250}},
# {"C"=>{"value_1"=>18, "value_2"=>60}},
# {"D"=>{"value_1"=>23, "value_2"=>250}}]
and we wish to return an array of all keys for which the value of "value_2" is maximum (["B", "D"]). We can obtain that as follows.
max_val = arr.map { |h| h.values.first["value_2"] }.max
#=> 250
arr.select { |h| h.values.first["value_2"] == max_val }.flat_map(&:keys)
#=> ["B", "D"]
flat_map(&:keys) is shorthand for:
flat_map { |h| h.keys }
which returns the same array as:
map { |h| h.keys.first }
See Enumerable#flat_map.
Code
p myArray.pop.max_by{|k,v|v["value_2"]}.first
Output
"B"
I'd use:
my_array = [
"A" => {
"value_1" => 30,
"value_2" => 240
},
"B" => {
"value_1" => 40,
"value_2" => 250
},
"C" => {
"value_1" => 18,
"value_2" => 60
}
]
h = Hash[*my_array]
# => {"A"=>{"value_1"=>30, "value_2"=>240},
# "B"=>{"value_1"=>40, "value_2"=>250},
# "C"=>{"value_1"=>18, "value_2"=>60}}
k = h.max_by { |k, v| v['value_2'] }.first # => "B"
Hash[*my_array] takes the array of hashes and turns it into a single hash. Then max_by will iterate each key/value pair, returning an array containing the key value "B" and the sub-hash, making it easy to grab the key using first:
k = h.max_by { |k, v| v['value_2'] } # => ["B", {"value_1"=>40, "value_2"=>250}]
I guess the idea of your solution is looping through each hash element and compare the found minimum value with hash["value_2"].
But you are getting an error at
if value["value_2"] == array_hash.values.max
Because the array_hash.values is still a hash
{"A"=>{"value_1"=>30, "value_2"=>240}}.values.max
#=> {"value_1"=>30, "value_2"=>240}
It should be like this:
max = nil
max_key = ""
myArray.each do |array_hash|
array_hash.each do |key, value|
if max.nil? || value.values.max > max
max = value.values.max
max_key = key
end
end
end
# max_key #=> "B"
Another solution:
myArray.map{ |h| h.transform_values{ |v| v["value_2"] } }.max_by{ |k| k.values }.keys.first
You asked "What am I missing?".
I think you are missing a proper understanding of the data structures that you are using. I suggest that you try printing the data structures and take a careful look at the results.
The simplest way is p myArray which gives:
[{"A"=>{"value_1"=>30, "value_2"=>240}, "B"=>{"value_1"=>40, "value_2"=>250}, "C"=>{"value_1"=>18, "value_2"=>60}}]
You can get prettier results using pp:
require 'pp'
pp myArray
yields:
[{"A"=>{"value_1"=>30, "value_2"=>240},
"B"=>{"value_1"=>40, "value_2"=>250},
"C"=>{"value_1"=>18, "value_2"=>60}}]
This helps you to see that myArray has only one element, a Hash.
You could also look at the expression array_hash.values.max inside the loop:
myArray.each do |array_hash|
p array_hash.values
end
gives:
[{"value_1"=>30, "value_2"=>240}, {"value_1"=>40, "value_2"=>250}, {"value_1"=>18, "value_2"=>60}]
Not what you expected? :-)
Given this, what would you expect to be returned by array_hash.values.max in the above loop?
Use p and/or pp liberally in your ruby code to help understand what's going on.

Most performant way to group/summarise two hashes?

I have two hashes with some data that I need to aggregate. The first one is a mapping of which ids (id_1, id_2, id_3, id_4) belong under what category (a, b, c):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
The second hash holds values of how many events happened per id for a given date (date_1, date_2, date_3):
hash_2 = {
'id_1' => {'date_1' => 5, 'date_2' => 6, 'date_3' => 8},
'id_2' => {'date_1' => 0, 'date_3' => 6},
'id_3' => {'date_1' => 0, 'date_2' => nil, 'date_3' => 1},
'id_4' => {'date_1' => 10, 'date_2' => 1}
}
What I want is to get the total event per category (a,b,c). For the above example, the result would look something like:
hash_3 = {'a' => (5+6+8+0+6), 'b' => (0+0+1), 'c' => (10+1)}
My problem is, that there are about 5000 categories, each pointing to typically 1 to 3 ids, and each ID having event counts for 30 dates or more. So this takes quite a bit of computation. What will be the most performant (time effective) way to do this grouping in Ruby?
update
This is what I tried so far (took like 6-8 seconds!, horribly slow):
def total_clicks_per_category
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
end
def total_event_per_ids(ids)
ids.reduce(0) do |memo, id|
events = hash_2.fetch(id, {})
memo + (events.values.reduce(:+) || 0)
end
end
P.S. I’m using Ruby 2.3.
I'm writing this on a phone so I cannot test right now, but it looks OK.
g = hash_2.each_with_object({}) { |(k,v),g| g[k] = v.values.compact.sum }
hash_3 = hash_1.each_with_object({}) { |(k,v),h| h[k] = g.values_at(*v).sum }
First, create an intermediate hash that holds the sum of hash_2:
hash_4 = hash_2.map{|k, v| [k, v.values.inject(:+)]}.to_h
# => {"id_1"=>19, "id_2"=>6, "id_3"=>1, "id_4"=>11}
Then do the final summation:
hash_3 = hash_1.map{|k, v| [k, v.map{|k| hash_4[k]}.inject(:+)]}.to_h
# => {"a"=>25, "b"=>1, "c"=>11}
Theory
5000*3*30 isn't that many. Ruby probably will need a second at most for this kind of job.
Hash lookup is fast by default, you won't be able to optimize much.
You could pre-calculate hash_2_sum, though :
hash_2_sum = {
'id_1' => 5+6+8,
'id_2' => 0+6,
'id_3' => 0+0+1,
'id_4' => 10+1
}
A loop on hash1 with hash_2_sum lookup, and you're done.
Code
Your example has been updated with some nil values. You need to remove them with compact, and make sure the sum is 0 when no element is found with inject(0, :+):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
hash_2 = {
'id_1' => { 'date_1' => 5, 'date_2' => 6, 'date_3' => 8 },
'id_2' => { 'date_1' => 0, 'date_3' => 6 },
'id_3' => { 'date_1' => 0, 'date_2' => nil, 'date_3' => 1 },
'id_4' => { 'date_1' => 10, 'date_2' => 1 }
}
hash_2_sum = hash_2.each_with_object({}) do |(key, dates), sum|
sum[key] = dates.values.compact.inject(0, :+)
end
hash_3 = hash_1.each_with_object({}) do |(key, ids), sum|
sum[key] = hash_2_sum.values_at(*ids).inject(0, :+)
end
# {"a"=>25, "b"=>1, "c"=>11}
Note
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
isn't very readable IMHO.
You can either use each_with_object or Array#to_h :
result = [1, 2, 3].each_with_object({}) do |i, hash|
hash[i] = i * i
end
#=> {1=>1, 2=>4, 3=>9}
result = [1, 2, 3].map { |i| [i, i * i] }.to_h
#=> {1=>1, 2=>4, 3=>9}

Merging keys and values from hash in Ruby

I have a hash with the number of occurrences of cities from a model and I want to merge nil or blank keys and values into one and rename the new single key
for example
#raw_data = {nil => 2, "" => 2, "Berlin" => 1, "Paris" => 1}
# to
#corrected_data = {"undefined" => 4, "Berlin" => 1, "Paris" => 1}
I have looked into the merge method from the Rubydoc but it only merges if two keys are similar, but in this case nil and "" are not the same.
I could also create a new hash and add if nil or blank statements and add the values to a new undefined key, but it seems tedious.
I'd do :
#raw_data = {nil => 2, "" => 2, "Berlin" => 1, "Paris" => 1}
key_to_merge = [nil, ""]
#correct_data = #raw_data.each_with_object(Hash.new(0)) do |(k,v), out_hash|
next out_hash['undefined'] += v if key_to_merge.include? k
out_hash[k] = v
end
#correct_data # => {"undefined"=>4, "Berlin"=>1, "Paris"=>1}
Ruby cannot automatically determine what do you want to do with values represented by two "similar" keys. You have to define the mapping yourself. One approach would be:
def undefined?(key)
# write your definition of undefined here
key.nil? || key.length == 0
end
def merge_value(existing_value, new_value)
return new_value if existing_value.nil?
# your example seemed to add values
existing_value + new_value
end
def corrected_key(key)
return "undefined" if undefined?(key)
key
end
#raw_data = {nil => 2, "" => 2, "Berlin" => 1, "Paris" => 1}
#raw_data.reduce({}) do |acc, h|
k, v = h
acc[corrected_key(k)] = merge_value(acc[corrected_key(k)], v)
acc
end
# {"undefined" => 4, "Berlin" => 1, "Paris" => 1}

Ruby: Link two arrays of objects by attribute value

I'm pretty new in Ruby programming. In Ruby there are plenty ways to write elegant code. Is there any elegant way to link two arrays with objects of the same type by attribute value?
It's hard to explain. Let's look at the next example:
a = [ { :id => 1, :value => 1 }, { :id => 2, :value => 2 }, { :id => 3, :value => 3 } ]
b = [ { :id => 1, :value => 2 }, { :id => 3, :value => 4 } ]
c = link a, b
# Result structure after linkage.
c = {
"1" => {
:a => { :id => 1, :value => 1 },
:b => { :id => 1, :value => 1 }
},
"3" => {
:a => { :id => 3, :value => 3 },
:b => { :id => 3, :value => 4 }
}
}
So the basic idea is to get pairs of objects from different arrays by their common ID and construct a hash, which will give this pair by ID.
Thanks in advance.
If you want to take an adventure through Enumerable, you could say this:
(a.map { |h| [:a, h] } + b.map { |h| [:b, h] })
.group_by { |_, h| h[:id] }
.select { |_, a| a.length == 2 }
.inject({}) { |h, (n, v)| h.update(n => Hash[v]) }
And if you really want the keys to be strings, say n.to_s => Hash[v] instead of n => Hash[v].
The logic works like this:
We need to know where everything comes from we decorate the little hashes with :a and :b symbols to track their origins.
Then add the decorated arrays together into one list so that...
group_by can group things into almost-the-final-format.
Then find the groups of size two since those groups contain the entries that appeared in both a and b. Groups of size one only appeared in one of a or b so we throw those away.
Then a little injection to rearrange things into their final format. Note that the arrays we built in (1) just somehow happen to be in the format that Hash[] is looking for.
If you wanted to do this in a link method then you'd need to say things like:
link :a => a, :b => b
so that the method will know what to call a and b. This hypothetical link method also easily generalizes to more arrays:
def link(input)
input.map { |k, v| v.map { |h| [k, h] } }
.inject(:+)
.group_by { |_, h| h[:id] }
.select { |_, a| a.length == input.length }
.inject({}) { |h, (n, v)| h.update(n => Hash[v]) }
end
link :a => [...], :b => [...], :c => [...]
I assume that, for any two elements h1 and h2 of a (or of b), h1[:id] != h2[:id].
I would do this:
def convert(arr) Hash[arr.map {|h| [h[:id], h]}] end
ah, bh = convert(a), convert(b)
c = ah.keys.each_with_object({}) {|k,h|h[k]={a: ah[k], b: bh[k]} if bh.key?(k)}
# => {1=>{:a=>{:id=>1, :value=>1}, :b=>{:id=>1, :value=>2}},
# 3=>{:a=>{:id=>3, :value=>3}, :b=>{:id=>3, :value=>4}}}
Note that:
ah = convert(a)
# => {1=>{:id=>1, :value=>1}, 2=>{:id=>2, :value=>2}, 3=>{:id=>3, :value=>3}}
bh = convert(b)
# => {1=>{:id=>1, :value=>2}, 3=>{:id=>3, :value=>4}}
Here's a second approach. I don't like it as well, but it represents a different way of looking at the problem.
def sort_by_id(a) a.sort_by {|h| h[:id]} end
c = Hash[*sort_by_id(a.select {|ha| b.find {|hb| hb[:id] == ha[:id]}})
.zip(sort_by_id(b))
.map {|ha,hb| [ha[:id], {a: ha, b: hb}]}
.flatten]
Here's what's happening. The first step is to select only the elements ha of a for which there is an element hb of b for which ha[:id] = hb[id]. Then we sort both (what's left of) a and b on h[:id], zip them together and then make the hash c.
r1 = a.select {|ha| b.find {|hb| hb[:id] == ha[:id]}}
# => [{:id=>1, :value=>1}, {:id=>3, :value=>3}]
r2 = sort_by_id(r1)
# => [{:id=>1, :value=>1}, {:id=>3, :value=>3}]
r3 = sort_by_id(b)
# => [{:id=>1, :value=>2}, {:id=>3, :value=>4}]
r4 = r2.zip(r3)
# => [[{:id=>1, :value=>1}, {:id=>1, :value=>2}],
# [{:id=>3, :value=>3}, {:id=>3, :value=>4}]]
r5 = r4.map {|ha,hb| [ha[:id], {a: ha, b: hb}]}
# => [[1, {:a=>{:id=>1, :value=>1}, :b=>{:id=>1, :value=>2}}],
# [3, {:a=>{:id=>3, :value=>3}, :b=>{:id=>3, :value=>4}}]]
r6 = r5.flatten
# => [1, {:a=>{:id=>1, :value=>1}, :b=>{:id=>1, :value=>2}},
# 3, {:a=>{:id=>3, :value=>3}, :b=>{:id=>3, :value=>4}}]
c = Hash[*r6]
# => {1=>{:a=>{:id=>1, :value=>1}, :b=>{:id=>1, :value=>2}},
# 3=>{:a=>{:id=>3, :value=>3}, :b=>{:id=>3, :value=>4}}}
Ok, I've found the answer by myself. Here is a quite short line of code, which should do the trick:
Hash[a.product(b)
.select { |pair| pair[0][:id] == pair[1][:id] }
.map { |pair| [pair[0][:id], { :a => pair[0], :b => pair[1] }] }]
The product method gives us all possible pairs, then we filter them by equal IDs of pair elements. And then we map pairs to the special form, which will produce a Hash we are looking for.
So Hash[["key1", "value1"], ["key2", "value2"]] returns { "key1" => "value1", "key2" => "value2" }. And I use this to get the answer on my question.
Thanks.
P.S.: you can use pair.first instead of pair[0] and pair.last instead of pair[1] for better readability.
UPDATE
As Cary pointed out, it is better to replace |pair| with |ha, hb| to avoid these ugly indices:
Hash[a.product(b)
.select { |ha, hb| ha[:id] == hb[:id] }
.map { |ha, hb| [ha[:id], { :a => ha, :b => hb }] }]

Joining an array of keys to a hash with key value pairs like excel vlookup

I've got an unsorted array of keys like this:
keys = ["ccc", "ddd", "ggg", "aaa", "bbb"]
and a hash
hash = {"ddd" => 4, "aaa" => 1, "bbb" => 2, "eee" => 5, "fff" => 6}
I'd like to join these two data structures to return a hash in the original order of keys to the first keys:
{"ccc" => nil, "ddd" => 4, "ggg" => nil, "aaa" => 1, "bbb" => 2}
Items NOT in the hash (like "ggg") should return nil.
This is analogous to the "v-lookup" function in excel.
this is in ruby. Thanks!
Cryptic:
Hash[keys.zip(hash.values_at *keys)]
Or a bit longer, a bit less cryptic:
keys.map.with_object({}) {|key, memo| memo[key] = hash[key]}

Resources