Group by hash in Ruby - ruby

My goal is to take a hash of names and numbers, for example:
hash = {
"Matt" => 30,
"Dave" => 50,
"Alex" => 60
}
and to group them by whether they achieved a "passing" score. I'd like the results to be passed as an array into two separate keys, say :pass and :fail like this:
hash = { "pass" => ["Alex", 60], "fail" => [["Matt", 30]["Dave",60]]}
I know the group_by method is what I need, but am not sure as to how I would pass the values into the new keys.
The passing grade should be decided by the user. For this example, you could use 45.

You can do somthing this in such way:
PASSING_GRADE = 45
hash.group_by {|_, v| v >= PASSING_GRADE ? 'pass' : 'fail'}
Here is result:
{"fail"=>[["Matt", 30], "pass"=>[["Alex", 60], ["Dave", 50]]]}

You could simply do something like this:
sorted = {"pass" => [], "fail" => []}
hash.each do |name, grade|
if grade >= PASSING_GRADE
sorted["pass"] << [name, grade]
else
sorted["fail"] << [name, grade]
end
end

Here are two ways:
#1
a = hash.to_a
{ "pass" => a.select { |_,v| v > 50 }, "fail" => a.reject { |_,v| v > 50 } }
#=> {"pass"=>[["Alex", 60]], "fail"=>[["Matt", 30], ["Dave", 50]]}
#2
[:pass, :fail].zip(hash.to_a.partition { |_,v| v > 50 }).to_h
#=> {:pass=>[["Alex", 60]], :fail=>[["Matt", 30], ["Dave", 50]]}
These both give the return values as arrays of tuples, which you wanted for "fail" but not for "pass". Wouldn't that make it a pain to work with? Is it not better for both to be arrays of tuples?
Consider returning hashes for values:
{"pass" => hash.select { |_,v| v > 50 }, "fail" => hash.reject {|_,v| v > 50 }}
#=> {"pass"=>{"Alex"=>60}, "fail"=>{"Matt"=>30, "Dave"=>50}}
Would that not be more convenient?

Related

How to find the largest value of a hash in an array of hashes

In my array, I'm trying to retrieve the key with the largest value of "value_2", so in this case, "B":
myArray = [
"A" => {
"value_1" => 30,
"value_2" => 240
},
"B" => {
"value_1" => 40,
"value_2" => 250
},
"C" => {
"value_1" => 18,
"value_2" => 60
}
]
myArray.each do |array_hash|
array_hash.each do |key, value|
if value["value_2"] == array_hash.values.max
puts key
end
end
end
I get the error:
"comparison of Hash with Hash failed (ArgumentError)".
What am I missing?
Though equivalent, the array given in the question is generally written:
arr = [{ "A" => { "value_1" => 30, "value_2" => 240 } },
{ "B" => { "value_1" => 40, "value_2" => 250 } },
{ "C" => { "value_1" => 18, "value_2" => 60 } }]
We can find the desired key as follows:
arr.max_by { |h| h.values.first["value_2"] }.keys.first
#=> "B"
See Enumerable#max_by. The steps are:
g = arr.max_by { |h| h.values.first["value_2"] }
#=> {"B"=>{"value_1"=>40, "value_2"=>250}}
a = g.keys
#=> ["B"]
a.first
#=> "B"
In calculating g, for
h = arr[0]
#=> {"A"=>{"value_1"=>30, "value_2"=>240}}
the block calculation is
a = h.values
#=> [{"value_1"=>30, "value_2"=>240}]
b = a.first
#=> {"value_1"=>30, "value_2"=>240}
b["value_2"]
#=> 240
Suppose now arr is as follows:
arr << { "D" => { "value_1" => 23, "value_2" => 250 } }
#=> [{"A"=>{"value_1"=>30, "value_2"=>240}},
# {"B"=>{"value_1"=>40, "value_2"=>250}},
# {"C"=>{"value_1"=>18, "value_2"=>60}},
# {"D"=>{"value_1"=>23, "value_2"=>250}}]
and we wish to return an array of all keys for which the value of "value_2" is maximum (["B", "D"]). We can obtain that as follows.
max_val = arr.map { |h| h.values.first["value_2"] }.max
#=> 250
arr.select { |h| h.values.first["value_2"] == max_val }.flat_map(&:keys)
#=> ["B", "D"]
flat_map(&:keys) is shorthand for:
flat_map { |h| h.keys }
which returns the same array as:
map { |h| h.keys.first }
See Enumerable#flat_map.
Code
p myArray.pop.max_by{|k,v|v["value_2"]}.first
Output
"B"
I'd use:
my_array = [
"A" => {
"value_1" => 30,
"value_2" => 240
},
"B" => {
"value_1" => 40,
"value_2" => 250
},
"C" => {
"value_1" => 18,
"value_2" => 60
}
]
h = Hash[*my_array]
# => {"A"=>{"value_1"=>30, "value_2"=>240},
# "B"=>{"value_1"=>40, "value_2"=>250},
# "C"=>{"value_1"=>18, "value_2"=>60}}
k = h.max_by { |k, v| v['value_2'] }.first # => "B"
Hash[*my_array] takes the array of hashes and turns it into a single hash. Then max_by will iterate each key/value pair, returning an array containing the key value "B" and the sub-hash, making it easy to grab the key using first:
k = h.max_by { |k, v| v['value_2'] } # => ["B", {"value_1"=>40, "value_2"=>250}]
I guess the idea of your solution is looping through each hash element and compare the found minimum value with hash["value_2"].
But you are getting an error at
if value["value_2"] == array_hash.values.max
Because the array_hash.values is still a hash
{"A"=>{"value_1"=>30, "value_2"=>240}}.values.max
#=> {"value_1"=>30, "value_2"=>240}
It should be like this:
max = nil
max_key = ""
myArray.each do |array_hash|
array_hash.each do |key, value|
if max.nil? || value.values.max > max
max = value.values.max
max_key = key
end
end
end
# max_key #=> "B"
Another solution:
myArray.map{ |h| h.transform_values{ |v| v["value_2"] } }.max_by{ |k| k.values }.keys.first
You asked "What am I missing?".
I think you are missing a proper understanding of the data structures that you are using. I suggest that you try printing the data structures and take a careful look at the results.
The simplest way is p myArray which gives:
[{"A"=>{"value_1"=>30, "value_2"=>240}, "B"=>{"value_1"=>40, "value_2"=>250}, "C"=>{"value_1"=>18, "value_2"=>60}}]
You can get prettier results using pp:
require 'pp'
pp myArray
yields:
[{"A"=>{"value_1"=>30, "value_2"=>240},
"B"=>{"value_1"=>40, "value_2"=>250},
"C"=>{"value_1"=>18, "value_2"=>60}}]
This helps you to see that myArray has only one element, a Hash.
You could also look at the expression array_hash.values.max inside the loop:
myArray.each do |array_hash|
p array_hash.values
end
gives:
[{"value_1"=>30, "value_2"=>240}, {"value_1"=>40, "value_2"=>250}, {"value_1"=>18, "value_2"=>60}]
Not what you expected? :-)
Given this, what would you expect to be returned by array_hash.values.max in the above loop?
Use p and/or pp liberally in your ruby code to help understand what's going on.

Most performant way to group/summarise two hashes?

I have two hashes with some data that I need to aggregate. The first one is a mapping of which ids (id_1, id_2, id_3, id_4) belong under what category (a, b, c):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
The second hash holds values of how many events happened per id for a given date (date_1, date_2, date_3):
hash_2 = {
'id_1' => {'date_1' => 5, 'date_2' => 6, 'date_3' => 8},
'id_2' => {'date_1' => 0, 'date_3' => 6},
'id_3' => {'date_1' => 0, 'date_2' => nil, 'date_3' => 1},
'id_4' => {'date_1' => 10, 'date_2' => 1}
}
What I want is to get the total event per category (a,b,c). For the above example, the result would look something like:
hash_3 = {'a' => (5+6+8+0+6), 'b' => (0+0+1), 'c' => (10+1)}
My problem is, that there are about 5000 categories, each pointing to typically 1 to 3 ids, and each ID having event counts for 30 dates or more. So this takes quite a bit of computation. What will be the most performant (time effective) way to do this grouping in Ruby?
update
This is what I tried so far (took like 6-8 seconds!, horribly slow):
def total_clicks_per_category
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
end
def total_event_per_ids(ids)
ids.reduce(0) do |memo, id|
events = hash_2.fetch(id, {})
memo + (events.values.reduce(:+) || 0)
end
end
P.S. I’m using Ruby 2.3.
I'm writing this on a phone so I cannot test right now, but it looks OK.
g = hash_2.each_with_object({}) { |(k,v),g| g[k] = v.values.compact.sum }
hash_3 = hash_1.each_with_object({}) { |(k,v),h| h[k] = g.values_at(*v).sum }
First, create an intermediate hash that holds the sum of hash_2:
hash_4 = hash_2.map{|k, v| [k, v.values.inject(:+)]}.to_h
# => {"id_1"=>19, "id_2"=>6, "id_3"=>1, "id_4"=>11}
Then do the final summation:
hash_3 = hash_1.map{|k, v| [k, v.map{|k| hash_4[k]}.inject(:+)]}.to_h
# => {"a"=>25, "b"=>1, "c"=>11}
Theory
5000*3*30 isn't that many. Ruby probably will need a second at most for this kind of job.
Hash lookup is fast by default, you won't be able to optimize much.
You could pre-calculate hash_2_sum, though :
hash_2_sum = {
'id_1' => 5+6+8,
'id_2' => 0+6,
'id_3' => 0+0+1,
'id_4' => 10+1
}
A loop on hash1 with hash_2_sum lookup, and you're done.
Code
Your example has been updated with some nil values. You need to remove them with compact, and make sure the sum is 0 when no element is found with inject(0, :+):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
hash_2 = {
'id_1' => { 'date_1' => 5, 'date_2' => 6, 'date_3' => 8 },
'id_2' => { 'date_1' => 0, 'date_3' => 6 },
'id_3' => { 'date_1' => 0, 'date_2' => nil, 'date_3' => 1 },
'id_4' => { 'date_1' => 10, 'date_2' => 1 }
}
hash_2_sum = hash_2.each_with_object({}) do |(key, dates), sum|
sum[key] = dates.values.compact.inject(0, :+)
end
hash_3 = hash_1.each_with_object({}) do |(key, ids), sum|
sum[key] = hash_2_sum.values_at(*ids).inject(0, :+)
end
# {"a"=>25, "b"=>1, "c"=>11}
Note
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
isn't very readable IMHO.
You can either use each_with_object or Array#to_h :
result = [1, 2, 3].each_with_object({}) do |i, hash|
hash[i] = i * i
end
#=> {1=>1, 2=>4, 3=>9}
result = [1, 2, 3].map { |i| [i, i * i] }.to_h
#=> {1=>1, 2=>4, 3=>9}

How to get the next hash element from hash?

I have this hash:
HASH = {
'x' => { :amount => 0 },
'c' => { :amount => 5 },
'q' => { :amount => 10 },
'y' => { :amount => 20 },
'n' => { :amount => 50 }
}
How can I get the key with the next highest amount from the hash?
For example, if I supply x, it should return c. If there is no higher amount, then the key with the lowest amount should be returned. That means when I supply n, then x would be returned.
Can anybody help?
I'd use something like this:
def next_higher(key)
amount = HASH[key][:amount]
sorted = HASH.sort_by { |_, v| v[:amount] }
sorted.find(sorted.method(:first)) { |_, v| v[:amount] > amount }.first
end
next_higher "x" #=> "c"
next_higher "n" #=> "x"
I'd do something like this:
def find_next_by_amount(hash, key)
sorted = hash.sort_by { |_, v| v[:amount] }
index_of_next = sorted.index { |k, _| k == key }.next
sorted.fetch(index_of_next, sorted.first).first
end
find_next_by_amount(HASH, 'x')
# => "c"
find_next_by_amount(HASH, 'n')
# => "x"
Something like that:
def next(key)
amount = HASH[key][:amount]
kv_pairs = HASH.select{ |k, v| v[:amount] > amount }
result = kv_pairs.empty? ? HASH.first.first : kv_pairs.min_by{ |k, v| v}.first
end
I'm curious, why would you want something like that? Maybe there is better solution to underlying task.
EDIT: Realized that hash isn't necessary sorted by amount, adapted code for unsorted hashes.
One way:
A = HASH.sort_by { |_,h| h[:amount] }.map(&:first)
#=> ['x', 'c', 'q', 'y', 'n']
(If HASH's keys are already in the correct order, this is is just A = HASH.keys.)
def next_one(x)
A[(A.index(x)+1)%A.size]
end
next_one 'x' #=> 'c'
next_one 'q' #=> 'y'
next_one 'n' #=> 'x'
Alternatively, you could create a hash instead of a method:
e = A.cycle
#=> #<Enumerator: ["x", "c", "q", "y", "n"]:cycle>
g = A.size.times.with_object({}) { |_,g| g.update(e.next=>e.peek) }
#=> {"x"=>"c", "c"=>"q", "q"=>"y", "y"=>"n", "n"=>"x"}

Create array of objects from hash keys and values

I have a collection of product codes in an array: #codes. I then check to see how many instances of each product I have:
#popular = Hash.new(0)
#codes.each do |v|
#popular[v] += 1
end
This produces a hash like { code1 => 5, code2 => 12}. What I really need is a nice array of the form:
[ {:code => code1, :frequency => 5}, {:code => code2, :frequency => 12} ]
How do I build an array like that from the hashes I'm producing? Alternatively, is there a more direct route? The objects in question are ActiveModel objects with :code as an attribute. Thanks in advance!
#popular.map { |k, v| { code: k, frequency: v } }
This will produce an array of Hashes. If you need an array of models, replace the inner {...} with an appropriate constructor.
Change your code to
#codes.each_with_object([]) do
|code, a|
if h = a.find{|h| h[:code] == code}
h[:frequency] += 1
else
a.push(code: code, frequency: 0)
end
end
For speed:
#codes.group_by{|e| e}.map{|k, v| {code: k, frequency: v.length}}
Not the most efficient, but this is another way:
def counts(codes)
codes.uniq.map { |e| { code: e, frequency: codes.count(e) } }
end
codes = %w{code5 code12 code5 code3 code5 code12 code7}
#=> ["code5", "code12", "code5", "code3", "code5", "code12", "code7"]
counts(codes)
#=> [{:code=>"code5", :frequency=>3}, {:code=>"code12", :frequency=>2},
# {:code=>"code3", :frequency=>1}, {:code=>"code7" , :frequency=>1}]

Best way to merge key value pairs in a hash based on number of values for that key in Ruby

I have a hash of arrays in ruby as :
#people = { "a" => ["john", "mark", "tony"], "b"=> ["tom","tim"],
"c" =>["jane"], "others"=>["rob", "ryan"] }
I would like to merge all key value pairs where there are less than 3 items in the array for a particular keys values. They should be merged into the key called "others" to give roughly the result of
#people = { "a" => ["john", "mark", "tony"],
"others"=> ["rob", "ryan", "tom", "tim", "jane"] }
Using the following code is problematic as duplicate key values in a hash cannot exist:
#people = Hash[#people.map{|k,v| v.count<3 ? ["others",v] : [k,v]} ] %>
Whats the best way to elegantly solve this?
You almost have it, the problem is, as you notice, that you can't build the Hash's key/value pairs on the fly because of duplicates. One way around the problem is to start out with the skeleton of what you're trying to build:
#people = #people.each_with_object({ 'others' => [ ] }) do |(k,v), h|
if(v.length >= 3)
h[k] = v
else
h['others'] += v
end
end
Or, if you don't like each_with_object, you could:
h = { 'others' => [ ] }
#people.each do |k, v|
# as above
end
#people = h
Or you could use pretty much the same structure with inject (taking care, as usual, to return the right thing from the block).
There are certainly other ways to do this but these approaches are pretty clear and easy to understand; IMO clarity should be your first goal.
try:
>> #people = { "a" => ["john", "mark", "tony"], "b"=> ["tom","tim"],
"c" =>["jane"], "others"=>["rob", "ryan"] }
>> #new_people = {"others" => []}
>> #people.each_pair {|k,v| (v.size >= 3 && k!="others") ? #new_people.merge!(k=>v) : #new_people['others']+= v}
>> #new_people
=> {"others"=>["rob", "ryan", "jane", "tom", "tim"], "a"=>["john", "mark", "tony"]}
Hash[ #people.group_by { |k,v| v.size < 3 ? 'others' : k }.
map { |k,v| [k, v.flat_map(&:last)] } ]
=> {"a"=>["john", "mark", "tony"],
"others"=>["tom", "tim", "jane", "rob", "ryan"]}
What about this:
> less_than_three, others = #people.partition {|(key, values)| values.size >= 3 }
> Hash[less_than_three]
# => {"a"=>["john", "mark", "tony"]}
> Hash["others" => others.map {|o| o.last}.flatten]
# => {"others"=>["tom", "tim", "jane", "rob", "ryan"]}
#people[:others] = []
#people.each do |k, v|
#people[:others] |= #people.delete(k) if v.size < 3
end
#people.inject({}) do |m, (k, v)|
m[i = v.size >= 3 ? k : 'others'] = m[i].to_a + v
m
end

Resources