Ignoring nil parameters in array select - ruby

Example
The user has made a search and has specified an age but not a number of children, i.e. wants to find people with any number of children, including zero.
array = []
obj1 = {:name => "Steve", :age => 32, :children => 2}
obj2 = {:name => "Dave", :age => 37, :children => 4}
obj3 = {:name => "Barry", :age => 40, :children => 0}
array << obj1
array << obj2
array << obj3
puts array
## replicating incoming search parameters
params = {}
params[:age] = 35
params[:children] = nil
matching_params = array.select{|person| person[:age] > params[:age] && person[:children] > params[:children]}
If I run this code, I will get an error that it can't compare a number with nil.
Workaround
If I change the children part to && person[:children] > (params[:children] || -1) then this will show all people over 35. For this situation, it does what's required.
Problem
Imagine, however, a world where it's possible to have a negative number of children...I would have to change -1 to minus infinity. Is there some way to just exclude any search criteria which have nil values?

Explicitly filter in nils.
array = [
{:name => "Steve", :age => 32, :children => 2},
{:name => "Dave", :age => 37, :children => 4},
{:name => "Barry", :age => 40, :children => 0}
]
params = {age: 35, children: nil}
matching_params =
array.select do |person|
(params[:age].nil? || person[:age] > params[:age]) &&
(params[:children].nil? || person[:children] > params[:children])
end

Code
def select_by_age_and_children(arr, params)
arr.select do |p|
p[:age] > (params[:age] || -1) &&
p[:children] > (params[:children] || -1)
end
end
Examples
array = [
{:name => "Steve", :age => 32, :children => 2},
{:name => "Dave", :age => 37, :children => 4},
{:name => "Barry", :age => 40, :children => 0}
]
params = { age: 35, children: 1 }
select_by_age_and_children(array, params)
#=> [{:name=>"Dave", :age=>37, :children=>4}]
params = { age: 35, children: nil }
select_by_age_and_children(array, params)
#=> [{:name=>"Dave", :age=>37, :children=>4},
# {:name=>"Barry", :age=>40, :children=>0}]
params = { age: 0, children: 1 }
select_by_age_and_children(array, params)
#=> [{:name=>"Steve", :age=>32, :children=>2},
# {:name=>"Dave", :age=>37, :children=>4}]
params = { age: nil, children: nil }
select_by_age_and_children(array, params)
#=> [{:name=>"Steve", :age=>32, :children=>2},
# {:name=>"Dave", :age=>37, :children=>4},
# {:name=>"Barry", :age=>40, :children=>0}]
Alternative design
Consider writing the method as follows:
def select_by_age_and_children(arr, params)
arr.select { p[:age] >= params[:age] && p[:children] >= params[:children] }
end
and then instead of calling it with, say:
params = { age: 35, children: nil }
call it with:
params = { age: 36, children: 0 }
As well as simplifying the code this avoids the need for the reader to understand what is meant by a nil value in params.

Related

How to find the largest value of a hash in an array of hashes

In my array, I'm trying to retrieve the key with the largest value of "value_2", so in this case, "B":
myArray = [
"A" => {
"value_1" => 30,
"value_2" => 240
},
"B" => {
"value_1" => 40,
"value_2" => 250
},
"C" => {
"value_1" => 18,
"value_2" => 60
}
]
myArray.each do |array_hash|
array_hash.each do |key, value|
if value["value_2"] == array_hash.values.max
puts key
end
end
end
I get the error:
"comparison of Hash with Hash failed (ArgumentError)".
What am I missing?
Though equivalent, the array given in the question is generally written:
arr = [{ "A" => { "value_1" => 30, "value_2" => 240 } },
{ "B" => { "value_1" => 40, "value_2" => 250 } },
{ "C" => { "value_1" => 18, "value_2" => 60 } }]
We can find the desired key as follows:
arr.max_by { |h| h.values.first["value_2"] }.keys.first
#=> "B"
See Enumerable#max_by. The steps are:
g = arr.max_by { |h| h.values.first["value_2"] }
#=> {"B"=>{"value_1"=>40, "value_2"=>250}}
a = g.keys
#=> ["B"]
a.first
#=> "B"
In calculating g, for
h = arr[0]
#=> {"A"=>{"value_1"=>30, "value_2"=>240}}
the block calculation is
a = h.values
#=> [{"value_1"=>30, "value_2"=>240}]
b = a.first
#=> {"value_1"=>30, "value_2"=>240}
b["value_2"]
#=> 240
Suppose now arr is as follows:
arr << { "D" => { "value_1" => 23, "value_2" => 250 } }
#=> [{"A"=>{"value_1"=>30, "value_2"=>240}},
# {"B"=>{"value_1"=>40, "value_2"=>250}},
# {"C"=>{"value_1"=>18, "value_2"=>60}},
# {"D"=>{"value_1"=>23, "value_2"=>250}}]
and we wish to return an array of all keys for which the value of "value_2" is maximum (["B", "D"]). We can obtain that as follows.
max_val = arr.map { |h| h.values.first["value_2"] }.max
#=> 250
arr.select { |h| h.values.first["value_2"] == max_val }.flat_map(&:keys)
#=> ["B", "D"]
flat_map(&:keys) is shorthand for:
flat_map { |h| h.keys }
which returns the same array as:
map { |h| h.keys.first }
See Enumerable#flat_map.
Code
p myArray.pop.max_by{|k,v|v["value_2"]}.first
Output
"B"
I'd use:
my_array = [
"A" => {
"value_1" => 30,
"value_2" => 240
},
"B" => {
"value_1" => 40,
"value_2" => 250
},
"C" => {
"value_1" => 18,
"value_2" => 60
}
]
h = Hash[*my_array]
# => {"A"=>{"value_1"=>30, "value_2"=>240},
# "B"=>{"value_1"=>40, "value_2"=>250},
# "C"=>{"value_1"=>18, "value_2"=>60}}
k = h.max_by { |k, v| v['value_2'] }.first # => "B"
Hash[*my_array] takes the array of hashes and turns it into a single hash. Then max_by will iterate each key/value pair, returning an array containing the key value "B" and the sub-hash, making it easy to grab the key using first:
k = h.max_by { |k, v| v['value_2'] } # => ["B", {"value_1"=>40, "value_2"=>250}]
I guess the idea of your solution is looping through each hash element and compare the found minimum value with hash["value_2"].
But you are getting an error at
if value["value_2"] == array_hash.values.max
Because the array_hash.values is still a hash
{"A"=>{"value_1"=>30, "value_2"=>240}}.values.max
#=> {"value_1"=>30, "value_2"=>240}
It should be like this:
max = nil
max_key = ""
myArray.each do |array_hash|
array_hash.each do |key, value|
if max.nil? || value.values.max > max
max = value.values.max
max_key = key
end
end
end
# max_key #=> "B"
Another solution:
myArray.map{ |h| h.transform_values{ |v| v["value_2"] } }.max_by{ |k| k.values }.keys.first
You asked "What am I missing?".
I think you are missing a proper understanding of the data structures that you are using. I suggest that you try printing the data structures and take a careful look at the results.
The simplest way is p myArray which gives:
[{"A"=>{"value_1"=>30, "value_2"=>240}, "B"=>{"value_1"=>40, "value_2"=>250}, "C"=>{"value_1"=>18, "value_2"=>60}}]
You can get prettier results using pp:
require 'pp'
pp myArray
yields:
[{"A"=>{"value_1"=>30, "value_2"=>240},
"B"=>{"value_1"=>40, "value_2"=>250},
"C"=>{"value_1"=>18, "value_2"=>60}}]
This helps you to see that myArray has only one element, a Hash.
You could also look at the expression array_hash.values.max inside the loop:
myArray.each do |array_hash|
p array_hash.values
end
gives:
[{"value_1"=>30, "value_2"=>240}, {"value_1"=>40, "value_2"=>250}, {"value_1"=>18, "value_2"=>60}]
Not what you expected? :-)
Given this, what would you expect to be returned by array_hash.values.max in the above loop?
Use p and/or pp liberally in your ruby code to help understand what's going on.

Most performant way to group/summarise two hashes?

I have two hashes with some data that I need to aggregate. The first one is a mapping of which ids (id_1, id_2, id_3, id_4) belong under what category (a, b, c):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
The second hash holds values of how many events happened per id for a given date (date_1, date_2, date_3):
hash_2 = {
'id_1' => {'date_1' => 5, 'date_2' => 6, 'date_3' => 8},
'id_2' => {'date_1' => 0, 'date_3' => 6},
'id_3' => {'date_1' => 0, 'date_2' => nil, 'date_3' => 1},
'id_4' => {'date_1' => 10, 'date_2' => 1}
}
What I want is to get the total event per category (a,b,c). For the above example, the result would look something like:
hash_3 = {'a' => (5+6+8+0+6), 'b' => (0+0+1), 'c' => (10+1)}
My problem is, that there are about 5000 categories, each pointing to typically 1 to 3 ids, and each ID having event counts for 30 dates or more. So this takes quite a bit of computation. What will be the most performant (time effective) way to do this grouping in Ruby?
update
This is what I tried so far (took like 6-8 seconds!, horribly slow):
def total_clicks_per_category
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
end
def total_event_per_ids(ids)
ids.reduce(0) do |memo, id|
events = hash_2.fetch(id, {})
memo + (events.values.reduce(:+) || 0)
end
end
P.S. I’m using Ruby 2.3.
I'm writing this on a phone so I cannot test right now, but it looks OK.
g = hash_2.each_with_object({}) { |(k,v),g| g[k] = v.values.compact.sum }
hash_3 = hash_1.each_with_object({}) { |(k,v),h| h[k] = g.values_at(*v).sum }
First, create an intermediate hash that holds the sum of hash_2:
hash_4 = hash_2.map{|k, v| [k, v.values.inject(:+)]}.to_h
# => {"id_1"=>19, "id_2"=>6, "id_3"=>1, "id_4"=>11}
Then do the final summation:
hash_3 = hash_1.map{|k, v| [k, v.map{|k| hash_4[k]}.inject(:+)]}.to_h
# => {"a"=>25, "b"=>1, "c"=>11}
Theory
5000*3*30 isn't that many. Ruby probably will need a second at most for this kind of job.
Hash lookup is fast by default, you won't be able to optimize much.
You could pre-calculate hash_2_sum, though :
hash_2_sum = {
'id_1' => 5+6+8,
'id_2' => 0+6,
'id_3' => 0+0+1,
'id_4' => 10+1
}
A loop on hash1 with hash_2_sum lookup, and you're done.
Code
Your example has been updated with some nil values. You need to remove them with compact, and make sure the sum is 0 when no element is found with inject(0, :+):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
hash_2 = {
'id_1' => { 'date_1' => 5, 'date_2' => 6, 'date_3' => 8 },
'id_2' => { 'date_1' => 0, 'date_3' => 6 },
'id_3' => { 'date_1' => 0, 'date_2' => nil, 'date_3' => 1 },
'id_4' => { 'date_1' => 10, 'date_2' => 1 }
}
hash_2_sum = hash_2.each_with_object({}) do |(key, dates), sum|
sum[key] = dates.values.compact.inject(0, :+)
end
hash_3 = hash_1.each_with_object({}) do |(key, ids), sum|
sum[key] = hash_2_sum.values_at(*ids).inject(0, :+)
end
# {"a"=>25, "b"=>1, "c"=>11}
Note
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
isn't very readable IMHO.
You can either use each_with_object or Array#to_h :
result = [1, 2, 3].each_with_object({}) do |i, hash|
hash[i] = i * i
end
#=> {1=>1, 2=>4, 3=>9}
result = [1, 2, 3].map { |i| [i, i * i] }.to_h
#=> {1=>1, 2=>4, 3=>9}

Averaging values across multiple hashes

EDIT I am accepting #CarySwoveland's answer because he got the closest on the first try, accounting for the most scenarios, and outputting the data into a hash so that you don't need to rely on order. Many honerable mentions though! Be sure to check out #ArupRakshit's answer as well if you want your output in an array!
I have an array of hashes like:
#my_hashes = [{"key1" => "10", "key2" => "5"...},{"key1" => "", "key2" => "9"...},{"key1" => "6", "key2" => "4"...}]
and I want an average for each key across the array. ie. 8.0,6.0...
Note that the hashes all have the exact same keys, in order, even if the value for the key is blank. Right now this works:
<%= #my_hashes[0].keys.each do |key| %>
<% sum = 0 %>
<% count = 0 %>
<% #my_hashes.each do |hash| %>
<% sum += hash[key].to_f %>
<% count += if hash[key].blank? then 0 else 1 end %>
<% end %>
<%= (sum/count) %>
<% end %>
but I feel like there may be a better way... any thoughts?
Do as below
#my_hashes = [{"key1" => "10", "key2" => "5"},{"key1" => "", "key2" => "9"},{"key1" => "6", "key2" => "4"}]
ar = #my_hashes[0].keys.map do |k|
a = #my_hashes.map { |h| h[k].to_f unless h[k].blank? }.compact
a.inject(:+)/a.size unless a.empty? #Accounting for "key1" => nil or "key1" => ""
end
ar # => [8, 6]
Another way:
#my_hashes = [ {"key1"=>"10", "key2"=>"5"},
{"key1"=> "", "key2"=>"9"},
{"key1"=> "6", "key2"=>"4"} ]
def avg(arr) arr.any? ? arr.reduce(:+)/arr.size.to_f : 0.0 end
(#my_hashes.each_with_object ( Hash.new { |h,k| h[k]=[] } ) {
|mh,h| mh.keys.each { |k| h[k] << mh[k].to_f unless mh[k].empty? } })
.each_with_object({}) { |(k,v),h| h[k] = avg(v) }
# => {"key1"=>8.0, "key2"=>6.0}
The object created by the first each_with_object is a hash whose default value is an empty array. That hash is represented by the block variable h. This means that if h[k] << mh[k].to_f is to be executed when h.key?(k) => false, h[k] = [] is executed first.
One could alternatively drop the avg method and create a temporary variable before computing the averages:
h = #my_hashes.each_with_object ( Hash.new { |h,k| h[k]=[] } ) { |mh,h|
mh.keys.each { |k| h[k] << mh[k].to_f unless mh[k].empty? } }
h.each_with_object({}) { |(k,v),h|
h[k] = ( avg(v) arr.any? ? arr.reduce(:+)/arr.size.to_f : 0.0 }
I think I found a quite elegant solution.
Here is a sample array:
a = [
{:a => 2, :b => 10},
{:a => 4, :b => 20},
{:a => 2, :b => 10},
{:a => 8, :b => 40},
]
And the solution:
class Array
def average
self.reduce(&:+) / self.size
end
end
r = a[0].keys.map do |key|
[key, a.map { |hash| hash[key] }.average]
end
puts Hash[*r.flatten]
Try this
#my_hashes = [{"key1" => "10", "key2" => "5"},{"key1" => "", "key2" => "9"},{"key1" => "6", "key2" => "4"}]
average_values = #my_hashes.map(&:values).transpose.map { |arr|
arr.map(&:to_f).inject(:+) / arr.size
}
with_keys = Hash[#my_hashes.first.keys.zip(average_values)]
average_values # => [5.333333333333333, 6.0]
with_keys # => {"key1"=>5.333333333333333, "key2"=>6.0}
if you want to exclude empty values from the average, could change average_values to reject empty values
average_values = #my_hashes.map(&:values).transpose.map { |arr|
arr.reject!(&:empty?)
arr.map(&:to_f).inject(:+) / arr.size
}
average_values # => [8.0, 6.0]
No super clean solution, but I would write:
a = [
{:a => 2, :b => 10},
{:a => 4, :b => 20},
{:a => 2, :b => 10},
{:a => 8, :b => 40},
]
grouped = a.flat_map(&:to_a).group_by{|x,|x}
grouped.keys.each do |key|
len = grouped[key].size
grouped[key] = 1.0 * grouped[key].map(&:last).inject(:+) / len
end

How do I create a diff of hashes with a correction factor?

I want to compare hashes inside an array:
h_array = [
{:name => "John", :age => 23, :eye_color => "blue"},
{:name => "John", :age => 22, :eye_color => "green"},
{:name => "John", :age => 22, :eye_color => "black"}
]
get_diff(h_array, correct_factor = 2)
# should return [{:eye_color => "blue"}, {:eye_color => "green"}, {:eye_color => "black"}]
get_diff(h_array, correct_factor = 3)
# should return
# [[{:age => 23}, {:age => 22}, {:age => 22}],
# [{:eye_color => "blue"}, {:eye_color => "green"}, {:eye_color => "black"}]]
I want to diff the hashes contained in the h_array. It looks like a recursive call/method because the h_array can have multiple hashes but with the same number of keys and values. How can I implement the get_diff method?
def get_diff h_array, correct_factor
h_array.first.keys.reject{|k|
h_array.map{|h| h[k]}.sort.chunk{|e| e}.map{|_,e| e.size}.max >= correct_factor
}.map{|k|
h_array.map{|hash| hash.select{|key,_| k == key}}
}
end
class Array
def find_ndups # also returns the number of items
uniq.map { |v| diff = (self.size - (self-[v]).size); (diff > 1) ? [v, diff] : nil}.compact
end
end
h_array = [
{:name => "John", :age => 22, :eye_color => "blue", :hair => "black"},
{:name => "John", :age => 33, :eye_color => "orange", :hair => "green"},
{:name => "John", :age => 22, :eye_color => "black", :hair => "yello"}
]
def get_diff(h_array, correct_factor)
temp = h_array.inject([]){|result, element| result << element.to_a}
master_array = []
unmatched_arr = []
matched_arr = []
temp = temp.transpose
temp.each_with_index do |arr, index|
ee = arr.find_ndups
if ee.length == 0
unmatched_arr << temp[index].inject([]){|result, arr| result << {arr.first => arr.last} }
elsif ee.length > 0 && ee[0][1] != correct_factor && ee[0][1] < correct_factor
return_arr << temp[index].inject([]){|result, arr| result << {arr.first => arr.last} }
elsif ee[0][1] = correct_factor
matched_arr << temp[index].inject([]){|result, arr| result << {arr.first => arr.last} }
end
end
return [matched_arr, unmatched_arr]
end
puts get_diff(h_array, 2).inspect
hope it helps
found this ActiveSupport::CoreExtensions::Hash::Diff module.
ActiveSupport 2.3.2 and 2.3.4 has a built in Hash::Diff module which returns a hash that represents the difference between two hashes.

How do I add values from two different arrays of hashes together?

I have two arrays of hashes. The keys for the hashes are different:
player_scores1 = [{:first_name=>"Bruce", :score => 43, :time => 50},
{:first_name=>"Clark", :score => 45, :minutes => 20}]
player_scores2 = [{:last_name=>"Wayne", :points => 13, :time => 40},
{:last_name=>"Kent", :points => 3, :minutes => 20}]
I'd like to create a new array of hashes which adds up :score and :points together and assign it to a key called :score. I'd also like to combine the :first_name and :last_name and assign it to a key called :full_name. I want to discard any other keys.
This would result in this array:
all_players = [{:full_name => "Bruce Wayne", :score => 56},
{:full_name => "Clark Kent", :score => 48}]
Is there an elegant way to do this?
Something like this:
player_scores1.zip(player_scores2).map { |a,b|
{
:full_name => a[:first_name]+' '+b[:last_name],
:score => a[:score]+b[:points]
}
}
The code you're looking for is:
final = []
player_scores1.each_index do |index|
entry_1 = player_scores1.values(index)
entry_2 = player_scores2.values(index)[:first_name]
score = entry_1[:score] + entry_2[:points]
final << {:full_name => "#{entry_1[:first_name]} #{entry_2[:last_name]}", :score => score }
end
Any suggestions on tightening this up would be much appreciated!
This works. I don't if that's elegant enough though.
player_scores1 = [{:first_name=>"Bruce", :score => 43, :time => 50},
{:first_name=>"Clark", :score => 45, :minutes => 20}]
player_scores2 = [{:last_name=>"Wayne", :points => 13, :time => 40},
{:last_name=>"Kent", :points => 3, :minutes => 20}]
p (0...[player_scores1.length, player_scores2.length].min).map {|i| {
:full_name => player_scores1[i][:first_name] + " " + player_scores2[i][:last_name],
:score => player_scores1[i][:score] + player_scores2[i][:points]
}}
This example on Codepad.
This uses zip with a block to loop over the hashes, joining the names and summarizing:
all_players = []
player_scores1.zip(player_scores2) { |a, b|
all_players << {
:full_name => a[:first_name] + ' ' + b[:last_name],
:score => a[:score] + b[:points]
}
}
all_players # => [{:full_name=>"Bruce Wayne", :score=>56}, {:full_name=>"Clark Kent", :score=>48}]

Resources