frequency of objects in an array using Ruby - ruby

If i had a list of balls each of which has a color property. how can i cleanly get the list of balls with the most frequent color.
[m1,m2,m3,m4]
say,
m1.color = blue
m2.color = blue
m3.color = red
m4.color = blue
[m1,m2,m4] is the list of balls with the most frequent color
My Approach is to do:
[m1,m2,m3,m4].group_by{|ball| ball.color}.each do |samecolor|
my_items = samecolor.count
end
where count is defined as
class Array
def count
k =Hash.new(0)
self.each{|x|k[x]+=1}
k
end
end
my_items will be a hash of frequencies foreach same color group. My implementation could be buggy and i feel there must be a better and more smarter way.
any ideas please?

You found group_by but missed max_by
max_color, max_balls = [m1,m2,m3,m4].group_by {|b| b.color}.max_by {|color, balls| balls.length}

Your code isn't bad, but it is inefficient. If I were you I would seek a solution that iterates through your array only once, like this:
balls = [m1, m2, m3, m4]
most_idx = nil
groups = balls.inject({}) do |hsh, ball|
hsh[ball.color] = [] if hsh[ball.color].nil?
hsh[ball.color] << ball
most_idx = ball.color if hsh[most_idx].nil? || hsh[ball.color].size > hsh[most_idx].size
hsh
end
groups[most_idx] # => [m1,m2,m4]
This does basically the same thing as group_by, but at the same time it counts up the groups and keeps a record of which group is largest (most_idx).

How about:
color,balls = [m1,m2,m3,m4].group_by { |b| b.color }.max_by(&:size)

Here's how I'd do it. The basic idea uses inject to accumulate the values into a hash, and comes from "12 - Building a Histogram" in "The Ruby Cookbook".
#!/usr/bin/env ruby
class M
attr_reader :color
def initialize(c)
#color = c
end
end
m1 = M.new('blue')
m2 = M.new('blue')
m3 = M.new('red')
m4 = M.new('blue')
hash = [m1.color, m2.color, m3.color, m4.color].inject(Hash.new(0)){ |h, x| h[x] += 1; h } # => {"blue"=>3, "red"=>1}
hash = [m1, m2, m3, m4].inject(Hash.new(0)){ |h, x| h[x.color] += 1; h } # => {"blue"=>3, "red"=>1}
There are two different ways to do it, depending on how much knowledge you want the inject() to know about your objects.

this produces a reverse sorted list of balls by frequency
balls.group_by { |b| b.color }
.map { |k, v| [k, v.size] }
.sort_by { |k, count| -count}

two parts, I'll use your strange balls example but will also include my own rails example
ary = [m1,m2,m3,m4]
colors = ary.each.map(&:color) #or ary.each.map {|t| t.color }
Hash[colors.group_by(&:w).map {|w, ws| [w, ws.length] }]
#=> {"blue" => 3, "red" => 1 }
my ActiveRecord example
stocks = Sp500Stock.all
Hash[stocks.group_by(&:sector).map {|w, s| [w, s.length] }].sort_by { |k,v| v }
#=> {"Health Care" => 36, etc]

myhash = {}
mylist.each do |ball|
if myhash[ball.color]
myhash[ball.color] += 1
else
myhash[ball.color] = 1
end
end
puts myhash.sort{|a,b| b[1] <=> a[1]}

Related

In ruby, Is it better to receive *array or to duplicate and array inside a method?

I was playing around with some implementations of Quicksort in Ruby. After implementing some of the inlace algorithms, I felt that using Ruby's partition method, even though it would not provide an in-place solution, it would provide a very nice readable solution.
My first solution was this, which other than always using the last element of the array as the pivot, seemed pretty nice.
def quick_sort3(ary)
return ary if ary.size <= 1
left,right = ary.partition { |v| v < ary.last }
pivot_value = right.pop
quick_sort3(left) + [pivot_value] + quick_sort3(right)
end
After some searching I found this answer which had a very similar solution with a better choice of the initial pivot, reproduced here using the same variable names and block passed to partition.
def quick_sort6(*ary)
return ary if ary.empty?
pivot_value = ary.delete_at(rand(ary.size))
left,right = ary.partition { |v| v < pivot_value }
return *quick_sort6(*left), pivot_value, *quick_sort6(*right)
end
I felt I could improve my solution by using the same method to select a random pivot.
def quick_sort4(ary)
return ary if ary.size <= 1
pivot_value = ary.delete_at(rand(ary.size))
left,right = ary.partition { |v| v < pivot_value }
quick_sort4(left) + [pivot_value] + quick_sort4(right)
end
The down side to this version quick_sort4 vs the linked answer quick_sort6, is that quick_sort4 changes the input array, while quick_sort6 does not. I am assuming this is why Jorg chose to receive the splat array vs array?
My fix for this was to simply duplicate the passed in array and then perform the delete_at on the copied array rather than the original array.
def quick_sort5(ary_in)
return ary_in if ary_in.size <= 1
ary = ary_in.dup
pivot_value = ary.delete_at(rand(ary.size))
left,right = ary.partition { |v| v < pivot_value }
quick_sort5(left) + [pivot_value] + quick_sort5(right)
end
My question is there any significant differences between quick_sort6 which uses the splats and quick_sort5 which uses dup? I am assuming the use of the splats was to avoid changing the input array, but is there something else I am missing?
In terms of performance, quick_sort6 is your best bet. Using some random data:
require 'benchmark'
def quick_sort3(ary)
return ary if ary.size <= 1
left,right = ary.partition { |v| v < ary.last }
pivot_value = right.pop
quick_sort3(left) + [pivot_value] + quick_sort3(right)
end
def quick_sort6(*ary)
return ary if ary.empty?
pivot_value = ary.delete_at(rand(ary.size))
left,right = ary.partition { |v| v < pivot_value }
return *quick_sort6(*left), pivot_value, *quick_sort6(*right)
end
def quick_sort4(ary)
return ary if ary.size <= 1
pivot_value = ary.delete_at(rand(ary.size))
left,right = ary.partition { |v| v < pivot_value }
quick_sort4(left) + [pivot_value] + quick_sort4(right)
end
def quick_sort5(ary_in)
return ary_in if ary_in.size <= 1
ary = ary_in.dup
pivot_value = ary.delete_at(rand(ary.size))
left,right = ary.partition { |v| v < pivot_value }
quick_sort5(left) + [pivot_value] + quick_sort5(right)
end
random_arrays = Array.new(5000) do
Array.new(500) { rand(1...500) }.uniq
end
Benchmark.bm do |benchmark|
benchmark.report("quick_sort3") do
random_arrays.each do |ra|
quick_sort3(ra.dup)
end
end
benchmark.report("quick_sort6") do
random_arrays.each do |ra|
quick_sort6(ra.dup)
end
end
benchmark.report("quick_sort4") do
random_arrays.each do |ra|
quick_sort4(ra.dup)
end
end
benchmark.report("quick_sort5") do
random_arrays.each do |ra|
quick_sort5(ra.dup)
end
end
end
Gives as result
user system total real
quick_sort3 1.389173 0.019380 1.408553 ( 1.411771)
quick_sort6 0.004399 0.000022 0.004421 ( 0.004487)
quick_sort4 1.208003 0.002573 1.210576 ( 1.214131)
quick_sort5 1.458327 0.000867 1.459194 ( 1.459882)
The problem with splat style in this case is that it would create an awkward API.
Most times the consumer code would have an array of things that need to be sorted:
stuff = [1, 2, 3]
sort(stuff)
The splat style makes the consumers do this instead:
stuff = [1, 2, 3]
sort(*stuff)
The two calls might end up doing the same thing, but as a user I am sorting an array, therefore I expect to pass the array to the method, not pass each array element individually to the method.
Another label for this phenomenon in abstraction leakage - you are allowing the implementation of the sort method define its interface. Usually in Ruby this is frowned upon.

How to sum values in an array with different hash

I want to sum the total values of the same items in an array.
I have an array as
[{"a"=>1},{"b"=>2},{"c"=>3},{"a"=>2},{"b"=>4}]
I want to get the result as
[{"a"=>3},{"b"=>6},{"c"=>3}]
Which method can do it?
if:
array = [{"a"=>1},{"b"=>2},{"c"=>3},{"a"=>2},{"b"=>4}]
then you can do:
array.inject(Hash.new{|h,k| h[k] = 0})
{ |h, a| k, v = a.flatten; h[k] += v; h }.
map{|arr| Hash[*arr] }
#=> [{"a"=>3}, {"b"=>6}, {"c"=>3}]
or:
array.each_with_object(Hash.new{|h,k| h[k] = 0})
{ |a, h| k, v = a.flatten; h[k] += v }.
map{|arr| Hash[*arr] }
#=> [{"a"=>3}, {"b"=>6}, {"c"=>3}]
It can be done as follows
array.group_by { |h| h.keys.first }.
values.
map {|x| x.reduce({}) { |h1, h2| h1.merge(h2) { |_, o, n| o + n } }
#=> [{"a"=>3}, {"b"=>6}, {"c"=>3}]
Every time you want to transform a collection in not a one-to-one way, it's job for #reduce. For one-to-one transformations we use #map.
array.reduce({}) { |h, acc| acc.merge(h) {|_k, o, n| o+n } }.zip.map(&:to_h)
# => [{"b"=>6}, {"a"=>3}, {"c"=>3}]
Here we use reduce with the initial value {}, which is passed to the block as the acc parameter, and then we use #merge with manual "conflicts resolution". It means that the block is called only when the key we're trying to merge is already present in the method receiver, acc. After that we break the hash into an array of hashes.
There are many ways to do this. It is instructive to see a few, even some that may be unusual and/or not especially efficient.
Here is another way:
arr = [{"a"=>1},{"b"=>2},{"c"=>3},{"a"=>2},{"b"=>4}]
arr.flat_map(&:keys)
.uniq
.map { |k| { k=>arr.reduce(0) { |t,g| t + (g.key?(k) ? g[k] : 0) } } }
#=> [{"a"=>3}, {"b"=>6}, {"c"=>3}]
Since nil.to_i => 0, we could instead write reduce's block as:
{ |t,g| t+g[k].to_i }

Creating Hash of Hash from an Array of Array

I have an array:
values = [["branding", "color", "blue"],
["cust_info", "customer_code", "some_customer"],
["branding", "text", "custom text"]]
I am having trouble tranforming it to hash as follow:
{
"branding" => {"color"=>"blue", "text"=>"custom text"},
"cust_info" => {"customer_code"=>"some customer"}
}
You can use default hash values to create something more legible than inject:
h = Hash.new {|hsh, key| hsh[key] = {}}
values.each {|a, b, c| h[a][b] = c}
Obviously, you should replace the h and a, b, c variables with your domain terms.
Bonus: If you find yourself needing to go N levels deep, check out autovivification:
fun = Hash.new { |h,k| h[k] = Hash.new(&h.default_proc) }
fun[:a][:b][:c][:d] = :e
# fun == {:a=>{:b=>{:c=>{:d=>:e}}}}
Or an overly-clever one-liner using each_with_object:
silly = values.each_with_object(Hash.new {|hsh, key| hsh[key] = {}}) {|(a, b, c), h| h[a][b] = c}
Here is an example using Enumerable#inject:
values = [["branding", "color", "blue"],
["cust_info", "customer_code", "some_customer"],
["branding", "text", "custom text"]]
# r is the value we are are "injecting" and v represents each
# value in turn from the enumerable; here we create
# a new hash which will be the result hash (res == r)
res = values.inject({}) do |r, v|
group, key, value = v # array decomposition
r[group] ||= {} # make sure group exists
r[group][key] = value # set key/value in group
r # return value for next iteration (same hash)
end
There are several different ways to write this; I think the above is relatively simple. See extracting from 2 dimensional array and creating a hash with array values for using a Hash (i.e. grouper) with "auto vivification".
Less elegant but easier to understand:
hash = {}
values.each do |value|
if hash[value[0]]
hash[value[0]][value[1]] = value[2]
else
hash[value[0]] = {value[1] => value[2]}
end
end
values.inject({}) { |m, (k1, k2, v)| m[k1] = { k2 => v }.merge m[k1] || {}; m }

Ruby array with an extra state

I'm trying to go through an array and add a second dimension for true and false values in ruby.
For example. I will be pushing on arrays to another array where it would be:
a = [[1,2,3,4],[5]]
I would like to go through each array inside of "a" and be able to mark a state of true or false for each individual value. Similar to a map from java.
Any ideas? Thanks.
You're better off starting with this:
a = [{ 1 => false, 2 => false, 3 => false, 4 => false }, { 5 => false }]
Then you can just flip the booleans as needed. Otherwise you will have to pollute your code with a bunch of tests to see if you have a Fixnum (1, 2, ...) or a Hash ({1 => true}) before you can test the flag's value.
Hashes in Ruby 1.9 are ordered so you wouldn't lose your ordering by switching to hashes.
You can convert your array to this form with one of these:
a = a.map { |x| Hash[x.zip([false] * x.length)] }
# or
a = a.map { |x| x.each_with_object({}) { |i,h| h[i] = false } }
And if using nil to mean "unvisited" makes more sense than starting with false then:
a = a.map { |x| Hash[x.zip([nil] * x.length)] }
# or
a = a.map { |x| x.each_with_object({}) { |i,h| h[i] = nil } }
Some useful references:
Hash[]
each_with_object
zip
Array *
If what you are trying to do is simply tag specific elements in the member arrays with boolean values, it is just a simple matter of doing the following:
current_value = a[i][j]
a[i][j] = [current_value, true_or_false]
For example if you have
a = [[1,2,3,4],[5]]
Then if you say
a[0][2] = [a[0,2],true]
then a becomes
a = [[1,2,[3,true],4],[5]]
You can roll this into a method
def tag_array_element(a, i, j, boolean_value)
a[i][j] = [a[i][j], boolean_value]
end
You might want to enhance this a little so you don't tag a specific element twice. :) To do so, just check if a[i][j] is already an array.
Change x % 2 == 0 for the actual operation you want for the mapping:
>> xss = [[1,2,3,4],[5]]
>> xss.map { |xs| xs.map { |x| {x => x % 2} } }
#=> [[{1=>false}, {2=>true}, {3=>false}, {4=>true}], [{5=>false}]]

Find most common string in an array

I have this array, for example (the size is variable):
x = ["1.111", "1.122", "1.250", "1.111"]
and I need to find the most commom value ("1.111" in this case).
Is there an easy way to do that?
Tks in advance!
EDIT #1: Thank you all for the answers!
EDIT #2: I've changed my accepted answer based on Z.E.D.'s information. Thank you all again!
Ruby < 2.2
#!/usr/bin/ruby1.8
def most_common_value(a)
a.group_by do |e|
e
end.values.max_by(&:size).first
end
x = ["1.111", "1.122", "1.250", "1.111"]
p most_common_value(x) # => "1.111"
Note: Enumberable.max_by is new with Ruby 1.9, but it has been backported to 1.8.7
Ruby >= 2.2
Ruby 2.2 introduces the Object#itself method, with which we can make the code more concise:
def most_common_value(a)
a.group_by(&:itself).values.max_by(&:size).first
end
As a monkey patch
Or as Enumerable#mode:
Enumerable.class_eval do
def mode
group_by do |e|
e
end.values.max_by(&:size).first
end
end
["1.111", "1.122", "1.250", "1.111"].mode
# => "1.111"
One pass through the hash to accumulate the counts. Use .max() to find the hash entry with the largest value.
#!/usr/bin/ruby
a = Hash.new(0)
["1.111", "1.122", "1.250", "1.111"].each { |num|
a[num] += 1
}
a.max{ |a,b| a[1] <=> b[1] } # => ["1.111", 2]
or, roll it all into one line:
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] } # => ["1.111", 2]
If you only want the item back add .first():
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first # => "1.111"
The first sample I used is how it would be done in Perl usually. The second is more Ruby-ish. Both work with older versions of Ruby. I wanted to compare them, plus see how Wayne's solution would speed things up so I tested with benchmark:
#!/usr/bin/env ruby
require 'benchmark'
ary = ["1.111", "1.122", "1.250", "1.111"] * 1000
def most_common_value(a)
a.group_by { |e| e }.values.max_by { |values| values.size }.first
end
n = 1000
Benchmark.bm(20) do |x|
x.report("Hash.new(0)") do
n.times do
a = Hash.new(0)
ary.each { |num| a[num] += 1 }
a.max{ |a,b| a[1] <=> b[1] }.first
end
end
x.report("inject:") do
n.times do
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first
end
end
x.report("most_common_value():") do
n.times do
most_common_value(ary)
end
end
end
Here's the results:
user system total real
Hash.new(0) 2.150000 0.000000 2.150000 ( 2.164180)
inject: 2.440000 0.010000 2.450000 ( 2.451466)
most_common_value(): 1.080000 0.000000 1.080000 ( 1.089784)
You could sort the array and then loop over it once. In the loop just keep track of the current item and the number of times it is seen. Once the list ends or the item changes, set max_count == count if count > max_count. And of course keep track of which item has the max_count.
You could create a hashmap that stores the array items as keys with their values being the number of times that element appears in the array.
Pseudo Code:
["1.111", "1.122", "1.250", "1.111"].each { |num|
count=your_hash_map.get(num)
if(item==nil)
hashmap.put(num,1)
else
hashmap.put(num,count+1)
}
As already mentioned, sorting might be faster.
Using the default value feature of hashes:
>> x = ["1.111", "1.122", "1.250", "1.111"]
>> h = Hash.new(0)
>> x.each{|i| h[i] += 1 }
>> h.max{|a,b| a[1] <=> b[1] }
["1.111", 2]
It will return most popular value in array
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]
IE:
x = ["1.111", "1.122", "1.250", "1.111"]
# Most popular
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]
#=> "1.111
# How many times
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[1].size
#=> 2

Resources