Finding the mode of an array in Ruby - ruby

When creating a method to find the mode of an array, I see people iterating over the array through a hash with default value 0:
def mode(array)
hash = Hash.new(0)
array.each do |i|
hash[i]+=1
end
end
or
freq = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
Can someone explain the following part of the block?
hash[i] = hash[i] + 1 or h[v] = h[v] + 1
How does the iterator know to add +1 to each unique key of the hash? For example:
array = [1,1,1,2,3]
freq = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
#=> {1:3, 2:1, 3:1}
If someone can explain how to find the mode of an array, I would be grateful.

In you first example, you need the method to return the hash that is created, or do some manipulation of the hash to compute the mode. Let's try it, just returning the hash (so I've added hash as the last line):
def hash_for_mode(array)
hash = Hash.new(0)
array.each do |i|
hash[i]+=1
end
hash
end
array = [1,3,1,4,3]
hash_for_mode(array) #=> {1=>2, 3=>2, 4=>1}
With hash_for_mode you can easily compute the mode.
By defining the hash h = Hash.new(0), we are telling Ruby that the default value is zero. By that, we mean that if a calculation is performed that depends on h[k] when k is not a key of h, h[k] will be set equal to the default value.
Consider, for example, when the first value of array (1 in my example) is passed into the block and assigned to the block variable i. hash does not have a key 1. (It has no keys yet.) hash[1] += 1 is shorthand for hash[1] = hash[1] + 1, so Ruby will replace hash[1] on the right side of the equality with the default value, zero, resulting in hash[1] => 1.
When the third value of array (another 1) is passed into the block, hash[1] already exists (and equals 1) so we just add one to give it a new value 2.
In case you were wondering, if we have:
hash = Hash.new(0)
hash[1] += 1
hash #=> {1=>1}
puts hash[2] #=> nil
hash #=> {1=>1}
That is, merely referencing a key that is not in the hash (here puts hash[2]), does not add a key-value pair to the hash.
Another common way to do the same thing is:
def hash_for_mode(array)
array.each_with_object({}) { |i,hash| hash[i] = (hash[i] || 0) + 1 }
end
hash_for_mode(array) #=> {1=>2, 3=>2, 4=>1}
This relies on the fact that:
(hash[i] || 0) #=> hash[i] if hash already has a key i
(hash[i] || 0) #=> 0 if hash does not have a key i, so hash[k]=>nil
(This requires that your hash does not contain any pairs k=>nil.)
Also, notice that rather than having the first statement:
hash = {}
and the last statement:
hash
I've used the method Enumerable#each_with_object, which returns the value of the hash. This is preferred here to using Enumerable#inject (a.k.a reduce) because you don't need to return hash to the iterator (no ; h needed).

array = [1,3,1,4,3]
array.group_by(&:itself).transform_values(&:count)
# => {1=>2, 3=>2, 4=>1}

Related

Selecting certain keys in a Hash

I have a Hash h and want to have an Array of those keys, where their values fulfil a certain condition. My naive approach is:
result = h.select { |_, val| val.fulfils_condition? }.keys
This works, of course, but looks unnecessarily inefficient to me, as it requires an intermediate Hash to be constructed, from which then the result Array is calculated.
Of course I could do an explicit loop:
result = []
h.each do
|key, val|
result << key if val.fulfils_condition?
end
but this I consider ugly due to the explicit handling of result. I also was contemplating this one:
result = h.reduce([]) do
|memo, pair|
memo << pair.first if pair.last.fulfils_condition?
memo
end
but this is not really more readable, and requires the construction of the intermediate pair arrays, each holding a key-value-pair.
Is there an alternative approach, which is compact, and does not need to calculate a temporary Hash?
Given:
h = {}; (1..100).each {|v| h[v.to_s] = v }
You can use the memory_profiler gem to measure allocations. Use something like MemoryProfiler.report { code }.total_allocated.
If memory allocations are really at a premium here, your approach of preallocating the result and then enumerating with #each is what you want. The reason for this is that Ruby optimizes Hash#each for blocks with an arity of 2, so that a new array isn't constructed per loop. The only allocation in this approach is the results array.
MemoryProfiler.report { r = [] ; h.each {|k, v| r << k if v > 1 } }.total_allocated
# => 1
Using #reduce, OTOH, results in an allocation per loop because you break the arity rule:
MemoryProfiler.report { h.reduce([]) {|agg, (k, v)| agg << k if v > 1 ; agg } }.total_allocated
# => 101
If you want something more "self-contained" and are willing to sacrifice an extra allocation, you'll want to use #each_key (which does create an intermediate array of keys) and then index into the hash to test each value.
h.each_key.select {|k| h[k] > 1 }
Simple is best I think. How about this?
h.keys.select { |k| k.meets_condition }
I love #ChrisHeald solution
h.each_key.filter_map { |k| k if condition(h[k]) }
I would have gone with a more verbose
h.each_with_object([]) { |(k, v), arr| arr < k if condition(v) }
or even
h.map { |k, v| k if condition(v) }.compact
which are clearly constructing more than needed, but are still quite clear.
Riffing off Peter Camilleri's proposal, I think the following does what you want:
h = {:a => 1, :b => -1, :c => 2, :d => -2}
h.keys.select { |k| h[k] < 0 } # [:b, :d]

Ruby - Return duplicates in an array using hashes, is this efficient?

I have solved the problem using normal loops and now using hashes, however I am not confident I used the hashes as well as I could have. Here is my code:
# 1-100 whats duplicated
def whats_duplicated?(array)
temp = Hash.new
output = Hash.new
# Write the input array numbers to a hash table and count them
array.each do |element|
if temp[element] >= 1
temp[element] += 1
else
temp[element] = 1
end
end
# Another hash, of only the numbers who appeared 2 or more times
temp.each do |hash, count|
if count > 1
output[hash] = count
end
end
# Return our sorted and formatted list as a string for screen
output.sort.inspect
end
### Main
# array_1 is an array 1-100 with duplicate numbers
array_1 = []
for i in 0..99
array_1[i] = i+1
end
# seed 10 random indexes which will likely be duplicates
for i in 0..9
array_1[rand(0..99)] = rand(1..100)
end
# print to screen the duplicated numbers & their count
puts whats_duplicated?(array_1)
My question is really what to improve? This is a learning excercise for myself, I am practising some of the typical brain-teasers you may get in an interview and while I can do this easily using loops, I want to learn an efficient use of hashes. I re-did the problem using hashes hoping for efficiency but looking at my code I think it isn't the best it could be. Thanks to anyone who takes an interest in this!
The easiest way to find duplicates in ruby, is to group the elements, and then count how many are in each group:
def whats_duplicated?(array)
array.group_by { |x| x }.select { |_, xs| xs.length > 1 }.keys
end
whats_duplicated?([1,2,3,3,4,5,3,2])
# => [2, 3]
def whats_duplicated?(array)
array.each_with_object(Hash.new(0)) { |val, hsh| hsh[val] += 1 }.select { |k,v| v > 1 }.keys
end
I would do it this way:
def duplicates(array)
counts = Hash.new { |h,k| h[k] = 0 }
array.each do |number|
counts[number] += 1
end
counts.select { |k,v| v > 1 }.keys
end
array = [1,2,3,4,4,5,6,6,7,8,8,9]
puts duplicates(array)
# => [4,6,8]
Some comments about your code: The block if temp[element] == 1 seems not correct. I think that will fail if a number occurs three or more times in the array. You should at least fix it to:
if temp[element] # check if element exists in hash
temp[element] += 1 # if it does increment
else
temp[element] = 1 # otherwise init hash at that position with `1`
end
Furthermore I recommend not to use the for x in foo syntax. Use foo.each do |x| instead. Hint: I like to ask in interviews about the difference between both versions.

Hashes vs Lambdas

I found two examples that looked close to each other for finding Fibonacci numbers:
Lambda
fibonacci = ->(x){ x < 2 ? x : fibonacci[x-1] + fibonacci[x-2] }
fibonacci[6] # => 8
Hash
fibonacci = Hash.new{ |h,x| h[x] = x < 2 ? x : h[x-1] + h[x-2] }
fibonacci[6] # => 8
I used both hashes and lambdas in ruby before, but not like this. This is more of a way of storing a function:
if x < 2
x
else
fibonacci[x-1] + fibonacci[x-2]
Can you explain in detail how this is working? Is this using recursion?
What are the differences between hashes like this and lambdas?
Yes it is using recursion. If we look at the code in the {}-brackets we can figure out the answer. Let's start looking at the hash. The values after new keyword is the default value. A value that will be assigned if the value does not already exist in the hash.
hash = Hash.new
p hash['new_value'] #=> nil
default_value_hash = Hash.new(0)
puts default_value_hash['new_value'] #=> 0
hash_with_block = Hash.new{|h,x| x}
puts hash_with_block['new_value'] #=> 'new_value'
So when we declare
fibonacci = Hash.new{ |h,x| h[x] = x < 2 ? x : h[x-1] + h[x-2] }
we are basically saying - Create a new hash with a default value. If we ask for a number (x) smaller or equal to two, just return the input (x). Else, give us the sum of the dictionary values where the key is x-1 and x-2. Basically the Fibonacci algorithm. If x-1 and x-2 does not exist, it runs the same code again until the two basic input values are 1 and 2.
The difference between the two approaches is that the hash saves the values (in a hash...). This can be a huge advantage in some cases. Every time the lambda is called it needs to recalculate the values for all numbers below the called value.
# Let's create a counter to keep track of the number of time the lambda is called.
# Please do not use global variables in real code. I am just lazy here.
#lambda_counter = 0
fibonacci_lambda = ->(x){
#lambda_counter += 1
x < 2 ? x : fibonacci_lambda[x-1] + fibonacci_lambda[x-2]
}
p (1..20).map{|x| fibonacci_lambda[x]}
# => [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
p #lambda_counter # => 57290
# lambda called 57290 times!
#hash_counter = 0
fibonacci_hash = Hash.new{ |h,x|
#hash_counter += 1
h[x] = x < 2 ? x : h[x-1] + h[x-2]
}
p (1..20).map{|x| fibonacci_hash[x]}
# => [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
p #hash_counter # => 21
# Only called 21 times!
The reason for the big difference in calls is the nature of recursion. The lambda does not store its values and when the value for 10 is calculated it recalculates the value for 3 more than 20 times. In the hash this value can be stored and saved for later.
In the first case, you are defining a recursion which will be called recursively.
In the case of the hash, the values will also be computed recursively, but stored and then access for giving the result.
Lambda
fibonacci = ->(x){ x < 2 ? x : fibonacci[x-1] + fibonacci[x-2] }
fibonacci[6]
fibonacci # => <Proc:0x2d026a0#(irb):5 (lambda)>
Hash
fibonacci = Hash.new{ |h,x| h[x] = x < 2 ? x : h[x-1] + h[x-2] }
fibonacci[6]
fibonacci # => {1=>1, 0=>0, 2=>1, 3=>2, 4=>3, 5=>5, 6=>8}
In one case, you are not leaving any footprint in memory, whereas the hash will continue to keep the computed value. So it depends on what you need.
If you need to access fibonacci[6] one more time, the lambda will recompute the result, whereas the hash will give you the result immediately without redo the calculations.
What are the differences between hashes like this and lambdas?
lambdas and hashes have nothing in common. Your question is like asking:
What are the differences between methods and arrays?
It's just that Hashes can specify a default value for a non-existent key:
h = Hash.new(10)
h["a"] = 2
puts h["a"]
puts h["b"]
--output:--
2
10
Hashes also provide a way to dynamically specify the default value: you can provide a block. Here is an example:
h = Hash.new do |h, key|
h[key] = key.length
end
puts h['hello']
puts h['hi']
p h
--output:--
5
2
{"hello"=>5, "hi"=>2}
When you access a non-existent key, the block is called, and the block can do whatever you want. So someone cleverly figured out that you could create a hash and specify a default value that calculated fibonacci numbers. Here is how it works:
h = Hash.new do |h, key|
if key < 2
h[key] = key
else
h[key] = h[key-1] + h[key-2]
end
end
That creates the Hash h, which is a Hash with no keys or values. If you then write:
puts h[3]
...3 is a non-existent key, so the block is called with the args h and 3. The else clause in the block executes, which gives you:
h[3-1] + h[3-2]
or:
h[2] + h[1]
But to evaluate that statement, ruby has to first evaluate h[2]. But when ruby looks up h[2] in the hash, the key 2 is a non-existent key, so the block is called with the args h and 2, giving you:
(h[2-1] + h[2-2]) + h[1]
or:
(h[1] + h[0]) + h[1]
To evaluate that statement, ruby first has to evaluate the first h[1], and when ruby tries to look up h[1] in the hash, 1 is a non existent key, so the block is called with the args h and 1. This time the if branch executes, causing this to happen:
h[1] = 1
and 1 is returned as the value of h[1], giving you this:
(1 + h[0]) + h[1]
Then ruby looks up h[0], and because 0 is a non-existent key, the block is called with the args h and 0, and the if clause executes and does this:
h[0] = 0
and 0 is returned as the value of h[0], giving you this:
(1 + 0) + h[1]
Then ruby looks up h[1] in the hash, and this time the key 1 exists, and it has a value of 1, giving you:
(1 + 0) + 1
And that is equal to 2, so h[3] is set equal to 2. After calling h[3], you get this output:
puts h[3]
p h
--output:--
2
{1=>1, 0=>0, 2=>1, 3=>2}
As you can see, the previous calculations are all cached in the hash, which means that those calculations don't have to be performed again for other fibonacci numbers.

Rank hash keys based on value

I'm trying to determine a rank for each key in a hash against the other keys based on it's value. The value is numeric. Ranks can be repeated (i.e. 3 keys can tie for first place). This works, but is ugly.
standings.sort_by {|k, v| v}.reverse!
prev_k = nil
standings.each_with_index do |(k, v), i|
if i == 0
k.rank = 1
elsif v == standings[prev_k]
k.rank = prev_k.rank
else
k.rank = prev_k.rank + 1
end
prev_k = k
end
Give this a try:
ranks = Hash[standings.values.sort.uniq.reverse.each_with_index.to_a]
standings.each { |k, v| k.rank = ranks[v] + 1 }
I'm not sure it's any prettier, but it's a bit more compact, carries fewer loop variables, and has no conditionals.

Ruby longest word in array

I built this method to find the longest word in an array, but I'm wondering if there's a better way to have done it. I'm pretty new to Ruby, and just did this as an exercise for learning the inject method.
It returns either the longest word in an array, or an array of the equal longest words.
class Array
def longest_word
# Convert array elements to strings in the event that they're not.
test_array = self.collect { |e| e.to_s }
test_array.inject() do |word, comparison|
if word.kind_of?(Array) then
if word[0].length == comparison.length then
word << comparison
else
word[0].length > comparison.length ? word : comparison
end
else
# If words are equal, they are pushed into an array
if word.length == comparison.length then
the_words = Array.new
the_words << word
the_words << comparison
else
word.length > comparison.length ? word : comparison
end
end
end
end
end
I would do
class Array
def longest_word
group_by(&:size).max.last
end
end
Ruby has a standard method for returning an element in a list with the maximum of a value.
anArray.max{|a, b| a.length <=> b.length}
or you can use the max_by method
anArray.max_by(&:length)
to get all the elements with the maximum length
max_length = anArray.max_by(&:length).length
all_with_max_length = anArray.find_all{|x| x.length = max_length}
Here's one using inject (doesn't work for an empty array):
words.inject(['']){|a,w|
case w.length <=> a.last.length
when -1
a
when 0
a << w
when 1
[w]
end
}
which can be shortened to
words.inject(['']){|a,w|
[a + [w], [w], a][w.length <=> a.last.length]
}
for those who like golf.
A two liner:
vc = ['asd','s','1234','1235'].sort{|a,b| b.size <=> a.size}
vc.delete_if{|a| a.size < vc.first.size}
#Output
["1235", "1234"]
or if you want use inject, this use your idea, but its more short.
test_array.inject{ |ret,word|
ret = [ret] unless ret.kind_of?(Array)
ret << word if word.size == ret.first.size
ret = [word] if word.size > ret.first.size
ret
}
module Enumerable
def longest_word
(strings = map(&:to_s)).
zip(strings.map(&:length)).
inject([[''],0]) {|(wws, ll), (w, l)|
case l <=> ll
when -1 then [wws, ll]
when 1 then [[w], l]
else [wws + [w], ll]
end
}.first
end
end
This method only depends on generic Enumerable methods, there's nothing Array specific about it, therefore we can pull it up into the Enumerable module, where it will also be available for Sets or Enumerators, not just Arrays.
This solution uses the inject method to accumulate the longest strings in an array, then picks the ones with the highest length.
animals = ["mouse", "cat", "bird", "bear", "moose"]
animals.inject(Hash.new{|h,k| h[k] = []}) { |acc, e| acc[e.size] << e; acc }.sort.last[1]
This returns:
["mouse", "mouse"]

Resources