Find most common string in an array

Find most common string in an array - ruby

I have this array, for example (the size is variable):
x = ["1.111", "1.122", "1.250", "1.111"]
and I need to find the most commom value ("1.111" in this case).
Is there an easy way to do that?
Tks in advance!
EDIT #1: Thank you all for the answers!
EDIT #2: I've changed my accepted answer based on Z.E.D.'s information. Thank you all again!

Ruby < 2.2
#!/usr/bin/ruby1.8
def most_common_value(a)
a.group_by do |e|
e
end.values.max_by(&:size).first
end
x = ["1.111", "1.122", "1.250", "1.111"]
p most_common_value(x) # => "1.111"
Note: Enumberable.max_by is new with Ruby 1.9, but it has been backported to 1.8.7
Ruby >= 2.2
Ruby 2.2 introduces the Object#itself method, with which we can make the code more concise:
def most_common_value(a)
a.group_by(&:itself).values.max_by(&:size).first
end
As a monkey patch
Or as Enumerable#mode:
Enumerable.class_eval do
def mode
group_by do |e|
e
end.values.max_by(&:size).first
end
end
["1.111", "1.122", "1.250", "1.111"].mode
# => "1.111"

One pass through the hash to accumulate the counts. Use .max() to find the hash entry with the largest value.
#!/usr/bin/ruby
a = Hash.new(0)
["1.111", "1.122", "1.250", "1.111"].each { |num|
a[num] += 1
}
a.max{ |a,b| a[1] <=> b[1] } # => ["1.111", 2]
or, roll it all into one line:
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] } # => ["1.111", 2]
If you only want the item back add .first():
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first # => "1.111"
The first sample I used is how it would be done in Perl usually. The second is more Ruby-ish. Both work with older versions of Ruby. I wanted to compare them, plus see how Wayne's solution would speed things up so I tested with benchmark:
#!/usr/bin/env ruby
require 'benchmark'
ary = ["1.111", "1.122", "1.250", "1.111"] * 1000
def most_common_value(a)
a.group_by { |e| e }.values.max_by { |values| values.size }.first
end
n = 1000
Benchmark.bm(20) do |x|
x.report("Hash.new(0)") do
n.times do
a = Hash.new(0)
ary.each { |num| a[num] += 1 }
a.max{ |a,b| a[1] <=> b[1] }.first
end
end
x.report("inject:") do
n.times do
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first
end
end
x.report("most_common_value():") do
n.times do
most_common_value(ary)
end
end
end
Here's the results:
user system total real
Hash.new(0) 2.150000 0.000000 2.150000 ( 2.164180)
inject: 2.440000 0.010000 2.450000 ( 2.451466)
most_common_value(): 1.080000 0.000000 1.080000 ( 1.089784)

You could sort the array and then loop over it once. In the loop just keep track of the current item and the number of times it is seen. Once the list ends or the item changes, set max_count == count if count > max_count. And of course keep track of which item has the max_count.

You could create a hashmap that stores the array items as keys with their values being the number of times that element appears in the array.
Pseudo Code:
["1.111", "1.122", "1.250", "1.111"].each { |num|
count=your_hash_map.get(num)
if(item==nil)
hashmap.put(num,1)
else
hashmap.put(num,count+1)
}
As already mentioned, sorting might be faster.

Using the default value feature of hashes:
>> x = ["1.111", "1.122", "1.250", "1.111"]
>> h = Hash.new(0)
>> x.each{|i| h[i] += 1 }
>> h.max{|a,b| a[1] <=> b[1] }
["1.111", 2]

It will return most popular value in array
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]
IE:
x = ["1.111", "1.122", "1.250", "1.111"]
# Most popular
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]
#=> "1.111
# How many times
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[1].size
#=> 2

Related

How can I improve the performance of this small Ruby function?

I am currently doing a Ruby challenge and get the error Terminated due to timeout
for some testcases where the string input is very long (10.000+ characters).
How can I improve my code?
Ruby challenge description
You are given a string containing characters A and B only. Your task is to change it into a string such that there are no matching adjacent characters. To do this, you are allowed to delete zero or more characters in the string.
Your task is to find the minimum number of required deletions.
For example, given the string s = AABAAB, remove A an at positions 0 and 3 to make s = ABAB in 2 deletions.
My function
def alternatingCharacters(s)
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
return counter
end
Thank you!

This could be faster returning the count:
str.size - str.chars.chunk_while{ |a, b| a == b }.to_a.size
The second part uses String#chars method in conjunction with Enumerable#chunk_while.
This way the second part groups in subarrays:
'aababbabbaab'.chars.chunk_while{ |a, b| a == b}.to_a
#=> [["a", "a"], ["b"], ["a"], ["b", "b"], ["a"], ["b", "b"], ["a", "a"], ["b"]]

Trivial if you can use squeeze:
str.length - str.squeeze.length
Otherwise, you could try a regular expression that matches those A (or B) that are preceded by another A (or B):
str.enum_for(:scan, /(?<=A)A|(?<=B)B/).count
Using enum_for avoids the creation of the intermediate array.

The main issue with:
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
Is the fact that you don't save chars into a variable. s.chars will rip apart the string into an array of characters. The first s.chars call outside the loop is fine. However there is no reason to do this for each character in s. This means if you have a string of 10.000 characters, you'll instantiate 10.001 arrays of size 10.000.
Re-using the characters array will give you a huge performance boost:
require 'benchmark'
s = ''
options = %w[A B]
10_000.times { s << options.sample }
Benchmark.bm do |x|
x.report do
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
# create a character array for each iteration ^
end
x.report do
counter = 0
chars = s.chars # <- only create a character array once
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
end
user system total real
8.279767 0.000001 8.279768 ( 8.279655)
0.002188 0.000003 0.002191 ( 0.002191)
You could also make use of enumerator methods like each_cons and count to simplify the code, this doesn't increase performance cost a lot, but makes the code a lot more readable.
Benchmark.bm do |x|
x.report do
counter = 0
chars = s.chars
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
x.report do
s.each_char.each_cons(2).count { |a, b| a == b }
# ^ using each_char instead of chars to avoid
# instantiating a character array
end
end
user system total real
0.002923 0.000000 0.002923 ( 0.002920)
0.003995 0.000000 0.003995 ( 0.003994)

How do you count the amout of duplicate characters in a string ruby

I have a string
s = "chineedne"
I am trying create a function that can count the amount of duplicate characters in my string or any string
tried
s.each_char.map { |c| c.find.count { |c| s.count(c) > 1 }}
#=> NoMethodError: undefined method `find' for "c":String

Possible solution:
string = "chineedne"
string.chars.uniq.count { |char| string.count(char) > 1 }
#=> 2
or without uniq method to count total amount of duplicated characters:
string = "chineedne"
string.chars.count { |char| string.count(char) > 1 }
#=> 5
In order to get away from N**2 complexity, you also can use group_by method for creating hash with character -> array that include all of this character from string and than just use this hash to get any data that you want:
duplicates = string.chars.group_by { |char| char }.select { |key, value| value.size > 1 }
# or, for Ruby version >= 2.2.1 - string.chars.group_by(&:itself).select { |key, value| value.size > 1 }
than:
> duplicates.keys.size # .keys => ['n', 'e']
#=> 2
and
> duplicates.values.flatten.size # .values.flatten => ["n", "n", "e", "e", "e"]
#=> 5

You can simply count your chars:
chars_frequency = str.each_char
.with_object(Hash.new(0)) {|c, m| m[c]+=1}
=> {"c"=>1, "h"=>1, "i"=>1, "n"=>2, "e"=>3, "d"=>1}
Then just count:
chars_frequency.count { |k, v| v > 1 }
=> 2
Or (if you want to count total amount):
chars_frequency.inject(0) {|r, (k, v)| v > 1 ? r + v : r }
=> 5

I dont know too much about ruby but I think that something like this should work
yourstring = "chineedne"
count = 0
"abcdefghijklmnopqrstuvwxyz".split("").each do |c|
if (yourstring.scan(c).count > 1
count = count+1
end
end
count variable represents the ammount of duplicate characters
https://stackoverflow.com/a/12428037/6371926

string = "chineedne"
# iterate over each chars creatign an array
# downcase, incase we are dealing with case sensitive
arr = string.downcase.chars
# pulling out the uniq char and counting how many times
# they appear in the array more than once
arr.uniq.count {|n| arr.count(n) > 1}

How can I remove the first 3 duplicate values and return an array with the remaining values?

Given these arrays, how do I remove three occurrences of a value while keeping the fourth or fifth in the array?
[1,5,1,1,1] # => [1,5]
[3,3,3,2,3] # => [3,2]
[3,4,5,3,3] # => [4,5]
[1,1,1,1,1] # => [1,1]
[1,2,2,4,5] # => [1,2,2,4,5]
Here's what I've tried:
array = [1,5,1,1,1]
top3 = array.select { |x| array.count(x) >= 3 }[0..2]
last2 = array - top3
This strategy (and similar) only seem to work when there are three duplicates but not four or five. Are there elegant solutions to this problem?
UPDATE: Thank you for your amazing answers. As a beginning rubyist I learned a lot just from analyzing each response. My question came from a Ruby Koan challenge for a dice program. Here's my complete solution implemented with Abdo's suggestion. I'm sure there are more efficient ways to implement the program :)
def score(dice)
a,b,c,d,e = dice
array = [a,b,c,d,e]
total = 0
triples = array.select {|x| array.count(x) >= 3}[0..2]
singles = array.group_by{|i| i}.values.map{ |a|
a.length > 2 ? a[0, a.length - 3] : a
}.inject([], :+)
# Calculate values for triples
# 1 * 3 = 1000pts
# 2 * 3 = 200pts
# 3 * 3 = 300pts
# 4 * 3 = 400pts
# 5 * 3 = 500pts
# 6 * 3 = 600pts
case triples[0]
when 1 then total += triples[0]*1000
when (2..6) then total += triples[0]*100
end
# Calculate values for singles:
# 1s = 100pts each
# 5s = 50pts each
singles.include? (1) ? singles.select {|x| x == 1 }.each {|x| total += x*100 } : total
singles.include? (5) ? singles.select {|x| x == 5 }.each {|x| total += x*10 } : total
return total
end
puts score([5,1,1, 5, 6]) # 300 points
puts score([]) # 0 points
puts score([1,1,1,5,1]) # 1150 points
puts score([2,3,4,6,2]) # 0 points
puts score([3,4,5,3,3]) # 350 points
puts score([1,5,1,2,4]) # 250 points

array = [1,5,1,1,1]
occurrence = {}
array.select do|a|
if(array.count(a) > 3)
occurrence[a] ||= []
occurrence[a] << a
occurrence[a].count > 3
else
true
end
end
PS: This solution preserves the order of the elements in the original array

Here's a faster solution when the size of the array is large:
(I avoid using count because it would loop through the array in an inner loop)
arr.inject({}) {
|h, i| h[i] ||= 0; h[i] += 1; h
}.collect_concat {|k,v| [k] * (v > 2 ? v - 3 : v) }
Here's the fruity comparison to the other working solutions:
arr = 1000.times.collect { rand(100) }.shuffle
require 'fruity'
compare do
vimsha {
occurrence = {};
arr.select do|a|
if(arr.count(a) > 3)
occurrence[a] ||= []
occurrence[a] << a
occurrence[a].count > 3
else
true
end
end
}
caryswoveland {
arr.uniq.reduce([]) {|a,e| a + [e]*((cnt=arr.count(e)) > 2 ? cnt-3 : cnt)}
}
aruprakshit {
num_to_del = arr.find { |e| arr.count(e) >= 3 }
if !num_to_del.nil?
3.times do
ind = arr.index { |e| e == num_to_del }
arr.delete_at(ind)
end
end
arr
}
# edited as suggested by #CarySwoveland
abdo {
arr.each_with_object(Hash.new {|h,k| h[k]=[]}) {|i,h| h[i] += 1
}.collect_concat { |k,v| [k] * (v > 2 ? v - 3 : v) }
}
broisatse {
arr.group_by{|i| i}.values.map{ |a|
a.length > 2 ? a[0, a.length - 3] : a
}.inject([], :+)
}
end
Here's the comparison result:
Running each test 64 times. Test will take about 48 seconds.
broisatse is faster than abdo by 30.000000000000004% ± 10.0%
abdo is faster than aruprakshit by 4x ± 1.0 (results differ: ...)
aruprakshit is similar to caryswoveland (results differ: ...)
caryswoveland is similar to vimsha (results differ: ...)
Note: I took #aruprakshit's code outside the method so we don't waste time in the method call itself.
When the array's size is increased further:
arr = 1000.times.collect { rand(1000) }.shuffle
we get:
abdo is faster than broisatse by 3x ± 1.0
broisatse is faster than aruprakshit by 6x ± 10.0
aruprakshit is faster than caryswoveland by 2x ± 1.0
caryswoveland is similar to vimsha

Another way, assuming order need not be preserved (which is consistent with a comment by the asker):
array = [1,2,4,1,2,1,2,1,1,4]
array.uniq.reduce([]) {|a,e| a + [e]*((cnt=array.count(e)) > 2 ? cnt-3 : cnt)}
#=> [1, 1, 4, 4]

Try something like:
a.group_by{|i| i}.values.map{|a| a[0, a.length % 3]}.inject([], :+)
This will remove all triplets from the array. If you want to remove only the first triplet, then do:
a.group_by{|i| i}.values.map{|a| a.length > 2 ? a[0, a.length - 3] : a }.inject([], :+)
Note: This might mess up the order of the array:
[1,2,1,2,3] #=> [1,1,2,2,3]
Let me know if you need to keep the order and, if so, which elements need to be removed if there are more than three, e.g. what should say: [1,1,2,1,1,] - [1,2] or [2,1]?

x.group_by{|i| i }.values.select{|a| a.size >= 3 }.each{|a| c=[3,a.size].min; x.delete_if{|e| a[0]==e && (c-=1)>=0 } }
It will remove the first [3,a.size].min occurrences of a[0] from the input x where a is, for example, [1,1,1,1] for x = [1,2,1,1,1]

I'd do as below :
def del_first_three(a)
num_to_del = a.find { |e| a.count(e) >= 3 }
return a if num_to_del.nil?
3.times do
ind = a.index { |e| e == num_to_del }
a.delete_at(ind)
end
a
end
del_first_three([3,4,5,3,3]) # => [4, 5]
del_first_three([1,5,1,1,1]) # => [5, 1]
del_first_three([1,2,2,4,5]) # => [1, 2, 2, 4, 5]

Which Ruby statement is more efficient?

I have a hash table:
hash = Hash.new(0)
hash[:key] = hash[:key] + 1 # Line 1
hash[:key] += 1 # Line 2
Line 1 and Line 2 do the same thing. Looks like line 1 needs to query hash by key two times while line 2 only once. Is that true? Or they are actually same?

I created a ruby script to benchmark it
require 'benchmark'
def my_case1()
#hash[:key] = #hash[:key] + 1
end
def my_case2()
#hash[:key] += 1
end
n = 10000000
Benchmark.bm do |test|
test.report("case 1") {
#hash = Hash.new(1)
#hash[:key] = 0
n.times do; my_case1(); end
}
test.report("case 2") {
#hash = Hash.new(1)
#hash[:key] = 0
n.times do; my_case2(); end
}
end
Here is the result
user system total real
case 1 3.620000 0.080000 3.700000 ( 4.253319)
case 2 3.560000 0.080000 3.640000 ( 4.178699)
It looks hash[:key] += 1 is slightly better.

#sza beat me to it :)
Here is my example irb session:
> require 'benchmark'
=> true
> n = 10000000
=> 10000000
> Benchmark.bm do |x|
> hash = Hash.new(0)
> x.report("Case 1:") { n.times do; hash[:key] = hash[:key] + 1; end }
> hash = Hash.new(0)
> x.report("Case 2:") { n.times do; hash[:key] += 1; end }
> end
user system total real
Case 1: 1.070000 0.000000 1.070000 ( 1.071366)
Case 2: 1.040000 0.000000 1.040000 ( 1.043644)

The Ruby Language Specification spells out the algorithm for evaluating abbreviated indexing assignment expressions quite clearly. It is something like this:
primary_expression[indexing_argument_list] ω= expression
# ω can be any operator, in this example, it is +
is (roughly) evaluated like
o = primary_expression
*l = indexing_argument_list
v = o.[](*l)
w = expression
l << (v ω w)
o.[]=(*l)
In particular, you can see that both the getter and the setter are called exactly once.
You can also see that by looking at the informal desugaring:
hash[:key] += 1
# is syntactic sugar for
hash[:key] = hash[:key] + 1
# which is syntactic sugar for
hash.[]=(:key, hash.[](:key).+(1))
Again, you see that both the setter and the getter are called exactly once.

second one is the customary way of doing it. It is more efficient.

frequency of objects in an array using Ruby

If i had a list of balls each of which has a color property. how can i cleanly get the list of balls with the most frequent color.
[m1,m2,m3,m4]
say,
m1.color = blue
m2.color = blue
m3.color = red
m4.color = blue
[m1,m2,m4] is the list of balls with the most frequent color
My Approach is to do:
[m1,m2,m3,m4].group_by{|ball| ball.color}.each do |samecolor|
my_items = samecolor.count
end
where count is defined as
class Array
def count
k =Hash.new(0)
self.each{|x|k[x]+=1}
k
end
end
my_items will be a hash of frequencies foreach same color group. My implementation could be buggy and i feel there must be a better and more smarter way.
any ideas please?

You found group_by but missed max_by
max_color, max_balls = [m1,m2,m3,m4].group_by {|b| b.color}.max_by {|color, balls| balls.length}

Your code isn't bad, but it is inefficient. If I were you I would seek a solution that iterates through your array only once, like this:
balls = [m1, m2, m3, m4]
most_idx = nil
groups = balls.inject({}) do |hsh, ball|
hsh[ball.color] = [] if hsh[ball.color].nil?
hsh[ball.color] << ball
most_idx = ball.color if hsh[most_idx].nil? || hsh[ball.color].size > hsh[most_idx].size
hsh
end
groups[most_idx] # => [m1,m2,m4]
This does basically the same thing as group_by, but at the same time it counts up the groups and keeps a record of which group is largest (most_idx).

How about:
color,balls = [m1,m2,m3,m4].group_by { |b| b.color }.max_by(&:size)

Here's how I'd do it. The basic idea uses inject to accumulate the values into a hash, and comes from "12 - Building a Histogram" in "The Ruby Cookbook".
#!/usr/bin/env ruby
class M
attr_reader :color
def initialize(c)
#color = c
end
end
m1 = M.new('blue')
m2 = M.new('blue')
m3 = M.new('red')
m4 = M.new('blue')
hash = [m1.color, m2.color, m3.color, m4.color].inject(Hash.new(0)){ |h, x| h[x] += 1; h } # => {"blue"=>3, "red"=>1}
hash = [m1, m2, m3, m4].inject(Hash.new(0)){ |h, x| h[x.color] += 1; h } # => {"blue"=>3, "red"=>1}
There are two different ways to do it, depending on how much knowledge you want the inject() to know about your objects.

this produces a reverse sorted list of balls by frequency
balls.group_by { |b| b.color }
.map { |k, v| [k, v.size] }
.sort_by { |k, count| -count}

two parts, I'll use your strange balls example but will also include my own rails example
ary = [m1,m2,m3,m4]
colors = ary.each.map(&:color) #or ary.each.map {|t| t.color }
Hash[colors.group_by(&:w).map {|w, ws| [w, ws.length] }]
#=> {"blue" => 3, "red" => 1 }
my ActiveRecord example
stocks = Sp500Stock.all
Hash[stocks.group_by(&:sector).map {|w, s| [w, s.length] }].sort_by { |k,v| v }
#=> {"Health Care" => 36, etc]

myhash = {}
mylist.each do |ball|
if myhash[ball.color]
myhash[ball.color] += 1
else
myhash[ball.color] = 1
end
end
puts myhash.sort{|a,b| b[1] <=> a[1]}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find most common string in an array - ruby

You could sort the array and then loop over it once. In the loop just keep track of the current item and the number of times it is seen. Once the list ends or the item changes, set max_count == count if count > max_count. And of course keep track of which item has the max_count.

Using the default value feature of hashes: >> x = ["1.111", "1.122", "1.250", "1.111"] >> h = Hash.new(0) >> x.each{|i| h[i] += 1 } >> h.max{|a,b| a[1] <=> b[1] } ["1.111", 2]

Related

How can I improve the performance of this small Ruby function?

How do you count the amout of duplicate characters in a string ruby

How can I remove the first 3 duplicate values and return an array with the remaining values?

Which Ruby statement is more efficient?

frequency of objects in an array using Ruby

Categories

Resources