Related
I currently have this written up for turning my string elements into a hash[key] with its indices as values
def char_indices(str)
array = str.split(//)
hash = array.each_with_object({}).with_index {|(el, h), i| h[el] = []<<i}
p hash
end
which returns
{"m"=>[0], "i"=>[10], "s"=>[6], "p"=>[9]}
{"c"=>[0], "l"=>[1], "a"=>[2], "s"=>[4], "r"=>[5], "o"=>[7], "m"=>[8]}
but I need
char_indices('mississippi') # => {"m"=>[0], "i"=>[1, 4, 7, 10], "s"=>[2, 3, 5, 6], "p"=>[8, 9]}
char_indices('classroom') # => {"c"=>[0], "l"=>[1], "a"=>[2], "s"=>[3, 4], "r"=>[5], "o"=>[6, 7], "m"=>[8]}
problem is my value is getting replaced each time and i only end up with the last value.
how can i add each recurring location to value in a ruby like fashion.
The problem is h[el] = []<<i. Instead you can use conditional assignment to ensure that you're working with an array:
def char_indices(string)
string.each_char
.with_index
.with_object({}) do |(char, index), hash|
hash[char] ||= []
hash[char] << index
end
end
def char_indices(str)
str.each_char.
with_index.
with_object({}) do |(c,i),h|
(h[c] ||= []) << i
end
end
char_indices('mississippi')
#=> {"m"=>[0], "i"=>[1, 4, 7, 10], "s"=>[2, 3, 5, 6], "p"=>[8, 9]}
char_indices('classroom')
#=> {"c"=>[0], "l"=>[1], "a"=>[2], "s"=>[3, 4], "r"=>[5], "o"=>
[6, 7], "m"=>[8]}
Notice how I've written the block variables: |(c,i),h|. This makes use of Ruby's array decomposition rules. See also this article on the subject.
The method is equivalent to the following.
def char_indices(str)
h = {}
str.each_char.
with_index do |c,i|
h[c] = [] unless h.key?(c)
h[c] << i
end
h
end
One could alternatively write the method like this:
def char_indices(str)
str.each_char.
with_index.
with_object(Hash.new { |h,k| h[k] = [] }) { |(c,i),h| h[c] << i }
end
This uses the form of Hash::new that takes a block and no argument. If
h = Hash.new { |h,k| h[k] = [] }
than, possibly after keys have been added, if h.key?(k) #=> false, h[k] = [] is executed. Here, when
h[c] << i
is encountered, if h.key?(c) #=> false, h[c] = [] is executed, after which h[c] << i is executed, resulting in h[c] #=> [c].
I want to be able to find the index of all occurrences of a substring in a larger string using Ruby. E.g.: all "in" in "Einstein"
str = "Einstein"
str.index("in") #returns only 1
str.scan("in") #returns ["in","in"]
#desired output would be [1, 6]
The standard hack is:
indices = "Einstein".enum_for(:scan, /(?=in)/).map do
Regexp.last_match.offset(0).first
end
#=> [1, 6]
def indices_of_matches(str, target)
sz = target.size
(0..str.size-sz).select { |i| str[i,sz] == target }
end
indices_of_matches('Einstein', 'in')
#=> [1, 6]
indices_of_matches('nnnn', 'nn')
#=> [0, 1, 2]
The second example reflects an assumption I made about the treatment of overlapping strings. If overlapping strings are not to be considered (i.e., [0, 2] is the desired return value in the second example), this answer is obviously inappropriate.
This is a more verbose solution which brings the advantage of not relying on a global value:
def indices(string, regex)
position = 0
Enumerator.new do |yielder|
while match = regex.match(string, position)
yielder << match.begin(0)
position = match.end(0)
end
end
end
p indices("Einstein", /in/).to_a
# [1, 6]
It outputs an Enumerator, so you could also use it lazily or just take the n first indices.
Also, if you might need more information than just the indices, you could return an Enumerator of MatchData and extract the indices:
def matches(string, regex)
position = 0
Enumerator.new do |yielder|
while match = regex.match(string, position)
yielder << match
position = match.end(0)
end
end
end
p matches("Einstein", /in/).map{ |match| match.begin(0) }
# [1, 6]
To get the behaviour described by #Cary, you could replace the last line in block by position = match.begin(0) + 1.
#Recursive Function
def indexes string, sub_string, start=0
index = string[start..-1].index(sub_string)
return [] unless index
[index+start] + indexes(string,sub_string,index+start+1)
end
#For better Usage I would open String class
class String
def indexes sub_string,start=0
index = self[start..-1].index(sub_string)
return [] unless index
[index+start] + indexes(sub_string,index+start+1)
end
end
This way we can call in this way: "Einstein".indexes("in") #=> [1, 6]
This is the question prompt:
Write a method that takes in a string. Your method should return the most common letter in the array, and a count of how many times it appears.
I'm not entirely sure where to go with what I have so far.
def most_common_letter(string)
arr1 = string.chars
arr2 = arr1.max_by(&:count)
end
I suggest you use a counting hash:
str = "The quick brown dog jumped over the lazy fox."
str.downcase.gsub(/[^a-z]/,'').
each_char.
with_object(Hash.new(0)) { |c,h| h[c] += 1 }.
max_by(&:last)
#=> ["e",4]
Hash::new with an argument of zero creates an empty hash whose default value is zero.
The steps:
s = str.downcase.gsub(/[^a-z]/,'')
#=> "thequickbrowndogjumpedoverthelazyfox"
enum0 = s.each_char
#=> #<Enumerator: "thequickbrowndogjumpedoverthelazyfox":each_char>
enum1 = enum0.with_object(Hash.new(0))
#=> #<Enumerator: #<Enumerator:
# "thequickbrowndogjumpedoverthelazyfox":each_char>:with_object({})>
You can think of enum1 as a "compound" enumerator. (Study the return value above.)
Let's see the elements of enum1:
enum1.to_a
#=> [["t", {}], ["h", {}], ["e", {}], ["q", {}],..., ["x", {}]]
The first element of enum1 (["t", {}]) is passed to the block by String#each_char and assigned to the block variables:
c,h = enum1.next
#=> ["t", {}]
c #=> "t"
h #=> {}
The block calculation is then performed:
h[c] += 1
#=> h[c] = h[c] + 1
#=> h["t"] = h["t"] + 1
#=> h["t"] = 0 + 1 #=> 1
h #=> {"t"=>1}
Ruby expands h[c] += 1 to h[c] = h[c] + 1, which is h["t"] = h["t"] + 1 As h #=> {}, h has no key "t", so h["t"] on the right side of the equal sign is replaced by the hash's default value, 0. The next time c #=> "t", h["t"] = h["t"] + 1 will reduce to h["t"] = 1 + 1 #=> 2 (i.e., the default value will not be used, as h now has a key "t").
The next value of enum1 is then passed into the block and the block calculation is performed:
c,h = enum1.next
#=> ["h", {"t"=>1}]
h[c] += 1
#=> 1
h #=> {"t"=>1, "h"=>1}
The remaining elements of enum1 are processed similarly.
A simple way to do that, without worrying about checking empty letters:
letter, count = ('a'..'z')
.map {|letter| [letter, string.count(letter)] }
.max_by(&:last)
Here is another way of doing what you want:
str = 'aaaabbbbcd'
h = str.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
max = h.values.max
output_hash = Hash[h.select { |k, v| v == max}]
puts "most_frequent_value: #{max}"
puts "most frequent character(s): #{output_hash.keys}"
def most_common_letter(string)
string.downcase.split('').group_by(&:itself).map { |k, v| [k, v.size] }.max_by(&:last)
end
Edit:
Using hash:
def most_common_letter(string)
chars = {}
most_common = nil
most_common_count = 0
string.downcase.gsub(/[^a-z]/, '').each_char do |c|
count = (chars[c] = (chars[c] || 0) + 1)
if count > most_common_count
most_common = c
most_common_count = count
end
end
[most_common, most_common_count]
end
I'd like to mention a solution with Enumerable#tally, introduced by Ruby 2.7.0:
str =<<-END
Tallies the collection, i.e., counts the occurrences of each element. Returns a hash with the elements of the collection as keys and the corresponding counts as values.
END
str.scan(/[a-z]/).tally.max_by(&:last)
#=> ["e", 22]
Where:
str.scan(/[a-z]/).tally
#=> {"a"=>8, "l"=>9, "i"=>6, "e"=>22, "s"=>12, "t"=>13, "h"=>9, "c"=>11, "o"=>11, "n"=>11, "u"=>5, "r"=>5, "f"=>2, "m"=>2, "w"=>1, "k"=>1, "y"=>1, "d"=>2, "p"=>1, "g"=>1, "v"=>1}
char, count = string.split('').
group_by(&:downcase).
map { |k, v| [k, v.size] }.
max_by { |_, v| v }
I am new to Ruby and trying to write a method that will return an array of the most common word(s) in a string. If there is one word with a high count, that word should be returned. If there are two words tied for the high count, both should be returned in an array.
The problem is that when I pass through the 2nd string, the code only counts "words" twice instead of three times. When the 3rd string is passed through, it returns "it" with a count of 2, which makes no sense, as "it" should have a count of 1.
def most_common(string)
counts = {}
words = string.downcase.tr(",.?!",'').split(' ')
words.uniq.each do |word|
counts[word] = 0
end
words.each do |word|
counts[word] = string.scan(word).count
end
max_quantity = counts.values.max
max_words = counts.select { |k, v| v == max_quantity }.keys
puts max_words
end
most_common('a short list of words with some words') #['words']
most_common('Words in a short, short words, lists of words!') #['words']
most_common('a short list of words with some short words in it') #['words', 'short']
Your method of counting instances of the word is your problem. it is in with, so it's double counted.
[1] pry(main)> 'with some words in it'.scan('it')
=> ["it", "it"]
It can be done easier though, you can group an array's contents by the number of instances of the values using an each_with_object call, like so:
counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
This goes through each entry in the array and adds 1 to the value for each word's entry in the hash.
So the following should work for you:
def most_common(string)
words = string.downcase.tr(",.?!",'').split(' ')
counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
max_quantity = counts.values.max
counts.select { |k, v| v == max_quantity }.keys
end
p most_common('a short list of words with some words') #['words']
p most_common('Words in a short, short words, lists of words!') #['words']
p most_common('a short list of words with some short words in it') #['words', 'short']
As Nick has answered your question, I will just suggest another way this can be done. As "high count" is vague, I suggest you return a hash with downcased words and their respective counts. Since Ruby 1.9, hashes retain the order that key-value pairs have been entered, so we may want to make use of that and return the hash with key-value pairs ordered in decreasing order of values.
Code
def words_by_count(str)
str.gsub(/./) do |c|
case c
when /\w/ then c.downcase
when /\s/ then c
else ''
end
end.split
.group_by {|w| w}
.map {|k,v| [k,v.size]}
.sort_by(&:last)
.reverse
.to_h
end
words_by_count('Words in a short, short words, lists of words!')
The method Array#h was introduced in Ruby 2.1. For earlier Ruby versions, one must use:
Hash[str.gsub(/./)... .reverse]
Example
words_by_count('a short list of words with some words')
#=> {"words"=>2, "of"=>1, "some"=>1, "with"=>1,
# "list"=>1, "short"=>1, "a"=>1}
words_by_count('Words in a short, short words, lists of words!')
#=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
words_by_count('a short list of words with some short words in it')
#=> {"words"=>2, "short"=>2, "it"=>1, "with"=>1,
# "some"=>1, "of"=>1, "list"=>1, "in"=>1, "a"=>1}
Explanation
Here is what's happening in the second example, where:
str = 'Words in a short, short words, lists of words!'
str.gsub(/./) do |c|... matches each character in the string and sends it to the block to decide what do with it. As you see, word characters are downcased, whitespace is left alone and everything else is converted to a blank space.
s = str.gsub(/./) do |c|
case c
when /\w/ then c.downcase
when /\s/ then c
else ''
end
end
#=> "words in a short short words lists of words"
This is followed by
a = s.split
#=> ["words", "in", "a", "short", "short", "words", "lists", "of", "words"]
h = a.group_by {|w| w}
#=> {"words"=>["words", "words", "words"], "in"=>["in"], "a"=>["a"],
# "short"=>["short", "short"], "lists"=>["lists"], "of"=>["of"]}
b = h.map {|k,v| [k,v.size]}
#=> [["words", 3], ["in", 1], ["a", 1], ["short", 2], ["lists", 1], ["of", 1]]
c = b.sort_by(&:last)
#=> [["of", 1], ["in", 1], ["a", 1], ["lists", 1], ["short", 2], ["words", 3]]
d = c.reverse
#=> [["words", 3], ["short", 2], ["lists", 1], ["a", 1], ["in", 1], ["of", 1]]
d.to_h # or Hash[d]
#=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
Note that c = b.sort_by(&:last), d = c.reverse can be replaced by:
d = b.sort_by { |_,k| -k }
#=> [["words", 3], ["short", 2], ["a", 1], ["in", 1], ["lists", 1], ["of", 1]]
but sort followed by reverse is generally faster.
def count_words string
word_list = Hash.new(0)
words = string.downcase.delete(',.?!').split
words.map { |word| word_list[word] += 1 }
word_list
end
def most_common_words string
hash = count_words string
max_value = hash.values.max
hash.select { |k, v| v == max_value }.keys
end
most_common 'a short list of words with some words'
#=> ["words"]
most_common 'Words in a short, short words, lists of words!'
#=> ["words"]
most_common 'a short list of words with some short words in it'
#=> ["short", "words"]
Assuming string is a string containing multiple words.
words = string.split(/[.!?,\s]/)
words.sort_by{|x|words.count(x)}
Here we split the words in an string and add them to an array. We then sort the array based on the number of words. The most common words will appear at the end.
The same thing can be done in the following way too:
def most_common(string)
counts = Hash.new 0
string.downcase.tr(",.?!",'').split(' ').each{|word| counts[word] += 1}
# For "Words in a short, short words, lists of words!"
# counts ---> {"words"=>3, "in"=>1, "a"=>1, "short"=>2, "lists"=>1, "of"=>1}
max_value = counts.values.max
#max_value ---> 3
return counts.select{|key , value| value == counts.values.max}
#returns ---> {"words"=>3}
end
This is just a shorter solution, which you might want to use. Hope it helps :)
This is the kind of question programmers love, isn't it :) How about a functional approach?
# returns array of words after removing certain English punctuations
def english_words(str)
str.downcase.delete(',.?!').split
end
# returns hash mapping element to count
def element_counts(ary)
ary.group_by { |e| e }.inject({}) { |a, e| a.merge(e[0] => e[1].size) }
end
def most_common(ary)
ary.empty? ? nil :
element_counts(ary)
.group_by { |k, v| v }
.sort
.last[1]
.map(&:first)
end
most_common(english_words('a short list of words with some short words in it'))
#=> ["short", "words"]
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
def common(string)
counts=Hash.new(0)
words=string.downcase.delete('.,!?').split(" ")
words.each {|k| counts[k]+=1}
p counts.sort.reverse[0]
end
In Ruby, given an array of elements, what is the easiest way to return the indices of the elements that are not identical?
array = ['a','b','a','a','a','c'] #=> [1,5]
Expanded question:
Assuming that the identity threshold is based on the most frequent element in the array.
array = ['a','c','a','a','a','d','d'] #=> [1,5,6]
For an array with two equally frequent elements, return the indices of either of the 2 elements. e.g.
array = ['a','a','a','b','b','b'] #=>[0,1,2] or #=> [3,4,5]
Answer edited after question edit:
def idx_by_th(arr)
idx = []
occur = arr.inject(Hash.new(0)) { |k,v| k[v] += 1; k }
th = arr.sort_by { |v| occur[v] }.last
arr.each_index {|i| idx << i if arr[i]!=th}
idx
end
idx_by_th ['a','b','a','a','a','c'] # => [1, 5]
idx_by_th ['a','c','a','a','a','d','d'] # => [1, 5, 6]
idx_by_th ['a','a','a','b','b','b'] # => [0, 1, 2]
These answers are valid for the first version of the question:
ruby < 1.8.7
def get_uniq_idx(arr)
test=[]; idx=[]
arr.each_index do |i|
idx << i if !(arr[i+1..arr.length-1] + test).include?(arr[i])
test << arr[i]
end
return idx
end
puts get_uniq_idx(['a','b','a','a','a','c']).inspect # => [1, 5]
ruby >= 1.8.7:
idxs=[]
array.each_index {|i| idxs<<i if !(array.count(array[i]) > 1)}
puts idxs.inspect # => [1, 5]
It's not quite clear what you're looking for, but is something like this what you want?
array = ['a','b','a','a','a','c']
array.uniq.inject([]) do |arr, elem|
if array.count(elem) == 1
arr << array.index(elem)
end
arr
end
# => [1,5]