How to find word with the greatest number of repeated letters - ruby

My goal is to find the word with greatest number of repeated letters in a given string. For example, "aabcc ddeeteefef iijjfff" would return "ddeeteefef" because "e" is repeated five times in this word and that is more than all other repeating characters.
So far this is what I got, but it has many problems and is not complete:
def LetterCountI(str)
s = str.split(" ")
i = 0
result = []
t = s[i].scan(/((.)\2+)/).map(&:max)
u = t.max { |a, b| a.length <=> b.length }
return u.split(//).count
end
The code I have only finds consecutive patterns; if the pattern is interrupted (such as with "aabaaa", it counts a three times instead of five).

str.scan(/\w+/).max_by{ |w| w.chars.group_by(&:to_s).values.map(&:size).max }
scan(/\w+/) — create an array of all sequences of 'word' characters
max_by{ … } — find the word that gives the largest value inside this block
chars — split the string into characters
group_by(&:to_s) — create a hash mapping each character to an array of all the occurrences
values — just get all the arrays of the occurrences
map(&:size) — convert each array to the number of characters in that array
max — find the largest characters and use this as the result for max_by to examine
Edit: Written less compactly:
str.scan(/\w+/).max_by do |word|
word.chars
.group_by{ |char| char }
.map{ |char,array| array.size }
.max
end
Written less functionally and with less Ruby-isms (to make it look more like "other" languages):
words_by_most_repeated = []
str.split(" ").each do |word|
count_by_char = {} # hash mapping character to count of occurrences
word.chars.each do |char|
count_by_char[ char ] = 0 unless count_by_char[ char ]
count_by_char[ char ] += 1
end
maximum_count = 0
count_by_char.each do |char,count|
if count > maximum_count then
maximum_count = count
end
end
words_by_most_repeated[ maximum_count ] = word
end
most_repeated = words_by_most_repeated.last

I'd do as below :
s = "aabcc ddeeteefef iijjfff"
# intermediate calculation that's happening in the final code
s.split(" ").map { |w| w.chars.max_by { |e| w.count(e) } }
# => ["a", "e", "f"] # getting the max count character from each word
s.split(" ").map { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => [2, 5, 3] # getting the max count character's count from each word
# final code
s.split(" ").max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => "ddeeteefef"
update
each_with_object gives better result than group_by method.
require 'benchmark'
s = "aabcc ddeeteefef iijjfff"
def phrogz(s)
s.scan(/\w+/).max_by{ |word| word.chars.group_by(&:to_s).values.map(&:size).max }
end
def arup_v1(s)
max_string = s.split.max_by do |w|
h = w.chars.each_with_object(Hash.new(0)) do |e,hsh|
hsh[e] += 1
end
h.values.max
end
end
def arup_v2(s)
s.split.max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
end
n = 100_000
Benchmark.bm do |x|
x.report("Phrogz:") { n.times {|i| phrogz s } }
x.report("arup_v2:"){ n.times {|i| arup_v2 s } }
x.report("arup_v1:"){ n.times {|i| arup_v1 s } }
end
output
user system total real
Phrogz: 1.981000 0.000000 1.981000 ( 1.979198)
arup_v2: 0.874000 0.000000 0.874000 ( 0.878088)
arup_v1: 1.684000 0.000000 1.684000 ( 1.685168)

Similar to sawa's answer:
"aabcc ddeeteefef iijjfff".split.max_by{|w| w.length - w.chars.uniq.length}
=> "ddeeteefef"
In Ruby 2.x, this works as-is because String#chars returns an array. In earlier versions of ruby, String#chars yields an enumerator so you need to add .to_a before applying uniq. I did my testing in Ruby 2.0, and overlooked this until it was pointed out by Stephens.
I believe this is valid, since the question was "greatest number of repeated letters in a given string" rather than greatest number of repeats for a single letter in a given string.

"aabcc ddeeteefef iijjfff"
.split.max_by{|w| w.chars.sort.chunk{|e| e}.map{|e| e.last.length}.max}
# => "ddeeteefef"

Related

Ruby merge duplicates in string

If I have a string like this
str =<<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
If a number in the first value shows up again, I want to add their second values together. So the final string would look like this
7312357006,1246.221
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
If the final output is an array that's fine too.
There are lots of ways to do this in Ruby. One particularly terse way is to use String#scan:
str = <<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
data = Hash.new(0)
str.scan(/(\d+),([\d.]+)/) {|k,v| data[k] += v.to_f }
p data
# => { "7312357006" => 1246.221,
# "3214058234" => 3499.2,
# "1324958723" => 232.1,
# "3214173443" => 234.1,
# "6134513494" => 23.2 }
This uses the regular expression /(\d+),([\d.]+)/ to extract the two values from each line. The block is called with each pair as arguments, which are then merged into the hash.
This could also be written as a single expression using each_with_object:
data = str.scan(/(\d+),([\d.]+)/)
.each_with_object(Hash.new(0)) {|(k,v), hsh| hsh[k] += v.to_f }
# => (same as above)
There are likewise many ways to print the result, but here are a couple I like:
puts data.map {|kv| kv.join(",") }.join("\n")
# => 7312357006,1246.221
# 3214058234,3499.2
# 1324958723,232.1
# 3214173443,234.1
# 6134513494,23.2
# or:
puts data.map {|k,v| "#{k},#{v}\n" }.join
# => (same as above)
You can see all of these in action on repl.it.
Edit: Although I don't recommend either of these for the sake of readability, here's more just for kicks (requires Ruby 2.4+):
data = str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.transform_values {|a| a.sum(&:to_f) }
...or, to going straight to a string:
puts str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.map {|k,vs| "#{k},#{vs.sum(&:to_f)}\n" }.join
Since repl.it is stuck on Ruby 2.3: Try it online!
You could achieve this using each_with_object, as below:
str = "7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1"
# convert the string into nested pairs of floats
# to briefly summarise the steps: split entries by newline, strip whitespace, split by comma, convert to floats
arr = str.split("\n").map(&:strip).map { |el| el.split(",").map(&:to_f) }
result = arr.each_with_object(Hash.new(0)) do |el, hash|
hash[el.first] += el.last
end
# => {7312357006.0=>1246.221, 3214058234.0=>3499.2, 1324958723.0=>232.1, 3214173443.0=>234.1, 6134513494.0=>23.2}
# You can then call `to_a` on result if you want:
result.to_a
# => [[7312357006.0, 1246.221], [3214058234.0, 3499.2], [1324958723.0, 232.1], [3214173443.0, 234.1], [6134513494.0, 23.2]]
each_with_object iterates through each pair of data, providing them with access to an accumulator (in this the hash). By following this approach, we can add each entry to the hash, and add together the totals if they appear more than once.
Hope that helps - let me know if you've any questions.
def combine(str)
str.each_line.with_object(Hash.new(0)) do |s,h|
k,v = s.split(',')
h.update(k=>v.to_f) { |k,o,n| o+n }
end.reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair }
end
puts combine str
7312357006,1246.22
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
Notes:
using String#each_line is preferable to str.split("\n") as the former returns an enumerator whereas the latter returns a temporary array. Each element generated by the enumerator is line of str that (unlike the elements of str.split("\n")) ends with a newline character, but that is of no concern.
see Hash::new, specifically when a default value (here 0) is used. If a hash has been defined h = Hash.new(0) and h does not have a key k, h[k] returns the default value, zero (h is not changed). When Ruby encounters the expression h[k] += 1, the first thing she does is expand it to h[k] = h[k] + 1. If h has been defined with a default value of zero, and h does not have a key k, h[k] on the right of the equality (syntactic sugar1 for h.[](k)) returns zero.
see Hash#update (aka merge!). h.update(k=>v.to_f) is syntactic sugar for h.update({ k=>v.to_f })
see Kernel#sprint for explanations of the formatting directives %s and %g.
the receiver for the expression reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair } (in the penultimate line), is the following hash.
{"7312357006"=>1246.221, "3214058234"=>3499.2, "1324958723"=>232.1,
"3214173443"=>234.1, "6134513494"=>23.2}
1 Syntactic sugar is a shortcut allowed by Ruby.
Implemented this solution as hash was giving me issues:
d = []
s.split("\n").each do |line|
x = 0
q = 0
dup = false
line.split(",").each do |data|
if x == 0 and d.include? data then dup = true ; q = d.index(data) elsif x == 0 then d << data end
if x == 1 and dup == false then d << data end
if x == 1 and dup == true then d[q+1] = "#{'%.2f' % (d[q+1].to_f + data.to_f).to_s}" end
if x == 2 and dup == false then d << data end
x += 1
end
end
x = 0
s = ""
d.each do |val|
if x == 0 then s << "#{val}," end
if x == 1 then s << "#{val}\n ; x = 0" end
x += 1
end
puts(s)

String compressor (Ruby)

Here is my code in ruby for a word compression.
For any given word (e.g. abbbcca) the compressed word/output should be in the format as "letter+repetition" (for above example, output: a1b3c2a1).
Here I'm so close to the completion but my result isn't in the expected format. It's counting the whole letters in string.chars.each thus resulting output as a2b3c2a2.
Any help?
def string_compressor(string)
new_string = []
puts string.squeeze
string.squeeze.chars.each { |s|
count = 0
string.chars.each { |w|
if [s] == [w]
count += 1
end
}
new_string << "#{s}#{count}"
puts "#{new_string}"
}
if new_string.length > string.length
return string
elsif new_string.length < string.length
return new_string
else "Equal"
end
end
string_compressor("abbbcca")
'abbbcca'.chars.chunk{|c| c}.map{|c, a| [c, a.size]}.flatten.join
Adapted from a similar question.
Similar:
'abbbcca'.chars.chunk{|c| c}.map{|c, a| "#{c}#{a.size}"}.join
See chunk documentation
You can use a regular expression for that.
'abbbcca'.gsub(/(.)\1*/) { |m| "%s%d" % [m[0], m.size] }
#=> "a1b3c2a1"
The regular expression reads, "match any character, capturing it in group 1. Then match the contents of capture group 1 zero or more times".
As you said, your code counts every letter in the string, not just the one grouped next to one another.
Here's a modified version :
def display_count(count)
if count == 1
""
else
count.to_s
end
end
def string_compressor(string)
new_string = ''
last_char = nil
count = 0
string.chars.each do |char|
if char == last_char
count += 1
else
new_string << "#{last_char}#{display_count(count)}" if last_char
last_char = char
count = 1
end
end
new_string << "#{last_char}#{display_count(count)}" if last_char
new_string
end
p string_compressor('abbbcca') #=> "ab3c2a"
p string_compressor('aaaabbb') #=> "a4b3"
p string_compressor('aabb') #=> "a2b2"
p string_compressor('abc') #=> "abc"
Note that with display_count removing 1s from the string, new_string can never be longer than string. It also probably isn't a good idea to return Equal as a supposedly compressed string.
To decompress the string :
def string_decompressor(string)
string.gsub(/([a-z])(\d+)/i){$1*$2.to_i}
end
p string_decompressor("a5b11") #=> "aaaaabbbbbbbbbbb"
p string_decompressor("ab3c2a") #=> "abbbcca"

Check whether a string contains all the characters of another string in Ruby

Let's say I have a string, like string= "aasmflathesorcerersnstonedksaottersapldrrysaahf". If you haven't noticed, you can find the phrase "harry potter and the sorcerers stone" in there (minus the space).
I need to check whether string contains all the elements of the string.
string.include? ("sorcerer") #=> true
string.include? ("harrypotterandtheasorcerersstone") #=> false, even though it contains all the letters to spell harrypotterandthesorcerersstone
Include does not work on shuffled string.
How can I check if a string contains all the elements of another string?
Sets and array intersection don't account for repeated chars, but a histogram / frequency counter does:
require 'facets'
s1 = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
s2 = "harrypotterandtheasorcerersstone"
freq1 = s1.chars.frequency
freq2 = s2.chars.frequency
freq2.all? { |char2, count2| freq1[char2] >= count2 }
#=> true
Write your own Array#frequency if you don't want to the facets dependency.
class Array
def frequency
Hash.new(0).tap { |counts| each { |v| counts[v] += 1 } }
end
end
I presume that if the string to be checked is "sorcerer", string must include, for example, three "r"'s. If so you could use the method Array#difference, which I've proposed be added to the Ruby core.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
str = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
target = "sorcerer"
target.chars.difference(str.chars).empty?
#=> true
target = "harrypotterandtheasorcerersstone"
target.chars.difference(str.chars).empty?
#=> true
If the characters of target must not only be in str, but must be in the same order, we could write:
target = "sorcerer"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /s.*o.*r.*c.*e.*r.*e.*r/
str =~ r
#=> 2 (truthy)
(or !!(str =~ r) #=> true)
target = "harrypotterandtheasorcerersstone"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /h.*a.*r.*r.*y* ... o.*n.*e/
str =~ r
#=> nil
A different albeit not necessarily better solution using sorted character arrays and sub-strings:
Given your two strings...
subject = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
search = "harrypotterandthesorcerersstone"
You can sort your subject string using .chars.sort.join...
subject = subject.chars.sort.join # => "aaaaaaacddeeeeeffhhkllmnnoooprrrrrrssssssstttty"
And then produce a list of substrings to search for:
search = search.chars.group_by(&:itself).values.map(&:join)
# => ["hh", "aa", "rrrrrr", "y", "p", "ooo", "tttt", "eeeee", "nn", "d", "sss", "c"]
You could alternatively produce the same set of substrings using this method
search = search.chars.sort.join.scan(/((.)\2*)/).map(&:first)
And then simply check whether every search sub-string appears within the sorted subject string:
search.all? { |c| subject[c] }
Create a 2 dimensional array out of your string letter bank, to associate the count of letters to each letter.
Create a 2 dimensional array out of the harry potter string in the same way.
Loop through both and do comparisons.
I have no experience in Ruby but this is how I would start to tackle it in the language I know most, which is Java.

Return array with the longest strings only

I need to have true in all 3 tests. I need to iterate the code until the array returns only the longest ones only.
# Tests
p longest(['tres', 'pez', 'alerta', 'cuatro', 'tesla', 'tropas', 'siete']) == ["alerta", "cuatro", "tropas"]
p longest(['gato', 'perro', 'elefante', 'jirafa']) == ["elefante"]
p longest(['verde', 'rojo', 'negro', 'morado']) == ["morado"]
Okay, let's break it down...
First: find out how long the longest word is:
words = ['tres', 'pez', 'alerta', 'cuatro', 'tesla', 'tropas', 'siete']
words.map(&:length).max
#=> 6
Second select all words with that length:
words.select { |w| w.length == 6 }
#=> ["alerta", "cuatro", "tropas"]
Combine that to a method:
def longest(words)
max_length = words.map(&:length).max
words.select { |w| w.length == max_length }
end

Counting the number of times two letters appear together

I am trying to make a Ruby program that counts the numer of times two letters appear together. This is what is written in the file I'm reading:
hola
chau
And this is what I'm trying to get:
ho;ol;la;ch;ha;au;
1;1;1;1;1;1;
I can't get it to work properly. This is my code so far:
file = File.read(gets.chomp)
todo = file.scan(/[a-z][a-z]/).each_with_object(Hash.new(0)) {
|a, b| b[a] += 1
}
keys = ''
values = ''
todo.each_key {
|key| keys += key + ';'
}
todo.each_value {
|value| values += value.to_s + ';'
}
puts keys
puts values
This is the result I'm getting:
ho;la;ch;au;
1;1;1;1;
Why am I not getting every combination of characters? What should I ad to my regex so it would count every combination of characters?
Because the characters are overlapped, you need to use a lookahead to capture the overlapped characters.
(?=([a-z][a-z]))
DEMO
This is one way.
def char_pairs(str)
str.split(/\s+/).flat_map { |w| w.chars.each_cons(2).map(&:join) }
.each_with_object({}) { |e,h| h[e] = (h[e] ||= 0 ) + 1 }
end
char_pairs("hello jello")
#=> {"he"=>1, "el"=>2, "ll"=>2, "lo"=>2, "je"=>1}
char_pairs("hello yellow jello")
#=> {"he"=>1, "el"=>3, "ll"=>3, "lo"=>3, "ye"=>1, "ow"=>1, "je"=>1}
Having the hash, it is an easy matter to convert it to any output format you desire.

Resources