Consecutive letter frequency - ruby

I am trying to write code to determine consecutive frequency of letters within a string.
For example:
"aabbcbb" => ["a",2],["b",2],["c", 1], ["b", 2]
My code gives me the first letter frequency but doesn't move on to the next.
def encrypt(str)
array = []
count = 0
str.each_char do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
return [array, count]
array = []
end
end
end

p "aabbcbb".chars.chunk{|c| c}.map{|c, a| [c, a.size]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]

"aabbcbb".chars.slice_when(&:!=).map{|a| [a.first, a.length]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]

There's a simple regular expression-based solution involving back-references:
"aabbbcbb".scan(/((.)\2*)/).map { |m,c| [c, m.length] }
# => [["a", 2], ["b", 3], ["c", 1], ["b", 2]]
But I would prefer the chunk method for clarity (and almost certainly efficiency).
Actually out of curiosity, I wrote a quick benchmark and scan is a little more than four times faster than chunk.map, but I'd still use chunk.map for clarity unless you're actually doing this hundreds of thousands of times:
require 'benchmark'
N = 10000
data = ('a'..'z').map { |c| c * 10 }.join("")
Benchmark.bm do |bm|
bm.report do
N.times { data.chars.chunk{ |c| c }.map { |c, a| [c, a.size] } }
end
bm.report do
N.times { data.scan(/((.)\2*)/).map { |m,c| [c, m.size] } }
end
end
user system total real
0.800000 0.010000 0.810000 ( 0.803824)
0.190000 0.000000 0.190000 ( 0.192915)

You need to build up an array of results, rather than simply stopping at the first one:
def consecutive_frequencies(str)
str.each_char.reduce([]) do |frequencies_arr, char|
if frequencies_arr.last && frequencies_arr.last[0] == char
frequencies_arr.last[1] += 1
else
frequencies_arr << [char, 1]
end
frequencies_arr
end
end

#steenslag gave the answer I would have given, so I'll try something different.
"aabbcbb".each_char.with_object([]) { |c,a| (a.any? && c == a.last.first) ?
a.last[-1] += 1 : a << [c, 1] }
#=> [["a", 2], ["b", 2], ["c", 1], ["b", 2]]

def encrypt(str)
count = 0
array = []
str.chars do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
puts "[#{array}, #{count}]"
array.clear
count = 0
array << letter
count += 1
end
end
puts "[#{array}, #{count}]"
end

There are several errors with your implementation, I would try with a hash (rather than an array) and use something like this:
def encrypt(str)
count = 0
hash = {}
str.each_char do |letter|
if hash.key?(letter)
hash[letter] += 1
else
hash[letter] = 1
end
end
return hash
end
puts encrypt("aabbcbb")

Related

Can someone explain how inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h works?

alphabet = ["A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q","R","S","T",
"U","V","W","X","Y","Z"," ",".",",",";",
"-","'"
]
file = File.read("vt_00.txt")
i = 0
while i < alphabet.count do
single_char = alphabet[i]
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
j = 0
while j < alphabet.count do
two_chars = alphabet[i] + alphabet[j]
two_chars_count = file.scan(two_chars).count
if two_chars_count > 10 && two_chars_count < 15
print "#{two_chars} = #{two_chars_count} "
end
k = 0
while k < alphabet.count do
three_chars = alphabet[i] + alphabet[j] + alphabet[k]
three_chars_count = file.scan(three_chars).count
if three_chars_count > 10 && three_chars_count < 15
print "#{three_chars} = #{three_chars_count} "
end
k += 1
end
j += 1
end
i += 1
end
I had code like upper code. But then I found a solution through each_cons, can u explain how it works?
I don't understand .inject.. part.
count = string.each_cons(1).inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h
A more elaborate way to write it would be:
total = Hash.new(0)
string.each_cons(1).each{|bigram| total[bigram] += 1}
inject allows to inject some start value (Hash.new(0) --> we use the default 0 so we can safely use the += operator), and whatever the block returns is injected in the next iteration. So in this case we have to explicitly return the hash (total) to be able to manipulate it in the next step.
A simple example is adding all values of an array:
[1,4,5,23,2,66,123].inject(0){|sum, value| sum += value}
We start with 0, the first iteration we execute 0 + 1 and the result of that will then be injected in the next iteration.
Note: in your original code, instead of using while loops and maintaining counters, you could more easily iterate over the arrays as follows:
alphabet.each do |single_char|
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
alphabet.each do |second_char|
two_chars = single_char + second_char
# do something with two_chars
alphabet.each do |third_char|
three_chars = single_char + second-char + third_char
# do something with three_chars
end
end
end
I am guessing it depends on the size of the file whether iterating over all each_cons (1-2-3) or using file.scan will be more efficient.
The question
You wish to know how the following works:
g = Hash.new(0)
count = str.each_char.inject(g) do |h, s|
h[s] += 1
h
end.sort_by { |_key, value| value }.reverse.to_h
str.each_cons(1) does not work because the class String, of which str is an instance, does not have an instance method each_cons. There is a method Enumerable#each_cons, but the class String does not include that module, so strings to not respond to that method:
String.included_modules
#=> [Comparable, Kernel]
String#each_char does make sense here, as it returns an enumerator that generates each character of the string. I therefore assume that each_char was meant where each_cons(1) was written.
I have changed the variable names to something more generic, and have moved
g = Hash.new(0)
to a separate line.
An example
Suppose str is as follows:
str = "The Cat and the Hat"
Examine steps performed
Let's break the calculation into pieces:
g = Hash.new(0)
#=> {}
h = str.each_char.inject(g) do |h,s|
h[s] += 1
h
end
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
a = h.sort_by { |_key, value| value }
#=> [["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1],
# ["h", 2], ["e", 2], ["a", 3], ["t", 3], [" ", 4]]
b = a.reverse
#=> [[" ", 4], ["t", 3], ["a", 3], ["e", 2], ["h", 2],
# ["H", 1], ["d", 1], ["n", 1], ["C", 1], ["T", 1]]
count = b.to_h
#=> {" "=>4, "t"=>3, "a"=>3, "e"=>2, "h"=>2,
# "H"=>1, "d"=>1, "n"=>1, "C"=>1, "T"=>1}
The calculations of a, b and count are straightforward, so let's consider them first.
Calculation of a
Like all Enumerable methods, Enumerable#sort_by requires that its receiver responds to the method each. Here sort_by's receiver is a hash so h must respond to Hash#each. Indeed, sort_by's first operation is to convert h to an enumerator by sending it the method Hash#each:
enum = h.each
#=> #<Enumerator: {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1, "a"=>3,
# "t"=>3, "n"=>1, "d"=>1, "H"=>1}:each>
We can see the values that are generated by this enumerator by repeatedly sending it the method Enumerator#next:
enum.next #=> ["T", 1]
enum.next #=> ["h", 2]
enum.next #=> ["e", 2]
...
enum.next #=> ["H", 1]
enum.next #=> StopIteration (iteration reached an end)
It is seen that enum generates a sequence of the hash's key-value pairs. Therefore,
h.sort_by { |_key, value| value }
is equivalent to
[["T", 1], ["h", 2], ["e", 2],..., ["H", 1]].sort_by { |_key, value| value }
which explains why a equals the array shown above.
Calculation of b
This calculation could not be more straightforward. Note that we could save a step by replacing b = h.sort_by { |_key, value| value }.reverse with
b = h.sort_by { |_key, value| -value }
#=> [[" ", 4], ["a", 3], ["t", 3], ["h", 2], ["e", 2],
# ["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1]]
This sorts the the key-value pairs of h in decreasing order of value, as before, though ties are ordered somewhat differently.
Calculation of count
This is a straightforward application of the method Array#to_h.
Calculation of h
The first step in this calculation is to use the method Hash::new to create an empty hash with a default value of zero:
h = Hash.new(0)
#=> {}
This simply causes h[k] to return the default value of zero when h does not have a key k. For example, since h now has no keys:
h['cat']
#=> 0
If we now set
h['cat'] = 3
then
h['cat']
#=> 3
as the default value no longer applies. A hash h created this way is often called a counting hash. Ruby's first step in parsing the expression h[s] += 1 is to expand it to:
h[s] = h[s] + 1
If h does not have a key s the expression reduces to
h[s] = 0 + 1
because h[s] on the right of the equals sign (the method Hash#[], as contrasted with the method Hash#[]= on the left) returns the default value of zero. If the string were "aaa", the following calculations would be made:
h['a'] = h['a'] + 1 => 0 + 1 => 1
h['a'] = h['a'] + 1 => 1 + 1 => 2
h['a'] = h['a'] + 1 => 2 + 1 => 3
h['a'] on the right returns the default value of zero in the first step, but since h then has the key 'a' in the second and third steps the current values of h['a'] are returned after the first step.
Enumerable#inject (a.k.a reduce) can be used here but the calculation of h is more commonly written as follows:
h = str.each_char.each_with_object(Hash.new(0)) { |s,h| h[s] += 1 }
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
See Enumerable#each_with_object.

Compressing a given String problem in Ruby

I have been trying to solve this problem for a while now, im supposed to take a given string like "aaabbc" and compress it into a new string that states multiples of a letter in a row in place. So it would output "3a2bc"
So far i managed to print it out except it counts all instances of a letter and im not sure how to get rid of the current repeats:
def compress_str(str)
new_str = []
word = str.split("")
word.each do |char|
count = 0
word.each do |ele|
if ele == char
count += 1
end
end
if count > 1
new_str << count
new_str << char
else
new_str << char
end
end
return new_str.join("")
Example output:
Any suggestions on how I'm supposed to get rid of them?
Using Enumerable#chunk might be a good fit for your needs.
uncompressed = %w(aaabbcddaaaaa aaabb 111ddttttabaaacc)
uncompressed.each do |str|
puts str.chars.chunk{|e| e}.map {|e| "#{e[1].length}#{e[0]}"}.join
end
>>> 3a2b1c2d5a
>>> 3a2b
>>> 312d4t1a1b3a2c
Sure, you can add another check inside map block, so omit 1 before a single element and print as is.
You could use String#chars (1), so Enumerable#chunk_while (2), then Enumerable#flat_map (3) into the desired format and finally Array#join:
str = "aaabbcaa"
str.chars.chunk_while { |x, y| x == y }.flat_map { |e| [(e.size unless e.size == 1), e.first] }.join
#=> "3a2bc2a"
Step by step
# (1)
str.chars#.to_a
#=> ["a", "a", "a", "b", "b", "c", "a", "a"]
so
# (2)
str.chars.chunk_while { |x, y| x == y }#.to_a
#=> [["a", "a", "a"], ["b", "b"], ["c"], ["a", "a"]]
then
# (3)
str.chars.chunk_while { |x, y| x == y }.flat_map { |e| [(e.size unless e.size == 1),e.first] }
#=> [3, "a", 2, "b", nil, "c", 2, "a"]
String#scan can also be handy here.
uncompressed = %w(aaabbcddaaaaa aaabb 111ddttttabaaacc)
uncompressed.map { |w| w.scan(/(.)(\1*)/).map(&:join) }
#⇒ [["aaa", "bb", "c", "dd", "aaaaa"],
# ["aaa", "bb"],
# ["111", "dd", "tttt", "a", "b", "aaa", "cc"]]
And to get the desired outcome.
uncompressed.map do |w|
w.scan(/(.)(\1*)/).map(&:join).map do |l|
"#{l.length}#{l[0]}"
end.join
end
#⇒ ["3a2b1c2d5a", "3a2b", "312d4t1a1b3a2c"]

Ruby Counting chars in a sequence not using regex

Need help with this code on counting chars in a sequence.
This is what I want:
word("aaabbcbbaaa") == [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
word("aaaaaaaaaa") == [["a", 10]]
word("") == []
Here is my code:
def word(str)
words=str.split("")
count = Hash.new(0)
words.map {|char| count[char] +=1 }
return count
end
I got word("aaabbcbbaaa") => [["a", 6], ["b", 4], ["c", 1]], which is not what I want. I want to count each sequence. I prefer a none regex solution. Thanks.
Split string by chars, then group chunks by char, then count chars in chunks:
def word str
str
.chars
.chunk{ |e| e }
.map{|(e,ar)| [e, ar.length] }
end
p word "aaabbcbbaaa"
p word("aaaaaaaaaa")
p word ""
Result:
[["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
[["a", 10]]
[]
If you don't want to use a regex, you may just have to do something like:
def word(str)
last, n, result = str.chars.first, 0, []
str.chars.each do |char|
if char != last
result << [last, n]
last, n = char, 1
else
n += 1
end
end
result << [last, n]
end
I'd like to use some higher-order function to make this more concise, but there's no appropriate one in the Ruby standard library. Enumerable#partition almost does it, but not quite.
I'd do the following. Note that each_char is a newer method (Ruby 1.9?) that might not be available on your version, so stick with words=str.split("") in that case.
def word(str)
return [] if str.length == 0
seq_count = []
last_char = nil
count = 0
str.each_char do |char|
if last_char == char
count += 1
else
seq_count << [last_char, count] unless last_char.nil?
count = 1
end
last_char = char
end
seq_count << [last_char, count]
end
[52] pry(main)> word("hello")
=> [["h", 1], ["e", 1], ["l", 2], ["o", 1]]
[54] pry(main)> word("aaabbcbbaaa")
=> [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
[57] pry(main)> word("")
=> []
Another non-regexp-version.
x = "aaabbcbbaaa"
def word(str)
str.squeeze.reverse.chars.each_with_object([]) do |char, list|
count = 0
count += 1 until str.chomp!(char).nil?
list << [char, count]
end
end
p word(x) #=> [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
If the world were without regex and chunk:
def word(str)
a = str.chars
b = []
loop do
return b if a.empty?
c = a.slice_before {|e| e != a.first}.first
b << [c.first, c.size]
a = a[c.size..-1]
end
end
word "aaabbcbbaaa" # => [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
word "aaa" # => [["a",3]]
word "" # => []
Here's another way. Initially I tried to find a solution that didn't require conversion of the string to an array of its characters. I couldn't come up with anything decent until I saw #hirolau 's answer, which I modified:
def word(str)
list = []
char = str[-1]
loop do
return list if str.empty?
count = 0
count += 1 until str.chomp!(char).nil?
list.unshift [char, count]
char = str[-1]
end
end
You can use this pattern with scan:
"aaabbcbbaaa".scan(/((.)\2*)/)
and after count the number of char for all group 1
example:
"aaabbcbbaaaa".scan(/((.)\2*)/).map do |x,y| [y, x.length] end

Ruby String Encode Consecutive Letter Frequency

I want to encode a string in Ruby such that output should be in pairs so that I could decode it. I want to encode in such a way that each pair contains the next distinct letter in the string, and the number consecutive repeats.
e.g If I encode "aaabbcbbaaa" output should
[["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
here is the code.
def encode( s )
b = 0
e = s.length - 1
ret = []
while ( s <= e )
m = s.match( /(\w)\1*/ )
l = m[0][0]
n = m[0].length
ret << [l, n]
end
ret
end
"aaabbcbbaaa".chars.chunk{|i| i}.map{|m,n| [m,n.count(m)]}
#=> [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
"aaabbcbbaaa".scan(/((.)\2*)/).map{|s, c| [c, s.length]}
You could also do this procedurally.
def group_consecutive(input)
groups = []
input.each_char do |c|
if groups.empty? || groups.last[0] != c
groups << [c, 1]
else
groups.last[1] += 1
end
end
groups
end
'aaabbcbbaaa'.scan(/((.)\2*)/).map {|e| [e[1], e[0].size]}

How to count duplicates in Ruby Arrays

How do you count duplicates in a ruby array?
For example, if my array had three a's, how could I count that
Another version of a hash with a key for each element in your array and value for the count of each element
a = [ 1, 2, 3, 3, 4, 3]
h = Hash.new(0)
a.each { | v | h.store(v, h[v]+1) }
# h = { 3=>3, 2=>1, 1=>1, 4=>1 }
Given:
arr = [ 1, 2, 3, 2, 4, 5, 3]
My favourite way of counting elements is:
counts = arr.group_by{|i| i}.map{|k,v| [k, v.count] }
# => [[1, 1], [2, 2], [3, 2], [4, 1], [5, 1]]
If you need a hash instead of an array:
Hash[*counts.flatten]
# => {1=>1, 2=>2, 3=>2, 4=>1, 5=>1}
This will yield the duplicate elements as a hash with the number of occurences for each duplicate item. Let the code speak:
#!/usr/bin/env ruby
class Array
# monkey-patched version
def dup_hash
inject(Hash.new(0)) { |h,e| h[e] += 1; h }.select {
|k,v| v > 1 }.inject({}) { |r, e| r[e.first] = e.last; r }
end
end
# unmonkeey'd
def dup_hash(ary)
ary.inject(Hash.new(0)) { |h,e| h[e] += 1; h }.select {
|_k,v| v > 1 }.inject({}) { |r, e| r[e.first] = e.last; r }
end
p dup_hash([1, 2, "a", "a", 4, "a", 2, 1])
# {"a"=>3, 1=>2, 2=>2}
p [1, 2, "Thanks", "You're welcome", "Thanks",
"You're welcome", "Thanks", "You're welcome"].dup_hash
# {"You're welcome"=>3, "Thanks"=>3}
Simple.
arr = [2,3,4,3,2,67,2]
repeats = arr.length - arr.uniq.length
puts repeats
arr = %w( a b c d c b a )
# => ["a", "b", "c", "d", "c", "b", "a"]
arr.count('a')
# => 2
Another way to count array duplicates is:
arr= [2,2,3,3,2,4,2]
arr.group_by{|x| x}.map{|k,v| [k,v.count] }
result is
[[2, 4], [3, 2], [4, 1]]
requires 1.8.7+ for group_by
ary = %w{a b c d a e f g a h i b}
ary.group_by{|elem| elem}.select{|key,val| val.length > 1}.map{|key,val| key}
# => ["a", "b"]
with 1.9+ this can be slightly simplified because Hash#select will return a hash.
ary.group_by{|elem| elem}.select{|key,val| val.length > 1}.keys
# => ["a", "b"]
To count instances of a single element use inject
array.inject(0){|count,elem| elem == value ? count+1 : count}
arr = [1, 2, "a", "a", 4, "a", 2, 1]
arr.group_by(&:itself).transform_values(&:size)
#=> {1=>2, 2=>2, "a"=>3, 4=>1}
Ruby >= 2.7 solution here:
A new method .tally has been added.
Tallies the collection, i.e., counts the occurrences of each element. Returns a hash with the elements of the collection as keys and the corresponding counts as values.
So now, you will be able to do:
["a", "b", "c", "b"].tally #=> {"a"=>1, "b"=>2, "c"=>1}
What about a grep?
arr = [1, 2, "Thanks", "You're welcome", "Thanks", "You're welcome", "Thanks", "You're welcome"]
arr.grep('Thanks').size # => 3
Its Easy:
words = ["aa","bb","cc","bb","bb","cc"]
One line simple solution is:
words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }
It works for me.
Thanks!!
I don't think there's a built-in method. If all you need is the total count of duplicates, you could take a.length - a.uniq.length. If you're looking for the count of a single particular element, try
a.select {|e| e == my_element}.length.
Improving #Kim's answer:
arr = [1, 2, "a", "a", 4, "a", 2, 1]
Hash.new(0).tap { |h| arr.each { |v| h[v] += 1 } }
# => {1=>2, 2=>2, "a"=>3, 4=>1}
Ruby code to get the repeated elements in the array:
numbers = [1,2,3,1,2,0,8,9,0,1,2,3]
similar = numbers.each_with_object([]) do |n, dups|
dups << n if seen.include?(n)
seen << n
end
print "similar --> ", similar
Another way to do it is to use each_with_object:
a = [ 1, 2, 3, 3, 4, 3]
hash = a.each_with_object({}) {|v, h|
h[v] ||= 0
h[v] += 1
}
# hash = { 3=>3, 2=>1, 1=>1, 4=>1 }
This way, calling a non-existing key such as hash[5] will return nil instead of 0 with Kim's solution.
I've used reduce/inject for this in the past, like the following
array = [1,5,4,3,1,5,6,8,8,8,9]
array.reduce (Hash.new(0)) {|counts, el| counts[el]+=1; counts}
produces
=> {1=>2, 5=>2, 4=>1, 3=>1, 6=>1, 8=>3, 9=>1}

Resources