Ruby String Encode Consecutive Letter Frequency - ruby

I want to encode a string in Ruby such that output should be in pairs so that I could decode it. I want to encode in such a way that each pair contains the next distinct letter in the string, and the number consecutive repeats.
e.g If I encode "aaabbcbbaaa" output should
[["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
here is the code.
def encode( s )
b = 0
e = s.length - 1
ret = []
while ( s <= e )
m = s.match( /(\w)\1*/ )
l = m[0][0]
n = m[0].length
ret << [l, n]
end
ret
end

"aaabbcbbaaa".chars.chunk{|i| i}.map{|m,n| [m,n.count(m)]}
#=> [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]

"aaabbcbbaaa".scan(/((.)\2*)/).map{|s, c| [c, s.length]}

You could also do this procedurally.
def group_consecutive(input)
groups = []
input.each_char do |c|
if groups.empty? || groups.last[0] != c
groups << [c, 1]
else
groups.last[1] += 1
end
end
groups
end

'aaabbcbbaaa'.scan(/((.)\2*)/).map {|e| [e[1], e[0].size]}

Related

Do something in middle of recursive function, then return as needed

How do I do something in the middle of a recursion, and return as needed? In other words, maybe no more recursion is needed because I have found a "solution" in which case to save resources, the recursion can stop.
For example, let's say I have a working permute method that does this
permute([["a","b"],[1,2]])
>>> [["a", 1], ["a", 2], ["b", 1], ["b", 2]]
Rather than have the method generate all 4 possibilities, if one meets my requirements, I'd like it to stop. For example, let's say I'm searching for ["a",2], then the method can stop after it creates the second possibility.
This is my current permute method that is working
def permute(arr)
if arr.length == 1
return arr.first
else
first = arr.shift
return first.product(permute(arr)).uniq
end
end
I feel like I need to inject a do block somewhere with something like the below, but not sure how/where...
if result_of_permutation_currently == ["a",2]
return ...
else
# continuing the permutations
end
You could write your method as follows.
def partial_product(arr, last_element)
#a = []
#last_element = last_element
recurse(arr)
#a
end
def recurse(arr, element = [])
first, *rest = arr
if rest.empty?
first.each do |e|
el = element + [e]
#a << el
return true if el == #last_element
end
else
first.each do |e|
rv = recurse(rest, element + [e])
return true if rv
end
end
false
end
arr = [["a","b"], [1,2,3], ["cat","dog"]]
partial_product(arr, ["b",2,"dog"])
#=> [["a", 1, "cat"], ["a", 1, "dog"], ["a", 2, "cat"],
# ["a", 2, "dog"], ["a", 3, "cat"], ["a", 3, "dog"],
# ["b", 1, "cat"], ["b", 1, "dog"], ["b", 2, "cat"],
# ["b", 2, "dog"]]
partial_product(arr, ["a",1,"dog"])
#=> [["a", 1, "cat"], ["a", 1, "dog"]]
partial_product(arr, ["b",2,"pig"])
#=> [["a", 1, "cat"], ["a", 1, "dog"], ["a", 2, "cat"],
# ["a", 2, "dog"], ["a", 3, "cat"], ["a", 3, "dog"],
# ["b", 1, "cat"], ["b", 1, "dog"], ["b", 2, "cat"],
# ["b", 2, "dog"], ["b", 3, "cat"], ["b", 3, "dog"]]
If you prefer to avoid using instance variables, you could carry a and last_element as arguments in recurse, but there would be inefficiencies by doing so, particularly in terms of memory use.
Here are two ways that could be done without using recursion.
Use each to generate elements of the desired array until the target pair is reached
def permute(arr1, arr2, last_pair = [])
arr1.each_with_object([]) do |e1,a|
arr2.each do |e2|
a << [e1, e2]
break a if [e1, e2] == last_pair
end
end
end
permute(["a","b"],[1,2],["b", 1])
#=> [["a", 1], ["a", 2], ["b", 1]]
permute(["a","b"],[1,2],["b", 99])
#=> [["a", 1], ["a", 2], ["b", 1], ["b", 2]]
permute(["a","b"],[1,2])
#=> [["a", 1], ["a", 2], ["b", 1], ["b", 2]]
permute(["a","b"],[],["b", 1])
#=> []
permute([],[1,2],["b", 1])
#=> []
permute([],[],["b", 1])
#=> []
Map a sequence of the indices of the desired array
def permute(arr1, arr2, last_pair = [])
n1 = arr1.size
n2 = arr2.size
idx1 = arr1.index(last_pair.first)
idx2 = idx1.nil? ? nil : arr2.index(last_pair.last)
return arr1.product(arr2) if idx2.nil?
0.step(to: idx1*n2+idx2).
map {|i| [arr1[(i % (n1*n2))/n2], arr2[i % n2]]}
end
permute(["a","b"],[1,2],["b", 1])
See Numeric#step
idx1*n2 + idx2, the number of elements in the array to be returned, is computed as follows.
last_pair = ["b", 1]
n2 = arr2.size
#=> 2
idx1 = arr1.index(last_pair.first)
#=> 1
idx2 = idx1.nil? ? nil : arr2.index(last_pair.last)
#=> 0
idx1*n2 + idx2
#=> 2
The element at index i of the array returned is:
n1 = arr1.size
#=> 2
[arr1[(i % (n1*n2))/n2], arr2[i % n2]]
#=> [["a","b"][(i % 2*2)/2], [1,2][i % 2]]
For i = 1 this is
[["a","b"][(1 % 4)/2], [1,2][1 % 2]]
#=> [["a","b"][0], [1,2][1]]
#=> [“a”, 2]
For i = 2 this is
[["a","b"][(2 % 4)/2], [1,2][2 % 2]]
#=> [["a","b"][1], [1,2][0]]
#=> [“b”,1]
Note that we cannot write
arr1.lazy.product(arr2).first(idx1*n2+idx2+1)
because arr1.lazy returns an enumerator (arr1.lazy
#=> #<Enumerator::Lazy: ["a", "b"]>) but Array#product requires it's receiver to be an array. It's for that reason that some Rubyists would like to see product made an Enumerable method (with a lazy version), but don't hold your breathe.

Can someone explain how inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h works?

alphabet = ["A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q","R","S","T",
"U","V","W","X","Y","Z"," ",".",",",";",
"-","'"
]
file = File.read("vt_00.txt")
i = 0
while i < alphabet.count do
single_char = alphabet[i]
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
j = 0
while j < alphabet.count do
two_chars = alphabet[i] + alphabet[j]
two_chars_count = file.scan(two_chars).count
if two_chars_count > 10 && two_chars_count < 15
print "#{two_chars} = #{two_chars_count} "
end
k = 0
while k < alphabet.count do
three_chars = alphabet[i] + alphabet[j] + alphabet[k]
three_chars_count = file.scan(three_chars).count
if three_chars_count > 10 && three_chars_count < 15
print "#{three_chars} = #{three_chars_count} "
end
k += 1
end
j += 1
end
i += 1
end
I had code like upper code. But then I found a solution through each_cons, can u explain how it works?
I don't understand .inject.. part.
count = string.each_cons(1).inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h
A more elaborate way to write it would be:
total = Hash.new(0)
string.each_cons(1).each{|bigram| total[bigram] += 1}
inject allows to inject some start value (Hash.new(0) --> we use the default 0 so we can safely use the += operator), and whatever the block returns is injected in the next iteration. So in this case we have to explicitly return the hash (total) to be able to manipulate it in the next step.
A simple example is adding all values of an array:
[1,4,5,23,2,66,123].inject(0){|sum, value| sum += value}
We start with 0, the first iteration we execute 0 + 1 and the result of that will then be injected in the next iteration.
Note: in your original code, instead of using while loops and maintaining counters, you could more easily iterate over the arrays as follows:
alphabet.each do |single_char|
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
alphabet.each do |second_char|
two_chars = single_char + second_char
# do something with two_chars
alphabet.each do |third_char|
three_chars = single_char + second-char + third_char
# do something with three_chars
end
end
end
I am guessing it depends on the size of the file whether iterating over all each_cons (1-2-3) or using file.scan will be more efficient.
The question
You wish to know how the following works:
g = Hash.new(0)
count = str.each_char.inject(g) do |h, s|
h[s] += 1
h
end.sort_by { |_key, value| value }.reverse.to_h
str.each_cons(1) does not work because the class String, of which str is an instance, does not have an instance method each_cons. There is a method Enumerable#each_cons, but the class String does not include that module, so strings to not respond to that method:
String.included_modules
#=> [Comparable, Kernel]
String#each_char does make sense here, as it returns an enumerator that generates each character of the string. I therefore assume that each_char was meant where each_cons(1) was written.
I have changed the variable names to something more generic, and have moved
g = Hash.new(0)
to a separate line.
An example
Suppose str is as follows:
str = "The Cat and the Hat"
Examine steps performed
Let's break the calculation into pieces:
g = Hash.new(0)
#=> {}
h = str.each_char.inject(g) do |h,s|
h[s] += 1
h
end
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
a = h.sort_by { |_key, value| value }
#=> [["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1],
# ["h", 2], ["e", 2], ["a", 3], ["t", 3], [" ", 4]]
b = a.reverse
#=> [[" ", 4], ["t", 3], ["a", 3], ["e", 2], ["h", 2],
# ["H", 1], ["d", 1], ["n", 1], ["C", 1], ["T", 1]]
count = b.to_h
#=> {" "=>4, "t"=>3, "a"=>3, "e"=>2, "h"=>2,
# "H"=>1, "d"=>1, "n"=>1, "C"=>1, "T"=>1}
The calculations of a, b and count are straightforward, so let's consider them first.
Calculation of a
Like all Enumerable methods, Enumerable#sort_by requires that its receiver responds to the method each. Here sort_by's receiver is a hash so h must respond to Hash#each. Indeed, sort_by's first operation is to convert h to an enumerator by sending it the method Hash#each:
enum = h.each
#=> #<Enumerator: {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1, "a"=>3,
# "t"=>3, "n"=>1, "d"=>1, "H"=>1}:each>
We can see the values that are generated by this enumerator by repeatedly sending it the method Enumerator#next:
enum.next #=> ["T", 1]
enum.next #=> ["h", 2]
enum.next #=> ["e", 2]
...
enum.next #=> ["H", 1]
enum.next #=> StopIteration (iteration reached an end)
It is seen that enum generates a sequence of the hash's key-value pairs. Therefore,
h.sort_by { |_key, value| value }
is equivalent to
[["T", 1], ["h", 2], ["e", 2],..., ["H", 1]].sort_by { |_key, value| value }
which explains why a equals the array shown above.
Calculation of b
This calculation could not be more straightforward. Note that we could save a step by replacing b = h.sort_by { |_key, value| value }.reverse with
b = h.sort_by { |_key, value| -value }
#=> [[" ", 4], ["a", 3], ["t", 3], ["h", 2], ["e", 2],
# ["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1]]
This sorts the the key-value pairs of h in decreasing order of value, as before, though ties are ordered somewhat differently.
Calculation of count
This is a straightforward application of the method Array#to_h.
Calculation of h
The first step in this calculation is to use the method Hash::new to create an empty hash with a default value of zero:
h = Hash.new(0)
#=> {}
This simply causes h[k] to return the default value of zero when h does not have a key k. For example, since h now has no keys:
h['cat']
#=> 0
If we now set
h['cat'] = 3
then
h['cat']
#=> 3
as the default value no longer applies. A hash h created this way is often called a counting hash. Ruby's first step in parsing the expression h[s] += 1 is to expand it to:
h[s] = h[s] + 1
If h does not have a key s the expression reduces to
h[s] = 0 + 1
because h[s] on the right of the equals sign (the method Hash#[], as contrasted with the method Hash#[]= on the left) returns the default value of zero. If the string were "aaa", the following calculations would be made:
h['a'] = h['a'] + 1 => 0 + 1 => 1
h['a'] = h['a'] + 1 => 1 + 1 => 2
h['a'] = h['a'] + 1 => 2 + 1 => 3
h['a'] on the right returns the default value of zero in the first step, but since h then has the key 'a' in the second and third steps the current values of h['a'] are returned after the first step.
Enumerable#inject (a.k.a reduce) can be used here but the calculation of h is more commonly written as follows:
h = str.each_char.each_with_object(Hash.new(0)) { |s,h| h[s] += 1 }
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
See Enumerable#each_with_object.

Consecutive letter frequency

I am trying to write code to determine consecutive frequency of letters within a string.
For example:
"aabbcbb" => ["a",2],["b",2],["c", 1], ["b", 2]
My code gives me the first letter frequency but doesn't move on to the next.
def encrypt(str)
array = []
count = 0
str.each_char do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
return [array, count]
array = []
end
end
end
p "aabbcbb".chars.chunk{|c| c}.map{|c, a| [c, a.size]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
"aabbcbb".chars.slice_when(&:!=).map{|a| [a.first, a.length]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
There's a simple regular expression-based solution involving back-references:
"aabbbcbb".scan(/((.)\2*)/).map { |m,c| [c, m.length] }
# => [["a", 2], ["b", 3], ["c", 1], ["b", 2]]
But I would prefer the chunk method for clarity (and almost certainly efficiency).
Actually out of curiosity, I wrote a quick benchmark and scan is a little more than four times faster than chunk.map, but I'd still use chunk.map for clarity unless you're actually doing this hundreds of thousands of times:
require 'benchmark'
N = 10000
data = ('a'..'z').map { |c| c * 10 }.join("")
Benchmark.bm do |bm|
bm.report do
N.times { data.chars.chunk{ |c| c }.map { |c, a| [c, a.size] } }
end
bm.report do
N.times { data.scan(/((.)\2*)/).map { |m,c| [c, m.size] } }
end
end
user system total real
0.800000 0.010000 0.810000 ( 0.803824)
0.190000 0.000000 0.190000 ( 0.192915)
You need to build up an array of results, rather than simply stopping at the first one:
def consecutive_frequencies(str)
str.each_char.reduce([]) do |frequencies_arr, char|
if frequencies_arr.last && frequencies_arr.last[0] == char
frequencies_arr.last[1] += 1
else
frequencies_arr << [char, 1]
end
frequencies_arr
end
end
#steenslag gave the answer I would have given, so I'll try something different.
"aabbcbb".each_char.with_object([]) { |c,a| (a.any? && c == a.last.first) ?
a.last[-1] += 1 : a << [c, 1] }
#=> [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
def encrypt(str)
count = 0
array = []
str.chars do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
puts "[#{array}, #{count}]"
array.clear
count = 0
array << letter
count += 1
end
end
puts "[#{array}, #{count}]"
end
There are several errors with your implementation, I would try with a hash (rather than an array) and use something like this:
def encrypt(str)
count = 0
hash = {}
str.each_char do |letter|
if hash.key?(letter)
hash[letter] += 1
else
hash[letter] = 1
end
end
return hash
end
puts encrypt("aabbcbb")

Ruby Counting chars in a sequence not using regex

Need help with this code on counting chars in a sequence.
This is what I want:
word("aaabbcbbaaa") == [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
word("aaaaaaaaaa") == [["a", 10]]
word("") == []
Here is my code:
def word(str)
words=str.split("")
count = Hash.new(0)
words.map {|char| count[char] +=1 }
return count
end
I got word("aaabbcbbaaa") => [["a", 6], ["b", 4], ["c", 1]], which is not what I want. I want to count each sequence. I prefer a none regex solution. Thanks.
Split string by chars, then group chunks by char, then count chars in chunks:
def word str
str
.chars
.chunk{ |e| e }
.map{|(e,ar)| [e, ar.length] }
end
p word "aaabbcbbaaa"
p word("aaaaaaaaaa")
p word ""
Result:
[["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
[["a", 10]]
[]
If you don't want to use a regex, you may just have to do something like:
def word(str)
last, n, result = str.chars.first, 0, []
str.chars.each do |char|
if char != last
result << [last, n]
last, n = char, 1
else
n += 1
end
end
result << [last, n]
end
I'd like to use some higher-order function to make this more concise, but there's no appropriate one in the Ruby standard library. Enumerable#partition almost does it, but not quite.
I'd do the following. Note that each_char is a newer method (Ruby 1.9?) that might not be available on your version, so stick with words=str.split("") in that case.
def word(str)
return [] if str.length == 0
seq_count = []
last_char = nil
count = 0
str.each_char do |char|
if last_char == char
count += 1
else
seq_count << [last_char, count] unless last_char.nil?
count = 1
end
last_char = char
end
seq_count << [last_char, count]
end
[52] pry(main)> word("hello")
=> [["h", 1], ["e", 1], ["l", 2], ["o", 1]]
[54] pry(main)> word("aaabbcbbaaa")
=> [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
[57] pry(main)> word("")
=> []
Another non-regexp-version.
x = "aaabbcbbaaa"
def word(str)
str.squeeze.reverse.chars.each_with_object([]) do |char, list|
count = 0
count += 1 until str.chomp!(char).nil?
list << [char, count]
end
end
p word(x) #=> [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
If the world were without regex and chunk:
def word(str)
a = str.chars
b = []
loop do
return b if a.empty?
c = a.slice_before {|e| e != a.first}.first
b << [c.first, c.size]
a = a[c.size..-1]
end
end
word "aaabbcbbaaa" # => [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
word "aaa" # => [["a",3]]
word "" # => []
Here's another way. Initially I tried to find a solution that didn't require conversion of the string to an array of its characters. I couldn't come up with anything decent until I saw #hirolau 's answer, which I modified:
def word(str)
list = []
char = str[-1]
loop do
return list if str.empty?
count = 0
count += 1 until str.chomp!(char).nil?
list.unshift [char, count]
char = str[-1]
end
end
You can use this pattern with scan:
"aaabbcbbaaa".scan(/((.)\2*)/)
and after count the number of char for all group 1
example:
"aaabbcbbaaaa".scan(/((.)\2*)/).map do |x,y| [y, x.length] end

Count consecutives

I need to write a method that does the following
consecutive_count("aaabbcbbaaa") == [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
I got the code, but it looks ugly and I'm trying to see a better solution, please advice.
Here is my code:
def consecutive_count(str)
el = str[0]; count = 0; result = []
str.split("").each do |l|
if (el != l)
result << [el, count]
count = 1
el = l
else
count +=1
end
end
result << [el, count] if !el.nil?
return result
end
Here is one way :
s = "aaabbcbbaaa"
s.chars.chunk{|e| e }.map{|item,ary| [item,ary.size]}
# => [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
"aaabbcbbaaa".scan(/(?<s>(?<c>.)\k<c>*)/).map{|s, c| [c, s.length]}
# => [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
or
"aaabbcbbaaa".scan(/((.)\2*)/).map{|s, c| [c, s.length]}
# => [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
A solution which does not involve regex magic (although those are a bit shorter and probably faster) is this:
str.each_char.each_with_object([]) do |char, result|
if (result.last || [])[0] == char
result.last[1] += 1
else
result << [char, 1]
end
end
Depending on your level of understanding, it might better transport your intended meaning which might help to debug the thing in 6 month :)
Regexp solution:
my_s = "aaabbcbbaaa"
p my_s.scan(/(.)(\1*)/).map{|x,y| [x, y.size + 1]}
#=> [["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
or
a, result = "aaabbcbbaaa", []
result << a.slice!(/(\w)\1*/) until a.empty?
and then map the result with counts.
You can try:
def consecutive_count(str)
result = {}
array = str.split(//).uniq
array.each.map {|char| result[char] = 0}
array.each do |char|
while str.starts_with?(char) do
result[char] += 1
str[0] = ""
end
result
end

Resources