Can someone explain how inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h works? - ruby

alphabet = ["A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q","R","S","T",
"U","V","W","X","Y","Z"," ",".",",",";",
"-","'"
]
file = File.read("vt_00.txt")
i = 0
while i < alphabet.count do
single_char = alphabet[i]
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
j = 0
while j < alphabet.count do
two_chars = alphabet[i] + alphabet[j]
two_chars_count = file.scan(two_chars).count
if two_chars_count > 10 && two_chars_count < 15
print "#{two_chars} = #{two_chars_count} "
end
k = 0
while k < alphabet.count do
three_chars = alphabet[i] + alphabet[j] + alphabet[k]
three_chars_count = file.scan(three_chars).count
if three_chars_count > 10 && three_chars_count < 15
print "#{three_chars} = #{three_chars_count} "
end
k += 1
end
j += 1
end
i += 1
end
I had code like upper code. But then I found a solution through each_cons, can u explain how it works?
I don't understand .inject.. part.
count = string.each_cons(1).inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h

A more elaborate way to write it would be:
total = Hash.new(0)
string.each_cons(1).each{|bigram| total[bigram] += 1}
inject allows to inject some start value (Hash.new(0) --> we use the default 0 so we can safely use the += operator), and whatever the block returns is injected in the next iteration. So in this case we have to explicitly return the hash (total) to be able to manipulate it in the next step.
A simple example is adding all values of an array:
[1,4,5,23,2,66,123].inject(0){|sum, value| sum += value}
We start with 0, the first iteration we execute 0 + 1 and the result of that will then be injected in the next iteration.
Note: in your original code, instead of using while loops and maintaining counters, you could more easily iterate over the arrays as follows:
alphabet.each do |single_char|
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
alphabet.each do |second_char|
two_chars = single_char + second_char
# do something with two_chars
alphabet.each do |third_char|
three_chars = single_char + second-char + third_char
# do something with three_chars
end
end
end
I am guessing it depends on the size of the file whether iterating over all each_cons (1-2-3) or using file.scan will be more efficient.

The question
You wish to know how the following works:
g = Hash.new(0)
count = str.each_char.inject(g) do |h, s|
h[s] += 1
h
end.sort_by { |_key, value| value }.reverse.to_h
str.each_cons(1) does not work because the class String, of which str is an instance, does not have an instance method each_cons. There is a method Enumerable#each_cons, but the class String does not include that module, so strings to not respond to that method:
String.included_modules
#=> [Comparable, Kernel]
String#each_char does make sense here, as it returns an enumerator that generates each character of the string. I therefore assume that each_char was meant where each_cons(1) was written.
I have changed the variable names to something more generic, and have moved
g = Hash.new(0)
to a separate line.
An example
Suppose str is as follows:
str = "The Cat and the Hat"
Examine steps performed
Let's break the calculation into pieces:
g = Hash.new(0)
#=> {}
h = str.each_char.inject(g) do |h,s|
h[s] += 1
h
end
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
a = h.sort_by { |_key, value| value }
#=> [["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1],
# ["h", 2], ["e", 2], ["a", 3], ["t", 3], [" ", 4]]
b = a.reverse
#=> [[" ", 4], ["t", 3], ["a", 3], ["e", 2], ["h", 2],
# ["H", 1], ["d", 1], ["n", 1], ["C", 1], ["T", 1]]
count = b.to_h
#=> {" "=>4, "t"=>3, "a"=>3, "e"=>2, "h"=>2,
# "H"=>1, "d"=>1, "n"=>1, "C"=>1, "T"=>1}
The calculations of a, b and count are straightforward, so let's consider them first.
Calculation of a
Like all Enumerable methods, Enumerable#sort_by requires that its receiver responds to the method each. Here sort_by's receiver is a hash so h must respond to Hash#each. Indeed, sort_by's first operation is to convert h to an enumerator by sending it the method Hash#each:
enum = h.each
#=> #<Enumerator: {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1, "a"=>3,
# "t"=>3, "n"=>1, "d"=>1, "H"=>1}:each>
We can see the values that are generated by this enumerator by repeatedly sending it the method Enumerator#next:
enum.next #=> ["T", 1]
enum.next #=> ["h", 2]
enum.next #=> ["e", 2]
...
enum.next #=> ["H", 1]
enum.next #=> StopIteration (iteration reached an end)
It is seen that enum generates a sequence of the hash's key-value pairs. Therefore,
h.sort_by { |_key, value| value }
is equivalent to
[["T", 1], ["h", 2], ["e", 2],..., ["H", 1]].sort_by { |_key, value| value }
which explains why a equals the array shown above.
Calculation of b
This calculation could not be more straightforward. Note that we could save a step by replacing b = h.sort_by { |_key, value| value }.reverse with
b = h.sort_by { |_key, value| -value }
#=> [[" ", 4], ["a", 3], ["t", 3], ["h", 2], ["e", 2],
# ["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1]]
This sorts the the key-value pairs of h in decreasing order of value, as before, though ties are ordered somewhat differently.
Calculation of count
This is a straightforward application of the method Array#to_h.
Calculation of h
The first step in this calculation is to use the method Hash::new to create an empty hash with a default value of zero:
h = Hash.new(0)
#=> {}
This simply causes h[k] to return the default value of zero when h does not have a key k. For example, since h now has no keys:
h['cat']
#=> 0
If we now set
h['cat'] = 3
then
h['cat']
#=> 3
as the default value no longer applies. A hash h created this way is often called a counting hash. Ruby's first step in parsing the expression h[s] += 1 is to expand it to:
h[s] = h[s] + 1
If h does not have a key s the expression reduces to
h[s] = 0 + 1
because h[s] on the right of the equals sign (the method Hash#[], as contrasted with the method Hash#[]= on the left) returns the default value of zero. If the string were "aaa", the following calculations would be made:
h['a'] = h['a'] + 1 => 0 + 1 => 1
h['a'] = h['a'] + 1 => 1 + 1 => 2
h['a'] = h['a'] + 1 => 2 + 1 => 3
h['a'] on the right returns the default value of zero in the first step, but since h then has the key 'a' in the second and third steps the current values of h['a'] are returned after the first step.
Enumerable#inject (a.k.a reduce) can be used here but the calculation of h is more commonly written as follows:
h = str.each_char.each_with_object(Hash.new(0)) { |s,h| h[s] += 1 }
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
See Enumerable#each_with_object.

Related

Sum of same values in ruby hash

everybody.
I have hash for example
{-2=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
There can be equal values. My task is to sum keys where values are equal. Result:
{51=>"a", -1=>"c", -38=>"ab"}
How can I do this?
hash.group_by{|key,val| val}
Gives awful result.
hash = {-2=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
hash.reduce({}) do |memo, (k,v)|
memo[v] ||= 0
memo[v] += k
memo
end.invert
# => {51=>"a", -1=>"c", -38=>"ab"}
reduce - lets you build up a new value by iterating over the values of a collection, in this case hash. See the docs for more.
invert - swaps the keys and values of a hash. See the docs for more.
Other ways to do this:
hash.reduce(Hash.new(0)) { |memo, (k,v)| memo[v] += k; memo }.invert
h = {-2=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
then
h.group_by(&:last).each_with_object({}) { |(k,v),h| h[v.map(&:first).sum] = k }
#=> {51=>"a", -1=>"c", -38=>"ab"}
but that would be crazy as it relies on the sums being unique. (Recall that hashes have unique keys.) Suppose
h = {-54=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
then
h.group_by(&:last).each_with_object({}) { |(k,v),h| h[v.map(&:first).sum] = k }
#=> {-1=>"c", -38=>"ab"}
as -1=>"a" is overwritten by -1=<"c". I doubt that this is wanted.
It would be better to save the contents of h in an array:
a = [[-2, "a"], [-1, "c"], [-1, "a"], [49, "a"], [-43, "ab"], [5, "ab"]]
(as it permits duplicate values of the integers--here -1) and then compute
a.group_by(&:last).each_with_object({}) { |(e,ar),h| h[e] = ar.map(&:first).sum }
#=> {"a"=>46, "c"=>-1, "ab"=>-38}
Note that (for the original value of h)
h.group_by(&:last)
#=> {"a"=>[[-2, "a"], [1, "a"], [3, "a"], [49, "a"]],
# "c"=>[[-1, "c"]], "ab"=>[[-43, "ab"], [5, "ab"]]}
and v.map(&:first).sum could be replaced with
v.reduce(0) { |t,(n,_)| t+n }

Ruby read a file, create a hash with a key and data, then sort alphabetically

I have a file like below, which I need to put into a hash:
GHIThree, Line, Number
DEFNumber, Two, Line
ABCLine, Number, One
I need to do is take the first 3 characters and turn that into a key and then the rest of the line into the value.
So when I print the hash it should look something like this:
Keys Values
ABC Line, Number, One
DEF Number, Two, Line
GHI Three, Line, Number
Here is what I've got, its a little all over the place but here it is:
lines = File.open("homework02.txt").read.split
fHash = {}
lines.each do |line|
next if line == ""
fHash[line[0..2]] = line[3..-1]
end
f = File.open("homework02.txt")
fHash = {}
loop do
x = f.gets
break unless x
fHash[x[0..2]] = x[3..-1]
end
puts fHash
f.close
fHash = fHash.to_a.sort.to_h
Try in this way:
result = {}
CSV.foreach('file.csv', skip_blanks: true) do |row|
result[row[0].slice!(0..2)] = row
end
result.sort.to_h
Another way:
fHash = { 4=>1, 1=>1, 3=>2 }
fHash.keys.sort.each_with_object({}) { |k,h| h[k]=fHash[k] }
#=> {1=>1, 3=>2, 4=>1}
The steps:
a = fHash.keys
#=> [4, 1, 3]
b = a.sort
#=> [1, 3, 4]
enum = b.each_with_object({})
#=> #<Enumerator: [1, 3, 4]:each_with_object({})>
We can view the contents of this enumerator by converting it to an array:
enum.to_a
#=> [[1, {}], [3, {}], [4, {}]]
You will see that the value of the hash will change as elements of enum are passed to the block.
The block variables k and h are assigned to the first element of enum:
k,h = enum.next
#=> [1, {}]
k #=> 1
h #=> {}
and the block calculation is performed:
h[k]=fHash[k]
#=> h[1] = { 4=>1, 1=>1, 3=>2 }[1]
# h[1] = 1
h #=> {1=>1}
The second element of enum is passed to the block and the operations are repeated:
k,h = enum.next
#=> [3, {1=>1}]
h[k]=fHash[k]
#=> h[3] = { 4=>1, 1=>1, 3=>2 }[3]
# h[3] = 2
h #=> {1=>1, 3=>2}
The third and last element of enum is passed to the block:
k,h = enum.next
#=> [4, {1=>1, 3=>2}]
h[k]=fHash[k]
#=> h[4] = { 4=>1, 1=>1, 3=>2 }[4]
# h[4] = 1
h #=> {1=>1, 3=>2, 4=>1}

How to write a method that finds the most common letter in a string?

This is the question prompt:
Write a method that takes in a string. Your method should return the most common letter in the array, and a count of how many times it appears.
I'm not entirely sure where to go with what I have so far.
def most_common_letter(string)
arr1 = string.chars
arr2 = arr1.max_by(&:count)
end
I suggest you use a counting hash:
str = "The quick brown dog jumped over the lazy fox."
str.downcase.gsub(/[^a-z]/,'').
each_char.
with_object(Hash.new(0)) { |c,h| h[c] += 1 }.
max_by(&:last)
#=> ["e",4]
Hash::new with an argument of zero creates an empty hash whose default value is zero.
The steps:
s = str.downcase.gsub(/[^a-z]/,'')
#=> "thequickbrowndogjumpedoverthelazyfox"
enum0 = s.each_char
#=> #<Enumerator: "thequickbrowndogjumpedoverthelazyfox":each_char>
enum1 = enum0.with_object(Hash.new(0))
#=> #<Enumerator: #<Enumerator:
# "thequickbrowndogjumpedoverthelazyfox":each_char>:with_object({})>
You can think of enum1 as a "compound" enumerator. (Study the return value above.)
Let's see the elements of enum1:
enum1.to_a
#=> [["t", {}], ["h", {}], ["e", {}], ["q", {}],..., ["x", {}]]
The first element of enum1 (["t", {}]) is passed to the block by String#each_char and assigned to the block variables:
c,h = enum1.next
#=> ["t", {}]
c #=> "t"
h #=> {}
The block calculation is then performed:
h[c] += 1
#=> h[c] = h[c] + 1
#=> h["t"] = h["t"] + 1
#=> h["t"] = 0 + 1 #=> 1
h #=> {"t"=>1}
Ruby expands h[c] += 1 to h[c] = h[c] + 1, which is h["t"] = h["t"] + 1 As h #=> {}, h has no key "t", so h["t"] on the right side of the equal sign is replaced by the hash's default value, 0. The next time c #=> "t", h["t"] = h["t"] + 1 will reduce to h["t"] = 1 + 1 #=> 2 (i.e., the default value will not be used, as h now has a key "t").
The next value of enum1 is then passed into the block and the block calculation is performed:
c,h = enum1.next
#=> ["h", {"t"=>1}]
h[c] += 1
#=> 1
h #=> {"t"=>1, "h"=>1}
The remaining elements of enum1 are processed similarly.
A simple way to do that, without worrying about checking empty letters:
letter, count = ('a'..'z')
.map {|letter| [letter, string.count(letter)] }
.max_by(&:last)
Here is another way of doing what you want:
str = 'aaaabbbbcd'
h = str.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
max = h.values.max
output_hash = Hash[h.select { |k, v| v == max}]
puts "most_frequent_value: #{max}"
puts "most frequent character(s): #{output_hash.keys}"
def most_common_letter(string)
string.downcase.split('').group_by(&:itself).map { |k, v| [k, v.size] }.max_by(&:last)
end
Edit:
Using hash:
def most_common_letter(string)
chars = {}
most_common = nil
most_common_count = 0
string.downcase.gsub(/[^a-z]/, '').each_char do |c|
count = (chars[c] = (chars[c] || 0) + 1)
if count > most_common_count
most_common = c
most_common_count = count
end
end
[most_common, most_common_count]
end
I'd like to mention a solution with Enumerable#tally, introduced by Ruby 2.7.0:
str =<<-END
Tallies the collection, i.e., counts the occurrences of each element. Returns a hash with the elements of the collection as keys and the corresponding counts as values.
END
str.scan(/[a-z]/).tally.max_by(&:last)
#=> ["e", 22]
Where:
str.scan(/[a-z]/).tally
#=> {"a"=>8, "l"=>9, "i"=>6, "e"=>22, "s"=>12, "t"=>13, "h"=>9, "c"=>11, "o"=>11, "n"=>11, "u"=>5, "r"=>5, "f"=>2, "m"=>2, "w"=>1, "k"=>1, "y"=>1, "d"=>2, "p"=>1, "g"=>1, "v"=>1}
char, count = string.split('').
group_by(&:downcase).
map { |k, v| [k, v.size] }.
max_by { |_, v| v }

Consecutive letter frequency

I am trying to write code to determine consecutive frequency of letters within a string.
For example:
"aabbcbb" => ["a",2],["b",2],["c", 1], ["b", 2]
My code gives me the first letter frequency but doesn't move on to the next.
def encrypt(str)
array = []
count = 0
str.each_char do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
return [array, count]
array = []
end
end
end
p "aabbcbb".chars.chunk{|c| c}.map{|c, a| [c, a.size]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
"aabbcbb".chars.slice_when(&:!=).map{|a| [a.first, a.length]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
There's a simple regular expression-based solution involving back-references:
"aabbbcbb".scan(/((.)\2*)/).map { |m,c| [c, m.length] }
# => [["a", 2], ["b", 3], ["c", 1], ["b", 2]]
But I would prefer the chunk method for clarity (and almost certainly efficiency).
Actually out of curiosity, I wrote a quick benchmark and scan is a little more than four times faster than chunk.map, but I'd still use chunk.map for clarity unless you're actually doing this hundreds of thousands of times:
require 'benchmark'
N = 10000
data = ('a'..'z').map { |c| c * 10 }.join("")
Benchmark.bm do |bm|
bm.report do
N.times { data.chars.chunk{ |c| c }.map { |c, a| [c, a.size] } }
end
bm.report do
N.times { data.scan(/((.)\2*)/).map { |m,c| [c, m.size] } }
end
end
user system total real
0.800000 0.010000 0.810000 ( 0.803824)
0.190000 0.000000 0.190000 ( 0.192915)
You need to build up an array of results, rather than simply stopping at the first one:
def consecutive_frequencies(str)
str.each_char.reduce([]) do |frequencies_arr, char|
if frequencies_arr.last && frequencies_arr.last[0] == char
frequencies_arr.last[1] += 1
else
frequencies_arr << [char, 1]
end
frequencies_arr
end
end
#steenslag gave the answer I would have given, so I'll try something different.
"aabbcbb".each_char.with_object([]) { |c,a| (a.any? && c == a.last.first) ?
a.last[-1] += 1 : a << [c, 1] }
#=> [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
def encrypt(str)
count = 0
array = []
str.chars do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
puts "[#{array}, #{count}]"
array.clear
count = 0
array << letter
count += 1
end
end
puts "[#{array}, #{count}]"
end
There are several errors with your implementation, I would try with a hash (rather than an array) and use something like this:
def encrypt(str)
count = 0
hash = {}
str.each_char do |letter|
if hash.key?(letter)
hash[letter] += 1
else
hash[letter] = 1
end
end
return hash
end
puts encrypt("aabbcbb")

How to count duplicates in Ruby Arrays

How do you count duplicates in a ruby array?
For example, if my array had three a's, how could I count that
Another version of a hash with a key for each element in your array and value for the count of each element
a = [ 1, 2, 3, 3, 4, 3]
h = Hash.new(0)
a.each { | v | h.store(v, h[v]+1) }
# h = { 3=>3, 2=>1, 1=>1, 4=>1 }
Given:
arr = [ 1, 2, 3, 2, 4, 5, 3]
My favourite way of counting elements is:
counts = arr.group_by{|i| i}.map{|k,v| [k, v.count] }
# => [[1, 1], [2, 2], [3, 2], [4, 1], [5, 1]]
If you need a hash instead of an array:
Hash[*counts.flatten]
# => {1=>1, 2=>2, 3=>2, 4=>1, 5=>1}
This will yield the duplicate elements as a hash with the number of occurences for each duplicate item. Let the code speak:
#!/usr/bin/env ruby
class Array
# monkey-patched version
def dup_hash
inject(Hash.new(0)) { |h,e| h[e] += 1; h }.select {
|k,v| v > 1 }.inject({}) { |r, e| r[e.first] = e.last; r }
end
end
# unmonkeey'd
def dup_hash(ary)
ary.inject(Hash.new(0)) { |h,e| h[e] += 1; h }.select {
|_k,v| v > 1 }.inject({}) { |r, e| r[e.first] = e.last; r }
end
p dup_hash([1, 2, "a", "a", 4, "a", 2, 1])
# {"a"=>3, 1=>2, 2=>2}
p [1, 2, "Thanks", "You're welcome", "Thanks",
"You're welcome", "Thanks", "You're welcome"].dup_hash
# {"You're welcome"=>3, "Thanks"=>3}
Simple.
arr = [2,3,4,3,2,67,2]
repeats = arr.length - arr.uniq.length
puts repeats
arr = %w( a b c d c b a )
# => ["a", "b", "c", "d", "c", "b", "a"]
arr.count('a')
# => 2
Another way to count array duplicates is:
arr= [2,2,3,3,2,4,2]
arr.group_by{|x| x}.map{|k,v| [k,v.count] }
result is
[[2, 4], [3, 2], [4, 1]]
requires 1.8.7+ for group_by
ary = %w{a b c d a e f g a h i b}
ary.group_by{|elem| elem}.select{|key,val| val.length > 1}.map{|key,val| key}
# => ["a", "b"]
with 1.9+ this can be slightly simplified because Hash#select will return a hash.
ary.group_by{|elem| elem}.select{|key,val| val.length > 1}.keys
# => ["a", "b"]
To count instances of a single element use inject
array.inject(0){|count,elem| elem == value ? count+1 : count}
arr = [1, 2, "a", "a", 4, "a", 2, 1]
arr.group_by(&:itself).transform_values(&:size)
#=> {1=>2, 2=>2, "a"=>3, 4=>1}
Ruby >= 2.7 solution here:
A new method .tally has been added.
Tallies the collection, i.e., counts the occurrences of each element. Returns a hash with the elements of the collection as keys and the corresponding counts as values.
So now, you will be able to do:
["a", "b", "c", "b"].tally #=> {"a"=>1, "b"=>2, "c"=>1}
What about a grep?
arr = [1, 2, "Thanks", "You're welcome", "Thanks", "You're welcome", "Thanks", "You're welcome"]
arr.grep('Thanks').size # => 3
Its Easy:
words = ["aa","bb","cc","bb","bb","cc"]
One line simple solution is:
words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }
It works for me.
Thanks!!
I don't think there's a built-in method. If all you need is the total count of duplicates, you could take a.length - a.uniq.length. If you're looking for the count of a single particular element, try
a.select {|e| e == my_element}.length.
Improving #Kim's answer:
arr = [1, 2, "a", "a", 4, "a", 2, 1]
Hash.new(0).tap { |h| arr.each { |v| h[v] += 1 } }
# => {1=>2, 2=>2, "a"=>3, 4=>1}
Ruby code to get the repeated elements in the array:
numbers = [1,2,3,1,2,0,8,9,0,1,2,3]
similar = numbers.each_with_object([]) do |n, dups|
dups << n if seen.include?(n)
seen << n
end
print "similar --> ", similar
Another way to do it is to use each_with_object:
a = [ 1, 2, 3, 3, 4, 3]
hash = a.each_with_object({}) {|v, h|
h[v] ||= 0
h[v] += 1
}
# hash = { 3=>3, 2=>1, 1=>1, 4=>1 }
This way, calling a non-existing key such as hash[5] will return nil instead of 0 with Kim's solution.
I've used reduce/inject for this in the past, like the following
array = [1,5,4,3,1,5,6,8,8,8,9]
array.reduce (Hash.new(0)) {|counts, el| counts[el]+=1; counts}
produces
=> {1=>2, 5=>2, 4=>1, 3=>1, 6=>1, 8=>3, 9=>1}

Resources