Counting frequency of symbols - ruby

So I have the following code which counts the frequency of each letter in a string (or in this specific instance from a file):
def letter_frequency(file)
letters = 'a' .. 'z'
File.read(file) .
split(//) .
group_by {|letter| letter.downcase} .
select {|key, val| letters.include? key} .
collect {|key, val| [key, val.length]}
end
letter_frequency(ARGV[0]).sort_by {|key, val| -val}.each {|pair| p pair}
Which works great, but I would like to see if there is someway to do something in ruby that is similar to this but to catch all the different possible symbols? ie spaces, commas, periods, and everything in between. I guess to put it more simply, is there something similar to 'a' .. 'z' that holds all the symbols? Hope that makes sense.

You won't need a range when you're trying to count every possible character, because every possible character is a domain. You should only create a range when you specifically need to use a subset of said domain.
This is probably a faster implementation that counts all characters in the file:
def char_frequency(file_name)
ret_val = Hash.new(0)
File.open(file_name) {|file| file.each_char {|char| ret_val[char] += 1 } }
ret_val
end
p char_frequency("1003v-mm") #=> {"\r"=>56, "\n"=>56, " "=>2516, "\xC9"=>2, ...
For reference I used this test file.

It may not use much Ruby magic with Ranges but a simple way is to build a character counter that iterates over each character in a string and counts the totals:
class CharacterCounter
def initialize(text)
#characters = text.split("")
end
def character_frequency
character_counter = {}
#characters.each do |char|
character_counter[char] ||= 0
character_counter[char] += 1
end
character_counter
end
def unique_characters
character_frequency.map {|key, value| key}
end
def frequency_of(character)
character_frequency[character] || 0
end
end
counter = CharacterCounter.new("this is a test")
counter.character_frequency # => {"t"=>3, "h"=>1, "i"=>2, "s"=>3, " "=>3, "a"=>1, "e"=>1}
counter.unique_characters # => ["t", "h", "i", "s", " ", "a", "e"]
counter.frequency_of 't' # => 3
counter.frequency_of 'z' # => 0

Related

How to use gsub with a file in Ruby?

Hey I've a little problem, I've a string array text_word and I want to replace some letters with my file transform.txt, my file looks like this:
/t/ 3
/$/ 1
/a/ !
But when I use gsub I get an Enumerator back, does anyone know how to fix this?
text_transform= Array.new
new_words= Array.new
File.open("transform.txt", "r") do |fi|
fi.each_line do |words|
text_transform << words.chomp
end
end
text_transform.each do |transform|
text_word.each do |words|
new_words << words.gsub(transform)
end
end
You can see String#gsub
If the second argument is a Hash, and the matched text is one of its
keys, the corresponding value is the replacement string.
Also you can use IO::readlines
File.readlines('transform.txt', chomp: true).map { |word| word.gsub(/[t$a]/, 't' => 3, '$' => 1, 'a' => '!') }
gsub returns an Enumerator when you provide just one argument (the pattern). If you want to replace just add the replacement string:
pry(main)> 'this is my string'.gsub(/i/, '1')
"th1s 1s my str1ng"
You need to refactor your code:
text_transform = Array.new
new_words = Array.new
File.open("transform.txt", "r") do |fi|
fi.each_line do |words|
text_transform << words.chomp.strip.split # "/t/ 3" -> ["/t/", "3"]
end
end
text_transform.each do |pattern, replacement| # pattern = "/t/", replacement = "3"
text_word.each do |words|
new_words << words.gsub(pattern, replacement)
end
end

How to fix incorrect character counting code

I have a question about mysterious 'e' characters appearing in my counts hash.
My initial approach was clunky and inelegant:
def letter_count(str)
counts = {}
words = str.split(" ")
words.each do |word|
letters = word.split("")
letters.each do |letter|
if counts.include?(letter)
counts[letter] += 1
else
counts[letter] = 1
end
end
end
counts
end
This approach worked, but I wanted to make it a little more readable, so I abbreviated it to:
def letter_count(str)
counts = Hash.new(0)
str.split("").each{|letter| counts[letter] += 1 unless letter == ""}
counts
end
This is where I encountered the issue, and fixed it by using:
str.split("").each{|letter| counts[letter] += 1 unless letter == " "} # added a space.
I don't understand why empty spaces were being represented by the letter 'e' or being counted at all.
I don't understand why empty spaces were being represented by the letter 'e' or being counted at all.
I can't duplicate the problem:
def letter_count(str)
counts = Hash.new(0)
str.split("").each{|letter| counts[letter] += 1 unless letter == ""}
counts
end
letter_count('a cat') # => {"a"=>2, " "=>1, "c"=>1, "t"=>1}
"empty spaces"? There's no such thing. A space is not empty; It's considered blank but not empty:
' '.empty? # => false
Loading the ActiveSupport extension:
require 'active_support/core_ext/object/blank'
' '.blank? # => true
spaces are valid characters, which is why they're being counted. You have to disallow them if you don't want them counted.
For reference, here's how I'd do it:
def letter_count(str)
str.chars.each_with_object(Hash.new(0)) { |l, h| h[l] += 1 }
end
letter_count('a cat') # => {"a"=>2, " "=>1, "c"=>1, "t"=>1}
A messier way would be:
def letter_count(str)
str.chars.group_by { |c| c }.map { |char, chars| [char, chars.count] }.to_h
end
Breaking that down:
def letter_count(str)
str.chars # => ["a", " ", "c", "a", "t"]
.group_by { |c| c } # => {"a"=>["a", "a"], " "=>[" "], "c"=>["c"], "t"=>["t"]}
.map { |char, chars| [char, chars.count] } # => [["a", 2], [" ", 1], ["c", 1], ["t", 1]]
.to_h # => {"a"=>2, " "=>1, "c"=>1, "t"=>1}
end
Ruby already has String#each_char which you could use.
def char_count(string)
counts = Hash.new(0)
string.each_char { |char|
counts[char] += 1
}
return counts
end
puts char_count("Basset hounds got long ears").inspect
# {"B"=>1, "a"=>2, "s"=>4, "e"=>2, "t"=>2, " "=>4, "h"=>1,
# "o"=>3, "u"=>1, "n"=>2, "d"=>1, "g"=>2, "l"=>1, "r"=>1}
As for why you're getting the wrong characters, are you sure you're passing in the string you think you are?

Build list of substrings created by separating a string by a match

I have a string:
"a_b_c_d_e"
I would like to build a list of substrings that result from removing everything after a single "_" from the string. The resulting list would look like:
['a_b_c_d', 'a_b_c', 'a_b', 'a']
What is the most rubyish way to achieve this?
s = "a_b_c_d_e"
a = []
s.scan("_"){a << $`} #`
a # => ["a", "a_b", "a_b_c", "a_b_c_d"]
You can split the string on the underscore character into an Array. Then discard the last element of the array and collect the remaining elements in another array joined by underscores. Like this:
str = "a_b_c_d_e"
str_ary = str.split("_") # will yield ["a","b","c","d","e"]
str_ary.pop # throw out the last element in str_ary
result_ary = [] # an empty array where you will collect your results
until str_ary.empty?
result_ary << str_ary.join("_") #collect the remaining elements of str_ary joined by underscores
str_ary.pop
end
# result_ary = ["a_b_c_d","a_b_c","a_b","a"]
Hope this helps.
I am not sure about “most rubyish”, my solutions would be:
str = 'a_b_c_d_e'
(items = str.split('_')).map.with_index do |_, i|
items.take(i + 1).join('_')
end.reverse
########################################################
(items = str.split('_')).size.downto(1).map do |e|
items.take(e).join('_')
end
########################################################
str.split('_').inject([]) do |memo, l|
memo << [memo.last, l].compact.join('_')
end.reverse
########################################################
([items]*items.size).map.with_index(&:take).map do |e|
e.join('_')
end.reject(&:empty?).reverse
My fave:
([str]*str.count('_')).map.with_index do |s, i|
s[/\A([^_]+_){#{i + 1}}/][0...-1]
end.reverse
Ruby ships with a module for abbreviation.
require "abbrev"
puts ["a_b_c_d_e".tr("_","")].abbrev.keys[1..-1].map{|a| a.chars*"_"}
# => ["a_b_c_d", "a_b_c", "a_b", "a"]
It works on an Array with words - just one in this case. Most work is removing and re-placing the underscores.

letter count in a string, Ruby

A silly question. I have a piece of code which counts letters appearances in a string lower case and uppercase letters as one. But it returns the hash keys in lower case. I would like to ask how can I make the hash keys return as uppercase letters? And also is there an easy way to put a line between each key? Thank you in forward!
downcase.scan(/\w/).inject(Hash.new(0)) {|h, c| h[c] += 1;h}
use upcase instead of downcase
> string = "HellO hElLo"
=> "HellO hElLo"
> string.upcase.scan(/\w/).inject(Hash.new(0)) {|h, c| h[c] += 1;h}
=> {"H"=>2, "E"=>2, "L"=>4, "O"=>2}
Use upcase first if you want the letters in uppercase.
Use each_with_object instead of inject. inject returns the result of the block and you have to explicitly return the hash in the end. each_with_object automatically returns the initial hash.
string = "Hello hElLo"
hash = string.upcase.scan(/\w/).each_with_object(Hash.new(0)) do |char, hash|
hash[char] += 1
end
puts hash
# => {"H"=>2, "E"=>2, "L"=>4, "O"=>2}
To output individual letters and their count on a line each, iterate the hash:
hash.each do |key, value|
puts "#{key} => #{value}"
end
# H => 2
# E => 2
# L => 4
# O => 2

frequency of a letter in a string

When trying to find the frequency of letters in 'fantastic' I am having trouble understanding the given solution:
def letter_count(str)
counts = {}
str.each_char do |char|
next if char == " "
counts[char] = 0 unless counts.include?(char)
counts[char] += 1
end
counts
end
I tried deconstructing it and when I created the following piece of code I expected it to do the exact same thing. However it gives me a different result.
blah = {}
x = 'fantastic'
x.each_char do |char|
next if char == " "
blah[char] = 0
unless
blah.include?(char)
blah[char] += 1
end
blah
end
The first piece of code gives me the following
puts letter_count('fantastic')
>
{"f"=>1, "a"=>2, "n"=>1, "t"=>2, "s"=>1, "i"=>1, "c"=>1}
Why does the second piece of code give me
puts blah
>
{"f"=>0, "a"=>0, "n"=>0, "t"=>0, "s"=>0, "i"=>0, "c"=>0}
Can someone break down the pieces of code and tell me what the underlying difference is. I think once I understand this I'll be able to really understand the first piece of code. Additionally if you want to explain a bit about the first piece of code to help me out that'd be great as well.
You can't split this line...
counts[char] = 0 unless counts.include?(char)
... over multiple line the way you did it. The trailing conditional only works on a single line.
If you wanted to split it over multiple lines you would have to convert to traditional if / end (in this case unless / end) format.
unless counts.include?(char)
counts[char] = 0
end
Here's the explanation of the code...
# we define a method letter_count that accepts one argument str
def letter_count(str)
# we create an empty hash
counts = {}
# we loop through all the characters in the string... we will refer to each character as char
str.each_char do |char|
# we skip blank characters (we go and process the next character)
next if char == " "
# if there is no hash entry for the current character we initialis the
# count for that character to zero
counts[char] = 0 unless counts.include?(char)
# we increase the count for the current character by 1
counts[char] += 1
# we end the each_char loop
end
# we make sure the hash of counts is returned at the end of this method
counts
# end of the method
end
Now that #Steve has answered your question and you have accepted his answer, perhaps I can suggest another way to count the letters. This is just one of many approaches that could be taken.
Code
def letter_count(str)
str.downcase.each_char.with_object({}) { |c,h|
(h[c] = h.fetch(c,0) + 1) if c =~ /[a-z]/ }
end
Example
letter_count('Fantastic')
#=> {"f"=>1, "a"=>2, "n"=>1, "t"=>2, "s"=>1, "i"=>1, "c"=>1}
Explanation
Here is what's happening.
str = 'Fantastic'
We use String#downcase so that, for example, 'f' and 'F' are treated as the same character for purposes of counting. (If you don't want that, simply remove .downcase.) Let
s = str.downcase #=> "fantastic"
In
s.each_char.with_object({}) { |c,h| (h[c] = h.fetch(c,0) + 1) c =~ /[a-z]/ }
the enumerator String#each_char is chained to Enumerator#with_index. This creates a compound enumerator:
enum = s.each_char.with_object({})
#=> #<Enumerator: #<Enumerator: "fantastic":each_char>:with_object({})>
We can view what the enumerator will pass to the block by converting it to an array:
enum.to_a
#=> [["f", {}], ["a", {}], ["n", {}], ["t", {}], ["a", {}],
# ["s", {}], ["t", {}], ["i", {}], ["c", {}]]
(Actually, it only passes an empty hash with 'f'; thereafter it passes the updated value of the hash.) The enumerator with_object creates an empty hash denoted by the block variable h.
The first element enum passes to the block is the string 'f'. The block variable c is assigned that value, so the expression in the block:
(h[c] = h.fetch(c,0) + 1) if c =~ /[a-z]/
evaluates to:
(h['f'] = h.fetch('f',0) + 1) if 'f' =~ /[a-z]/
Now
c =~ /[a-z]/
is true if and only if c is a lowercase letter. Here
'f' =~ /[a-z]/ #=> true
so we evaluate the expression
h[c] = h.fetch(c,0) + 1
h.fetch(c,0) returns h[c] if h has a key c; else it returns the value of Hash#fetch's second parameter, which here is zero. (fetch can also take a block.)
Since h is now empty, it becomes
h['f'] = 0 + 1 #=> 1
The enumerator each_char then passes 'a', 'n' and 't' to the block, resulting in the hash becoming
h = {'f'=>1, 'a'=>1, 'n'=>1, 't'=>1 }
The next character passed in is a second 'a'. As h already has a key 'a',
h[c] = h.fetch(c,0) + 1
evaluates to
h['a'] = h['a'] + 1 #=> 1 + 1 => 2
The remainder of the string is processed the same way.

Resources