letter count in a string, Ruby - ruby

A silly question. I have a piece of code which counts letters appearances in a string lower case and uppercase letters as one. But it returns the hash keys in lower case. I would like to ask how can I make the hash keys return as uppercase letters? And also is there an easy way to put a line between each key? Thank you in forward!
downcase.scan(/\w/).inject(Hash.new(0)) {|h, c| h[c] += 1;h}

use upcase instead of downcase
> string = "HellO hElLo"
=> "HellO hElLo"
> string.upcase.scan(/\w/).inject(Hash.new(0)) {|h, c| h[c] += 1;h}
=> {"H"=>2, "E"=>2, "L"=>4, "O"=>2}

Use upcase first if you want the letters in uppercase.
Use each_with_object instead of inject. inject returns the result of the block and you have to explicitly return the hash in the end. each_with_object automatically returns the initial hash.
string = "Hello hElLo"
hash = string.upcase.scan(/\w/).each_with_object(Hash.new(0)) do |char, hash|
hash[char] += 1
end
puts hash
# => {"H"=>2, "E"=>2, "L"=>4, "O"=>2}
To output individual letters and their count on a line each, iterate the hash:
hash.each do |key, value|
puts "#{key} => #{value}"
end
# H => 2
# E => 2
# L => 4
# O => 2

Related

Ruby merge duplicates in string

If I have a string like this
str =<<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
If a number in the first value shows up again, I want to add their second values together. So the final string would look like this
7312357006,1246.221
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
If the final output is an array that's fine too.
There are lots of ways to do this in Ruby. One particularly terse way is to use String#scan:
str = <<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
data = Hash.new(0)
str.scan(/(\d+),([\d.]+)/) {|k,v| data[k] += v.to_f }
p data
# => { "7312357006" => 1246.221,
# "3214058234" => 3499.2,
# "1324958723" => 232.1,
# "3214173443" => 234.1,
# "6134513494" => 23.2 }
This uses the regular expression /(\d+),([\d.]+)/ to extract the two values from each line. The block is called with each pair as arguments, which are then merged into the hash.
This could also be written as a single expression using each_with_object:
data = str.scan(/(\d+),([\d.]+)/)
.each_with_object(Hash.new(0)) {|(k,v), hsh| hsh[k] += v.to_f }
# => (same as above)
There are likewise many ways to print the result, but here are a couple I like:
puts data.map {|kv| kv.join(",") }.join("\n")
# => 7312357006,1246.221
# 3214058234,3499.2
# 1324958723,232.1
# 3214173443,234.1
# 6134513494,23.2
# or:
puts data.map {|k,v| "#{k},#{v}\n" }.join
# => (same as above)
You can see all of these in action on repl.it.
Edit: Although I don't recommend either of these for the sake of readability, here's more just for kicks (requires Ruby 2.4+):
data = str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.transform_values {|a| a.sum(&:to_f) }
...or, to going straight to a string:
puts str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.map {|k,vs| "#{k},#{vs.sum(&:to_f)}\n" }.join
Since repl.it is stuck on Ruby 2.3: Try it online!
You could achieve this using each_with_object, as below:
str = "7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1"
# convert the string into nested pairs of floats
# to briefly summarise the steps: split entries by newline, strip whitespace, split by comma, convert to floats
arr = str.split("\n").map(&:strip).map { |el| el.split(",").map(&:to_f) }
result = arr.each_with_object(Hash.new(0)) do |el, hash|
hash[el.first] += el.last
end
# => {7312357006.0=>1246.221, 3214058234.0=>3499.2, 1324958723.0=>232.1, 3214173443.0=>234.1, 6134513494.0=>23.2}
# You can then call `to_a` on result if you want:
result.to_a
# => [[7312357006.0, 1246.221], [3214058234.0, 3499.2], [1324958723.0, 232.1], [3214173443.0, 234.1], [6134513494.0, 23.2]]
each_with_object iterates through each pair of data, providing them with access to an accumulator (in this the hash). By following this approach, we can add each entry to the hash, and add together the totals if they appear more than once.
Hope that helps - let me know if you've any questions.
def combine(str)
str.each_line.with_object(Hash.new(0)) do |s,h|
k,v = s.split(',')
h.update(k=>v.to_f) { |k,o,n| o+n }
end.reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair }
end
puts combine str
7312357006,1246.22
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
Notes:
using String#each_line is preferable to str.split("\n") as the former returns an enumerator whereas the latter returns a temporary array. Each element generated by the enumerator is line of str that (unlike the elements of str.split("\n")) ends with a newline character, but that is of no concern.
see Hash::new, specifically when a default value (here 0) is used. If a hash has been defined h = Hash.new(0) and h does not have a key k, h[k] returns the default value, zero (h is not changed). When Ruby encounters the expression h[k] += 1, the first thing she does is expand it to h[k] = h[k] + 1. If h has been defined with a default value of zero, and h does not have a key k, h[k] on the right of the equality (syntactic sugar1 for h.[](k)) returns zero.
see Hash#update (aka merge!). h.update(k=>v.to_f) is syntactic sugar for h.update({ k=>v.to_f })
see Kernel#sprint for explanations of the formatting directives %s and %g.
the receiver for the expression reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair } (in the penultimate line), is the following hash.
{"7312357006"=>1246.221, "3214058234"=>3499.2, "1324958723"=>232.1,
"3214173443"=>234.1, "6134513494"=>23.2}
1 Syntactic sugar is a shortcut allowed by Ruby.
Implemented this solution as hash was giving me issues:
d = []
s.split("\n").each do |line|
x = 0
q = 0
dup = false
line.split(",").each do |data|
if x == 0 and d.include? data then dup = true ; q = d.index(data) elsif x == 0 then d << data end
if x == 1 and dup == false then d << data end
if x == 1 and dup == true then d[q+1] = "#{'%.2f' % (d[q+1].to_f + data.to_f).to_s}" end
if x == 2 and dup == false then d << data end
x += 1
end
end
x = 0
s = ""
d.each do |val|
if x == 0 then s << "#{val}," end
if x == 1 then s << "#{val}\n ; x = 0" end
x += 1
end
puts(s)

`Hash#[]` gives TypeError after ordering the hash

With the following code:
text = "hello dog hello"
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word] += 1 }
I get:
frequencies # => {"hello" => 2, "dog" => 1}
frequencies["test"] # => 0
If I add the following two lines after the first code above:
frequencies = frequencies.sort_by {|a, b| b }
frequencies.reverse!
then do frequencies["test"], I get this error:
in `[]': no implicit conversion of String into Integer (TypeError)
I guess something's happening to frequencies, but I can't understand what. I also tried puts frequencies["test"].to_s without any luck. How can I have my program to print 0? "test" does not exist as a key after printing the ordered hash.
Enumerable#sort_by returns an Array. Arrays are indexed by Integers, but you index it using a String on line 9.
You need to convert the Array back to a Hash, for example using the Array#to_h method:
frequencies = frequencies.sort_by(&:last).reverse.to_h
Note: this has nothing to do with printing. The error message clearly tells you that the error is on line 9, after printing and it tells you that the error is in your call to the [] method.
This happens, because you call .sort_by which returns array of arrays.
Following code converts it back to hash and sets back default value to 0:
text = "hello dog hello"
words = text.split(" ")
frequencies = words.each_with_object(Hash.new(0)) { |word, o| o[word] += 1 }
frequencies = frequencies.sort_by{ |_, v| v }.reverse.to_h
frequencies.default = 0
p frequencies #=> {"hello"=>2, "dog"=>1}
p frequencies["test"] #=> 0

why 2 same strings have the same object_id in Ruby?

As you may know that in Ruby two same strings do not have a same object_id, while two same symbols do. For instance:
irb(main):001:0> :george.object_id == :george.object_id
=> true
irb(main):002:0> "george".object_id == "george".object_id
=> false
However, in my code below, it shows that two strings which have a same value "one" having a same object_id.
class MyArray < Array
def ==(x)
comparison = Array.new()
x.each_with_index{|item, i| comparison.push(item.object_id.equal?(self[i].object_id))}
if comparison.include?(false) then
false
else
true
end
end
end
class MyHash < Hash
def ==(x)
y = Hash[self.sort]
puts y.class
puts y
x = Hash[x.sort]
puts x.class
puts x
puts "______"
xkeys = MyArray.new(x.keys)
puts xkeys.class
puts xkeys.to_s
puts xkeys.object_id
puts xkeys[0].class
puts xkeys[0]
puts xkeys[0].object_id
puts "______"
xvals = MyArray.new(x.values)
puts "______"
selfkeys = MyArray.new(y.keys)
puts selfkeys.class
puts selfkeys.to_s
puts selfkeys.object_id
puts selfkeys[0].class
puts selfkeys[0]
puts selfkeys[0].object_id
puts "______"
selfvals = MyArray.new(y.values)
puts xkeys.==(selfkeys)
puts xvals.==(selfvals)
end
end
a1 = MyHash[{"one" => 1, "two" => 2}]
b1 = MyHash[{"one" => 1, "two" => 2}]
puts a1.==(b1)
And Get
Hash
{"one"=>1, "two"=>2}
Hash
{"one"=>1, "two"=>2}
______
MyArray
["one", "two"]
21638020
String
one
21641920
______
______
MyArray
["one", "two"]
21637580
String
one
21641920
______
true
true
As you can see from the result that 2 String objects with have a same value "one" having a same object_id 21641920, while it's supposed to have different ID. So can anyone give me some hints or tell me how can I get different ID in this case?
Best Regards.
When a String object is used as a key in a Hash, the hash will duplicate and freeze the string internally and will use that copy as its key.
Reference: Hash#store.
As of ruby 2.2 strings used as keys in hash literals are frozen and de-duplicated: the same string will be reused.
This is a performance optimisation: not allocating many copies of the same string means there are fewer objects to allocate and fewer to garbage collect.
Another way to see frozen string literals in action :
"foo".freeze.object_id == "foo".freeze.object_id
Will return true in versions of ruby >= 2.1

Counting frequency of symbols

So I have the following code which counts the frequency of each letter in a string (or in this specific instance from a file):
def letter_frequency(file)
letters = 'a' .. 'z'
File.read(file) .
split(//) .
group_by {|letter| letter.downcase} .
select {|key, val| letters.include? key} .
collect {|key, val| [key, val.length]}
end
letter_frequency(ARGV[0]).sort_by {|key, val| -val}.each {|pair| p pair}
Which works great, but I would like to see if there is someway to do something in ruby that is similar to this but to catch all the different possible symbols? ie spaces, commas, periods, and everything in between. I guess to put it more simply, is there something similar to 'a' .. 'z' that holds all the symbols? Hope that makes sense.
You won't need a range when you're trying to count every possible character, because every possible character is a domain. You should only create a range when you specifically need to use a subset of said domain.
This is probably a faster implementation that counts all characters in the file:
def char_frequency(file_name)
ret_val = Hash.new(0)
File.open(file_name) {|file| file.each_char {|char| ret_val[char] += 1 } }
ret_val
end
p char_frequency("1003v-mm") #=> {"\r"=>56, "\n"=>56, " "=>2516, "\xC9"=>2, ...
For reference I used this test file.
It may not use much Ruby magic with Ranges but a simple way is to build a character counter that iterates over each character in a string and counts the totals:
class CharacterCounter
def initialize(text)
#characters = text.split("")
end
def character_frequency
character_counter = {}
#characters.each do |char|
character_counter[char] ||= 0
character_counter[char] += 1
end
character_counter
end
def unique_characters
character_frequency.map {|key, value| key}
end
def frequency_of(character)
character_frequency[character] || 0
end
end
counter = CharacterCounter.new("this is a test")
counter.character_frequency # => {"t"=>3, "h"=>1, "i"=>2, "s"=>3, " "=>3, "a"=>1, "e"=>1}
counter.unique_characters # => ["t", "h", "i", "s", " ", "a", "e"]
counter.frequency_of 't' # => 3
counter.frequency_of 'z' # => 0

Display subset of array

Say I want to puts the alphabet. So I can do something like:
alphabet = ('a'..'z')
alphabet.map do |a|
puts a
end
What I want to do now is exclude the vowels.
alphabet = ('a'..'z')
vowels = ['a','e','i','o','u']
alphabet.map do |a|
puts a unless a == vowels
end
I am trying to avoid this:
alphabet = ('a'..'z')
alphabet.map do |a|
puts a unless a == 'a'
puts a unless a == 'e'
puts a unless a == 'i'
puts a unless a == 'o'
puts a unless a == 'u'
end
How do I syntactically implement the second example so that it works properly?
A Range can be expanded into an Array. Then you can subtract another array.
chars = ('a'..'z').to_a - %w( a e i o u )
chars.each do |a|
puts a
end
As a side note, don't use #map unless you really need to. Use #each if you don't care about the returning value.
You don't want equality, you want inclusion:
puts a if vowels.include? a
Also, you're using map (same as collect) which will actually return the results of the puts statements. Unless you actually need that, use each. Or find the letters that match the condition and use that collection to print the results later.
Use the Array#include? method:
puts a unless vowels.include? a
Source: http://rubydoc.info/stdlib/core/1.9.2/Array#include%3F-instance_method
You can even get rid of the loop. This preserves the original alphabet.
alphabet = ('a'..'z')
puts (alphabet.to_a - %w(a e i o u)).join('\r')
Enumerable#grep would work, too:
('a'..'z').grep(/[^aeiou]/) { |a| puts a }
Or simply
puts ('a'..'z').grep(/[^aeiou]/)

Resources