How to get words frequency in efficient way with ruby?

How to get words frequency in efficient way with ruby? - ruby

Sample input:
"I was 09809 home -- Yes! yes! You was"
and output:
{ 'yes' => 2, 'was' => 2, 'i' => 1, 'home' => 1, 'you' => 1 }
My code that does not work:
def get_words_f(myStr)
myStr=myStr.downcase.scan(/\w/).to_s;
h = Hash.new(0)
myStr.split.each do |w|
h[w] += 1
end
return h.to_a;
end
print get_words_f('I was 09809 home -- Yes! yes! You was');

This works but I am kinda new to Ruby too. There might be a better solution.
def count_words(string)
words = string.split(' ')
frequency = Hash.new(0)
words.each { |word| frequency[word.downcase] += 1 }
return frequency
end
Instead of .split(' '), you could also do .scan(/\w+/); however, .scan(/\w+/) would separate aren and t in "aren't", while .split(' ') won't.
Output of your example code:
print count_words('I was 09809 home -- Yes! yes! You was');
#{"i"=>1, "was"=>2, "09809"=>1, "home"=>1, "yes"=>2, "you"=>1}

def count_words(string)
string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res}
end
Second variant:
def count_words(string)
string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1}
end

def count_words(string)
Hash[
string.scan(/[a-zA-Z]+/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
end
puts count_words 'I was 09809 home -- Yes! yes! You was'

This code will ask you for input and then find the word frequency for you:
puts "enter some text man"
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word.downcase] += 1 }
frequencies = frequencies.sort_by {|a, b| b}
frequencies.reverse!
frequencies.each do |word, frequency|
puts word + " " + frequency.to_s
end

This works, and ignores the numbers:
def get_words(my_str)
my_str = my_str.scan(/\w+/)
h = Hash.new(0)
my_str.each do |s|
s = s.downcase
if s !~ /^[0-9]*\.?[0-9]+$/
h[s] += 1
end
end
return h
end
print get_words('I was there 1000 !')
puts '\n'

You can look at my code that splits the text into words. The basic code would look as follows:
sentence = "Ala ma kota za 5zł i 10$."
splitter = SRX::Polish::WordSplitter.new(sentence)
histogram = Hash.new(0)
splitter.each do |word,type|
histogram[word.downcase] += 1 if type == :word
end
p histogram
You should be careful if you wish to work with languages other than English, since in Ruby 1.9 the downcase won't work as you expected for letters such as 'Ł'.

class String
def frequency
self.scan(/[a-zA-Z]+/).each.with_object(Hash.new(0)) do |word, hash|
hash[word.downcase] += 1
end
end
end
puts "I was 09809 home -- Yes! yes! You was".frequency

Related

How can I refactor my word frequency method?

This is my method word_frequency.
def frequencies(text)
words = text.split
the_frequencies = Hash.new(0)
words.each do |word|
the_frequencies[word] += 1
end
return the_frequencies
end
def most_common_words(file_name, stop_words_file_name, number_of_word)
# TODO: return hash of occurences of number_of_word most frequent words
opened_file_string = File.open(file_name.to_s).read.downcase.strip.split.join(" ").gsub(/[^a-zA-Z \'$]/, "").gsub(/'s/, "").split
opened_stop_file_string = File.open(stop_words_file_name.to_s).read.downcase.strip.split.join(" ").gsub(/[^a-zA-Z \']/, "").gsub(/'s/, "").split
# declarar variables de file_name stop words.
filtered_array = opened_file_string.reject { |n| opened_stop_file_string.include? n }
the_frequencies = Hash.new(0)
filtered_array.each do |word|
the_frequencies[word] += 1
end
store = the_frequencies.sort_by { |_key, value| value }.reverse[0..number_of_word - 1].to_h
store
end
Works well, but I think I can do it better. Rubocop says my lines are too long, and I'm agree, but this is my best. Can someone explain how I can do it better?

It would be good if you just decompose the big parts. The most_common_words seems still delicate, you could explain what you're trying to do, to see what can else can be done there.
You could also make use of frequencies, and looking at the pattern within the method arguments, an OOP approach would fit better here.
def join_file(file_name)
File.open(file_name).read.downcase.strip.split.join(' ')
end
def frequencies(text)
text.split.each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
end
def opened_file_string(file_name)
join_file(file_name).gsub(/[^a-zA-Z \'$]/, '').gsub(/'s/, '').split
end
def opened_stop_file_string(file_name)
#opened_stop_file_string ||= join_file(file_name).gsub(/[^a-zA-Z \']/, '').gsub(/'s/, '').split
end
def in_stop_file_string?(file_name, word)
opened_stop_file_string(file_name).include?(word)
end
def filtered_array(file_name, stop_words_file_name)
opened_file_string(file_name).reject do |word|
in_stop_file_string?(stop_words_file_name, word)
end
end
def frequencies_in_filtered_array(file_name, stop_words_file_name)
frequencies(filtered_array(file_name, stop_words_file_name)).sort_by { |_, value| value }
end
def most_common_words(file_name, stop_words_file_name, number_of_word)
frequencies_in_filtered_array(file_name.to_s, stop_words_file_name.to_s).reverse[0...number_of_word].to_h
end

This is a bit cleaner, use multiline method chaining etc.
def frequencies(text)
words = text.split
the_frequencies = Hash.new(0)
words.each do |word|
the_frequencies[word] += 1
end
the_frequencies
end
def pre_process_file(file_name)
File.open(file_name.to_s)
.read.downcase.strip.split.join(" ")
.gsub(/[^a-zA-Z \'$]/, "")
.gsub(/'s/, "")
.split
end
def most_common_words(file_name, stop_words_file_name, number_of_word)
# TODO: return hash of occurences of number_of_word most frequent words
opened_file_string = pre_process_file(file_name)
opened_stop_file_string = pre_process_file(stop_words_file_name)
# declarar variables de file_name stop words.
filtered_array = opened_file_string
.reject { |n| opened_stop_file_string.include? n }
the_frequencies = Hash.new(0)
filtered_array.each { |word| the_frequencies[word] += 1 }
the_frequencies
.sort_by { |_k, value| value }
.reverse[0..number_of_word - 1]
.to_h
end

Finding the most occurring character/letter in a string

Trying to get the most occurring letter in a string.
So far:
puts "give me a string"
words = gets.chomp.split
counts = Hash.new(0)
words.each do |word|
counts[word] += 1
end
Does not run further than asking for a string. What am I doing wrong?

If you're running this in irb, then the computer may think that the ruby code you're typing in is the text to analyse:
irb(main):001:0> puts "give me a string"
give me a string
=> nil
irb(main):002:0> words = gets.chomp.split
counts = Hash.new(0)
words.each do |word|
counts[word] += 1
end=> ["counts", "=", "Hash.new(0)"]
irb(main):003:0> words.each do |word|
irb(main):004:1* counts[word] += 1
irb(main):005:1> end
NameError: undefined local variable or method `counts' for main:Object
from (irb):4:in `block in irb_binding'
from (irb):3:in `each'
from (irb):3
from /Users/agrimm/.rbenv/versions/2.2.1/bin/irb:11:in `<main>'
irb(main):006:0>
If you wrap it in a block of some sort, you won't get that confusion:
begin
puts "give me a string"
words = gets.chomp.split
counts = Hash.new(0)
words.each do |word|
counts[word] += 1
end
counts
end
gives
irb(main):001:0> begin
irb(main):002:1* puts "give me a string"
irb(main):003:1> words = gets.chomp.split
irb(main):004:1> counts = Hash.new(0)
irb(main):005:1> words.each do |word|
irb(main):006:2* counts[word] += 1
irb(main):007:2> end
irb(main):008:1> counts
irb(main):009:1> end
give me a string
foo bar
=> {"foo"=>1, "bar"=>1}
Then you can work on the fact that split by itself isn't what you want. :)

This should work:
puts "give me a string"
result = gets.chomp.split(//).reduce(Hash.new(0)) { |h, v| h.store(v, h[v] + 1); h }.max_by{|k,v| v}
puts result.to_s
Output:
#Alan ➜ test rvm:(ruby-2.2#europa) ruby test.rb
give me a string
aa bbb cccc ddddd
["d", 5]
Or in irb:
:008 > 'This is some random string'.split(//).reduce(Hash.new(0)) { |h, v| h.store(v, h[v] + 1); h }.max_by{|k,v| v}
=> ["s", 4]

Rather than getting a count word by word, you can process the whole string immediately.
str = gets.chomp
hash = Hash.new(0)
str.each_char do |c|
hash[c] += 1 unless c == " " #used to filter the space
end
After getting the number of letters, you can then find the letter with highest count with
max = hash.values.max
Then match it to the key in the hash and you're done :)
puts hash.select{ |key| hash[key] == max }
Or to simplify the above methods
hash.max_by{ |key,value| value }
The compact form of this is :
hash = Hash.new(0)
gets.chomp.each_char { |c| hash[c] += 1 unless c == " " }
puts hash.max_by{ |key,value| value }

This returns the highest occurring character within a given string:
puts "give me a string"
characters = gets.chomp.split("").reject { |c| c == " " }
counts = Hash.new(0)
characters.each { |character| counts[character] += 1 }
print counts.max_by { |k, v| v }

Using one Array to search a second array for frequency in ruby

I have an Array-1 say
arr1 =['s','a','sd','few','asdw','a','sdfeg']
And a second Array
arr2 = ['s','a','d','f','w']
I want to take arr1 and sort the frequency of letters by inputting arr2 with result
[s=> 4, a=> 2, d => 3] So on and so forth.
As far as I can muddle around.. Nothing below works, Just my thoughts on it?
hashy = Hash.new
print "give me a sentance "
sentance = gets.chomp.downcase.delete!(' ')
bing = sentance.split(//)
#how = sentance.gsub!(/[^a-z)]/, "") #Remove nil result
#chop = how.to_s.split(//).uniq
#hashy << bing.each{|e| how[e] }
#puts how.any? {|e| bing.count(e)}
#puts how, chop
bing.each {|v| hashy.store(v, hashy[v]+1 )}
puts bing
Thank you for your time.

I assumed that you want to count all letters in the sentence you put in, and not array 1. Assuming that, here's my take on it:
hashy = Hash.new()
['s','a','d','f','w'].each {|item| hashy[item.to_sym] = 0}
puts "give me a sentence"
sentence = gets.chomp.downcase.delete!(' ')
sentence_array = []
sentence.each_char do |l|
sentence_array.push(l)
end
hashy.each do |key, value|
puts "this is key: #{key} and value #{hashy[key]}"
sentence_array.each do |letter|
puts "letter: #{letter}"
if letter.to_sym == key
puts "letter #{letter} equals key #{key}"
value = value + 1
hashy[key] = value
puts "value is now #{value}"
end
end
end
puts hashy

Word Count returns an array (of arrays of the form [word, count]) representing the frequency of each word

str = 'put returns between paragraph put returns between paragraph put returns between paragraph'
def word_count(string)
resut= []
return result = string.split.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
end
def parse_word(word)
word.gsub!(/[^a-zA-Z0-9]/, " ")
word.downcase!
#yoo= word
end
result =word_count(str)
print result, "\n\n"
res2 = result.select { |pair| pair[1] > 1 } `#Error coming`
I am getting OutPut
**
OutPut
**
{"put"=>3, "returns"=>3, "between"=>3, "paragraph"=>3}
I need OutPut Like this
**
OutPut
**
{"put"=>3, "returns"=>3, "between"=>3, "paragraph"=>3}
and
put: 3
returns: 3
between: 3
but the main problem is that he gave us the code to do that but i cant able to understand it
I am not getting this what this code will do can anyone help me ...And modify it so it can work
The following processes the first paragraph of put returns ... Note that ss is an array of those words that occur at least twice in this paragraph.
nect = ss.select { |p| p[1] > 1 }
nect .sort.each do |key, count|
puts "#{key}: #{count}"
end

module WordCount
def self.word_count(s)
count_frequency(words_from_string(s))
end
def self.word_count_from_file(filename)
s = File.open(filename) { |file| file.read }
word_count(s)
end
def self.words_from_string(s)
s.downcase.scan(/[\w']+/)
end
def self.count_frequency(words)
counts = Hash.new(0)
for word in words
counts[word] += 1
end
# counts.to_a.sort {|a,b| b[1] <=> a[1]}
# sort by decreasing count, then lexicographically
counts.to_a.sort do |a,b|
[b[1],a[0]] <=> [a[1],b[0]]
end
end
end
def word_count(s)
WordCount.word_count(s)
end

Ruby getting the longest word of a sentence

I'm trying to create method named longest_word that takes a sentence as an argument and The function will return the longest word of the sentence.
My code is:
def longest_word(str)
words = str.split(' ')
longest_str = []
return longest_str.max
end

The shortest way is to use Enumerable's max_by:
def longest(string)
string.split(" ").max_by(&:length)
end

Using regexp will allow you to take into consideration punctuation marks.
s = "lorem ipsum, loremmm ipsummm? loremm ipsumm...."
first longest word:
s.split(/[^\w]+/).max_by(&:length)
# => "loremmm"
# or using scan
s.scan(/\b\w+\b/).max_by(&:length)
# => "loremmm"
Also you may be interested in getting all longest words:
s.scan(/\b\w+\b/).group_by(&:length).sort.last.last
# => ["loremmm", "ipsummm"]

It depends on how you want to split the string. If you are happy with using a single space, than this works:
def longest(source)
arr = source.split(" ")
arr.sort! { |a, b| b.length <=> a.length }
arr[0]
end
Otherwise, use a regular expression to catch whitespace and puntuaction.

def longest_word(sentence)
longest_word = ""
words = sentence.split(" ")
words.each do |word|
longest_word = word unless word.length < longest_word.length
end
longest_word
end
That's a simple way to approach it. You could also strip the punctuation using a gsub method.

Funcional Style Version
str.split(' ').reduce { |r, w| w.length > r.length ? w : r }
Another solution using max
str.split(' ').max { |a, b| a.length <=> b.length }

sort_by! and reverse!
def longest_word(sentence)
longw = sentence.split(" ")
longw.sort_by!(&:length).reverse!
p longw[0]
end
longest_word("once upon a time long ago a very longword")

If you truly want to do it in the Ruby way it would be:
def longest(sentence)
sentence.split(' ').sort! { |a, b| b.length <=> a.length }[0]
end

This is to strip the word from the extra chars
sen.gsub(/[^0-9a-z ]/i, '').split(" ").max_by(&:length)

Find Longest word in a string
sentence = "Hi, my name is Mesut. There is longestword here!"
def longest_word(string)
long = ""
string.split(" ").each do |sent|
if sent.length >= long.length
long = sent
end
end
return long
end
p longest_word(sentence)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to get words frequency in efficient way with ruby? - ruby

def count_words(string) string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res} end Second variant: def count_words(string) string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1} end

def count_words(string) Hash[ string.scan(/[a-zA-Z]+/) .group_by{|word| word.downcase} .map{|word, words|[word, words.size]} ] end puts count_words 'I was 09809 home -- Yes! yes! You was'

This works, and ignores the numbers: def get_words(my_str) my_str = my_str.scan(/\w+/) h = Hash.new(0) my_str.each do |s| s = s.downcase if s !~ /^[0-9]*\.?[0-9]+$/ h[s] += 1 end end return h end print get_words('I was there 1000 !') puts '\n'

class String def frequency self.scan(/[a-zA-Z]+/).each.with_object(Hash.new(0)) do |word, hash| hash[word.downcase] += 1 end end end puts "I was 09809 home -- Yes! yes! You was".frequency

Related

How can I refactor my word frequency method?

Finding the most occurring character/letter in a string

Using one Array to search a second array for frequency in ruby

Word Count returns an array (of arrays of the form [word, count]) representing the frequency of each word

Ruby getting the longest word of a sentence

Categories

Resources