Unique frequency of occurence - ruby

For a project for class, we are supposed to take a published paper and create an algorithm to create a list of all words in the unit of text while excluding the stop words. I am trying to produce a list of all unique words (in the entire text) along with their frequency of occurrence. This is the algorithm that I created for one line of the text:
x = l[125] #Selecting specific line in the text
p = Array.new() # Assign new array to variable p
p = x.split # Split the array
for i in (0...p.length)
if(p[i] != "the" and p[i] != "to" and p[i] != "union" and p[i] != "political")
print p[i] + " "
end
end
puts
The output of this program is one sentence (from line 125) excluding the stop words. Should I use bubble sort? How would I modify it to sort strings of equal length (or is that irrelevant)?

I'd say you have a good start, considering you are new to Ruby. You asked if you should use a bubble sort. I guess you're thinking of grouping multiple occurrences of a word, then go through the array to count them. That would work, but there are a couple of other approaches that are easier and more 'Ruby-like'. (By that I mean they make use of powerful features of the language and at the same time are more natural.)
Let's focus on counting the unique words in a single line. Once you can do that, you should be able to easily generalize that for multiple lines.
First Method: Use a Hash
The first approach is to use a hash. h = {} creates a new empty one. The hash's keys will be words and its values will be the number of times each word is present in the line. For example, if the word "cat" appears 9 times, we will have h["cat"] = 9, just what you need. To construct this hash, we see if each word w in the line is already in hash. It is in the hash if
h[w] != nil
If it is, we increment the word count:
h[w] = h[w] + 1
or just
h[w] += 1
If it's not in the hash, we add the word to the hash like this:
h[w] = 1
That means we can do this:
if h[w]
h[w] += 1
else
h[w] = 1
end
Note that here if h[w] is the same as if h[w] != nil.
Actually, we can use a trick to make this even easier. If we create the hash like this:
h = Hash.new(0)
then any key we add without a value will be assigned a default value of zero. That way we don't have to check if the word is already in the hash; we simply write
h[w] += 1
If w is not in the hash, h[w] will add it and initialize it to 0, then += 1 will increment it to 1. Cool, eh?
Let's put all this together. Suppose
line = "the quick brown fox jumped over the lazy brown fox"
We convert this string to an array with the String#split method:
arr = line.split # => ["the", "quick", "brown", "fox", "jumped", \
"over", "the", "lazy", "brown", "fox"]
then
h = Hash.new(0)
arr.each {|w| h[w] += 1}
h # => {"the"=>2, "quick"=>1, "brown"=>2, "fox"=>2, "jumped"=>1, "over"=>1, "lazy"=>1}
We're done!
Second Method: use the Enumerable#group_by method
Whenever you want to group elements of an array, hash or other collection, the group_by method should come to mind.
To apply group_by to the quick, brown fox array, we provide a block that contains the grouping criterion, which in this case is simply the words themselves. This produces a hash:
g = arr.group_by {|e| e}
# => {"the"=>["the", "the"], "quick"=>["quick"], "brown"=>["brown", "brown"], \
# "fox"=>["fox", "fox"], "jumped"=>["jumped"], "over"=>["over"], "lazy"=>["lazy"]}
The next thing to do is convert the hash values to the number of occurrences of the word (e.g., convert ["the", "the"] to 2). To do this, we can create a new empty hash h, and add hash pairs to it:
h = {}
g.each {|k,v| h[k] = v.size}
h # => {"the"=>2, "quick"=>1, "brown"=>2, "fox"=>2, "jumped"=>1, "over"=>1, "lazy"=>1
One More Thing
You have this code snippet:
if(p[i] != "the" and p[i] != "to" and p[i] != "union" and p[i] != "political")
print p[i] + " "
end
Here are a couple of ways you could make this a little cleaner, both using the hash h above.
First Way
skip_words = %w[the to union political] # => ["the", "to", "union", "political"]
h.each {|k,v| (print v + ' ') unless skip_words.include?(k)}
Second Way
h.each |k,v|
case k
when "the", "to", "union", "political"
next
else
puts "The word '#{k}' appears #{v} times."
end
end
Edit to address your comment. Try this:
p = "The quick brown fox jumped over the quick grey fox".split
freqs = Hash.new(0)
p.each {|w| freqs[w] += 1}
sorted_freqs = freqs.sort_by {|k,v| -v}
sorted_freqs.each {|word, freq| puts word+' '+freq.to_s}
=>
quick 2
fox 2
jumped 1
The 1
brown 1
over 1
the 1
grey 1
Normally, ypu would not sort a hash; rather you'd first convert it to an array:
sorted_freqs = freqs.to_a.sort_by {|x,y| v}.reverse
or
sorted_freqs = freqs.to_a.sort_by {|x,y| -v}
Now sorted_freqs is an array, rather than a hash. The last line stays the same. In general, it's best not to rely on a hash's order. In fact, before Ruby version 1.9.2, hashes were not ordered. If order is important, use an array or convert a hash to array.
Having said that, you can sort smallest-to-largest on the hash values, or (as I have done), sort largest-to-smallest on the negative of the hash values. Note that there is no Enumerable#reverse or Hash#reverse. Alternatively (always many ways to skin a cat with Ruby), you could sort on v and then use Enumerable#reverse_each:
sorted_freqs.reverse_each {|word, freq| puts word+' '+freq.to_s}
Lastly, you could eliminate the temporary variable sorted_freqs (needed because there is no Enumerable#sort_by! method), by chaining the last two statements:
freqs.sort_by {|k,v| -v}.each {|word, freq| puts word+' '+freq.to_s}

You should really look into Ruby's enumerable class. you very seldom do for x in y in ruby.
word_list = ["the", "to", "union", "political"]
l[125].split.each do |word|
print word + " " unless word_list.include?(word)
end
In order to count, sort and all that stuff look into the group_by method and perhaps the sort_by method of arrays.

Related

Having trouble adding new elements to my hash (Ruby)

new to Ruby, new to coding in general...
I'm trying to add new elements into my hash, incrementing the value when necessary. So I used Hash.new(0) and I'm trying to add new values using the "+=" symbol, but when I do this I get an error message -
"/tmp/file.rb:6:in `+': String can't be coerced into Integer (TypeError)
from /tmp/file.rb:6:in `block in stockList'
from /tmp/file.rb:3:in `each'
from /tmp/file.rb:3:in `each_with_index'
from /tmp/file.rb:3:in `stockList'
from /tmp/file.rb:24:in `<main>'
"
Here's my code:
def stockList(stock, cat)
hash = Hash.new(0)
stock.each_with_index do |word, i|
if cat.include?(word[i])
char = word[i]
hash[char] += num(word)
end
end
new_arr = []
hash.each do |k, v|
new_arr.push(k,v)
end
return new_arr
end
def num(word)
nums = "1234567890"
word.each_char.with_index do |char, i|
if nums.include?(char)
return word[i..-1]
end
end
end
puts stockList(["ABAR 200", "CDXE 500", "BKWR 250", "BTSQ 890", "DRTY 600"], ["A", "B"])
Does anyone know why this is happening?
It's a codewars challenge -- I'm basically given two arrays and am meant to return a string that adds the numbers associated with the word that starts with the letter(s) listed in the second array.
For this input I'm meant to return " (A : 200) - (B : 1140) "
Your immediate problem is that num(word) returns a string, and a string can't be added to a number in the line hash[char] += num(word). You can convert the string representation of a numeric value using .to_i or .to_f, as appropriate for the problem.
For the overall problem I think you've added too much complexity. The structure of the problem is:
Create a storage object to tally up the results.
For each string containing a stock and its associated numeric value (price? quantity?), split the string into its two tokens.
If the first character of the stock name is one of the target values,
update the corresponding tally. This will require conversion from string to integer.
Return the final tallies.
One minor improvement is to use a Set for the target values. That reduces the work for checking inclusion from O(number of targets) to O(1). With only two targets, the improvement is negligible, but would be useful if the list of stocks and targets increase beyond small test-case problems.
I've done some renaming to hopefully make things clearer by being more descriptive. Without further ado, here it is in Ruby:
require 'set'
def get_tallies(stocks, prefixes)
targets = Set.new(prefixes) # to speed up .include? check below
tally = Hash.new(0)
stocks.each do |line|
name, amount = line.split(/ +/) # one or more spaces is token delimiter
tally[name[0]] += amount.to_i if targets.include?(name[0]) # note conversion to int
end
tally
end
stock_list = ["ABAR 200", "CDXE 500", "BKWR 250", "BTSQ 890", "DRTY 600"]
prefixes = ["A", "B"]
p get_tallies(stock_list, prefixes)
which prints
{"A"=>200, "B"=>1140}
but that can be formatted however you like.
The particular issue triggering this error is that your def num(word) is essentially a no-op, returning the word without any change.
But you actually don't need this function: this...
word.delete('^0-9').to_i
... gives you back the word with all non-digit characters stripped, cast to integer.
Note that without to_i you'll still receive the "String can't be coerced into Integer" error: Ruby is not as forgiving as JavaScript, and tries to protect you from results that might surprise you.
It's a codewars challenge -- I'm basically given two arrays and am
meant to return a string that adds the numbers associated with the
word that starts with the letter(s) listed in the second array.
For this input I'm meant to return " (A : 200) - (B : 1140) "
This is one way to get there:
def stockList(stock, cat)
hash = Hash.new(0)
stock.each do |word|
letter = word[0]
if cat.include?(letter)
hash[letter] += word.delete('^0-9').to_i
end
end
hash.map { |k, v| "#{k}: #{v}" }
end
Besides type casting, there's another difference here: always choosing the initial letter of the word. With your code...
stock.each_with_index do |word, i|
if cat.include?(word[i])
char = word[i]
... you actually took the 1st letter of the 1st ticker, the 2nd letter of the 2nd ticker and so on. Don't use indexes unless your results depend on them.
stock = ["ABAR 200", "CDXE 500", "BKWR 250", "BTSQ 890", "DRTY 600"]
cat = ["A", "B"]
I concur with your decision to create a hash h with the form of Hash::new that takes an argument (the "default value") which h[k] returns when h does not have a key k. As a first step we can write:
h = stock.each_with_object(Hash.new(0)) { |s,h| h[s[0]] += s[/\d+/].to_i }
#=> {"A"=>200, "C"=>500, "B"=>1140, "D"=>600}
Then Hash#slice can be used to extract the desired key-value pairs:
h = h.slice(*cat)
#=> {"A"=>200, "B"=>1140}
At this point you have all the information you need to display the result any way you like. For example,
" " << h.map { |k,v| "(#{k} : #{v})" }.join(" - ") << " "
#=> " (A : 200) - (B : 1140) "
If h before h.slice(*cat) is large relative to h.slice(*cat) you can reduce memory requirements and probably speed things somewhat by writing the following.
require 'set'
cat_set = cat.to_set
#=> #<Set: {"A", "B"}>
h = stock.each_with_object(Hash.new(0)) do |s,h|
h[s[0]] += s[/\d+/].to_i if cat_set.include?(s[0])
end
#=> {"A"=>200, "B"=>1140}

Check whether a string contains all the characters of another string in Ruby

Let's say I have a string, like string= "aasmflathesorcerersnstonedksaottersapldrrysaahf". If you haven't noticed, you can find the phrase "harry potter and the sorcerers stone" in there (minus the space).
I need to check whether string contains all the elements of the string.
string.include? ("sorcerer") #=> true
string.include? ("harrypotterandtheasorcerersstone") #=> false, even though it contains all the letters to spell harrypotterandthesorcerersstone
Include does not work on shuffled string.
How can I check if a string contains all the elements of another string?
Sets and array intersection don't account for repeated chars, but a histogram / frequency counter does:
require 'facets'
s1 = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
s2 = "harrypotterandtheasorcerersstone"
freq1 = s1.chars.frequency
freq2 = s2.chars.frequency
freq2.all? { |char2, count2| freq1[char2] >= count2 }
#=> true
Write your own Array#frequency if you don't want to the facets dependency.
class Array
def frequency
Hash.new(0).tap { |counts| each { |v| counts[v] += 1 } }
end
end
I presume that if the string to be checked is "sorcerer", string must include, for example, three "r"'s. If so you could use the method Array#difference, which I've proposed be added to the Ruby core.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
str = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
target = "sorcerer"
target.chars.difference(str.chars).empty?
#=> true
target = "harrypotterandtheasorcerersstone"
target.chars.difference(str.chars).empty?
#=> true
If the characters of target must not only be in str, but must be in the same order, we could write:
target = "sorcerer"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /s.*o.*r.*c.*e.*r.*e.*r/
str =~ r
#=> 2 (truthy)
(or !!(str =~ r) #=> true)
target = "harrypotterandtheasorcerersstone"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /h.*a.*r.*r.*y* ... o.*n.*e/
str =~ r
#=> nil
A different albeit not necessarily better solution using sorted character arrays and sub-strings:
Given your two strings...
subject = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
search = "harrypotterandthesorcerersstone"
You can sort your subject string using .chars.sort.join...
subject = subject.chars.sort.join # => "aaaaaaacddeeeeeffhhkllmnnoooprrrrrrssssssstttty"
And then produce a list of substrings to search for:
search = search.chars.group_by(&:itself).values.map(&:join)
# => ["hh", "aa", "rrrrrr", "y", "p", "ooo", "tttt", "eeeee", "nn", "d", "sss", "c"]
You could alternatively produce the same set of substrings using this method
search = search.chars.sort.join.scan(/((.)\2*)/).map(&:first)
And then simply check whether every search sub-string appears within the sorted subject string:
search.all? { |c| subject[c] }
Create a 2 dimensional array out of your string letter bank, to associate the count of letters to each letter.
Create a 2 dimensional array out of the harry potter string in the same way.
Loop through both and do comparisons.
I have no experience in Ruby but this is how I would start to tackle it in the language I know most, which is Java.

Find if all letters in a string are unique

I need to know if all letters in a string are unique. For a string to be unique, a letter can only appear once. If all letters in a string are distinct, the string is unique. If one letter appears multiple times, the string is not unique.
"Cwm fjord veg balks nth pyx quiz."
# => All 26 letters are used only once. This is unique
"This is a string"
# => Not unique, i and s are used more than once
"two"
# => unique, each letter is shown only once
I tried writing a function that determines whether or not a string is unique.
def unique_characters(string)
for i in ('a'..'z')
if string.count(i) > 1
puts "This string is unique"
else
puts "This string is not unique"
end
end
unique_characters("String")
I receive the output
"This string is unique" 26 times.
Edit:
I would like to humbly apologize for including an incorrect example in my OP. I did some research, trying to find pangrams, and assumed that they would only contain 26 letters. I would also like to thank you guys for pointing out my error. After that, I went on wikipedia to find a perfect pangram (I wrongly thought the others were perfect).
Here is the link for reference purposes
http://en.wikipedia.org/wiki/List_of_pangrams#Perfect_pangrams_in_English_.2826_letters.29
Once again, my apologies.
s = "The quick brown fox jumps over the lazy dog."
.downcase
("a".."z").all?{|c| s.count(c) <= 1}
# => false
Another way to do it is:
s = "The quick brown fox jumps over the lazy dog."
(s.downcase !~ /([a-z]).*\1/)
# => false
I would solve this in two steps: 1) extract the letters 2) check if there are duplicates:
letters = string.scan(/[a-z]/i) # append .downcase to ignore case
letters.length == letters.uniq.length
Here is a method that does not convert the string to an array:
def dupless?(str)
str.downcase.each_char.with_object('') { |c,s|
c =~ /[a-z]/ && s.include?(c) ? (return false) : s << c }
true
end
dupless?("Cwm fjord veg balks nth pyx quiz.") #=> true
dupless?("This is a string.") #=> false
dupless?("two") #=> true
dupless?("Two tubs") #=> false
If you want to actually keep track of the duplicate characters:
def is_unique?(string)
# Remove whitespaces
string = string.gsub(/\s+/, "")
# Build a hash counting all occurences of each characters
h = Hash.new { |hash, key| hash[key] = 0 }
string.chars.each { |c| h[c] += 1 }
# An array containing all the repetitions
res = h.keep_if {|k, c| c > 1}.keys
if res.size == 0
puts "All #{string.size} characters are used only once. This is unique"
else
puts "Not unique #{res.join(', ')} are used more than once"
end
end
is_unique?("This is a string") # Not unique i, s are used more than once
is_unique?("two") # All 3 characters are used only once. This is unique
To check if a string is unique or not, you can try out this:
string_input.downcase.gsub(/[^a-z]/, '').split("").sort.join('') == ('a' .. 'z').to_a.join('')
This will return true, if all the characters in your string are unique and if they include all the 26 characters.
def has_uniq_letters?(str)
letters = str.gsub(/[^A-Za-z]/, '').chars
letters == letters.uniq
end
If this doesn't have to be case sensitive,
def has_uniq_letters?(str)
letters = str.downcase.gsub(/[^a-z]/, '').chars
letters == letters.uniq
end
In your example, you mentioned you wanted additional information about your string (list of unique characters, etc), so this example may also be useful to you.
# s = "Cwm fjord veg balks nth pyx quiz."
s = "This is a test string."
totals = Hash.new(0)
s.downcase.each_char { |c| totals[c] += 1 if ('a'..'z').cover?(c) }
duplicates, uniques = totals.partition { |k, v| v > 1 }
duplicates, uniques = Hash[duplicates], Hash[uniques]
# duplicates = {"t"=>4, "i"=>3, "s"=>4}
# uniques = {"h"=>1, "a"=>1, "e"=>1, "r"=>1, "n"=>1, "g"=>1}

Build an array of descending match counts?

I have a hash where the keys are book titles and the values are an array of words in the book.
I want to write a method where, if I enter a word, I can search through the hash to find which array has the highest frequency of the word. Then I want to return an array of the book titles in order of most matches.
The method should return an array in descending order of highest amount of occurrences of the searched word.
This is what I have so far:
def search(query)
books_names = #book_info.keys
book_info = {}
#result.each do |key,value|
contents = #result[key]
if contents.include?(query)
book_info[:key] += 1
end
end
end
If book_info is your hash and input_str is the string you want to search in book_info, the following will return you a hash in the order of frequency of input_str in the text:
Hash[book_info.sort_by{|k, v| v.count(input_str)}.reverse]
If you want output to be an array of book names, remove Hash and take out the first elements:
book_info.sort_by{|k, v| v.count(input_str)}.reverse.map(&:first)
This is a more compact version(but little bit slow), replacing reverse with negative sort criteria:
book_info.sort_by{|k, v| -v.count(input_str)}.map(&:first)
You may want to consider creating a Book class. Here's a book class that will index the words into a word_count hash for quick sorting.
class Book
attr_accessor :title, :words
attr_reader :word_count
#books = []
class << self
attr_accessor :books
def top(word)
#books.sort_by{|b| b.word_count[word.downcase]}.reverse
end
end
def initialize
self.class.books << self
#word_count = Hash.new { |h,k| h[k] = 0}
end
def words=(str)
str.gsub(/[^\w\s]/,"").downcase.split.each do |word|
word_count[word] += 1
end
end
def to_s
title
end
end
Use it like so:
a = Book.new
a.title = "War and Peace"
a.words = "WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bonaparte family."
b = Book.new
b.title = "Moby Dick"
b.words = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world."
puts Book.top("ago")
result:
Moby Dick
War and Peace
Here is one way to build a hash whose keys are words and whose values are arrays of hashes with keys :title and :count, the hashes ordered by decreasing value of count.
Code
I am assuming we will start with a hash books, whose keys are titles and whose values are all the text in the book in one string.
def word_count_hash(books)
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
title_and_count_by_word = word_and_count_by_title
.each_with_object({}) { |(title,words),g| words.each { |w,count|
g.update({w =>[{title: title, count: count}]}){|_,oarr,narr|oarr+narr}}}
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
end
Example
books = {}
books["Grapes of Wrath"] =
<<_
To the red country and part of the gray country of Oklahoma, the last rains
came gently, and they did not cut the scarred earth. The plows crossed and
recrossed the rivulet marks. The last rains lifted the corn quickly and
scattered weed colonies and grass along the sides of the roads so that the
gray country and the dark red country began to disappear under a green cover.
_
books["Tale of Two Cities"] =
<<_
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us, we were all going direct to Heaven, we were all
going direct the other way
_
books["Moby Dick"] =
<<_
Call me Ishmael. Some years ago - never mind how long precisely - having little
or no money in my purse, and nothing particular to interest me on shore, I
thought I would sail about a little and see the watery part of the world. It is
a way I have of driving off the spleen and regulating the circulation. Whenever
I find myself growing grim about the mouth; whenever it is a damp, drizzly
November in my soul; whenever I find myself involuntarily pausing before coffin
warehouses
_
Construct the hash:
title_and_count_by_word = word_count_hash(books)
and then look up words:
title_and_count_by_word["the"]
#=> [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}]
title_and_count_by_word["to"]
#=> [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}]
Note the words being looked up must be entered in (or converted to) lower case.
Explanation
Construct the first hash:
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
#=> {"Grapes of Wrath"=>
# {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1},
# "Tale of Two Cities"=>
# {"it"=>10, "was"=>10, "the"=>11, "best"=>1, "of"=>10, "times"=>2,
# ...
# "going"=>2, "direct"=>2, "to"=>1, "heaven"=>1, "other"=>1, "way"=>1},
# "Moby Dick"=>
# {"call"=>1, "me"=>2, "ishmael"=>1, "some"=>1, "years"=>1, "ago"=>1,
# ...
# "pausing"=>1, "before"=>1, "coffin"=>1, "warehouses"=>1}}
To see what's happening here, consider the first element of books that Enumerable#each_with_object passes into the block. The two block variables are assigned the following values:
title
#=> "Grapes of Wrath"
words
#=> "To the red country and part of the gray country of Oklahoma, the
# last rains came gently,\nand they did not cut the scarred earth.
# ...
# the dark red country began to disappear\nunder a green cover.\n"
each_with_object has created a hash represented by the block variable h, which is initially empty.
First construct an array of words and convert each to lower-case.
q = words.scan(/\w+/).map(&:downcase)
#=> ["to", "the", "red", "country", "and", "part", "of", "the", "gray",
# ...
# "began", "to", "disappear", "under", "a", "green", "cover"]
We may now create a hash that contains a count of each word for the title "Grapes of Wrath":
h[title] = q.each_with_object({}) { |w,g| g[w] = (g[w] || 0) + 1 }
#=> {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1}
Note the expression
g[w] = (g[w] || 0) + 1
If the hash g already has a key for the word w, this expression is equivalent to
g[w] = g[w] + 1
On the other hand, if g does not have this key (word) (in which case g[w] => nil), then the expression is eqivalent to
g[w] = 0 + 1
The same calculations are then performed for each of the other two books.
We can now construct the second hash.
title_and_count_by_word =
word_and_count_by_title.each_with_object({}) { |(title,words),g|
words.each { |w,count| g.update({ w => [{title: title, count: count}]}) \
{ |_, oarr, narr| oarr + narr } } }
#=> {"to" => [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
#=> "the" => [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=> [{:title=>"Moby Dick", :count=>1}]}
(Note that this operation does not order the hashes for each word by :count, even though that may appear to be the case in this output fragment. The hashes are sorted in the next and final step.)
The main operation here that requires explanation is Hash#update (aka Hash#merge!). We are building a hash denoted by the block variable g, which initially is empty. The keys of this hash are words, the values are hashes with keys :title and :count. Whenever the hash being merged has a key (word) that is already a key of g, the block
{ |_, oarr, narr| oarr + narr }
is called to determine the value for the key in the merged hash. The block variables here are the key (word) (which we have replaced with an underscore because it will not be used), the old array of hashes and the new array of hashes to be merged (of which there is just one). We simply add the new hash to merged array of hashes.
Lastly we sort the values of the hash (which are arrays of hashes) on decreasing value of :count.
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
#=> {"to"=>
# [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
# "the"=>
# [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=>[{:title=>"Moby Dick", :count=>1}]}

How to make sure certain elements not get into arrays in Ruby

I have an array lets say
array1 = ["abc", "a", "wxyz", "ab",......]
How do I make sure neither for example "a" (any 1 character), "ab" (any 2 characters), "abc" (any 3 characters), nor words like "that", "this", "what" etc nor any of the foul words are saved in array1?
This removes elements with less than 4 characters and words like this, that, what from array1 (if I got it right):
array1.reject! do |el|
el.length < 4 || ['this', 'that', 'what'].include?(el)
end
This changes array1. If you use reject (without !), it'll return the result and not change array1
You can open and add a new interface to the Array class which will disallow certain words. Example:
class Array
def add(ele)
unless rejects.include?(ele)
self.push ele
end
end
def rejects
['this', 'that', 'what']
end
end
arr = []
arr.add "one"
puts arr
arr.add "this"
puts arr
arr.add "aslam"
puts arr
Output would be:
one one one aslam
And notice the word "this" was not added.
You could create a stop list. Using a hash for this would be more efficient than an array as lookup time will be consistant with a hash. With an array the lookup time is proportional to the number of elements in the array. If you are going to check for stop words a lot, I suggest using a hash that contains all the stop words. Using your code, you could do the following
badwords_a = ["abc", "a", "wxyz", "ab"] # Your array of bad words
badwords_h = {} # Initialize and empty hash
badwords_a.each{|word| badwords_h[word] = nil} # Fill the hash
goodwords = []
words_to_process = ["abc","a","Foo","Bar"] # a list of words you want to process
words_to_process.each do |word| # Process new words
if badwords_h.key?(word)
else
goodwords << word # Add the word if it did not match the bad list
end
end
puts goodwords.join(", ")

Resources