ruby searching array for keywords - ruby

I am parsing a large CSV file in a ruby script and need to find the closest match for a title from some search keys. The search keys maybe one or more values and the values may not exactly match as per below (should be close)
search_keys = ["big", "bear"]
A large array containing data that I need to search through, only want to search on the title column:
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
In this case I would want it to return the row ["3", "three big bears", "2626", "a heart warmer"] as this is the closest match to my search keys.
I want it to return the closest match from the search keys given.
Is there any helpers/libraries/gems I can use? Anyone done this before??

I am worried, this task should be handled to any search engine at db level or similar, no point fetching data in app and do searching across columns/rows etc, should be expensive. but for now here is the plain simple approach :)
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
h = {}
search_keys = ["big", "bear"]
array[1..-1].each do |rec|
rec_id = rec[0].to_i
search_keys.each do |key|
if rec[1].include? key
h[rec_id] = h[rec_id] ? (h[rec_id]+1) : 1
end
end
end
closest = h.keys.first
h.each do |rec, count|
closest = rec if h[closest] < h[rec]
end
array[closest] # => desired output :)

I think you can do it by your self and no need to use any gems!
This may be close to what you need; searching in the array for the keys and set a rank for each found element.
result = []
array.each do |ar|
rank = 0
search_keys.each do |key|
if ar[1].include?(key)
rank += 1
end
end
if rank > 0
result << [rank, ar]
end
end
This code can be written better than the above, but i wanted to show you the details.

This works. Will find and return an array of matched* rows as result.
*matched rows = a row where the id, title, code or description match ANY of the provided seach_keys. incl partial searches such as 'bear' in 'bears'
result = []
array.each do |a|
a.each do |i|
search_keys.each do |k|
result << a if i.include?(k)
end
end
end
result.uniq!

You could probably write it in a more succinct way...
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
search_keys = ["big", "bear"]
def sift(records, target_field, search_keys)
# find target_field index
target_field_index = nil
records.first.each_with_index do |e, i|
if e == target_field
target_field_index = i
break
end
end
if target_field_index.nil?
raise "Target field was not found"
end
# sums up which records have a match and how many keys they match
# key => val = record => number of keys matched
counter = Hash.new(0) # each new hash key is init'd with value of 0
records.each do |record| # look at all our given records
search_keys.each do |key| # check each search key on the field
if record[target_field_index].include?(key)
counter[record] += 1 # found a key, init to 0 if required and increment count
end
end
end
# find the result with the most search key matches
top_result = counter.to_a.reduce do |top, record|
if record[1] > top[1] # [0] = record, [1] = key hit count
top = record # set to new top
end
top # continue with reduce
end.first # only care about the record (not the key hit count)
end
puts "Top result: #{sift array, 'title', search_keys}"
# => Top result: ["3", "three big bears", "2626", "a heart warmer"]

Here is my one-line shot
p array.find_all {|a|a.join.scan(/#{search_keys.join("|")}/).length==search_keys.length}
=>[["3", "three big bears", "2626", "a heart warmer"]]
to get all the rows in order of number of matches
p array.drop(1).sort_by {|a|a.join.scan(/#{search_keys.join("|")}/).length}.reverse
Anyone knows how to combine the last solution so that the rows that contain none of the keys are dropped and to keep it concise as is ?

Related

Unscrambling a string given the number of splits and words that the sentence can be comprised of

Im working on a problem in which I'm given a string that has been scrambled. The scrambling works like this.
An original string is chopped into substrings at random positions and a random number of times.
Each substring is then moved around randomly to form a new string.
I'm also given a dictionary of words that are possible words in the string.
Finally, i'm given the number of splits in the string that were made.
The example I was given is this:
dictionary = ["world", "hello"]
scrambled_string = rldhello wo
splits = 1
The expected output of my program would be the original string, in this case:
"hello world"
Suppose the initial string
"hello my name is Sean"
with
splits = 2
yields
["hel", "lo my name ", "is Sean"]
and those three pieces are shuffled to form the following array:
["lo my name ", "hel", "is Sean"]
and then the elements of this array are joined to form:
scrambled = "lo my name helis Sean"
Also suppose:
dictionary = ["hello", "Sean", "the", "name", "of", "my", "cat", "is", "Sugar"]
First convert dictionary to a set to speed lookups.
require 'set'
dict_set = dictionary.to_set
#=> #<Set: {"hello", "Sean", "the", "name", "of", "my", "cat", "is", "Sugar"}>
Next I will create a helper method.
def indices_to_ranges(indices, last_index)
[-1, *indices, last_index].each_cons(2).map { |i,j| i+1..j }
end
Suppose we split scrambled twice (because splits #=> 2), specifically after the 'y' and the 'h':
indices = [scrambled.index('y'), scrambled.index('h')]
#=> [4, 11]
The first element of indices will always be -1 and the last value will always be scrambled.size-1.
We may then use indices_to_ranges to convert these indices to ranges of indices of characters in scrambed:
ranges = indices_to_ranges(indices, scrambled.size-1)
#=> [0..4, 5..11, 12..20]
a = ranges.map { |r| scrambled[r] }
#=> ["lo my", " name h", "elis Sean"]
We could of course combine these two steps:
a = indices_to_ranges(indices, scrambled.size-1).map { |r| scrambled[r] }
#=> ["lo my", " name h", "elis Sean"]
Next I will permute the values of a. For each permutation I will join the elements to form a string, then split the string on single spaces to form an array of words. If all of those words are in the dictionary we may claim success and are finished. Otherwise, a different array indices will be constructed and we try again, continuing until success is realized or all possible arrays indices have been considered. We can put all this in the following method.
def unscramble(scrambled, dict_set, splits)
last_index = scrambled.size-1
(0..scrambled.size-2).to_a.combination(splits).each do |indices|
indices_to_ranges(indices, last_index).
map { |r| scrambled[r] }.
permutation.each do |arr|
next if arr[0][0] == ' ' || arr[-1][-1] == ' '
words = arr.join.split(' ')
return words if words.all? { |word| dict_set.include?(word) }
end
end
end
Let's try it.
original string: "hello my name is Sean"
scrambled = "lo my name helis Sean"
splits = 4
unscramble(scrambled, dict_set, splits)
#=> ["my", "name", "hello", "is", "Sean"]
See Array#combination and Array#permutation.
bonkers answer (not quite perfect yet ... trouble with single chars):
#
# spaces appear to be important!
#check = {}
#ordered = []
def previous_words (word)
#check.select{|y,z| z[:previous] == word}.map do |nw,z|
#ordered << nw
previous_words(nw)
end
end
def in_word(dictionary, string)
# check each word in the dictionary to see if the string is container in one of them
dictionary.each do |word|
if word.include?(string)
return word
end
end
return nil
end
letters=scrambled.split("")
previous=nil
substr=""
letters.each do |l|
if in_word(dictionary, substr+l)
substr+= l
elsif (l==" ")
word=in_word(dictionary, substr)
#check[word]={found: 1}
#check[word][:previous] = previous if previous
substr=""
previous=word
else
word=in_word(dictionary, substr)
#check[word]={found: 1}
#check[word][:previous] = previous if previous
substr=l
previous=nil
end
end
word=in_word(dictionary, substr)
#check[word]={found: 1}
#check[word][:previous] = previous if previous
#check.select{|y,z| z[:previous].nil?}.map do |w,z|
#ordered << w
previous_words(w)
end
pp #ordered
output:
dictionary = ["world", "hello"]
scrambled = "rldhello wo"
... my code here ...
2.5.8 :817 > #ordered
=> ["hello", "world"]
dictionary = ["hello", "my", "name", "is", "Sean"]
scrambled = "me is Shelleano my na"
... my code here ...
2.5.8 :879 > #ordered
=> ["Sean", "hello", "my", "name", "is"]

Compare string against array and extract array elements present in ruby

I have the following string:
str = "This is a string"
What I want to do is compare it with this array:
a = ["this", "is", "something"]
The result should be an array with "this" and "is" because both are present in the array and in the given string. "something" is not present in the string so it shouldn't appear. How can I do this?
One way to do this:
str = "This is a string"
a = ["this","is","something"]
str.downcase.split & a
# => ["this", "is"]
I am assuming Array a will always have keys(elements) in downcase.
There's always many ways to do this sort of thing
str = "this is the example string"
words_to_compare = ["dogs", "ducks", "seagulls", "the"]
words_to_compare.select{|word| word =~ Regexp.union(str.split) }
#=> ["the"]
Your question has an XY problem smell to it. Usually when we want to find what words exist the next thing we want to know is how many times they exist. Frequency counts are all over the internet and Stack Overflow. This is a minor modification to such a thing:
str = "This is a string"
a = ["this", "is", "something"]
a_hash = a.each_with_object({}) { |i, h| h[i] = 0 } # => {"this"=>0, "is"=>0, "something"=>0}
That defined a_hash with the keys being the words to be counted.
str.downcase.split.each{ |k| a_hash[k] += 1 if a_hash.key?(k) }
a_hash # => {"this"=>1, "is"=>1, "something"=>0}
a_hash now contains the counts of the word occurrences. if a_hash.key?(k) is the main difference we'd see compared to a regular word-count as it's only allowing word-counts to occur for the words in a.
a_hash.keys.select{ |k| a_hash[k] > 0 } # => ["this", "is"]
It's easy to find the words that were in common because the counter is > 0.
This is a very common problem in text processing so it's good knowing how it works and how to bend it to your will.

Calling specific element from array not returning (Ruby)

I can't tell what's wrong with my code:
def morse_code(str)
string = []
string.push(str.split(' '))
puts string
puts string[2]
end
What I'm expecting is if I use "what is the dog" for str, I would get the following results:
=> ["what", "is", "the", "dog"]
=> "the"
But what I get instead is nil. If I do string[0], it just gives me the entire string again. Does the .split function not break them up into different elements? If anyone could help, that would be great. Thank you for taking the time to read this.
Your code should be :
def morse_code(str)
string = []
string.push(*str.split(' '))
puts string
p string[2]
end
morse_code("what is the dog" )
# >> what
# >> is
# >> the
# >> dog
# >> "the"
str.split(' ') is giving ["what", "is", "the", "dog"], and you are pushing this array object to the array string. Thus string became [["what", "is", "the", "dog"]]. Thus string is an array of size 1. Thus if you want to access any index like 1, 2 so on.., you will get nil. You can debug it using p(it calls #inspect on the array), BUT NOT puts.
def morse_code(str)
string = []
string.push(str.split(' '))
p string
end
morse_code("what is the dog" )
# >> [["what", "is", "the", "dog"]]
With Array, puts works completely different way than p. I am not good to read MRI code always, thus I take a look at sometime Rubinious code. Look how they defined IO::puts, which is same as MRI. Now look the specs for the code
it "flattens a nested array before writing it" do
#io.should_receive(:write).with("1")
#io.should_receive(:write).with("2")
#io.should_receive(:write).with("3")
#io.should_receive(:write).with("\n").exactly(3).times
#io.puts([1, 2, [3]]).should == nil
end
it "writes nothing for an empty array" do
x = []
#io.should_receive(:write).exactly(0).times
#io.puts(x).should == nil
end
it "writes [...] for a recursive array arg" do
x = []
x << 2 << x
#io.should_receive(:write).with("2")
#io.should_receive(:write).with("[...]")
#io.should_receive(:write).with("\n").exactly(2).times
#io.puts(x).should == nil
end
We can now be sure that, IO::puts or Kernel::puts behaves with array just the way, as Rubinious people implemented it. You can now take a look at the MRI code also. I just found the MRI one, look the below test
def test_puts_recursive_array
a = ["foo"]
a << a
pipe(proc do |w|
w.puts a
w.close
end, proc do |r|
assert_equal("foo\n[...]\n", r.read)
end)
end

Build an array of descending match counts?

I have a hash where the keys are book titles and the values are an array of words in the book.
I want to write a method where, if I enter a word, I can search through the hash to find which array has the highest frequency of the word. Then I want to return an array of the book titles in order of most matches.
The method should return an array in descending order of highest amount of occurrences of the searched word.
This is what I have so far:
def search(query)
books_names = #book_info.keys
book_info = {}
#result.each do |key,value|
contents = #result[key]
if contents.include?(query)
book_info[:key] += 1
end
end
end
If book_info is your hash and input_str is the string you want to search in book_info, the following will return you a hash in the order of frequency of input_str in the text:
Hash[book_info.sort_by{|k, v| v.count(input_str)}.reverse]
If you want output to be an array of book names, remove Hash and take out the first elements:
book_info.sort_by{|k, v| v.count(input_str)}.reverse.map(&:first)
This is a more compact version(but little bit slow), replacing reverse with negative sort criteria:
book_info.sort_by{|k, v| -v.count(input_str)}.map(&:first)
You may want to consider creating a Book class. Here's a book class that will index the words into a word_count hash for quick sorting.
class Book
attr_accessor :title, :words
attr_reader :word_count
#books = []
class << self
attr_accessor :books
def top(word)
#books.sort_by{|b| b.word_count[word.downcase]}.reverse
end
end
def initialize
self.class.books << self
#word_count = Hash.new { |h,k| h[k] = 0}
end
def words=(str)
str.gsub(/[^\w\s]/,"").downcase.split.each do |word|
word_count[word] += 1
end
end
def to_s
title
end
end
Use it like so:
a = Book.new
a.title = "War and Peace"
a.words = "WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bonaparte family."
b = Book.new
b.title = "Moby Dick"
b.words = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world."
puts Book.top("ago")
result:
Moby Dick
War and Peace
Here is one way to build a hash whose keys are words and whose values are arrays of hashes with keys :title and :count, the hashes ordered by decreasing value of count.
Code
I am assuming we will start with a hash books, whose keys are titles and whose values are all the text in the book in one string.
def word_count_hash(books)
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
title_and_count_by_word = word_and_count_by_title
.each_with_object({}) { |(title,words),g| words.each { |w,count|
g.update({w =>[{title: title, count: count}]}){|_,oarr,narr|oarr+narr}}}
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
end
Example
books = {}
books["Grapes of Wrath"] =
<<_
To the red country and part of the gray country of Oklahoma, the last rains
came gently, and they did not cut the scarred earth. The plows crossed and
recrossed the rivulet marks. The last rains lifted the corn quickly and
scattered weed colonies and grass along the sides of the roads so that the
gray country and the dark red country began to disappear under a green cover.
_
books["Tale of Two Cities"] =
<<_
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us, we were all going direct to Heaven, we were all
going direct the other way
_
books["Moby Dick"] =
<<_
Call me Ishmael. Some years ago - never mind how long precisely - having little
or no money in my purse, and nothing particular to interest me on shore, I
thought I would sail about a little and see the watery part of the world. It is
a way I have of driving off the spleen and regulating the circulation. Whenever
I find myself growing grim about the mouth; whenever it is a damp, drizzly
November in my soul; whenever I find myself involuntarily pausing before coffin
warehouses
_
Construct the hash:
title_and_count_by_word = word_count_hash(books)
and then look up words:
title_and_count_by_word["the"]
#=> [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}]
title_and_count_by_word["to"]
#=> [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}]
Note the words being looked up must be entered in (or converted to) lower case.
Explanation
Construct the first hash:
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
#=> {"Grapes of Wrath"=>
# {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1},
# "Tale of Two Cities"=>
# {"it"=>10, "was"=>10, "the"=>11, "best"=>1, "of"=>10, "times"=>2,
# ...
# "going"=>2, "direct"=>2, "to"=>1, "heaven"=>1, "other"=>1, "way"=>1},
# "Moby Dick"=>
# {"call"=>1, "me"=>2, "ishmael"=>1, "some"=>1, "years"=>1, "ago"=>1,
# ...
# "pausing"=>1, "before"=>1, "coffin"=>1, "warehouses"=>1}}
To see what's happening here, consider the first element of books that Enumerable#each_with_object passes into the block. The two block variables are assigned the following values:
title
#=> "Grapes of Wrath"
words
#=> "To the red country and part of the gray country of Oklahoma, the
# last rains came gently,\nand they did not cut the scarred earth.
# ...
# the dark red country began to disappear\nunder a green cover.\n"
each_with_object has created a hash represented by the block variable h, which is initially empty.
First construct an array of words and convert each to lower-case.
q = words.scan(/\w+/).map(&:downcase)
#=> ["to", "the", "red", "country", "and", "part", "of", "the", "gray",
# ...
# "began", "to", "disappear", "under", "a", "green", "cover"]
We may now create a hash that contains a count of each word for the title "Grapes of Wrath":
h[title] = q.each_with_object({}) { |w,g| g[w] = (g[w] || 0) + 1 }
#=> {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1}
Note the expression
g[w] = (g[w] || 0) + 1
If the hash g already has a key for the word w, this expression is equivalent to
g[w] = g[w] + 1
On the other hand, if g does not have this key (word) (in which case g[w] => nil), then the expression is eqivalent to
g[w] = 0 + 1
The same calculations are then performed for each of the other two books.
We can now construct the second hash.
title_and_count_by_word =
word_and_count_by_title.each_with_object({}) { |(title,words),g|
words.each { |w,count| g.update({ w => [{title: title, count: count}]}) \
{ |_, oarr, narr| oarr + narr } } }
#=> {"to" => [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
#=> "the" => [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=> [{:title=>"Moby Dick", :count=>1}]}
(Note that this operation does not order the hashes for each word by :count, even though that may appear to be the case in this output fragment. The hashes are sorted in the next and final step.)
The main operation here that requires explanation is Hash#update (aka Hash#merge!). We are building a hash denoted by the block variable g, which initially is empty. The keys of this hash are words, the values are hashes with keys :title and :count. Whenever the hash being merged has a key (word) that is already a key of g, the block
{ |_, oarr, narr| oarr + narr }
is called to determine the value for the key in the merged hash. The block variables here are the key (word) (which we have replaced with an underscore because it will not be used), the old array of hashes and the new array of hashes to be merged (of which there is just one). We simply add the new hash to merged array of hashes.
Lastly we sort the values of the hash (which are arrays of hashes) on decreasing value of :count.
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
#=> {"to"=>
# [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
# "the"=>
# [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=>[{:title=>"Moby Dick", :count=>1}]}

Unique frequency of occurence

For a project for class, we are supposed to take a published paper and create an algorithm to create a list of all words in the unit of text while excluding the stop words. I am trying to produce a list of all unique words (in the entire text) along with their frequency of occurrence. This is the algorithm that I created for one line of the text:
x = l[125] #Selecting specific line in the text
p = Array.new() # Assign new array to variable p
p = x.split # Split the array
for i in (0...p.length)
if(p[i] != "the" and p[i] != "to" and p[i] != "union" and p[i] != "political")
print p[i] + " "
end
end
puts
The output of this program is one sentence (from line 125) excluding the stop words. Should I use bubble sort? How would I modify it to sort strings of equal length (or is that irrelevant)?
I'd say you have a good start, considering you are new to Ruby. You asked if you should use a bubble sort. I guess you're thinking of grouping multiple occurrences of a word, then go through the array to count them. That would work, but there are a couple of other approaches that are easier and more 'Ruby-like'. (By that I mean they make use of powerful features of the language and at the same time are more natural.)
Let's focus on counting the unique words in a single line. Once you can do that, you should be able to easily generalize that for multiple lines.
First Method: Use a Hash
The first approach is to use a hash. h = {} creates a new empty one. The hash's keys will be words and its values will be the number of times each word is present in the line. For example, if the word "cat" appears 9 times, we will have h["cat"] = 9, just what you need. To construct this hash, we see if each word w in the line is already in hash. It is in the hash if
h[w] != nil
If it is, we increment the word count:
h[w] = h[w] + 1
or just
h[w] += 1
If it's not in the hash, we add the word to the hash like this:
h[w] = 1
That means we can do this:
if h[w]
h[w] += 1
else
h[w] = 1
end
Note that here if h[w] is the same as if h[w] != nil.
Actually, we can use a trick to make this even easier. If we create the hash like this:
h = Hash.new(0)
then any key we add without a value will be assigned a default value of zero. That way we don't have to check if the word is already in the hash; we simply write
h[w] += 1
If w is not in the hash, h[w] will add it and initialize it to 0, then += 1 will increment it to 1. Cool, eh?
Let's put all this together. Suppose
line = "the quick brown fox jumped over the lazy brown fox"
We convert this string to an array with the String#split method:
arr = line.split # => ["the", "quick", "brown", "fox", "jumped", \
"over", "the", "lazy", "brown", "fox"]
then
h = Hash.new(0)
arr.each {|w| h[w] += 1}
h # => {"the"=>2, "quick"=>1, "brown"=>2, "fox"=>2, "jumped"=>1, "over"=>1, "lazy"=>1}
We're done!
Second Method: use the Enumerable#group_by method
Whenever you want to group elements of an array, hash or other collection, the group_by method should come to mind.
To apply group_by to the quick, brown fox array, we provide a block that contains the grouping criterion, which in this case is simply the words themselves. This produces a hash:
g = arr.group_by {|e| e}
# => {"the"=>["the", "the"], "quick"=>["quick"], "brown"=>["brown", "brown"], \
# "fox"=>["fox", "fox"], "jumped"=>["jumped"], "over"=>["over"], "lazy"=>["lazy"]}
The next thing to do is convert the hash values to the number of occurrences of the word (e.g., convert ["the", "the"] to 2). To do this, we can create a new empty hash h, and add hash pairs to it:
h = {}
g.each {|k,v| h[k] = v.size}
h # => {"the"=>2, "quick"=>1, "brown"=>2, "fox"=>2, "jumped"=>1, "over"=>1, "lazy"=>1
One More Thing
You have this code snippet:
if(p[i] != "the" and p[i] != "to" and p[i] != "union" and p[i] != "political")
print p[i] + " "
end
Here are a couple of ways you could make this a little cleaner, both using the hash h above.
First Way
skip_words = %w[the to union political] # => ["the", "to", "union", "political"]
h.each {|k,v| (print v + ' ') unless skip_words.include?(k)}
Second Way
h.each |k,v|
case k
when "the", "to", "union", "political"
next
else
puts "The word '#{k}' appears #{v} times."
end
end
Edit to address your comment. Try this:
p = "The quick brown fox jumped over the quick grey fox".split
freqs = Hash.new(0)
p.each {|w| freqs[w] += 1}
sorted_freqs = freqs.sort_by {|k,v| -v}
sorted_freqs.each {|word, freq| puts word+' '+freq.to_s}
=>
quick 2
fox 2
jumped 1
The 1
brown 1
over 1
the 1
grey 1
Normally, ypu would not sort a hash; rather you'd first convert it to an array:
sorted_freqs = freqs.to_a.sort_by {|x,y| v}.reverse
or
sorted_freqs = freqs.to_a.sort_by {|x,y| -v}
Now sorted_freqs is an array, rather than a hash. The last line stays the same. In general, it's best not to rely on a hash's order. In fact, before Ruby version 1.9.2, hashes were not ordered. If order is important, use an array or convert a hash to array.
Having said that, you can sort smallest-to-largest on the hash values, or (as I have done), sort largest-to-smallest on the negative of the hash values. Note that there is no Enumerable#reverse or Hash#reverse. Alternatively (always many ways to skin a cat with Ruby), you could sort on v and then use Enumerable#reverse_each:
sorted_freqs.reverse_each {|word, freq| puts word+' '+freq.to_s}
Lastly, you could eliminate the temporary variable sorted_freqs (needed because there is no Enumerable#sort_by! method), by chaining the last two statements:
freqs.sort_by {|k,v| -v}.each {|word, freq| puts word+' '+freq.to_s}
You should really look into Ruby's enumerable class. you very seldom do for x in y in ruby.
word_list = ["the", "to", "union", "political"]
l[125].split.each do |word|
print word + " " unless word_list.include?(word)
end
In order to count, sort and all that stuff look into the group_by method and perhaps the sort_by method of arrays.

Resources