Appending multiple values to one key in an empty hash - ruby

I'm trying to find the string in an array that has the most matches to dictionary words in a file. I store the score (matches) as the key of a hash and the corresponding matching strings as the value to the key. For example:
The string "XXBUTTYATCATYSSX" has three substring word matches. The score for this string would be 3. The string and score are stored in the scores hash as:
scores = { 3 => "XXBUTTYATCATYSSX" }
The string "YOUKKYUHISJFXPOP" also has three matches. This should be stored in the hash as:
scores = { 3 => "XXBUTTYATCATYSSX", "YOUKKYUHISJFXPOP" }
"
scores = { }
#scores = Hash.new { |hash, key| hash[key] = [] }
File.open("#{File.dirname(__FILE__)}/dictionary.txt","r") do |file|
#going to a string in the array
strArray.each_index do |str|
score = 0
match = strArray[str]
#going to a line in the dictionary file
file.each_line do |line|
dictWord = line.strip!.upcase
if match.include? dictWord
score += 1
end
end
#the key in the scores hash equals the score (amount of matches)
#the values in the scores hash are the matched strings that have the score of the key
#scores[score] << match
scores.merge!("#{score}" => match)
end
edit:
I've revised the code above. Now it will not enter into file.each_line do |line| after the first loop
Please help.

With File objects, you can't read them twice. That is, if you read the entire file once with each_line, then you try to do it again, the second time won't do anything because it was already at the end of the file. To read the file again, you need to rewind it with file.rewind before you try to read from it.
The second problem is that you're trying to add to an array that doesn't exist. For example:
scores = {}
scores[3] #=> nil
scores[3] << 'ASDASDASD' # crashes (can't use << with nil)
You need to create an array for each score before you can add words to it. One way to do this would be to check if the key exists before using it, like this:
scores = {}
if scores[3].nil?
scores[3] = []
end
scores[3] << 'word' # this will work

Straight to the code:
scores = Hash.new
File.open("#{File.dirname(__FILE__)}/dictionary.txt","r") do |file|
strings.each do |string|
score = 0
file.each do |line|
score += 1 if string.match(line.strip!.upcase)
end
# store score and new array unless it already have same score
scores.store(score, []) unless scores.has_key?(score)
scores[score] << string
# rewind to read dictionary from first line on next iteration
file.rewind
end
end
strings is your array of strings to compare with dict:
e.g. strings = ["XXBUTTYYOUATCATYSSX", "YOUKKYUHISJFXPOP"])

Related

Having trouble adding new elements to my hash (Ruby)

new to Ruby, new to coding in general...
I'm trying to add new elements into my hash, incrementing the value when necessary. So I used Hash.new(0) and I'm trying to add new values using the "+=" symbol, but when I do this I get an error message -
"/tmp/file.rb:6:in `+': String can't be coerced into Integer (TypeError)
from /tmp/file.rb:6:in `block in stockList'
from /tmp/file.rb:3:in `each'
from /tmp/file.rb:3:in `each_with_index'
from /tmp/file.rb:3:in `stockList'
from /tmp/file.rb:24:in `<main>'
"
Here's my code:
def stockList(stock, cat)
hash = Hash.new(0)
stock.each_with_index do |word, i|
if cat.include?(word[i])
char = word[i]
hash[char] += num(word)
end
end
new_arr = []
hash.each do |k, v|
new_arr.push(k,v)
end
return new_arr
end
def num(word)
nums = "1234567890"
word.each_char.with_index do |char, i|
if nums.include?(char)
return word[i..-1]
end
end
end
puts stockList(["ABAR 200", "CDXE 500", "BKWR 250", "BTSQ 890", "DRTY 600"], ["A", "B"])
Does anyone know why this is happening?
It's a codewars challenge -- I'm basically given two arrays and am meant to return a string that adds the numbers associated with the word that starts with the letter(s) listed in the second array.
For this input I'm meant to return " (A : 200) - (B : 1140) "
Your immediate problem is that num(word) returns a string, and a string can't be added to a number in the line hash[char] += num(word). You can convert the string representation of a numeric value using .to_i or .to_f, as appropriate for the problem.
For the overall problem I think you've added too much complexity. The structure of the problem is:
Create a storage object to tally up the results.
For each string containing a stock and its associated numeric value (price? quantity?), split the string into its two tokens.
If the first character of the stock name is one of the target values,
update the corresponding tally. This will require conversion from string to integer.
Return the final tallies.
One minor improvement is to use a Set for the target values. That reduces the work for checking inclusion from O(number of targets) to O(1). With only two targets, the improvement is negligible, but would be useful if the list of stocks and targets increase beyond small test-case problems.
I've done some renaming to hopefully make things clearer by being more descriptive. Without further ado, here it is in Ruby:
require 'set'
def get_tallies(stocks, prefixes)
targets = Set.new(prefixes) # to speed up .include? check below
tally = Hash.new(0)
stocks.each do |line|
name, amount = line.split(/ +/) # one or more spaces is token delimiter
tally[name[0]] += amount.to_i if targets.include?(name[0]) # note conversion to int
end
tally
end
stock_list = ["ABAR 200", "CDXE 500", "BKWR 250", "BTSQ 890", "DRTY 600"]
prefixes = ["A", "B"]
p get_tallies(stock_list, prefixes)
which prints
{"A"=>200, "B"=>1140}
but that can be formatted however you like.
The particular issue triggering this error is that your def num(word) is essentially a no-op, returning the word without any change.
But you actually don't need this function: this...
word.delete('^0-9').to_i
... gives you back the word with all non-digit characters stripped, cast to integer.
Note that without to_i you'll still receive the "String can't be coerced into Integer" error: Ruby is not as forgiving as JavaScript, and tries to protect you from results that might surprise you.
It's a codewars challenge -- I'm basically given two arrays and am
meant to return a string that adds the numbers associated with the
word that starts with the letter(s) listed in the second array.
For this input I'm meant to return " (A : 200) - (B : 1140) "
This is one way to get there:
def stockList(stock, cat)
hash = Hash.new(0)
stock.each do |word|
letter = word[0]
if cat.include?(letter)
hash[letter] += word.delete('^0-9').to_i
end
end
hash.map { |k, v| "#{k}: #{v}" }
end
Besides type casting, there's another difference here: always choosing the initial letter of the word. With your code...
stock.each_with_index do |word, i|
if cat.include?(word[i])
char = word[i]
... you actually took the 1st letter of the 1st ticker, the 2nd letter of the 2nd ticker and so on. Don't use indexes unless your results depend on them.
stock = ["ABAR 200", "CDXE 500", "BKWR 250", "BTSQ 890", "DRTY 600"]
cat = ["A", "B"]
I concur with your decision to create a hash h with the form of Hash::new that takes an argument (the "default value") which h[k] returns when h does not have a key k. As a first step we can write:
h = stock.each_with_object(Hash.new(0)) { |s,h| h[s[0]] += s[/\d+/].to_i }
#=> {"A"=>200, "C"=>500, "B"=>1140, "D"=>600}
Then Hash#slice can be used to extract the desired key-value pairs:
h = h.slice(*cat)
#=> {"A"=>200, "B"=>1140}
At this point you have all the information you need to display the result any way you like. For example,
" " << h.map { |k,v| "(#{k} : #{v})" }.join(" - ") << " "
#=> " (A : 200) - (B : 1140) "
If h before h.slice(*cat) is large relative to h.slice(*cat) you can reduce memory requirements and probably speed things somewhat by writing the following.
require 'set'
cat_set = cat.to_set
#=> #<Set: {"A", "B"}>
h = stock.each_with_object(Hash.new(0)) do |s,h|
h[s[0]] += s[/\d+/].to_i if cat_set.include?(s[0])
end
#=> {"A"=>200, "B"=>1140}

Comparing values of one hash to many hashes to get inverse document frequency in ruby

I'm trying to find the inverse document frequency for a categorization algorithm and am having trouble getting it the way that my code is structured (with nested hashes), and generally comparing one hash to many hashes.
My training code looks like this so far:
def train!
#data = {}
#all_books.each do |category, books|
#data[category] = {
words: 0,
books: 0,
freq: Hash.new(0)
}
books.each do |filename, tokens|
#data[category][:words] += tokens.count
#data[category][:books] += 1
tokens.each do |token|
#data[category][:freq][token] += 1
end
end
#data[category][:freq].map { |k, v| v = (v / #data[category][:freq].values.max) }
end
end
Basically, I have a hash with 4 categories (subject to change), and for each have word count, book count, and a frequency hash which shows term frequency for the category. How do I get the frequency of individual words from one category compared against the frequency of the words shown in all categories? I know how to do the comparison for one set of hash keys against another, but am not sure how to loop through a nested hash to get the frequency of terms against all other terms, if that makes sense.
Edit to include predicted outcome -
I'd like to return a hash of nested hashes (one for each category) that shows the word as the key, and the number of other categories in which it appears as the value. i.e. {:category1 = {:word => 3, :other => 2, :third => 1}, :category2 => {:another => 1, ...}} Alternately an array of category names as the value, instead of the number of categories, would also work.
I've tried creating a new hash as follows, but it's turning up empty:
def train!
#data = {}
#all_words = Hash.new([]) #new hash for all words, default value is empty array
#all_books.each do |category, books|
#data[category] = {
words: 0,
books: 0,
freq: Hash.new(0)
}
books.each do |filename, tokens|
#data[category][:words] += tokens.count
#data[category][:books] += 1
tokens.each do |token|
#data[category][:freq][token] += 1
#all_words[token] << category #should insert category name if the word appears, right?
end
end
#data[category][:freq].map { |k, v| v = (v / #data[category][:freq].values.max) }
end
end
If someone can help me figure out why the #all_words hash is empty when the code is run, I may be able to get the rest.
I haven't gone through it all, but you certainly have an error:
#all_words[token] << category #should insert category name if the word appears, right?
Nope. #all_words[token] will return empty array, but not create a new slot with an empty array, like you're assuming. So that statement doesn't modify the #all_words hash at all.
Try these 2 changes and see if it helps:
#all_words = {} # ditch the default value
...
(#all_words[token] ||= []) << category # lazy-init the array, and append

Ruby grocery list program

I am currently learning Ruby and I'm trying to write a simple Ruby grocery_list method. Here are the instructions:
We want to write a program to help keep track of a grocery list. It takes a grocery item (like "eggs") as an argument, and returns the grocery list (that is, the item names with the quantities of each item). If you pass the same argument twice, it should increment the quantity.
def grocery_list(item)
array = []
quantity = 1
array.each {|x| quantity += x }
array << "#{quantity}" + " #{item}"
end
puts grocery_list("eggs", "eggs")
so I'm trying to figure out here how to return "2 eggs" by passing eggs twice
To help you count the different items you can use as Hash. A Hash is similar to an Array, but with Strings instead of Integers als an Index:
a = Array.new
a[0] = "this"
a[1] = "that"
h = Hash.new
h["sonja"] = "asecret"
h["brad"] = "beer"
In this example the Hash might be used for storing passwords for users. But for your
example you need a hash for counting. Calling grocery_list("eggs", "beer", "milk", "eggs")
should lead to the following commands being executed:
h = Hash.new(0) # empty hash {} created, 0 will be default value
h["eggs"] += 1 # h is now {"eggs"=>1}
h["beer"] += 1 # {"eggs"=>1, "beer"=>1}
h["milk"] += 1 # {"eggs"=>1, "beer"=>1, "milk"=>1}
h["eggs"] += 1 # {"eggs"=>2, "beer"=>1, "milk"=>1}
You can work through all the keys and values of a Hash with the each-loop:
h.each{|key, value| .... }
and build up the string we need as a result, adding
the number of items if needed, and the name of the item.
Inside the loop we always add a comma and a blank at the end.
This is not needed for the last element, so after the
loop is done we are left with
"2 eggs, beer, milk, "
To get rid of the last comma and blank we can use chop!, which "chops off"
one character at the end of a string:
output.chop!.chop!
One more thing is needed to get the complete implementation of your grocery_list:
you specified that the function should be called like so:
puts grocery_list("eggs", "beer", "milk","eggs")
So the grocery_list function does not know how many arguments it's getting. We can handle
this by specifying one argument with a star in front, then this argument will
be an array containing all the arguments:
def grocery_list(*items)
# items is an array
end
So here it is: I did your homework for you and implemented grocery_list.
I hope you actually go to the trouble of understanding the implementation,
and don't just copy-and-paste it.
def grocery_list(*items)
hash = Hash.new(0)
items.each {|x| hash[x] += 1}
output = ""
hash.each do |item,number|
if number > 1 then
output += "#{number} "
end
output += "#{item}, "
end
output.chop!.chop!
return output
end
puts grocery_list("eggs", "beer", "milk","eggs")
# output: 2 eggs, beer, milk
def grocery_list(*item)
item.group_by{|i| i}
end
p grocery_list("eggs", "eggs","meat")
#=> {"eggs"=>["eggs", "eggs"], "meat"=>["meat"]}
def grocery_list(*item)
item.group_by{|i| i}.flat_map{|k,v| [k,v.length]}
end
p grocery_list("eggs", "eggs","meat")
#=>["eggs", 2, "meat", 1]
def grocery_list(*item)
Hash[*item.group_by{|i| i}.flat_map{|k,v| [k,v.length]}]
end
grocery_list("eggs", "eggs","meat")
#=> {"eggs"=>2, "meat"=>1}
grocery_list("eggs", "eggs","meat","apple","apple","apple")
#=> {"eggs"=>2, "meat"=>1, "apple"=>3}
or as #Lee said:
def grocery_list(*item)
item.each_with_object(Hash.new(0)) {|a, h| h[a] += 1 }
end
grocery_list("eggs", "eggs","meat","apple","apple","apple")
#=> {"eggs"=>2, "meat"=>1, "apple"=>3}
Use a Hash Instead of an Array
When you want an easy want to count things, you can use a hash key to hold the name of the thing you want to count, and the value of that key is the quantity. For example:
#!/usr/bin/env ruby
class GroceryList
attr_reader :list
def initialize
# Specify hash with default quantity of zero.
#list = Hash.new(0)
end
# Increment the quantity of each item in the #list, using the name of the item
# as a hash key.
def add_to_list(*items)
items.each { |item| #list[item] += 1 }
#list
end
end
if $0 == __FILE__
groceries = GroceryList.new
groceries.add_to_list('eggs', 'eggs')
puts 'Grocery list correctly contains 2 eggs.' if groceries.list['eggs'] == 2
end
Here's a more verbose, but perhaps more readable solutions to your challenge.
def grocery_list(*items) # Notice the asterisk in front of items. It means "put all the arguments into an array called items"
my_grocery_hash = {} # Creates an empty hash
items.each do |item| # Loops over the argument array and passes each argument into the loop as item.
if my_grocery_hash[item].nil? # Returns true of the item is not a present key in the hash...
my_grocery_hash[item] = 1 # Adds the key and sets the value to 1.
else
my_grocery_hash[item] = my_grocery_hash[item] + 1 # Increments the value by one.
end
end
my_grocery_hash # Returns a hash object with the grocery name as the key and the number of occurences as the value.
end
This will create an empty hash (called dictionaries or maps in other languages) where each grocery is added as a key with the value set to one. In case the same grocery appears multiple times as a parameter to your method, the value is incremented.
If you want to create a text string and return that instead of the hash object and you can do like this after the iteration:
grocery_list_string = "" # Creates an empty string
my_grocery_hash.each do |key, value| # Loops over the hash object and passes two local variables into the loop with the current entry. Key being the name of the grocery and value being the amount.
grocery_list_string << "#{value} units of #{key}\n" # Appends the grocery_list_string. Uses string interpolation, so #{value} becomes 3 and #{key} becomes eggs. The remaining \n is a newline character.
end
return grocery_list_string # Explicitly declares the return value. You can ommit return.
Updated answer to comment:
If you use the first method without adding the hash iteration you will get a hash object back which can be used to look up the amount like this.
my_hash_with_grocery_count = grocery_list("Lemonade", "Milk", "Eggs", "Lemonade", "Lemonade")
my_hash_with_grocery_count["Milk"]
--> 1
my_hash_with_grocery_count["Lemonade"]
--> 3
Enumerable#each_with_object can be useful for things like this:
def list_to_hash(*items)
items.each_with_object(Hash.new(0)) { |item, list| list[item] += 1 }
end
def hash_to_grocery_list_string(hash)
hash.each_with_object([]) do |(item, number), result|
result << (number > 1 ? "#{number} #{item}" : item)
end.join(', ')
end
def grocery_list(*items)
hash_to_grocery_list_string(list_to_hash(*items))
end
p grocery_list('eggs', 'eggs', 'bread', 'milk', 'eggs')
# => "3 eggs, bread, milk"
It iterates an array or hash to enable building another object in a convenient way. The list_to_hash method uses it to build a hash from the items array (the splat operator converts the method arguments to an array); the hash is created so that each value is initialized to 0. The hash_to_grocery_list_string method uses it to build an array of strings that is joined to a comma-separated string.

array modified in the loop causes bug with the output ruby

I have a code that places anagrams into an array of arrays. (which contain anagrams)
but somewhere i made a bug and the first values do not output as arrays but just as strings
I am using the << operator to push one array into the other
the code is not that complicated but i cannot find a bug
def combine_anagrams(words)
indexes = []
anagrams = []
words.each{|word|
if(word.is_a? String )
first_word = word.downcase.chars.sort.join
words.each{|second_word|
if(second_word.is_a? String)
if(first_word == second_word.downcase.chars.sort.join)
indexes << words.index(second_word)
end
end
}
indexes.each{|index| anagrams << words[index] }
words.reject!.with_index {|el, idx| indexes.include?(idx)}
words << anagrams # i replaced words with an array all_anagrams
indexes = []
anagrams = []
end
}
return words
end
puts combine_anagrams([ 'cars','for', 'potatoes', 'racs', 'four','scar', 'creams', 'scream'] ).inspect
outputs
["for", "four", ["cars", "racs", "scar"], ["potatoes"], ["creams", "scream"]]
if i switch the order of "cars" and "for" in the input i get
["cars", "racs", "scar", ["for"], ["potatoes"], ["four"], ["creams", "scream"]]
Whats going on here
Sorry for the messy code im just begging to learn ruby
I created an additional variable all_anagrams = [] to store the array of all anagrams
when i output the array onto the sreen i get all the values except the "for" and "four" for some reason those never get send to all_anagrams
probably because i shorten the array when i am in the loop and those values get skipped over?
However i dont know how to deal with this problem.
the output of all_anagrams is
[["cars", "racs", "scar"], ["potatoes"], ["creams", "scream"]]
What you need is introduce a new array to store anagrams before you blank it, lets call it valid_anagrams. Right now you're pushing that in words. And as Fredrick pointed out you're modifying words while iterating over it. Its not good and to avoid that you keep a clone of words called words_clone and reject items from it instead. Following code should work -
def combine_anagrams(words)
indexes, anagrams, valid_anagrams = [], [], []
words_clone = words.clone # creating a clone of words
words.each do |word|
if(word.is_a? String )
first_word = word.downcase.chars.sort.join
words.each do |second_word|
if(second_word.is_a? String)
if(first_word == second_word.downcase.chars.sort.join)
indexes << words.index(second_word)
end
end
end
indexes.each{|index| anagrams << words[index] }
# reject from words_cloned instead of words
words_clone.reject!.with_index {|el, idx| indexes.include?(idx)}
# insert anagrams into valid_anagrams array. In your code you inserted it in words array
valid_anagrams << anagrams unless valid_anagrams.include?(anagrams)
indexes, anagrams = [], []
end
end
# return valid_anagrams array
return valid_anagrams
end

Ruby: sorting 2d array and output similar field value to files

I have array which I read from excel (using ParseExcel) using the following code:
workbook = Spreadsheet::ParseExcel.parse("test.xls")
rows = workbook.worksheet(1).map() { |r| r }.compact
grid = rows.map() { |r| r.map() { |c| c.to_s('latin1') unless c.nil?}.compact rescue nil }
grid.sort_by { |k| k[2]}
test.xls has lots of rows and 6 columns. The code above sort by column 3.
I would like to output rows in array "grid" to many text file like this:
- After sorting, I want to print out all the rows where column 3 have the same value into one file and so on for a different file for other same value in column3.
Hope I explain this right. Thanks for any help/tips.
ps.
I search through most posting on this site but could not find any solution.
instead of using your above code, I made a test 100-row array, each row containing a 6-element array.
You pass in the array, and the column number you want matched, and this method prints into separate files rows that have the same nth element.
Since I used integers, I used the nth element of each row as the filename. You could use a counter, or the md5 of the element, or something like that, if your nth element does not make a good filename.
a = []
100.times do
b = []
6.times do
b.push rand(10)
end
a.push(b)
end
def print_files(a, column)
h = Hash.new
a.each do |element|
h[element[2]] ? (h[element[column]] = h[element[column]].push(element)) : (h[element[column]] = [element])
end
h.each do |k, v|
File.open("output/" + k.to_s, 'w') do |f|
v.each do |line|
f.puts line.join(", ")
end
end
end
end
print_files(a, 2)
Here is the same code using blocks instead of do .. end:
a = Array.new
100.times{b = Array.new;6.times{b.push rand(10)};a.push(b)}
def print_files(a, column)
h = Hash.new
a.each{|element| h[element[2]] ? (h[element[column]] = h[element[column]].push(element)) : (h[element[column]] = [element])}
h.map{|k, v| File.open("output/" + k.to_s, 'w'){|f| v.map{|line| f.puts line.join(", ")}}}
end
print_files(a, 2)

Resources