adding a hash to hash or symbol on the fly Ruby - ruby

I would like to know how to add a hash to hash on the fly
and increment the hashes on the inside.
words_to_scan.scan(/\w+|\?|\.|!|\,/).select do |aword|
if words_from_file.has_key?(aword.to_sym)
words_from_file[aword.to_sym]['pop'] += 1
else
words_from_file[aword.to_sym]['pop'] = 1
end
end
i am trying to create something like
words_from_file = {:the => {'pop' => 3, 'positions' => [1,6,10]}}

words_from_file = {}
words_to_scan.scan(/\w+|\?|\.|!|\,/).select do |aword|
words_from_file[aword.to_sym] ||= {} # declare hash if was not already declared
words_from_file[aword.to_sym]['pop'] ||= 0 # set pop if was not already set
words_from_file[aword.to_sym]['pop'] += 1 # increment
end

The default_proc of a Hash runs whenever a key is not found. Here it creates a new Hash as value for the new key:
words_from_file.default_proc = Proc.new{|h,k,v| h[k] = {'pop' => 0, 'positions' => []} }
words_to_scan.scan(/\w+|\?|\.|!|\,/).each do |aword|
words_from_file[aword.to_sym]['pop'] += 1
end

Related

Working with Hashes that have a default value

Am learning to code with ruby. I am learning about hashes and i dont understand this code: count = Hash.new(0). It says that the 0 is a default value, but when i run it on irb it gives me an empty hash {}. If 0 is a default value why can't i see something like count ={0=>0}. Or is the zero an accumulator but doesn't go to the keys or values? Thanks
0 will be the fallback if you try to access a key in the hash that doesn't exist
For example:
count = Hash.new -> count['key'] => nil
vs
count = Hash.new(0) -> count['key'] => 0
To expand on the answer from #jeremy-ramos and comment from #mu-is-too-short.
There are two common gotcha's with defaulting hash values in this way.
1. Accidentally shared references.
Ruby uses the exact same object in memory that you pass in as the default value for every missed key.
For an immutable object (like 0), there is no problem. However you might want to write code like:
hash = Hash.new([])
hash[key] << value
or
hash = Hash.new({})
hash[key][second_key] = value
This will not do what you'd expect. Instead of hash[unknown_key] returning a new, empty array or hash it will return the exact same array/hash object for every key.
so doing:
hash = Hash.new([])
hash[key1] << value1
hash[key2] << value2
results in a hash where key1 and key2 both point to the same array object containing [value1, value2]
See related question here
Solution
To solve this you can create a hash with a default block argument instead (which is called whenever a missing key is accessed and lets you assign a value to the missed key)
hash = Hash.new{|h, key| h[key] = [] }
2. Assignment of missed keys with default values
When you access a missing key that returns the default value, you might expect that the hash will now contain that key with the value returned. It does not. Ruby does not modify the hash, it simply returns the default value. So, for example:
hash = Hash.new(0) #$> {}
hash.keys.empty? #$> true
hash[:foo] #$> 0
hash[:foo] == 0 #$> true
hash #$> {}
hash.keys.empty? #$> true
Solution
This confusion is also addressed using the block approach, where they keys value can be explicitly set.
The Hash.new docs are not very clear on this. I hope that the example below clarifies the difference and one of the frequent uses of Hash.new(0).
The first chunk of code uses Hash.new(0). The hash has a default value of 0, and when new keys are encountered, their value is 0. This method can be used to count the characters in the array.
The second chunk of code fails, because the default value for the key (when not assigned) is nil. This value cannot be used in addition (when counting), and generates an error.
count = Hash.new(0)
puts "count=#{count}"
# count={}
%w[a b b c c c].each do |char|
count[char] += 1
end
puts "count=#{count}"
# count={"a"=>1, "b"=>2, "c"=>3}
count = Hash.new
puts "count=#{count}"
%w[a b b c c c].each do |char|
count[char] += 1
# Fails: in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)
end
puts "count=#{count}"
SEE ALSO:
What's the difference between "Hash.new(0)" and "{}"
TL;DR When you initialize hash using Hash.new you can setup default value or default proc (the value that would be returned if given key does not exist)
Regarding the question to understand this magic firstly you need to know that Ruby hashes have default values. To access default value you can use Hash#default method
This default value by default :) is nil
hash = {}
hash.default # => nil
hash[:key] # => nil
You can set default value with Hash#default=
hash = {}
hash.default = :some_value
hash[:key] # => :some_value
Very important note: it is dangerous to use mutable object as default because of side effect like this:
hash = {}
hash.default = []
hash[:key] # => []
hash[:other_key] << :some_item # will mutate default value
hash[:key] # => [:some_value]
hash.default # => [:some_value]
hash # => {}
To avoid this you can use Hash#default_proc and Hash#default_proc= methods
hash = {}
hash.default_proc # => nil
hash.default_proc = proc { [] }
hash[:key] # => []
hash[:other_key] << :some_item # will not mutate default value
hash[:other_key] # => [] # because there is no this key
hash[:other_key] = [:symbol]
hash[:other_key] << :some_item
hash[:other_key] # => [:symbol, :some_item]
hash[:key] # => [] # still empty array as default
Setting default cancels default_proc and vice versa
hash = {}
hash.default = :default
hash.default_proc = proc { :default_proc }
hash[:key] # => :default_proc
hash.default = :default
hash[:key] # => :default
hash.default_proc # => nil
Going back to Hash.new
When you pass argument to this method, you initialize default value
hash = Hash.new(0)
hash.default # => 0
hash.default_proc # => nil
When you pass block to this method, you initialize default proc
hash = Hash.new { 0 }
hash.default # => nil
hash[:key] # => 0

Ruby dot notation to nested hash keys

What's the best way of converting a dot notation path (or even an array of strings) into a nested hash key-value? Ex: I need to convert 'foo.bar.baz' equal to 'qux' like this:
{
'foo' => {
'bar' => {
'baz' => 'qux'
}
}
}
I've done this in PHP, but I managed that by creating a key in the array and then setting a tmp variable to that array key's value by reference so any changes would also take place in the array.
Try this
f = "root/sub-1/sub-2/file"
f.split("/").reverse.inject{|a,n| {n=>a}} #=>{"root"=>{"sub-1"=>{"sub-2"=>"file"}}}
I'd probably use recursion. For example:
def hasherizer(arr, value)
if arr.empty?
value
else
{}.tap do |hash|
hash[arr.shift] = hasherizer(arr, value)
end
end
end
This results in:
> hasherizer 'foo.bar.baz'.split('.'), 'qux'
=> {"foo"=>{"bar"=>{"baz"=>"qux"}}}
I like this method below which operates on itself (or your own hash class). It'll create new hash keys or reuse/append to existing keys in a hash to add or update the value.
# set a new or existing nested key's value by a dotted-string key
def dotkey_set(dottedkey, value, deep_hash = self)
keys = dottedkey.to_s.split('.')
first = keys.first
if keys.length == 1
deep_hash[first] = value
else
# in the case that we are creating a hash from a dotted key, we'll assign a default
deep_hash[first] = (deep_hash[first] || {})
dotkey_set(keys.slice(1..-1).join('.'), value, deep_hash[first])
end
end
Usage:
hash = {}
hash.dotkey_set('how.are.you', 'good')
# => "good"
hash
# => {"how"=>{"are"=>{"you"=>"good"}}}
hash.dotkey_set('how.goes.it', 'fine')
# => "fine"
hash
# => {"how"=>{"are"=>{"you"=>"good"}, "goes"=>{"it"=>"fine"}}}
I did something similar when I wrote an HTTP server that had to move all the parameters passed in the request into a multiple value hash which might contain arrays or strings or hashes...
You can look at the code for the Plezi server and framework... although the code over there deals with values surrounded with []...
It could possibly be adjusted like so:
def add_param_to_hash param_name, param_value, target_hash = {}
begin
a = target_hash
p = param_name.split(/[\/\.]/)
val = param_value
# the following, somewhat complex line, runs through the existing (?) tree, making sure to preserve existing values and add values where needed.
p.each_index { |i| p[i].strip! ; n = p[i].match(/^[0-9]+$/) ? p[i].to_i : p[i].to_sym ; p[i+1] ? [ ( a[n] ||= ( p[i+1].empty? ? [] : {} ) ), ( a = a[n]) ] : ( a.is_a?(Hash) ? (a[n] ? (a[n].is_a?(Array) ? (a << val) : a[n] = [a[n], val] ) : (a[n] = val) ) : (a << val) ) }
rescue Exception => e
warn "(Silent): parameters parse error for #{param_name} ... maybe conflicts with a different set?"
target_hash[param_name] = param_value
end
end
This should preserve existing values while adding new values if they exist.
The long line looks something like this when broken down:
def add_param_to_hash param_name, param_value, target_hash = {}
begin
# a will hold the object to which we Add.
# As we walk the tree we change `a`. we start at the root...
a = target_hash
p = param_name.split(/[\/\.]/)
val = param_value
# the following, somewhat complex line, runs through the existing (?) tree, making sure to preserve existing values and add values where needed.
p.each_index do |i|
p[i].strip!
# converts the current key string to either numbers or symbols... you might want to replace this with: n=p[i]
n = p[i].match(/^[0-9]+$/) ? p[i].to_i : p[i].to_sym
if p[i+1]
a[n] ||= ( p[i+1].empty? ? [] : {} ) # is the new object we'll add to
a = a[n] # move to the next branch.
else
if a.is_a?(Hash)
if a[n]
if a[n].is_a?(Array)
a << val
else
a[n] = [a[n], val]
end
else
a[n] = val
end
else
a << val
end
end
end
rescue Exception => e
warn "(Silent): parameters parse error for #{param_name} ... maybe conflicts with a different set?"
target_hash[param_name] = param_value
end
end
Brrr... Looking at the code like this, I wonder what I was thinking...

Difference between Ruby's .push and << [duplicate]

This question already has answers here:
Ruby - Difference between Array#<< and Array#push
(5 answers)
Closed 8 years ago.
Here's an example with push:
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
puts #connections # => {1=>[2]}
Here's an example with <<
#connections = Hash.new []
#connections[1] << 2
puts #connections # => {}
For some reason the output (#connections) is different, but why? I'm guessing it has something to do with Ruby object model?
Perhaps the new hash object [] is being create each time, but not saved? But why?
The difference in your code isn't about << vs. push, it's about the fact that you re-assign in one case and don't in the other. The following two pieces of code are equivalent:
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
puts #connections # => {1=>[2]}
#connections = Hash.new []
#connections[1] = (#connections[1] << 2)
puts #connections # => {1=>[2]}
As are these two:
#connections = Hash.new []
#connections[1].push(2)
puts #connections # => {}
#connections = Hash.new []
#connections[1] << 2
puts #connections # => {}
The reason that re-assignment makes a difference here is that accessing a default value, does not automatically add an entry for it to the hash. That is if you have h = Hash.new(0) and then you do p h[0], you'll print 0, but the value of h will still be {} (not {0 => 0}) because the 0 is not added to the hash. If you do h[0] += 1, this will call the []= method on the hash and actually add an entry for 0 to it, so h becomes {0 => 1}.
So when you do #connections[1] << 2 in your code, you get the default array and perform << on it, but you don't store anything in #connections, so it stays {}. When you do #connections[i] = #connections[i].push(2) or #connections[i] = (#connections[i] << 2), you're calling []=, so the entry gets added to the hash.
However you should note that the hash will return a reference to the same array each time, so even if you do add the entry to the hash, it will likely still not behave as you expect once you add more than one entry (since all entries refer to the same array):
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
#connections[2] = #connections[2].push(42)
puts #connections # => {1 => [2, 42], 2 => [2, 42]}
What you really want is a hash that returns a reference to a new array each time that a new key is accessed and that automatically adds an entry for the new array when that happens. To do that you can use the block form of Hash.new like this:
#connections = Hash.new do |h, k|
h[k] = []
end
#connections[1].push(2)
#connections[2].push(42)
puts #connections # => {1 => [2], 2 => [42]}
Note that when you write
h = Hash.new |this_hash, non_existent_key| { this_hash[non_existent_key] = [] }
...Ruby will execute the block whenever you try to lookup a key that doesn't exist, and then return the block's return value. A block is like a def in that all variables inside it(including the parameter variables) are created anew every time the block is called. In addition, note that [] is an Array constructor, and each time it is called, it creates a new array.
A block returns the result of the last statement that was executed in the block, which is the assignment statement:
this_hash[non_existent_key] = []
And an assignment statement returns the right hand side, which will be a reference to the same Array that was assigned to the key in the hash, so any changes to the returned Array will change the Array in the hash.
On the other hand, when you write:
Hash.new([])
The [] constructor creates a new, empty Array; and that Array becomes the argument for Hash.new(). There is no block for ruby to call every time you look up a non existent key, so ruby just returns that one Array as the value for ALL non-existent keys--and very importantly nothing is done to the hash.

Can't convert symbol to integer from hash table

Edit: The issue is being unable to get the quantity of arrays within the hash, so it can be, x = amount of arrays. so it can be used as function.each_index{|x| code }
Trying to use the index of the amount of rows as a way of repeating an action X amount of times depending on how much data is pulled from a CSV file.
Terminal issued
=> Can't convert symbol to integer (TypeError)
Complete error:
=> ~/home/tests/Product.rb:30:in '[]' can't convert symbol into integer (TypeError) from ~home/tests/Product.rub:30:in 'getNumbRel'
from test.rb:36:in '<main>'
the function is that is performing the action is:
def getNumRel
if defined? #releaseHashTable
return #releaseHashTable[:releasename].length
else
#releaseHashTable = readReleaseCSV()
return #releaseHashTable[:releasename].length
end
end
The csv data pull is just a hash of arrays, nothing snazzy.
def readReleaseCSV()
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has started")
$log.debug("reading product csv file")
# Create a Hash where the default is an empty Array
result = Array.new
csvPath = "#{File.dirname(__FILE__)}"+"/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
if "#{column}" == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
productId = Integer(value)
if result[productId] == nil
result[productId] = Array.new
end
result[productId][result[productId].length] = proHash
end
end
end
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has finished")
#productReleaseArr = result
end
Sorry, couldn't resist, cleaned up your method.
# empty brackets unnecessary, no uppercase in method names
def read_release_csv
# you don't need + here
$log.info("Method #{self.class.name}.#{__method__} has started")
$log.debug("reading product csv file")
# you're returning this array. It is not a hash. [] is preferred over Array.new
result = []
csvPath = "#{File.dirname(__FILE__)}/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
# to_s is preferred
if column.to_s == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
# to_i is preferred
productId = value.to_i
# this notation is preferred
result[productId] ||= []
# this is identical to what you did and more readable
result[productId] << proHash
end
end
end
$log.info("Method #{self.class.name}.#{__method__} has finished")
#productReleaseArr = result
end
You haven't given much to go on, but it appears that #releaseHashTable contains an Array, not a Hash.
Update: Based on the implementation you posted, you can see that productId is an integer and that the return value of readReleaseCSV() is an array.
In order to get the releasename you want, you have to do this:
#releaseHashTable[productId][n][:releasename]
where productId and n are integers. Either you'll have to specify them specifically, or (if you don't know n) you'll have to introduce a loop to collect all the releasenames for all the products of a particular productId.
This is what Mark Thomas meant:
> a = [1,2,3] # => [1, 2, 3]
> a[:sym]
TypeError: can't convert Symbol into Integer
# here starts the backstrace
from (irb):2:in `[]'
from (irb):2
An Array is only accessible by an index like so a[1] this fetches the second element from the array
Your return a an array and thats why your code fails:
#....
result = Array.new
#....
#productReleaseArr = result
# and then later on you call
#releaseHashTable = readReleaseCSV()
#releaseHashTable[:releasename] # which gives you TypeError: can't convert Symbol into Integer

String to array to multidimensional hash in ruby

I don't really know if the title is correct, but the question is quite simple:
I have a value and a key.
The key is as follows:
"one.two.three"
Now, how can I set this hash:
params['one']['two']['three'] = value
You can try to do it with this code:
keys = "one.two.three".split '.' # => ["one", "two", "three"]
params = {}; value = 1; i = 0; # i is an index of processed keys array element
keys.reduce(params) { |hash, key|
hash[key] = if (i += 1) == keys.length
value # assign value to the last key in keys array
else
hash[key] || {} # initialize hash if it is not initialized yet (won't loose already initialized hashes)
end
}
puts params # {"one"=>{"two"=>{"three"=>1}}}
Use recursion:
def make_hash(keys)
keys.empty? ? 1 : { keys.shift => make_hash(keys) }
end
puts make_hash("one.two.three".split '.')
# => {"one"=>{"two"=>{"three"=>1}}}
You can use the inject method:
key = "one.two.three"
value = 5
arr = key.split(".").reverse
arr[1..-1].inject({arr[0] => value}){ |memo, i| {i => memo} }
# => {"one"=>{"two"=>{"three"=>5}}}

Resources