Automatic nested hash keys with ability to append value - ruby

I have an arbitrary list of file names I'd like to sort into a hash. I'd like to do it like this:
## Example file name 'hello.world.random_hex"
file_name_list.each do |file|
name_array = file.split('.')
files[name_array[0].to_sym][name_array[1].to_sym] << file
end
Those keys may not exist and I'd like for them to be automatically created with a default value of [] so the << works as expected. The final files hash would look like:
{ :hello => { :world => [ "hello.world.random_hex", "hello.world.other_random_hex" ] } }
How can I initialize files to accomplish this?

If there are always two levels of keys like this, you can do it using the block form of Hash.new:
files = Hash.new {|k,v| k[v] = Hash.new {|k,v| k[v] = [] }}
(On the other hand, if the keys can be nested to an arbitrary depth, this is much harder because the Hash can't know whether the value for a nonexistent key should be a Hash or an Array at the time it is accessed.)

Related

Creating a ruby nested hash with array as inner value

I am trying to create a nested hash where the inner values are arrays. For example
{"monday"=>{"morning"=>["John", "Katie", "Dave"],"afternoon"=>["Anne", "Charlie"]},
"tuesday"=>{"morning"=>["Joe"],"afternoon"=>["Chris","Tim","Melissa"]}}
I tried
h = Hash.new( |hash, key| hash[key] = Hash.new([]) }
When I try
h["monday"]["morning"].append("Ben")
and look at h, I get
{"monday" => {}}
rather than
{"monday" => {"morning"=>["Ben"]}}
I'm pretty new to Ruby, any suggestions for getting the functionality I want?
Close, you'll have to initialise a new hash as the value of the initial key, and set an Array as the value of the nested hash:
h = Hash.new { |hash, key| hash[key] = Hash.new { |k, v| k[v] = Array.new } }
h["monday"]["morning"] << "Ben"
{"monday"=>{"morning"=>["Ben"]}}
This way you will not have to initialise an array every time you want to push a value. The key will be as you set in the initial parameter, the second parameter will create a nested hash where the value will be an array you can push to with '<<'. Is this a solution to use in live code? No, it’s not very readable but explains a way of constructing data objects to fit your needs.
Refactored for Explicitness
While it's possible to create a nested initializer using the Hash#new block syntax, it's not really very readable and (as you've seen) it can be hard to debug. It may therefore be more useful to construct your nested hash in steps that you can inspect and debug as you go.
In addition, you already know ahead of time what your keys will be: the days of the week, and morning/afternoon shifts. For this use case, you might as well construct those upfront rather than relying on default values.
Consider the following:
require 'date'
# initialize your hash with a literal
schedule = {}
# use constant from Date module to initialize your
# lowercase keys
Date::DAYNAMES.each do |day|
# create keys with empty arrays for each shift
schedule[day.downcase] = {
"morning" => [],
"afternoon" => [],
}
end
This seems more explicit and readable to me, but that's admittedly subjective. Meanwhile, calling pp schedule will show you the new data structure:
{"sunday"=>{"morning"=>[], "afternoon"=>[]},
"monday"=>{"morning"=>[], "afternoon"=>[]},
"tuesday"=>{"morning"=>[], "afternoon"=>[]},
"wednesday"=>{"morning"=>[], "afternoon"=>[]},
"thursday"=>{"morning"=>[], "afternoon"=>[]},
"friday"=>{"morning"=>[], "afternoon"=>[]},
"saturday"=>{"morning"=>[], "afternoon"=>[]}}
The new data structure can then have its nested array values assigned as you currently expect:
schedule["monday"]["morning"].append("Ben")
#=> ["Ben"]
As a further refinement, you could append to your nested arrays in a way that ensures you don't duplicate names within a scheduled shift. For example:
schedule["monday"]["morning"].<<("Ben").uniq!
schedule["monday"]
#=> {"morning"=>["Ben"], "afternoon"=>[]}
There are many ways to create the hash. One simple way is as follows.
days = [:monday, :tuesday]
day_parts = [:morning, :afternoon]
h = days.each_with_object({}) do |d,h|
h[d] = day_parts.each_with_object({}) { |dp,g| g[dp] = [] }
end
#=> {:monday=>{:morning=>[], :afternoon=>[]},
# :tuesday=>{:morning=>[], :afternoon=>[]}}
Populating the hash will of course depend on the format of the data. For example, if the data were as follows:
people = { "John" =>[:monday, :morning],
"Katie" =>[:monday, :morning],
"Dave" =>[:monday, :morning],
"Anne" =>[:monday, :afternoon],
"Charlie"=>[:monday, :afternoon],
"Joe" =>[:tuesday, :morning],
"Chris" =>[:tuesday, :afternoon],
"Tim" =>[:tuesday, :afternoon],
"Melissa"=>[:tuesday, :afternoon]}
we could build the hash as follows.
people.each { |name,(day,day_part)| h[day][day_part] << name }
#=> {
# :monday=>{
# :morning=>["John", "Katie", "Dave"],
# :afternoon=>["Anne", "Charlie"]
# },
# :tuesday=>{
# :morning=>["Joe"],
# :afternoon=>["Chris", "Tim", "Melissa"]
# }
# }
As per your above-asked question
h = Hash.new{ |hash, key| hash[key] = Hash.new([]) }
you tried
h["monday"]["morning"].append("Ben")
instead you should first initialize that with an array & then you can use array functions like append
h["monday"]["morning"] = []
h["monday"]["morning"].append("Ben")
This would work fine & you will get the desired results.

How to "split and group" an array of objects based on one of their properties

Context and Code Examples
I have an Array with instances of a class called TimesheetEntry.
Here is the constructor for TimesheetEntry:
def initialize(parameters = {})
#date = parameters.fetch(:date)
#project_id = parameters.fetch(:project_id)
#article_id = parameters.fetch(:article_id)
#hours = parameters.fetch(:hours)
#comment = parameters.fetch(:comment)
end
I create an array of TimesheetEntry objects with data from a .csv file:
timesheet_entries = []
CSV.parse(source_file, csv_parse_options).each do |row|
timesheet_entries.push(TimesheetEntry.new(
:date => Date.parse(row['Date']),
:project_id => row['Project'].to_i,
:article_id => row['Article'].to_i,
:hours => row['Hours'].gsub(',', '.').to_f,
:comment => row['Comment'].to_s.empty? ? "N/A" : row['Comment']
))
end
I also have a Set of Hash containing two elements, created like this:
all_timesheets = Set.new []
timesheet_entries.each do |entry|
all_timesheets << { 'date' => entry.date, 'entries' => [] }
end
Now, I want to populate the Array inside of that Hash with TimesheetEntries.
Each Hash array must contain only TimesheetEntries of one specific date.
I have done that like this:
timesheet_entries.each do |entry|
all_timesheets.each do |timesheet|
if entry.date == timesheet['date']
timesheet['entries'].push entry
end
end
end
While this approach gets the job done, it's not very efficient (I'm fairly new to this).
Question
What would be a more efficient way of achieving the same end result? In essence, I want to "split" the Array of TimesheetEntry objects, "grouping" objects with the same date.
You can fix the performance problem by replacing the Set with a Hash, which is a dictionary-like data structure.
This means that your inner loop all_timesheets.each do |timesheet| ... if entry.date ... will simply be replaced by a more efficient hash lookup: all_timesheets[entry.date].
Also, there's no need to create the keys in advance and then populate the date groups. These can both be done in one go:
all_timesheets = {}
timesheet_entries.each do |entry|
all_timesheets[entry.date] ||= [] # create the key if it's not already there
all_timesheets[entry.date] << entry
end
A nice thing about hashes is that you can customize their behavior when a non-existing key is encountered. You can use the constructor that takes a block to specify what happens in this case. Let's tell our hash to automatically add new keys and initialize them with an empty array. This allows us to drop the all_timesheets[entry.date] ||= [] line from the above code:
all_timesheets = Hash.new { |hash, key| hash[key] = [] }
timesheet_entries.each do |entry|
all_timesheets[entry.date] << entry
end
There is, however, an even more concise way of achieving this grouping, using the Enumerable#group_by method:
all_timesheets = timesheet_entries.group_by { |e| e.date }
And, of course, there's a way to make this even more concise, using yet another trick:
all_timesheets = timesheet_entries.group_by(&:date)

Ruby's optimized implementation of Histogram/Aggregator

i'm about to write my own but i was wondering if there are any gems/libs that i can use as aggregator/histogram
my goal would be to sum up values based on a matching key:
["fish","2"]
["fish","40"]
["meat","56"]
["meat","1"]
Should sum op the values per unique key and return ["fish","42"] and ["meat","57"]
.The files i have to aggregate are relatively large, about 4gb text files made of tsv key/value pair
.My goal is to try not to use temporary files in order not to take too much space on the machine, so i was wondering if something similar already optimized already exists, i have found a jeb on github named 'histogram' but it does not really contain the functionalities i need
Thx
You can use a Hash with a default value of 0 to do the counting, then in the end you could convert it to Array to yield the format you want, though I think you might just want to keep using the Hash instead.
data = [
["fish","2"],
["fish","40"],
["meat","56"],
["meat","1"]
]
hist = data.each_with_object(Hash.new(0)) do |(k,v), h|
h[k] += v.to_i
end
hist # => {"fish"=>42, "meat"=>57}
hist.to_a # => [["fish", 42], ["meat", 57]]
# To get String values, "42" instead of 42, etc:
hist.map { |k,v| [k, v.to_s] } # => [["fish", "42"], ["meat", "57"]]
Since you stated you had to read the data from a file, here is the above when applied to a file. The input.txt file contents are as follows for this example:
fish,2
fish,40
meat,56
meat,1
Then, to create the same output as before by reading it line by line:
file = File.open('input.txt')
hist = file.each_with_object(Hash.new(0)) do |line, h|
key, value = line.split(',')
h[key] += value.to_i
end
file.close

In Ruby, how can I recursivly populate a Mongo database using nested arrays as input?

I have been using Ruby for a while, but this is my first time doing anything with a database. I've been playing around with MongoDB for a bit and, at this point, I've begun to try and populate a simple database.
Here is my problem. I have a text file containing data in a particular format. When I read that file in, the data is stored in nested arrays like so:
dataFile = ["sectionName", ["key1", "value1"], ["key2", "value2", ["key3", ["value3A", "value3B"]]]
The format will always be that the first value of the array is a string and each subsequent value is an array. Each array is formatted in as a key/value pair. However, the value can be a string, an array of two strings, or a series of arrays that have their own key/value array pairs. I don't know any details about the data file before I read it in, just that it conforms to these rules.
Now, here is my problem. I want to read this into to a Mongo database preserving this basic structure. So, for instance, if I were to do this by hand, it would look like this:
newDB = mongo_client.db("newDB")
newCollection = newDB["dataFile1"]
doc = {"section_name" => "sectionName", "key1" => "value1", "key2" => "value2", "key3" => ["value3A", "value3B"]}
ID = newCollection.insert(doc)
I know there has to be an easy way to do this. So far, I've been trying various recursive functions to parse the data out, turn it into mongo commands and try to populate my database. But it just feels clunky, like there is a better way. Any insight into this problem would be appreciated.
The value that you gave for the variable dataFile isn't a valid array, because it is missing an closing square bracket.
If we made the definition of dataFile a valid line of ruby code, the following code would yield the hash that you described. It uses map.with_index to visit each element of the array and transforms this array into a new array of key/value hashes. This transformed array of hashes is flatted and converted into single hash using the inject method.
dataFile = ["sectionName", ["key1", "value1"], ["key2", "value2", ["key3", ["value3A", "value3B"]]]]
puts dataFile.map.with_index {
|e, ix|
case ix
when 0
{ "section_name" => e }
else
list = []
list.push( { e[0] => e[1] } )
if( e.length > 2 )
list.push(
e[2..e.length-1].map {|p|
{ p[0] => p[1] }
}
)
end
list
end
}.flatten.inject({ }) {
|accum, e|
key = e.keys.first
accum[ key ] = e[ key ]
accum
}.inspect
The output looks like:
{"section_name"=>"sectionName", "key1"=>"value1", "key2"=>"value2", "key3"=>["value3A", "value3B"]}
For input that looked like this:
["sectionName", ["key1", "value1"], ["key2", "value2", ["key3", ["value3A", "value3B"]], ["key4", ["value4A", "value4B"]]], ["key5", ["value5A", "value5B"]]]
We would see:
{"section_name"=>"sectionName", "key1"=>"value1", "key2"=>"value2", "key3"=>["value3A", "value3B"], "key4"=>["value4A", "value4B"], "key5"=>["value5A", "value5B"]}
Note the arrays "key3" and "key4", which is what I consider as being called a series of arrays. If the structure has array of arrays of unknown depth then we would need a different implementation - maybe use an array to keep track of the position as the program walks through this arbitrarily nested array of arrays.
In the following test, please find two solutions.
The first converts to a nested Hash which is what I think that you want without flattening the input data.
The second stores the key-value pairs exactly as given from the input.
I've chosen to fix missing closing square bracket by preserving key values pairs.
The major message here is that while the top-level data structure for MongoDB is a document mapped to a Ruby Hash
that by definition has key-value structure, the values can be any shape including nested arrays or hashes.
So I hope that test examples cover the range, showing that you can match storage in MongoDB to fit your needs.
test.rb
require 'mongo'
require 'test/unit'
require 'pp'
class MyTest < Test::Unit::TestCase
def setup
#coll = Mongo::MongoClient.new['test']['test']
#coll.remove
#dataFile = ["sectionName", ["key1", "value1"], ["key2", "value2"], ["key3", ["value3A", "value3B"]]]
#key, *#value = #dataFile
end
test "nested array data as hash value" do
input_doc = {#key => Hash[*#value.flatten(1)]}
#coll.insert(input_doc)
fetched_doc = #coll.find.first
assert_equal(input_doc[#key], fetched_doc[#key])
puts "#{name} fetched hash value doc:"
pp fetched_doc
end
test "nested array data as array value" do
input_doc = {#key => #value}
#coll.insert(input_doc)
fetched_doc = #coll.find.first
assert_equal(input_doc[#key], fetched_doc[#key])
puts "#{name} fetched array doc:"
pp fetched_doc
end
end
ruby test.rb
$ ruby test.rb
Loaded suite test
Started
test: nested array data as array value(MyTest) fetched array doc:
{"_id"=>BSON::ObjectId('5357d4ac7f11ba0678000001'),
"sectionName"=>
[["key1", "value1"], ["key2", "value2"], ["key3", ["value3A", "value3B"]]]}
.test: nested array data as hash value(MyTest) fetched hash value doc:
{"_id"=>BSON::ObjectId('5357d4ac7f11ba0678000002'),
"sectionName"=>
{"key1"=>"value1", "key2"=>"value2", "key3"=>["value3A", "value3B"]}}
.
Finished in 0.009493 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
210.68 tests/s, 210.68 assertions/s

how to name an object reference (handle) dynamically in ruby

So I have a class like this:
def Word
end
and im looping thru an array like this
array.each do |value|
end
And inside that loop I want to instantiate an object, with a handle of the var
value = Word.new
Im sure there is an easy way to do this - I just dont know what it is!
Thanks!
To assign things to a dynamic variable name, you need to use something like eval:
array.each do |value|
eval "#{value} = Word.new"
end
but check this is what you want - you should avoid using eval to solve things that really require different data structures, since it's hard to debug errors created with eval, and can easily cause undesired behaviour. For example, what you might really want is a hash of words and associated objects, for example
words = {}
array.each do |value|
words[value] = Word.new
end
which won't pollute your namespace with tons of Word objects.
Depending on the data structure you want to work with, you could also do this:
# will give you an array:
words = array.map { |value| Word.new(value) }
# will give you a hash (as in Peter's example)
words = array.inject({}) { |hash, value| hash.merge value => Word.new }
# same as above, but more efficient, using monkey-lib (gem install monkey-lib)
words = array.construct_hash { |value| [value, Word.new ] }

Resources