Say I have a CSV file with 4 fields,
ID,name,pay,age
and about 32,000 records.
What's the best way to stick this into a hash in Ruby?
In other words, an example record would look like:
{:rec1 => {:id=>"00001", :name => "Bob", :pay => 150, :age => 95 } }
Thanks for the help!
You can use the Excelsior rubygem for this:
csv = ...
result = Hash.new
counter = 1
Excelsior::Reader.rows(csv) do |row|
row_hash = result[("rec#{counter}".intern)] = Hash.new
row.each do |col_name, col_val|
row_hash[col_name.intern] = col_val
end
counter += 1
end
# do something with result...
Typically we'd want to use an :id field for the Hash key, since it'd be the same as a primary key in a database table:
{"00001" => {:name => "Bob", :pay => 150, :age => 95 } }
This will create a hash looking like that:
require 'ap'
# Pretend this is CSV data...
csv = [
%w[ id name pay age ],
%w[ 1 bob 150 95 ],
%w[ 2 fred 151 90 ],
%w[ 3 sam 140 85 ],
%w[ 31999 jane 150 95 ]
]
# pull headers from the first record
headers = csv.shift
# drop the first header, which is the ID. We'll use it as the key so we won't need a name for it.
headers.shift
# loop over the remaining records, adding them to a hash
data = csv.inject({}) { |h, row| h[row.shift.rjust(5, '0')] = Hash[headers.zip(row)]; h }
ap data
# >> {
# >> "00001" => {
# >> "name" => "bob",
# >> "pay" => "150",
# >> "age" => "95"
# >> },
# >> "00002" => {
# >> "name" => "fred",
# >> "pay" => "151",
# >> "age" => "90"
# >> },
# >> "00003" => {
# >> "name" => "sam",
# >> "pay" => "140",
# >> "age" => "85"
# >> },
# >> "31999" => {
# >> "name" => "jane",
# >> "pay" => "150",
# >> "age" => "95"
# >> }
# >> }
Check out the Ruby Gem smarter_csv, which parses CSV-files and returns array(s) of hashes for the rows in the CSV-file. It can also do chunking, to more efficiently deal with large CSV-files, so you can pass the chunks to parallel Resque workers or mass-create records with Mongoid or MongoMapper.
It comes with plenty of useful options - check out the documentation on GitHub
require 'smarter_csv'
filename = '/tmp/input.csv'
array = SmarterCSV.process(filename)
=>
[ {:id=> 1, :name => "Bob", :pay => 150, :age => 95 } ,
...
]
See also:
https://github.com/tilo/smarter_csv
http://www.unixgods.org/~tilo/Ruby/process_csv_as_hashes.html
Hash[*CSV.read(filename, :headers => true).flat_map.with_index{|r,i| ["rec#{i+1}", r.to_hash]}]
Related
I am trying to define a function that it can print out any hash values in a tree format. The function will do something like this:
From
{"parent1"=>
{"child1" => { "grandchild1" => 1,
"grandchild2" => 2},
"child2" => { "grandchild3" => 3,
"grandchild4" => 4}}
}
To
parent1:
child1:
grandchild1:1
grandchild2:2
child2:
grandchild3:3
grandchild4:4
And this is my code so far:
def readprop(foo)
level = ''
if foo.is_a?(Hash)
foo.each_key {|key| if foo[key].nil? == false
puts level + key + ":"
level += " "
readprop(foo[key])
end
}
else
puts level + foo
level = level[0,level.length - 2]
end
end
and it will give me a bad format like this:
parent1:
child1:
grandchild1:
1
grandchild2:
2
child2:
grandchild3:
3
grandchild4:
4
You are almost there. One way to solve it is to make level a part of the recursive function parameters. x is the hash in the question.
Simple recursive version:
def print_hash(h,spaces=4,level=0)
h.each do |key,val|
format = "#{' '*spaces*level}#{key}: "
if val.is_a? Hash
puts format
print_hash(val,spaces,level+1)
else
puts format + val.to_s
end
end
end
print_hash(x)
#parent1:
# child1:
# grandchild1: 1
# grandchild2: 2
# child2:
# grandchild3: 3
# grandchild4: 4
In this case you could also convert it to YAML (as mentioned in a comment above)
require 'YAML'
puts x.to_yaml
#---
#parent1:
# child1:
# grandchild1: 1
# grandchild2: 2
# child2:
# grandchild3: 3
# grandchild4: 4
I would use recursion, but there is another way that might be of interest to some. Below I've used a "pretty printer", awesome-print, to do part of the formatting (the indentation in particular), saving the result to a string, and then applied a couple of gsub's to the string to massage the results into the desired format.
Suppose your hash were as follows:
h = { "parent1"=>
{ "child1" => { "grandchild11" => 1,
"grandchild12" => { "great grandchild121" => 3 } },
"child2" => { "grandchild21" => { "great grandchild211" =>
{ "great great grandchild2111" => 4 } },
"grandchild22" => 2 }
}
}
We could then do the following.
require 'awesome_print'
puts str = h.awesome_inspect(indent: -5, index: false, plain: true).
gsub(/^\s*(?:{|},?)\s*\n|[\"{}]/, '').
gsub(/\s*=>\s/, ':')
prints
parent1:
child1:
grandchild11:1,
grandchild12:
great grandchild121:3
child2:
grandchild21:
great grandchild211:
great great grandchild2111:4
grandchild22:2
The steps:
str = h.awesome_inspect(indent: -5, index: false, plain: true)
puts str prints
{
"parent1" => {
"child1" => {
"grandchild11" => 1,
"grandchild12" => {
"great grandchild121" => 3
}
},
"child2" => {
"grandchild21" => {
"great grandchild211" => {
"great great grandchild2111" => 4
}
},
"grandchild22" => 2
}
}
}
s1 = str.gsub(/^\s*(?:{|},?)\s*\n|[\"{}]/, '')
puts s1 prints
parent1 =>
child1 =>
grandchild11 => 1,
grandchild12 =>
great grandchild121 => 3
child2 =>
grandchild21 =>
great grandchild211 =>
great great grandchild2111 => 4
grandchild22 => 2
s2 = s1.gsub(/\s*=>\s/, ':')
puts s2 prints the result above.
Not exactly what you require but I will submit this answer as I think you may find it useful:
require 'yaml'
hash = {"parent1"=> {"child1" => { "grandchild1" => 1,"grandchild2" => 2},
"child2" => { "grandchild3" => 3,"grandchild4" => 4}}}
puts hash.to_yaml
prints:
---
parent1:
child1:
grandchild1: 1
grandchild2: 2
child2:
grandchild3: 3
grandchild4: 4
See Ruby Recursive Tree
Suppose we have
#$ mkdir -p foo/bar
#$ mkdir -p baz/boo/bee
#$ mkdir -p baz/goo
We can get
{
"baz"=>{
"boo"=>{
"bee"=>{}},
"goo"=>{}},
"foo"=>{
"bar"=>{}}}
We can traverse the tree as the following. So, here's a way to make a Hash based on directory tree on disk:
Dir.glob('**/*'). # get all files below current dir
select{|f|
File.directory?(f) # only directories we need
}.map{|path|
path.split '/' # split to parts
}.inject({}){|acc, path| # start with empty hash
path.inject(acc) do |acc2,dir| # for each path part, create a child of current node
acc2[dir] ||= {} # and pass it as new current node
end
acc
}
Thanks to Mladen Jablanović in the other answer for this concept.
I want to compare two XML files where one is input and the other is output. I am converting both into a hash.
My idea is to get all the keys from the input XML converted to hash, and search each key in both the input and output hashes for their respective key/value pairs.
I have a hash:
{
"requisition_header" => {
"requested_by" => {"login" => "coupasupport"},
"department" => {"name" => "Marketing"},
"ship_to_address" => {"name" => "Address_1431693296"},
"justification" => nil,
"attachments" => [],
"requisition_lines" => [
{
"description" => "Cleaning Services for Building A",
"line_num" => 1,
"need_by_date" => 2010-09-23 07:00:00 UTC,
"source_part_num" => nil,
"supp_aux_part_num" => nil,
"unit_price" => #<BigDecimal:a60520c,'0.3E4',9(18)>,
"supplier" => {"name" => "amazon.com"},
"account" => {
"code" => "SF-Marketing-Indirect",
"account_type" => {"name" => "Ace Corporate"}
},
"currency" => {"code" => "USD"},
"payment_term" => {"code" => "Net 30"},
"shipping_term" => {"code" => "Standard"},
"commodity" => {"name" => "Marketing-Services"}
}
]
}
}
It is nested and all the values are not directly accessible.
I want a way to generate direct access to each value in the hash.
For example:
requisition_header.requested_by.login
will access "coupasupport".
requisition_header.department.name
will access "Marketing".
requisition_header.requisition_lines[0].description
will access "Cleaning Services for Building A".
requisition_header.requisition_lines[0].line_num
will access "1".
requisition_header.requisition_lines[0].need_by_date
will access "2010-09-23 07:00:00 UTC".
Each key built can be used to search for the value directly inside the hash.
That could be done with the following method, that translates the nested hash into nested OpenStructs:
require 'ostruct'
def deep_structify(hash)
result = {}
hash.each do |key, value|
result[key] = value.is_a?(Hash) ? deep_structify(value) : value
end if hash
OpenStruct.new(result)
end
hash = {"requisition_header"=>{"requested_by"=>{"login"=>"coupasupport"}, "department"=>{"name"=>"Marketing"}, "ship_to_address"=>{"name"=>"Address_1431693296"}, "justification"=>nil, "attachments"=>[], "requisition_lines"=>[{"description"=>"Cleaning Services for Building A", "line_num"=>1, "need_by_date"=>2010-09-23 07:00:00 UTC, "source_part_num"=>nil, "supp_aux_part_num"=>nil, "unit_price"=>#<BigDecimal:a60520c,'0.3E4',9(18)>, "supplier"=>{"name"=>"amazon.com"}, "account"=>{"code"=>"SF-Marketing-Indirect", "account_type"=>{"name"=>"Ace Corporate"}}, "currency"=>{"code"=>"USD"}, "payment_term"=>{"code"=>"Net 30"}, "shipping_term"=>{"code"=>"Standard"}, "commodity"=>{"name"=>"Marketing-Services"}}]}}
struct = deep_structify(hash)
struct.requisition_header.department.name
#=> "Marketing"
You can do it by overriding OpenStruct#new as well,
require 'ostruct'
class DeepStruct < OpenStruct
def initialize(hash=nil)
#table = {}
#hash_table = {}
if hash
hash.each do |k,v|
#table[k.to_sym] = (v.is_a?(Hash) ? self.class.new(v) : v)
#hash_table[k.to_sym] = v
new_ostruct_member(k)
end
end
end
def to_h
#hash_table
end
end
Now you can do:
require 'deep_struct'
hash = {"requisition_header"=>{"requested_by"=>{"login"=>"coupasupport"}, "department"=>{"name"=>"Marketing"}, "ship_to_address"=>{"name"=>"Address_1431693296"}, "justification"=>nil, "attachments"=>[], "requisition_lines"=>[{"description"=>"Cleaning Services for Building A", "line_num"=>1, "need_by_date"=>2010-09-23 07:00:00 UTC, "source_part_num"=>nil, "supp_aux_part_num"=>nil, "unit_price"=>#<BigDecimal:a60520c,'0.3E4',9(18)>, "supplier"=>{"name"=>"amazon.com"}, "account"=>{"code"=>"SF-Marketing-Indirect", "account_type"=>{"name"=>"Ace Corporate"}}, "currency"=>{"code"=>"USD"}, "payment_term"=>{"code"=>"Net 30"}, "shipping_term"=>{"code"=>"Standard"}, "commodity"=>{"name"=>"Marketing-Services"}}]}}
mystruct = DeepStruct.new hash
mystruct.requisition_header.requested_by.login # => coupasupport
mystruct.requisition_header.to_h # => {"requested_by"=>{"login"=>"coupasupport"}
You could use BasicObject#method_missing:
Code
class Hash
def method_missing(key,*args)
(args.empty? && key?(key)) ? self[key] : super
end
end
Example
hash = { animals: {
pets: { dog: "Diva", cat: "Boots", python: "Stretch" },
farm: { pig: "Porky", chicken: "Little", sheep: "Baa" }
},
finishes: {
tinted: { stain: "Millers", paint: "Oxford" },
clear: { lacquer: "Target", varnish: "Topcoat" }
}
}
hash.finishes.tinted.stain
#=> "Millers
hash.animals.pets.cat
#=> "Boots"
hash.animals.pets
#=> {:dog=>"Diva", :cat=>"Boots", :python=>"Stretch"}
hash.animals
#=> {:pets=>{:dog=>"Diva", :cat=>"Boots", :python=>"Stretch"},
# :farm=>{:pig=>"Porky", :chicken=>"Little", :sheep=>"Baa"}}
Reader challenge
There is a potential "gotcha" with this approach. I leave it to the reader to identify it. My example contains a clue. (Mind you, there may be other problems I haven't thought of.)
Why second output shows me only one element of Array? Is it still Array or Hash already?
def printArray(arr)
arr.each { | j |
k, v = j.first
printf("%s %s %s \n", k, v, j)
}
end
print "Array 1\n"
printArray( [
{kk: { 'k1' => 'v1' }},
{kk: { 'k2' => 'v2' }},
{kk: { 'k3' => 'v3' }},
])
print "Array 2\n"
printArray( [
kk: { 'k1' => 'v1' },
kk: { 'k2' => 'v2' },
kk: { 'k3' => 'v3' },
])
exit
# Output:
#
# Array 1
# kk {"k1"=>"v1"} {:kk=>{"k1"=>"v1"}}
# kk {"k2"=>"v2"} {:kk=>{"k2"=>"v2"}}
# kk {"k3"=>"v3"} {:kk=>{"k3"=>"v3"}}
# Array 2
# kk {"k3"=>"v3"} {:kk=>{"k3"=>"v3"}}
Ruby interpreted the second example as an array with a single hash as its element (the curly braces are implied). It is equivalent to this:
[{ kk: { 'k1' => 'v1' }, kk: { 'k2' => 'v2' }, kk: { 'k3' => 'v3' }}]
Only the last 'kk' is shown because hashes can't have duplicate keys; only the last one sticks.
If you want an array with multiple hashes as elements, you need to use the syntax like on your first example.
More examples on which ruby implies a hash start:
# Only argument on method calls
def only_arg(obj)
puts obj.class
end
only_arg(bar: "baz") # => Hash
# Which is equivalent to:
only_arg({bar: "baz"}) # => Hash
# Last argument on method calls
def last_arg(ignored, obj)
puts obj.class
end
last_arg("ignored", bar: "baz") # => Hash
# Which is equivalent to:
last_arg("ignored", { bar: "baz" }) # => Hash
# Last element on an array
def last_on_array(arr)
puts arr.last.class
end
last_on_array(["something", "something", bar: "baz"]) # => Hash
# Which is equivalent to:
last_on_array(["something", "something", { bar: "baz" }]) # => Hash
I'm writing an API parser at the moment, and I'm working on formatting the data nicely.
So far, I have the following code:
data.each {|season| episodes[season["no"].to_i] = season["episode"].group_by{|i| i["seasonnum"].to_i}}
However, the only issue with this is that the output comes out like this:
8 => {
1 => [
[0] {
"epnum" => "150",
"seasonnum" => "01",
"prodnum" => "3X7802",
"airdate" => "2012-10-03",
"link" => "http://www.tvrage.com/Supernatural/episodes/1065195189",
"title" => "We Need to Talk About Kevin"
}
],
2 => [
[0] {
"epnum" => "151",
"seasonnum" => "02",
"prodnum" => "3X7803",
"airdate" => "2012-10-10",
"link" => "http://www.tvrage.com/Supernatural/episodes/1065217045",
"title" => "What's Up, Tiger Mommy?"
}
]
}
So there's a redundant array in each value of the secondary hash. How would I remove this array and just have the inside hash? So, for example I want:
8 => {
1 => {
"epnum" => "150",
"seasonnum" => "01",
"prodnum" => "3X7802",
"airdate" => "2012-10-03",
"link" => "http://www.tvrage.com/Supernatural/episodes/1065195189",
"title" => "We Need to Talk About Kevin"
}
,
etc.
EDIT: Here's the full file:
require 'httparty'
require 'awesome_print'
require 'debugger'
require 'active_support'
episodes = Hash.new{ [] }
response = HTTParty.get('http://services.tvrage.com/feeds/episode_list.php?sid=5410')
data = response.parsed_response['Show']['Episodelist']["Season"]
data.each { |season|
episodes[season["no"].to_i] = season["episode"].group_by{ |i|
i["seasonnum"].to_i
}
}
ap episodes
Input data: http://services.tvrage.com/feeds/episode_list.php?sid=5410
Wild guess:
data.each { |season|
episodes[season["no"].to_i] = season["episode"].group_by{ |i|
i["seasonnum"].to_i
}.first
}
It looks like you're using group_by (array of entries with same key) when you really want index_by (one entry per key).
data.each {|season| episodes[season["no"].to_i] = season["episode"].index_by {|i| i["seasonnum"].to_i}}
NOTE: If you can have MORE than one episode with the same seasonnum, you SHOULD use group by and have an array of values here. If you're just building a hash of episodes with a convenient lookup (one to one mapping), then index_by is what you want.
Why does the value in my data_dummy hash increase? I’d like to use it to initiate another hash with zero values!
fau[f.label][:hash] = data_dummy # ==>{"name 1" => 0, "name 2" => 0} but in the second loop it contains data from the first loop e.g. {"name 1" => 2, "name 2" => 0}
When using the string instead of variable dummy_data the code works as expected.
fau[f.label][:hash] = {"name 1" => 0, "name 2" => 0}
I can't do that because 'name X' is changing....
That's strange to me!
complete code
fau = {}
series = []
labels = [{:value => 0, :text => ''}]
data_dummy = {}
source.each do |c|
data_dummy[c.name] = 0
end
i = 0
data_dummy.each do |k,v|
i += 1
labels.push({:value => i, :text => k})
end
source.each do |s|
logger.debug "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
logger.debug "Source: '#{s.name}'|'#{fault_labels[s.fault_id.to_s].label}' => #{s.cnt}"
con_name = s.name #TODO: Cut name (remove location like left,right, ...)
f = fault_labels[s.fault_id.to_s]
unless fau.has_key?(f.label)
# init faults-hash 'fau'
fau[f.label] = {:total => 0, :hash => {}, :color => f.color, :name => f.label} #, :data => []
# add all connector_names as keys with value = 0
logger.debug "init :hash with #{data_dummy}" # ==>{"name 1" => 0, "name 2" => 0} but in the second loop it contains data from the first loop e.g. {"name 1" => 2, "name 2" => 0}
fau[f.label][:hash] = data_dummy
# this way the number of incidents are all in the same order for each fault (first dimension key)
# and all get at least a value of 0
end
logger.debug "Count up fau['#{f.label}'][:total] = #{fau[f.label][:total]} + #{s.cnt} (where connector '#{s.name}' and fault '#{f.label}')"
logger.debug "Count up fau['#{f.label}'][:hash]['#{con_name}'] = #{fau[f.label][:hash][con_name]} + #{s.cnt}"
fau[f.label][:total] += s.cnt
fau[f.label][:hash][con_name] += s.cnt
logger.debug "result :hash with #{fau[f.label][:hash].inspect}}"
end
Because Ruby hashes, like all Ruby objects, are references and copying one, such as hash2 = hash1 only creates a copy of the reference. Modifying hash2 will modify hash1, as really, they are just different aliases for the same thing.
You want to use the clone method instead.
hash2 = hash1.clone
See also How do I copy a hash in Ruby?
Note that even this only creates a shallow copy, if you have a nested hash (such as myhash = {"key1" => "value1", "key2" => {"key2a" => "value2a"}}), you will have to make a deep copy. According to Wayne Conrad's answer to the question above, the way to do that is this:
def deep_copy(o)
Marshal.load(Marshal.dump(o))
end
If you want to make a copy of the hash, you need to use the dup method:
foo = {"name 1" => 0, "name 2" => 0}
bar = foo
foo["name 2"] += 1
foo
=> {"name 2"=>1, "name 1"=>0}
bar
=> {"name 2"=>1, "name 1"=>0}
baz = foo.dup
foo["name 2"] += 1
foo
=> {"name 2"=>2, "name 1"=>0}
baz
=> {"name 2"=>1, "name 1"=>0}