How to output a hash to a CSV line - ruby

I'd like to map a hash to a CSV line.
I have a couple of objects in a hash:
person1 = {'first_name' => 'John', 'second_name' => 'Doe', 'favorite_color' => 'blue', 'favorite_band' => 'Backstreet Boys'}
person2 = {'first_name' => 'Susan', 'favorite_color' => 'green', 'second_name' => 'Smith'}
I want to transform this into a CSV file with the keys as columns and the values for each row.
I can easily create the headers by creating a CSV::Row like this:
h = CSV::Row.new(all_keys_as_array,[],true)
I cannot rely on the order and more important, not all values are filled all the time.
But when I now try to add rows to the table via << and array, the mapping of the headers is ignored. It has to be in the right order.
To demonstrate this, I wrote this little script:
require 'csv'
person1 = {'first_name' => 'John', 'second_name' => 'Doe', 'favorite_color' => 'blue', 'favorite_band' => 'Backstreet Boys'}
person2 = {'first_name' => 'Susan', 'favorite_color' => 'green', 'second_name' => 'Smith'}
persons = [person1, person2]
all_keys_as_array = %w{first_name second_name favorite_color favorite_band}
h = CSV::Row.new(all_keys_as_array,[],true)
t = CSV::Table.new([h])
persons.each do |p|
r = CSV::Row.new([],[],false)
p.each do |k, v|
r << {k => v}
end
t << r
end
puts t.to_csv
I would expect this output:
first_name,last_name,favorite_color,favorite_band
John,Doe,blue,Backstreet Boys
Susan,Smith,green,
Instead the values in the order as they appear. So the output is this:
first_name,second_name,favorite_color,favorite_band
John,Doe,blue,Backstreet Boys
Susan,green,Smith
The strangest part is, when I do a lookup via ['key'] I get the correct values:
puts "favorite_bands: #{t['favorite_band']}"
> favorite_bands: [nil, "Backstreet Boys", nil]
So is there any way to write to a CSV file as I expect?

You can just iterate over the column names
persons.each do |person|
r = CSV::Row.new([],[],false)
all_keys_as_array.each do |key|
r << person[key]
end
t << r
end

The current suggestions both work, so thank you for this.
However I stumbled upon a more elegant solution by using CSV::Row#fields.
Then I can just convert the CSV row to the correct array before adding it to the table:
t << r.fields(*all_keys_as_array)

You may be able to just use #to_csv and dispense with all the CSV::Row and CSV::Table stuff:
headers = %w{first_name second_name favorite_color favorite_band} # fyi if you have commas in here you'll get messed up
people = []
people << {'first_name' => 'John', 'second_name' => 'Doe', 'favorite_color' => 'blue', 'favorite_band' => 'Backstreet Boys'}
people << {'first_name' => 'Susan', 'favorite_color' => 'green', 'second_name' => 'Smith'}
require 'csv'
File.open('output.csv', 'w') do |f|
f.puts headers.to_csv
people.each do |person|
f.puts headers.map { |h| person[h] }.to_csv
end
end

Related

Most performant way to group/summarise two hashes?

I have two hashes with some data that I need to aggregate. The first one is a mapping of which ids (id_1, id_2, id_3, id_4) belong under what category (a, b, c):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
The second hash holds values of how many events happened per id for a given date (date_1, date_2, date_3):
hash_2 = {
'id_1' => {'date_1' => 5, 'date_2' => 6, 'date_3' => 8},
'id_2' => {'date_1' => 0, 'date_3' => 6},
'id_3' => {'date_1' => 0, 'date_2' => nil, 'date_3' => 1},
'id_4' => {'date_1' => 10, 'date_2' => 1}
}
What I want is to get the total event per category (a,b,c). For the above example, the result would look something like:
hash_3 = {'a' => (5+6+8+0+6), 'b' => (0+0+1), 'c' => (10+1)}
My problem is, that there are about 5000 categories, each pointing to typically 1 to 3 ids, and each ID having event counts for 30 dates or more. So this takes quite a bit of computation. What will be the most performant (time effective) way to do this grouping in Ruby?
update
This is what I tried so far (took like 6-8 seconds!, horribly slow):
def total_clicks_per_category
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
end
def total_event_per_ids(ids)
ids.reduce(0) do |memo, id|
events = hash_2.fetch(id, {})
memo + (events.values.reduce(:+) || 0)
end
end
P.S. I’m using Ruby 2.3.
I'm writing this on a phone so I cannot test right now, but it looks OK.
g = hash_2.each_with_object({}) { |(k,v),g| g[k] = v.values.compact.sum }
hash_3 = hash_1.each_with_object({}) { |(k,v),h| h[k] = g.values_at(*v).sum }
First, create an intermediate hash that holds the sum of hash_2:
hash_4 = hash_2.map{|k, v| [k, v.values.inject(:+)]}.to_h
# => {"id_1"=>19, "id_2"=>6, "id_3"=>1, "id_4"=>11}
Then do the final summation:
hash_3 = hash_1.map{|k, v| [k, v.map{|k| hash_4[k]}.inject(:+)]}.to_h
# => {"a"=>25, "b"=>1, "c"=>11}
Theory
5000*3*30 isn't that many. Ruby probably will need a second at most for this kind of job.
Hash lookup is fast by default, you won't be able to optimize much.
You could pre-calculate hash_2_sum, though :
hash_2_sum = {
'id_1' => 5+6+8,
'id_2' => 0+6,
'id_3' => 0+0+1,
'id_4' => 10+1
}
A loop on hash1 with hash_2_sum lookup, and you're done.
Code
Your example has been updated with some nil values. You need to remove them with compact, and make sure the sum is 0 when no element is found with inject(0, :+):
hash_1 = {'a' => ['id_1','id_2'], 'b' => ['id_3'], 'c' => ['id_4']}
hash_2 = {
'id_1' => { 'date_1' => 5, 'date_2' => 6, 'date_3' => 8 },
'id_2' => { 'date_1' => 0, 'date_3' => 6 },
'id_3' => { 'date_1' => 0, 'date_2' => nil, 'date_3' => 1 },
'id_4' => { 'date_1' => 10, 'date_2' => 1 }
}
hash_2_sum = hash_2.each_with_object({}) do |(key, dates), sum|
sum[key] = dates.values.compact.inject(0, :+)
end
hash_3 = hash_1.each_with_object({}) do |(key, ids), sum|
sum[key] = hash_2_sum.values_at(*ids).inject(0, :+)
end
# {"a"=>25, "b"=>1, "c"=>11}
Note
{}.tap do |res|
hash_1.each do |cat, ids|
res[cat] = total_event_per_ids(ids)
end
end
isn't very readable IMHO.
You can either use each_with_object or Array#to_h :
result = [1, 2, 3].each_with_object({}) do |i, hash|
hash[i] = i * i
end
#=> {1=>1, 2=>4, 3=>9}
result = [1, 2, 3].map { |i| [i, i * i] }.to_h
#=> {1=>1, 2=>4, 3=>9}

Ruby converting an array to a hash

I have the following string and I would like to convert it to a hash printing the below result
string = "Cow, Bill, Phone, Flour"
hash = string.split(",")
>> {:animal => "Cow", :person: "Bill", :gadget => "Phone",
:grocery => "Flour"}
hash = Hash[[:animal, :person, :gadget, :grocery].zip(string.split(/,\s*/))]
The answer by #Max is quite nice. You might understand it better as:
def string_to_hash(str)
values = str.split(/,\s*/)
names = [:animal, :person, :gadget, :grocery]
Hash[names.zip(values)]
end
Here is a less sophisticated approach:
def string_to_hash(str)
parts = str.split(/,\s*/)
Hash[
:animal => parts[0],
:person => parts[1],
:gadget => parts[2],
:grocery => parts[3],
]
end

Merge hashes based on particular key/value pair in ruby

I am trying to merge an array of hashes based on a particular key/value pair.
array = [ {:id => '1', :value => '2'}, {:id => '1', :value => '5'} ]
I would want the output to be
{:id => '1', :value => '7'}
As patru stated, in sql terms this would be equivalent to:
SELECT SUM(value) FROM Hashes GROUP BY id
In other words, I have an array of hashes that contains records. I would like to obtain the sum of a particular field, but the sum would grouped by key/value pairs. In other words, if my selection criteria is :id as in the example above, then it would seperate the hashes into groups where the id was the same and the sum the other keys.
I apologize for any confusion due to the typo earlier.
Edit: The question has been clarified since I first posted my answer. As a result, I have revised my answer substantially.
Here are two "standard" ways of addressing this problem. Both use Enumerable#select to first extract the elements from the array (hashes) that contain the given key/value pair.
#1
The first method uses Hash#merge! to sequentially merge each array element (hashes) into a hash that is initially empty.
Code
def doit(arr, target_key, target_value)
qualified = arr.select {|h|h.key?(target_key) && h[target_key]==target_value}
return nil if qualified.empty?
qualified.each_with_object({}) {|h,g|
g.merge!(h) {|k,gv,hv| k == target_key ? gv : (gv.to_i + hv.to_i).to_s}}
end
Example
arr = [{:id => '1', :value => '2'}, {:id => '2', :value => '3'},
{:id => '1', :chips => '4'}, {:zd => '1', :value => '8'},
{:cat => '2', :value => '3'}, {:id => '1', :value => '5'}]
doit(arr, :id, '1')
#=> {:id=>"1", :value=>"7", :chips=>"4"}
Explanation
The key here is to use the version of Hash#merge! that uses a block to determine the value for each key/value pair whose key appears in both of the hashes being merged. The two values for that key are represented above by the block variables hv and gv. We simply want to add them together. Note that g is the (initially empty) hash object created by each_with_object, and returned by doit.
target_key = :id
target_value = '1'
qualified = arr.select {|h|h.key?(target_key) && h[target_key]==target_value}
#=> [{:id=>"1", :value=>"2"},{:id=>"1", :chips=>"4"},{:id=>"1", :value=>"5"}]
qualified.empty?
#=> false
qualified.each_with_object({}) {|h,g|
g.merge!(h) {|k,gv,hv| k == target_key ? gv : (gv.to_i + hv.to_i).to_s}}
#=> {:id=>"1", :value=>"7", :chips=>"4"}
#2
The other common way to do this kind of calculation is to use Enumerable#flat_map, followed by Enumerable#group_by.
Code
def doit(arr, target_key, target_value)
qualified = arr.select {|h|h.key?(target_key) && h[target_key]==target_value}
return nil if qualified.empty?
qualified.flat_map(&:to_a)
.group_by(&:first)
.values.map { |a| a.first.first == target_key ? a.first :
[a.first.first, a.reduce(0) {|tot,s| tot + s.last}]}.to_h
end
Explanation
This may look complex, but it's not so bad if you break it down into steps. Here's what's happening. (The calculation of qualified is the same as in #1.)
target_key = :id
target_value = '1'
c = qualified.flat_map(&:to_a)
#=> [[:id,"1"],[:value,"2"],[:id,"1"],[:chips,"4"],[:id,"1"],[:value,"5"]]
d = c.group_by(&:first)
#=> {:id=>[[:id, "1"], [:id, "1"], [:id, "1"]],
# :value=>[[:value, "2"], [:value, "5"]],
# :chips=>[[:chips, "4"]]}
e = d.values
#=> [[[:id, "1"], [:id, "1"], [:id, "1"]],
# [[:value, "2"], [:value, "5"]],
# [[:chips, "4"]]]
f = e.map { |a| a.first.first == target_key ? a.first :
[a.first.first, a.reduce(0) {|tot,s| tot + s.last}] }
#=> [[:id, "1"], [:value, "7"], [:chips, "4"]]
f.to_h => {:id=>"1", :value=>"7", :chips=>"4"}
#=> {:id=>"1", :value=>"7", :chips=>"4"}
Comment
You may wish to consider makin the values in the hashes integers and exclude the target_key/target_value pairs from qualified:
arr = [{:id => 1, :value => 2}, {:id => 2, :value => 3},
{:id => 1, :chips => 4}, {:zd => 1, :value => 8},
{:cat => 2, :value => 3}, {:id => 1, :value => 5}]
target_key = :id
target_value = 1
qualified = arr.select { |h| h.key?(target_key) && h[target_key]==target_value}
.each { |h| h.delete(target_key) }
#=> [{:value=>2}, {:chips=>4}, {:value=>5}]
return nil if qualified.empty?
Then either
qualified.each_with_object({}) {|h,g| g.merge!(h) { |k,gv,hv| gv + hv } }
#=> {:value=>7, :chips=>4}
or
qualified.flat_map(&:to_a)
.group_by(&:first)
.values
.map { |a| [a.first.first, a.reduce(0) {|tot,s| tot + s.last}] }.to_h
#=> {:value=>7, :chips=>4}

I'm using the ruby gem "roo" to read xlsx file, how to return the content of one column as an array?

require 'gchart'
require 'rubygems'
require 'roo'
oo = Excelx.new("datav.xlsx")
oo.default_sheet = oo.sheets.first
2.upto(47) do |line|
data_a = [oo.cell(line,'B')]
data_b = [oo.cell(line,'E')]
chart_a = Gchart.new( :type => 'line',
:title => "A",
:theme => :keynote,
:width => 600,
:height => 500,
:data => data_a,
:line_colors => 'e0440e',
:axis_with_labels => ['x', 'y'],
:axis_range => [[0,50,20], [0,3000,500]],
:filename => "tmp/chart_a.png")
chart_b = Gchart.new( :type => 'line',
:title => "B",
:theme => :keynote,
:width => 600,
:height => 500,
:data => data_b,
:line_colors => 'e62ae5',
:axis_with_labels => ['x', 'y'],
:axis_range => [[0,50,20], [0,3000,500]],
:filename => "tmp/chart_b.png")
# Record file in filesystem
chart_a.file
chart_b.file
end
This will get every cell's content of column B and E to be the argument :data alone. How to return it as an array? If roo can't return array, then is there any else gem do this?
there is a column method that returns values of a given column as an array. Calling oo.column(2) should return you values for column B. oo.column('B') might work also. haven't tested it.
I needed the row back as a hash to be compatible with the logic I used for FasterCSV. This will give you a hash of the first row as the key and current line as the value.
def row_from_excel(s, line)
row = {}
s.first_column.upto(s.last_column) do |col|
cell_name = s.cell(1, col)
logger.debug "************* #{col} => #{cell_name} => #{s.cell(line, col)}"
row[cell_name] = s.cell(line, col)
end
row
end
s = Excelx.new(path_to_file) # or Excel.new(path_to_file)
2.upto(s.last_row) do |line|
row = row_from_excel(s, line)
end

How to save a hash into a CSV

I am new in ruby so please forgive the noobishness.
I have a CSV with two columns. One for animal name and one for animal type.
I have a hash with all the keys being animal names and the values being animal type. I would like to write the hash to the CSV without using fasterCSV. I have thought of several ideas what would be easiest.. here is the basic layout.
require "csv"
def write_file
h = { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
CSV.open("data.csv", "wb") do |csv|
csv << [???????????]
end
end
When I opened the file to read from it I opened it File.open("blabla.csv", headers: true)
Would it be possible to write back to the file the same way?
If you want column headers and you have multiple hashes:
require 'csv'
hashes = [{'a' => 'aaaa', 'b' => 'bbbb'}]
column_names = hashes.first.keys
s=CSV.generate do |csv|
csv << column_names
hashes.each do |x|
csv << x.values
end
end
File.write('the_file.csv', s)
(tested on Ruby 1.9.3-p429)
Try this:
require 'csv'
h = { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
CSV.open("data.csv", "wb") {|csv| h.to_a.each {|elem| csv << elem} }
Will result:
1.9.2-p290:~$ cat data.csv
dog,canine
cat,feline
donkey,asinine
I think the simplest solution to your original question:
def write_file
h = { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
CSV.open("data.csv", "w", headers: h.keys) do |csv|
csv << h.values
end
end
With multiple hashes that all share the same keys:
def write_file
hashes = [ { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' },
{ 'dog' => 'rover', 'cat' => 'kitty', 'donkey' => 'ass' } ]
CSV.open("data.csv", "w", headers: hashes.first.keys) do |csv|
hashes.each do |h|
csv << h.values
end
end
end
CSV can take a hash in any order, exclude elements, and omit a params not in the HEADERS
require "csv"
HEADERS = [
'dog',
'cat',
'donkey'
]
def write_file
CSV.open("data.csv", "wb", :headers => HEADERS, :write_headers => true) do |csv|
csv << { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
csv << { 'dog' => 'canine'}
csv << { 'cat' => 'feline', 'dog' => 'canine', 'donkey' => 'asinine' }
csv << { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine', 'header not provided in the options to #open' => 'not included in output' }
end
end
write_file # =>
# dog,cat,donkey
# canine,feline,asinine
# canine,,
# canine,feline,asinine
# canine,feline,asinine
This makes working with the CSV class more flexible and readable.
I tried the solutions here but got an incorrect result (values in wrong columns) since my source is a LDIF file that not always has all the values for a key. I ended up using the following.
First, when building up the hash I remember the keys in a separate array which I extend with the keys that are not allready there.
# building up the array of hashes
File.read(ARGV[0]).each_line do |lijn|
case
when lijn[0..2] == "dn:" # new record
record = {}
when lijn.chomp == '' # end record
if record['telephonenumber'] # valid record ?
hashes << record
keys = keys.concat(record.keys).uniq
end
when ...
end
end
The important line here is keys = keys.concat(record.keys).uniq which extends the array of keys when new keys (headers) are found.
Now the most important: converting our hashes to a CSV
CSV.open("export.csv", "w", {headers: keys, col_sep: ";"}) do |row|
row << keys # add the headers
hashes.each do |hash|
row << hash # the whole hash, not just the array of values
end
end
[BEWARE] All the answers in this thread are assuming that the order of the keys defined in the hash will be constant amongst all rows.
To prevent problems (that I am facing right now) where some values are assigned to the wrong keys in the csv (Ex:)
hahes = [
{:cola => "hello", :colb => "bye"},
{:colb => "bye", :cola => "hello"}
]
producing the following table using the code from the majority (including best answer) of the answers on this thread:
cola | colb
-------------
hello | bye
-------------
bye | hello
You should do this instead:
require "csv"
csv_rows = [
{:cola => "hello", :colb => "bye"},
{:colb => "bye", :cola => "hello"}
]
column_names = csv_rows.first.keys
s=CSV.generate do |csv|
csv << column_names
csv_rows.each do |row|
csv << column_names.map{|column_name| row[column_name]} #To be explicit
end
end
Try this:
require 'csv'
data = { 'one' => '1', 'two' => '2', 'three' => '3' }
CSV.open("data.csv", "a+") do |csv|
csv << data.keys
csv << data.values
end
Lets we have a hash,
hash_1 = {1=>{:rev=>400, :d_odr=>3}, 2=>{:rev=>4003, :d_price=>300}}
The above hash_1 having keys as some id 1,2,.. and values to those are again hash with some keys as (:rev, :d_odr, :d_price).
Suppose we want a CSV file with headers,
headers = ['Designer_id','Revenue','Discount_price','Impression','Designer ODR']
Then make a new array for each value of hash_1 and insert it in CSV file,
CSV.open("design_performance_data_temp.csv", "w") do |csv|
csv << headers
csv_data = []
result.each do |design_data|
csv_data << design_data.first
csv_data << design_data.second[:rev] || 0
csv_data << design_data.second[:d_price] || 0
csv_data << design_data.second[:imp] || 0
csv_data << design_data.second[:d_odr] || 0
csv << csv_data
csv_data = []
end
end
Now you are having design_performance_data_temp.csv file saved in your corresponding directory.
Above code can further be optimized.

Resources