Loading data from a CSV with header in Ruby - ruby

I want to load a CSV file and create objects based on the data. The file has the following structure:
product code;product name;product price;units
RTF0145;Mixer;659;15
GTF4895;PC;9999;25
While loading the data I want to skip the first row with headers but I have a trouble using the {:headers => true} attribute, the method does nothing, no error is raised.
def Store.load_data(file, separator, headers = true)
begin
#items = []
CSV.open(file, "r", {:col_sep => separator}, {:headers => headers}) do |csv|
csv.each do |product|
#items << Store.new(product["product code"], product["product name"], product["price"], product["units"])
end
end
rescue
end
end
I call the method like this:
Store.load_data("products.csv", ";")
If I use it without the headers argument everything works as expected:
def Store.load_data(file, separator, headers = true)
begin
#items = []
CSV.foreach(file, { :col_sep => separator }) do |row|
#items << Store.new(row[0], row[1], row[2], row[3]) unless row[0] == "product code"
end
rescue
end
end

The signature for the CSV.open method is:
CSV.open(filename, mode = "rb", options = Hash.new)
Therefore calling it like this:
CSV.open(file, "r", { ... }, { ... })
...is incorrect, and Ruby should throw an exception when you do that.
The correct way to call CSV.open with a Hash is either:
CSV.open(file, "r", { :a => b, :c => d, :e => f })
Or:
CSV.open(file, "r", :a => b, :c => d, :e => f)
So, to fix your problem the solution should be to change this:
CSV.open(file, "r", { :col_sep => separator }, { :headers => headers })
To this:
CSV.open(file, "r", { :col_sep => separator, :headers => headers })

Related

Access to merged cells using Ruby-Roo

According to example below: Value is stored only in A1, other cells return nil.
How is possible to get the A1'a value from the others merged cells, or simply check range of the A1 cell?
here is my take, if all merged fields are same as prev - then non-merged fields should become array
xlsx = Roo::Excelx.new(__dir__ + "/output.xlsx", { expand_merged_ranges: true })
parsed = xlsx.sheet(0).parse(headers: true).drop(1)
parsed_merged = []
.tap do |parsed_merged|
parsed.each do |x|
if parsed_merged.empty?
parsed_merged << {
"field_non_merged1" => x["field_non_merged1"],
"field_merged1" => [x["field_merged1"]],
"field_merged2" => [x["field_merged2"]],
"field_merged3" => [x["field_merged3"]],
"field_merged4" => [x["field_merged4"]],
"field_non_merged2" => x["field_non_merged2"],
"field_non_merged3" => x["field_non_merged3"],
}
else
field_merged1_is_same_as_prev = x["field_non_merged1"] == parsed_merged.last["field_non_merged1"]
field_merged2_is_same_as_prev = x["field_non_merged2"] == parsed_merged.last["field_non_merged2"]
field_merged3_is_same_as_prev = x["field_non_merged3"] == parsed_merged.last["field_non_merged3"]
merged_rows_are_all_same_as_prev = field_non_merged1_is_same_as_prev && field_merged2_is_same_as_prev && field_merged3_is_same_as_prev
if merged_rows_are_all_same_as_prev
parsed_merged.last["field_merged1"].push x["field_merged1"]
parsed_merged.last["field_merged2"].push x["field_merged2"]
parsed_merged.last["field_merged3"].push x["field_merged3"]
parsed_merged.last["field_merged4"].push x["field_merged4"]
else
parsed_merged << {
"field_non_merged1" => x["field_non_merged1"],
"field_merged1" => [x["field_merged1"]],
"field_merged2" => [x["field_merged2"]],
"field_merged3" => [x["field_merged3"]],
"field_merged4" => [x["field_merged4"]],
"field_non_merged2" => x["field_non_merged2"],
"field_non_merged3" => x["field_non_merged3"],
}
end
end
end
end
.map do |x|
{
"field_non_merged1" => x["field_non_merged1"],
"field_merged1" => x["field_merged1"].compact.uniq,
"field_merged2" => x["field_merged2"].compact.uniq,
"field_merged3" => x["field_merged3"].compact.uniq,
"field_merged4" => x["field_merged4"].compact.uniq,
"field_non_merged2" => x["field_non_merged2"],
"field_non_merged3" => x["field_non_merged3"],
}
end
This is not possible without first assigning the value to all the cells of the range, even in Excel VBA this is the case.
See this sample
require 'axlsx'
p = Axlsx::Package.new
wb = p.workbook
wb.add_worksheet(:name => "Basic Worksheet") do |sheet|
sheet.add_row ["Val", nil]
sheet.add_row [nil, nil]
merged = sheet.merge_cells('A1:B2')
p sheet.rows[0].cells[0].value # "Val"
p sheet.rows[0].cells[1].value # nil
sheet[*merged].each{|cell|cell.value = sheet[*merged].first.value}
p sheet.rows[0].cells[0].value # "Val"
p sheet.rows[0].cells[1].value # "Val"
end
p.serialize('./simple.xlsx')
Please add a sample yourself next time so that we see which gem you used, which code, error etc.

Ruby Read and Write CSV with Quotes

I'd like to read in a csv row, update one field then output the row again with quotes.
Row Example Input => "Joe", "Blow", "joe#blow.com"
Desired Row Example Output => "Joe", "Blow", "xxxx#xxxx.xxx"
My script below outputs => Joe, Blow, xxxx#xxxx.xxx
It loses the double quotes which I want to retain.
I've tried various options but no joy so far .. any tips?
Many thanks!
require 'csv'
CSV.foreach('transactions.csv',
:quote_char=>'"',
:col_sep =>",",
:headers => true,
:header_converters => :symbol ) do |row|
row[:customer_email] = 'xxxx#xxxx.xxx'
puts row
end
Quotes in CSV fields are usually unnecessary, unless the field itself contains a delimiter or a newline character. But you can force the CSV file to always use quotes. For that, you need to set force_quotes => true:
CSV.foreach('transactions.csv',
:quote_char=>'"',
:col_sep =>",",
:headers => true,
:force_quotes => true,
:header_converters => :symbol ) do |row|
You can manually add them to all your items
Hash[row.map { |k,v| [k,"\"#{v}\""] }]
(edited because I forgot you had a hash and not an array)
Thanks Justin L.
Built on your solution and ended up with this.
I get the feeling Ruby has something more elegant but this does what I need:
require 'csv'
CSV.foreach('trans.csv',
:quote_char=>'"',
:col_sep =>",",
:headers => true,
:header_converters => :symbol ) do |row|
row[:customer_email] = 'xxxx#xxxx.xxx'
row = Hash[row.map { |k,v| [k,"\"#{v}\""] }]
new_row = ""
row.each_with_index do | (k, v) ,i|
new_row += v.to_s
if i != row.length - 1
new_row += ','
end
end
puts new_row
end

Create a new Ruby CSV object with headers in a single csv.new() line

I'm trying to create a new CSV object with only the header row in it, but the headers are not set until I call read():
[32] pry(main)> c = CSV.new("Keyword,Index,Page,Index in Page,Type,Title,URL", :headers => :first_row, :write_headers => true, :return_headers => true)
=> <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"" headers:true>
[33] pry(main)> c.headers
=> true
[34] pry(main)> c.read
=> #<CSV::Table mode:col_or_row row_count:1>
[35] pry(main)> c.headers
=> ["Keyword", "Index", "Page", "Index in Page", "Type", "Title", "URL"]
Why is that? Why can't I get a properly working CSV object with my single CSV.new line?
As the documentation will tell you it's treating the string as if it were the contents of a file (i.e. StringIO) so you still have to read the string just as you would any other IO source.
If you want to set the headers explicitly, you pass an array as the :headers parameter.
There does not appear to be a way to do this in one call but you can easily remedy that with a custom method of your own:
Given:
def new_csv(headers, data)
csv = CSV.new(data, headers: headers, write_headers: true, return_headers: true)
csv.read
csv
end
You can call use it as:
csv = new_csv("Header 1, Header 2", "abc,def")
=> <#CSV io_type:StringIO encoding:UTF-8 lineno:1 col_sep:"," row_sep:"\n" quote_char:"\"" headers:["abc", "def"]>
csv.headers
=> ["Header 1", "Header 2"]
Hope that helps.

How to save a hash into a CSV

I am new in ruby so please forgive the noobishness.
I have a CSV with two columns. One for animal name and one for animal type.
I have a hash with all the keys being animal names and the values being animal type. I would like to write the hash to the CSV without using fasterCSV. I have thought of several ideas what would be easiest.. here is the basic layout.
require "csv"
def write_file
h = { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
CSV.open("data.csv", "wb") do |csv|
csv << [???????????]
end
end
When I opened the file to read from it I opened it File.open("blabla.csv", headers: true)
Would it be possible to write back to the file the same way?
If you want column headers and you have multiple hashes:
require 'csv'
hashes = [{'a' => 'aaaa', 'b' => 'bbbb'}]
column_names = hashes.first.keys
s=CSV.generate do |csv|
csv << column_names
hashes.each do |x|
csv << x.values
end
end
File.write('the_file.csv', s)
(tested on Ruby 1.9.3-p429)
Try this:
require 'csv'
h = { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
CSV.open("data.csv", "wb") {|csv| h.to_a.each {|elem| csv << elem} }
Will result:
1.9.2-p290:~$ cat data.csv
dog,canine
cat,feline
donkey,asinine
I think the simplest solution to your original question:
def write_file
h = { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
CSV.open("data.csv", "w", headers: h.keys) do |csv|
csv << h.values
end
end
With multiple hashes that all share the same keys:
def write_file
hashes = [ { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' },
{ 'dog' => 'rover', 'cat' => 'kitty', 'donkey' => 'ass' } ]
CSV.open("data.csv", "w", headers: hashes.first.keys) do |csv|
hashes.each do |h|
csv << h.values
end
end
end
CSV can take a hash in any order, exclude elements, and omit a params not in the HEADERS
require "csv"
HEADERS = [
'dog',
'cat',
'donkey'
]
def write_file
CSV.open("data.csv", "wb", :headers => HEADERS, :write_headers => true) do |csv|
csv << { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine' }
csv << { 'dog' => 'canine'}
csv << { 'cat' => 'feline', 'dog' => 'canine', 'donkey' => 'asinine' }
csv << { 'dog' => 'canine', 'cat' => 'feline', 'donkey' => 'asinine', 'header not provided in the options to #open' => 'not included in output' }
end
end
write_file # =>
# dog,cat,donkey
# canine,feline,asinine
# canine,,
# canine,feline,asinine
# canine,feline,asinine
This makes working with the CSV class more flexible and readable.
I tried the solutions here but got an incorrect result (values in wrong columns) since my source is a LDIF file that not always has all the values for a key. I ended up using the following.
First, when building up the hash I remember the keys in a separate array which I extend with the keys that are not allready there.
# building up the array of hashes
File.read(ARGV[0]).each_line do |lijn|
case
when lijn[0..2] == "dn:" # new record
record = {}
when lijn.chomp == '' # end record
if record['telephonenumber'] # valid record ?
hashes << record
keys = keys.concat(record.keys).uniq
end
when ...
end
end
The important line here is keys = keys.concat(record.keys).uniq which extends the array of keys when new keys (headers) are found.
Now the most important: converting our hashes to a CSV
CSV.open("export.csv", "w", {headers: keys, col_sep: ";"}) do |row|
row << keys # add the headers
hashes.each do |hash|
row << hash # the whole hash, not just the array of values
end
end
[BEWARE] All the answers in this thread are assuming that the order of the keys defined in the hash will be constant amongst all rows.
To prevent problems (that I am facing right now) where some values are assigned to the wrong keys in the csv (Ex:)
hahes = [
{:cola => "hello", :colb => "bye"},
{:colb => "bye", :cola => "hello"}
]
producing the following table using the code from the majority (including best answer) of the answers on this thread:
cola | colb
-------------
hello | bye
-------------
bye | hello
You should do this instead:
require "csv"
csv_rows = [
{:cola => "hello", :colb => "bye"},
{:colb => "bye", :cola => "hello"}
]
column_names = csv_rows.first.keys
s=CSV.generate do |csv|
csv << column_names
csv_rows.each do |row|
csv << column_names.map{|column_name| row[column_name]} #To be explicit
end
end
Try this:
require 'csv'
data = { 'one' => '1', 'two' => '2', 'three' => '3' }
CSV.open("data.csv", "a+") do |csv|
csv << data.keys
csv << data.values
end
Lets we have a hash,
hash_1 = {1=>{:rev=>400, :d_odr=>3}, 2=>{:rev=>4003, :d_price=>300}}
The above hash_1 having keys as some id 1,2,.. and values to those are again hash with some keys as (:rev, :d_odr, :d_price).
Suppose we want a CSV file with headers,
headers = ['Designer_id','Revenue','Discount_price','Impression','Designer ODR']
Then make a new array for each value of hash_1 and insert it in CSV file,
CSV.open("design_performance_data_temp.csv", "w") do |csv|
csv << headers
csv_data = []
result.each do |design_data|
csv_data << design_data.first
csv_data << design_data.second[:rev] || 0
csv_data << design_data.second[:d_price] || 0
csv_data << design_data.second[:imp] || 0
csv_data << design_data.second[:d_odr] || 0
csv << csv_data
csv_data = []
end
end
Now you are having design_performance_data_temp.csv file saved in your corresponding directory.
Above code can further be optimized.

Replacing text in one CSV column using FasterCSV

Being relatively new to Ruby, I am trying to figure out how to do the following using FasterCSV:
Open a CSV file, pick a column by its header, in this column only replace all occurrences of string x with y, write out the new file to STDOUT.
The following code almost works:
filename = ARGV[0]
csv = FCSV.read(filename, :headers => true, :header_converters => :symbol, :return_headers => true, :encoding => 'u')
mycol = csv[:mycol]
# construct a mycol_new by iterating over mycol and doing some string replacement
puts csv[:mycol][0] # produces "MyCol" as expected
puts mycol_new[0] # produces "MyCol" as expected
csv[:mycol] = mycol_new
puts csv[:mycol][0] # produces "mycol" while "MyCol" is expected
csv.each do |r|
puts r.to_csv(:force_quotes => true)
end
The only problem is that there is a header conversion where I do not expect it. If the header of the chosen column is "MyCol" before the substitution of the columns in the csv table it is "mycol" afterwards (see comments in the code). Why does this happen? And how to avoid it? Thanks.
There's a couple of things you can change in the initialization line that will help. Change:
csv = FCSV.read(filename, :headers => true, :return_headers => true, :encoding => 'u')
to:
csv = FCSV.read(filename, :headers => true, :encoding => 'u')
I'm using CSV, which is FasterCSV only it's part of Ruby 1.9. This will create a CSV file in the current directory called "temp.csv" with a modified 'FName' field:
require 'csv'
data = "ID,FName,LName\n1,mickey,mouse\n2,minnie,mouse\n3,donald,duck\n"
# read and parse the data
csv_in = CSV.new(data, :headers => true)
# open the temp file
CSV.open('./temp.csv', 'w') do |csv_out|
# output the headers embedded in the object, then rewind to the start of the list
csv_out << csv_in.first.headers
csv_in.rewind
# loop over the rows
csv_in.each do |row|
# munge the first name
if (row['FName']['mi'])
row['FName'] = row['FName'][1 .. -1] << '-' << row['FName'][0] << 'ay'
end
# output the record
csv_out << row.fields
end
end
The output looks like:
ID,FName,LName
1,ickey-may,mouse
2,innie-may,mouse
3,donald,duck
It is possible to manipulate the desired column directly in the FasterCSV object instead of creating a new column and then trying to replace the old one with the new one.
csv = FCSV.read(filename, :headers => true, :header_converters => :symbol, :return_headers => true, :encoding => 'u')
mycol = csv[:my_col]
mycol.each do |row|
row.gsub!(/\s*;\s*/,"///") unless row.nil? # or any other substitution
csv.each do |r|
puts r.to_csv(:force_quotes => true)
end

Resources