Check if CSV header exists - ruby

I have class Importer for my Rails application in which I use method to import CSV file.
def import
CSV.foreach(file.path, headers: true, encoding: "iso-8859-1") do |row|
mail = row["email"]||row["Email"]||row["e-mail"]||row["E-mail"]||row["mail"]||row["Mail"]
end
end
I set variable mail to perform actions inside a loop, I try to protect it from different names of mail column, but I have no clue how should break a loop and keep a code DRY in case when there is CSV without column with any of defined headers.
EDIT:
def import
header = nil
headers = CSV.open(file.path, encoding: "iso-8859-1") { |csv| csv.first }
headers.each { |e| header = e if e.downcase.gsub('-','')=~/^(|e)mail$/ }
if header != nil
CSV.foreach(file.path, headers: true, encoding: "iso-8859-1") do |row|
mail = row[header]
end
end
end
Solution to the problem

This should get you started. You'll need to change the regexp to match all of your cases.
def import
CSV.foreach(file.path, headers: true, encoding: "iso-8859-1") do |row|
if row.headers.none?{|e| e =~ /email/i}
raise "freak out"
end
end
end
I would also consider setting a variable has_email_headers that you can check since you don't want to have to scan every row's header since they are all the same.

According to the CSV documentation of Ruby 2.5.0 you can also use the return_headers:true to check for the header_row? later in a loop. Here's an example:
data = CSV.read("your.csv", headers: true, return_headers: true)
(0..(data.length-1)).each do |row|
if data[row].header_row? then
p "yes header!"
end
end

Once could also try the header_converters: [:downcase, :symbol] option, to just have to check fewer values (i.e., case insensitive), such as [:email, :mail]:
CSV.foreach(file.path, headers: true, header_converters: [:downcase, :symbol], encoding: "iso-8859-1") do |row|
puts 'You are missing the "email" header!' unless [:email, :mail].all? { |header| row.headers.include? header }
# refine/refactor as necessary...
# do rest of function...
end
Documentation on :header_converters.

Related

How to "observe" a stream in Ruby's CSV module?

I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.

How to access block parameters using Object.send

I'm trying to run the following code:
class RentLimit < ActiveRecord::Base
def self.load_data
rows = CSV.open("csvs/income_limits_2011_to_2015.csv").read
rows.shift
rows.each do |county, yr, date, _50pct_1br, _50pct_2br, _50pct_3br, _50pct_4br, _60pct_1br, _60pct_2br, _60pct_3br, _60pct_4br|
[50, 60].each do |ami|
[1, 2, 3, 4].each do |br|
r = new
r.county = county
r.state = "SC"
r.year = yr
r.effective_date = Date.parse(date)
r.pct_ami = ami
r.br = br
r.max_rent = self.send("_#{ami}pct_#{br}br".to_sym)
r.save
end#of brs
end# of amis
end# of rows
end
end
but am getting this error message when trying to run it:
NoMethodError: undefined method `_50pct_1br' for #<Class:0x007fe942ce3b18>
The send method isn't able to access those block parameters inside of the scope. Is there any way to give access to block parameters to send? If not, how else might I dynamically access block parameters?
How do I use send or its equivalent to access block parameters in Ruby?
This is much easier if you tell CSV.open what your column names are. It looks like your CSV file might have a header row that you're skipping with rows.shift, in which case you shouldn't skip it, and use the headers: true option. Then you can access each field by name with row["field_name"] or, in your case, row["_#{ami}pct_#{br}br"]:
CSV_PATH = "csvs/income_limits_2011_to_2015.csv"
DEFAULT_STATE = "SC"
def self.load_data
CSV.open(CSV_PATH, 'r', headers: true) do |csv|
csv.each do |row|
max_rent = row["_#{ami}pct_#{br}br"]
create(
county: row["county"],
state: DEFAULT_STATE,
year: row["yr"],
effective_date: Date.parse(row["date"]),
pct_ami: ami,
br: br,
max_rent: max_rent,
)
end
end
end
Note that I used CSV.open with a block to ensure that the file is closed after it's been read, which your original code wasn't doing. I also used create instead of new; ... save, since the latter is needlessly verbose.
If you're skipping the first row for some other reason, or you want to use field names other than those in the header row, you can set the options return_headers: false, headers: names, where names is an array of names, e.g.:
CSV_HEADERS = %w[
county yr date _50pct_1br _50pct_2br _50pct_3br _50pct_4br
_60pct_1br _60pct_2br _60pct_3br _60pct_4br
].freeze
def self.load_data
CSV.open(CSV_PATH, 'r', return_headers: false, headers: CSV_HEADERS) do |csv|
# ...
end
end
Finally, since some of your attributes are the same for every object created, I'd move those out of the loop:
def self.load_data
base_attrs = { state: DEFAULT_STATE, pct_ami: ami, br: br }
CSV.open(CSV_PATH, 'r', headers: true) do |csv|
csv.each do |row|
create(base_attrs.merge(
county: row["county"],
year: row["yr"],
effective_date: row["date"],
max_rent: row["_#{ami}pct_#{br}br"]
))
end
end
end

How to remove a row from a CSV with Ruby

Given the following CSV file, how would you remove all rows that contain the word 'true' in the column 'foo'?
Date,foo,bar
2014/10/31,true,derp
2014/10/31,false,derp
I have a working solution, however it requires making a secondary CSV object csv_no_foo
#csv = CSV.read(#csvfile, headers: true) #http://bit.ly/1mSlqfA
#headers = CSV.open(#csvfile,'r', :headers => true).read.headers
# Make a new CSV
#csv_no_foo = CSV.new(#headers)
#csv.each do |row|
# puts row[5]
if row[#headersHash['foo']] == 'false'
#csv_no_foo.add_row(row)
else
puts "not pushing row #{row}"
end
end
Ideally, I would just remove the offending row from the CSV like so:
...
if row[#headersHash['foo']] == 'false'
#csv.delete(true) #Doesn't work
...
Looking at the ruby documentation, it looks like the row class has a delete_if function. I'm confused on the syntax that that function requires. Is there a way to remove the row without making a new csv object?
http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Row.html#method-i-each
You should be able to use CSV::Table#delete_if, but you need to use CSV::table instead of CSV::read, because the former will give you a CSV::Table object, whereas the latter results in an Array of Arrays. Be aware that this setting will also convert the headers to symbols.
table = CSV.table(#csvfile)
table.delete_if do |row|
row[:foo] == 'true'
end
File.open(#csvfile, 'w') do |f|
f.write(table.to_csv)
end
You might want to filter rows in a ruby manner:
require 'csv'
csv = CSV.parse(File.read(#csvfile), {
:col_sep => ",",
:headers => true
}
).collect { |item| item[:foo] != 'true' }
Hope it help.

Removing whitespaces in a CSV file

I have a string with extra whitespace:
First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
I want to parse this line and remove the whitespaces.
My code looks like:
namespace :db do
task :populate_contacts_csv => :environment do
require 'csv'
csv_text = File.read('file_upload_example.csv')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
end
end
#prices = CSV.parse(IO.read('prices.csv'), :headers=>true,
:header_converters=> lambda {|f| f.strip},
:converters=> lambda {|f| f ? f.strip : nil})
The nil test is added to the row but not header converters assuming that the headers are never nil, while the data might be, and nil doesn't have a strip method. I'm really surprised that, AFAIK, :strip is not a pre-defined converter!
You can strip your hash first:
csv.each do |unstriped_row|
row = {}
unstriped_row.each{|k, v| row[k.strip] = v.strip}
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Edited to strip hash keys too
CSV supports "converters" for the headers and fields, which let you get inside the data before it's passed to your each loop.
Writing a sample CSV file:
csv = "First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
first,last,email ,mobile phone ,company,title ,street,city,state,zip,country, birthday,gender ,contact type
"
File.write('file_upload_example.csv', csv)
Here's how I'd do it:
require 'csv'
csv = CSV.open('file_upload_example.csv', :headers => true)
[:convert, :header_convert].each { |c| csv.send(c) { |f| f.strip } }
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Which outputs:
First Name: 'first'
Last Name: 'last'
Email: 'email'
The converters simply strip leading and trailing whitespace from each header and each field as they're read from the file.
Also, as a programming design choice, don't read your file into memory using:
csv_text = File.read('file_upload_example.csv')
Then parse it:
csv = CSV.parse(csv_text, :headers => true)
Then loop over it:
csv.each do |row|
Ruby's IO system supports "enumerating" over a file, line by line. Once my code does CSV.open the file is readable and the each reads each line. The entire file doesn't need to be in memory at once, which isn't scalable (though on new machines it's becoming a lot more reasonable), and, if you test, you'll find that reading a file using each is extremely fast, probably equally fast as reading it, parsing it then iterating over the parsed file.

Ruby - remove columns from csv file and convert to pipe delimited txt file

I'm trying to take a CSV file, strip a few columns, and then output a pipe delimited text file.
Here's my code, which almost works. The only problem is the CSV.generate block is adding double quotes around the whole thing, as well as a random comma with double quotes around it where the line break is.
require 'csv'
original = CSV.read('original.csv', { headers: true, return_headers: true })
original.delete('Column header 1')
original.delete('Column header 2')
original.delete('Column header 3')
csv_string = CSV.generate do |csv|
csv << original
end
pipe_string = csv_string.tr(",","|")
File.open('output.txt', 'w+') do |f|
f.write(pipe_string)
end
Is there a better way to do this? Any help is appreciated.
Try this:
require 'csv'
original = CSV.read('original.csv', { headers: true, return_headers: true })
original.delete('Column header 1')
original.delete('Column header 2')
original.delete('Column header 3')
CSV.open('output.txt', 'w', col_sep: '|') do |csv|
original.each do |row|
csv << row
end
end

Resources