Filtering into a new CSV with headers in Ruby? - ruby

I have a CSV with a basic list of people, their genders, and ages, and corresponding headers:
"First Name","Age","Gender"
"Adam",31,"Male"
"Bruce",36,"Male"
"Lawrence",34,"Male"
"James",32,"Male"
"Elyse",30,"Female"
"Matt",32,"Male"
I'd like to open this CSV in Ruby, go through line by line, and append all male members to a new CSV with the same headers, and save this CSV to a new file.
My code right now (which is not working)
require 'csv'
file = 'cast.csv'
new_cast = CSV.new(:headers => CSV.read(file, :headers => :true).headers)
CSV.foreach(file, :headers => :true, :header_converters => :symbol) do |row|
if row[:gender] == 'Male'
new_cast.add_row(row)
end
end
File.open('new_cast.csv', 'w') do |f|
f.write(new_cast)
end
The error message I am receiving:
/usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/csv.rb:1692:in `<<': undefined method `<<' for {:headers=>["First Name", "Age", "Gender"]}:Hash (NoMethodError)
Did you mean? <
from csv.rb:8:in `block in <main>'
from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/csv.rb:1748:in `each'
from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/csv.rb:1131:in `block in foreach'
from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/csv.rb:1282:in `open'
from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/csv.rb:1130:in `foreach'
from csv.rb:6:in `<main>'
So, it seems like I'm doing something pretty wrong. What would be the simplest way to do this?

CSV#new takes a "string or IO object" as its first argument, and an optional hash as its second, per the docs.
So it looks like the error is actually caused by this line:
new_cast = CSV.new(:headers => CSV.read(file, :headers => :true).headers)
which should be
new_cast = CSV.new("", :headers => CSV.read(file, :headers => :true).headers)
Note the empty string.
But even with that, this won't write the new CSV. For that, I think you want to write_headers in your new CSV, and then rewind it before writing, exposing the underlying IO object.
require 'csv'
file = 'cast.csv'
new_cast = CSV.new("", :headers => CSV.read(file, :headers => :true).headers, write_headers: true)
CSV.foreach(file, :headers => :true, :header_converters => :symbol) do |row|
if row[:gender] == 'Male'
new_cast.add_row(row)
end
end
CSV.open('new_cast.csv', 'w') do |csv|
new_cast.rewind
new_cast.each {|row| csv << row}
end
Hope that helps!

Related

Parsing remote csv file : No such file or directory # rb_sysopen

I am trying to parse a csv file hosted remotely. I user rails 6 and active storage. The file is stored on the ImportJob model. Its url can be accessed this way :
ImportJob.last.csv_file.url
the file does exist and is downloadable : http://res.cloudinary.com/dockcyr0z/raw/upload/rghn3zi2190nmc28qwbtr24apqxe.csv
However when trying to parse it
CSV.foreach(url, headers: true, header_converters: :symbol, col_sep: ';') do |row|
puts row
end
Im getting Errno::ENOENT: No such file or directory # rb_sysopen - http://res.cloudinary.com/dockcyr0z/raw/upload/rghn3zi2190nmc28qwbtr24apqxe.csv
same thing if I try to open the file first : open(url)
Why am I getting this error ? How can I parse this remote csv file ?
Open url with URI.parse and change CSV.foreach to CSV.parse
CSV.parse(URI.parse(url).read, headers: true, header_converters: :symbol, col_sep: ';') do |row|
puts row
end
# output
{
:first_name => "Souper",
:last_name => "Man",
:email => "dageismar+learner233#gmail.com",
:role => "CEO",
:tags => "sales,marketing",
:avatar_url => "http://res.cloudinary.com/dockcyr0z/image/upload/x3f65o5mepbdhi4fwvww99gjqr7p"
}
{
:first_name => "Gentil",
:last_name => "Keum",
:email => "dageismar+learner234#gmail.com",
:role => "CEO",
:tags => "sales,marketing",
:avatar_url => "http://res.cloudinary.com/dockcyr0z/image/upload/x3f65o5mepbdhi4fwvww99gjqr7p"
}
Update:
Or as Stefan suggests just URI.open(url) instead of URI.parse(url).read

Ruby Read and Write CSV with Quotes

I'd like to read in a csv row, update one field then output the row again with quotes.
Row Example Input => "Joe", "Blow", "joe#blow.com"
Desired Row Example Output => "Joe", "Blow", "xxxx#xxxx.xxx"
My script below outputs => Joe, Blow, xxxx#xxxx.xxx
It loses the double quotes which I want to retain.
I've tried various options but no joy so far .. any tips?
Many thanks!
require 'csv'
CSV.foreach('transactions.csv',
:quote_char=>'"',
:col_sep =>",",
:headers => true,
:header_converters => :symbol ) do |row|
row[:customer_email] = 'xxxx#xxxx.xxx'
puts row
end
Quotes in CSV fields are usually unnecessary, unless the field itself contains a delimiter or a newline character. But you can force the CSV file to always use quotes. For that, you need to set force_quotes => true:
CSV.foreach('transactions.csv',
:quote_char=>'"',
:col_sep =>",",
:headers => true,
:force_quotes => true,
:header_converters => :symbol ) do |row|
You can manually add them to all your items
Hash[row.map { |k,v| [k,"\"#{v}\""] }]
(edited because I forgot you had a hash and not an array)
Thanks Justin L.
Built on your solution and ended up with this.
I get the feeling Ruby has something more elegant but this does what I need:
require 'csv'
CSV.foreach('trans.csv',
:quote_char=>'"',
:col_sep =>",",
:headers => true,
:header_converters => :symbol ) do |row|
row[:customer_email] = 'xxxx#xxxx.xxx'
row = Hash[row.map { |k,v| [k,"\"#{v}\""] }]
new_row = ""
row.each_with_index do | (k, v) ,i|
new_row += v.to_s
if i != row.length - 1
new_row += ','
end
end
puts new_row
end

Ruby CSV input value format

I'm using ruby CSV module to read in a csv file.
One of the values inside the csv file is in format is XXX_XXXXX where X are number. I treat this value as string, actually, but CSV module is reading in these values as XXXXXXXX, as numbers, which I do not want.
Options I am currently using
f = CSV.read('file.csv', {:headers => true, :header_converters => :symbol, :converters => :all} )
Is there a way to tell CSV to not do that?
f = CSV.read('file.csv', {:headers => true, :header_converters => :symbol)}
Leave out the :converters => :all; that one tries (amongst others) to convert all numerical looking strings to numbers.
The :convertors => all causes this, try the following
require "csv"
CSV.parse(DATA, :col_sep => ",", :headers => true, :converters => :all).each do |row|
puts row["numfield"]
end
__END__
textfield,datetimefield,numfield
foo,2008-07-01 17:50:55.004688,123_45678
bar,2008-07-02 17:50:55.004688,234_56789
# gives
# 12345678
# 23456789
and
CSV.parse(DATA, :col_sep => ",", :headers => true).each do |row|
puts row["numfield"]
end
__END__
textfield,datetimefield,numfield
foo,2008-07-01 17:50:55.004688,123_45678
bar,2008-07-02 17:50:55.004688,234_56789
# gives
# 123_45678
# 234_56789

Create a new Ruby CSV object with headers in a single csv.new() line

I'm trying to create a new CSV object with only the header row in it, but the headers are not set until I call read():
[32] pry(main)> c = CSV.new("Keyword,Index,Page,Index in Page,Type,Title,URL", :headers => :first_row, :write_headers => true, :return_headers => true)
=> <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"" headers:true>
[33] pry(main)> c.headers
=> true
[34] pry(main)> c.read
=> #<CSV::Table mode:col_or_row row_count:1>
[35] pry(main)> c.headers
=> ["Keyword", "Index", "Page", "Index in Page", "Type", "Title", "URL"]
Why is that? Why can't I get a properly working CSV object with my single CSV.new line?
As the documentation will tell you it's treating the string as if it were the contents of a file (i.e. StringIO) so you still have to read the string just as you would any other IO source.
If you want to set the headers explicitly, you pass an array as the :headers parameter.
There does not appear to be a way to do this in one call but you can easily remedy that with a custom method of your own:
Given:
def new_csv(headers, data)
csv = CSV.new(data, headers: headers, write_headers: true, return_headers: true)
csv.read
csv
end
You can call use it as:
csv = new_csv("Header 1, Header 2", "abc,def")
=> <#CSV io_type:StringIO encoding:UTF-8 lineno:1 col_sep:"," row_sep:"\n" quote_char:"\"" headers:["abc", "def"]>
csv.headers
=> ["Header 1", "Header 2"]
Hope that helps.

Replacing text in one CSV column using FasterCSV

Being relatively new to Ruby, I am trying to figure out how to do the following using FasterCSV:
Open a CSV file, pick a column by its header, in this column only replace all occurrences of string x with y, write out the new file to STDOUT.
The following code almost works:
filename = ARGV[0]
csv = FCSV.read(filename, :headers => true, :header_converters => :symbol, :return_headers => true, :encoding => 'u')
mycol = csv[:mycol]
# construct a mycol_new by iterating over mycol and doing some string replacement
puts csv[:mycol][0] # produces "MyCol" as expected
puts mycol_new[0] # produces "MyCol" as expected
csv[:mycol] = mycol_new
puts csv[:mycol][0] # produces "mycol" while "MyCol" is expected
csv.each do |r|
puts r.to_csv(:force_quotes => true)
end
The only problem is that there is a header conversion where I do not expect it. If the header of the chosen column is "MyCol" before the substitution of the columns in the csv table it is "mycol" afterwards (see comments in the code). Why does this happen? And how to avoid it? Thanks.
There's a couple of things you can change in the initialization line that will help. Change:
csv = FCSV.read(filename, :headers => true, :return_headers => true, :encoding => 'u')
to:
csv = FCSV.read(filename, :headers => true, :encoding => 'u')
I'm using CSV, which is FasterCSV only it's part of Ruby 1.9. This will create a CSV file in the current directory called "temp.csv" with a modified 'FName' field:
require 'csv'
data = "ID,FName,LName\n1,mickey,mouse\n2,minnie,mouse\n3,donald,duck\n"
# read and parse the data
csv_in = CSV.new(data, :headers => true)
# open the temp file
CSV.open('./temp.csv', 'w') do |csv_out|
# output the headers embedded in the object, then rewind to the start of the list
csv_out << csv_in.first.headers
csv_in.rewind
# loop over the rows
csv_in.each do |row|
# munge the first name
if (row['FName']['mi'])
row['FName'] = row['FName'][1 .. -1] << '-' << row['FName'][0] << 'ay'
end
# output the record
csv_out << row.fields
end
end
The output looks like:
ID,FName,LName
1,ickey-may,mouse
2,innie-may,mouse
3,donald,duck
It is possible to manipulate the desired column directly in the FasterCSV object instead of creating a new column and then trying to replace the old one with the new one.
csv = FCSV.read(filename, :headers => true, :header_converters => :symbol, :return_headers => true, :encoding => 'u')
mycol = csv[:my_col]
mycol.each do |row|
row.gsub!(/\s*;\s*/,"///") unless row.nil? # or any other substitution
csv.each do |r|
puts r.to_csv(:force_quotes => true)
end

Resources