Ruby CSV parsing string with escaped quotes - ruby

I have a line in my CSV file that has some escaped quotes:
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
When I try to parse it the the Ruby CSV parser:
require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
puts row
end
I get this error:
.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)
How can I get around this error?

The \" is typical Unix whereas Ruby CSV expects ""
To parse it:
require 'csv'
text = File.read('test.csv').gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
puts row
end
Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.
Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:
gsub(/(?<!\\)\\"/,'""')

CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaces on all fields in a row.
Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.
This is my sample CSV file:
ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
Preserving your CSV.foreach method, this is my example code for parsing it without CSV getting mad:
require 'csv'
require 'pp'
header = []
File.foreach('test.csv') do |csv_line|
row = CSV.parse(csv_line.gsub('\"', '""')).first
if header.empty?
header = row.map(&:to_sym)
next
end
row = Hash[header.zip(row)]
pp row
puts row[:Name]
end
And the resulting hash and name value:
{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto
I assumed you were wanting a hash back because you specified the :headers flag:
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|

Open the file in MSExcel and save as MS-DOS Comma Separated(.csv)

Related

Ruby CSV: How can I read a tab-delimited file?

CSV.open(name, "r").each do |row|
puts row
end
And I get the following error:
CSV::MalformedCSVError Unquoted fields do not allow \r or \n
The name of the file is a .txt tab-delimited file. I made it specifically. I have a .csv file, I went to excel, and saved the file as .txt tab delimited. So it is tab delimited.
Shouldn't CSV.open be able to read tab-delimited files?
Try specifying the field delimiter like this:
CSV.open("name", "r", { :col_sep => "\t" }).each do |row|
puts row
end
And remember to require 'csv' and read the DOCS
By default CSV uses the comma as separator, this comes from the fact that CSV stands for 'Comma Separated Values'.
If you want a different separator (in this case tabs) you need to make it explicit.
Example:
p CSV.new("aaa\tbbb\tccc\nddd\teee", col_sep: "\t").read
Relevant documentation: http://ruby-doc.org/stdlib-2.1.0/libdoc/csv/rdoc/CSV.html#new
as an alternative to CSV, you can also use smarter_csv like this:
require 'smarter_csv'
data = SmarterCSV.process(filename, col_sep: "\t")
If you use smarter_csv >= 1.4.2, you can also do this:
require 'smarter_csv'
data = SmarterCSV.process(filename, col_sep: :auto)
SmarterCSV will return an array of hashes, and can do batch processing

Ruby equivalent to Python's DictWriter?

I have a Ruby script that goes through a CSV, determines some information, and then puts out a resulting CSV file. In Python, I'm able to open both my source file and my results file with DictReader and DictWriter respectively and write rows as dictionaries, where keys are the file header values. It doesn't appear that there is a manageable way to do this in Ruby, but I'm hoping somebody can point me to a better solution than storing all of my result hashes in an array and writing them after the fact.
The standard library "CSV" gives rows hash-like behavior when headers are enabled.
require 'csv'
CSV.open("file.csv", "wb") do |csv_out|
CSV.foreach("test.csv", headers: true) do |row|
row["header2"].upcase! # hashlike behaviour
row["new_header"] = 12 # add a new column
csv_out << row
end
end
(test.csv has a header1, a header2 and some random comma separated string lines.)

CSV.generate and converters?

I'm trying to create a converter to remove newline characters from CSV output.
I've got:
nonewline=lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
I've verified that this works properly IF I load a variable and then run something like:
csv=CSV(variable,:converters=>[nonewline])
However, I'm attempting to use this code to update a bunch of preexisting code using CSV.generate, and it does not appear to work at all.
CSV.generate(:converters=>[nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
returns:
"\"hello\ngoodbye\"\n"
I've tried quite a few things as well as trying other examples I've found online, and it appears as though :converters has no effect when used with CSV.generate.
Is this correct, or is there something I'm missing?
You need to write your converter as as below :
CSV::Converters[:nonewline] = lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
Then do :
CSV.generate(:converters => [:nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
Read the documentation Converters .
Okay, above part I didn't remove, as to show you how to write the custom CSV converters. The way you wrote it is incorrect.
Read the documentation of CSV::generate
This method wraps a String you provide, or an empty default String, in a CSV object which is passed to the provided block. You can use the block to append CSV rows to the String and when the block exits, the final String will be returned.
After reading the docs, it is quite clear that this method is for writing to a csv file, not for reading. Now all the converters options ( like :converters, :header_converters) is applied, when you are reading a CSV file, but not applied when you are writing into a CSV file.
Let me show you 2 examples to illustrate this more clearly.
require 'csv'
string = <<_
foo,bar
baz,quack
_
File.write('a',string)
CSV::Converters[:upcase] = lambda do |s|
s.upcase
end
I am reading from a CSV file, so :converters option is applied to it.
CSV.open('a','r',:converters => :upcase) do |csv|
puts csv.read
end
output
# >> FOO
# >> BAR
# >> BAZ
# >> QUACK
Now I am writing into the CSV file, converters option is not applied.
CSV.open('a','w',:converters => :upcase) do |csv|
csv << ['dog','cat']
end
CSV.read('a') # => [["dog", "cat"]]
Attempting to remove newlines using :converters did not work.
I had to override the << method from csv.rb adding the following code to it:
# Change all CR/NL's into one space
row.map! { |element|
if element.is_a?(String)
element.gsub(/(\r?\n)+/,' ')
else
element
end
}
Placed right before
output = row.map(&#quote).join(#col_sep) + #row_sep # quote and separate
at line 21.
I would think this would be a good patch to CSV, as newlines will always produce bad CSV output.

Removing whitespaces in a CSV file

I have a string with extra whitespace:
First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
I want to parse this line and remove the whitespaces.
My code looks like:
namespace :db do
task :populate_contacts_csv => :environment do
require 'csv'
csv_text = File.read('file_upload_example.csv')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
end
end
#prices = CSV.parse(IO.read('prices.csv'), :headers=>true,
:header_converters=> lambda {|f| f.strip},
:converters=> lambda {|f| f ? f.strip : nil})
The nil test is added to the row but not header converters assuming that the headers are never nil, while the data might be, and nil doesn't have a strip method. I'm really surprised that, AFAIK, :strip is not a pre-defined converter!
You can strip your hash first:
csv.each do |unstriped_row|
row = {}
unstriped_row.each{|k, v| row[k.strip] = v.strip}
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Edited to strip hash keys too
CSV supports "converters" for the headers and fields, which let you get inside the data before it's passed to your each loop.
Writing a sample CSV file:
csv = "First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
first,last,email ,mobile phone ,company,title ,street,city,state,zip,country, birthday,gender ,contact type
"
File.write('file_upload_example.csv', csv)
Here's how I'd do it:
require 'csv'
csv = CSV.open('file_upload_example.csv', :headers => true)
[:convert, :header_convert].each { |c| csv.send(c) { |f| f.strip } }
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Which outputs:
First Name: 'first'
Last Name: 'last'
Email: 'email'
The converters simply strip leading and trailing whitespace from each header and each field as they're read from the file.
Also, as a programming design choice, don't read your file into memory using:
csv_text = File.read('file_upload_example.csv')
Then parse it:
csv = CSV.parse(csv_text, :headers => true)
Then loop over it:
csv.each do |row|
Ruby's IO system supports "enumerating" over a file, line by line. Once my code does CSV.open the file is readable and the each reads each line. The entire file doesn't need to be in memory at once, which isn't scalable (though on new machines it's becoming a lot more reasonable), and, if you test, you'll find that reading a file using each is extremely fast, probably equally fast as reading it, parsing it then iterating over the parsed file.

Parse CSV file with header fields as attributes for each row

I would like to parse a CSV file so that each row is treated like an object with the header-row being the names of the attributes in the object. I could write this, but I'm sure its already out there.
Here is my CSV input:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
The code would look something like this:
CSV.open('my_file.csv','r') do |csv_obj|
puts csv_obj.foo #prints 1 the 1st time, "blah" 2nd time, etc
puts csv.bar #prints 2 the first time, 7 the 2nd time, etc
end
With Ruby's CSV module I believe I can only access the fields by index. I think the above code would be a bit more readable. Any ideas?
Using Ruby 1.9 and above, you can get a an indexable object:
CSV.foreach('my_file.csv', :headers => true) do |row|
puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end
It's not dot syntax but it is much nicer to work with than numeric indexes.
As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.
Here is an example of the symbolic syntax using Ruby 1.9. In the examples below, the code reads a CSV file named data.csv from Rails db directory.
:headers => true treats the first row as a header instead of a data row. :header_converters => :symbolize parameter then converts each cell in the header row into Ruby symbol.
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
In Ruby 1.8:
require 'fastercsv'
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
Based on the CSV provided by the Poul (the StackOverflow asker), the output from the example code above will be:
1,2,3
blah,7,blam
4,5,6
Depending on the characters used in the headers of the CSV file, it may be necessary to output the headers in order to see how CSV (FasterCSV) converted the string headers to symbols. You can output the array of headers from within the CSV.foreach.
row.headers
Easy to get a hash in Ruby 2.3:
CSV.foreach('my_file.csv', headers: true, header_converters: :symbol) do |row|
puts row.to_h[:foo]
puts row.to_h[:bar]
end
Although I am pretty late to the discussion, a few months ago I started a "CSV to object mapper" at https://github.com/vicentereig/virgola.
Given your CSV contents, mapping them to an array of FooBar objects is pretty straightforward:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
require 'virgola'
class FooBar
include Virgola
attribute :foo
attribute :bar
attribute :baz
end
csv = <<CSV
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
CSV
foo_bars = FooBar.parse(csv).all
foo_bars.each { |foo_bar| puts foo_bar.foo, foo_bar.bar, foo_bar.baz }
Since I hit this question with some frequency:
array_of_hashmaps = CSV.read("path/to/file.csv", headers: true)
puts array_of_hashmaps.first["foo"] # 1
This is the non-block version, when you want to slurp the whole file.

Resources