Ruby CSV - Illegal quoting in line 1. CSV::MalformedCSVError - ruby

I have a problem with reading from the csv file. File comes from Windows, so I suppose there are some encoding issues. My code looks like this:
CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
CSV.parse(open(doc.file.url), headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n", encoding: 'utf-8').each_with_index do |line, index|
csv << line.headers if index == 0
# do something wiht row
csv << line
end
end
I have to open existing file and complete some columns from it. So I just create new file. The existing file is stored on Dropbox, so I have to use open method.
The problem is that I get an error in this line:
CSV.parse(open(doc.file.url), headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n", encoding: 'utf-8').each_with_index do |line, index|
The error is:
Illegal quoting in line 1. CSV::MalformedCSVError
I check and seems like I don't have BOM characters in the file (not sure if check it right). The problem seems to be in quote character. The exception is thrown for every line in the file.
This is the file that causes me problems: https://dl.dropboxusercontent.com/u/3900955/geo_bez_adresu_10_do_testow_small.csv
I tried different approaches from StackOverflow but nothing helps, for example I changed my code into this:
CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
open(doc.file.url) do |f|
f.each_line do |line|
CSV.parse(line, 'r:bom|utf-8') do |row|
csv << row
end
end
end
end
but it doesn't help. I will be grateful for any help with parsing this file.
======= edit =========
When I safe the same file on Windows with encoding ANSI as UTF-8 (in Notepad++) I can parse the file correctly. From this discussion What is "ANSI as UTF-8" and how can I make fputcsv() generate UTF-8 w/BOM?, it seems like I have BOM in the original file. How I can check in Ruby if my file is with BOM and how I can parse the csv file with BOM ?

CSV.parse() requires a string on its first argument, but you're passing a File object instead. What happens is that parse() gets to parse the expanded value of (file object).to_s instead and it cause the error.
Update
To read file with BOM you can have this:
CSV.new(File.open('file.csv', 'r:bom|utf-8'), col_sep: ';').each do |row|
...
end
Reference: https://stackoverflow.com/a/7780559/445221

I didn't find any way to read directly from remote file, if it contains BOM. So I use Tempfile file to create temporary file and then I do CSV.open with 'r:bom|utf-8':
doc = Document.find(doc_id)
path = "#{Rails.root.join('tmp')}/#{doc.name.split('.').first}_#{Time.now.to_i}.csv"
file = Tempfile.new(["#{doc.name.split('.').first}_#{Time.now.to_i}", '.csv'])
file.binmode
file << open(doc.file.url).read
file.close
CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
CSV.open(file.path, 'r:bom|utf-8', headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n").each_with_index do |line, index|
# do something
end
end
Now, it seems to parse the file.

Related

How to encode csv file in Roo (Rails) : invalid byte sequence in UTF-8

I am trying to upload a csv file but getting invalid byte sequence in UTF-8 error. I am using 'roo' gem.
My code is like this :
def upload_results_csv file
spreadsheet = MyFileUtil.open_file(file)
header = spreadsheet.row(1) # THIS LINE RAISES THE ERROR
(2..spreadsheet.last_row).each do |i|
row = Hash[[header, spreadsheet.row(i)].transpose]
...
...
end
class MyFileUtil
def self.open_file(file)
case File.extname(file.original_filename)
when ".csv" then
Roo::Csv.new(file.path,csv_options: {encoding: Encoding::UTF_8})
when ".xls" then
Roo::Excel.new(file.path, nil, :ignore)
when ".xlsx" then
Roo::Excelx.new(file.path, nil, :ignore)
else
raise "Unknown file type: #{file.original_filename}"
end
end
end.
I don't know how to encode csv file. Please help!
Thanks
To safely convert a string to utf-8 you can do:
str.encode('utf-8', 'binary', invalid: :replace, undef: :replace, replace: '')
also see this blog post.
Since the roo gem will only take filenames as constructor argument, not plain IO objects, the only solution I can think of is to write a sanitized version to a tempfile and pass it to roo, along the lines of
require 'tempfile'
def upload_results_csv file
tmpfile = Tempfile.new(file.path)
tmpfile.write(File.read(file.path).encode('utf-8', 'binary', invalid: :replace, undef: :replace, replace: ''))
tmpfile.rewind
spreadsheet = MyFileUtil.open_file(tmpfile, file.original_filename)
header = spreadsheet.row(1) # THIS LINE RAISES THE ERROR
# ...
ensure
tmpfile.close
tmpfile.unlink
end
You need to alter MyFileUtil as well, because the original filename needs to be passed down:
class MyFileUtil
def self.open_file(file, original_filename)
case File.extname(original_filename)
when ".csv" then
Roo::Csv.new(file.path,csv_options: {encoding: Encoding::UTF_8})
when ".xls" then
Roo::Excel.new(file.path, nil, :ignore)
when ".xlsx" then
Roo::Excelx.new(file.path, nil, :ignore)
else
raise "Unknown file type: #{original_filename}"
end
end
end

Read a csv in ruby with UTF-8 literal

i have this csv file
file data.csv:
data.csv: ASCII text
This file has ~10000 lines with some UTF-8 literal chars.
For example:
1388357672.209253000,48:a2:2d:78:84:10,\xe5\x87\xb6\xe5\xb7\xb4\xe5\xb7\xb4\xe8\x87\xad\xe7\x98\xaa\xe7\x98\xaa\xe7\x9a\x84\xe6\x80\xaa\xe5\x85\xbd\xe5\x87\xba
I iterate over this file in Ruby and save every line in my postgresql db
File.open(filename, "r").each_line do |line|
CSV.parse(line, encoding: 'UTF-8') do |row|
//Save to Postgresql
end
end
I have now the problem that the UTF-8 literal string is saved in the db and not the correct UTF-8 string. I can convert every line with echo -e "line" but this takes much time. Is ther a way that ruby can do this task?
Try this:
CSV.parse(line, encoding: 'UTF-8') do |row|
row = row.map do |elem|
elem.gsub(/\\x../) {|s| [s[2..-1].hex].pack("C")}.force_encoding("UTF-8")
end
//Save to Postgresql
end
Just put each cell in double quotes:
"\xe5\x87\xb6\xe5\xb7\xb4\xe5\xb7\xb4\xe8\x87\xad\xe7\x98\xaa\xe7\x98\xaa\xe7\x9a\x84\xe6\x80\xaa\xe5\x85\xbd\xe5\x87\xba"
=> "凶巴巴臭瘪瘪的怪兽出"

Ruby CSV parsing string with escaped quotes

I have a line in my CSV file that has some escaped quotes:
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
When I try to parse it the the Ruby CSV parser:
require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
puts row
end
I get this error:
.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)
How can I get around this error?
The \" is typical Unix whereas Ruby CSV expects ""
To parse it:
require 'csv'
text = File.read('test.csv').gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
puts row
end
Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.
Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:
gsub(/(?<!\\)\\"/,'""')
CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaces on all fields in a row.
Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.
This is my sample CSV file:
ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
Preserving your CSV.foreach method, this is my example code for parsing it without CSV getting mad:
require 'csv'
require 'pp'
header = []
File.foreach('test.csv') do |csv_line|
row = CSV.parse(csv_line.gsub('\"', '""')).first
if header.empty?
header = row.map(&:to_sym)
next
end
row = Hash[header.zip(row)]
pp row
puts row[:Name]
end
And the resulting hash and name value:
{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto
I assumed you were wanting a hash back because you specified the :headers flag:
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
Open the file in MSExcel and save as MS-DOS Comma Separated(.csv)

Ruby CSV, using square brackets as row separators

I'm trying to use square brackets '[]' as a row separator in a CSV file. I must use this exact format for this project (output needs to match LEDES98 law invoicing format exactly).
I'm trying to do this:
CSV.open('output.txt', 'w', col_sep: '|', row_sep: '[]') do |csv|
#Do Stuff
end
But Ruby won't take row_sep: '[]' and throws this error:
lib/ruby/1.9.1/csv.rb:2309:in `initialize': empty char-class: /[]\z/ (RegexpError)
I've tried escaping the characters with /'s, using double quotes, etc, but nothing has worked yet. What's the way to do this?
The problem is in CSV#encode_re: the parameter row_sep: "|[]\n" is converted to a Regexp.
What can redefine this method:
class CSV
def encode_re(*chunks)
encode_str(*chunks)
end
end
CSV.open('output.txt', 'w', col_sep: '|', row_sep: "|[]\n"
) do |csv|
csv << [1,2,3]
csv << [4,5,6]
end
The result is:
1|2|3|[]
4|5|6|[]
I found no side effect, but I don't feel comfortble to redefine CSV, so I would recommend to create a new CSV-variant:
#Class to create LEDES98
class LEDES_CSV < CSV
def encode_re(*chunks)
encode_str(*chunks)
end
end
LEDES_CSV.open('output.txt', 'w', col_sep: '|', row_sep: "|[]\n"
) do |csv|
csv << [1,2,3]
csv << [4,5,6]
end
Then you can use the 'original' CSV and for LEDES-files you can use the LEDES_CSV.
Given an input string of the form
s = "[cat][dog][horsey\nhorse]"
you could use something like
s.scan(/\[(.*?)\]/m).flatten
which would return ["cat", "dog", "horsey\nhorse"] and process that with CSV module.
I just tried
require 'csv'
#Create LEDES98
CSV.open('output.txt', 'w', col_sep: '|', row_sep: '[]') do |csv|
csv << [1,2,3]
csv << [4,5,6]
end
and I got
1|2|3[]4|5|6[]
Which csv/ruby-version do you use? My CSV::VERSION is 2.4.7, my ruby version is 1.9.2p290 (2011-07-09) [i386-mingw32].
Another remark:
If I look at the example files in http://www.ledes.org/ then you need additional newlines. I would recommed to use:
require 'csv'
#Create LEDES98
CSV.open('output.txt', 'w', col_sep: '|', row_sep: "[]\n") do |csv|
csv << [1,2,3,nil]
csv << [4,5,6,nil]
end
Result:
1|2|3|[]
4|5|6|[]
The additional nils gives you the last | before the [].
I tested on another computer with ruby 1.9.3p194 (2012-04-20) [i386-mingw32] and get the same error.
I researched a bit and can isolate the problem:
p "[]" #[]
p "\[\]" #[] <--- Problem
p "\\[\\]" #\\[\\]
You can't mask the [. If you mask it once, Ruby produces [ (without the mask sign). If you mask it twice, you mask only the \, not the ].

Ruby - remove columns from csv file and convert to pipe delimited txt file

I'm trying to take a CSV file, strip a few columns, and then output a pipe delimited text file.
Here's my code, which almost works. The only problem is the CSV.generate block is adding double quotes around the whole thing, as well as a random comma with double quotes around it where the line break is.
require 'csv'
original = CSV.read('original.csv', { headers: true, return_headers: true })
original.delete('Column header 1')
original.delete('Column header 2')
original.delete('Column header 3')
csv_string = CSV.generate do |csv|
csv << original
end
pipe_string = csv_string.tr(",","|")
File.open('output.txt', 'w+') do |f|
f.write(pipe_string)
end
Is there a better way to do this? Any help is appreciated.
Try this:
require 'csv'
original = CSV.read('original.csv', { headers: true, return_headers: true })
original.delete('Column header 1')
original.delete('Column header 2')
original.delete('Column header 3')
CSV.open('output.txt', 'w', col_sep: '|') do |csv|
original.each do |row|
csv << row
end
end

Resources