Ruby: How to generate CSV files that has Excel-friendly encoding - ruby

I am generating CSV files that needs to be opened and reviewed in Excel once they have been generated. It seems that Excel requires a different encoding than UTF-8.
Here is my config and generation code:
csv_config = {col_sep: ";",
row_sep: "\n",
encoding: Encoding::UTF_8
}
csv_string = CSV.generate(csv_config) do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
When opening this in Excel, the special characters are not being displayed properly:
Text a Text b Text æ Text ø Text å
Any idea how to ensure proper encoding?

Excel understands UTF-8 CSV if it has BOM. That can be done like:
Use CSV.generate
# the argument of CSV.generate is default string
csv_string = CSV.generate("\uFEFF") do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
Use CSV.open
filename = "/tmp/example.csv"
# Default output encoding is UTF-8
CSV.open(filename, "w") do |csv|
csv.to_io.write "\uFEFF" # use CSV#to_io to write BOM directly
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

The top voted answer from #joaofraga worked for me, but I found an alternative solution that also worked - no UTF-8 to ISO-8859-1 transcoding required.
From what I've read, Excel, can indeed handle UTF-8, but for some reason, it doesn't recognize it by default. But if you add a BOM to the beginning of the CSV data, this seems to cause Excel to realise that the file is UTF-8.
So, if you have a CSV like so:
csv_string = CSV.generate(csv_config) do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
just add a BOM byte like so:
"\uFEFF" + csv_string
In my case, my controller is sending the CSV as a file, so this is what my controller looks like:
def show
respond_to do |format|
format.csv do
# add BOM to force Excel to realise this file is encoded in UTF-8, so it respects special characters
send_data "\uFEFF" + csv_string, type: :csv, filename: "csv.csv"
end
end
end
I should note that UTF-8 itself does not require or recommend a BOM at all, but as I mentioned, adding it in this case seemed to nudge Excel into realising that the file was indeed UTF-8.

You should switch the encoding to ISO-8859-1 as following:
CSV.generate(encoding: 'ISO-8859-1') { |csv| csv << ["Text á", "Text é", "Text æ"] }
For your context, you can do this:
config = {
col_sep: ';',
row_sep: ';',
encoding: 'ISO-8859-1'
}
CSV.generate(config) { |csv| csv << ["Text á", "Text é", "Text æ"] }
I had the same issue and that encoding fixed.

config = {
encoding: 'ISO-8859-1'
}
CSV.generate(config) { |csv| csv << ["Text á", "Text é", "Text æ"] }

With https://github.com/gtd/csv_builder, I had to:
In the controller action:
#output_encoding = 'UTF-8'
send_data "\uFEFF" + render_to_string(), type: :csv, filename: #filename
Atop the csv.csvbuilder template:
faster_csv.to_io.write("\uFEFF")
I don't know why I had to add the BOM twice, but it did not work with either one on its own.

Related

How do I generate a CSV with ANSI encoding using Ruby?

Currently I'm generating a CSV with UTF-8 encoding from my administrate ui. But swedish letters "åäö" is not shown correctly in excel or in the label printer programs (P-touch Editor 5.4 & Dymo Connect) I'm using.
After talking to their support I've been told the CSV needs to be ANSI encoded. How do I do that?
My code:
def to_csv
attributes = %w{full_name street_address postal_code city}
CSV.generate(headers: true, col_sep: ",") do |csv|
csv << attributes
orders.all.each do |order|
csv << attributes.map{ |attr| order.address.send(attr) }
end
end
end
By default CSV uses Encoding.default_external as encoding, most likely this is UTF-8.
In your case you have to override it, but first you need to know which ANSI encoding you actually need. (What is ANSI format?)
Most likely you can use Windows-1252 or ISO-8859-1.
Then you can set the external encoding of the CSV string like this:
CSV.generate(headers: true, col_sep: ",", encoding: Encoding::ISO_8859_1)
CSV.generate(headers: true, col_sep: ",", encoding: Encoding::WINDOWS_1252)
Strings work, too:
CSV.generate(headers: true, col_sep: ",", encoding: 'ISO-8859-1')

CSV writing Ruby Encoding Error

I am trying to write a UTF-8 character in a CSV file with csv library of Ruby. And I have got an error:
csv ruby write problem ASCII-8BIT (Encoding::CompatibilityError)
#create csv file
CSV.open(CSV_file,"wb",) do |csv|
csv << First_line
rows.each do |r|
csv << r.generate_array
end
end
That's the code where UTF-8 conflicts with ASCII-8BIT.
Example text that fails:
demás
Here is an example of CSV writing and reading with UTF-8:
fn="/tmp/f.csv"
require "csv"
d1=DATA.read.split(/\n/).map {|e| e.split}
CSV.open(fn, "w:utf-8") do |row|
d1.each { |dr| row << dr }
end
d2=[]
CSV.foreach(fn) do |row|
d2 << row
end
puts d1==d2
# true
__END__
privé face à face à un tête-à-tête
Face to face with one-on-one
demás
Without a more detailed example from you, I cannot help further.

Ruby CSV - Illegal quoting in line 1. CSV::MalformedCSVError

I have a problem with reading from the csv file. File comes from Windows, so I suppose there are some encoding issues. My code looks like this:
CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
CSV.parse(open(doc.file.url), headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n", encoding: 'utf-8').each_with_index do |line, index|
csv << line.headers if index == 0
# do something wiht row
csv << line
end
end
I have to open existing file and complete some columns from it. So I just create new file. The existing file is stored on Dropbox, so I have to use open method.
The problem is that I get an error in this line:
CSV.parse(open(doc.file.url), headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n", encoding: 'utf-8').each_with_index do |line, index|
The error is:
Illegal quoting in line 1. CSV::MalformedCSVError
I check and seems like I don't have BOM characters in the file (not sure if check it right). The problem seems to be in quote character. The exception is thrown for every line in the file.
This is the file that causes me problems: https://dl.dropboxusercontent.com/u/3900955/geo_bez_adresu_10_do_testow_small.csv
I tried different approaches from StackOverflow but nothing helps, for example I changed my code into this:
CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
open(doc.file.url) do |f|
f.each_line do |line|
CSV.parse(line, 'r:bom|utf-8') do |row|
csv << row
end
end
end
end
but it doesn't help. I will be grateful for any help with parsing this file.
======= edit =========
When I safe the same file on Windows with encoding ANSI as UTF-8 (in Notepad++) I can parse the file correctly. From this discussion What is "ANSI as UTF-8" and how can I make fputcsv() generate UTF-8 w/BOM?, it seems like I have BOM in the original file. How I can check in Ruby if my file is with BOM and how I can parse the csv file with BOM ?
CSV.parse() requires a string on its first argument, but you're passing a File object instead. What happens is that parse() gets to parse the expanded value of (file object).to_s instead and it cause the error.
Update
To read file with BOM you can have this:
CSV.new(File.open('file.csv', 'r:bom|utf-8'), col_sep: ';').each do |row|
...
end
Reference: https://stackoverflow.com/a/7780559/445221
I didn't find any way to read directly from remote file, if it contains BOM. So I use Tempfile file to create temporary file and then I do CSV.open with 'r:bom|utf-8':
doc = Document.find(doc_id)
path = "#{Rails.root.join('tmp')}/#{doc.name.split('.').first}_#{Time.now.to_i}.csv"
file = Tempfile.new(["#{doc.name.split('.').first}_#{Time.now.to_i}", '.csv'])
file.binmode
file << open(doc.file.url).read
file.close
CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
CSV.open(file.path, 'r:bom|utf-8', headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n").each_with_index do |line, index|
# do something
end
end
Now, it seems to parse the file.

Read a csv in ruby with UTF-8 literal

i have this csv file
file data.csv:
data.csv: ASCII text
This file has ~10000 lines with some UTF-8 literal chars.
For example:
1388357672.209253000,48:a2:2d:78:84:10,\xe5\x87\xb6\xe5\xb7\xb4\xe5\xb7\xb4\xe8\x87\xad\xe7\x98\xaa\xe7\x98\xaa\xe7\x9a\x84\xe6\x80\xaa\xe5\x85\xbd\xe5\x87\xba
I iterate over this file in Ruby and save every line in my postgresql db
File.open(filename, "r").each_line do |line|
CSV.parse(line, encoding: 'UTF-8') do |row|
//Save to Postgresql
end
end
I have now the problem that the UTF-8 literal string is saved in the db and not the correct UTF-8 string. I can convert every line with echo -e "line" but this takes much time. Is ther a way that ruby can do this task?
Try this:
CSV.parse(line, encoding: 'UTF-8') do |row|
row = row.map do |elem|
elem.gsub(/\\x../) {|s| [s[2..-1].hex].pack("C")}.force_encoding("UTF-8")
end
//Save to Postgresql
end
Just put each cell in double quotes:
"\xe5\x87\xb6\xe5\xb7\xb4\xe5\xb7\xb4\xe8\x87\xad\xe7\x98\xaa\xe7\x98\xaa\xe7\x9a\x84\xe6\x80\xaa\xe5\x85\xbd\xe5\x87\xba"
=> "凶巴巴臭瘪瘪的怪兽出"

Problems with FasterCSV (Ruby)

I'm writing a program, that creates a csv-File. And I have a problem right at the beginning.
So, my code is
def create_csv
destfile = Rails.root.join("public", "reports", "statistic_csv#{id}.csv")
csv_string = FasterCSV.generate do |out|
out << ["row", "of", "CSV", "data"]
end
FasterCSV.open(destfile, "w") do |csv|
csv << csv_string
end
end
I thought, I will get 4 columns in the output file, smth like this row|of|csv|data. But what I get is "row,of,CSV,data" in one cell A1. How can i solve the Problem? Thanks in advance!
PS. I use ruby 1.8.7 and FasterCSV 1.5.5
You are encoding the CSV string twice. This should work:
def create_csv
destfile = Rails.root.join("public", "reports", "statistic_csv#{id}.csv")
FasterCSV.open(destfile, "wb") do |csv|
csv << ["row", "of", "CSV", "data"]
end
end
You can also specify a custom column separator:
FasterCSV.open(destfile, "wb", { :col_sep => "|" }) do |csv|
# ...
end
I presume you're opening this in Excel. Excel may not be detecting the file as a CSV file. Try importing the data into an excel workbook as opposed to opening the file in Excel.

Resources