How do I generate a CSV with ANSI encoding using Ruby?

How do I generate a CSV with ANSI encoding using Ruby? - ruby

Currently I'm generating a CSV with UTF-8 encoding from my administrate ui. But swedish letters "åäö" is not shown correctly in excel or in the label printer programs (P-touch Editor 5.4 & Dymo Connect) I'm using.
After talking to their support I've been told the CSV needs to be ANSI encoded. How do I do that?
My code:
def to_csv
attributes = %w{full_name street_address postal_code city}
CSV.generate(headers: true, col_sep: ",") do |csv|
csv << attributes
orders.all.each do |order|
csv << attributes.map{ |attr| order.address.send(attr) }
end
end
end

By default CSV uses Encoding.default_external as encoding, most likely this is UTF-8.
In your case you have to override it, but first you need to know which ANSI encoding you actually need. (What is ANSI format?)
Most likely you can use Windows-1252 or ISO-8859-1.
Then you can set the external encoding of the CSV string like this:
CSV.generate(headers: true, col_sep: ",", encoding: Encoding::ISO_8859_1)
CSV.generate(headers: true, col_sep: ",", encoding: Encoding::WINDOWS_1252)
Strings work, too:
CSV.generate(headers: true, col_sep: ",", encoding: 'ISO-8859-1')

Related

Ruby CSV BOM|UTF-8 encoding for StringIO

Ruby 2.6.3.
I have been trying to parse a StringIO object into a CSV instance with the bom|utf-8 encoding, so that the BOM character (undesired) is stripped and the content is encoded to UTF-8:
require 'csv'
CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze
content = StringIO.new("\xEF\xBB\xBFid\n123")
first_row = CSV.parse(content, CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF") # This returns true
Apparently the bom|utf-8 encoding does not work for StringIO objects, but I found that it does work for files, for instance:
require 'csv'
CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze
# File content is: "\xEF\xBB\xBFid\n12"
first_row = CSV.read('bom_content.csv', CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF") # This returns false
Considering that I need to work with StringIO directly, why does CSV ignores the bom|utf-8 encoding? Is there any way to remove the BOM character from the StringIO instance?
Thank you!

Ruby 2.7 added the set_encoding_by_bom method to IO. This methods consumes the byte order mark and sets the encoding.
require 'csv'
require 'stringio'
CSV_READ_OPTIONS = { headers: true }.freeze
content = StringIO.new("\xEF\xBB\xBFid\n123")
content.set_encoding_by_bom
first_row = CSV.parse(content, CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF")
#=> false

Ruby doesn't like BOMs. It only handles them when reading a file, never anywhere else, and even then it only reads them so that it can get rid of them. If you want a BOM for your string, or a BOM when writing a file, you have to handle it manually.
There are probably gems for doing this, though it's easy to do yourself
if string[0...3] == "\xef\xbb\xbf"
string = string[3..-1].force_encoding('UTF-8')
elsif string[0...2] == "\xff\xfe"
string = string[2..-1].force_encoding('UTF-16LE')
# etc

I found out that forcing encoding to utf8 on the StringIO string and removing the BOM to generate a new StringIO worked:
require 'csv'
CSV_READ_OPTIONS = { headers: true}.freeze
content = StringIO.new("\xEF\xBB\xBFid\n123")
csv_file = StringIO.new(content.string.force_encoding('utf-8').sub("\xEF\xBB\xBF", ''))
first_row = CSV.parse(csv_file, CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF") # => false
The encoding option is no more needed. It may not be the best option memory-wise, but it works.

Ruby: How to generate CSV files that has Excel-friendly encoding

I am generating CSV files that needs to be opened and reviewed in Excel once they have been generated. It seems that Excel requires a different encoding than UTF-8.
Here is my config and generation code:
csv_config = {col_sep: ";",
row_sep: "\n",
encoding: Encoding::UTF_8
}
csv_string = CSV.generate(csv_config) do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
When opening this in Excel, the special characters are not being displayed properly:
Text a Text b Text Ã¦ Text Ã¸ Text Ã¥
Any idea how to ensure proper encoding?

Excel understands UTF-8 CSV if it has BOM. That can be done like:
Use CSV.generate
# the argument of CSV.generate is default string
csv_string = CSV.generate("\uFEFF") do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
Use CSV.open
filename = "/tmp/example.csv"
# Default output encoding is UTF-8
CSV.open(filename, "w") do |csv|
csv.to_io.write "\uFEFF" # use CSV#to_io to write BOM directly
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

The top voted answer from #joaofraga worked for me, but I found an alternative solution that also worked - no UTF-8 to ISO-8859-1 transcoding required.
From what I've read, Excel, can indeed handle UTF-8, but for some reason, it doesn't recognize it by default. But if you add a BOM to the beginning of the CSV data, this seems to cause Excel to realise that the file is UTF-8.
So, if you have a CSV like so:
csv_string = CSV.generate(csv_config) do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
just add a BOM byte like so:
"\uFEFF" + csv_string
In my case, my controller is sending the CSV as a file, so this is what my controller looks like:
def show
respond_to do |format|
format.csv do
# add BOM to force Excel to realise this file is encoded in UTF-8, so it respects special characters
send_data "\uFEFF" + csv_string, type: :csv, filename: "csv.csv"
end
end
end
I should note that UTF-8 itself does not require or recommend a BOM at all, but as I mentioned, adding it in this case seemed to nudge Excel into realising that the file was indeed UTF-8.

You should switch the encoding to ISO-8859-1 as following:
CSV.generate(encoding: 'ISO-8859-1') { |csv| csv << ["Text á", "Text é", "Text æ"] }
For your context, you can do this:
config = {
col_sep: ';',
row_sep: ';',
encoding: 'ISO-8859-1'
}
CSV.generate(config) { |csv| csv << ["Text á", "Text é", "Text æ"] }
I had the same issue and that encoding fixed.

config = {
encoding: 'ISO-8859-1'
}
CSV.generate(config) { |csv| csv << ["Text á", "Text é", "Text æ"] }

With https://github.com/gtd/csv_builder, I had to:
In the controller action:
#output_encoding = 'UTF-8'
send_data "\uFEFF" + render_to_string(), type: :csv, filename: #filename
Atop the csv.csvbuilder template:
faster_csv.to_io.write("\uFEFF")
I don't know why I had to add the BOM twice, but it did not work with either one on its own.

Read a csv in ruby with UTF-8 literal

i have this csv file
file data.csv:
data.csv: ASCII text
This file has ~10000 lines with some UTF-8 literal chars.
For example:
1388357672.209253000,48:a2:2d:78:84:10,\xe5\x87\xb6\xe5\xb7\xb4\xe5\xb7\xb4\xe8\x87\xad\xe7\x98\xaa\xe7\x98\xaa\xe7\x9a\x84\xe6\x80\xaa\xe5\x85\xbd\xe5\x87\xba
I iterate over this file in Ruby and save every line in my postgresql db
File.open(filename, "r").each_line do |line|
CSV.parse(line, encoding: 'UTF-8') do |row|
//Save to Postgresql
end
end
I have now the problem that the UTF-8 literal string is saved in the db and not the correct UTF-8 string. I can convert every line with echo -e "line" but this takes much time. Is ther a way that ruby can do this task?

Try this:
CSV.parse(line, encoding: 'UTF-8') do |row|
row = row.map do |elem|
elem.gsub(/\\x../) {|s| [s[2..-1].hex].pack("C")}.force_encoding("UTF-8")
end
//Save to Postgresql
end

Just put each cell in double quotes:
"\xe5\x87\xb6\xe5\xb7\xb4\xe5\xb7\xb4\xe8\x87\xad\xe7\x98\xaa\xe7\x98\xaa\xe7\x9a\x84\xe6\x80\xaa\xe5\x85\xbd\xe5\x87\xba"
=> "凶巴巴臭瘪瘪的怪兽出"

Ruby read CSV file as UTF-8 and/or convert ASCII-8Bit encoding to UTF-8

I'm using ruby 1.9.2
I'm trying to parse a CSV file that contains some French words (e.g. spécifié) and place the contents in a MySQL database.
When I read the lines from the CSV file,
file_contents = CSV.read("csvfile.csv", col_sep: "$")
The elements come back as Strings that are ASCII-8BIT encoded (spécifié becomes sp\xE9cifi\xE9), and strings like "spécifié" are then NOT properly saved into my MySQL database.
Yehuda Katz says that ASCII-8BIT is really "binary" data meaning that CSV has no idea how to read the appropriate encoding.
So, if I try to make CSV force the encoding like this:
file_contents = CSV.read("csvfile.csv", col_sep: "$", encoding: "UTF-8")
I get the following error
ArgumentError: invalid byte sequence in UTF-8:
If I go back to my original ASCII-8BIT encoded Strings and examine the String that my CSV read as ASCII-8BIT, it looks like this "Non sp\xE9cifi\xE9" instead of "Non spécifié".
I can't convert "Non sp\xE9cifi\xE9" to "Non spécifié" by doing this
"Non sp\xE9cifi\xE9".encode("UTF-8")
because I get this error:
Encoding::UndefinedConversionError: "\xE9" from ASCII-8BIT to UTF-8,
which Katz indicated would happen because ASCII-8BIT isn't really a proper String "encoding".
Questions:
Can I get CSV to read my file in the appropriate encoding? If so, how?
How do I convert an ASCII-8BIT string to UTF-8 for proper storage in MySQL?

deceze is right, that is ISO8859-1 (AKA Latin-1) encoded text. Try this:
file_contents = CSV.read("csvfile.csv", col_sep: "$", encoding: "ISO8859-1")
And if that doesn't work, you can use Iconv to fix up the individual strings with something like this:
require 'iconv'
utf8_string = Iconv.iconv('utf-8', 'iso8859-1', latin1_string).first
If latin1_string is "Non sp\xE9cifi\xE9", then utf8_string will be "Non spécifié". Also, Iconv.iconv can unmangle whole arrays at a time:
utf8_strings = Iconv.iconv('utf-8', 'iso8859-1', *latin1_strings)
With newer Rubies, you can do things like this:
utf8_string = latin1_string.force_encoding('iso-8859-1').encode('utf-8')
where latin1_string thinks it is in ASCII-8BIT but is really in ISO-8859-1.

With ruby >= 1.9 you can use
file_contents = CSV.read("csvfile.csv", col_sep: "$", encoding: "ISO8859-1:utf-8")
The ISO8859-1:utf-8 is meaning: The csv-file is ISO8859-1 - encoded, but convert the content to utf-8
If you prefer a more verbose code, you can use:
file_contents = CSV.read("csvfile.csv", col_sep: "$",
external_encoding: "ISO8859-1",
internal_encoding: "utf-8"
)

I have been dealing with this issue for a while and not any of the other solutions worked for me.
The thing that made the trick was to store the conflictive string in a binary File, then read the File normally and using this string to feed the CSV module:
tempfile = Tempfile.new("conflictive_string")
tempfile.binmode
tempfile.write(conflictive_string)
tempfile.close
cleaned_string = File.read(tempfile.path)
File.delete(tempfile.path)
csv = CSV.new(cleaned_string)

Interpreting non-latin characters in Sinatra coming from Mac Excel 2011

I've a Mac VBA script making a request to a Ruby Sinatra web app.
The text passing from Excel contains characters such as é. Ruby (version 1.9.2) chokes on these characters as Excel is not sending them as UTF-8.
# encoding: utf-8
require 'rubygems'
require 'sinatra'
require "sinatra/reloader" if development?
configure do
class << Sinatra::Base
def options(path, opts={}, &block)
route 'OPTIONS', path, opts, &block
end
end
Sinatra::Delegator.delegate :options
end
options '/' do
response.headers["Access-Control-Allow-Origin"] = "*"
response.headers["Access-Control-Allow-Methods"] = "POST"
halt 200
end
post '/fetch' do
chars = []
params['excel_input'].valid_encoding? #returns false
params['excel_input']
end
My Excel VBA:
Sub FetchAddress()
For Each oDest In Selection
With ActiveSheet.QueryTables.Add(Connection:="URL;http://localhost:4567/fetch", Destination:=oDest)
.PostText = "excel_input=" & oDest.Offset(0, -1).Value
.RefreshStyle = xlOverwriteCells
.SaveData = True
.Refresh
End With
Next
End Sub
The character é comes out the other end as Ž.
It looks like the text in Excel is encoded as Windows-1252 http://en.wikipedia.org/wiki/Windows-1252.
The byte representation of the character is 142 (or Ž in Windows-1252).

iconv can convert the input to UTF-8. It converts the character encoding from one encoding to another. So something like this should work:
require "iconv"
...
post '/fetch' do
excel_input = Iconv.conv("UTF-8", "WINDOWS-1252", params['excel_input'])
...
end

you can also probably look at: https://github.com/jmhodges/rchardet
then, you can autodetect charset and then convert it to utf-8.

Ruby 1.9 Encodings: A Primer and the Solution for Rails - yehuda katz is a good read. If you have some time. Goes in to depth about encodings and how to convert between them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I generate a CSV with ANSI encoding using Ruby? - ruby

Related

Ruby CSV BOM|UTF-8 encoding for StringIO

Ruby: How to generate CSV files that has Excel-friendly encoding

Read a csv in ruby with UTF-8 literal

Ruby read CSV file as UTF-8 and/or convert ASCII-8Bit encoding to UTF-8

Interpreting non-latin characters in Sinatra coming from Mac Excel 2011

Categories

Resources