Can't read file charset utf-16le except puts in ruby - ruby

I need to read an external file in ruby.
Running file -i locally shows
text/plain; charset=utf-16le
I open it in ruby CSV with separater '\t' and a row shows as:
<CSV::Row "\xFF\xFEC\x00a\x00n\x00d\x00i\x00d\x00a\x00t\x00e\x00 \x00n\x00u\...
row.to_s produces \x000\x000\x000\x001\x00\t\x00E\x00D\x00O
Running puts row shows the data correctly:
0001 EDOARDO A...
(the values also show legibly in vim and LibreOffice Calc)
Any suggestions how to get the data in ruby? I've tried various combinations of opening the CSV with external_encoding: 'utf-16le', internal_encoding: "utf-8" etc., but puts is the only thing that gives legible values
It also said ASCII-8BIT in ruby CSV.
<#CSV io_type:StringIO encoding:ASCII-8BIT lineno:0 col_sep:"\\t" row_sep:"\n" quote_char:"\"" headers:true>
The file itself was produced as an XLS file. I have uploaded an edited version here (edited i gvim)

This is working fine for me:
require 'csv'
CSV.foreach("file.xls", encoding: "UTF-16LE:UTF-8", col_sep: "\t") do |row|
puts row.inspect
end
this will produce the following output:
["Candidate number", "First name", "Last name", "Date of birth", "Preparation centre", "Result", "Score", "Reading and Writing", "Listening", "Speaking", "Result enquiry", "Raised on", "Raised by", "Enquiry status", "Withdrawn on", "Withdrawn by", nil]
["0001", "EDOARDO", "AGNEW", "20/01/2001", "Fondazione Istituto Massimo", "RY5-G8-Y2", "-", nil, nil, nil, "-", "00000000", nil, nil, "00000000", nil, nil]
As you can see each row is an array of strings of each column in the document.

The issue was that I was reading from a Paperclip attachment, which needed to have the encoding set (overridden) before saving.
Adding s3_headers in the model worked:
has_attached_file :attachment, s3_headers: lambda { |attachment|
{
'content-Type' => 'text/csv; charset=utf-16le'
}
}
Thanks to Julien for tipping me off that the issue was related to the paperclip attachment (that solution works to read the file directly)

Related

Ruby: How to generate CSV files that has Excel-friendly encoding

I am generating CSV files that needs to be opened and reviewed in Excel once they have been generated. It seems that Excel requires a different encoding than UTF-8.
Here is my config and generation code:
csv_config = {col_sep: ";",
row_sep: "\n",
encoding: Encoding::UTF_8
}
csv_string = CSV.generate(csv_config) do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
When opening this in Excel, the special characters are not being displayed properly:
Text a Text b Text æ Text ø Text å
Any idea how to ensure proper encoding?
Excel understands UTF-8 CSV if it has BOM. That can be done like:
Use CSV.generate
# the argument of CSV.generate is default string
csv_string = CSV.generate("\uFEFF") do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
Use CSV.open
filename = "/tmp/example.csv"
# Default output encoding is UTF-8
CSV.open(filename, "w") do |csv|
csv.to_io.write "\uFEFF" # use CSV#to_io to write BOM directly
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
The top voted answer from #joaofraga worked for me, but I found an alternative solution that also worked - no UTF-8 to ISO-8859-1 transcoding required.
From what I've read, Excel, can indeed handle UTF-8, but for some reason, it doesn't recognize it by default. But if you add a BOM to the beginning of the CSV data, this seems to cause Excel to realise that the file is UTF-8.
So, if you have a CSV like so:
csv_string = CSV.generate(csv_config) do |csv|
csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
just add a BOM byte like so:
"\uFEFF" + csv_string
In my case, my controller is sending the CSV as a file, so this is what my controller looks like:
def show
respond_to do |format|
format.csv do
# add BOM to force Excel to realise this file is encoded in UTF-8, so it respects special characters
send_data "\uFEFF" + csv_string, type: :csv, filename: "csv.csv"
end
end
end
I should note that UTF-8 itself does not require or recommend a BOM at all, but as I mentioned, adding it in this case seemed to nudge Excel into realising that the file was indeed UTF-8.
You should switch the encoding to ISO-8859-1 as following:
CSV.generate(encoding: 'ISO-8859-1') { |csv| csv << ["Text á", "Text é", "Text æ"] }
For your context, you can do this:
config = {
col_sep: ';',
row_sep: ';',
encoding: 'ISO-8859-1'
}
CSV.generate(config) { |csv| csv << ["Text á", "Text é", "Text æ"] }
I had the same issue and that encoding fixed.
config = {
encoding: 'ISO-8859-1'
}
CSV.generate(config) { |csv| csv << ["Text á", "Text é", "Text æ"] }
With https://github.com/gtd/csv_builder, I had to:
In the controller action:
#output_encoding = 'UTF-8'
send_data "\uFEFF" + render_to_string(), type: :csv, filename: #filename
Atop the csv.csvbuilder template:
faster_csv.to_io.write("\uFEFF")
I don't know why I had to add the BOM twice, but it did not work with either one on its own.

Writing to a file via Tempfile in Ruby

I have the below background job that writes to a csv file and emails it out. I am using the Tempfile class so the file is removed after I email it to the user. Currently, when I look at the csv file I am producing the results look like the following:
["Client Application" "Final Price" "Tax" "Credit" "Base Price" "Billed At" "Order Guid" "Method of Payment Guid" "Method of Payment Type"]
["web" nil nil nil nil nil nil "k32k313k1j3" "credit card"]
Please ignore the data, but the issue is, it is being written directly to the file in the ruby format and not removing the "" and [] characters.
Please see the code below:
class ReportJob
#queue = :report_job
def self.perform(client_app_id, current_user_id)
user = User.find(current_user_id)
client_application = Application.find(client_app_id)
transactions = client_application.transactions
file = Tempfile.open(["#{Rails.root}/tmp/", ".csv"]) do |csv|
begin
csv << ["Application", "Price", "Tax", "Credit", "Base Price", "Billed At", "Order ID", "Payment ID", "Payment Type"]
transactions.each do |transaction|
csv << "\n"
csv << [application.name, transaction.price, transaction.tax, transaction.credit, transaction.base_price, transaction.billed_at, transaction.order_id, transaction.payment_id, transaction.payment_type]
end
ensure
ReportMailer.send_rev_report(user.email, csv).deliver
csv.close(unlink_now=false)
end
end
end
end
Would this be an issue with using the tempfile class instead of the csv class? or is there something I could do to change the way it is being written to the file?
Adding the code for reading the csv file in the mailer. I am currently getting a TypeError that says "can't convert CSV into String".
class ReportMailer < ActionMailer::Base
default :from => "test#gmail.com"
def send_rev_report(email, file)
attachments['report.csv'] = File.read("#{::Rails.root.join('tmp', file)}")
mail(:to => email, :subject => "Attached is your report")
end
end
end
The issue is that you're not actually writing csv data to the file. You're sending arrays to the filehandle. I believe you need something like:
Tempfile.open(....) do |fh|
csv = CSV.new(fh, ...)
<rest of your code>
end
to properly setup the CSV output filtering.
Here's how I did it.
patient_payments = PatientPayment.all
Tempfile.new(['patient_payments', '.csv']).tap do |file|
CSV.open(file, 'wb') do |csv|
csv << patient_payments.first.class.attribute_names
patient_payments.each do |patient_payment|
csv << patient_payment.attributes.values
end
end
end
I prefer to do
tempfile = Tempfile.new(....)
csv = CSV.new(tempfile, ...) do |row|
<rest of your code>
end
try this:
Tempfile.open(["#{Rails.root}/tmp/", ".csv"]) do |outfile|
CSV::Writer.generate(outfile) do |csv|
csv << ["Application", "Price", "Tax", "Credit", "Base Price", "Billed At", "Order ID", "Payment ID", "Payment Type"]
#...
end
end

Removing whitespaces in a CSV file

I have a string with extra whitespace:
First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
I want to parse this line and remove the whitespaces.
My code looks like:
namespace :db do
task :populate_contacts_csv => :environment do
require 'csv'
csv_text = File.read('file_upload_example.csv')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
end
end
#prices = CSV.parse(IO.read('prices.csv'), :headers=>true,
:header_converters=> lambda {|f| f.strip},
:converters=> lambda {|f| f ? f.strip : nil})
The nil test is added to the row but not header converters assuming that the headers are never nil, while the data might be, and nil doesn't have a strip method. I'm really surprised that, AFAIK, :strip is not a pre-defined converter!
You can strip your hash first:
csv.each do |unstriped_row|
row = {}
unstriped_row.each{|k, v| row[k.strip] = v.strip}
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Edited to strip hash keys too
CSV supports "converters" for the headers and fields, which let you get inside the data before it's passed to your each loop.
Writing a sample CSV file:
csv = "First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
first,last,email ,mobile phone ,company,title ,street,city,state,zip,country, birthday,gender ,contact type
"
File.write('file_upload_example.csv', csv)
Here's how I'd do it:
require 'csv'
csv = CSV.open('file_upload_example.csv', :headers => true)
[:convert, :header_convert].each { |c| csv.send(c) { |f| f.strip } }
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Which outputs:
First Name: 'first'
Last Name: 'last'
Email: 'email'
The converters simply strip leading and trailing whitespace from each header and each field as they're read from the file.
Also, as a programming design choice, don't read your file into memory using:
csv_text = File.read('file_upload_example.csv')
Then parse it:
csv = CSV.parse(csv_text, :headers => true)
Then loop over it:
csv.each do |row|
Ruby's IO system supports "enumerating" over a file, line by line. Once my code does CSV.open the file is readable and the each reads each line. The entire file doesn't need to be in memory at once, which isn't scalable (though on new machines it's becoming a lot more reasonable), and, if you test, you'll find that reading a file using each is extremely fast, probably equally fast as reading it, parsing it then iterating over the parsed file.

Ruby CSV parsing string with escaped quotes

I have a line in my CSV file that has some escaped quotes:
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
When I try to parse it the the Ruby CSV parser:
require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
puts row
end
I get this error:
.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)
How can I get around this error?
The \" is typical Unix whereas Ruby CSV expects ""
To parse it:
require 'csv'
text = File.read('test.csv').gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
puts row
end
Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.
Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:
gsub(/(?<!\\)\\"/,'""')
CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaces on all fields in a row.
Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.
This is my sample CSV file:
ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
Preserving your CSV.foreach method, this is my example code for parsing it without CSV getting mad:
require 'csv'
require 'pp'
header = []
File.foreach('test.csv') do |csv_line|
row = CSV.parse(csv_line.gsub('\"', '""')).first
if header.empty?
header = row.map(&:to_sym)
next
end
row = Hash[header.zip(row)]
pp row
puts row[:Name]
end
And the resulting hash and name value:
{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto
I assumed you were wanting a hash back because you specified the :headers flag:
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
Open the file in MSExcel and save as MS-DOS Comma Separated(.csv)

Ruby 1.9.2 export a CSV string without generating a file

I just can't get the 'To a String' example under 'Writing' example in the documentation to work at all.
ruby -v returns:
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0]
The example from the documentation I can't working is here:
csv_string = CSV.generate do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
end
The error I get is:
wrong number of arguments (0 for 1)
So it seems like I am missing an argument, in the documentation here it states:
This method wraps a String you provide, or an empty default String
But when I pass in a empty string, it gives me the following error:
No such file or directory -
I am not looking to generate a csv file, I just wanted to create a string of csv that I send as text to the user.
Here is code I know works against Ruby 1.9.2 with Rails 3.0.1
def export_csv(filename, header, rows)
require 'csv'
file = CSV.generate do |csv|
csv << header if not header.blank?
rows.map {|row| csv << row}
end
send_data file, :type => 'text/csv; charset=iso-8859-1; header=present', :disposition => "attachment;filename=#{filename}.csv"
end

Resources