Ruby CSV: How can I read a tab-delimited file? - ruby

CSV.open(name, "r").each do |row|
puts row
end
And I get the following error:
CSV::MalformedCSVError Unquoted fields do not allow \r or \n
The name of the file is a .txt tab-delimited file. I made it specifically. I have a .csv file, I went to excel, and saved the file as .txt tab delimited. So it is tab delimited.
Shouldn't CSV.open be able to read tab-delimited files?

Try specifying the field delimiter like this:
CSV.open("name", "r", { :col_sep => "\t" }).each do |row|
puts row
end
And remember to require 'csv' and read the DOCS

By default CSV uses the comma as separator, this comes from the fact that CSV stands for 'Comma Separated Values'.
If you want a different separator (in this case tabs) you need to make it explicit.
Example:
p CSV.new("aaa\tbbb\tccc\nddd\teee", col_sep: "\t").read
Relevant documentation: http://ruby-doc.org/stdlib-2.1.0/libdoc/csv/rdoc/CSV.html#new

as an alternative to CSV, you can also use smarter_csv like this:
require 'smarter_csv'
data = SmarterCSV.process(filename, col_sep: "\t")
If you use smarter_csv >= 1.4.2, you can also do this:
require 'smarter_csv'
data = SmarterCSV.process(filename, col_sep: :auto)
SmarterCSV will return an array of hashes, and can do batch processing

Related

Ruby equivalent to Python's DictWriter?

I have a Ruby script that goes through a CSV, determines some information, and then puts out a resulting CSV file. In Python, I'm able to open both my source file and my results file with DictReader and DictWriter respectively and write rows as dictionaries, where keys are the file header values. It doesn't appear that there is a manageable way to do this in Ruby, but I'm hoping somebody can point me to a better solution than storing all of my result hashes in an array and writing them after the fact.
The standard library "CSV" gives rows hash-like behavior when headers are enabled.
require 'csv'
CSV.open("file.csv", "wb") do |csv_out|
CSV.foreach("test.csv", headers: true) do |row|
row["header2"].upcase! # hashlike behaviour
row["new_header"] = 12 # add a new column
csv_out << row
end
end
(test.csv has a header1, a header2 and some random comma separated string lines.)

Using excel to log results in ruby watir. how to keep values in different cells using puts

I am new to ruby watir and need your help.
I am using the following commands to log my script results into an excel sheet.
File.open('c:\log.txt', 'w') do |file|
file.puts("TEST PASSED" + "#{Time.now}")
end
Here the test passed and the time is getting displayed in a single cell itself.
I want to display both of it in different cells.
Please suggest a solution.
Thanks in advance!
you are logging to a file called log.txt which appears to be a plain text file. if you want your file to be an excel file you will need a format, the easiest one to write to is either .csv or .tsv which stands for comma separated variable and tab separated variables. You could then write in a few different ways. You could write as you were with:
File.open('c:\log.tsv', 'w') do |file|
file.puts("TEST PASSED\t" + "#{Time.now}")
end
for a tsv (note that it doesn't need to be called .tsv)
File.open('c:\log.csv', 'w') do |file|
file.puts("TEST PASSED," + "#{Time.now}")
end
for a csv
or you could use the standard csv library. like so:
CSV.open("c:\log.csv", "wb") do |csv|
csv << ["TEST PASSED", "#{Time.now}"]
end
which you can manipulate for tsv's:
CSV.open("c:\log.csv", "wb", { :col_sep => "\t" }) do |csv|
csv << ["TEST PASSED", "#{Time.now}"]
end
The easiest solution would probably be to log the results in a CSV (comma separated values) file. These can be written like a text file, as well as read by Excel as a grid.
For a CSV file, each line represents a row in the table. Within each line, the column values (or cells) are separated by commas. For example, the following will create 2 rows with 3 columns:
a1, b1, c1
a2, b2, c2
For your logging, you could:
Create a log.csv instead of log.txt
Output the values as comma separated
The code would be:
File.open('c:\log.csv', 'w') do |file|
file.puts("TEST PASSED, #{Time.now}")
end

How do I write a TSV file scraper, where "if line contains x, then save"?

I want to open a TSV (tab-separated-value) file, and save specific rows to a new CSV (comma-separated-value) file.
If the row contains 'NLD' in a field with the header 'Actor1Code', I want to save the row to a CSV; if not, I want to iterate to the next row. This is what I have so far, but apparently that is not enough:
require 'csv'
CSV.open("path/to.csv", "wb") do |csv| #csv to save to
CSV.open('data.txt', 'r', '\t').each do |row| #csv to scrape
if row['Actor1Code'] == 'NLD'
csv << row
else
end
end
end
Are you sure that you're calling CSV.open correctly? The documentation seems to suggest arguments are passed in as hashes:
CSV.open('data.txt', 'r', col_sep: "\t")
The error you're seeing is probably the result of '\t' being interpreted as a hash and referenced with [].

Ruby CSV parsing string with escaped quotes

I have a line in my CSV file that has some escaped quotes:
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
When I try to parse it the the Ruby CSV parser:
require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
puts row
end
I get this error:
.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)
How can I get around this error?
The \" is typical Unix whereas Ruby CSV expects ""
To parse it:
require 'csv'
text = File.read('test.csv').gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
puts row
end
Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.
Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:
gsub(/(?<!\\)\\"/,'""')
CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaces on all fields in a row.
Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.
This is my sample CSV file:
ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
Preserving your CSV.foreach method, this is my example code for parsing it without CSV getting mad:
require 'csv'
require 'pp'
header = []
File.foreach('test.csv') do |csv_line|
row = CSV.parse(csv_line.gsub('\"', '""')).first
if header.empty?
header = row.map(&:to_sym)
next
end
row = Hash[header.zip(row)]
pp row
puts row[:Name]
end
And the resulting hash and name value:
{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto
I assumed you were wanting a hash back because you specified the :headers flag:
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
Open the file in MSExcel and save as MS-DOS Comma Separated(.csv)

what is the proper way to import a csv in ruby?

There are soo many inherent problems with csv, 1) your columns can't have commas, so you have to encapsulate them with quotes "", and then once you encapsulate them with quotes, you have to already escape the quotes within a sentence by using \"
What is the easiest way to parse a csv file? I reverted to doing semi-colon separated files but those are troublesome when working in excel, so now i'm back to csv files.
Have you tried http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html.
You can also look at http://fastercsv.rubyforge.org/
Check out Faster CSV from James Edward Gray II.
"FasterCSV is CSV, but faster, smaller, and cleaner."
Here is a rough example using CSV in ruby:
class DataLoader
require 'csv'
def self.import_csv
Dir.glob("/imports/*.csv").each do |csv_file|
csv = CSV.open(csv_file, {:col_sep => ",", :headers => true, :return_headers => false, :quote_char => '"'})
#data_table = csv.read
#data_table.each do |data_row|
field_one = data_row.first[0]
field_two = data_row.first[1]
#do some work
end
end
end

Resources