I have string values which I am writing to csv file in the form of array as -
output = "This is a, ruby output"
CSV.open("output/abc.csv", "a+") do |csv|
csv << [output]
end
When I check my file abc.csv the row added has quotation marks (") at the start and end of the field. How can I get rid of it?
File output as ---
"This is a, ruby output"
So far I've tried tr or slice before saving to csv, but it seems writing is causing it.
If you get rid of the quotes then your output is no longer CSV. The CSV class can be instructed to use a different delimiter and will only quote if that delimiter is included in the input. For example:
require 'csv'
output = "This is a, ruby output"
File.open("output/abc.csv", "a+") do |io|
csv = CSV.new(io, col_sep: '^')
csv << [output, "the end"]
end
Output:
This is a, ruby output^the end
Related
I generate CSV data as follows:
require 'csv'
CSV.generate do |csv|
csv << ['a', 123, 1.5, '0123']
end
This returns this result:
"a,123,1.5,0123\n"
The problem is that Excel will interpret 0123 as integer. On the other side, when using force_quotes: true, the values 123 and 1.5 won't be interpreted as numbers anymore.
How can I quote only strings, not numbers, to get the following result?
"\"a\",123,1.5,\"0123\"\n"
require 'csv'
test = CSV.generate do |csv|
csv << ['a', 123, 1.5, '0123'].map{|e|e.class == String ? "\"#{e}\"" : e}
end
puts test # => """a""",123,1.5,"""0123"""
This output is valid CSV and gets imported into spreadsheets as:
A1: "a"
B1: 123
C1: 1.5
D1: "0123"
Is this what you are looking for?
Edit:
In case it wasn't obvious, what I'm doing here is checking each value before I pass it into the csv. If it's a string, surround it with quotes (the \" is the escaped quote) and then let CSV escape them however it needs to in order to be valid.
Regardless of the method you are using to pass things into your csv object, you should be able to do the same check and modify the strings that way.
Alternatively if you have access to the source data, add the surrounding quotes there.
The below with hardcoded file path works
File.open('/cm/Chef/instnst.loc', 'w') do |f2|
# use "\n" for two lines of text
f2.puts "inventory_loc=#{orainsloc}\ninst_group=dbas"
end
But when I change it to read from a JSON parameter, it errors. What am I doing wrong?
File.open('#{node['installParams']['InstallFilesLocation']}/instnst.loc', 'w') do |f2|
# use "\n" for two lines of text
f2.puts "inventory_loc=#{orainsloc}\ninst_group=dbas"
end
#{} formatting has be inside double quotes like "#{node['installParams']['InstallFilesLocation']}/instnst.loc".
I am retrieving a large hash of results from a database query and writing them to a csv file. The code block below takes the results and creates the CSV. With the quote_char: option it will replace the quotes with NULL characters which I need to properly create the tab-delimited file.
However, the NULL characters are getting converted into "" when they are loaded into their destination so I would like to remove those. If I leave out quote_char: every field is double quoted which causes the same result.
How can I remove the NULL characters?
begin
CSV.open("#{file_path}"'file.tab', "wb", Options = {col_sep: "\t", quote_char: "\0"}) do |csv|
csv << ["Key","channel"]
series_1_results.each_hash do |series_1|
csv << ["#{series_1['key']}","#{series_1['channel']}"]
end
end
end
As it is stated in the csv documentation you have to the set quote_char to some character, and this character will always be used to quote empty fields.
It seems the only solution in this case is to remove used quote_chars from the created csv file. You can do it like this:
quotedFile = File.read("#{file_path}"'file.tab')
unquotedFile = quotedFile.gsub("\0", "")
File.open("#{file_path}"'unquoted_file.tab',"w") { |file| file.puts replace }
I assume here that NULL's are the only escaped fields. If that's not the case use default quote_char: '"' and gsub(',"",', '') which should handle almost all possible cases of fields containing special characters.
But as you note that the results of your query are large it might be more practical to prepare the csv file on your own and avoid processing the outputs twice. You could simply write:
File.open("#{file_path}"'unquoted_file.tab',"w") do |file|
csv.puts ["Key","channel"]
series_1_results.each_hash do |series_1|
csv.puts ["#{series_1['key']},#{series_1['channel']}"]
end
end
Once more, you might need to handle fields with special characters.
From the Ruby CSV docs, setting force_quotes: false in the options seems to work.
CSV.open("#{file_path}"'file.tab', "wb", { col_sep: "\t", force_quotes: false }) do |csv|
The above does the trick. I'd suggest against setting quote_char to \0 since that doesn't work as expected.
There is one thing to note though. If the field is an empty string "" - it will force the quote_char to be printed into the CSV. But strangely a nil value does not. I'd suggest that if at all you're expecting empty strings in the data, you somehow convert them into nil when writing to the CSV (maybe using the ActiveSupport presence method or anything similar).
First, a tab-separated file is "TSV", vs. a comma-separated file which is "CSV".
Wrapping quotes around fields is necessary any time there could be an occurrence of the field delimiter inside a field.
For instance, how are you going to embed this string in a tab-delimited file?
Foo\tbar
The \t is the representation of an embedded Tab.
The same problem occurs when writing a CSV file with a field containing commas. The field has to be wrapped in double-quotes to delimit the field itself.
If your input contains any data that needs to be escaped (such as the column separator or the quote character), then you do need to quote your data. Otherwise it cannot be parsed correctly later.
CSV.open('test.csv', 'wb', col_sep: "\t") do |csv|
csv << ["test", "'test'", '"test"', nil, "test\ttest"]
end
puts open('test.csv').read
#test 'test' """test""" "test test"
The CSV class won't quote anything unnecessarily (as you can see above). So I'm not sure why you're saying all your fields are being quoted. It could be somehow force_quotes is getting set to true somewhere.
If you're absolutely certain your data will never contain \t or ", then the default quote_char (") should work just fine. Otherwise, if you want to avoid quoting anything, you'll need to pick another quote character that you're absolutely certain won't occur in your data.
CSV.open('test.csv', 'wb', col_sep: "\t", quote_char: "|") do |csv|
csv << ["test", "'test'", nil, '"test"']
end
puts open('test.csv').read
#test 'test' "test"
The incoming data file(s) contain malformed CSV data such as non-escaped quotes, as well as (valid) CSV data such as fields containing new lines. If a CSV format error is detected I would like to use an alternative routine on that data.
With the following sample code (abbreviated for simplicity)
FasterCSV.open( file ){|csv|
row = true
while row
begin
row = csv.shift
break unless row
# Do things with the good rows here...
rescue FasterCSV::MalformedCSVError => e
# Do things with the bad rows here...
next
end
end
}
The MalformedCSVError is caused in the csv.shift method. How can I access the data that caused the error from the rescue clause?
require 'csv' #CSV in ruby 1.9.2 is identical to FasterCSV
# File.open('test.txt','r').each do |line|
DATA.each do |line|
begin
CSV.parse(line) do |row|
p row #handle row
end
rescue CSV::MalformedCSVError => er
puts er.message
puts "This one: #{line}"
# and continue
end
end
# Output:
# Unclosed quoted field on line 1.
# This one: 1,"aaa
# Illegal quoting on line 1.
# This one: aaa",valid
# Unclosed quoted field on line 1.
# This one: 2,"bbb
# ["bbb", "invalid"]
# ["3", "ccc", "valid"]
__END__
1,"aaa
aaa",valid
2,"bbb
bbb,invalid
3,ccc,valid
Just feed the file line by line to FasterCSV and rescue the error.
This is going to be really difficult. Some things that make FasterCSV, well, faster, make this particularly hard. Here's my best suggestion: FasterCSV can wrap an IO object. What you could do, then, is to make your own subclass of File (itself a subclass of IO) that "holds onto" the result of the last gets. Then when FasterCSV raises an exception you can ask your special File object for the last line. Something like this:
class MyFile < File
attr_accessor :last_gets
#last_gets = ''
def gets(*args)
line = super
#last_gets << $/ << line
line
end
end
# then...
file = MyFile.open(filename, 'r')
csv = FasterCSV.new file
row = true
while row
begin
break unless row = csv.shift
# do things with the good row here...
rescue FasterCSV::MalformedCSVError => e
bad_row = file.last_gets
# do something with bad_row here...
next
ensure
file.last_gets = '' # nuke the #last_gets "buffer"
end
end
Kinda neat, right? BUT! there are caveats, of course:
I'm not sure how much of a performance hit you take when you add an extra step to every gets call. It might be an issue if you need to parse multi-million-line files in a timely fashion.
This fails utterly might or might not fail if your CSV file contains newline characters inside quoted fields. The reason for this is described in the source--basically, if a quoted value contains a newline then shift has to do additional gets calls to get the entire line. There could be a clever way around this limitation but it's not coming to me right now. If you're sure your file doesn't have any newline characters within quoted fields then this shouldn't be a worry for you, though.
Your other option would be to read the file using File.gets and pass each line in turn to FasterCSV#parse_line but I'm pretty sure in so doing you'd squander any performance advantage gained from using FasterCSV.
I used Jordan's file subclassing approach to fix the problem with my input data before CSV ever tries to parse it. In my case, I had a file that used \" to escape quotes, instead of the "" that CSV expects. Hence,
class MyFile < File
def gets(*args)
line = super
if line != nil
line.gsub!('\\"','""') # fix the \" that would otherwise cause a parse error
end
line
end
end
infile = MyFile.open(filename)
incsv = CSV.new(infile)
while row = infile.shift
# process each row here
end
This allowed me to parse the non-standard CSV file. Ruby's CSV implementation is very strict and often has trouble with the many variants of the CSV format.
What's the best (most efficient) way to parse a tab-delimited file in Ruby?
The Ruby CSV library lets you specify the field delimiter. Ruby 1.9 uses FasterCSV. Something like this would work:
require "csv"
parsed_file = CSV.read("path-to-file.csv", col_sep: "\t")
The rules for TSV are actually a bit different from CSV. The main difference is that CSV has provisions for sticking a comma inside a field and then using quotation characters and escaping quotes inside a field. I wrote a quick example to show how the simple response fails:
require 'csv'
line = 'boogie\ttime\tis "now"'
begin
line = CSV.parse_line(line, col_sep: "\t")
puts "parsed correctly"
rescue CSV::MalformedCSVError
puts "failed to parse line"
end
begin
line = CSV.parse_line(line, col_sep: "\t", quote_char: "Ƃ")
puts "parsed correctly with random quote char"
rescue CSV::MalformedCSVError
puts "failed to parse line with random quote char"
end
#Output:
# failed to parse line
# parsed correctly with random quote char
If you want to use the CSV library you could used a random quote character that you don't expect to see if your file (the example shows this), but you could also use a simpler methodology like the StrictTsv class shown below to get the same effect without having to worry about field quotations.
# The main parse method is mostly borrowed from a tweet by #JEG2
class StrictTsv
attr_reader :filepath
def initialize(filepath)
#filepath = filepath
end
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
fields = Hash[headers.zip(line.split("\t"))]
yield fields
end
end
end
end
# Example Usage
tsv = Vendor::StrictTsv.new("your_file.tsv")
tsv.parse do |row|
puts row['named field']
end
The choice of using the CSV library or something more strict just depends on who is sending you the file and whether they are expecting to adhere to the strict TSV standard.
Details about the TSV standard can be found at http://en.wikipedia.org/wiki/Tab-separated_values
There are actually two different kinds of TSV files.
TSV files that are actually CSV files with a delimiter set to Tab. This is something you'll get when you e.g. save an Excel spreadsheet as "UTF-16 Unicode Text". Such files use CSV quoting rules, which means that fields may contain tabs and newlines, as long as they are quoted, and literal double quotes are written twice. The easiest way to parse everything correctly is to use the csv gem:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t")
TSV files conforming to the IANA standard. Tabs and newlines are not allowed as field values, and there is no quoting whatsoever. This is something you will get when you e.g. select a whole Excel spreadsheet and paste it into a text file (beware: it will get messed up if some cells do contain tabs or newlines). Such TSV files can be easily parsed line-by-line with a simple line.rstrip.split("\t", -1) (note -1, which prevents split from removing empty trailing fields). If you want to use the csv gem, simply set quote_char to nil:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t", quote_char: nil)
I like mmmries answer. HOWEVER, I hate the way that ruby strips off any empty values off of the end of a split. It isn't stripping off the newline at the end of the lines, either.
Also, I had a file with potential newlines within a field. So, I rewrote his 'parse' as follows:
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
myline=line
while myline.scan(/\t/).count != headers.count-1
myline+=f.gets
end
fields = Hash[headers.zip(myline.chomp.split("\t",headers.count))]
yield fields
end
end
end
This concatenates any lines as necessary to get a full line of data, and always returns the full set of data (without potential nil entries at the end).