Only string (not number) quoting when generating CSV - ruby

I generate CSV data as follows:
require 'csv'
CSV.generate do |csv|
csv << ['a', 123, 1.5, '0123']
end
This returns this result:
"a,123,1.5,0123\n"
The problem is that Excel will interpret 0123 as integer. On the other side, when using force_quotes: true, the values 123 and 1.5 won't be interpreted as numbers anymore.
How can I quote only strings, not numbers, to get the following result?
"\"a\",123,1.5,\"0123\"\n"

require 'csv'
test = CSV.generate do |csv|
csv << ['a', 123, 1.5, '0123'].map{|e|e.class == String ? "\"#{e}\"" : e}
end
puts test # => """a""",123,1.5,"""0123"""
This output is valid CSV and gets imported into spreadsheets as:
A1: "a"
B1: 123
C1: 1.5
D1: "0123"
Is this what you are looking for?
Edit:
In case it wasn't obvious, what I'm doing here is checking each value before I pass it into the csv. If it's a string, surround it with quotes (the \" is the escaped quote) and then let CSV escape them however it needs to in order to be valid.
Regardless of the method you are using to pass things into your csv object, you should be able to do the same check and modify the strings that way.
Alternatively if you have access to the source data, add the surrounding quotes there.

Related

How to separate two data in one cell on csv by ruby

I want to change CSV file content:
itemId,url,name,type
1|urlA|nameA|typeA
2|urlB|nameB|typeB
3|urlC,urlD|nameC|typeC
4|urlE|nameE|typeE
into an array:
[itemId,url,name,type]
[1,urlA,nameA,typeA]
[2,urlB,nameB,typeB]
[**3**,**urlC**,nameC,typeC]
[**3**,**urlD**,nameC,typeC]
[4,urlE,nameE,typeE]
Could anybody teach me how to do it?
Finally, I'm going to DL url files(.jpg)
The header row has a different separator than the data. That's a problem. You need to change the header row to use | instead of ,. Then:
require 'csv'
require 'pp'
array = Array.new
CSV.foreach("test.csv", col_sep: '|', headers: true) do |row|
if row['url'][/,/]
row['url'].split(',').each do |url|
row['url'] = url
array.push row.to_h.values
end
else
array.push row.to_h.values
end
end
pp array
=> [["1", "urlA", "nameA", "typeA"],
["2", "urlB", "nameB", "typeB"],
["3", "urlC", "nameC", "typeC"],
["3", "urlD", "nameC", "typeC"],
["4", "urlE", "nameE", "typeE"]]
You'll need to test the fifth column to see how the line should be parsed. If you see a fifth element (row[4]) output the line twice replacing the url column
array = Array.new
CSV.foreach("test.csv") do |row|
if row[4]
array << [row[0..1], row[3..4]].flatten
array << [[row[0]], row[2..4]].flatten
else
array << row
end
end
p array
In your example you had asterisks but I'm assuming that was just to emphasise the lines for which you want special handling. If you do want asterisks, you can modify the two array shovel commands appropriately.

Writing to csv is adding quotes

I have string values which I am writing to csv file in the form of array as -
output = "This is a, ruby output"
CSV.open("output/abc.csv", "a+") do |csv|
csv << [output]
end
When I check my file abc.csv the row added has quotation marks (") at the start and end of the field. How can I get rid of it?
File output as ---
"This is a, ruby output"
So far I've tried tr or slice before saving to csv, but it seems writing is causing it.
If you get rid of the quotes then your output is no longer CSV. The CSV class can be instructed to use a different delimiter and will only quote if that delimiter is included in the input. For example:
require 'csv'
output = "This is a, ruby output"
File.open("output/abc.csv", "a+") do |io|
csv = CSV.new(io, col_sep: '^')
csv << [output, "the end"]
end
Output:
This is a, ruby output^the end

How to convert a whitespace string to CSV string in Ruby?

How can I convert a whitespace-delimited string to CSV string in Ruby? Is there a built-in method that could be used to achieve this?
Code:
#stores = current_user.channels
puts #stores
Current Output:
TMSUS TMSCA
Expected Output:
TMSUS,TMSCA
There is a CSV library in Ruby Here
require 'csv'
stores = 'TMSUS THSCA'
stores.split(' ').to_csv
Don't use gsub to do this. If you had a string with a comma in it, it would break your CSV. The CSV library does escaping for you.
You could use the CSV library:
require 'csv'
string = 'TMSUS THSCA'
CSV.generate do |csv|
csv << string.split
end
# => "TMSUS,THSCA\n"
The advantage to using the CSV library is it properly escapes and quotes values which might require that.

Ruby CSV.open need to remove quotes and null characters

I am retrieving a large hash of results from a database query and writing them to a csv file. The code block below takes the results and creates the CSV. With the quote_char: option it will replace the quotes with NULL characters which I need to properly create the tab-delimited file.
However, the NULL characters are getting converted into "" when they are loaded into their destination so I would like to remove those. If I leave out quote_char: every field is double quoted which causes the same result.
How can I remove the NULL characters?
begin
CSV.open("#{file_path}"'file.tab', "wb", Options = {col_sep: "\t", quote_char: "\0"}) do |csv|
csv << ["Key","channel"]
series_1_results.each_hash do |series_1|
csv << ["#{series_1['key']}","#{series_1['channel']}"]
end
end
end
As it is stated in the csv documentation you have to the set quote_char to some character, and this character will always be used to quote empty fields.
It seems the only solution in this case is to remove used quote_chars from the created csv file. You can do it like this:
quotedFile = File.read("#{file_path}"'file.tab')
unquotedFile = quotedFile.gsub("\0", "")
File.open("#{file_path}"'unquoted_file.tab',"w") { |file| file.puts replace }
I assume here that NULL's are the only escaped fields. If that's not the case use default quote_char: '"' and gsub(',"",', '') which should handle almost all possible cases of fields containing special characters.
But as you note that the results of your query are large it might be more practical to prepare the csv file on your own and avoid processing the outputs twice. You could simply write:
File.open("#{file_path}"'unquoted_file.tab',"w") do |file|
csv.puts ["Key","channel"]
series_1_results.each_hash do |series_1|
csv.puts ["#{series_1['key']},#{series_1['channel']}"]
end
end
Once more, you might need to handle fields with special characters.
From the Ruby CSV docs, setting force_quotes: false in the options seems to work.
CSV.open("#{file_path}"'file.tab', "wb", { col_sep: "\t", force_quotes: false }) do |csv|
The above does the trick. I'd suggest against setting quote_char to \0 since that doesn't work as expected.
There is one thing to note though. If the field is an empty string "" - it will force the quote_char to be printed into the CSV. But strangely a nil value does not. I'd suggest that if at all you're expecting empty strings in the data, you somehow convert them into nil when writing to the CSV (maybe using the ActiveSupport presence method or anything similar).
First, a tab-separated file is "TSV", vs. a comma-separated file which is "CSV".
Wrapping quotes around fields is necessary any time there could be an occurrence of the field delimiter inside a field.
For instance, how are you going to embed this string in a tab-delimited file?
Foo\tbar
The \t is the representation of an embedded Tab.
The same problem occurs when writing a CSV file with a field containing commas. The field has to be wrapped in double-quotes to delimit the field itself.
If your input contains any data that needs to be escaped (such as the column separator or the quote character), then you do need to quote your data. Otherwise it cannot be parsed correctly later.
CSV.open('test.csv', 'wb', col_sep: "\t") do |csv|
csv << ["test", "'test'", '"test"', nil, "test\ttest"]
end
puts open('test.csv').read
#test 'test' """test""" "test test"
The CSV class won't quote anything unnecessarily (as you can see above). So I'm not sure why you're saying all your fields are being quoted. It could be somehow force_quotes is getting set to true somewhere.
If you're absolutely certain your data will never contain \t or ", then the default quote_char (") should work just fine. Otherwise, if you want to avoid quoting anything, you'll need to pick another quote character that you're absolutely certain won't occur in your data.
CSV.open('test.csv', 'wb', col_sep: "\t", quote_char: "|") do |csv|
csv << ["test", "'test'", nil, '"test"']
end
puts open('test.csv').read
#test 'test' "test"

What's the best way to parse a tab-delimited file in Ruby?

What's the best (most efficient) way to parse a tab-delimited file in Ruby?
The Ruby CSV library lets you specify the field delimiter. Ruby 1.9 uses FasterCSV. Something like this would work:
require "csv"
parsed_file = CSV.read("path-to-file.csv", col_sep: "\t")
The rules for TSV are actually a bit different from CSV. The main difference is that CSV has provisions for sticking a comma inside a field and then using quotation characters and escaping quotes inside a field. I wrote a quick example to show how the simple response fails:
require 'csv'
line = 'boogie\ttime\tis "now"'
begin
line = CSV.parse_line(line, col_sep: "\t")
puts "parsed correctly"
rescue CSV::MalformedCSVError
puts "failed to parse line"
end
begin
line = CSV.parse_line(line, col_sep: "\t", quote_char: "Ƃ")
puts "parsed correctly with random quote char"
rescue CSV::MalformedCSVError
puts "failed to parse line with random quote char"
end
#Output:
# failed to parse line
# parsed correctly with random quote char
If you want to use the CSV library you could used a random quote character that you don't expect to see if your file (the example shows this), but you could also use a simpler methodology like the StrictTsv class shown below to get the same effect without having to worry about field quotations.
# The main parse method is mostly borrowed from a tweet by #JEG2
class StrictTsv
attr_reader :filepath
def initialize(filepath)
#filepath = filepath
end
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
fields = Hash[headers.zip(line.split("\t"))]
yield fields
end
end
end
end
# Example Usage
tsv = Vendor::StrictTsv.new("your_file.tsv")
tsv.parse do |row|
puts row['named field']
end
The choice of using the CSV library or something more strict just depends on who is sending you the file and whether they are expecting to adhere to the strict TSV standard.
Details about the TSV standard can be found at http://en.wikipedia.org/wiki/Tab-separated_values
There are actually two different kinds of TSV files.
TSV files that are actually CSV files with a delimiter set to Tab. This is something you'll get when you e.g. save an Excel spreadsheet as "UTF-16 Unicode Text". Such files use CSV quoting rules, which means that fields may contain tabs and newlines, as long as they are quoted, and literal double quotes are written twice. The easiest way to parse everything correctly is to use the csv gem:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t")
TSV files conforming to the IANA standard. Tabs and newlines are not allowed as field values, and there is no quoting whatsoever. This is something you will get when you e.g. select a whole Excel spreadsheet and paste it into a text file (beware: it will get messed up if some cells do contain tabs or newlines). Such TSV files can be easily parsed line-by-line with a simple line.rstrip.split("\t", -1) (note -1, which prevents split from removing empty trailing fields). If you want to use the csv gem, simply set quote_char to nil:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t", quote_char: nil)
I like mmmries answer. HOWEVER, I hate the way that ruby strips off any empty values off of the end of a split. It isn't stripping off the newline at the end of the lines, either.
Also, I had a file with potential newlines within a field. So, I rewrote his 'parse' as follows:
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
myline=line
while myline.scan(/\t/).count != headers.count-1
myline+=f.gets
end
fields = Hash[headers.zip(myline.chomp.split("\t",headers.count))]
yield fields
end
end
end
This concatenates any lines as necessary to get a full line of data, and always returns the full set of data (without potential nil entries at the end).

Resources