How to convert a whitespace string to CSV string in Ruby? - ruby

How can I convert a whitespace-delimited string to CSV string in Ruby? Is there a built-in method that could be used to achieve this?
Code:
#stores = current_user.channels
puts #stores
Current Output:
TMSUS TMSCA
Expected Output:
TMSUS,TMSCA

There is a CSV library in Ruby Here
require 'csv'
stores = 'TMSUS THSCA'
stores.split(' ').to_csv
Don't use gsub to do this. If you had a string with a comma in it, it would break your CSV. The CSV library does escaping for you.

You could use the CSV library:
require 'csv'
string = 'TMSUS THSCA'
CSV.generate do |csv|
csv << string.split
end
# => "TMSUS,THSCA\n"
The advantage to using the CSV library is it properly escapes and quotes values which might require that.

Related

Writing to csv is adding quotes

I have string values which I am writing to csv file in the form of array as -
output = "This is a, ruby output"
CSV.open("output/abc.csv", "a+") do |csv|
csv << [output]
end
When I check my file abc.csv the row added has quotation marks (") at the start and end of the field. How can I get rid of it?
File output as ---
"This is a, ruby output"
So far I've tried tr or slice before saving to csv, but it seems writing is causing it.
If you get rid of the quotes then your output is no longer CSV. The CSV class can be instructed to use a different delimiter and will only quote if that delimiter is included in the input. For example:
require 'csv'
output = "This is a, ruby output"
File.open("output/abc.csv", "a+") do |io|
csv = CSV.new(io, col_sep: '^')
csv << [output, "the end"]
end
Output:
This is a, ruby output^the end

Only string (not number) quoting when generating CSV

I generate CSV data as follows:
require 'csv'
CSV.generate do |csv|
csv << ['a', 123, 1.5, '0123']
end
This returns this result:
"a,123,1.5,0123\n"
The problem is that Excel will interpret 0123 as integer. On the other side, when using force_quotes: true, the values 123 and 1.5 won't be interpreted as numbers anymore.
How can I quote only strings, not numbers, to get the following result?
"\"a\",123,1.5,\"0123\"\n"
require 'csv'
test = CSV.generate do |csv|
csv << ['a', 123, 1.5, '0123'].map{|e|e.class == String ? "\"#{e}\"" : e}
end
puts test # => """a""",123,1.5,"""0123"""
This output is valid CSV and gets imported into spreadsheets as:
A1: "a"
B1: 123
C1: 1.5
D1: "0123"
Is this what you are looking for?
Edit:
In case it wasn't obvious, what I'm doing here is checking each value before I pass it into the csv. If it's a string, surround it with quotes (the \" is the escaped quote) and then let CSV escape them however it needs to in order to be valid.
Regardless of the method you are using to pass things into your csv object, you should be able to do the same check and modify the strings that way.
Alternatively if you have access to the source data, add the surrounding quotes there.

Unicode string from CSV: replace \xHex symbols with unicode values

I read some Unicode data from a CSV file using standard Ruby 1.9 csv library like this:
def read_csv(file_name, value)
CSV.foreach(file_name) do |row|
if row[0] == value
return row[1]
end
end
end
And I get a string, the Unicode symbols looks okay in debug.
Invitación
But if I put it (or compare with another string) it looks like this:
Invitaci\xC3\xB3n
How to convert those hex symbols to values? Or maybe I read this CSV file wrong somehow?
Actually found this myself.
Just change line
CSV.foreach(file_name) do |row|
on line
CSV.foreach(file_name, encoding: "UTF-8") do |row|
and this work flawless

Store regex matches in ruby?

I'm parsing a file with ruby to change the data formatting. I created a regex which has three match groups that I want to temporally store in variables. I'm having trouble getting the matches to be stored as everything is nil.
Here is what I have so far from what I've read.
regex = '^"(\bhttps?://[-\w+&##/%?=~_|$!:,.;]*[\w+&##/%=~_|$])","(\w+|[\w._%+-]+#[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})'
begin
file = File.new("testfile.csv", "r")
while (line = file.gets)
puts line
match_array = line.scan(/regex/)
puts $&
end
file.close
end
Here is some sample data that I'm using for testing.
"https://mail.google.com","Master","password1","","https://mail.google.com","",""
"https://login.sf.org","monster#gmail.com","password2","https://login.sf.org","","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxUsername","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxPassword"
"http://www.facebook.com","Beast","12345678","https://login.facebook.com","","email","pass"
"http://www.own3d.tv","Earth","passWOrd3","http://www.own3d.tv","","user_name","user_password"
Thank you,
LF4
This won't work:
match_array = line.scan(/regex/)
That's just using a literal "regex" string as your regular expression, not what's in your regex variable. You can either put the big ugly regex right into your scan or create a Regexp instance:
regex = Regexp.new('^"(\bhttps?://[-\w+&##/%?=~_|$!:,.;]*[\w+&##/%=~_|$])","(\w+|[\w._%+-]+#[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})')
# ...
match_array = line.scan(regex)
And you should probably use a CSV library (one comes with Ruby: 1.8.7 or 1.9) for parsing CSV files, then apply a regular expression to each column from the CSV. You'll run into fewer quoting and escaping issues that way.

What's the best way to parse a tab-delimited file in Ruby?

What's the best (most efficient) way to parse a tab-delimited file in Ruby?
The Ruby CSV library lets you specify the field delimiter. Ruby 1.9 uses FasterCSV. Something like this would work:
require "csv"
parsed_file = CSV.read("path-to-file.csv", col_sep: "\t")
The rules for TSV are actually a bit different from CSV. The main difference is that CSV has provisions for sticking a comma inside a field and then using quotation characters and escaping quotes inside a field. I wrote a quick example to show how the simple response fails:
require 'csv'
line = 'boogie\ttime\tis "now"'
begin
line = CSV.parse_line(line, col_sep: "\t")
puts "parsed correctly"
rescue CSV::MalformedCSVError
puts "failed to parse line"
end
begin
line = CSV.parse_line(line, col_sep: "\t", quote_char: "Ƃ")
puts "parsed correctly with random quote char"
rescue CSV::MalformedCSVError
puts "failed to parse line with random quote char"
end
#Output:
# failed to parse line
# parsed correctly with random quote char
If you want to use the CSV library you could used a random quote character that you don't expect to see if your file (the example shows this), but you could also use a simpler methodology like the StrictTsv class shown below to get the same effect without having to worry about field quotations.
# The main parse method is mostly borrowed from a tweet by #JEG2
class StrictTsv
attr_reader :filepath
def initialize(filepath)
#filepath = filepath
end
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
fields = Hash[headers.zip(line.split("\t"))]
yield fields
end
end
end
end
# Example Usage
tsv = Vendor::StrictTsv.new("your_file.tsv")
tsv.parse do |row|
puts row['named field']
end
The choice of using the CSV library or something more strict just depends on who is sending you the file and whether they are expecting to adhere to the strict TSV standard.
Details about the TSV standard can be found at http://en.wikipedia.org/wiki/Tab-separated_values
There are actually two different kinds of TSV files.
TSV files that are actually CSV files with a delimiter set to Tab. This is something you'll get when you e.g. save an Excel spreadsheet as "UTF-16 Unicode Text". Such files use CSV quoting rules, which means that fields may contain tabs and newlines, as long as they are quoted, and literal double quotes are written twice. The easiest way to parse everything correctly is to use the csv gem:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t")
TSV files conforming to the IANA standard. Tabs and newlines are not allowed as field values, and there is no quoting whatsoever. This is something you will get when you e.g. select a whole Excel spreadsheet and paste it into a text file (beware: it will get messed up if some cells do contain tabs or newlines). Such TSV files can be easily parsed line-by-line with a simple line.rstrip.split("\t", -1) (note -1, which prevents split from removing empty trailing fields). If you want to use the csv gem, simply set quote_char to nil:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t", quote_char: nil)
I like mmmries answer. HOWEVER, I hate the way that ruby strips off any empty values off of the end of a split. It isn't stripping off the newline at the end of the lines, either.
Also, I had a file with potential newlines within a field. So, I rewrote his 'parse' as follows:
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
myline=line
while myline.scan(/\t/).count != headers.count-1
myline+=f.gets
end
fields = Hash[headers.zip(myline.chomp.split("\t",headers.count))]
yield fields
end
end
end
This concatenates any lines as necessary to get a full line of data, and always returns the full set of data (without potential nil entries at the end).

Resources