I use ruby to generate a CSV file. One of my data is a string which contain a double quote "in it, for example ="000123".
This is my code:
csv = CSV.generate do |csv|
csv << ["=\"000123\""]
end
However, it will generate a wrong string with additional double quote:
2.4.0 :005 > puts csv
"=""000123"""
What I expect result is ="000123". Does anyone know the reason? How to solve this?
That's actually how CSV escapes double quotes:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
and because your field has double quotes inside it, you are required to quote the entire column:
Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields.
(emphasis mine) If you really want to disable that feature though, just set the quote_char to something else:
puts CSV.generate(quote_char: "'") { |csv| csv << ["=\"000123\""] }
# ="000123"
Related
Is there a way to convert a hash, possibly nested:
{:event=>"subscribe", :channel=>"data_channel", :parameters=>{:api_key=>"XXX", :sign=>"YYY"}}
into a string in specified format as below?
"{'event':'subscribe', 'channel':'data_channel', 'parameters': {'api_key':'XXX', 'sign':'YYY'}}"
EDIT
The format reminds JSON, but practically is not due to single quotes.
Make JSON, then fix it up:
require 'json'
hash = {:event=>"subscribe", :channel=>"data_channel",
:parameters=>{:api_key=>"XXX", :sign=>%q{Miles "Chief" O'Brien}}}
puts hash.to_json.gsub(/"((?:\\[\"]|[^\"])*)"/) { |x|
%Q{'#{$1.gsub(/'|\\"/, ?' => %q{\'}, %q{\\"} => ?")}'}
}
# => {'event':'subscribe','channel':'data_channel',
# 'parameters':{'api_key':'XXX','sign':'Miles "Chief" O\'Brien'}}
EDIT: The first regex says: match a double quote, then a sequence of either escaped double quotes/backslashes, or non-double-quote/backslash characters, then a double quote again. This makes sure we only find strings, and not accidental half-strings like "Miles \". For each such string, we surround the bit that was inside the double quotes ($1) with single quotes, and run a sub-replacement on it that will find escaped double quotes and unescaped single quotes, unescape the former and escape the latter.
Also, sorry about wonky highlighting, seems StackOverflow syntax highlighter can't deal with alternate forms of Ruby quoting, but they're so convenient when you're working with quote characters...
Your desired output looks like a JSON. Try
require 'json'
JSON.dump(hash)
=> "{\"event\":\"subscribe\",\"channel\":\"data_channel\",\"parameters\":{\"api_key\":\"XXX\",\"sign\":\"YYY\"}}"
To have single quotes you can try something like:
JSON.dump(hash).gsub('"', '\'')
It returns:
{'event':'subscribe','channel':'data_channel','parameters':{'api_key':'XXX','sign':'YYY'}}
I have this string:
str = "no,\"contact_last_name\",\"token\""
=> "no,\"contact_last_name\",\"token\""
I want to remove the escaped double quoted string character \". I use gsub:
result = str.gsub('\\"','')
=> "no,\"contact_last_name\",\"token\""
It appears that the string has not substituted the double quote escape characters in the string.
Why am I trying to do this? I have this csv file:
no,"contact_last_name","token",company,urbanization,sec-"property_address","property_address",city-state-zip,ase,oel,presorttrayid,presortdate,imbno,encodedimbno,fca,"property_city","property_state","property_zip"
1,MARIE A JEANTY,1083123,,,,17 SW 6TH AVE,DANIA BEACH FL 33004-3260,Electronic Service Requested,,T00215,12/14/2016,00-314-901373799-105112-33004-3260-17,TATTTADTATTDDDTTFDDFATFTDDDTTFADTTDFAAADDATDAATTFDTDFTTAFFTTATFFF,017,DANIA BEACH,FL, 33004-3260
When I try to open it with CSV, I get the following error:
CSV.foreach(path, headers: true) do |row|
end
CSV::MalformedCSVError: Illegal quoting in line 1.
Once I removed those double quoted strings in the first row (the header), the error went away. So I am trying to remove those double quoted strings before I run it through CSV:
file = File.open "file.csv"
contents = file.read
"no,\"contact_last_name\",\"token\" ... "
contents.gsub!('\\"','')
So again my question is why is gsub not removing the specified characters? Note that this actuall does work:
contents.gsub /"/, ""
as if the string is ignoring the \ character.
There is no escaped double quote in this string:
"no,\"contact_last_name\",\"token\""
The interpreter recognizes the text above as a string because it is enclosed in double quotes. And because of the same reason, the double quotes embedded in the string must be escaped; otherwise they signal the end of the string.
The enclosing double quote characters are part of the language, not part of the string. The use of backslash (\) as an escape character is also the language's way to put inside a string characters that otherwise have special meaning (double quotes f.e.).
The actual string stored in the str variable is:
no,"contact_last_name","token"
You can check this for yourself if you tell the interpreter to put the string on screen (puts str).
To answer the issue from the question's title, all your efforts to substitute escaped characters string were in vain just because the string doesn't contain the character sequences you tried to find and replace.
And the actual problem is that the CSV file is malformed. The 6th value on the first row (sec-"property_address") doesn't follow the format of a correctly encoded CSV file.
It should read either sec-property_address or "sec-property_address"; i.e. the value should be either not enclosed in quotes at all or completely enclosed in quotes. Having it partially enclosed in quotes confuses the Ruby's CSV parser.
The string looks fine; You're not understanding what you're seeing. Meditate on this:
"no,\"contact_last_name\",\"token\"" # => "no,\"contact_last_name\",\"token\""
'no,"contact_last_name","token"' # => "no,\"contact_last_name\",\"token\""
%q[no,"contact_last_name","token"] # => "no,\"contact_last_name\",\"token\""
%Q#no,"contact_last_name","token"# # => "no,\"contact_last_name\",\"token\""
When looking at a string that is delimited by double-quotes, it's necessary to escape certain characters, such as embedded double-quotes. Ruby, along with many other languages, has multiple ways of defining a string to remove that need.
Following is my code:
md5 = Digest::MD5.new
md5 << "!##$"
Then comes the error:
SyntaxError: (irb):46: unterminated string meets end of file
What is wrong? And how can I calculate the md5 hash of the string "!##$"?
The hash # sign in double quoted strings is used for variable and expression substitution. In this case, you are substituting the value of the global variable $" into the string, but you are not closing the string. The syntactically correct way of expressing that would be
"!##$"" # Note the extra closing quotes
However, it seems that you actually don't want to do variable substitution anyway, in which case you should always use single quoted strings:
'!##$'
Seems like you need to quote #:
> puts "!#\#$"
!##$
Your problem is the string you got is in a double apostrophe (") - so it is interpreted. And you have a hash (#) inside, so it is trying to do expression substitution. Put the string in a single apostrophe:
md5 << '!##$'
I have a mal-formatted .csv file which is caused by some extra \n. e.g.:
Name,Comment
"Peter","Good morning"
"Paul","How are you
"
"Mary","Fine"
The 2nd row ends with a unwanted, extra \n.
How can I remove all tailing \ns which are not followed by a double-quote " (assume the whole file is read into a string already)?
Don't read the whole thing into a string, use the standard CSV parser in 1.9 to read it. If you have that in, say, pancakes.csv, then:
require 'csv'
data = CSV.open('pancakes.csv').map { |r| r.map(&:strip) }
# or
data = CSV.open('pancakes.csv').map { |r| r.map(&:chomp) }
Then you'll have this in data:
[
["Name", "Comment"],
["Peter", "Good morning"],
["Paul", "How are you"],
["Mary", "Fine"]
]
So you can get your data all clean and nicely parsed quite simply. And if you just need to clean up the CSV for some other program that can't handled embedded newlines, then you can use CSV to write it back out again.
You don't need a Regexp for that. It's basically any double-quote on its own line:
csv_string.gsub("\n\"\n", "\"\n")
Why don't you just add a trailing double quote for lines which don't end in a double quote, and remove empty lines (lines that only have a double quote)?
I am using FastCSV.
WHen I do this:
title = "\"" + some_title + "\""
My file looks like:
"""some title """, 23, 22
I want:
"some title", 23,22
My guess would be that fastercsv is adding the extra quotes to escape the quotes in your input string.
So if you're input string is [Hello, CSV], faster csv would have to enclose it within double quotes so that csv parsing isn't disrupted by the comma. Ditto for double quotes which have significance in CSV.
I'd say try sending string without the quotes, let fastercsv decide when it needs the double quotes OR use single quotes like Jacob suggests.