Problem
In a source file, I have a large number of strings.ome with interpolation, some with special symbols and some with neither.
I am trying to work out if I can replace the single quotes with double quotes whilst converting escaped single quote characters. I would then run this conversion on one or more source code files.
Example - Code
Imagine the following code:
def myfunc(var, var2 = 'abc')
s = 'something'
puts 'a simple string'
puts 'string with an escaped quote \' in it'
x = "nasty #{interpolated}" + s + ' and single quote combo'
puts "my #{var}"
end
Example - Result
I would like to turn it into this:
def myfunc(var, var2 = "abc")
s = "something"
puts "a simple string"
puts "string with an escaped quote ' in it"
x = "nasty #{interpolated}" + s + " and single quote combo"
puts "my #{var}"
end
If anyone has any ideas I'd be very grateful!
You want negative look behind (?<!) operator:
REGEX
(?<!\)'
DEMO
http://regex101.com/r/rN5eE6
EXPLANATION
You want to replace any single quote not preceded by a backslash.
Don't forget to do a find and replace of all \' with '
THERE IS MORE
For this use case, even if it's a simple use case, a ruby parser would perform better.
As Peter Hamilton pointed out, although replacing single quoted strings with double quoted equivalents might seem as an easy task at first, even that cannot be done easily, if at all, with regexen, mainly thanks to the possibility of single quotes in the "wrong places", such as within double-quoted strings, %q literal string constructs, heredocs, comments...
x = 'puts "foo"'
y = %/puts 'foo'/ # TODO: Replace "x = %/puts 'foo'/" with "x = %#puts 'bar'#"
But the correct solution, in this case, is much easier than the other way around (double quoted to single quoted), and actually partially attainable:
require 'ripper'
require 'sorcerer' # gem install sorcerer if necessary
my_source = <<-source
x = 'puts "foo"'
y = "puts 'bar'"
source
sexp = Ripper::SexpBuilder.new( my_source ).parse
double_quoted_source = Sorcerer.source sexp
#=> "x = \"puts \"foo\"\"; y = \"puts 'bar'\""
The reason why I say "partially attainable" is because, as you can see by yourself,
puts double_quoted_source
#=> x = "puts "foo""; y = "puts 'bar'"
Sorcerer forgets to escape double quotes inside formerly single-quoted string. Feel free to submit a patch
to sorcerer's author Jim Weirich that would fix the problem.
Related
Why might you use ''' instead of """, as in Learn Ruby the Hard Way, Chapter 10 Study Drills?
There are no triple quotes in Ruby.
Two String literals which are juxtaposed are parsed as a single String literal. So,
'Hello' 'World'
#=> "HelloWorld"
is the same as
'HelloWorld'
#=> "HelloWorld"
And
'' 'Hello' ''
#=> "Hello"
is the same as
'''Hello'''
#=> "Hello"
is the same as
'Hello'
#=> "Hello"
Since adding an empty string literal does not change the result, you can add as many empty strings as you want:
""""""""""""'''''Hello'''''''''
#=> "Hello"
There are no special rules for triple single quotes vs. triple double quotes, because there are no triple quotes. The rules are simply the same as for quotes.
I assume the author confused Ruby and Python, because a triple-quote will not work in Ruby the way author thought it would. It'll just work like three separate strings ('' '' '').
For multi-line strings one could use:
%q{
your text
goes here
}
=> "\n your text\n goes here\n "
or %Q{} if you need string interpolation inside.
Triple-quotes ''' are the same as single quotes ' in that they don't interpolate any #{} sequences, escape characters (like "\n"), etc.
Triple-double-quotes (ugh) """ are the same as double-quotes " in that they do interpolation and escape sequences.
This is further down on the same page you linked.
The triple-quoted versions """ ''' allows for multi-line strings... as does the singly-quoted ' and ", so I don't know why both are available.
In Ruby """ supports interpolation, ''' does not.
Rubyists use triple quotes for multi-line strings (similar to 'heredocs').
You could just as easily use one of these characters.
Just like normal strings the double quotes will allow you to use variables inside of your strings (also known as 'interpolation').
Save this to a file called multiline_example.rb and run it:
interpolation = "(but this one can use interpolation)"
single = '''
This is a multi-line string.
'''
double = """
This is also a multi-line string #{interpolation}.
"""
puts single
puts double
This is the output:
$ ruby multiline_string_example.rb
This is a multi-line string.
This is also a multi-line string (but this one can use interpolation).
$
Now try it the other way around:
nope = "(this will never get shown)"
single = '''
This is a multi-line string #{nope}.
'''
double = """
This is also a multi-line string.
"""
puts single
puts double
You'll get this output:
$ ruby multiline_example.rb
This is a multi-line string #{nope}.
This is also a multi-line string.
$
Note that in both examples you got some extra newlines in your output. That's because multiline strings keep any newlines inside them, and puts adds a newline to every string.
I've recently been coding in Ruby and have come from Python, where single and double quotes made no difference to how the code worked as far as I know.
I moved to Ruby to see how it worked, and to investigate the similarities between Ruby and Python.
I was using single-quoted strings once and noticed this:
hello = 'hello'
x = '#{hello} world!'
puts x
It returned '#{hello} world!' rather than 'hello world!'.
After noticing this I tried double quotes and the problem was fixed. Now I'm not sure why that is.
Do single and double quotes change this or is it because of my editor (Sublime text 3)? I'm also using Ruby version 2.0 if it works differently in previous versions.
In Ruby, double quotes are interpolated, meaning the code in #{} is evaluated as Ruby. Single quotes are treated as literals (meaning the code isn't evaluated).
var = "hello"
"#{var} world" #=> "hello world"
'#{var} world' #=> "#{var} world"
For some extra-special magic, Ruby also offers another way to create strings:
%Q() # behaves like double quotes
%q() # behaves like single quotes
For example:
%Q(#{var} world) #=> "hello world"
%q(#{var} world) #=> "#{var} world"
You should read the Literals section of the official Ruby documentation.
It is very concise, so you need to read carefully. But it explains the difference between double-quoted and single-quoted strings, and how they are equivalent to %Q/.../ and %q/.../ respectively.
If you enclose Ruby string in single qoutes, you can't use interpolation. That's how Ruby works.
Single-quoted strings don't process escape sequence \ and they don't do string interpolation.
For a better understanding, take a look at String concatenation vs. interpolation
To answer your question, you have to use "" when you want to do string interpolation:
name = 'world'
puts "Hello #{name}" # => "Hello world"
Using escape sequence:
puts 'Hello\nworld' # => "Hello\nworld"
puts "Hello\nworld" # => "Hello
world"
Ruby supports single-quoted string, for many uses like as follow:
>> 'foo'
=> "foo"
>> 'foo' + 'bar'
=> "foobar"
In above example, those two types of strings are identical. We can use double quote in place of single quote and we will get same output like above example.
As you face problem, while using interpolation in single quoted string because Ruby do not interpolate into single-quoted string. I am taking one example for more understanding:
>> '#{foo} bar'
=> "\#{foo} bar"
Here you can see that return values using double-quoted strings, which requires backslash to escape special characters such as #.
Single quoted string often useful because they are truly literal.
In the string interpolation concept, the essential difference between using single or double quotes is that double quotes allow for escape sequences while single quotes do not.
Let's take an example:
name = "Mike"
puts "Hello #{name} \n How are you?"
The above ruby code with string interpolation will interpolate the variable called name which is written inside brackets with its original value which is Mike. And it will also print the string How are you? in a separate line since we already placed an escape sequence there.
Output:
Hello Mike
How are you?
If you do the same with single quotes, it will treat the entire string as a text and it will print as it is including the escape sequence as well.
name = Mike'
puts 'Hello #{name} \n How are you'?
Output:
Hello #{name} \n How are you?
AKA How do I find an unescaped character sequence with regex?
Given an environment set up with:
#secret = "OH NO!"
$secret = "OH NO!"
##secret = "OH NO!"
and given string read in from a file that looks like this:
some_str = '"\"#{:NOT&&:very}\" bad. \u262E\n##secret \\#$secret \\\\###secret"'
I want to evaluate this as a Ruby string, but without interpolation. Thus, the result should be:
puts safe_eval(some_str)
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret #$secret \###secret
By contrast, the eval-only solution produces
puts eval(some_str)
#=> "very" bad. ☮
#=> OH NO! #$secret \OH NO!
At first I tried:
def safe_eval(str)
eval str.gsub(/#(?=[{#$])/,'\\#')
end
but this fails in the malicious middle case above, producing:
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret \OH NO! \###secret
You can do this via regex by ensuring that there are an even number of backslashes before the character you want to escape:
def safe_eval(str)
eval str.gsub( /([^\\](?:\\\\)*)#(?=[{#$])/, '\1\#' )
end
…which says:
Find a character that is not a backslash [^\\]
followed by two backslashes (?:\\\\)
repeated zero or more times *
followed by a literal # character
and ensure that after that you can see either a {, #, or $ character.
and replace that with
the non-backslash-maybe-followed-by-even-number-of-backslashes
and then a backslash and then a #
How about not using eval at all? As per this comment in chat, all that's necessary are escaping quotes, newlines, and unicode characters. Here's my solution:
ESCAPE_TABLE = {
/\\n/ => "\n",
/\\"/ => "\"",
}
def expand_escapes(str)
str = str.dup
ESCAPE_TABLE.each {|k, v| str.gsub!(k, v)}
#Deal with Unicode
str.gsub!(/\\u([0-9A-Z]{4})/) {|m| [m[2..5].hex].pack("U") }
str
end
When called on your string the result is (in your variable environment):
"\"\"\#{:NOT&&:very}\" bad. ☮\n\##secret \\\#$secret \\\\\###secret\""
Although I would have preferred not to have to treat unicode specially, it is the only way to do it without eval.
Problem
In a source file, I have a large number of strings. Some with interpolation, some with special symbols and some with neither.
I am trying to work out if I can replace the simple strings' double quotes with single quotes whilst leaving double quotes for the interpolated and special symbol strings. I would then run this conversion on one or more source code files.
I imagine there is probably a nice regex for this, but I can't quite formulate it.
Example - Code
Imagine the following code:
def myfunc(var, var2 = "abc")
s = "something"
puts "a simple string"
puts "string with a single ' quote"
puts "string with a newline \n"
puts "my #{var}"
end
Example - Result
I would like to turn it into this:
def myfunc(var, var2 = 'abc')
s = 'something'
puts 'a simple string'
puts "string with a single ' quote"
puts "string with a newline \n"
puts "my #{var}"
end
If anyone has any ideas I'd be very grateful!
Assuming that you can read your string from your file by yourself into an array strings:
strings = [ "\"a simple string\"",
"\"string with a single ' quote\"",
"\"string with a newline \n\""
"\"my \#{var}\"" ]
then we would eval them to see how they behave:
$SAFE = 4
single_quoted_when_possible = strings.map { |double_quoted|
begin
string = eval( double_quoted ) # this string, as Ruby sees it
raise unless string.is_a? String
raise unless '"' + string + '"' == double_quoted
rescue
raise "Array element is not a string!"
end
begin
raise unless eval( "'#{string}'" ) == string
"'#{string}'"
rescue
double_quoted
end
}
And that SAFE level 4 is just woodoo, just an acknowledgement from me that we are doing something dangerous. I do not know to what extent it actually protects against all dangers.
In your particular case, you can create a Regexp heuristic, relying on hope that nobody will write "evil" strings in your code, such as /= *(".+") *$/ or /\w+ *\(* *(".+") *\)* *$/. That heuristic would extract some string suspects, to which you could further apply the method I wrote higher above. But I would still have human look at each replacement, and run tests on the resulting code afterwards.
Does anyone know of a Ruby gem (or built-in, or native syntax, for that matter) that operates on the outer quote marks of strings?
I find myself writing methods like this over and over again:
remove_outer_quotes_if_quoted( myString, chars ) -> aString
add_outer_quotes_unless_quoted( myString, char ) -> aString
The first tests myString to see if its beginning and ending characters match any one character in chars. If so, it returns the string with quotes removed. Otherwise it returns it unchanged. chars defaults to a list of quote mark characters.
The second tests myString to see if it already begins and ends with char. If so, it returns the string unchanged. If not, it returns the string with char tacked on before and after, and any embedded occurrance of char is escaped with backslash. char defaults to the first in a default list of characters.
(My hand-cobbled methods don't have such verbose names, of course.)
I've looked around for similar methods in the public repos but can't find anything like this. Am I the only one that needs to do this alot? If not, how does everyone else do this?
If you do it a lot, you may want to add a method to String:
class String
def strip_quotes
gsub(/\A['"]+|['"]+\Z/, "")
end
end
Then you can just call string.strip_quotes.
Adding quotes is similar:
class String
def add_quotes
%Q/"#{strip_quotes}"/
end
end
This is called as string.add_quotes and uses strip_quotes before adding double quotes.
This might 'splain how to remove and add them:
str1 = %["We're not in Kansas anymore."]
str2 = %['He said, "Time flies like an arrow, Fruit flies like a banana."']
puts str1
puts str2
puts
puts str1.sub(/\A['"]/, '').sub(/['"]\z/, '')
puts str2.sub(/\A['"]/, '').sub(/['"]\z/, '')
puts
str3 = "foo"
str4 = 'bar'
[str1, str2, str3, str4].each do |str|
puts (str[/\A['"]/] && str[/['"]\z/]) ? str : %Q{"#{str}"}
end
The original two lines:
# >> "We're not in Kansas anymore."
# >> 'He said, "Time flies like an arrow, Fruit flies like a banana."'
Stripping quotes:
# >> We're not in Kansas anymore.
# >> He said, "Time flies like an arrow, Fruit flies like a banana."
Adding quotes when needed:
# >> "We're not in Kansas anymore."
# >> 'He said, "Time flies like an arrow, Fruit flies like a banana."'
# >> "foo"
# >> "bar"
I would use the value = value[1...-1] if value[0] == value[-1] && %w[' "].include?(value[0]). In short, this simple code checks whether first and last char of string are the same and removes them if they are single/double quote. Additionally as many as needed quote types can be added.
%w["adadasd" 'asdasdasd' 'asdasdasd"].each do |value|
puts 'Original value: ' + value
value = value[1...-1] if value[0] == value[-1] && %w[' "].include?(value[0])
puts 'Processed value: ' + value
end
The example above will print the following:
Original value: "adadasd"
Processed value: adadasd
Original value: 'asdasdasd'
Processed value: asdasdasd
Original value: 'asdasdasd"
Processed value: 'asdasdasd"