Problem
In a source file, I have a large number of strings. Some with interpolation, some with special symbols and some with neither.
I am trying to work out if I can replace the simple strings' double quotes with single quotes whilst leaving double quotes for the interpolated and special symbol strings. I would then run this conversion on one or more source code files.
I imagine there is probably a nice regex for this, but I can't quite formulate it.
Example - Code
Imagine the following code:
def myfunc(var, var2 = "abc")
s = "something"
puts "a simple string"
puts "string with a single ' quote"
puts "string with a newline \n"
puts "my #{var}"
end
Example - Result
I would like to turn it into this:
def myfunc(var, var2 = 'abc')
s = 'something'
puts 'a simple string'
puts "string with a single ' quote"
puts "string with a newline \n"
puts "my #{var}"
end
If anyone has any ideas I'd be very grateful!
Assuming that you can read your string from your file by yourself into an array strings:
strings = [ "\"a simple string\"",
"\"string with a single ' quote\"",
"\"string with a newline \n\""
"\"my \#{var}\"" ]
then we would eval them to see how they behave:
$SAFE = 4
single_quoted_when_possible = strings.map { |double_quoted|
begin
string = eval( double_quoted ) # this string, as Ruby sees it
raise unless string.is_a? String
raise unless '"' + string + '"' == double_quoted
rescue
raise "Array element is not a string!"
end
begin
raise unless eval( "'#{string}'" ) == string
"'#{string}'"
rescue
double_quoted
end
}
And that SAFE level 4 is just woodoo, just an acknowledgement from me that we are doing something dangerous. I do not know to what extent it actually protects against all dangers.
In your particular case, you can create a Regexp heuristic, relying on hope that nobody will write "evil" strings in your code, such as /= *(".+") *$/ or /\w+ *\(* *(".+") *\)* *$/. That heuristic would extract some string suspects, to which you could further apply the method I wrote higher above. But I would still have human look at each replacement, and run tests on the resulting code afterwards.
Related
Here's a small part of my code, pretty self explanatory, it copies all characters to temp from input and skips spaces.
input = gets.to_s.chomp
temp=String.new
for i in 0..input.length-1
if (input[i]==" ")
next
else
temp[i]=input[i]
end
end
puts "#{temp},END"
gets
However, i tested it with a 'hello world' input, and it should've given me helloworld But i'm getting
8:in '[]=':index 6 out of string(IndexError)
meaning the problem starts while it's skipping the space, for some reason.
keep in mind that i don't get any errors if i put a string that doesn't contain a space
Whenever string manipulation is required, it may be desirable to convert the string to an array of its parts, manipulate those parts and then join them back into a string, but more often as not it is simpler to just operate on the string itself, mainly using methods from the class String. Here you could Kernel#puts the following.
"%s END" % gets.delete(" \n")
#=> "helloworld"
String#delete removes both spaces and the return character ("\n") that Kernel#gets tacks onto the end of the string that is entered. A variant of this is "%s END" % gets.chomp.delete(" ").
Another way would be to puts
"%s END" % gets.gsub(/\s/, '')
#=> "helloworld"
The regular expression /\s/, causes String#gsub to remove all whitespace, which includes both spaces (and tabs) that are entered and the "\n" that gets tacks on to the end of the string.
I guess your error is due to the difference between the string 'hello world' and that you're "rejecting" whitespaces. In such case, for each whitespace in the string being used, the temp will have one less.
You can assign the input[i] when isn't a whitespace to the temp variable in the position temp.size, this way you don't skip indexes.
It could be temp[temp.size] or just modifying temp with +=.
for i in 0...input.size
if input[i] == ' '
next
else
temp[temp.size] = input[i]
end
end
Note you can replace the for loop for each (the Ruby way):
input = 'hello world'
temp = ''
(0...input.size).each do |index|
input[index] == ' ' ? next : temp[temp.size] = input[index]
end
# helloworld
If you want to skip all white spaces from your input and print the output, you can do so with a one-liner:
puts "#{gets.chomp.split.join}, END"
In ruby, you hardly need to write loops using for construct unlike other traditional languages like Java.
I was trying to write a single string across multiple lines since it was too long and I arrived at this solution that I think looks best but I couldn't find anything about it in the Ruby documentation
my_original = 'what I originally had ' +
'across multiple lines'
# executes to: "what I originally had across multiple lines"
new_style = 'new format to '\
'span multiple lines'
# executes to: "new format to span multiple lines\n"
However I saw of lot of ways this could be done and I was wondering whether they uses concatenation or interpolation and all I could find was this. In this case performance is not particularly important in this case but having the knowledge of what goes on under the covers cant hurt. So was wondering the differences between these.
my_original = 'what I originally had ' +
'across multiple lines'
# executes to: "what I originally had across multiple lines"
s_one = 'I assume this is a more '
s_two = 'verbose version of the '
s_three = 'first example.'
my_string = s_one + s_two + s_three
# executes to: "I assume this is a more verbose version of the first example."
my_first_solution = 'that breaks whenever'
my_first_solution << 'ruby 3.0 might be released'
# executes to:
style_i_used = 'which can span multiple '\
'lines without having '\
'extra white space'
# executes to: "which can span multiple lines without having extra white space"
another_string = <<-HEREDOC
when you don't mind
really wonky indentation
or having extra spaces.
HEREDOC
# executes to:"when you don't mind \nreally wonky indentation \n or having extra spaces\n and new lines. \n"
# "Is this just a one line string with no special characters?# executes to:
another_string = <<~HEREDOC
I assume this is the same as
the non squiggly version with
stripping between lines.
HEREDOC
# executes to: "I assume this is the same as \nthe non squiggly version with \nstripping between lines.\n"
#Edit
another_one =
"I did not originally include; however,
this one also adds a bunch of extra
white space and new lines."
# executes to:"I did not originally include; however,\n this one also adds a bunch of extra \n white space and new lines."
This is a NoMethodError
style_i_used = 'a format to' /
'span multiple lines'
/ is a method call which Strings don't have. \ on the other hand "escapes" the newline so it's treated as one string.
These won't give an error, but are bad:
my_original = 'what I originally had' +
'across multiple lines'
s_one = 'I assume this is the same as '
s_two = 'the first example with the '
s_three = 'code being more verbose'
my_string = s_one + s_two + s_three
my_first_solution = 'that breaks whenever'
my_first_solution << 'ruby 3.0 might be released'
+ and << are method calls. I don't think the Ruby parser is smart enough to see that these can all be compile-time strings so these strings are actually being constructed during run-time with object allocation and everything. That is entirely unnecessary.
The other kinds have different usage/results, so you use whichever is appropriate.
str = "a"\
"b" # result is "ab"
str = "a\
b" # also "ab"
str = "a
b" # result is "a\nb"
str = <<END
'"a'"
'"b'"
END
# result is " '\"a'\"\n '\"b'\""
str = <<~END
'"a'"
'"b'"
END
# result is "'\"a'\"\n'\"b'\""
another_one =
"what I originally had
across multiple lines"
AKA How do I find an unescaped character sequence with regex?
Given an environment set up with:
#secret = "OH NO!"
$secret = "OH NO!"
##secret = "OH NO!"
and given string read in from a file that looks like this:
some_str = '"\"#{:NOT&&:very}\" bad. \u262E\n##secret \\#$secret \\\\###secret"'
I want to evaluate this as a Ruby string, but without interpolation. Thus, the result should be:
puts safe_eval(some_str)
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret #$secret \###secret
By contrast, the eval-only solution produces
puts eval(some_str)
#=> "very" bad. ☮
#=> OH NO! #$secret \OH NO!
At first I tried:
def safe_eval(str)
eval str.gsub(/#(?=[{#$])/,'\\#')
end
but this fails in the malicious middle case above, producing:
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret \OH NO! \###secret
You can do this via regex by ensuring that there are an even number of backslashes before the character you want to escape:
def safe_eval(str)
eval str.gsub( /([^\\](?:\\\\)*)#(?=[{#$])/, '\1\#' )
end
…which says:
Find a character that is not a backslash [^\\]
followed by two backslashes (?:\\\\)
repeated zero or more times *
followed by a literal # character
and ensure that after that you can see either a {, #, or $ character.
and replace that with
the non-backslash-maybe-followed-by-even-number-of-backslashes
and then a backslash and then a #
How about not using eval at all? As per this comment in chat, all that's necessary are escaping quotes, newlines, and unicode characters. Here's my solution:
ESCAPE_TABLE = {
/\\n/ => "\n",
/\\"/ => "\"",
}
def expand_escapes(str)
str = str.dup
ESCAPE_TABLE.each {|k, v| str.gsub!(k, v)}
#Deal with Unicode
str.gsub!(/\\u([0-9A-Z]{4})/) {|m| [m[2..5].hex].pack("U") }
str
end
When called on your string the result is (in your variable environment):
"\"\"\#{:NOT&&:very}\" bad. ☮\n\##secret \\\#$secret \\\\\###secret\""
Although I would have preferred not to have to treat unicode specially, it is the only way to do it without eval.
Problem
In a source file, I have a large number of strings.ome with interpolation, some with special symbols and some with neither.
I am trying to work out if I can replace the single quotes with double quotes whilst converting escaped single quote characters. I would then run this conversion on one or more source code files.
Example - Code
Imagine the following code:
def myfunc(var, var2 = 'abc')
s = 'something'
puts 'a simple string'
puts 'string with an escaped quote \' in it'
x = "nasty #{interpolated}" + s + ' and single quote combo'
puts "my #{var}"
end
Example - Result
I would like to turn it into this:
def myfunc(var, var2 = "abc")
s = "something"
puts "a simple string"
puts "string with an escaped quote ' in it"
x = "nasty #{interpolated}" + s + " and single quote combo"
puts "my #{var}"
end
If anyone has any ideas I'd be very grateful!
You want negative look behind (?<!) operator:
REGEX
(?<!\)'
DEMO
http://regex101.com/r/rN5eE6
EXPLANATION
You want to replace any single quote not preceded by a backslash.
Don't forget to do a find and replace of all \' with '
THERE IS MORE
For this use case, even if it's a simple use case, a ruby parser would perform better.
As Peter Hamilton pointed out, although replacing single quoted strings with double quoted equivalents might seem as an easy task at first, even that cannot be done easily, if at all, with regexen, mainly thanks to the possibility of single quotes in the "wrong places", such as within double-quoted strings, %q literal string constructs, heredocs, comments...
x = 'puts "foo"'
y = %/puts 'foo'/ # TODO: Replace "x = %/puts 'foo'/" with "x = %#puts 'bar'#"
But the correct solution, in this case, is much easier than the other way around (double quoted to single quoted), and actually partially attainable:
require 'ripper'
require 'sorcerer' # gem install sorcerer if necessary
my_source = <<-source
x = 'puts "foo"'
y = "puts 'bar'"
source
sexp = Ripper::SexpBuilder.new( my_source ).parse
double_quoted_source = Sorcerer.source sexp
#=> "x = \"puts \"foo\"\"; y = \"puts 'bar'\""
The reason why I say "partially attainable" is because, as you can see by yourself,
puts double_quoted_source
#=> x = "puts "foo""; y = "puts 'bar'"
Sorcerer forgets to escape double quotes inside formerly single-quoted string. Feel free to submit a patch
to sorcerer's author Jim Weirich that would fix the problem.
Does anyone know of a Ruby gem (or built-in, or native syntax, for that matter) that operates on the outer quote marks of strings?
I find myself writing methods like this over and over again:
remove_outer_quotes_if_quoted( myString, chars ) -> aString
add_outer_quotes_unless_quoted( myString, char ) -> aString
The first tests myString to see if its beginning and ending characters match any one character in chars. If so, it returns the string with quotes removed. Otherwise it returns it unchanged. chars defaults to a list of quote mark characters.
The second tests myString to see if it already begins and ends with char. If so, it returns the string unchanged. If not, it returns the string with char tacked on before and after, and any embedded occurrance of char is escaped with backslash. char defaults to the first in a default list of characters.
(My hand-cobbled methods don't have such verbose names, of course.)
I've looked around for similar methods in the public repos but can't find anything like this. Am I the only one that needs to do this alot? If not, how does everyone else do this?
If you do it a lot, you may want to add a method to String:
class String
def strip_quotes
gsub(/\A['"]+|['"]+\Z/, "")
end
end
Then you can just call string.strip_quotes.
Adding quotes is similar:
class String
def add_quotes
%Q/"#{strip_quotes}"/
end
end
This is called as string.add_quotes and uses strip_quotes before adding double quotes.
This might 'splain how to remove and add them:
str1 = %["We're not in Kansas anymore."]
str2 = %['He said, "Time flies like an arrow, Fruit flies like a banana."']
puts str1
puts str2
puts
puts str1.sub(/\A['"]/, '').sub(/['"]\z/, '')
puts str2.sub(/\A['"]/, '').sub(/['"]\z/, '')
puts
str3 = "foo"
str4 = 'bar'
[str1, str2, str3, str4].each do |str|
puts (str[/\A['"]/] && str[/['"]\z/]) ? str : %Q{"#{str}"}
end
The original two lines:
# >> "We're not in Kansas anymore."
# >> 'He said, "Time flies like an arrow, Fruit flies like a banana."'
Stripping quotes:
# >> We're not in Kansas anymore.
# >> He said, "Time flies like an arrow, Fruit flies like a banana."
Adding quotes when needed:
# >> "We're not in Kansas anymore."
# >> 'He said, "Time flies like an arrow, Fruit flies like a banana."'
# >> "foo"
# >> "bar"
I would use the value = value[1...-1] if value[0] == value[-1] && %w[' "].include?(value[0]). In short, this simple code checks whether first and last char of string are the same and removes them if they are single/double quote. Additionally as many as needed quote types can be added.
%w["adadasd" 'asdasdasd' 'asdasdasd"].each do |value|
puts 'Original value: ' + value
value = value[1...-1] if value[0] == value[-1] && %w[' "].include?(value[0])
puts 'Processed value: ' + value
end
The example above will print the following:
Original value: "adadasd"
Processed value: adadasd
Original value: 'asdasdasd'
Processed value: asdasdasd
Original value: 'asdasdasd"
Processed value: 'asdasdasd"