I'm trying to replace single quotes (') with escaped single quotes (\') in a string in ruby 1.9.3 and 1.8.7.
The exact problem string is "Are you sure you want to delete '%#'". This string should become "Are you sure you want to delete \'%#\'"
Using .gsub!(/\'/,"\'") leads to the following string "Are you sure you want to %#'%#".
Any ideas on what's going on?
String#gsub in the form gsub(exp,replacement) has odd quirks affecting the replacement string which sometimes require lots of escaping slashes. Ruby users are frequently directed to use the block form instead:
str.gsub(/'/){ "\\'" }
If you want to do away with escaping altogether, consider using an alternate string literal form:
str.gsub(/'/){ %q(\') }
Once you get used to seeing these types of literals, using them to avoid escape sequences can make your code much more readable.
\' in a substitution replacement string means "The portion of the original string after the match". So str.gsub!(/\'/,"\\'") replaces the ' character with everything after it - which is what you've noticed.
You need to further escape the backslash in the replacement. .gsub(/'/,"\\\\'") works in my irb console:
irb(main):059:0> puts a.gsub(/'/,"\\\\'")
Are you sure you want to delete \'%#\'
You need to escape the backslash. What about this?
"Are you sure you want to delete '%#'".gsub(/(?=')/, "\\")
# => "Are you sure you want to delete \\'%#\\'"
The above should be what you want. Your expected result is wrong. There is no way to literally see a single backslash when it means literally a backslash.
Related
Does anyone know how to provide a single backslash as the replacement value in Ruby's gsub method? I thought using double backslashes for the replacement value would result in a single backslash but it results in two backslashes.
Example: "a\b".gsub("\", "\\")
Result: a\\b
I also get the same result using a block:
Example: "a\b".gsub("\"){"\\"}
Result: a\\b
Obviously I can't use a single backslash for the replacement value since that would just serve to escape the quote that follows it. I've also tried using single (as opposed to double) quotes around the replacement value but still get two backslashes in the result.
EDIT: Thanks to the commenters I now realize my confusion was with how the Rails console reports the result of the operation (i.e. a\\b). Although the strings 'a\b' and 'a\\b' appear to be different, they both have the same length:
'a\b'.length (3)
'a\\b'.length (3)
You can represent a single backslash by either "\\" or '\\'. Try this in irb, where
"\\".size
correctly outputs 1, showing that you indeed have only one character in this string, not 2 as you think. You can also do a
puts "\\"
Similarily, your example
puts("a\b".gsub("\", "\\"))
correctly prints
a\b
I don't quite understand what's the point of escaping a single backslash when you have a string in single quotes in Ruby. Why does Ruby treats backslashes 'differently'?
backslashes are an escape character so if you were to write '\' would think you were trying to escape the '.
Otherwise if it treated single character strings differentls you wanted to write ' you would have to use double quotes, which will quickly get harder to maintain when you need to remember which quotes to use when.
If your question is actually "What is the point of the language design requiring us to escape a backslash in single (as opposed to double) quotes", then that is to allow single quotes to appear within a string literal written with single quotes. In order to do that, there must be an escape character for single quotes, which is the backslash, and then, the escape character itself needs to be escaped.
How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1' or '\\\\\\1'. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1 is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1' and '\1' are equivalent but '\' and '\\' are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1 to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1" means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1. It also understands \ as an escape character, so \\1 is not sufficient and will simply evaluate to the literal output \1. So, we need \\\1 in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1 therefore requires "\\\\\\1". In the case of single quotes one slash can be omitted as \1 is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/ style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the . for it to be understood as a literal . by the regexp engine, since "." and "\." both evaluate to . in double-quoted strings, but we need the engine itself to receive \..
This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')
Adding \ two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil
I have this string:
"some text\nandsomemore"
I need to remove the "\n" from it. I've tried
"some text\nandsomemore".gsub('\n','')
but it doesn't work. How do I do it? Thanks for reading.
You need to use "\n" not '\n' in your gsub. The different quote marks behave differently.
Double quotes " allow character expansion and expression interpolation ie. they let you use escaped control chars like \n to represent their true value, in this case, newline, and allow the use of #{expression} so you can weave variables and, well, pretty much any ruby expression you like into the text.
While on the other hand, single quotes ' treat the string literally, so there's no expansion, replacement, interpolation or what have you.
In this particular case, it's better to use either the .delete or .tr String method to delete the newlines.
See here for more info
If you want or don't mind having all the leading and trailing whitespace from your string removed you can use the strip method.
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
as mentioned here.
edit The original title for this question was different. My answer is for the original question.
When you want to remove a string, rather than replace it you can use String#delete (or its mutator equivalent String#delete!), e.g.:
x = "foo\nfoo"
x.delete!("\n")
x now equals "foofoo"
In this specific case String#delete is more readable than gsub since you are not actually replacing the string with anything.
You don't need a regex for this. Use tr:
"some text\nandsomemore".tr("\n","")
use chomp or strip functions from Ruby:
"abcd\n".chomp => "abcd"
"abcd\n".strip => "abcd"
Which style of Ruby string quoting do you favour? Up until now I've always used 'single quotes' unless the string contains certain escape sequences or interpolation, in which case I obviously have to use "double quotes".
However, is there really any reason not to just use double quoted strings everywhere?
Don't use double quotes if you have to escape them. And don't fall in "single vs double quotes" trap. Ruby has excellent support for arbitrary delimiters for string literals:
Mirror of Site - https://web.archive.org/web/20160310224440/http://rors.org/2008/10/26/dont-escape-in-strings
Original Site -
http://rors.org/2008/10/26/dont-escape-in-strings
I always use single quotes unless I need interpolation.
Why? It looks nicer. When you have a ton of stuff on the screen, lots of single quotes give you less "visual clutter" than lots of double quotes.
I'd like to note that this isn't something I deliberately decided to do, just something that I've 'evolved' over time in trying to achieve nicer looking code.
Occasionally I'll use %q or %Q if I need in-line quotes. I've only ever used heredocs maybe once or twice.
Like many programmers, I try to be as specific as is practical. This means that I try to make the compiler do as little work as possible by having my code as simple as possible. So for strings, I use the simplest method that suffices for my needs for that string.
<<END
For strings containing multiple newlines,
particularly when the string is going to
be output to the screen (and thus formatting
matters), I use heredocs.
END
%q[Because I strongly dislike backslash quoting when unnecessary, I use %Q or %q
for strings containing ' or " characters (usually with square braces, because they
happen to be the easiest to type and least likely to appear in the text inside).]
"For strings needing interpretation, I use %s."%['double quotes']
'For the most common case, needing none of the above, I use single quotes.'
My first simple test of the quality of syntax highlighting provided by a program is to see how well it handles all methods of quoting.
I use single quotes unless I need interpolation. The argument about it being troublesome to change later when you need interpolation swings in the other direction, too: You have to change from double to single when you found that there was a # or a \ in your string that caused an escape you didn't intend.
The advantage of defaulting to single quotes is that, in a codebase which adopts this convention, the quote type acts as a visual cue as to whether to expect interpolated expressions or not. This is even more pronounced when your editor or IDE highlights the two string types differently.
I use %{.....} syntax for multi-line strings.
I usually use double quotes unless I specifically need to disable escaping/interpolation.
I see arguments for both:
For using mostly double quotes:
The github ruby style guideline advocates always using double quotes:
It's easier to search for a string foobar by searching for "foobar" if you were consistent with quoting. However, I'm not. So I search for ['"]foobar['"] turning on regexps.
For using some combination of single double quotes:
Know if you need to look for string interpolation.
Might be slightly faster (although so slight it wasn't enough to affect the github style guide).
I used to use single quotes until I knew I needed interpolation. Then I found that I was wasting a lot of time when I'd go back and have to change some single-quotes to double-quotes. Performance testing showed no measurable speed impact of using double-quotes, so I advocate always using double-quotes.
The only exception is when using sub/gsub with back-references in the replacement string. Then you should use single quotes, since it's simpler.
mystring.gsub( /(fo+)bar/, '\1baz' )
mystring.gsub( /(fo+)bar/, "\\1baz" )
I use single quotes unless I need interpolation, or the string contains single quotes.
However, I just learned the arbitrary delimiter trick from Dejan's answer, and I think it's great. =)
Single quote preserve the characters inside them. But double quotes evaluate and parse them. See the following example:
"Welcome #{#user.name} to App!"
Results:
Welcome Bhojendra to App!
But,
'Welcome #{#user.name} to App!'
Results:
Welcome #{#user.name} to App!