Backslash + captured group within Ruby regular expression - ruby

How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?

As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1' or '\\\\\\1'. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1 is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1' and '\1' are equivalent but '\' and '\\' are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1 to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1" means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1. It also understands \ as an escape character, so \\1 is not sufficient and will simply evaluate to the literal output \1. So, we need \\\1 in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1 therefore requires "\\\\\\1". In the case of single quotes one slash can be omitted as \1 is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/ style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the . for it to be understood as a literal . by the regexp engine, since "." and "\." both evaluate to . in double-quoted strings, but we need the engine itself to receive \..

This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')

Adding \ two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil

Related

Ruby method gsub with string '+'

I've found interesting thing in ruby. Do anybody know why is behavior?
tried '+'.gsub!('+', '\+') and expected "\\+" but got ""(empty string)
gsub is implemented, after some indirection, as rb_sub_str_bang in C, which calls rb_reg_regsub.
Now, gsub is supposed to allow the replacement string to contain backreferences. That is, if you pass a regular expression as the first argument and that regex defines a capture group, then your replacement string can include \1 to indicate that that capture group should be placed at that position.
That behavior evidently still happens if you pass an ordinary, non-regex string as the pattern. Your verbatim string obviously won't have any capture groups, so it's a bit silly in this case. But trying to replace, for instance, + with \1 in the string + will give the empty string, since \1 says to go get the first capture group, which doesn't exist and hence is vacuously "".
Now, you might be thinking: + isn't a number. And you'd be right. You're replacing + with \+. There are several other backreferences allowed in your replacement string. I couldn't find any official documentation where these are written down, but the source code does quite fine. To summarize the code:
Digits \1 through \9 refer to numbered capture groups.
\k<...> refers to a named capture group, with the name in angled brackets.
\0 or \& refer to the whole substring that was matched, so (\0) as a replacement string would enclose the match in parentheses.
A backslash followed by a backtick (I have no idea how to write that using StackOverflow's markdown) refers to the entire string up to the match.
\' refers to the entire string following the match.
\+ refers to the final capture group, i.e. the one with the highest number.
\\ is a literal backslash.
(Most of these are based on Perl variables of a similar name)
So, in your examples,
\+ as the replacement string says "take the last capture group". There is no capture group, so you get the empty string.
\- is not a valid backreference, so it's replaced verbatim.
\ok is, likewise, not a backreference, so it's replaced verbatim.
In \\+, Ruby eats the first backslash sequence, so the actual string at runtime is \+, equivalent to the first example.
For \\\+, Ruby processes the first backslash sequence, so we get \\+ by the time the replacement function sees it. \\ is a literal backslash, and + is no longer part of an escape sequence, so we get \+.

Single backslash for Ruby gsub replacement value?

Does anyone know how to provide a single backslash as the replacement value in Ruby's gsub method? I thought using double backslashes for the replacement value would result in a single backslash but it results in two backslashes.
Example: "a&bsol;b".gsub("&bsol;", "\\")
Result: a\\b
I also get the same result using a block:
Example: "a&bsol;b".gsub("&bsol;"){"\\"}
Result: a\\b
Obviously I can't use a single backslash for the replacement value since that would just serve to escape the quote that follows it. I've also tried using single (as opposed to double) quotes around the replacement value but still get two backslashes in the result.
EDIT: Thanks to the commenters I now realize my confusion was with how the Rails console reports the result of the operation (i.e. a\\b). Although the strings 'a\b' and 'a\\b' appear to be different, they both have the same length:
'a\b'.length (3)
'a\\b'.length (3)
You can represent a single backslash by either "\\" or '\\'. Try this in irb, where
"\\".size
correctly outputs 1, showing that you indeed have only one character in this string, not 2 as you think. You can also do a
puts "\\"
Similarily, your example
puts("a&bsol;b".gsub("&bsol;", "\\"))
correctly prints
a\b

How to replace single quotes with escaped single quotes in ruby

I'm trying to replace single quotes (') with escaped single quotes (\') in a string in ruby 1.9.3 and 1.8.7.
The exact problem string is "Are you sure you want to delete '%#'". This string should become "Are you sure you want to delete \'%#\'"
Using .gsub!(/\'/,"\'") leads to the following string "Are you sure you want to %#'%#".
Any ideas on what's going on?
String#gsub in the form gsub(exp,replacement) has odd quirks affecting the replacement string which sometimes require lots of escaping slashes. Ruby users are frequently directed to use the block form instead:
str.gsub(/'/){ "\\'" }
If you want to do away with escaping altogether, consider using an alternate string literal form:
str.gsub(/'/){ %q(\') }
Once you get used to seeing these types of literals, using them to avoid escape sequences can make your code much more readable.
\' in a substitution replacement string means "The portion of the original string after the match". So str.gsub!(/\'/,"\\'") replaces the ' character with everything after it - which is what you've noticed.
You need to further escape the backslash in the replacement. .gsub(/'/,"\\\\'") works in my irb console:
irb(main):059:0> puts a.gsub(/'/,"\\\\'")
Are you sure you want to delete \'%#\'
You need to escape the backslash. What about this?
"Are you sure you want to delete '%#'".gsub(/(?=')/, "\\")
# => "Are you sure you want to delete \\'%#\\'"
The above should be what you want. Your expected result is wrong. There is no way to literally see a single backslash when it means literally a backslash.

ruby rspec and strings comparison

I'm not a ruby expert and may be this will seem a silly question...but I'm too courious about an oddity (I think) I've found in RSpec matcher called match.
You know match takes in input a string or a regex. Example:
"test".should match "test" #=> will pass
"test".should match /test/ #=> will pass
The strange begins when you insert special regex characters in the input string:
"*test*".should match "*test*" #=> will fail throwing a regex exception
This means (I thought) that input strings are interpreted as regex, then I should escape special regex characters to make it works:
"*test*".should match "\*test\*" #=> will fail with same exception
"*test*".should match /\*test\*/ #=> will pass
From this basic test, I understand that match treats input strings as regular expressions but it does not allow you to escape special regex characters.
Am I true? Is not this a singular behavior? I mean, it's a string or a regex!
EDIT AFTER ANSWER:
Following DigitalRoss (right) answer the following tests passed:
"*test*".should match "\\*test\\*" #=> pass
"*test*".should match '\*test\*' #=> pass
"*test*".should match /\*test\*/ #=> pass
What you are seeing is the different interpretation of backslash-escaped characters in String vs Regexp. In a soft (") quoted string, \* becomes a *, but /\*/ is really a backslash followed by a star.
If you use hard quotes (') for the String objects or double the backslash characters (only for the Strings, though) then your tests should produce the same results.

How can I remove the string "\n" from within a Ruby string?

I have this string:
"some text\nandsomemore"
I need to remove the "\n" from it. I've tried
"some text\nandsomemore".gsub('\n','')
but it doesn't work. How do I do it? Thanks for reading.
You need to use "\n" not '\n' in your gsub. The different quote marks behave differently.
Double quotes " allow character expansion and expression interpolation ie. they let you use escaped control chars like \n to represent their true value, in this case, newline, and allow the use of #{expression} so you can weave variables and, well, pretty much any ruby expression you like into the text.
While on the other hand, single quotes ' treat the string literally, so there's no expansion, replacement, interpolation or what have you.
In this particular case, it's better to use either the .delete or .tr String method to delete the newlines.
See here for more info
If you want or don't mind having all the leading and trailing whitespace from your string removed you can use the strip method.
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
as mentioned here.
edit The original title for this question was different. My answer is for the original question.
When you want to remove a string, rather than replace it you can use String#delete (or its mutator equivalent String#delete!), e.g.:
x = "foo\nfoo"
x.delete!("\n")
x now equals "foofoo"
In this specific case String#delete is more readable than gsub since you are not actually replacing the string with anything.
You don't need a regex for this. Use tr:
"some text\nandsomemore".tr("\n","")
use chomp or strip functions from Ruby:
"abcd\n".chomp => "abcd"
"abcd\n".strip => "abcd"

Resources