Ruby - Simple way to escape regex characters - ruby

I have an expression like this:
s.gsub! /[\?\/\\]/, ''
In fact, the list of forbidden characters is longer than that. Some require escaping, some don't. Is there a way I could just put the characters in some literal structure (?/\) and say "remove all these from the string". I'm aware of Regexp.quote but not sure how to use it in this context.

You can interpolate a statement inside a Ruby regular expression, exactly like you do for strings.
/#{...}/
In your case
s.gsub! /[#{Regexp.escape('?/\')}]/, ''

Related

Escape whitespace characters in Ruby

I have a string:
"hello\n\nsomeletters\t\nmoreletters\n"
What I want:
"hello\\n\\nsomeletters\\t\\nmoreletters\\n"
How to do it?
I know a gsub way. But it sounds very simple and seems to be a common problem therefore I am sure that Ruby Gods have already sent us a solution.
There are different possibilities. The closest to what you want would be Regexp#escape:
Regexp.escape "hello\n\nsomeletters\t\nmoreletters\n"
#⇒ "hello\\n\\nsomeletters\\t\\nmoreletters\\n"
But be aware it will escape some other symbols having a special meaning in regular expressions.
Also, we have Shellwords#escape, which is probably not what you want here.
For escaping backslashes only there is no dedicated method because this operation basically has a little sense and it is not worth it to call it instead of:
"hello\n\nsomeletters\t\nmoreletters\n".gsub(
/\n|\t/, {"\n" => "\\n", "\t" => "\\t"}
)
Please note, there are no slash characters in the initial string, hence you are to match all the expected sequences.

delete matched characters using regex in ruby

I need to write a regex for the following text:
"How can you restate your point (something like: \"<font>First</font>\") as a clear topic?"
that keeps whatever is between the
\" \"
characters (in this case <font>First</font>
I came up with this:
/"How can you restate your point \(something like: |\) as a clear topic\?"/
but how do I get ruby to remove the unwanted surrounding text and only return <font>First</font>?
lookbehind, lookahead and making what is greedy, lazy.
str[/(?<=\").+?(?=\")/] #=> "<font>First</font>"
If you have strings just like that, you can .split and get the first:
> str.split(/"/)[1]
=> "<font>First</font>"
You certainly can use a regular expression, but you don't need to:
str = "How can you restate (like: \"<font>First</font>\") as a clear topic?"
str[str.index('"')+1...str.rindex('"')]
#=> "<font>First</font>"
or, for those like me who never use three dots:
str[str.index('"')+1..str.rindex('"')-1]

Backslash + captured group within Ruby regular expression

How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1' or '\\\\\\1'. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1 is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1' and '\1' are equivalent but '\' and '\\' are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1 to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1" means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1. It also understands \ as an escape character, so \\1 is not sufficient and will simply evaluate to the literal output \1. So, we need \\\1 in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1 therefore requires "\\\\\\1". In the case of single quotes one slash can be omitted as \1 is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/ style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the . for it to be understood as a literal . by the regexp engine, since "." and "\." both evaluate to . in double-quoted strings, but we need the engine itself to receive \..
This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')
Adding \ two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil

Matching braces in ruby with a character in front

I have read quite a few posts here for matching nested braces in Ruby using Regexp. However I cannot adapt it to my situation and I am stuck. The Ruby 1.9 book uses the following to match a set of nested braces
/\A(?<brace_expression>{([^{}]|\g<brace_expression>)*})\Z/x
I am trying to alter this in three ways. 1. I want to use parentheses instead of braces, 2. I want a character in front (such as a hash symbol), and 3. I want to match anywhere in the string, not just beginning and end. Here is what I have so far.
/(#(?<brace_expression>\(([^\(\)]|\g<brace_expression>)*\)))/x
Any help in getting the right expression would be appreciated.
Using the regex modifier x enables comments in the regex. So the # in your regex is interpreted as a comment character and the rest of the regex is ignored. You'll need to either escape the # or remove the x modifier.
Btw: There's no need to escape the parentheses inside [].

Another way instead of escaping regex patterns?

Usually when my regex patterns look like this:
http://www.microsoft.com/
Then i have to escape it like this:
string.match(/http:\/\/www\.microsoft\.com\//)
Is there another way instead of escaping it like that?
I want to be able to just use it like this http://www.microsoft.com, cause I don't want to escape all the special characters in all my patterns.
Regexp.new(Regexp.quote('http://www.microsoft.com/'))
Regexp.quote simply escapes any characters that have special regexp meaning; it takes and returns a string. Note that . is also special. After quoting, you can append to the regexp as needed before passing to the constructor. A simple example:
Regexp.new(Regexp.quote('http://www.microsoft.com/') + '(.*)')
This adds a capturing group for the rest of the path.
You can also use arbitrary delimiters in Ruby for regular expressions by using %r and defining a character before the regular expression, for example:
%r!http://www.microsoft.com/!
Regexp.quote or Regexp.escape can be used to automatically escape things for you:
https://ruby-doc.org/core/Regexp.html#method-c-escape
The result can be passed to Regexp.new to create a Regexp object, and then you can call the object's .match method and pass it the string to match against (the opposite order from string.match(/regex/)).
You can simply use single quotes for escaping.
string.match('http://www.microsoft.com/')
you can also use %q{} if you need single quotes in the text itself. If you need to have variables extrapolated inside the string, then use %Q{}. That's equivalent to double quotes ".
If the string contains regex expressions (eg: .*?()[]^$) that you want extrapolated, use // or %r{}
For convenience I just define
def regexcape(s)
Regexp.new(Regexp.escape(s))
end

Resources