This works:
str = "California (LA) rocks"
match_string = "rocks"
str.match(match_string) # => #<MatchData "rocks">
Why does this not work?
match_string = "(LA) rocks"
str.match(match_string) # => nil
You must escape the parenthesis for match to work, otherwise they will be interpreted as part of the regex pattern (i.e. as a capturing group), and not as part of the string to be matched.
To escape the parenthesis you can use a \:
match_string = '\(LA\) rocks'
str.match(match_string)
#=> #<MatchData "(LA) rocks">
Notice the use of single quotes (') instead of double quotes ("); if you want to use double quotes instead, you will need to use double \:
match_string = "\\(LA\\) rocks"
Because the argument of match is converted into a regex. In particular, the parentheses in "(LA) rocks" are interpreted as meta characters, not as literal parentheses. In fact, the following matches:
"California LA rocks".match("(LA) rocks")
# => #<MatchData "LA rocks" 1:"LA">
Related
I'm splitting a search result string so I can use Rails Highlight to highlight the terms. In some cases, there will be exact matches and single words in the same search term and I'm trying to write regex that will do that in a single pass.
search_term = 'pizza cheese "ham and pineapple" pepperoni'
search_term.split(/\W+/)
=> ["pizza", "cheese", "ham", "and", "pineapple", "pepperoni"]
search_term.split(/(?=\")\W+/)
=> ["pizza cheese ", "ham and pineapple", "pepperoni"]
I can get ham and pineapple on its own (without the unwanted quotes), and I can easily split all the words, but is there some regex that will return an array like:
search_term.split(🤷♂️)
=> ["pizza", "cheese", "ham and pineapple", "pepperoni"]
Yes:
/"[^"]*?"|\w+/
https://regex101.com/r/fzHI4g/2
Not done as a split. Just take stuff in quotes, or single words...each one is a match.
£ cat pizza
pizza "a and b" pie
£ ruby -ne 'print $_.scan(/"[^"]*?"|\w+/)' pizza
["pizza", "\"a and b\"", "pie"]
£
so...search_term.scan(/regex/) seems to return the array you want.
To exclude the quotes you need:
This puts the quotes in lookarounds which assert that the matched expression has a quote before it (lookbehind), and a quote after it (lookahead) rather than containing the quotes.
/(?<=")\w[^"]*?(?=")|\w+/
Note that because the last regex doesn't consume the quotes, it uses whitespace to determine beginning vs. ending quotes so " a bear" is not ok. This can be solved with capture groups, but if this is an issue, like I said in the comments, I would recommend just trimming quotes off each array element and using the regex at the top of the answer.
r = /
(?<=\") # match a double quote in a positive lookbehind
(?!\s) # next char cannot be a whitespace, negative lookahead
[^"]+ # match one or more characters other than double-quote
(?<!\s) # previous char cannot be a whitespace, negative lookbehind
(?=\") # match a double quote in a positive lookahead
| # or
\w+ # match one or more word characters
/x # free-spacing regex definition mode
str = 'pizza "ham and pineapple" mushroom pepperoni "sausage and anchovies"'
str.scan r
#=> ["pizza", "ham and pineapple", "mushroom", "pepperoni",
# "sausage and anchovies"]
s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?
You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.
s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.
You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source
Lets say I have a string:
str = "12345\56789"
How to split above string into 2 words?
["12345","56789"]
str = "12345/56789"
print str.split('/') # => ["12345", "56789"]
Edit: With the change to a backslash, it should be:
str = '12345\56789'
print str.split('\\') # => ["12345", "56789"]
You need the double backslash to avoid escaping the closing quote mark.
Regexp.quote returns a string with special characters escaped. This returned string can be split with '\\'.
So the solution is: Regexp.quote('00050\00050').split('\\')[0]
I am struggling to write a Ruby regexp that will match all words which: starts with 2 or 3 letters, then have backslash (\) and then have 7 or 8 letters and digits. The expression I use is like this:
p "BFD\082BBSA".match %r{\A[a-zA-Z]{2,3}\/[a-zA-Z0-9]{7,8}\z}
But each time this code returns nil. What am I doing wrong?
Try as below :
'BFD\082BBSA'.match %r{\A[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}\z}
# => #<MatchData "BFD\\082BBSA">
#or
"BFD\\082BBSA".match %r{\A[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}\z}
# => #<MatchData "BFD\\082BBSA">
Read this also - Backslashes in Single quoted strings vs. Double quoted strings in Ruby?
The problem is that you actually have no backslash in your string, just a null Unicode character:
"BFD\082BBSA"
# => "BFD\u000082BBSA"
So you just have to escape the backslash in the string:
"BFD\\082BBSA"
# => "BFD\\082BBSA"
Moreover, as others pointed out, \/ will match a forward slash, so you have to change \/ into \\:
"BFD\\082BBSA".match(/\A[a-z]{2,3}\\[a-z0-9]{7,8}\z/i)
# => #<MatchData "BFD\\082BBSA">
You wanted to match the backward slash, but you are matching forward slash. Please change the RegEx to
[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}
Note the \\ instead of \/. Check the RegEx at work, here
I'm trying to build a regex from a string object, which happens to be stored in a variable.
The problem I'm facing is that escaped sequences (in the string) such "\d" doesn't make to the resulting regex.
Regexp.new("\d") => /d/
If I use single quotes, tough, it works flawless.
Regexp.new('\d') => /\d/
But, as my string is stored in a variable, I always get the double-quoted string.
Is there a way to turn a double-quoted string to single-quoted string, so that I could use in the Regexp constructor ?
(I'd like to use the string interpolation feature of the double quotes)
ex.:
email_pattern = "/[a-z]*\.com"
whole_pattern = "to: #{email_pattern}"
Regexp.new(whole_pattern)
For better readability, I'd like to avoid escaping escape characters.
"\\d"
The problem is, that you end up with completely different strings, depending on whether you use single or double quotes:
"\d".chars.to_a
#=> ["d"]
'\d'.chars.to_a
#=> ["\\", "d"]
so when you are using double quotes, the single \ is immediately lost and cannot be recovered by definition, for example:
"\d" == "d"
#=> true
so you can never know what the string contained before the escaping took place. As #FrankSchmitt suggested, use the double backslash or stick with single quotes. There's no other way.
There's an option, though. You can define your regex parts as regexes themselves, instead of strings. They behave exactly as expected:
regex1 = /\d/
#=> /\d/
regex2 = /foobar/
#=> /foobar/
Then, you can build your final regex with #{}-style interpolation, instead of building the regex source from strings:
regex3 = /#{regex1} #{regex2}/
#=> /(?-mix:\d) (?-mix:foobar)/
Reflecting your example this would translate to:
email_regex = /[a-z]*\.com/
whole_regex = /to: #{email_regex}/
#=> /to: (?-mix:[a-z]*\.com)/
You may also find Regexp#escape interesting. (see the docs)
If you run into further escaping problems (with the slashes), you can also use the alternative Regexp literal syntax with %r{<your regex here>}, in which you do not need to escape the / character. For example:
%r{/}
#=> /\//
There's no getting around escaping the backslash \ with \\, though.
Either create your string with single quotes:
s = '\d'
r = Regexp.new(s)
or quote the backslash:
s = "\\d"
r = Regexp.new(s)
Both should work.