I am struggling to write a Ruby regexp that will match all words which: starts with 2 or 3 letters, then have backslash (\) and then have 7 or 8 letters and digits. The expression I use is like this:
p "BFD\082BBSA".match %r{\A[a-zA-Z]{2,3}\/[a-zA-Z0-9]{7,8}\z}
But each time this code returns nil. What am I doing wrong?
Try as below :
'BFD\082BBSA'.match %r{\A[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}\z}
# => #<MatchData "BFD\\082BBSA">
#or
"BFD\\082BBSA".match %r{\A[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}\z}
# => #<MatchData "BFD\\082BBSA">
Read this also - Backslashes in Single quoted strings vs. Double quoted strings in Ruby?
The problem is that you actually have no backslash in your string, just a null Unicode character:
"BFD\082BBSA"
# => "BFD\u000082BBSA"
So you just have to escape the backslash in the string:
"BFD\\082BBSA"
# => "BFD\\082BBSA"
Moreover, as others pointed out, \/ will match a forward slash, so you have to change \/ into \\:
"BFD\\082BBSA".match(/\A[a-z]{2,3}\\[a-z0-9]{7,8}\z/i)
# => #<MatchData "BFD\\082BBSA">
You wanted to match the backward slash, but you are matching forward slash. Please change the RegEx to
[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}
Note the \\ instead of \/. Check the RegEx at work, here
Related
Im trying to match double back slashes in a string, but only when there is 2 and not 3 so I can swap out the 2 for 3.
I know that \\{2} will match double back slash except it will also match the first 2 slashes when 3 are present.
For example in the string
{"files":{"windows": {"%windir%\\\System32\\drivers\\etc\\lmhosts.sam":{"ignore":{"id":32}},"%windir%\\System32\\\drivers\\etc":{"ignore":{"id":32}},"%windir%\\System32\\drivers\\etc\\hosts":{"ignore":{"id":32}}}}}
There are multiple double slashes that I wish to match and replace, but there are also a few triple slashes which I wish to leave alone.
So, my question, how do match the double slash when it does not sit adjacent to another slash?
Heres a Regex101 link to toy with.
https://regex101.com/r/kWIscW/1
Also, doing this in Ruby.
How about:
\b\\{2}\b
To define that you \\ are the only one characters evaluated
Another possibility to is looking behind and look ahead, however, not sure your regex engine supports it:
(?<=[^\\])\\{2}(?=[^\\])
r = /
(?<!\\) # do not match a backslash, negative lookbehind
\\\\ # match two backslashes
(?!\\) # do not match a backslash, negative lookahead
/x # free-spacing regex definition mode
str = "\\\\\ are two backslashes and here are three \\\\\\ of 'em"
puts str
# \\ are two backslashes and here are three \\\ of 'em
str.scan(r)
#=> ["\\\\"]
Note that s = "\\\\\ " is two backslashes followed by an escaped space.
s.size
#=> 3
s[0].ord
#=> 92
92.chr
#=> "\\"
s[1].ord
#=> 92
s[2].ord
#=> 32
Let's first address backslashes in string literals,
"\\" is one backslash
"\\\\" are two backslashes
"\\\\\\" are three backslashes
Why? Backslash is the escape sequence in string literals, eg "\n" is a linebreak, and hence a backslash must be escaped with a backslash to encode one backslash.
Now, try this
string = "\\\\aaa\\\\bbb\\\\\\ccc"
string.gsub(/\\+/) { |match| match.size == 2 ? '/' : match }
# => "/aaa/bbb\\\\\\ccc"
How does this work?
/\\+/ matches any sequence of backslashes
match.size == 2 filters those that have length 2
And then we just replace those
s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?
You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.
s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.
You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source
Why, in Ruby, do the first two regexes fail to match while the third matches?
str = 'ID: 4'
regex1 = /^(?<=ID: )\d+/
regex2 = /\A(?<=ID: )\d+/
regex3 = /(?<=ID: )\d+/
str.match(regex1) # => nil
str.match(regex2) #=> nil
str.match(regex3) #=> #<MatchData "4">
The only difference is the ^ or \A characters, which match the beginning of a line and beginning of the string, respectively. It seems both should be matched by str.
The look-behind pattern (?<=ID: ) matches a position in the string that is preceded by «ID: ».
The anchors ^ and \A match a position at the beginning of the line or string.
So the pattern \A(?<=ID: ) asks that both match together, i.e. that the beginning of the string is preceded by «ID: ». Not gonna happen!
Both of these would work fine if you put the anchor inside of the lookbehind:
regex1 = /(?<=^ID: )\d+/
regex2 = /(?<=\AID: )\d+/
If the anchors are outside of the lookbehind then you are saying "from the start of the string, are the previous characters ID:". This will always fail because there won't be any characters before the start of the string.
Look-ahead and look-behind are non-capturing/zero-length, so the first two expressions don't match.
The first expression, for instance, amounts to another way of writing: /^\d+/ (it's conditioned on \d+ not being preceded by a space, but that's not possible since there cannot be anything before ^ anyway).
In the third expression, the lookbehind can occur anywhere and specifically occurs in the zero-width space before the 4. You can see that only the 4 is matched.
With ^ or \A, the zero-width space at the beginning of the string must match the lookbehind, which is impossible.
In regex1, which is /^(?<=ID: )\d+/, there has to be a beginning of a line that is preceded by ID:. The string in question does not have such point.
In regex2, which is /\A(?<=ID: )\d+/, there has to be a beginning of a string that is preceded by ID:. There is no string that has such point.
In regex3, which is /(?<=ID: )\d+/, there has to be a point of string that is preceded by ID: and is followed by \d+. There is such point in the string.
Look-behind doesn't change position of the match.
/(?<=ID: )\d+/ is actually matched at the digit:
ID: 4
^
Is there any practical difference between a regexp using an escape character versus one using the literal character? I.e. are there any situations where matching with them will return different results?
Example in Ruby:
literal = Regexp.new("\t")
=> / /
escaped = Regexp.new("\\t")
=> /\t/
# They're different...
literal == escaped
=> false
# ...but they seem to match the same:
"Hello\tWorld".match(literal)
=> #<MatchData "\t">
"Hello\tWorld".match(escaped)
=> #<MatchData "\t">
No, not in the case of \t (or \n).
But it won't work in most other cases (e.g., escape sequences that either don't have a 1:1 equivalent in string escapes like \s or where the meaning differs like \b), so it's generally a good idea to use the escaped versions (or construct the regex using /.../ in the first place).
Suppose I said £ character as dangerous, and I want to be able to protect and to unprotect any string. And vice versa.
Example 1:
"Foobar £ foobar foobar foobar." # => dangerous string
"Foobar \£ foobar foobar foobar." # => protected string
Example 2:
"Foobar £ foobar £££££££foobar foobar." # => dangerous string
"Foobar \£ foobar \£\£\£\£\£\£\£foobar foobar." # => protected string
Example 3:
"Foobar \£ foobar \\£££££££foobar foobar." # => dangerous string
"Foobar \£ foobar \\\£\£\£\£\£\£\£foobar foobar." # => protected string
Is there an easy way, with Ruby, to escape (and unescape) a given character (such as £ in my example) from a string?
Edit: here is an explication about the behavior of this question.
First of all, thanks for your answers. I have a Rails app with a Tweet model having a content field. Example of tweet:
tweet = Tweet.create(content: "Hello #bob")
Inside the model, there's a serialization process that converte the string like this:
dump('Hello #bob') # => '["Hello £", 42]'
# ... where 42 is the id of bob username
Then, I'm able to deserialize and display its tweet like this:
load('["Hello £", 42]') # => 'Hello #bob'
In the same way, it's also possible to do so with more than one username:
dump('Hello #bob and #joe!') # => '["Hello £ and £!", 42, 185]'
load('["Hello £ and £!", 42, 185]') # => 'Hello #bob and #joe!'
That's the goal :)
But this find-and-replace could be hard to perform with something like:
tweet = Tweet.create(content: "£ Hello #bob")
'cause here we also have to escape £ char. And I think your solution is good for this. So the result become:
dump('£ Hello #bob') # => '["\£ Hello £", 42]'
load('["\£ Hello £", 42]') # => '£ Hello #bob'
Just perfect. <3 <3
Now, if there is this:
tweet = Tweet.create(content: "\£ Hello #bob")
I think we first should escape every \, and then escape every £, like:
dump('\£ Hello #bob') # => '["\\£ Hello £", 42]'
load('["\\£ Hello £", 42]') # => '£ Hello #bob'
However... how can we do in this case:
tweet = Tweet.create(content: "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\£ Hello #bob")
...where tweet.content.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\") seems not working.
Hopefully your version of ruby supports lookbehinds. If it doesn't my solution will not work for you.
Escape characters :
str = str.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\")
Un-escape characters :
str = str.gsub(/(?<!\\)((?:\\\\)*)\\£/, "\1£")
Both regexes will work regardless of the amount of backslashes. They are complementing each other.
Escape explanation :
"
(?<! # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
\\ # Match the character “\” literally
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
(?: # Match the regular expression below
\\ # Match the character “\” literally
\\ # Match the character “\” literally
)* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
£ # Match the character “£” literally
)
"
Not that I am matching a certain position. No text is consumed at all. When I pinpoint the position I want I insert a \.
Explanation of unescape :
"
(?<! # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
\\ # Match the character “\” literally
)
( # Match the regular expression below and capture its match into backreference number 1
(?: # Match the regular expression below
\\ # Match the character “\” literally
\\ # Match the character “\” literally
)* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\\ # Match the character “\” literally
£ # Match the character “£” literally
"
Here I am saving all the backslashes minus one and and I replace this number of backslashes with the special character. Tricky stuff :)
If you are using Ruby 1.9, which has lookbehind, then FailedDev's answer should work quite well. If you are using Ruby 1.8, which does not have lookbehind (I think), a different approach may work. Give this a try:
text.gsub!(/(\\.)|£)/m) do
if ($1 != nil) # If escaped anything
"$1" # replace with self.
else # Otherwise escape the
"\\£" # unescaped £.
end
end
Note that I am not a Ruby programmer and this snippet is untested (in particular I'm not sure if the: if ($1 != nil) statement usage is correct - it may need to be: if ($1 != "") or if ($1)), but I do know that this general technique (using code in place of a simple replacement string) works. I recently used this same technique for my JavaScript solution to a similar question which was looking to find unescaped asterisks.
I'm not sure if this is what you want, but I think you can do a simple find-and-replace:
str = str.gsub("£", "\\£") # to escape
str = str.gsub("\\£", "£") # to unescape
Note that I changed \ to \\ because you have to escape the backslash in a double-quoted string.
Edit: I think what you want is a regex that matches an odd number of backslashes:
str = str.gsub(/(^|[^\\])((?:\\\\)*)\\£/, "\\1\\2£")
That does the following transformations
"£" #=> "£"
"\\£" #=> "£"
"\\\\£" #=> "\\\\£"
"\\\\\\£" #=> "\\\\£"