Regex match number of backslashes but not more than x number

Regex match number of backslashes but not more than x number - ruby

Im trying to match double back slashes in a string, but only when there is 2 and not 3 so I can swap out the 2 for 3.
I know that \\{2} will match double back slash except it will also match the first 2 slashes when 3 are present.
For example in the string
{"files":{"windows": {"%windir%\\\System32\\drivers\\etc\\lmhosts.sam":{"ignore":{"id":32}},"%windir%\\System32\\\drivers\\etc":{"ignore":{"id":32}},"%windir%\\System32\\drivers\\etc\\hosts":{"ignore":{"id":32}}}}}
There are multiple double slashes that I wish to match and replace, but there are also a few triple slashes which I wish to leave alone.
So, my question, how do match the double slash when it does not sit adjacent to another slash?
Heres a Regex101 link to toy with.
https://regex101.com/r/kWIscW/1
Also, doing this in Ruby.

How about:
\b\\{2}\b
To define that you \\ are the only one characters evaluated
Another possibility to is looking behind and look ahead, however, not sure your regex engine supports it:
(?<=[^\\])\\{2}(?=[^\\])

r = /
(?<!\\) # do not match a backslash, negative lookbehind
\\\\ # match two backslashes
(?!\\) # do not match a backslash, negative lookahead
/x # free-spacing regex definition mode
str = "\\\\\ are two backslashes and here are three \\\\\\ of 'em"
puts str
# \\ are two backslashes and here are three \\\ of 'em
str.scan(r)
#=> ["\\\\"]
Note that s = "\\\\\ " is two backslashes followed by an escaped space.
s.size
#=> 3
s[0].ord
#=> 92
92.chr
#=> "\\"
s[1].ord
#=> 92
s[2].ord
#=> 32

Let's first address backslashes in string literals,
"\\" is one backslash
"\\\\" are two backslashes
"\\\\\\" are three backslashes
Why? Backslash is the escape sequence in string literals, eg "\n" is a linebreak, and hence a backslash must be escaped with a backslash to encode one backslash.
Now, try this
string = "\\\\aaa\\\\bbb\\\\\\ccc"
string.gsub(/\\+/) { |match| match.size == 2 ? '/' : match }
# => "/aaa/bbb\\\\\\ccc"
How does this work?
/\\+/ matches any sequence of backslashes
match.size == 2 filters those that have length 2
And then we just replace those

Related

Why does a substring with indexes [0, 1] return two characters, including the escape backslash?

I have:
long_string # => "\nIt was the best of times,\nIt was the worst of times.\n"
I get:
long_string[0,1] # => "\n"
I am curious why I get two characters rather than merely "\" as in other cases.
Is this how escaped characters are treated in substrings and beyond?

From documentation of String#[]
str[start, length] → new_str or nil
If passed a start index and a length, returns a substring containing length characters starting at the start index
For example
"Hello"[0, 1] #=> "H"
'Hello'[0, 1] #=> "H"
But there is difference between single quotes and double quotes.
Double quotes allow for many escape sequences, e.g. "\n", "\t", "\s", "\r" and others. All this is not two, but one character.
"\n" is just one (newline) character. But '\n' contains two characters (backslash and letter).
"\n".size #=> 1
'\n'.size #=> 2
Compare the different behavior of double quotes and single quotes when you try to return one character starting from zero index
"\n"[0, 1] #=> "\n"
'\n'[0, 1] #=> "\\"
As is clear from the above "\\" is just one character (backslash). Another backslash is used to escape.

Solved - flexible quotes as above store the string as "\nIt was the best of times,\nIt was the worst of times.\n" (in double quotes). Double quoted strings interpret escaped characters, whereas single quoted strings do not.
E.g.
string = "\n"
string.size == 1 in above
string.size == 2 in below
string = '\n'

[0,1] will always return 2 characters - character 0 and character 1. [0,0] will return the first character.

ruby - weird duplication with backtick in gsub [duplicate]

s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?

You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.

s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.

You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source

Match a word with backslash

I am struggling to write a Ruby regexp that will match all words which: starts with 2 or 3 letters, then have backslash (\) and then have 7 or 8 letters and digits. The expression I use is like this:
p "BFD\082BBSA".match %r{\A[a-zA-Z]{2,3}\/[a-zA-Z0-9]{7,8}\z}
But each time this code returns nil. What am I doing wrong?

Try as below :
'BFD\082BBSA'.match %r{\A[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}\z}
# => #<MatchData "BFD\\082BBSA">
#or
"BFD\\082BBSA".match %r{\A[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}\z}
# => #<MatchData "BFD\\082BBSA">
Read this also - Backslashes in Single quoted strings vs. Double quoted strings in Ruby?

The problem is that you actually have no backslash in your string, just a null Unicode character:
"BFD\082BBSA"
# => "BFD\u000082BBSA"
So you just have to escape the backslash in the string:
"BFD\\082BBSA"
# => "BFD\\082BBSA"
Moreover, as others pointed out, \/ will match a forward slash, so you have to change \/ into \\:
"BFD\\082BBSA".match(/\A[a-z]{2,3}\\[a-z0-9]{7,8}\z/i)
# => #<MatchData "BFD\\082BBSA">

You wanted to match the backward slash, but you are matching forward slash. Please change the RegEx to
[a-zA-Z]{2,3}\\[a-zA-Z0-9]{7,8}
Note the \\ instead of \/. Check the RegEx at work, here

Remove hex escape from string

I have the following hex as a string: "\xfe\xff". I'd like to convert this to "feff". How do I do this?
The closest I got was "\xfe\xff".inspect.gsub("\\x", ""), which returns "\"FEFF\"".

"\xfe\xff".unpack("H*").first
# => "feff"

You are dealing with what's called an escape sequence in your double quoted string. The most common escape sequence in a double quoted string is "\n", but ruby allows you to use other escape sequences in strings too. Your string, "\xfe\xff", contains two hex escape sequences, which are of the form:
\xNN
Escape sequences represent ONE character. When ruby processes the string, it notices the "\" and converts the whole hex escape sequence to one character. After ruby processes the string, there is no \x left anywhere in the string. Therefore, looking for a \x in the string is fruitless--it doesn't exist. The same is true for the characters 'f' and 'e' found in your escape sequences: they do not exist in the string after ruby processes the string.
Note that ruby processes hex escape sequences in double quoted strings only, so the type of string--double or single quoted--is entirely relevant. In a single quoted string, the series of characters '\xfe' is four characters long because there is no such thing as a hex escape sequence in a single quoted string:
str = "\xfe"
puts str.length #=>1
str = '\xfe'
puts str.length #=>4
Regexes behave like double quoted strings, so it is possible to use an entire escape sequence in a regex:
/\xfe/
When ruby processes the regex, then just like with a double quoted string, ruby converts the hex escape sequence to a single character. That allows you to search for the single character in a string containing the same hex escape sequence:
if "abc\xfe" =~ /\xfe/
If you pretend for a minute that the character ruby converts the escape sequence "\xfe" to is the character 'z', then that if statement is equivalent to:
if "abcz" =~ /z/
It's important to realize that the regex is not searching the string for a '\' followed by an 'x' followed by an 'f' followed by an 'e'. Those characters do not exist in the string.
The inspect() method allows you to see the escape sequences in a string by nullifying the escape sequences, like this:
str = "\\xfe\\xff"
puts str
--output:--
\xfe\xff
In a double quoted string, "\\" represents a literal backslash, while an escape sequence begins with only one slash.
Once you've nullified the escape sequences, then you can match the literal characters, like the two character sequence '\x'. But it's easier to just pick out the parts you want rather than matching the parts you don't want:
str = "\xfe\xff"
str = str.inspect #=> "\"\\xFE\\xFF\""
result = ""
str.scan /x(..)/ do |groups_arr|
result << groups_arr[0]
end
puts result.downcase
--output:--
feff
Here it is with gsub:
str = "\xfe\xff"
str = str.inspect #=>"\"\\xFE\\xFF\""
str.gsub!(/
"? #An optional quote mark
\\ #A literal '\'
x #An 'x'
(..) #Any two characters, captured in group 1
"? #An optional quote mark
/xm) do
Regexp.last_match(1)
end
puts str.downcase
--output:--
feff
Remember, a regex acts like a double quoted string, so to specify a literal \ in a regex, you have to write \\. However, in a regex you don't have to worry about a " being mistaken for the end of the regex, so you don't need to escape it, like you do in a double quoted string.
Just for fun:
str = "\xfe\xff"
result = ""
str.each_byte do |int_code|
result << sprintf('%x', int_code)
end
p result
--output:--
"feff"

Why are you calling inspect? That's adding the extra quotes..
Also, putting that in double quotes means the \x is interpolated. Put it in single quotes and everything should be good.
'\xfe\xff'.gsub("\\x","")
=> "feff"

Lookbehind with the ^ character in a Ruby regex

Why, in Ruby, do the first two regexes fail to match while the third matches?
str = 'ID: 4'
regex1 = /^(?<=ID: )\d+/
regex2 = /\A(?<=ID: )\d+/
regex3 = /(?<=ID: )\d+/
str.match(regex1) # => nil
str.match(regex2) #=> nil
str.match(regex3) #=> #<MatchData "4">
The only difference is the ^ or \A characters, which match the beginning of a line and beginning of the string, respectively. It seems both should be matched by str.

The look-behind pattern (?<=ID: ) matches a position in the string that is preceded by «ID: ».
The anchors ^ and \A match a position at the beginning of the line or string.
So the pattern \A(?<=ID: ) asks that both match together, i.e. that the beginning of the string is preceded by «ID: ». Not gonna happen!

Both of these would work fine if you put the anchor inside of the lookbehind:
regex1 = /(?<=^ID: )\d+/
regex2 = /(?<=\AID: )\d+/
If the anchors are outside of the lookbehind then you are saying "from the start of the string, are the previous characters ID:". This will always fail because there won't be any characters before the start of the string.

Look-ahead and look-behind are non-capturing/zero-length, so the first two expressions don't match.
The first expression, for instance, amounts to another way of writing: /^\d+/ (it's conditioned on \d+ not being preceded by a space, but that's not possible since there cannot be anything before ^ anyway).

In the third expression, the lookbehind can occur anywhere and specifically occurs in the zero-width space before the 4. You can see that only the 4 is matched.
With ^ or \A, the zero-width space at the beginning of the string must match the lookbehind, which is impossible.

In regex1, which is /^(?<=ID: )\d+/, there has to be a beginning of a line that is preceded by ID:. The string in question does not have such point.
In regex2, which is /\A(?<=ID: )\d+/, there has to be a beginning of a string that is preceded by ID:. There is no string that has such point.
In regex3, which is /(?<=ID: )\d+/, there has to be a point of string that is preceded by ID: and is followed by \d+. There is such point in the string.

Look-behind doesn't change position of the match.
/(?<=ID: )\d+/ is actually matched at the digit:
ID: 4
^

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regex match number of backslashes but not more than x number - ruby

How about: \b\\{2}\b To define that you \\ are the only one characters evaluated Another possibility to is looking behind and look ahead, however, not sure your regex engine supports it: (?<=[^\\])\\{2}(?=[^\\])

Related

Why does a substring with indexes [0, 1] return two characters, including the escape backslash?

ruby - weird duplication with backtick in gsub [duplicate]

Match a word with backslash

Remove hex escape from string

Lookbehind with the ^ character in a Ruby regex

Categories

Resources