How to get three consecutive backslashes in Regexp - ruby

I want to create a regular expression such as:
/\\\s*\\\s*$/
I am trying it in this way:
Regexp.new('\\\s*\\\s*$') # => /\\s*\\s*$/
What am I doing wrong?

Well (\\) matches a single backslash. Backslash serves as an escape character for Regexp.
rgx = Regexp.new('\\\\\\s*\\\\\\s*$')
A more verbose way of doing this would be the following as #Cary Swoveland stated.
rgx = Regexp.new('\\{3}s*\\{3}s*$')

Using literal notation avoids some confusion. This compiles to what you said you want:
/\\\s*\\\s*$/
Though, to be clear, this still matches a single backslash, optional whitespace a single backslash and more optional whitespace. Backslashes are escaped when you inspect a regexp.

Related

Ruby - convert single backslash unicode to double backslash

I have a string that looks something like '\\u001E\\u001Csome_text\u001F' - where the first two characters are escaped with two backslashes and the last one only has one backslash.
I want to convert that string so that all the unicode literals have two backslashes in them, so the output would look like '\\u001E\\u001Csome_text\\u001F'. What ways could I go about doing this?
If the goal is to find instances of '\u' not preceded by a '\' character, then a negative lookbehind should suit the need to match:
(?<!\\)\\u
If you want to match the whole character, then you can also use a positive lookahead to verify the following characters:
(?<!\\)\\u(?=[0-9A-Z]{4})
Both of these will return the instances of '\u', which you can replace with '\\u'

GSUB and Forward Slash usage in Ruby

I often see the gsub function being called with the pattern parameter enclosed in forward slashes. For example:
>> phrase = "*** and *** ran to the ###."
>> phrase.gsub(/\*\*\*/, "WOOF")
=> "WOOF and WOOF ran to the ###."
I thought maybe it had something to do with escaping asterisks, but using single quotes and double quotes works just as well:
>> phrase = "*** and *** ran to the ###."
>> phrase.gsub('***', "WOOF")
=> "WOOF and WOOF ran to the ###."
>> phrase.gsub("***", "WOOF")
=> "WOOF and WOOF ran to the ###."
Is it just convention to use forward slash? What am I missing?
Use forward slashes if you need to use regular expressions.
If you use a string argument with gsub, it will just do a plain character match.
In your example, you need backslashes to escape the asterisks when using a regular expression, because asterisks have a special meaning in regex (optionally match something any number of times). They are not necessary when using a string, because they are just matched exactly.
In your example, you probably don't need to use a regular expression, since it is a simple pattern. However, if you wanted to match *** only when it was at the beginning of a string (e.g. the first bunch in your example), then you would want to use a regex, for example:
phrase.gsub(/^\*{3}/, "WOOF")
For more information on regular expressions, see: http://www.regular-expressions.info/.
For more information on using regular expressions in Ruby, see: http://ruby-doc.org/core-2.2.0/Regexp.html.
To play with regular expressions as they work in Ruby, try: http://rubular.com/.
You are missing reading the documentation:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ā€˜dā€™, instead of a digit.
http://ruby-doc.org/core-2.1.4/String.html#method-i-gsub
In other words, you can give a string or a regular expression. Regular expressions can be delimited several ways:
Regexps are created using the /.../ and %r{...} literals, and by the Regexp::new constructor.
http://ruby-doc.org/core-2.2.2/Regexp.html
The benefit of %r and of the alternate %r delimiters is you can usually find a delimiter that doesn't collide with characters in the pattern, which would force escaping them, as in your example.
* has to be escaped because it has special meaning in a regex, but in a string it does not.

split string by spaces properly accounting for quotes and backslashes (ruby)

I want to split a string (insecure foreign line, like exim_mainlog line) by spaces, but not by spaces that are inside of double quotes, and ignore if the quote is escaped by a backslash like \", and ignore the backslash if it is just escaped like \\. Without slow parsing the string manually with FSM.
Example line:
U=mailnull T="test \"quote\" and wild blackslash\\" P=esmtps
Should be split into:
["U=mailnull", "T=\"test \\\"quote\\\" and wild blackslash\\\"", "P=esmtps"]
(Btw, I think ruby should had method for such split.., sigh).
I think I found simple enough solution: input.scan(/(?:"(?:\\.|[^"])*"|[^" ])+/)

ruby regex about escape a escape

I am trying to write a regex in Ruby to test a string such as:
"GET \"anything/here.txt\""
the point is, everything can be in the outer double quote, but all double quotes in the outer double quotes must be escaped by back slash(otherwise it doesnt match). So for example
"GET "anything/here.txt""
this will not be a proper line.
I tried many ways to write the regex but doest work. can anyone help me with this? thank you
You can use positive lookbehind:
/\A"((?<=\\)"|[^"])*"\z/
This does exactly what you asked for: "if a double quote appears inside the outer double quotes without a backslash prefixed, it doesn't match."
Some comments:
\A,\z: These match only at the beginning and end of the string. So the pattern has to match against the whole string, not a part of it.
(?<=): This is the syntax for positive lookbehind; it asserts that a pattern must match directly before the current position. So (?<=\\)" matches "a double quote which is preceded by a backslash".
[^"]: This matches "any character which is not a backslash".
One point about this regex, is that it will match an inner double quote which is preceded by two backslashes. If that is a problem, post a comment and I'll fix it.
If your version of Ruby doesn't have lookbehind, you could do something like:
/\A"(\\.|[^"\\])*"\z/
Note that unlike the first regexp, this one does not count a double backslash as escaping a quote (rather, the first backslash escapes the second one), so "\\"" will not match.
This works:
/"(?<method>[A-Z]*)\s*\\\"(?<file>[^\\"]*)\\""/
See it on Rubular.
Edit:
"(?<method>[A-Z]*)\s(?<content>(\\\"|[a-z\/\.]*)*)"
See it here.
Edit 2: without (? ...) sequence (for Ruby 1.8.6):
"([A-Z]*)\s((\\\"|[a-z\/\.]*)*)"
Rubular here.
Tested this on Rubular successfully:
\"GET \\\".*\\\"\"
Breakdown:
\" - Escape the " for the regex string, meaning the literal character "
GET - Assuming you just want GET than this is explicit
\\" - Escape \ and " to get the literal string \"
.* - 0 or more of any character other than \n
\\"\" - Escapes for the literal \""
I'm not sure a regex is really your best tool here, but if you insist on using one, I recommend thinking of the string as a sequence of tokens: a quote, then a series of things that are either \\, \" or anything that isn't a quote, then a closing quote at the end. So this:
^"(\\\\|\\"|[^"])*"$

How to make Ruby ignore backslash in strings?

Is there some way in Ruby that I can avoid having to put double-backslash in Ruby strings (like what can be done in C#):
For example, in C# was can prefix a string with # and then the backslash in the string does not need to be escaped:
#"C:\Windows, C:\ABC"
Without # we would need to escape the backslash:
"C:\\Windows, C:\\ABC"
Is there something similar in Ruby?
Use single quotes
my_string = 'C:\Windows'
See more in the Strings section here
You can also use %q and backslashes will be automatically escaped for you:
%q{C:\Windows} => "C:\\Windows"

Resources