ruby rspec and strings comparison - ruby

I'm not a ruby expert and may be this will seem a silly question...but I'm too courious about an oddity (I think) I've found in RSpec matcher called match.
You know match takes in input a string or a regex. Example:
"test".should match "test" #=> will pass
"test".should match /test/ #=> will pass
The strange begins when you insert special regex characters in the input string:
"*test*".should match "*test*" #=> will fail throwing a regex exception
This means (I thought) that input strings are interpreted as regex, then I should escape special regex characters to make it works:
"*test*".should match "\*test\*" #=> will fail with same exception
"*test*".should match /\*test\*/ #=> will pass
From this basic test, I understand that match treats input strings as regular expressions but it does not allow you to escape special regex characters.
Am I true? Is not this a singular behavior? I mean, it's a string or a regex!
EDIT AFTER ANSWER:
Following DigitalRoss (right) answer the following tests passed:
"*test*".should match "\\*test\\*" #=> pass
"*test*".should match '\*test\*' #=> pass
"*test*".should match /\*test\*/ #=> pass

What you are seeing is the different interpretation of backslash-escaped characters in String vs Regexp. In a soft (") quoted string, \* becomes a *, but /\*/ is really a backslash followed by a star.
If you use hard quotes (') for the String objects or double the backslash characters (only for the Strings, though) then your tests should produce the same results.

Related

Single backslash for Ruby gsub replacement value?

Does anyone know how to provide a single backslash as the replacement value in Ruby's gsub method? I thought using double backslashes for the replacement value would result in a single backslash but it results in two backslashes.
Example: "a\b".gsub("\", "\\")
Result: a\\b
I also get the same result using a block:
Example: "a\b".gsub("\"){"\\"}
Result: a\\b
Obviously I can't use a single backslash for the replacement value since that would just serve to escape the quote that follows it. I've also tried using single (as opposed to double) quotes around the replacement value but still get two backslashes in the result.
EDIT: Thanks to the commenters I now realize my confusion was with how the Rails console reports the result of the operation (i.e. a\\b). Although the strings 'a\b' and 'a\\b' appear to be different, they both have the same length:
'a\b'.length (3)
'a\\b'.length (3)
You can represent a single backslash by either "\\" or '\\'. Try this in irb, where
"\\".size
correctly outputs 1, showing that you indeed have only one character in this string, not 2 as you think. You can also do a
puts "\\"
Similarily, your example
puts("a\b".gsub("\", "\\"))
correctly prints
a\b

Working with Ruby class: Capitalizing a string

I'm trying to get my head around how to work with Classes in Ruby and would really appreciate some insight on this area. Currently, I've got a rather simple task to convert a string with the start of each word capitalized. For example:
Not Jaden-Cased: "How can mirrors be real if our eyes aren't real"
Jaden-Cased: "How Can Mirrors Be Real If Our Eyes Aren't Real"
This is my code currently:
class String
def toJadenCase
split
capitalize
end
end
#=> usual case: split.map(&:capitalize).join(' ')
Output:
Expected: "The Moment That Truth Is Organized It Becomes A Lie.",
instead got: "The moment that truth is organized it becomes a lie."
I suggest you not pollute the core String class with the addition of an instance method. Instead, just add an argument to the method to hold the string. You can do that as follows, by downcasing the string then using gsub with a regular expression.
def to_jaden_case(str)
str.downcase.gsub(/(?<=\A| )[a-z]/) { |c| c.upcase }
end
to_jaden_case "The moMent That trUth is organized, it becomes a lie."
#=> "The Moment That Truth Is Organized, It Becomes A Lie."
Ruby's regex engine performs the following operations.
(?<=\A| ) : use a positive lookbehind to assert that the following match
is immediately preceded by the start of the string or a space
[a-z] : match a lowercase letter
(?<=\A| ) can be replaced with the negative lookbehind (?<![^ ]), which asserts that the match is not preceded by a character other than a space.
Notice that by using String#gsub with a regular expression (unlike the split-process-join dance), extra spaces are preserved.
When spaces are to be matched by a regular expression one often sees whitespaces (\s) matched instead. Here, for example, /(?<=\A|\s)[a-z]/ works fine, but sometimes matching whitespaces leads to problems, mainly because they also match newlines (\n) (as well as spaces, tabs and a few other characters). My advice is to match space characters if spaces are to be matched. If tabs are to be matched as well, use a character class ([ \t]).
Try:
def toJadenCase
self.split.map(&:capitalize).join(' ')
end

What is the difference between these three alternative ways to write Ruby regular expressions?

I want to match the path "/". I've tried the following alternatives, and the first two do match, but I don't know why the third doesn't:
/\A\/\z/.match("/") # <MatchData "/">
"/\A\/\z/".match("/") # <MatchData "/">
Regexp.new("/\A\/\z/").match("/") # nil
What's going on here? Why are they different?
The first snippet is the only correct one.
The second example is... misleading. That string literal "/\A\/\z/" is, obviously, not a regex. It's a string. Strings have #match method which converts its argument to a regexp (if not already one) and match against it. So, in this example, it's '/' that is the regular expression, and it matches a forward slash found in the other string.
The third line is completely broken: don't need the surrounding slashes there, they are part of regex literal, which you didn't use. Also use single quoted strings, not double quoted (which try to interpret escape sequences like \A)
Regexp.new('\A/\z').match("/") # => #<MatchData "/">
And, of course, none of the above is needed if you just want to check if a string consists of only one forward slash. Just use the equality check in this case.
s == '/'

Ruby Regexp: difference between new and union with a single regexp

I have simplified the examples. Say I have a string containing the code for a regex. I would like the regex to match a literal dot and thus I want it to be:
\.
So I create the following Ruby string:
"\\."
However when I use it with Regexp.union to create my regex, I get this:
irb(main):017:0> Regexp.union("\\.")
=> /\\\./
That will match a slash followed by a dot, not just a single dot. Compare the previous result to this:
irb(main):018:0> Regexp.new("\\.")
=> /\./
which gives the Regexp I want but without the needed union.
Could you explain why Ruby acts like that and how to make the correct union of regexes ? The context of utilization is that of importing JSON strings describing regexes and union-ing them in Ruby.
Passing a string to Regexp.union is designed to match that string literally. There is no need to escape it, Regexp.escape is already called internally.
Regexp.union(".")
#=> /\./
If you want to pass regular expressions to Regexp.union, don't use strings:
Regexp.union(Regexp.new("\\."))
#=> /\./
\\. is where you went wrong I think, if you want to match a . you should just use the first one \. Now you have a \ and \. and the first one is escaped.
To be safe just use the standard regex provided by Ruby which would be Regexp.new /\./ in your case
If you want to use union just use Regexp.union "." which should return /\./
From the ruby regex class:
Regexp.union("a+b*c") #=> /a\+b\*c/

Backslash + captured group within Ruby regular expression

How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1' or '\\\\\\1'. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1 is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1' and '\1' are equivalent but '\' and '\\' are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1 to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1" means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1. It also understands \ as an escape character, so \\1 is not sufficient and will simply evaluate to the literal output \1. So, we need \\\1 in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1 therefore requires "\\\\\\1". In the case of single quotes one slash can be omitted as \1 is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/ style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the . for it to be understood as a literal . by the regexp engine, since "." and "\." both evaluate to . in double-quoted strings, but we need the engine itself to receive \..
This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')
Adding \ two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil

Resources