Why does Ruby not Symbol#=~ (regex match operator)? - ruby

Ruby doesn't automatically stringify symbols when performing a regex match on them, which is easy to do when you have variables containing symbols and you forget that you need to call #to_s on them before trying a regex match:
>> :this =~ /./
=> false
>> :this =~ :this
=> false
>> :this =~ /:this/
=> false
It turns out that :=~ is defined in Object, Ruby 1.8's primordial class:
http://rubybrain.com/api/ruby-1.8.7/doc/index.html?a=M000308&name==~
Of course, the implementation just returns false, leaving it up to subclasses like String and Regexp to provide meaningful implementations.
So why doesn't Symbol provide something like the following?
def =~(pattern)
self.to_s =~ pattern
end
Any Ruby linguists out there know?

I don't know the reason why it was decided that 1.8 should behave this way, but 1.9 changed in that regard:
>> RUBY_VERSION #=> "1.9.2"
>> :this =~ /./ #=> 0
>> :this =~ /is/ #=> 2

Related

How to convert a backslash hexadecimal string to a binary string in Ruby? [duplicate]

Does Ruby have any built-in method for escaping and unescaping strings? In the past, I've used regular expressions; however, it occurs to me that Ruby probably does such conversions internally all the time. Perhaps this functionality is exposed somewhere.
So far I've come up with these functions. They work, but they seem a bit hacky:
def escape(s)
s.inspect[1..-2]
end
def unescape(s)
eval %Q{"#{s}"}
end
Is there a better way?
Ruby 2.5 added String#undump as a complement to String#dump:
$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"
With it:
def escape(s)
s.dump[1..-2]
end
def unescape(s)
"\"#{s}\"".undump
end
$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"
There are a bunch of escaping methods, some of them:
# Regexp escapings
>> Regexp.escape('\*?{}.')
=> \\\*\?\{\}\.
>> URI.escape("test=100%")
=> "test=100%25"
>> CGI.escape("test=100%")
=> "test%3D100%25"
So, its really depends on the issue you need to solve. But I would avoid using inspect for escaping.
Update - there is a dump, inspect uses that, and it looks like it is what you need:
>> "\n\t".dump
=> "\"\\n\\t\""
Caleb function was the nearest thing to the reverse of String #inspect I was able to find, however it contained two bugs:
\\ was not handled correctly.
\x.. retained the backslash.
I fixed the above bugs and this is the updated version:
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end
# To test it
while true
line = STDIN.gets
puts unescape(line)
end
Update: I no longer agree with my own answer, but I'd prefer not to delete it since I suspect that others may go down this wrong path, and there's already been a lot of discussion of this answer and it's alternatives, so I think it still contributes to the conversation, but please don't use this answer in real code.
If you don't want to use eval, but are willing to use the YAML module, you can use it instead:
require 'yaml'
def unescape(s)
YAML.load(%Q(---\n"#{s}"\n))
end
The advantage to YAML over eval is that it is presumably safer. cane disallows all usage of eval. I've seen recommendations to use $SAFE along with eval, but that is not available via JRuby currently.
For what it is worth, Python does have native support for unescaping backslashes.
Ruby's inspect can help:
"a\nb".inspect
=> "\"a\\nb\""
Normally if we print a string with an embedded line-feed, we'd get:
puts "a\nb"
a
b
If we print the inspected version:
puts "a\nb".inspect
"a\nb"
Assign the inspected version to a variable and you'll have the escaped version of the string.
To undo the escaping, eval the string:
puts eval("a\nb".inspect)
a
b
I don't really like doing it this way. It's more of a curiosity than something I'd do in practice.
YAML's ::unescape doesn't seem to escape quote characters, e.g. ' and ". I'm guessing this is by design, but it makes me sad.
You definitely do not want to use eval on arbitrary or client-supplied data.
This is what I use. Handles everything I've seen and doesn't introduce any dependencies.
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end
I suspect that Shellwords.escape will do what you're looking for
https://ruby-doc.org/stdlib-1.9.3/libdoc/shellwords/rdoc/Shellwords.html#method-c-shellescape

Overriding the =~ operator of Regexp in a subclass Subregex, result in a weird behaviour when executing "example" =~ subregexex

Given the following example in Ruby 2.0.0:
class Regexp
def self.build
NumRegexp.new("-?[\d_]+")
end
end
class NumRegexp < Regexp
def match(value)
'hi two'
end
def =~(value)
'hi there'
end
end
var_ex = Regexp.build
var_ex =~ '12' # => "hi there" , as expected
'12' =~ var_ex # => nil , why? It was expected "hi there" or "hi two"
According to the documentation of Ruby of the =~ operator for the class String:
str =~ obj → fixnum or nil
"If obj is a Regexp, use it as a pattern to match against str,and returns the position the match starts, or nil if there is no match. Otherwise, invokes obj.=~, passing str as an argument. The default =~ in Object returns nil."
http://www.ruby-doc.org/core-2.0.0/String.html#method-i-3D-7E
It is a fact that the variable var_ex is an object of class NumRegexp, hence, it is not a Regexp object. Therefore, it should invoke the method obj.=~ passing the string as an argument, as indicated in the documentation and returning "hi there".
In another case, maybe as NumRegexp is a subclass of Regexp it could be considered a Regexp type. Then, "If obj is a Regexp use it as a pattern to match against str". It should return "hi two" in that case.
What is wrong in my reasoning? What do I have to do to achieve the desired functionality?
I've found that the record:
var_ex =~ '12'
isn't the same of:
'12' =~ var_ex
It seems that there are no string method that calls to regexp class #~= method back, this is a bug already reported, and expected to be solved in 2.2.0. So you have to declare it explicitly:
class String
alias :__system_match :=~
def =~ regex
regex.is_a?( Regexp ) && ( regex =~ self ) || __system_match( regex )
end
end
'12' =~ /-?[\d_]+/
# => 0
This is a possible and acceptable solution using monkey patching but it presents some problems to take into account:
The problem with this is that we have now polluted the namespace with a superfluous __system_match method. This method will show up in our documentation, it will show up in code completion in our IDEs, it will show up during reflection. Also, it still can be called, but presumably we monkey patched it, because we didn't like its behavior in the first place, so we might not want other people to call it.
The reason is that you are calling =~ method on a string, not on your NumRegexp object. You need to tell String how to behave:
class String
def =~(reg)
return reg=~self if reg.is_a? NumRegexp
super
end
end

Does Ruby regular expression have a not match operator like "!~" in Perl?

I just want to know whether ruby regex has a not match operator just like !~ in perl. I feel it's inconvenient to use (?!xxx)or (?<!xxxx) because you cannot use regex patterns in the xxx part.
Yes: !~ works just fine – you probably thought it wouldn’t because it’s missing from the documentation page of Regexp. Nevertheless, it works:
irb(main):001:0> 'x' !~ /x/
=> false
irb(main):002:0> 'x' !~ /y/
=> true
AFAIK (?!xxx) is supported:
2.1.5 :021 > 'abc1234' =~ /^abc/
=> 0
2.1.5 :022 > 'def1234' =~ /^abc/
=> nil
2.1.5 :023 > 'abc1234' =~ /^(?!abc)/
=> nil
2.1.5 :024 > 'def1234' =~ /^(?!abc)/
=> 0
Back in perl, 'foobar' !~ /bar/ was perfectly perlish to test that the string doesn't contain "bar".
In Ruby, particularly with a modern style guide, I think a more explicit solution is more conventional and easy to understand:
input = 'foobar'
do_something unless input.match?(/bar/)
needs_bar = !input.match?(/bar/)
That said, I think it would be spiffy if there was a .no_match? method.

Best way to escape and unescape strings in Ruby?

Does Ruby have any built-in method for escaping and unescaping strings? In the past, I've used regular expressions; however, it occurs to me that Ruby probably does such conversions internally all the time. Perhaps this functionality is exposed somewhere.
So far I've come up with these functions. They work, but they seem a bit hacky:
def escape(s)
s.inspect[1..-2]
end
def unescape(s)
eval %Q{"#{s}"}
end
Is there a better way?
Ruby 2.5 added String#undump as a complement to String#dump:
$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"
With it:
def escape(s)
s.dump[1..-2]
end
def unescape(s)
"\"#{s}\"".undump
end
$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"
There are a bunch of escaping methods, some of them:
# Regexp escapings
>> Regexp.escape('\*?{}.')
=> \\\*\?\{\}\.
>> URI.escape("test=100%")
=> "test=100%25"
>> CGI.escape("test=100%")
=> "test%3D100%25"
So, its really depends on the issue you need to solve. But I would avoid using inspect for escaping.
Update - there is a dump, inspect uses that, and it looks like it is what you need:
>> "\n\t".dump
=> "\"\\n\\t\""
Caleb function was the nearest thing to the reverse of String #inspect I was able to find, however it contained two bugs:
\\ was not handled correctly.
\x.. retained the backslash.
I fixed the above bugs and this is the updated version:
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end
# To test it
while true
line = STDIN.gets
puts unescape(line)
end
Update: I no longer agree with my own answer, but I'd prefer not to delete it since I suspect that others may go down this wrong path, and there's already been a lot of discussion of this answer and it's alternatives, so I think it still contributes to the conversation, but please don't use this answer in real code.
If you don't want to use eval, but are willing to use the YAML module, you can use it instead:
require 'yaml'
def unescape(s)
YAML.load(%Q(---\n"#{s}"\n))
end
The advantage to YAML over eval is that it is presumably safer. cane disallows all usage of eval. I've seen recommendations to use $SAFE along with eval, but that is not available via JRuby currently.
For what it is worth, Python does have native support for unescaping backslashes.
Ruby's inspect can help:
"a\nb".inspect
=> "\"a\\nb\""
Normally if we print a string with an embedded line-feed, we'd get:
puts "a\nb"
a
b
If we print the inspected version:
puts "a\nb".inspect
"a\nb"
Assign the inspected version to a variable and you'll have the escaped version of the string.
To undo the escaping, eval the string:
puts eval("a\nb".inspect)
a
b
I don't really like doing it this way. It's more of a curiosity than something I'd do in practice.
YAML's ::unescape doesn't seem to escape quote characters, e.g. ' and ". I'm guessing this is by design, but it makes me sad.
You definitely do not want to use eval on arbitrary or client-supplied data.
This is what I use. Handles everything I've seen and doesn't introduce any dependencies.
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end
I suspect that Shellwords.escape will do what you're looking for
https://ruby-doc.org/stdlib-1.9.3/libdoc/shellwords/rdoc/Shellwords.html#method-c-shellescape

simplest way to check for just spaces in ruby

So I know in ruby that x.nil? will test if x is null.
What is the simplest way to test if x equals ' ', or ' '(two spaces), or ' '(three spaces), etc?
Basically, I'm wondering what the best way to test if a variable is all whitespace?
If you are using Rails, you can simply use:
x.blank?
This is safe to call when x is nil, and returns true if x is nil or all whitespace.
If you aren't using Rails you can get it from the activesupport gem. Install with gem install activesupport. In your file either require 'active_support/core_ext to get all active support extensions to the base classes, or require 'active_support/core_ext/string' to get just the extensions to the String class. Either way, the blank? method will be available after the require.
"best" depends on the context, but here is a simple way.
some_string.strip.empty?
s =~ /\A\s*\Z/
Regex solution. Here's a short ruby regex tutorial.
If x is all whitespace, then x.strip will be the empty string. So you can do:
if not x.nil? and x.strip.empty? then
puts "It's all whitespace!"
end
Alternatively, using a regular expression, x =~ /\S/ will return false if and only if x is all whitespace characters:
if not (x.nil? or x =~ /\S/) then
puts "It's all whitespace!"
end
a = " "
a.each_byte do |x|
if x == 32
puts "space"
end
end
Based on your comment I think you can extend the String class and define a spaces? method as follows:
$ irb
>> s = " "
=> " "
>> s.spaces?
NoMethodError: undefined method `spaces?' for " ":String
from (irb):2
>> class String
>> def spaces?
>> x = self =~ /^\s+$/
>> x == 0
>> end
>> end
=> nil
>> s.spaces?
=> true
>> s = ""
=> ""
>> s.spaces?
=> false
>>
s.include?(" ")
Examples:
s = "A B C D"
s.include?(" ") #=> true
s = "ABCD"
s.include?(" ") #=> false
Yet another :) string.all? { |c| c == ' ' }

Resources