[[:punct:]] matches differently in irb and rails test [duplicate] - ruby

This question already has an answer here:
Regex "punct" character class matches different characters depending on Ruby version
(1 answer)
Closed 5 years ago.
[[:punct:]] doesn't match any punctuation when it's called by a rails model test. Using the following code
test "punctuation matched by [[:punct:]]" do
punct_match = /\A[[:punct:]]+\Z/.match('[\]\[!"#$%&\'()*+,./:;<=>?#\^_`{|}~-]')
puts punct_match
puts punct_match.class
end
this outputs a non-printable character and NilClass.
However, if I execute the same statement
punct_match = /\A[[:punct:]]+\Z/.match('[\]\[!"#$%&\'()*+,./:;<=>?#\^_`{|}~-]')
in irb matches correctly and outputs
[\]\[!"#$%&'()*+,./:;<=>?#\^_`{|}~-]
=> nil
What am I missing?
Ruby version => 2.2.4,
Rails version => 4.2.6

The behaviour of /[[:punct:]]/ changed slightly in ruby version 2.4.0.
This bug was raised in the ruby issues, which links back to this (much older) issue in Onigmo - the regexp engine used since Ruby 2.0+.
The short answer is, these characters were not matched by /[[:punct:]]/ in ruby versions <2.4.0, and are now matched:
$+<=>^`|~
You must be running irb in a newer ruby version than this rails application, which is why it matches there.
On a separate note, you should alter your code slightly to:
/\A[[:punct:]]+\z/.match('[]!"#$%&\'()*+,./:;<=>?#^_`{|}~-]')
Use \z, not \Z. There is a slight difference: \Z will also match a new line at the end of the string.
You have unnecessary back-slashes in the string, such as '\^'
You have repeated a [ character: '[\]\['

Related

Ruby: magic comments "frozen_string_literal: true" vs "immutable: string"

In ruby one can freeze all constant strings in a file via two different magic comments at the beginning of a file:
# frozen_string_literal: true
and
# -*- immutable: string -*-
I have no idea what the differences are.
Are there any?
The 1st syntax is the magic comment for Ruby 2.3+ versions to freeze string literals, otherwise you have to use the String method like this:
'hello world!'.freeze
The 2nd syntax is not implemented in Ruby, however it is the way that variables are specified for files in the Emacs text editor.
For example, the following comment in Emacs would declare that the file is a Ruby file and needs Ruby syntax highlighting, and that the variable immutable is set to the value string.
# -*- mode: ruby; immutable: string -*-
After searching around, it looks like that does nothing and is not used by any Ruby syntax highlighting mode.
So you do not need the 2nd syntax.
Digging for anything on the 2nd version, it looks like they had the same intention but the 2nd magic comment syntax does not to appear to have been adopted as of Ruby 2.1.0.
See https://github.com/ruby/ruby/pull/487
The first version # frozen_string_literal: true was adopted in Ruby 2.3.0
I tried the latter version in a few versions of ruby but didn't work. I would guess it should not be used or trusted to work in any version of >= 2.3 but probably no versions support it. In fact, I was not able to find any reference to that version in the open source code on github searching that syntax
https://github.com/ruby/ruby/search?q=immutable%3A+string&unscoped_q=immutable%3A+string

Regex error in Ruby 1.8.7 but not 2.0?

In Ruby 1.8.7 the following regex warning: nested repeat operator + and * was replaced with '*'.
^(\w+\.\w+)\|(\w+\.\w+)\n+*$
It does work in Ruby 2.0 though?
http://rubular.com/r/nRUSP5LNZA
A nested operator works, but is warned because it is useless. \n+* means:
Zero or more repeatition of
One or more repeatition of
\n
which is equivalent to a more simple expression \n*, which means:
Zero or more repeatition of
\n
There is no reason to use \n+*. Ruby regex engine was replaced in Ruby 1.9 and in Ruby 2.0, and if there are any differences, then it is simply that the newer engine does not check for warnings as the older one did.

Regular expression "empty range in char class error"

I got a regex in my code, which is to match pattern of url and threw error:
/^(http|https):\/\/([\w-]+\.)+[\w-]+([\w- .\/?%&=]*)?$/
The error was "empty range in char class error". I found the cause of that is in ([\w- .\/?%&=]*)? part. Ruby seems to recognize - in \w- . as an operator for range instead of a literal -. After adding escape to the dash, the problem was solved.
But the original regular expression ran well on my co-workers' machines. We use the same version of osx, rails and ruby: Ruby version is ruby 1.9.3p194, rails is 3.1.6 and osx is 10.7.5. And after we deployed code to our Heroku server, everything worked fine too. Why did only my environment have error regarding this regex? What is the mechanism of Ruby regex interpreting?
I can replicate this error on Ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux], installed on Ubuntu 12.04.1 LTS using rvm 1.13.4. However, this should not be a version-specific error. In fact, I'm surprised it worked on the other machines at all.
A a simpler demonstration that fails just as well:
"abcd" =~ /[\w- ]/
This is because [\w- ] is interpreted as "a range beginning with any word character up to space (or blank)", rather than a character class containing a word, a hyphen, or a space, which is what you had intended.
Per Ruby's regular expression documentation:
Within a character class the hyphen (-) is a metacharacter denoting an inclusive range of characters. [abcd] is equivalent to [a-d]. A range can be followed by another range, so [abcdwxyz] is equivalent to [a-dw-z]. The order in which ranges or individual characters appear inside a character class is irrelevant.
As you saw, prepending a backslash escaped the hyphen, thus changing the nature of the regexp from a range to a character class, removing the error. However, escaping the hyphen in the middle of character class is not recommended, since it's easy to confuse the intended meaning of the hyphen in such cases. As m.buettner pointed out, always place hyphens either at the beginning or the end of a character class:
"abcd" =~ /[-\w ]/

Ruby 1.9 -Ku, mem_cache_store and invalid multibyte escape error

Originally this bug was posted here: https://rails.lighthouseapp.com/projects/8994/tickets/5713-ruby-19-ku-incompatible-with-mem_cache_store
And now, as we've run into the same issue, I'll copy here a question from that issue, hoping someone have an answer already:
When Ruby 1.9 is started in unicode mode (-Ku), mem_cache_store.rb fails to parse:
/usr/local/ruby19/bin/ruby -Ku /usr/local/ruby-1.9.2-p0/lib/ruby/gems/1.9.1/gems/
activesupport-3.0.0/lib/active_support/cache/mem_cache_store.rb
/usr/local/ruby-1.9.2-p0/lib/ruby/gems/1.9.1/gems/activesupport-3.0.0/lib/active_support/
cache/mem_cache_store.rb:32: invalid multibyte escape: /[\x00-\x20%\x7F-\xFF]/
Our case is practically identical: when you set config.action_controller.cache_store to :mem_cache_store, and try to run tests, console, or server, you recieve this in return:
/Users/%username%/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.1/lib/active_support/
cache/mem_cache_store.rb:32: invalid multibyte escape: /[\x00-\x20%\x7F-\xFF]/
Any ideas how this can be avoided?..
Ruby 1.9 in unicode mode will attempt to interpret the regular expression as unicode. To avoid this you need to pass the regular expression option "n" for "no encoding":
ESCAPE_KEY_CHARS = /[\x00-\x20%\x7F-\xFF]/n
Now we have our raw 8-bit encoding (the only thing Ruby 1.8 speaks) as intended:
ruby-1.9.2-p136 :001 > ESCAPE_KEY_CHARS = /[\x00-\x20%\x7F-\xFF]/n.encoding
=> # <Encoding:ASCII-8BIT>
Hopefully the Rails teams fixes this, for now you have to edit the file.

Shellwords.shellescape implementation for Ruby 1.8

While the build of 1.8.7 I have seems to have a backported version of Shellwords::shellescape, I know that method is a 1.9 feature and definitely isn't supported in earlier versions of 1.8. Does anyone know where I can find, either in Gem form or just as a snippet, a robust standalone implementation of Bourne-shell command escaping for Ruby?
You might as well just copy what you want from shellwords.rb in the trunk of Ruby's subversion repository (which is GPLv2'd):
def shellescape(str)
# An empty argument will be skipped, so return empty quotes.
return "''" if str.empty?
str = str.dup
# Process as a single byte sequence because not all shell
# implementations are multibyte aware.
str.gsub!(/([^A-Za-z0-9_\-.,:\/#\n])/n, "\\\\\\1")
# A LF cannot be escaped with a backslash because a backslash + LF
# combo is regarded as line continuation and simply ignored.
str.gsub!(/\n/, "'\n'")
return str
end
I wound up going with the Escape gem, which has the additional feature of using quotes by default, and only backslash-escaping when necessary.

Resources