Ruby 2.0 regex and cyrillic - ruby

Before ruby 2.0, regex worked this way:
/\A[a-zа-я\d]+\z/i =~ 'привет' # => 0
/\A[a-z\p{Cyrillic}\d]+\z/i =~ 'привет' # => 0
I updated ruby 2.0, and it has a bug:
/\A[a-zа-я\d]+\z/i =~ 'привет' # => nil
/\A[a-z\p{Cyrillic}\d]+\z/i =~ 'привет' # => nil
How can I deal with this problem? Without \d in the character class, it works correctly:
/\A[a-zа-я]+\z/i =~ 'привет' # => 0

This bug looks similar and may be related to this bug that I asked about before. I reported it to ruby trunk, and it has been accepted as a bug. Hopefully, it will be fixed.

The bug seems to be fixed in ruby-head:
⮀ rvm use ruby-2.0.0-preview2
Using /home/am/.rvm/gems/ruby-2.0.0-preview2
⮀ irb
2.0.0dev :001 > regex = /\A[a-zа-я\d]+\z/i ; regex =~ 'привет'
# ⇒ nil
⮀ rvm use ruby-2.0.0-preview1
Using /home/am/.rvm/gems/ruby-2.0.0-preview1
⮀ irb
2.0.0dev :001 > regex = /\A[a-zа-я\d]+\z/i ; regex =~ 'привет'
# ⇒ nil
⮀ rvm use ruby-head
Using /home/am/.rvm/gems/ruby-head
⮀ irb
irb(main):001:0> regex = /\A[a-zа-я\d]+\z/i ; regex =~ 'привет'
# ⇒ 0

Related

How do I match something that is not a letter or a number or a space?

I'm using Ruby 2.4. How do I match something that is not a letter or a number or a space? I tried
2.4.0 :004 > str = "-"
=> "-"
2.4.0 :005 > str =~ /[^[:alnum:]]*/
=> 0
2.4.0 :006 > str = " "
=> " "
2.4.0 :007 > str =~ /[^[:alnum:]]*/
=> 0
but as you can see it is still matching a space.
Your /[^[:alnum:]]*/ pattern matches 0 or more symbols other than alphanumeric chars. It will match whitespace.
To match 1 or more chars other than alphanumeric and whitespace, you can use
/[^[:alnum:][:space:]]+/
Use the negated bracket expression with the relevant POSIX character classes inside.

How to check ARGF is empty or not in Ruby

I want to do with ARGF like this.
# file.rb
if ARGF.???
puts ARGF.read
else
puts "no redirect."
end
$ echo "Hello world" | ruby file.rb
Hello world
$ ruby file.rb
no redirect.
I need to do without waiting user input. I tried eof? or closed? doesn't help. Any ideas?
NOTE I was misunderstood ARGF. please see comments below.
Basically you'd examine #filename. One way to do this is:
if ARGF.filename != "-"
puts ARGF.read
else
puts "no redirect."
end
And this is the more complete form:
#!/usr/bin/env ruby
if ARGF.filename != "-" or (not STDIN.tty? and not STDIN.closed?)
puts ARGF.read
else
puts "No redirect."
end
Another:
#!/usr/bin/env ruby
if not STDIN.tty? and not STDIN.closed?
puts STDIN.read
else
puts "No redirect."
end
There might be a better way, but for me I needed to read the contents of a files being passed as arguments as well as having a files contents redirected to stdin.
my_executable
#!/usr/bin/env ruby
puts ARGF.pos.zero?
Then
$ my_executable file1.txt # passed as argument
#=> true
$ my_executable < file1.txt # redirected to stdin
#=> true
$ my_executable
#=> false
So I took all three currently suggested solutions:
p (not STDIN.tty? and not STDIN.closed?)
p ARGF.filename
p ARGF.pos
and saw that none of them actually works:
$ ruby temp.rb
false
"-"
36471287
$ ruby temp.rb temp.rb
false
"temp.rb"
0
$ echo 123 | ruby temp.rb
true
"-"
temp.rb:3:in `pos': Illegal seek # rb_io_tell - <STDIN> (Errno::ESPIPE)
from temp.rb:3:in `<main>'
because to assume the ability to call the ARGF.read you want to get false/true/true.
So I suppose you have to combine them:
!STDIN.tty? && !STDIN.closed? || ARGF.filename != ?-

Substituting the value of variable inside backtick operator

How can I substitute the value of a variable inside the backtick operator?
script_dir = File.expand_path File.dirname(__FILE__)
p `ruby -e p "$script_dir"` # this does not work
In Ruby, unlike Perl, the dollar sign indicates a global variable, not a plain regular variable to expand in a string. In a string, you need to use the #{} construct:
p `ruby -e "p #{script_dir}"`
An example:
irb(main):011:0> str = '\'howdy\''
=> "'howdy'"
irb(main):012:0> `ruby -e "p #{str}"`
=> "\"howdy\"\n"
Ruby string interpolation works with backtick operator:
p `ruby -e p "#{script_dir}"`

Is there a bug in Ruby lookbehind assertions (1.9/2.0)?

Why doesn't the regex (?<=fo).* match foo (whereas (?<=f).* does)?
"foo" =~ /(?<=f).*/m => 1
"foo" =~ /(?<=fo).*/m => nil
This only seems to happen with singleline mode turned on (dot matches newline); without it, everything is OK:
"foo" =~ /(?<=f).*/ => 1
"foo" =~ /(?<=fo).*/ => 2
Tested on Ruby 1.9.3 and 2.0.0.
See it on Rubular
EDIT: Some more observations:
Adding an end-of-line anchor doesn't change anything:
"foo" =~ /(?<=fo).*$/m => nil
But together with a lazy quantifier, it "works":
"foo" =~ /(?<=fo).*?$/m => 2
EDIT: And some more observations:
.+ works as does its equivalent {1,}, but only in Ruby 1.9 (it seems that that's the only behavioral difference between the two in this scenario):
"foo" =~ /(?<=fo).+/m => 2
"foo" =~ /(?<=fo).{1,}/ => 2
In Ruby 2.0:
"foo" =~ /(?<=fo).+/m => nil
"foo" =~ /(?<=fo).{1,}/m => nil
.{0,} is busted (in both 1.9 and 2.0):
"foo" =~ /(?<=fo).{0,}/m => nil
But {n,m} works in both:
"foo" =~ /(?<=fo).{0,1}/m => 2
"foo" =~ /(?<=fo).{0,2}/m => 2
"foo" =~ /(?<=fo).{0,999}/m => 2
"foo" =~ /(?<=fo).{1,999}/m => 2
This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z anchors in multiline strings.

Match newline `\n` in ruby regex

I'm trying to understand why the following returns false: (** I should have put "outputs 0" **)
puts "a\nb" =~ Regexp.new(Regexp.escape("a\nb"), Regexp::MULTILINE | Regexp::EXTENDED)
Perhaps someone could explain.
I am trying to generate a Regexp from a multi-line String that will match the String.
Thanks in advance
puts will always return nil.
Your code should work fine, albeit lengthy. =~ returns the position of the match which is 0.
You could also use:
"a\nb" =~ /a\sb/m
or
"a\nb" =~ /a\nb/m
Note: The m option isn't necessary in this example but demonstrates how it would be used without Regexp.new.
Probably, puts caused this
1.9.3-194 (main):0 > puts ("a\nb" =~ Regexp.new(Regexp.escape("a\nb"), Regexp::MULTILINE | Regexp::EXTENDED) )
0
=> nil
1.9.3-194 (main):0 > "a\nb" =~ Regexp.new(Regexp.escape("a\nb"), Regexp::MULTILINE | Regexp::EXTENDED)
=> 0

Resources