How are named capture groups used in RE2 regexps? - ruby

On this page http://swtch.com/~rsc/regexp/regexp3.html it says that RE2 supports named expressions.
RE2 supports Python-style named captures (?P<name>expr), but not the
alternate syntaxes (?<name>expr) and (?'name'expr) used by .NET and
Perl.
ruby-1.9.2-p180 :003 > r = RE2::Regexp.compile("(?P<foo>.+) bla")
#=> #<RE2::Regexp /(?P<foo>.+) bla/>
ruby-1.9.2-p180 :006 > r = r.match("lalal bla")
#=> #<RE2::MatchData "lalal bla" 1:"lalal">
ruby-1.9.2-p180 :009 > r[1] #=> "lalal"
ruby-1.9.2-p180 :010 > r[:foo]
TypeError: can't convert Symbol into Integer
ruby-1.9.2-p180 :011 > r["foo"]
TypeError: can't convert String into Integer
But I'm not able to access the match with the name, so it seems like a useless implementation. Am I missing something?

Looking at your code output, it seems that you are using the Ruby re2 gem which I maintain.
As of the latest release (0.2.0), the gem does not support the underlying C++ re2 library's named capturing groups. The error you are seeing is due to the fact that any non-integer argument passed to MatchData#[] will simply be forwarded onto the default Array#[]. You can confirm this in an irb session like so:
irb(main):001:0> a = [1, 2, 3]
=> [1, 2, 3]
irb(main):002:0> a["bob"]
TypeError: can't convert String into Integer
from (irb):2:in `[]'
from (irb):2
from /Users/mudge/.rbenv/versions/1.9.2-p290/bin/irb:12:in `<main>'
irb(main):003:0> a[:bob]
TypeError: can't convert Symbol into Integer
from (irb):3:in `[]'
from (irb):3
from /Users/mudge/.rbenv/versions/1.9.2-p290/bin/irb:12:in `<main>'
I will endeavour to add the ability to reference captures by name as soon as possible and update this answer once a release has been made.
Update: I just released version 0.3.0 which now supports named groups like so:
irb(main):001:0> r = RE2::Regexp.compile("(?P<foo>.+) bla")
=> #<RE2::Regexp /(?P<foo>.+) bla/>
irb(main):002:0> r = r.match("lalal bla")
=> #<RE2::MatchData "lalal bla" 1:"lalal">
irb(main):003:0> r[1]
=> "lalal"
irb(main):004:0> r[:foo]
=> "lalal"
irb(main):005:0> r["foo"]
=> "lalal"

Related

Different behavior of strings and symbols?

I was learning ruby recently from koans and i noticed one thing about symbols and string objects. When i assigned two different variables same symbols, i found that the object_id's were same.
2.1.1 :017 > symbol1 = :a
=> :a
2.1.1 :018 > symbol2 = :a
=> :a
2.1.1 :019 > symbol1.object_id
=> 361768
2.1.1 :020 > symbol2.object_id
=> 361768
Now seeing this i thought that it should be true strings and integers too. But when i did same with strings the object id's ended up being different.
2.1.1 :021 > string1 = "test"
=> "test"
2.1.1 :022 > string2 = "test"
=> "test"
2.1.1 :023 > string1.object_id
=> 13977640
2.1.1 :024 > string2.object_id
=> 13932280
Why is the behavior of symbols and strings different?
You can think of symbols as self-referential interned strings - that is, only one copy of a given symbol will ever exist. This is also true of some objects like Fixnum instances, booleans, or nil, as well. They are not garbage collected, are not duplicable, and are not mutable.
Strings, on the other hand, are garbage collected, are duplicable, and are mutable. Every time you declare a string, a new object is allocated.

Encoding and decoding ruby symbols

I discovered this behavior of multi_json ruby gem:
2.1.0 :001 > require 'multi_json'
=> true
2.1.0 :002 > sym = :symbol
=> :symbol
2.1.0 :003 > sym.class
=> Symbol
2.1.0 :004 > res = MultiJson.load MultiJson.dump(sym)
=> "symbol"
2.1.0 :005 > res.class
=> String
Is this an appropriate way to store ruby symbols? Does JSON provide some way to distinguish :symbol from "string"?
Nope is the simple answer. Most of the time it only really matters for hashes and there's a cheat on hashes, symbolize_keys!. Bottom line is that JSON does not understand symbols, just strings.
Since you are using MultiJson, you can also ask MultiJson to do this for you...
MultiJson.load('{"abc":"def"}', :symbolize_keys => true)

Sequel gem increment

I am trying to use the Ruby Sequel gem for DB operations.
I am stuck for incrementing and decrementing values.
The doc says that this should work, even though it seems very strange for me to be able to add a number and a symbol.
2.0.0-p247 :019 > require 'sequel'
=> true
2.0.0-p247 :020 > s = Sequel.connect('sqlite://db.sqlite')
=> #<Sequel::SQLite::Database: "sqlite://db.sqlite">
2.0.0-p247 :021 > s[:query_volume].update_sql(:queries => 3)
=> "UPDATE `query_volume` SET `queries` = 3"
2.0.0-p247 :022 > s[:query_volume].update_sql(:queries => :queries + 3)
NoMethodError: undefined method `+' for :queries:Symbol
from (irb):21
from /Users/avandra/.rvm/rubies/ruby-2.0.0-p247/bin/irb:16:in `<main>'
But as you can see it gives undefined method on the queries symbol. Which is kindof concurs with why it was strange for me.
I tried using curly braces, but that gives another error:
2.0.0-p247 :023 > s[:query_volume].update_sql{:queries => :queries + 3}
SyntaxError: (irb):23: syntax error, unexpected =>, expecting '}'
s[:query_volume].update_sql{:queries => :queries + 3}
^
from /Users/avandra/.rvm/rubies/ruby-2.0.0-p247/bin/irb:16:in `<main>'
And using
2.0.0-p247 :033 > s[:query_volume].update_sql{queries = queries + 3}
=> "UPDATE `query_volume` SET "
just gives a badly formatted SQL...
Could anyone shed some light on how this can be done?
You should use Sequel.expr for that:
s[:query_volume].update_sql(:queries => Sequel.expr(3) + :queries)

What does ! mean at the end of a Ruby method definition? [duplicate]

This question already has answers here:
Why are exclamation marks used in Ruby methods?
(12 answers)
Closed 9 years ago.
I'm trying to learn Ruby by reading code, but I bumped into the following situation, which I cannot find in any of my tutorials/cheatsheets.
def foo!
# do bar
return bar
end
What is the point of "!" in a method definition?
Ruby doesn't treat the ! as a special character at the end of a method name. By convention, methods ending in ! have some sort of side-effect or other issue that the method author is trying to draw attention to. Examples are methods that do in-place changes, or might throw an exception, or proceed with an action despite warnings.
For example, here's how String#upcase! compares to String#upcase:
1.9.3p392 :004 > foo = "whatever"
=> "whatever"
1.9.3p392 :005 > foo.upcase
=> "WHATEVER"
1.9.3p392 :006 > foo
=> "whatever"
1.9.3p392 :007 > foo.upcase!
=> "WHATEVER"
1.9.3p392 :008 > foo
=> "WHATEVER"
ActiveRecord makes extensive use of bang-methods for things like save!, which raises an exception on failure (vs save, which returns true/false but doesn't raise an exception).
It's a "heads up!" flag, but there's nothing that enforces this. You could end all your methods in !, if you wanted to confuse and/or scare people.
! is a "bang" method, which changes the receiver and is a convention in Ruby.
You can define a ! version which might work like a non-bang method, but then it would then mislead other programmers if they didn't look at your method definition.
bang method in turn returns nil when no changes made to the receiver.
Examples without ! - You can see that the source string has not been changed:
str = "hello"
p str.delete("l") #=> "heo"
p str #=> "hello"
Examples with ! - You can see that the source string has been changed:
str = "hello"
p str.delete!("l") #=> "heo"
p str #=> "heo"
NOTE: There are some non-bang version methods, which also can change the receiver object:
str = "hello"
p str.concat(" world") #=> "hello world"
p str #=> "hello world"
! is not a method definition but is an convention used when you declaring an method and this method will change the object.
1.9.3-p194 :004 > a="hello "
=> "hello "
1.9.3-p194 :005 > a.strip
=> "hello"
1.9.3-p194 :006 > a
=> "hello "
1.9.3-p194 :007 > a.strip!
=> "hello"
1.9.3-p194 :008 > a
=> "hello"

Why was the object_id for true and nil changed in ruby2.0?

I came across this ruby object_id allocation question sometime back and then read this awesome article which talks about VALUE and explains why object_id of true, nil and false the way they are. I have been toying with ruby2.0 object_id when I found the apparent change that has been made regarding object_id of true and nil.
forbidden:~$ ruby -v
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]
forbidden:~$
forbidden:~$ irb
irb(main):001:0> true.object_id
=> 20
irb(main):002:0> false.object_id
=> 0
irb(main):003:0> nil.object_id
=> 8
irb(main):004:0> exit
forbidden:~$
forbidden:~$ rvm use 1.9.3
Using /home/forbidden/.rvm/gems/ruby-1.9.3-p392
forbidden:~$ ruby -v
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-linux]
forbidden:~$
forbidden:~$ irb
irb(main):001:0> true.object_id
=> 2
irb(main):002:0> false.object_id
=> 0
irb(main):003:0> nil.object_id
=> 4
tl;dr: The values for true and nil were respectively 2, 4 in 1.9.3 and 1.8.7, but have been changed to 20, 8 in ruby2.0.0 - even though the id of false remains the same i.e. 0 and the ids for Fixnum maintains the same old 2n+1 pattern.
Also, the way Fixnum and Bignum are implemented is still the same in 2.0.0 as the example given in the above mentioned article also runs just the same way it used to:
irb(main):001:0>
irb(main):002:0* ((2**62)).class
=> Bignum
irb(main):003:0> ((2**62)-1).class
=> Fixnum
irb(main):004:0>
What's the reason behind this object_id change?
Why was this change made? How is this going to help developers?
A look at the Ruby source where these values are defined suggests that this has something to do with “flonums” (also see the commit where this was introduced). A search for ”flonum” came up with a message on the Ruby mailing list discussing it.
This is a technique for speeding up floating point calculations on 64 bit machines by using immediate values for some floating point vales, similar to using Fixnums for integers. The pattern for Flonums is ...xxxx xx10 (i.e. the last two bits are 10, where for fixnums the last bit is 1). The object_ids of other immediate values have been changed to accomodate this change.
You can see this change by looking at the object_ids of floats in Ruby 1.9.3 and 2.0.0.
In 1.9.3 different floats with the same value are different objects:
1.9.3p385 :001 > s = 10.234
=> 10.234
1.9.3p385 :002 > t = 10.234
=> 10.234
1.9.3p385 :003 > s.object_id
=> 2160496240
1.9.3p385 :004 > t.object_id
=> 2160508080
In 2.0.0 they are the same:
2.0.0p0 :001 > s = 10.234
=> 10.234
2.0.0p0 :002 > t = 10.234
=> 10.234
2.0.0p0 :003 > s.object_id
=> 82118635605473626
2.0.0p0 :004 > t.object_id
=> 82118635605473626

Resources