Are Ruby regular expressions Perl or POSIX-compatible? - ruby

Are Ruby regular expressions PCREs and/or POSIX-compatible basic/extended regexes?

Are Ruby regular expressions PCREs and/or POSIX basic/extended regexes?
No, they are Ruby Regexps.

Related

How to use lookbehind regexp in go?

I'm trying to convert this ruby regex to go.
GROUP_CALL = /^(?<i1>[ \t]*)group\(?[ \t]*(?<grps>#{SYMBOLS})[ \t]*\)?[ \t]+do[ \t]*?\n(?<blk>.*?)\n^\k<i1>end[ \t]*$/m
I converted it to
groupCall := regexp.MustCompile("^(?P<i1>[ \\t]*)group\\(?[ \\t]*(?P<grps>(?::\\w+|:\"[^\"#]+\"|:'[^']+')([ \\t]*,[ \\t]*(?::\\w+|:\"[^\"#]+\"|:'[^']+'))*)[ \\t]*\\)?[ \\t]+do[ \\t]*?\\n(?P<blk>.*?)\\n^\\k<i1>end[ \\t]*$/s")
but when run I get this error
error parsing regexp: invalid escape sequence: \k
There's no mention of \k in the go docs, is there no equivalent in go?
lookbehinds aren't supported neither are backreferences like #stribizhev mentioned.
Regular Expression 2 (RE2) Syntax Reference:
https://github.com/google/re2/wiki/Syntax
The syntax of the regular expressions accepted is the same general
syntax used by Perl, Python, and other languages. More precisely, it
is the syntax accepted by RE2 and described at
//code.google.com/p/re2/wiki/Syntax, except for \C. --GoLang Docs
(ref: https://golang.org/pkg/regexp/)

Regex error in Ruby 1.8.7 but not 2.0?

In Ruby 1.8.7 the following regex warning: nested repeat operator + and * was replaced with '*'.
^(\w+\.\w+)\|(\w+\.\w+)\n+*$
It does work in Ruby 2.0 though?
http://rubular.com/r/nRUSP5LNZA
A nested operator works, but is warned because it is useless. \n+* means:
Zero or more repeatition of
One or more repeatition of
\n
which is equivalent to a more simple expression \n*, which means:
Zero or more repeatition of
\n
There is no reason to use \n+*. Ruby regex engine was replaced in Ruby 1.9 and in Ruby 2.0, and if there are any differences, then it is simply that the newer engine does not check for warnings as the older one did.

Regular Expressions metacharacters, \< and \>, no longer supported in latest Ruby version?

When I investigated this in irb, I found that the metacharacters, \< and \>, returns nil when I expected a value. Under the cheatsheet I'm using, these metacharacters are called "start-of-word" and "end-of-word" respectively. But don't they function the same as "word boundaries"?
It seems to hold true for the examples in "Mastering Regular Expression" by Friedl.
irb(main):001:0> "this cat is fat" =~ /\bcat\b/
=> 5
irb(main):002:0> "this cat is fat" =~ /\<cat\>/
=> nil
irb(main):003:0> "cat" =~ /\<cat\>/
=> nil
That's entirely possible. As of Ruby 1.9, Ruby switched to Oniguruma for regular expression parsing. It's possible that prior to 1.9, \< and > were valid.
However, in researching this, I found that they are listed as a specifically GNU addition to the regex language.
Playing with it in Rubular, which supports running a regex through several different ruby implementations, I couldn't get \< or > to work in any version. \b seems to be the more standard way of specifying a word boundary...
Ruby has always used \b for word boundaries, just like Perl and the other Perl-derived flavors (JavaScript, .NET, Python, etc.). It's only GNU tools like egrep that use \< and \>.
Is that the AddedBytes cheat sheet you're using? It contains several errors, that being one of them. Elsewhere it says < and > are metacharacters and you have to use \< and \> to match them literally; how's that for a Catch-22?

Match unicode text with Ruby 1.8.7

I have a regex that is used for matching unicode string and works pretty cool with all versions of Ruby newer than 1.8.7:
/[\p{L}\p{Space}]+/u
How it can be achieved with Ruby 1.8.7?
Unicode properties were added in Ruby with version 1.9, so in older versions you have to use Posix classes like [:space:] or [:alpha:]
See POSIX Bracket Expressions for more details.

Ruby: hexadecimal in regular expressions

I need to match an md5 checksum in a regular expression in a Ruby (actually Rails) program. I found out somewhere that I can match hexadecimal strings with \h sequence, but I can't find the link anymore.
I'm using that sequence and my code is working in Ruby 1.9.2. I can make it working even under plain IRB (so it's not a Rails extension).
ruby-1.9.2-p180 :007 > "123abcdf" =~ /^\h+$/; $~
=> #<MatchData "123abcdf">
ruby-1.9.2-p180 :008 > "123abcdfg" =~ /^\h+$/; $~
=> nil
However my IDE mark that expression as wrong and I can't find any reference which cites that sequence.
Is the \h sequence legal in Ruby Regex under any environment/version or should I trust my ide and replace it with something like [abcdef\d]?
Yes it is. Check the official doc for the complete documentation for regex in Ruby.
Note that \h will match uppercase letters too, so it's actually equivalent to [a-fA-F\d]
According to this \h is part of oniguruma, which I believe is standard in ruby 1.9.

Resources