How to embed regular expressions in other regular expressions in Ruby

How to embed regular expressions in other regular expressions in Ruby - ruby

I have a string:
'A Foo'
and want to find "Foo" in it.
I have a regular expression:
/foo/
that I'm embedding into another case-insensitive regular expression, so I can build the pattern in steps:
foo_regex = /foo/
pattern = /A #{ foo_regex }/i
But it won't match correctly:
'A Foo' =~ pattern # => nil
If I embed the text directly into the pattern it works:
'A Foo' =~ /A foo/i # => 0
What's wrong?

On the surface it seems that embedding a pattern inside another pattern would simply work, but that's based on a bad assumption of how patterns work in Ruby, that they're simply strings. Using:
foo_regex = /foo/
creates a Regexp object:
/foo/.class # => Regexp
As such it has knowledge of the optional flags used to create it:
( /foo/ ).options # => 0
( /foo/i ).options # => 1
( /foo/x ).options # => 2
( /foo/ix ).options # => 3
( /foo/m ).options # => 4
( /foo/im ).options # => 5
( /foo/mx ).options # => 6
( /foo/imx ).options # => 7
or, if you like binary:
'%04b' % ( /foo/ ).options # => "0000"
'%04b' % ( /foo/i ).options # => "0001"
'%04b' % ( /foo/x ).options # => "0010"
'%04b' % ( /foo/xi ).options # => "0011"
'%04b' % ( /foo/m ).options # => "0100"
'%04b' % ( /foo/mi ).options # => "0101"
'%04b' % ( /foo/mx ).options # => "0110"
'%04b' % ( /foo/mxi ).options # => "0111"
and remembers those whenever the Regexp is used, whether as a standalone pattern or if embedded in another.
You can see this in action if we look to see what the pattern looks like after embedding:
/#{ /foo/ }/ # => /(?-mix:foo)/
/#{ /foo/i }/ # => /(?i-mx:foo)/
?-mix: and ?i-mx: are how those options are represented in an embedded-pattern.
According to the Regexp documentation for Options:
i, m, and x can also be applied on the subexpression level with the (?on-off) construct, which enables options on, and disables options off for the expression enclosed by the parentheses.
So, Regexp is remembering those options, even inside the outer pattern, causing the overall pattern to fail the match:
pattern = /A #{ foo_regex }/i # => /A (?-mix:foo)/i
'A Foo' =~ pattern # => nil
It's possible to make sure that all sub-expressions match their surrounding patterns, however that can quickly become too convoluted or messy:
foo_regex = /foo/i
pattern = /A #{ foo_regex }/i # => /A (?i-mx:foo)/i
'A Foo' =~ pattern # => 0
Instead we have the source method which returns the text of a pattern:
/#{ /foo/.source }/ # => /foo/
/#{ /foo/i.source }/ # => /foo/
The problem with the embedded pattern remembering the options also appears when using other Regexp methods, such as union:
/#{ Regexp.union(%w[a b]) }/ # => /(?-mix:a|b)/
and again, source can help:
/#{ Regexp.union(%w[a b]).source }/ # => /a|b/
Knowing all that:
foo_regex = /foo/
pattern = /#{ foo_regex.source }/i # => /foo/i
'A Foo' =~ pattern # => 2

"what's wrong?"
Your assumption on how a Regexp is interpolated is wrong.
Interpolation via #{...} is done by calling to_s on the interpolated object:
d = Date.new(2017, 9, 8)
#=> #<Date: 2017-09-08 ((2458005j,0s,0n),+0s,2299161j)>
d.to_s
#=> "2017-09-08"
"today is #{d}!"
#=> "today is 2017-09-08!"
and not just for string literals, but also for regular expression literals:
/today is #{d}!/
#=> /today is 2017-09-08!/
In your example, the object-to-be-interpolated is a Regexp:
foo_regex = /foo/
And Regexp#to_s returns:
[...] the regular expression and its options using the (?opts:source) notation.
foo_regex.to_s
#=> "(?-mix:foo)"
Therefore:
/A #{foo_regex}/i
#=> /A (?-mix:foo)/i
Just like:
"A #{foo_regex}"
#=> "A (?-mix:foo)"
In other words: because of the way Regexp#to_s is implemented, you can interpolate patterns without loosing their flags. It's a feature, not a bug.
If Regexp#to_s would return just the source (without options), it would work the way you expect:
def foo_regex.to_s
source
end
/A #{foo_regex}/i
#=> /A foo/i
The above code is just for demonstration purposes, don't do that.

Related

Ruby - regex that matches string to pattern and detects unwanted occurrences [duplicate]

How do I a string against a regex such that it will return true if the whole string matches (not a substring)?
eg:
test( \ee\ , "street" ) #=> returns false
test( \ee\ , "ee" ) #=> returns true!
Thank you.

You can match the beginning of the string with \A and the end with \Z. In ruby ^ and $ match also the beginning and end of the line, respectively:
>> "a\na" =~ /^a$/
=> 0
>> "a\na" =~ /\Aa\Z/
=> nil
>> "a\na" =~ /\Aa\na\Z/
=> 0

This seems to work for me, although it does look ugly (probably a more attractive way it can be done):
!(string =~ /^ee$/).nil?
Of course everything inside // above can be any regex you want.
Example:
>> string = "street"
=> "street"
>> !(string =~ /^ee$/).nil?
=> false
>> string = "ee"
=> "ee"
>> !(string =~ /^ee$/).nil?
=> true
Note: Tested in Rails console with ruby (1.8.7) and rails (3.1.1)

So, what you are asking is how to test whether the two strings are equal, right? Just use string equality! This passes every single one of the examples that both you and Tomas cited:
'ee' == 'street' # => false
'ee' == 'ee' # => true
"a\na" == 'a' # => false
"a\na" == "a\na" # => true

Ruby regex to extract match_group value?

I have two questions about regex.
The match string is:
"FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, user_email=admin#example.com"
When extracting the user_email value, my regexp is:
\s+(?<email_from_header>\S+)
and the match group value is:
(space)user_email=admin#example.com"
What do I use to omit the first (space) char and the last " quote?
When extracting the token, my regex is:
AUTH-TOKEN\s+(?<auth_token>\S+)
and the match group value is:
FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA,
What do I use to delete that last trailing comma ,?

Your regex would be,
\s+\K(?<email_from_header>[^"]*)
Use \K switch to discard the previously matched characters. And use not character class to match any character not of " zero or more times.
Your regex would be,
AUTH-TOKEN\s+(?<auth_token>[^,]*)
[^,]* it would match any character not of , zero or more times.

If your string has embedded double-quotes:
str[/^"(.+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^"(.+?),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^"([^,]+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str = '"FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, user_email=admin#example.com"'
str # => "\"FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, user_email=admin#example.com\""
str[/(user_email=.+)"/, 1] # => "user_email=admin#example.com"
str[/(user_email=[^"]+)"/, 1] # => "user_email=admin#example.com"
str[/user_email=([^"]+)"/, 1] # => "admin#example.com"
match = str.match(/(?<user_email>user_email=(?<addr>.+))"/)
match # => #<MatchData "user_email=admin#example.com\"" user_email:"user_email=admin#example.com" addr:"admin#example.com">
match['user_email'] # => "user_email=admin#example.com"
match['addr'] # => "admin#example.com"
If it doesn't:
str = 'FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, user_email=admin#example.com'
str # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, user_email=admin#example.com"
str[/^(.+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^(.+?),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^([^,]+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/(user_email=.+)/, 1] # => "user_email=admin#example.com"
str[/(user_email=(.+))/, 2] # => "admin#example.com"
str[/user_email=(.+)/, 1] # => "admin#example.com"
Or, having more regex fun:
match = str.match(/(?<user_email>user_email=(?<addr>.+))/)
match # => #<MatchData "user_email=admin#example.com" user_email:"user_email=admin#example.com" addr:"admin#example.com">
match['user_email'] # => "user_email=admin#example.com"
match['addr'] # => "admin#example.com"
Regular expressions are a very rich language, and you can write something in many ways usually. The problem then becomes maintaining the pattern as the program "matures". I recommend starting simply, and expanding the pattern as the needs dictate. Don't start complex hoping to find a working solution, because that usually doesn't work; Getting a complex pattern to work immediately often fails.

Regular expression for only 2 letters

I need to create regular expression for 2 and only 2 letters. I understood it has to be the following /[a-z]{2}/i, but it matches any string with 2 or more letters. Here is what I get:
my_reg_exp = /[a-z]{2}/i
my_reg_exp.match('aa') # => #<MatchData "aa">
my_reg_exp.match('AA') # => #<MatchData "AA">
my_reg_exp.match('a') # => nil
my_reg_exp.match('aaa') # => #<MatchData "aa">
Any suggestion?

You can add the anchors like this:
my_reg_exp = /^[a-z]{2}$/i
Test:
my_reg_exp.match('aaa')
#=> nil
my_reg_exp.match('aa')
#=> #<MatchData "aa">

Hao's solution matches isn't locale sensitive. If this is important for your use case:
/\a[[:alpha:]]{2}\z/
2.0.0-p451 :005 > 'aba' =~ /\A[[:alpha:]]{2}\Z/
=> nil
2.0.0-p451 :006 > 'ab' =~ /\A[[:alpha:]]{2}\Z/
=> 0
2.0.0-p451 :007 > 'xy' =~ /\A[[:alpha:]]{2}\Z/
=> 0
2.0.0-p451 :008 > 'zxy' =~ /\A[[:alpha:]]{2}\Z/
=> nil
Per usual, if you need further assistance, leave a comment.

You can use /\b[a-z]{2}\b/i to match a two-letter string. /b Matches a word-break.
This means you can scan a string to find all occurrences:
'Foo is a bar'.scan(/\b[a-z]{2}\b/i) #=> ["is"]
Or find the first match in a string using:
'a bc def'[/\b[a-z]{2}\b/i] # => "bc"

Trim a trailing .0

I have an Excel column containing part numbers. Here is a sample
As you can see, it can be many different datatypes: Float, Int, and String. I am using roo gem to read the file. The problem is that roo interprets integer cells as Float, adding a trailing zero to them (16431 => 16431.0). I want to trim this trailing zero. I cannot use to_i because it will trim all the trailing numbers of the cells that require a decimal in them (the first row in the above example) and will cut everything after a string char in the String rows (the last row in the above example).
Currently, I have a a method that checks the last two characters of the cell and trims them if they are ".0"
def trim(row)
if row[0].to_s[-2..-1] == ".0"
row[0] = row[0].to_s[0..-3]
end
end
This works, but it feels terrible and hacky. What is the proper way of getting my Excel file contents into a Ruby data structure?

def trim num
i, f = num.to_i, num.to_f
i == f ? i : f
end
trim(2.5) # => 2.5
trim(23) # => 23
or, from string:
def convert x
Float(x)
i, f = x.to_i, x.to_f
i == f ? i : f
rescue ArgumentError
x
end
convert("fjf") # => "fjf"
convert("2.5") # => 2.5
convert("23") # => 23
convert("2.0") # => 2
convert("1.00") # => 1
convert("1.10") # => 1.1

For those using Rails, ActionView has the number_with_precision method that takes a strip_insignificant_zeros: true argument to handle this.
number_with_precision(13.00, precision: 2, strip_insignificant_zeros: true)
# => 13
number_with_precision(13.25, precision: 2, strip_insignificant_zeros: true)
# => 13.25
See the number_with_precision documentation for more information.

This should cover your needs in most cases: some_value.gsub(/(\.)0+$/, '').
It trims all trailing zeroes and a decimal point followed only by zeroes. Otherwise, it leaves the string alone.
It's also very performant, as it is entirely string-based, requiring no floating point or integer conversions, assuming your input value is already a string:
Loading development environment (Rails 3.2.19)
irb(main):001:0> '123.0'.gsub(/(\.)0+$/, '')
=> "123"
irb(main):002:0> '123.000'.gsub(/(\.)0+$/, '')
=> "123"
irb(main):003:0> '123.560'.gsub(/(\.)0+$/, '')
=> "123.560"
irb(main):004:0> '123.'.gsub(/(\.)0+$/, '')
=> "123."
irb(main):005:0> '123'.gsub(/(\.)0+$/, '')
=> "123"
irb(main):006:0> '100'.gsub(/(\.)0+$/, '')
=> "100"
irb(main):007:0> '127.0.0.1'.gsub(/(\.)0+$/, '')
=> "127.0.0.1"
irb(main):008:0> '123xzy45'.gsub(/(\.)0+$/, '')
=> "123xzy45"
irb(main):009:0> '123xzy45.0'.gsub(/(\.)0+$/, '')
=> "123xzy45"
irb(main):010:0> 'Bobby McGee'.gsub(/(\.)0+$/, '')
=> "Bobby McGee"
irb(main):011:0>

Numeric values are returned as type :float
def convert_cell(cell)
if cell.is_a?(Float)
i = cell.to_i
cell == i.to_f ? i : cell
else
cell
end
end
convert_cell("foobar") # => "foobar"
convert_cell(123) # => 123
convert_cell(123.4) # => 123.4

How do I test a WHOLE string against regex in ruby?

How do I a string against a regex such that it will return true if the whole string matches (not a substring)?
eg:
test( \ee\ , "street" ) #=> returns false
test( \ee\ , "ee" ) #=> returns true!
Thank you.

You can match the beginning of the string with \A and the end with \Z. In ruby ^ and $ match also the beginning and end of the line, respectively:
>> "a\na" =~ /^a$/
=> 0
>> "a\na" =~ /\Aa\Z/
=> nil
>> "a\na" =~ /\Aa\na\Z/
=> 0

This seems to work for me, although it does look ugly (probably a more attractive way it can be done):
!(string =~ /^ee$/).nil?
Of course everything inside // above can be any regex you want.
Example:
>> string = "street"
=> "street"
>> !(string =~ /^ee$/).nil?
=> false
>> string = "ee"
=> "ee"
>> !(string =~ /^ee$/).nil?
=> true
Note: Tested in Rails console with ruby (1.8.7) and rails (3.1.1)

So, what you are asking is how to test whether the two strings are equal, right? Just use string equality! This passes every single one of the examples that both you and Tomas cited:
'ee' == 'street' # => false
'ee' == 'ee' # => true
"a\na" == 'a' # => false
"a\na" == "a\na" # => true

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to embed regular expressions in other regular expressions in Ruby - ruby

Related

Ruby - regex that matches string to pattern and detects unwanted occurrences [duplicate]

Ruby regex to extract match_group value?

Regular expression for only 2 letters

Trim a trailing .0

How do I test a WHOLE string against regex in ruby?

Categories

Resources