Regexp repetition defined by backreference

Regexp repetition defined by backreference - ruby

In Ruby (PCRE), is it possible to use a backreference to a captured decimal value to define a repetition length?
/^(\d+),.{\1}/.match('4,abcdefgh') # Should match '4,abcd'
The above code just returns nil (finds no matches).

You can use String#to_i, which gives you the number at the start:
str = '4,abcdefgh'
str.match(/^(\d+),.{#{str.to_i}}/) # => #<MatchData "4,abcd" 1:"4">

No, you can't do that with regular expressions. If the range of decimal values however is limited, you could build a regular expression containing all possible combinations, something like:
'1abcde2abcde3abcde4abcde'.scan(/1.{1}|2.{2}|3.{3}|4.{4}/)
#=> ["1a", "2ab", "3abc", "4abcd"]

You could use two regular expressions:
str = '4,abcdefgh'
str =~ /\A(\d+,)/
str[0,$1.size+$1.to_i]
#=> "4,abcd"

Related

Simple regex - ignoring certain characters

I'm trying to use the match method with an argument of a regex to select a valid phone number, by definition, any string with nine digits.
For example:
9347584987 is valid,
(456)322-3456 is valid,
(324)5688890 is valid.
But
(340)HelloWorld is NOT valid and
456748 is NOT valid.
So far, I'm able to use \d{9} to select the example string of 9 digit characters in a row, but I'm not sure how to specifically ignore any character, such as '-' or '(' or ')' in the middle of the sequence.
What kind of Regex could I use here?

Given:
nums=['9347584987','(456)322-3456','(324)5688890','(340)HelloWorld', '456748 is NOT valid']
You can split on a NON digit and rejoin to remove non digits:
> nums.map {|s| s.split(/\D/).join}
["9347584987", "4563223456", "3245688890", "340", "456748"]
Then filter on the length:
> nums.map {|s| s.split(/\D/).join}.select {|s| s.length==10}
["9347584987", "4563223456", "3245688890"]
Or, you can grab a group of numbers that look 'phony numbery' by using a regex to grab digits and common delimiters:
> nums.map {|s| s[/[\d\-()]+/]}
["9347584987", "(456)322-3456", "(324)5688890", "(340)", "456748"]
And then process that list as above.
That would delineate:
> '123 is NOT a valid area code for 456-7890'[/[\d\-()]+/]
=> "123" # no match
vs
> '123 is NOT a valid area code for 456-7890'.split(/\D/).join
=> "1234567890" # match

I suggest using one regular expression for each valid pattern rather than constructing a single regex. It would be easier to test and debug, and easier to maintain the code. If, for example, "123-456-7890" or 123-456-7890 x231" were in future deemed valid numbers, one need only add a single, simple regex for each to the array VALID_PATTERS below.
VALID_PATTERS = [/\A\d{10}\z/, /\A\(\d{3}\)\d{3}-\d{4}\z/, /\A\(\d{3}\)\d{7}\z/]
def valid?(str)
VALID_PATTERS.any? { |r| str.match?(r) }
end
ph_nbrs = %w| 9347584987 (456)322-3456 (324)5688890 (340)HelloWorld 456748 |
ph_nbrs.each { |s| puts "#{s.ljust(15)} \#=> #{valid?(s)}" }
9347584987 #=> true
(456)322-3456 #=> true
(324)5688890 #=> true
(340)HelloWorld #=> false
456748 #=> false
String#match? made its debut in Ruby v2.4. There are many alternatives, including str.match(r) and str =~ r.

"9347584987" =~ /(?:\d.*){9}/ #=> 0
"(456)322-3456" =~ /(?:\d.*){9}/ #=> 1
"(324)5688890" =~ /(?:\d.*){9}/ #=> 1
"(340)HelloWorld" =~ /(?:\d.*){9}/ #=> nil
"456748" =~ /(?:\d.*){9}/ #=> nil

Pattern: (Rubular Demo)
^\(?\d{3}\)?\d{3}-?\d{4}$ # this makes the expected symbols optional
This pattern will ensure that an opening ( at the start of the string is followed by 3 numbers the a closing ).
^(\(\d{3}\)|\d{3})\d{3}-?\d{4}$
On principle, though, I agree with melpomene in advising that you remove all non-digital characters, test for 9 character length, then store/handle the phone numbers in a single/reliable/basic format.

Can I use positive lookbehind to return a match in Ruby?

Suppose that I want to find all words in a given string that start with b and end with ing . However, I only want to return the portion of the
word that precedes the ing. Thus, if the word is bailing, I should only
match and return bail.
The below Ruby regex will certainly match:
\bt[a-zA-Z]*ing\b
but it doesn't return just the "bail" portion. Can I use some kind of lookahead or lookbehind assertion? If not, what is a good way to do this in Ruby?

words = "booster bailings balling failing"
words.scan /(?<=\b)b\w*?(?=ing\b)/
#⇒ ["ball"]

Here are two ways to extract the desired information.
str = "blathering fumbling blinging bérgering blings"
str.scan(/\bb[[:alpha:]]*(?=ing\b)/)
#=> ["blather", "bling", "bérger"]
str.scan(/\b(b[[:alpha:]]*)ing\b/).flatten
#=> ["blather", "bling", "bérger"]
whereas
str.scan(/\bb[a-zA-Z]*(?=ing\b)/)
#=> ["blather", "bling"]

Why does /[<>]/ not return both angle brackets with String#match?

I expect this example to match the two characters <and >:
a = "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
a.match /[<>]/
# => #<MatchData "<">
It matches only the first character. Why?

#match only returns the first match as you have seen as MatchData, #scan will return all matches.
>> a="<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
=> "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
>> a.scan /[<>]/
=> ["<", ">"]

Problem
You are misunderstanding your expression. /[<>]/ means:
Match a single character from the character class, which may be either < or >.
Ruby is correctly giving you exactly what you've asked for in your pattern.
Solution
If you're expecting the entire string between the two characters, you need a different pattern. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".match /<.*?>/
#=> #<MatchData "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>">
Alternatively, if you just want to match all the instances of < or > in your string, then you should use String#scan with a character class or alternation. In this particular case, the results will be identical either way. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /<|>/
#=> ["<", ">"]
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /[<>]/
#=> ["<", ">"]

How can I specify terms of a match in a regular expression but only capture part of the match?

I want to match any abc so long as it does not follow x.
I would be inclined to use the regular expression /[^x]abc/, but then if I write dabc it matches dabc, while I want it to only match the abc part.
How can I use something like [^x] to qualify the start of the regular expression without it counting in my match?

Use Negative Lookbehind
You can use a negative lookbehind, and then access the last match through the special variable $&. For example:
domatch = 'foo abc'
nomatch = 'xabc'
pattern = /(?<!x)abc/
domatch.match pattern
$&
#=> "abc"
nomatch.match pattern
$&
#=> nil
See the Regexp class for details.

Adding () will create a matching group that you can reference:
a = 'dabc'
p a.match(/[^x](abc)/)[1]
=> "abc"

Use the square brackets and then as a second argument specify the subgroup, in this case w only have one, so 1 will get the subgroup.
my_string = 'dabc'
my_string[/[^x](abc)/, 1] # => 'abc'

String containment

Is there a way to check in Ruby whether the string "1:/2" is contained within a larger string str, beside iterating over all positions of str?

You can use the include? method
str = "wdadwada1:/2wwedaw"
# => "wdadwada1:/2wwedaw"
str.include? "1:/2"
# => true

A regular expression will do that.
s =~ /1:\/2/
This will return either nil if s does not contain the string, or the integer position if it does. Since nil is falsy and an integer is truthy, you can use this expression in an if statement:
if s =~ /1:\/2/
...
end
The regular expression is normally delimited by /, which is why the slash within the regular expression is escaped as \/
It is possible to use a different delimiter to avoid having to escape the /:
s =~ %r"1:/2"
You could use other characters than " with this syntax, if you want.

The simplest and most straight-forward is to simply ask the string if it contains the sub-string:
"...the string 1:/2 is contained..."['1:/2']
# => "1:/2"
!!"...the string 1:/2 is contained..."['1:/2']
# => true
The documentation has the full scoop; Look at the last two examples.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regexp repetition defined by backreference - ruby

In Ruby (PCRE), is it possible to use a backreference to a captured decimal value to define a repetition length? /^(\d+),.{\1}/.match('4,abcdefgh') # Should match '4,abcd' The above code just returns nil (finds no matches).

You can use String#to_i, which gives you the number at the start: str = '4,abcdefgh' str.match(/^(\d+),.{#{str.to_i}}/) # => #<MatchData "4,abcd" 1:"4">

No, you can't do that with regular expressions. If the range of decimal values however is limited, you could build a regular expression containing all possible combinations, something like: '1abcde2abcde3abcde4abcde'.scan(/1.{1}|2.{2}|3.{3}|4.{4}/) #=> ["1a", "2ab", "3abc", "4abcd"]

You could use two regular expressions: str = '4,abcdefgh' str =~ /\A(\d+,)/ str[0,$1.size+$1.to_i] #=> "4,abcd"

Related

Simple regex - ignoring certain characters

Can I use positive lookbehind to return a match in Ruby?

Why does /[<>]/ not return both angle brackets with String#match?

How can I specify terms of a match in a regular expression but only capture part of the match?

String containment

Categories

Resources