Suppose I have the following test string:
I want to match:
123
456
I want to use a regex to capture 123 and 456.
I'm trying this:
I want to match:(?:\n\s\s(\d*))*
but it only captures the last group. Any ideas?
I want to match:\K|\G(?!^)(?:\n\s\s(\d+))
You can use \G for this as regex engine only remembers the last group.See demo.
https://regex101.com/r/pG1kU1/13
You can use String#scan
s = <<-STR
123
456
STR
s.scan /\d+/ #=> ["123", "456"]
s.scan /(\d+)/ #=> [["123"], ["456"]]
Once you match the part I want to match: in the string, your next matching point will be somewhere after that. This means that you can only match once with that regex, given the format of your test string.
Unfortunately, you cannot get all the captured values of a capture group.
A good explanation is here, where it says:
The Returned Value for a Given Group is the Last One Captured
Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string A_B_C_D_ with ([A-Z])+, when you inspect the match, Group 1 will be D. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.
So, if you know that the lines you want to capture will at most be 2, then you may try something like this:
I want to match:(?:\n\s\s(\d*))?(?:\n\s\s(\d*))?
i.e. manually repeat the capture group for each line.
You can capture your example string like this: \n*\s*(\d+)
Related
I am trying to write a regular expression to get the value in between parentheses. I expect a value without parentheses. For example, given:
value = "John sinu.s(14)"
I expected to get 14.
I tried the following:
value[/\(.*?\)/]
but it gives the result (14). Please help me.
You may do that using
value[/\((.*?)\)/, 1]
or
value[/\(([^()]*)\)/, 1]
Use a capturing group and a second argument to extract just the group value.
Note that \((.*?)\) will also match a substring that contains ( char in it, and the second option will only match a substring between parentheses that does not contain ( nor ) since [^()] is a negated character class that matches any char but ( and ).
See the Ruby demo online.
From the Ruby docs:
str[regexp, capture] → new_str or nil
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
In case you need to extract multiple occurrences, use String#scan:
value = "John sinu.s(14) and Jack(156)"
puts value.scan(/\(([^()]*)\)/)
# => [ 14, 156 ]
See another Ruby demo.
Another option is to use non-capturing look arounds like this
value[/(?<=\().*(?=\))/]
(?<=\() - positive look behind make sure there is ( but don't capture it
(?=\)) - positive look ahead make sure the regex ends with ) but don't capture it
You can use
/(?<=\\()[^\\)]+/g
which selects string inside brackets without brackets
Only thing you need is "positive lookahead" feature
Follow this link for more info about positive lookahead in special groups.
I don't know if it is supported in ruby
Try using this regular expression
/\((.*?)\)/
\( will match your opening parenthesis in the string
(.*?) creates a capturing group
\) will match your closing parenthesis
Do you wish to extract the string between the parentheses or do that using a regular expression? You specify the latter in the question but it's conceivable your question is really the former and you are assuming that a regular expression must be used.
If you just want the value, without any restriction on the method used to obtain it, you could do that quite simply using String#index and String#rindex.
s = "John sinu.s(14)"
s[s.index('(')+1 .. s.rindex(')')-1]
#=> "14"
I am trying to fix a bit of regex I have for a chatops bot for lita. I have the following regex:
/^(?:how\s+do\s+I\s+you\s+get\s+far\s+is\s+it\s+from\s+)?(.+)\s+to\s+(.+)/i
This is supposed to capture the words before and after 'to', with optional words in front that can form questions like: How do I get from x to y, how far from x to y, how far is it from x to y.
expected output:
match 1 : "x"
match 2 : "y"
For the most part my optional words work as expected. But when I pull my response matches, I get the words leading up to the first capture group included.
So, how far is it from sfo to lax should return:
sfo and lax.
But instead returns:
how far is it from sfo and lax
Your glitch is that the first chunk of your regex doesn't make sense.
To choose from multiple options, use this syntax:
(a|b|c)
What I think you're trying to do is this:
/^(?:(?:how|do|I|you|get|far|is|it|from)\s+)*(.+)\s+to\s+(.+)/i
The regexp says to skip all the words in the multiple options, regardless of order.
If you want to preserve word order, you can use regexps such as this pseudocode:
… how (can|do|will) (I|you|we) (get|go|travel) from …
When you want to match words, \w is the most natural pattern I'd use (e.g., it is used in word count tools.)
To capture any 1 word before and after a "to" can be done with (\w+\sto\s+\w*) regex.
To return them as 2 different groups, you can use (\w+)\s+to\s+(\w+).
Have a look at the demo.
I'm not quite sure I understand how non-capturing groups work. I am looking for a regex to produce this result: 5.214. I thought the regex below would work, but it is replacing everything including the non-capture groups. How can I write a regex to only replace the capture groups?
"5,214".gsub(/(?:\d)(,)(?:\d)/, '.')
# => ".14"
My desired result:
"5,214".gsub(some_regex)
#=> "5.214
non capturing groups still consumes the match
use
"5,214".gsub(/(\d+)(,)(\d+)/, '\1.\3')
or
"5,214".gsub(/(?<=\d+)(,)(?=\d+)/, '.')
You can't. gsub replaces the entire match; it does not do anything with the captured groups. It will not make any difference whether the groups are captured or not.
In order to achieve the result, you need to use lookbehind and lookahead.
"5,214".gsub(/(?<=\d),(?=\d)/, '.')
It is also possible to use Regexp.last_match (also available via $~) in the block version to get access to the full MatchData:
"5,214".gsub(/(\d),(\d)/) { |_|
match = Regexp.last_match
"#{match[1]}.#{match[2]}"
}
This scales better to more involved use-cases.
Nota bene, from the Ruby docs:
the ::last_match is local to the thread and method scope of the method that did the pattern match.
gsub replaces the entire match the regular expression engine produces. Both capturing/non-capturing group constructs are not retained. However, you could use lookaround assertions which do not "consume" any characters on the string.
"5,214".gsub(/\d\K,(?=\d)/, '.')
Explanation: The \K escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included. That being said, we then look for and match the comma, and the Positive Lookahead asserts that a digit follows.
I know nothing about ruby.
But from what i see in the tutorial
gsub mean replace,
the pattern should be /(?<=\d+),(?=\d+)/ just replace the comma with dot
or, use capture /(\d+),(\d+)/ replace the string with "\1.\2"?
You can easily reference capture groups in the replacement string (second argument) like so:
"5,214".gsub(/(\d+)(,)(\d+)/, '\1.\3')
#=> "5.214"
\0 will return the whole matched string.
\1 will be replaced by the first capturing group.
\2 will be replaced by the second capturing group etc.
You could rewrite the example above using a non-capturing group for the , char.
"5,214".gsub(/(\d+)(?:,)(\d+)/, '\1.\2')
#=> "5.214"
As you can see, the part after the comma is now the second capturing group, since we defined the middle group as non-capturing.
Although it's kind of pointless in this case. You can just omit the capturing group for , altogether
"5,214".gsub(/(\d+),(\d+)/, '\1.\2')
#=> "5.214"
You don't need regexp to achieve what you need:
'1,200.00'.tr('.','!').tr(',','.').tr('!', ',')
Periods become bangs (1,200!00)
Commas become periods (1.200!00)
Bangs become commas (1.200,00)
In Ruby, I want to have a regex match either of two expressions with a single group in the result. I want the following results:
regex = /you tell me/
regex.match(%|My name is "Peter"|)[1]
=> "Peter"
regex.match(%|My name is 'Peter'|)[1]
=> "Peter"
Note that I want the 1st group to refer to just Peter with no quotes, and I want there to be exactly one group matched in either case. Just as an example, this would match the first case (only):
/^My name is "([^"]*)"$/
I'd like something similar to that. I happen to be using this for cucumber testing.
This regex might work for you
['"](\w+)['"]
It matches exactly one group. But it also allows unbalanced quotes, like 'Peter"
If you want to match only balanced quotes, then you can't do it with a single group (I'm afraid).
Anyhow, here's my take:
('|")(\w+)\1
It matches two groups and "Peter" is in the second one.
http://rubular.com/r/C78X0wwGej
(?=['"](\w+)['"])(?:"\1"|'\1')
I am getting completely different reults from string.scan and several regex testers...
I am just trying to grab the domain from the string, it is the last word.
The regex in question:
/([a-zA-Z0-9\-]*\.)*\w{1,4}$/
The string (1 single line, verified in Ruby's runtime btw)
str = 'Show more results from software.informer.com'
Work fine, but in ruby....
irb(main):050:0> str.scan /([a-zA-Z0-9\-]*\.)*\w{1,4}$/
=> [["informer."]]
I would think that I would get a match on software.informer.com ,which is my goal.
Your regex is correct, the result has to do with the way String#scan behaves. From the official documentation:
"If the pattern contains groups, each individual result is itself an array containing one entry per group."
Basically, if you put parentheses around the whole regex, the first element of each array in your results will be what you expect.
It does not look as if you expect more than one result (especially as the regex is anchored). In that case there is no reason to use scan.
'Show more results from software.informer.com'[ /([a-zA-Z0-9\-]*\.)*\w{1,4}$/ ]
#=> "software.informer.com"
If you do need to use scan (in which case you obviously need to remove the anchor), you can use (?:) to create non-capturing groups.
'foo.bar.baz lala software.informer.com'.scan( /(?:[a-zA-Z0-9\-]*\.)*\w{1,4}/ )
#=> ["foo.bar.baz", "lala", "software.informer.com"]
You are getting a match on software.informer.com. Check the value of $&. The return of scan is an array of the captured groups. Add capturing parentheses around the suffix, and you'll get the .com as part of the return value from scan as well.
The regex testers and Ruby are not disagreeing about the fundamental issue (the regex itself). Rather, their interfaces are differing in what they are emphasizing. When you run scan in irb, the first thing you'll see is the return value from scan (an Array of the captured subpatterns), which is not the same thing as the matched text. Regex testers are most likely oriented toward displaying the matched text.
How about doing this :
/([a-zA-Z0-9\-]*\.*\w{1,4})$/
This returns
informer.com
On your test string.
http://rubular.com/regexes/13670