In Perl, you can do this:
(?x)
(?(DEFINE)
(?<animal>dog|cat)
)
(?&animal)
In Ruby (Oniguruma engine), it seems that the (?(DEFINE... syntax is not supported. Also, (?&... becomes \g. So, you can do this:
(?x)
(?<animal>dog|cat)
\g<animal>
But of course, this is not equivalent to the Perl example I gave above, becuase the first (?<animal>dog|cat) is not ignored, since there isn't anything like (?(DEFINE....
If I want to define a large regex with a bunch of named subroutines, what I could once do in Perl can't be done this way.
It does seem that I could hack together a pretty awkward solution by doing something like this:
(?x)
(?:^$DEFINE
(?<animal>dog|cat)
){0}
\g<animal>
But, that is pretty hackish. Is there a better way to do this? Does Oniguruma support a way to define named subroutines without having to try to "match" them first?
Alternatively, if there is a way to get true PCRE to work in Ruby, with ?(DEFINE... and (?&... I'd take that too.
Thanks!
You don't need a so complicated hack. Writing:
(?x)
(?<animal>dog|cat){0}
(?<color>red|green|blue){0}
...
your main pattern here
does exactly the same.
Putting all group definitions inside (?:^$DEFINE ... ){0} is only cosmetic.
Note that a group with the quantifier {0} isn't tried at all (the quantifier is taken in account first), and if in this way the named group is defined anyway, man can deduce that it isn't really a hack, but the way to do it with oniguruma.
Related
My company uses FreeBSD, and therefore FreeBSD's flavor of make.
A few of our in-house ports include something like this (where BRANCH is something that came from an SVN URL, either 'trunk' or a branch name like 'branches/1.2.3').
PORTVERSION= ${BRANCH:C,^branches/,,}
The Variable modifiers section of make(1) documents the :C colon-c modifier as
:C/pattern/replacement/[1gW]
Am I looking at the right documentation? ^branches/ looks like a regex pattern to me, but it looks like the actual code uses , instead of / as a separator. Did I skip documentation explaining that?
The documentation says:
:C/pattern/replacement/[1gW]
The :C modifier is just like the :S modifier except that the old and new strings, instead of being simple strings, are an extended regular expression (see regex(3)) string pattern and an ed(1)-style string replacement.
and in :S:
Any character may be used as a delimiter for the parts of the modifier string.
As #MadScientist pointed out, it's quite common to use a different delimiter, especially when / is a part of pattern or replacement string, like in your case. Otherwise it would require escaping and would look like ${BRANCH:C/^branches\///} which seems less readable.
I'm trying to take out the the content inside the parenthesis. For example, if the string is "(blah blah) This is stack(over)flow", I want to just take out "(blah blah)" but leave "(over)" alone. I'm trying
/\A\(.*\)/
but returns "(blah blah) This is stack(over)", and I'm sure why it's returning that.
Easiest fix:
/\A\(.*?\)/
Normally, * will try to match as much as it possibly can, so it'll match all the way to the last ) in the line. This is called "greedy" matching. Putting ? after +/*/? makes them non-greedy, and they'll match the shortest possible string.
But note that this won't work for nested parentheses. That's rather more complicated. Given your example, I assume this is for a pretty simple ad-hoc format where nesting isn't a concern.
I have a regexp of the form:
/(something complex and boring)?(something complex and interesting)/
I'm interested in the contents of the second parenthesis; the first ones are there only to ensure a correct match (since the boring part might or might not be present but if it is, I'll match it by accident with the regexp for the interesting part).
So I can access the second match using $2. However, for uniformity with other regexps I'm using I want that somehow $1 will contain the contents of the second parethesis. Is it possible?
Use a non-capturing group:
r = /(?:ab)?(cd)/
This is a non-ruby regexp feature. Use /(?:something complex and boring)?(something complex and interesting)/ (note the ?:) to achieve this.
By the way, in Ruby 1.9, you can do /(something complex and boring)?(?<interesting>something complex and interesting)/ and access the group with $~[:interesting] ;)
Yup, use the ?: syntax:
/(?:something complex and boring)?(something complex and interesting)/
I'm not a ruby developer however I know other regex flavors. So I bet you can use a non capturing group
/(?:something complex and boring)?(something complex and interesting)/
There is only one capturing group, hence $1
HTH
Not really, no. But you can use a named group for uniformity, like this:
/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/
You can change the names (the text in the angle brackets) for the uniformity that you want to achieve. You can then access the groups like this:
string.match(/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/) do |m|
# Do something with the match, m['group'] can be used to access the group
end
I need to remove all leading and trailing non-numeric characters. This is what I came up with. Is there a better implementation.
puts s.gsub(/^\D+/,'').gsub(/\D+$/,'')
Instead of eliminating what you don't want, it's often clearer to select what you do want (using parentheses). Also, this only requires one regex evaluation:
s.match(/^\D*(.*?)\D*$/)[1]
Or, this convenient shorthand:
s[/^\D*(.*?)\D*$/, 1]
Perhaps a single #gsub(/(^\D+)|(\D+$)/, '')
Also, when in doubt Rubular it.
Alright, this one's interesting. I have a solution, but I don't like it.
The goal is to be able to find a set of lines that start with 3 periods - not an individual line, mind you, but a collection of all the lines in a row that match. For example, here's some matches (each match is separated by a blank line):
...
...hello
...
...hello
...world
...
...wazzup?
...
My solution is as follows:
^\.\.\..*(\n\.\.\..*)*$
It matches all those, so it's what I'm using for now - however, it looks kinda silly to repeat the \.\.\..* pattern. Is there a simpler way?
Please test your regex before submitting it, rather than submit what "should work." For example, I tried the following first:
(^\.\.\..*$)+
which only returned individual lines, even though in my mind it looks like it would do the trick - I guess I just don't understand regex internals. (And no, I didn't need to set any flags to get ^ and $ to match line boundaries, since I'm implementing this in Ruby.)
So I'm not totally sure there's a good answer, but one would be much appreciated - thanks in advance!
In most regex implementations you can shorten \.\.\. using \.{3} so your solution would turn into \.{3}.*(\n\.{3}.*)*.
What you already have is already simple and understandable. Keep in mind that more "clever" RegExps may very well be slower and undoubtedly less readable.
Assuming lines are terminated by a \n:
((^|\n)\.{3}[^\n]*)+
I am not familiar with Ruby, so depending on how it returns matches you might need to "nonmatch" groups:
((?:(?:^|\n)\.{3}[^\n]*)+)
^([.]{3}.*$\n?)+
This doesn't really need $ in there.
You are pretty close to a solution with (^\.\.\..*$)+, but because the + modifier is on the outside of the group, it is getting overwritten each time and you are only left with the last line. Try wrapping it in an outer group: ((^\.\.\..*$)+) and looking at the first submatch and ignoring the inner one.
Combined with the other suggestion: ((^\.{3}.*$)+)