Regexp in ruby - can I use parenthesis without grouping? - ruby

I have a regexp of the form:
/(something complex and boring)?(something complex and interesting)/
I'm interested in the contents of the second parenthesis; the first ones are there only to ensure a correct match (since the boring part might or might not be present but if it is, I'll match it by accident with the regexp for the interesting part).
So I can access the second match using $2. However, for uniformity with other regexps I'm using I want that somehow $1 will contain the contents of the second parethesis. Is it possible?

Use a non-capturing group:
r = /(?:ab)?(cd)/

This is a non-ruby regexp feature. Use /(?:something complex and boring)?(something complex and interesting)/ (note the ?:) to achieve this.
By the way, in Ruby 1.9, you can do /(something complex and boring)?(?<interesting>something complex and interesting)/ and access the group with $~[:interesting] ;)

Yup, use the ?: syntax:
/(?:something complex and boring)?(something complex and interesting)/

I'm not a ruby developer however I know other regex flavors. So I bet you can use a non capturing group
/(?:something complex and boring)?(something complex and interesting)/
There is only one capturing group, hence $1
HTH

Not really, no. But you can use a named group for uniformity, like this:
/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/
You can change the names (the text in the angle brackets) for the uniformity that you want to achieve. You can then access the groups like this:
string.match(/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/) do |m|
# Do something with the match, m['group'] can be used to access the group
end

Related

Named subroutines in Oniguruma regex engine?

In Perl, you can do this:
(?x)
(?(DEFINE)
(?<animal>dog|cat)
)
(?&animal)
In Ruby (Oniguruma engine), it seems that the (?(DEFINE... syntax is not supported. Also, (?&... becomes \g. So, you can do this:
(?x)
(?<animal>dog|cat)
\g<animal>
But of course, this is not equivalent to the Perl example I gave above, becuase the first (?<animal>dog|cat) is not ignored, since there isn't anything like (?(DEFINE....
If I want to define a large regex with a bunch of named subroutines, what I could once do in Perl can't be done this way.
It does seem that I could hack together a pretty awkward solution by doing something like this:
(?x)
(?:^$DEFINE
(?<animal>dog|cat)
){0}
\g<animal>
But, that is pretty hackish. Is there a better way to do this? Does Oniguruma support a way to define named subroutines without having to try to "match" them first?
Alternatively, if there is a way to get true PCRE to work in Ruby, with ?(DEFINE... and (?&... I'd take that too.
Thanks!
You don't need a so complicated hack. Writing:
(?x)
(?<animal>dog|cat){0}
(?<color>red|green|blue){0}
...
your main pattern here
does exactly the same.
Putting all group definitions inside (?:^$DEFINE ... ){0} is only cosmetic.
Note that a group with the quantifier {0} isn't tried at all (the quantifier is taken in account first), and if in this way the named group is defined anyway, man can deduce that it isn't really a hack, but the way to do it with oniguruma.

Changing "word" to "Word" using a RegEx like [A-Z]([a-z]*)\b

The title sums up my conundrum pretty well. I've been searching around the net for a while, and being new to Ruby and Regular Expressions as a whole, I'm stuck trying to figure out how to alter the case of a single word string using a RegEx "filter" such as [A-Z]([a-z]*)\b.
Basically I want the flow to be
input: woRD
filter: [A-Z]([a-z]*)\b
output: Word
I already have the words filtered into a list, so I don't need to match words; I only need to filter the case of the word using a RegEx filter.
I do not want to use standard capitalization methods, I want this to be done using Regular Expressions.
You can use
"woRD".downcase.capitalize
Ruby provides some predefined methods for these type of functionality. Try to use them instead of regex. which saves coding time!
Well, for some reason you want to use regexps. Here you go:
# prepare hashes for gsub
to_down = (to_upper = Hash[('a'..'z').zip('A'..'Z')]).invert
# convert to downcase
downcased = 'woRD'.gsub(/[A-Z]/, to_down)
# ⇛ 'word'
titlecased = downcased.gsub(/^\w/, to_upper)
# ⇒ 'Word'
Hope it helps. Note the usage of String#gsub(re, hash) method.
You can't use Regex to such altering as you want to do.
Please read carefully this topic: How to change case of letters in string using regex in Ruby.
The best way to solve your problem is to use:
"woRD".downcase.capitalize
or
name_of_your_variable.downcase!.capitalize!
if you want to alter string in your variable permanently without need of assign it to other variable.

Content Inside Parenthesis Regular Expression Ruby

I'm trying to take out the the content inside the parenthesis. For example, if the string is "(blah blah) This is stack(over)flow", I want to just take out "(blah blah)" but leave "(over)" alone. I'm trying
/\A\(.*\)/
but returns "(blah blah) This is stack(over)", and I'm sure why it's returning that.
Easiest fix:
/\A\(.*?\)/
Normally, * will try to match as much as it possibly can, so it'll match all the way to the last ) in the line. This is called "greedy" matching. Putting ? after +/*/? makes them non-greedy, and they'll match the shortest possible string.
But note that this won't work for nested parentheses. That's rather more complicated. Given your example, I assume this is for a pretty simple ad-hoc format where nesting isn't a concern.

Regex can this be achieved

I'm too ambitious or is there a way do this
to add a string if not present ?
and
remove a the same string if present?
Do all of this using Regex and avoid the if else statement
Here an example
I have string
"admin,artist,location_manager,event_manager"
so can the substring location_manager be added or removed with regards to above conditions
basically I'm looking to avoid the if else statement and do all of this plainly in regex
"admin,artist,location_manager,event_manager".test(/some_regex/)
The some_regex will remove location_manager from the string if present else it will add it
Am I over over ambitions
You will need to use some sort of logic.
str += ',location_manager' unless str.gsub!(/location_manager,/,'')
I'm assuming that if it's not present you append it to the end of the string
Regex will not actually add or remove anything in any language that I am aware of. It is simply used to match. You must use some other language construct (a regex based replacement function for example) to achieve this functionality. It would probably help to mention your specific language so as to get help from those users.
Here's one kinda off-the-wall solution. It doesn't use regexes, but it also doesn't use any if/else statements either. It's more academic than production-worthy.
Assumptions: Your string is a comma-separated list of titles, and that these are a unique set (no duplicates), and that order doesn't matter:
titles = Set.new(str.split(','))
#=> #<Set: {"admin", "artist", "location_manager", "event_manager"}>
titles_to_toggle = ["location_manager"]
#=> ["location_manager"]
titles ^= titles_to_toggle
#=> #<Set: {"admin", "artist", "event_manager"}>
titles ^= titles_to_toggle
#=> #<Set: {"location_manager", "admin", "artist", "event_manager"}>
titles.to_a.join(",")
#=> "location_manager,admin,artist,event_manager"
All this assumes that you're using a string as a kind of set. If so, you should probably just use a set. If not, and you actually need string-manipulation functions to operate on it, there's probably no way around except for using if-else, or a variant, such as the ternary operator, or unless, or Bergi's answer
Also worth noting regarding regex as a solution: Make sure you consider the edge cases. If 'location_manager' is in the middle of the string, will you remove the extraneous comma? Will you handle removing commas correctly if it's at the beginning or the end of the string? Will you correctly add commas when it's added? For these reasons treating a set as a set or array instead of a string makes more sense.
No. Regex can only match/test whether "a string" is present (or not). Then, the function you've used can do something based on that result, for example replace can remove a match.
Yet, you want to do two actions (each can be done with regex), remove if present and add if not. You can't execute them sequentially, because they overlap - you need to execute either the one or the other. This is where if-else structures (or ternary operators) come into play, and they are required if there is no library/native function that contains them to do exactly this job. I doubt there is one in Ruby.
If you want to avoid the if-else-statement (for one-liners or expressions), you can use the ternary operator. Or, you can use a labda expression returning the correct value:
# kind of pseudo code
string.replace(/location,?|$/, function($0) return $0 ? "" : ",location" )
This matches the string "location" (with optional comma) or the string end, and replaces that with nothing if a match was found or the string ",location" otherwise. I'm sure you can adapt this to Ruby.
to remove something matching a pattern is really easy:
(admin,?|artist,?|location_manager,?|event_manager,?)
then choose the string to replace the match -in your case an empty string- and pass everything to the replace method.
The other operation you suggested was more difficult to achieve with regex only. Maybe someone knows a better answer

Optimal Regular Expression: match sets of lines starting with

Alright, this one's interesting. I have a solution, but I don't like it.
The goal is to be able to find a set of lines that start with 3 periods - not an individual line, mind you, but a collection of all the lines in a row that match. For example, here's some matches (each match is separated by a blank line):
...
...hello
...
...hello
...world
...
...wazzup?
...
My solution is as follows:
^\.\.\..*(\n\.\.\..*)*$
It matches all those, so it's what I'm using for now - however, it looks kinda silly to repeat the \.\.\..* pattern. Is there a simpler way?
Please test your regex before submitting it, rather than submit what "should work." For example, I tried the following first:
(^\.\.\..*$)+
which only returned individual lines, even though in my mind it looks like it would do the trick - I guess I just don't understand regex internals. (And no, I didn't need to set any flags to get ^ and $ to match line boundaries, since I'm implementing this in Ruby.)
So I'm not totally sure there's a good answer, but one would be much appreciated - thanks in advance!
In most regex implementations you can shorten \.\.\. using \.{3} so your solution would turn into \.{3}.*(\n\.{3}.*)*.
What you already have is already simple and understandable. Keep in mind that more "clever" RegExps may very well be slower and undoubtedly less readable.
Assuming lines are terminated by a \n:
((^|\n)\.{3}[^\n]*)+
I am not familiar with Ruby, so depending on how it returns matches you might need to "nonmatch" groups:
((?:(?:^|\n)\.{3}[^\n]*)+)
^([.]{3}.*$\n?)+
This doesn't really need $ in there.
You are pretty close to a solution with (^\.\.\..*$)+, but because the + modifier is on the outside of the group, it is getting overwritten each time and you are only left with the last line. Try wrapping it in an outer group: ((^\.\.\..*$)+) and looking at the first submatch and ignoring the inner one.
Combined with the other suggestion: ((^\.{3}.*$)+)

Resources