Ruby string does not match expression - ruby

I have this ruby expression as below
(a|bc)(d?|e)*
when i use rubular to test out possible strings that fit this expression, I have some strings that I dont understand why they dont fit
the strings are "ade", it matches "ad" but does not match the "e". Anyone can help?

The second part of the regular expression you entered (d?|e)* is the problem. Putting the ? on the d says, match d 0 or 1 times. When you run through the string ade, the regex matches a, then d, then d 0 times... If you instead changed it to (a|bc)(d|e)*, it would match ade, and seem to have the semantics that you're looking for.

(d?)* is a non-greedy match and e* will be "short circuited" by logic or. It will match as few as possible.
I don't know why you put a question mark there. Just use
(a|bc)(d|e)*
Will be fine.

Related

Use regex to find a certain instance of two characters

I am not sure how to approach this problem using regex.
Write a method that takes a string in and returns true if the letter "z" appears within three letters after an "a". You may assume that the string contains only lowercase letters.
Any insight would be greatly appreciated.
Basically, you're being asked to match any of the following patterns:
'a**z'
'a*z'
'az'
where * is a lowercase letter, a-z. In natural language (ok, English) that can be stated as "An 'a' followed by 0, 1, or 2 lowercase letters, followed by a 'z'. Regex-wise, that can be expressed as
/a[a-z]{0,2}z/
I'm not a rubyist at all, so there may or may not be some sort of Ruby specific tweaks that need to be made to that, but that should be the basic gist of it.
def foo s
!!(s =~ /a\w{,2}z/)
end
This is the site I like to use for Ruby Regex validations. rubular.com That being said, if you want to hit on a z that is 0 to 2 spaces after only an a, and the text is lowercase I would use this Regex string. [a]{0,2}z This regex string passed all three scenarios on Rubular and it will only hit on an a.

Precedence of Ruby regular expressions?

I am reviewing regular expressions and cannot understand why a regular expression won't match a given string, specifically:
regex = /(ab*)+(bc)?/
mystring = "abbc"
The match matches "abb" but leaves the c off. I tested this using Rubular and in IRB and don't understand why the regex doesn't match the entire string. I thought that (ab*)+ would match "ab" and then (bc)? would match "bc".
Am I missing something in terms of precedence for regular expression operations?
Regular expressions try to match the first part of the regular expression as much as possible by default, and they do not backtrack to try to make larger sections match if they don't have to. Since you make (bc) optional, the (ab*) can match as much as it wants (the non-zero repetition after it doesn't have much to do) and doesn't try backtracking to try other matching alternatives.
If you want the whole string to be matched (which will force some backtracking in this case) make sure you anchor both ends of the string:
regex = /^(ab*)+(bc)?$/
The regex with parenthesis assumes you have two matches in your string.
The first one is abb because (ab*) means a and zero or more b. You have two b, so the match is abb. Then you have only c in your string, so it doesn't match the second condition which is bc.

How to match a word that is not preceded by "=" using Regex?

I would like to extract symbols from Fortran codes in Ruby. The symbols would have the following patterns (NOTE: the variable type and attribute parts have been filtered out):
a = b, c(2) ! Match result should be "a" and "c"
d(3) = [1,2, & ! Match result should be "d"
3]
The Regex that I have tried is ((?<!=)\w+(?![^\[]*\]+)(?=( |,|\(|$))) with lookaround stuffs. But due to the restriction of lookbehind, I can not match "= *" to exclude b.
I used Rubular for testing. For your convenience, see here.
Thanks in advance!
In order to make your regex work you can first replace all trailing whitespace after =
.gsub(/=\s+/, '=').scan(/((?<!=)\w+(?![^\[]*\]+)(?=( |,|\(|$)))/)
One easy thing you could do is split the line in two (at the '=') and only do your regex on the left operand.
That way you don't have to write any complex regex.
My advice would be to separate your regex into 2 expressions. Regex needn't always be a one-liner.

How to match only some characters with a regex?

I want to check if a string matches only some characters with a regex.
For example, I would like to match only a, b, or c.
So, "aaacb" would pass, but "aaauccb" would not (because of the u).
I have tried this way:
/[a|b|c]+/
but it does not work, because the failing example passes.
You need to make sure that your string consists only of those characters by anchoring the regex to the beginning and end of the string:
/^[abc]+$/
You also mixed up two concepts. Alternation (which would be (a|b|c)) and character classes (which would be [abc]). They are in this case equivalent. Your version would also allow | as a character.
/[^abc]/
Is a litterally copied example from rubular. It matches any single character except: a, b, or c
Try [abc]+
It will match a, b or c.

Ruby regex, is there a way to only match literal matches?

I'm trying to parse using a case/when statement with regex in it. I'm having some trouble with the match as it will give me a match even if it's not a literal match.
Example:
if I input ($45, x), I get back: "address mode: indirect, x -> value: 45" from this regex:
/[(][$][1-9a-fA-F]{1,2}\s*,\s*[xX]\s*[)]/
Now, if I input ($45, p), I get a match for this regex:
/[$][1-9a-fA-F]{2,4}/
Which is understandable, but I would like the match to look only for literal matches. If there are extra characters that does not exactly match the regex I want the match function to return false.
Is there some other functions like match() or extra arguments that can be given to match() to get this behavior?
From your question, it is a little unclear what you are after. Your second regex is matching on the substring
$45
If you want to avoid this, use the anchors ^ and $ to ensure the entire string is matched. Something like:
^\(\$[1-9A-Za-z]+,\s*[xX]\s*\)$

Resources