How to match only some characters with a regex? - ruby

I want to check if a string matches only some characters with a regex.
For example, I would like to match only a, b, or c.
So, "aaacb" would pass, but "aaauccb" would not (because of the u).
I have tried this way:
/[a|b|c]+/
but it does not work, because the failing example passes.

You need to make sure that your string consists only of those characters by anchoring the regex to the beginning and end of the string:
/^[abc]+$/
You also mixed up two concepts. Alternation (which would be (a|b|c)) and character classes (which would be [abc]). They are in this case equivalent. Your version would also allow | as a character.

/[^abc]/
Is a litterally copied example from rubular. It matches any single character except: a, b, or c

Try [abc]+
It will match a, b or c.

Related

Regex to match none, one, the other or both in any order

I'm trying to find an appropriate expressions to match C++ integer suffix, which is, following cppreference:
integer-suffix, if provided, may contain one or both of the following > (if both are provided, they may appear in any order:
unsigned-suffix (the character u or the character U)
long-suffix (the character l or the character L) or the long-long-suffix (the character sequence ll or the character sequence LL) (since C++11)
As of now, the best pattern I was able to write is
/u?(ll|l)?u?/i
But this will match uu which isn't allowed per the standard… Is there a better regex?
edit
In the lexer I'm currently working on, we parse integers as follows (C rules, C++ rules are similar):
rule /\d+[lu]*/i, Num::Integer
rule /0[0-7]+[lu]*/i, Num::Oct
rule /\d+[lu]*/i, Num::Integer
As one can see, the matching of the suffix is matching a lot more than what is defined in the standard. My goal is to rewrite this as:
isuffix = /u?(ll|l)?u?/i
rule /\d+#{isuffix}/i, Num::Integer
rule /0[0-7]+#{isuffix}/i, Num::Oct
rule /\d+#{isuffix}/i, Num::Integer
Pure Ruby... U knoL
%w(u ul ull l ll llu).include? suffix.downcase
But if you insist:
/u?ll?|l?l?u/i
The first part handles the u before the ls and requires an l.
The second part handles the u after the ls and requires the u.
If you want to include an empty suffix as a possibility, you can add optional matching for these characters as well.
Note that this expects that the lexer will fail if there are some leftovers from the suffix.
See it in action
Updated answer
If you're looking for suffix so that /\d('?\d)*#{suffix}/ matches decimal integers, you can use :
suffix = /(ul?l?|ll?u?)?\b/i
Here is a Rubular example. It matches 1 in l1 and 11 in c++11 though, because there's no lookbehind before \d.
Old answers
This will find a non-empty suffix anywhere in the string :
/(?<![a-z])(ul?l?|ll?u?)\b/i
It means :
u, ul, ull or
l, ll, lu or llu
Followed by a word boundary and preceded by anything but another letter.
Other answers without boundaries match "uu" for example.
Here is a Rubular example.
If your string is just the suffix and you want to check it is correct :
/^(ul?l?|ll?u?)?$/i
Here is another example.
Force failing using negative lookahead.
For example:
/(?!u(ll|l)?u)u?(ll|l)?u?/i
or
/(?!ul*u)u?l{0,2}u?/i
My 2 cents for what ever it's worth: Sometimes it just pays to be explicit and not try to be too fancy. I think that this is one of those times. Here's my regex:
/(?<=\d)(u|ul|ull|l|lu|ll|llu)(?=([^ul]|$))/i
Well the idea was simple...

Matching only a single standalone letter

I'm trying to write a regular expression that matches only a single standalone letter only, such as a,C,f,G, but, NOT abc or de for instance.
I tried [a-zA-z], but all of the above match.
What should I do in this case?
^[a-zA-Z]$
Add ^$ or anchors to limit match to just one character.
or
(?:^|(?<=[^a-zA-Z]))[a-zA-Z](?=[^a-zA-Z]|$)
There are several ways to do this, depending on your content. This could work:
[^a-zA-Z][a-zA-Z][^a-zA-Z]
Or there's a regex code for that, the \b:
\b[a-zA-Z]\b
which is more useful since it allows matches at the start and end of a line.
Your regex [a-zA-z] matches not only letters but also matches [, ], \, ^, _ and `. Moreover, it has no anchors and thus will match both a and t in at.
You can make use of the POSIX bracket expression alpha to match a single letter substring together with a word boundary \b:
puts 'a,C,f,G, but, NOT abc de'.scan(/\b[[:alpha:]]\b/)
See IDEONE demo
Output:
a
C
f
G

Precedence of Ruby regular expressions?

I am reviewing regular expressions and cannot understand why a regular expression won't match a given string, specifically:
regex = /(ab*)+(bc)?/
mystring = "abbc"
The match matches "abb" but leaves the c off. I tested this using Rubular and in IRB and don't understand why the regex doesn't match the entire string. I thought that (ab*)+ would match "ab" and then (bc)? would match "bc".
Am I missing something in terms of precedence for regular expression operations?
Regular expressions try to match the first part of the regular expression as much as possible by default, and they do not backtrack to try to make larger sections match if they don't have to. Since you make (bc) optional, the (ab*) can match as much as it wants (the non-zero repetition after it doesn't have much to do) and doesn't try backtracking to try other matching alternatives.
If you want the whole string to be matched (which will force some backtracking in this case) make sure you anchor both ends of the string:
regex = /^(ab*)+(bc)?$/
The regex with parenthesis assumes you have two matches in your string.
The first one is abb because (ab*) means a and zero or more b. You have two b, so the match is abb. Then you have only c in your string, so it doesn't match the second condition which is bc.

Regex negation with non-greedy match in Ruby

I have this string:
ibeu
I'm trying to have a regex that will do the following: (most of which works)
Find syllables in a word, and insert some letters before the first vowel in the syllable. Because of the challenges that this presents, the string above has already been modified before running it through the hack of a lex that I wrote for it.
I have this regex that I'm trying to use on the above string:
word.match(/(.*?)([^aeo]?u)(.*?)/)
Given the constraints of the regex (mainly the u not preceded by a, e, or o, I expect nothing to match that regex, but I end up with the following:
#<MatchData "ibeu" 1:"ibe" 2:"u">
I'm sure this is something stupid, but I can't figure it out.
You probably want negative look-behind:
(.*?)(?<![aeo])u
It matches any u that is not preceded by aeo. I am not sure what you actually want to do, so I just include the (.*?) in front like your current regex. The (.*?) at the back will always match an empty string, so it is redundant and can be removed.
Your current solution fails since [^aeo] is made optional. It can't simply be fixed by removing ?, since it seems you also want to match u when it is at the beginning of the string or when there are multiple u in a row.

Ruby string does not match expression

I have this ruby expression as below
(a|bc)(d?|e)*
when i use rubular to test out possible strings that fit this expression, I have some strings that I dont understand why they dont fit
the strings are "ade", it matches "ad" but does not match the "e". Anyone can help?
The second part of the regular expression you entered (d?|e)* is the problem. Putting the ? on the d says, match d 0 or 1 times. When you run through the string ade, the regex matches a, then d, then d 0 times... If you instead changed it to (a|bc)(d|e)*, it would match ade, and seem to have the semantics that you're looking for.
(d?)* is a non-greedy match and e* will be "short circuited" by logic or. It will match as few as possible.
I don't know why you put a question mark there. Just use
(a|bc)(d|e)*
Will be fine.

Resources