How to use regular expressions to match numbers with some exceptions - ruby

I need to match numbers in groups of 5, from 1 to 5, with the following exceptions:
Numbers can't include zeros
Numbers can't be like 11111, 22222 and so on.
Numbers can't be like 12345 or 54321
Some examples of valid numbers:
14252, 45121, 43412, 51321 ...
So far I got an expression to group the numbers and do not allow zeros.
/[1-5]{5}/
But I'm having some trouble to handle the second and third exceptions. I tried unsuccessfully to use a negative lookahead to disallow a match if I have a pattern of repeated numbers.
?!11111|?!22222
I'm trying with this expression:
((?!11111)[1-5]{5}?)
How can I write regular expressions to not match certain patterns?
I will eventually change it to not match any other sequence of numbers.

First off, you don't have to cram everything into one regex. Regexes are already complicated, if you can do it in multiple regexes, that will often make things much simpler and allow for more flexible code. For example, you can customize the error message based on which condition failed. Usually you only need to fold multiple regexes together for performance reasons, and there are tools to do that automatically.
So far I got an expression to group the numbers and do not allow zeros.
/[1-5]{5}/
Careful, you have to anchor at both ends that else it will accept any string that contains a run of 5 of 1-5.
/\A[1-5]{5}\z/
Numbers can't be like 11111, 22222 and so on.
Use a capture within the regex to accomplish this. Capture the first number, then see if there's four more. () to capture and \1 to refer to what was captured.
/\A([1-5])\1{4}\z/'
Numbers can't be like 12345 or 54321
/\A(?:12345|54321)\z/

Here's a solution that does not use a regular expression. I understand we are to determine if: a) the string contains five characters; b) each character equals '1', '2', '3', '4' or '5'; c) the string contains at least two different characters; and d) the string is neither '12345' nor '54321'. We can do that as follows.
def is_ok?(str)
str.size == 5 && # five characters
(str.chars - ['1','2','3','4','5']).empty? && # only the digits '1'-'5'
str.squeeze.size > 1 && # not all the same character
str != '12345' && # not an increasing sequence
str != '54321' # not a decreasing sequence
end
is_ok? '12543' #=> true
is_ok? '12043' #=> false
is_ok? '12643' #=> false
is_ok? '22222' #=> false
is_ok? '12345' #=> false
is_ok? '54321' #=> false

You have the right idea using negative lookaheads, just the syntax was a little off. This works for me:
\A(?!11111|22222|33333|44444|55555|12345|54321)[1-5]{5}\z

How about this?
^(?!([1-5])\1{4})(?!54321)(?!12345)[1-5]{5}$

Related

In ruby, how do I use string.scan(/regex/) method for numbers from 1 to 12?

That's what I am doing:
c.scan(/[1-9]|1[0-2]/)
For some reason, it returns only numbers from 1 to 9, ignoring the second part. I tried experimenting a little bit, it seems that the method will search for 10-12 only if 1 is excluded from [1-9] part, e.g., c.scan(/[2-9]|1[0-2]/) will do. What is the reason?
P.S. I know that this method lacks lookbehinds and will search for numbers and "part of numbers" as well
Change the order of your patterns and add word boundaries if necessary.
c.scan(/\b(?:1[0-2]|[1-9])\b/)
The pattern before | is used first. So in our case, it matches all the numbers from 10 to 12. After that the next pattern, that is the one after | is used and now it matches all the remaining numbers ranges from 1 to 9. Note that this would match 9 in 59 also. So i suggest you to put your pattern inside a capturing or non-capturing group and add word boundary \b (matches between a word character and a non-word character) before and after to that group .
DEMO
| matches left to right, and the first part of the right side (1) is always matched by the left side. Reverse them:
c.scan(/1[0-2]|[1-9]/)
Here's another way you might consider extracting numbers between 1 and 12 (assuming that's what you want to do):
c = '14 0 11x 15 003 y12'
c.scan(/\d+/).map(&:to_i).select { |n| (1..12).cover?(n) }
#=> [11, 3, 12]
I've returned an array of integers, rather than strings, thinking that probably would be more useful, but if you want strings:
c.scan(/\d+/).map { |s| s.to_i.to_s }
.select { |s| ['10', '11', '12', *'1'..'9'].include?(s) }
#=> ["11", "3", "12"]
I see several advantages to this approach, versus using a single regex:
it's easy to understand;
the regex is simple;
it's easy to modify if the permissible values change; and
it can be broken into three pieces to facilitate testing.

Regex to capture string into ruby method params

I Looking for an Regex to capture this examples of strings:
first_paramenter, first_hash_key: 'class1 class2', second_hash_key: true
first_argument, single_hash_key: 'class1 class2'
first_argument_without_second_argument
The pattern rules are:
The string must start some word (the first parameter) /^(\w+)/
The second parameter is optional
If second parameter provided, must have one comma after fisrt parameter
The second argument is an hash, with keys and values. Values can be true, false or an string enclosed by quotes
The hash keys must start with letter
I'm using this regex, but it matches with the only second example:
^(\w+),(\s[a-z]{1}[a-z_]+:\s'?[\w\s]+'?,?)$
I'd go with something like:
^(\w+)(?:, ([a-z]\w+): ('[^']*')(?:, ([a-z]\w+): (\w+))?)?
Here's a Rubular example of it.
(?:...) create non-capturing groups which we can easily test for existence using ?. That makes it easy to test for optional chunks.
([a-z]\w+) is an easy way to say "it must start with a letter" while allowing normal alpha, digits and "_".
As far as testing for "Values can be true, false or an string enclosed by quotes", I'd do that in code after capturing. It's way too easy to create a complex pattern, and then be unable to maintain it later. It's better to use simple ones, then look to see whether you got what you expected, than to try to enforce it inside the regex.
in the third example, your regex return 5 matches. It would be better if return only one. It's possible?
I'm not sure what you're asking. This will return a single capture for each, but why you'd want that makes no sense to me if you're capturing parameters to send to a method:
/^(\w+(?:, [a-z]\w+: '[^']*'(?:, [a-z]\w+: \w+)?)?)/
http://rubular.com/r/GLVuSOieI6
There is frequently a choice to be made between attacking an entire string with a single regex or breaking the string up with one or more String methods, and then going after each piece separately. The latter approach often makes debugging and testing easier, and may also make the code intelligible to mere mortals. It's always a judgement call, of course, but I think this problem lends itself well to the divide and conquer approach. This is how I'd do it.
Code
def match?(str)
a = str.split(',')
return false unless a.shift.strip =~ /^\w+$/
a.each do |s|
return false unless ((key_val = s.split(':')).size == 2) &&
key_val.first.strip =~ /^[a-z]\w*$/ &&
key_val.last.strip =~ /^(\'.*?\'|true|false)$/
end
true
end
Examples
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: true")
#=>true
match?("first_argument, single_hash_key: 'class1 class2'")
#=>true
match?("first_argument_without_second_argument")
#=>true
match?("first_parameter, first_hash_key: 7")
#=>false
match?("dogs and cats, first_hash_key: 'class1 class2'")
#=>false
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: :true")
#=>false
You've got the basic idea, you have a bunch of small mistakes in there
/^(\w+)(,\s[a-z][a-z_]+:\s('[^']*'|true|false))*$/
explained:
/^(\w+) # starts with a word
(
,\s # the comma goes _inside_ the parens since its optional
[a-z][a-z_]+:\s # {1} is completely redundant
( # use | in a capture group to allow different possible keys
'[^']*' | # note that '? doesn't make sure that the quotes always match
true |
false
)
)*$/x # can have 0 or more hash keys after the first word

At which position does the regex fail?

I need a very simple string validator that would show where is first symbol not corresponding to the desired format. I want to use regex but in this case I have to find the place where the string stops corresponding to the expression and I can't find a method that would do that.
(It's got to be a fairly simple method... maybe there isn't one?)
For example if I have regex:
/^Q+E+R+$/
with string:
"QQQQEEE2ER"
The desired result should be 7
An idea: what you can do is to tokenize your pattern and write it with optional nested capturing groups:
^(Q+(E+(R+($)?)?)?)?
Then you only need to count the number of capture groups you obtain to know where the regex engine stops in the pattern and you can determine the offset of the match end in the string with the whole match length.
As #zx81 notices it in his comment, if one of the elements can match the next element (example Q can match the element E), things become different.
Let's say that Q is \w (and can match E and R). For the string QQQEEERRR the precedent pattern will give only one capturing group (the greedy \w+ matches all) when ^(\w+)(E+)(R+)$ will give three groups: QQQEE, E, RRR
To obtain the same result you need to add an alternation:
^((?:\w+(?=E)|\w+)(E+(R+($)?)?)?)?
In the alternation, the case where E exists must be tested first, and only if this branch fails (with the lookahead), then the other branch where E doesn't exist is used.
Thus the full pattern can be rewritten like this to deal with this specific case:
^((?:Q+(?=E)|Q+)((?:E+(?=R)|E+)((?:R+(?=$)|R+)($)?)?)?)?
Perhaps could you take a look to the gem amatch too.
This is an interesting task that can be accomplished with a neat regex trick:
^(?:(?=(Q+)))?(?:(?=(Q+E+)))?(?:(?=(Q+E+R+)))?(?:(?=(Q+E+R+$)))?
We have four optional lookaheads checking various parts of the pattern and capturing the partial matches to Groups 1, 2, 3 and 4 incrementally.
Group 1 contains Q+ if it can be matched, in your example QQQQ.
Group 2 contains Q+E+ if it can be matched, in your example EEE.
Group 3 contains Q+E+R+ if it can be matched, in your example nil.
Group 3 contains Q+E+R+$ if it can be matched, in your example nil.
In your code, check which is the last Group that is set by testing !$1.nil?, !$2.nil? and so on.
The last one set gives you the length that is matchable, so in your example $2.length gives you the 7 you wanted.
Incidentally, the fact that Group 2 is the last one set also tells you that we fail on R+.
For your example, you could do the following.
Code
Change your regex from:
/^Q+E+R+$/
to
R = /^(Q*)(E*)(R*)/
and then apply the following method to the string:
def nbr_matched_chars(str)
str.scan(R).flatten.reduce(0) {|t,e| return t if e.nil?; t+e.size }
end
str matches the original regex if and only if nbr_matched_chars(str) == str.size.
Examples
nbr_matched_chars("QQQQEEE2ER") #=> 7
nbr_matched_chars("QQQQEEEERR") #=> 10 (= "QQQQEEEERR".size)
nbr_matched_chars("QQAQQEEEER") #=> 2
Explanation
To see why this [evidently :-)] works, we can look at the results of invoking String#scan, followed by Array#flatten:
"QQQQEEE2ER".scan(r).flatten #=> ["QQQQ", "EEE" , nil ]
"QQQQEEEERR".scan(r).flatten #=> ["QQQQ", "EEEE", "RR"]
"QQAQQEEEER".scan(r).flatten #=> ["QQ" , nil , nil ]

Ruby Regular expressions (regex): character appear only once at most

Suppose I want to make sure a string x equals any combination of abcd (each character appearing one or zero times-->each character should not repeat, but the combination may appear in any order)
valid ex: bc .. abcd ... bcad ... b... d .. dc
invalid ex. abcdd, cc, bbbb, abcde (ofcourse)
my effort:
I tried various techniques:
the closest I came was
x =~ ^(((a)?(b)?(c)?(d)?))$
but this wont work if I do not type them in the same order as i have written:
works for: ab, acd, abcd, a, d, c
wont work for: bcda, cb, da (anything that is not in the above order)
you can test your solutions here : http://rubular.com/r/wCpD355bub
PS: the characters may not be in alphabetical order, it could be u c e t
If you can use things besides regexes, you can try:
str.chars.uniq.length == str.length && str.match(/^[a-d]+$/)
The general idea here is that you just strip any duplicated characters from the string, and if the length of the uniq array is not equal to the length of the source string, you have a duplicated character in the string. The regex then enforces the character set.
This can probably be improved, but it's pretty straightforward. It does create a couple of extra arrays, so you might want a different approach if this needs to be used in a performance-critical location.
If you want to stick to regexes, you could use:
str.match(/^[a-d]+$/) && !str.match(/([a-d]).*\1/)
That'll basically check that the string only contains the allowed characters, and that those characters are never repeated.
This is really not what regular expressions are meant to do, but if you really really want to.
Here is a regex that satisfies the conditions.
^([a-d])(?!(\1))([a-d])?(?!(\1|\3))([a-d])?(?!(\1|\3|\5))([a-d])?(?!(\1|\3|\5|\7))$
basically it goes through each character, making the group, then makes sure that that group isn't matched. Then checks the next character, and makes sure that group and the previous groups don't match.
You can reverse it (match the condition that would make it fail)
re = /^ # start of line
(?=.*([a-d]).*\1) # match if a letter appears more than once
| # or
(?=.*[^a-d]) # match if a non abcd char appears
/x
puts 'fail' if %w{bc abcd bcad b d dc}.any?{|s| s =~ re}
puts 'fail' unless %w{abcdd cc bbbb abcde}.all?{|s| s =~ re}
I don't think regexes are well suited to this problem, so here is another non-regex solution. It's recursive:
def match_chars_no_more_than_once(characters, string)
return true if string.empty?
if characters.index(string[0])
match_chars_no_more_than_once(characters.sub(string[0],''), string[1..-1])
else
false
end
end
%w{bc bdac hello acbbd cdda}.each do |string|
p [string, match_chars_no_more_than_once('abcd', string)]
end
Output:
["bc", true]
["bdac", true]
["hello", false]
["acbbd", false]
["cdda", false]

Regular expression to match my pattern of words, wild chars

can you help me with this:
I want a regular expression for my Ruby program to match a word with the below pattern
Pattern has
List of letters ( For example. ABCC => 1 A, 1 B, 2 C )
N Wild Card Charaters ( N can be 0 or 1 or 2)
A fixed word (for example “XY”).
Rules:
Regarding the List of letters, it should match words with
a. 0 or 1 A
b. 0 or 1 B
c. 0 or 1 or 2 C
Based on the value of N, there can be 0 or 1 or 2 wild chars
Fixed word is always in the order it is given.
The combination of all these can be in any order and should match words like below
ABWXY ( if wild char = 1)
BAXY
CXYCB
But not words with 2 A’s or 2 B’s
I am using the pattern like ^[ABCC]*.XY$
But it looks for words with more than 1 A, or 1 B or 2 C's and also looks for words which end with XY, I want all words which have XY in any place and letters and wild chars in any postion.
If it HAS to be a regex, the following could be used:
if subject =~
/^ # start of string
(?!(?:[^A]*A){2}) # assert that there are less than two As
(?!(?:[^B]*B){2}) # and less than two Bs
(?!(?:[^C]*C){3}) # and less than three Cs
(?!(?:[ABCXY]*[^ABCXY]){3}) # and less than three non-ABCXY characters
(?=.*XY) # and that XY is contained in the string.
/x
# Successful match
else
# Match attempt failed
end
This assumes that none of the characters A, B, C, X, or Y are allowed as wildcards.
I consider myself to be fairly good with regular expressions and I can't think of a way to do what you're asking. Regular expressions look for patterns and what you seem to want is quite a few different patterns. It might be more appropriate to in your case to write a function which splits the string into characters and count what you have so you can satisfy your criteria.
Just to give an example of your problem, a regex like /[abc]/ will match every single occurrence of a, b and c regardless of how many times those letters appear in the string. You can try /c{1,2}/ and it will match "c", "cc", and "ccc". It matches the last case because you have a pattern of 1 c and 2 c's in "ccc".
One thing I have found invaluable when developing and debugging regular expressions is rubular.com. Try some examples and I think you'll see what you're up against.
I don't know if this is really any help but it might help you choose a direction.
You need to break out your pattern properly. In regexp terms, [ABCC] means "any one of A, B or C" where the duplicate C is ignored. It's a set operator, not a grouping operator like () is.
What you seem to be describing is creating a regexp based on parameters. You can do this by passing a string to Regexp.new and using the result.
An example is roughly:
def match_for_options(options)
pattern = '^'
pattern << 'A' * options[:a] if (options[:a])
pattern << 'B' * options[:b] if (options[:b])
pattern << 'C' * options[:c] if (options[:c])
Regexp.new(pattern)
end
You'd use it something like this:
if (match_for_options(:a => 1, :c => 2).match('ACC'))
# ...
end
Since you want to allow these "elements" to appear in any order, you might be better off writing a bit of Ruby code that goes through the string from beginning to end and counts the number of As, Bs, and Cs, finds whether it contains your desired substring. If the number of As, Bs, and Cs, is in your desired limits, and it contains the desired substring, and its length (i.e. the number of characters) is equal to the length of the desired substring, plus # of As, plus # of Bs, plus # of Cs, plus at most N characters more than that, then the string is good, otherwise it is bad. Actually, to be careful, you should first search for your desired substring and then remove it from the original string, then count # of As, Bs, and Cs, because otherwise you may unintentionally count the As, Bs, and Cs that appear in your desired string, if there are any there.
You can do what you want with a regular expression, but it would be a long ugly regular expression. Why? Because you would need a separate "case" in the regular expression for each of the possible orders of the elements. For example, the regular expression "^ABC..XY$" will match any string beginning with "ABC" and ending with "XY" and having two wild card characters in the middle. But only in that order. If you want a regular expression for all possible orders, you'd need to list all of those orders in the regular expression, e.g. it would begin something like "^(ABC..XY|ACB..XY|BAC..XY|BCA..XY|" and go on from there, with about 5! = 120 different orders for that list of 5 elements, then you'd need more for the cases where there was no A, then more for cases where there was no B, etc. I think a regular expression is the wrong tool for the job here.

Resources