Ruby regex specify length of captured group - ruby

I need to match a string of variable length(between 5 and 12), composed of uppercase letters and one or more digits between 1 and 8.
How can I specify that I need the whole captured group's length to be between 5 and 12?
I have tried with parenthesis but with no luck.
I have tried this
\s([A-Z]+[1-8]+[A-Z]+){5,12}\s
My idea was to use the quantifier {5,12} to limit the length of the captured group between parenthesis, but clearly it doesn't work like that.
The string needs to be identified inside a normal text just like
"THE STRING I NEED TO DECODE IS SOMETHING LIKE FD1531FHHKWF BUT NOT LIKE g4G58234JJ"

You actually have two conditions to met:
The length of the match is to be specified with curly brackets {5,12}, and before and after there should be not letters/digits. So:
/(?!\b[A-Z]+\b)\b[A-Z1-8]{5,12}\b/
First, we assure that the lookahead for letters only is negative, then we look for the pattern.

Use positive look-ahead on total size of regex
\s(?=^.{5,12}$)([A-Z]+[1-8]+[A-Z]+)\s
Explanation
(?= # look-ahead match start
^.{5,12}$ # 3 to 15 characters from start to end
) # look-ahead match end

Related

Why does this regex not match numbers and single letters?

Why does this regex not match 3a?
(\/\d{1,4}?|\d{1,4}?|\d{1,4}[A-z]{1})
Using \d{1,4}\D{1}, the result is the same.
Streets numbers:
/1
78
3a
89/
-1 (special case)
1
https://regex101.com/r/cYCafR/3
The digits+letter combination is not matched due to the order of alternatives in your pattern. The \d{1,4}? matches the digit before the letter, and \d{1,4}[A-z]{1} does not even have a chance to step in. See the Remember That The Regex Engine Is Eager article.
The \/\d{1,4}? will match a / and a single digit after the slash, and \d{1,4}? will always match a single digit, as {min,max}? is a lazy range/interval/limiting quantifier and as such only matches as few chars as possible. See Laziness Instead of Greediness.
Besides, [A-z] is a typo, it should be [A-Za-z].
It seems you want
\d{1,4}[A-Za-z]|\/?\d{1,4}
See the regex demo. If it should be at the start of a line, use
^(?:\d{1,4}[A-Za-z]|\/?\d{1,4})
See this regex demo.
Details
^ - start of a line
(?: - start of a non-capturing group
\d{1,4}[A-Za-z] - 1 to 4 digits and an ASCII letter
| - or
\/? - an optional /
\d{1,4} - 1 to 4 digits
) - end of the group.
Your regex uses lazy quantifiers like {1,4}?. These will match one character, and stop, because the rest of the pattern (i.e. nothing) matches the rest of the string. See here for how greedy vs lazy quantifiers work.
Another reason is that you put the \d{1,4}[A-z]{1} case last. This case will only be tried if the first two cases don't match. With 3a, the 3 already matches the second case, so the last case won't be considered.
You seem to just want:
^(\d{1,4}[A-Za-z]|\/?\d{1,4})
Note how the \/\d{1,4} case and the \d{1,4} case in your original regex are combined into one case \/?\d{1,4}.

Regex matching plus or minus

Could someone please look at the following function and explain the regex for me as I don't understand it and I don't like using something I don't understand as then I won't be able to replicate it for use in the future and nor do I learn from it.
Also can someone explain the double !! in front, I know single means not so does double mean not "not"?
The function is a patch to String to check if it's capable of being converted to an integer or not.
class String
def is_i?
!!(self =~ /\A[-+]?[0-9]+\z/)
end
end
The main thing that's giving me trouble is [-+] as it makes little sense to me, if you could explain in the context given it would be very helpful.
EDIT:
Since people missed the second part of the question I'll be a little more explicit.
What does !! Mean in front of the check, I know a single ! means NOT but I can't find what !! means.
The [-+] Character Class
[-+] is a character class. It means "match one character specified by the class", i.e. - or +.
Hyphens in Character Classes
I can see how this particular class can be confusing because the hyphen often plays a special role in a character class: it links two characters to form a character range. For instance, [a-z] means "match one character between a and z, and [a-z0-9] means "match one character between a and z or between 0 and 9.
However, in this case, the hypen in [-+] is positioned in a place where it cannot be used to specify a range, and the - is just a literal hyphen.
Decoding the entire expression
Assert position at the beginning of the string \A
Match a single character from the list “-+” [-+]?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character in the range between “0” and “9” [0-9]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Assert position at the very end of the string \z
A Character Class defines a set of characters, any one of which can occur in a string for a match to succeed.
For example, the regular expression [-+]?[0-9]+ will match 123, -123, or +123 because it defines a character class (accepting either -, +, or neither one) as its first character.
In context:
\A asserts position at the start of the string.
[-+] any character of: - or + (? optional, meaning between zero and one time)
[0-9] any character of: 0 to 9 (+ quantifier meaning 1 or more times)
\z asserts position at the very end of the string.
What does !! mean?
!! placed together converts the value to a boolean.
explain the regex for me as I don't understand it
Pattern explanation: \A[-+]?[0-9]+\z
\A Start of string
[-+]? plus or minus sign [zero or one time (optional)]
[0-9]+ 0 to 9 any digit [one or more times]
\z End of string
The above regex pattern is able to match any positive and negative integer number that has + or - sign optional.
Read more about Character Classes and test your regex pattern online at Rubular

How to find whole complete number with ruby regex

I'm looking to find the first whole occurance of a number within a string. I'm not looking for the first digit, rather the whole first number. So, for example, the first number in: w134fklj342 is 134, while the first number in 1235alkj9342klja9034 is 1235.
I have attempted to use \d but I'm unsure how to expand that to include multiple digits (without specifying how long the number is).
I think, you're looking for this regex
\d+
"Plus" means "one or more". This regex will match all numbers within a string, so pick first one.
strings = ['w134fklj342', '1235alkj9342klja9034']
strings.each do |s|
puts s[/\d+/]
end
# >> 134
# >> 1235
Demo: http://rubular.com/r/YE8kPE2SyW
The easiest way to understand regexes is to think of eachbit is one character; e.g: \d or [1234567890] or [0-9] will match one digit.
To expand this one character you have 2 basic options: * and +
* will match the character 0 or more times
+ will match it one or more times
Like Sergio said you should use \d+ to match many digits.
Excellent tutorial for regexes in general: http://www.regular-expressions.info/tutorial.html

How can I write a regex in Ruby that will determine if a string meets this criteria?

How can I write a regex in Ruby 1.9.2 that will determine if a string meets this criteria:
Can only include letters, numbers and the - character
Cannot be an empty string, i.e. cannot have a length of 0
Must contain at least one letter
/\A[a-z0-9-]*[a-z][a-z0-9-]*\z/i
It goes like
beginning of string
some (or zero) letters, digits and/or dashes
a letter
some (or zero) letters, digits and/or dashes
end of string
I suppose these two will help you: /\A[a-z0-9\-]{1,}\z/i and /[a-z]{1,}/i. The first one checks on first two rules and the second one checks for the last condition.
No regex:
str.count("a-zA-Z") > 0 && str.count("^a-zA-Z0-9-") == 0
You can take a look at this tutorial for how to use regular expressions in ruby. With regards to what you need, you can use the following:
^[A-Za-z0-9\-]+$
The ^ will instruct the regex engine to start matching from the very beginning of the string.
The [..] will instruct the regex engine to match any one of the characters they contain.
A-Z mean any upper case letter, a-z means any lower case letter and 0-9 means any number.
The \- will instruct the regex engine to match the -. The \ is used infront of it because the - in regex is a special symbol, so it needs to be escaped
The $ will instruct the regex engine to stop matching at the end of the line.
The + instructs the regex engine to match what is contained between the square brackets one or more time.
You can also use the \i flag to make your search case insensitive, so the regex might become something like this:
^[a-z0-9\-]+/i$

Regex that allows for A-z, 0-9, and dashing in the middle, never on the ends?

I'm working to create a ruby regex that meets the following conditions:
Supported:
A-Z, a-z, 0-9, dashes in the middle but never starting or ending in a dash.
At least 5, no more than 500 characters
So far I have:
[0-9a-z]{5,500}
Any suggestions on how to update to meet the criteria above?
Thanks
[A-Za-z\d][-A-Za-z\d]{3,498}[A-Za-z\d]
If you are willing to treat _ as a letter also, it's even simpler:
\w[-\w]{3,498}\w
This should work:
[0-9A-Za-z][0-9A-Za-z-]{3,498}[0-9A-Za-z]
Here you go:
/^[0-9A-Za-z][0-9A-Za-z\-]{3,498}[0-9A-Za-z]$/
or if you want the beginning and end to be only 0-9,A-Z,a-z (instead of non dash) then:
Explanation:
The first ^ matches beginning of string.
The next [] matches a A-Z,a-z,0-9
The next [] matches 3 to 498 chars of A-Z,a-z,0-9,dash. Note that we match 3 to 498 chars because we match one char in the beginning and one in the end.
The next [^] is again a A-Z,a-z,0-9.
And lastly we match $ for the end of the string.
This assumes that there are either always dashes or never dashes. It also assumes only one dash is allowed between alphanumeric characters. It's the only way I can think off hand to limit characters instead of number of instances of the string.
(([0-9a-zA-Z]{4,499})|([0-9a-zA-Z][\d]?){2,249})[0-9a-zA-Z]
Assuming there's no limit to the number of adjacent dashes allowed, this would work:
[0-9a-zA-Z][0-9a-zA-Z\d]{3,498}[0-9a-zA-Z]

Resources