Find any two periods and any non-digit character in regex string - ruby

I'm close, I just need a few tips on getting this correct.
Essentially I want to return true if I have the string 1.0, but return false if I have the string 1..0 or ANY character other than numbers.
So I need to match any character other than numbers but I can also match 2 or more periods.
EX:
1.0 => true
2.000004 => true
foobar => false
2..0 => false
2 => true
2.0.0.0.0 => false
I currently have been toying with this: /[a-zA-Z]|.{2,}/
my translation is that it matches any letters, or any 2 periods..
It works, but it only matches 2 consecutive periods, so the following happens
2.0.0.0.0 => true (I want to be false, as in I want to match this)
I would like to use \D instead of [a-zA-Z] to potentially avoid any strange ASCII characters but a period is in the subset of \D but not in [a-zA-Z]
Any tips? Thanks!

I've tested all your options
^\.{1}?[0-9]*\.{1}?[0-9]+$
http://rubular.com/r/5ll1Grhy6H
If you need negative numbers:
^[.-]{1}?[0-9]*\.{1}?[0-9]+$
http://rubular.com/r/LLOjq1pIUY

Here is an opposite approach:
/\.{2,}|[^\d\.-]|(\d\.){2,}|\.\z|.-/
This regex will find "incorrect" strings, that contain:
two or more dots close to each other
characters that are not: digits, dot, dash (for negative numbers)
two or more groups "digit-dot" (like in 2.0.0.0.0)
dot as a last character
dash not in the first position
All strings that return true in this comparison
string !~ /\.{2,}|[^\d\.-]|(\d\.){2,}|\.\z|.-/
should be valid numbers (floats or integers)

Related

How to use regular expressions to match numbers with some exceptions

I need to match numbers in groups of 5, from 1 to 5, with the following exceptions:
Numbers can't include zeros
Numbers can't be like 11111, 22222 and so on.
Numbers can't be like 12345 or 54321
Some examples of valid numbers:
14252, 45121, 43412, 51321 ...
So far I got an expression to group the numbers and do not allow zeros.
/[1-5]{5}/
But I'm having some trouble to handle the second and third exceptions. I tried unsuccessfully to use a negative lookahead to disallow a match if I have a pattern of repeated numbers.
?!11111|?!22222
I'm trying with this expression:
((?!11111)[1-5]{5}?)
How can I write regular expressions to not match certain patterns?
I will eventually change it to not match any other sequence of numbers.
First off, you don't have to cram everything into one regex. Regexes are already complicated, if you can do it in multiple regexes, that will often make things much simpler and allow for more flexible code. For example, you can customize the error message based on which condition failed. Usually you only need to fold multiple regexes together for performance reasons, and there are tools to do that automatically.
So far I got an expression to group the numbers and do not allow zeros.
/[1-5]{5}/
Careful, you have to anchor at both ends that else it will accept any string that contains a run of 5 of 1-5.
/\A[1-5]{5}\z/
Numbers can't be like 11111, 22222 and so on.
Use a capture within the regex to accomplish this. Capture the first number, then see if there's four more. () to capture and \1 to refer to what was captured.
/\A([1-5])\1{4}\z/'
Numbers can't be like 12345 or 54321
/\A(?:12345|54321)\z/
Here's a solution that does not use a regular expression. I understand we are to determine if: a) the string contains five characters; b) each character equals '1', '2', '3', '4' or '5'; c) the string contains at least two different characters; and d) the string is neither '12345' nor '54321'. We can do that as follows.
def is_ok?(str)
str.size == 5 && # five characters
(str.chars - ['1','2','3','4','5']).empty? && # only the digits '1'-'5'
str.squeeze.size > 1 && # not all the same character
str != '12345' && # not an increasing sequence
str != '54321' # not a decreasing sequence
end
is_ok? '12543' #=> true
is_ok? '12043' #=> false
is_ok? '12643' #=> false
is_ok? '22222' #=> false
is_ok? '12345' #=> false
is_ok? '54321' #=> false
You have the right idea using negative lookaheads, just the syntax was a little off. This works for me:
\A(?!11111|22222|33333|44444|55555|12345|54321)[1-5]{5}\z
How about this?
^(?!([1-5])\1{4})(?!54321)(?!12345)[1-5]{5}$

How to scan for substrings with specific characters in them

This is a follow-up to this question. How to scan and return a set of words with specific characters in them in Ruby
We want to scan for words starting with a certain set of letters and then return them in an array. Something like this:
b="h ARCabc s and other ARC12".scan(/\w+ARC*\w+/)
and get back:
["ARCabc","ARC12"]
How would I do this (and I know this is very similar to what I asked yesterday)?
Just use the following regex:
\bARC\w*\b
or (to exclude underscores from matching)
\bARC[[:alnum:]]*\b
See regex demo
The regex matches:
\b - a word boundary (ARC at the start of a word only)
ARC - a fixed sequence of characters
\w* - 0 or more letter, digits or underscores. NOTE: if you only want to limit the matches to letters and digits, replace this \w* with [[:alnum:]]*.
\b - end of word (trailing) boundary.
See IDEONE demo here (output: ARCabc and ARC12).
NOTE2: If you plan to match Unicode strings, consider using either of the following regexps:
\bARC\p{Word}*\b - this variation will match words with underscores after ARC
\bARC[\p{L}\p{M}\d]*\b - this regex will match words that only have digits and Unicode letters after ARC.
For good readability, you could split the string into words and then select the ones you want:
str = "h ARCabc s and other ARC12"
target = "ARC"
str.split.select { |w| w.include?(target) }
#=> ["ARCabc", "ARC12"]
If the words must begin with target:
str.split.select { |w| w.start_with?(target) }

Substring starting with a combination of digits till the next white space

I have a very long string of around 2000 chars. The string is a join of segments with first two chars of each segment as the segment indicator.
Eg- '11xxxxx 12yyyy 14ddddd gghgfbddc 0876686589 SANDRA COLINS 201 STMONK CA'
Now I want to extract the segment with indicator 14.
I achieved this using:
str.split(' ').each do |substr|
if substr.starts_with?('14')
key = substr.slice(2,5).to_i
break
end
end
I feel there should be a better way to do this. I am not able to find a more direct and one line solution for string matching in ruby. Please someone suggest a better approach.
It's not entirely clear what you're looking for, because your example string shows letters, but your title says digits. Either way, this is a good task for a regular expression.
foo = '12yyyy 014dddd 14ddddd gghgfbddc'
bar = '12yyyy 014dddd 1499999 gghgfbddc'
baz = '12yyyy 014dddd 14a9B9z gghgfbddc'
foo[/\b14[a-zA-Z]+/] # => "14ddddd"
bar[/\b14\d+/] # => "1499999"
baz[/\b14\w+/] # => "14a9B9z"
foo[/\b14\S+/] # => "14ddddd"
bar[/\b14\S+/] # => "1499999"
baz[/\b14\S+/] # => "14a9B9z"
In the patterns:
\b means word-break, so the pattern has to start at a transition between spaces or punctuation.
[a-zA-Z]+ means one or more letters.
\d+ means one or more digits.
\w+ means one or more of letters, digits and '_'. That is equivalent to the character set [a-zA-Z0-9_]+.
\S+ means non-whitespace, which is useful if you want everything up to a space.
Which of those is appropriate for your use-case is really up to you to decide.

Using Regexp to check whether a string starts with a consonant

Is there a better way to write the following regular expression in Ruby? The first regex matches a string that begins with a (lower case) consonant, the second with a vowel.
I'm trying to figure out if there's a way to write a regular expression that matches the negative of the second expression, versus writing the first expression with several ranges.
string =~ /\A[b-df-hj-np-tv-z]/
string =~ /\A[aeiou]/
The statement
$string =~ /\A[^aeiou]/
will test whether the string starts with a non-vowel character, which includes digits, punctuation, whitespace and control characters. That is fine if you know beforehand that the string begins with a letter, but to check that it starts with a consonant you can use forward look-ahead to test that it starts with both a letter and a non-vowel, like this
$string =~ /\A(?=[^aeiou])(?=[a-z])/i
To match an arbitrary number of consonants, you can use the sub-expression (?i:(?![aeiou])[a-z]) to match a consonant. It is atomic, so you can put a repetition count like {3} right after it. For example, this program finds all the strings in a list that contain three consonants in a row
list = %w/ aab bybt xeix axei AAsE SAEE eAAs xxsa Xxsr /
puts list.select { |word| word =~ /\A(?i:(?![aeiou])[a-z]){3}/ }
output
bybt
xxsa
Xxsr
I modified the answer provided by #Alexander Cherednichenko in order to get rid of the if statements.
/^[^aeiou\W]/i.match(s) != nil
If you want to catch a string that doesn't start with vowels, but only starts with consonants you can use this code below. It returns true if a string starts with any letter other than A, E, I, O, U. s is any string we give to a function
if /^[^aeiou\W]/i.match(s) == nil
return false
else
return true
end
i added at the end to make regular expression case insensitive.
\W is used to catch any non-word character, for example if a string starts with a digit like: "1something"
[^aeiou] means a range of character except a e i o u
And we put ^ at the beginning before [ to indicate that the following range [^aeiou\W] if for the 1st character
Note that ^[^aeiou\W] pattern is not correct because it also matches a line that starts with a digit, or underscore. Borodin's solution is working well, but there is one more possible solution without lookaheads, based on character class subtraction (more here) and using the more contemporary Regexp#match?:
/\A[a-z&&[^aeiou]]/i.match?(word)
See the Rubular demo.
Details
\A - start of a string (^ in Ruby is start of any line)
[a-z&&[^aeiou]] - an a-z character range matching any ASCII letter (/i flag makes it case insensitive) except for the aeiou chars.
See the Ruby demo:
test = %w/ 1word _word ball area programming /
puts test.select { |w| /\A[a-z&&[^aeiou]]/i.match?(w) }
# => ['ball', 'programming']

Regexp for specific matching of character string

I need a regex to match something like
"4f0f30500be4443126002034"
and
"4f0f30500be4443126002034>4f0f31310be4443126005578"
but not like
"4f0f30500be4443126002034>4f0f31310be4443126005578>4f0f31310be4443126005579"
Try:
^[\da-f]{24}(>[\da-f]{24})?$
[\da-f]{24} is exactly 24 characters consisting only of 0-9, a-f. The whole pattern is one such number optionally followed by a > and a second such number.
I think you want something like:
/^[0-9a-f]{24}(>[0-9a-f]{24})?$/
That matches 24 characters in the 0-9a-f range (which matches your first string) followed by zero or one strings starting with a >, followed by 24 characters in the 0-9a-f range (which matches your second string). Here's a RegexPal for this regex.
Don't need a regex.
str = "4f0f30500be4443126002034>4f0f31310be4443126005578"
match = str.count('>') < 2
match will be set to true for matches where there are 1 or 0 '>' in the string. Otherwise match is set to false.

Resources