Ruby specifying regexp - ruby

How would I write a regexp so that the string MUST equal the exact format in the regexp?
For example:
/\d:\d/ =~ 5:4
BUT
/\d:\d/ is also equal to 5:42alskjf2425
how do I make it so that my regexp checks for only a digit, followed by a colon, followed by a digit, and nothing else?
Thanks.

Use \A and \z anchors, to match the beginning and end of a string:
/\A\d:\d\z/ =~ '5:4' # => 0 (boolean true)
/\A\d:\d\z/ =~ '5:4x' # => nil (boolean false)

If you need to specify how many characters must be found, you can do it a couple ways:
\d finds one.
\d{1} finds one.
\d{1,2} finds one or two.
\d{1,} finds one or more.
\d{,2} finds zero, one or two.
In other words, use:
/\d{1}:\d{1}/
Check it out:
'5:4'[/\d{1}:\d{1}/] # => "5:4"
'5:42alskjf2425'[/\d{1}:\d{1}/] # => "5:4"
That's all documented so take the time to read through the Regexp documentation.

Related

Positive Lookahead and Non-capturing group difference

When you want to match either of two patterns but not capture it, you would use a noncapturing group ?::
/(?:https?|ftp)://(.+)/
But what if I want to capture '_1' in the string 'john_1'. It could be '2' or '' followed by anything else. First I tried a non-capturing group:
'john_1'.gsub(/(?:.+)(_.+)/, "")
=> ""
It does not work. I am telling it to not capture one or more characters but to capture _ and all characters after it.
Instead the following works:
'john_1'.gsub(/(?=.+)(_.+)/, "")
=> "john"
I used a positive lookahead. The definition I found for positive lookahead was as follows:
q(?=u) matches a q that is
followed by a u, without making the u part of the match. The positive
lookahead construct is a pair of parentheses, with the opening
parenthesis followed by a question mark and an equals sign.
But that definition doesn't really fit my example. What makes the Positive Lookahead work but not the Non-capturing group work in the example I provide?
Capturing and matching are two different things. (?:expr) doesn't capture expr, but it's still included in the matched string. Zero-width assertions, e.g. (?=expr), don't capture or include expr in the matched string.
Perhaps some examples will help illustrate the difference:
> "abcdef"[/abc(def)/] # => abcdef
> $1 # => def
> "abcdef"[/abc(?:def)/] # => abcdef
> $1 # => nil
> "abcdef"[/abc(?=def)/] # => abc
> $1 # => nil
When you use a non-capturing group in your String#gsub call, it's still part of the match, and gets replaced by the replacement string.
Your first example doesn't work because a non-capturing group is still part of the overall capture, whereas the lookbehind is only used for matching but isn't part of the overall capture.
This is easier to understand if you get the actual match data:
# Non-capturing group
/(?:.+)(_.+)/.match 'john_1'
=> #<MatchData "john_1" 1:"_1">
# Positive Lookbehind
/(?=.+)(_.+)/.match 'john_1'
=> #<MatchData "_1" 1:"_1">
EDIT: I should also mention that sub and gsub work on the entire capture, not individual capture groups (although those can be used in the replacement).
'john_1'.gsub(/(?:.+)(_.+)/, 'phil\1')
=> "phil_1"
Let's consider a couple of situations.
The string preceding the underscore must be "john" and the underscore is followed by one or more characters
str = "john_1"
You have two choices.
Use a positive lookbehind
str[/(?<=john)_.+/]
#=> "_1"
The positive lookbehind requires that "john" must appear immediately before the underscore, but it is not part of the match that is returned.
Use a capture group:
str[/john(_.+)/, 1]
#=> "_1"
This regular expression matches "john_1", but "_.+" is captured in capture group 1. By examining the doc for the method String#[] you will see that one form of the method is str[regexp, capture], which returns the contents of the capture group capture. Here capture equals 1, meaning the first capture group.
Note that the string following the underscore may contain underscores: "john_1_a"[/(?<=john)_.+/] #=> "_1_a".
If the underscore can be at the end of the string replace + with * in the above regular expressions (meaning match zero or more characters after the underscore).
The string preceding the underscore can be anything and and the underscore is followed by one or more characters
str = "john_mary_tom_julie"
We may consider two cases.
The string returned is to begin with the first underscore
In this case we could write:
str[/_.+/]
#=> "_mary_tom_julie"
This works because the regex is by default greedy, meaning it will begin at the first underscore encountered.
The string returned is to begin with the last underscore
Here we could write:
str[/_[^_]+\z/]
#=> "_julie"
This regex matches an underscore followed by one or more characters that are not underscores, followed by the end-of-string anchor (\z).
Aside: the method String#[]
[] may seem an odd name for a method but it is a method nevertheless, so it can be invoked in the conventional way:
str.[](/john(_.+)/, 1)
#=> "_1"
The expression str[/john(_.+)/, 1] is an example (of which there are many in Ruby) of syntactic sugar. When written str[...] Ruby converts it to the conventional expression for methods before evaluating it.

How do I specify in Ruby that I want to match a character provided that a sequence following that character does not match a pattern?

I'm using Ruby on Rails 5.1. In Ruby, how do I say taht I want to match a string if the first character matches something but the sequence that follows does NOT match a pattern? That is, I want to match a number provided that the sequence taht follows is not a character from an array I have followed by two other numbers. Here's my character array ...
2.4.0 :010 > TOKENS
=> [":", ".", "'"]
So this string would NOT match
3:00
since ":00" matches the pattern of a character from my array followed by two numbers. But this string
3400
would match. This string would also match
3:0
and this would match
3
since nothing follows the above. How do I write the appropriate regex in Ruby?
string =~ /\A\d+(?!:\d{2})/
This regular expression means:
\A anchors the match to the start of the string.
\d+ means "one or more digits".
(?!...) is a negative look-ahead. It checks that the pattern contained in the brackets does not match., looking ahead from the current position.
:\d{2} means : followed by two digits.
Consideration should be given to testing the first character and the remaining characters separately.
def match_it?(str, first_char_regex, no_match_regex)
str[0].match?(first_char_regex) && !str[1..-1].match?(no_match_regex)
end
match_it?("0:00", /0/, /\A[:. ]cat\z/) #=> true
match_it?("0:00", /\d/, /\A[:. ]\d+\z/) #=> false
match_it?("0:00", /[[:alpha:]]/, /\A[:. ]\d+\z/) #=> false
I believe this reads well and it simplifies testing when compared to methods that employ a single regular expression.

Regex to find strings with only letters or numbers or both

I am searching for strings with only letters or numbers or both. How could I write a regex for that?
You can use following regex to check if the string contains letters and/or numbers
^[a-zA-Z0-9]+$
Explanation
^: Starts with
[]: Character class
a-zA-Z: Matches any alphabet
0-9: Matches any number
+: Matches previous characters one or more time
$: Ends with
RegEx101 Demo
"abc&#*(2743438" !~ /[^a-z0-9]/i # => false
"abc2743438" !~ /[^a-z0-9]/i # => true
This example let to avoid multiline anchors use (^ or $) (which may present a security risk) so it's better to use \A and \z, or to add the :multiline => true option in Rails.
Only letters and numbers:
/\A[a-zA-Z0-9]+\z/
Or if you want to leave - and _ chars also:
/\A[a-zA-Z0-9_\-]+\z/

Ruby Regular expression match

why this regexp
(?<!\S)[^\s]*[aeiou][^\s]*(?<=\d)(?!\S)
match test123 but not 123test
i want to match a word which must have a vowel and digit .As i am new to this i dont understand all methods completely. maybe thats causing problem.
i want to match a word which must have a vowel and digit
(?<!\S)\S*(?:[aeiou]\S*\d|\d\S*[aeiou])\S*(?!\S)
This part in your regex,
(?<=\d)(?!\S)
will look for a digit to be present which must not be followed by a non-space character. In this test123, because 3 present at the last satisfies this condition where 3 is not followed by a non-space character. So your regex matches test123 and fails to match 123test because all the digits present in this input is followed by a non-space character. And also your regex asserts that there must be an vowel exists before the digit. This is also a reason.
(?<!\S)[^\s]*[aeiou][^\s]*(?<=\d)(?!\S)
^^^^^^^
Because of the lookbehind which makes regex engine look for integer after the last match which is not present in 123test
For your needs you can simply use
\b(?=[a-zA-Z0-9]*[0-9])(?=[a-zA-Z0-9]*[aeiou])[a-zA-Z0-9]+\b
See demo.
https://www.regex101.com/r/fJ6cR4/27
Alternative way of doing this:
def vowel_and_num?(str)
!(str.scan(/[aeiou]/).empty? || str.scan(/[0-9]/).empty?)
end
vowel_and_num?("test123")
# => true
vowel_and_num?("123test")
# => true
vowel_and_num?("test")
# => false
vowel_and_num?("123")
# => false

Regex dismatch repeat special character

I'm doing a regex to check a slug.
Actually my regex is : /^[^-][a-z\-].*[^-]+$/
here's what I'm checking right now :
my-awesome-project => valid
-my-awesome-project => invalid
my-awesome-project- => invalid
Now what I want is to check if the dash is repeating or not :
my-awesome-project => should be valid
my-awesome--project => should not be valid
my----awesome-project => should not be valid
Can I do that with a regex ?
Thank you,
I think this regexp should work:
/^[a-z]+(-[a-z]+)*$/
What this does: ^[a-z]+ matches if the string begins with at least on character. After that there may be (-[a-z]+)*$ zero or more occurances of a dash followed by again at least one character.
See on Rubular.
As I understand, the string is valid unless it:
contains a character other than a lower-case letter or hyphen,
begins with a hyphen,
ends with a hyphen, or
contains two (or more) hyphens in a row.
If that's the case, it's easiest to check if it invalid:
R = /
[^a-z-] # match one character other than a lower-case letter or hyphen
| # or
^- # match a hyphen as the first character
| # or
-$ # match a hyphen as the last character
| # or
-- # match two hypens
/x
def valid?(str)
str !~ R
end
valid? 'my-awesome-project' #=> true
valid? '-my-awesome-project' #=> false
valid? 'my-awesome-project-' #=> false
valid? 'my-awesome--project' #=> false
valid? 'my----awesome-project' #=> false
Below regex may be helpful.
[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*

Resources