Regex condition on first and last characters - ruby

How can I write a regex to match a string that does not start or end with a white space character? A matching string can have any character in the middle, and importantly, a single-character string should match.
My attempt was:
/\A\S.*\S\z/
but this will not match a single character.

This is one of the cases where you should not attempt to build a regex that matches something, but rather a regex that matches the complement of something, and use the regex negatively.
re = /\A\s|\s\z/
re !~ " " # => false
re !~ "" # => true
re !~ "sss" # => true
re !~ "s ss" # => true
re !~ " s ss" # => false

is_ok = lambda do |str|
a, z = str.chars.first, str.chars.last
"#{a}#{z}" =~ / |\n|\t/ ? false : true
end
#"more elegant" (yeah dude I rock)
is_ok = lambda {|str| [0, -1].map{|i| str.chars[i] }.join =~ / |\n|\t/ ? false : true}

Use this regex:
\A\S+(?:\s*\S+)*\Z
You can play with the Test String part of this demo to see how this works. I'm assuming that strings can span multiple lines, hence the \A and \Z
In Ruby, something like:
if subject =~ /\A\S+(?:\s*\S+)*\Z/
match = $&
Explanation
The \A anchor asserts that we are at the beginning of the subject string
\S+ matches one or more non-whitespace characters (including tabs, newlines etc.) Alternaltely, if you want to allow newlines at the beginning but only want to exclude a space character, you can use [^ ]+ instead of \S+
(?:\s*\S+) matches any number of optional whitespace characters, followed by one or more non-space characters
The * quantifier repeats that zero or more times
The \Z anchor asserts that we are at the end of the subject string

Use lookaheads, like this:
\A(?=\S).*\S\Z
Regex101 Demo
This matches the start of the string and requires (1) that the first character be a non-whitespace character and (2) that the last character be a non-whitespace character.
Matches:
a
a b
a b c d 1231 e
Non matches:
(just a space)
a (leading space)
b (trailing space)
empty string

Related

A way to execute a block only when a unique phrase is input

For example I have two commands in a program:
elsif ##content.downcase.include? "!roll"
randomRoll = rand(100)
puts("You rolled a #{add_comma_to_numbers(randomRoll.to_s)} (1-100)")
elsif ##content.downcase.include? "!roll1m"
randomRoll = rand(1000000)
puts("You rolled a #{add_comma_to_numbers(randomRoll.to_s)} (1-1m)")
Because !roll1m has the same prefix as !roll, they both execute when !roll is input, which I only want the top one to execute.
Perhaps there is another method to use besides include??
I have thought about just simply flipping the two, however that would just be a temporary fix. Any suggestions would help
You can do that as follows:
r = /!roll\b/
which reads, "match the string '!roll' followed by a word break".
'!roll'.match?(r)
#=> true
'I am on a !roll today'.match?(r)
#=> true
'!roll1m'.match?(r)
#=> false
If, in addition, we wish to match, "tootsie !roll" but not "tootsie!roll", we could use either of the following regular expressions:
r = /(?<=\A|\W)!roll\b/
r = /(?<!\w)!roll\b/
(?<=\A|\W), a positive lookbehind ((?<=...)), requires the string "!roll" to be immediately preceded by the beginning of the string (\A) or (|) by a non-word character (\W).
(?<!\w), a negative lookbehind ((?<!...)), requires the string "!roll" to not be immediately preceded a word character (\w).
\b requires that '!roll' be followed by a word break, meaning that '!roll' must be followed by a non-word character or be at the end of the string.
Both return the following:
"tootsie !roll".match?(r) #=> true
"tootsie!roll".match?(r) #=> false
I initially expected I could use the regular expression
r = /\b!roll\b/
but as I was reminded in the comments, the word-break character, \b, requires there to be a word character on one side and not a word character on the other side. This includes a word character before and no characters after or a word character after and no characters before.1
For example,
r = /\b!roll\b/
"!roll".match?(r)
#=> false (\W ('!') after \b but no \w before '!')
"tootsie!roll".match(r)
#=> true (\w before \b, \W ('!') after \b)
"tootsie !roll".match(r)
#=> false (\W before and after \b)
1. This is not made clear in the documentation for Regexp (v2.7.0), which states, "\b - Matches word boundaries when outside brackets; backspace (0x08) when inside brackets".

Matching strings that contain a letter with the first character not being a number

How do I write a regular expression that has at least one letter, but the first character must not be a number? I tried this
str = "a"
str =~ /^[^\d][[:space:]]*[a-z]*/i
# => 0
str = "="
str =~ /^[^\d][[:space:]]*[a-z]*/i
# => 0
The "=" is matched even though it contains no letters. I expect the"a"to match, and similarly a string like"3abcde"` should not match.
The [a-z]* and [[:space:]]* patterns can match an empty string, so they do not really make any difference when validating is necessary. Also, = is not a digit, it is matched with [^\d] negated character class that is a consuming type of pattern. It means it requires a character other than a digit in the string.
You may rely on a lookahead that will restrict the start of string position:
/\A(?!\d).*[a-z]/im
Or even a bit faster and Unicode-friendly version:
/\A(?!\d)\P{L}*\p{L}/
See the regex demo
Details:
\A - start of a string
(?!\d) - the first char cannot be a digit
\P{L}* - 0 or more (*) chars other than letters
or
.* - any 0+ chars, including line breaks if /m modifier is used)
\p{L} - a letter
The m modifier enables the . to match line break chars in a Ruby regex.
Use [a-z] when you need to restrict the letters to those in ASCII table only. Also, \p{L} may be replaced with [[:alpha:]] and \P{L} with [^[:alpha:]].
If two regular expressions were permitted you could write:
def pass_de_test?(str)
str[0] !~ /\d/ && str =~ /[[:alpha]]/
end
pass_de_test?("*!\n?a>") #=> 4 (truthy)
pass_de_test?("3!\n?a>") #=> false
If you want true or false returned, change the operative line to:
str[0] !~ /\d/ && str =~ /[[:alpha]]/) ? true : false
or
!!(str[0] !~ /\d/ && str =~ /[[:alpha]]/)

Capitalize the first character after a dash

So I've got a string that's an improperly formatted name. Let's say, "Jean-paul Bertaud-alain".
I want to use a regex in Ruby to find the first character after every dash and make it uppercase. So, in this case, I want to apply a method that would yield: "Jean-Paul Bertaud-Alain".
Any help?
String#gsub can take a block argument, so this is as simple as:
str = "Jean-paul Bertaud-alain"
str.gsub(/-[a-z]/) {|s| s.upcase }
# => "Jean-Paul Bertaud-Alain"
Or, more succinctly:
str.gsub(/-[a-z]/, &:upcase)
Note that the regular expression /-[a-z]/ will only match letters in the a-z range, meaning it won't match e.g. à. This is because String#upcase does not attempt to capitalize characters with diacritics anyway, because capitalization is language-dependent (e.g. i is capitalized differently in Turkish than in English). Read this answer for more information: https://stackoverflow.com/a/4418681
"Jean-paul Bertaud-alain".gsub(/(?<=-)\w/, &:upcase)
# => "Jean-Paul Bertaud-Alain"
I suggest you make the test more demanding by requiring the letter to be upcased: 1) be preceded by a capitalized word followed by a hypen and 2) be followed by lowercase letters followed by a word break.
r = /
\b # Match a word break
[A-Z] # Match an upper-case letter
[a-z]+ # Match >= 1 lower-case letters
\- # Match hypen
\K # Forget everything matched so far
[a-z] # Match a lower-case letter
(?= # Begin a positive lookahead
[a-z]+ # Match >= 1 lower-case letters
\b # Match a word break
) # End positive lookahead
/x # Free-spacing regex definition mode
"Jean-paul Bertaud-alain".gsub(r) { |s| s.upcase }
#=> "Jean-Paul Bertaud-Alain"
"Jean de-paul Bertaud-alainM".gsub(r) { |s| s.upcase }
#=> "Jean de-paul Bertaud-alainM"

Regex dismatch repeat special character

I'm doing a regex to check a slug.
Actually my regex is : /^[^-][a-z\-].*[^-]+$/
here's what I'm checking right now :
my-awesome-project => valid
-my-awesome-project => invalid
my-awesome-project- => invalid
Now what I want is to check if the dash is repeating or not :
my-awesome-project => should be valid
my-awesome--project => should not be valid
my----awesome-project => should not be valid
Can I do that with a regex ?
Thank you,
I think this regexp should work:
/^[a-z]+(-[a-z]+)*$/
What this does: ^[a-z]+ matches if the string begins with at least on character. After that there may be (-[a-z]+)*$ zero or more occurances of a dash followed by again at least one character.
See on Rubular.
As I understand, the string is valid unless it:
contains a character other than a lower-case letter or hyphen,
begins with a hyphen,
ends with a hyphen, or
contains two (or more) hyphens in a row.
If that's the case, it's easiest to check if it invalid:
R = /
[^a-z-] # match one character other than a lower-case letter or hyphen
| # or
^- # match a hyphen as the first character
| # or
-$ # match a hyphen as the last character
| # or
-- # match two hypens
/x
def valid?(str)
str !~ R
end
valid? 'my-awesome-project' #=> true
valid? '-my-awesome-project' #=> false
valid? 'my-awesome-project-' #=> false
valid? 'my-awesome--project' #=> false
valid? 'my----awesome-project' #=> false
Below regex may be helpful.
[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*

Lookbehind with the ^ character in a Ruby regex

Why, in Ruby, do the first two regexes fail to match while the third matches?
str = 'ID: 4'
regex1 = /^(?<=ID: )\d+/
regex2 = /\A(?<=ID: )\d+/
regex3 = /(?<=ID: )\d+/
str.match(regex1) # => nil
str.match(regex2) #=> nil
str.match(regex3) #=> #<MatchData "4">
The only difference is the ^ or \A characters, which match the beginning of a line and beginning of the string, respectively. It seems both should be matched by str.
The look-behind pattern (?<=ID: ) matches a position in the string that is preceded by «ID: ».
The anchors ^ and \A match a position at the beginning of the line or string.
So the pattern \A(?<=ID: ) asks that both match together, i.e. that the beginning of the string is preceded by «ID: ». Not gonna happen!
Both of these would work fine if you put the anchor inside of the lookbehind:
regex1 = /(?<=^ID: )\d+/
regex2 = /(?<=\AID: )\d+/
If the anchors are outside of the lookbehind then you are saying "from the start of the string, are the previous characters ID:". This will always fail because there won't be any characters before the start of the string.
Look-ahead and look-behind are non-capturing/zero-length, so the first two expressions don't match.
The first expression, for instance, amounts to another way of writing: /^\d+/ (it's conditioned on \d+ not being preceded by a space, but that's not possible since there cannot be anything before ^ anyway).
In the third expression, the lookbehind can occur anywhere and specifically occurs in the zero-width space before the 4. You can see that only the 4 is matched.
With ^ or \A, the zero-width space at the beginning of the string must match the lookbehind, which is impossible.
In regex1, which is /^(?<=ID: )\d+/, there has to be a beginning of a line that is preceded by ID:. The string in question does not have such point.
In regex2, which is /\A(?<=ID: )\d+/, there has to be a beginning of a string that is preceded by ID:. There is no string that has such point.
In regex3, which is /(?<=ID: )\d+/, there has to be a point of string that is preceded by ID: and is followed by \d+. There is such point in the string.
Look-behind doesn't change position of the match.
/(?<=ID: )\d+/ is actually matched at the digit:
ID: 4
^

Resources