Regex: don't match if string contains whitespace - ruby

I can't seem to figure out the regex pattern for matching strings only if it doesn't contain whitespace. For example
"this has whitespace".match(/some_pattern/)
should return nil but
"nowhitespace".match(/some_pattern/)
should return the MatchData with the entire string. Can anyone suggest a solution for the above?

In Ruby I think it would be
/^\S*$/
This means "start, match any number of non-whitespace characters, end"

You could always search for spaces, an then negate the result:
"str".match(/\s/).nil?

>> "this has whitespace".match(/^\S*$/)
=> nil
>> "nospaces".match(/^\S*$/)
=> #<MatchData "nospaces">
^ = Beginning of string
\S = non-whitespace character, * = 0 or more
$ = end of string

Not sure you can do it in one pattern, but you can do something like:
"string".match(/pattern/) unless "string".match(/\s/)

"nowhitespace".match(/^[^\s]*$/)

You want:
/^\S*$/
That says "match the beginning of the string, then zero or more non-whitespace characters, then the end of the string." The convention for pre-defined character classes is that a lowercase letter refers to a class, while an uppercase letter refers to its negation. Thus, \s refers to whitespace characters, while \S refers to non-whitespace.

str.match(/^\S*some_pattern\S*$/)

Related

In ruby, how to find the index of first non-whitespace character of given string

In ruby, how to find the index of first non-whitespace (non-tab, non-space, non-newline) character of given string.
For example, given string "\t\nstring", the index of first non-tab, non-space, non-newline character will be 2 which is 's'.
With this notation:
/\S/ =~ "\t\nstring"
# => 2
Try this one. s is your string
s.index(s.lstrip[0])

Why does /[<>]/ not return both angle brackets with String#match?

I expect this example to match the two characters <and >:
a = "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
a.match /[<>]/
# => #<MatchData "<">
It matches only the first character. Why?
#match only returns the first match as you have seen as MatchData, #scan will return all matches.
>> a="<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
=> "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
>> a.scan /[<>]/
=> ["<", ">"]
Problem
You are misunderstanding your expression. /[<>]/ means:
Match a single character from the character class, which may be either < or >.
Ruby is correctly giving you exactly what you've asked for in your pattern.
Solution
If you're expecting the entire string between the two characters, you need a different pattern. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".match /<.*?>/
#=> #<MatchData "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>">
Alternatively, if you just want to match all the instances of < or > in your string, then you should use String#scan with a character class or alternation. In this particular case, the results will be identical either way. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /<|>/
#=> ["<", ">"]
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /[<>]/
#=> ["<", ">"]

Ruby Regular expression not matching properly

I am trying to creat a RegEx to find words that contains any vowel.
so far i have tried this
/(.*?\S[aeiou].*?[\s|\.])/i
but i have not used RegEx much so its not working properly.
for example if i input "test is 1234 and sky fly test1234"
it should match test , is, and, test1234 but showing
test, is,1234 and
if put something else then different output.
Alternatively you can also do something like:
"test is 1234 and sky fly test1234".split.find_all { |a| a =~ /[aeiou]/ }
# => ["test", "is", "and", "test1234"]
You could use the below regex.
\S*[aeiou]\S*
\S* matches zero or more non-space characters.
or
\w*[aeiou]\w*
It will solve:
\b\w*[aeiou]+\w*\b
https://www.debuggex.com/r/O-fU394iC5ErcSs7
or you can substitute \w by \S
\b\S*[aeiou]+\S*\b
https://www.debuggex.com/r/RNE6Y6q1q5yPJbe-
\b - a word boundary
\w - same as [_a-zA-Z0-9]
\S - a non-whitespace character
Try this:
\b\w*[aeiou]\w*\b
\b denotes a word boundry, so this regexp matches word bounty, zero or more letters, a vowel, zero or more letters and another word boundry

Regex dismatch repeat special character

I'm doing a regex to check a slug.
Actually my regex is : /^[^-][a-z\-].*[^-]+$/
here's what I'm checking right now :
my-awesome-project => valid
-my-awesome-project => invalid
my-awesome-project- => invalid
Now what I want is to check if the dash is repeating or not :
my-awesome-project => should be valid
my-awesome--project => should not be valid
my----awesome-project => should not be valid
Can I do that with a regex ?
Thank you,
I think this regexp should work:
/^[a-z]+(-[a-z]+)*$/
What this does: ^[a-z]+ matches if the string begins with at least on character. After that there may be (-[a-z]+)*$ zero or more occurances of a dash followed by again at least one character.
See on Rubular.
As I understand, the string is valid unless it:
contains a character other than a lower-case letter or hyphen,
begins with a hyphen,
ends with a hyphen, or
contains two (or more) hyphens in a row.
If that's the case, it's easiest to check if it invalid:
R = /
[^a-z-] # match one character other than a lower-case letter or hyphen
| # or
^- # match a hyphen as the first character
| # or
-$ # match a hyphen as the last character
| # or
-- # match two hypens
/x
def valid?(str)
str !~ R
end
valid? 'my-awesome-project' #=> true
valid? '-my-awesome-project' #=> false
valid? 'my-awesome-project-' #=> false
valid? 'my-awesome--project' #=> false
valid? 'my----awesome-project' #=> false
Below regex may be helpful.
[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*

Lookbehind with the ^ character in a Ruby regex

Why, in Ruby, do the first two regexes fail to match while the third matches?
str = 'ID: 4'
regex1 = /^(?<=ID: )\d+/
regex2 = /\A(?<=ID: )\d+/
regex3 = /(?<=ID: )\d+/
str.match(regex1) # => nil
str.match(regex2) #=> nil
str.match(regex3) #=> #<MatchData "4">
The only difference is the ^ or \A characters, which match the beginning of a line and beginning of the string, respectively. It seems both should be matched by str.
The look-behind pattern (?<=ID: ) matches a position in the string that is preceded by «ID: ».
The anchors ^ and \A match a position at the beginning of the line or string.
So the pattern \A(?<=ID: ) asks that both match together, i.e. that the beginning of the string is preceded by «ID: ». Not gonna happen!
Both of these would work fine if you put the anchor inside of the lookbehind:
regex1 = /(?<=^ID: )\d+/
regex2 = /(?<=\AID: )\d+/
If the anchors are outside of the lookbehind then you are saying "from the start of the string, are the previous characters ID:". This will always fail because there won't be any characters before the start of the string.
Look-ahead and look-behind are non-capturing/zero-length, so the first two expressions don't match.
The first expression, for instance, amounts to another way of writing: /^\d+/ (it's conditioned on \d+ not being preceded by a space, but that's not possible since there cannot be anything before ^ anyway).
In the third expression, the lookbehind can occur anywhere and specifically occurs in the zero-width space before the 4. You can see that only the 4 is matched.
With ^ or \A, the zero-width space at the beginning of the string must match the lookbehind, which is impossible.
In regex1, which is /^(?<=ID: )\d+/, there has to be a beginning of a line that is preceded by ID:. The string in question does not have such point.
In regex2, which is /\A(?<=ID: )\d+/, there has to be a beginning of a string that is preceded by ID:. There is no string that has such point.
In regex3, which is /(?<=ID: )\d+/, there has to be a point of string that is preceded by ID: and is followed by \d+. There is such point in the string.
Look-behind doesn't change position of the match.
/(?<=ID: )\d+/ is actually matched at the digit:
ID: 4
^

Resources