Regex for name in Ruby - ruby

I know this question has been asked a lot but I need a RegEx for a name validator.
The only requirements are letters are okay, No numbers, and no special characters other than 2 and the spaces cannot be at the beginning or end, the "-" and "`" are allowed also. Everything else would be invalid.
All the other answers seem to ask for a lot more and seem to get too complicated.
Currently I am using
/^([^\d\W]|[-])*$/
But this fails with the space
Sample data:
Pass:
Susan Johnson,
Stephanie Le'Sean,
John Pierre'-Frank
Fail:
Ricky2Good,
Jean,stewie,
Mike#dude,
Jim. McNeil

I've assumed that for a string to be valid, it may contain only uppercase and lowercase letters, apostrophes, dashes and at most two spaces, provided the spaces are not at the beginning or end of the string.
STR= "-a-z'"
r = /
\A # match beginning of string
(?: # begin non-capture group
[#{STR}]+ # match 1+ letters, "-" or "'"
| # or
[#{STR}]+\s[#{STR}]*\s?[#{STR}]+
# match 1+ letters, "-" or "'", space, 0+ letters, "-" or "'",
# optional space, 1+ letters, "-" or "'"
) # end non-capture group
\z # match end of string
/ix # case-indifferent and free-spacing regex definition modes
#=> /
\A # match beginning of string
(?: # begin non-capture group
[-a-z']+ # match 1+ letters, "-" or "'"
| # or
[-a-z']+\s[-a-z']*\s?[-a-z']+
# match 1+ letters, "-" or "'", space, 0+ letters, "-" or "'",
# optional space, 1+ letters, "-" or "'"
) # end non-capture group
\z # match end of string
/ix
If I did not use free-spacing mode to define the regex it would look like this:
r = /\A(?:[-a-z']+|[-a-z']+\s[-a-z']*\s?[-a-z']+)\z/i
"a B-' v" =~ r #=> 0
"aB-'v" =~ r #=> 0
"aB-'1v" =~ r #=> nil
"a B-'1 v" =~ r #=> nil
" a B-1v" =~ r #=> nil
If you wish to return true or false, rather than a truthy value 0 or a falsy value nil, you could write, for example:
("a B-' v" =~ r) ? true : false #=> true
or (the "trick")
!!("a B-' v" =~ r) #=> true
The latter works because it is the same as:
!(!("a B-' v" =~ r))
#=> !(!(0)) => !(false) => true
The question asks for a regex to validate names. Using a regex may be the best, but it's not the only way. If the question is really how to validate names--using a regex or otherwise--it should be stated in a way that doesn't stipulate a particular approach. Here's one way to validate without using a regex.
GOOD_CHARS = ('a'..'z').to_a.join << "'-"
#=> "abcdefghijklmnopqrstuvwxyz'-"
def validate(str)
return false if str.empty? || (str[0]==' ' || str[-1]==' ')
nbr_spaces = str.count(' ')
return false if nbr_spaces > 2
str.downcase.count(GOOD_CHARS) + nbr_spaces == str.size
end
validate "a B-' v" #=> true
validate "aB-'v" #=> true
validate "aB-`1v" #=> false
validate "a B-'1 v" #=> false
validate " a B-'1v" #=> false

The following regex should filter for letters, no special characters (other than one space, dashes, and backticks), and no numbers:
/^[a-zA-Z\-\`]++(?: [a-zA-Z\-\`]++)?$/
Hope it helps!

Related

Regex for namespaced Ruby class / module names

What is a good regex to match any namespaced Ruby Class or Module name?
More generally, how do I match sequences of words separated by double colons?
Word1::Word2
Word1::Word2::Word3
Word1::Word2::Word3::Word4
etc.
This is the closest thing I got, but it only works for up to two consecutive words:
string.scan /[a-zA-Z0-9]+(?:\:\:[a-zA-Z0-9]+)/
Your approach is fine you should only quantify capturing group or a shorter:
\b\w+(?:::\w+)+\b
Live demo
R = /
\A # match beginning of string
(?: # begin a non-capture group
(?:::)? # optionally match two colons
\p{Lu} # match an uppercase letter
\w* # match zero or more word characters
)+ # close non-capture group and execute group one or more times
\z # match end of string
/x # free-spacing regex definition mode
'AB::CD::EF'.match?(R) #=> true
'A'.match?(R) #=> true
'::A::C_d::E3F_'.match?(R) #=> true
'AB::cD::EF'.match?(R) #=> false
'AB:::CD::EF&'.match?(R) #=> false
Alternatively, we could write the following.
def valid_mod_name?(str)
i = str[0,2]=='::' ? 2 : 0
str[i..-1].split('::').all? { |s| s.match?(/\A\p{Lu}\w*\z/) }
end
valid_mod_name? 'AB::CD::EF' #=> true
valid_mod_name? 'A' #=> true
valid_mod_name? '::A::C_d::E3F_' #=> true
valid_mod_name? 'AB::cD::EF' #=> false
valid_mod_name? 'AB:::CD::EF&' #=> false

Find nth occurrence of variable regex in Ruby?

Writing a method for what the question says, need to find the index of the nth occurrence of a particular left bracket (defined by the user, i.e. if user provides a string with the additional parameters '{' and '5' it will find the 5th occurrence of this, same with '(' and '[').
Currently doing it with a while loop and comparing each character but this looks ugly and isn't very interesting, is there a way to do this with regex? Can you use a variable in a regex?
def _find_bracket_n(str,left_brac,brackets_num)
i = 0
num_of_left_bracs = 0
while i < str.length && num_of_left_bracs < brackets_num
num_of_left_bracs += 1 if str[i] == left_brac
i += 1
end
n_th_lbrac_index = i - 1
end
The offset of the nth instance of a given character in a string is wanted, or nil if the string contains fewer than n instances of that character. I will give four solutions.
chr = "("
str = "a(b(cd((ef(g(hi("
n = 5
Use Enumerable#find_index
str.each_char.find_index { |c| c == chr && (n = n-1).zero? }
#=> 10
Use a regular expression
chr_esc = Regexp.escape(chr)
#=> "\\("
r = /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
) # end the non-capture group
{#{n-1}} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
/x # free-spacing regex definition mode
#=> /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
\( # match the given character
) # end the non-capture group
{4} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
\( # match the given character
/x
str =~ r
#=> 0
$~.end(0)-1
#=> 10
For the last line we could instead write
Regexp.last_match.end(0)-1
See Regexp::escape, Regexp::last_match and MatchData#end.
The regex is conventionally written (i.e., not free-spacing mode) written as follows.
/\A(?:.*?#{chr_esc}){#{n-1}}.*?#{chr_esc}/
Convert characters to offsets, remove offsets to non-matching characters and return the nth offset of those that remain
str.size.times.select { |i| str[i] == chr }[n-1]
#=> 10
n = 20
str.size.times.select { |i| str[i] == chr }[n-1]
#=> nil
Use String#index repeatedly to decapitate substrings
s = str.dup
n.times.reduce(0) do |off,_|
i = s.index(chr)
break nil if i.nil?
s = s[i+1..-1]
off + i + 1
end - 1
#=> 10

i have a regular expression that i need to figure out

"peter,nick,jake,jack"
i need to have something like this.
i cannot have any whitespace after the word for example,
"peter,," "peter," "peter,,nick " will all be incorrect.
it has to be just a word such as "peter" or a word follow by a comma then word ("peter,nick")
First confirm that the string has the required structure.
r = /
\A # match the beginning of the string
[[:alpha:]]+ # match > 0 letters
(?:,[[:alpha:]]+) # match a comma then > 0 letters in a non-capture group
* # match the preceding non-capture group >= 0 times
\z # match end of the string
/x # free-spacing regex definition mode
str = "peter,nick,jake,jack"
str =~ r #=> 0
Since it matches the regex, simply split on commas to return an array of the words.
str.split(',') #=> ["peter", "nick", "jake", "jack"]
By contrast:
"peter,nick,,jake,jack" =~ r #=> nil
"peter,nick,jake, jack" =~ r #=> nil
"peter,nick,jake,jack " =~ r #=> nil
"peter ispeter,nick" =~ r #=> nil
I assume the string must contain at least one letter.

match a price amount after a particular substring

Considering this string:
Looking for a front-end developer who can fix a bug on my Wordpress site. The header logo disappeared after I updated some plugins. \n\nI have tried disabling all plugins but it didn't help.Budget: $25\nPosted On: May 06, 2016 16:29 UTCCategory: Web, Mobile & Software Dev > Web DevelopmentSkills: WordPress Country: Denmarkclick to apply
I'd like to retrieve the price value after the string Budget:. I have a number of string all with the same pattern (price right after the "Budget:" string)
I tried /\$[\d.]+/ to extract any price amount but that would take any price amount in the string not only the one following Budget:
How can I accomplish that ?
r = /
\b # match a word break
[Bb] # match "B" or "b"
udget: # match string
\s+\$ # match one or more spaces followed by a dollar sign
\K # discard all matches so far
\d{1,3} # match between one or three digits
(?:\,\d{3}) # match a comma followed by three digits in a non-capture group
* # perform the preceding match zero or more times
(?:\.\d\d) # match a period followed by two digits in a non-capture group
? # make the preceding match optional
/x # free-spacing regex definition mode
"Some text Budget: $25\nsome more text"[r] #=> "25"
"Some text Budget: $25.42\nsome more text"[r] #=> "25.24"
"Some text Budget: $25,642,328\nsome more text"[r] #=> "25,642,328"
"Some text Budget: $25,642,328.01\nsome more text"[r] #=> "25,642,328.01"
This is actually not quite right because
"Some text Budget: $25,64,328.01\nsome more text"[r] #=> "25"
should return nil. Unfortunately, the fix calls for major surgery:
r = /
\b # match a word break
[Bb] # match "B" or "b"
udget: # match string
\s+\$ # match 1 or more spaces followed by a dollar sign
\K # discard all matches so far
\d{1,3} # match between 1 and 3 digits
(?: # begin a non-capture group
(?![\,\d]) # match a comma or digit in a negative lookahead
| # or
(?: # begin a non-capture group
(?:\,\d{3}) # match a comma followed by 3 digits in a non-capture group
+ # perform preceding match 1 or more times
) # end non-capture group
) # end non-capture group
(?:\.\d\d) # match a period followed by 2 digits in a non-capture group
? # make the preceding match optional
/x
"Some text Budget: $25\nsome more text"[r] #=> "25"
"Some text Budget: $25.42\nsome more text"[r] #=> "25.24"
"Some text Budget: $25,642,328\nsome more text"[r] #=> "25,642,328"
"Some text Budget: $25,642,328.01\nsome more text"[r] #=> "25,642,328.01"
"Some text Budget: $25,64,328.01\nsome more text"[r] #=> nil
You say the string "Budget:" doesn't change and assuming there are no decimal values, I'd use something like this:
/Budget:(\s*\$\d*)/
Try this:
def extract_budget s
m = s.match(/Budget: \$([\d,.]+)\n/)
if m.nil?
nil
else
m.captures[0].gsub(/,/, "").to_f
end
end
If s1 is your string and s2 is the same string but with "Budget: $25,000.53":
irb> extract_budget s1
=> 25.0
irb> extract_budget s2
=> 25000.53
irb> extract_budget "foo"
=> nil

How to write a regex in a single line

I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.

Resources