Considering this string:
Looking for a front-end developer who can fix a bug on my Wordpress site. The header logo disappeared after I updated some plugins. \n\nI have tried disabling all plugins but it didn't help.Budget: $25\nPosted On: May 06, 2016 16:29 UTCCategory: Web, Mobile & Software Dev > Web DevelopmentSkills: WordPress Country: Denmarkclick to apply
I'd like to retrieve the price value after the string Budget:. I have a number of string all with the same pattern (price right after the "Budget:" string)
I tried /\$[\d.]+/ to extract any price amount but that would take any price amount in the string not only the one following Budget:
How can I accomplish that ?
r = /
\b # match a word break
[Bb] # match "B" or "b"
udget: # match string
\s+\$ # match one or more spaces followed by a dollar sign
\K # discard all matches so far
\d{1,3} # match between one or three digits
(?:\,\d{3}) # match a comma followed by three digits in a non-capture group
* # perform the preceding match zero or more times
(?:\.\d\d) # match a period followed by two digits in a non-capture group
? # make the preceding match optional
/x # free-spacing regex definition mode
"Some text Budget: $25\nsome more text"[r] #=> "25"
"Some text Budget: $25.42\nsome more text"[r] #=> "25.24"
"Some text Budget: $25,642,328\nsome more text"[r] #=> "25,642,328"
"Some text Budget: $25,642,328.01\nsome more text"[r] #=> "25,642,328.01"
This is actually not quite right because
"Some text Budget: $25,64,328.01\nsome more text"[r] #=> "25"
should return nil. Unfortunately, the fix calls for major surgery:
r = /
\b # match a word break
[Bb] # match "B" or "b"
udget: # match string
\s+\$ # match 1 or more spaces followed by a dollar sign
\K # discard all matches so far
\d{1,3} # match between 1 and 3 digits
(?: # begin a non-capture group
(?![\,\d]) # match a comma or digit in a negative lookahead
| # or
(?: # begin a non-capture group
(?:\,\d{3}) # match a comma followed by 3 digits in a non-capture group
+ # perform preceding match 1 or more times
) # end non-capture group
) # end non-capture group
(?:\.\d\d) # match a period followed by 2 digits in a non-capture group
? # make the preceding match optional
/x
"Some text Budget: $25\nsome more text"[r] #=> "25"
"Some text Budget: $25.42\nsome more text"[r] #=> "25.24"
"Some text Budget: $25,642,328\nsome more text"[r] #=> "25,642,328"
"Some text Budget: $25,642,328.01\nsome more text"[r] #=> "25,642,328.01"
"Some text Budget: $25,64,328.01\nsome more text"[r] #=> nil
You say the string "Budget:" doesn't change and assuming there are no decimal values, I'd use something like this:
/Budget:(\s*\$\d*)/
Try this:
def extract_budget s
m = s.match(/Budget: \$([\d,.]+)\n/)
if m.nil?
nil
else
m.captures[0].gsub(/,/, "").to_f
end
end
If s1 is your string and s2 is the same string but with "Budget: $25,000.53":
irb> extract_budget s1
=> 25.0
irb> extract_budget s2
=> 25000.53
irb> extract_budget "foo"
=> nil
Related
What is a good regex to match any namespaced Ruby Class or Module name?
More generally, how do I match sequences of words separated by double colons?
Word1::Word2
Word1::Word2::Word3
Word1::Word2::Word3::Word4
etc.
This is the closest thing I got, but it only works for up to two consecutive words:
string.scan /[a-zA-Z0-9]+(?:\:\:[a-zA-Z0-9]+)/
Your approach is fine you should only quantify capturing group or a shorter:
\b\w+(?:::\w+)+\b
Live demo
R = /
\A # match beginning of string
(?: # begin a non-capture group
(?:::)? # optionally match two colons
\p{Lu} # match an uppercase letter
\w* # match zero or more word characters
)+ # close non-capture group and execute group one or more times
\z # match end of string
/x # free-spacing regex definition mode
'AB::CD::EF'.match?(R) #=> true
'A'.match?(R) #=> true
'::A::C_d::E3F_'.match?(R) #=> true
'AB::cD::EF'.match?(R) #=> false
'AB:::CD::EF&'.match?(R) #=> false
Alternatively, we could write the following.
def valid_mod_name?(str)
i = str[0,2]=='::' ? 2 : 0
str[i..-1].split('::').all? { |s| s.match?(/\A\p{Lu}\w*\z/) }
end
valid_mod_name? 'AB::CD::EF' #=> true
valid_mod_name? 'A' #=> true
valid_mod_name? '::A::C_d::E3F_' #=> true
valid_mod_name? 'AB::cD::EF' #=> false
valid_mod_name? 'AB:::CD::EF&' #=> false
I'm using Ruby 2.4. Let's say I have a string that has a number of spaces in it
str = "abc def 123ffg"
How do I capture all the consecutive words at the beginning of the string that begin with a letter? So for example, in the above, I would want to capture
"abc def"
And if I had a string like
"aa22 b cc 33d ff"
I would want to capture
"aa22 b cc"
but if my string were
"66dd eee ff"
I would want to return nothing because the first word of that string does not begin with a letter.
If you can spare the extra spaces between words, you could then split the string and iterate the resulting array with take_while, using a regex to get the desired output; something like this:
str = "abc def 123ffg"
str.split.take_while { |word| word[0] =~ /[[:alpha:]]/ }
#=> ["abc", "def"]
The output is an array, but if a string is needed, you could use join at the end:
str.split.take_while { |word| word[0] =~ /[[:alpha:]]/ }.join(" ")
#=> "abc def"
More examples:
"aa22 b cc 33d ff".split.take_while { |word| word[0] =~ /[[:alpha:]]/ }
#=> ["aa22", "b", "cc"]
"66dd eee ff".split.take_while { |word| word[0] =~ /[[:alpha:]]/ }
#=> []
The Regular Expression
There's usually more than one way to match a pattern, although some are simpler than others. A relatively simple regular express that works with your inputs and expected outputs is as follows:
/(?:(?:\A|\s*)\p{L}\S*)+/
This matches one or more strings when all of the following conditions are met:
start-of-string, or zero or more whitespace characters
followed by a Unicode category of "letter"
followed by zero or more non-whitespace characters
The first item in the list, which is the second non-capturing group, is what allows the match to be repeated until a word starts with a non-letter.
The Proofs
regex = /(?:(?:\A|\s*)\p{L}\S*)+/
regex.match 'aa22 b cc 33d ff' #=> #<MatchData "aa22 b cc">
regex.match 'abc def 123ffg' #=> #<MatchData "abc def">
regex.match '66dd eee ff' #=> #<MatchData "">
The sub method can be used to replace with an empty string '' everything that needs to be removed from the expression.
In this case, a first sub method is needed to remove the whole text if it starts with a digit. Then another sub will remove everything starting from any word that starts with a digit.
Answer:
str.sub(/^\d+.*/, '').sub(/\s+\d+.*/, '')
Outputs:
str = "abc def 123ffg"
# => "abc def"
str = "aa22 b cc 33d ff"
# => "aa22 b cc"
str = "66dd eee ff"
# => ""
I know this question has been asked a lot but I need a RegEx for a name validator.
The only requirements are letters are okay, No numbers, and no special characters other than 2 and the spaces cannot be at the beginning or end, the "-" and "`" are allowed also. Everything else would be invalid.
All the other answers seem to ask for a lot more and seem to get too complicated.
Currently I am using
/^([^\d\W]|[-])*$/
But this fails with the space
Sample data:
Pass:
Susan Johnson,
Stephanie Le'Sean,
John Pierre'-Frank
Fail:
Ricky2Good,
Jean,stewie,
Mike#dude,
Jim. McNeil
I've assumed that for a string to be valid, it may contain only uppercase and lowercase letters, apostrophes, dashes and at most two spaces, provided the spaces are not at the beginning or end of the string.
STR= "-a-z'"
r = /
\A # match beginning of string
(?: # begin non-capture group
[#{STR}]+ # match 1+ letters, "-" or "'"
| # or
[#{STR}]+\s[#{STR}]*\s?[#{STR}]+
# match 1+ letters, "-" or "'", space, 0+ letters, "-" or "'",
# optional space, 1+ letters, "-" or "'"
) # end non-capture group
\z # match end of string
/ix # case-indifferent and free-spacing regex definition modes
#=> /
\A # match beginning of string
(?: # begin non-capture group
[-a-z']+ # match 1+ letters, "-" or "'"
| # or
[-a-z']+\s[-a-z']*\s?[-a-z']+
# match 1+ letters, "-" or "'", space, 0+ letters, "-" or "'",
# optional space, 1+ letters, "-" or "'"
) # end non-capture group
\z # match end of string
/ix
If I did not use free-spacing mode to define the regex it would look like this:
r = /\A(?:[-a-z']+|[-a-z']+\s[-a-z']*\s?[-a-z']+)\z/i
"a B-' v" =~ r #=> 0
"aB-'v" =~ r #=> 0
"aB-'1v" =~ r #=> nil
"a B-'1 v" =~ r #=> nil
" a B-1v" =~ r #=> nil
If you wish to return true or false, rather than a truthy value 0 or a falsy value nil, you could write, for example:
("a B-' v" =~ r) ? true : false #=> true
or (the "trick")
!!("a B-' v" =~ r) #=> true
The latter works because it is the same as:
!(!("a B-' v" =~ r))
#=> !(!(0)) => !(false) => true
The question asks for a regex to validate names. Using a regex may be the best, but it's not the only way. If the question is really how to validate names--using a regex or otherwise--it should be stated in a way that doesn't stipulate a particular approach. Here's one way to validate without using a regex.
GOOD_CHARS = ('a'..'z').to_a.join << "'-"
#=> "abcdefghijklmnopqrstuvwxyz'-"
def validate(str)
return false if str.empty? || (str[0]==' ' || str[-1]==' ')
nbr_spaces = str.count(' ')
return false if nbr_spaces > 2
str.downcase.count(GOOD_CHARS) + nbr_spaces == str.size
end
validate "a B-' v" #=> true
validate "aB-'v" #=> true
validate "aB-`1v" #=> false
validate "a B-'1 v" #=> false
validate " a B-'1v" #=> false
The following regex should filter for letters, no special characters (other than one space, dashes, and backticks), and no numbers:
/^[a-zA-Z\-\`]++(?: [a-zA-Z\-\`]++)?$/
Hope it helps!
"peter,nick,jake,jack"
i need to have something like this.
i cannot have any whitespace after the word for example,
"peter,," "peter," "peter,,nick " will all be incorrect.
it has to be just a word such as "peter" or a word follow by a comma then word ("peter,nick")
First confirm that the string has the required structure.
r = /
\A # match the beginning of the string
[[:alpha:]]+ # match > 0 letters
(?:,[[:alpha:]]+) # match a comma then > 0 letters in a non-capture group
* # match the preceding non-capture group >= 0 times
\z # match end of the string
/x # free-spacing regex definition mode
str = "peter,nick,jake,jack"
str =~ r #=> 0
Since it matches the regex, simply split on commas to return an array of the words.
str.split(',') #=> ["peter", "nick", "jake", "jack"]
By contrast:
"peter,nick,,jake,jack" =~ r #=> nil
"peter,nick,jake, jack" =~ r #=> nil
"peter,nick,jake,jack " =~ r #=> nil
"peter ispeter,nick" =~ r #=> nil
I assume the string must contain at least one letter.
I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.