Regular expression to match numbers - ruby

I am looking for a regular expression regex to match one of these patterns:
Number followed by x
Number separated by one or more space
I don't know if match is the correct method to achieve the results.
Matching examples:
' 30x '
'30x'
'20 30'
' 20 30 '
'30x'.match(regex).to_a #=> ['30']
'30 40'.match(regex).to_a #=> ['30', '40']
"30".match(regex).to_a # => ["30"]
" 30 ".match(regex).to_a # => ["30"]
"30 40".match(regex).to_a # => ["30", "40"]
Non-matching examples:
'20x 30 '
'x20 '
"30xx".match(regex).to_a # => nil
"30 a".match(regex).to_a # => nil
"30 60x".match(regex).to_a # => nil
"30x 20".match(regex).to_a # => nil
EDIT
Following #TeroTilus advice, this is the use case for this question:
The user will insert how he will pay an debt. Then, we've created a textfield to
easily insert the payment condition. Example:
> "15 20" # Generate 2 bills: First for 15 days and second for 20 days
> "2x" # Generate 2 bills: First for 30 days and second for 60 days
> "2x 30" # Show message of 'Invalid Format'
> "ANY other string" # Show message of 'Invalid Format'

How about:
/^\s*\d+(?:x\s*|\s*\d+)?$/
explanation:
The regular expression:
(?-imsx:^\s*\d+(?:x\s*|\s*\d+)?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
x 'x'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Try string.scan(/(^|\s+)(\d+)x?/).map(&:last), it probably does what you want.

This works for me
\d*x\s*$|\d*[^x] \d[^\s]*

I have newer written a single line of ruby so the syntax of the below might be horible
But, the easiest solution to your problem is; first reduce your first case to your second case, then do your matching for numbers.
Something like;
("20x 30".gsub/^\s*(\d+)x\s*$/,'\1').match(/\b\d+\b/)

This should work for the examples you gave.
^(?:\s*(\d+))+x?\s*$
^ # Match start of string
(?: # Open non-capturing group
\s* # Zero or more spaces at start or between numbers
(\d+) # Capture one or more numbers
) # Close the group
+ # Group should appear one or more times
x? # The final group may have an x directly after it
\s* # Zero or more trailing spaces are allowed
$ # Match the end of the string
Edited to capture the numbers, not sure if you were looking to do this or just match the string.

Related

Ruby regex get the word combo separated by period

I'm trying to use Ruby regex to get word combo like below.
In a example below I only need cases 1-4, * marked them in caps for easy testing. Word in the middle (dbo, bcd) could be anything or nothing like in case#3. I have trouble how to get that double period case#3 working. It's also good to get standalone SALES as word too but probably it's too much for one regex ?Tx all guru .
This is my script which partially working, need add alpha..SALES
s = '1 alpha.dbo.SALES 2 alpha.bcd.SALES 3 alpha..SALES 4 SALES
bad cases 5x alpha.saleS 6x saleSXX'
regex = /alpha+\.+[a-z]+\.?sales/ix
puts 'R: ' + s.scan(regex).to_s
##R: ["alpha.dbo.SALES", "alpha.bcd.SALES"]
s = '1 alpha.dbo.SALES 2 alpha.bcd.SALES 3 alpha..SALES 4 SALES
bad cases 5x alpha.saleS 6x saleSXX 7x alpha.abc.SALES.etc'
regex = /(?<=^|\s)(?:alpha\.[a-z]*\.)?(?:sales)(?=\s|$)/i
puts 'R: ' + s.scan(regex).to_s
Output:
R: ["alpha.dbo.SALES", "alpha.bcd.SALES", "alpha..SALES", "SALES"]
r = /
(?<=\d[ ]) # match a digit followed by a space in a positive lookbehind
(?: # begin a non-capture group
\p{Alpha}+ # match one or more letters
\. # match a period
(?: # begin a non-capture group
\p{Alpha}+ # match one or more letters
\. # match a period
| # or
\. # match a period
) # end non-capture group
)? # end non-capture group and optionally match it
SALES # match string
(?!=[.\p{Alpha}]) # do not match a period or letter (negative lookahead)
/x # free-spacing regex definition mode.
s.scan(r)
#=> ["alpha.dbo.SALES", "alpha.bcd.SALES", "alpha..SALES", "SALES"]
This regular expression is customarily written as follows.
r = /
(?<=\d )(?:\p{Alpha}+\.(?:\p{Alpha}+\.|\.))?SALES(?!=[.\p{Alpha}])/
In free-spacing mode the space must be put in a character class ([ ]); else it would be stripped out.

Ruby parsing and regex

Picked up Ruby recently and have been fiddling around with it. I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line.
Let's say I have an order list that looks strictly like this in this format:
cost: 50 items: book,lamp
One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that.
How can I check for errors in this format using Ruby? This for example should fail my checks:
cost: 60 items:shoes,football
My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. Also doesn't help me check for trailing whitespaces. How do I go about doing this?
You could use the following regular expression.
r = /
\A # match beginning of string
cost:\s # match "cost:" followed by a space
\d+\s # match > 0 digits followed by a space
items:\s # match "items:" followed by a space
[[:alpha:]]+ # match > 0 lowercase or uppercase letters
(?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase
# letters in a non-capture group (?: ... )
* # perform the match on non-capture group >= 0 times
\z # match the end of the string
/x # free-spacing regex definition mode
"cost: 50 items: book,lamp" =~ r #=> 0 (a match, beginning at index 0)
"cost: 50 items: book,lamp,table" =~ r #=> 0 (a match, beginning at index 0)
"cost: 60 items:shoes,football" =~ r #=> nil (no match)
The regex can can of course be written in the normal manner:
r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/
or
r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/
though a whitespace character (\s) cannot be replaced by a space in the free-spacing mode definition (\x).

Match exact word in string ruby regex

I have this string in ruby and I'm trying to make a match for sin
"please don't share this: 234-604-142"
"please don't share this: 234-604-1421"
so far I have
\d{3}-\d{3}-\d{3}
but this will also match the second case. Adding a ^ and $ for begins and ends will cause this not to function at all.
To fix this I could do this:
x = "please don't share this: 234-604-1421"
x.split.last ~= /^\d{3}-\d{3}-\d{3}$/
but is there a way to match the sin otherwise?
Another option worth considering for your rule is
(?<!\d)\d{3}-\d{3}-\d{3}(?!\d)
That way, if for any reason your phone number is followed or preceded by letters, as in
"please don't share this number234-604-142because it's private"
the regex will still work.
Explain Regex
(?<! # look behind to see if there is not:
\d # digits (0-9)
) # end of look-behind
\d{3} # digits (0-9) (3 times)
- # '-'
\d{3} # digits (0-9) (3 times)
- # '-'
\d{3} # digits (0-9) (3 times)
(?! # look ahead to see if there is not:
\d # digits (0-9)
) # end of look-ahead

Need a regex pattern to match date with optional time

I need a regex pattern which matches a date with optional time.
The date should be a valid U.S. date in m/d/yyyy format. The time should be h:mm:ss am/pm or 24-hour time hh:mm:ss.
Matches: 9/1/2011 | 9/1/2011 10:00 am | 9/1/2011 10:00 AM | 9/1/2011 10:00:00
This pattern will be used in a Ruby on Rails project, so it should be in a format usable via Ruby. See http://rubular.com/ for testing.
Here's my existing date pattern (which may be an over-kill):
DATE_PATTERN = /^((((0[13578])|([13578])|(1[02]))[\/](([1-9])|([0-2][0-9])|(3[01])))|(((0[469])|([469])|(11))[\/](([1-9])|([0-2][0-9])|(30)))|((2|02)[\/](([1-9])|([0-2][0-9]))))[\/]\d{4}$|^\d{4}/
Regular expressions are horrible for this kind of job. If you're using Ruby I'd recommend using DateTime.strptime to parse the data and check its validity:
def validate_date(date_str)
valid_formats = ["%m/%d/%Y", "%m/%d/%Y %I:%M %P"]
#see http://www.ruby-doc.org/core-1.9.3/Time.html#method-i-strftime for more
valid_formats.each do |format|
valid = Time.strptime(date_str, format) rescue false
return true if valid
end
return false
end
Well, here's what I ended up with; using stricter military time:
DATE_TIME_FORMAT = /^([0,1]?\d{1})\/([0-2]?\d{1}|[3][0,1]{1})\/([1]{1}[9]{1}[9]{1}\d{1}|[2-9]{1}\d{3})\s([0]?\d|1\d|2[0-3]):([0-5]\d):([0-5]\d)$/
Matches: 1/19/2011 23:59:59
Captures:
1
19
2011
23
59
59
if subject =~ /\A(?:0?[1-9]|1[012])\/(?:0?[1-9]|[12]\d|3[01])\/(?:\d{4})(?:\s+(?:(?:[01]?\d|2[0-3]):(?:[0-5]\d)|(?:0?\d|1[0-2]):(?:[0-5]\d)\s+[ap]m))?\s*\Z/i
# Successful match
Good luck..
How it works :
"
^ # Assert position at the beginning of the string
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
0 # Match the character “0” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[1-9] # Match a single character in the range between “1” and “9”
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
1 # Match the character “1” literally
[012] # Match a single character present in the list “012”
)
/ # Match the character “/” literally
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
0 # Match the character “0” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[1-9] # Match a single character in the range between “1” and “9”
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
[12] # Match a single character present in the list “12”
\d # Match a single digit 0..9
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
3 # Match the character “3” literally
[01] # Match a single character present in the list “01”
)
/ # Match the character “/” literally
(?: # Match the regular expression below
\d # Match a single digit 0..9
{4} # Exactly 4 times
)
(?: # Match the regular expression below
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
[01] # Match a single character present in the list “01”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
2 # Match the character “2” literally
[0-3] # Match a single character in the range between “0” and “3”
)
: # Match the character “:” literally
(?: # Match the regular expression below
[0-5] # Match a single character in the range between “0” and “5”
\d # Match a single digit 0..9
)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
0 # Match the character “0” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
1 # Match the character “1” literally
[0-2] # Match a single character in the range between “0” and “2”
)
: # Match the character “:” literally
(?: # Match the regular expression below
[0-5] # Match a single character in the range between “0” and “5”
\d # Match a single digit 0..9
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
[ap] # Match a single character present in the list “ap”
m # Match the character “m” literally
)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"
Remember is not the spoon that bents but you!
Here's what I came up with that seems to work:
regex = /^1?\d{1}\/[123]?\d{1}\/\d{4}(\s[12]?\d:[0-5]\d(:[0-5]\d)?(\s[ap]m)?)?$/

Ruby regex extracting words

I'm currently struggling to come up with a regex that can split up a string into words where words are defined as a sequence of characters surrounded by whitespace, or enclosed between double quotes. I'm using String#scan
For instance, the string:
' hello "my name" is "Tom"'
should match the words:
hello
my name
is
Tom
I managed to match the words enclosed in double quotes by using:
/"([^\"]*)"/
but I can't figure out how to incorporate the surrounded by whitespace characters to get 'hello', 'is', and 'Tom' while at the same time not screw up 'my name'.
Any help with this would be appreciated!
result = ' hello "my name" is "Tom"'.split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/)
will work for you. It will print
=> ["", "hello", "\"my name\"", "is", "\"Tom\""]
Just ignore the empty strings.
Explanation
"
\\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
(?: # Match the regular expression below
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\" # Match the character “\"” literally
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\" # Match the character “\"” literally
)* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\$ # Assert position at the end of a line (at the end of the string or before a line break character)
)
"
You can use reject like this to avoid empty strings
result = ' hello "my name" is "Tom"'
.split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/).reject {|s| s.empty?}
prints
=> ["hello", "\"my name\"", "is", "\"Tom\""]
text = ' hello "my name" is "Tom"'
text.scan(/\s*("([^"]+)"|\w+)\s*/).each {|match| puts match[1] || match[0]}
Produces:
hello
my name
is
Tom
Explanation:
0 or more spaces followed by
either
some words within double-quotes OR
a single word
followed by 0 or more spaces
You can try this regex:
/\b(\w+)\b/
which uses \b to find the word boundary. And this web site http://rubular.com/ is helpful.

Resources