Ruby find a whole math expression in a string using RegEx [duplicate] - ruby

This question already has answers here:
Regular expression to match digits and basic math operators
(10 answers)
Closed 8 years ago.
I'm trying to write a program that will take in a string and use RegEx to search for certain mathematical expressions, such as 1 * 3 + 4 / 2. Only operators to look for are [- * + /].
so far:
string = "something something nothing 1/ 2 * 3 nothing hello world"
a = /\d+\s*[\+ \* \/ -]\s*\d+/
puts a.match(string)
produces:
1/ 2
I want to grab the whole equation 1/ 2 * 3. I'm essentially brand new to the world of regex, so any help will be appreciated!
New Information:
a = /\s*-?\d+(?:\s*[-\+\*\/]\s*\d+)+/
Thank you to zx81 for his answer. I had to modify it in order to work. For some reason ^ and $ do not produce any output, or perhaps a nil output, for a.match(string). Also, certain operators need a \ before them.
Version to work with parenthesis:
a = /\(* \s* \d+ \s* (( [-\+\*\/] \s* \d+ \)* \s* ) | ( [-\+\*\/] \s* \(* \s* \d+ \s* ))+/

Regex Calculators
First off, you might want to have a look at this question about Regex Calculators (both RPN and non-RPN version).
But we're not dealing with parentheses, so we can go with something like:
^\s*-?\d+(?:\s*[-+*/]\s*\d+)+$
See demo.
Explanation
The ^ anchor asserts that we are at the beginning of the string
\s* allows optional spaces
-? allows an optional minus before the first digit
\d+ matches the first digits
The non-capturing group (?:\s*[-+*/]\s*\d+) matches optional spaces, an operator, optional spaces and digits
the + quantifier matches that one or more times
The $ anchor asserts that we are at the end of the string

Related

Regex for selecting substrings before and after a string

I am trying to find a right regex expression to select substrings between another substring, which I'd like to exclude. For example in this string:
11 - 12£ in $ + 13
I want to select 12£ and $. Basically, it's substrings around in, until I hit an array of values I want to use as end/start, in this case, arithmetic operators %w(+ - / *)
So far closest I got was using this regex /(.\d\p{Sc})\sin\s(\p{Sc})/
Some more examples:
10 - 12$ in £ - 13$ should return 12$ and £
12 $ in £ should return 12$ and £
100£in$ should return 100£ and $
sentence.match(/[^-+*\/]*in[^-+*\/]*/).to_s.strip.split(/ *in */)
[^-+*\/]* matches multiple non-arithmetic operators
this will hence get everything from the "opening" to the "closing" operator that surround an in
#strip removes the leading and trailing whitespaces
finally, split into two strings, removing in and the spaces around it
r = /
\s+[+*\/-]\s+ # match 1+ whitespaces, 1 char in char class, 1+ whitespaces
(\S+) # match 1+ non-whitespaces in capture group 1
\s+in\s+ # match 1+ whitespaces, 'in', 1+ whitespaces
(\S+) # match 1+ non-whitespaces in capture group 2
\s+[+*\/-]\s # match 1+ whitespaces, 1 char in char class, 1+ whitespaces
/x # free-spacing regex definition mode
str = '11 - 12£ in $ + 13 / 13F in % * 4'
str.scan(r)
#=> [["12£", "$"], ["13F", "%"]]
See the doc for String#scan to see how scan handles capture groups.
Note that '-' must be first or last in the character class [+*\/-].

How exactly does this work string.split(/\?|\.|!/).size?

I know, or at least I think I know, what this does (string.split(/\?|\.|!/).size); splits the string at every ending punctuation into an array and then gets the size of the array.
The part I am confused with is (/\?|\.|!/).
Thank you for your explanation.
Regular expressions are surrounded by slashes / /
The backslash before the question mark and dot means use those characters literally (don't interpret them as special instructions)
The vertical pipes are "or"
So you have / then question mark \? then "or" | then period \. then "or" | then exclamation point ! then / to end the expression.
/\?|\.|!/
It's a Regular Expression. That particular one matches any '?', '.' or '!' in the target string.
You can learn more about them here: http://regexr.com/
A regular expression splitting on the char "a" would look like this: /a/. A regular expression splitting on "a" or "b" is like this: /a|b/. So splitting on "?", "!" and "." would look like /?|!|./ - but it does not. Unfortunately, "?", and "." have special meaning in regexps which we do not want in this case, so they must be escaped, using "\".
A way to avoid this is to use Regexp.union("?","!",".") which results in /\?|!|\./
(/\?|\.|!/)
Working outside in:
The parentheses () captures everything enclosed.
The // tell Ruby you're using a Regular Expression.
\? Matches any ?
\. Matches any .
! Matches any !
The preceding \ tells Ruby we want to find these specific characters in the string, rather than using them as special characters.
Special characters (that need to be escaped to be matched) are:
. | ( ) [ ] { } + \ ^ $ * ?.
There is a nice guide to Ruby RegEx at:
http://rubular.com/ & http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm
For SO answers that involve regular expressions, I often use the "extended" mode, which makes them self-documenting. This one would be:
r = /
\? # match a question mark
| # or
\. # match a period
| # or
! # match an explamation mark
/x # extended mode
str = "Out, damn'd spot! out, I say!—One; two: why, then 'tis time to " +
"do't.—Hell is murky.—Fie, my lord, fie, a soldier, and afeard?"
str.split(r)
#=> ["Out, damn'd spot",
# " out, I say",
# "—One; two: why, then 'tis time to do't",
# "—Hell is murky",
# "—Fie, my lord, fie, a soldier, and afeard"]
str.split(r).size #=> 5
#steenslag mentioned Regexp::union. You could also use Regexp::new to write (with single quotes):
r = Regexp.new('\?|\.|!')
#=> /\?|\.|!/
but it really doesn't buy you anything here. You might find it useful in other situations, however.

Splitting the content of brackets without separating the brackets ruby

I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"

Ruby regex need to exclude pattern

I have the following strings
ALEXANDRITE OVAL 5.1x7.9 GIA# 6167482443 FINE w:1.16
ALEXANDRITE OVAL 4x6 FINE w:1.16
I want to match the 5.1 and 7.9 and the 4 and 6 and not w:1.16 or w: 1.16 or the 6167482443. So far I managed to come up with these:
Matching the w:1.16 w: 1.16
([w][:]\d\.?\d*|[w][:]\s?\d\.?\d*)
Matching the other digits:
\d+\.?\d{,3}
I kind of expected this not the return the long number sequence because of the {,3} but it still does.
My questions are :
1. How do I combine the two patterns excluding one and returning the other?
2. How do I exclude the long sequence of numbers? Why is it not being excluded now?
Thanks!
You could simply use the below regex.
\b(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)
DEMO
Explanation:
\b the boundary between a word char (\w) and
something that is not a word char
( group and capture to \1:
\d+ digits (0-9) (1 or more times)
(?: group, but do not capture (optional):
\. '.'
\d+ digits (0-9) (1 or more times)
)? end of grouping
) end of \1
x 'x'
( group and capture to \2:
\d+ digits (0-9) (1 or more times)
(?: group, but do not capture (optional):
\. '.'
\d+ digits (0-9) (1 or more times)
)? end of grouping
) end of \2
([\d\.])+x([\d\.])+
matches
5.1x7.9
4x6
(\d+(?:\.\d+)?)(?=x)|(?<=x)(\d+(?:\.\d+)?)
You can try this.See demo.
http://regex101.com/r/wQ1oW3/6
2)To ignore the long string you have to use \b\d{1,3}\b to specify boundaries.
http://regex101.com/r/wQ1oW3/7
Or else a part of long string will match.

Ruby REGEX for letters and numbers or letters followed by period, letters and numbers

I am trying to construct a Ruby REGEX that will only allow the following:
some string (read letter only characters)
some string followed by numbers
some string followed by a period and another string
some string followed by a period and another string followed by numbers
period is only allowed if another string follows it
no other periods are allowed afterwards
numbers may only be at the very end
I have got \A[[^0-9.]a-z]*([0-9]*|((.)([[^0-9]a-z]*)[0-9]*))\z but I can't get what I need. This allows:
test.
test..
test.123
What is the correct REGEX? If someone could explain what I am doing wrong to help me understand for future that would be great too.
Edit: update requirements to be more descriptive
So I'm guessing you want identifiers separated by ..
By identifier I mean:
a string consisting of alphanumeric characters
that does not start with a number
and is atleast one characer long.
Written out as a grammar, it would look something like this:
EXPR := IDENT "." EXPR | IDENT
IDENT := [A-Z]\w*
And the regex for this would be the following:
/\A[A-Z]\w*(\.[A-Z]\w*)*\Z/i
Try it out here
Note Due to the behaviour of \w this pattern will also accept _ (underscores) after the first character (i.e. test_123 will also pass).
EDIT to reflect update of question
So the grammar you want is actually like this:
EXPR := IDENT [0-9]*
IDENT := STR | STR "." STR
STR := [A-Z]+
And the regexp then is this:
/\A[A-Z]+(\.[A-Z]+)?[0-9]*\z/i
Try this one out here
The explanation is as follows:
/ # start Regexp
\A # start of string
[A-Z]+ # "some string"
(
\. # followed by a period
[A-Z]+ # and another string
)? # period + another string is optional
[0-9]* # optional digits at the end
\z # end of string
/i # this regexp is case insensitive.
You can try
^[a-z]+\.?[a-z]+[0-9]*$
Here is demo
Note: use \A and \z to match starting and ending of string instead of line.
You need to escape . that matches any single character.
Pattern explanation:
^ the beginning of the line
[a-z]+ any character of: 'a' to 'z' (1 or more times)
\.? '.' (optional)
[a-z]+ any character of: 'a' to 'z' (1 or more times)
[0-9]* any character of: '0' to '9' (0 or more times)
$ the end of the line

Resources