Ruby evaluate an expression with regex, no eval - ruby

Sorry if this is duplicated. I thought I'd reword my question a little bit.
How could I use regex to evaluate a mathematical expression? Without using the eval function.
Example expressions:
math1 = "1+1"
math2 = "3+2-1"
I would like it to work for a variable number of numbers in the expression like I showed in the example.

This is just a bad idea. Regexp is not a parser, nor an evaluator.
Use a grammar to describe your expressions. Parse it with a formal parser like the lovely ruby gem Treetop. Then evaluate the abstract syntax tree (AST) produced by the parser.
Gosh, Treetop's arithmetic example practically gives you the solution for free.

This is a little late, but I wrote a gem for evaluating arbitrary mathematical expressions (and it doesn't use eval internally): https://github.com/rubysolo/dentaku

For addition and subtraction, this should work
(?:(/d+)([-+]))+(/d+)
This means:
one or more digits, followed by exactly one plus or minus
the above can be repeated as many times as required (this is a non capturing group)
and then must end with one or more digits.
Note that each individual number and sign are captured in groups 1..n
So to evaluate, you could take captures 1 and 3, applying the sign from capture 2. Then apply the sign from capture 4 (if it exists) with the previous result and the number from capture 5 (which must exist if capture 4 exists) and so on...
So to evaluate, in psuedo code:
i=1
result=capture(i)
loop while i <= (n-2) (where n is the capture count):
If capture(i+1) == "-" // is subtraction
result = result - capture(i+2)
Else // is addition
result = result + capture(i+2)
End if
i = i + 2
End while
This is only going to work for simple addition and subtraction like in the examples you provided, as it relies on left to right associativity. As others have suggested, you'll probably need to properly parse anything more complex, eg by building a tree of nodes that can then be evaluated in the correct (depth-first?) order.

This is really messy…
math2 = "12+3-4"
head, *tail = math2.scan(/(?<digits>\d+)(?<op>[\+\-\*\/])?/)
.map{|(digits,op)|
[digits.to_i,op]
}
.reverse
tail.inject(head.first){|sum,(digits,op)|
op.nil? ?
digits :
digits.send(op,sum)
}
# => 11
You should really consider a parser though.

Related

Perform subtraction within regular expression

I have the following RSpec output:
30 examples, 15 failures
I would like to subtract the second number from the first. I have this code:
def capture_passing_score(output)
captures = output.match(/^(?<total>\d+)\s*examples,\s*(?<failed>\d+)\s*failures$/)
captures[:total].to_i - captures[:failed].to_i
end
I am wondering if there is a way to do the calculation within a regular expression. Ideally, I'd avoid the second step in my code, and subtract the numbers within a regex. Performing mathematical operations may not be possible with Ruby's (or any) regex engine, but I couldn't find an answer either way. Is this possible?
Nope.
By every definition I have ever seen, Regular Expressions are about text processing. It is character based pattern matching. Numbers are a class of textual characters in Regex and do not represent their numerical values. While syntactic sugar may mask what is actually being done, you still need to convert the text to a numeric value to perform the subtraction.
WikiPedia
RubyDoc
If you know the format is going to remain consistent, you could do something like this:
output.scan(/\d+/).map(&:to_i).inject(:-)
It's not doing the subtraction via regex, but it does make it more concise.

Why do we need prefix, postfix notation

I know how each of them can be converted to one another but never really understood what their applications are. The usual infix operation is quite readable, but where does it fail which led to inception of prefix and postfix notation
Infix notation is easy to read for humans, whereas pre-/postfix notation is easier to parse for a machine. The big advantage in pre-/postfix notation is that there never arise any questions like operator precedence.
For example, consider the infix expression 1 # 2 $ 3. Now, we don't know what those operators mean, so there are two possible corresponding postfix expressions: 1 2 # 3 $ and 1 2 3 $ #. Without knowing the rules governing the use of these operators, the infix expression is essentially worthless.
Or, to put it in more general terms: it is possible to restore the original (parse) tree from a pre-/postfix expression without any additional knowledge, but the same isn't true for infix expressions.
Postfix notation, also known as RPN, is very easy to process left-to-right. An operand is pushed onto a stack; an operator pops its operand(s) from the stack and pushes the result. Little or no parsing is necessary. It's used by Forth and by some calculators (HP calculators are noted for using RPN).
Prefix notation is nearly as easy to process; it's used in Lisp.
At least for the case of the prefix notation: The advantage of using a prefix operator is that syntactically, it reads as if the operator is a function call
Another aspect of prefix/postfix vs. infix is that the arity of the operator (how many arguments it is applied to) no longer has to be limited to exactly 2. It can be more, or sometimes less (0 or 1 when defaults are implied naturally, like zero for addition/subtraction, one for multiplication/division).

Need a regular expression to match a dynamically-determined repeat count

I need a ruby regexp pattern that matches a string containing a letter (for simplicity say 'a') n times and then n at the end.
For example, it should match "aaa3", "aaaa4" etc but not "a2" or "aaa1", etc.
I can do it in Perl, but not in Ruby.
/^(a+)(??{length($1)})$/
Fun, eh?
Check it out: http://ideone.com/ShB6C
That is not possible in regex since it is not a regular language (that's easy to prove with the Pumping Lemma for Regular Languages). I'm not sure how much more powerful ruby regex is than a true Regular Expression, but I doubt it's powerful enough for this. You can set a finite limit on it and state each possibility like:
a1|aa2|aaa3|aaaa4|aaaaa5||aaaaaa6||aaaaaaa7||aaaaaaaa8||aaaaaaaaa9
Since all finite lanugages are regular, but it would be much easy to use string operations to count the number of times a letter appears and then parse for that integer in the string right after the last of that letter.
I just woke up, so take this with a grain of salt, but instead of doing it with a single regex, an easy way to do it would be
def f(s)
s =~ /(a+)(\d)/
$1.size == $2.to_i
end #=> nil
f 'aaa3' #=> true
f 'aa3' #=> false

Regular Expression help in Ruby

Can anybody help me write a regular expression which could find all the instances of the following in a long string >
type="array" count="x" total="y"
where x and y could be any numbers from 1 to 100.
language is ruby.
First, since we'll use the regex for a number twice, we'll save it as its own variable. Note that the number regex is comprised of three separate pieces: one-digit numbers, two-digit numbers, and three-digit numbers. This is a good rule of thumb to use when trying to make a regex to match a range of numbers. It's easy to get it wrong otherwise (allowing strings like "07").
Once you have the number regex, the rest is easy.
number = /[1-9]|[1-9][0-9]|100/
regex = /type="array" count="#{number}" total="#{number}"/
string.scan(regex)
This will return an array of matches
long_string.scan(/type="array" count="(?:[1-9]\d?|100)" total="(?:[1-9]\d?|100)")

How to get rid of unnecessary parentheses in mathematical expression

Hi I was wondering if there is any known way to get rid of unnecessary parentheses in mathematical formula. The reason I am asking this question is that I have to minimize such formula length
if((-if(([V].[6432])=0;0;(([V].[6432])-([V].[6445]))*(((([V].[6443]))/1000*([V].[6448])
+(([V].[6443]))*([V].[6449])+([V].[6450]))*(1-([V].[6446])))))=0;([V].[6428])*
((((([V].[6443]))/1000*([V].[6445])*([V].[6448])+(([V].[6443]))*([V].[6445])*
([V].[6449])+([V].[6445])*([V].[6450])))*(1-([V].[6446])));
it is basically part of sql select statement. It cannot surpass 255 characters and I cannot modify the code that produces this formula (basically a black box ;) )
As you see many parentheses are useless. Not mentioning the fact that:
((a) * (b)) + (c) = a * b + c
So I want to keep the order of operations Parenthesis, Multiply/Divide, Add/Subtract.
Im working in VB, but solution in any language will be fine.
Edit
I found an opposite problem (add parentheses to a expression) Question.
I really thought that this could be accomplished without heavy parsing. But it seems that some parser that will go through the expression and save it in a expression tree is unevitable.
If you are interested in remove the non-necessary parenthesis in your expression, the generic solution consists in parsing your text and build the associated expression tree.
Then, from this tree, you can find the corresponding text without non-necessary parenthesis, by applying some rules:
if the node is a "+", no parenthesis are required
if the node is a "*", then parenthesis are required for left(right) child only if the left(right) child is a "+"
the same apply for "/"
But if your problem is just to deal with these 255 characters, you can probably just use intermediate variables to store intermediate results
T1 = (([V].[6432])-([V].[6445]))*(((([V].[6443]))/1000*([V].[6448])+(([V].[6443]))*([V].[6449])+([V].[6450]))*(1-([V].[6446])))))
T2 = etc...
You could strip the simplest cases:
([V].[6432]) and (([V].[6443]))
Becomes
v.[6432]
You shouldn't need the [] around the table name or its alias.
You could shorten it further if you can alias the columns:
select v.[6432] as a, v.[6443] as b, ....
Or even put all the tables being queried into a single subquery - then you wouldn't need the table prefix:
if((-if(a=0;0;(a-b)*((c/1000*d
+c*e+f)*(1-g))))=0;h*
(((c/1000*b*d+c*b*
e+b*f))*(1-g));
select [V].[6432] as a, [V].[6445] as b, [V].[6443] as c, [V].[6448] as d,
[V].[6449] as e, [V].[6450] as f,[V].[6446] as g, [V].[6428] as h ...
Obviously this is all a bit psedo-code, but it should help you simplify the full statement
I know this thread is really old, but as it is searchable from google.
I'm writing a TI-83 plus calculator program that addresses similar issues. In my case, I'm trying to actually solve the equation for a specific variable in number, but it may still relate to your problem, although I'm using an array, so it might be easier for me to pick out specific values...
It's not quite done, but it does get rid of the vast majority of parentheses with (I think), a somewhat elegant solution.
What I do is scan through the equation/function/whatever, keeping track of each opening parenthese "(" until I find a closing parenthese ")", at which point I can be assured that I won't run into any more deeply nested parenthese.
y=((3x + (2))) would show the (2) first, and then the (3x + (2)), and then the ((3x + 2))).
What it does then is checks the values immediately before and after each parenthese. In the case above, it would return + and ). Each of these is assigned a number value. Between the two of them, the higher is used. If no operators are found (*,/,+,^, or -) I default to a value of 0.
Next I scan through the inside of the parentheses. I use a similar numbering system, although in this case I use the lowest value found, not the highest. I default to a value of 5 if nothing is found, as would be in the case above.
The idea is that you can assign a number to the importance of the parentheses by subtracting the two values. If you have something like a ^ on the outside of the parentheses
(2+3)^5
those parentheses are potentially very important, and would be given a high value, (in my program I use 5 for ^).
It is possible however that the inside operators would render the parentheses very unimportant,
(2)^5
where nothing is found. In that case the inside would be assigned a value of 5. By subtracting the two values, you can then determine whether or not a set of parentheses is neccessary simply by checking whether the resulting number is greater than 0. In the case of (2+3)^5, a ^ would give a value of 5, and a + would give a value of 1. The resulting number would be 4, which would indicate that the parentheses are in fact needed.
In the case of (2)^5 you would have an inner value of 5 and an outer value of 5, resulting
in a final value of 0, showing that the parentheses are unimportant, and can be removed.
The downside to this is that, (at least on the TI-83) scanning through the equation so many times is ridiculously slow. But if speed isn't an issue...
Don't know if that will help at all, I might be completely off topic. Hope you got everything up and working.
I'm pretty sure that in order to determine what parentheses are unnecessary, you have to evaluate the expressions within them. Because you can nest parentheses, this is is the sort of recursive problem that a regular expression can only address in a shallow manner, and most likely to incorrect results. If you're already evaluating the expression, maybe you'd like to simplify the formula if possible. This also gets kind of tricky, and in some approaches uses techniques that that are also seen in machine learning, such as you might see in the following paper: http://portal.acm.org/citation.cfm?id=1005298
If your variable names don't change significantly from 1 query to the next, you could try a series of replace() commands. i.e.
X=replace([QryString],"(([V].[6443]))","[V].[6443]")
Also, why can't it surpass 255 characters? If you are storing this as a string field in an Access table, then you could try putting half the expression in 1 field and the second half in another.
You could also try parsing your expression using ANTLR, yacc or similar and create a parse tree. These trees usually optimize parentheses away. Then you would just have to create expression back from tree (without parentheses obviously).
It might take you more than a few hours to get this working though. But expression parsing is usually the first example on generic parsing, so you might be able to take a sample and modify it to your needs.

Resources