How can I avoid left-recursion in treetop without backtracking? - ruby

I am having trouble avoiding left-recursion in this simple expression parser I'm working on. Essentially, I want to parse the equation 'f x y' into two expressions 'f x' and '(f x) y' (with implicit parentheses). How can I do this while avoiding left-recursion and backtracking? Does there have to be an intermediate step?
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
Treetop.load_from_string DATA.read
parser = ExpressionParser.new
p parser.parse('f x y').value
__END__
grammar Expression
rule equation
expression (w+ expression)*
end
rule expression
expression w+ atom
end
rule atom
var / '(' w* expression w* ')'
end
rule var
[a-z]
end
rule w
[\s\n\t\r]
end
end

You haven't given enough information about your desired result. In particular, do you expect "f(a b) y" to parse as "(f(a(b))) y"? I assume you do... which means that a function not followed by an open parenthesis has arity one.
So you want to say:
rule equation
expression w* var / expression w* parenthesised_list
end
rule parenthesised_list
'(' w* ( expression w* )+ ')'
end
If on the other hand you have external (to the grammar) knowledge of the arity of f, and you want to iterate "expression" exactly that many times - as happens in parsing TeX for example - then you will need to use a semantic predicate &{|s| ...} inside the iterated expression list). Beware that the argument passed to the block of a sempred is not a SyntaxNode (which cannot yet be constructed because this sequence sub-rule has not yet succeeded) but the accumulated array of nodes so far in the sequence. The truthiness of the block return value dictates the parse result and can stop the iteration.
Another tool you might consider using is lookahead (!stuff_I_dont_expect_to_follow or &stuff_that_must_follow).
You can also ask such questions in http://groups.google.com/group/treetop-dev

Related

How to implement addition operator in math parser (ruby)

I'm trying to build my own evaluator for mathematical expressions in ruby, and before doing that am trying to implement a parser to break the expression into a tree(of arrays). It correctly breaks down expressions with parenthesis, but I am having lots of trouble trying to figure out how to make it correctly break up an expression with operator precedence for addition.
Right now, a string like 1+2*3+4 becomes 1+[2*[3+4]] instead of 1+[2*3]+4. I'm trying to do the simplest solution possible.
Here is my code:
#d = 0
#error = false
#manipulate an array by reference
def calc_expr expr, array
until #d == expr.length
c = expr[#d]
case c
when "("
#d += 1
array.push calc_expr(expr, Array.new)
when ")"
#d += 1
return array
when /[\*\/]/
#d +=1
array.push c
when /[\+\-]/
#d+=1
array.push c
when /[0-9]/
x = 0
matched = false
expr[#d]
until matched == true
y = expr.match(/[0-9]+/,#d).to_s
case expr[#d+x]
when /[0-9]/
x+=1
else matched = true
end
end
array.push expr[#d,x].to_i
#d +=(x)
else
unless #error
#error = true
puts "Problem evaluating expression at index:#{#d}"
puts "Char '#{expr[#d]}' not recognized"
end
return
end
end
return array
end
#expression = ("(34+45)+(34+67)").gsub(" ","")
evaluated = calc #expression
puts evaluated.inspect
For fun, here's a fun regex-based 'parser' that uses the nice "inside-out" approach suggested by #DavidLjungMadison. It performs simple "a*b" multiplication and division first, followed by "a+b" addition and subtraction, and then unwraps any number left in parenthesis (a), and then starts over.
For simplicity in the regex I've only chosen to support integers; expanding each -?\d+ to something more robust, and replacing the .to_i with .to_f would allow it to work with floating point values.
module Math
def self.eval( expr )
expr = expr.dup
go = true
while go
go = false
go = true while expr.sub!(/(-?\d+)\s*([*\/])\s*(-?\d+)/) do
m,op,n = $1.to_i, $2, $3.to_i
op=="*" ? m*n : m/n
end
go = true while expr.sub!(/(-?\d+)\s*([+-])\s*(-?\d+)/) do
a,op,b = $1.to_i, $2, $3.to_i
op=="+" ? a+b : a-b
end
go = true while expr.gsub!(/\(\s*(-?\d+)\s*\)/,'\1')
end
expr.to_i
end
end
And here's a bit of testing for it:
tests = {
"1" => 1,
"1+1" => 2,
"1 + 1" => 2,
"1 - 1" => 0,
"-1" => -1,
"1 + -1" => 0,
"1 - -1" => 2,
"2*3+1" => 7,
"1+2*3" => 7,
"(1+2)*3" => 9,
"(2+(3-4) *3 ) * -6 * ( 3--4)" => 42,
"4*6/3*2" => 16
}
tests.each do |expr,expected|
actual = Math.eval expr
puts [expr.inspect,'=>',actual,'instead of',expected].join(' ') unless actual == expected
end
Note that I use sub! instead of gsub! on the operators in order to survive the last test case. If I had used gsub! then "4*6/3*2" would first be turned into "24/6" and thus result in 4, instead of the correct expansion "24/3*2" → "8*2" → 16.
If you really need to do the expression parsing yourself, then you should search for both sides of an expression (such as '2*3') and replace that with either your answer (if you are trying to calculate the answer) or an expression object (such as your tree of arrays, if you want to keep the structure of the expressions and evaluate later). If you do this in the order of precedence, then precedence will be preserved.
As a simplified example, your expression parser should:
Repeatedly search for all inner parens: /(([^)+]))/ and replace that with a call to the expression parser of $1 (sorry about the ugly regexp :)
Now all parens are gone, so you are looking at math operations between numbers and/or expression objects - treat them the same
Search for multiplication: /(expr|number)*(expr|number)/
Replace this with either the answer or encapsulate the two expressions in
a new expression. Again, depending on whether you need the answer now or
if you need the expression tree.
Search for addition: ... etc ...
If you are calculating the answer now then this is easy, each call to the expression parser eventually (after necessary recursion) returns a number which you can just replace the original expression with. It's a different story if you want to build the expression tree, and how you deal with a mixture of strings and expression objects so you can run a regexp on it is up to you, you could encode a pointer to the expression object in the string or else replace the entire string at the outside with an array of objects and use something similar to regexp to search the array.
You should also consider dealing with unary operators: "3*+3"
(It might simplify things if the very first step you take is to convert all numbers to a simple expression object just containing the number, you might be able to deal with unary operators here, but that can involve tricky situations like "-3++1")
Or just find an expression parsing library as suggested. :)

Treetop infinite recursion with negative rule

I have the following treetop grammar:
grammar TestGrammar
rule body
text / expression
end
rule text
not_delimiter*
end
rule expression
delimiter text delimiter
end
rule delimiter
'$'
end
rule not_delimiter
!delimiter
end
end
When I try to parse an expression, eg 'hello world $test$', the script goes in an infinite loop.
The problem seems to come from the not_delimiter rule, as when I remove it the expression get parsed.
What is the problem with this grammar?
Thanks in advance.
The problem seems to be where you are attempting to match:
rule text
not_delimiter*
end
Since the * will also match nothing you have the possibility of matching [^$]*, which I think is what is causing the infinite loop.
Also, you need to match multiple bodies at the starting rule, otherwise it will return nil, since you will only ever match either a text rule or an expression rule but not both.
rule bodies
body+
end
This will parse:
require 'treetop'
Treetop.load_from_string DATA.read
parser = TestGrammarParser.new
p parser.parse "hello world $test$"
__END__
grammar TestGrammar
rule bodies
body+
end
rule body
expression / text
end
rule expression
delimiter text delimiter
end
rule text
not_delimiter+
end
rule not_delimiter
[^$]
end
rule delimiter
'$'
end
end

Treetop grammar issues using regular expressions

I have a simple grammar setup like so:
grammar Test
rule line
(adjective / not_adjective)* {
def content
elements.map{|e| e.content }
end
}
end
rule adjective
("good" / "bad" / "excellent") {
def content
[:adjective, text_value]
end
}
end
rule not_adjective
!adjective {
def content
[:not_adjective, text_value]
end
}
end
end
Let's say my input is "this is a good ball. let's use it". This gives an error, which I'm not mentioning right now because I want to understand the theory about why its wrong first.
So, how do I create rule not_adjective so that it matches anything that is not matched by rule adjective? In general, how to I write I rule (specifically in Treetop) that "doesnt" match another named rule?
Treetop is a parser generator that generates parsers out of a special class of grammars called Parsing Expression Grammars or PEG.
The operational interpretation of !expression is that it succeeds if expression fails and fails if expression succeeds but it consumes NO input.
To match anything that rule expression does not match use the dot operator (that matches anything) in conjunction with the negation operator to avoid certain "words":
( !expression . )* ie. "match anything BUT expression"
The previous answer is incorrect for the OP's question, since it will match any sequence of individual characters up to any adjective. So if you see the string xyzgood, it'll match xyz and a following rule will match the "good" part as an adjective. Likewise, the adjective rule of the OP will match the first three characters of "badge" as the adjective "bad", which isn't what they want.
Instead, the adjective rule should look something like this:
rule adjective
a:("good" / "bad" / "excellent") ![a-z] {
def content
[:adjective, a.text_value]
end
}
end
and the not_adjective rule like this:
rule not_adjective
!adjective w:([a-z]+) {
def content
[:not_adjective, w.text_value]
end
}
end
include handling for upper-case, hyphenation, apostrophes, etc, as necessary. You'll also need white-space handling, of course.

How to define {min,max} matches in treetop peg

With Ruby's regular expressions I could write /[0-9]{3,}/ I can't figure out how to write this in treetop other than:
rule at_least_three_digit_number
[0-9] [0-9] [0-9]+
end
Is there a 'match [at least|most] n' rule for treetop?
It looks like PEGs don't have some of the RE convenience operators, but in return you do get a much more powerful expression matcher.
http://treetop.rubyforge.org/syntactic_recognition.html
A generalised repetition count (minimum, maximum) is also available.
'foo' 2.. matches 'foo' two or more times
'foo' 3..5 matches 'foo' from three to five times
'foo' ..4 matches 'foo' from zero to four times

Most efficient way to process arguments from the command-line in prefix notation

our homework is to write a ruby script who calculate a subset of wordlist depending on the expression.
regular binary operations are
&& And operator
|| Or operator
++ Concatenate operator
! Negation operator
A valid call would be like
./eval.rb wordlist && a c
or
./eval.rb wordlist && || a b c
First call means generate a new wordlist which all words have at least one 'a' and 'c'.
So my question is how do I process the arguemnts in a efficent way? Maybe recursiv?
I'm stuck...
Thanks in advance.
Looks like a grammer with prefix notation. A stack is indeed your friend, and the easiest stack to use is the call stack. For example, given this grammar:
expression ::= number | operand number number
operand ::= '+' | '-'
This is the code to evaluate it:
#!/usr/bin/ruby1.8
#tokens = ['*', 2, '+', 3, 4]
def evaluate
token = #tokens.shift # Remove first token from #tokens
case token
when '*'
return evaluate * evaluate
when '+'
return evaluate + evaluate
else
return token
end
end
puts evaluate # => 14
Although this is Ruby, it's close enough to pseudo-code. I've put explicit returns in, although Ruby does not require them, because it may be clearer to someone who does not know Ruby.
Use a stack. The max size would be the number of arguments.

Resources