Treetop seems to fail on a simple grammar (5 rules) - ruby

I am trying to write a parser for a subset of C.
The behavior of treetop is difficult to analyze on this simple (further simplified) grammar.
grammar Shyc
rule functionDef
type space identifier '(' ')' bloc
end
rule type
'int'
end
rule bloc
'{' '}'
end
rule identifier
[a-zA-Z] [a-zA-Z_]*
end
rule space
[\s]+
end
end
My test case is "int main(){}"
And the error message from treetop is :
error at line 1, column 9
failure reason : Expected [a-zA-Z_] at line 1, column 9 (byte 9) after
compiler.rb:25:in `parse': Parse error (RuntimeError)
from compiler.rb:73:in `<main>'enter
The problem is thus around identifier rule...
The version of treetop : 1.5.3 and Ruby 2.1.1
Any idea ?

The problem was that my test case was in a separate file, with a supplemental end-of-line \n at the end, and that the grammar tested here does not specify how to consume that.
Here is the code that solve the problem. As discussed here on the mailing list of Treetop, the error is weird and somehow misleading but it is difficult in general to automate the emission of a clear message.
grammar Shyc
rule functionDef
type space identifier '(' ')' bloc space?
end
rule type
'int'
end
rule bloc
'{' '}'
end
rule identifier
[a-zA-Z] [a-zA-Z_]*
end
rule space
[\s\n]+
end
end

Related

syntax error, unexpected keyword_else, expecting ':' - RUBY

prompt("Welcome to Calculator! Enter your name:")
name = ''
loop do
name = Kernel.gets().chomp()
if name.empty()?
prompt("Make sure to use a valid name")
else
break
end
end
Not sure what I'm missing here.
I got this error messsage:
syntax error, unexpected keyword_else, expecting ':'
Try out
if name.empty?
Note that you can call methods that have no params without parentheses. Otherwise you should do name.empty?() because ? is part of the name of the method.
Anyway, your mistake is the ? after the if condition. The error message is saying you that with that ? it's trying to process a ternary operator that has this syntax
condition ? expression1 : expression2
for this reason it expects :
Line
if name.empty()?
is interpreted as ternary 'if' operator inside regular if statement:
if (name.empty() ? do_somethig : do_something_else )
and double dot is missing in your code
maybe you meant this:
if name.empty? # is equal to
if name.empty?()
Because question mark is a part of method name

Stack level too deep using Citrus for Parsing Expression Grammar

I'm trying to process what will eventually be Boolean logic using a grammar in Treetop-like Citrus for Ruby. I'm getting a recursion issue, but I'm unclear on exactly why. Here's the text I'm trying to process (it should have a newline at the very end):
COMMANDWORD # MYCOMMENT
Here's my Citrus grammar (intended to deal with more advanced stuff):
grammar Grammar
rule commandset
command+
end
rule command
identifier command_detail* comment_to_eol* "\n"
end
rule command_detail
assign_expr | expr
end
rule assign_expr
identifier ":=" expr
end
rule expr
# Stack overflow
or_expr | gtor_expr
# No problem!
# or_expr
end
rule or_expr
# Temporarily match everything except a comment...
[^#]+
# What I think will be correct in the future...
# gtor_expr "OR" expr
end
rule gtor_expr
and_expr | gtand_expr
end
rule and_expr
gtand_expr "AND" gtor_expr
end
rule gtand_expr
not_expr | gtnot_expr
end
rule not_expr
"NOT" gtnot_expr | gtand_expr
end
rule gtnot_expr
parens_expr | identifier
end
rule parens_expr
"(" expr ")"
end
rule identifier
ws* [a-zA-Z0-9]+ ws*
end
rule ws
[ ]
end
rule comment_to_eol
"#" [^\n]*
end
end
The important things are in the rule expr and the rule or_expr. I've altered or_expr so it matches everything except a comment. If I stick with the current expr rule I get a stack overflow. But if I switch it so it doesn't have the choice between or_expr and gtor_expr it works fine.
My understanding of the "choice" is that it will try to evaluate them in order. If the first choice fails, then it will try the second. In this case, the first choice is obviously capable of succeeding, so why do I get a stack overflow if I include a second choice that should never be taken?
the loop may result because gtand_expr -> not_expr, and not_expr -> gtand_expr.
I think you can replace not_expr's rule with
not_expr -> "NOT" not_expr | gtnot_expr
And you should try simpler rules using regexp operators:
expr -> orexpr
orexpr -> andexpr ("OR" andexpr)*
andexpr -> notexpr ("AND" notexpr)*
notexpr -> "NOT"? atomicexpr
atomicexpr -> id | "(" expr ")"

Treetop infinite recursion with negative rule

I have the following treetop grammar:
grammar TestGrammar
rule body
text / expression
end
rule text
not_delimiter*
end
rule expression
delimiter text delimiter
end
rule delimiter
'$'
end
rule not_delimiter
!delimiter
end
end
When I try to parse an expression, eg 'hello world $test$', the script goes in an infinite loop.
The problem seems to come from the not_delimiter rule, as when I remove it the expression get parsed.
What is the problem with this grammar?
Thanks in advance.
The problem seems to be where you are attempting to match:
rule text
not_delimiter*
end
Since the * will also match nothing you have the possibility of matching [^$]*, which I think is what is causing the infinite loop.
Also, you need to match multiple bodies at the starting rule, otherwise it will return nil, since you will only ever match either a text rule or an expression rule but not both.
rule bodies
body+
end
This will parse:
require 'treetop'
Treetop.load_from_string DATA.read
parser = TestGrammarParser.new
p parser.parse "hello world $test$"
__END__
grammar TestGrammar
rule bodies
body+
end
rule body
expression / text
end
rule expression
delimiter text delimiter
end
rule text
not_delimiter+
end
rule not_delimiter
[^$]
end
rule delimiter
'$'
end
end

How can I avoid left-recursion in treetop without backtracking?

I am having trouble avoiding left-recursion in this simple expression parser I'm working on. Essentially, I want to parse the equation 'f x y' into two expressions 'f x' and '(f x) y' (with implicit parentheses). How can I do this while avoiding left-recursion and backtracking? Does there have to be an intermediate step?
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
Treetop.load_from_string DATA.read
parser = ExpressionParser.new
p parser.parse('f x y').value
__END__
grammar Expression
rule equation
expression (w+ expression)*
end
rule expression
expression w+ atom
end
rule atom
var / '(' w* expression w* ')'
end
rule var
[a-z]
end
rule w
[\s\n\t\r]
end
end
You haven't given enough information about your desired result. In particular, do you expect "f(a b) y" to parse as "(f(a(b))) y"? I assume you do... which means that a function not followed by an open parenthesis has arity one.
So you want to say:
rule equation
expression w* var / expression w* parenthesised_list
end
rule parenthesised_list
'(' w* ( expression w* )+ ')'
end
If on the other hand you have external (to the grammar) knowledge of the arity of f, and you want to iterate "expression" exactly that many times - as happens in parsing TeX for example - then you will need to use a semantic predicate &{|s| ...} inside the iterated expression list). Beware that the argument passed to the block of a sempred is not a SyntaxNode (which cannot yet be constructed because this sequence sub-rule has not yet succeeded) but the accumulated array of nodes so far in the sequence. The truthiness of the block return value dictates the parse result and can stop the iteration.
Another tool you might consider using is lookahead (!stuff_I_dont_expect_to_follow or &stuff_that_must_follow).
You can also ask such questions in http://groups.google.com/group/treetop-dev

simplest rules in treetop not working

I have a treetop grammar with only two rules:
grammar RCFAE
rule num
[0-9]+ <Num>
end
rule identifier
[a-zA-Z] [a-zA-Z]* <ID>
end
end
I'm trying to parse simple strings ("A" and "5"). The "5" is recognized as a Num if I put that rule first, and returns nil if i put that rule second. Similarly, "A" is recognized as an ID if I put that rule first, and returns nil if I put that rule second. I can't understand how these two rules overlap in any way. It's driving me crazy!
Is there something I'm missing or don't understand about treetop or regular expressions? Thanks in advance for your help.
Treetop expects the first rule to be the "main rule". It doesn't try to apply all the rules you defined until one matches - it only applies the main rule and if that does not match, it fails.
To do what you want, you need to add a main rule which might be a num or an identifier, like this:
grammar RCFAE
rule expression
num / identifier
end
rule num
[0-9]+ <Num>
end
rule identifier
[a-zA-Z] [a-zA-Z]* <ID>
end
end

Resources