I have a treetop grammar like below:
grammar Addme
rule AddExpr
Num '+' Num
end
rule Num
[0-9]+ <ExprNumber>
end
end
This is working when I parse the expression:
g = AddmeParser.new
t = g.parse("1234+56789")
. . . there is a syntax node that matches "1234" with type ExprNumber.
But, if I add parentheses to the rule like this:
rule Num
([0-9]+) <ExprNumber>
end
It will not match the class ExprNumber. Why would this happen?
The node has already been created inside the parentheses. A module can be mixed in, but not a class.
Related
Good morning everyone,
I'm currently trying to describe some basic Ruby grammar but I'm now stuck with parse space?
I can handle x = 1 + 1,
but can't parser x=1+1,
how can I parser space?
I have tried add enough space after every terminal.
but it can't parse,give a nil.....
How can I fix it?
Thank you very much, have a nice day.
grammar Test
rule main
s assign
end
rule assign
name:[a-z]+ s '=' s expression s
{
def to_ast
Assign.new(name.text_value.to_sym, expression.to_ast)
end
}
end
rule expression
add
end
rule add
left:brackets s '+' s right:add s
{
def to_ast
Add.new(left.to_ast, right.to_ast)
end
}
/
minus
end
rule minus
left:brackets s '-' s right:minus s
{
def to_ast
Minus.new(left.to_ast, right.to_ast)
end
}
/
brackets
end
rule brackets
'(' s expression ')' s
{
def to_ast
expression.to_ast
end
}
/
term
end
rule term
number / variable
end
rule number
[0-9]+ s
{
def to_ast
Number.new(text_value.to_i)
end
}
end
rule variable
[a-z]+ s
{
def to_ast
Variable.new(text_value.to_sym)
end
}
end
rule newline
s "\n"+ s
end
rule s
[ \t]*
end
end
this code works
problem Solved!!!!
It's not enough to define the space rule, you have to use it anywhere there might be space. Because this occurs often, I usually use a shorter rule name S for mandatory space, and the lowercase version s for optional space.
Then, as a principle, I skip optional space first in my top rule, and again after every terminal that can be followed by space. Terminals here are strings, character sets, etc. So at the start of assign, and before the {} block on variable, boolean, number, and also after your '=', '-' and '+' literals, add a call to the rule s to skip any spaces.
This policy works well for me. It's a good idea to have a test case which has minimum space, and another case that has maximum space (in all possible places).
I am just starting to use TreeTop to do parsing works. The following is the snippets that puzzles me:
grammar Fortran
rule integer
[1-9] [0-9]*
end
rule id
[a-zA-Z] [a-zA-Z0-9]*
end
end
parser = FortranParser.new
ast = parser.parse('1')
The result ast is:
[SyntaxNode offset=0, "1", SyntaxNode offset=1, ""]
But when I place rule id above rule integer, the result is nil. So what is the problem? Thanks in advance!
I think I just figured out where is wrong!!! There should be a top rule that includes other rules, which is placed as the first rule:
grammar Fortran
rule statement
( id / integer )* {
def content
elements.map { |e| e.content }
end
}
end
rule id
[a-zA-Z] [a-zA-Z0-9]* {
def content
[:id, text_value]
end
}
end
rule integer
[1-9] [0-9]* {
def content
[:integer, text_value]
end
}
end
end
parser = FortranParser.new
ast = parser.parse('1')
Then the result is
[[:integer, "1"]]
I have the following treetop grammar:
grammar TestGrammar
rule body
text / expression
end
rule text
not_delimiter*
end
rule expression
delimiter text delimiter
end
rule delimiter
'$'
end
rule not_delimiter
!delimiter
end
end
When I try to parse an expression, eg 'hello world $test$', the script goes in an infinite loop.
The problem seems to come from the not_delimiter rule, as when I remove it the expression get parsed.
What is the problem with this grammar?
Thanks in advance.
The problem seems to be where you are attempting to match:
rule text
not_delimiter*
end
Since the * will also match nothing you have the possibility of matching [^$]*, which I think is what is causing the infinite loop.
Also, you need to match multiple bodies at the starting rule, otherwise it will return nil, since you will only ever match either a text rule or an expression rule but not both.
rule bodies
body+
end
This will parse:
require 'treetop'
Treetop.load_from_string DATA.read
parser = TestGrammarParser.new
p parser.parse "hello world $test$"
__END__
grammar TestGrammar
rule bodies
body+
end
rule body
expression / text
end
rule expression
delimiter text delimiter
end
rule text
not_delimiter+
end
rule not_delimiter
[^$]
end
rule delimiter
'$'
end
end
I have a treetop grammar with only two rules:
grammar RCFAE
rule num
[0-9]+ <Num>
end
rule identifier
[a-zA-Z] [a-zA-Z]* <ID>
end
end
I'm trying to parse simple strings ("A" and "5"). The "5" is recognized as a Num if I put that rule first, and returns nil if i put that rule second. Similarly, "A" is recognized as an ID if I put that rule first, and returns nil if I put that rule second. I can't understand how these two rules overlap in any way. It's driving me crazy!
Is there something I'm missing or don't understand about treetop or regular expressions? Thanks in advance for your help.
Treetop expects the first rule to be the "main rule". It doesn't try to apply all the rules you defined until one matches - it only applies the main rule and if that does not match, it fails.
To do what you want, you need to add a main rule which might be a num or an identifier, like this:
grammar RCFAE
rule expression
num / identifier
end
rule num
[0-9]+ <Num>
end
rule identifier
[a-zA-Z] [a-zA-Z]* <ID>
end
end
I have a simple grammar setup like so:
grammar Test
rule line
(adjective / not_adjective)* {
def content
elements.map{|e| e.content }
end
}
end
rule adjective
("good" / "bad" / "excellent") {
def content
[:adjective, text_value]
end
}
end
rule not_adjective
!adjective {
def content
[:not_adjective, text_value]
end
}
end
end
Let's say my input is "this is a good ball. let's use it". This gives an error, which I'm not mentioning right now because I want to understand the theory about why its wrong first.
So, how do I create rule not_adjective so that it matches anything that is not matched by rule adjective? In general, how to I write I rule (specifically in Treetop) that "doesnt" match another named rule?
Treetop is a parser generator that generates parsers out of a special class of grammars called Parsing Expression Grammars or PEG.
The operational interpretation of !expression is that it succeeds if expression fails and fails if expression succeeds but it consumes NO input.
To match anything that rule expression does not match use the dot operator (that matches anything) in conjunction with the negation operator to avoid certain "words":
( !expression . )* ie. "match anything BUT expression"
The previous answer is incorrect for the OP's question, since it will match any sequence of individual characters up to any adjective. So if you see the string xyzgood, it'll match xyz and a following rule will match the "good" part as an adjective. Likewise, the adjective rule of the OP will match the first three characters of "badge" as the adjective "bad", which isn't what they want.
Instead, the adjective rule should look something like this:
rule adjective
a:("good" / "bad" / "excellent") ![a-z] {
def content
[:adjective, a.text_value]
end
}
end
and the not_adjective rule like this:
rule not_adjective
!adjective w:([a-z]+) {
def content
[:not_adjective, w.text_value]
end
}
end
include handling for upper-case, hyphenation, apostrophes, etc, as necessary. You'll also need white-space handling, of course.