Rule's order does matter in TreeTop? - ruby

I am just starting to use TreeTop to do parsing works. The following is the snippets that puzzles me:
grammar Fortran
rule integer
[1-9] [0-9]*
end
rule id
[a-zA-Z] [a-zA-Z0-9]*
end
end
parser = FortranParser.new
ast = parser.parse('1')
The result ast is:
[SyntaxNode offset=0, "1", SyntaxNode offset=1, ""]
But when I place rule id above rule integer, the result is nil. So what is the problem? Thanks in advance!

I think I just figured out where is wrong!!! There should be a top rule that includes other rules, which is placed as the first rule:
grammar Fortran
rule statement
( id / integer )* {
def content
elements.map { |e| e.content }
end
}
end
rule id
[a-zA-Z] [a-zA-Z0-9]* {
def content
[:id, text_value]
end
}
end
rule integer
[1-9] [0-9]* {
def content
[:integer, text_value]
end
}
end
end
parser = FortranParser.new
ast = parser.parse('1')
Then the result is
[[:integer, "1"]]

Related

Treetop parser : how to handle spaces?

Good morning everyone,
I'm currently trying to describe some basic Ruby grammar but I'm now stuck with parse space?
I can handle x = 1 + 1,
but can't parser x=1+1,
how can I parser space?
I have tried add enough space after every terminal.
but it can't parse,give a nil.....
How can I fix it?
Thank you very much, have a nice day.
grammar Test
rule main
s assign
end
rule assign
name:[a-z]+ s '=' s expression s
{
def to_ast
Assign.new(name.text_value.to_sym, expression.to_ast)
end
}
end
rule expression
add
end
rule add
left:brackets s '+' s right:add s
{
def to_ast
Add.new(left.to_ast, right.to_ast)
end
}
/
minus
end
rule minus
left:brackets s '-' s right:minus s
{
def to_ast
Minus.new(left.to_ast, right.to_ast)
end
}
/
brackets
end
rule brackets
'(' s expression ')' s
{
def to_ast
expression.to_ast
end
}
/
term
end
rule term
number / variable
end
rule number
[0-9]+ s
{
def to_ast
Number.new(text_value.to_i)
end
}
end
rule variable
[a-z]+ s
{
def to_ast
Variable.new(text_value.to_sym)
end
}
end
rule newline
s "\n"+ s
end
rule s
[ \t]*
end
end
this code works
problem Solved!!!!
It's not enough to define the space rule, you have to use it anywhere there might be space. Because this occurs often, I usually use a shorter rule name S for mandatory space, and the lowercase version s for optional space.
Then, as a principle, I skip optional space first in my top rule, and again after every terminal that can be followed by space. Terminals here are strings, character sets, etc. So at the start of assign, and before the {} block on variable, boolean, number, and also after your '=', '-' and '+' literals, add a call to the rule s to skip any spaces.
This policy works well for me. It's a good idea to have a test case which has minimum space, and another case that has maximum space (in all possible places).

Why is a custom SyntaxNode subclass not working with parentheses?

I have a treetop grammar like below:
grammar Addme
rule AddExpr
Num '+' Num
end
rule Num
[0-9]+ <ExprNumber>
end
end
This is working when I parse the expression:
g = AddmeParser.new
t = g.parse("1234+56789")
. . . there is a syntax node that matches "1234" with type ExprNumber.
But, if I add parentheses to the rule like this:
rule Num
([0-9]+) <ExprNumber>
end
It will not match the class ExprNumber. Why would this happen?
The node has already been created inside the parentheses. A module can be mixed in, but not a class.

Treetop infinite loop when parsing Latex document

I'm trying to write a parser with treetop to parse some latex commands into HTML markup. With the following I get a deadspin in generated code. I've build the source code with tt and stepped through but it doesn't really elucidate what the underlying issue is (it just spins in _nt_paragraph)
Test input: "\emph{hey} and some more text."
grammar Latex
rule document
(paragraph)* {
def content
[:document, elements.map { |e| e.content }]
end
}
end
# Example: There aren't the \emph{droids you're looking for} \n\n.
rule paragraph
( text / tag )* eop {
def content
[:paragraph, elements.map { |e| e.content } ]
end
}
end
rule text
( !( tag_start / eop) . )* {
def content
[:text, text_value ]
end
}
end
# Example: \tag{inner_text}
rule tag
"\\emph{" inner_text '}' {
def content
[:tag, inner_text.content]
end
}
end
# Example: \emph{inner_text}
rule inner_text
( !'}' . )* {
def content
[:inner_text, text_value]
end
}
end
# End of paragraph.
rule eop
newline 2.. {
def content
[:newline, text_value]
end
}
end
rule newline
"\n"
end
# You know, what starts a tag
rule tag_start
"\\"
end
end
For anyone curious, Clifford over at the treetop dev google group figured this out.
The problem was with paragraph and text.
Text is 0 or more characters, and there can be 0 or more texts in a paragraph, so what was happening was there was an infinite amount of 0 length characters before the first \n, causing the parser to dead spin. The fix was to adjust text to be:
( !( tag_start / eop) . )+
So that it must have at least one character to match.

Ruby Regex Odd Error, What is going on?

I have the following program:
class Matcher
include Enumerable
def initialize(string, match)
#string = string
#match = match
end
def each
#string.scan(/[##match]/) do |pattern|
yield pattern
end
end
end
mch = Matcher.new("the quickbrown fox", "aeiou")
puts mch.inject {|x, n| x+n}
It is supposed to match the characters, aeiou with the string the quickbrown fox
No matter what I put as the pattern, it oddly prints out the characters: thc. What's going on?
#string.scan(/[##match]/) do |pattern| is incorrect. #{#match} is what you're looking for.

Simplest treetop grammar is returning a parse error, just learning

I'm trying to learn treetop and was taking most of the code from https://github.com/survival/lordbishop for parsing names and was going to build from that.
My structure is a bit different because I'm building it in rails, rather than ruby command line.
When I run a very simple parse, I have a parse error being returned on a space (which should be one of the simpler things in my grammar. What am I doing wrong?
My code is fairly simple, in my model
require 'treetop'
require 'polyglot'
require 'grammars/name'
class Name
def self.parse(data)
parser = FullNameParser.new
tree = parser.parse(data)
if tree.nil?
return "Parse error at offset: #{parser.index}"
end
result_hash = {}
tree.value.each do |node|
result_hash[node[0] = node[1].strip if node.is_a?(Array) && !node[1].blank?
end
return result_hash
end
end
I've stripped most of the grammar down to just getting words and spaces
grammar FullName
rule word
[^\s]+ {
def value
text_value
end
}
end
rule s
[\s]+ {
def value
""
end
}
end
end
I'm trying to parse 'john smith',i was hoping to just get back words and spaces and build my logic from there, but I'm stuck at even this simple level. Any suggestions??
AFAIK, treetop starts parsing with the first rule in your grammar (the rule word, in your case!). Now, if you input is 'John Smith' (i.e.: word, s, word), it stops parsing after matching the rule word for the first time. And produces an error when it encounters the first s since word does not match s.
You need to add a rule to the top of your grammar that describes an entire name: that is a word, followed by a space followed by a word, etc.
grammar FullName
rule name
word (s word)* {
def value
text_value
end
}
end
rule word
[^\s]+ {
def value
text_value
end
}
end
rule s
[\s]+ {
def value
text_value
end
}
end
end
A quick test with the script:
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
require 'polyglot'
require 'FullName'
parser = FullNameParser.new
name = parser.parse('John Smith').value
print name
will print:
John Smith

Resources