Parsing ANTLRv3 tree doent not produce full tree - antlr3

I'm using ANTLRv3. I've defined a grammar.
Now I want to display Parse Tree (like in ANTLRWorks Parse Tree or STACK).
I've tried http://www.antlr.org/wiki/display/ANTLR3/Interfacing+AST+with+Java (walking throuht children) but it ommits clauses in grammar which does not appear
in parsing string.
Eg. I have a SQL grammar.
I'm parsing SELECT title,description from document .
In ANTLRWorks I can see(in parse tree)
root_statement->select_statement->select_expression->select_list->[displayed_column,displayed_colulmn] which is what I want.
But when I get the AST Tree from root_statement (through getChildren) I don't get select_statement, select_expression. The children are only for string from "SELECT title,description from document".
How can I get throught tree in the same way like in ANTLv4? (root_statement.select_statemet.select_expression)

ANTLR 3 builds ASTs with a custom shape defined by special syntax in the grammar (operators ^, !, and ->). ANTLR 4 builds parse trees that automatically follow the shape of the grammar itself.
To make ANTLR 3 behave like ANTLR 4, you'd need to create rewrite rules for every parser rule in your grammar where the root node has the name of the rule itself. For example:
myParserRule
: x y* -> ^(MyParserRule x y*)
| z+ -> ^(MyParserRule z+)
;
As for the other direction, there isn't an "easy" way to make ANTLR 4 behave like ANTLR 3.

Related

Algorithm to tell when we've processed a complex variable path expression while parsing?

I am working on a compiler for a homemade programming language and I am stuck on how to convert the lexical token stream into a tree of commands for constructing a DOM-like tree. The "tree of commands" will still be a list, essentially emmitting events in a way that describes how to create a tree, from partial information provided by the lexer. (This language is like CoffeeScript in a way, indentation based, or like XML with indentation focus).
I am stuck on how to tell when a variable path has been discovered. A variable path can be simple, or complex, as these examples demonstrate:
foo
foo.bar
foo.bar[baz].hello[and][goodday].there
this[is[even[more.complicated].wouldnt.you[say]]]
They could get more complicated still, if we handled dynamic interpolation of strings, such as:
foo[`bar${x}abc`].baz
But in my simple lang, there are two relevant things, "paths", and "terms". Terms are anything /a-z/ for now, and paths are chaining together and nesting, like the first examples.
For demonstration purposes, everything else is a simple "term" of 1 word, so you might have this:
abc foo.bar[baz].hello[and][goodday].there, one foo.bar
It forms a simple tree.
Right now I have a lexer which spits out the tokens, so basically:
abc
[SPACE]
foo
.
bar
[
baz
]
.
hello
[
and
]
[
goodday
]
.
there
,
[SPACE]
one
[SPACE]
foo
.
bar
That is at least how I broke it up initially.
So given that sequence of strings, how can you generate messages to tell the parser how to build a tree?
term
nest-down
term
period
term
open-square
and
close-square
...
That is the stream of tokens with a name now, but it is not a tree yet. I would like this:
term-start
term # value: abc
term-end
nest-down
term-path-start
term-start
term # value: foo
term-end
period
term-start
term # value: bar
term-end
term-nest-start
term-start
term # value: and
term-and
term-nest-end
...
I have been struggling with this example for several days now (boiled down from a complex real-world scenario). I cant seem to figure out how to keep track of all the information you need to make a decision on when to say "this structure is done now, close it out" sort of thing. Wondering if you know how to get past this.
Note, I don't need the last tree to actually be a tree structure visually, I just need it to generate those messages which can be interpreted on the other end and used to construct a tree at runtime.
There is no way to construct a tree from a list without having the description of the tree in some form. Often, in relation to parsing, the description of this tree is given by a context-free grammar (CFG).
Then you create a parser on the basis of this given CFG. The lexical token stream is given as an input to the parser. The parser organizes the lexical tokens into a tree by using some parsing algorithm.
The parser emits commands for syntax tree construction based on the rules it uses during parsing. On entering into a rule a command "rule X enter" is emitted, on exiting a rule a command "exit X rule" is emitted. When you accept a lexical token then a "token forward" is emitted with its lexeme characters. Some grammars, namely these in ABNF format, support repetitions of elements. Depending from these repetitions the syntax tree might be represented as lists or arrays.
Then a builder module receives this commands and builds a tree, or uses the commands for the specific task with the listener pattern.
I have co-authored (2021) a paper describing a list of commands for building a concrete/abstract syntax trees, depending on CFG's structure, that are used in the parsers generated by parser generator Tunnel Grammar Studio.
The paper is named "Тhe Expressive Power of the Statically Typed Concrete Syntax Trees". It is in an open-access journal (intentionally). The commands are in section "4.3 Syntax Structure Construction Commands". The article is bit "compressed", due to space limitations, and it is not really intended to be a software development guide, but to note the taken approach. It might give you some ideas.
Another co-authored paper of mine, from 2021 named "A Parsing Machine Architecture Encapsulating Different Parsing Approaches" (also in open-access journal) describes a general form of parsing machine and its modules. There Fig.1, p.33, will give you a quick description.
Disclaimer: I have made the parser generator.

Rascal: Grammar Stack Trace

When parsing a file with a specific grammar and the parse fails, I get a corresponding error message with the location in the source file that offended the grammar.
What I would like to look at in these situations would be the list of grammar rules that were active at this moment, something like a grammar rule "stack trace", or the rules that have matched so far.
Is this possible in Rascal?
So, for a very simple example, in the EXP language from the documentation, if I tried to parse "2 + foo" I could get something like
Exp
=> left Exp "+" Exp
=> left IntegerLiteral "+" Exp
=> left IntegerLiteral "+" <?>
No derivation of "foo" from rule 'Exp'
Another way of saying this is looking at an incomplete parse tree, as it was the moment the parse error occurred. Does that make sense?
It makes total sense, but I'm afraid this "incomplete parse tree" feature is on our TODO list.
Note that with the non-deterministic parsing algorithm it would probably return a set of current parse contexts, so a "parse forest" rather than a single stack trace. Still I think that would be a very useful debugging feature.
The only suggestion right now I can do right now is "delta-debugging", so removing half the input and checking if the parse error is still there, then the other half, rinse/lather/repeat.

How to build Prolog grammar parse tree consisting of two sentences joined by a conjunction

I have following Prolog code to recognise a sentence. Notice that it builds a parse tree for the grammar too.
sentence(plural,s(Np,Vp)) -->
noun_phrase(plural,Np),
verb_phrase(plural,Vp).
sentence(singular,s(Np,Vp)) -->
noun_phrase(singular,Np),
verb_phrase(singular,Vp).
I need to have a predicate that can recognise a compound sentence (it consists of two sentences joined by a conjunction). I came up with following code but execution fails. Of course, in my Prolog code there are definitions for noun_phrase, verb_phrase and so on.
compound_sentence(comp_s(s1,Conj,s2)) -->
sentence(_,s1(Np,Vp)),
conjuction(_,Conj),
sentence(_,s2(Np,Vp)).
e.g. When I run this query, it will fail.
?- phrase(compound_sentence(_),
[the,reboot,is,a,success,and,the,user,does,a,save]).
How do you go about detecting compound sentences?
The reason why query fails:
phrase(compound_sentence(_), ...)
because (a) the two subgoals sentence(, s1(Np,Vp)) cannot match the parse tree sentence/2 is building: sentence(, s(Np,Vp)). And (b) the two sentences cannot have the same Np and Vp. Try something like this:
compound_sentence(comp_s(S1,Conj,S2)) -->
sentence(_, S1),
conjuction(_,Conj),
sentence(_, S2).
where S1 = s(Np1, Vp1) corresponding to the first sentence, and S2 = s(Np2, Vp2) for the second.

Java Grammar To AST

In java grammar I have a parser rule,
name
: Identifier ('.' Identifier)* ';'
;
How to get all the identifiers under a single AST tree node?
It seems impossible to me only with your lexer-parser.
For this, you will need the called: tree-walker.This third part of the parsing process will make you able to go through the generated AST and, with a counter, print the number of occurrences.
I let you a reference here in case you decide to implement it.
https://theantlrguy.atlassian.net/wiki/display/ANTLR3/Tree+construction
I hope this would help you!

Are there alternative ways to display a list other than by using loop?

I know on how to display a list by using loop.
For example,
choice(a):-write('This is the top 15 countries list:'),nl,
loop(X).
loop(X):-country(X),write(X),nl,fail.
Unfortunately, I don't know on how to display list by using list. Anyone can guide me?
it's not very clear what it is that you're trying to achieve.
I'm not sure from your description whether you have quite got to grips with the declarative style of Prolog. When you wrote your rule for loop you were providing a set of conditions under which Prolog would match the rule. This is different from a set of procedural instructions.
If you want to collect all the countries into a list you can use the setof rule like follows
top_countries(Cs):-
setof(C, country(C), Cs).
This will return a list [] of the countries matched by the rule.
If you wanted to output each element of this list on a new line you could do something like the following recursive function.
write_list([]).
write_list([H|T]):-
write(H),nl,
write_list(T).
The first rule matches the base case; this is when there are no elements left in the list. At this point we should match and stop. The second rule matches (unifies) the head of the list and writes it to screen with a newline after it. The final line unifies the tail (remainder) of the list against the write_list function again.
You could then string them together with something like the following
choice(a):-
write('This is the top 15 countries list:'),nl,
top_countries(X),
write_list(X).
Things to note
Try not to have singleton variables such as the X in your choice rule. Variables are there to unify (match) against something.
Look into good declarative programming style. When you use functions like write it can be misleading and tempting to treat Prolog in a procedural manner but this will just cause you problems.
Hope this helps
write/1 doesn't only write strings, it writes any Prolog term. So, though Oli has given a prettier write_list, the following would do the job:
choice(Countries):-write('This is the top 15 countries list:'),nl,write(Countries).

Resources