Odd behavior of parser when trying sample grammar - antlr3

I'm trying to get the feel for antlr3, and i pasted the Expression evaluator into an ANTLRWorks window (latest version) and compiled it. It compiled successfully and started, but two problems:
Attempting to use a input of 1+2*4/3; resulted in the actual input for the parser being 1+2*43.
One of the errors it shows in it's graphical parser tree is MissingTokenException(0!=0).
As i'm new to antlr, can someone help?

The example you linked to doesn't support division (just look at the code, you'll notice there's no division here:
expr returns [int value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += $e.value;}
| '-' e=multExpr {$value -= $e.value;}
)*

We often get
MissingTokenException(0!=0)
when we make mistakes. I think it means that it cannot find a token it's looking for, and could be produced by an incorrect token. It's possible for the parser to "recover" sometimes depending on the grammar.
Remember also that the LEXER operates before the parser and your should check what tokens are actually passed to the parser. The AntlrWorks debugger can be very helpful here.

Related

Compile error using fast pipe operator after pipe last in ReasonML

The way that the "fast pipe" operator is compared to the "pipe last" in many places implies that they are drop-in replacements for each other. Want to send a value in as the last parameter to a function? Use pipe last (|>). Want to send it as the first parameter? Use fast pipe (once upon a time |., now deprecated in favour of ->).
So you'd be forgiven for thinking, as I did until earlier today, that the following code would get you the first match out of the regular expression match:
Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id")
|> Belt.Option.getExn
-> Array.get(1)
But you'd be wrong (again, as I was earlier today...)
Instead, the compiler emits the following warning:
We've found a bug for you!
OCaml preview 3:10-27
This has type:
'a option -> 'a
But somewhere wanted:
'b array
See this sandbox. What gives?
Looks like they screwed up the precedence of -> so that it's actually interpreted as
Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id")
|> (Belt.Option.getExn->Array.get(1));
With the operators inlined:
Array.get(Belt.Option.getExn, 1, Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id"));
or with the partial application more explicit, since Reason's syntax is a bit confusing with regards to currying:
let f = Array.get(Belt.Option.getExn, 1);
f(Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id"));
Replacing -> with |. works. As does replacing the |> with |..
I'd consider this a bug in Reason, but would in any case advise against using "fast pipe" at all since it invites a lot of confusion for very little benefit.
Also see this discussion on Github, which contains various workarounds. Leaving #glennsl's as the accepted answer because it describes the nature of the problem.
Update: there is also an article on Medium that goes into a lot of depth about the pros and cons of "data first" and "data last" specifically as it applies to Ocaml / Reason with Bucklescript.

Datalog code not working in DrRacket

I am trying to run this prolog code in DrRacket: http://www.anselm.edu/homepage/mmalita/culpro/graf1.html
#lang datalog
arc(a,b).
arc(b,c).
arc(a,c).
arc(a,d).
arc(b,e).
arc(e,f).
arc(b,f).
arc(f,g).
pathall(X,X,[]).
pathall(X,Y,[X,Z|L]):- arc(X,Z),pathall(Z,Y,L). % error on this line;
pathall(a,g)?
However, it is giving following error:
read: expected a `]' to close `['
I suspect '|' symbol is not being read as head-tail separator of the list. Additionally, [] is also giving error (if subsequent line is removed):
#%app: missing procedure expression;
probably originally (), which is an illegal empty application in: (#%app)
How can these be corrected so that the code works and searches for paths between a and g ?
The Datalog module in DrRacket is not an implementation of Prolog, and the syntax that you have used is not allowed (see the manual for the syntax allowed).
In particular terms cannot be data structures like lists ([]). To run a program like that of above you need a Prolog interpreter with data structures.
What you can do is define for instance a predicate path, like in the example that you have linked:
path(X,Y):- arc(X,Y).
path(X,Y):- arc(X,Z),path(Z,Y).
and, for instance, ask if a path exists or not, as in:
path(a,g)?
or print all the paths to a certain node with
path(X,g)?
etc.

Java Grammar To AST

In java grammar I have a parser rule,
name
: Identifier ('.' Identifier)* ';'
;
How to get all the identifiers under a single AST tree node?
It seems impossible to me only with your lexer-parser.
For this, you will need the called: tree-walker.This third part of the parsing process will make you able to go through the generated AST and, with a counter, print the number of occurrences.
I let you a reference here in case you decide to implement it.
https://theantlrguy.atlassian.net/wiki/display/ANTLR3/Tree+construction
I hope this would help you!

Writing F# object expression in one single line

As I was about to write a code generator for F#, I was wondering whether I could avoid stepping into the indentation mess by generating only one-line-values.
As part of this effort, I was considering how I could express Object Expressions in one single line, but couldn't succeed, except in Verbose Mode.
let Expr() = (let ToString = "ToString" in { new System.Object() with member this.ToString() = ToString; member this.GetHashCode() = ToString.GetHashCode()})
The issue is that I don't want to generate all my code in Verbose mode, this is a compatibility feature. Is there any other option?
Thanks a lot in advance for your insights!
François
The reason I ask for this is that I have to generate Object Expressions in arbitrary expressions and I would like to avoid to count the number of characters in the current line to compute how much I've to indent the next line.
(Shameless plug) I maintain F# source code formatter which exposes APIs to pretty-print full F# 3.0 syntax. You have several options:
Implement your code generator using verbose mode and run the source code formatter APIs on the output. Then you don't have to worry about indentation and line break on verbose mode is very easy.
Implement your code generator using functional combinators. There are many combinators in FormatConfig module which you can copy and modify. F# indentation rule is clear; you can read more in this article.
You probably have an AST for pretty printing. If you prefer a lightweight solution, F# Compiler CodeDom has similar combinators for code generation.
This is not strictly speaking an answer, but what is so bad about writing it with proper indentation? You can always make a list of the generated lines and add indentation in a separate step. This is how I usually generate code, even if the language (e.g. HTML) does not need indentation.
Why does generated code have to be illegible for humans? The proper indentation makes a review of the generated code easier and you can even use usual diff tools, if you for instance have the generated sources commited to source control.
At least in my experience proper indentation is actually far less difficult than one might think, especially when generating. Don't forget, that you have all the context at hand during the generation, so adding a level of indentation should be very easy.
let indent = List.map (sprintf " %s")
[
yield "let Expr() ="
yield! [
yield "let ToString = \"ToString\""
yield "{"
yield! [
"new System.Object() with"
"member this.ToString() = ToString"
"member this.GetHashCode() = ToString.GetHashCode()"
] |> indent
yield "}"
] |> indent
]

Pythonesque blocks and postfix expressions

In JavaScript,
f = function(x) {
return x + 1;
}
(5)
seems at a glance as though it should assign f the successor function, but actually assigns the value 6, because the lambda expression followed by parentheses is interpreted by the parser as a postfix expression, specifically a function call. Fortunately this is easy to fix:
f = function(x) {
return x + 1;
};
(5)
behaves as expected.
If Python allowed a block in a lambda expression, there would be a similar problem:
f = lambda(x):
return x + 1
(5)
but this time we can't solve it the same way because there are no semicolons. In practice Python avoids the problem by not allowing multiline lambda expressions, but I'm working on a language with indentation-based syntax where I do want multiline lambda and other expressions, so I'm trying to figure out how to avoid having a block parse as the start of a postfix expression. Thus far I'm thinking maybe each level of the recursive descent parser should have a parameter along the lines of 'we have already eaten a block in this statement so don't do postfix'.
Are there any existing languages that encounter this problem, and how do they solve it if so?
Python has semicolons. This is perfectly valid (though ugly and not recommended) Python code: f = lambda(x): x + 1; (5).
There are many other problems with multi-line lambdas in otherwise standard Python syntax though. It is completely incompatible with how Python handles indentation (whitespace in general, actually) inside expressions - it doesn't, and that's the complete opposite of what you want. You should read the numerous python-ideas thread about multi-line lambdas. It's somewhere between very hard to impossible.
If you want arbitrarily complex compound statements inside lambdas you can't use the existing rules for multi-line expressions even if you made all statements expressions. You'd have to change the indentation handling (see the language reference for how it works right now) so that expressions can also contain blocks. This is hard to do without breaking perfectly fine Python code, and will certainly result in a language many Python programmers will consider worse in several regards: Harder to understand, more complex to implement, permits some stupid errors, etc.
Most languages don't solve this exact problem at all. Most candidates (Scala, Ruby, Lisps, and variants of these three) have explicit end-of-block tokens. I know of two languages that have the same problem, one of which (Haskell) has been mentioned by another answer. Coffeescript also uses indentation without end-of-block tokens. It parses the transliteration of your example correctly. However, I could not find any specification of how or why it does this (and I won't dig through the parser source code). Both differ significantly from Python in syntax as well as design philosophy, so their solution is of little (if any) use for Python.
In Haskell, there is an implicit semicolon whenever you start a line with the same indentation as a previous one, assuming the parser is in a layout-sensitive mode.
More specifically, after a token is encountered that signals the start of a (layout-sensitive) block, the indentation level of the first token of the first block item is remembered. Each line that is indented more continues the current block item; each line that is indented the same starts a new block item, and the first line that is indented less implies the closure of the block.
How your last example would be treated depends on whether the f = is a block item in some block or not. If it is, then there will be an implicit semicolon between the lambda expression and the (5), since the latter is indented the same as the former. If it is not, then the (5) will be treated as continuing whatever block item the f = is a part of, making it an argument to the lamda function.
The details are a bit messier than this; look at the Haskell 2010 report.

Resources