JavaCC: A LOOKAHEAD of 2 or greater make my compiler crash? - compilation

I am using the Grammar defined in the official Java 8 Language Specification to write a Parser for Java.
In my .jj file I have all of the usual kinds of choice conflicts such as
Warning: Choice conflict involving two expansions at
line 25, column 3 and line 31, column 3 respectively.
A common prefix is:
Consider using a lookahead of 2 for earlier expansion.
or
Warning: Choice conflict in (...)* construct at line 25, column 8.
I did carefully read the Lookahead tutorial from JavaCC but my problem is that whenever I set a LOOKAHEAD(n) where n > 1 and I compile the .jj file the compilation gets stuck and I need to kill the java process.
Why?
CODE
Since I am unable to localize the code which causes my problem I am also not possible to isolate the corresponding code portions.
I was able to restrict the search for the erroneous code fragments as follows:
I have uploaded the code at scribd here.
Please note:
The first rules have a leading // OK comment. This means that when I only have these rules I do get the warnings from the compiler that I have choice conflicts but when I add
LOOKAHEAD(3)
at the corresponding position the warnings disappear.
When I add all successive rules (at once) I am not able to add the
LOOKAHEAD(3) statement anymore. When I do my Eclipse IDE freezes and the javaw.exe process seems get deadlocked or run into an infinite loop when I try to compile the file with JavaCC (which is my actual problem).

Your grammar is so far from LL(1) that it is hard to know where to begin. Let's look at types. After correcting it to follow the grammar in the JLS 8, you have
void Type() :
{ }
{
PrimitiveType() |
ReferenceType()
}
where
void PrimitiveType() :
{ }
{
(Annotation())* NumericType() |
(Annotation())* <KW_boolean>
}
void ReferenceType() :
{ }
{
ClassOrInterfaceType() |
TypeVariable() |
ArrayType()
}
void ClassOrInterfaceType() :
{ }
{
(Annotation())* <Identifier> (TypeArguments())? |
(Annotation())* <Identifier> (TypeArguments())? M()
}
And the error for Type is
Warning: Choice conflict involving two expansions at
line 796, column 3 and line 797, column 3 respectively.
A common prefix is: "#" <Identifier>
Consider using a lookahead of 3 or more for earlier expansion.
The error message tells you exactly what the problem is. There can be annotations at the start of both alternatives in Type. One way to deal with this is to factor out what's common, which is annotations.
Now you have
void Type() :
{ }
{
( Annotation() )*
( PrimitiveType() | ReferenceType() )
}
void PrimitiveType() :
{ }
{
NumericType() |
<KW_boolean>
}
void ReferenceType() :
{ }
{
ClassOrInterfaceType() |
TypeVariable() |
ArrayType()
}
void ClassOrInterfaceType() :
{ }
{
<Identifier> (TypeArguments())? |
<Identifier> (TypeArguments())? M()
}
That fixes the problem with Type. There are still lots of problems, but now there is one less.
For example, all three choices in ReferenceType can start with an identifier. In the end you will want something like this
void Type() :
{ }
{
( Annotation() )*
( PrimitiveType() | ReferenceTypesOtherThanArrays() )
( Dims() )?
}
void PrimitiveType() :
{ }
{
NumericType() | <KW_boolean>
}
void ReferenceTypesOtherThanArrays() :
{ }
{
<Identifier>
( TypeArguments() )?
(
<Token_Dot>
( Annotation() )*
<Identifier>
( TypeArguments() )?
)*
}
Notice that TypeVariable is gone. This is because there is no way to syntactically distinguish a type variable from a class (or interface) name. Thus the grammar just above will accept, say T.x, where T is a type variable, whereas the JLS grammar does not. This is the kind of error you can only rule out using a symbol table. There are a few of situations like this in Java; for example, without a symbol table, you can't tell a package name from a class name or a class name from a variable name; in an expression a.b.c, a could be a package name, a class name, an interface name, a type variable, a variable, or a field name.
You can handle these sorts of issues in one of two ways: you can deal with the problem after parsing, i.e. in a later phase, or you can have a symbol table present during the parsing phase and use the symbol table to guide the parser using semantic lookahead. The latter option is not a good one for Java, however; it is best to parse first and deal with all issues that need a symbol table later. This is because, in Java a symbol can be declared after it is used. It might even be declared in another file. What we did in the Java compiler for the Teaching Machine was to parse all files first. Then build a symbol table. Then do semantic analysis. Of course if your application does not require diagnosing all errors, then these considerations can largely be ignored.

Related

Alternate syntax on introspecting modules/classes/etc

I'm rewriting a framework from Perl5 to Perl6 for my work purposes. At some place I need to collect information from other modules/classes by executing a public sub they might provide; or they may not. So, it necessary to find out if the sub is present. This is not a big deal when a module is referenced directly (Foo::<&my-sub>) or by a symbolic name in a string (&::("Foo")::my-sub). But for the simplicity of it I would like to allow to pass module names as-is (lets say collector is the method collecting the info):
self.collector( Foo );
Where Foo could be the following:
module Foo {
use Bar;
use Baz;
our sub my-sub { Bar, 'Baz' }
}
And this is where I'm missing something important from Perl6 syntax because the following:
method collector ( $mod ) {
my $mod-name = $mod.WHO;
my #mods;
with &::($mod-name)::my-sub {
#mods.push: &$_();
}
}
is currently the only way I can perform the task.
I didn't try a type capture yet though. Should work as expected, I guess. So, the question is more about extending my knowelge of the syntax.
The final solution from the exchange with Vadim in the comments on their question. It's arguably insane. They think it's beautiful. And who am I to argue? .oO( Haha, hoho, heehee... )
my $pkg-arg = (Int, 'Int').pick;
my \pkg-sym = $pkg-arg && ::($pkg-arg);
my \sub-ref = &pkg-sym::($subname);
There are two obviously useful ways to refer to a package:
Its symbolic name. Int is the symbolic name of the Int class.
Its string name. 'Int' is the string name of the Int class.
Vadim, reasonably enough, wants a solution for both.
In the solution in this answer I simulate the two types of argument by randomly picking one and assigning it to $pkg-arg:
my $pkg-arg = (Int, 'Int').pick;
Now we need to normalize. If we've got a symbolic name we're good to go. But if it's a string name, we need to turn that into the symbolic name.
Vadim showed a couple ways to do this in the comments on their question. This solution uses a third option:
my \pkg-sym = $pkg-arg && ::($pkg-arg);
If $pkg-arg is a symbolic name, it'll be False. With a False LHS the && short-circuits and returns its LHS. If $pkg-arg is a string name, then the && will instead return its RHS, which is ::($pkg-arg) which is a symbol lookup using $pkg-arg as a string name.
The upshot is that pkg-sym ends up containing a package symbolic name (or a Failure if the lookup failed to find a matching symbolic name).
Which leaves the last line. That looks for a sub named $subname in the package pkg-sym:
my \sub-ref = &pkg-sym::($subname);
The & is needed to ensure the RHS is treated as a reference rather than as an attempt to call a routine. And pkg-sym has to be a sigilless identifier otherwise the code won't work.
At the end of these three lines of code sub-ref contains either a Failure or a reference to the wanted sub.

Pattern matching over borrowed HashMap containing enums

I'm trying to learn Rust, so bear with me if I'm way off :-)
I have a program that inserts enums into a HashMap, and uses Strings as keys. I'm trying to match over the content of the HashMap. Problem is that I can't figure out how to get the correct borrowings, references and types in the eval_output function. How should the eval_output function look to properly handle a reference to a HashMap? Is there any good document that I can read to learn more about this particular subject?
use std::prelude::*;
use std::collections::HashMap;
enum Op {
Not(String),
Value(u16),
}
fn eval_output(output: &str, outputs: &HashMap<String, Op>) -> u16 {
match outputs.get(output) {
Some(&op) => {
match op {
Op::Not(input) => return eval_output(input.as_str(), outputs),
Op::Value(value) => return value,
}
}
None => panic!("Did not find input for wire {}", output),
}
}
fn main() {
let mut outputs = HashMap::new();
outputs.insert(String::from("x"), Op::Value(17));
outputs.insert(String::from("a"), Op::Not(String::from("x")));
println!("Calculated output is {}", eval_output("a", &outputs));
}
Review what the compiler error message is:
error: cannot move out of borrowed content [E0507]
Some(&op) => {
^~~
note: attempting to move value to here
Some(&op) => {
^~
help: to prevent the move, use `ref op` or `ref mut op` to capture value by reference
While technically correct, using Some(ref op) would be a bit silly, as the type of op would then be a double-reference (&&Op). Instead, we simply remove the & and have Some(op).
This is a common mistake that bites people, because to get it right you have to be familiar with both pattern matching and references, plus Rust's strict borrow checker. When you have Some(&op), that says
Match an Option that is the variant Some. The Some must contain a reference to a value. The referred-to thing should be moved out of where it is and placed into op.
When pattern matching, the two keywords ref and mut can come into play. These are not pattern-matched, but instead they control how the value is bound to the variable name. They are analogs of & and mut.
This leads us to the next error:
error: mismatched types:
expected `&Op`,
found `Op`
Op::Not(input) => return eval_output(input.as_str(), outputs),
^~~~~~~~~~~~~~
It's preferred to do match *some_reference, when possible, but in this case you cannot. So we need to update the pattern to match a reference to an Op — &Op. Look at what error comes next...
error: cannot move out of borrowed content [E0507]
&Op::Not(input) => return eval_output(input.as_str(), outputs),
^~~~~~~~~~~~~~~
It's our friend from earlier. This time, we will follow the compilers advice, and change it to ref input. A bit more changes and we have it:
use std::collections::HashMap;
enum Op {
Not(String),
Value(u16),
}
fn eval_output(output: &str, outputs: &HashMap<String, Op>) -> u16 {
match outputs.get(output) {
Some(op) => {
match op {
&Op::Not(ref input) => eval_output(input, outputs),
&Op::Value(value) => value,
}
}
None => panic!("Did not find input for wire {}", output),
}
}
fn main() {
let mut outputs = HashMap::new();
outputs.insert("x".into(), Op::Value(17));
outputs.insert("a".into(), Op::Not("x".into()));
println!("Calculated output is {}", eval_output("a", &outputs));
}
There's no need to use std::prelude::*; — the compiler inserts that automatically.
as_str doesn't exist in stable Rust. A reference to a String (&String) can use deref coercions to act like a string slice (&str).
I used into instead of String::from as it's a bit shorter. No real better reason.

Where is the best place to define variable?

I am wondering about does it any difference the below code for performance or something? For example there is three variables and each one is when to use is defined.
bool myFunc()
{
string networkName;
if ( !Parse(example, XML_ATTRIBUTE_NAME, networkName) )
{
return false;
}
BYTE networkId;
if ( !Parse(example, XML_ATTRIBUTE_ID, networkId) )
{
return false;
}
string baudRate;
if ( !Parse(example, XML_ATTRIBUTE_BAUDRATE, baudRate) )
{
return false;
}
}
Does it any difference between above and below code for performance or something?
bool myFunc()
{
string networkName;
string baudRate;
BYTE networkId;
if ( !Parse(example, XML_ATTRIBUTE_NAME, networkName) )
{
return false;
}
if ( !Parse(example, XML_ATTRIBUTE_ID, networkId) )
{
return false;
}
if ( !Parse(example, XML_ATTRIBUTE_BAUDRATE, baudRate) )
{
return false;
}
}
Code Readability
The recommended practice is to put the declaration as close as possible to the first place where the variable is used. This also minimizes the scope.
From Steve McConnell's "Code Complete" book:
Ideally, declare and define each variable close to where it’s first
used. A declaration establishes a variable’s type. A definition assigns
the variable a specific value. In languages that support it, such as
C++ and Java, variables should be declared and defined close to where
they are first used. Ideally, each variable should be defined at the
same time it’s declared.
Nevertheless, few sources recommend placing declarations in the beginning of the block ({}).
From the obsolete Java Code Conventions:
Put declarations only at the beginning of blocks. (A block is any code
surrounded by curly braces "{" and "}".) Don't wait to declare
variables until their first use; it can confuse the unwary programmer
and hamper code portability within the scope.
Declaring variables only at the top of the function is considered bad practice. Place declarations in the most local blocks.
Performance
In fact, it depends. Declaring POD types should not affect performance at all: the memory for all local variables is allocated when you call the function (C, JavaScript, ActionScript...).
Remember that the compiler optimizes your code, so I guess non-POD types also wouldn't be a problem (C++).
Usually choosing the place to declare a variable is a premature optimization, so the performance is an unimportant point here because of its insignificant microscopic boost (or overhead). The major argument is still the code readability.
Additional Note
Before C99 (C language) standard, variables had to be declared in the beginning of the block.
Summarizing
Considering the above, the best approach (but still not mandatory) is to declare variable as close as possible to the place of its first usage, keeping the scope clean.
In general, it's just a matter of a code readability.

Checking, if optional parameter is provided in Dart

I'm new to Dart and just learning the basics.
The Dart-Homepage shows following:
It turns out that Dart does indeed have a way to ask if an optional
parameter was provided when the method was called. Just use the
question mark parameter syntax.
Here is an example:
void alignDingleArm(num axis, [num rotations]) {
if (?rotations) {
// the parameter was really used
}
}
So I've wrote a simple testing script for learning:
import 'dart:html';
void main() {
String showLine(String string, {String printBefore : "Line: ", String printAfter}){
// check, if parameter was set manually:
if(?printBefore){
// check, if parameter was set to null
if(printBefore == null){
printBefore = "";
}
}
String line = printBefore + string + printAfter;
output.appendText(line);
output.appendHtml("<br />\n");
return line;
}
showLine("Hallo Welt!",printBefore: null);
}
The Dart-Editor already marks the questionmark as Error:
Multiple markers at this line
- Unexpected token '?'
- Conditions must have a static type of
'bool'
When running the script in Dartium, the JS-Console shows folloing Error:
Internal error: 'http://localhost:8081/main.dart': error: line 7 pos 8: unexpected token '?'
if(?printBefore){
^
I know, that it would be enough to check if printBefore is null, but I want to learn the language.
Does anyone know the reason for this problem?
How to check, if the parameter is set manually?
The feature existed at some point in Dart's development, but it was removed again because it caused more complication than it removed, without solving the problem that actually needed solving - forwarding of default parameters.
If you have a function foo([x = 42]) and you want a function to forward to it, bar([x]) => f(x);, then, since foo could actually tell if x is passed or not, you actually ended up writing bar([x]) => ?x ? foo(x) : foo();. That was worse than what you had to do without the ?: operator.
Ideas came up about having a bar([x]) => foo(?:x) or something which pased on x if it was present and not if it was absent (I no longer remember the actual proposed syntax), but that got complicated fast, fx converting named arguments to positional - bar({x,y}) => foo(?:x, ?:y); - what if y was provided and x was not. It was really just a bad solution for a self-inflicted problem.
So, the ?x feature was rolled back. All optional parameters have a default value which is passed if there is no matching argument in a call. If you want to forward an optional parameter, you need to know the default value of the function you are forwarding to.
For most function arguments, the declared default value is null, with an internal if (arg == null) arg = defaultValue; statement to fix it. That means that the null value can be forwarded directly without any confusion.
Some arguments have a non-null default value. It's mostly boolean arguments, but there are other cases too. I recommend using null for everything except named boolean parameters (because they are really meant to be named more than they are meant to be optional). At least unless there is a good reason not to - like ensuring that all subclasses will have the same default value for a method parameter (which may be a good reason, or not, and should be used judiciosuly).
If you have an optional parameter that can also accept null as a value ... consider whether it should really be optional, or if you just need a different function with one more argument. Or maybe you can introduce a different "missing argument" default value. Example:
abstract class C { foo([D something]); }
class _DMarker implements D { const _DMarker(); }
class _ActualC {
foo([D something = const _DMarker()]) {
if (something == const _DMarker()) {
// No argument passed, because user cannot create a _DMarker.
} else {
// Argument passed, may be null.
}
}
}
This is a big workaround, and hardly ever worth it. In general, just use null as default value, it's simpler.
I was trying something similar:
This does not work
widget.optionalStringParameter ? widget.optionalStringParameter : 'default string'
This works
widget.optionalStringParameter != null ? widget.optionalStringParameter : 'default string'
This also works
widget.optionalStringParameter ?? 'default string'
There was support for checking if an optional parameter was actually provider in early Dart days (pre 1.0) but was removed because it causes some troubles.

Explanation of Oslo error "M0197: 'Text' cannot be used in a Type context"?

In Microsoft Oslo SDK CTP 2008 (using Intellipad) the following code compiles fine:
module M {
type T {
Text : Text;
}
}
while compiling the below code leads to the error "M0197: 'Text' cannot be used in a Type context"
module M {
type T {
Text : Text;
Value : Text; // error
}
}
I do not see the difference between the examples, as in the first case Text is also used in a Type context.
UPDATE:
To add to the confusion, consider the following example, which also compiles fine:
module M {
type X;
type T {
X : X;
Y : X;
}
}
The M Language Specification states that:
Field declarations override lexical scoping to prevent the type of a declaration binding to the declaration itself. The ascribed type of a field declaration must not be the declaration itself; however, the declaration may be used in a constraint. Consider the following example:
type A;
type B {
A : A;
}
The lexically enclosing scope for the type ascription of the field declaration A is the entity declaration B. With no exception, the type ascription A would bind to the field declaration in a circular reference which is an error. The exception allows lexical lookup to skip the field declaration in this case.
It seems that user defined types and built-in (intrinsic) types are not treated equal.
UPDATE2:
Note that Value in the above example is not a reserved keyword. The same error results if you rename Value to Y.
Any ideas?
Regards, tamberg
From what I am seeing you have redefined Text:
Text : Text
and then you are attempting to use it for the type of Value:
Value : Text
which is not allowed. Why using a type name as a property redefines a type I'm not entirely clear on (still reading M language specification), but I'm sure there's a good reason for it. Just name Text something that's not already a defined type (escaping it with brackets ([Text]) does not work either).
http://social.msdn.microsoft.com/Forums/en-US/oslo/thread/fcaf10a1-52f9-4ab7-bef5-1ad9f9112948
Here's the problem: in M, you can do tricks like this:
module M
{
type Address;
type Person
{
Addresses : Address*;
FavoriteAddress : Address where value in Addresses;
}
}
In that example, "Addresses" refers to Person.Addresses. The problem, then, is that when you write something innocuous like
module M
{
type T
{
Text : Text;
SomethingElse : Text;
}
}
...then the "Text" in the type ascription for SomethingElse refers not to Language.Text, but to T.Text. And that's what's going wrong. The workaround is to write it like this:
module M
{
type T
{
Text : Text;
SomethingElse : Language.Text;
}
}
(You may wonder why things like "Text : Text" work in the example above. There's a special rule: identifiers in a field's type ascription cannot refer to the field itself. The canonical example for this is "Address : Address".)

Resources