Decision can match input such as "{'A'..'Z', '_', 'a'..'z'}" using multiple alternatives: 1, 3 - antlr3

I am a startbie for this antlr 3.5. I understood that left recursion is accepted in ant;r 4.0 and not in 3.5, I am getting ambigious error warning for my grammar .
I am just verifying my email using this grammar, can some one fix this grammar
grammar HelloWorld;
options
{
// antlr will generate java lexer and parser
language = Java;
// generated parser should create abstract syntax tree
output = AST;
backtrack = true;
}
//as the generated lexer will reside in com.nuwaza.aqua.antlr
//package, we have to add package declaration on top of it
#lexer::header {
package com.nuwaza.aqua.antlr;
}
//as the generated parser will reside in org.meri.antlr_step_by_step.parsers
//package, we have to add package declaration on top of it
#parser::header {
package com.nuwaza.aqua.antlr;
}
// ***************** parser rules:
//our grammar accepts only salutation followed by an end symbol
expression : EmailId At Domain Dot Web EOF;
// ***************** lexer rules:
//the grammar must contain at least one lexer rule
EmailId: (Domain)+;
At : '#';
Domain:(Identifier)+;
Dot: DotOperator;
Web:(Identifier)+|(DotOperator)+|(Identifier)+;
/*Space
:
(
' '
| '\t'
| '\r'
| '\n'
| '\u000C'
)
{
skip();
}
;*/
Identifier
:
(
'a'..'z'
| 'A'..'Z'
| '_'
)
(
'a'..'z'
| 'A'..'Z'
| '_'
| Digit
)*
;
fragment
Digit
:
'0'..'9'
;
fragment DotOperator:'.';

I assume that your problem is in your rule: Identifier. If I were you, I would do something like:
Identifier : ID (ID |Digit)*;
fragment ID : ('a'..'z' | 'A'..'Z' | '_');
I hope this would help you. ;)

I am having two different grammar file and i am trying to use combined grammar for different abstraction.
My code is as follows
HelloWorldParser.g
parser grammar HelloWorldParser;
options
{
// antlr will generate java lexer and parser
language = Java;
// generated parser should create abstract syntax tree
output = AST;
}
//as the generated parser will reside in org.meri.antlr_step_by_step.parsers
//package, we have to add package declaration on top of it
// ***************** parser rules:
//our grammar accepts only salutation followed by an end symbol
expression1
:
Hello World EOF;
and HelloWorldLexer.g
lexer grammar HelloWorldLexer;
//as the generated lexer will reside in com.nuwaza.aqua.antlr
//package, we have to add package declaration on top of it
//as the generated parser will reside in org.meri.antlr_step_by_step.parsers
//package, we have to add package declaration on top of it
// ***************** lexer rules:
Hello: 'Hello';
World: 'World';
My combined grammar is
Test.g
grammar Test;
options
{
// antlr will generate java lexer and parser
language = Java;
// generated parser should create abstract syntax tree
output = AST;
}
import HelloWorldLexer, HelloWorldParser;
#lexer::header {
package com.nuwaza.aqua.antlr;
}
#parser::header {
package com.nuwaza.aqua.antlr;
}
// ***************** parser rules:
//our grammar accepts only salutation followed by an end symbol
expression:expression1;
My LexerParserGenerator is :
package com.nuwaza.aqua.antlr.generator;
import org.antlr.Tool;
public class LexerParserGenerator {
private static final String OUTPUT_DIRECTORY_KEY = "-o";
public static void main(String[] args) {
//provide the grammar ( .g file) residing path
String grammarPath = "./src/main/resources/grammar/Test.g";
//Specify the path with which grammar has to be generated.
String outputPath = "./src/main/java/com/nuwaza/aqua/antlr/";
Tool tool = new Tool(new String[] { grammarPath, OUTPUT_DIRECTORY_KEY,
outputPath });
tool.process();
}
}

Related

JavaCC: A LOOKAHEAD of 2 or greater make my compiler crash?

I am using the Grammar defined in the official Java 8 Language Specification to write a Parser for Java.
In my .jj file I have all of the usual kinds of choice conflicts such as
Warning: Choice conflict involving two expansions at
line 25, column 3 and line 31, column 3 respectively.
A common prefix is:
Consider using a lookahead of 2 for earlier expansion.
or
Warning: Choice conflict in (...)* construct at line 25, column 8.
I did carefully read the Lookahead tutorial from JavaCC but my problem is that whenever I set a LOOKAHEAD(n) where n > 1 and I compile the .jj file the compilation gets stuck and I need to kill the java process.
Why?
CODE
Since I am unable to localize the code which causes my problem I am also not possible to isolate the corresponding code portions.
I was able to restrict the search for the erroneous code fragments as follows:
I have uploaded the code at scribd here.
Please note:
The first rules have a leading // OK comment. This means that when I only have these rules I do get the warnings from the compiler that I have choice conflicts but when I add
LOOKAHEAD(3)
at the corresponding position the warnings disappear.
When I add all successive rules (at once) I am not able to add the
LOOKAHEAD(3) statement anymore. When I do my Eclipse IDE freezes and the javaw.exe process seems get deadlocked or run into an infinite loop when I try to compile the file with JavaCC (which is my actual problem).
Your grammar is so far from LL(1) that it is hard to know where to begin. Let's look at types. After correcting it to follow the grammar in the JLS 8, you have
void Type() :
{ }
{
PrimitiveType() |
ReferenceType()
}
where
void PrimitiveType() :
{ }
{
(Annotation())* NumericType() |
(Annotation())* <KW_boolean>
}
void ReferenceType() :
{ }
{
ClassOrInterfaceType() |
TypeVariable() |
ArrayType()
}
void ClassOrInterfaceType() :
{ }
{
(Annotation())* <Identifier> (TypeArguments())? |
(Annotation())* <Identifier> (TypeArguments())? M()
}
And the error for Type is
Warning: Choice conflict involving two expansions at
line 796, column 3 and line 797, column 3 respectively.
A common prefix is: "#" <Identifier>
Consider using a lookahead of 3 or more for earlier expansion.
The error message tells you exactly what the problem is. There can be annotations at the start of both alternatives in Type. One way to deal with this is to factor out what's common, which is annotations.
Now you have
void Type() :
{ }
{
( Annotation() )*
( PrimitiveType() | ReferenceType() )
}
void PrimitiveType() :
{ }
{
NumericType() |
<KW_boolean>
}
void ReferenceType() :
{ }
{
ClassOrInterfaceType() |
TypeVariable() |
ArrayType()
}
void ClassOrInterfaceType() :
{ }
{
<Identifier> (TypeArguments())? |
<Identifier> (TypeArguments())? M()
}
That fixes the problem with Type. There are still lots of problems, but now there is one less.
For example, all three choices in ReferenceType can start with an identifier. In the end you will want something like this
void Type() :
{ }
{
( Annotation() )*
( PrimitiveType() | ReferenceTypesOtherThanArrays() )
( Dims() )?
}
void PrimitiveType() :
{ }
{
NumericType() | <KW_boolean>
}
void ReferenceTypesOtherThanArrays() :
{ }
{
<Identifier>
( TypeArguments() )?
(
<Token_Dot>
( Annotation() )*
<Identifier>
( TypeArguments() )?
)*
}
Notice that TypeVariable is gone. This is because there is no way to syntactically distinguish a type variable from a class (or interface) name. Thus the grammar just above will accept, say T.x, where T is a type variable, whereas the JLS grammar does not. This is the kind of error you can only rule out using a symbol table. There are a few of situations like this in Java; for example, without a symbol table, you can't tell a package name from a class name or a class name from a variable name; in an expression a.b.c, a could be a package name, a class name, an interface name, a type variable, a variable, or a field name.
You can handle these sorts of issues in one of two ways: you can deal with the problem after parsing, i.e. in a later phase, or you can have a symbol table present during the parsing phase and use the symbol table to guide the parser using semantic lookahead. The latter option is not a good one for Java, however; it is best to parse first and deal with all issues that need a symbol table later. This is because, in Java a symbol can be declared after it is used. It might even be declared in another file. What we did in the Java compiler for the Teaching Machine was to parse all files first. Then build a symbol table. Then do semantic analysis. Of course if your application does not require diagnosing all errors, then these considerations can largely be ignored.

Antlr v3 comment processing VHDL

I am facing an ANTLR problem in a VHDL grammar I wrote. VHDL doesn't have true multiline comments, and no pragmas, so tool vendors invented a comment based mechanism to exclude certain parts of the code, something like
-- pragma translate_off
code to disregard
-- pragma translate_on
('--' introduces a comment in VHDL) where the actual code for the pragma varies, "synopsys translate off" and "rtl translate_off" are known variants.
the part of the ANTLR grammar handling comments is now
#lexer::members {
private static final Pattern translateOnPattern = Pattern.compile("\\s*--\\s*(rtl_synthesis\\s+on|(pragma|synthesis|synopsys)\\s+translate(\\s|_)on)\\s*");
private static final Pattern translateOffPattern = Pattern.compile("\\s*-- \\s*(rtl_synthesis\\s+off|(pragma|synthesis|synopsys)\\s+translate(\\s|_)off)\\s*");
private boolean translateOn = true;
}
[...]
COMMENT
: '--' ( ~( '\n' | '\r' ) )*
{
$channel = CHANNEL_COMMENT;
String content = getText();
Matcher mOn = translateOnPattern.matcher(content);
if(mOn.matches()) {
translateOn = true;
}
Matcher mOff = translateOffPattern.matcher(content);
if(mOff.matches()) {
translateOn = false;
}
}
;
The problem is that my comments go to the hidden channel and while I can recognize these pragmas by processing the comment in a lexer action using regex, I have not found a way to direct all coming tokens to the hidden channel until "-- pragma translate_on". Is that possbile or would you generally use a different approach?

How should I handle a template string in Xtext?

I have the following input:
a: "a is {Foo foo}"
where foo is of type Foo; the business layer will eventually provide a value for the variable. After processing the input, two files will be generated: A properties file and a Java class:
a=a is {0}
public static I18nMessage a( Foo foo ) {
return new I18nMessageBuilder().id( "a" ).args( foo ).build();
}
The idea is that I assign each message an id which gives me a Java class that contains methods (where name == id) and which accepts typed parameters to complete the messages.
My question: How should I handle the text strings in my Xtext grammar? I would like to have code completion for the parameter types (Foo) but I have no idea how to handle the rest of the string which can contain spaces and any valid Unicode character.
Suggestions?
I found a better solution which gives a pretty simple grammar and solves some other problems:
a(Foo foo): "a is " foo;
So the text message is a list of strings and parameters. This way, code completion is very simple and there is no need for escape sequences if you want to add formatters:
a(Date d): "Date is " d( "short" );

Why does one of these statements compile in Scala but not the other?

(Note: I'm using Scala 2.7.7 here, not 2.8).
I'm doing something pretty simple -- creating a map based on the values in a simple, 2-column CSV file -- and I've completed it easily enough, but I'm perplexed at why my first attempt didn't compile. Here's the code:
// Returns Iterator[String]
private def getLines = Source.fromFile(csvFilePath).getLines
// This doesn't compile:
def mapping: Map[String,String] = {
Map(getLines map { line: String =>
val pairArr = line.split(",")
pairArr(0) -> pairArr(1).trim()
}.toList:_*)
}
// This DOES compile
def mapping: Map[String,String] = {
def strPair(line: String): (String,String) = {
val pairArr = line.split(",")
pairArr(0) -> pairArr(1).trim()
}
Map(getLines.map( strPair(_) ).toList:_*)
}
The compiler error is
CsvReader.scala:16:
error: value toList is not a member of
(St ring) => (java.lang.String,
java.lang.String) [scalac] possible
cause: maybe a semicolon is missing
before `value toList'? [scalac]
}.toList:_*) [scalac] ^
[scalac] one error found
So what gives? They seem like they should be equivalent to me, apart from the explicit function definition (vs. anonymous in the nonworking example) and () vs. {}. If I replace the curly braces with parentheses in the nonworking example, the error is "';' expected, but 'val' found." But if I remove the local variable definition and split the string twice AND use parens instead of curly braces, it compiles. Can someone explain this difference to me, preferably with a link to Scala docs explaining the difference between parens and curly braces when used to surround method arguments?
Looks like the difference is because you are using the operator notation in the first example. If you add an extra set of parentheses it works:
def mapping: Map[String,String] = {
Map((getLines map { line: String =>
val pairArr = line.split(",")
pairArr(0) -> pairArr(1).trim()
}).toList:_*)
}
or if you don't use the operator syntax it works
def mapping: Map[String,String] = {
Map(getLines.map({ line: String =>
val pairArr = line.split(",")
pairArr(0) -> pairArr(1).trim()
}).toList:_*)
}
I think the problem is that using the normal method invocation syntax has higher precedence than the operator syntax for method calls. This meant that the .toList was being applied to the anonymous function rather than to the result of the map method call.
If you don't use operator syntax, it compiles fine:
//Compiles
def mapping: Map[String,String] = {
Map(getLines.map { line: String =>
val pairArr = line.split(",")
pairArr(0) -> pairArr(1).trim()
}.toList:_*)
}
There is not a problem with how you use the anonymous function, but as Ben mentioned, the syntax of calls map without the . is not equivalent to the typical Java-style method call.

Explanation of Oslo error "M0197: 'Text' cannot be used in a Type context"?

In Microsoft Oslo SDK CTP 2008 (using Intellipad) the following code compiles fine:
module M {
type T {
Text : Text;
}
}
while compiling the below code leads to the error "M0197: 'Text' cannot be used in a Type context"
module M {
type T {
Text : Text;
Value : Text; // error
}
}
I do not see the difference between the examples, as in the first case Text is also used in a Type context.
UPDATE:
To add to the confusion, consider the following example, which also compiles fine:
module M {
type X;
type T {
X : X;
Y : X;
}
}
The M Language Specification states that:
Field declarations override lexical scoping to prevent the type of a declaration binding to the declaration itself. The ascribed type of a field declaration must not be the declaration itself; however, the declaration may be used in a constraint. Consider the following example:
type A;
type B {
A : A;
}
The lexically enclosing scope for the type ascription of the field declaration A is the entity declaration B. With no exception, the type ascription A would bind to the field declaration in a circular reference which is an error. The exception allows lexical lookup to skip the field declaration in this case.
It seems that user defined types and built-in (intrinsic) types are not treated equal.
UPDATE2:
Note that Value in the above example is not a reserved keyword. The same error results if you rename Value to Y.
Any ideas?
Regards, tamberg
From what I am seeing you have redefined Text:
Text : Text
and then you are attempting to use it for the type of Value:
Value : Text
which is not allowed. Why using a type name as a property redefines a type I'm not entirely clear on (still reading M language specification), but I'm sure there's a good reason for it. Just name Text something that's not already a defined type (escaping it with brackets ([Text]) does not work either).
http://social.msdn.microsoft.com/Forums/en-US/oslo/thread/fcaf10a1-52f9-4ab7-bef5-1ad9f9112948
Here's the problem: in M, you can do tricks like this:
module M
{
type Address;
type Person
{
Addresses : Address*;
FavoriteAddress : Address where value in Addresses;
}
}
In that example, "Addresses" refers to Person.Addresses. The problem, then, is that when you write something innocuous like
module M
{
type T
{
Text : Text;
SomethingElse : Text;
}
}
...then the "Text" in the type ascription for SomethingElse refers not to Language.Text, but to T.Text. And that's what's going wrong. The workaround is to write it like this:
module M
{
type T
{
Text : Text;
SomethingElse : Language.Text;
}
}
(You may wonder why things like "Text : Text" work in the example above. There's a special rule: identifiers in a field's type ascription cannot refer to the field itself. The canonical example for this is "Address : Address".)

Resources