Antlr3 MySQL Parser can not parse some sql statements - antlr3

I am planning to build a simple sql parser to analyze mysql statements. I use antlr3 to build this parser. here is a piece of my antlr3 file.
This file works fine for some sql statement, but others will cause a exception.
For example, "set ##global.sysvar = 123" and "set ##session.sysvar = 123" works fine.
But "set global sysvar = 123" and "set session sysvar = 123" will cause a exception :
Recognition exception MismatchedTokenException(0!=0)
Error node 3 (,1:4], resync=set global sysvarid = 123>/0)
WHY ????
grammar zhihu;
options
{
language=Java;
output=AST;
//ASTLabelType=CommonTree;
backtrack=true;
}
fragment A_ : 'a' | 'A';
fragment B_ : 'b' | 'B';
fragment C_ : 'c' | 'C';
fragment D_ : 'd' | 'D';
fragment E_ : 'e' | 'E';
fragment F_ : 'f' | 'F';
fragment G_ : 'g' | 'G';
fragment H_ : 'h' | 'H';
fragment I_ : 'i' | 'I';
fragment J_ : 'j' | 'J';
fragment K_ : 'k' | 'K';
fragment L_ : 'l' | 'L';
fragment M_ : 'm' | 'M';
fragment N_ : 'n' | 'N';
fragment O_ : 'o' | 'O';
fragment P_ : 'p' | 'P';
fragment Q_ : 'q' | 'Q';
fragment R_ : 'r' | 'R';
fragment S_ : 's' | 'S';
fragment T_ : 't' | 'T';
fragment U_ : 'u' | 'U';
fragment V_ : 'v' | 'V';
fragment W_ : 'w' | 'W';
fragment X_ : 'x' | 'X';
fragment Y_ : 'y' | 'Y';
fragment Z_ : 'z' | 'Z';
GLOBAL : G_ L_ O_ B_ A_ L_ ;
SESSION : S_ E_ S_ S_ I_ O_ N_ ;
SET : S_ E_ T_ ;
//SYSDATE : S_ Y_ S_ D_ A_ T_ E_ ;
//SYSTEM_USER : S_ Y_ S_ T_ E_ M_ '_' U_ S_ E_ R_ ;
EQ : '=';
SET_VAR : ':=' ;
COMMA : ',' ;
DOT : '.' ;
INTEGER_NUM: ('0'..'9')+ ;
ID:
( 'A'..'Z' | 'a'..'z' | '_' | '$') ( 'A'..'Z' | 'a'..'z' | '_' | '$' | '0'..'9' )*
;
SYS_VAR
: GLOBAL ID
| SESSION ID
| ('##' | ('##' GLOBAL DOT) | ('##' SESSION DOT)) ID
;
WHITE_SPACE : ( ' '|'\r'|'\t'|'\n' ) {$channel=HIDDEN;} ;
set_sysvar_statement:
SET SYS_VAR (SET_VAR | EQ) INTEGER_NUM (COMMA SYS_VAR (SET_VAR | EQ) INTEGER_NUM)*
;

Related

Parsing iOS/macOS localizable strings file with antlr4

I am trying to parse the localized "strings" files of macOS/iOS.
The format of this file is based on key/value pairs, with optional comments. An example follows:
/* This is a comment */
// This is also a comment
"key1" = "value1";
"key2" = "value2";
and so on. NOTE inside the "" could be absolutely any kind of text.
EDIT Original errorneus grammar removed
I tried to write this simple grammar, but unfortunately it doesn't work.
Since the contents inside the quotes could be quite tricky, not to mention the comments, I feel that usual regex has no real power there.
EDIT based on the comments by #GRosenberg I've created a new grammar. Now I have the problem that I can't include "Symbols" as a Char, or else parsing will break.
grammar LProj;
Esc : '\\';
Spaces : [ \t\r\n\f]+;
BlockComment : '/*' .*? ('*/' | EOF) ;
LineComment : '//' ~[\r\n]* ( '\r'? '\n' [ \t]* '//' ~[\r\n]* )* ;
MLN_COMMENT: BlockComment -> channel(HIDDEN) ;
SLN_COMMENT: LineComment -> channel(HIDDEN) ;
doc : expression*;
expression
: BlockComment
| LineComment
| Spaces
| entry
;
entry : '"' key=VALUE '"' Spaces? '=' Spaces? '"' value=VALUE '"' Spaces? ';' ;
VALUE : ( EscSeq | Val )+ ;
fragment Val : Char ( EscSeq | Char )* ;
fragment Symbol
: '*'
| '/'
| ';'
| '='
;
fragment Char
: Spaces
| '!' // skip "
| '#'..')' // skip *
| '+'..'.' // skip /
| '0'..':' // skip ;
| '<' // skip =
| '>'..'[' // skip \
| ']'..'~'
| '\u00B7'..'\ufffd'
; // ignores | ['\u10000-'\uEFFFF] ;
fragment UnicodeEsc
: 'u' (Hex (Hex (Hex Hex?)?)?)?
;
fragment Hex : [0-9a-fA-F] ;
fragment EscSeq
: Esc
( [btnfr"\\] // standard escaped character set
| UnicodeEsc // standard Unicode escape sequence
| . // Invalid escape character
| EOF // Incomplete at EOF
)
;
The Antlr grammar repository, provides good examples of how to achieve the stated goal. Just define the ID terminal to allow for inclusion of escape sequences.
Thus (with obvious details omitted),
id : QUOTE key=ID EQ val=ID QUOTE ;
DOC_COMMENT: DocComment -> channel(HIDDEN) ;
MLN_COMMENT: BlockComment -> channel(HIDDEN) ;
SLN_COMMENT: LineComment -> channel(HIDDEN) ;
NAME : NameStartChar NameChar* ;
VALUE : ( EsqSeq | Val )+ ;
fragment Val : NameStartChar ( EsqSeq | NameChar )* ;
fragment Hws : [ \t] ;
fragment Vws : [\r\n\f] ;
fragment DocComment : '/**' .*? ('*/' | EOF) ;
fragment BlockComment : '/*' .*? ('*/' | EOF) ;
fragment LineComment : '//' ~[\r\n]* ( '\r'? '\n' Hws* '//' ~[\r\n]* )* ;
// escaped short-cut character or Unicode literal
fragment EscSeq
: Esc
( [btnfr"\\] // standard escaped character set
| UnicodeEsc // standard Unicode escape sequence
| . // Invalid escape character
| EOF // Incomplete at EOF
)
;
fragment Esc : '\\' ;
fragment UnicodeEsc
: 'u' (Hex (Hex (Hex Hex?)?)?)?
;
// A valid hex digit
fragment Hex : [0-9a-fA-F] ;
fragment NameChar
: NameStartChar
| '0'..'9'
| '_'
| '\u00B7'
| '\u0300'..'\u036F'
| '\u203F'..'\u2040'
;
fragment NameStartChar
: 'A'..'Z'
| 'a'..'z'
| '\u00C0'..'\u00D6'
| '\u00D8'..'\u00F6'
| '\u00F8'..'\u02FF'
| '\u0370'..'\u037D'
| '\u037F'..'\u1FFF'
| '\u200C'..'\u200D'
| '\u2070'..'\u218F'
| '\u2C00'..'\u2FEF'
| '\u3001'..'\uD7FF'
| '\uF900'..'\uFDCF'
| '\uFDF0'..'\uFFFD'
; // ignores | ['\u10000-'\uEFFFF] ;

Antlr4 error when running parser

I am relatively new to ANTLR and was working on adding code to a parser for my purposes. I got it set up but it gives me a error:
line 1:192 extraneous input '"cafe,restaurant,hotel"' expecting {'"', ')'}
for the input:
select * from source S where overlap(S.txt, "cafe,restaurant,hotel") > 0 Collective ( order by 1/Dist (S.loc, q1.loc) + Overlap(S.txt, "cafe, restaurant, hotel") limit contains(Collect(S.txt),"cafe,restaurant,hotel") Skip contains(collect(S.txt),S.txt))
I am thinking my problem is at this line:
limit
:(K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' ('"'any_name'"' )? ')'(any_name)? )?)?
//(K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' '"'any_name'"' ')')?)?
;
Here is my SQLite.g4 file for reference:
/*
* The MIT License (MIT)
*
* Copyright (c) 2014 by Bart Kiers
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*
* Project : sqlite-parser; an ANTLR4 grammar for SQLite
* https://github.com/bkiers/sqlite-parser
* Developed by : Bart Kiers, bart#big-o.nl
*/
grammar SQLite;
parse
: ( sql_stmt_list | error )* EOF
;
error
: UNEXPECTED_CHAR
{
throw new RuntimeException("UNEXPECTED_CHAR=" + $UNEXPECTED_CHAR.text);
}
;
sql_stmt_list
: ';'* sql_stmt ( ';'+ sql_stmt )* ';'*
;
sql_stmt
: ( K_EXPLAIN ( K_QUERY K_PLAN )? )? ( alter_table_stmt
| analyze_stmt
| attach_stmt
| begin_stmt
| commit_stmt
| compound_select_stmt
| create_index_stmt
| create_table_stmt
| create_trigger_stmt
| create_view_stmt
| create_virtual_table_stmt
| delete_stmt
| delete_stmt_limited
| detach_stmt
| drop_index_stmt
| drop_table_stmt
| drop_trigger_stmt
| drop_view_stmt
| factored_select_stmt
| insert_stmt
| pragma_stmt
| reindex_stmt
| release_stmt
| rollback_stmt
| savepoint_stmt
| simple_select_stmt
| select_stmt
| update_stmt
| update_stmt_limited
| vacuum_stmt )
;
alter_table_stmt
: K_ALTER K_TABLE ( database_name '.' )? table_name
( K_RENAME K_TO new_table_name
| K_ADD K_COLUMN? column_def
)
;
analyze_stmt
: K_ANALYZE ( database_name | table_or_index_name | database_name '.' table_or_index_name )?
;
attach_stmt
: K_ATTACH K_DATABASE? expr K_AS database_name
;
begin_stmt
: K_BEGIN ( K_DEFERRED | K_IMMEDIATE | K_EXCLUSIVE )? ( K_TRANSACTION transaction_name? )?
;
commit_stmt
: ( K_COMMIT | K_END ) ( K_TRANSACTION transaction_name? )?
;
compound_select_stmt
: with_clause?
select_core ( ( K_UNION K_ALL? | K_INTERSECT | K_EXCEPT ) select_core )+
( K_ORDER K_BY ordering_term ( ',' ordering_term )* )?
( K_LIMIT expr ( ( K_OFFSET | ',' ) expr )? )?
;
create_index_stmt
: K_CREATE K_UNIQUE? K_INDEX ( K_IF K_NOT K_EXISTS )?
( database_name '.' )? index_name K_ON table_name '(' indexed_column ( ',' indexed_column )* ')'
( K_WHERE expr )?
;
create_table_stmt
: K_CREATE ( K_TEMP | K_TEMPORARY )? K_TABLE ( K_IF K_NOT K_EXISTS )?
( database_name '.' )? table_name
( '(' column_def ( ',' column_def )*? ( ',' table_constraint )* ')' ( K_WITHOUT IDENTIFIER )?
| K_AS select_stmt
)
;
create_trigger_stmt
: K_CREATE ( K_TEMP | K_TEMPORARY )? K_TRIGGER ( K_IF K_NOT K_EXISTS )?
( database_name '.' )? trigger_name ( K_BEFORE | K_AFTER | K_INSTEAD K_OF )?
( K_DELETE | K_INSERT | K_UPDATE ( K_OF column_name ( ',' column_name )* )? ) K_ON ( database_name '.' )? table_name
( K_FOR K_EACH K_ROW )? ( K_WHEN expr )?
K_BEGIN ( ( update_stmt | insert_stmt | delete_stmt | select_stmt ) ';' )+ K_END
;
create_view_stmt
: K_CREATE ( K_TEMP | K_TEMPORARY )? K_VIEW ( K_IF K_NOT K_EXISTS )?
( database_name '.' )? view_name K_AS select_stmt
;
create_virtual_table_stmt
: K_CREATE K_VIRTUAL K_TABLE ( K_IF K_NOT K_EXISTS )?
( database_name '.' )? table_name
K_USING module_name ( '(' module_argument ( ',' module_argument )* ')' )?
;
delete_stmt
: with_clause? K_DELETE K_FROM qualified_table_name
( K_WHERE expr )?
;
delete_stmt_limited
: with_clause? K_DELETE K_FROM qualified_table_name
( K_WHERE expr )?
( ( K_ORDER K_BY ordering_term ( ',' ordering_term )* )?
K_LIMIT expr ( ( K_OFFSET | ',' ) expr )?
)?
;
detach_stmt
: K_DETACH K_DATABASE? database_name
;
drop_index_stmt
: K_DROP K_INDEX ( K_IF K_EXISTS )? ( database_name '.' )? index_name
;
drop_table_stmt
: K_DROP K_TABLE ( K_IF K_EXISTS )? ( database_name '.' )? table_name
;
drop_trigger_stmt
: K_DROP K_TRIGGER ( K_IF K_EXISTS )? ( database_name '.' )? trigger_name
;
drop_view_stmt
: K_DROP K_VIEW ( K_IF K_EXISTS )? ( database_name '.' )? view_name
;
factored_select_stmt
: with_clause?
select_core ( compound_operator select_core )*
( K_ORDER K_BY ordering_term ( ',' ordering_term )* )?
( K_LIMIT expr ( ( K_OFFSET | ',' ) expr )? )?
;
insert_stmt
: with_clause? ( K_INSERT
| K_REPLACE
| K_INSERT K_OR K_REPLACE
| K_INSERT K_OR K_ROLLBACK
| K_INSERT K_OR K_ABORT
| K_INSERT K_OR K_FAIL
| K_INSERT K_OR K_IGNORE ) K_INTO
( database_name '.' )? table_name ( '(' column_name ( ',' column_name )* ')' )?
( K_VALUES '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )*
| select_stmt
| K_DEFAULT K_VALUES
)
;
pragma_stmt
: K_PRAGMA ( database_name '.' )? pragma_name ( '=' pragma_value
| '(' pragma_value ')' )?
;
reindex_stmt
: K_REINDEX ( collation_name
| ( database_name '.' )? ( table_name | index_name )
)?
;
release_stmt
: K_RELEASE K_SAVEPOINT? savepoint_name
;
rollback_stmt
: K_ROLLBACK ( K_TRANSACTION transaction_name? )? ( K_TO K_SAVEPOINT? savepoint_name )?
;
savepoint_stmt
: K_SAVEPOINT savepoint_name
;
simple_select_stmt
: with_clause?
select_core ( K_ORDER K_BY ordering_term ( ',' ordering_term )* )?
( K_LIMIT expr ( ( K_OFFSET | ',' ) expr )? )?
;
select_stmt
: with_clause?
select_or_values ( compound_operator select_or_values )*
( K_ORDER K_BY ordering_term ( ',' ordering_term )* )?
( K_LIMIT limit | (expr ( ( K_OFFSET | ',' ) expr )?) )?
( K_SKIP (K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' ('"'(any_name)'"' ')')? (file_read)? )?)?)?
;//Skip and Contains will come here added
select_or_values
: K_SELECT ( K_DISTINCT | K_ALL )? result_column ( ',' result_column )* (',' K_OVER'(' K_PARTITION K_BY (K_WITHIN K_DISTANCE)? '('any_name '.' any_name ',' any_name')' ',' K_COLLECTIVE'(' (K_SKIP (K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' ('"'(any_name)'"' )? (file_read)? )?)?')' ( K_LIMIT (K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' ('"'(any_name)'"')? ')'')')))? (K_HAVING expr K_ORDER K_BY ordering_term ( ',' ordering_term )*)?)?)?
( K_FROM ( table_or_subquery ( ',' table_or_subquery )* | join_clause ) )?
( K_WHERE expr )?
( K_COLLECTIVE '(' K_ORDER K_BY ordering_term ( K_LIMIT limit K_SKIP (K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' ('"'(any_name)'"' )? (file_read)? )?')')?)? ')')?
( K_GROUP K_BY expr ( ',' expr )* ( K_HAVING expr )? )?
| K_VALUES '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )*
;
//Partition by added here
update_stmt
: with_clause? K_UPDATE ( K_OR K_ROLLBACK
| K_OR K_ABORT
| K_OR K_REPLACE
| K_OR K_FAIL
| K_OR K_IGNORE )? qualified_table_name
K_SET column_name '=' expr ( ',' column_name '=' expr )* ( K_WHERE expr )?
;
update_stmt_limited
: with_clause? K_UPDATE ( K_OR K_ROLLBACK
| K_OR K_ABORT
| K_OR K_REPLACE
| K_OR K_FAIL
| K_OR K_IGNORE )? qualified_table_name
K_SET column_name '=' expr ( ',' column_name '=' expr )* ( K_WHERE expr )?
( ( K_ORDER K_BY ordering_term ( ',' ordering_term )* )?
K_LIMIT expr ( ( K_OFFSET | ',' ) expr )?
)?
;
vacuum_stmt
: K_VACUUM
;
column_def
: column_name type_name? column_constraint*
;
type_name
: name+? ( '(' signed_number ')'
| '(' signed_number ',' signed_number ')' )?
;
column_constraint
: ( K_CONSTRAINT name )?
( K_PRIMARY K_KEY ( K_ASC | K_DESC )? conflict_clause K_AUTOINCREMENT?
| K_NOT? K_NULL conflict_clause
| K_UNIQUE conflict_clause
| K_CHECK '(' expr ')'
| K_DEFAULT (signed_number | literal_value | '(' expr ')')
| K_COLLATE collation_name
| foreign_key_clause
)
;
conflict_clause
: ( K_ON K_CONFLICT ( K_ROLLBACK
| K_ABORT
| K_FAIL
| K_IGNORE
| K_REPLACE
)
)?
;
//added limit here
limit
:(K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' ('"'any_name'"' )? ')'(any_name)? )?)?
//(K_CONTAINS ( '(' K_COLLECT '('(file_read)')' ',' '"'any_name'"' ')')?)?
;
/*
SQLite understands the following binary operators, in order from highest to
lowest precedence:
||
* / %
+ -
<< >> & |
< <= > >=
= == != <> IS IS NOT IN LIKE GLOB MATCH REGEXP
AND
OR
*/
expr
: literal_value
| BIND_PARAMETER
| ( ( database_name '.' )? table_name '.' )? column_name
| unary_operator expr
| expr '||' expr
| expr ( '*' | '/' | '%' ) expr
| expr ( '+' | '-' ) expr
| expr ( '<<' | '>>' | '&' | '|' ) expr
| expr ( '<' | '<=' | '>' | '>=' ) expr
| expr ( '=' | '==' | '!=' | '<>' ) expr
| expr K_AND expr
| expr K_OR expr
| function_name '(' ( K_DISTINCT? expr ( ',' expr )* | '*' )? ')'
| '(' expr ')'
| K_CAST '(' expr K_AS type_name ')'
| expr K_COLLATE collation_name
| expr K_NOT? ( K_LIKE | K_GLOB | K_REGEXP | K_MATCH ) expr ( K_ESCAPE expr )?
| expr ( K_ISNULL | K_NOTNULL | K_NOT K_NULL )
| expr K_IS K_NOT? expr
| expr K_NOT? K_BETWEEN expr K_AND expr
| expr K_NOT? K_IN ( '(' ( select_stmt
| expr ( ',' expr )*
)?
')'
| ( database_name '.' )? table_name )
| ( ( K_NOT )? K_EXISTS )? '(' select_stmt ')'
| K_CASE expr? ( K_WHEN expr K_THEN expr )+ ( K_ELSE expr )? K_END
| raise_function
;
foreign_key_clause
: K_REFERENCES foreign_table ( '(' column_name ( ',' column_name )* ')' )?
( ( K_ON ( K_DELETE | K_UPDATE ) ( K_SET K_NULL
| K_SET K_DEFAULT
| K_CASCADE
| K_RESTRICT
| K_NO K_ACTION )
| K_MATCH name
)
)*
( K_NOT? K_DEFERRABLE ( K_INITIALLY K_DEFERRED | K_INITIALLY K_IMMEDIATE )? )?
;
raise_function
: K_RAISE '(' ( K_IGNORE
| ( K_ROLLBACK | K_ABORT | K_FAIL ) ',' error_message )
')'
;
indexed_column
: column_name ( K_COLLATE collation_name )? ( K_ASC | K_DESC )?
;
table_constraint
: ( K_CONSTRAINT name )?
( ( K_PRIMARY K_KEY | K_UNIQUE ) '(' indexed_column ( ',' indexed_column )* ')' conflict_clause
| K_CHECK '(' expr ')'
| K_FOREIGN K_KEY '(' column_name ( ',' column_name )* ')' foreign_key_clause
)
;
with_clause
: K_WITH K_RECURSIVE? common_table_expression ( ',' common_table_expression )*
;
qualified_table_name
: ( database_name '.' )? table_name ( K_INDEXED K_BY index_name
| K_NOT K_INDEXED )?
;
ordering_term
: expr ( K_COLLATE collation_name )? ( K_ASC | K_DESC )?
;
pragma_value
: signed_number
| name
| STRING_LITERAL
;
common_table_expression
: table_name ( '(' column_name ( ',' column_name )* ')' )? K_AS '(' select_stmt ')'
;
result_column
: '*'
| table_name '.' '*'
| expr ( K_AS? column_alias )?
;
table_or_subquery
: ( schema_name '.' )? table_name ( K_AS? table_alias )?
( K_INDEXED K_BY index_name
| K_NOT K_INDEXED )?
| ( schema_name '.' )? table_function_name '(' ( expr ( ',' expr )* )? ')' ( K_AS? table_alias )?
| '(' ( table_or_subquery ( ',' table_or_subquery )*
| join_clause )
')'
| '(' select_stmt ')' ( K_AS? table_alias )?
;
join_clause
: table_or_subquery ( join_operator table_or_subquery join_constraint )*
;
join_operator
: ','
| K_NATURAL? ( K_LEFT K_OUTER? | K_INNER | K_CROSS )? K_JOIN
;
join_constraint
: ( K_ON expr
| K_USING '(' column_name ( ',' column_name )* ')' )?
;
select_core
: K_SELECT ( K_DISTINCT | K_ALL )? result_column ( ',' result_column )*
( K_FROM ( table_or_subquery ( ',' table_or_subquery )* | join_clause ) )?
( K_WHERE expr )?
( K_GROUP K_BY expr ( ',' expr )* ( K_HAVING expr )? )?
| K_VALUES '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )*
;
compound_operator
: K_UNION
| K_UNION K_ALL
| K_INTERSECT
| K_EXCEPT
;
signed_number
: ( '+' | '-' )? NUMERIC_LITERAL
;
literal_value
: NUMERIC_LITERAL
| STRING_LITERAL
| BLOB_LITERAL
| K_NULL
| K_CURRENT_TIME
| K_CURRENT_DATE
| K_CURRENT_TIMESTAMP
;
unary_operator
: '-'
| '+'
| '~'
| K_NOT
;
error_message
: STRING_LITERAL
;
module_argument // TODO check what exactly is permitted here
: expr
| column_def
;
column_alias
: IDENTIFIER
| STRING_LITERAL
;
keyword
: K_ABORT
| K_ACTION
| K_ADD
| K_AFTER
| K_ALL
| K_ALTER
| K_ANALYZE
| K_AND
| K_AS
| K_ASC
| K_ATTACH
| K_AUTOINCREMENT
| K_BEFORE
| K_BEGIN
| K_BETWEEN
| K_BY
| K_CASCADE
| K_CASE
| K_CAST
| K_CHECK
| K_COLLATE
| K_COLUMN
| K_COMMIT
| K_CONFLICT
| K_CONSTRAINT
| K_CREATE
| K_CROSS
| K_CURRENT_DATE
| K_CURRENT_TIME
| K_CURRENT_TIMESTAMP
| K_DATABASE
| K_DEFAULT
| K_DEFERRABLE
| K_DEFERRED
| K_DELETE
| K_DESC
| K_DETACH
| K_DISTINCT
| K_DROP
| K_EACH
| K_ELSE
| K_END
| K_ESCAPE
| K_EXCEPT
| K_EXCLUSIVE
| K_EXISTS
| K_EXPLAIN
| K_FAIL
| K_FOR
| K_FOREIGN
| K_FROM
| K_FULL
| K_GLOB
| K_GROUP
| K_HAVING
| K_IF
| K_IGNORE
| K_IMMEDIATE
| K_IN
| K_INDEX
| K_INDEXED
| K_INITIALLY
| K_INNER
| K_INSERT
| K_INSTEAD
| K_INTERSECT
| K_INTO
| K_IS
| K_ISNULL
| K_JOIN
| K_KEY
| K_LEFT
| K_LIKE
| K_LIMIT
| K_MATCH
| K_NATURAL
| K_NO
| K_NOT
| K_NOTNULL
| K_NULL
| K_OF
| K_OFFSET
| K_ON
| K_OR
| K_ORDER
| K_OUTER
| K_PLAN
| K_PRAGMA
| K_PRIMARY
| K_QUERY
| K_RAISE
| K_RECURSIVE
| K_REFERENCES
| K_REGEXP
| K_REINDEX
| K_RELEASE
| K_RENAME
| K_REPLACE
| K_RESTRICT
| K_RIGHT
| K_ROLLBACK
| K_ROW
| K_SAVEPOINT
| K_SELECT
| K_SET
| K_TABLE
| K_TEMP
| K_TEMPORARY
| K_THEN
| K_TO
| K_TRANSACTION
| K_TRIGGER
| K_UNION
| K_UNIQUE
| K_UPDATE
| K_USING
| K_VACUUM
| K_VALUES
| K_VIEW
| K_VIRTUAL
| K_WHEN
| K_WHERE
| K_WITH
| K_WITHOUT
| K_PARTITION
| K_SKIP
| K_OVER
| K_COLLECTIVE
| K_WITHIN
| K_DISTANCE
| K_CONTAINS
| K_COLLECT
;
// TODO check all names below
name
: any_name
;
function_name
: any_name
;
database_name
: any_name
;
schema_name
: any_name
;
table_function_name
: any_name
;
table_name
: any_name
;
table_or_index_name
: any_name
;
new_table_name
: any_name
;
column_name
: any_name
;
collation_name
: any_name
;
foreign_table
: any_name
;
index_name
: any_name
;
trigger_name
: any_name
;
view_name
: any_name
;
module_name
: any_name
;
pragma_name
: any_name
;
savepoint_name
: any_name
;
table_alias
: IDENTIFIER
| STRING_LITERAL
| '(' table_alias ')'
;
transaction_name
: any_name
;
file_read
: any_name '.' any_name
;
//added
inner_loop
: IDENTIFIER
| inner_loop ',' inner_loop
| any_name '.' any_name ',' any_name
;
any_name
: IDENTIFIER
| keyword
| STRING_LITERAL
| '(' any_name ')'
;
SCOL : ';';
DOT : '.';
OPEN_PAR : '(';
CLOSE_PAR : ')';
COMMA : ',';
ASSIGN : '=';
STAR : '*';
PLUS : '+';
MINUS : '-';
TILDE : '~';
PIPE2 : '||';
DIV : '/';
MOD : '%';
LT2 : '<<';
GT2 : '>>';
AMP : '&';
PIPE : '|';
LT : '<';
LT_EQ : '<=';
GT : '>';
GT_EQ : '>=';
EQ : '==';
NOT_EQ1 : '!=';
NOT_EQ2 : '<>';
// http://www.sqlite.org/lang_keywords.html
K_ABORT : A B O R T;
K_ACTION : A C T I O N;
K_ADD : A D D;
K_AFTER : A F T E R;
K_ALL : A L L;
K_ALTER : A L T E R;
K_ANALYZE : A N A L Y Z E;
K_AND : A N D;
K_AS : A S;
K_ASC : A S C;
K_ATTACH : A T T A C H;
K_AUTOINCREMENT : A U T O I N C R E M E N T;
K_BEFORE : B E F O R E;
K_BEGIN : B E G I N;
K_BETWEEN : B E T W E E N;
K_BY : B Y;
K_CASCADE : C A S C A D E;
K_CASE : C A S E;
K_CAST : C A S T;
K_CHECK : C H E C K;
K_COLLATE : C O L L A T E;
K_COLUMN : C O L U M N;
K_COMMIT : C O M M I T;
K_CONFLICT : C O N F L I C T;
K_CONSTRAINT : C O N S T R A I N T;
K_CREATE : C R E A T E;
K_CROSS : C R O S S;
K_CURRENT_DATE : C U R R E N T '_' D A T E;
K_CURRENT_TIME : C U R R E N T '_' T I M E;
K_CURRENT_TIMESTAMP : C U R R E N T '_' T I M E S T A M P;
K_DATABASE : D A T A B A S E;
K_DEFAULT : D E F A U L T;
K_DEFERRABLE : D E F E R R A B L E;
K_DEFERRED : D E F E R R E D;
K_DELETE : D E L E T E;
K_DESC : D E S C;
K_DETACH : D E T A C H;
K_DISTINCT : D I S T I N C T;
K_DROP : D R O P;
K_EACH : E A C H;
K_ELSE : E L S E;
K_END : E N D;
K_ESCAPE : E S C A P E;
K_EXCEPT : E X C E P T;
K_EXCLUSIVE : E X C L U S I V E;
K_EXISTS : E X I S T S;
K_EXPLAIN : E X P L A I N;
K_FAIL : F A I L;
K_FOR : F O R;
K_FOREIGN : F O R E I G N;
K_FROM : F R O M;
K_FULL : F U L L;
K_GLOB : G L O B;
K_GROUP : G R O U P;
K_HAVING : H A V I N G;
K_IF : I F;
K_IGNORE : I G N O R E;
K_IMMEDIATE : I M M E D I A T E;
K_IN : I N;
K_INDEX : I N D E X;
K_INDEXED : I N D E X E D;
K_INITIALLY : I N I T I A L L Y;
K_INNER : I N N E R;
K_INSERT : I N S E R T;
K_INSTEAD : I N S T E A D;
K_INTERSECT : I N T E R S E C T;
K_INTO : I N T O;
K_IS : I S;
K_ISNULL : I S N U L L;
K_JOIN : J O I N;
K_KEY : K E Y;
K_LEFT : L E F T;
K_LIKE : L I K E;
K_LIMIT : L I M I T;
K_MATCH : M A T C H;
K_NATURAL : N A T U R A L;
K_NO : N O;
K_NOT : N O T;
K_NOTNULL : N O T N U L L;
K_NULL : N U L L;
K_OF : O F;
K_OFFSET : O F F S E T;
K_ON : O N;
K_OR : O R;
K_ORDER : O R D E R;
K_OUTER : O U T E R;
K_PLAN : P L A N;
K_PRAGMA : P R A G M A;
K_PRIMARY : P R I M A R Y;
K_QUERY : Q U E R Y;
K_RAISE : R A I S E;
K_RECURSIVE : R E C U R S I V E;
K_REFERENCES : R E F E R E N C E S;
K_REGEXP : R E G E X P;
K_REINDEX : R E I N D E X;
K_RELEASE : R E L E A S E;
K_RENAME : R E N A M E;
K_REPLACE : R E P L A C E;
K_RESTRICT : R E S T R I C T;
K_RIGHT : R I G H T;
K_ROLLBACK : R O L L B A C K;
K_ROW : R O W;
K_SAVEPOINT : S A V E P O I N T;
K_SELECT : S E L E C T;
K_SET : S E T;
K_TABLE : T A B L E;
K_TEMP : T E M P;
K_TEMPORARY : T E M P O R A R Y;
K_THEN : T H E N;
K_TO : T O;
K_TRANSACTION : T R A N S A C T I O N;
K_TRIGGER : T R I G G E R;
K_UNION : U N I O N;
K_UNIQUE : U N I Q U E;
K_UPDATE : U P D A T E;
K_USING : U S I N G;
K_VACUUM : V A C U U M;
K_VALUES : V A L U E S;
K_VIEW : V I E W;
K_VIRTUAL : V I R T U A L;
K_WHEN : W H E N;
K_WHERE : W H E R E;
K_WITH : W I T H;
K_WITHOUT : W I T H O U T; //My additions
K_PARTITION : P A R T I T I O N;
K_SKIP : S K I P;
K_OVER : O V E R;
K_COLLECTIVE : C O L L E C T I V E;
K_WITHIN : W I T H I N;
K_DISTANCE : D I S T A N C E;
K_CONTAINS : C O N T A I N S;
K_COLLECT : C O L L E C T;
IDENTIFIER
: '"' (~'"' | '""')* '"'
| '`' (~'`' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_0-9]* // TODO check: needs more chars in set
| [a-zA-Z_0-9]* //added
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;
BIND_PARAMETER
: '?' DIGIT*
| [:#$] IDENTIFIER
;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
;
BLOB_LITERAL
: X STRING_LITERAL
;
SINGLE_LINE_COMMENT
: '--' ~[\r\n]* -> channel(HIDDEN)
;
MULTILINE_COMMENT
: '/*' .*? ( '*/' | EOF ) -> channel(HIDDEN)
;
SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;
UNEXPECTED_CHAR
: .
;
fragment DIGIT : [0-9];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];
Note that any_name is defined as:
any_name
: IDENTIFIER
| keyword
| STRING_LITERAL
| '(' any_name ')'
;
and you are trying to match
'"'any_name'"'
so this will only match things where the expression in double quotes is a valid identifier or key word, etc..
Instead try matching a quoted string like the way you define STRING_LITERAL:
'"' ( ~'"' )* '"'

ANTLRWorks v1.4.3 Debugger random behaviour (Can't connect to debugger)

If I debug this grammar:
grammar CDBFile;
options {
language=Java;
TokenLabelType=CommonToken;
output=AST;
k=1;
ASTLabelType=CommonTree;
}
tokens {
IMAG_COMPILE_UNIT;
MODULE;
}
//#lexer::namespace{Parser}
//#parser::namespace{Parser}
#lexer::header {
}
#lexer::members {
}
#parser::header {
}
#parser::members {
}
/*
* Lexer Rules
*/
fragment LETTER :
'a'..'z'
| 'A'..'Z';
MODULE_NAME
:
(LETTER)*
;
COLON
:
':'
;
/*
* Parser Rules
*/
public
compileUnit
:
(basic_record)* EOF
;
basic_record
:
(
'M' COLON module_record
| 'F' COLON function_record
) ('\n')?
;
module_record
:
MODULE_NAME
;
function_record
:
function_scope MODULE_NAME '$'
;
function_scope
:
('G$' | 'F$' | 'L$')
;
With just this input:
M:divide
the debugger does simply not start saying
"Cannot launch the debuggerTab. Time-out waiting to connect to the remote parser".
But using this grammar here:
grammar Calculator;
options {
//DO NOT CHANGE THESE!
backtrack = false;
k = 1;
output = AST;
ASTLabelType = CommonTree;
//SERIOUSLY, DO NOT CHANGE THESE!
}
tokens {
// Imaginary tokens
// Root
PROGRAM;
// function top level
FUNCTION_DECLARATION;
FUNCTION_HEAD;
FUNCTION_BODY;
DECL;
FUN;
// if-else-statement
IF_STATEMENT;
IF_CONDITION;
IF_BODY;
ELSE_BODY;
// for-loop
FOR_STATEMENT;
FOR_INITIALIZE;
FOR_CONDITION;
FOR_INCREMENT;
FOR_BODY;
// Non-imaginary tokens
}
#lexer::header {
package at.tugraz.ist.cc;
}
#lexer::members {
}
#parser::header {
package at.tugraz.ist.cc;
}
#parser::members {
}
//Lexer rules
ASSIGNOP :
'=';
OR :
'||';
AND :
'&&';
RELOP :
'<'
| '<='
| '>'
| '>='
| '=='
| '!=';
SIGN :
'+'
| '-';
MULOP :
'*'
| '/'
| '%';
NOT :
'!';
fragment OPERATORS :
'<'
| '>'
| '='
| '+'
| '-'
| '/'
| '%'
| '*'
| '|'
| '&';
INT :
'0'
| DIGIT DIGIT0*;
fragment DIGIT :
'1'..'9';
fragment DIGIT0 :
'0'..'9';
BOOLEAN :
'true'
| 'false';
ID :
LETTER
(
LETTER
| DIGIT0
| '_'
)*;
fragment LETTER :
'a'..'z'
| 'A'..'Z';
PUNCT :
'.'
| ','
| ';'
| ':'
| '!';
WS :
(
' '
| '\t'
| '\r'
| '\n'
)
{
$channel = HIDDEN;
};
LITERAL :
'"'
(
LETTER
| DIGIT
| '_'
| '\\'
| OPERATORS
| PUNCT
| WS
)*
'"';
// parse rules
program :
functions -> ^(PROGRAM functions)
;
functions :
(function_declaration functions)?
;
function_declaration :
head=function_head '{' declarations optional_stmt return_stmt rc='}' -> ^(FUNCTION_DECLARATION[$head.start, $head.text] function_head ^( FUNCTION_BODY[rc,"FUNCTION_BODY"] declarations optional_stmt? return_stmt))
;
function_head :
typeInfo=type ID arguments -> ^(FUNCTION_HEAD[$typeInfo.start, "FUNCTION_HEAD"] type ID arguments?)
;
type :
'int'
| 'boolean'
| 'String'
;
arguments :
'(' ! argument_optional ')' !;
argument_optional :
parameter_list ? -> ^(DECL parameter_list)? ;
parameter_list :
type ID parameter_list2 -> ^(type ID) parameter_list2
;
parameter_list2 :
(',' type ID)* -> ^(type ID)*;
declarations :
( type idlist ';' )* -> ^(DECL ( ^(type idlist))*) ;
idlist :
( ID idlist2 );
idlist2 :
( ',' ! idlist ) ?;
optional_stmt :
( stmt_list ) ?;
stmt_list :
statement statement2;
statement2 :
stmt_list ?;
return_stmt :
'return' ^ expression ';' ! ;
statement :
(
compound_stmt
| ifThenElse
| forLoop
| assignment ';' !
) ;
ifThenElse :
(
'if' '(' ifCondition=expression ')' ifBody=statement 'else' elseBody=statement -> ^(IF_STATEMENT ^(IF_CONDITION $ifCondition) ^(IF_BODY $ifBody) ^(ELSE_BODY $elseBody))
)
;
forLoop :
(
'for' '(' forInitialization=assignment ';' forCondition=expression ';' forIncrement=assignment ')' forBody=statement ->
^(FOR_STATEMENT ^(FOR_INITIALIZE $forInitialization) ^(FOR_CONDITION $forCondition) ^(FOR_INCREMENT $forIncrement) ^(FOR_BODY $forBody))
)
;
compound_stmt :
'{'! optional_stmt '}' !;
assignment :
ID ASSIGNOP ^ expression;
expression: andExpression (OR ^ andExpression)*;
andExpression: relOPExpression (AND ^ relOPExpression)*;
relOPExpression: signExpression (RELOP ^ signExpression)*;
signExpression : mulExpression (SIGN ^ mulExpression)*;
mulExpression : factor (MULOP ^ factor)*;
factor :
(
factorID
| INT
| BOOLEAN
| LITERAL
| NOT ^ factor
| SIGN ^ factor
| '('! expression ')' !
);
factorID: ID
( function_call -> ^(FUN ID function_call)
| -> ID
)
;
function_call :
'('! function_call_opt ')' !;
function_call_opt :
extend_assign_expr_list ? ;
extend_assign_expr_list :
(
expression
extend_assign_expr_list1
) ;
extend_assign_expr_list1 :
( ',' ! extend_assign_expr_list ) ? ;
parsing an Input like
int main()
{
return 0;
}
works just fine!
The internet has a lot of suggestions regarding this issue but none of them seem to work. The thing is that the debugger DOES work. Assuming that not the input is the problem here, the grammar has to be it. But if there is a problem with the grammar why would the Interpreter work for both examples?
Any ideas?
Edit:
I have noticed that for some reason in __Test__.java just contains:
M:divide
F:G0
I also get this output while Interpreting M:asd:
[13:47:52] Interpreting...
[13:47:52] problem matching token at 1:3 NoViableAltException('a'#[1:1: Tokens : ( T__8 | T__9 | T__10 | T__11 | T__12 | T__13 | T__14 | COLON );])
[13:47:52] problem matching token at 1:4 NoViableAltException('s'#[1:1: Tokens : ( T__8 | T__9 | T__10 | T__11 | T__12 | T__13 | T__14 | COLON );])
[13:47:52] problem matching token at 1:5 NoViableAltException('d'#[1:1: Tokens : ( T__8 | T__9 | T__10 | T__11 | T__12 | T__13 | T__14 | COLON );])
(even thought the tree is correct)
AFAIK, the debugger only works with the Java target. Since you have C# specific code in your first grammar:
#lexer::namespace{Parser}
#parser::namespace{Parser}
there are no .java classes generated (or at least, none that will compile), and the debugger hangs (and times out).
EDIT
I see you're using fragment rules in your parser rules: you can't. Fragment rules will never become a token on their own, they're only there for other lexer rules.
I've tested the grammar without the C# code in ANTLRWorks 1.4.3, and had no issues.
You could try the following:
restarting ANTLRWorks
changing the port the debugger listens on (perhaps the port is used by another service, or another debug-run of ANTLRWorks)
use the most recent version of ANTLRWorks

ANTLR resolving non-LL(*) problems and syntactic predicates

consider following rules in the parser:
expression
: IDENTIFIER
| (...)
| procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
procedure_call
: '(' expression expression* ')'
;
macro_use
: '(' IDENTIFIER datum* ')'
;
and
// Note that any string that parses as an <expression> will also parse as a <datum>.
datum
: simple_datum
| compound_datum
;
simple_datum
: BOOLEAN
| NUMBER
| CHARACTER
| STRING
| IDENTIFIER
;
compound_datum
: list
| vector
;
list
: '(' (datum+ ( '.' datum)?)? ')'
| ABBREV_PREFIX datum
;
fragment ABBREV_PREFIX
: ('\'' | '`' | ',' | ',#')
;
vector
: '#(' datum* ')'
;
the procedure_call and macro_rule alternative in the expression rule generate an non-LL(*) structure error. I can see the problem, since (IDENTIFIER) will parse as both. but even when i define both with + instead of *, it generates the error, even though above example shouldn't be parsing anymore.
i came up with the usage of syntactic predicates, but i can't figure out how to use them to do the trick here.
something like
expression
: IDENTIFIER
| (...)
| (procedure_call)=>procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
or
expression
: IDENTIFIER
| (...)
| ('(' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
doesnt work either, since none but the first rule will match anything. is there a proper way to solve that?
I found a JavaCC grammar of R5RS which I used to (quickly!) write an ANTLR equivalent:
/*
* Copyright (C) 2011 by Bart Kiers, based on the work done by HÃ¥kan L. Younes'
* JavaCC R5RS grammar, available at: http://mindprod.com/javacc/R5RS.jj
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
grammar R5RS;
parse
: commandOrDefinition* EOF
;
commandOrDefinition
: (syntaxDefinition)=> syntaxDefinition
| (definition)=> definition
| ('(' BEGIN commandOrDefinition)=> '(' BEGIN commandOrDefinition+ ')'
| command
;
syntaxDefinition
: '(' DEFINE_SYNTAX keyword transformerSpec ')'
;
definition
: '(' DEFINE ( variable expression ')'
| '(' variable defFormals ')' body ')'
)
| '(' BEGIN definition* ')'
;
defFormals
: variable* ('.' variable)?
;
keyword
: identifier
;
transformerSpec
: '(' SYNTAX_RULES '(' identifier* ')' syntaxRule* ')'
;
syntaxRule
: '(' pattern template ')'
;
pattern
: patternIdentifier
| '(' (pattern+ ('.' pattern | ELLIPSIS)?)? ')'
| '#(' (pattern+ ELLIPSIS? )? ')'
| patternDatum
;
patternIdentifier
: syntacticKeyword
| VARIABLE
;
patternDatum
: STRING
| CHARACTER
| bool
| number
;
template
: patternIdentifier
| '(' (templateElement+ ('.' templateElement)?)? ')'
| '#(' templateElement* ')'
| templateDatum
;
templateElement
: template ELLIPSIS?
;
templateDatum
: patternDatum
;
command
: expression
;
identifier
: syntacticKeyword
| variable
;
syntacticKeyword
: expressionKeyword
| ELSE
| ARROW
| DEFINE
| UNQUOTE
| UNQUOTE_SPLICING
;
expressionKeyword
: QUOTE
| LAMBDA
| IF
| SET
| BEGIN
| COND
| AND
| OR
| CASE
| LET
| LETSTAR
| LETREC
| DO
| DELAY
| QUASIQUOTE
;
expression
: (variable)=> variable
| (literal)=> literal
| (lambdaExpression)=> lambdaExpression
| (conditional)=> conditional
| (assignment)=> assignment
| (derivedExpression)=> derivedExpression
| (procedureCall)=> procedureCall
| (macroUse)=> macroUse
| macroBlock
;
variable
: VARIABLE
| ELLIPSIS
;
literal
: quotation
| selfEvaluating
;
quotation
: '\'' datum
| '(' QUOTE datum ')'
;
selfEvaluating
: bool
| number
| CHARACTER
| STRING
;
lambdaExpression
: '(' LAMBDA formals body ')'
;
formals
: '(' (variable+ ('.' variable)?)? ')'
| variable
;
conditional
: '(' IF test consequent alternate? ')'
;
test
: expression
;
consequent
: expression
;
alternate
: expression
;
assignment
: '(' SET variable expression ')'
;
derivedExpression
: quasiquotation
| '(' ( COND ( '(' ELSE sequence ')'
| condClause+ ('(' ELSE sequence ')')?
)
| CASE expression ( '(' ELSE sequence ')'
| caseClause+ ('(' ELSE sequence ')')?
)
| AND test*
| OR test*
| LET variable? '(' bindingSpec* ')' body
| LETSTAR '(' bindingSpec* ')' body
| LETREC '(' bindingSpec* ')' body
| BEGIN sequence
| DO '(' iterationSpec* ')' '(' test doResult? ')' command*
| DELAY expression
)
')'
;
condClause
: '(' test (sequence | ARROW recipient)? ')'
;
recipient
: expression
;
caseClause
: '(' '(' datum* ')' sequence ')'
;
bindingSpec
: '(' variable expression ')'
;
iterationSpec
: '(' variable init step? ')'
;
init
: expression
;
step
: expression
;
doResult
: sequence
;
procedureCall
: '(' operator operand* ')'
;
operator
: expression
;
operand
: expression
;
macroUse
: '(' keyword datum* ')'
;
macroBlock
: '(' (LET_SYNTAX | LETREC_SYNTAX) '(' syntaxSpec* ')' body ')'
;
syntaxSpec
: '(' keyword transformerSpec ')'
;
body
: ((definition)=> definition)* sequence
;
//sequence
// : ((command)=> command)* expression
// ;
sequence
: expression+
;
datum
: simpleDatum
| compoundDatum
;
simpleDatum
: bool
| number
| CHARACTER
| STRING
| identifier
;
compoundDatum
: list
| vector
;
list
: '(' (datum+ ('.' datum)?)? ')'
| abbreviation
;
abbreviation
: abbrevPrefix datum
;
abbrevPrefix
: '\'' | '`' | ',#' | ','
;
vector
: '#(' datum* ')'
;
number
: NUM_2
| NUM_8
| NUM_10
| NUM_16
;
bool
: TRUE
| FALSE
;
quasiquotation
: quasiquotationD[1]
;
quasiquotationD[int d]
: '`' qqTemplate[d]
| '(' QUASIQUOTE qqTemplate[d] ')'
;
qqTemplate[int d]
: (expression)=> expression
| ('(' UNQUOTE)=> unquotation[d]
| simpleDatum
| vectorQQTemplate[d]
| listQQTemplate[d]
;
vectorQQTemplate[int d]
: '#(' qqTemplateOrSplice[d]* ')'
;
listQQTemplate[int d]
: '\'' qqTemplate[d]
| ('(' QUASIQUOTE)=> quasiquotationD[d+1]
| '(' (qqTemplateOrSplice[d]+ ('.' qqTemplate[d])?)? ')'
;
unquotation[int d]
: ',' qqTemplate[d-1]
| '(' UNQUOTE qqTemplate[d-1] ')'
;
qqTemplateOrSplice[int d]
: ('(' UNQUOTE_SPLICING)=> splicingUnquotation[d]
| qqTemplate[d]
;
splicingUnquotation[int d]
: ',#' qqTemplate[d-1]
| '(' UNQUOTE_SPLICING qqTemplate[d-1] ')'
;
// macro keywords
LET_SYNTAX : 'let-syntax';
LETREC_SYNTAX : 'letrec-syntax';
SYNTAX_RULES : 'syntax-rules';
DEFINE_SYNTAX : 'define-syntax';
// syntactic keywords
ELSE : 'else';
ARROW : '=>';
DEFINE : 'define';
UNQUOTE_SPLICING : 'unquote-splicing';
UNQUOTE : 'unquote';
// expression keywords
QUOTE : 'quote';
LAMBDA : 'lambda';
IF : 'if';
SET : 'set!';
BEGIN : 'begin';
COND : 'cond';
AND : 'and';
OR : 'or';
CASE : 'case';
LET : 'let';
LETSTAR : 'let*';
LETREC : 'letrec';
DO : 'do';
DELAY : 'delay';
QUASIQUOTE : 'quasiquote';
NUM_2 : PREFIX_2 COMPLEX_2;
NUM_8 : PREFIX_8 COMPLEX_8;
NUM_10 : PREFIX_10? COMPLEX_10;
NUM_16 : PREFIX_16 COMPLEX_16;
ELLIPSIS : '...';
VARIABLE
: INITIAL SUBSEQUENT*
| PECULIAR_IDENTIFIER
;
STRING : '"' STRING_ELEMENT* '"';
CHARACTER : '#\\' (~(' ' | '\n') | CHARACTER_NAME);
TRUE : '#' ('t' | 'T');
FALSE : '#' ('f' | 'F');
// to ignore
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
COMMENT : ';' ~('\r' | '\n')* {$channel=HIDDEN;};
// fragments
fragment INITIAL : LETTER | SPECIAL_INITIAL;
fragment LETTER : 'a'..'z' | 'A'..'Z';
fragment SPECIAL_INITIAL : '!' | '$' | '%' | '&' | '*' | '/' | ':' | '<' | '=' | '>' | '?' | '^' | '_' | '~';
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT;
fragment DIGIT : '0'..'9';
fragment SPECIAL_SUBSEQUENT : '.' | '+' | '-' | '#';
fragment PECULIAR_IDENTIFIER : '+' | '-';
fragment STRING_ELEMENT : ~('"' | '\\') | '\\' ('"' | '\\');
fragment CHARACTER_NAME : 'space' | 'newline';
fragment COMPLEX_2
: REAL_2 ('#' REAL_2)?
| REAL_2? SIGN UREAL_2? ('i' | 'I')
;
fragment COMPLEX_8
: REAL_8 ('#' REAL_8)?
| REAL_8? SIGN UREAL_8? ('i' | 'I')
;
fragment COMPLEX_10
: REAL_10 ('#' REAL_10)?
| REAL_10? SIGN UREAL_10? ('i' | 'I')
;
fragment COMPLEX_16
: REAL_16 ('#' REAL_16)?
| REAL_16? SIGN UREAL_16? ('i' | 'I')
;
fragment REAL_2 : SIGN? UREAL_2;
fragment REAL_8 : SIGN? UREAL_8;
fragment REAL_10 : SIGN? UREAL_10;
fragment REAL_16 : SIGN? UREAL_16;
fragment UREAL_2 : UINTEGER_2 ('/' UINTEGER_2)?;
fragment UREAL_8 : UINTEGER_8 ('/' UINTEGER_8)?;
fragment UREAL_10 : UINTEGER_10 ('/' UINTEGER_10)? | DECIMAL_10;
fragment UREAL_16 : UINTEGER_16 ('/' UINTEGER_16)?;
fragment DECIMAL_10
: UINTEGER_10 SUFFIX
| '.' DIGIT+ '#'* SUFFIX?
| DIGIT+ '.' DIGIT* '#'* SUFFIX?
| DIGIT+ '#'+ '.' '#'* SUFFIX?
;
fragment UINTEGER_2 : DIGIT_2+ '#'*;
fragment UINTEGER_8 : DIGIT_8+ '#'*;
fragment UINTEGER_10 : DIGIT+ '#'*;
fragment UINTEGER_16 : DIGIT_16+ '#'*;
fragment PREFIX_2 : RADIX_2 EXACTNESS? | EXACTNESS RADIX_2;
fragment PREFIX_8 : RADIX_8 EXACTNESS? | EXACTNESS RADIX_8;
fragment PREFIX_10 : RADIX_10 EXACTNESS? | EXACTNESS RADIX_10;
fragment PREFIX_16 : RADIX_16 EXACTNESS? | EXACTNESS RADIX_16;
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l' | 'E' | 'S' | 'F' | 'D' | 'L';
fragment SIGN : '+' | '-';
fragment EXACTNESS : '#' ('i' | 'e' | 'I' | 'E');
fragment RADIX_2 : '#' ('b' | 'B');
fragment RADIX_8 : '#' ('o' | 'O');
fragment RADIX_10 : '#' ('d' | 'D');
fragment RADIX_16 : '#' ('x' | 'X');
fragment DIGIT_2 : '0' | '1';
fragment DIGIT_8 : '0'..'7';
fragment DIGIT_16 : DIGIT | 'a'..'f' | 'A'..'F';
which can be tested with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source =
"(define sum-iter \n" +
" (lambda(n acc i) \n" +
" (if (> i n) \n" +
" acc \n" +
" (sum-iter n (+ acc i) (+ i 1))))) ";
R5RSLexer lexer = new R5RSLexer(new ANTLRStringStream(source));
R5RSParser parser = new R5RSParser(new CommonTokenStream(lexer));
parser.parse();
}
}
and to generate a lexer & parser, compile all Java source files and run the main class, do:
bart#hades:~/Programming/ANTLR/Demos/R5RS$ java -cp antlr-3.3.jar org.antlr.Tool R5RS.g
bart#hades:~/Programming/ANTLR/Demos/R5RS$ javac -cp antlr-3.3.jar *.java
bart#hades:~/Programming/ANTLR/Demos/R5RS$ java -cp .:antlr-3.3.jar Main
bart#hades:~/Programming/ANTLR/Demos/R5RS$
The fact that nothing is being printed on the console means the parser (and lexer) didn't find any errors with the provided source.
Note that I have no Unit tests and have only tested the single Scheme source inside the Main class. If you find errors in the ANTLR grammar, I'd appreciate to hear about them so I can fix the grammar. In due time, I'll probably commit the grammar to the official ANTLR Wiki.

ANTLR IDL Grammar

Using ANTLR I am trying to create a very simple IDL-style grammar. Here is what I have so far.
grammar idl;
data_type
: 'DataType' ID LCURLY attribute_list RCURLY
;
modifier
: 'public'
;
primitive
: 'byte'
| 'short'
| 'int'
| 'float'
| 'double'
;
attribute
: modifier primitive ID END
;
attribute_list
: attribute+
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
LCURLY : '{'
;
RCURLY : '}'
;
END : ';'
;
This does not seem to work when I run the debugger over 'data_type' however. It just halts when it reaches the 'attribute_list'. Changing 'attribute_list' to just 'attribute' works fine but obviously I want one-or-more attributes, not just one.
Thanks

Resources