Following this question, I have ANTLR installed via HomeBrew:
brew install antlr
and it is installed on:
/usr/local/Cellar/antlr/<version>/
and installed the Python runtime via
pip3 install antlr4-python3-runtime
From here, I ran the
export CLASSPATH=".:/usr/local/Cellar/antlr/<version>/antlr-<version>-complete.jar:$CLASSPATH"
but when I run the command
grun <grammarName> <inputFile>
I get the infamous error message:
Can't load <grammarName> as lexer or parser
I would appreciate it if you could help me know the problem and how to solve it.
P.S. It shouldn't matter, but you may see the code I am working on here.
This error message is an indication that TestRig (that the grun alias uses), can’t find the Parser (or Lexer) class in your classpath. If you’ve placed your generated Parser in a package, you may need to take the package name into account, but the main thing is that the generated (and compiled) class is in your classpath.
Also.. TestRig takes the grammarName AND a startRule as parameters, and expects your input to come from stdin.
I cloned your repo to take a closer look at your issue.
The immediate issue for why grun is giving you this issue is that you specified your target language in the grammar file (looks like I need to retract that comment about it not being anything in your grammar). By specifying python as the target language in the grammar, you didn't generate the *.java classes that the TestRig class (used by the grun alias) needed to execute.
I removed the target language option from the grammar and was able to run the grun command against your sample input. To get it to parse correctly, I took the liberty of modifying several things in your grammar:
Removed the target language (It's generally better to specify the target language on the antlr command line so that the grammar remains language agnostic (it's also essential if you want to use the TestRig/grun utility to test things out, since you'll need the Java target)).
changed SectionName lexer rule to section parser rule (with labeled alternatives. Having lexer rules like 'Body ' Integer will give you a single token with both the body keyword and the integer, that you'd then have to pull apart later (it also forces there to be only a single space between 'Body' and the integer).
set the NewLine token to -> skip (This is a bit more presumptive on my part, but not skipping NewLine will require modifying more parse rule to specify where all NewLine is a valid token.)
removed the StatementEnd lexer rule since I skiped the NewLine tokens
reworked theInteger and Float stuff to be two different tokens so that I could use the Integer token in the section parser rule.
a couple more minor tweaks just to get this skeleton to handle your sample input.
The resulting grammar I used was:
grammar ElmerSolver;
// Parser Rules
// eostmt: ';' | CR;
statement: ';';
statement_list: statement*;
sections: section+ EOF;
// section: SectionName /* statement_list */ End;
// Lexer Rules
fragment DIGIT: [0-9];
Integer: DIGIT+;
Float:
[+-]? (DIGIT+ ([.]DIGIT*)? | [.]DIGIT+) ([Ee][+-]? DIGIT+)?;
section:
'Header' statement_list End # headerSection
| 'Simulation' statement_list End # simulatorSection
| 'Constants' statement_list End # constantsSection
| 'Body ' Integer statement_list End # bodySection
| 'Material ' Integer statement_list End # materialSection
| 'Body Force ' Integer statement_list End # bodyForceSection
| 'Equation ' Integer statement_list End # equationSection
| 'Solver ' Integer statement_list End # solverSection
| 'Boundary Condition ' Integer statement_list End # boundaryConditionSection
| 'Initial Condition ' Integer statement_list End # initialConditionSection
| 'Component' Integer statement_list End # componentSection;
End: 'End';
// statementEnd: ';' NewLine*;
NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;
LineJoining:
'\\' WhiteSpace? ('\r'? '\n' | '\r' | '\f') -> skip;
WhiteSpace: [ \t\r\n]+ -> skip;
LineComment: '#' ~( '\r' | '\n')* -> skip;
With those changes, I ran
➜ antlr4 ElmerSolver.g4
javac *.java
grun ElmerSolver sections -tree < examples/ex001.sif
and got the output:
(sections (section Simulation statement_list End) (section Equation 1 statement_list End) <EOF>)
Related
I want to parse yaml with antlr4.
Target file contains image: xxx.com/node:8.14.
Then I wrote a grammar file like this:
grammar Drone;
yaml: obj+ ;
obj: ID ':' value;
value:
STRING;
ID
: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')+
;
STRING: ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'.'|'_'|'/'|':')+ ;
WS: [ \t]+ -> skip;
CRLF: [\r\n]+ ;
got result like this:
[antlr4] ➜ dronemigrate antlr4-parse Drone.g4 yaml -tree -trace drone.yml
line 1:0 mismatched input 'image:' expecting ID
enter yaml, LT(1)=image:
enter obj, LT(1)=image:
exit obj, LT(1)=<EOF>
exit yaml, LT(1)=<EOF>
(yaml:1 (obj:1 image: xxx.com/node:8.14))
When I remove char ':' in Grammar rules file, got reulst like this:
[antlr4] ➜ dronemigrate antlr4-parse Drone.g4 yaml -tree -trace drone.yml
enter yaml, LT(1)=image
enter obj, LT(1)=image
consume [#0,0:4='image',<2>,1:0] rule obj
consume [#1,5:5=':',<1>,1:5] rule obj
enter value, LT(1)=xxx.com/node
consume [#2,7:18='xxx.com/node',<3>,1:7] rule value
exit value, LT(1)=:
exit obj, LT(1)=:
exit yaml, LT(1)=:
(yaml:1 (obj:1 image : (value:1 xxx.com/node)))
how to deal the ':' in string?
YAML is not well suited for parser generators like ANTLR (where there is a strict separation between the lexer and parser). There are ways around it (using lexical modes), but fully implementing a YAML grammar is by no means a trivial thing. That's why there is still an open issue in ANTLR's issue tracker to implement such a grammar. You could have a look at how it is done here: https://github.com/umaranis/FastYaml
I have the following powershell script:
$triggerBy = "Finish Build Trigger; Aed / My Application / Service / My Application Service, build #4.1.2.41"
$buildId = $triggerBy -replace 'Finish Build Trigger; ', '' -replace ', build #.*', '' -replace '( \/ )', '_' -replace ' ', ''
When this runs, $buildId is set to Aed_MyApplication_Service_MyApplicationService.
I then want to get value of the variable %dep.Aed_MyApplication_Service_MyApplicationService.build.number%. But I need to use the value of $buildId for the middle part of that.
Is there a way to say $buildNumber = %dep.$buildId.build.number% and TeamCity recognize that $buildId should be expanded before it evaluates the variable?
No, sadly there is no way to force TeamCity to evaluate the build lazily.
If this is a $triggerBy "switch/case" kind of choice though, you could create a build step (at the beginning of the build), that will :
test if your calculated $buildId is Aed_MyApplication_Service_MyApplicationService
if it is, the following should enable you to use %myDepBuildNumber% in the next build steps :
echo "##teamcity[setParameter name='myDepBuildNumber' value='%dep.Aed_MyApplication_Service_MyApplicationService.build.number%']"
else do nothing.
Repeat this if/else for any $triggerBy case.
If you get an error for unset %dep.Aed_MyApplication_Service_MyApplicationService.build.number% : setting this parameter should fix the issue. (e.g. if your build and Aed_MyApplication_Service_MyApplicationService both belong to Aed_MyApplication_Service then declare the parameter here).
I tried to use ANTLR3 to build a simple Regexpression parser, but it throws the internal error
Here is the Sample.g
grammar Sample;
options {
memoize=true;
output=AST;
}
tokens {
RegExp;
}
RegExpression:
'/' (a=~('/' | NL))+ '/'
-> ^(RegExp[$RegExpression.start, $RegExpression.text] $a+ )
;
fragment NL: '\n' | '\r';
ANY : . ;
I run the command:
java -jar antlr-3.5.2-complete.jar -print Sample.g
and it gives this:
error(10): internal error: Sample.g : java.lang.NullPointerException
org.antlr.grammar.v3.DefineGrammarItemsWalker.rewrite_atom(DefineGrammarItemsWalker.java:3896)
...
...
Updated according to comments
grammar Sample{
memoize=true;
output=AST;
}
tokens {
RegExp;
}
regExpression:
'/' (a=~('/' | NL))+ '/'
-> ^(RegExp[$regExpression.start, $regExpression.text] $a+ )
;
NL: '\n' | '\r';
And here are the errors after running the java -jar antlr-3.5.2-complete.jar Sample.g
error(10): internal error: Sample.g : java.lang.NullPointerException
org.antlr.grammar.v3.CodeGenTreeWalker.getTokenElementST(CodeGenTreeWalker.java:311)
org.antlr.grammar.v3.CodeGenTreeWalker.notElement(CodeGenTreeWalker.java:2886)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2431)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2446)
org.antlr.grammar.v3.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:2250)
org.antlr.grammar.v3.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1798)
org.antlr.grammar.v3.CodeGenTreeWalker.ebnf(CodeGenTreeWalker.java:3014)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2495)
org.antlr.grammar.v3.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:2250)
org.antlr.grammar.v3.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1798)
org.antlr.grammar.v3.CodeGenTreeWalker.rule(CodeGenTreeWalker.java:1321)
org.antlr.grammar.v3.CodeGenTreeWalker.rules(CodeGenTreeWalker.java:955)
org.antlr.grammar.v3.CodeGenTreeWalker.grammarSpec(CodeGenTreeWalker.java:877)
org.antlr.grammar.v3.CodeGenTreeWalker.grammar_(CodeGenTreeWalker.java:518)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:415)
org.antlr.Tool.generateRecognizer(Tool.java:674)
org.antlr.Tool.process(Tool.java:487)
org.antlr.Tool.main(Tool.java:98)
You're trying to use a rewrite rule (tree construction) on a lexer rule. That doesn't make sense.
In ANTLR, all rules with name starting with an uppercase letter are lexer rules. The tree construction is used on AST nodes, not on tokens themselves, so you have to use it on parser rules (starting with lowercase letter).
When you do that, keep in mind that your NL is a fragment now (you cannot use fragments in parser rules) and make sure your ANY token doesn't collide with anything else, i.e. define all needed tokens (/, NL etc.) and put them above the ANY token definition.
I tried to use ANTLR3 to build a simple Regexpression parser, but it throws the internal error
Here is the Sample.g
grammar Sample;
options {
memoize=true;
output=AST;
}
tokens {
RegExp;
}
RegExpression:
'/' (a=~('/' | NL))+ '/'
-> ^(RegExp[$RegExpression.start, $RegExpression.text] $a+ )
;
fragment NL: '\n' | '\r';
ANY : . ;
I run the command:
java -jar antlr-3.5.2-complete.jar -print Sample.g
and it gives this:
error(10): internal error: Sample.g : java.lang.NullPointerException
org.antlr.grammar.v3.DefineGrammarItemsWalker.rewrite_atom(DefineGrammarItemsWalker.java:3896)
...
...
Updated according to comments
grammar Sample{
memoize=true;
output=AST;
}
tokens {
RegExp;
}
regExpression:
'/' (a=~('/' | NL))+ '/'
-> ^(RegExp[$regExpression.start, $regExpression.text] $a+ )
;
NL: '\n' | '\r';
And here are the errors after running the java -jar antlr-3.5.2-complete.jar Sample.g
error(10): internal error: Sample.g : java.lang.NullPointerException
org.antlr.grammar.v3.CodeGenTreeWalker.getTokenElementST(CodeGenTreeWalker.java:311)
org.antlr.grammar.v3.CodeGenTreeWalker.notElement(CodeGenTreeWalker.java:2886)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2431)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2446)
org.antlr.grammar.v3.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:2250)
org.antlr.grammar.v3.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1798)
org.antlr.grammar.v3.CodeGenTreeWalker.ebnf(CodeGenTreeWalker.java:3014)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2495)
org.antlr.grammar.v3.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:2250)
org.antlr.grammar.v3.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1798)
org.antlr.grammar.v3.CodeGenTreeWalker.rule(CodeGenTreeWalker.java:1321)
org.antlr.grammar.v3.CodeGenTreeWalker.rules(CodeGenTreeWalker.java:955)
org.antlr.grammar.v3.CodeGenTreeWalker.grammarSpec(CodeGenTreeWalker.java:877)
org.antlr.grammar.v3.CodeGenTreeWalker.grammar_(CodeGenTreeWalker.java:518)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:415)
org.antlr.Tool.generateRecognizer(Tool.java:674)
org.antlr.Tool.process(Tool.java:487)
org.antlr.Tool.main(Tool.java:98)
You're trying to use a rewrite rule (tree construction) on a lexer rule. That doesn't make sense.
In ANTLR, all rules with name starting with an uppercase letter are lexer rules. The tree construction is used on AST nodes, not on tokens themselves, so you have to use it on parser rules (starting with lowercase letter).
When you do that, keep in mind that your NL is a fragment now (you cannot use fragments in parser rules) and make sure your ANY token doesn't collide with anything else, i.e. define all needed tokens (/, NL etc.) and put them above the ANY token definition.
I want to have a multi-line bit of markdown java in a yam file. I tried many things but I guess I don't quite get the quoting rules of Yaml.
{
title: Museum,
body: |
"```java
code code code
java2",
answers: [
"`museum`",
"`museum.getFloor(3)`",
"`museum.getFloor(3).getExhibit(5)`",
"`museum.getFloor(3).getExhibit(5).getCurator()`",
"`museum.getFloor(3).getExhibit(5).getCurator().name`",
"`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
]
}
Produces:
/Users/pitosalas/.rbenv/versions/2.3.1/lib/ruby/2.3.0/psych.rb:377:in `parse': (generator/test.yml): found character that cannot start any token while scanning for the next token at line 3 column 9 (Psych::SyntaxError)
YAML has two styles: the JSON like flow style and the much better human readable block style.
Roughly speaking you can have nested structures each style nested within itself and can have flow style nested within block style, but block style nested within flow style is not allowed.
Your to level { and } are flow style but you try to introduce, with |, a literal block style scalar within that flow style. Replace the flow style with block style upwards from that scalar:
title: Museum
body: |
"```java
code code code
java2"
answers: [
"`museum`",
"`museum.getFloor(3)`",
"`museum.getFloor(3).getExhibit(5)`",
"`museum.getFloor(3).getExhibit(5).getCurator()`",
"`museum.getFloor(3).getExhibit(5).getCurator().name`",
"`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
]
and your YAML is fine. Note that the double quotes "around" the value for the key body are not going to be stripped when loading, maybe that is not what you intended.
You should IMO not leave out the trailing , after the last value in the (flow style) sequence that is the value for answers. This will certainly lead to errors when you extend the list and forget to put in the trailing comma on the line above.
I would personally go for block style all the way:
title: Museum
body: |
"```java
code code code
java2"
answers:
- "`museum`"
- "`museum.getFloor(3)`"
- "`museum.getFloor(3).getExhibit(5)`"
- "`museum.getFloor(3).getExhibit(5).getCurator()`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
When dealing with YAML file generation that is convoluted or complex, or when it's not working as I expect, I revert to letting Ruby show me the way:
require 'yaml'
body = <<EOT
"```java
code code code
java2
"
EOT
answers = %w(
`museum`
`museum.getFloor(3)`
`museum.getFloor(3).getExhibit(5)`
`museum.getFloor(3).getExhibit(5).getCurator()`
`museum.getFloor(3).getExhibit(5).getCurator().name`
`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`
)
obj = {
"title" => "Museum",
"body" => body,
"answers" => answers
}
puts obj.to_yaml
Which, in this case, outputs:
---
title: Museum
body: |
"```java
code code code
java2
"
answers:
- "`museum`"
- "`museum.getFloor(3)`"
- "`museum.getFloor(3).getExhibit(5)`"
- "`museum.getFloor(3).getExhibit(5).getCurator()`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
If you then pass that YAML back into the parser, you should get the original data structure back:
YAML.load(obj.to_yaml)
# => {"title"=>"Museum",
# "body"=>"\"```java\n" +
# "code code code\n" +
# "java2\n" +
# "\"\n",
# "answers"=>
# ["`museum`",
# "`museum.getFloor(3)`",
# "`museum.getFloor(3).getExhibit(5)`",
# "`museum.getFloor(3).getExhibit(5).getCurator()`",
# "`museum.getFloor(3).getExhibit(5).getCurator().name`",
# "`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"]}