how to deal char ':' and assign(:) in ANTLR4 grammar - yaml

I want to parse yaml with antlr4.
Target file contains image: xxx.com/node:8.14.
Then I wrote a grammar file like this:
grammar Drone;
yaml: obj+ ;
obj: ID ':' value;
value:
STRING;
ID
: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')+
;
STRING: ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'.'|'_'|'/'|':')+ ;
WS: [ \t]+ -> skip;
CRLF: [\r\n]+ ;
got result like this:
[antlr4] ➜ dronemigrate antlr4-parse Drone.g4 yaml -tree -trace drone.yml
line 1:0 mismatched input 'image:' expecting ID
enter yaml, LT(1)=image:
enter obj, LT(1)=image:
exit obj, LT(1)=<EOF>
exit yaml, LT(1)=<EOF>
(yaml:1 (obj:1 image: xxx.com/node:8.14))
When I remove char ':' in Grammar rules file, got reulst like this:
[antlr4] ➜ dronemigrate antlr4-parse Drone.g4 yaml -tree -trace drone.yml
enter yaml, LT(1)=image
enter obj, LT(1)=image
consume [#0,0:4='image',<2>,1:0] rule obj
consume [#1,5:5=':',<1>,1:5] rule obj
enter value, LT(1)=xxx.com/node
consume [#2,7:18='xxx.com/node',<3>,1:7] rule value
exit value, LT(1)=:
exit obj, LT(1)=:
exit yaml, LT(1)=:
(yaml:1 (obj:1 image : (value:1 xxx.com/node)))
how to deal the ':' in string?

YAML is not well suited for parser generators like ANTLR (where there is a strict separation between the lexer and parser). There are ways around it (using lexical modes), but fully implementing a YAML grammar is by no means a trivial thing. That's why there is still an open issue in ANTLR's issue tracker to implement such a grammar. You could have a look at how it is done here: https://github.com/umaranis/FastYaml

Related

Sphinx-autodoc with napoleon (Google Doc String Style): Warnings and Errors about Block quotes and indention

I am using Sphinx 4.4.0 with napoleon extension (Google Doc String). I have this two problems
ARNING: Block quote ends without a blank line; unexpected unindent.
ERROR: Unexpected indentation.
I found something about it on the internet but can not fit this two my code. My problem is I even do not understand the messages. I do not see where the problem could be.
This is the code:
def read_and_validate_csv(basename, specs_and_rules):
"""Read a CSV file with respect to specifications about format and
rules about valid values.
Hints: Do not use objects of type type (e.g. str instead of "str") when
specificing the column type.
specs_and_rules = {
'TEMPLATES': {
'T1l': ('Int16', [-9, ' '])
},
'ColumnA': 'str',
'ColumnB': ('str', 'no answer'),
'ColumnC': None,
'ColumnD': (
'Int16',
-9, {
'len': [1, 2, (4-8)],
'val': [0, 1, (3-9)]
}
}
Returns:
(pandas.DataFrame): Result.
"""
This are the original messages:
.../bandas.py:docstring of buhtzology.bandas.read_and_validate_csv:11: WARNING: Block quote ends without a blank line; unexpected unindent.
.../bandas.py:docstring of buhtzology.bandas.read_and_validate_csv:15: ERROR: Unexpected indentation.
.../bandas.py:docstring of buhtzology.bandas.read_and_validate_csv:17: ERROR: Unexpected indentation.
.../bandas.py:docstring of buhtzology.bandas.read_and_validate_csv:19: WARNING: Block quote ends without a blank line; unexpected unindent.
.../bandas.py:docstring of buhtzology.bandas.read_and_validate_csv:20: WARNING: Block quote ends without a blank line; unexpected unindent.
reStructuredText is not Markdown, and indentation alone is not enough to demarcate the code block. reStructuredText calls this a literal block. Although the use of :: is one option, you might want to explicitly specify the language (overriding the default) with the use of the code-block directive.
Also I noticed that you have invalid syntax in your code block—a missing ) and extra spaces in your indentation—which could have caused those errors.
Try this.
def read_and_validate_csv(basename, specs_and_rules):
"""Read a CSV file with respect to specifications about format and
rules about valid values.
Hints: Do not use objects of type type (e.g. str instead of "str") when
specificing the column type.
.. code-block:: python
specs_and_rules = {
'TEMPLATES': {
'T1l': ('Int16', [-9, ' '])
},
'ColumnA': 'str',
'ColumnB': ('str', 'no answer'),
'ColumnC': None,
'ColumnD': (
'Int16',
-9, {
'len': [1, 2, (4-8)],
'val': [0, 1, (3-9)]
}
)
}
Returns:
(pandas.DataFrame): Result.
"""

setting up ANTLR's CLASSPATH on macOS installed via HomeBrew

Following this question, I have ANTLR installed via HomeBrew:
brew install antlr
and it is installed on:
/usr/local/Cellar/antlr/<version>/
and installed the Python runtime via
pip3 install antlr4-python3-runtime
From here, I ran the
export CLASSPATH=".:/usr/local/Cellar/antlr/<version>/antlr-<version>-complete.jar:$CLASSPATH"
but when I run the command
grun <grammarName> <inputFile>
I get the infamous error message:
Can't load <grammarName> as lexer or parser
I would appreciate it if you could help me know the problem and how to solve it.
P.S. It shouldn't matter, but you may see the code I am working on here.
This error message is an indication that TestRig (that the grun alias uses), can’t find the Parser (or Lexer) class in your classpath. If you’ve placed your generated Parser in a package, you may need to take the package name into account, but the main thing is that the generated (and compiled) class is in your classpath.
Also.. TestRig takes the grammarName AND a startRule as parameters, and expects your input to come from stdin.
I cloned your repo to take a closer look at your issue.
The immediate issue for why grun is giving you this issue is that you specified your target language in the grammar file (looks like I need to retract that comment about it not being anything in your grammar). By specifying python as the target language in the grammar, you didn't generate the *.java classes that the TestRig class (used by the grun alias) needed to execute.
I removed the target language option from the grammar and was able to run the grun command against your sample input. To get it to parse correctly, I took the liberty of modifying several things in your grammar:
Removed the target language (It's generally better to specify the target language on the antlr command line so that the grammar remains language agnostic (it's also essential if you want to use the TestRig/grun utility to test things out, since you'll need the Java target)).
changed SectionName lexer rule to section parser rule (with labeled alternatives. Having lexer rules like 'Body ' Integer will give you a single token with both the body keyword and the integer, that you'd then have to pull apart later (it also forces there to be only a single space between 'Body' and the integer).
set the NewLine token to -> skip (This is a bit more presumptive on my part, but not skipping NewLine will require modifying more parse rule to specify where all NewLine is a valid token.)
removed the StatementEnd lexer rule since I skiped the NewLine tokens
reworked theInteger and Float stuff to be two different tokens so that I could use the Integer token in the section parser rule.
a couple more minor tweaks just to get this skeleton to handle your sample input.
The resulting grammar I used was:
grammar ElmerSolver;
// Parser Rules
// eostmt: ';' | CR;
statement: ';';
statement_list: statement*;
sections: section+ EOF;
// section: SectionName /* statement_list */ End;
// Lexer Rules
fragment DIGIT: [0-9];
Integer: DIGIT+;
Float:
[+-]? (DIGIT+ ([.]DIGIT*)? | [.]DIGIT+) ([Ee][+-]? DIGIT+)?;
section:
'Header' statement_list End # headerSection
| 'Simulation' statement_list End # simulatorSection
| 'Constants' statement_list End # constantsSection
| 'Body ' Integer statement_list End # bodySection
| 'Material ' Integer statement_list End # materialSection
| 'Body Force ' Integer statement_list End # bodyForceSection
| 'Equation ' Integer statement_list End # equationSection
| 'Solver ' Integer statement_list End # solverSection
| 'Boundary Condition ' Integer statement_list End # boundaryConditionSection
| 'Initial Condition ' Integer statement_list End # initialConditionSection
| 'Component' Integer statement_list End # componentSection;
End: 'End';
// statementEnd: ';' NewLine*;
NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;
LineJoining:
'\\' WhiteSpace? ('\r'? '\n' | '\r' | '\f') -> skip;
WhiteSpace: [ \t\r\n]+ -> skip;
LineComment: '#' ~( '\r' | '\n')* -> skip;
With those changes, I ran
➜ antlr4 ElmerSolver.g4
javac *.java
grun ElmerSolver sections -tree < examples/ex001.sif
and got the output:
(sections (section Simulation statement_list End) (section Equation 1 statement_list End) <EOF>)

Accessing environment variables in a YAML file for Ruby project (using ${ENVVAR} syntax)

I am building an open source project using Ruby for testing HTTP services: https://github.com/Comcast/http-blackbox-test-tool
I want to be able to reference environment variables in my test-plan.yaml file. I could use ERB, however I don't want to support embedding any random Ruby code and ERB syntax is odd for non-rubyists, I just want to access environment variables using the commonly used Unix style ${ENV_VAR} syntax.
e.g.
order-lunch-app-health:
request:
url: ${ORDER_APP_URL}
headers:
content-type: 'application/text'
method: get
expectedResponse:
statusCode: 200
maxRetryCount: 5
All examples I have found for Ruby use ERB. Does anyone have a suggestion on the best way to deal with this? I an open to using another tool to preprocess the YAML and then send that to the Ruby application.
I believe something like this should work under most circumstances:
require 'yaml'
def load_yaml(file)
content = File.read file
content.gsub! /\${([^}]+)}/ do
ENV[$1]
end
YAML.load content
end
p load_yaml 'sample.yml'
As opposed to my original answer, this is both simpler and handles undefined ENV variables well.
Try with this YAML:
# sample.yml
path: ${PATH}
home: ${HOME}
error: ${NO_SUCH_VAR}
Original Answer (left here for reference)
There are several ways to do it. If you want to allow your users to use the ${VAR} syntax, then perhaps one way would be to first convert these variables to Ruby string substitution format %{VAR} and then evaluate all environment variables together.
Here is a rough proof of concept:
require 'yaml'
# Transform environments to a hash of { symbol: value }
env_hash = ENV.to_h.transform_keys(&:to_sym)
# Load the file and convert ${ANYTHING} to %{ANYTHING}
content = File.read 'sample.yml'
content.gsub! /\${([^}]+)}/, "%{\\1}"
# Use Ruby string substitution to replace %{VARS}
content %= env_hash
# Done
yaml = YAML.load content
p yaml
Use it with this sample.yml for instance:
# sample.yml
path: ${PATH}
home: ${HOME}
There are many ways this can be improved upon of course.
Preprocessing is easy, and I recommend you use a YAML loaderd/dumper
based solution, as the replacement might require quotes around the
replacement scalar. (E.g. you substitute the string true, if that
were not quoted, the resulting YAML would be read as a boolean).
Assuming your "source" is in input.yaml and your env. variable
ORDER_APP_URL set to https://some.site/and/url. And the following
script in expand.py:
import sys
import os
from pathlib import Path
import ruamel.yaml
def substenv(d, env):
if isinstance(d, dict):
for k, v in d.items():
if isinstance(v, str) and '${' in v:
d[k] = v.replace('${', '{').format(**env)
else:
substenv(v, env)
elif isinstance(d, list):
for idx, item in enumerate(d):
if isinstance(v, str) and '${' in v:
d[idx] = item.replace('${', '{').format(**env)
else:
substenv(item, env)
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
data = yaml.load(Path(sys.argv[1]))
substenv(data, os.environ)
yaml.dump(data, Path(sys.argv[2]))
You can then do:
python expand.py input.yaml output.yaml
which writes output.yaml:
order-lunch-app-health:
request:
url: https://some.site/and/url
headers:
content-type: 'application/text'
method: get
expectedResponse:
statusCode: 200
maxRetryCount: 5
Please note that the spurious quotes around 'application/text' are preserved, as would be any comments
in the original file.
Quotes around the substituted URL are not necessary, but the would have been added if they were.
The substenv routine recursively traverses the loaded data, and substitutes even if the substitution is in mid-scalar, and if there are more than substitution in one scalar. You can "tighten" the test:
if isinstance(v, str) and '${' in v:
if that would match too many strings loaded from YAML.

ANTLR3 NoViableAltException and MissingTokenException [duplicate]

I tried to use ANTLR3 to build a simple Regexpression parser, but it throws the internal error
Here is the Sample.g
grammar Sample;
options {
memoize=true;
output=AST;
}
tokens {
RegExp;
}
RegExpression:
'/' (a=~('/' | NL))+ '/'
-> ^(RegExp[$RegExpression.start, $RegExpression.text] $a+ )
;
fragment NL: '\n' | '\r';
ANY : . ;
I run the command:
java -jar antlr-3.5.2-complete.jar -print Sample.g
and it gives this:
error(10): internal error: Sample.g : java.lang.NullPointerException
org.antlr.grammar.v3.DefineGrammarItemsWalker.rewrite_atom(DefineGrammarItemsWalker.java:3896)
...
...
Updated according to comments
grammar Sample{
memoize=true;
output=AST;
}
tokens {
RegExp;
}
regExpression:
'/' (a=~('/' | NL))+ '/'
-> ^(RegExp[$regExpression.start, $regExpression.text] $a+ )
;
NL: '\n' | '\r';
And here are the errors after running the java -jar antlr-3.5.2-complete.jar Sample.g
error(10): internal error: Sample.g : java.lang.NullPointerException
org.antlr.grammar.v3.CodeGenTreeWalker.getTokenElementST(CodeGenTreeWalker.java:311)
org.antlr.grammar.v3.CodeGenTreeWalker.notElement(CodeGenTreeWalker.java:2886)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2431)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2446)
org.antlr.grammar.v3.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:2250)
org.antlr.grammar.v3.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1798)
org.antlr.grammar.v3.CodeGenTreeWalker.ebnf(CodeGenTreeWalker.java:3014)
org.antlr.grammar.v3.CodeGenTreeWalker.element(CodeGenTreeWalker.java:2495)
org.antlr.grammar.v3.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:2250)
org.antlr.grammar.v3.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1798)
org.antlr.grammar.v3.CodeGenTreeWalker.rule(CodeGenTreeWalker.java:1321)
org.antlr.grammar.v3.CodeGenTreeWalker.rules(CodeGenTreeWalker.java:955)
org.antlr.grammar.v3.CodeGenTreeWalker.grammarSpec(CodeGenTreeWalker.java:877)
org.antlr.grammar.v3.CodeGenTreeWalker.grammar_(CodeGenTreeWalker.java:518)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:415)
org.antlr.Tool.generateRecognizer(Tool.java:674)
org.antlr.Tool.process(Tool.java:487)
org.antlr.Tool.main(Tool.java:98)
You're trying to use a rewrite rule (tree construction) on a lexer rule. That doesn't make sense.
In ANTLR, all rules with name starting with an uppercase letter are lexer rules. The tree construction is used on AST nodes, not on tokens themselves, so you have to use it on parser rules (starting with lowercase letter).
When you do that, keep in mind that your NL is a fragment now (you cannot use fragments in parser rules) and make sure your ANY token doesn't collide with anything else, i.e. define all needed tokens (/, NL etc.) and put them above the ANY token definition.

YML syntax error

I want to have a multi-line bit of markdown java in a yam file. I tried many things but I guess I don't quite get the quoting rules of Yaml.
{
title: Museum,
body: |
"```java
code code code
java2",
answers: [
"`museum`",
"`museum.getFloor(3)`",
"`museum.getFloor(3).getExhibit(5)`",
"`museum.getFloor(3).getExhibit(5).getCurator()`",
"`museum.getFloor(3).getExhibit(5).getCurator().name`",
"`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
]
}
Produces:
/Users/pitosalas/.rbenv/versions/2.3.1/lib/ruby/2.3.0/psych.rb:377:in `parse': (generator/test.yml): found character that cannot start any token while scanning for the next token at line 3 column 9 (Psych::SyntaxError)
YAML has two styles: the JSON like flow style and the much better human readable block style.
Roughly speaking you can have nested structures each style nested within itself and can have flow style nested within block style, but block style nested within flow style is not allowed.
Your to level { and } are flow style but you try to introduce, with |, a literal block style scalar within that flow style. Replace the flow style with block style upwards from that scalar:
title: Museum
body: |
"```java
code code code
java2"
answers: [
"`museum`",
"`museum.getFloor(3)`",
"`museum.getFloor(3).getExhibit(5)`",
"`museum.getFloor(3).getExhibit(5).getCurator()`",
"`museum.getFloor(3).getExhibit(5).getCurator().name`",
"`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
]
and your YAML is fine. Note that the double quotes "around" the value for the key body are not going to be stripped when loading, maybe that is not what you intended.
You should IMO not leave out the trailing , after the last value in the (flow style) sequence that is the value for answers. This will certainly lead to errors when you extend the list and forget to put in the trailing comma on the line above.
I would personally go for block style all the way:
title: Museum
body: |
"```java
code code code
java2"
answers:
- "`museum`"
- "`museum.getFloor(3)`"
- "`museum.getFloor(3).getExhibit(5)`"
- "`museum.getFloor(3).getExhibit(5).getCurator()`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
When dealing with YAML file generation that is convoluted or complex, or when it's not working as I expect, I revert to letting Ruby show me the way:
require 'yaml'
body = <<EOT
"```java
code code code
java2
"
EOT
answers = %w(
`museum`
`museum.getFloor(3)`
`museum.getFloor(3).getExhibit(5)`
`museum.getFloor(3).getExhibit(5).getCurator()`
`museum.getFloor(3).getExhibit(5).getCurator().name`
`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`
)
obj = {
"title" => "Museum",
"body" => body,
"answers" => answers
}
puts obj.to_yaml
Which, in this case, outputs:
---
title: Museum
body: |
"```java
code code code
java2
"
answers:
- "`museum`"
- "`museum.getFloor(3)`"
- "`museum.getFloor(3).getExhibit(5)`"
- "`museum.getFloor(3).getExhibit(5).getCurator()`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name`"
- "`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"
If you then pass that YAML back into the parser, you should get the original data structure back:
YAML.load(obj.to_yaml)
# => {"title"=>"Museum",
# "body"=>"\"```java\n" +
# "code code code\n" +
# "java2\n" +
# "\"\n",
# "answers"=>
# ["`museum`",
# "`museum.getFloor(3)`",
# "`museum.getFloor(3).getExhibit(5)`",
# "`museum.getFloor(3).getExhibit(5).getCurator()`",
# "`museum.getFloor(3).getExhibit(5).getCurator().name`",
# "`museum.getFloor(3).getExhibit(5).getCurator().name.toUpper()`"]}

Resources