antlr4 and string template groups and rewrite option in grammar - url-rewriting

In ANtlr3 I could specify the template group from the parser. The grammar itself had the following options
options {
output=template;
rewrite=true;
language=CSharp4;
}
parser.TemplateGroup = template;
How is this done in Antlr4 and ST4?

ANTLR 4 does not have an output option (or rewrite option). In ANTLR 4, you would implement a listener for the parse tree automatically generated by the grammar, and perform all of the StringTemplate operations within the listener.

Related

Example to find a value in a list using common expression language (CEL)

I was following google common expression language specification, can some one guide me if i could do something like this:
I need to write an expression to find if "345" is in the phone_numbers list using google CEL .
json : {"phone_numbers": ["123","234","345"] }
example : phone_numbers.exist_one("345"); // this does not works ....
https://github.com/google/cel-spec/blob/master/doc/langdef.md#standard-definitions
I got the expression :
phone_numbers.exists_one(r, r=="345")
Since you're just testing for the existence of a single value in a list, I would recommend using the in operator:
'345' in phone_numbers
The exists_one macro is pretty useful if '345' can only appear exactly once in the list.

groupingBy operation in Java-8

I'm trying to re-write famous example of Spark's text classification (http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/) on Java 8.
I have a problem - in this code I'm making some data preparations for getting idfs of all words in all files:
termDocsRdd.collect().stream().flatMap(doc -> doc.getTerms().stream()
.map(term -> new ImmutableMap.Builder<String, String>()
.put(doc.getName(),term)
.build())).distinct()
And I'm stuck on the groupBy operation. (I need to group this by term, so each term must be a key and the value must be a sequence of documents).
In Scala this operation looks very simple - .groupBy(_._2).
But how can I do this in Java?
I tried to write something like:
.groupingBy(term -> term, mapping((Document) d -> d.getDocNameContainsTerm(term), toList()));
but it's incorrect...
Somebody knows how to write it in Java?
Thank You very much.
If I understand you correctly, you want to do something like this:
(import static java.util.stream.Collectors.*;)
Map<Term, Set<Document>> collect = termDocsRdd.collect().stream().flatMap(
doc -> doc.getTerms().stream().map(term -> new AbstractMap.SimpleEntry<>(doc, term)))
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toSet())));
The use of Map.Entry/ AbstractMap.SimpleEntry is due to the absence of a standard Pair<K,V> class in Java-8. Map.Entry implementations can fulfill this role but at the cost of having unintuitive and verbose type and method names (regarding the task of serving as Pair implementation).
If you are using the current Eclipse version (I tested with LunaSR1 20140925) with its limited type inference, you have to help the compiler a little bit:
Map<Term, Set<Document>> collect = termDocsRdd.collect().stream().flatMap(
doc -> doc.getTerms().stream().<Map.Entry<Document,Term>>map(term -> new AbstractMap.SimpleEntry<>(doc, term)))
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toSet())));

Getting First and Follow metadata from an ANTLR4 parser

Is it possible to extract the first and follow sets from a rule using ANTLR4? I played around with this a little bit in ANTLR3 and did not find a satisfactory solution, but if anyone has info for either version, it would be appreciated.
I would like to parse user input up the user's cursor location and then provide a list of possible choices for auto-completion. At the moment, I am not interested in auto-completing tokens which are partially entered. I want to display all possible following tokens at some point mid-parse.
For example:
sentence:
subjects verb (adverb)? '.' ;
subjects:
firstSubject (otherSubjects)* ;
firstSubject:
'The' (adjective)? noun ;
otherSubjects:
'and the' (adjective)? noun;
adjective:
'small' | 'orange' ;
noun:
CAT | DOG ;
verb:
'slept' | 'ate' | 'walked' ;
adverb:
'quietly' | 'noisily' ;
CAT : 'cat';
DOG : 'dog';
Given the grammar above...
If the user had not typed anything yet the auto-complete list would be ['The'] (Note that I would have to retrieve the FIRST and not the FOLLOW of rule sentence, since the follow of the base rule is always EOF).
If the input was "The", the auto-complete list would be ['small', 'orange', 'cat', 'dog'].
If the input was "The cat slept, the auto-complete list would be ['quietly', 'noisily', '.'].
So ANTLR3 provides a way to get the set of follows doing this:
BitSet followSet = state.following[state._fsp];
This works well. I can embed some logic into my parser so that when the parser calls the rule at which the user is positioned, it retrieves the follows of that rule and then provides them to the user. However, this does not work as well for nested rules (For instance, the base rule, because the follow set ignores and sub-rule follows, as it should).
I think I need to provide the FIRST set if the user has completed a rule (this could be hard to determine) as well as the FOLLOW set of to cover all valid options. I also think I will need to structure my grammar such that two tokens are never subsequent at the rule level.
I would have break the above "firstSubject" rule into some sub rules...
from
firstSubject:
'The'(adjective)? CAT | DOG;
to
firstSubject:
the (adjective)? CAT | DOG;
the:
'the';
I have yet to find any information on retrieving the FIRST set from a rule.
ANTLR4 appears to have drastically changed the way it works with follows at the level of the generated parser, so at this point I'm not really sure if I should continue with ANTLR3 or make the jump to ANTLR4.
Any suggestions would be greatly appreciated.
ANTLRWorks 2 (AW2) performs a similar operation, which I'll describe here. If you reference the source code for AW2, keep in mind that it is only released under an LGPL license.
Create a special token which represents the location of interest for code completion.
In some ways, this token behaves like the EOF. In particular, the ParserATNSimulator never consumes this token; a decision is always made at or before it is reached.
In other ways, this token is very unique. In particular, if the token is located at an identifier or keyword, it is treated as though the token type was "fuzzy", and allowed to match any identifier or keyword for the language. For ANTLR 4 grammars, if the caret token is located at a location where the user has typed g, the parser will allow that token to match a rule name or the keyword grammar.
Create a specialized ATN interpreter that can return all possible parse trees which lead to the caret token, without looking past the caret for any decision, and without constraining the exact token type of the caret token.
For each possible parse tree, evaluate your code completion in the context of whatever the caret token matched in a parser rule.
The union of all the results found in step 3 is a superset of the complete set of valid code completion results, and can be presented in the IDE.
The following describes AW2's implementation of the above steps.
In AW2, this is the CaretToken, and it always has the token type CARET_TOKEN_TYPE.
In AW2, this specialized operation is represented by the ForestParser<TParser> interface, with most of the reusable implementation in AbstractForestParser<TParser> and specialized for parsing ANTLR 4 grammars for code completion in GrammarForestParser.
In AW2, this analysis is performed primarily by GrammarCompletionQuery.TaskImpl.runImpl(BaseDocument).

Can I get ANTLR to generate ouput from an AST that I create by hand?

I understand how to generate AST from character streams using ANTLR.
I'd like to be able to create an AST programmatically and have ANTLR apply the rules in the grammar to produce valid output, for example adding in the syntactic stuff that isn't in the AST like say quotes and whitespace.
To use a tree walker you seem to need a TokenStream attached to the TreeNodeStream, but if the tree was created programmatically there is no TokenStream. Failing to call setTokenStream(...) on a CommonTreeNodeStream causes NullPointerExceptions at runtime.
example:
TokenStream tokens = new CommonTokenStream(new MyLexer(somestream));
// parse an AST 'ast' from this stream
CommonTreeNodeStream nodes = new CommonTreeNodeStream(ast);
// needs this or npe
nodes.setTokenStream(tokens);
new MyWalker(nodes).start();
so - can you create an AST on the fly without a character stream input and have some generated ANTLR class generate a character stream according to the rules defined in a grammar?

Access translated i18n messages from Scala templates (Play! Internationalization)

In my Play! 2.0 application I would like to define the following languages:
# The application languages
# ~~~~~
application.langs=en-GB,de-DE,nl-NL
I also have created 3 files that ends with the corresponding language codes:
Messages.en-GB
Messages.de-DE
Messages.nl-NL
When I start the application without any request for a translated key I get the following error message:
conf/application.conf: 12: Key 'de-DE' may not be followed by token: ',' (if you intended ',' to be part of the value for 'de-DE', try enclosing the value in double quotes)
Also when trying to access a message from the Scala template I still see the same message. I request the message by the following code:
#Messages("login.page")
The above changes I have done according to the Play manual: http://www.playframework.org/documentation/2.0/JavaI18N . So I have two questions:
How can I set the default langauge and change it like in 1.2.4 (Lang.change("en-GB"))
How to access the messages from the Scala templates?
In your scala file use:
<h1>#Messages("pack.key")</h1>
And in your java file use :
String title = Messages.get("pack.key");
Don't forget to add quote around your language list : conf/application.conf
application.langs="en-GB,de-DE,nl-NL"
Changing the language is not possible in Play! 2.0, see this discussion: http://groups.google.com/group/play-framework/browse_thread/thread/744d523c169333ac/5bfe28732a6efd89?show_docid=5bfe28732a6efd89
and this ticket: https://play.lighthouseapp.com/projects/82401-play-20/tickets/174-20-i18n-add-ability-to-define-implicit-lang-for-java-api#ticket-174-4
Although, when you register multiple languages you should enclose them in double qoutes, like this:
application.langs="en-GB,de-DE,nl-NL"
And then you can access them from the scala templates like this:
#Messages.get("login.title")
So currently the default language is the language that is defined in the file messages (without any prefix!!)
Or you can just use #Messages("not.logged.in")

Resources