antlr v3 context-aware conditional comment inclusion - comments

I'm modifying a DSL grammar for a product that is in public use.
Currently all /*...*/ comments are silently ignored, but I need to modify it so that comments that are placed before certain key elements are parsed into the AST.
I need to maintain backwards compatibility whereby users can still add comments arbitrarily throughout the DSL and only those key comments are included.
The parser grammar currently looks a bit like this:
grammar StateGraph;
graph: 'graph' ID '{' graph_body '}';
graph_body: state+;
state: 'state' ID '{' state_body '}';
state_body: transition* ...etc...;
transition: 'transition' (transition_condition) ID ';';
COMMENT: '/*' ( options {greedy=false;} : . )* '*/' {skip();}
Comments placed before the 'graph' and 'state' elements contain meaningful description and annotations and need to be included within the parsed AST.
So I've modified those two rules and am no longer skipping COMMENT:
graph: comment* 'graph' ID '{' graph_body '}';
state: comment* 'state' ID '{' state_body '}';
COMMENT: '/*' ( options {greedy=false;} : . )* '*/'
If I naively use the above, the other comments cause mismatched token errors when subsequently executing the tree parser.
How do I ignore all instances of COMMENT that are not placed in front of 'graph' or 'state'?
An example DSL would be:
/* Some description
* #some.meta.info
*/
graph myGraph {
/* Some description of the state.
* #some.meta.info about the state
*/
state first {
transition if (true) second; /* this comment ignored */
}
state second {
}
/* this comment ignored */
}

This is the solution I've actually got working.
I'd love feedback.
The basic idea is to send comments to the HIDDEN channel, manually extract them in the places where I want them,
and to use rewrite rules to re-insert the comments where needed.
The extraction step is inspired by the information here: http://www.antlr.org/wiki/pages/viewpage.action?pageId=557063.
The grammar is now:
grammar StateGraph;
#tokens { COMMENTS; }
#members {
// matches comments immediately preceding specified token on any channel -> ^(COMMENTS COMMENT*)
CommonTree treeOfCommentsBefore(Token token) {
List<Token> comments = new ArrayList<Token>();
for (int i=token.getTokenIndex()-1; i >= 0; i--) {
Token t = input.get(i);
if (t.getType() == COMMENT) {
comments.add(t);
}
else if (t.getType() != WS) {
break;
}
}
java.util.Collections.reverse(comments);
CommonTree commentsTree = new CommonTree(new CommonToken(COMMENTS, "COMMENTS"));
for (Token t: comments) {
commentsTree.addChild(new CommonTree(t));
}
return commentsTree;
}
}
graph
: 'graph' ID '{' graph_body '}'
-> ^(ID {treeOfCommentsBefore($start)} graph_body);
graph_body: state+;
state
: 'state' ID '{' state_body '}'
-> ^(ID {treeOfCommentsBefore($start)} staty_body);
state_body: transition* ...etc...;
transition: 'transition' (transition_condition) ID ';';
COMMENT: '/*' .* '*/' {$channel=HIDDEN;}

Does this work for you?
grammar StateGraph;
graph: 'graph' ID '{' graph_body '}';
graph_body: state+;
state: .COMMENT 'state' ID '{' state_body '}';
state_body: .COMMENT transition* ...etc...;
transition: 'transition' (transition_condition) ID ';';
COMMENT: '/*' ( options {greedy=false;} : . )* '*/' {skip();}

How do I ignore all instances of COMMENT that are not placed in front of 'graph' or 'state'?
You can do that by checking after the closing "*/" of a comment if there is either 'graph' or 'state' ahead, with some optional spaces in between. If this is the case, don't do anything, and if that's not the case, the predicate fails and you fall through the rule and simply skip() the comment token.
In ANTLR syntax that would look like:
COMMENT
: '/*' .* '*/' ( (SPACE* (GRAPH | STATE))=> /* do nothing, so keep this token */
| {skip();} /* or else, skip it */
)
;
GRAPH : 'graph';
STATE : 'state';
SPACES : SPACE+ {skip();};
fragment SPACE : ' ' | '\t' | '\r' | '\n';
Note that .* and .+ are ungreedy by default: no need to set options{greedy=false;}.
Also, be aware that you don't use SPACES in your COMMENT rule since SPACES executes the skip() method, when called!

Related

How can I / Is it possible to warn the user for unused variables within a logic rule in a Prolog-like DSL developer through Xtext?

I'm new here but I hope someone can help me.
I'm developing a Prolog-like DSL for an university project.
This is a simplified grammar that I use to expertiment stuff:
grammar it.unibo.gciatto.Garbage hidden (SL_COMMENT, ML_COMMENT, WS, ANY_OTHER)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate garbage "http://www.unibo.it/gciatto/Garbage"
PTheory returns Theory
: (kb+=PExpression '.')*
;
PExpression returns Expression
: PRule
;
PRule returns Expression
: PConjunction ({ Expression.left=current } name=':-' right=PConjunction)?
;
PConjunction returns Expression
: PExpression0 ({ Expression.left=current } name=',' right=PConjunction)?
;
PExpression0 returns Expression
: PTerm
| '(' PExpression ')'
;
PTerm returns Term
: PStruct
| PVariable
| PNumber
;
PVariable returns Variable
: { AnonymousVariable } name='_'
| name=VARIABLE
;
PNumber returns Number
: value=INT
;
PStruct returns Struct
: name=ATOM '(' arg+=PExpression0 (',' arg+=PExpression0)* ')'
| PAtom
;
PAtom returns Atom
: name=ATOM
| { AtomString } name=STRING
;
terminal fragment CHARSEQ : ('a'..'z' | 'A' .. 'Z' | '0'..'9' | '_')*;
terminal ATOM : ('a'..'z') CHARSEQ;
terminal VARIABLE : ('A'..'Z') CHARSEQ;
terminal INT returns ecore::EInt: ('0'..'9')+;
terminal STRING :
'"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"' |
"'" ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|"'") )* "'"
;
terminal ML_COMMENT : '/*' -> '*/';
terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
terminal WS : (' '|'\t'|'\r'|'\n')+;
terminal ANY_OTHER: .;
When validating I'd love to search for unused variables in rules definition and suggest the user to use anonymous variable instead. Once I've understood the mechanism I may consider similar validation rules.
I know Xtext has a built-in scoping mechanism and I've been able to use it in different situations, but as you know, any IScopeProvider provides a scope for a given EReference (am I right?) and, as you can see, my grammar has no cross-references. The reason for that is simple: in Prolog a variable "definition" and its "references" are syntactically the same, so no context-free parser able to distinguish the two contexts can be generated (I'm pretty sure, even without a formal proof).
However, I think the validation algorithm is quite simple:
"while navigating the AST, collect any variable within an ad-hoc data structure and the count occurrences" or something smarter than that
Now the real question is: can I someway (re)use any part of the Xtext scoping framework, and if yes how? Or should I build a simple scoping library by my self?
Sorry for the long question and bad english, I hope I was exhaustive.
Thank you for reading.
The Xtext validation framework can reuse your scope provider instance easily, and can write validation rules. A sample for the validator is already generated for the Xtext grammar, you have to extend it with your specific validation case like follows:
public class GarbageLanguageJavaValidator extends AbstractGarbageLanguageJavaValidator {
#Inject
GarbageLanguageScopeProvider scopeProvider;
//Validation rule for theories. For any other element, change the input parameter
#Check
public void checkTheory(Theory theory) {
//here, you can simply reuse the injected scope provider
scopeProvider.getAllReferencesInTheory();
//in case of problems, report errors using the inherited error/warning methods
}
}
The created validation rules are automatically registered and executed (see also the Xtext documentation for details about validation).
I actually solved the problem a few days after I posted the question and then I was too busy to post the solution. Here it comes (for a more detailed description of both the problem and the solution, you are free to read the essay I wrote: RespectX - section 4.5 ).
I created my own IQualifiedNameProvider, called IPrologSimpleNameProvider, which simply returns the 'name' feature.
Then I created my own IContextualScopeProvider, which doesn't extend IScopeProvider. It exposes the following methods:
getContext given any AST node, it returns the root of the current context, i.e.
the EObject whose eContainer is instance of Theory and containing the
input node within its sub-tree.
getScope returns an IScope for the context of the input node.
getFilteredScope applies a type-filter to a getScope invocation (e.g. it makes
it easy to create a scope containing only Variables).
getFilteredScope filters a getScope invocation using a predicate.
Of course, the IContextualScopeProvider implementation uses an IPrologSimpleNameProvider implementation so, now, the validation rule quite simple to realize:
Given a variable, it uses the getScope method which returns an IScope containing all the variables within that context
It counts how many variables within the IScope are named after the current one
If they are lesser than 2 a warning is found.
I really hope I explained ^^"

preg_match not redirecting properly

I am using the following code to determine the proper redirect. The top check matches as expected and selects the proper page to load, but the second check does not return a match and I end up being redirected to the default page. I have compared the two lines until I can't see straight. They look identical in form and function, other than the obvious difference in what I am trying to match of course. What am I missing? Is there another way to accomplish my goal that might be a better choice? Thanks as always.
if (preg_match('/\Handicap Summer Foursomes/', $eventname)) {
$form = '<a class="primary" href="signup_HSF.php?eid=' . $EID . '&squads=2">Sign-up</a>';
} else if (preg_match('/\Gigantic 5/', $eventname)) {
$form = '<a class="primary" href="signup_G5.php?eid=' . $EID . '&squads=2">Sign-up</a>';
} else {
$form = '<a class="primary" href="signup.php?eid=' . $EID . '&squads=2">Sign-up</a>';
}
if (preg_match('/\Handicap Summer Foursomes/', $eventname)) {
^^
In preg, \H is "any character that is not a horizontal whitespace character". While this would match a literal H, because that's not a horiz-WS char, it's still an escape sequence you need to be aware of.
Ditto for \G in the other pattern. in preg, \G is "first matching position in subject". That's definitely not going to match.

CodeIgniter: Disallowed Key Characters

I have the same problem as the people below, but the solutions offered for them does not work for me.
CodeIgniter - disallowed key characters
CodeIgniter Disallowed Key Characters
Disallowed key characters error message in Codeigniter (v2)
I get "Disallowed Key Characters" when I submit a form.
I have CSRF protection enabled, and I am using arrays in my form field names (i.e., search[] as the name as there are multiple selection dropdown options). I have a feeling it is the "[]" in the form name that bothers this form.
I have followed all advice I could see in the posts above.
I disabled CSRF temporarily,
I disabled XSS temporarily,
I edited $config['permitted_uri_chars'] and
I edited Input.php where this message is generated.
Anybody has any additional ideas of what could cause this problem on form submission?
Thanks!
Like my answer here — you just need to update the regex in MY_Input->_clean_input_keys() to allow more characters (eg escaped JSON, or escaped HTML/XML)
Allow just 'English': !preg_match("/^[a-z0-9\:\;\.\,\?\!\#\#\$%\^\*\"\~\'+=\\\ &_\/\.\[\]-\}\{]+$/iu", $str)
Allow Chinese Characters: !preg_match("/^[a-z0-9\x{4e00}-\x{9fa5}\:\;\.\,\?\!\#\#\$%\^\*\"\~\'+=\\\ &_\/\.\[\]-\}\{]+$/iu", $str)
My full working function looks like this:
public function _clean_input_keys($str) {
// NOTE: \x{4e00}-\x{9fa5} = allow chinese characters
// NOTE: 'i' — case insensitive
// NOTE: 'u' — UTF-8 mode
if (!preg_match("/^[a-z0-9\x{4e00}-\x{9fa5}\:\;\.\,\?\!\#\#\$%\^\*\"\~\'+=\\\ &_\/\.\[\]-\}\{]+$/iu", $str)) {
/**
* Check for Development enviroment - Non-descriptive
* error so show me the string that caused the problem
*/
if (is_env_dev()) {
var_dump($str);
}
exit('Disallowed Key Characters.');
}
// Clean UTF-8 if supported
if (UTF8_ENABLED === TRUE) {
return $this->uni->clean_string($str);
}
return $str;
}
my_helper.php
if (!function_exists('is_env_dev')) {
function is_env_dev() {
return (
defined('ENVIRONMENT') && strtolower(ENVIRONMENT) == 'development' ||
defined('ENVIRONMENT') && strtolower(ENVIRONMENT) == 'testing'
);
}
}
Thanks, but I found a comment hidden way below (right at the bottom at the time of this writing) on another post here: CodeIgniter Disallowed Key Characters
The comment suggested that I add $str to the exit() comment to test. This indicated that I had a missing double quote in my form fields. It is a very complex form built up dynamically, with 300 lines of code, so easy to miss.
Hope this answer (and the comment that inspired it) helps someone else.
Validating the source of the output could prevent problems such as this one :-)
Regards

How to extract token information from attribute "elements"?

1. Overall Task:
I want to customize the Java.stg to modify the token display format in the comment of the code block for an alternative of grammar rule.
2. Context:
one rule in my current grammar is:
temporal returns [String ret]:
NEXT disj
{$ret= $ret= "X ".concat($disj.ret);}
| EVENTUALLY disj
{$ret= "F ".concat($disj.ret);}`
;
The corresponding generated code block (in parser) is as follows:
switch (alt2) {
case 1 :
// RERS.g:26:7: NEXT disj
{
match(input,NEXT,FOLLOW_NEXT_in_temporal204);
pushFollow(FOLLOW_disj_in_temporal206);
disj2=disj();
state._fsp--;
ret = ret = "X ".concat(disj2);
}
break;
case 2 :
// RERS.g:28:7: EVENTUALLY disj
{
match(input,EVENTUALLY,FOLLOW_EVENTUALLY_in_temporal222);
pushFollow(FOLLOW_disj_in_temporal224);
disj3=disj();
state._fsp--;
ret = "F ".concat(disj3);
}
break;
3. My Goal:
Change the comment from format like // RERS.g:26:7: NEXT disj to NEXT_disj, i.e., from <fileName>:<description> to <MyOwnAttribute>
4. Attempt so far:
I tried to modify the template "alt(elements,altNum,description,autoAST,outerAlt,treeLevel,rew)" as follows:
alt(elements,altNum,description,autoAST,outerAlt,treeLevel,rew) ::= <<
/* <elements:ExtractToken()> */
{
<#declarations()>
<elements:element()> // as I understand, it's just an template expansion to apply the sub templates in each elements
<rew>
<#cleanup()>
}
>>
I checked that in this context, the value of attribute elements is something like {el=/tokenRef(), line=26, pos=7}{el=/ruleRef(), line=26, pos=12}{el=/execAction(), line=27, pos=7}.
I think I should "overload" the "tokenRef" template to spit out tokens formatted like "NEXT_disj"
5. Questions:
How to "overload" an existing template? I want to do that because I will have to modify the value of "elements" otherwise.
How can I only apply a template to a specific element in attribute "elements", instead of applying it to every element (like what template "element()" does)?
I think there should be some convenient way to achieve my goal. Any suggestion?
Thanks in advance.

Codigniter validation for exact maching of string

I want to setup my validation rules in codeigniter such as a field starts with character 'P' or 'S', other wise it is invalid. How can I do that using Codigniter validation library?
Test Case 1: input: A145874 ------- invalid Must start with P or S
Test Case 2: input: P258741 ------- valid
Test Case 3: input: P45KK91 ------- invalid Must not contain Letters in other positions rather the first one.
Test Case 4: input: S457821 ------- valid
You would need to write a custom validation rule. Something like this:
public function check_first_char($str) {
$first_char = substr($str, 0, 1);
if ($first_char != 'P' || $first_char != 'S') {
$this->form_validation->set_message('check_first_char', 'The %s field must begin with P or S!');
return FALSE;
} else {
return TRUE;
}
}
Then you would add that validation rule like this:
$this->form_validation->set_rules('field_name', 'Field Name', 'callback_check_first_char');
The documentation explains it all pretty clearly.

Resources