GATE - JAPE rule nested ‘contains’ operators - correct syntax - syntax

I am getting errors when I try to create ‘Sentence contains’ jape rules with OR operators, i.e when a Sentence contains 1 OR 2 AND 3 OR 4:
(
{
Sentence contains { Annotation1 | Annotation2 },
Sentence contains { Annotation3 | Annotation4 }
}
)
:temp
-->
Can someone please advise on the correct syntax?

There is no such thing like AND operator in LSH jape grammar and we cannot use OR operator inside contextual operators ie; contains and within. Instead you can code like this.
(
({Sentence contains {Annotation1}} | {Sentence contains {Annotation2}})
({Sentence contains {Annotation3}} | {Sentence contains {Annotation4}})
)
:temp
-->

Related

Why || can't be used in pattern matching?

In OCaml when I do a pattern matching I can't do the following:
let rec example = function
| ... -> ...
| ... || ... -> ... (* here I get a syntax error because I use ||*)
Instead I need to do:
let rec example1 = function
|... -> ...
|... | ... -> ...
I know that || means or in OCaml, but why do we need to use only one 'pipe' : | to specify 'or' in pattern matching?
Why don't the usual || work?
|| doesn't really mean "or" generally, it means "boolean or", or rather it's the boolean or operator. Operators operate on values resulting from the evaluation of expressions, its operands. Operations and operands together also form expressions which can then be used as operands with other operators to form further expressions and so on.
Pattern matching on the other hand evaluate patterns, which are neither boolean or expressions. Although patterns do in a sense evaluate to true or false if applied to, or rather matched against, a value, they do not evaluate to anything on their own. They are in that sense more like operators than operands. Furthermore, the result of matching against a pattern is not just a boolean value, but also a set of bindings.
Using || instead of | with patterns would overload its meaning and serve more to confuse than to clarify I think.

What is the empty statement in Golang?

In Python we can use pass clause as an placeholder.
What is the equivalent clause in Golang?
An ; or something else?
The Go Programming Language Specification
Empty statements
The empty statement does nothing.
EmptyStmt = .
Notation
The syntax is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following
operators, in increasing precedence:
| alternation
() grouping
[] option (0 or 1 times)
{} repetition (0 to n times)
Lower-case production names are used to identify lexical tokens.
Non-terminals are in CamelCase. Lexical tokens are enclosed in double
quotes "" or back quotes ``.
The form a … b represents the set of characters from a through b as
alternatives. The horizontal ellipsis … is also used elsewhere in the
spec to informally denote various enumerations or code snippets that
are not further specified. The character … (as opposed to the three
characters ...) is not a token of the Go language.
The empty statement is empty. In EBNF (Extended Backus–Naur Form) form: EmptyStmt = . or an empty string.
For example,
for {
}
var no
if true {
} else {
no = true
}

Conflicts in ocamlyacc

I am trying to write a parser for a simple language that recognizes integer and float expressions using ocamlyacc. However I want to introduce the possiblity of having variables. So i defined the token VAR in my lexer.mll file which allows it to be any alphanumneric string starting with a capital letter.
expr:
| INT { $1 }
| VAR { /*Some action */}
| expr PLUS expr { $1 + $3 }
| expr MINUS expr { $1 - $3 }
/* and similar rules below for real expressions differently */
Now i have a similar definition for real numbers. However when i run this file, I get 2 reduce/reduce conflict because if i just enter a random string(identified as token VAR). The parser would not know if its a real or an integer type of variable as the keyword VAR is present in defining both int and real expressions in my grammar.
Var + 12 /*means that Var has to be an integer variable*/
Var /*Is a valid expression according to my grammar but can be of any type*/
How do I eliminate this reduce/reduce conflict without losing the generality of variable declaration and mainting the 2 data types available to me.

ANTLR 3 - how do I make unique tokens with NOT across special chars

I have a short question:
// Lexer
LOOP_NAME : (LETTER|DIGIT)+;
OTHERCHARS : ~('>' | '}')+;
LETTER : ('A'..'Z')|('a'..'z');
DIGIT : ('0'..'9');
A_ELEMENT
: (LETTER|'_')*(LETTER|DIGIT|'_'|'.');
// Parser-Konfiguration
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
My problem is that this is impossible due to:
As a result, alternative(s) 2 were disabled for that input [14:55:32]
error(208): ltxt2.g:61:1: The following token definitions can never be
matched because prior tokens match the same input:
LETTER,DIGIT,A_ELEMENT,WS
My issue is that I also need to catch UTF8 with OTHERCHARS... and I cannot put all special UTF8 chars into a Lexer rule since I cannot range like ("!".."?").
So I need the NOT (~). The OTHERCHARS here can be everything but ">" or "}". These two close a literal context and are forbidden within.
It doesn't seem such cases are referenced very well, so I'd be happy if someone knew a workaround. The NOT operator here creates the ambivalence I need to solve.
Thanks in advance.
Best,
wishi
Move OTHERCHARS to the very end of the lexer and define it like this:
OTHERCHARS : . ;
In the Java target, this will match a single UTF-16 code point which is not matched by a previous rule. I typically name the rule ANY_CHAR and treat it as a fall-back. By using . instead of .+, the lexer will only use this rule if no other rule matches.
If another rule matches more than one character, that rule will have priority over ANY_CHAR due to matching a larger number of characters from the input.
If another rule matches exactly one character, that rule will have priority over ANY_CHAR due to appearing earlier in the grammar.
Edit: To exclude } and > from the ANY_CHAR rule, you'll want to create rules for them so they are covered under point 2.
RBRACE : '}' ;
GT : '>' ;
ANY_CHAR : . ;

ANTLR parse problem

I need to be able to match a certain string ('[' then any number of equals signs or none then '['), then i need to match a matching close bracket (']' then the same number of equals signs then ']') after some other match rules. ((options{greedy=false;}:.)* if you must know). I have no clue how to do this in ANTLR, how can i do it?
An example: I need to match [===[whatever arbitrary text ]===] but not [===[whatever arbitrary text ]==].
I need to do it for an arbitrary number of equals signs as well, so therein lies the problem: how do i get it to match an equal number of equals signs in the open as in the close? The supplied parser rules so far dont seem to make sense as far as helping.
You can't easely write a lexer for it, you need parsing rules. Two rules should be sufficient. One is responsible for matching the braces, one for matching the equal signs.
Something like this:
braces : '[' ']'
| '[' equals ']'
;
equals : '=' equals '='
| '=' braces '='
;
This should cover the use case you described. Not absolute shure but maybe you have to use a predicate in the first rule of 'equals' to avoid ambiguous interpretations.
Edit:
It is hard to integrate your greedy rule and at the same time avoid a lexer context switch or something similar (hard in ANTLR). But if you are willing to integrate a little bit of java in your grammer you can write an lexer rule.
The following example grammar shows how:
grammar TestLexer;
SPECIAL : '[' { int counter = 0; } ('=' { counter++; } )+ '[' (options{greedy=false;}:.)* ']' ('=' { counter--; } )+ { if(counter != 0) throw new RecognitionException(input); } ']';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
rule : ID
| SPECIAL
;
Your tags mention lexing, but your question itself doesn't. What you're trying to do is non-regular, so I don't think it can be done as part of lexing (though I don't remember if ANTLR's lexer is strictly regular -- it's been a couple of years since I last used ANTLR).
What you describe should be possible in parsing, however. Here's the grammar for what you described:
thingy : LBRACKET middle RBRACKET;
middle : EQUAL middle EQUAL
| LBRACKET RBRACKET;

Resources