In Python we can use pass clause as an placeholder.
What is the equivalent clause in Golang?
An ; or something else?
The Go Programming Language Specification
Empty statements
The empty statement does nothing.
EmptyStmt = .
Notation
The syntax is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following
operators, in increasing precedence:
| alternation
() grouping
[] option (0 or 1 times)
{} repetition (0 to n times)
Lower-case production names are used to identify lexical tokens.
Non-terminals are in CamelCase. Lexical tokens are enclosed in double
quotes "" or back quotes ``.
The form a … b represents the set of characters from a through b as
alternatives. The horizontal ellipsis … is also used elsewhere in the
spec to informally denote various enumerations or code snippets that
are not further specified. The character … (as opposed to the three
characters ...) is not a token of the Go language.
The empty statement is empty. In EBNF (Extended Backus–Naur Form) form: EmptyStmt = . or an empty string.
For example,
for {
}
var no
if true {
} else {
no = true
}
Related
Here are the examples:
Transfer-Encoding = "Transfer-Encoding" ":" 1#transfer-coding
Upgrade = "Upgrade" ":" 1#product
Server = "Server" ":" 1*( product | comment )
delta-seconds = 1*DIGIT
Via = "Via" ":" 1#( received-protocol received-by [ comment ] )
chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
date3 = month SP ( 2DIGIT | ( SP 1DIGIT ))
Questions are:
What is the 1#transfer-coding (the 1# regarding the rule transfer-coding)? Same with 1#product.
What does 1 times x mean, as in 1*( product | comment )? Or 1*DIGIT.
What do the brackets mean, as in [ comment ]? The parens (...) group it all, but what about the [...]?
What does the *(...) mean, as in *( ";" chunk-ext-name [ "=" chunk-ext-val ] )?
What do the nested square brackets mean, as in [ abs_path [ "?" query ]]? Nested optional values? It doesn't make sense.
What does 2DIGIT and 1DIGIT mean, where do those come from / get defined?
I may have missed where these are defined, but knowing these would help clarify how to parse the grammar definitions they use in the RFCs.
I get the rest of the grammar notation, juts not these few remaining pieces.
Update: Looks like this is a good start.
Square brackets enclose an optional element sequence:
[foo bar]
is equivalent to
*1(foo bar).
Specific Repetition: nRule
A rule of the form:
<n>element
is equivalent to
<n>*<n>element
That is, exactly <n> occurrences of <element>. Thus, 2DIGIT is a
2-digit number, and 3ALPHA is a string of three alphabetic
characters.
Variable Repetition: *Rule
The operator "*" preceding an element indicates repetition. The full
form is:
<a>*<b>element
where <a> and <b> are optional decimal values, indicating at least
<a> and at most <b> occurrences of the element.
Default values are 0 and infinity so that *<element> allows any
number, including zero; 1*<element> requires at least one;
3*3<element> allows exactly 3; and 1*2<element> allows one or two.
But what I'm still missing is what the # means?
Update 2: Found it I think!
#RULE: LISTS
A construct "#" is defined, similar to "*", as follows:
<l>#<m>element
indicating at least <l> and at most <m> elements, each separated
by one or more commas (","). This makes the usual form of lists
very easy; a rule such as '(element *("," element))' can be shown
as "1#element".
Also, what do these mean?
1*2DIGIT
2*4DIGIT
I want to parse template strings:
`Some text ${variable.name} and so on ... ${otherVariable.function(parameter)} ...`
Here is my grammar:
varname: VAR ;
variable: varname funParameter? ('.' variable)* ;
templateString: '`' (TemplateStringLiteral* '${' variable '}' TemplateStringLiteral*)+ '`' ;
funParameter: '(' variable? (',' variable)* ')' ;
WS : [ \t\r\n\u000C]+ -> skip ;
TemplateStringLiteral: ('\\`' | ~'`') ;
VAR : [$]?[a-zA-Z0-9_]+|[$] ;
When the input for the grammar is parsed, the template string has no whitespaces anymore because of the WS -> skip. When I put the TemplateStringLiteral before WS, I get the error:
extraneous input ' ' expecting {'`'}
How can I allow whitespaces to be parsed and not skipped only inside the template string?
What is currently happening
When testing your example against your current grammar displaying the generated tokens, the lexer gives this:
[#0,0:0='`',<'`'>,1:0]
[#1,1:4='Some',<VAR>,1:1]
[#2,6:9='text',<VAR>,1:6]
[#3,11:12='${',<'${'>,1:11]
[#4,13:20='variable',<VAR>,1:13]
[#5,21:21='.',<'.'>,1:21]
[#6,22:25='name',<VAR>,1:22]
[#7,26:26='}',<'}'>,1:26]
... shortened ...
[#26,85:84='<EOF>',<EOF>,2:0]
This tells you, that Some which you intended to be TemplateStringLiteral* was actually lexed to be VAR. Why is this happening?
As mentioned in this answer, antlr uses the longest possible match to create a token. Since your TemplateStringLiteral rule only matches single characters, but your VAR rule matches infinitely many, the lexer obviously uses the latter to match Some.
What you could try (Spoiler: won't work)
You could try to modify the rule like this:
TemplateStringLiteral: ('\\`' | ~'`')+ ;
so that it captures more than one character and therefore will be preferred. This has two reasons why it does not work:
How would the lexer match anything to the VAR rule, ever?
The TemplateStringLiteral rule now also matches ${ therefore prohibiting the correct recognition of the start of a template chunk.
How to achieve what you actually want
There might be another solution, but this one works:
File MartinCup.g4:
parser grammar MartinCup;
options { tokenVocab=MartinCupLexer; }
templateString
: BackTick TemplateStringLiteral* (template TemplateStringLiteral*)+ BackTick
;
template
: TemplateStart variable TemplateEnd
;
variable
: varname funParameter? (Dot variable)*
;
varname
: VAR
;
funParameter
: OpenPar variable? (Comma variable)* ClosedPar
;
File MartinCupLexer.g4:
lexer grammar MartinCupLexer;
BackTick : '`' ;
TemplateStart
: '${' -> pushMode(templateMode)
;
TemplateStringLiteral
: '\\`'
| ~'`'
;
mode templateMode;
VAR
: [$]?[a-zA-Z0-9_]+
| [$]
;
OpenPar : '(' ;
ClosedPar : ')' ;
Comma : ',' ;
Dot : '.' ;
TemplateEnd
: '}' -> popMode;
This grammar uses lexer modes to differentiate between the inside and the outside of the curly braces. The VAR rule is now only active after ${ has been encountered and only stays active until } is read. It thereby does not catch non-template text like Some.
Notice that the use of lexer modes requires a split grammar (separate files for parser and lexer grammars). Since no lexer rules are allowed in a parser grammar, I had to introduce tokens for the parentheses, comma, dot and backticks.
About the whitespaces
I assume you want to keep whitespaces inside the "normal text", but not allow whitespace inside the templates. Therefore I simply removed the WS rule. You can always re-add it if you like.
I tested your alternative grammar, where you put TemplateStringLiteral above WS, but contrary to your observation, this gives me:
line 1:1 extraneous input 'Some' expecting {'${', TemplateStringLiteral}
The reason for this is the same as above, Some is lexed to VAR.
Why is there a semicolon at the end of Proc.num_stack_slots.(i) <- 0 in the following code?
I thought semicolons are separators in OCaml. Can we always put an optional semicolon for the last expression of a block?
for i = 0 to Proc.num_register_classes - 1 do
Proc.num_stack_slots.(i) <- 0;
done;
See https://github.com/def-lkb/ocaml-tyr/blob/master/asmcomp/coloring.ml line 273 for the complete example.
There is no need for a semicolon after this expression, but as a syntactic courtesy, it is allowed here. In the example, you referenced, there is a semicolon, because after it a second expression follows.
Essentially, you can view a semicolon as a binary operator, that takes two-unit expressions, executes them from left to right, and returns a unit.
val (;): unit -> unit -> unit
then the following example will be more understandable:
for i = 1 to 5 do
printf "Hello, ";
printf "world\n"
done
here ; works just a glue. It is allowed to put a ; after the second expression, but only as the syntactic sugar, nothing more than a courtesy from compiler developers.
If you open a parser definition of the OCaml compiler you will see, that an expression inside a seq_expr can be ended by a semicolumn:
seq_expr:
| expr %prec below_SEMI { $1 }
| expr SEMI { reloc_exp $1 }
| expr SEMI seq_expr { mkexp(Pexp_sequence($1, $3)) }
That means that you can even write such strange code:
let x = 2 in x; let y = 3 in y; 25
In PHP you need to use preg_quote() to escape all the characters in a string that have a particular meaning in a regular expression, to allow (for example) preg_match() to search for those special characters.
What is the equivalent in Ruby of the following code?
// The content of this variable is obtained from user input, in example.
$search = "$var = 100";
if (preg_match('/' . preg_quote($search, '/') . ";/i")) {
// …
}
You want Regexp.escape.
str = "[...]"
re = /#{Regexp.escape(str)}/
"la[...]la[...]la".gsub(re,"") #=> "lalala"
I need to be able to match a certain string ('[' then any number of equals signs or none then '['), then i need to match a matching close bracket (']' then the same number of equals signs then ']') after some other match rules. ((options{greedy=false;}:.)* if you must know). I have no clue how to do this in ANTLR, how can i do it?
An example: I need to match [===[whatever arbitrary text ]===] but not [===[whatever arbitrary text ]==].
I need to do it for an arbitrary number of equals signs as well, so therein lies the problem: how do i get it to match an equal number of equals signs in the open as in the close? The supplied parser rules so far dont seem to make sense as far as helping.
You can't easely write a lexer for it, you need parsing rules. Two rules should be sufficient. One is responsible for matching the braces, one for matching the equal signs.
Something like this:
braces : '[' ']'
| '[' equals ']'
;
equals : '=' equals '='
| '=' braces '='
;
This should cover the use case you described. Not absolute shure but maybe you have to use a predicate in the first rule of 'equals' to avoid ambiguous interpretations.
Edit:
It is hard to integrate your greedy rule and at the same time avoid a lexer context switch or something similar (hard in ANTLR). But if you are willing to integrate a little bit of java in your grammer you can write an lexer rule.
The following example grammar shows how:
grammar TestLexer;
SPECIAL : '[' { int counter = 0; } ('=' { counter++; } )+ '[' (options{greedy=false;}:.)* ']' ('=' { counter--; } )+ { if(counter != 0) throw new RecognitionException(input); } ']';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
rule : ID
| SPECIAL
;
Your tags mention lexing, but your question itself doesn't. What you're trying to do is non-regular, so I don't think it can be done as part of lexing (though I don't remember if ANTLR's lexer is strictly regular -- it's been a couple of years since I last used ANTLR).
What you describe should be possible in parsing, however. Here's the grammar for what you described:
thingy : LBRACKET middle RBRACKET;
middle : EQUAL middle EQUAL
| LBRACKET RBRACKET;