History of trailing comma in programming language grammars - syntax

Many programming languages allow trailing commas in their grammar following the last item in a list. Supposedly this was done to simplify automatic code generation, which is understandable.
As an example, the following is a perfectly legal array initialization in Java (JLS 10.6 Array Initializers):
int[] a = { 1, 2, 3, };
I'm curious if anyone knows which language was first to allow trailing commas such as these. Apparently C had it as far back as 1985.
Also, if anybody knows other grammar "peculiarities" of modern programming languages, I'd be very interested in hearing about those also. I read that Perl and Python for example are even more liberal in allowing trailing commas in other parts of their grammar.

I'm not an expert on the commas, but I know that standard Pascal was very persnickity about semi-colons being statement separators, not terminators. That meant you had to be very very careful about where you put one if you didn't want to get yelled at by the compiler.
Later Pascal-esque languages (C, Modula-2, Ada, etc.) had their standards written to accept the odd extra semicolon without behaving like you'd just peed in the cake mix.

I just found out that a g77 Fortran compiler has the -fugly-comma Ugly Null Arguments flag, though it's a bit different (and as the name implies, rather ugly).
The -fugly-comma option enables use of a single trailing comma to mean “pass an extra trailing null argument” in a list of actual arguments to an external procedure, and use of an empty list of arguments to such a procedure to mean “pass a single null argument”.
For example, CALL FOO(,) means “pass two null arguments”, rather than “pass one null argument”. Also, CALL BAR() means “pass one null argument”.
I'm not sure which version of the language this first appeared in, though.

[Does anybody know] other grammar "peculiarities" of modern programming languages?
One of my favorites, Modula-3, was designed in 1990 with Niklaus Wirth's blessing as the then-latest language in the "Pascal family". Does anyone else remember those awful fights about where semicolon should be a separator or a terminator? In Modula-3, the choice is yours! The EBNF for a sequence of statements is
stmt ::= BEGIN [stmt {; stmt} [;]] END
Similarly, when writing alternatives in a CASE statement, Modula-3 let you use the vertical bar | as either a separator or a prefix. So you could write
CASE c OF
| 'a', 'e', 'i', 'o', 'u' => RETURN Char.Vowel
| 'y' => RETURN Char.Semivowel
ELSE RETURN Char.Consonant
END
or you could leave off the initial bar, perhaps because you prefer to write OF in that position.
I think what I liked as much as the design itself was the designers' awareness that there was a religious war going on and their persistence in finding a way to support both sides.
Let the programmer choose!
P.S. Objective Caml allows permissive use of | in case expressions whereas the earlier and closely related dialect Standard ML does not. As a result, case expressions are often uglier in Standard ML code.
EDIT: After seeing T.E.D.'s answer I checked the Modula-2 grammar and he's correct, Modula-2 also supported semicolon as terminator, but through the device of the empty statement, which makes stuff like
x := x + 1;;;;;; RETURN x
legal. I suppose that's not a bad thing. Modula-2 didn't allow flexible use of the case separator |, however; that seems to have originated with Modula-3.

Something which has always galled me about C is that although it allows an extra trailing comma in an intializer list, it does not allow an extra trailing comma in an enumerator list (for defining the literals of an enumeration type). This little inconsistency has bitten me in the ass more times than I care to admit. And for no reason!

Related

What are valid identifiers in R7RS-small?

R7RS-small says that all identifiers must be terminated by a delimiter, but at the same time it defines pretty elaborate rules for what can be in an identifier. So, which one is it?
Is an identifier supposed to start with an initial character and then continue until a delimiter, or does it start with an initial character and continue following the syntax defined in 7.1.1.
Here are a couple of obvious cases. Are these valid identifiers?
a#a
b,b
c'c
d[d]
If they are not supposed to be valid, what is the purpose of saying that an identifier must be terminated by a delimiter?
|..ident..| are delimiters for symbols in R7RS, to allow any character that you cannot insert in an old style symbol (| is the delimiter).
However, in R6RS the "official" grammar was incorrect, as it did not allow to define symbols such that 1+, which led all implementations define their own rules to overcome this illness of the official grammar.
Unless you need to read the source code of a given implementation and see how it defines the symbols, you should not care too much about these rules and use classical symbols.
In the section 7.1.1 you find the backus-naur form that defines the lexical structure of R7RS identifiers but I doubt the implementations follow it.
I quote from here
As with identifiers, different implementations of Scheme use slightly
different rules, but it is always the case that a sequence of
characters that contains no special characters and begins with a
character that cannot begin a number is taken to be a symbol
In other words, an implementation will use a function like read-atom and after that it will classify an atom by backtracking with read-number and if number? fails it will be a symbol.

Dangling topic (or any other) variable does not fail

This works:
$_ =
say "hi";
That is, you can put any amount of whitespace between an assignment and stuff that's behind, it will simply ignore it. You can use any variable (with my) too. Effectively, $_ will be assigned the result of the say, which is True.
Is this surprising, but up to spec, or simply surprising?
There may be any amount of whitespace either side of an operator. Thus:
say 1
+ 2
+ 3;
Is, so far as the compiler sees it, entirely the same as:
say 1 + 2 + 3;
Assignment (=) is just another operator, so also follows these rules.
Further, say is just a normal built-in subroutine, so it's just like:
my $answer = flip '24';
say $answer; # 42
Except with more whitespace:
my $answer =
flip '24';
say $answer; # 42
There are some places in Perl 6 where whitespace is significant, but the whitespace between infix operators is not one of them.
TL;DR P6 syntax is freeform.
Dangling topic (or any other) variable does not fail
The issue you describe is a very general one. It's most definitely not merely about variable declaration/assignment!
In P6, parsing of a statement -- a single imperative unit ("do this") -- generally just keeps on going until it reaches an explicit statement separator -- ; -- that brings a statement to its end, just like the period (aka full stop) at the end of this English sentence.
you can put any amount of whitespace
Like many programming languages, standard P6 is generally freeform. That is to say, wherever some whitespace is valid, then generally any amount of whitespace -- horizontal and vertical whitespace -- is syntactically equivalent.
$_ =
say "hi";
The above works exactly as would be expected if someone is applying the freeform principle -- it's a single statement assigning the value of the say to the $_ variable.
Is this surprising, but up to spec, or simply surprising?
I like inventing (hopefully pithy) sayings. I just invented "Surprise follows surmise".
It's up to spec. I know to expect it. It doesn't surprise me.
If someone embraces the fact that P6 is generally freeform and has semicolon statement separation then it will, I predict, (eventually -- likely quickly) stop being surprising.
The foregoing is a direct answer to your question. See also Jonathan's answer for more along the same lines. Feel free to ignore the rest of this answer.
For the rest of this answer I use "freeform" to refer to P6's combination of freeform syntax, semicolon statement separation, and braced blocks ({...}).
The rest of this answer is in three sections:
P6 exceptions to freeform syntax
Freeform vs line oriented
Freeform and line oriented?
P6 exceptions to freeform syntax
#Larry concluded that intuition, aesthetics, convenience, and/or other factors justified an exception from a pure freeform syntax in standard P6 in a few cases.
Statements may omit the trailing semicolon if they:
Are the last statement in a source file, or in a block;
End in a block whose closing curly is followed by a newline (ignoring comments).
Thus none of the three statements below (the if and two says) need a closing semicolon:
if 42 {
say 'no semicolon needed before the closing curly'
} # no semicolon needed after the closing curly
say 'no semicolon needed if last statement in source file'
Sometimes this may not be what's wanted:
{ ... } # the closing } ends the statement (block)
.() # call is invoked on $_
One way to change that is to use parentheses:
({ ... })
.() # calls the block on prior line
For some constructs spaces are either required or disallowed. For example, some postfixes must directly follow the values they apply to and some infixes must not. These are both syntax errors:
foo ++
foo«+»bar
Freeform vs line oriented
For some coding scenarios P6's freeform syntax is arguably a strong net positive, eg:
One liners can use blocks;
FP code is natural (one can write non-trivial closures);
More straight-forward editing/refactoring/copying/pasting.
But there are downsides:
The writing and reading overhead of freeform syntax -- semicolons and block braces.
Ignoring the intuition that presumably led you to post your question.
The latter is potent. All of the following could lead someone to think that the say in your example is part of a new statement that isn't a continuation of the $_ = statement:
The newline after the =;
The blank line after that;
The lack of an indent at the start of the say line relative to the $_ = line;
The nature of say (it might seem like say must be the start of a new statement).
An upshot of the above intuitions is that some programming languages adopt a "line-oriented" syntax rather than a freeform one, with the most famous being Python.1
Freeform and line oriented syntax
Some languages, eg Haskell, allow use of either line oriented or freeform syntax (at least for some language constructs).
P6 supports slangs, userland modules that mutate the language. Imagine a slang that supported both freeform and line-oriented code so:
Those learning P6 encountered more familiarity and less surprises as they learned the language's basics by focusing on either line-oriented or freeform code based on their preference;
Those familiar with P6 could write better code by using either line-oriented or freeform syntax.
At the risk of over-complicating things, imagine a slang that adopts not only line orientation but also the off-side rule that Python supports, and implements no strict; for untyped sigil-free variables (which drops declarators and sigils and promotes immutability). Here's a fragment of some code written in said imagined slang that I posted in a reddit comment a few weeks ago:
sub egg(bar)
put bar
data = ["Iron man", "is", "Tony Stark"]
callbacks = []
Perhaps something like the above is too difficult to pull off? (I don't currently see why.)
Footnotes
1 The remainder of this section compares P6 and Python using the Wikipedia section on Programming language statements as our guide:
A statement separator is used to demarcate boundaries between two separate statements.
In P6 it's ; or the end of blocks.
In Python ; is available to separate statements. But it's primarily line-oriented.
Languages that interpret the end of line to be the end of a statement are called "line-oriented" languages.
In Python a line end terminates a statement unless the next line is suitably indented (in which case it's the start of an associated sub-block) or an explicit line continuation character appears at the end of a line.
P6 is not line-oriented. (At least, not standard P6. I'll posit a P6 slang at the end of this answer that supports both freeform and line-oriented syntax.)
"Line continuation" is a convention in line-oriented languages [that] allows a single statement to span more than just one line.
Python has line continuation features; see the Wikipedia article for details.
Standard P6 also has line continuation features despite not being line-oriented.2
2 P6 supports line continuation. Continuing with quotes from Wikipedia:
a newline normally results in a token being added to the token stream, unless line continuation is detected.
(A token is the smallest fragment of code -- beyond an individual character -- that's treated as an atomic unit by the parser.)
Standard P6 always assumes a token break if it encounters a newline with the sole exception of a string written across lines like this:
say
'The
quick
fox';
This will compile OK and display The, quick, and fox on three separate lines.
The equivalent in Python will generate a syntax error.
Backslash as last character of line
In P6 a backslash:
Cannot appear in the middle of a token;
Can be used to introduce whitespace in sourcecode that is ignored by the parser to avoid what would otherwise be a syntax error. For example:
say #foo\
»++
Is actually a more general concept of "unspace" that can be used anywhere within a line not just at the end:
say #foo\ »++
Some form of inline comment serves as line continuation
This works:
say {;}#`( An embedded
comment ).signature
An embedded comment:
Cannot appear in the middle of a token;
Isn't as general as a backslash (say #foo#`[...]»++ doesn't work).

What does the "%" mean in tcl?

In a situation like this for example:
[% $create_port %]
or [list [% $RTL_LIST %]]
I realized it had to do with the brackets, but what confuses me is that sometimes it is used with the brackets and variable followed, and sometimes you have brackets with variables inside without the %.
So i'm not sure what it is used for.
Any help is appreciated.
% is not a metacharacter in the Tcl language core, but it still has a few meanings in Tcl. In particular, it's the modulus operator in expr and a substitution field specifier in format, scan, clock format and clock scan. (It's also the default prompt character, and I have a trivial pass-through % command in my ~/.tclshrc to make cut-n-pasting code easier, but nobody else in the world needs to follow my lead there!)
But the code you have written does not appear to be any of those (because it would be a syntax error in all of the commands I've mentioned). It looks like it is some sort of directive processing scheme (with the special sequences being [% and %], with the brackets) though not one I recognise such as doctools or rivet. Because a program that embeds a Tcl interpreter could do an arbitrary transformation to scripts before executing them, it's extremely difficult to guess what it might really be.

What is VBS UCASE function doing to Japanese?

In order to avoid case conflicts comparing strings on an ASP classic site, some inherited code converts all strings with UCASE() first. This seems to work well across languages ... except Japanese. Here's a simple example on a Japanese string. I've provided the UrlEncoded values to make it clear how little is changing behind the scenes:
Server.UrlEncode("戦艦帝国") = %E6%88%A6%E8%89%A6%E5%B8%9D%E5%9B%BD
UCASE("戦艦帝国") = ƈ�ȉ�Ÿ�ś�
Server.UrlEncode(UCASE("戦艦帝国")) = %C6%88%A6%C8%89%A6%C5%B8%9D%C5%9B%BD
So is UCASE doing anything sensible with this Japanese string? Or is its behavior buggy, undefined, or known to be incompatible with Japanese?
(LCASE leaves the sample string alone. But now I'm wary of switching all comparisons to LCASE because I don't know if it bungles other non-western languages that do work with UCASE....)
https://msdn.microsoft.com/en-us/library/1systdcy(v=vs.84).aspx
Only lowercase letters are converted to uppercase; all uppercase letters and non-letter characters remain unchanged.
https://en.wikipedia.org/wiki/Letter_case
Most Western languages (particularly those with writing systems based on the Latin, Cyrillic, Greek, Coptic, and Armenian alphabets) use letter cases in their written form as an aid to clarity. Scripts using two separate cases are also called bicameral scripts. Many other writing systems make no distinction between majuscules and minuscules – a system called unicameral script or unicase.
"lowercase or uppercase letters" does not apply in Chinese-Japanese-Korean languages, hence, the output of UCase() should remain unchanged.

ACM programming - Arithmetica 1.0 - any additional operators?

Has anyone in this forum attempted to solve the ACM programming problem http://acm.mipt.ru/judge/problems.pl?browse=yes&problem=024? It is one of the simpler problems in ACM MIPT and the goal is to evaluate an expression consisting of +, -, * and parentheses. Despite the apparent simplicity, I haven't been able to get my solution accepted, apparently because one of the test case expressions has an operator not stated in the problem. I even added support for division ('/') but that too didn't help. Any idea on what other operator needs to be supported? FYI, my program removes all whitespaces from the input before processing so that spaces shouldn't be a problem. Anything not stated in the problem but needs to be taken care of?
You're being bitten by ruby's handling of strings and characters.
curr_ch = #input[i]
gives you an integer, for the input you get, the ASCII code of the character at index i of the input.
curr_ch == '('
for example compares that integer to the string "(", of course that fails. Also the regex matches fail because you pass them an integer where a string is expected.
Replacing all occurrences of some_var = #input[some_index] with some_var = #input[some_index...some_index+1] gives me a programme that seems to work (it works on a few test inputs I gave it). Probably someone who actually knows the quirks of ruby can give you a better fix.

Resources