In a situation like this for example:
[% $create_port %]
or [list [% $RTL_LIST %]]
I realized it had to do with the brackets, but what confuses me is that sometimes it is used with the brackets and variable followed, and sometimes you have brackets with variables inside without the %.
So i'm not sure what it is used for.
Any help is appreciated.
% is not a metacharacter in the Tcl language core, but it still has a few meanings in Tcl. In particular, it's the modulus operator in expr and a substitution field specifier in format, scan, clock format and clock scan. (It's also the default prompt character, and I have a trivial pass-through % command in my ~/.tclshrc to make cut-n-pasting code easier, but nobody else in the world needs to follow my lead there!)
But the code you have written does not appear to be any of those (because it would be a syntax error in all of the commands I've mentioned). It looks like it is some sort of directive processing scheme (with the special sequences being [% and %], with the brackets) though not one I recognise such as doctools or rivet. Because a program that embeds a Tcl interpreter could do an arbitrary transformation to scripts before executing them, it's extremely difficult to guess what it might really be.
Related
This works:
$_ =
say "hi";
That is, you can put any amount of whitespace between an assignment and stuff that's behind, it will simply ignore it. You can use any variable (with my) too. Effectively, $_ will be assigned the result of the say, which is True.
Is this surprising, but up to spec, or simply surprising?
There may be any amount of whitespace either side of an operator. Thus:
say 1
+ 2
+ 3;
Is, so far as the compiler sees it, entirely the same as:
say 1 + 2 + 3;
Assignment (=) is just another operator, so also follows these rules.
Further, say is just a normal built-in subroutine, so it's just like:
my $answer = flip '24';
say $answer; # 42
Except with more whitespace:
my $answer =
flip '24';
say $answer; # 42
There are some places in Perl 6 where whitespace is significant, but the whitespace between infix operators is not one of them.
TL;DR P6 syntax is freeform.
Dangling topic (or any other) variable does not fail
The issue you describe is a very general one. It's most definitely not merely about variable declaration/assignment!
In P6, parsing of a statement -- a single imperative unit ("do this") -- generally just keeps on going until it reaches an explicit statement separator -- ; -- that brings a statement to its end, just like the period (aka full stop) at the end of this English sentence.
you can put any amount of whitespace
Like many programming languages, standard P6 is generally freeform. That is to say, wherever some whitespace is valid, then generally any amount of whitespace -- horizontal and vertical whitespace -- is syntactically equivalent.
$_ =
say "hi";
The above works exactly as would be expected if someone is applying the freeform principle -- it's a single statement assigning the value of the say to the $_ variable.
Is this surprising, but up to spec, or simply surprising?
I like inventing (hopefully pithy) sayings. I just invented "Surprise follows surmise".
It's up to spec. I know to expect it. It doesn't surprise me.
If someone embraces the fact that P6 is generally freeform and has semicolon statement separation then it will, I predict, (eventually -- likely quickly) stop being surprising.
The foregoing is a direct answer to your question. See also Jonathan's answer for more along the same lines. Feel free to ignore the rest of this answer.
For the rest of this answer I use "freeform" to refer to P6's combination of freeform syntax, semicolon statement separation, and braced blocks ({...}).
The rest of this answer is in three sections:
P6 exceptions to freeform syntax
Freeform vs line oriented
Freeform and line oriented?
P6 exceptions to freeform syntax
#Larry concluded that intuition, aesthetics, convenience, and/or other factors justified an exception from a pure freeform syntax in standard P6 in a few cases.
Statements may omit the trailing semicolon if they:
Are the last statement in a source file, or in a block;
End in a block whose closing curly is followed by a newline (ignoring comments).
Thus none of the three statements below (the if and two says) need a closing semicolon:
if 42 {
say 'no semicolon needed before the closing curly'
} # no semicolon needed after the closing curly
say 'no semicolon needed if last statement in source file'
Sometimes this may not be what's wanted:
{ ... } # the closing } ends the statement (block)
.() # call is invoked on $_
One way to change that is to use parentheses:
({ ... })
.() # calls the block on prior line
For some constructs spaces are either required or disallowed. For example, some postfixes must directly follow the values they apply to and some infixes must not. These are both syntax errors:
foo ++
foo«+»bar
Freeform vs line oriented
For some coding scenarios P6's freeform syntax is arguably a strong net positive, eg:
One liners can use blocks;
FP code is natural (one can write non-trivial closures);
More straight-forward editing/refactoring/copying/pasting.
But there are downsides:
The writing and reading overhead of freeform syntax -- semicolons and block braces.
Ignoring the intuition that presumably led you to post your question.
The latter is potent. All of the following could lead someone to think that the say in your example is part of a new statement that isn't a continuation of the $_ = statement:
The newline after the =;
The blank line after that;
The lack of an indent at the start of the say line relative to the $_ = line;
The nature of say (it might seem like say must be the start of a new statement).
An upshot of the above intuitions is that some programming languages adopt a "line-oriented" syntax rather than a freeform one, with the most famous being Python.1
Freeform and line oriented syntax
Some languages, eg Haskell, allow use of either line oriented or freeform syntax (at least for some language constructs).
P6 supports slangs, userland modules that mutate the language. Imagine a slang that supported both freeform and line-oriented code so:
Those learning P6 encountered more familiarity and less surprises as they learned the language's basics by focusing on either line-oriented or freeform code based on their preference;
Those familiar with P6 could write better code by using either line-oriented or freeform syntax.
At the risk of over-complicating things, imagine a slang that adopts not only line orientation but also the off-side rule that Python supports, and implements no strict; for untyped sigil-free variables (which drops declarators and sigils and promotes immutability). Here's a fragment of some code written in said imagined slang that I posted in a reddit comment a few weeks ago:
sub egg(bar)
put bar
data = ["Iron man", "is", "Tony Stark"]
callbacks = []
Perhaps something like the above is too difficult to pull off? (I don't currently see why.)
Footnotes
1 The remainder of this section compares P6 and Python using the Wikipedia section on Programming language statements as our guide:
A statement separator is used to demarcate boundaries between two separate statements.
In P6 it's ; or the end of blocks.
In Python ; is available to separate statements. But it's primarily line-oriented.
Languages that interpret the end of line to be the end of a statement are called "line-oriented" languages.
In Python a line end terminates a statement unless the next line is suitably indented (in which case it's the start of an associated sub-block) or an explicit line continuation character appears at the end of a line.
P6 is not line-oriented. (At least, not standard P6. I'll posit a P6 slang at the end of this answer that supports both freeform and line-oriented syntax.)
"Line continuation" is a convention in line-oriented languages [that] allows a single statement to span more than just one line.
Python has line continuation features; see the Wikipedia article for details.
Standard P6 also has line continuation features despite not being line-oriented.2
2 P6 supports line continuation. Continuing with quotes from Wikipedia:
a newline normally results in a token being added to the token stream, unless line continuation is detected.
(A token is the smallest fragment of code -- beyond an individual character -- that's treated as an atomic unit by the parser.)
Standard P6 always assumes a token break if it encounters a newline with the sole exception of a string written across lines like this:
say
'The
quick
fox';
This will compile OK and display The, quick, and fox on three separate lines.
The equivalent in Python will generate a syntax error.
Backslash as last character of line
In P6 a backslash:
Cannot appear in the middle of a token;
Can be used to introduce whitespace in sourcecode that is ignored by the parser to avoid what would otherwise be a syntax error. For example:
say #foo\
»++
Is actually a more general concept of "unspace" that can be used anywhere within a line not just at the end:
say #foo\ »++
Some form of inline comment serves as line continuation
This works:
say {;}#`( An embedded
comment ).signature
An embedded comment:
Cannot appear in the middle of a token;
Isn't as general as a backslash (say #foo#`[...]»++ doesn't work).
I'm trying to understand exactly who among the set of TTY, kernel, line discipline, and shell actually deals with any given part of my input. In particular, I've been working through this article:
The TTY demystified
My question: if I want to have the actual delete character show up in BASH, I can use ^V to make it verbatim. E.g. I can use Ctrl+V then Ctrl+H to have a backspace character. Pretty neat!
echo '3^H'
3
My question is this: if I'm in something that reads in canonical mode (I believe cat does this) I can put a null character in there by doing Ctrl+V then Ctrl+2 (basically Ctrl+#, which is the caret notation for the null character).
BASH won't allow me to have a verbatim null character on one of its lines, though, and it looks like python and other readline programs won't either.
Does anybody know why this is, or a general-purpose workaround?
The C library uses a literal null as a string terminator, so anything using that is unable to represent strings containing literal nulls.
Programs which need to support literal nulls define their own string data type.
Which form is most efficient?
1)
v=''
v+='a'
v+='b'
v+='c'
2)
v2='a'` `'b'` `'c'
Assuming readability were exactly the same to you, and that's a stretch, would 1) mean creating and throwing away a few string immutables (like in Python) or act as a Java "StringBuffer" with periodical expansion of the buffer capacity? How are string concatenations handled internally in Bash?
If 2) were just as readable to you as 1), would the backticks spawn subshells and would that be more costly, even as a potential 'no-op' than what is done in 1) ?
Well, the simplest and most efficient mechanism would be option 0:
v="abc"
The first mechanism involves four assignments.
The second mechanism is bizarre (and is definitely not readable). It (nominally) runs an empty command in two sub-shells (the two ` ` parts) and concatenates the outputs (an empty string) with the three constants. If the shell simply executes the back-tick commands without noting that they're empty (and it's not unreasonable that it won't notice; it is a weird thing to try — I don't recall seeing it done in my previous 30 years of shell scripting), this is definitely vastly slower.
So, given only options (1) and (2), use option (1), but in general, use option (0) shown above.
Why would you be building up the string piecemeal like that? What's missing from your example that makes the original code sensible but the reduced code shown less sensible.
v=""
x=$(...)
v="$v$x"
y=$(...)
v="$v$y"
z=$(...)
v="$v$z"
This would make more sense, especially if you use each of $x, $y and $z later, and/or use intermediate values of $v (perhaps in the commands represented by triple dots). The concatenation notation used will work with any Bourne-shell derivative; the alternative += shell will work with fewer shells, but is probably slightly more efficient (with the emphasis on 'slightly').
The portable and straight forward method would be to use double quotes and curly brackets for variables:
VARA="beginning text ${VARB} middle text ${VARC}..."
you can even set default values for empty variables this way
VARA="${VARB:-default text} substring manipulation 1st 3 characters ${VARC:0:3}"
using the curly brackets prevents situations where there is a $VARa and you want to write ${VAR}a but end up getting the contents of ${VARa}
In my program, I'm grep-ing via NSTask. For some reason, sometimes I would get no results (even though the code was apparently the same as the command run from the CLI which worked just fine), so I checked through my code and found, in Apple's documentation, that when adding arguments to an NSTask object, "the NSTask object converts both path and the strings in arguments to appropriate C-style strings (using fileSystemRepresentation) before passing them to the task via argv[]" (snip).
The problem is that I might grep terms like "Río Gallegos". Sadly (as I checked with fileSystemRepresentation), that undergoes the conversion and turns out to be "RiÃÅo Gallegos".
How can I solve this?
-- Ry
The problem is that I might grep terms like "Río Gallegos". Sadly (as I checked with fileSystemRepresentation), that undergoes the conversion and turns out to be "RiÃÅo Gallegos".
That's one possible interpretation. What you mean is that “Río Gallegos” gets converted to “Ri\xcc\x81o Gallegos”—the UTF-8 bytes to represent the decomposed i + combining acute accent.
Your problem is that grep is not interpreting these bytes as UTF-8. grep is using some other encoding—apparently, MacRoman.
The solution is to tell grep to use UTF-8. That requires setting the LC_ALL variable in your grep task's environment.
The quick and dirty value to use would be “en_US.UTF-8”; a more proper way would be to get the language code for the user's primary preferred language, replace the hyphen, if any, with an underscore, and stick “.UTF-8” on the end of that.
Many programming languages allow trailing commas in their grammar following the last item in a list. Supposedly this was done to simplify automatic code generation, which is understandable.
As an example, the following is a perfectly legal array initialization in Java (JLS 10.6 Array Initializers):
int[] a = { 1, 2, 3, };
I'm curious if anyone knows which language was first to allow trailing commas such as these. Apparently C had it as far back as 1985.
Also, if anybody knows other grammar "peculiarities" of modern programming languages, I'd be very interested in hearing about those also. I read that Perl and Python for example are even more liberal in allowing trailing commas in other parts of their grammar.
I'm not an expert on the commas, but I know that standard Pascal was very persnickity about semi-colons being statement separators, not terminators. That meant you had to be very very careful about where you put one if you didn't want to get yelled at by the compiler.
Later Pascal-esque languages (C, Modula-2, Ada, etc.) had their standards written to accept the odd extra semicolon without behaving like you'd just peed in the cake mix.
I just found out that a g77 Fortran compiler has the -fugly-comma Ugly Null Arguments flag, though it's a bit different (and as the name implies, rather ugly).
The -fugly-comma option enables use of a single trailing comma to mean “pass an extra trailing null argument” in a list of actual arguments to an external procedure, and use of an empty list of arguments to such a procedure to mean “pass a single null argument”.
For example, CALL FOO(,) means “pass two null arguments”, rather than “pass one null argument”. Also, CALL BAR() means “pass one null argument”.
I'm not sure which version of the language this first appeared in, though.
[Does anybody know] other grammar "peculiarities" of modern programming languages?
One of my favorites, Modula-3, was designed in 1990 with Niklaus Wirth's blessing as the then-latest language in the "Pascal family". Does anyone else remember those awful fights about where semicolon should be a separator or a terminator? In Modula-3, the choice is yours! The EBNF for a sequence of statements is
stmt ::= BEGIN [stmt {; stmt} [;]] END
Similarly, when writing alternatives in a CASE statement, Modula-3 let you use the vertical bar | as either a separator or a prefix. So you could write
CASE c OF
| 'a', 'e', 'i', 'o', 'u' => RETURN Char.Vowel
| 'y' => RETURN Char.Semivowel
ELSE RETURN Char.Consonant
END
or you could leave off the initial bar, perhaps because you prefer to write OF in that position.
I think what I liked as much as the design itself was the designers' awareness that there was a religious war going on and their persistence in finding a way to support both sides.
Let the programmer choose!
P.S. Objective Caml allows permissive use of | in case expressions whereas the earlier and closely related dialect Standard ML does not. As a result, case expressions are often uglier in Standard ML code.
EDIT: After seeing T.E.D.'s answer I checked the Modula-2 grammar and he's correct, Modula-2 also supported semicolon as terminator, but through the device of the empty statement, which makes stuff like
x := x + 1;;;;;; RETURN x
legal. I suppose that's not a bad thing. Modula-2 didn't allow flexible use of the case separator |, however; that seems to have originated with Modula-3.
Something which has always galled me about C is that although it allows an extra trailing comma in an intializer list, it does not allow an extra trailing comma in an enumerator list (for defining the literals of an enumeration type). This little inconsistency has bitten me in the ass more times than I care to admit. And for no reason!