Bash allows things like ${#string} (string length) or ${array[10]} (indexing array). There's many more forms than the above, for example ones for trimming, replacing, changing case, etc.
I've been unable to find a proper name for these. I've seen sources refer to these as "string manipulations" or "array manipulations", but I can't find any official source using these names.
The manual seems to do it's best to avoid naming these constructs at all.
Does anyone know a name for these sorts of constructs? (ones of the form ${....} used to manipulate strings and arrays.) Or at least an unofficial name I could Google?
These are "parameter expansion" constructs.
See:
https://wiki.bash-hackers.org/syntax/pe (the relevant page in the bash-hackers' wiki)
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02 (the relevant section of the POSIX sh specification)
https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion (the official manual)
Related
I noticed in the Advanced Bash-Scripting Guide, that multiline comments are denoted as #+ rather than simply #. E.g. here.
(there is also a #% used in that particular example, denoting something like a bullet list(?), but this is literally the only location in the document where this is used, whereas the #+ syntax is used extensively)
I was wondering if this is some sort of convention, or if there is a particular reason for it other than the fact it just looks nice.
I note that it specifically seems to denote lines that are meant to be a continuation on a single line, rather than multi-line comments in general, so I'm wondering if it was simply done internally for parsing / documentation generation.
Has anyone else encountered this out in the wild before? Does anyone actually use this style?
I have Googled a handful of things such as "lisp documentation strings", "lisp comments", and a few others and I cant find anything that specifically addresses this.
I see a lot of code (especially in CL and elisp) that looks like
(defvar test 1
"This is a quoted string and it says things"
)
Where I would normally do
; This is a comment
(defvar test 1)
Which is preferred? Do each serve a different purpose? Thanks!
Many objects in Common Lisp can have a documentation string, that can be retrieved with the generic function documentation and set with the generic function (setf documentation). According to the specification:
Documentation strings are made available for debugging purposes. Conforming programs are permitted to use documentation strings when they are present, but should not depend for their correct behavior on the presence of those documentation strings. An implementation is permitted to discard documentation strings at any time for implementation-defined reasons.
So the first case allows the definition of a variable together with its documentation string, that can be used to store at run-time, if the implementation permits so, information useful for documentation and debugging purposes, either used through the IDE, or directly, through a form like:
(documentation 'test 'variable)
The second case, instead, is just a comment inside a source file, useful only for human consumption, and it is completely ignored by the reader/compiler of the system.
Development environments will use these documentation features. For example GNU Emacs / SLIME:
Move the text cursor onto the symbol test.
Type c-c c-d c-d (Describe Symbol).
Now SLIME displays a buffer with the following content:
COMMON-LISP-USER::TEST
[symbol]
TEST names a special variable:
Value: 1
Documentation:
This is a quoted string and it says things
A simple comment in the source code won't enable this form of development environment integration and documentation lookup.
I saw your tag included scheme and elisp as well. In CL and Elisp always use docstrings. They are used by documentation systems in their languages. Scheme does not have that feature so you will have to continue using comments to document functions.
Did't you try to see hyperspec for defvar?
defvar takes an optional argument - document string, and this is what are you talking about.
Documentation specified this way can be acessed throug documentation:
CL-USER> (defvar *a* "A variable" "A docstring")
*A*
CL-USER> (documentation '*a* 'variable)
"A docstring"
The Background
I recently posted an answer where I variously referred to #{} as a literal, an operator, and (in one draft) a "literal constructor." The squishiness of this definition didn't really affect the quality of the answer, since the question was more about what it does and how to find language references for it, but I'm unhappy with being unable to point to a canonical definition of exactly what to call this element of Ruby syntax.
The Ruby manual mentions this syntax element in the section on expression substitution, but doesn't really define the term for the syntax itself. Almost every reference to this language element says it's used for string interpolation, but doesn't define what it is.
Wikipedia Definitions
Here are some Wikipedia definitions that imply this construct is (strictly speaking) neither a literal nor an operator.
Literal (computer programming)
Operator (programming)
The Questions
Does anyone know what the proper term is for this language element? If so, can you please point me to a formal definition?
Ruby's parser calls #{} the "embexpr" operator. That's EMBedded EXPRession, naturally.
I would definitely call it neither a literal (that's more for, e.g. string literals or number literals themselves, but not parts thereof) nor an operator; those are solely for e.g. binary or unary (infix) operators.
I would either just refer to it without a noun (i.e. for string interpolation), or perhaps call those characters the string interpolation sequence or escape.
TL;DR
Originally, I'd hypothesized:
Embedded expression seems the most likely definition for this token, based on hints in the source code.
This turned out to be true, and has been officially validated by the Ruby 2.x documentation. Based on the updates to the Ripper documentation since this answer was originally written, it seems the parser token is formally defined as string_embexpr and the symbol itself is called an "embedded expression." See the Update for Ruby 2.x section at the bottom of this answer for detailed corroboration.
The remainder of the answer is still relevant, especially for older Rubies such as Ruby 1.9.3, and the methodology used to develop the original answer remains interesting. I am therefore updating the answer, but leaving the bulk of the original post as-is for historical purposes, even though the current answer could now be shorter.
Pre-2.x Answer Based on Ruby 1.9.3 Source Code
Related Answer
This answer calls attention to the Ruby source, which makes numerous references to embexpr throughout the code base. #Phlip suggests that this variable is an abbreviation for "EMBedded EXPRession." This seems like a reasonable interpretation, but neither the ruby-1.9.3-p194 source nor Google (as of this writing) explicitly references the term embedded expression in association with embexpr in any context, Ruby-related or not.
Additional Research
A scan of the Ruby 1.9.3-p194 source code with:
ack-grep -cil --type-add=YACC=.y embexpr .rvm/src/ruby-1.9.3-p194 |
sort -rnk2 -t: |
sed 's!^.*/!!'
reveals 9 files and 33 lines with the term embexpr:
test_scanner_events.rb:12
test_parser_events.rb:7
eventids2.c:5
eventids1.c:3
eventids2table.c:2
parse.y:1
parse.c:1
ripper.y:1
ripper.c:1
Of particular interest is the inclusion of string_embexpr on line 4,176 of the parse.y and ripper.y bison files. Likewise, TestRipper::ParserEvents#test_string_embexpr contains two references to parsing #{} on lines 899 and 902 of test_parser_events.rb.
The scanner, exercised in test_scanner_events.rb, is also noteworthy. This file defines tests in #test_embexpr_beg and #test_embexpr_end that scan for the token #{expr} inside various string expressions. The tests reference both embexpr and expr, raising the likelihood that "embedded expression" is indeed a sensible name for the thing.
Update for Ruby 2.x
Since this post was originally written, the documentation for the standard library's Ripper class has been updated to formally identify the token. The usage section provides "Hello, #{world}!" as an example, and says in part:
Within our :string_literal you’ll notice two #tstring_content, this is the literal part for Hello, and !. Between the two #tstring_content statements is a :string_embexpr, where embexpr is an embedded expression.
This Block post suggests, it is called an 'idiom':
http://kconrails.com/2010/12/08/ruby-string-interpolation/
The Wikipedia Article doesn't seem to contradict that:
http://en.wikipedia.org/wiki/Programming_idiom
#{} It's called placeholder and is used to reference variables with a string.
puts "My name is #{my_name}"
In the BashFAQ of Gregs's Wiki, the following is written:
Don't mark strings that contain variables or other substitutions.
and
Bash (at least up through 4.0) performs locale expansion before other substitutions. Thus, in a case like this:
echo "The answer is $answer"
The literal string $answer will become part of the marked string.
Now I can understand that using variables in strings marked as translatable is security-wise dangerous as described in http://www.gnu.org/software/gettext/manual/html_node/bash.html.
However, neither removing the variables nor splitting the strings is viable, as this makes the translation difficult/impossible (because of the different sentence structure in e.g. Russian, French, German and English).
So my question is: Does any sane and safe way of bash localization exists, or does one use a more expressive programming language (like Python, Ruby or Perl) when it comes to localization?
http://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/localization.html looks like a good tutorial for Bash localization using gettext, but I have not used it.
I'm trying to write some specifications to be shared between a small team and getting picky about the format I put some command listings in. Is there any formal definition of the syntax used in the SYNOPSIS section of man pages?
From the Wikimedia Commons, here's an example of a man page with the SYNOPSIS section I'm talking about, where the command is listed with the required and optional arguments it understands.
There is no formal definition of a manpage anywhere, not even in the POSIX standard. The man(1) manpage in your example is pretty typical: you write out the various ways a program can be used (often just one) with [] denoting optional, bold (or typewriter font with the mdoc macros) denoting literal command line input and italics denoting variables.
The manpages man(7) and mdoc(7) will explain the most important conventions. man(7) is for old-style Unix manpages and is still popular on Linux (see man-pages(7)); mdoc(7) comes from 4.4BSD and is popular in its derivatives. The latter maintains a stricter separation of content and presentation and can produce (IMHO) prettier PDF/HTML output
The utility conventions for utilities are documented in in Chapter 12. Utility conventions of the IEEE Std 1003.1, 2004 Edition.
A newer edition of this document exists here
man 7 man-pages:
briefly describes the command or function's interface. For commands,
this shows the syntax of the command and its arguments (including
options); boldface is used for as-is text and italics are used to
indicate replaceable arguments. Brackets ([]) surround optional
arguments, vertical bars (|) separate choices, and ellipses (...) can
be repeated. For functions, it shows any required data declarations
or #include directives, followed by the function declaration.