Is there a specification for a man page's SYNOPSIS section? - syntax

I'm trying to write some specifications to be shared between a small team and getting picky about the format I put some command listings in. Is there any formal definition of the syntax used in the SYNOPSIS section of man pages?
From the Wikimedia Commons, here's an example of a man page with the SYNOPSIS section I'm talking about, where the command is listed with the required and optional arguments it understands.

There is no formal definition of a manpage anywhere, not even in the POSIX standard. The man(1) manpage in your example is pretty typical: you write out the various ways a program can be used (often just one) with [] denoting optional, bold (or typewriter font with the mdoc macros) denoting literal command line input and italics denoting variables.
The manpages man(7) and mdoc(7) will explain the most important conventions. man(7) is for old-style Unix manpages and is still popular on Linux (see man-pages(7)); mdoc(7) comes from 4.4BSD and is popular in its derivatives. The latter maintains a stricter separation of content and presentation and can produce (IMHO) prettier PDF/HTML output

The utility conventions for utilities are documented in in Chapter 12. Utility conventions of the IEEE Std 1003.1, 2004 Edition.
A newer edition of this document exists here

man 7 man-pages:
briefly describes the command or function's interface. For commands,
this shows the syntax of the command and its arguments (including
options); boldface is used for as-is text and italics are used to
indicate replaceable arguments. Brackets ([]) surround optional
arguments, vertical bars (|) separate choices, and ellipses (...) can
be repeated. For functions, it shows any required data declarations
or #include directives, followed by the function declaration.

Related

Name for ${...} constructs (for strings and arrays) in bash?

Bash allows things like ${#string} (string length) or ${array[10]} (indexing array). There's many more forms than the above, for example ones for trimming, replacing, changing case, etc.
I've been unable to find a proper name for these. I've seen sources refer to these as "string manipulations" or "array manipulations", but I can't find any official source using these names.
The manual seems to do it's best to avoid naming these constructs at all.
Does anyone know a name for these sorts of constructs? (ones of the form ${....} used to manipulate strings and arrays.) Or at least an unofficial name I could Google?
These are "parameter expansion" constructs.
See:
https://wiki.bash-hackers.org/syntax/pe (the relevant page in the bash-hackers' wiki)
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02 (the relevant section of the POSIX sh specification)
https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion (the official manual)

Why is #+ used for multiline comments in the Advanced Bash-Scripting Guide

I noticed in the Advanced Bash-Scripting Guide, that multiline comments are denoted as #+ rather than simply #. E.g. here.
(there is also a #% used in that particular example, denoting something like a bullet list(?), but this is literally the only location in the document where this is used, whereas the #+ syntax is used extensively)
I was wondering if this is some sort of convention, or if there is a particular reason for it other than the fact it just looks nice.
I note that it specifically seems to denote lines that are meant to be a continuation on a single line, rather than multi-line comments in general, so I'm wondering if it was simply done internally for parsing / documentation generation.
Has anyone else encountered this out in the wild before? Does anyone actually use this style?

Strings in the middle of lisp S-exp?

I have Googled a handful of things such as "lisp documentation strings", "lisp comments", and a few others and I cant find anything that specifically addresses this.
I see a lot of code (especially in CL and elisp) that looks like
(defvar test 1
"This is a quoted string and it says things"
)
Where I would normally do
; This is a comment
(defvar test 1)
Which is preferred? Do each serve a different purpose? Thanks!
Many objects in Common Lisp can have a documentation string, that can be retrieved with the generic function documentation and set with the generic function (setf documentation). According to the specification:
Documentation strings are made available for debugging purposes. Conforming programs are permitted to use documentation strings when they are present, but should not depend for their correct behavior on the presence of those documentation strings. An implementation is permitted to discard documentation strings at any time for implementation-defined reasons.
So the first case allows the definition of a variable together with its documentation string, that can be used to store at run-time, if the implementation permits so, information useful for documentation and debugging purposes, either used through the IDE, or directly, through a form like:
(documentation 'test 'variable)
The second case, instead, is just a comment inside a source file, useful only for human consumption, and it is completely ignored by the reader/compiler of the system.
Development environments will use these documentation features. For example GNU Emacs / SLIME:
Move the text cursor onto the symbol test.
Type c-c c-d c-d (Describe Symbol).
Now SLIME displays a buffer with the following content:
COMMON-LISP-USER::TEST
[symbol]
TEST names a special variable:
Value: 1
Documentation:
This is a quoted string and it says things
A simple comment in the source code won't enable this form of development environment integration and documentation lookup.
I saw your tag included scheme and elisp as well. In CL and Elisp always use docstrings. They are used by documentation systems in their languages. Scheme does not have that feature so you will have to continue using comments to document functions.
Did't you try to see hyperspec for defvar?
defvar takes an optional argument - document string, and this is what are you talking about.
Documentation specified this way can be acessed throug documentation:
CL-USER> (defvar *a* "A variable" "A docstring")
*A*
CL-USER> (documentation '*a* 'variable)
"A docstring"

Are there any standards for tmlanguage keyword types?

.tmlanguage files work by defining a list of key value pairs. Regular expressions are the keys and the type of syntax is the value. This is done in the following XML-ish manner:
<key>match</key>
<string>[0-9]</string>
<key>name</key>
<string>constant.numeric</string>
My main question is: Is there a list of values that could go in place of constant.numeric if the file is to be used by a text editor like Sublime?
For a basic introduction, check out the Language Grammars section of the TextMate Manual. The Naming Conventions section describes some of the base scopes, like comment, keyword, meta, storage, etc. These classes can then be subclassed to give as much detail as possible - for example, constant.numeric.integer.long.hexadecimal.python. However, it is very important to note that these are not hard-and-fast rules - just suggestions. This will become obvious as you scan through different language definitions and see, for example, all the different ways that functions are scoped - meta.function-call, support.function.name, meta.function-call punctuation.definition.parameters, etc.
The best way to learn about scopes is to examine existing .tmLanguage files, and to look through the source of different languages and see what scopes are assigned where. The XML format is very difficult to casually browse through, so I use the excellent PackageDev plugin to translate the XML to YAML. It is then much easier to scan and see what scopes are described by what regexes:
Another way to learn is to see how different language constructs are scoped, and for that I highly recommend using ScopeAlways. Once installed and activated, just place your cursor and the scope(s) that apply to that particular position are shown in the status bar. This is particularly useful when designing color schemes, as you can easily see which selectors will highlight a language feature of interest.
If you're interested, the color scheme used here is Neon, which I designed to make as many languages as possible look as good as possible, covering as many scopes as possible. Feel free to look through it to see how the different language elements are highlighted; this could also help you in designing your .tmLanguage to be consistent with other languages.
I hope all this helps, good luck!
Yes. The .tmlanguage format was originally used by TextMate. The TextMate manual provides full documentation for the format, including the possible types of language constructs.
Copied from the relevant docs page, in hierarchical format:
comment — for comments.
line — line comments, we specialize further so that the type of comment start character(s) can be extracted from the scope
double-slash — // comment
double-dash — -- comment
number-sign — # comment
percentage — % comment
character — other types of line comments.
block — multi-line comments like /* … */ and <!-- … -->.
documentation — embedded documentation.
constant — various forms of constants.
numeric — those which represent numbers, e.g. 42, 1.3f, 0x4AB1U.
character — those which represent characters, e.g. <, \e, \031.
escape — escape sequences like \e would be constant.character.escape.
language — constants (generally) provided by the language which are “special” like true, false, nil, YES, NO, etc.
other — other constants, e.g. colors in CSS.
entity — an entity refers to a larger part of the document, for example a chapter, class, function, or tag. We do not scope the entire entity as entity.* (we use meta.* for that). But we do use entity.* for the “placeholders” in the larger entity, e.g. if the entity is a chapter, we would use entity.name.section for the chapter title.
name — we are naming the larger entity.
function — the name of a function.
type — the name of a type declaration or class.
tag — a tag name.
section — the name is the name of a section/heading.
other — other entities.
inherited-class — the superclass/baseclass name.
attribute-name — the name of an attribute (mainly in tags).
we are naming the larger entity.
invalid — stuff which is “invalid”.
illegal — illegal, e.g. an ampersand or lower-than character in HTML (which is not part of an entity/tag).
deprecated — for deprecated stuff e.g. using an API function which is deprecated or using styling with strict HTML.
keyword — keywords (when these do not fall into the other groups).
control — mainly related to flow control like continue, while, return, etc.
operator — operators can either be textual (e.g. or) or be characters.
other — other keywords.
markup — this is for markup languages and generally applies to larger subsets of the text.
underline — underlined text.
link — this is for links, as a convenience this is derived from markup.underline so that if there is no theme rule which specifically targets markup.underline.link then it will inherit the underline style.
bold — bold text (text which is strong and similar should preferably be derived from this name).
heading — a section header. Optionally provide the heading level as the next element, for example markup.heading.2.html for <h2>…</h2> in HTML.
italic — italic text (text which is emphasized and similar should preferably be derived from this name).
list — list items.
numbered — numbered list items.
unnumbered — unnumbered list items.
quote — quoted (sometimes block quoted) text.
raw — text which is verbatim, e.g. code listings. Normally spell checking is disabled for markup.raw.
other — other markup constructs.
meta — the meta scope is generally used to markup larger parts of the document. For example the entire line which declares a function would be meta.function and the subsets would be storage.type, entity.name.function, variable.parameter etc. and only the latter would be styled. Sometimes the meta part of the scope will be used only to limit the more general element that is styled, most of the time meta scopes are however used in scope selectors for activation of bundle items. For example in Objective-C there is a meta scope for the interface declaration of a class and the implementation, allowing the same tab-triggers to expand differently, depending on context.
storage — things relating to “storage”.
type — the type of something, class, function, int, var, etc.
modifier — a storage modifier like static, final, abstract, etc.
string — strings.
quoted — quoted strings.
single — single quoted strings: 'foo'.
double — double quoted strings: "foo".
triple — triple quoted strings: """Python""".
other — other types of quoting: $'shell', %s{...}.
unquoted — for things like here-docs and here-strings.
interpolated — strings which are “evaluated”: `date`, $(pwd).
regexp — regular expressions: /(\w+)/.
other — other types of strings (should rarely be used).
support — things provided by a framework or library should be below support.
function — functions provided by the framework/library. For example NSLog in Objective-C is support.function.
class — when the framework/library provides classes.
type — types provided by the framework/library, this is probably only used for languages derived from C, which has typedef (and struct). Most other languages would introduce new types as classes.
constant — constants (magic values) provided by the framework/library.
variable — variables provided by the framework/library. For example NSApp in AppKit.
other — the above should be exhaustive, but for everything else use support.other.
variable — variables. Not all languages allow easy identification (and thus markup) of these.
parameter — when the variable is declared as the parameter.
language — reserved language variables like this, super, self, etc.
other — other variables, like $some_variables.

Short/long options with option argument - is this some sort of convention? [duplicate]

This question already has answers here:
What is the general syntax of a Unix shell command?
(4 answers)
Closed 7 years ago.
It seems that most (a lot of) commands implement option arguments like this:
if a short option requires an option argument, the option is separated by a space from the option argument, e.g.
$ head -n 10
if a long option requires an option argument, the option is separated by a = from the option argument, e.g.
$ head --lines=10
Is this some sort of convention and yes, where can I find it? Besides, what's the reasoning?
Why e.g. is it not
$ head --lines 10
?
The short option rationale is documented in the POSIX Utility Conventions. Most options parsers allow the value to be 'attached' to the letter (-n10), mainly because of extensive historical precedent.
The long option rationale is specified by GNU in their Coding Standards and in the manual page for getopt_long().
Once upon a long time ago, in a StackOverflow of long ago, there was a question about command option styles. Not perhaps a good question, but I think the answers rescued it (but I admit to bias). Anyway, it has since been deleted, so I'm going to resuscitate my answer here because (a) it was a painful process to rediscover the answer and (b) it has useful information in it related to options.
How many different types of options do you recognize? I can think of many, including:
Single-letter options preceded by single dash, groupable when there is no argument, argument can be attached to option letter or in next argument (many, many Unix commands; most POSIX commands).
Single-letter options preceded by single dash, grouping not allowed, arguments must be attached (RCS).
Single-letter options preceded by single dash, grouping not allowed, arguments must be separate (pre-POSIX SCCS, IIRC).
Multi-letter options preceded by single dash, arguments may be attached or in next argument (X11 programs).
Multi-letter options preceded by single dash, may be abbreviated (Atria Clearcase).
Multi-letter options preceded by single plus (obsolete).
Multi-letter options preceded by double dash; arguments may follow '=' or be separate (GNU utilities).
Options without prefix/suffix, some names have abbreviations or are implied, arguments must be separate. (AmigaOS Shell, added by porneL)
Options taking an optional argument sometimes must be attached, sometimes must follow an '=' sign. POSIX doesn't support optional arguments meaningfully (the POSIX getopt() only allows them for the last option on the command line).
All sensible option systems use an option consisting of double-dash ('--') alone to mean "end of options" - the following arguments are "non-option arguments" (usually file names) even if they start with a dash. (I regard supporting this notation as an imperative.) Note that if you have a command cmd with an option -f that expects an argument, then if you invoke it with -- in place of the argument (cmd -f -- -other, many versions of getopt() will treat the -- as the file name for -f and then parse -other as regular options. That is, -- does not terminate the options if it has to be interpreted as an argument to another option.
Many but not all programs accept single dash as a file name to mean standard input (usually) or standard output (occasionally). Sometimes, as with GNU 'tar', both can be used in a single command line:
tar -cf - -F - | ...
The first solo dash means 'write to stdout'; the second means 'read file names from stdin'.
Some programs use other conventions — that is, options not preceded by a dash. Many of these are from the oldest days of Unix. For example, 'tar' and 'ar' both accept options without a dash, so:
tar cvzf /tmp/somefile.tgz some/directory
The dd command uses opt=value exclusively:
dd if=/some/file of=/another/file bs=16k count=200
Some programs allow you to interleave options and other arguments completely; the C compiler, make and the GNU utilities run without POSIXLY_CORRECT in the environment are examples. Many programs expect the options to precede the other arguments.
Modern programs such as git increasingly seem to use a base command name (git) followed by a sub-command (commit) followed by options (-m "Commit message"). This was presaged by the sccs interface to the SCCS commands, and then by cvs, and is used by svn too (and they are all version control systems). However, other big suites of commands adopt similar styles when it seems appropriate.
I don't have strong preferences between the different systems. When there are few enough options, then single letters with mnemonic value are convenient. GNU supports this, but recommends backing it up with multi-letter options preceded by a double-dash.
There are some things I do object to. One of the worst is the same option letter being used with different meanings depending on what other option letters have preceded it. In my book, that's a no-no, but I know of software where it is done.
Another objectionable behaviour is inconsistency in style of handling arguments (especially for a single program, but also within a suite of programs). Either require attached arguments or require detached arguments (or allow either), but do not have some options requiring an attached argument and others requiring a detached argument. And be consistent about whether '=' may be used to separate the option and the argument.
As with many, many (software-related) things — consistency is more important than the individual decisions.
Whatever you do, please, read the TAOUP's Command-Line Options and consider Standards for Command Line Interfaces. (Added by J F Sebastian — thanks; I agree.)

Resources