Bash line comment continuation - bash

I have seen some bash/shell comments use the notation
# Some comment block that starts on line 1, but then
#+ continues on line 2 with this silly plus sign.
# And this is another comment line that is not related to the ones above
Does the "#+" help with any kind of parser (like how Doxygen-style comments are used to auto-generate documentation)?
Is this a common practice? I understand that it doesn't hurt anything to include/exclude it, as far as the actual script execution goes, but I'm curious if there are advantages to adopting this style of commenting.

According to the Advanced Bash-Scripting Guide, it looks like this is one of several comment headers one can use to improve clarity and legibility in scripts. This tidbit is presented in the "Assorted Tips" section of the guide:
Use special-purpose comment headers to increase clarity and legibility in scripts.
Here are several of the ones they list in the example block from the guide:
## Caution.
rm -rf *.zzy ## The "-rf" options to "rm" are very dangerous,
##+ especially with wild cards.
#+ Line continuation.
# This is line 1
#+ of a multi-line comment,
#+ and this is the final line.
#* Note.
#o List item.
#> Another point of view.
while [ "$var1" != "end" ] #> while test "$var1" != "end"
Apparently some people find these little bits helpful, but I personally don't see much benefit in doing it.

Related

Should I split a ruby single-line if statement into multi-line if statement because the line is long?

I have a ruby single-line statement that is very long, about 200 characters. According to a ruby style guide, single-line if statement is favored here because the body is single-line.
address = Module::InnerModule::Class.new(long_address) if Module::Class.new(long_address).is_good?
But, 200 characters is way over the usual threshold for line length (which is usually 120 at most). Should I split the if statement into a multi-line statement in order to reduce the line length (or should I just accept that the line is long)?
if Module::Class.new(long_address).is_good?
address = Module::InnerModule::Class.new(long_address)
Also, what happens if the line is still very long after splitting? What is the best practice here? I'm new to Ruby, so I would appreciate any advice on the best practice here.
Style questions aside, if you want to maintain your current semantics, you can break lines at certain keywords and operators without escaping newlines with backslashes. For example:
address =
Module::InnerModule::Class.new(long_address) if
Module::Class.new(long_address).is_good?
Otherwise, change your semantics or refactor your code to fit your desired line length and chosen style. Questions about how to split lines are answerable, but the “best” way to split, indent, or refactor are largely subjective, and mostly amount to a combination of readability and intent.

Dangling topic (or any other) variable does not fail

This works:
$_ =
say "hi";
That is, you can put any amount of whitespace between an assignment and stuff that's behind, it will simply ignore it. You can use any variable (with my) too. Effectively, $_ will be assigned the result of the say, which is True.
Is this surprising, but up to spec, or simply surprising?
There may be any amount of whitespace either side of an operator. Thus:
say 1
+ 2
+ 3;
Is, so far as the compiler sees it, entirely the same as:
say 1 + 2 + 3;
Assignment (=) is just another operator, so also follows these rules.
Further, say is just a normal built-in subroutine, so it's just like:
my $answer = flip '24';
say $answer; # 42
Except with more whitespace:
my $answer =
flip '24';
say $answer; # 42
There are some places in Perl 6 where whitespace is significant, but the whitespace between infix operators is not one of them.
TL;DR P6 syntax is freeform.
Dangling topic (or any other) variable does not fail
The issue you describe is a very general one. It's most definitely not merely about variable declaration/assignment!
In P6, parsing of a statement -- a single imperative unit ("do this") -- generally just keeps on going until it reaches an explicit statement separator -- ; -- that brings a statement to its end, just like the period (aka full stop) at the end of this English sentence.
you can put any amount of whitespace
Like many programming languages, standard P6 is generally freeform. That is to say, wherever some whitespace is valid, then generally any amount of whitespace -- horizontal and vertical whitespace -- is syntactically equivalent.
$_ =
say "hi";
The above works exactly as would be expected if someone is applying the freeform principle -- it's a single statement assigning the value of the say to the $_ variable.
Is this surprising, but up to spec, or simply surprising?
I like inventing (hopefully pithy) sayings. I just invented "Surprise follows surmise".
It's up to spec. I know to expect it. It doesn't surprise me.
If someone embraces the fact that P6 is generally freeform and has semicolon statement separation then it will, I predict, (eventually -- likely quickly) stop being surprising.
The foregoing is a direct answer to your question. See also Jonathan's answer for more along the same lines. Feel free to ignore the rest of this answer.
For the rest of this answer I use "freeform" to refer to P6's combination of freeform syntax, semicolon statement separation, and braced blocks ({...}).
The rest of this answer is in three sections:
P6 exceptions to freeform syntax
Freeform vs line oriented
Freeform and line oriented?
P6 exceptions to freeform syntax
#Larry concluded that intuition, aesthetics, convenience, and/or other factors justified an exception from a pure freeform syntax in standard P6 in a few cases.
Statements may omit the trailing semicolon if they:
Are the last statement in a source file, or in a block;
End in a block whose closing curly is followed by a newline (ignoring comments).
Thus none of the three statements below (the if and two says) need a closing semicolon:
if 42 {
say 'no semicolon needed before the closing curly'
} # no semicolon needed after the closing curly
say 'no semicolon needed if last statement in source file'
Sometimes this may not be what's wanted:
{ ... } # the closing } ends the statement (block)
.() # call is invoked on $_
One way to change that is to use parentheses:
({ ... })
.() # calls the block on prior line
For some constructs spaces are either required or disallowed. For example, some postfixes must directly follow the values they apply to and some infixes must not. These are both syntax errors:
foo ++
foo«+»bar
Freeform vs line oriented
For some coding scenarios P6's freeform syntax is arguably a strong net positive, eg:
One liners can use blocks;
FP code is natural (one can write non-trivial closures);
More straight-forward editing/refactoring/copying/pasting.
But there are downsides:
The writing and reading overhead of freeform syntax -- semicolons and block braces.
Ignoring the intuition that presumably led you to post your question.
The latter is potent. All of the following could lead someone to think that the say in your example is part of a new statement that isn't a continuation of the $_ = statement:
The newline after the =;
The blank line after that;
The lack of an indent at the start of the say line relative to the $_ = line;
The nature of say (it might seem like say must be the start of a new statement).
An upshot of the above intuitions is that some programming languages adopt a "line-oriented" syntax rather than a freeform one, with the most famous being Python.1
Freeform and line oriented syntax
Some languages, eg Haskell, allow use of either line oriented or freeform syntax (at least for some language constructs).
P6 supports slangs, userland modules that mutate the language. Imagine a slang that supported both freeform and line-oriented code so:
Those learning P6 encountered more familiarity and less surprises as they learned the language's basics by focusing on either line-oriented or freeform code based on their preference;
Those familiar with P6 could write better code by using either line-oriented or freeform syntax.
At the risk of over-complicating things, imagine a slang that adopts not only line orientation but also the off-side rule that Python supports, and implements no strict; for untyped sigil-free variables (which drops declarators and sigils and promotes immutability). Here's a fragment of some code written in said imagined slang that I posted in a reddit comment a few weeks ago:
sub egg(bar)
put bar
data = ["Iron man", "is", "Tony Stark"]
callbacks = []
Perhaps something like the above is too difficult to pull off? (I don't currently see why.)
Footnotes
1 The remainder of this section compares P6 and Python using the Wikipedia section on Programming language statements as our guide:
A statement separator is used to demarcate boundaries between two separate statements.
In P6 it's ; or the end of blocks.
In Python ; is available to separate statements. But it's primarily line-oriented.
Languages that interpret the end of line to be the end of a statement are called "line-oriented" languages.
In Python a line end terminates a statement unless the next line is suitably indented (in which case it's the start of an associated sub-block) or an explicit line continuation character appears at the end of a line.
P6 is not line-oriented. (At least, not standard P6. I'll posit a P6 slang at the end of this answer that supports both freeform and line-oriented syntax.)
"Line continuation" is a convention in line-oriented languages [that] allows a single statement to span more than just one line.
Python has line continuation features; see the Wikipedia article for details.
Standard P6 also has line continuation features despite not being line-oriented.2
2 P6 supports line continuation. Continuing with quotes from Wikipedia:
a newline normally results in a token being added to the token stream, unless line continuation is detected.
(A token is the smallest fragment of code -- beyond an individual character -- that's treated as an atomic unit by the parser.)
Standard P6 always assumes a token break if it encounters a newline with the sole exception of a string written across lines like this:
say
'The
quick
fox';
This will compile OK and display The, quick, and fox on three separate lines.
The equivalent in Python will generate a syntax error.
Backslash as last character of line
In P6 a backslash:
Cannot appear in the middle of a token;
Can be used to introduce whitespace in sourcecode that is ignored by the parser to avoid what would otherwise be a syntax error. For example:
say #foo\
»++
Is actually a more general concept of "unspace" that can be used anywhere within a line not just at the end:
say #foo\ »++
Some form of inline comment serves as line continuation
This works:
say {;}#`( An embedded
comment ).signature
An embedded comment:
Cannot appear in the middle of a token;
Isn't as general as a backslash (say #foo#`[...]»++ doesn't work).

check for permutation in bash

I have a script wherein you have to input a string with a length greater then or equal to 1 and less then 26.
If that's not the case I want to return an error. But that's the part I have figured out
lengthAlphabetInput=${#1}
if [ $lengthAlphabetInput -lt 1 ] || [ $lengthAlphabetInput -gt 26 ]
then
echo "error: key needs to be between 1 and 26 characters"
exit 1
fi
Other than that I would like to check if the input the user gave is a permutation of (a part of) the alphabet.
For example if the user inputs "abc" I want to return an error "abc is
not a permutation of the alphabet"
if the user inputs "xxxgsdnoip" I again want to return the same error
because I don't want the user to use the same letter more than once.
But the input "xyz" or "jhcwslaedmviotrgzxkbynpuqf" would be correct
because these are permutations of the alphabet. (x instead of a, y
instead of b and z instead of c).
Can anyone help me transform this idea into code?
I realized that this is a question raised by a student, so I did not write down a detailed answer, since the experience of reading manual and figuring it out yourself will really help you learn how to use bash (actually the GNU/BSD core utilities), as said by #binaryzebra. What you should do is:
Learn to read manual in bash, with command man, such as man sort for the manual of sort utility. Hit Up/Down arrow key or PageUp/PageDown key to scroll; hit q to exit. Reading manual is your first step into Unix world. Sure you can skip this and find all the information from Google, but learning to read manual will do you more good in the long run.
Read the manual of sed and learn substitution with regular expression. The manual is a little too long for a newcomer, but luckily you do not need to read it all; just scan the manual and find the part about substitution; read the examples as well, if there is any. Practice with some test file. Now you know how to check whether input contains only letters (instead of whitespace, symbols, etc.), as well as how to split each character in its own line.
Read the manual of uniq. It has a much shorter manual; reading the whole manual won't take long.
Now learn the pipeline feature in bash. I cannot find a short and focused manual entry, so you may as well just read the online manual from GNU. With the help of pipeline, you can combine sed and uniq to detect duplicated characters.
By "permutation", it seems that you do not want the characters in their original order. If so, read the manual of the sort utility and think how it can help you.
You do not seem to care about whether all 26 letters are there. If this is the case, you probably do not need the wc (word count) utility, unless you require the subset of letters be continuous (such as "cdefg" instead of "cdhjk").
That's all the hints; good luck with your homework.
#!/usr/bin/perl
$_=shift;
print "not ok:repeated: $1\n" if/(.).*\1/;
my $i=0;
my #s= ( map { ord($_)-97 != $i++ ? ():($_)} split(''));
print "not ok:samePlace: #s\n" if #s;
usage:
$ perl ex.pl rty
$ perl ex.pl abc
not ok:samePlace: a b c
$ perl ex.pl ddss
not ok:repeated: d

An obscure one: Documented VT100 'soft-wrap' escape sequence?

When connected to a remote BASH session via SSH (with the terminal type set to vt100), the console command line will soft-wrap when the cursor hits column 80.
What I am trying to discover is if the <space><carriage return> sequence that gets sent at this point is documented anywhere?
For example sending the following string
std::string str = "0123456789" // 1
"0123456789"
"0123456789" // 3
"0123456789"
"0123456789" // 5
"012345678 9"
"0123456789_" // 7
"0123456789"
"0";
gets the following response back from the host (Linux Mint as it happens)
01234567890123456789012345678901234567890123456789012345678<WS><WS><CR>90123456789_01234567890
The behaviour observed is not really part of bash; rather, it is part of the behaviour of the readline library. It doesn't happen if you simply use echo (which is a bash builtin) to output enough text to force an automatic line wrap, nor does it happen if bash produces an error message which is wider than the console. (Try, for example, the command . with an argument of more then 80 characters not corresponding to any existing file.)
So it's not an official "soft-wrap sequence", nor is it part of any standard. Rather, it's a pragmatic solution to one of the many irritating problems related to console display management.
There is an ambiguity in terminal implementation of line wrapping:
The terminal wraps after a character is inserted at the rightmost position.
The terminal wraps just before the next character is sent.
As a result, it is not possible to reliably send a newline after the last column position. If the terminal had already wrapped (option 1 above), then the newline will create an extra blank line. Otherwise (option 2), the following newline will be "eaten".
These days, almost all terminals follow some variant of option 2, which was the behaviour of the DEC VT-100 terminal. In the vocabulary of the terminfo terminal description database, this is called xenl: the "eat-newline-glitch".
There are actually two possible subvariants of option 2. In the one actually implemented by the VT-100 (and xterm), the cursor ends up in an anomalous state at the end of the line; effectively, it is one character position off the screen, so you can still backspace the cursor in the same line. Other historic terminals "ate" the newline, but positioned the cursor at the beginning of the next line anyway, so that a backspace would not be possible. (Unless the terminal has the bw capability.)
This creates a problem for programs which need to accurately keep track of the cursor position, even for apparently simple applications like echoing input. (Obviously, the easiest way to echo input is to let the terminal do that itself, but that precludes being able to implement extra control characters like tab completion.) Suppose the user has entered text right up to the right margin, and then types the backspace character to delete the last character typed. Normally, you could implement a backspace-delete by outputting a cub1 (move left 1) code and then an el (clear to end of line). (It's more complicated if the deletion is in the middle of a line, but the principle is the same.)
However, if the cursor could possibly be at the beginning of the next line, this won't work. If you knew the cursor was at the beginning of the next, you could move up and then to the right before doing the el, but that wouldn't work if the cursor was still on the same line.
Historically, what was considered "correct" was to force the cursor to the next line with a hard return. (Following quote is taken from the file terminfo.src found in the ncurses distribution. I don't know who wrote it or when):
# Note that the <xenl> glitch in vt100 is not quite the same as on the Concept,
# since the cursor is left in a different position while in the
# weird state (concept at beginning of next line, vt100 at end
# of this line) so all versions of vi before 3.7 don't handle
# <xenl> right on vt100. The correct way to handle <xenl> is when
# you output the char in column 80, immediately output CR LF
# and then assume you are in column 1 of the next line. If <xenl>
# is on, am should be on too.
But there is another way to handle the issue which doesn't require you to even know whether the terminal has the xenl "glitch" or not: output a space character, after which the terminal will definitely have line-wrapped, and then return to the leftmost column.
As it turns out, this trick has another benefit if the terminal emulator is xterm (and probably other such emulators), which allows you to select a "word" by double-clicking on it. If the automatic line wrap happens in the middle of a word, it would be ideal if you could still select the entire word even though it is split over two lines. If you follow the suggestion in the terminfo file above, then xterm will (quite reasonably) treat the split word as two words, because they have an explicit newline between them. But if you let the terminal wrap automatically, xterm treats the result as a single word. (It does this despite the output of the space character, presumably because the space character was overwritten.)
In short, the SPCR sequence is not in any way a standardized feature of the VT100 terminal. Rather, it is a pragmatic response to a specific feature of terminal descriptions combined with the observed behaviour of a specific (and common) terminal emulator. Variants of this code can be found in a variety of codebases, and although as far as I know it is not part of any textbook or formal documentation, it is certainly part of terminal-handling folkcraft [note 2].
In the case of readline, you'll find a comment in the code which is much more telegraphic than this answer: [note 1]
/* If we're at the right edge of a terminal that supports xn, we're
ready to wrap around, so do so. This fixes problems with knowing
the exact cursor position and cut-and-paste with certain terminal
emulators. In this calculation, TEMP is the physical screen
position of the cursor. */
(xn is the short form of xenl.)
Notes
The comment is at line 1326 of display.c in the current view of the git repository as I type this answer. In future versions it may be at a different line number, and the provided link will therefore not work. If you notice that it has changed, please feel free to correct the link.
In the original version of this answer, I described this procedure as "part of terminal handling folklore", in which I used the word "folklore" to describe knowledge passed down from programmer to programmer rather than being part of the canon of academic texts and international standards. While "folklore" is often used with a negative connotation, I use it without such prejudice. "lore" (according to wiktionary) refers to "all the facts and traditions about a particular subject that have been accumulated over time through education or experience", and is derived from an Old Germanic word meaning "teach". Folklore is therefore the accumulated education and experience of the "folk", as opposed to the establishment: in Eric S. Raymond's analogy of the Cathedral and the Bazaar, folklore is the knowledge base of the Bazaar.
This usage raised the eyebrows of at least one highly-skilled practitioner, who suggested the use of the word "esoteric" to describe this bit of information about terminal-handling. "Esoteric" (again according to wiktionary) applies to information "intended for or likely to be understood by only a small number of people with a specialized knowledge or interest, or an enlightened inner circle", being derived from the Greek ἐσωτερικός, "inner circle". (In other words, the knowledge of the Cathedral.)
While the semantic discussion is, at least, amusing, I changed the text by using the hopefully less emotionally-charged word "folkcraft".
There is more than one reason for making line-wrapping a special case (and "folklore" seems an inappropriate term):
The xterm FAQ That description of wrapping is odd, say more? is one of many places discussing vt100 line-wrapping.
vim and screen both take care to not use cursor-addressing to avoid the wrapping, since that would interfere with selecting a wrapped line in xterm. Instead (and the sample seems to show bash doing this too) they send a series of printable characters which step across the margin before sending other control sequences which would prevent the line-wrapping flag from being set in xterm. This is noted in xterm's manual page:
Logical words and lines selected by double- or triple-clicking may wrap
across more than one screen line if lines were wrapped by xterm itself
rather than by the application running in the window.
As for "comments in code" - there certainly are, to explain to maintainers what should not be changed. This from Sven Mascheck's XTerm resource file gives a good explanation:
! Wether this works also with _wrapped_ selections, depends on
! - the terminal emulator: Neither MIT X11R5/6 nor Suns openwin xterm
! know about that. Use the 'xfree xterm' or 'rxvt'. Both compile on
! all major platforms.
! - It only works if xterm is wrapping the line itself
! (not always really obvious for the user, though).
! - Among the different vi's, vim actually supports this with a
! clever and little hackish trick (see screen.c):
!
! But before: vim inspects the _name_ of the value of TERM.
! This must be similar to "xterm" (like "xterm-xfree86", which is
! better than "xterm-color", btw, see his FAQ).
! The terminfo entry _itself_ doesn't matter here
! (e.g.: 'xterm' and 'vs100' are the same entry, but with
! the latter it doesn't work).
!
! If vim has to wrap a word, it appends a space at the first part,
! this space will be wrapped by xterm. Going on with writing, vim
! in turn then positions the cursor again at the _beginning_ of this
! next line. Thus, the space is not visible. But xterm now believes
! that the two lines are actually a single one--as xterm _has_ done
! some wrapping also...
The comment which #rici quotes came from the terminfo file which Eric Raymond incorporated from SCO in 1995. The history section of the terminfo source refers to this. Some of the material in that is based on the BSD termcap sources, but differs, as one would notice when comparing the BSD termcap in this section with ncurses. The four paragraphs beginning with the "not quite" are the same (aside from line-wrapping) with the SCO file. Here is a cut/paste from that file:
# # --------------------------------
#
# dec: DEC (DIGITAL EQUIPMENT CORPORATION)
#
# Manufacturer: DEC (DIGITAL EQUIPTMENT CORP.)
# Class: II
#
# Info:
# Note that xenl glitch in vt100 is not quite the same as concept,
# since the cursor is left in a different position while in the
# weird state (concept at beginning of next line, vt100 at end
# of this line) so all versions of vi before 3.7 don't handle
# xenl right on vt100. The correct way to handle xenl is when
# you output the char in column 80, immediately output CR LF
# and then assume you are in column 1 of the next line. If xenl
# is on, am should be on too.
#
# I assume you have smooth scroll off or are at a slow enough baud
# rate that it doesn't matter (1200? or less). Also this assumes
# that you set auto-nl to "on", if you set it off use vt100-nam
# below.
#
# The padding requirements listed here are guesses. It is strongly
# recommended that xon/xoff be enabled, as this is assumed here.
#
# The vt100 uses rs2 and rf rather than is2/tbc/hts because the
# tab settings are in non-volatile memory and don't need to be
# reset upon login. Also setting the number of columns glitches
# the screen annoyingly. You can type "reset" to get them set.
#
# smkx and rmkx, given below, were removed.
# smkx=\E[?1h\E=, rmkx=\E[?1l\E>,
# Somtimes smkx and rmkx are included. This will put the auxilliary keypad in
# dec application mode, which is not appropriate for SCO applications.
vt100|vt100-am|dec vt100 (w/advanced video),
If you compare the two, the ncurses version has angle brackets added around the terminfo capability names, and a minor grammatical change was made in the first sentence. But the author of the comment clearly was not Raymond.

Most reliable way to get text into ruby script

I have a ruby script that’ll do some text parsing (à lá markdown). It does it in a sequence of steps, like
string = string.gsub # more code here
string = string.gsub # more code here
# and so on
what is the best (i.e. most reliable) way to feed text into string in the first place? It’s a script, and the text it’ll be fed can vary a lot — it can be multilingual, have some characters that might trip a shell (like ", ', ’, &, $ you get the idea), and will likely be multi-line.
Is there some trick on the lines of
cat << EOF
bunch of text here
EOF
Additional considerations
I’m not looking for a markdown parser, this is something I want to do, not something I want a tool for.
I’m not a big ruby user (I’m starting to use it), so the more detailed the answer you can provide, the better.
It must be completely scriptable (i.e., no interrupting to ask the user for information).
The Kernel#gets method will read a string separated using the record separator from stdin or files specified on the command line. So if you use that you can do things like:
yourscript <filename #read from filename
yourscript file1 file2 # read both file1 and file2
yourscript #lets you type at your script
So to run something like:
cat <<'eof' |ruby yourscript.rb
This' & will $all 'eof' be 'fine'''
eof
Script might contain something like:
s = gets() # read a line
lines = readlines() # read all lines into an array
That's fairly standard for command-line scripts. If you want to have a user-interface then you'll want something more complex. There is an option to the Ruby interpreter to set the encoding of files as they are read.
Just read from stdin (which is an IO object):
$stdin.read
As you can see, stdin is provided in the global variable $stdin. Since it’s an IO object, there are a lot of other methods available if read doesn’t suit your needs.
Here’s a simple one-line example in the shell:
$ echo "foo\nbar" | ruby -e 'puts $stdin.read.upcase'
FOO
BAR
Obviously reading from stdin is extremely flexible since you can pipe input in from anywhere.
Ruby is very adept at encodings (see eg. Encoding docs). To get text into Ruby, one typically uses either gets, or reads File objects, or uses a GUI, which one can build with gtk2 gem or rugui (if already finished). In case you are getting texts from the wild internet, security should be your concern. Ruby used to have 4 $SAFE levels, but after some discussions, now there might be only 3 of them left. In any case, the best strategy to handle strings is to know as much as possible about the properties of the string that you expect in advance. Handling absolutely arbitrary strings is a surprisingly difficult task. Try to limit the number of possible encodings and figure the maximum size for the string that you expect.
Also, with respect to your original stated goal writing a markdown-processor-like something, you might want to not reinvent the wheel (unless it is for didactic purposes). There is this SO post:
Better ruby markdown interpreter?
The answer will direct you to kramdown gem, which gets a lot of praise, though I have not tried it personally.

Resources