Alternative for-loop construct - bash

General comment: any new answer which gives a new and useful insight into this question will be rewarded with a bonus.
The Bash reference manual mentions that Bash supports the
following for-loop constructs:
for name [ [in [words ...] ] ; ] do commands; done
for (( expr1 ; expr2 ; expr3 )) ; do commands ; done
Surprisingly, the following for-loop constructs are also valid:
for i in 1 2 3; { echo $i; }
for ((i=1;i<=3;++i)); { echo $i; }
These unusual constructs are not documented at all. Neither the Bash
manual, the Bash man-pages nor The Linux Documentation
Project make any mention of these constructs.
When investigating the language grammar one can see that using
open-and-close braces ({ commands; }) as an alternative to do commands; done is a valid construct that is implemented for both
for-loops and select statements and dates back to Bash-1.14.7
[1].
The other two loop-constructs:
until test-commands; do consequent-commands; done
while test-commands; do consequent-commands; done
do not have this alternative form.
Since a lot of shell-languages are related, one can find that these
constructs are also defined there and mildly documented. The KSH manual mentions:
For historical reasons, open and close braces may be used instead of do and done e.g.
for i; { echo $i; }
while ZSH implements and documents similar alternatives for the other loop-constructs, but with limitations. It states:
For the if, while and until commands, in both these cases the
test part of the loop must also be suitably delimited, such as by
[[ ... ]] or (( ... )), else the end of the test will not be recognized.
Question: What is the origin of this construct and why is
this not propagated to the other loop-constructs?
Update 1: There are some very useful and educational comments below
this post pointing out that this is an undocumented Bourne Shell feature which seems to be the result of a C-vs-sh language battle in the early days.
Update 2: When asking the question: Why is this language feature not documented? to the Gnu Bash mailinglist, I received the following answer from Chet Ramey (current lead-developer of GNU bash):
It's never been documented. The reason bash supports it (undocumented) is
because it was an undocumented Bourne shell feature that we implemented
for compatibility. At the time, 30+ years ago, there were scripts that used
it. I hope those scripts have gone into the dustbin of history, but who
knows how many are using this construct now.
I'm going to leave it undocumented; people should not be using it anyway.
Related questions/answers:
A bash loop with braces?
Hidden features of Bash (this answer)
[U&L] What is the purpose of the “do” keyword in Bash for loops?
Footnotes: [1] I did not find earlier versions, I do believe it predates this

[W]hy is this not propagated to the other loop-constructs?
Braced forms of while and until commands would be syntactically ambiguous because you can't separate test-commands from consequent-commands without having a distinctive delimiter between them as they are both defined by POSIX to be compound lists.
For example, a shell that supports such constructs can choose either one of the brace groups in the command below as consequent-commands and either way it would be a reasonable choice.
while true; { false; }; { break; }
Because of its ambiguous form, this command can be translated to either of the below; neither is a more accurate translation than the other, and they do completely different things.
while true; do
false
done
break
while true; { false; }; do
break
done
The for command is immune to this ambiguity because its first part—a variable name optionally followed by in and a list of words, or a special form of the (( compound command—can easily be distinguished from the brace group that forms its second part.
Given that we already have a consistent syntax for while and until commands, I don't really see any point in propagating this alternate form to them.
Wrt its origin, see:
Characteristical common properties of the traditional Bourne shells,
Stephen Bourne's talk at BSDCon,
Unix v7 source code, sh/cmd.c.

Related

Bash 4.2.46 getopts concatenated short options and resetting OPTIND during processing leads to an infinite loop: How can this be remedied in Bash?

I'm using the following Bash version in a up-to-date CentOS 7 VM:
GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
The following code performs as expected (take note of -x -y):
set -- -x -y; OPTIND=1; while getopts xy opt; do echo $opt; OPTIND=$OPTIND; done
x
y
However, when I combine the two short options to -xy, an infinite loop happens:
set -- -xy; OPTIND=1; while getopts xy opt; do echo $opt; OPTIND=$OPTIND; done
x
x
x ... infinite output
The trigger is the OPTIND=$OPTIND assignment. If this is removed, the behavior doesn't happen. It feels like there is some hidden substring indexing that is going on:
-xy isn't just index $1 that OPTIND could describe as a single integer
it feels like index ${1:1:1} for the x
and ${1:2:1} for the y
Perhaps these are indicated by other getopts parameters not described in the man page. Can anyone shed any light on this that might help me resolve this one issue when writing my wrapper to allow for partial argument/nested getopts handling?
For the curious: I've implemented some wrapper functions for nested getopts processing. These allow for the partial processing of arguments during which functions might be called that also use the wrapper functions to do getopts processing. I save the OPTIND values in a stack array variable and as one pops out of a nesting, the OPTIND needs to be reset. It all works quite nicely except for the case where one uses concatenated short flag arguments. (The implementation also makes specifying long options possible.)
How can this be remedied in Bash?
Well, by patching the sources. But I wouldn't like to do it, I believe the current behavior is the right one.
You could add an additional function to getopt.c and expose it via some special variable in variables.c that would allow manipulating getopts internal state.
Or even simpler - I see getopts.def loadable builtin which you could patch to add some additional option to serialize/deserialize getopts state.
And you can also provide your own implementation of getopts as a bash function with your custom semantics and custom state serializer/deserializer.
Can anyone shed any light on
From posix getopts:
If the application sets OPTIND to the value 1, a new set of parameters can be used: either the current positional parameters or new arg values. Any other attempt to invoke getopts multiple times in a single shell execution environment with parameters (positional parameters or arg operands) that are not the same in all invocations, or with an OPTIND value modified to be a value other than 1, produces unspecified results.
From that we know:
setting OPTIND=1 resets the internal state of getopt ()
it is not specified what should happen if you modify OPTIND.
The behavior you are seeing is documented - setting OPTIND=1 as $OPTIND is equal to 1 resets getopt(), which results in endless loop, as one would expect. Except for that, bash documentation does not specify what should happen when you modify OPTIND. Do not do it. Your expectation that setting OPTIND to custom value will affect getopts in specific ways is not based on anything. It will not.
resolve this one issue when writing my wrapper to allow for partial argument/nested getopts handling?
If you are writing your own argument parsing module, do not use getopts and do not depend on undefined, unspecified nor implementation defined behavior. I suggest to do it the same way GNU getopt does - produce a shell source-able string in separate sub-process, instead of relying on global variables OPT* and contributing to spaghetti code.
Do not nest getopts, it is not re-entrant and there is no way to affect it's internal state and it uses global variables. getopts sets OPTIND, it is not required to read it, except for the case when OPTIND is reset to 1, in which case getopts is reset. Any other value may just be ignored. Just call one getopts after another.

write a bash function that mimic 'return' builtin

I'm trying to write a bash function that would do the equivalent of the return builtin, it could be used like this:
f() {
echo a
my_return 15
echo c
return 17
}
It would have to behave exactly as return builtin is expected to work.
The context is that I'm looking in the depth of bash for fun/experiments to mainly see if we can implement some higher language constructs and notions as are exceptions, continuations or similar concepts in bash.
I actually managed to implement the breaking of the instruction flow by using DEBUG trap returning a value of 2 (as explained in extdebug part of the bash manual, this simulates a return in the current function, but doesn't set the return value of it), the only problem is that this return value of 2 is then passed as the return value of the whole function, thus removing me the possibility to set it to an arbitrary value.
Using RETURN trap did not seem to work neither. It seems we can't set the return value of the function in any way. I achieved already pretty impressive flow control feats thanks to DEBUG/RETURN traps, and the goal seems not so far anymore.
So would you know about a way to achieve this implementation of my_return in pure bash, with current implementation ? if not, what minimal modification would you suggest in bash implementation to allow this ? (I'm thinking of patches maybe on trap.c on run_{debug,return}_trap functions, allowing to set the return value of the returning function, or maybe adding a custom bash builtin, as it seems to be quite pluggable here).

Should I use "test" or "[" "]" in POSIX shell?

I believe both of the following code snippets are valid in POSIX compliant shell:
Option 1:
if [ "$var" = "dude" ]
then
echo "Dude, your var equals dude."
fi
Option 2:
if test "$var" = "dude"
then
echo "Dude, your var equals dude."
fi
Which syntax is preferred and why? Is there a reason to use one over the other in certain situations?
There is no functional difference, making this a purely stylistic choice with no widely accepted guidelines. The bash-hackers wiki has an extended section on classic (POSIX-compliant) test, with a great deal of attention to best practices and pitfalls, and takes no position on which to prefer.
Moreover, the POSIX specification for test -- while it does mark a great deal of functionality obsolescent1 -- specifies neither form as preferred over the other.
That said, one advantage to test is that it's less conducive to folks bringing in expectations from other languages which result in broken or buggy code. For instance, it's a common error to write [$foo=1] rather than the correct [ "$foo" = 1 ], but folks aren't widely seen to write test$foo=1: It's more visually obvious that test "$foo" = 1 is following the same parsing rules as other shell commands, and thus requires the same care regarding quoting and whitespace.
[1] Such as -a, -o, ( and ), and any usage with more than four arguments (excluding the trailing ] on an instance started under the name [).

printf column alignment issue

Can someone help me understand printf's alignment function. I have tried reading several examples on Stack and general Google results and I'm still having trouble understanding its syntax. Here is essentially what I'm trying to achieve:
HOLDING 1.1.1.1 Hostname Potential outage!
SKIPPING 1:1:1:1:1:1:1:1 Hostname Existing outage!
I'm sorry, I know this is more of a handout than my usual questions. I really don't know how to start here. I have tried using echo -e "\t" in the past which works for horizontal placement, but not alignment. I have also incorporated a much more complex tcup solution using a for loop, but this will not work easily in this situation.
I just discovered printf's capability though and it seems like it will do what I need, but I dont understand the syntax. Maybe something like this?
A="HOLDING"
B="1.1.1.1"
C="Hostname"
D="Potential outage"
for (( j=1; j<=10; j++ )); do
printf "%-10s" $A $B $C $D
echo "\n"
done
Those variables would be fed in from a db though. I still dont really understand the printf syntax though? Please help
* ALSO *
Off topic question, what is your incentive for responding? I'm fairly new to stack exchange. Do some of you get anything out of it other than reputation. Careers 2.0? or something else? Some people have ridiculous stats on this site. Just curious what the drive is.
The string %-10s can be broken up into multiple parts:
% introduces a conversion specifier, i.e. how to format an argument
- specifies that the field should be left aligned.
10 specifies the field width
s specifies the data type, string.
Bash printf format strings mimic those of the C library function printf(3), and this part is described in man 3 printf.
Additionally, Bash printf, when given more arguments than conversion specifiers, will print the string multiple times for each argument, so that printf "%-10s" foo bar is equivalent to printf "%-10s" foo; printf "%-10s" bar. This is what lets you specify all the arguments on the same command, with the %-10s applying to each of them.
As for people's motivation, you could try the meta site, which is dedicated to questions about stackoverflow itself.

Why are bash variables 'different'?

Is there some reason why bash 'variables' are different from variables in other 'normal' programming languages?
Is it due to the fact that they are set by the output of previous programs or have to be set by some kind of literal text, ie they have to be set by the output of some program or something outputting text through standard input/output or the console or such like?
I am at a loss to use the right vocabulary, but can anyone who can understands what I trying to say and perhaps use the right words or point me some docs where I can understand bash variable concepts better.
In most languages, variables can contain different kinds of values. For example, in Python a variable can be a number that you can do arithmetics on (a-1), an array or string you can split (a[3:]), or a custom, nested object (person.name.first_name).
In bash, you can't do any of this directly. If I understood you right, you asked why this is.
There are two reasons why you can't really do the same in bash.
One: environment variables are (conventionally) simple key=value strings, and the original sh was a pretty thin wrapper on top of the Unix process model. Bash works the same, for technical and compatibility reasons. Since all variables are (based on) strings, you can't really have rich, nested types.
This also means that you can't set a variable in a subshell/subscript you call. The variable won't be set in the parent script, because that's not how environment variables work.
Two: Original sh didn't separate code and data, since this makes it easier to work with interactively. Sh treated all non-special characters as literal. I.e. find / -name foo was considered four literal strings: a command and three arguments.
Bash can't just decide that find / -name now means "the value of the variable find divided by the negated value of variable name", since that would mean everyone's find commands would start breaking. This is why you can't have the simple dereferencing syntax other languages do.
Even $name-1 can't be used to substract, because it could just as easily be intended as part of $name-1-12-2012.tar.gz, a filename with a timestamp.
I would say it has to do with Bash functions. Bash functions cannot return a value, only a status code.
So with Bash you can have a function
foo ()
{
grep bar baz
}
But if you try to "save" the return value of the function
quux=$?
It is merely saving the exit status, not any value. Contrast this with a language such as Javascript, functions can actually return values.
foo ()
{
return document.getElementById("dog").getAttribute("cat");
}
and save like this
quux = foo();

Resources