What does the POSIX spec mean when it says this is necessary to avoid ambiguity? - bash

When responding to this comment:
Now I got the the two ":"s are independent, and that's why I couldn't find any document about them. Is the first one needed in this case?
I noticed this paragraph in the spec for the first time:
In the parameter expansions shown previously, use of the <colon> in the format shall result in a test for a parameter that is unset or null; omission of the <colon> shall result in a test for a parameter that is only unset. If parameter is '#' and the colon is omitted, the application shall ensure that word is specified (this is necessary to avoid ambiguity with the string length expansion).
I've seen the matching explanation in the bash reference manual:
When not performing substring expansion, using the form described below (e.g., ‘:-’), Bash tests for a parameter that is unset or null. Omitting the colon results in a test only for a parameter that is unset. Put another way, if the colon is included, the operator tests for both parameter’s existence and that its value is not null; if the colon is omitted, the operator tests only for existence.
before and I understand what the difference is with the colon versions of these expansions.
What confused me just now is this sentence from the spec:
If parameter is '#' and the colon is omitted, the application shall ensure that word is specified (this is necessary to avoid ambiguity with the string length expansion).
I don't understand what ambiguity is possible here if word is unspecified.
None of the expansion sigils are valid in shell variable names so they cannot possibly start a single-character variable name. If they could then using a parameter of # would always be ambiguous without a colon since you could never tell if ${#+foo} meant the length of the variable foo or an alternate expansion on #, etc.
What am I missing here? What ambiguity requires ensuring that word exist? (I mean not having word in this expansion is clearly not useful but that's not the same thing.)

- is also a shell special parameter, whose value is a string indicating which shell options are currently set. For example,
$ echo $-
himBH
${#parameter} is the syntax for the length of a parameter.
$ foo=bar
$ echo ${#foo}
3
The expression ${#-}, therefore is ambiguous: is it the length of the value of $-, or is does it expand to the empty string if $# is empty? (Unlikely, since $# is always an integer and cannot be unset, but syntactically legal.) I interpret the spec to meant that ${#-} should resolve the ambiguity by expanding to the length of $- (which is what most shells seem to do).

Related

Bash - Why does $VAR1=FOO or 'VAR=FOO' (with quotes) return command not found?

For each of two examples below I'll try to explain what result I expected and what I got instead. I'm hoping for you to help me understand why I was wrong.
1)
VAR1=VAR2
$VAR1=FOO
result: -bash: VAR2=FOO: command not found
In the second line, $VAR1 gets expanded to VAR2, but why does Bash interpret the resulting VAR2=FOO as a command name rather than a variable assignment?
2)
'VAR=FOO'
result: -bash: VAR=FOO: command not found
Why do the quotes make Bash treat the variable assignment as a command name?
Could you please describe, step by step, how Bash processes my two examples?
How best to indirectly assign variables is adequately answered in other Q&A entries in this knowledgebase. Among those:
Indirect variable assignment in bash
Saving function output into a variable named in an argument
If that's what you actually intend to ask, then this question should be closed as a duplicate. I'm going to make a contrary assumption and focus on the literal question -- why your other approaches failed -- below.
What does the POSIX sh language specify as a valid assignment? Why does $var1=foo or 'var=foo' fail?
Background: On the POSIX sh specification
The POSIX shell command language specification is very specific about what constitutes an assignment, as quoted below:
4.21 Variable Assignment
In the shell command language, a word consisting of the following parts:
varname=value
When used in a context where assignment is defined to occur and at no other time, the value (representing a word or field) shall be assigned as the value of the variable denoted by varname.
The varname and value parts shall meet the requirements for a name and a word, respectively, except that they are delimited by the embedded unquoted equals-sign, in addition to other delimiters.
Also, from section 2.9.1, on Simple Commands, with emphasis added:
The words that are recognized as variable assignments or redirections according to Shell Grammar Rules are saved for processing in steps 3 and 4.
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
Also, from the grammar:
If all the characters preceding '=' form a valid name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name), the token ASSIGNMENT_WORD shall be returned. (Quoted characters cannot participate in forming a valid name.)
Note from this:
The command must be recognized as an assignment at the very beginning of the parsing sequence, before any expansions (or quote removal!) have taken place.
The name must be a valid name. Literal quotes are not part of a valid variable name.
The equals sign must be unquoted. In your second example, the entire string was quoted.
Assignments are recognized before tilde expansion, parameter expansion, command substitution, etc.
Why $var1=foo fails to act as an assignment
As given in the grammar, all characters before the = in an assignment must be valid characters within a variable name for an assignment to be recognized. $ is not a valid character in a name. Because assignments are recognized in step 1 of simple command processing, before expansion takes place, the literal text $var1, not the value of that variable, is used for this matching.
Why 'var=foo' fails to act as an assignment
First, all characters before the = must be valid in variable names, and ' is not valid in a variable name.
Second, an assignment is only recognized if the = is not quoted.
1)
VAR1=VAR2
$VAR1=FOO
You want to use a variable name contained in a variable for the assignment. Bash syntax does not allow this. However, there is an easy workaround :
VAR1=VAR2
declare "$VAR1"=FOO
It works with local and export too.
2)
By using single quotes (double quotes would yield the same result), you are telling Bash that what is inside is a string and to treat it as a single entity. Since it is the first item on the line, Bash tries to find an alias, or shell builtin, or an executable file in its PATH, that would be named VAR=FOO. Not finding it, it tells you there is no such command.
An assignment is not a normal command. To perform an assignment contained in a quote, you would need to use eval, like so :
eval "$VAR1=FOO" # But please don't do that in real life
Most experienced bash programmers would probably tell you to avoid eval, as it has serious drawbacks, and I am giving it as an example just to recommend against its use : while in the example above it would not involve any security risk or error potential because the value of VAR1 is known and safe, there are many cases where an arbitrary (i.e. user-supplied) value could cause a crash or unexpected behavior. Quoting inside an eval statement is also more difficult and reduces readability.
You declare VAR2 earlier in the program, right?
If you are trying to assign the value of VAR2 to VAR1, then you need to make sure and use $ in front of VAR2, like so:
VAR1=$VAR2
That will set the value of VAR2 equal to VAR1, because when you utilize the $, you are saying that value that is stored in the variable. Otherwise it doesn't recognize it as a variable.
Basically, a variable that doesn't have a $ in front of it will be interpreted as a command. Any word will. That's why we have the $ to clarify "hey this is a variable".

Why does field splitting not occur after parameter expansion in an assignment statement in shell?

Consider the following two assignments.
$ a="foo bar"
$ b=$a
$ b=foo bar
bash: bar: command not found
Why does the second assignment work fine? How is the second command any different from the third command?
I was hoping the second assignment to fail because
b=$a
would expand to
b=foo bar
Since $a is not within double-quotes, foo bar is not quoted, therefore field-splitting should occur (as per my understanding) which would result in b=foo to be considered an assignment and bar to be a command that cannot be found.
Summary: I was expecting the 2nd command to fail for the same reason that caused the 3rd command to fail. Why does the 2nd command succeed?
I went through the POSIX but I am unable to find anything that specifies that field splitting won't occur after parameter expansion that occurs in an assignment.
I mean anywhere else field splitting would occur for an unquoted parameter after parameter expansion. For example,
$ a="foo bar"
$ printf "[%s] [%s]\n" $a
[foo] [bar]
See Section 2.6.5.
After parameter expansion (Parameter Expansion), command substitution (Command Substitution), and arithmetic expansion (Arithmetic Expansion), the shell shall scan the results of expansions and substitutions that did not occur in double-quotes for field splitting and multiple fields can result.
So which part of the POSIX standard prevents field splitting when parameter expansion occurs in an assignment statement?
In 2.9.1, "Simple Commands":
The words that are recognized as variable assignments or redirections according to Shell Grammar Rules are saved for processing in steps 3 and 4.
Step 2 -- which is explicitly skipped in this case per the above text -- reiterates that it ignores assignments when performing expansion and field splitting:
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Thus, it's step 2 that determines the command to run (based on contents other than variable assignments and redirections), which addresses the b=$a case given in your question.
Step 4 performs other expansions -- "tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal" -- for assignments. Notably, field splitting is not a member of this set. Indeed, it's explicit in 2.6 that none of these create multiple words in and of themselves:
Tilde expansions, parameter expansions, command substitutions, arithmetic expansions, and quote removals that occur within a single word expand to a single field. It is only field splitting or pathname expansion that can create multiple fields from a single word. The single exception to this rule is the expansion of the special parameter '#' within double-quotes, as described in Special Parameters.

unix: double quote and $ is removing exactly one digit

In the Posix Standard there are some Definitions for the behavior of characters inside of Double Quotes, e.g. "".
There are different expansions that are taking place on such characters. But one behavior I'm unable to find a representation/description of in the standard/internet is this:
// Longest possible Name of a Variable, : is not a valid character in the name of a variable
~>echo "$aa:"
:
// The first character of name is not a digit
~>4=test
error 4=test is not a directory (or a similar error message)
// So this can't fall under parameter expansion?
~>echo "$4a:"
a:
// Hu?
~>echo "$44a:"
4a:
Excerpts of the possible standard sections:
2.6.2 Parameter Expansion
If the parameter name or symbol is not enclosed in braces, the expansion shall use the longest valid name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name), whether or not the symbol represented by that name exists.
3.230 Name
In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.
Enviroment is a Fedora 19 64-bit standard terminal without any modifications.
Thanks in advance for clarification what's going on.
Refer to the section on positional parameters:
2.5.1 Positional Parameters
A positional parameter is a parameter denoted by the decimal value
represented by one or more digits, other than the single digit 0. The
digits denoting the positional parameters shall always be interpreted
as a decimal value, even if there is a leading zero. When a positional
parameter with more than one digit is specified, the application shall
enclose the digits in braces (see Parameter Expansion). Positional
parameters are initially assigned when the shell is invoked (see sh),
temporarily replaced when a shell function is invoked (see Function
Definition Command), and can be reassigned with the set special
built-in command.
When you say:
echo "$4a:"
the shell attempts to expand a positional parameter, namely $4 and concatenates a: with the expansion of the parameter.
Similarly, for
echo "$44a:"
4a: is concatenated to the expansion of $4.
Note that if you actually wanted to refer to a positional parameter $44, you'd need to say ${44} else the shell would concatenate 4 to the expansion of $4.

How does this bash code detect an interactive session?

Following some issues with scp (it did not like the presence of the bash bind command in my .bashrc file, apparently), I followed the advice of a clever guy on the Internet (I just cannot find that post right now) that put at the top of its .bashrc file this:
[[ ${-#*i} != ${-} ]] || return
in order to make sure that the bash initialization is NOT executed unless in interactive session.
Now, that works. However, I am not able to figure how it works. Could you enlighten me?
According to this answer, the $- is the current options set for the shell and I know that the ${} is the so-called "substring" syntax for expanding variables.
However, I do not understand the ${-#*i} part. And why $-#*i is not the same as ${-#*i}.
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern just as in filename
expansion. If the pattern matches the
beginning of the expanded value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ‘#’ case) or the longest matching pattern (the
‘##’ case) deleted. If parameter is ‘#’ or ‘’, the pattern removal
operation is applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with ‘#’ or ‘’, the pattern removal operation is applied
to each member of the array in turn, and the expansion is the
resultant list.
Source: http://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
So basically what happens in ${-#*i} is that *i is expanded, and if it matches the beginning of the value of $-, then the result of the whole expansion is $- with the shortest matching pattern between *i and $- deleted.
Example
VAR="baioasd";
echo ${VAR#*i};
outputs oasd.
In your case
If shell is interactive, $- will contain the letter 'i', so when you strip the variable $- of the pattern *i you will get a string that is different from the original $- ( [[ ${-#*i} != ${-} ]] yelds true).
If shell is not interactive, $- does not contain the letter 'i' so the pattern *i does not match anything in $- and [[ ${-#*i} != $- ]] yelds false, and the return statement is executed.
See this:
To determine within a startup script whether or not Bash is running interactively, test the value of the ‘-’ special parameter. It contains i when the shell is interactive
Your substitution removes the string up to, and including the i and tests if the substituted version is equal to the original string. They will be different if there is i in the ${-}.

Bash bad substitution with subshell and substring

A contrived example... given
FOO="/foo/bar/baz"
this works (in bash)
BAR=$(basename $FOO) # result is BAR="baz"
BAZ=${BAR:0:1} # result is BAZ="b"
this doesn't
BAZ=${$(basename $FOO):0:1} # result is bad substitution
My question is which rule causes this [subshell substitution] to evaluate incorrectly? And what is the correct way, if any, to do this in 1 hop?
First off, note that when you say this:
BAR=$(basename $FOO) # result is BAR="baz"
BAZ=${BAR:0:1} # result is BAZ="b"
the first bit in the construct for BAZ is BAR and not the value that you want to take the first character of. So even if bash allowed variable names to contain arbitrary characters your result in the second expression wouldn't be what you want.
However, as to the rule that's preventing this, allow me to quote from the bash man page:
DEFINITIONS
The following definitions are used throughout the rest of this docu‐
ment.
blank A space or tab.
word A sequence of characters considered as a single unit by the
shell. Also known as a token.
name A word consisting only of alphanumeric characters and under‐
scores, and beginning with an alphabetic character or an under‐
score. Also referred to as an identifier.
Then a bit later:
PARAMETERS
A parameter is an entity that stores values. It can be a name, a num‐
ber, or one of the special characters listed below under Special Param‐
eters. A variable is a parameter denoted by a name. A variable has a
value and zero or more attributes. Attributes are assigned using the
declare builtin command (see declare below in SHELL BUILTIN COMMANDS).
And later when it defines the syntax you're asking about:
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of
parameter starting at the character specified by offset.
So the rules as articulated in the manpage say that the ${foo:x:y} construct must have a parameter as the first part, and that a parameter can only be a name, a number, or one of the few special parameter characters. $(basename $FOO) is not one of the allowed possibilities for a parameter.
As for a way to do this in one assignment, use a pipe to other commands as mentioned in other responses.
Modified forms of parameter substitution such as ${parameter#word} can only modify a parameter, not an arbitrary word.
In this case, you might pipe the output of basename to a dd command, like
BAR=$(basename -- "$FOO" | dd bs=1 count=1 2>/dev/null)
(If you want a higher count, increase count and not bs, otherwise you may get fewer bytes than requested.)
In the general case, there is no way to do things like this in one assignment.
It fails because ${BAR:0:1} is a variable expansion. Bash expects to see a variable name after ${, not a value.
I'm not aware of a way to do it in a single expression.
As others have said, the first parameter of ${} needs to be a variable name. But you can use another subshell to approximate what you're trying to do.
Instead of:
BAZ=${$(basename $FOO):0:1} # result is bad substitution
Use:
BAZ=$(_TMP=$(basename $FOO); echo ${_TMP:0:1}) # this works
A contrived solution for your contrived example:
BAZ=$(expr $(basename $FOO) : '\(.\)')
as in
$ FOO=/abc/def/ghi/jkl
$ BAZ=$(expr $(basename $FOO) : '\(.\)')
$ echo $BAZ
j
${string:0:1},string must be a variable name
for example:
FOO="/foo/bar/baz"
baz="foo"
BAZ=eval echo '${'"$(basename $FOO)"':0:1}'
echo $BAZ
the result is 'f'

Resources