How does `extension="${filename##*.}"` work in bash? [duplicate] - bash

This question already has answers here:
What is the meaning of the ${0##...} syntax with variable, braces and hash character in bash?
(4 answers)
Closed 2 years ago.
While looking online on how to get a file's extension and name, I found:
filename=$(basename "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}
What is the ${} syntax...? I know regular expressions but "${filename##*.}" and "${filename%.*} escape my understanding.
Also, what's the difference between:
filename=$(basename "$fullfile")
And
filename=`basename "$fullfile"`
...?
Looking in Google is a nightmare, because of the strange characters...

The ${filename##*.} expression is parameter expansion ("parameters" being the technical name for the shell feature that other languages call "variables"). Plain ${varname} is the value of the parameter named varname, and if that's all you're doing, you can leave off the curly braces and just put $varname. But if you leave the curly braces there, you can put other things inside them after the name, to modify the result. The # and % are some of the most basic modifiers - they remove a prefix or suffix of the string that matches a wildcard pattern. # removes from the beginning, and % from the end; in each case, a single instance of the symbol removes the shortest matching string, while a double symbol matches the longest. So ${filename##*.} is "the value of filename with everything from the beginning to the last period removed", while ${filename%.*} is "the value of filename with everything from the last period to the end removed".
The backticks syntax (`...`) is the original way of doing command substitution in the Bourne shell, and has since been borrowed by languages like Perl and Ruby to incorporate calling out to system commands. But it doesn't deal well with nesting, and its attempt to even allow nesting means that quoting works differently inside them, and it's all very confusing. The newer $(...) syntax, originally introduced in the Korn shell and then adopted by Bash and zsh and codified by POSIX, lets quoting work the same at all levels of a nested substitution and makes for a nice symmetry with the ${...} parameter expansion.

As #e0k states in a comment on the question the ${varname...} syntax is Bash's parameter (variable) expansion. It has its own syntax that is unrelated to regular expressions; it encompasses a broad set of features that include:
specifying a default value
prefix and postfix stripping
string replacement
substring extraction
The difference between `...` and $(...) (both of which are forms of so-called command substitutions) is:
`...` is the older syntax (often called deprecated, but that's not strictly true).
$(...) is its modern equivalent, which facilitates nested use and works more intuitively when it comes to quoting.
See here for more information.

Related

What does a caret + brace do within a bash variable?

What does a caret do when appended to a bash variable but within braces? I'm trying to decipher this within a bash script:
readonly TEST=${USER^}
When I don't know the meaning of some syntax in bash/sh I use my browsers find function in bash's manual and sh's specification. This is pretty effective as both contain the entire manual in a single page.
From bash's manual:
${parameter^pattern}
[...]
The ‘^’ operator converts lowercase letters matching pattern to uppercase
[...]
If pattern is omitted, it is treated like a ‘?’, which matches every character.
So ${variable^} expands to the value of $variable with the first letter converted to its uppercase variant.

Why can substring variable expansion reference a variable without a dollar sign?

In bash, to get the first 4 characters of a variable, you can do:
variable='this is a variable'
echo ${variable:0:4}
Instead of hard-coding the length, you can reference a variable like this:
length=4
echo ${variable:0:$length}
However, it seems that you can leave off the $ off length as well:
echo ${variable:0:length}
It does not make sense to me that you should be able to do this because I always thought that to use/evaluate a variable, you have to prefix it with $.
In other languages, I would expect the text after each : to be a number or an expression that evaluates to a number. And in bash, length wouldn't evaluate to anything, but $length would.
This is confusing. Could someone help me understand what is going on here?
In general is correct to use the "$" symbol to expand a variable, but in some cases the bash auto-expands variable. For example in context like arithmetics or indirect expansion
(see Shell expansion to more detailed information).
However your case is a simple arithmetic context expansion.

Are quotes necessary in bash when declaring local variables based on the command line argument variable expansion? [duplicate]

This question already has answers here:
Quoting vs not quoting the variable on the RHS of a variable assignment
(5 answers)
Closed 4 years ago.
Are the quotes in the below example necessary or superfluous. And why?
#!/bin/bash
arg1="$1"
arg2="$2"
How do you explain the fact when $1 is 123 echo abc, the first assignment is not interpreted as:
arg1=123 echo abc
which is a normal command (echo) call with argument abc and an environment variable (arg) passed to the execution.
From section 2.9.1 of the POSIX shell syntax specification:
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
String-splitting and globbing (the steps which double quotes suppress) are not in this list.
Thus, the quotes are superfluous -- not just for assignments where the right-and side refers to a positional parameter, but for all assignments barring those where (1) the behavior of single-quoted, not double-quoted, strings are desired; or (2) whitespace or other content in the value would be otherwise parsed as syntactic rather than literal.
(Note that the decision on how to parse a command -- thus, whether it is an assignment, a simple command, a compound command, or something else -- takes place before parameter expansions; thus, var=$1 is determined to be an assignment before the value of $1 is ever considered! Were this untrue, such that data could silently become syntax, it would be far more difficult -- if not impossible -- to write secure code handling untrusted data in bash).

Bash - Why does $VAR1=FOO or 'VAR=FOO' (with quotes) return command not found?

For each of two examples below I'll try to explain what result I expected and what I got instead. I'm hoping for you to help me understand why I was wrong.
1)
VAR1=VAR2
$VAR1=FOO
result: -bash: VAR2=FOO: command not found
In the second line, $VAR1 gets expanded to VAR2, but why does Bash interpret the resulting VAR2=FOO as a command name rather than a variable assignment?
2)
'VAR=FOO'
result: -bash: VAR=FOO: command not found
Why do the quotes make Bash treat the variable assignment as a command name?
Could you please describe, step by step, how Bash processes my two examples?
How best to indirectly assign variables is adequately answered in other Q&A entries in this knowledgebase. Among those:
Indirect variable assignment in bash
Saving function output into a variable named in an argument
If that's what you actually intend to ask, then this question should be closed as a duplicate. I'm going to make a contrary assumption and focus on the literal question -- why your other approaches failed -- below.
What does the POSIX sh language specify as a valid assignment? Why does $var1=foo or 'var=foo' fail?
Background: On the POSIX sh specification
The POSIX shell command language specification is very specific about what constitutes an assignment, as quoted below:
4.21 Variable Assignment
In the shell command language, a word consisting of the following parts:
varname=value
When used in a context where assignment is defined to occur and at no other time, the value (representing a word or field) shall be assigned as the value of the variable denoted by varname.
The varname and value parts shall meet the requirements for a name and a word, respectively, except that they are delimited by the embedded unquoted equals-sign, in addition to other delimiters.
Also, from section 2.9.1, on Simple Commands, with emphasis added:
The words that are recognized as variable assignments or redirections according to Shell Grammar Rules are saved for processing in steps 3 and 4.
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
Also, from the grammar:
If all the characters preceding '=' form a valid name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name), the token ASSIGNMENT_WORD shall be returned. (Quoted characters cannot participate in forming a valid name.)
Note from this:
The command must be recognized as an assignment at the very beginning of the parsing sequence, before any expansions (or quote removal!) have taken place.
The name must be a valid name. Literal quotes are not part of a valid variable name.
The equals sign must be unquoted. In your second example, the entire string was quoted.
Assignments are recognized before tilde expansion, parameter expansion, command substitution, etc.
Why $var1=foo fails to act as an assignment
As given in the grammar, all characters before the = in an assignment must be valid characters within a variable name for an assignment to be recognized. $ is not a valid character in a name. Because assignments are recognized in step 1 of simple command processing, before expansion takes place, the literal text $var1, not the value of that variable, is used for this matching.
Why 'var=foo' fails to act as an assignment
First, all characters before the = must be valid in variable names, and ' is not valid in a variable name.
Second, an assignment is only recognized if the = is not quoted.
1)
VAR1=VAR2
$VAR1=FOO
You want to use a variable name contained in a variable for the assignment. Bash syntax does not allow this. However, there is an easy workaround :
VAR1=VAR2
declare "$VAR1"=FOO
It works with local and export too.
2)
By using single quotes (double quotes would yield the same result), you are telling Bash that what is inside is a string and to treat it as a single entity. Since it is the first item on the line, Bash tries to find an alias, or shell builtin, or an executable file in its PATH, that would be named VAR=FOO. Not finding it, it tells you there is no such command.
An assignment is not a normal command. To perform an assignment contained in a quote, you would need to use eval, like so :
eval "$VAR1=FOO" # But please don't do that in real life
Most experienced bash programmers would probably tell you to avoid eval, as it has serious drawbacks, and I am giving it as an example just to recommend against its use : while in the example above it would not involve any security risk or error potential because the value of VAR1 is known and safe, there are many cases where an arbitrary (i.e. user-supplied) value could cause a crash or unexpected behavior. Quoting inside an eval statement is also more difficult and reduces readability.
You declare VAR2 earlier in the program, right?
If you are trying to assign the value of VAR2 to VAR1, then you need to make sure and use $ in front of VAR2, like so:
VAR1=$VAR2
That will set the value of VAR2 equal to VAR1, because when you utilize the $, you are saying that value that is stored in the variable. Otherwise it doesn't recognize it as a variable.
Basically, a variable that doesn't have a $ in front of it will be interpreted as a command. Any word will. That's why we have the $ to clarify "hey this is a variable".

Bash command groups: Why do curly braces require a semicolon?

I know the difference in purpose between parentheses () and curly braces {} when grouping commands in bash.
But why does the curly brace construct require a semicolon after the last command, whereas for the parentheses construct, the semicolon is optional?
$ while false; do ( echo "Hello"; echo "Goodbye"; ); done
$ while false; do ( echo "Hello"; echo "Goodbye" ); done
$ while false; do { echo "Hello"; echo "Goodbye"; }; done
$ while false; do { echo "Hello"; echo "Goodbye" }; done
bash: syntax error near unexpected token `done'
$
I'm looking for some insight as to why this is the case. I'm not looking for answers such as "because the documentation says so" or "because it was designed that way". I'd like to know why it was designed this is way. Or maybe if it is just a historical artifact?
This may be observed in at least the following versions of bash:
GNU bash, version 3.00.15(1)-release (x86_64-redhat-linux-gnu)
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin12)
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)
Because { and } are only recognized as special syntax if they are the first word in a command.
There are two important points here, both of which are found in the definitions section of the bash manual. First, is the list of metacharacters:
metacharacter
A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.
That list includes parentheses but not braces (neither curly nor square). Note that it is not a complete list of characters with special meaning to the shell, but it is a complete list of characters which separate tokens. So { and } do not separate tokens, and will only be considered tokens themselves if they are adjacent to a metacharacter, such as a space or a semi-colon.
Although braces are not metacharacters, they are treated specially by the shell in parameter expansion (eg. ${foo}) and brace expansion (eg. foo.{c,h}). Other than that, they are just normal characters. There is no problem with naming a file {ab}, for example, or }{, since those words do not conform to the syntax of either parameter expansion (which requires a $ before the {) or brace expansion (which requires at least one comma between { and }). For that matter, you could use { or } as a filename without ever having to quote the symbols. Similarly, you can call a file if, done or time without having to think about quoting the name.
These latter tokens are "reserved words":
reserved word
A word that has a special meaning to the shell. Most reserved words introduce shell flow control constructs, such as for and while.
The bash manual doesn't contain a complete list of reserved words, which is unfortunate, but they certainly include the Posix-designated:
! { }
case do done elif else
esac fi for if in
then until while
as well as the extensions implemented by bash (and some other shells):
[[ ]]
function select time
These words are not the same as built-ins (such as [), because they are actually part of the shell syntax. The built-ins could be implemented as functions or shell scripts, but reserved words cannot because they change the way that the shell parses the command line.
There is one very important feature of reserved words, which is not actually highlighted in the bash manual but is made very explicit in Posix (from which the above lists of reserved words were taken, except for time):
This recognition [as a reserved word] shall only occur when none of the characters is quoted and when the word is used as:
The first word of a command …
(The full list of places where reserved words is recognized is slightly longer, but the above is a pretty good summary.) In other words, reserved words are only reserved when they are the first word of a command. And, since { and } are reserved words, they are only special syntax if they are the first word in a command.
Example:
ls } # } is not a reserved word. It is an argument to `ls`
ls;} # } is a reserved word; `ls` has no arguments
There is lots more I could write about shell parsing, and bash parsing in particular, but it would rapidly get tedious. (For example, the rule about when # starts a comment and when it is just an ordinary character.) The approximate summary is: "don't try this at home"; really, the only thing which can parse shell commands is a shell. And don't try to make sense of it: it's just a random collection of arbitrary choices and historical anomalies, many but not all based on the need to not break ancient shell scripts with new features.

Resources