Can Bash substring offset be omitted? - bash

str=abcde
echo ${str:0:2} # output: ab
echo ${str::2} # outpub: ab
The code above leads to same result.
As the documentation describes:
This is referred to as Substring Expansion. It expands to up to length characters of the value of parameter starting at the character specified by offset. If parameter is # or *, an indexed array subscripted by # or *, or an associative array name, the results differ as described below. If length is omitted, it expands to the substring of the value of parameter starting at the character specified by offset and extending to the end of the value. length and offset are arithmetic expressions (see Shell Arithmetic).
If offset evaluates to a number less than zero, the value is used as an offset in characters from the end of the value of parameter. If length evaluates to a number less than zero, it is interpreted as an offset in characters from the end of the value of parameter rather than a number of characters, and the expansion is the characters between offset and that result. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion.
There is no description about omitting offset but in fact it can be omitted.
I wonder if there are any documents I haven't noticed.

In the Arithmetic Evaluation section of the manual it is mentioned that a null value is interpreted as 0, and as we know the argument of that parameter expansion is subject to arithmetic evalation.
A shell variable that is null or unset evaluates to 0 when referenced
by name without using the parameter expansion syntax.
A null value evaluates to 0.
This might be the closest reference you can get as to why an empty value evaluates to 0.
However, I still think that allowing empty arguments should still be documented.

When you have a look at man bash you can read:
${parameter:offset}, ${parameter:offset:length} Substring Expansion. Expands to up to length characters of the value of parameter starting at the character specified by offset. If parameter is #, an indexed array subscripted by # or *, or an associative array name, the results differ as described below. If length is omitted, expands to the substring of the value of parameter starting at the character specified by offset and extending to the end of the value. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below).
It must be made clear that it is not possible to omit the offset value. Here, omit implies that also the <colon>-character is missing. From the table below, you see that there is an ambiguity in this case:
offset
omitted offset
length
${parameter:offset:length}
${parameter:length}
omitted length
${parameter:offset}
${parameter}
From a syntactic point of view, you cannot omit length as you can not distinguish whether length or offset is omitted.
It is possible to leave it empty. man bash clearly states that both length and offset are arithmetic expressions and in that section we find.
A null value evaluates to 0.
This entails both unset variables as well as empty expressions:
$ unset v
$ echo $(( v )) $(( ))
0 0
As offset is a arithmetic expression, an empty value will valuate to the same value as $(( )) which is 0.
So the following are all equivalent:
${parameter:0:length} == ${paramter::length}
${parameter:offset:0} == ${parameter:offset:} == ""
${parameter:0:0} == ${parameter::} == ""

Related

What does the # symbol mean in this bash for loop? [duplicate]

I know that one can get the length of an array in bash by doing ${#arrayname[#]}.
My question is: is this just something that I have to memorize, or can this syntax be broken down into understandable parts? For instance, what does the # symbol mean where one would expect to find the index? Why the #?
# at the beginning of a variable reference means to get the length of the variable's value. For a normal variable this means its length in characters. # is the "number" sign, so you can remember this as meaning "the number of things in the variable".
# or * in an array index means to use the whole array, not a specific element, and instead of returning the number of characters, it returns the number of array elements. * is used as a wildcard in many contexts, so this should be easy to remember. Also, $* and $# are used to mean all the arguments to a shell script, so the parallel with all the array elements should be obvious.
You can't just write ${#arrayname} because when you use an array variable without a subscript, it's equivalent to element 0 of the array. So ${#arrayname} is the same as ${#arrayname[0]}, which is the number of characters in the first element of the array.
You should memorize. :) The # usually means number. e.g. the
$# - is the number of arguments
${#str} - length of the string $str
${#arr[#]}" - length (number of elements) of the array arr
${#arr} - the length of the 1st element of the array (like the str above)
Unfortunately the ${parameter#word} or ${parameter##word} has nothing with numbers. (it removes the shortest/longest word from the beginning of the parameter.
And also, the # .... is comment ;)
In general usage of form ${#PARAMETER} returns the length in number of characters and NOT bytes of the parameter's value.
myString="Hello StackOverflow!"
printf "%s\n" "${#myString}"
20
But for arrays, this expansion type has two meanings:
For individual elements, it reports the string length of the element
(as for every "normal" parameter)
For the mass subscripts # and * it
reports the number of set elements in the array
Consider an example over arrays,
myArray=(1 2 3 4 15)
printf "%s\n" "${myArray[#]}" # <-- Gives me list of elements
1
2
3
4
15
printf "%s\n" "${#myArray[#]}" # <-- Gives me number of elements
5
It gets interesting now, the length of the last element 2 can be obtained by doing
printf "%s\n" "${#myArray[4]}"
2
The '#' acts the same way as '*'. Instead of providing a specific index this references the full thing.
The '#' is telling bash you want the length
https://www.cyberciti.biz/faq/finding-bash-shell-array-length-elements/

What does "${var:x:y}" mean in Bash?

Inside the function of a shell script I see something like this
func() {
local x
x=${1:3:1}
...
}
What does x=${1:3:1} mean? I know that $1, $2 and $3 are arguments of the function. So does the above statement mean that x = $1:$2:$3?
This is called parameter expansion in shell.
${PARAMETER:OFFSET:LENGTH}
This one can expand only a part of a parameter's value, given a position to start and maybe a length. If LENGTH is omitted, the parameter will be expanded up to the end of the string. If LENGTH is negative, it's taken as a second offset into the string, counting from the end of the string.
OFFSET and LENGTH can be any arithmetic expression. The OFFSET starts at 0, not at 1.
e.g lets say the parameter is a string,
MYSTRING = "Be liberal in what you accept, and conservative in what you send"
echo ${MYSTRING:34:13}
The above will give you the following
conservative
as it will count the 33th(index start at 0) character which will start with the character "c" and then count (13 charcter) length .
So in your case $1 is the parameter you pass to your script and then it offsets 3 characters of that and take a string of length 1 and initialize it to x.
Read more here : http://wiki.bash-hackers.org/syntax/pe#substring_expansion
It is a GNU shell parameter expansion, part of many that start with ${.
Like ${parameter:-word}, ${parameter:=word}, ${parameter:?word}, ${parameter:+word} and several others.
This one (specific to ksh, bash and zsh): ${parameter:offset:length} extracts lenght characters (optional, if missing, the rest of the string in parameter) starting at offset. With several details described in the bash manual.
${name:offset:length}
Substring Expansion. Expands to up to length characters of the value of parameter starting at the character specified by offset. If parameter is #, an indexed array subscripted by # or *, or an associative array name, the results differ as described below. If length is omitted, expands to the substring of the value of parameter starting at the character specified by offset and extending to the end of the value. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below).
If offset evaluates to a number less than zero, the value is used as an offset in characters from the end of the value of parameter. If length evaluates to a number less than zero, it is interpreted as an offset in characters from the end of the value of parameter rather than a number of characters, and the expansion is the characters between offset and that result. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion.
If parameter is #, the result is length positional parameters beginning at offset. A negative offset is taken relative to one greater than the greatest positional parameter, so an offset of -1 evaluates to the last positional parameter. It is an expansion error if length evaluates to a number less than zero.
If parameter is an indexed array name subscripted by # or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. It is an expansion error if length evaluates to a number less than zero.
Substring expansion applied to an associative array produces undefined results.
Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used, $0 is prefixed to the list.
Use the manual page, all information is in there. man bash:
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of the
value of parameter starting at the character specified by off‐
set. If parameter is #, an indexed array subscripted by # or *,
or an associative array name, the results differ as described
below. If length is omitted, expands to the substring of the
value of parameter starting at the character specified by offset
and extending to the end of the value. length and offset are
arithmetic expressions (see ARITHMETIC EVALUATION below).
What does x=${1:3:1} mean?
It's substring cut, and in English: using the string in $1, pull 1 character starting at index 3 (where indexes are 0-based). So if $1 === "foobar", then ${1:3:1} === "b".
I know that $1, $2 and $3 are arguments of the function. So does the above statement mean that x = $1:$2:$3?
No, adjacency represents string concatenation: x="$1$2$3" is the result of concatenating the strings in $1, $2, and $3.
Also, it is really helpful if someone can suggest on how do I google search for speacial characters like this? Any standard keywords? I tried searching 'what is ":" in shell scripts' etc.. But the results are random when trying to search for special characters.
bash parameter substitution usually gets you in the ballpark. I know I can't remember all the different syntax ways bash can fiddle with the data, so committing "parameter substitution" to memory pays off. String manipulation happens to be the chapter before parameter substitution.
Try this:
set ABCDEFG
echo ${1:3:1}
It is getting a substring. In general ${} refers to an array variable (in this case array of chars)

why integer result in 0 when I change declared integer to string in Bash script

Code:
#!/bin/bash
declare -i number
# The script will treat subsequent occurrences of "number" as an integer.
number=3
echo "Number = $number" # Number = 3
number=three
echo "Number = $number" # Number = 0
# Tries to evaluate the string "three" as an integer.
I cannot figure out why number changed when I assign a string "three" to number. I think number should stay the same. That really surprised me.
From the declare section of man bash:
-i The variable is treated as an integer; arithmetic evaluation (see ARITHMETIC EVALUATION) is performed when the variable is assigned a value.
From the ARITHMETIC EVALUATION section of man bash:
The value of a variable is evaluated as an arithmetic expression when...a variable which has been given the integer attribute using declare -i is assigned a value. A null value evaluates to 0.
Together, these clearly state that the behavior you're seeing is the expected behavior. When the characters t h r e e are evaluated arithmetically, the resulting null value is evaluated as 0, which is then assigned to the variable number.
All assignments in bash are interpreted first as strings. number=10 interprets the 1 0 as a string first, recognizes it as a valid integer, and leaves it as-is. number=three is just as syntactically and semantically valid as number=10, which is why your script continues without any error after assigning the evaluated value of 0 to number.

Bash: Irregular behavior of substrings?

I have a question about substrings in Bash, which seem irregular to me. Imagine I have X="Hello World!". Then:
echo ${X:4} # Prints 'o World!'
echo ${X:(-2)} # Prints 'd!'
Why does a positive integer show the whole string except for the specified characters, while a negative characters shows nothing except for the specified characters?
There are 2-ways to offset from end-of-string in bash. The first uses the index in parenthesis as you have:
echo ${X:(-2)} # Prints 'd!'
The second leaves a space after to :
echo ${X: -2} # Prints 'd!'
Both are offset from end-of-string. What is printed is the same for the positive and negative case. Characters are printed from the index to end-of-string. In the negative case, you offset from end-of-string by 2 and then print all remaining characters from that index. (the last 2-chars)
You can prove it to yourself with:
echo ${X: -2:1} # Prints 'd'
Positive indexes are offset from beginning-of-string, negative indexes are offset from end-of-string. In both cases what is printed is the remaining characters in the string unless the number of characters to print is specified following a second colon. (e.g. ${var:index:nchars} )
It works as documented:
${parameter:offset}
${parameter:offset:length}
This is referred to as Substring Expansion. It expands to up to length characters of the value of parameter starting at the character
specified by offset. If parameter is ‘#’, an indexed array subscripted
by ‘#’ or ‘*’, or an associative array name, the results differ as
described below. If length is omitted, it expands to the substring of
the value of parameter starting at the character specified by offset
and extending to the end of the value. length and offset are
arithmetic expressions (see Shell Arithmetic).
If offset evaluates to a number less than zero, the value is used as an offset in characters from the end of the value of parameter.
[..]
(emphasis mine)
So a negative offset is equivalent to its positive offset as below:
pos_offset = len_of_str + neg_offset
i.e. in your example, ${X:(-2)} should behaves as ${X:10} (str_len = 10 and neg_offset = -2, so pos_offset = 10). So both print the whole string starting from the specified index.

Complicated bash variable syntax

In one bash script i found the next construction:
if [[ "${xvar[id]:0:${#cnt}}" != "$cnt" ]]; then
Can someone explain what the above condition does?
The complicated expression is: ${xvar[id]:0:${#cnt}}.
$xvar must be an array, possibly associative. If it is associative, the part ${xvar[id]} refers to the element of the array identified by the string 'id'; if not, then it refers to the element indexed by variable $id (you're allowed to omit the nested $), as noted by chepner in a comment.
The ${xxx:0:${#cnt}} part of the expression refers to a substring from offset 0 to the length of the variable $cnt (so ${#cnt} is the length of the string in the variable $cnt).
All in all, the test checks whether the first characters of ${xvar[id]} are the same as the value of $cnt, so is the value in $cnt a prefix of the value in ${xvar[id]}.

Resources