I have a question about substrings in Bash, which seem irregular to me. Imagine I have X="Hello World!". Then:
echo ${X:4} # Prints 'o World!'
echo ${X:(-2)} # Prints 'd!'
Why does a positive integer show the whole string except for the specified characters, while a negative characters shows nothing except for the specified characters?
There are 2-ways to offset from end-of-string in bash. The first uses the index in parenthesis as you have:
echo ${X:(-2)} # Prints 'd!'
The second leaves a space after to :
echo ${X: -2} # Prints 'd!'
Both are offset from end-of-string. What is printed is the same for the positive and negative case. Characters are printed from the index to end-of-string. In the negative case, you offset from end-of-string by 2 and then print all remaining characters from that index. (the last 2-chars)
You can prove it to yourself with:
echo ${X: -2:1} # Prints 'd'
Positive indexes are offset from beginning-of-string, negative indexes are offset from end-of-string. In both cases what is printed is the remaining characters in the string unless the number of characters to print is specified following a second colon. (e.g. ${var:index:nchars} )
It works as documented:
${parameter:offset}
${parameter:offset:length}
This is referred to as Substring Expansion. It expands to up to length characters of the value of parameter starting at the character
specified by offset. If parameter is ‘#’, an indexed array subscripted
by ‘#’ or ‘*’, or an associative array name, the results differ as
described below. If length is omitted, it expands to the substring of
the value of parameter starting at the character specified by offset
and extending to the end of the value. length and offset are
arithmetic expressions (see Shell Arithmetic).
If offset evaluates to a number less than zero, the value is used as an offset in characters from the end of the value of parameter.
[..]
(emphasis mine)
So a negative offset is equivalent to its positive offset as below:
pos_offset = len_of_str + neg_offset
i.e. in your example, ${X:(-2)} should behaves as ${X:10} (str_len = 10 and neg_offset = -2, so pos_offset = 10). So both print the whole string starting from the specified index.
Related
str=abcde
echo ${str:0:2} # output: ab
echo ${str::2} # outpub: ab
The code above leads to same result.
As the documentation describes:
This is referred to as Substring Expansion. It expands to up to length characters of the value of parameter starting at the character specified by offset. If parameter is # or *, an indexed array subscripted by # or *, or an associative array name, the results differ as described below. If length is omitted, it expands to the substring of the value of parameter starting at the character specified by offset and extending to the end of the value. length and offset are arithmetic expressions (see Shell Arithmetic).
If offset evaluates to a number less than zero, the value is used as an offset in characters from the end of the value of parameter. If length evaluates to a number less than zero, it is interpreted as an offset in characters from the end of the value of parameter rather than a number of characters, and the expansion is the characters between offset and that result. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion.
There is no description about omitting offset but in fact it can be omitted.
I wonder if there are any documents I haven't noticed.
In the Arithmetic Evaluation section of the manual it is mentioned that a null value is interpreted as 0, and as we know the argument of that parameter expansion is subject to arithmetic evalation.
A shell variable that is null or unset evaluates to 0 when referenced
by name without using the parameter expansion syntax.
A null value evaluates to 0.
This might be the closest reference you can get as to why an empty value evaluates to 0.
However, I still think that allowing empty arguments should still be documented.
When you have a look at man bash you can read:
${parameter:offset}, ${parameter:offset:length} Substring Expansion. Expands to up to length characters of the value of parameter starting at the character specified by offset. If parameter is #, an indexed array subscripted by # or *, or an associative array name, the results differ as described below. If length is omitted, expands to the substring of the value of parameter starting at the character specified by offset and extending to the end of the value. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below).
It must be made clear that it is not possible to omit the offset value. Here, omit implies that also the <colon>-character is missing. From the table below, you see that there is an ambiguity in this case:
offset
omitted offset
length
${parameter:offset:length}
${parameter:length}
omitted length
${parameter:offset}
${parameter}
From a syntactic point of view, you cannot omit length as you can not distinguish whether length or offset is omitted.
It is possible to leave it empty. man bash clearly states that both length and offset are arithmetic expressions and in that section we find.
A null value evaluates to 0.
This entails both unset variables as well as empty expressions:
$ unset v
$ echo $(( v )) $(( ))
0 0
As offset is a arithmetic expression, an empty value will valuate to the same value as $(( )) which is 0.
So the following are all equivalent:
${parameter:0:length} == ${paramter::length}
${parameter:offset:0} == ${parameter:offset:} == ""
${parameter:0:0} == ${parameter::} == ""
I know that one can get the length of an array in bash by doing ${#arrayname[#]}.
My question is: is this just something that I have to memorize, or can this syntax be broken down into understandable parts? For instance, what does the # symbol mean where one would expect to find the index? Why the #?
# at the beginning of a variable reference means to get the length of the variable's value. For a normal variable this means its length in characters. # is the "number" sign, so you can remember this as meaning "the number of things in the variable".
# or * in an array index means to use the whole array, not a specific element, and instead of returning the number of characters, it returns the number of array elements. * is used as a wildcard in many contexts, so this should be easy to remember. Also, $* and $# are used to mean all the arguments to a shell script, so the parallel with all the array elements should be obvious.
You can't just write ${#arrayname} because when you use an array variable without a subscript, it's equivalent to element 0 of the array. So ${#arrayname} is the same as ${#arrayname[0]}, which is the number of characters in the first element of the array.
You should memorize. :) The # usually means number. e.g. the
$# - is the number of arguments
${#str} - length of the string $str
${#arr[#]}" - length (number of elements) of the array arr
${#arr} - the length of the 1st element of the array (like the str above)
Unfortunately the ${parameter#word} or ${parameter##word} has nothing with numbers. (it removes the shortest/longest word from the beginning of the parameter.
And also, the # .... is comment ;)
In general usage of form ${#PARAMETER} returns the length in number of characters and NOT bytes of the parameter's value.
myString="Hello StackOverflow!"
printf "%s\n" "${#myString}"
20
But for arrays, this expansion type has two meanings:
For individual elements, it reports the string length of the element
(as for every "normal" parameter)
For the mass subscripts # and * it
reports the number of set elements in the array
Consider an example over arrays,
myArray=(1 2 3 4 15)
printf "%s\n" "${myArray[#]}" # <-- Gives me list of elements
1
2
3
4
15
printf "%s\n" "${#myArray[#]}" # <-- Gives me number of elements
5
It gets interesting now, the length of the last element 2 can be obtained by doing
printf "%s\n" "${#myArray[4]}"
2
The '#' acts the same way as '*'. Instead of providing a specific index this references the full thing.
The '#' is telling bash you want the length
https://www.cyberciti.biz/faq/finding-bash-shell-array-length-elements/
What do [0] and [1..-1] mean in the following code?
def capitalize(string)
puts "#{string[0].upcase}#{string[1..-1]}"
end
string[0] is a new string that contains the first character of string.
It is, in fact, syntactic sugar for string.[](0), i.e. calling the method String#[] on the String object stored in the variable string with argument 0.
The String#[] method also accepts a Range as argument, to extract a substring. In this case, the lower bound of range is the index where the substring starts and the upper bound is the index where the substring ends. Positive values count the characters from the beginning of the string (starting with 0), negative values count the characters from the end of the string (-1 denotes the last character).
The call string[1..-1] (string.[](1..-1)) returns a new string that is initialized with the substring of string that starts with the second character of string (1) and ends with its last character.
Put together, string[0].upcase is the uppercase version of the first character of string, string[1..-1] is the rest of string (everything but the first character).
Read more about different ways to access individual characters and substrings in strings using String#[] method.
Inside the function of a shell script I see something like this
func() {
local x
x=${1:3:1}
...
}
What does x=${1:3:1} mean? I know that $1, $2 and $3 are arguments of the function. So does the above statement mean that x = $1:$2:$3?
This is called parameter expansion in shell.
${PARAMETER:OFFSET:LENGTH}
This one can expand only a part of a parameter's value, given a position to start and maybe a length. If LENGTH is omitted, the parameter will be expanded up to the end of the string. If LENGTH is negative, it's taken as a second offset into the string, counting from the end of the string.
OFFSET and LENGTH can be any arithmetic expression. The OFFSET starts at 0, not at 1.
e.g lets say the parameter is a string,
MYSTRING = "Be liberal in what you accept, and conservative in what you send"
echo ${MYSTRING:34:13}
The above will give you the following
conservative
as it will count the 33th(index start at 0) character which will start with the character "c" and then count (13 charcter) length .
So in your case $1 is the parameter you pass to your script and then it offsets 3 characters of that and take a string of length 1 and initialize it to x.
Read more here : http://wiki.bash-hackers.org/syntax/pe#substring_expansion
It is a GNU shell parameter expansion, part of many that start with ${.
Like ${parameter:-word}, ${parameter:=word}, ${parameter:?word}, ${parameter:+word} and several others.
This one (specific to ksh, bash and zsh): ${parameter:offset:length} extracts lenght characters (optional, if missing, the rest of the string in parameter) starting at offset. With several details described in the bash manual.
${name:offset:length}
Substring Expansion. Expands to up to length characters of the value of parameter starting at the character specified by offset. If parameter is #, an indexed array subscripted by # or *, or an associative array name, the results differ as described below. If length is omitted, expands to the substring of the value of parameter starting at the character specified by offset and extending to the end of the value. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below).
If offset evaluates to a number less than zero, the value is used as an offset in characters from the end of the value of parameter. If length evaluates to a number less than zero, it is interpreted as an offset in characters from the end of the value of parameter rather than a number of characters, and the expansion is the characters between offset and that result. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion.
If parameter is #, the result is length positional parameters beginning at offset. A negative offset is taken relative to one greater than the greatest positional parameter, so an offset of -1 evaluates to the last positional parameter. It is an expansion error if length evaluates to a number less than zero.
If parameter is an indexed array name subscripted by # or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. It is an expansion error if length evaluates to a number less than zero.
Substring expansion applied to an associative array produces undefined results.
Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used, $0 is prefixed to the list.
Use the manual page, all information is in there. man bash:
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of the
value of parameter starting at the character specified by off‐
set. If parameter is #, an indexed array subscripted by # or *,
or an associative array name, the results differ as described
below. If length is omitted, expands to the substring of the
value of parameter starting at the character specified by offset
and extending to the end of the value. length and offset are
arithmetic expressions (see ARITHMETIC EVALUATION below).
What does x=${1:3:1} mean?
It's substring cut, and in English: using the string in $1, pull 1 character starting at index 3 (where indexes are 0-based). So if $1 === "foobar", then ${1:3:1} === "b".
I know that $1, $2 and $3 are arguments of the function. So does the above statement mean that x = $1:$2:$3?
No, adjacency represents string concatenation: x="$1$2$3" is the result of concatenating the strings in $1, $2, and $3.
Also, it is really helpful if someone can suggest on how do I google search for speacial characters like this? Any standard keywords? I tried searching 'what is ":" in shell scripts' etc.. But the results are random when trying to search for special characters.
bash parameter substitution usually gets you in the ballpark. I know I can't remember all the different syntax ways bash can fiddle with the data, so committing "parameter substitution" to memory pays off. String manipulation happens to be the chapter before parameter substitution.
Try this:
set ABCDEFG
echo ${1:3:1}
It is getting a substring. In general ${} refers to an array variable (in this case array of chars)
I found this snippet online and the purpose is to commify any number including numbers with decimals ... 99999999 => 99,999,999. I can see that it uses regex but I am confused by "$1.reverse, $2"
def commify(n)
n.to_s =~ /([^\.]*)(\..*)?/
int, dec = $1.reverse, $2 ? $2 : ""
while int.gsub!(/(,|\.|^)(\d{3})(\d)/, '\1\2,\3')
end
int.reverse + dec
end
Can anyone explain what is going on in this code?
$1, $2, $3 ... are Perl legacy. They are capture group variables, that is, they capture the groups inside the regular expression.
A named group is indicated by parentheses. So, the first capture group matches ([^\.]), which is any non dot character, and (\..*) matches a dot character \. and any other characters after it.
Note that the second group is optional, so in the line below you have the ternary expression $2 ? $2 : "", which is a crypty-ish way to get either the value of the capture of a blank string.
The int, dec = $1, $2_or_blank_string is a parallel assignment. Ruby supports assigning more than one variable at once, it's not different than doing int = $1.reversed then dec = $2 So int now holds the integer part (reversed) and dec the decimal part of the number. We are interested in the first one for now.
The next empty while does a string substitution. The method gsub! replaces all occurences of the regular expression for the value in the seconf argument. But it returns nil if no change happened, which ends the while.
The /(,|\.|^)(\d{3})(\d)/ expression matches:
(,|\.|^) A comma, a point or the beginning of the string
(\d{3}) Three digits
(\d) A fourth digit
Then replaces it for \1\2,\3. The \n in a string substitution mean the nth capture group, just as the $n variables do. So, it basically does: if I have four digits, just add a comma after the third one. Repeat until no group of four digits is found
Then, just reverse the integer part again and append the decimal part.
n.to_s =~ /([^\.]*)(\..*)?/ takes the number as a string and stores everything before the decimal point (or simply everything if there is no decimal point) in $1 and everything after and including it in $2.
int, dec = $1.reverse, $2 ? $2 : "" stores the reverse of $1 in int and $2, or "" if $2 is nil, in dec. In other words int now contains the part before the decimal point reversed and dec contains the part after the point (not reversed).
The next line inserts a comma every three places into int. So by reversing int again we get the original integral part of the number with commas inserted every three places from the end. Now we add dec again at the end and get the original number with commas at the right places.
Another way:
class Integer
def commify
self.to_s.gsub(/(\d)(?=(\d{3})+$)/,'\1,')
end
end
Then you can do 12345678.commify and get the string 12,345,678
And here's one that handles floating point numbers:
class Float
def commify
self.to_s.reverse.gsub(/(\d\d\d)(?=\d)(?!\d*\.)/,'\1,').reverse
end
end