Assign last character of string to variable in bash script [duplicate] - bash

I found out that with ${string:0:3} one can access the first 3 characters of a string. Is there a equivalently easy method to access the last three characters?

Last three characters of string:
${string: -3}
or
${string:(-3)}
(mind the space between : and -3 in the first form).
Please refer to the Shell Parameter Expansion in the reference manual:
${parameter:offset}
${parameter:offset:length}
Expands to up to length characters of parameter starting at the character
specified by offset. If length is omitted, expands to the substring of parameter
starting at the character specified by offset. length and offset are arithmetic
expressions (see Shell Arithmetic). This is referred to as Substring Expansion.
If offset evaluates to a number less than zero, the value is used as an offset
from the end of the value of parameter. If length evaluates to a number less than
zero, and parameter is not ‘#’ and not an indexed or associative array, it is
interpreted as an offset from the end of the value of parameter rather than a
number of characters, and the expansion is the characters between the two
offsets. If parameter is ‘#’, the result is length positional parameters
beginning at offset. If parameter is an indexed array name subscripted by ‘#’ or
‘*’, the result is the length members of the array beginning with
${parameter[offset]}. A negative offset is taken relative to one greater than the
maximum index of the specified array. Substring expansion applied to an
associative array produces undefined results.
Note that a negative offset must be separated from the colon by at least one
space to avoid being confused with the ‘:-’ expansion. Substring indexing is
zero-based unless the positional parameters are used, in which case the indexing
starts at 1 by default. If offset is 0, and the positional parameters are used,
$# is prefixed to the list.
Since this answer gets a few regular views, let me add a possibility to address John Rix's comment; as he mentions, if your string has length less than 3, ${string: -3} expands to the empty string. If, in this case, you want the expansion of string, you may use:
${string:${#string}<3?0:-3}
This uses the ?: ternary if operator, that may be used in Shell Arithmetic; since as documented, the offset is an arithmetic expression, this is valid.
Update for a POSIX-compliant solution
The previous part gives the best option when using Bash. If you want to target POSIX shells, here's an option (that doesn't use pipes or external tools like cut):
# New variable with 3 last characters removed
prefix=${string%???}
# The new string is obtained by removing the prefix a from string
newstring=${string#"$prefix"}
One of the main things to observe here is the use of quoting for prefix inside the parameter expansion. This is mentioned in the POSIX ref (at the end of the section):
The following four varieties of parameter expansion provide for substring processing. In each case, pattern matching notation (see Pattern Matching Notation), rather than regular expression notation, shall be used to evaluate the patterns. If parameter is '#', '*', or '#', the result of the expansion is unspecified. If parameter is unset and set -u is in effect, the expansion shall fail. Enclosing the full parameter expansion string in double-quotes shall not cause the following four varieties of pattern characters to be quoted, whereas quoting characters within the braces shall have this effect. In each variety, if word is omitted, the empty pattern shall be used.
This is important if your string contains special characters. E.g. (in dash),
$ string="hello*ext"
$ prefix=${string%???}
$ # Without quotes (WRONG)
$ echo "${string#$prefix}"
*ext
$ # With quotes (CORRECT)
$ echo "${string#"$prefix"}"
ext
Of course, this is usable only when then number of characters is known in advance, as you have to hardcode the number of ? in the parameter expansion; but when it's the case, it's a good portable solution.

You can use tail:
$ foo="1234567890"
$ echo -n $foo | tail -c 3
890
A somewhat roundabout way to get the last three characters would be to say:
echo $foo | rev | cut -c1-3 | rev

Another workaround is to use grep -o with a little regex magic to get three chars followed by the end of line:
$ foo=1234567890
$ echo $foo | grep -o ...$
890
To make it optionally get the 1 to 3 last chars, in case of strings with less than 3 chars, you can use egrep with this regex:
$ echo a | egrep -o '.{1,3}$'
a
$ echo ab | egrep -o '.{1,3}$'
ab
$ echo abc | egrep -o '.{1,3}$'
abc
$ echo abcd | egrep -o '.{1,3}$'
bcd
You can also use different ranges, such as 5,10 to get the last five to ten chars.

1. Generalized Substring
To generalise the question and the answer of gniourf_gniourf (as this is what I was searching for), if you want to cut a range of characters from, say, 7th from the end to 3rd from the end, you can use this syntax:
${string: -7:4}
Where 4 is the length of course (7-3).
2. Alternative using cut
In addition, while the solution of gniourf_gniourf is obviously the best and neatest, I just wanted to add an alternative solution using cut:
echo $string | cut -c $((${#string}-2))-
Here, ${#string} is the length of the string, and the trailing "-" means cut to the end.
3. Alternative using awk
This solution instead uses the substring function of awk to select a substring which has the syntax substr(string, start, length) going to the end if the length is omitted. length($string)-2) thus picks up the last three characters.
echo $string | awk '{print substr($1,length($1)-2) }'

Related

Understanding Bash Script argument expressions

With the help of a number of web searches, SO questions and answers, and trial-and-error I have written the following script to send attachments to an email.
attachments=""
subject=""
args=( "$#" ) # Copy arguments
recipient="${#: -1}" # Last argument
unset "args[${#args[#]}-1]" # Remove last argument
for i in "${args[#]}"; do # Remaining Arguments
attachments="$attachments -a $i"
subject="$subject $i"
done
eval "echo 'See Attached …' | mail -r 'Fred <fred#example.net>' $attachments -s \"Attached: $subject\" $recipient"
It works perfectly using something like
send.sh file1 file2 file3 recipient#example.com
I have omitted some of the refinements in the above code, such as error checking, but the whole thing works as planned.
I have no trouble with the process, and I have good programming skills. However I find that Bash scripting is like medieval Latin to me, and I an having a hard time understanding the four expressions which I have commented.
The idea is that I pop the last argument, which is supposed to be the recipient, and loop through the remaining arguments which will be attached files.
Can anybody detail the meanings of the expressions $#, ${#: -1}, ${args[#]}, and args[${#args[#]}-1], and explain what the hash is doing in the last expression?
No doubt the script could stand some improvement, but I only trying to understand what is happening so far.
It's all in bash manual shell parameter expansion and some in bash special parameters. So:
Can anybody detail the meanings of the expressions $#
From the manual, important parts:
$#
($#) Expands to the positional parameters, starting from one. [...]
if not within double quotes, these words are subject to word splitting. In contexts where word splitting is not performed, this expands to a single word with each positional parameter separated by a space. When the expansion occurs within double quotes, and word splitting is performed, each parameter expands to a separate word. That is, "$#" is equivalent to "$1" "$2" ...
So "$#" is equal to "$1" "$2" "$3" ... for each parameter passed. Word splitting is that thing that when a variable is not quoted, it splits argument on spaces, like: a="arg1 arg2 arg3"; f $a runs f with 3 arguments.
${#: -1}
From shell parameter expansion:
${parameter:offset}
${parameter:offset:length}
It expands to up to length characters of the value of parameter starting at the character specified by offset [...]
If parameter is ‘#’, the result is length positional parameters beginning at offset. A negative offset is taken relative to one greater than the greatest positional parameter, so an offset of -1 evaluates to the last positional parameter.
So ${#: -1} is the last positional argument passed to the script. The additional space is there because ${parameter:-word} means something different.
${args[#]}
From bash manual arrays:
Any element of an array may be referenced using ${name[subscript]}. The braces are required to avoid conflicts with the shell’s filename expansion operators. If the subscript is ‘#’ or ‘*’, the word expands to all members of the array name.
${args[#]} is equal to ${args[1]} ${args[2]} ${args[3]}. Note that without quotes word splitting is performed. In your code you have for i in "${args[#]}" - words are preserved.
args[${#args[#]}-1]
From bash manual shell parameter expansion:
${#parameter}
If parameter is an array name subscripted by ‘*’ or ‘#’, the value substituted is the number of elements in the array.
So ${#args[#]} expands to the count of elements in an array. The count of elements -1 is the index of last element. So args[${#args[#]}-1] is args[<the index of last array element>]. The unset "args[${#args[#]}-1]" is used to remove last array element.
explain what the hash is doing in the last expression?
The hash is there to trigger proper variable expansion.
what ( "$#" ) is doing.
From manual:
Arrays are assigned to using compound assignments of the form
name=(value1 value2 … )
The var=("$#") creates an array var with the copy of positional parameters properly expanded with words preserved.
Everything is explained somewhere in the Bash Manual
$# in Special Parameters
Expands to the positional parameters, starting from one. In contexts where word splitting is performed, this expands each positional parameter to a separate word; if not within double quotes, these words are subject to word splitting.
${#: -1} in Shell Parameter Expansion
${parameter:offset}
${parameter:offset:length}
... If parameter is ‘#’, the result is length positional parameters beginning at offset. A negative offset is taken relative to one greater than the greatest positional parameter, so an offset of -1 evaluates to the last positional parameter.
${args[#]} in Arrays
Any element of an array may be referenced using ${name[subscript]}. The braces are required to avoid conflicts with the shell’s filename expansion operators. If the subscript is ‘#’ or ‘*’, the word expands to all members of the array name.
args[${#args[#]}-1] also in Arrays:
${#name[subscript]} expands to the length of ${name[subscript]}. If subscript is ‘#’ or ‘*’, the expansion is the number of elements in the array.

How to keep/remove numbers in a variable in shell?

I have a variable such as:
disk=/dev/sda1
I want to extract:
only the non numeric part (i.e. /dev/sda)
only the numeric part (i.e. 1)
I'm gonna use it in a script where I need the disk and the partition number.
How can I do that in shell (bash and zsh mostly)?
I was thinking about using Shell parameters expansions, but couldn't find working patterns in the documentation.
Basically, I tried:
echo ${disk##[:alpha:]}
and
echo ${disk##[:digit:]}
But none worked. Both returned /dev/sda1
With bash and zsh and Parameter Expansion:
disk="/dev/sda12"
echo "${disk//[0-9]/} ${disk//[^0-9]/}"
Output:
/dev/sda 12
The expansions kind-of work the other way round. With [:digit:] you will match only a single digit. You need to match everything up until, or from a digit, so you need to use *.
The following looks ok:
$ echo ${disk%%[0-9]*} ${disk##*[^0-9]}
/dev/sda 1
To use [:digit:] you need double braces, cause the character class is [:class:] and it itself has to be inside [ ]. That's why I prefer 0-9, less typing*. The following is the same as above:
echo ${disk%%[[:digit:]]*} ${disk##*[^[:digit:]]}
* - Theoretically they may be not equal, as [0-9] can be affected by the current locale, so it may be not equal to [0123456789], but to something different.
You have to be careful when using patterns in parameter substitution. These patterns are not regular expressions but pathname expansion patterns, or glob patterns.
The idea is to remove the last number, so you want to make use of Remove matching suffix pattern (${parameter%%word}). Here we remove the longest instance of the matched pattern described by word. Representing single digit numbers is easily done by using the pattern [0-9], however, multi-digit numbers is harder. For this you need to use extended glob expressions:
*(pattern-list): Matches zero or more occurrences of the given patterns
So if you want to remove the last number, you use:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
0 /dev/dsk/c0t2d0s
We have to use ${disk#${disk%%*([0-9])}} to remove the prefix. It essentially searches the last number, removes it, uses the remainder and remove that part again.
You can also make use of pattern substitution (${parameter/pattern/string}) with the anchors % and # to anchor the pattern to the begin or end of the parameter. (see man bash for more information). This is completely equivalent to the previous solution:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
0 /dev/dsk/c0t2d0s

bash shell reworking variable replace dots by underscore

I can't see to get it working :
echo $VERSIONNUMBER
i get : v0.9.3-beta
VERSIONNUMBERNAME=${VERSIONNUMBER:1}
echo $VERSIONNUMBERNAME
I get : 0.9.3-beta
VERSION=${VERSIONNUMBERNAME/./_}
echo $VERSION
I get : 0_9.3-beta
I want to have : 0_9_3-beta
I've been googling my brains out I can't make heads or tails of it.
Ideally I'd like to remove the v and replace the periods with underscores in one line.
Let's create your variables:
$ VERSIONNUMBER=v0.9.3-beta
$ VERSIONNUMBERNAME=${VERSIONNUMBER:1}
This form only replaces the first occurrence of .:
$ echo "${VERSIONNUMBERNAME/./_}"
0_9.3-beta
To replace all occurrences of ., use:
$ echo "${VERSIONNUMBERNAME//./_}"
0_9_3-beta
Because this approach avoids the creation of pipelines and subshells and the use of external executables, this approach is efficient. This approach is also unicode-safe.
Documentation
From man bash:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern
just as in pathname expansion. Parameter is expanded and the longest
match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are replaced
with string. Normally only the first match is replaced. If
pattern begins with #, it must match at the beginning of the expanded
value of parameter. If pattern begins with %, it must match at the
end of the expanded value of parameter. If string
is null, matches of pattern are deleted and the / following pattern
may be omitted. If the nocasematch shell option is enabled, the
match is performed without regard to the case of alphabetic
characters. If parameter is # or *, the substitution operation is
applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with # or *, the substitution operation is
applied to each member of the array in turn, and the expansion is the
resultant list.
(Emphasis added.)
You can combine pattern substitution with tr:
VERSION=$( echo ${VERSIONNUMBER:1} | tr '.' '_' )

Using Interval expressions with bash extended globbing

I know for a fact, that bash supports extended glob with a regular expression like support for #(foo|bar), *(foo) and ?(foo). This syntax is quite unique i.e. different from that of EREs -- extended globs use a prefix notation (where the operator appears before its operands), rather than postfix like EREs.
I'm wondering does it support the interval expressions feature of type {n,m} i.e. if there is one number in the braces, the preceding regexp is repeated n times or if there are two numbers separated by a comma, the preceding regexp is repeated n to m times. I couldn't find a particular documentation that suggests this support enabled in extended glob.
Actual Question
I came across a requirement in one of the questions today, to remove only a pair of trailing zeroes in a string. Trying to solve this with the extended glob support in bash
Given some sample strings like
foobar0000
foobar00
foobar000
should produce
foobar00
foobar
foobar0
I tried using extended glob with parameter expansion to do
x='foobar000'
respectively. I tried using the interval expression as below which seemed obvious to me that it wouldn't work
echo ${x%%+([0]{2})}
i.e. similar using sed in ERE as sed -E 's/[0]{2}$//' or in BRE as sed 's/[0]\{2\}$//'
So my question being, is this possible using any of the extended glob operators? I'm looking for answers specific to using the extended glob support in bash would take 'No' if not possible too.
Somehow I managed to find a way to do this within the confinements of bash.
Are interval glob-expressions implemented in bash?
No! In contrast to other shells such as ksh and zsh, bash did not implement interval expressions for globbing.
Can we mimic interval expressions in bash?
Yes! However, it is not really practical and could sometimes benefit by using printf. The idea is to build the globular expression that mimics the {m,n} interval using the KSH-globs #(pattern) and ?(pattern).
In the explanation below, we assume that the pattern is stored in variable p
Match n occurrences of the given pattern ({n}):
The idea is to repeat the pattern n times. For large n you can use printf
$ var="foobar01010"
$ echo ${var%%#(0|1)#(0|1)}
foobar000
or
$ var="foobar01010"
$ p=$(printf "#(0|1)%.0s" {1..4})
$ echo ${var%%$p}
foobar0
Match at least m occurrences of the given pattern ({m,}):
It is the same as before, but with an additional *(pattern)
$ var="foobar01010"
$ echo ${var%%#(0|1)#(0|1)*(0|1)}
foobar
or
$ var="foobar01010"
$ p="(0|1)"
$ q=$(printf "#$p%.0s" {1..4})
$ echo ${var%%$q*$p}
foobar
Match from n to m occurrences of the given pattern ({m,n}):
The interval expression {n,m} implies we have for sure n appearances and m-n possible appearances. These can be constructed using the ksh-globs #(pat) n times and ?(pat) m-n times. For n=2 and m=3, this leads to:
$ var="foobar01010"
$ echo ${var%%#(0|1)#(0|1)?(0|1)}
foobar010
or
$ p="(0|1)"
$ q=$(printf "#$p%.0s" {1..n})$(printf "?$p%.0s" {n+1..m})
$ echo ${var%%$q}
foobar010
$ var="foobar00200"
foobar002
$ var="foobar00020"
foobar00020
Another way to construct the interval expression {n,m} is using the ksh-glob anything but pattern written as !(pat) which allows us to say: give me all, except...
man bash:
!(pattern-list): Matches anything except one of the given patterns
This way we can write
$ echo ${var%%!(!(*$p)|#$p#$p#$p+$p|?$p)}
or
$ p="(0|1)"
$ pn=$(printf "#$p%.0s" {1..n})
$ pm=$(printf "?$p%.0s" {1..m-1})
$ echo ${var%%!(!(*$p)|$pn+$p|$pm)}
note: you need to do a double exclusion here due to the or (|) in the pattern list.
What about other shells?
KSH93
The interval expression {n,m} has been implemented in ksh93:
man ksh:
{n}(pattern-list) Matches n occurrences of the given patterns.
{m,n}(pattern-list) Matches from m to n occurrences of the given patterns. If m is omitted, 0 will be used. If n is omitted at least m occurrences will be matched.
$ echo ${var%%{2,3}(0|1)}
ZSH
Also zsh has a form of interval expression. It is a globbing flag which is part of the EXTENDED_GLOB option:
man zshall:
(#cN,M) The flag (#cN,M) can be used anywhere that the # or ## operators can be used except in the expressions (*/)# and (*/)## in filename generation, where / has special meaning; it cannot be combined with other globbing flags and a bad pattern error occurs if it is misplaced. It is equivalent to the form {N,M} in regular expressions. The previous character or group is required to match between N and M times, inclusive. The form
(#cN) requires exactly N matches; (#c,M) is equivalent to specifying N as 0; (#cN,) specifies that there is no maximum limit on the number of matches.
$ echo ${var%%(0|1)(#c2,3)}
No
"Extended pattern matching features" is enabled using extglob (thus we call that extended glob). Extended pattern matching features are used in an operation called pattern matching. Pattern matching is used in filename expansion and in [[...]] conditional constructs when using = or != operators. Filename expansion is used in parameter expansion.
As you can see in pattern matching, extended glob or not, pattern matching does not support expressions like [set]{count}. We can for example match one or more occurrences with +(..) and so on, but specifying the number of occurrences of a pattern is not possible.
But this is bash and bash is powerful. We can specify the number of occurrences of a pattern simply by repeating the pattern. We cannot specify the ending or the beginning (I mean like using ^ and $ in regex), but we can use ${parameter%%word} parameter expansions to remove the trailing portion of the parameter. So this will work:
var='foobar000'
echo ${var%%[0][0]}
and, with some simple hacking, we can do this:
var='foobar000'
echo ${var%%$(yes '[0]' | head -n 2 | tr -d '\n')}
and this will remove two trailing zeros from the string.

Bash last index of

Sorry for the lame bash question, but I can't seem to be able to work it out.
I have the following simple case:
I have variable like artifact-1.2.3.zip
I would like to get a sub-string between the hyphen and the last index of the dot (both exclusive).
My bash skill are not too strong. I have the following:
a="artifact-1.2.3.zip"; b="-"; echo ${a:$(( $(expr index "$a" "$b" + 1) - $(expr length "$b") ))}
Producing:
1.2.3.zip
How do I remove the .zip part as well?
The bash man page section titled "Variable Substitution" describes using ${var#pattern}, ${var##pattern}, ${var%pattern}, and ${var%%pattern}.
Assuming that you have a variable called filename, e.g.,
filename="artifact-1.2.3.zip"
then, the following are pattern-based extractions:
% echo "${filename%-*}"
artifact
% echo "${filename##*-}"
1.2.3.zip
Why did I use ## instead of #?
If the filename could possibly contain dashes within, such as:
filename="multiple-part-name-1.2.3.zip"
then compare the two following substitutions:
% echo "${filename#*-}"
part-name-1.2.3.zip
% echo "${filename##*-}"
1.2.3.zip
Once having extracted the version and extension, to isolate the version, use:
% verext="${filename##*-}"
% ver="${verext%.*}"
% ext="${verext##*.}"
% echo $ver
1.2.3
% echo $ext
zip
$ a="artifact-1.2.3.zip"; a="${a#*-}"; echo "${a%.*}"
‘#pattern’ removes pattern so long as it matches the beginning of $a.
The syntax of pattern is similar to that used in filename matching.
In our case,
* is any sequence of characters.
- means a literal dash.
Thus #*- matches everything up to, and including, the first dash.
Thus ${a#*-} expands to whatever $a would expand to,
except that artifact- is removed from the expansion,
leaving us with 1.2.3.zip.
Similarly, ‘%pattern’ removes pattern so long as it matches the end of the expansion.
In our case,
. a literal dot.
* any sequence of characters.
Thus %.* is everything including the last dot up to the end of the string.
Thus if $a expands to 1.2.3.zip,
then ${a%.*} expands to 1.2.3.
Job done.
The man page content for this is as follows (at least on my machine, YMMV):
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern just as in pathname
expansion. If the pattern matches the beginning of the value of
parameter, then the result of the expansion is the expanded
value of parameter with the shortest matching pattern (the ``#''
case) or the longest matching pattern (the ``##'' case) deleted.
If parameter is # or *, the pattern removal operation is applied
to each positional parameter in turn, and the expansion is the
resultant list. If parameter is an array variable subscripted
with # or *, the pattern removal operation is applied to each
member of the array in turn, and the expansion is the resultant
list.
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern just as in pathname
expansion. If the pattern matches a trailing portion of the
expanded value of parameter, then the result of the expansion is
the expanded value of parameter with the shortest matching pat-
tern (the ``%'' case) or the longest matching pattern (the
``%%'' case) deleted. If parameter is # or *, the pattern
removal operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the pattern removal
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
HTH!
EDIT
Kudos to #x4d for the detailed answer.
Still think people should RTFM though.
If they don't understand the manual,
then post another question.
Using Bash RegEx feature:
>str="artifact-1.2.3.zip"
[[ "$str" =~ -(.*)\.[^.]*$ ]] && echo ${BASH_REMATCH[1]}
I think you can do this:
string=${a="artifact-1.2.3.zip"; b="-"; echo ${a:$(( $(expr index "$a" "$b" + 1) - $(expr length "$b") ))}}
substring=${string:0:4}
The last step removes the last 4 characters from the string. There's some more info on here.

Resources