Attribute expansion substring, get a substring at the Nth occurence - bash

Let's say I have a filename:
Filename=AB123_10_001_00202.jpg
Using as much as possible bash "attribute expansion substring" ,
I would like to extract "202" or in general the number without the "_00".
If I do:
Name=${Filename%.jpg}
I get:
AB123_10_001_00202
but then, as many "_0" occur, I don't see how to proceed.
So I tried:
Number=${Name##*_0}
...which works in case of the last digits are 12, 123 or 1234 for example. But if a "_0" is in-between some digit like 202, I only get "2".

Removing the leading zeroes is tricky using parameter expansion. You could remove them by interpreting the number:
Filename=AB123_10_001_00202.jpg
Name="${Filename%.jpg}"
PaddedNumber="${Name##*_}"
(( Number = "10#$PaddedNumber" ))
Alternatively, use bash's matching operator
Filename=AB123_10_001_00202.jpg
Regex='.*_0*([0-9]+)'
[[ "$Filename" =~ $Regex ]]
Number="${BASH_REMATCH[1]}"

Thank you for the one who posted this answer but removed it.
It worked perfectly well. Just in case someone get stuck as I was.
$ fname=AB123_10_001_00202.jpg
$ str=${fname%.jpg}
$ echo $fname
AB123_10_001_00202
$ shopt -s extglob
$ printf -v var '%s\n' "${str##*_*(0)}"
$ echo $var
202

Related

How to pull a string apart by its contents

I have a string in a common pattern that I want to manipulate. I want to be able to turn string 5B299 into 5B300 (increment the last number by one).
I want to avoid blindly splicing the string by index, as the first number and letter can change in size. Essentially I want to be able to get the entire value of everything after the first character, increment it by one, and re-append it.
The only things I've found online so far show me how to cut by a delimiter, but I don't have a constant delimiter.
You could use the regex features supported by the bash shell with its ~ construct that supports basic Extended Regular Expression matching (ERE). All you need to do is define a regex and work on the captured groups to get the resulting string
str=5B299
re='^(.*[A-Z])([0-9]+)$'
Now use the ~ operator to do the regex match. The ~ operator populates an array BASH_REMATCH with the captured groups if regex match was successful. The first part (5B in the example) would be stored in the index 0 and the next one at 1. We increment the value at index 1 with the $((..)) operator.
if [[ $str =~ $re ]]; then
result="${BASH_REMATCH[1]}$(( BASH_REMATCH[2] + 1 ))"
printf '%s\n' "$result"
fi
The POSIX version of the regex, free of the locale dependency would be to use character classes instead of range expressions as
posix_re='^(.*[[:alpha:]])([[:digit:]]+)$'
You can do what you are attempting fairly easily with the bash parameter-expansion for string indexes along with the POSIX arithmetic operator. For instance you could do:
#!/bin/bash
[ -z "$1" ] && { ## validate at least 1 argument provided
printf "error: please provide a number.\n" >&2
exit 1
}
[[ $1 =~ [^0-9][^0-9]* ]] && { ## validate all digits in argument
printf "error: input contains non-digit characters.\n" >&2
exit 1
}
suffix=${1:1} ## take all character past 1st as suffix
suffix=$((suffix + 1)) ## increment suffix by 1
result=${1:0:1}$suffix ## append suffent to orginal 1st character
echo "$result" ## output
exit 0
Which will leave the 1st character alone while incrementing the remaining characters by 1 and then joining again with the original 1st digit, while validating that the input consisted only of digits, e.g.
Example Use/Output
$ bash prefixsuffix.sh
error: please provide a number.
$ bash prefixsuffix.sh 38a900
error: input contains non-digit characters.
$ bash prefixsuffix.sh 38900
38901
$ bash prefixsuffix.sh 39999
310000
Look things over and let me know if that is what you intended.
You can use sed in conjunction with awk:
increment() {
echo $1 | sed -r 's/([0-9]+[a-zA-Z]+)([0-9]+)/\1 \2/' | awk '{printf "%s%d", $1, ++$2}'
}
echo $(increment "5B299")
echo $(increment "127ABC385")
echo $(increment "7cf999")
Output:
5B300
127ABC386
7cf1000

Remove leading digits from a string with Bash using parameter expansion

The initial string is RU="903B/100ms"
from which I wish to obtain B/100ms.
Currently, I have written:
#!/bin/bash
RU="903B/100ms"
RU=${RU#*[^0-9]}
echo $RU
which returns /100ms since the parameter expansion removes up to and including the first non-numeric character. I would like to keep the first non-numeric character in this case. How would I do this by amending the above text?
You can use BASH_REMATCH to extract the desired matching value:
$ RU="903B/100ms"
$ [[ $RU =~ ^([[:digit:]]+)(.*) ]] && echo ${BASH_REMATCH[2]}
B/100ms
Or just catch the desired part as:
$ [[ $RU =~ ^[[:digit:]]+(.*) ]] && echo ${BASH_REMATCH[1]}
B/100ms
Assuming shopt -s extglob:
RU="${RU##+([0-9])}"
echo "903B/100ms" | sed 's/^[0-9]*//g'
B/100ms

Change a Number inside a Variable

I have the following problem: (Its about dates)
The user will set the following variable.
variable1=33_2016
now I Somehow want to to automatically set a second variable which sets the "33" +1
that I get
variable2=34_2016
Thanks for any advice.
My first choice would be to break the first variable apart with read, then put the (updated) pieces back together.
IFS=_ read f1 f2 <<< "$variable1"
# Option 1
variable2=$((f1 + 1))_$f2
# Option 2
printf -v variable2 '%s_%s" "$((f1 + 1))" "$f2"
You can also use parameter expansion to do the parsing:
f1=${variable%_*}
f2=${variable#*_}
You can also use a regular expression, which is more readable for parsing but much longer to put the pieces back together (BASH_REMATCH could use a shorter synonym).
[[ $variable1 =~ (.*)_(.*) ]] &&
f1=$((${BASH_REMATCH[1]}+1)) f2=${BASH_REMATCH[2]}
The first and third options also allow the possibility of working with an array:
# With read -a
IFS=_ read -a f <<< "$variable1"
variable2=$(IFS=_; echo "${f[*]}")
# With regular expression
[[ $variable1 =~ (.*)_(.*) ]]
variable2=$(IFS=_; echo "${BASH_REMATCH[*]:1:2}")
You can use awk:
awk 'BEGIN{FS=OFS="_"}{$1+=1}1' <<< "${variable1}"
While this needs an external process to spawn (a bit slower) it's easier to read/write. Decide for yourself what is more important for you here.
To store the return value in a variable, use command substitution:
variable2=$(awk 'BEGIN{FS=OFS="_"}{$1+=1}1' <<< "${variable1}")
You can do somewhat the same thing with parameter expansion with substring substitution, e.g.
$ v1=33_2016
$ v2=${v1/${v1%_*}/$((${v1%_*}+1))}
$ echo $v2
34_2016
It's six to one, a half-dozen to another.

case insensitive string comparison in bash

The following line removes the leading text before the variable $PRECEDING
temp2=${content#$PRECEDING}
But now i want the $PRECEDING to be case-insensitive. This works with sed's I flag. But i can't figure out the whole cmd.
No need to call out to sed or use shopt. The easiest and quickest way to do this (as long as you have Bash 4):
if [ "${var1,,}" = "${var2,,}" ]; then
echo "matched"
fi
All you're doing there is converting both strings to lowercase and comparing the results.
Here's a way to do it with sed:
temp2=$(sed -e "s/^.*$PRECEDING//I" <<< "$content")
Explanation:
^.*$PRECEDING: ^ means start of string, . means any character, .* means any character zero or more times. So together this means "match any pattern from start of string that is followed by (and including) string stored in $PRECEDING.
The I part means case-insensitive, the g part (if you use it) means "match all occurrences" instead of just the 1st.
The <<< notation is for herestrings, so you save an echo.
The only bash way I can think of is to check if there's a match (case-insensitively) and if yes, exclude the appropriate number of characters from the beginning of $content:
content=foo_bar_baz
PRECEDING=FOO
shopt -s nocasematch
[[ $content == ${PRECEDING}* ]] && temp2=${content:${#PRECEDING}}
echo $temp2
Outputs: _bar_baz
your examples have context-switching techniques.
better is (bash v4):
VAR1="HELLoWORLD"
VAR2="hellOwOrld"
if [[ "${VAR1^^}" = "${VAR2^^}" ]]; then
echo MATCH
fi
link: Converting string from uppercase to lowercase in Bash
If you don't have Bash 4, I find the easiest way is to first convert your string to lowercase using tr
VAR1=HelloWorld
VAR2=helloworld
VAR1_LOWER=$(echo "$VAR1" | tr '[:upper:]' '[:lower:]')
VAR2_LOWER=$(echo "$VAR2" | tr '[:upper:]' '[:lower:]')
if [ "$VAR1_LOWER" = "$VAR2_LOWER" ]; then
echo "Match"
else
echo "Invalid"
fi
This also makes it really easy to assign your output to variables by changing your echo to OUTPUT="Match" & OUTPUT="Invalid"

Padding zeros in a string

I'm writing a bash script to get some podcasts. The problem is that some of the podcast numbers are one digits while others are two/three digits, therefore I need to pad them to make them all 3 digits.
I tried the following:
n=1
n = printf %03d $n
wget http://aolradio.podcast.aol.com/sn/SN-$n.mp3
but the variable 'n' doesn't stay padded permanently. How can I make it permanent?
Use backticks to assign the result of the printf command (``):
n=1
wget http://aolradio.podcast.aol.com/sn/SN-`printf %03d $n`.mp3
EDIT: Note that i removed one line which was not really necessary.
If you want to assign the output of 'printf %...' to n, you could
use
n=`printf %03d $n`
and after that, use the $n variable substitution you used before.
Seems you're assigning the return value of the printf command (which is its exit code), you want to assign the output of printf.
bash-3.2$ n=1
bash-3.2$ n=$(printf %03d $n)
bash-3.2$ echo $n
001
Attention though if your input string has a leading zero!
printf will still do the padding, but also convert your string to hex octal format.
# looks ok
$ echo `printf "%05d" 03`
00003
# but not for numbers over 8
$ echo `printf "%05d" 033`
00027
A solution to this seems to be printing a float instead of decimal.
The trick is omitting the decimal places with .0f.
# works with leading zero
$ echo `printf "%05.0f" 033`
00033
# as well as without
$ echo `printf "%05.0f" 33`
00033
to avoid context switching:
a="123"
b="00000${a}"
c="${b: -5}"
n=`printf '%03d' "2"`
Note spacing and backticks
As mentioned by noselad, please command substitution, i.e. $(...), is preferable as it supercedes backtics, i.e. `...`.
Much easier to work with when trying to nest several command substitutions instead of escaping, i.e. "backslashing", backtics.
This is in response to an answer given by cC Xx.
It will work only until a's value less is than 5 digits.
Consider when a=12345678.
It'll truncate the leading digits:
a="12345678"
b="00000${a}"
c="${b: -5}"
echo "$a, $b, $c"
This gives the following output:
12345678, 0000012345678, 45678
Putting an if to check value of a is less than 5 digits and then doing it could be solution:
if [[ $a -lt 9999 ]] ; then b="00000${a}" ; c="${b: -5}" ; else c=$a; fi
Just typing this here for additional information.
If you know the number of zeroes you need, you can use the string concatenation:
let pad="0"
pad+=1
echo $pad # this will print 01

Resources