How does extglob work with shell parameter expansion? - bash

I thought I understood the use of the optional ?(pattern-list) in bash (when extglob shell option is on) and by default in ksh. For example in bash:
$ shopt -s extglob
$ V=35xAB
$ echo "${V#?(35|88)x}" "${V#35}"
AB xAB
But when the matching prefix pattern is just one ?() or one *(), which introduce what I call optional patterns, the 35 is not omitted unless ## is used:
$ echo "${V#?(35|88)}" "${V#*(35|88)}" # Why 35 is not left out?
35xA 35xA
$ echo "${V##?(35|88)}" "${V##*(35|88)}" # Why is it omitted when ## is used?
xA xA
The same behaviour is reported when ?() and *() are used in a matching suffix pattern (using % and %%):
$ echo "${V%5?(xA|Bz)}" # 5xA is omitted
3
$ echo "${V%?(xA|Bz)}" "${V%*(xA|Bz)}" # why xA is not left out?
35xA 35xA
$ echo "${V%%?(xA|Bz)}" "${V%%*(xA|Bz)}" # xA is omitted when %% is used
35 35
I tested this issue in the bash releases 3.2.25, 4.1.2 and 4.1.6 and it makes me think that, perhaps, I had not properly understood the actual underlying shell mechanism for matching patterns.
May anybody shed light on this?
Thanks in advance

If you use # instead of ? then it works as expected:
$> echo "${V##(35|88)}"
xAB
$> echo "${V%#(xAB|Bzh)}"
35
Similarly behavior of + instead of *:
$> echo "${V#*(35|88)}"
35xAB
$>echo "${V#+(35|88)}"
xAB
It is because:
?(pattern-list) # Matches zero or one occurrence of the given patterns
#(pattern-list) # Matches one of the given patterns
And:
*(pattern-list) # Matches zero or more occurrences of the given patterns
+(pattern-list) # Matches one or more occurrences of the given patterns

Related

Extracting a string between last two slashes in Bash

I know this can be easily done using regex like I answered on https://stackoverflow.com/a/33379831/3962126, however I need to do this in bash.
So the closest question on Stackoverflow I found is this one bash: extracting last two dirs for a pathname, however the difference is that if
DIRNAME = /a/b/c/d/e
then I need to extract
d
This may be relatively long, but it's also much faster to execute than most preceding answers (other than the zsh-only one and that by j.a.), since it uses only string manipulations built into bash and uses no subshell expansions:
string='/a/b/c/d/e' # initial data
dir=${string%/*} # trim everything past the last /
dir=${dir##*/} # ...then remove everything before the last / remaining
printf '%s\n' "$dir" # demonstrate output
printf is used in the above because echo doesn't work reliably for all values (think about what it would do on a GNU system with /a/b/c/-n/e).
Here a pure bash solution:
[[ $DIRNAME =~ /([^/]+)/[^/]*$ ]] && printf '%s\n' "${BASH_REMATCH[1]}"
Compared to some of the other answers:
It matches the string between the last two slashes. So, for example, it doesn't match d if DIRNAME=d/e.
It's shorter and fast (just uses built-ins and doesn't create subprocesses).
Support any character between last two slashes (see Charles Duffy's answer for more on this).
Also notice that is not the way to assign a variable in bash:
DIRNAME = /a/b/c/d/e
^ ^
Those spaces are wrong, so remove them:
DIRNAME=/a/b/c/d/e
Using awk:
echo "/a/b/c/d/e" | awk -F / '{ print $(NF-1) }' # d
Edit: This does not work when the path contains newlines, and still gives output when there are less than two slashes, see comments below.
Using sed
if you want to get the fourth element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^(/[^/]*){3}/([^/]*)/.*$_\2_g'
if you want to get the before last element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^.*/([^/]*)/[^/]*$_\1_g'
OMG, maybe this was obvious, but not to me initially. I got the right result with:
dir=$(basename -- "$(dirname -- "$str")")
echo "$dir"
Using zsh parameter substitution is pretty cool too
echo ${${DIRNAME%/*}##*/}
I think it's faster than the double $() as well, because it won't need any subprocesses.
Basically it slices off the right side first, and then all the remaining left side second.

VAR=${n:-m} usage in Bash

I want to write a Genetic Algorithm for bash based in the one posted here: http://father-natures.blogspot.mx/2013/04/implementing-genetic-algorithm-in-bash.html. I am quite inexpert in advance scripting and I don't get what VAR=${n:-m} stands for. My guess was that things like:
POOL_SIZE=${1:-6}
Make $1=-6, however when I check $1 it is empty and when I check $POOLSIZE I get 6.
libertad#engrane4:~$ echo "POOL_SIZE"
6
This is quite confusing for me. If I wanted the variable to be 6 I would write:
POOL_SIZE=6
Could you tell me what am I missing (what else is this assignation doing)?
Thank you,
It sets a default in case $1 is empty.
From 3.5.3 Shell Parameter Expansion in the Bash Reference Manual:
${parameter:-word}
If parameter is unset or null, the expansion of word is substituted.
Otherwise, the value of parameter is substituted.
Example
$ echo ${a:-"hello"}
hello
$ a="test"
$ echo ${a:-"hello"}
test
Based on your comment
Thanks, #fedorqui. The original variables were POOL_SIZE=${1:-6},
REPRO_CHANCE=${2:-30}, BEST_FITS=${3:-70}. Now I am wondering if
${POOL_SIZE:-6}, ${REPRO_CHANCE:-30} and ${BEST_FITS:-70} would be the
same and why is the numeration needed
If you have
POOL_SIZE=${1:-6}
REPRO_CHANCE=${2:-30}
BEST_FITS=${3:-70}
it is because POOL_SIZE, REPRO_CHANCE and BEST_FITS are supposed to contain the value of $1, $2 and $3. Any $n means the nth parameter, for example from a script. So if you have the following script:
$ cat a
#!/bin/bash
POOL_SIZE=${1:-6}
REPRO_CHANCE=${2:-30}
BEST_FITS=${3:-70}
echo "POOL_SIZE=$POOL_SIZE"
echo "REPRO_CHANCE=$REPRO_CHANCE"
echo "BEST_FITS=$BEST_FITS"
Then its execution with different amount of parameters would yield:
$ ./a
POOL_SIZE=6
REPRO_CHANCE=30
BEST_FITS=70
$ ./a 2 2 2
POOL_SIZE=2
REPRO_CHANCE=2
BEST_FITS=2
$ ./a 24 2
POOL_SIZE=24
REPRO_CHANCE=2
BEST_FITS=70
I hope it makes it clear.
Note also that ${var:-value} and ${var-value} are not the same: What is the difference between ${var:-word} and ${var-word}?.

Shell: extract words matching pattern, but ignore circumventing expression

I am currently trying to extract ALL matching expressions from a text which e.g. looks like this and put them into an array.
aaaaaaaaa${bbbbbbb}ccccccc${dddd}eeeee
ssssssssssssssssss${TTTTTT}efhsekfh ej
348653jlk3jß1094utß43t59ßgöelfl,-s-fko
The matching expressions are similar to this: ${}. Beware that I need the full expression, not only the word in between this expression! So in this case the result should be an array which contains:
${bbbbbbb}
${dddd}
${TTTTTTT}
Problems I have stumbled upon and couldn't solve:
It should NOT recognizes this as a whole
${bbbbbbb}ccccccc${dddd} but each for its own
grep -o is not installed on the old machine, Perl is not allowed either!
Many commands e.g. BASH_REMATCH only deliver the whole line or the first occurrence of the expression, instead of all matching expressions in the line!
The mentioned pattern \${[^}]*} seems to work partly, as it can extract the first occurrence of the expression, however it always omitts the ones following after that, if it's in the same text line. What I need is ALL matching expressions found in the line, not only the first one.
You could split the string on any of the characters $,{,}:
$ s='...blaaaaa${blabla}bloooo${bla}bluuuuu...'
$ echo "$s"
...blaaaaa${blabla}bloooo${bla}bluuuuu...
$ IFS='${}' read -ra words <<< "$s"
$ for ((i=0; i<${#words[#]}; i++)); do printf "%d %s\n" $i "${words[i]}"; done
0 ...blaaaaa
1
2 blabla
3 bloooo
4
5 bla
6 bluuuuu...
So if you're trying to extract the words inside the braces:
$ for ((i=2; i<${#words[#]}; i+=3)); do printf "%d %s\n" $i "${words[i]}"; done
2 blabla
5 bla
If the above doesn't suit you, grep will work:
$ echo '...blaaaaa${blabla}bloooo${bla}bluuuuu...' | grep -o '\${[^}]\+}'
${blabla}
${bla}
You still haven't told us exactly what output you want.
Since it bugged me a lot I have asked directly on www.unix.com and was kindly provided with a solution which fits for my ancient shell. So if anyone got the same problem here is the solution:
line='aaaa$aa{yyy}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line"
regex='^(\{[^}]+})'
for e in "${words[#]}"; do
if [[ $e =~ $regex ]]; then
echo "\$${BASH_REMATCH[0]}";
fi;
done
which prints then the following - without even getting disturbed by random occurrences of $ and { or } between the syntactically correct expressions:
${important}
${important2}
${importantstring3}
I have updated the full solution after I got another update from the forums: now it also ignores this: aaa$aa{yyy}aaaa - which it previously printed as ${yyy} - but which it should completely ignore as there are characters between $ and {. Now with the additional anchoring on the beginning of the regexp it works as expected.
I just found another issue: theoretically using the above approach I would still get a wrong output if the read line looks like this line='{ccc}aaaa${important}aaa'. The IFS would split it and the REGEX would match {ccc} although this hadn't the $ sign in front. This is suboptimal.
However following approach could solve it: after getting the BASH_REMATCH I would need to do a search in the original line - the one I gave to the IFS - for this exact expression ${ccc} - with the difference, that the $ is included! And only if it finds this exact match, only then, it counts as a valid match; otherwise it should be ignored. Kind of a reverse search method...
Updated - add this reverse search to ignore the trap on the beginning of the line:
pattern="\$${BASH_REMATCH[0]}";
searchresult="";
searchresult=`echo "$line" | grep "$pattern"`;
if [ "$searchresult" != "" ]; then echo "It was found!"; fi;
Neglectable issue: If the line looks like this line='{ccc}aaaaaa${ccc}bbbbb' it would recognize the first {ccc} as a valid match (although it isn't) and print it, because the reverse search found the second ${ccc}. Although this is not intended it's irrelevant for my specific purpose as it implies that this pattern does in fact exist at least once in the same line.

Bash: trim a parameter from both ends

Greetings!
This are well know Bash parameter expansion patterns:
${parameter#word}, ${parameter##word}
and
${parameter%word}, ${parameter%%word}
I need to chop one part from the beginning and anoter part from the trailing of the parameter. Could you advice something for me please?
If you're using Bash version >= 3.2, you can use regular expression matching with a capture group to retrieve the value in one command:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ [[ $path =~ ^.*/([^/]*)/.*$ ]]
$ echo ${BASH_REMATCH[1]}
ABC
This would be equivalent to:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ path=$(echo "$path" | sed 's|^.*/\([^/]*\)/.*$|\1|p')
$ echo $path
ABC
I don't know that there's an easy way to do this without resorting to sub-shells, something you probably want to avoid for efficiency. I would just use:
> xx=hello_there
> yy=${xx#he}
> zz=${yy%re}
> echo ${zz}
llo_the
If you're not fussed about efficiency and just want a one-liner:
> zz=$(echo ${xx%re} | sed 's/^he//')
> echo ${zz}
llo_the
Keep in mind that this second method starts sub-shells - it's not something I'd be doing a lot of if your script has to run fast.
This solution uses what Andrey asked for and it does not employ any external tool. Strategy: Use the % parameter expansion to remove the file name, then use the ## to remove all but the last directory:
$ path=/path/to/my/last_dir/filename.txt
$ dir=${path%/*}
$ echo $dir
/path/to/my/last_dir
$ dir=${dir##*/}
$ echo $dir
last_dir
I would highly recommend going with bash arrays as their performance is just over 3x faster than regular expression matching.
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ IFS='/' arr=( $path )
$ echo ${arr[${#arr[#]}-2]}
ABC
This works by telling bash that each element of the array is separated by a forward slash / via IFS='/'. We access the penultimate element of the array by first determining how many elements are in the array via ${#arr[#]} then subtracting 2 and using that as the index to the array.

^word^replacement^ on all matches in Bash?

To clarify, I am looking for a way to perform a global search and replace on the previous command used. ^word^replacement^ only seems to replace the first match.
Is there some set option that is eluding me?
Try this:
$ echo oneone
oneone
$ !!:gs/one/two/ # Repeats last command; substitutes 'one' --> 'two'.
twotwo
This solution uses Bash Substring Replacement:
$ SENTENCE="1 word, 2 words";echo "${SENTENCE//word/replacement}"
1 replacement, 2 replacements
Note the use of the double slashes denotes "global" string replacement.
This solution can be executed in one line.
Here's how to globally replace a string in a file named "myfile.txt":
$ sed -i -e "s/word/replacement/g" myfile.txt
Blending my answer here with John Feminella's you can do this if you want an alias:
$alias dothis='`history -p "!?monkey?:gs/jpg/png/"`'
$ls *.jpg
monkey.jpg
$dothis
monkey.png
The !! only does the previous command, while !?string? matches the most recent command containing "string".
A nasty way to get around this could be something like this:
Want to echo BAABAA rather than BLABLA by swapping L's for A's
$ echo "BLABLA"
BLABLA
$ `echo "!!" | sed 's/L/A/g'`
$(echo "echo "BLABLA" " | sed 's/L/A/g')
BAABAA
$
Unfortunately this technique doesn't seem to work in functions or aliases.
this question has many dupes and one elegant answer only appears in this answer of user #Mikel in unix se
fc -s pat=rep
this bash builtin is documented under the chapter 9.2 Bash History Builtins
In the second form, command is re-executed after each instance of pat
in the selected command is replaced by rep. command is interpreted the
same as first above.
A useful alias to use with the fc command is r='fc -s', so that typing
‘r cc’ runs the last command beginning with cc and typing ‘r’
re-executes the last command (see Aliases).
I test it on SUSE 10.1.
"^word^replacement^" doesn't work, while "^word^replacement" works well.
for a instance:
linux-geek:/home/Myworks # ls /etc/ld.so.conf
/etc/ld.so.conf
linux-geek:/home/Myworks # ^ls^cat
cat /etc/ld.so.conf
/usr/X11R6/lib/Xaw3d
/usr/X11R6/lib
/usr/i486-linux-libc5/lib=libc5
/usr/i386-suse-linux/lib
/usr/local/lib
/opt/kde3/lib
/opt/gnome/lib
include /etc/ld.so.conf.d/*.conf
linux-geek:/home/Myworks #

Resources