string pattern replacement in bash - bash

Using bash, I have a string:
str='a $s'
echo ${str/\$/\$a}
# a $as
str='a $1'
echo ${str/\$/\$a}
# a $a1
How modify pattern that replacement performs only if word starts with a letter?
I want
str='a $s'
echo ${str/??/??}
# a $as
str='a $1'
echo ${str/??/??}
# a $1

This would have been trivial if the regex support in Bash allowed lookahead. The solution would then simply be ${str/\$(?=[a-zA-Z])/\$a}. Unfortunately, that is not the case.
You can still do it if you don't might using sed.
str='a $s'
echo $str | sed 's/\$\([a-zA-Z]\)/\$a\1/g' # gives you a $as
str='a $1'
echo $str | sed 's/\$\([a-zA-Z]\)/\$a\1/g' # gives you a $1
\$\([a-zA-Z]\) matches $ followed by an alphabet and store the alphabet. If a match is found, it is replaced with $a followed by the stored alphabet (\1). For more details, see this tutorial.
The trailing g means it will match all occurrences in the string. Remove that if you only want to only replace the first occurrence.

You can use an IF to check if any any letter follows the dollar sign and in case do the replacement:
if [[ "$str" =~ \$[[:alpha:]] ]]; then
echo ${str/$/\$a};
else
echo $str;
fi

Demonstration:
for str in 'a $1' 'a $s'
do
[[ $str =~ \$([[:alpha:]]) ]]
echo ${str/\$/\$${BASH_REMATCH[1]:+a}}
done
Result:
a $1
a $as

Related

Get first character of each string with BASH_REMATCH

I'am trying to get the first character of each string using regex and BASH_REMATCH in shell script.
My input text file contain :
config_text = STACK OVER FLOW
The strings STACK OVER FLOW must be uppercase like that.
My output should be something like this :
SOF
My code for now is :
var = config_text
values=$(grep $var test_file.txt | tr -s ' ' '\n' | cut -c 1)
if [[ $values =~ [=(.*)]]; then
echo $values
fi
As you can see I'am using tr and cut but I'am looking to replace them with only BASH_REMATCH because these two commands have been reported in many links as not functional on MacOs.
I tried something like this :
var = config_text
values=$(grep $var test_file.txt)
if [[ $values =~ [=(.*)(\b[a-zA-Z])]]; then
echo $values
fi
VALUES as I explained should be :
S O F
But it seems \b does not work on shell script.
Anyone have an idea how to get my desired output with BASH_REMATCH ONLY.
Thanks in advance for any help.
A generic BASH_REMATCH solution handling any number of words and any separator.
local input="STACK OVER FLOW" pattern='([[:upper:]]+)([^[:upper:]]*)' result=""
while [[ $input =~ $pattern ]]; do
result+="${BASH_REMATCH[1]::1}${BASH_REMATCH[2]}"
input="${input:${#BASH_REMATCH[0]}}"
done
echo "$result"
# Output: "S O F"
Bash's regexes are kind of cumbersome if you don't know how many words there are in the input string. How's this instead?
config_text="STACK OVER FLOW"
sed 's/\([^[:space:]]\)[^[:space:]]*/\1/g' <<<"$config_text"
First Put a valid shebang and paste your script at https://shellcheck.net for validation/recommendation.
With the assumption that the line starts with config and ends with FLOW e.g.
config_text = STACK OVER FLOW
Now the script.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
while IFS= read -r line; do
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
done < test_file.txt
If there is Only one line or the target string/pattern is at the first line of the test_file.txt, the while loop is not needed.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
IFS= read -r line < test_file.txt
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
Make sure you have and running/using Bashv4+ since MacOS, defaults to Bashv3
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Another option rather than bash regex would be to utilize bash parameter expansion substring ${parameter:offset:length} to extract the desired characters:
$ read -ra arr <text.file ; printf "%s%s%s\n" "${arr[2]:0:1}" "${arr[3]:0:1}" "${arr[4]:0:1}"
SOF

How to capture the longest match of a repeating pattern using BASH_REMATCH

I am trying to capture the longest match of a repeating pattern
do_run() {
local regex='.*((abc)+).*'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get the following output :
regex=.*((abc)+).*
str=_abcabcabc_
0=_abcabcabc123_
1=abc
2=abc
I am trying to get something like :
regex=.*((abc)+).*
str=_abcabcabc123_
0=_abcabcabc123_
x=abcabcabc
(Update : x is just here to indicate that the index of the matching group does not matter but I need to know what number to use to retrieve the matching group ...)
Update:
After reading comment, the following regex will work : ((abc)+)
However, I also need to capture what precedes and what follows ((abc)+).
I had not mentionned it earlier because I thought the same solution would be applied.
So the new code would be :
do_run() {
local regex='(.*)((abc)+)(.*)'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get then the following output :
regex=(.*)((abc)+)(.*)
str=_abcabcabc123_
0=_abcabcabc123_
1=_abcabc
2=abc
3=abc
4=123_
I want to be able to retrieve abcabcabc from a matching group but also what precedes it and what follows it
As a workaround you can do like this:
[STEP 101] $ cat foo.sh
v=_abcabcabc123_
if [[ $v =~ (abc)+ ]]; then
middle=${BASH_REMATCH[0]}
[[ $v =~ (.*)"$middle" ]]
before=${BASH_REMATCH[1]}
[[ $v =~ "$middle"(.*) ]]
after=${BASH_REMATCH[1]}
echo "before: $before"
echo "middle: $middle"
echo "after : $after"
fi
[STEP 102] $ bash foo.sh
before: _
middle: abcabcabc
after : 123_
[STEP 103] $
I also need to capture what precedes and what follows ((abc)+).
For that, typically you'll need a negative lookahead with perl regex, something along (?<!abc)((abs)+)(.*).
I am bad at perl regex, with perl-enabled grep I was able to this:
$ grep -oxP '(.*)(?<!abc)((abc)+)\K(.*)' <<<'_abcabcabc123_'
123_
$ grep -oP '((abc)+)' <<<'_abcabcabc123_'
abcabcabc
$ rev <<<'_abcabcabc123_' | grep -oP '(.*)(?<!cba)((cba)+)\K(.*)' | rev
_
Bash has no lookarounds and no perl regex. Consider using python or perl.
But you may use sed by splitting the part on the regex and then reading lines, which may be simpler:
$ readarray -t lines < <(<<<'_abcabcabc123_' sed -E 's/((abc)+)/\n&\n/'); declare -p lines
declare -a lines=([0]="_" [1]="abcabcabc" [2]="123_")
Another idea: you may use bash expansion to replace the abc parts by something unique, then split it on that separator:
$ IFS=' ' read -r before post < <(printf "%s\n" "${str//abc/ }") ; declare -p before post
declare -- before="_"
declare -- post="123_"
# or
$ IFS='#' read -r before post < <(<<<"${str//abc/#}" tr -s '#') ; declare -p before post
declare -- before="_"
declare -- post="123_"
For your given input this regex would work:
re='^([^a]|a[^b]*|ab[^c]*)((abc)+)(.*)'
str='_abcabcabc123_'
[[ $str =~ $re ]] && declare -p BASH_REMATCH
Output:
declare -ar BASH_REMATCH=([0]="_abcabcabc123_" [1]="_" [2]="abcabcabc" [3]="abc" [4]="123_")
So you can use:
"${BASH_REMATCH[1]}" # string before
"${BASH_REMATCH[2]}" # string containing all "abc"s
"${BASH_REMATCH[4]}" # string after
RegEx Demo

Remove leading digits from a string with Bash using parameter expansion

The initial string is RU="903B/100ms"
from which I wish to obtain B/100ms.
Currently, I have written:
#!/bin/bash
RU="903B/100ms"
RU=${RU#*[^0-9]}
echo $RU
which returns /100ms since the parameter expansion removes up to and including the first non-numeric character. I would like to keep the first non-numeric character in this case. How would I do this by amending the above text?
You can use BASH_REMATCH to extract the desired matching value:
$ RU="903B/100ms"
$ [[ $RU =~ ^([[:digit:]]+)(.*) ]] && echo ${BASH_REMATCH[2]}
B/100ms
Or just catch the desired part as:
$ [[ $RU =~ ^[[:digit:]]+(.*) ]] && echo ${BASH_REMATCH[1]}
B/100ms
Assuming shopt -s extglob:
RU="${RU##+([0-9])}"
echo "903B/100ms" | sed 's/^[0-9]*//g'
B/100ms

Bash variable substitution and strings

Let's say I have two variables:
a="AAA"
b="BBB"
I read a string from a file. This string is the following:
str='$a $b'
How to create a new string from the first one that substitutes the variables?
newstr="AAA BBB"
bash variable indirection whithout eval:
Well, as eval is evil, we may try to make this whithout them, by using indirection in variable names.
a="AAA"
b="BBB"
str='$a $b'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=($cnt)
done
newstr="${newstr[*]}"
echo $newstr
AAA BBB
Another try:
var1="Hello"
var2="2015"
str='$var1 world! Happy new year $var2'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=($cnt)
done
newstr="${newstr[*]}"
echo $newstr
Hello world! Happy new year 2015
Addendum As correctly pointed by #EtanReisner's comment, if your string do contain some * or other glob expendable stings, you may have to use set -f to prevent bad things:
cd /bin
var1="Hello"
var2="star"
var3="*"
str='$var1 this string contain a $var2 as $var3 *'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt};
newstr+=("$cnt");
done;
newstr="${newstr[*]}"
echo "$newstr"
Hello this string contain a star as * bash bunzip2 busybox....zmore znew
echo ${#newstr}
1239
Note: I've added " at newstr+=("$cnt"); to prevent glob expansion, but set -f seem required...
newstr=()
set -f
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=("$cnt")
done
set +f
newstr="${newstr[*]}"
echo "$newstr"
Hello this string contain a star as * *
Nota 2: This is far away from a perfect solution. For sample if string do contain ponctuation, this won't work again... Example:
str='$var1, this string contain a $var2 as $var3: *'
with same variables as previous run will render:
' this string contain a star as *' because ${!var1,} and ${!var3:} don't exist.
... and if $str do contain special chars:
As #godblessfq asked:
If str contains a line break, how do I do the substitution and preserve the newline in the output?
So this is not robust as every indirected variable must be first, last or space separated from all special chars!
str=$'$var1 world!\n... 2nd line...'
var1=Hello
newstr=()
set -f
IFS=' ' read -d$'\377' -ra array <<<"$str"
for cnt in "${array[#]}";do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=("$cnt")
done
set +f
newstr="${newstr[*]}"
echo "$newstr"
Hello world!
... 2nd line...
As <<< inline string add a trailing newline, last echo command could be written:
echo "${newstr%$'\n'}"
The easiest solution is to use eval:
eval echo "$str"
To assign it to a variable, use command substitution:
replaced=$(eval echo "$str")
Disclaimer: I only discovered perl an hour ago. But this seems to work robustly, whatever special characters you throw at it:
newstr=$(a2="$a" b2="$b" perl -pe 's/\$a\b/$ENV{a2}/g; s/\$b\b/$ENV{b2}/g' <(echo -e "$str"))
Test:
a='A*A\nA'
b='B*B\nB'
str='$a $aa * \n $b $bb'
newstr=$(a2="$a" b2="$b" perl -pe 's/\$a\b/$ENV{a2}/g; s/\$b\b/$ENV{b2}/g' <(echo -e "$str"))
echo -e "$newstr"
Output:
A*A
A $aa *
B*B
B $bb
I'd use awk solution with awk-variables. This will allow passing a text containing special chars and subsitute any placeholder with it.
a workaround to recognize $ would be using [\x24]:
awk -v a="$a" -v b="$b" '{gsub("[\x24]a",a);gsub("[\x24]b",b); print}' <<< $str
here
-v defines variable a="$a"
[x24] is ASCII for $, so [x24]a equal to $a
gsub(x,y) - replaces x with y

How can I concatenate / join two bash variables by a newline character?

My non-working code sample (erroneous line -> empty=$empty\n$url):
empty=""
IFS=$'\n'
for line in $s; do
if [[ $line =~ $regex ]]; then
url="${BASH_REMATCH[2]}${BASH_REMATCH[1]}"
echo $url
empty=$empty\n$url
else
echo "$s does not match"
fi
done
echo $empty|sort -f -t/ -k 4
I try to rebuild the modified lines splitted by the for cycle.
empty="$empty"$'\n'"$url"
$'\n' is a literal newline in bash (double-quoting the variable references is not strictly necessary here, but helps readability; alternative: empty=${empty}$'\n'${url}).
Alternative solution with printf:
printf -v empty '%s\n%s' "$empty" "$url"
String literals can have embedded newlines:
entry="$entry
$url"
or
entry+="
$url"

Resources