Bash variable substitution and strings - bash

Let's say I have two variables:
a="AAA"
b="BBB"
I read a string from a file. This string is the following:
str='$a $b'
How to create a new string from the first one that substitutes the variables?
newstr="AAA BBB"

bash variable indirection whithout eval:
Well, as eval is evil, we may try to make this whithout them, by using indirection in variable names.
a="AAA"
b="BBB"
str='$a $b'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=($cnt)
done
newstr="${newstr[*]}"
echo $newstr
AAA BBB
Another try:
var1="Hello"
var2="2015"
str='$var1 world! Happy new year $var2'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=($cnt)
done
newstr="${newstr[*]}"
echo $newstr
Hello world! Happy new year 2015
Addendum As correctly pointed by #EtanReisner's comment, if your string do contain some * or other glob expendable stings, you may have to use set -f to prevent bad things:
cd /bin
var1="Hello"
var2="star"
var3="*"
str='$var1 this string contain a $var2 as $var3 *'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt};
newstr+=("$cnt");
done;
newstr="${newstr[*]}"
echo "$newstr"
Hello this string contain a star as * bash bunzip2 busybox....zmore znew
echo ${#newstr}
1239
Note: I've added " at newstr+=("$cnt"); to prevent glob expansion, but set -f seem required...
newstr=()
set -f
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=("$cnt")
done
set +f
newstr="${newstr[*]}"
echo "$newstr"
Hello this string contain a star as * *
Nota 2: This is far away from a perfect solution. For sample if string do contain ponctuation, this won't work again... Example:
str='$var1, this string contain a $var2 as $var3: *'
with same variables as previous run will render:
' this string contain a star as *' because ${!var1,} and ${!var3:} don't exist.
... and if $str do contain special chars:
As #godblessfq asked:
If str contains a line break, how do I do the substitution and preserve the newline in the output?
So this is not robust as every indirected variable must be first, last or space separated from all special chars!
str=$'$var1 world!\n... 2nd line...'
var1=Hello
newstr=()
set -f
IFS=' ' read -d$'\377' -ra array <<<"$str"
for cnt in "${array[#]}";do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=("$cnt")
done
set +f
newstr="${newstr[*]}"
echo "$newstr"
Hello world!
... 2nd line...
As <<< inline string add a trailing newline, last echo command could be written:
echo "${newstr%$'\n'}"

The easiest solution is to use eval:
eval echo "$str"
To assign it to a variable, use command substitution:
replaced=$(eval echo "$str")

Disclaimer: I only discovered perl an hour ago. But this seems to work robustly, whatever special characters you throw at it:
newstr=$(a2="$a" b2="$b" perl -pe 's/\$a\b/$ENV{a2}/g; s/\$b\b/$ENV{b2}/g' <(echo -e "$str"))
Test:
a='A*A\nA'
b='B*B\nB'
str='$a $aa * \n $b $bb'
newstr=$(a2="$a" b2="$b" perl -pe 's/\$a\b/$ENV{a2}/g; s/\$b\b/$ENV{b2}/g' <(echo -e "$str"))
echo -e "$newstr"
Output:
A*A
A $aa *
B*B
B $bb

I'd use awk solution with awk-variables. This will allow passing a text containing special chars and subsitute any placeholder with it.
a workaround to recognize $ would be using [\x24]:
awk -v a="$a" -v b="$b" '{gsub("[\x24]a",a);gsub("[\x24]b",b); print}' <<< $str
here
-v defines variable a="$a"
[x24] is ASCII for $, so [x24]a equal to $a
gsub(x,y) - replaces x with y

Related

Get first character of each string with BASH_REMATCH

I'am trying to get the first character of each string using regex and BASH_REMATCH in shell script.
My input text file contain :
config_text = STACK OVER FLOW
The strings STACK OVER FLOW must be uppercase like that.
My output should be something like this :
SOF
My code for now is :
var = config_text
values=$(grep $var test_file.txt | tr -s ' ' '\n' | cut -c 1)
if [[ $values =~ [=(.*)]]; then
echo $values
fi
As you can see I'am using tr and cut but I'am looking to replace them with only BASH_REMATCH because these two commands have been reported in many links as not functional on MacOs.
I tried something like this :
var = config_text
values=$(grep $var test_file.txt)
if [[ $values =~ [=(.*)(\b[a-zA-Z])]]; then
echo $values
fi
VALUES as I explained should be :
S O F
But it seems \b does not work on shell script.
Anyone have an idea how to get my desired output with BASH_REMATCH ONLY.
Thanks in advance for any help.
A generic BASH_REMATCH solution handling any number of words and any separator.
local input="STACK OVER FLOW" pattern='([[:upper:]]+)([^[:upper:]]*)' result=""
while [[ $input =~ $pattern ]]; do
result+="${BASH_REMATCH[1]::1}${BASH_REMATCH[2]}"
input="${input:${#BASH_REMATCH[0]}}"
done
echo "$result"
# Output: "S O F"
Bash's regexes are kind of cumbersome if you don't know how many words there are in the input string. How's this instead?
config_text="STACK OVER FLOW"
sed 's/\([^[:space:]]\)[^[:space:]]*/\1/g' <<<"$config_text"
First Put a valid shebang and paste your script at https://shellcheck.net for validation/recommendation.
With the assumption that the line starts with config and ends with FLOW e.g.
config_text = STACK OVER FLOW
Now the script.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
while IFS= read -r line; do
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
done < test_file.txt
If there is Only one line or the target string/pattern is at the first line of the test_file.txt, the while loop is not needed.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
IFS= read -r line < test_file.txt
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
Make sure you have and running/using Bashv4+ since MacOS, defaults to Bashv3
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Another option rather than bash regex would be to utilize bash parameter expansion substring ${parameter:offset:length} to extract the desired characters:
$ read -ra arr <text.file ; printf "%s%s%s\n" "${arr[2]:0:1}" "${arr[3]:0:1}" "${arr[4]:0:1}"
SOF

How to capture the longest match of a repeating pattern using BASH_REMATCH

I am trying to capture the longest match of a repeating pattern
do_run() {
local regex='.*((abc)+).*'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get the following output :
regex=.*((abc)+).*
str=_abcabcabc_
0=_abcabcabc123_
1=abc
2=abc
I am trying to get something like :
regex=.*((abc)+).*
str=_abcabcabc123_
0=_abcabcabc123_
x=abcabcabc
(Update : x is just here to indicate that the index of the matching group does not matter but I need to know what number to use to retrieve the matching group ...)
Update:
After reading comment, the following regex will work : ((abc)+)
However, I also need to capture what precedes and what follows ((abc)+).
I had not mentionned it earlier because I thought the same solution would be applied.
So the new code would be :
do_run() {
local regex='(.*)((abc)+)(.*)'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get then the following output :
regex=(.*)((abc)+)(.*)
str=_abcabcabc123_
0=_abcabcabc123_
1=_abcabc
2=abc
3=abc
4=123_
I want to be able to retrieve abcabcabc from a matching group but also what precedes it and what follows it
As a workaround you can do like this:
[STEP 101] $ cat foo.sh
v=_abcabcabc123_
if [[ $v =~ (abc)+ ]]; then
middle=${BASH_REMATCH[0]}
[[ $v =~ (.*)"$middle" ]]
before=${BASH_REMATCH[1]}
[[ $v =~ "$middle"(.*) ]]
after=${BASH_REMATCH[1]}
echo "before: $before"
echo "middle: $middle"
echo "after : $after"
fi
[STEP 102] $ bash foo.sh
before: _
middle: abcabcabc
after : 123_
[STEP 103] $
I also need to capture what precedes and what follows ((abc)+).
For that, typically you'll need a negative lookahead with perl regex, something along (?<!abc)((abs)+)(.*).
I am bad at perl regex, with perl-enabled grep I was able to this:
$ grep -oxP '(.*)(?<!abc)((abc)+)\K(.*)' <<<'_abcabcabc123_'
123_
$ grep -oP '((abc)+)' <<<'_abcabcabc123_'
abcabcabc
$ rev <<<'_abcabcabc123_' | grep -oP '(.*)(?<!cba)((cba)+)\K(.*)' | rev
_
Bash has no lookarounds and no perl regex. Consider using python or perl.
But you may use sed by splitting the part on the regex and then reading lines, which may be simpler:
$ readarray -t lines < <(<<<'_abcabcabc123_' sed -E 's/((abc)+)/\n&\n/'); declare -p lines
declare -a lines=([0]="_" [1]="abcabcabc" [2]="123_")
Another idea: you may use bash expansion to replace the abc parts by something unique, then split it on that separator:
$ IFS=' ' read -r before post < <(printf "%s\n" "${str//abc/ }") ; declare -p before post
declare -- before="_"
declare -- post="123_"
# or
$ IFS='#' read -r before post < <(<<<"${str//abc/#}" tr -s '#') ; declare -p before post
declare -- before="_"
declare -- post="123_"
For your given input this regex would work:
re='^([^a]|a[^b]*|ab[^c]*)((abc)+)(.*)'
str='_abcabcabc123_'
[[ $str =~ $re ]] && declare -p BASH_REMATCH
Output:
declare -ar BASH_REMATCH=([0]="_abcabcabc123_" [1]="_" [2]="abcabcabc" [3]="abc" [4]="123_")
So you can use:
"${BASH_REMATCH[1]}" # string before
"${BASH_REMATCH[2]}" # string containing all "abc"s
"${BASH_REMATCH[4]}" # string after
RegEx Demo

Double quotes in bash variable assignment and command substitution

I have a few questions about variable assignment and command substitution:
Why does \"<Enter> add a new line to the output
$ v1="1\"
2"
$ echo "$v1"
1"
2
?
Why
$ v2=$(echo -e "123\n\n\n")
$ echo "$v2"
123
while
$ v2=$(echo -e "123\n\n\n5")
$ echo "$v2"
123
5
?
How to correctly use quotes in such constructs:
v3="$(command "$v2")"
?
First question
< Enter > equal to new line, also equal to \n.
Use following code to explain:
function print_hex() {
HEXVAL=$(hexdump -e '"%X"' <<< "$1")
echo $HEXVAL
}
v1="
"
v2=$'\n'
print_hex $v1
print_hex $v2
---------output---------
A
A
In hex mode printing it is seen that v1 and v2 are equal.
Seconde question
echo manual explain link.
-e enable interpretation of backslash escapes
-E disable interpretation of backslash escapes (default)
Third question
Do you mean print the string or mean get the command output?
The following example v3 is print the string and v4 is get the command output.
v2=.
v3="\$(ls \"$v2\")"
v4=$(ls "$v2")
echo $v3
echo $v4
---------output---------
$(ls ".")
test1.sh

why the blackslash is not url encoded in this shell script?

I am trying to the url encode a string based on shell scripting.
I have downloaded a script from internet.
it is:
#!/bin/sh
url_encoder()
{
echo -n "$1" | awk -v ORS="" '{ gsub(/./,"&\n") ; print }' | while read l;
do
case "$l" in
[-_.~/a-zA-Z0-9] ) echo -n ${l} ;;
"" ) echo -n %20 ;;
* ) printf '%%%02X' "'$l"
esac
done
}
echo ""
}
The basic idea of the above codes is to
(1) convert a input string into the rows, each row has one character
(2) for each row, url encode the character
So If I run
$url_encoder "abc:"
the output would be "abc%3A", which is correct
But if I run
$url_encoder "\\" # I want to encode the backslash, so I use 2 "\" here
there is no output at all.
Do you know the reason why?
no need to use read which is slow, variable expansion can do a substring, no need to handle the space character specially, it can be handled as the default
url_encoder() {
local i str=$1 c
for ((i=0;i<${#str};i+=1)); do
c=${str:i:1}
case "$c" in
[-_.~/a-zA-Z0-9] ) echo -n "${c}" ;;
* ) printf '%%%02X' "'$c" ;;
esac
done
}
l='\'
printf '%%%02X' "'$l"
The reason why the backslash disapears is because it has a special meaning for read, -r option should be used to avoid.
https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html#index-read
Note ~ should also be encoded http://www.rfc-editor.org/rfc/rfc1738.txt
printf argument starting with a quote (single or double), handles only ascii character "'$c" (<128).
url_encoder() { (
LC_ALL=C
str=$1
for ((i=0;i<${#str};i+=1)); do
c=${str:i:1}
if [[ $c = [-_./a-zA-Z0-9] ]]; then
echo -n "${c}"
elif [[ $c = [$'\1'-$'\x7f'] ]]; then
printf '%%%02X' "'$c"
else
printf '%%%s' $(echo -n "$c" | od -An -tx1)
fi
done
)}
Nahuel Fouilleul's helpful answer explains the problem with your approach (-r is missing from your read command, resulting in unwanted interpretation of \ chars.) and offers a more efficient bash solution.
Here's a more efficient, POSIX-compliant solution (sh-compatible) that performs the encoding with a single awk command, assuming that the input string is composed only of characters in the ASCII/Unicode code-point range between 32 and 127, inclusively:
#!/bin/sh
url_encoder()
{
awk -v url="$1" -v ORS= 'BEGIN {
# Create lookup table that maps characters to their code points.
for(n=32;n<=127;n++) ord[sprintf("%c",n)]=n
# Process characters one by one, either passing them through, if they
# need no encoding, or converting them to their %-prefixed hex equivalent.
for(i=1;i<=length(url);++i) {
char = substr(url, i, 1)
if (char !~ "[-_.~/a-zA-Z0-9]") char = sprintf("%%%x", ord[char])
print char
}
printf "\n"
}'
}

How to print literal string "$1" in bash script?

I want to print string called "$1". But when I do this with echo it prints string which equals to "$1" variable. How can I print "$1" just like string?
for example:
set -- "output" # this sets $1 to be "output"
echo $1 # ==> output
But I want this:
echo $1 # ==> $1
You have to escape the $ to have it shown:
$ echo "\$1"
or, as noted by JeremyP in comments, just use single quotes so that the value of $1 does not get expanded:
$ echo '$1'
You need to either:
Enclose the variable in SINGLE quotes: echo '$1'
Escape the $ sign: echo "\$1"

Resources