How to insert variable and code after pattern using sed? - bash

I'm using a shell script to insert code with a variable after a previous code pattern in script.tex, however sed is not adding anything after the expected pattern.
cat script.tex
\multicolumn{1}{c}{st_var}
Expected result (script.tex) after script.sh is run:
\multicolumn{1}{c}{A} & \multicolumn{1}{c}{B} & \multicolumn{1}{c}{C} & \multicolumn{1}{c}{D} & \multicolumn{1}{c}{E} & \multicolumn{1}{c}{F} \\
Current result (script.tex):
\multicolumn{1}{c}{A}
The first part of the conditional is working as expected. The remaining is not being found by sed.
cat script.sh:
#!/bin/bash
var=("NA" "A" "B" "C" "D" "E" "F")
clen=$(( ${#var[#]} - 1 ))
cind=1
for (( i=1; i<${#var[#]}; i++ )) ; do
if [[ "$cind" -eq 1 ]]; then
sed -i 's/st_var/'${var[$i]//\"/}'/g' script.tex
elif [[ "$cind" -gt 1 ]] && [[ "$cind" -lt "$clen" ]]; then
sstr="\multicolumn{1}{c}{${var[$i-1]//\"/}}"
estr=" & \multicolumn{1}{c}{${var[$i]//\"/}}"
festr=" & \multicolumn{1}{c}{${var[$i]//\"/}} \\\\"
sed -i '/^${sstr}/ s/$/${estr}/' script.tex
else
sed -i '/^${sstr}/ s/$/${festr}/' script.tex
fi
cind=$((cind + 1))
done
The var array here must have all elements double quoted for other purposes outside of this question. Also, the var array is shown here for simplicity - the letters A-F could be any random string. The first element in the array here is skipped (NA).
The best attempt so far:
script.sh:
#!/bin/bash -x
var=("NA" "A" "B" "C" "D" "E" "F")
clen=$(( ${#var[#]} - 1 ))
cind=1
for (( i=1; i<${#var[#]}; i++ )) ; do
if [[ "$cind" -eq 1 ]]; then
sed -i 's/st_var/'${var[$i]//\"/}'/g' script.tex
elif [[ "$cind" -gt 1 ]] && [[ "$cind" -lt "$clen" ]]; then
sstr='\multicolumn{1}{c}{'${var[$i-1]//\"/}'}'
estr=' \& \multicolumn{1}{c}{'${var[$i]//\"/}'}'
festr=' \& \multicolumn{1}{c}{'${var[$i+1]//\"/}'} \\'
# sed -i '/$sstr/r $estr/' script.tex
# sed -i '/^'"${sstr}"'/'"${estr}"'/' script.tex
sed -i "s/$sstr/&$estr/" script.tex
else
sed -i "s/$sstr/&$festr/" script.tex
# sed -i '/^'"${sstr}"'/'"${festr}"'/' script.tex
fi
cind=$((cind + 1))
done
Result:
\multicolumn{1}{c}{A} & multicolumn{1}{c}{B} & multicolumn{1}{c}{C} & multicolumn{1}{c}{D} & multicolumn{1}{c}{F} \ & multicolumn{1}{c}{E}
The ampersands are coming through, however the backslashes before multicolumn aren't coming through, and neither are the two backslashes at the end of the line. E and F are also flipped - F should be last.

Consider a different approach. Instead of adding anything incrementally, which might be hard and confusing because you have to keep "state", just do one single run. One replacement and regex pattern.
var=("A" "B" "C" "D" "E" "F")
# Generate replacement for the line.
repl=$(
# Print var on separate lines with the stub
printf " \multicolumn{1}{c}{%s} \n" "${var[#]}" |
# join lines with & + space character
paste -sd '&'
)
# add trailing \\
repl+="\\\\"
# Remove leading space
repl=${repl:1}
# Properly escape
# see https://stackoverflow.com/questions/407523/escape-a-string-for-a-sed-replace-pattern
ESCAPED_REPLACE=$(printf '%s\n' "$repl" | sed -e 's/[\/&]/\\&/g')
KEYWORD="\multicolumn{1}{c}{st_var}";
ESCAPED_KEYWORD=$(printf '%s\n' "$KEYWORD" | sed -e 's/[]\/$*.^[]/\\&/g');
# Finally run sed
set -x
sed "s/^$ESCAPED_KEYWORD$/$ESCAPED_REPLACE/"
When executed, for the following input:
\multicolumn{1}{c}{st_var}
outputs:
+ sed 's/^\\multicolumn{1}{c}{st_var}$/\\multicolumn{1}{c}{A} \& \\multicolumn{1}{c}{B} \& \\multicolumn{1}{c}{C} \& \\multicolumn{1}{c}{D} \& \\multicolumn{1}{c}{E} \& \\multicolumn{1}{c}{F} \\\\/'
\multicolumn{1}{c}{A} & \multicolumn{1}{c}{B} & \multicolumn{1}{c}{C} & \multicolumn{1}{c}{D} & \multicolumn{1}{c}{E} & \multicolumn{1}{c}{F} \\

The following code works:
#!/bin/bash -x
var=("NA" "A" "B" "C" "D" "E" "F")
clen=$(( ${#var[#]} - 1 ))
cind=1
for (( i=1; i<${#var[#]}; i++ )) ; do
if [[ "$cind" -eq 1 ]]; then
sed -i 's/st_var/'${var[$i]//\"/}'/g' script.tex
elif [[ "$cind" -gt 1 ]] && [[ "$cind" -lt "$clen" ]]; then
sstr='\\multicolumn{1}{c}{'${var[$i-1]//\"/}'}'
estr=' \& \\multicolumn{1}{c}{'${var[$i]//\"/}'}'
festr=' \& \\multicolumn{1}{c}{'${var[$i+1]//\"/}'} \\\\'
sed -i "s/$sstr/&$estr/" script.tex
else
sed -i "s/$estr/&$festr/" script.tex
fi
cind=$((cind + 1))
done

This might work for you (GNU sed):
sed -E 's/\\multicolumn\{1\}\{c\}\{st_var\}/ ABCDEF\n&/
:a;ta;s/(\S)(\S*\n(.*)\{st_var\})/\3{\1} \& \2/;ta
s/ (.*)\&.*/\1\\\\/' file
Prepend a space, the values to substituted for st_var and a newline to the original sting \multicolumn{1}{c}{st_var}.
Iterate through each value prepending the original string with the new value substituted until no more values to be substituted exist.
Clean up the new string, removing the introduced newline and the original string and append \\.

Related

Get first character of each string with BASH_REMATCH

I'am trying to get the first character of each string using regex and BASH_REMATCH in shell script.
My input text file contain :
config_text = STACK OVER FLOW
The strings STACK OVER FLOW must be uppercase like that.
My output should be something like this :
SOF
My code for now is :
var = config_text
values=$(grep $var test_file.txt | tr -s ' ' '\n' | cut -c 1)
if [[ $values =~ [=(.*)]]; then
echo $values
fi
As you can see I'am using tr and cut but I'am looking to replace them with only BASH_REMATCH because these two commands have been reported in many links as not functional on MacOs.
I tried something like this :
var = config_text
values=$(grep $var test_file.txt)
if [[ $values =~ [=(.*)(\b[a-zA-Z])]]; then
echo $values
fi
VALUES as I explained should be :
S O F
But it seems \b does not work on shell script.
Anyone have an idea how to get my desired output with BASH_REMATCH ONLY.
Thanks in advance for any help.
A generic BASH_REMATCH solution handling any number of words and any separator.
local input="STACK OVER FLOW" pattern='([[:upper:]]+)([^[:upper:]]*)' result=""
while [[ $input =~ $pattern ]]; do
result+="${BASH_REMATCH[1]::1}${BASH_REMATCH[2]}"
input="${input:${#BASH_REMATCH[0]}}"
done
echo "$result"
# Output: "S O F"
Bash's regexes are kind of cumbersome if you don't know how many words there are in the input string. How's this instead?
config_text="STACK OVER FLOW"
sed 's/\([^[:space:]]\)[^[:space:]]*/\1/g' <<<"$config_text"
First Put a valid shebang and paste your script at https://shellcheck.net for validation/recommendation.
With the assumption that the line starts with config and ends with FLOW e.g.
config_text = STACK OVER FLOW
Now the script.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
while IFS= read -r line; do
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
done < test_file.txt
If there is Only one line or the target string/pattern is at the first line of the test_file.txt, the while loop is not needed.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
IFS= read -r line < test_file.txt
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
Make sure you have and running/using Bashv4+ since MacOS, defaults to Bashv3
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Another option rather than bash regex would be to utilize bash parameter expansion substring ${parameter:offset:length} to extract the desired characters:
$ read -ra arr <text.file ; printf "%s%s%s\n" "${arr[2]:0:1}" "${arr[3]:0:1}" "${arr[4]:0:1}"
SOF

Changing alternative character from lower to upper and upper to low - Unix shell script

How to convert the alternative character of a string passed to script, if it is lower then it should be converted to upper and if it is upper then to lower??
read -p " Enter string" str
for i in `seq 0 ${#str}`
do
#echo $i
rem=$(($i % 2 ))
if [ $rem -eq 0 ]
then
echo ${str:$i:1}
else
fr=${str:$i:1}
if [[ "$fr" =~ [A-Z] ]]
then
echo ${str:$i:1} | tr '[:upper:]' '[:lower:]'
elif [[ "$fr" =~ [a-z] ]]
then
echo ${str:$i:1} | tr '[:lower:]' '[:upper:]'
else
echo ""
fi
fi
done
Your question is a bit challenging given that it is tagged shell and not as a question pertaining to an advanced shell like bash or zsh. In POSIX shell, you have no string indexes, no C-style for loop, and no [[ .. ]] operator to use character class pattern matching.
However, with a bit of awkward creativity, the old expr and POSIX string and arithmetic operations, and limiting your character strings to ASCII characters, you can iterate over a string changing uppercase to lowercase and lowercase and uppercase while leaving all other characters unchanged.
I wouldn't recommend the approach if you have an advanced shell available, but if you are limited to POSIX shell, as your question is tagged, it will work, but don't expect it to be super-fast...
#!/bin/sh
a=${1:-"This Is My 10TH String"} ## input and output strings
b=
i=1 ## counter and string length
len=$(expr length "$a")
asciiA=$(printf "%d" "'A") ## ASCII values for A,Z,a,z
asciiZ=$(printf "%d" "'Z")
asciia=$(printf "%d" "'a")
asciiz=$(printf "%d" "'z")
echo "input : $a" ## output original string
while [ "$i" -le "$len" ]; do ## loop over each character
c=$(expr substr "$a" "$i" "1") ## extract char from string
asciic=$(printf "%d" "'$c") ## convert to ASCII value
## check if asciic is [A-Za-z]
if [ "$asciiA" -le "$asciic" -a "$asciic" -le "$asciiZ" ] ||
[ "$asciia" -le "$asciic" -a "$asciic" -le "$asciiz" ]
then ## toggle the sign bit (bit-6)
b="${b}$(printf "\x$(printf "%x" $((asciic ^ 1 << 5)))\n")"
else
b="$b$c" ## otherwise copy as is
fi
i=$(expr $i + 1)
done
echo "output: $b" ## output resluting string
The case change is affected by relying on a simple bit-toggle of the case-bit (bit-6) in the ASCII value of each upper or lower case character to change it from lower to upper or vice-versa. (and note, you can exchange the printf and bit-shift for tr of asciic as an alternative)
Example Use/Output
$ sh togglecase.sh
input : This Is My 10TH String
output: tHIS iS mY 10th sTRING
When you want to swab every second characters case, try this:
read -p " Enter string " str
for i in `seq 0 ${#str}`; do
rem=$(($i % 2 ))
if [ $rem -eq 0 ]
then
printf "%s" "${str:$i:1}"
else
fr=${str:$i:1}
printf "%s" "$(tr '[:upper:][:lower:]' '[:lower:][:upper:]' <<< "${str:$i:1}")"
fi
done
echo
EDIT: Second solution
Switch case of str and merge the old and new string.
#!/bin/bash
str="part is lowercase & PART IS UPPERCASE"
str2=$(tr '[:upper:][:lower:]' '[:lower:][:upper:]' <<< "${str}")
str_chopped=$(sed -r 's/(.)./\1\n/g' <<< "${str}");
# Will have 1 additional char for odd length str
# str2_chopped_incorrect=$(sed -r 's/.(.)/\1\n/g' <<< "${str2}");
str2_chopped=$(fold -w2 <<< "${str2}" | sed -nr 's/.(.)/\1/p' );
paste -d '\n' <(echo "${str_chopped}") <(echo "${str2_chopped}") | tr -d '\n'; echo

why the blackslash is not url encoded in this shell script?

I am trying to the url encode a string based on shell scripting.
I have downloaded a script from internet.
it is:
#!/bin/sh
url_encoder()
{
echo -n "$1" | awk -v ORS="" '{ gsub(/./,"&\n") ; print }' | while read l;
do
case "$l" in
[-_.~/a-zA-Z0-9] ) echo -n ${l} ;;
"" ) echo -n %20 ;;
* ) printf '%%%02X' "'$l"
esac
done
}
echo ""
}
The basic idea of the above codes is to
(1) convert a input string into the rows, each row has one character
(2) for each row, url encode the character
So If I run
$url_encoder "abc:"
the output would be "abc%3A", which is correct
But if I run
$url_encoder "\\" # I want to encode the backslash, so I use 2 "\" here
there is no output at all.
Do you know the reason why?
no need to use read which is slow, variable expansion can do a substring, no need to handle the space character specially, it can be handled as the default
url_encoder() {
local i str=$1 c
for ((i=0;i<${#str};i+=1)); do
c=${str:i:1}
case "$c" in
[-_.~/a-zA-Z0-9] ) echo -n "${c}" ;;
* ) printf '%%%02X' "'$c" ;;
esac
done
}
l='\'
printf '%%%02X' "'$l"
The reason why the backslash disapears is because it has a special meaning for read, -r option should be used to avoid.
https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html#index-read
Note ~ should also be encoded http://www.rfc-editor.org/rfc/rfc1738.txt
printf argument starting with a quote (single or double), handles only ascii character "'$c" (<128).
url_encoder() { (
LC_ALL=C
str=$1
for ((i=0;i<${#str};i+=1)); do
c=${str:i:1}
if [[ $c = [-_./a-zA-Z0-9] ]]; then
echo -n "${c}"
elif [[ $c = [$'\1'-$'\x7f'] ]]; then
printf '%%%02X' "'$c"
else
printf '%%%s' $(echo -n "$c" | od -An -tx1)
fi
done
)}
Nahuel Fouilleul's helpful answer explains the problem with your approach (-r is missing from your read command, resulting in unwanted interpretation of \ chars.) and offers a more efficient bash solution.
Here's a more efficient, POSIX-compliant solution (sh-compatible) that performs the encoding with a single awk command, assuming that the input string is composed only of characters in the ASCII/Unicode code-point range between 32 and 127, inclusively:
#!/bin/sh
url_encoder()
{
awk -v url="$1" -v ORS= 'BEGIN {
# Create lookup table that maps characters to their code points.
for(n=32;n<=127;n++) ord[sprintf("%c",n)]=n
# Process characters one by one, either passing them through, if they
# need no encoding, or converting them to their %-prefixed hex equivalent.
for(i=1;i<=length(url);++i) {
char = substr(url, i, 1)
if (char !~ "[-_.~/a-zA-Z0-9]") char = sprintf("%%%x", ord[char])
print char
}
printf "\n"
}'
}

Concatenating digits from a string in sh

Assuming that I have a string like this one:
string="1 0 . # 1 1 ? 2 2 4"
Is it possible to concatenate digits that are next to each other?
So that string be like: 10 . # 11 ? 224 ?
I found only basic things how to distinguish integers from other characters and how to "connect" them. But I have no idea how to iterate properly.
num=""
for char in $string; do
if [ $char -eq $char 2>/dev/null ] ; then
num=$num$char
Here's an almost pure-shell implementation -- transforming the string into a character per line and using a BashFAQ #1 while read loop.
string="1 0 . # 1 1 ? 2 2 4"
output=''
# replace spaces with newlines for easier handling
string=$(printf '%s\n' "$string" | tr ' ' '\n')
last_was_number=0
printf '%s\n' "$string" | {
while read -r char; do
if [ "$char" -eq "$char" ] 2>/dev/null; then # it's a number
if [ "$last_was_number" -eq "1" ]; then
output="$output$char"
last_was_number=1
continue
fi
last_was_number=1
else
last_was_number=0
fi
output="$output $char"
done
printf '%s\n' "$output"
}
To complement Charles Duffy's helpful, POSIX-compliant sh solution with a more concise perl alternative:
Note: perl is not part of POSIX, but it is preinstalled on most modern Unix-like platforms.
$ printf '%s\n' "1 0 . # 1 1 ? 2 2 4" | perl -pe 's/\d( \d)+/$& =~ s| ||gr/eg'
10 . # 11 ? 224
The outer substitution, s/\d( \d)+/.../eg, globally (g) finds runs of at least 2 adjacent digits (\d( \d)+), and replaces each run with the result of the expression (e) specified as the replacement string (represented as ... here).
The expression in the inner substitution, $& =~ s| ||gr, whose result is used as the replacement string, removes all spaces from each run of adjacent digits:
$& represents what the outer regex matched - the run of adjacent digits.
=~ applies the s call on the RHS to the LHS, i.e., $& (without this, the s call would implicitly apply to the entire input string, $_).
s| ||gr replaces all (g) instances of <space> from the value of the value of $& and returns (r) the result, effectively removing all spaces.
Note that | is used arbitrarily as the delimiter character for the s call, so as to avoid a clash with the customary / delimiter used by the outer s call.
POSIX compliant one-liner with sed:
string="1 0 . # 1 1 ? 2 2 4"
printf '%s\n' "$string" | sed -e ':b' -e ' s/\([0-9]\) \([0-9]\)/\1\2/g; tb'
It just iteratively removes the any space between two digits until there aren't any more, resulting in:
10 . # 11 ? 224
Here is my solution:
string="1 0 . # 1 1 ? 2 2 4"
array=(${string/// })
arraylength=${#array[#]}
pattern="[0-9]"
i=0
while true; do
str=""
start=$i
if [ $i -eq $arraylength ]; then
break;
fi
for (( j=$start; j<${arraylength}; j++ )) do
curr=${array[$j]}
i=$((i + 1))
if [[ $curr =~ $pattern ]]; then
str="$str$curr"
else
break
fi
done
echo $str
done

Bash variable substitution and strings

Let's say I have two variables:
a="AAA"
b="BBB"
I read a string from a file. This string is the following:
str='$a $b'
How to create a new string from the first one that substitutes the variables?
newstr="AAA BBB"
bash variable indirection whithout eval:
Well, as eval is evil, we may try to make this whithout them, by using indirection in variable names.
a="AAA"
b="BBB"
str='$a $b'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=($cnt)
done
newstr="${newstr[*]}"
echo $newstr
AAA BBB
Another try:
var1="Hello"
var2="2015"
str='$var1 world! Happy new year $var2'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=($cnt)
done
newstr="${newstr[*]}"
echo $newstr
Hello world! Happy new year 2015
Addendum As correctly pointed by #EtanReisner's comment, if your string do contain some * or other glob expendable stings, you may have to use set -f to prevent bad things:
cd /bin
var1="Hello"
var2="star"
var3="*"
str='$var1 this string contain a $var2 as $var3 *'
newstr=()
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt};
newstr+=("$cnt");
done;
newstr="${newstr[*]}"
echo "$newstr"
Hello this string contain a star as * bash bunzip2 busybox....zmore znew
echo ${#newstr}
1239
Note: I've added " at newstr+=("$cnt"); to prevent glob expansion, but set -f seem required...
newstr=()
set -f
for cnt in $str ;do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=("$cnt")
done
set +f
newstr="${newstr[*]}"
echo "$newstr"
Hello this string contain a star as * *
Nota 2: This is far away from a perfect solution. For sample if string do contain ponctuation, this won't work again... Example:
str='$var1, this string contain a $var2 as $var3: *'
with same variables as previous run will render:
' this string contain a star as *' because ${!var1,} and ${!var3:} don't exist.
... and if $str do contain special chars:
As #godblessfq asked:
If str contains a line break, how do I do the substitution and preserve the newline in the output?
So this is not robust as every indirected variable must be first, last or space separated from all special chars!
str=$'$var1 world!\n... 2nd line...'
var1=Hello
newstr=()
set -f
IFS=' ' read -d$'\377' -ra array <<<"$str"
for cnt in "${array[#]}";do
[ "${cnt:0:1}" == '$' ] && cnt=${cnt:1} && cnt=${!cnt}
newstr+=("$cnt")
done
set +f
newstr="${newstr[*]}"
echo "$newstr"
Hello world!
... 2nd line...
As <<< inline string add a trailing newline, last echo command could be written:
echo "${newstr%$'\n'}"
The easiest solution is to use eval:
eval echo "$str"
To assign it to a variable, use command substitution:
replaced=$(eval echo "$str")
Disclaimer: I only discovered perl an hour ago. But this seems to work robustly, whatever special characters you throw at it:
newstr=$(a2="$a" b2="$b" perl -pe 's/\$a\b/$ENV{a2}/g; s/\$b\b/$ENV{b2}/g' <(echo -e "$str"))
Test:
a='A*A\nA'
b='B*B\nB'
str='$a $aa * \n $b $bb'
newstr=$(a2="$a" b2="$b" perl -pe 's/\$a\b/$ENV{a2}/g; s/\$b\b/$ENV{b2}/g' <(echo -e "$str"))
echo -e "$newstr"
Output:
A*A
A $aa *
B*B
B $bb
I'd use awk solution with awk-variables. This will allow passing a text containing special chars and subsitute any placeholder with it.
a workaround to recognize $ would be using [\x24]:
awk -v a="$a" -v b="$b" '{gsub("[\x24]a",a);gsub("[\x24]b",b); print}' <<< $str
here
-v defines variable a="$a"
[x24] is ASCII for $, so [x24]a equal to $a
gsub(x,y) - replaces x with y

Resources