Short way to escape HTML in Bash? - bash

The box has no Ruby/Python/Perl etc.
Only bash, sed, and awk.
A way is to replace chars by map, but it becomes tedious.
Perhaps some built-in functionality i'm not aware of?

Escaping HTML really just involves replacing three characters: <, >, and &. For extra points, you can also replace " and '. So, it's not a long sed script:
sed 's/&/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/'"'"'/\'/g'

Pure bash, no external programs:
function htmlEscape () {
local s
s=${1//&/&}
s=${s//</<}
s=${s//>/>}
s=${s//'"'/"}
printf -- %s "$s"
}
Just simple string substitution.

You can use recode utility:
echo 'He said: "Not sure that - 2<1"' | recode ascii..html
Output:
He said: "Not sure that - 2<1"

or use xmlstar Escape/Unescape special XML characters:
$ echo '<abc&def>'| xml esc
<abc&def>

I'm using jq:
$ echo "2 < 4 is 'TRUE'" | jq -Rr #html
2 < 4 is &apos;TRUE&apos;

This is an updated answer to miken32 "Pure bash, "no external programs":
bash 5.2 breaks backward compatibility in ways that are highly inconvenient.
From NEWS:
x. New shell option: patsub_replacement. When enabled, a '&' in
the replacement string of the pattern substitution expansion is
replaced by the portion of the string that matched the pattern.
Backslash will escape the '&' and insert a literal '&'.
The option is enabled by default. If you want to restore the previous
behavior, add shopt -u patsub_replacement.
So there is three ways to use miken32 code in bash 5.2+:
Either disable patsub_replacement:
shopt -u patsub_replacement
function htmlEscape () {
local s
s=${1//&/&}
s=${s//</<}
s=${s//>/>}
s=${s//'"'/"}
printf -- %s "$s"
}
, another option is to escape '&' with backslash in the replacement if you want to make it work regardless of the 5.2 feature, patsub_replacement:
function htmlEscape () {
local s
s=${1//&/\&}
s=${s//</\<}
s=${s//>/\>}
s=${s//'"'/\"}
printf -- %s "$s"
}
and another option is to quote string in the replacement:
function htmlEscape () {
local s
s=${1//&/"&"}
s=${s//</"<"}
s=${s//>/">"}
s=${s//'"'/"""}
printf -- %s "$s"
}

There's much better answers, but I just found this so I thought I'd share.
PN=`basename "$0"` # Program name
VER=`echo '$Revision: 1.1 $' | cut -d' ' -f2`
Usage () {
echo >&2 "$PN - encode HTML unsave characters, $VER
usage: $PN [file ...]"
exit 1
}
set -- `getopt h "$#"`
while [ $# -gt 0 ]
do
case "$1" in
--) shift; break;;
-h) Usage;;
-*) Usage;;
*) break;; # First file name
esac
shift
done
sed \
-e 's/&/\&/g' \
-e 's/"/\"/g' \
-e 's/</\</g' \
-e 's/>/\>/g' \
-e 's/„/\ä/g' \
-e 's/Ž/\Ä/g' \
-e 's/”/\ö/g' \
-e 's/™/\Ö/g' \
-e 's//\ü/g' \
-e 's/š/\Ü/g' \
-e 's/á/\ß/g' \
"$#"

The previous sed replacement defaces valid output like
<
into
&lt;
Adding a negative loook-ahead so "&" is only changed into "&" if that "&" isn't already followed by "amp;" fixes that:
sed 's/&(?!amp;)/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/'"'"'/\'/g'

Related

Double quotes in bash variable assignment and command substitution

I have a few questions about variable assignment and command substitution:
Why does \"<Enter> add a new line to the output
$ v1="1\"
2"
$ echo "$v1"
1"
2
?
Why
$ v2=$(echo -e "123\n\n\n")
$ echo "$v2"
123
while
$ v2=$(echo -e "123\n\n\n5")
$ echo "$v2"
123
5
?
How to correctly use quotes in such constructs:
v3="$(command "$v2")"
?
First question
< Enter > equal to new line, also equal to \n.
Use following code to explain:
function print_hex() {
HEXVAL=$(hexdump -e '"%X"' <<< "$1")
echo $HEXVAL
}
v1="
"
v2=$'\n'
print_hex $v1
print_hex $v2
---------output---------
A
A
In hex mode printing it is seen that v1 and v2 are equal.
Seconde question
echo manual explain link.
-e enable interpretation of backslash escapes
-E disable interpretation of backslash escapes (default)
Third question
Do you mean print the string or mean get the command output?
The following example v3 is print the string and v4 is get the command output.
v2=.
v3="\$(ls \"$v2\")"
v4=$(ls "$v2")
echo $v3
echo $v4
---------output---------
$(ls ".")
test1.sh

Bash double quotes getting single quoted

I have a problem, where my script argument, goes from this:
'_rofiarg: -mesg "State: Active:Enabled"'
To this (I strip the _rofiarg: using cut btw):
-mesg '"State:' 'Active:Enabled"'
As you can see, my purposeful double quotes get ignored, and Bash splits them into two words.
I'll rewrite a pseudocode here, as the original script has a lot of local dependencies to my configs.
#script1.sh
# $active and $state have some value, of course
script2.sh \
"_rofiarg: -mesg \"State: $active:$state\"" \
${somearray[#]};
#script2.sh
#Check input for rofi arguments, add them to Args(), all else goes to Data()
for Arg in "$#"; do
if [[ "$Arg" =~ _rofiarg:* ]]; then
Args+=( "$(echo "$Arg" | cut -c 11-)" );
else
Data+=( "$Arg" );
fi;
done;
After this I just pass the ${Args[#]} to the target program, in this case Rofi - like this:
out="$(
printf '%s\n' "${Data[#]}" |
rofi -config $CONF/rofi/config.cfg \
-dmenu -no-custom \
-padding $padd \
${Args[#]};
)";
I've been at this problem for hours. All of my statements about what actually gets passed to which program is logged using set -o xtrace, and I'm at a point, where I think I literally tried all random combinations of single, double, $'', and all other quote types.
The line
${Args[#]};
with the unquoted variable expansion is the point where the coherence between "State: and …" is lost. To prevent that, we have to quote the expansion, but in order to remove the quotes and to separate -mesg from "State: …", we have to evaluate the command; this gives:
out="$(
printf '%s\n' "${Data[#]}" |
eval rofi -config $CONF/rofi/config.cfg \
-dmenu -no-custom \
-padding $padd \
"${Args[#]}"
)"

Bash substring separated by spaces

I've a string like this
var="--env=test --arg=foo"
I've tried to use substring ${var#*=} to get for example test but don't find a way to separate spaces. Any idea or should I use cut?
You can use BASH regex:
var="--env=test --arg=foo"
[[ $var =~ =([^ ]+) ]] && echo "${BASH_REMATCH[1]}"
test
Using extglob, you can do this in even shorter code:
shopt -s extglob
echo "${var//+( *|+([! ])=)}"
test
You could use an array:
$ var="--env=test --arg=foo"
$ arr=($var)
$ printf "%s\n" "${arr[#]}"
--env=test
--arg=foo
where tokens in var are split by IFS (which defaults to: space, tab, newline). If you want to split only by space, just set IFS=' ':
IFS=' ' arr=($var)
util-linux (which should be part of any Linux distribution) has built in support for getopts command line parsing.
Usage in your case:
var="blub --env=test --arg=foo"
eval set -- $(getopt --longoptions env:,arg: -- $var)
while true ; do
case "$1" in
--arg)
echo "Arg is $2"
shift 2
;;
--env)
echo "Env is $2"
shift 2
;;
--) shift ; break ;;
esac
done

why the blackslash is not url encoded in this shell script?

I am trying to the url encode a string based on shell scripting.
I have downloaded a script from internet.
it is:
#!/bin/sh
url_encoder()
{
echo -n "$1" | awk -v ORS="" '{ gsub(/./,"&\n") ; print }' | while read l;
do
case "$l" in
[-_.~/a-zA-Z0-9] ) echo -n ${l} ;;
"" ) echo -n %20 ;;
* ) printf '%%%02X' "'$l"
esac
done
}
echo ""
}
The basic idea of the above codes is to
(1) convert a input string into the rows, each row has one character
(2) for each row, url encode the character
So If I run
$url_encoder "abc:"
the output would be "abc%3A", which is correct
But if I run
$url_encoder "\\" # I want to encode the backslash, so I use 2 "\" here
there is no output at all.
Do you know the reason why?
no need to use read which is slow, variable expansion can do a substring, no need to handle the space character specially, it can be handled as the default
url_encoder() {
local i str=$1 c
for ((i=0;i<${#str};i+=1)); do
c=${str:i:1}
case "$c" in
[-_.~/a-zA-Z0-9] ) echo -n "${c}" ;;
* ) printf '%%%02X' "'$c" ;;
esac
done
}
l='\'
printf '%%%02X' "'$l"
The reason why the backslash disapears is because it has a special meaning for read, -r option should be used to avoid.
https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html#index-read
Note ~ should also be encoded http://www.rfc-editor.org/rfc/rfc1738.txt
printf argument starting with a quote (single or double), handles only ascii character "'$c" (<128).
url_encoder() { (
LC_ALL=C
str=$1
for ((i=0;i<${#str};i+=1)); do
c=${str:i:1}
if [[ $c = [-_./a-zA-Z0-9] ]]; then
echo -n "${c}"
elif [[ $c = [$'\1'-$'\x7f'] ]]; then
printf '%%%02X' "'$c"
else
printf '%%%s' $(echo -n "$c" | od -An -tx1)
fi
done
)}
Nahuel Fouilleul's helpful answer explains the problem with your approach (-r is missing from your read command, resulting in unwanted interpretation of \ chars.) and offers a more efficient bash solution.
Here's a more efficient, POSIX-compliant solution (sh-compatible) that performs the encoding with a single awk command, assuming that the input string is composed only of characters in the ASCII/Unicode code-point range between 32 and 127, inclusively:
#!/bin/sh
url_encoder()
{
awk -v url="$1" -v ORS= 'BEGIN {
# Create lookup table that maps characters to their code points.
for(n=32;n<=127;n++) ord[sprintf("%c",n)]=n
# Process characters one by one, either passing them through, if they
# need no encoding, or converting them to their %-prefixed hex equivalent.
for(i=1;i<=length(url);++i) {
char = substr(url, i, 1)
if (char !~ "[-_.~/a-zA-Z0-9]") char = sprintf("%%%x", ord[char])
print char
}
printf "\n"
}'
}

Indirect parameter substitution in shell script

I'm having a problem with a shell script (POSIX shell under HP-UX, FWIW). I have a function called print_arg into which I'm passing the name of a parameter as $1. Given the name of the parameter, I then want to print the name and the value of that parameter. However, I keep getting an error. Here's an example of what I'm trying to do:
#!/usr/bin/sh
function print_arg
{
# $1 holds the name of the argument to be shown
arg=$1
# The following line errors off with
# ./test_print.sh[9]: argval=${"$arg"}: The specified substitution is not valid for this command.
argval=${"$arg"}
if [[ $argval != '' ]] ; then
printf "ftp_func: $arg='$argval'\n"
fi
}
COMMAND="XYZ"
print_arg "COMMAND"
I've tried re-writing the offending line every way I can think of. I've consulted the local oracles. I've checked the online "BASH Scripting Guide". And I sharpened up the ol' wavy-bladed knife and scrubbed the altar until it gleamed, but then I discovered that our local supply of virgins has been cut down to, like, nothin'. Drat!
Any advice regarding how to get the value of a parameter whose name is passed into a function as a parameter will be received appreciatively.
You could use eval, though using direct indirection as suggested by SiegeX is probably nicer if you can use bash.
#!/bin/sh
foo=bar
print_arg () {
arg=$1
eval argval=\"\$$arg\"
echo "$argval"
}
print_arg foo
In bash (but not in other sh implementations), indirection is done by: ${!arg}
Input
foo=bar
bar=baz
echo $foo
echo ${!foo}
Output
bar
baz
This worked surprisingly well:
#!/bin/sh
foo=bar
print_arg () {
local line name value
set | \
while read line; do
name=${line%=*} value=${line#*=\'}
if [ "$name" = "$1" ]; then
echo ${value%\'}
fi
done
}
print_arg foo
It has all the POSIX clunkiness, in Bash would be much sorter, but then again, you won't need it because you have ${!}. This -in case it proves solid- would have the advantage of using only builtins and no eval. If I were to construct this function using an external command, it would have to be sed. Would obviate the need for the read loop and the substitutions. Mind that asking for indirections in POSIX without eval, has to be paid with clunkiness! So don't beat me!
Even though the answer's already accepted, here's another method for those who need to preserve newlines and special characters like Escape ( \033 ): Storing the variable in base64.
You need: bc, wc, echo, tail, tr, uuencode, uudecode
Example
#!/bin/sh
#====== Definition =======#
varA="a
b
c"
# uuencode the variable
varB="`echo "$varA" | uuencode -m -`"
# Skip the first line of the uuencode output.
varB="`NUM=\`(echo "$varB"|wc -l|tr -d "\n"; echo -1)|bc \`; echo "$varB" | tail -n $NUM)`"
#====== Access =======#
namevar1=varB
namevar2=varA
echo simple eval:
eval "echo \$$namevar2"
echo simple echo:
echo $varB
echo precise echo:
echo "$varB"
echo echo of base64
eval "echo \$$namevar1"
echo echo of base64 - with updated newlines
eval "echo \$$namevar1 | tr ' ' '\n'"
echo echo of un-based, using sh instead of eval (but could be made with eval, too)
export $namevar1
sh -c "(echo 'begin-base64 644 -'; echo \$$namevar1 | tr ' ' '\n' )|uudecode"
Result
simple eval:
a b c
simple echo:
YQpiCmMK ====
precise echo:
YQpiCmMK
====
echo of base64
YQpiCmMK ====
echo of base64 - with updated newlines
YQpiCmMK
====
echo of un-based, using sh instead of eval (but could be made with eval, too)
a
b
c
Alternative
You also could use the set command and parse it's output; with that, you don't need to treat the variable in a special way before it's accessed.
A safer solution with eval:
v=1
valid_var_name='[[:alpha:]_][[:alnum:]_]*$'
print_arg() {
local arg=$1
if ! expr "$arg" : "$valid_var_name" >/dev/null; then
echo "$0: invalid variable name ($arg)" >&2
exit 1
fi
local argval
eval argval=\$$arg
echo "$argval"
}
print_arg v
print_arg 'v; echo test'
Inspired by the following answer.

Resources