How to urlencode data for curl command? - bash

I am trying to write a bash script for testing that takes a parameter and sends it through curl to web site. I need to url encode the value to make sure that special characters are processed properly. What is the best way to do this?
Here is my basic script so far:
#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $#

Use curl --data-urlencode; from man curl:
This posts data, similar to the other --data options with the exception that this performs URL-encoding. To be CGI-compliant, the <data> part should begin with a name followed by a separator and a content specification.
Example usage:
curl \
--data-urlencode "paramName=value" \
--data-urlencode "secondParam=value" \
http://example.com
See the man page for more info.
This requires curl 7.18.0 or newer (released January 2008). Use curl -V to check which version you have.
You can as well encode the query string:
curl --get \
--data-urlencode "p1=value 1" \
--data-urlencode "p2=value 2" \
http://example.com
# http://example.com?p1=value%201&p2=value%202

Another option is to use jq:
$ printf %s 'input text'|jq -sRr #uri
input%20text
$ jq -rn --arg x 'input text' '$x|#uri'
input%20text
-r (--raw-output) outputs the raw contents of strings instead of JSON string literals. -n (--null-input) doesn't read input from STDIN.
-R (--raw-input) treats input lines as strings instead of parsing them as JSON, and -sR (--slurp --raw-input) reads the input into a single string. You can replace -sRr with -Rr if your input only contains a single line or if you don't want to replace linefeeds with %0A:
$ printf %s\\n multiple\ lines of\ text|jq -Rr #uri
multiple%20lines
of%20text
$ printf %s\\n multiple\ lines of\ text|jq -sRr #uri
multiple%20lines%0Aof%20text%0A
Or this percent-encodes all bytes:
xxd -p|tr -d \\n|sed 's/../%&/g'

Here is the pure BASH answer.
Update: Since many changes have been discussed, I have placed this on https://github.com/sfinktah/bash/blob/master/rawurlencode.inc.sh for anybody to issue a PR against.
Note: This solution is not intended to encode unicode or multi-byte characters - which are quite outside BASH's humble native capabilities. It's only intended to encode symbols that would otherwise ruin argument passing in POST or GET requests, e.g. '&', '=' and so forth.
Very Important Note: DO NOT ATTEMPT TO WRITE YOUR OWN UNICODE CONVERSION FUNCTION, IN ANY LANGUAGE, EVER. See end of answer.
rawurlencode() {
local string="${1}"
local strlen=${#string}
local encoded=""
local pos c o
for (( pos=0 ; pos<strlen ; pos++ )); do
c=${string:$pos:1}
case "$c" in
[-_.~a-zA-Z0-9] ) o="${c}" ;;
* ) printf -v o '%%%02x' "'$c"
esac
encoded+="${o}"
done
echo "${encoded}" # You can either set a return variable (FASTER)
REPLY="${encoded}" #+or echo the result (EASIER)... or both... :p
}
You can use it in two ways:
easier: echo http://url/q?=$( rawurlencode "$args" )
faster: rawurlencode "$args"; echo http://url/q?${REPLY}
[edited]
Here's the matching rawurldecode() function, which - with all modesty - is awesome.
# Returns a string in which the sequences with percent (%) signs followed by
# two hex digits have been replaced with literal characters.
rawurldecode() {
# This is perhaps a risky gambit, but since all escape characters must be
# encoded, we can replace %NN with \xNN and pass the lot to printf -b, which
# will decode hex for us
printf -v REPLY '%b' "${1//%/\\x}" # You can either set a return variable (FASTER)
echo "${REPLY}" #+or echo the result (EASIER)... or both... :p
}
With the matching set, we can now perform some simple tests:
$ diff rawurlencode.inc.sh \
<( rawurldecode "$( rawurlencode "$( cat rawurlencode.inc.sh )" )" ) \
&& echo Matched
Output: Matched
And if you really really feel that you need an external tool (well, it will go a lot faster, and might do binary files and such...) I found this on my OpenWRT router...
replace_value=$(echo $replace_value | sed -f /usr/lib/ddns/url_escape.sed)
Where url_escape.sed was a file that contained these rules:
# sed url escaping
s:%:%25:g
s: :%20:g
s:<:%3C:g
s:>:%3E:g
s:#:%23:g
s:{:%7B:g
s:}:%7D:g
s:|:%7C:g
s:\\:%5C:g
s:\^:%5E:g
s:~:%7E:g
s:\[:%5B:g
s:\]:%5D:g
s:`:%60:g
s:;:%3B:g
s:/:%2F:g
s:?:%3F:g
s^:^%3A^g
s:#:%40:g
s:=:%3D:g
s:&:%26:g
s:\$:%24:g
s:\!:%21:g
s:\*:%2A:g
While it is not impossible to write such a script in BASH (probably using xxd and a very lengthy ruleset) capable of handing UTF-8 input, there are faster and more reliable ways. Attempting to decode UTF-8 into UTF-32 is a non-trivial task to do with accuracy, though very easy to do inaccurately such that you think it works until the day it doesn't.
Even the Unicode Consortium removed their sample code after discovering it was no longer 100% compatible with the actual standard.
The Unicode standard is constantly evolving, and has become extremely nuanced. Any implementation you can whip together will not be properly compliant, and if by some extreme effort you managed it, it wouldn't stay compliant.

Use Perl's URI::Escape module and uri_escape function in the second line of your bash script:
...
value="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$2")"
...
Edit: Fix quoting problems, as suggested by Chris Johnsen in the comments. Thanks!

One of variants, may be ugly, but simple:
urlencode() {
local data
if [[ $# != 1 ]]; then
echo "Usage: $0 string-to-urlencode"
return 1
fi
data="$(curl -s -o /dev/null -w %{url_effective} --get --data-urlencode "$1" "")"
if [[ $? != 3 ]]; then
echo "Unexpected error" 1>&2
return 2
fi
echo "${data##/?}"
return 0
}
Here is the one-liner version for example (as suggested by Bruno):
date | curl -Gso /dev/null -w %{url_effective} --data-urlencode #- "" | cut -c 3-
# If you experience the trailing %0A, use
date | curl -Gso /dev/null -w %{url_effective} --data-urlencode #- "" | sed -E 's/..(.*).../\1/'

for the sake of completeness, many solutions using sed or awk only translate a special set of characters and are hence quite large by code size and also dont translate other special characters that should be encoded.
a safe way to urlencode would be to just encode every single byte - even those that would've been allowed.
echo -ne 'some random\nbytes' | xxd -plain | tr -d '\n' | sed 's/\(..\)/%\1/g'
xxd is taking care here that the input is handled as bytes and not characters.
edit:
xxd comes with the vim-common package in Debian and I was just on a system where it was not installed and I didnt want to install it. The altornative is to use hexdump from the bsdmainutils package in Debian. According to the following graph, bsdmainutils and vim-common should have an about equal likelihood to be installed:
http://qa.debian.org/popcon-png.php?packages=vim-common%2Cbsdmainutils&show_installed=1&want_legend=1&want_ticks=1
but nevertheless here a version which uses hexdump instead of xxd and allows to avoid the tr call:
echo -ne 'some random\nbytes' | hexdump -v -e '/1 "%02x"' | sed 's/\(..\)/%\1/g'

I find it more readable in python:
encoded_value=$(python3 -c "import urllib.parse; print urllib.parse.quote('''$value''')")
the triple ' ensures that single quotes in value won't hurt. urllib is in the standard library. It work for example for this crazy (real world) url:
"http://www.rai.it/dl/audio/" "1264165523944Ho servito il re d'Inghilterra - Puntata 7

I've found the following snippet useful to stick it into a chain of program calls, where URI::Escape might not be installed:
perl -p -e 's/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg'
(source)

If you wish to run GET request and use pure curl just add --get to #Jacob's solution.
Here is an example:
curl -v --get --data-urlencode "access_token=$(cat .fb_access_token)" https://graph.facebook.com/me/feed

This may be the best one:
after=$(echo -e "$before" | od -An -tx1 | tr ' ' % | xargs printf "%s")

Direct link to awk version : http://www.shelldorado.com/scripts/cmds/urlencode
I used it for years and it works like a charm
:
##########################################################################
# Title : urlencode - encode URL data
# Author : Heiner Steven (heiner.steven#odn.de)
# Date : 2000-03-15
# Requires : awk
# Categories : File Conversion, WWW, CGI
# SCCS-Id. : #(#) urlencode 1.4 06/10/29
##########################################################################
# Description
# Encode data according to
# RFC 1738: "Uniform Resource Locators (URL)" and
# RFC 1866: "Hypertext Markup Language - 2.0" (HTML)
#
# This encoding is used i.e. for the MIME type
# "application/x-www-form-urlencoded"
#
# Notes
# o The default behaviour is not to encode the line endings. This
# may not be what was intended, because the result will be
# multiple lines of output (which cannot be used in an URL or a
# HTTP "POST" request). If the desired output should be one
# line, use the "-l" option.
#
# o The "-l" option assumes, that the end-of-line is denoted by
# the character LF (ASCII 10). This is not true for Windows or
# Mac systems, where the end of a line is denoted by the two
# characters CR LF (ASCII 13 10).
# We use this for symmetry; data processed in the following way:
# cat | urlencode -l | urldecode -l
# should (and will) result in the original data
#
# o Large lines (or binary files) will break many AWK
# implementations. If you get the message
# awk: record `...' too long
# record number xxx
# consider using GNU AWK (gawk).
#
# o urlencode will always terminate it's output with an EOL
# character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
# urldecode
##########################################################################
PN=`basename "$0"` # Program name
VER='1.4'
: ${AWK=awk}
Usage () {
echo >&2 "$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
-l: encode line endings (result will be one line of output)
The default is to encode each input line on its own."
exit 1
}
Msg () {
for MsgLine
do echo "$PN: $MsgLine" >&2
done
}
Fatal () { Msg "$#"; exit 1; }
set -- `getopt hl "$#" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage # "getopt" detected an error
EncodeEOL=no
while [ $# -gt 0 ]
do
case "$1" in
-l) EncodeEOL=yes;;
--) shift; break;;
-h) Usage;;
-*) Usage;;
*) break;; # First file name
esac
shift
done
LANG=C export LANG
$AWK '
BEGIN {
# We assume an awk implementation that is just plain dumb.
# We will convert an character to its ASCII value with the
# table ord[], and produce two-digit hexadecimal output
# without the printf("%02X") feature.
EOL = "%0A" # "end of line" string (encoded)
split ("1 2 3 4 5 6 7 8 9 A B C D E F", hextab, " ")
hextab [0] = 0
for ( i=1; i<=255; ++i ) ord [ sprintf ("%c", i) "" ] = i + 0
if ("'"$EncodeEOL"'" == "yes") EncodeEOL = 1; else EncodeEOL = 0
}
{
encoded = ""
for ( i=1; i<=length ($0); ++i ) {
c = substr ($0, i, 1)
if ( c ~ /[a-zA-Z0-9.-]/ ) {
encoded = encoded c # safe character
} else if ( c == " " ) {
encoded = encoded "+" # special handling
} else {
# unsafe character, encode it as a two-digit hex-number
lo = ord [c] % 16
hi = int (ord [c] / 16);
encoded = encoded "%" hextab [hi] hextab [lo]
}
}
if ( EncodeEOL ) {
printf ("%s", encoded EOL)
} else {
print encoded
}
}
END {
#if ( EncodeEOL ) print ""
}
' "$#"

Here's a Bash solution which doesn't invoke any external programs:
uriencode() {
s="${1//'%'/%25}"
s="${s//' '/%20}"
s="${s//'"'/%22}"
s="${s//'#'/%23}"
s="${s//'$'/%24}"
s="${s//'&'/%26}"
s="${s//'+'/%2B}"
s="${s//','/%2C}"
s="${s//'/'/%2F}"
s="${s//':'/%3A}"
s="${s//';'/%3B}"
s="${s//'='/%3D}"
s="${s//'?'/%3F}"
s="${s//'#'/%40}"
s="${s//'['/%5B}"
s="${s//']'/%5D}"
printf %s "$s"
}

url=$(echo "$1" | sed -e 's/%/%25/g' -e 's/ /%20/g' -e 's/!/%21/g' -e 's/"/%22/g' -e 's/#/%23/g' -e 's/\$/%24/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' -e 's/(/%28/g' -e 's/)/%29/g' -e 's/\*/%2a/g' -e 's/+/%2b/g' -e 's/,/%2c/g' -e 's/-/%2d/g' -e 's/\./%2e/g' -e 's/\//%2f/g' -e 's/:/%3a/g' -e 's/;/%3b/g' -e 's//%3e/g' -e 's/?/%3f/g' -e 's/#/%40/g' -e 's/\[/%5b/g' -e 's/\\/%5c/g' -e 's/\]/%5d/g' -e 's/\^/%5e/g' -e 's/_/%5f/g' -e 's/`/%60/g' -e 's/{/%7b/g' -e 's/|/%7c/g' -e 's/}/%7d/g' -e 's/~/%7e/g')
this will encode the string inside of $1 and output it in $url. although you don't have to put it in a var if you want. BTW didn't include the sed for tab thought it would turn it into spaces

Using php from a shell script:
value="http://www.google.com"
encoded=$(php -r "echo rawurlencode('$value');")
# encoded = "http%3A%2F%2Fwww.google.com"
echo $(php -r "echo rawurldecode('$encoded');")
# returns: "http://www.google.com"
http://www.php.net/manual/en/function.rawurlencode.php
http://www.php.net/manual/en/function.rawurldecode.php

If you don't want to depend on Perl you can also use sed. It's a bit messy, as each character has to be escaped individually. Make a file with the following contents and call it urlencode.sed
s/%/%25/g
s/ /%20/g
s/ /%09/g
s/!/%21/g
s/"/%22/g
s/#/%23/g
s/\$/%24/g
s/\&/%26/g
s/'\''/%27/g
s/(/%28/g
s/)/%29/g
s/\*/%2a/g
s/+/%2b/g
s/,/%2c/g
s/-/%2d/g
s/\./%2e/g
s/\//%2f/g
s/:/%3a/g
s/;/%3b/g
s//%3e/g
s/?/%3f/g
s/#/%40/g
s/\[/%5b/g
s/\\/%5c/g
s/\]/%5d/g
s/\^/%5e/g
s/_/%5f/g
s/`/%60/g
s/{/%7b/g
s/|/%7c/g
s/}/%7d/g
s/~/%7e/g
s/ /%09/g
To use it do the following.
STR1=$(echo "https://www.example.com/change&$ ^this to?%checkthe#-functionality" | cut -d\? -f1)
STR2=$(echo "https://www.example.com/change&$ ^this to?%checkthe#-functionality" | cut -d\? -f2)
OUT2=$(echo "$STR2" | sed -f urlencode.sed)
echo "$STR1?$OUT2"
This will split the string into a part that needs encoding, and the part that is fine, encode the part that needs it, then stitches back together.
You can put that into a sh script for convenience, maybe have it take a parameter to encode, put it on your path and then you can just call:
urlencode https://www.exxample.com?isThisFun=HellNo
source

You can emulate javascript's encodeURIComponent in perl. Here's the command:
perl -pe 's/([^a-zA-Z0-9_.!~*()'\''-])/sprintf("%%%02X", ord($1))/ge'
You could set this as a bash alias in .bash_profile:
alias encodeURIComponent='perl -pe '\''s/([^a-zA-Z0-9_.!~*()'\''\'\'''\''-])/sprintf("%%%02X",ord($1))/ge'\'
Now you can pipe into encodeURIComponent:
$ echo -n 'hèllo wôrld!' | encodeURIComponent
h%C3%A8llo%20w%C3%B4rld!

Python 3 based on #sandro's good answer from 2010:
echo "Test & /me" | python -c "import urllib.parse;print (urllib.parse.quote(input()))"
Test%20%26%20/me

This nodejs-based answer will use encodeURIComponent on stdin:
uriencode_stdin() {
node -p 'encodeURIComponent(require("fs").readFileSync(0))'
}
echo -n $'hello\nwörld' | uriencode_stdin
hello%0Aw%C3%B6rld

For those of you looking for a solution that doesn't need perl, here is one that only needs hexdump and awk:
url_encode() {
[ $# -lt 1 ] && { return; }
encodedurl="$1";
# make sure hexdump exists, if not, just give back the url
[ ! -x "/usr/bin/hexdump" ] && { return; }
encodedurl=`
echo $encodedurl | hexdump -v -e '1/1 "%02x\t"' -e '1/1 "%_c\n"' |
LANG=C awk '
$1 == "20" { printf("%s", "+"); next } # space becomes plus
$1 ~ /0[adAD]/ { next } # strip newlines
$2 ~ /^[a-zA-Z0-9.*()\/-]$/ { printf("%s", $2); next } # pass through what we can
{ printf("%%%s", $1) } # take hex value of everything else
'`
}
Stitched together from a couple of places across the net and some local trial and error. It works great!

uni2ascii is very handy:
$ echo -ne '你好世界' | uni2ascii -aJ
%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C

Simple PHP option:
echo 'part-that-needs-encoding' | php -R 'echo urlencode($argn);'

What would parse URLs better than javascript?
node -p "encodeURIComponent('$url')"

Here is a POSIX function to do that:
url_encode() {
awk 'BEGIN {
for (n = 0; n < 125; n++) {
m[sprintf("%c", n)] = n
}
n = 1
while (1) {
s = substr(ARGV[1], n, 1)
if (s == "") {
break
}
t = s ~ /[[:alnum:]_.!~*\47()-]/ ? t s : t sprintf("%%%02X", m[s])
n++
}
print t
}' "$1"
}
Example:
value=$(url_encode "$2")

The question is about doing this in bash and there's no need for python or perl as there is in fact a single command that does exactly what you want - "urlencode".
value=$(urlencode "${2}")
This is also much better, as the above perl answer, for example, doesn't encode all characters correctly. Try it with the long dash you get from Word and you get the wrong encoding.
Note, you need "gridsite-clients" installed to provide this command:
sudo apt install gridsite-clients

Here's the node version:
uriencode() {
node -p "encodeURIComponent('${1//\'/\\\'}')"
}

Another php approach:
echo "encode me" | php -r "echo urlencode(file_get_contents('php://stdin'));"

Here is my version for busybox ash shell for an embedded system, I originally adopted Orwellophile's variant:
urlencode()
{
local S="${1}"
local encoded=""
local ch
local o
for i in $(seq 0 $((${#S} - 1)) )
do
ch=${S:$i:1}
case "${ch}" in
[-_.~a-zA-Z0-9])
o="${ch}"
;;
*)
o=$(printf '%%%02x' "'$ch")
;;
esac
encoded="${encoded}${o}"
done
echo ${encoded}
}
urldecode()
{
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}

Ruby, for completeness
value="$(ruby -r cgi -e 'puts CGI.escape(ARGV[0])' "$2")"

Here's a one-line conversion using Lua, similar to blueyed's answer except with all the RFC 3986 Unreserved Characters left unencoded (like this answer):
url=$(echo 'print((arg[1]:gsub("([^%w%-%.%_%~])",function(c)return("%%%02X"):format(c:byte())end)))' | lua - "$1")
Additionally, you may need to ensure that newlines in your string are converted from LF to CRLF, in which case you can insert a gsub("\r?\n", "\r\n") in the chain before the percent-encoding.
Here's a variant that, in the non-standard style of application/x-www-form-urlencoded, does that newline normalization, as well as encoding spaces as '+' instead of '%20' (which could probably be added to the Perl snippet using a similar technique).
url=$(echo 'print((arg[1]:gsub("\r?\n", "\r\n"):gsub("([^%w%-%.%_%~ ]))",function(c)return("%%%02X"):format(c:byte())end):gsub(" ","+"))' | lua - "$1")

In this case, I needed to URL encode the hostname. Don't ask why. Being a minimalist, and a Perl fan, here's what I came up with.
url_encode()
{
echo -n "$1" | perl -pe 's/[^a-zA-Z0-9\/_.~-]/sprintf "%%%02x", ord($&)/ge'
}
Works perfectly for me.

Related

Bash: decode string with url escaped hex codes [duplicate]

I'm looking for a way to turn this:
hello < world
to this:
hello < world
I could use sed, but how can this be accomplished without using cryptic regex?
Try recode (archived page; GitHub mirror; Debian page):
$ echo '<' |recode html..ascii
<
Install on Linux and similar Unix-y systems:
$ sudo apt-get install recode
Install on Mac OS using:
$ brew install recode
With perl:
cat foo.html | perl -MHTML::Entities -pe 'decode_entities($_);'
With php from the command line:
cat foo.html | php -r 'while(($line=fgets(STDIN)) !== FALSE) echo html_entity_decode($line, ENT_QUOTES|ENT_HTML401);'
An alternative is to pipe through a web browser -- such as:
echo '!' | w3m -dump -T text/html
This worked great for me in cygwin, where downloading and installing distributions are difficult.
This answer was found here
Using xmlstarlet:
echo 'hello < world' | xmlstarlet unesc
A python 3.2+ version:
cat foo.html | python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]'
This answer is based on: Short way to escape HTML in Bash? which works fine for grabbing answers (using wget) on Stack Exchange and converting HTML to regular ASCII characters:
sed 's/ / /g; s/&/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/#'/\'"'"'/g; s/“/\"/g; s/”/\"/g;'
Edit 1: April 7, 2017 - Added left double quote and right double quote conversion. This is part of bash script that web-scrapes SE answers and compares them to local code files here: Ask Ubuntu -
Code Version Control between local files and Ask Ubuntu answers
Edit June 26, 2017
Using sed was taking ~3 seconds to convert HTML to ASCII on a 1K line file from Ask Ubuntu / Stack Exchange. As such I was forced to use Bash built-in search and replace for ~1 second response time.
Here's the function:
LineOut="" # Make global
HTMLtoText () {
LineOut=$1 # Parm 1= Input line
# Replace external command: Line=$(sed 's/&/\&/g; s/</\</g;
# s/>/\>/g; s/"/\"/g; s/'/\'"'"'/g; s/“/\"/g;
# s/”/\"/g;' <<< "$Line") -- With faster builtin commands.
LineOut="${LineOut// / }"
LineOut="${LineOut//&/&}"
LineOut="${LineOut//</<}"
LineOut="${LineOut//>/>}"
LineOut="${LineOut//"/'"'}"
LineOut="${LineOut//'/"'"}"
LineOut="${LineOut//“/'"'}" # TODO: ASCII/ISO for opening quote
LineOut="${LineOut//”/'"'}" # TODO: ASCII/ISO for closing quote
} # HTMLtoText ()
On macOS, you can use the built-in command textutil (which is a handy utility in general):
echo '👋 hello < world 🌐' | textutil -convert txt -format html -stdin -stdout
outputs:
👋 hello < world 🌐
To support the unescaping of all HTML entities only with sed substitutions would require too long a list of commands to be practical, because every Unicode code point has at least two corresponding HTML entities.
But it can be done using only sed, grep, the Bourne shell and basic UNIX utilities (the GNU coreutils or equivalent):
#!/bin/sh
htmlEscDec2Hex() {
file=$1
[ ! -r "$file" ] && file=$(mktemp) && cat >"$file"
printf -- \
"$(sed 's/\\/\\\\/g;s/%/%%/g;s/&#[0-9]\{1,10\};/\&#x%x;/g' "$file")\n" \
$(grep -o '&#[0-9]\{1,10\};' "$file" | tr -d '&#;')
[ x"$1" != x"$file" ] && rm -f -- "$file"
}
htmlHexUnescape() {
printf -- "$(
sed 's/\\/\\\\/g;s/%/%%/g
;s/&#x\([0-9a-fA-F]\{1,8\}\);/\&#x0000000\1;/g
;s/&#x0*\([0-9a-fA-F]\{4\}\);/\\u\1/g
;s/&#x0*\([0-9a-fA-F]\{8\}\);/\\U\1/g' )\n"
}
htmlEscDec2Hex "$1" | htmlHexUnescape \
| sed -f named_entities.sed
Note, however, that a printf implementation supporting \uHHHH and \UHHHHHHHH sequences is required, such as the GNU utility’s. To test, check for example that printf "\u00A7\n" prints §. To call the utility instead of the shell built-in, replace the occurrences of printf with env printf.
This script uses an additional file, named_entities.sed, in order to support the named entities. It can be generated from the specification using the following HTML page:
<!DOCTYPE html>
<head><meta charset="utf-8" /></head>
<body>
<p id="sed-script"></p>
<script type="text/javascript">
const referenceURL = 'https://html.spec.whatwg.org/entities.json';
function writeln(element, text) {
element.appendChild( document.createTextNode(text) );
element.appendChild( document.createElement("br") );
}
(async function(container) {
const json = await (await fetch(referenceURL)).json();
container.innerHTML = "";
writeln(container, "#!/usr/bin/sed -f");
const addLast = [];
for (const name in json) {
const characters = json[name].characters
.replace("\\", "\\\\")
.replace("/", "\\/");
const command = "s/" + name + "/" + characters + "/g";
if ( name.endsWith(";") ) {
writeln(container, command);
} else {
addLast.push(command);
}
}
for (const command of addLast) { writeln(container, command); }
})( document.getElementById("sed-script") );
</script>
</body></html>
Simply open it in a modern browser, and save the resulting page as text as named_entities.sed. This sed script can also be used alone if only named entities are required; in this case it is convenient to give it executable permission so that it can be called directly.
Now the above shell script can be used as ./html_unescape.sh foo.html, or inside a pipeline reading from standard input.
For example, if for some reason it is needed to process the data by chunks (it might be the case if printf is not a shell built-in and the data to process is large), one could use it as:
nLines=20
seq 1 $nLines $(grep -c $ "$inputFile") | while read n
do sed -n "$n,$((n+nLines-1))p" "$inputFile" | ./html_unescape.sh
done
Explanation of the script follows.
There are three types of escape sequences that need to be supported:
&#D; where D is the decimal value of the escaped character’s Unicode code point;
&#xH; where H is the hexadecimal value of the escaped character’s Unicode code point;
&N; where N is the name of one of the named entities for the escaped character.
The &N; escapes are supported by the generated named_entities.sed script which simply performs the list of substitutions.
The central piece of this method for supporting the code point escapes is the printf utility, which is able to:
print numbers in hexadecimal format, and
print characters from their code point’s hexadecimal value (using the escapes \uHHHH or \UHHHHHHHH).
The first feature, with some help from sed and grep, is used to reduce the &#D; escapes into &#xH; escapes. The shell function htmlEscDec2Hex does that.
The function htmlHexUnescape uses sed to transform the &#xH; escapes into printf’s \u/\U escapes, then uses the second feature to print the unescaped characters.
I like the Perl answer given in https://stackoverflow.com/a/13161719/1506477.
cat foo.html | perl -MHTML::Entities -pe 'decode_entities($_);'
But, it produced an unequal number of lines on plain text files. (and I dont know perl enough to debug it.)
I like the python answer given in https://stackoverflow.com/a/42672936/1506477 --
python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]'
but it creates a list [ ... for l in sys.stdin] in memory, that is forbidden for large files.
Here is another easy pythonic way without buffering in memory: using awkg.
$ echo 'hello < : " world' | \
awkg -b 'from html import unescape' 'print(unescape(R0))'
hello < : " world
awkg is a python based awk-like line processor. You may install it using pip https://pypi.org/project/awkg/:
pip install awkg
-b is awk's BEGIN{} block that runs once in the beginning.
Here we just did from html import unescape.
Each line record is in R0 variable, for which we did
print(unescape(R0))
Disclaimer:
I am the maintainer of awkg
I have created a sed script based on the list of entities so it must handle most of the entities.
sed -f htmlentities.sed < file.html
My original answer got some comments, that recode does not work for UTF-8 encoded HTML files. This is correct. recode supports only HTML 4. The encoding HTML is an alias for HTML_4.0:
$ recode -l | grep -iw html
HTML-i18n 2070 RFC2070
HTML_4.0 h h4 HTML
The default encoding for HTML 4 is Latin-1. This has changed in HTML 5. The default encoding for HTML 5 is UTF-8. This is the reason, why recode does not work for HTML 5 files.
HTML 5 defines the list of entities here:
https://html.spec.whatwg.org/multipage/named-characters.html
The definition includes a machine readable specification in JSON format:
https://html.spec.whatwg.org/entities.json
The JSON file can be used to perform a simple text replacement. The following example is a self modifying Perl script, which caches the JSON specification in its DATA chunk.
Note: For some obscure compatibility reasons, the specification allows entities without a terminating semicolon. Because of that the entities are sorted by length in reverse order to make sure, that the correct entities are replaced first so that they do not get destroyed by entities without the ending semicolon.
#! /usr/bin/perl
use utf8;
use strict;
use warnings;
use open qw(:std :utf8);
use LWP::Simple;
use JSON::Parse qw(parse_json);
my $entities;
INIT {
if (eof DATA) {
my $data = tell DATA;
open DATA, '+<', $0;
seek DATA, $data, 0;
my $entities_json = get 'https://html.spec.whatwg.org/entities.json';
print DATA $entities_json;
truncate DATA, tell DATA;
close DATA;
$entities = parse_json ($entities_json);
} else {
local $/ = undef;
$entities = parse_json (<DATA>);
}
}
local $/ = undef;
my $html = <>;
for my $entity (sort { length $b <=> length $a } keys %$entities) {
my $characters = $entities->{$entity}->{characters};
$html =~ s/$entity/$characters/g;
}
print $html;
__DATA__
Example usage:
$ echo '😊 & ٱلْعَرَبِيَّة' | ./html5-to-utf8.pl
😊 & ٱلْعَرَبِيَّة
With Xidel:
echo 'hello < : " world' | xidel -s - -e 'parse-html($raw)'
hello < : " world

Double quotes containing variable not working in sed [duplicate]

In my bash script I have an external (received from user) string, which I should use in sed pattern.
REPLACE="<funny characters here>"
sed "s/KEYWORD/$REPLACE/g"
How can I escape the $REPLACE string so it would be safely accepted by sed as a literal replacement?
NOTE: The KEYWORD is a dumb substring with no matches etc. It is not supplied by user.
Warning: This does not consider newlines. For a more in-depth answer, see this SO-question instead. (Thanks, Ed Morton & Niklas Peter)
Note that escaping everything is a bad idea. Sed needs many characters to be escaped to get their special meaning. For example, if you escape a digit in the replacement string, it will turn in to a backreference.
As Ben Blank said, there are only three characters that need to be escaped in the replacement string (escapes themselves, forward slash for end of statement and & for replace all):
ESCAPED_REPLACE=$(printf '%s\n' "$REPLACE" | sed -e 's/[\/&]/\\&/g')
# Now you can use ESCAPED_REPLACE in the original sed statement
sed "s/KEYWORD/$ESCAPED_REPLACE/g"
If you ever need to escape the KEYWORD string, the following is the one you need:
sed -e 's/[]\/$*.^[]/\\&/g'
And can be used by:
KEYWORD="The Keyword You Need";
ESCAPED_KEYWORD=$(printf '%s\n' "$KEYWORD" | sed -e 's/[]\/$*.^[]/\\&/g');
# Now you can use it inside the original sed statement to replace text
sed "s/$ESCAPED_KEYWORD/$ESCAPED_REPLACE/g"
Remember, if you use a character other than / as delimiter, you need replace the slash in the expressions above wih the character you are using. See PeterJCLaw's comment for explanation.
Edited: Due to some corner cases previously not accounted for, the commands above have changed several times. Check the edit history for details.
The sed command allows you to use other characters instead of / as separator:
sed 's#"http://www\.fubar\.com"#URL_FUBAR#g'
The double quotes are not a problem.
The only three literal characters which are treated specially in the replace clause are / (to close the clause), \ (to escape characters, backreference, &c.), and & (to include the match in the replacement). Therefore, all you need to do is escape those three characters:
sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
Example:
$ export REPLACE="'\"|\\/><&!"
$ echo fooKEYWORDbar | sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
foo'"|\/><&!bar
Based on Pianosaurus's regular expressions, I made a bash function that escapes both keyword and replacement.
function sedeasy {
sed -i "s/$(echo $1 | sed -e 's/\([[\/.*]\|\]\)/\\&/g')/$(echo $2 | sed -e 's/[\/&]/\\&/g')/g" $3
}
Here's how you use it:
sedeasy "include /etc/nginx/conf.d/*" "include /apps/*/conf/nginx.conf" /etc/nginx/nginx.conf
It's a bit late to respond... but there IS a much simpler way to do this. Just change the delimiter (i.e., the character that separates fields). So, instead of s/foo/bar/ you write s|bar|foo.
And, here's the easy way to do this:
sed 's|/\*!50017 DEFINER=`snafu`#`localhost`\*/||g'
The resulting output is devoid of that nasty DEFINER clause.
It turns out you're asking the wrong question. I also asked the wrong question. The reason it's wrong is the beginning of the first sentence: "In my bash script...".
I had the same question & made the same mistake. If you're using bash, you don't need to use sed to do string replacements (and it's much cleaner to use the replace feature built into bash).
Instead of something like, for example:
function escape-all-funny-characters() { UNKNOWN_CODE_THAT_ANSWERS_THE_QUESTION_YOU_ASKED; }
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A="$(escape-all-funny-characters 'KEYWORD')"
B="$(escape-all-funny-characters '<funny characters here>')"
OUTPUT="$(sed "s/$A/$B/g" <<<"$INPUT")"
you can use bash features exclusively:
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A='KEYWORD'
B='<funny characters here>'
OUTPUT="${INPUT//"$A"/"$B"}"
Use awk - it is cleaner:
$ awk -v R='//addr:\\file' '{ sub("THIS", R, $0); print $0 }' <<< "http://file:\_THIS_/path/to/a/file\\is\\\a\\ nightmare"
http://file:\_//addr:\file_/path/to/a/file\\is\\\a\\ nightmare
Here is an example of an AWK I used a while ago. It is an AWK that prints new AWKS. AWK and SED being similar it may be a good template.
ls | awk '{ print "awk " "'"'"'" " {print $1,$2,$3} " "'"'"'" " " $1 ".old_ext > " $1 ".new_ext" }' > for_the_birds
It looks excessive, but somehow that combination of quotes works to keep the ' printed as literals. Then if I remember correctly the vaiables are just surrounded with quotes like this: "$1". Try it, let me know how it works with SED.
These are the escape codes that I've found:
* = \x2a
( = \x28
) = \x29
" = \x22
/ = \x2f
\ = \x5c
' = \x27
? = \x3f
% = \x25
^ = \x5e
sed is typically a mess, especially the difference between gnu-sed and bsd-sed
might just be easier to place some sort of sentinel at the sed side, then a quick pipe over to awk, which is far more flexible in accepting any ERE regex, escaped hex, or escaped octals.
e.g. OFS in awk is the true replacement ::
date | sed -E 's/[0-9]+/\xC1\xC0/g' |
mawk NF=NF FS='\xC1\xC0' OFS='\360\237\244\241'
1 Tue Aug 🤡 🤡:🤡:🤡 EDT 🤡
(tested and confirmed working on both BSD-sed and GNU-sed - the emoji isn't a typo that's what those 4 bytes map to in UTF-8 )
There are dozens of answers out there... If you don't mind using a bash function schema, below is a good answer. The objective below was to allow using sed with practically any parameter as a KEYWORD (F_PS_TARGET) or as a REPLACE (F_PS_REPLACE). We tested it in many scenarios and it seems to be pretty safe. The implementation below supports tabs, line breaks and sigle quotes for both KEYWORD and replace REPLACE.
NOTES: The idea here is to use sed to escape entries for another sed command.
CODE
F_REVERSE_STRING_R=""
f_reverse_string() {
: 'Do a string reverse.
To undo just use a reversed string as STRING_INPUT.
Args:
STRING_INPUT (str): String input.
Returns:
F_REVERSE_STRING_R (str): The modified string.
'
local STRING_INPUT=$1
F_REVERSE_STRING_R=$(echo "x${STRING_INPUT}x" | tac | rev)
F_REVERSE_STRING_R=${F_REVERSE_STRING_R%?}
F_REVERSE_STRING_R=${F_REVERSE_STRING_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/2705678/3223785 ]
F_POWER_SED_ECP_R=""
f_power_sed_ecp() {
: 'Escape strings for the "sed" command.
Escaped characters will be processed as is (e.g. /n, /t ...).
Args:
F_PSE_VAL_TO_ECP (str): Value to be escaped.
F_PSE_ECP_TYPE (int): 0 - For the TARGET value; 1 - For the REPLACE value.
Returns:
F_POWER_SED_ECP_R (str): Escaped value.
'
local F_PSE_VAL_TO_ECP=$1
local F_PSE_ECP_TYPE=$2
# NOTE: Operational characters of "sed" will be escaped, as well as single quotes.
# By Questor
if [ ${F_PSE_ECP_TYPE} -eq 0 ] ; then
# NOTE: For the TARGET value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[]\/$*.^[]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
else
# NOTE: For the REPLACE value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[\/&]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
fi
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R%?}
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/24134488/3223785 ,
# https://stackoverflow.com/a/21740695/3223785 ,
# https://unix.stackexchange.com/a/655558/61742 ,
# https://stackoverflow.com/a/11461628/3223785 ,
# https://stackoverflow.com/a/45151986/3223785 ,
# https://linuxaria.com/pills/tac-and-rev-to-see-files-in-reverse-order ,
# https://unix.stackexchange.com/a/631355/61742 ]
F_POWER_SED_R=""
f_power_sed() {
: 'Facilitate the use of the "sed" command. Replaces in files and strings.
Args:
F_PS_TARGET (str): Value to be replaced by the value of F_PS_REPLACE.
F_PS_REPLACE (str): Value that will replace F_PS_TARGET.
F_PS_FILE (Optional[str]): File in which the replacement will be made.
F_PS_SOURCE (Optional[str]): String to be manipulated in case "F_PS_FILE" was
not informed.
F_PS_NTH_OCCUR (Optional[int]): [1~n] - Replace the nth match; [n~-1] - Replace
the last nth match; 0 - Replace every match; Default 1.
Returns:
F_POWER_SED_R (str): Return the result if "F_PS_FILE" is not informed.
'
local F_PS_TARGET=$1
local F_PS_REPLACE=$2
local F_PS_FILE=$3
local F_PS_SOURCE=$4
local F_PS_NTH_OCCUR=$5
if [ -z "$F_PS_NTH_OCCUR" ] ; then
F_PS_NTH_OCCUR=1
fi
local F_PS_REVERSE_MODE=0
if [ ${F_PS_NTH_OCCUR} -lt -1 ] ; then
F_PS_REVERSE_MODE=1
f_reverse_string "$F_PS_TARGET"
F_PS_TARGET="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_REPLACE"
F_PS_REPLACE="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_SOURCE"
F_PS_SOURCE="$F_REVERSE_STRING_R"
F_PS_NTH_OCCUR=$((-F_PS_NTH_OCCUR))
fi
f_power_sed_ecp "$F_PS_TARGET" 0
F_PS_TARGET=$F_POWER_SED_ECP_R
f_power_sed_ecp "$F_PS_REPLACE" 1
F_PS_REPLACE=$F_POWER_SED_ECP_R
local F_PS_SED_RPL=""
if [ ${F_PS_NTH_OCCUR} -eq -1 ] ; then
# NOTE: We kept this option because it performs better when we only need to replace
# the last occurrence. By Questor
# [Ref(s).: https://linuxhint.com/use-sed-replace-last-occurrence/ ,
# https://unix.stackexchange.com/a/713866/61742 ]
F_PS_SED_RPL="'s/\(.*\)$F_PS_TARGET/\1$F_PS_REPLACE/'"
elif [ ${F_PS_NTH_OCCUR} -gt 0 ] ; then
# [Ref(s).: https://unix.stackexchange.com/a/587924/61742 ]
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/$F_PS_NTH_OCCUR'"
elif [ ${F_PS_NTH_OCCUR} -eq 0 ] ; then
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/g'"
fi
# NOTE: As the "sed" commands below always process literal values for the "F_PS_TARGET"
# so we use the "-z" flag in case it has multiple lines. By Quaestor
# [Ref(s).: https://unix.stackexchange.com/a/525524/61742 ]
if [ -z "$F_PS_FILE" ] ; then
F_POWER_SED_R=$(echo "x${F_PS_SOURCE}x" | eval "sed -z $F_PS_SED_RPL")
F_POWER_SED_R=${F_POWER_SED_R%?}
F_POWER_SED_R=${F_POWER_SED_R#?}
if [ ${F_PS_REVERSE_MODE} -eq 1 ] ; then
f_reverse_string "$F_POWER_SED_R"
F_POWER_SED_R="$F_REVERSE_STRING_R"
fi
else
if [ ${F_PS_REVERSE_MODE} -eq 0 ] ; then
eval "sed -i -z $F_PS_SED_RPL \"$F_PS_FILE\""
else
tac "$F_PS_FILE" | rev | eval "sed -z $F_PS_SED_RPL" | tac | rev > "$F_PS_FILE"
fi
fi
}
MODEL
f_power_sed "F_PS_TARGET" "F_PS_REPLACE" "" "F_PS_SOURCE"
echo "$F_POWER_SED_R"
EXAMPLE
f_power_sed "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" "[ ]+|$/,\"\0\"" "" "Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate that concatenation of the final \", \" then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,\"\0\"); print; }' <<<\"$string\") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readar"
echo "$F_POWER_SED_R"
IF YOU JUST WANT TO ESCAPE THE PARAMETERS TO THE SED COMMAND
MODEL
# "TARGET" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 0
echo "$F_POWER_SED_ECP_R"
# "REPLACE" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 1
echo "$F_POWER_SED_ECP_R"
IMPORTANT: If the strings for KEYWORD and/or replace REPLACE contain tabs or line breaks you will need to use the "-z" flag in your "sed" command. More details here.
EXAMPLE
f_power_sed_ecp "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" 0
echo "$F_POWER_SED_ECP_R"
f_power_sed_ecp "[ ]+|$/,\"\0\"" 1
echo "$F_POWER_SED_ECP_R"
NOTE: The f_power_sed_ecp and f_power_sed functions above was made available completely free as part of this project ez_i - Create shell script installers easily!.
Standard recommendation here: use perl :)
echo KEYWORD > /tmp/test
REPLACE="<funny characters here>"
perl -pi.bck -e "s/KEYWORD/${REPLACE}/g" /tmp/test
cat /tmp/test
don't forget all the pleasure that occur with the shell limitation around " and '
so (in ksh)
Var=">New version of \"content' here <"
printf "%s" "${Var}" | sed "s/[&\/\\\\*\\"']/\\&/g' | read -r EscVar
echo "Here is your \"text\" to change" | sed "s/text/${EscVar}/g"
If the case happens to be that you are generating a random password to pass to sed replace pattern, then you choose to be careful about which set of characters in the random string. If you choose a password made by encoding a value as base64, then there is is only character that is both possible in base64 and is also a special character in sed replace pattern. That character is "/", and is easily removed from the password you are generating:
# password 32 characters log, minus any copies of the "/" character.
pass=`openssl rand -base64 32 | sed -e 's/\///g'`;
If you are just looking to replace Variable value in sed command then just remove
Example:
sed -i 's/dev-/dev-$ENV/g' test to sed -i s/dev-/dev-$ENV/g test
I have an improvement over the sedeasy function, which WILL break with special characters like tab.
function sedeasy_improved {
sed -i "s/$(
echo "$1" | sed -e 's/\([[\/.*]\|\]\)/\\&/g'
| sed -e 's:\t:\\t:g'
)/$(
echo "$2" | sed -e 's/[\/&]/\\&/g'
| sed -e 's:\t:\\t:g'
)/g" "$3"
}
So, whats different? $1 and $2 wrapped in quotes to avoid shell expansions and preserve tabs or double spaces.
Additional piping | sed -e 's:\t:\\t:g' (I like : as token) which transforms a tab in \t.
An easier way to do this is simply building the string before hand and using it as a parameter for sed
rpstring="s/KEYWORD/$REPLACE/g"
sed -i $rpstring test.txt

Bash script to add double quotes in .CSV comma delimited file

I need to add double quotes to the csv file. My sample data is like this..
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29 06:10:01",200890899011202395,0899,0279395,356058102052972,89148000005117597971,67756296
I have tried some code available online with awk and sed, it is resulting as below , Error - **First digit in the number is being trimmed like for ex. in '378478' it is only displaying '78478'.
Also it is adding double quotes to already existing double quotes too!** nothing seems to be perfectly working. Please guide me!
"78478","COMPLETED","Tracfone","","",""2020/03/29 09:39:22"","","2787","","356074101197544","89148000005748235454","75176540"
"78328","COMPLETED",""Total Wireless"",""Unlimited Talk"," Text"," & Data (First 25GB High Speed"," then unlimited 2GB)"","50",""2020/03/29 06:10:01"","200890899011202395","0899","0279395","356058102052972","89148000005117597971","67756296"
"78329","COMPLETED",""Cricket Wireless"",""Unlimited Talk"," Text"," & 4G LTE Data w/ 15GB Hotspot"","60",""2020/03/29""
This is the code I am using:
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' file # or
sed -E 's/([^,]*) , (.*)/"\1" , "\2"/' file
My total code is the below one. my Intention was to first convert all .xlsx to .csv and then add double quotes to same csv and save it in the same file.i know the $file.csv part is wrong, hence i need some help
find "$Src_Dir" -type f -iname "*.xlsx" -print>path/temp
cat path/temp | while IFS="" read -r -d $'\0' file;
do
echo $file
ssconvert "${file}" --export-type=Gnumeric_stf:stf_csv
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' $file > $file.csv
done
If you want to handle anything other than the simplest CSV files, you should probably move away from sed and awk. There are much better tools available.
For example, if you sudo apt install csvtool (or equivalent) on your favourite distro, you can use its call-per-line functionality to process each line in the input file. See the following script for an example:
#!/bin/bash
function quotify {
# Start empty line, process every field.
line=""
while [[ $# -ne 0 ]] ; do
# Append comma for all but first field, then quoted field.
[[ -n "${line}" ]] && line="${line},"
line="${line}\"$1\""
shift
done
# Output the fully quoted line.
echo "${line}"
}
# Needed to call functions. Also, ensure link: /bin/sh -> /bin/bash.
export -f quotify
# Pretty-print input and output.
echo "Input file:"
sed 's/^/ /' inputFile.csv
echo "Output file:"
csvtool call quotify inputFile.csv | sed 's/^/ /'
Note the quotify function which is called for each line in the CSV file, with the arguments set to each field within that line (sans quotes, whether the original fields had quotes or not).
It basically constructs a string of all the fields in the line, with quotes around them, then writes that to standard output, as shown below in the output from that script:
Input file:
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29"
Output file:
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29"
Even though using a separate tool is probably the easiest way to go, if you absolutely cannot install other packages, then you're going to have to code up something in a package you already have. The following bash script is a good place to start, as it uses no other tools to achieve its goal.
At the moment, it's tied to a very specific set of rules, as follows:
White space matters. Anything between the commas is considered part of the field. This especially matters when detecting a quoted field, it must have the quote as the first character, no abc, "d,e,f",ghi stuff since the "d,e,f" won't be handled correctly.
Quoted fields are allowed to contain commas, and "" sequences within them are turned into ".
It's probably not a good idea to supply ill-formatted CSV files :-)
But, with that in mind, here we go. I'll offer a brief textual description of each section but hopefully the comments in the code will be enough to figure out what's going on.
First, a function for finding the position if some string within another string, useful for working out the field bounds:
function findPos {
haystack="$1"
needle="$2"
# Remove everything past the needle.
prefix="${haystack%%${needle}*}"
# If nothing was removed, it wasn't found, so supply massive number.
# Otherwise, it was found at the length of the string with removed stuff.
position=999999
[[ ${#prefix} -ne ${#haystack} ]] && position=${#prefix}
echo ${position}
}
Then we can use that in the function that works out the length of the next field. This basically just looks for the next comma for unquoted fields, and does special handling for quoted fields by building up the field from segments (it has to handle quotes within quotes and commas):
function getNextFieldLen {
line="$1"
# Empty line means all work done.
[[ -z "${line}" ]] && echo -1 && return
# Handle unquoted first, this is easy.
[[ "${line:0:1}" != '"' ]] && { echo $(findPos "${line}" ","); return; }
# Now handle quoted. Loop over all segments where a segment is defined as
# the text up to the next <"">, assuming it's before the next <",>.
field=""
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
while [[ ${nextDoubleQuote} -lt ${nextQuoteComma} ]]; do
# Append segment to the field and go back for next segment.
field="${field}${line:0:${nextDoubleQuote}}\"\""
line="${line:${nextDoubleQuote}}"
line="${line:2}"
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
done
# Add final segment (up to the comma) and output entire field.
field="${field}${line:0:${nextQuoteComma}}\""
echo "${#field}"
}
Finally, there's the top-level function which will quotify whatever comes in via standard input:
function quotifyStdIn {
# Process file line by line.
while read -r line; do
# Start with empty output line and non-comma separator.
outLine="" ; sep=""
# Place terminator to make processing easier, start field loop.
line="${line},"
fieldLen=$(getNextFieldLen "${line}")
while [[ ${fieldLen} -ge 0 ]]; do
# Get field and quotify if needed, adjust line (remove field and comma).
field="${line:0:${fieldLen}}"
[[ "${field:0:1}" = '"' ]] || field="\"${field}\""
line="${line:$((fieldLen+1))}"
#line="${line:${fieldLen}}"
#line="${line:1}"
# Append to output line and prepare for next field.
outLine="${outLine}${sep}${field}"; sep=","
fieldLen=$(getNextFieldLen "${line}")
done
# Output built line.
echo "${outLine}"
done
}
And, on the off-chance you want to read directly from a file (though providing a file name that's empty or "-" will use standard input so you can probably just use the file-based function for everything):
function quotifyFile {
file="$1"
# Empty file or "-" means standard input, otherwise take input from real file.
[[ ${#file} -eq 0 ]] && { quotifyStdIn; return; }
[[ "${file}" = "-" ]] && { quotifyStdIn; return; }
quotifyStdIn < "${file}"
}
And, finally, because every program that's not a "Hello, world" one deserves some form of test harness, this is what you can use to test the various capabilities:
(
echo 'paxdiablo,was here'
echo 'and,"then, strangely,",he,was,not'
echo '50,"My name is ""Pax"", and yours is ""Bob""",42'
echo '17,"""Love"" is grand",19'
) > harness.csv
echo "Before:"
sed "s/^/ /" harness.csv
echo "After:"
quotifyFile harness.csv | sed "s/^/ /"
rm -rf harness.csv
And, since a test harness is of little use unless you run the tests, here's the results of the first run:
Before:
paxdiablo,was here
and,"then, strangely,",he,was,not
50,"My name is ""Pax"", and yours is ""Bob""",42
17,"""Love"" is grand",19
After:
"paxdiablo","was here"
"and","then, strangely,","he","was","not"
"50","My name is ""Pax"", and yours is ""Bob""","42"
"17","""Love"" is grand","19"
Hopefully, that will be enough to get you going in the absence of being able to install packages. Of course, if one of the packages you can't install in bash itself, then you have problems that I can't help you with :-)
Your starting CSV is not a good CSV: the 2 rows have different number of columns
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 378478 | COMPLETED | Tracfone | - | - | 2020/03/29 09:39:22 | - | 2787 | - | 356074101197544 | 89148000005748235454 | 75176540 |
| 378328 | COMPLETED | Total Wireless | Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB) | 50 | 2020/03/29 | - | - | - | - | - | - |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
Using Miller (https://github.com/johnkerl/miller) you could run
mlr --csv --quote-all -N unsparsify input >output
to have
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29","","","","","",""
You can use it downloading the executable https://github.com/johnkerl/miller/releases/tag/v5.7.0

How can extract a value from .ini using sed [duplicate]

I have a parameters.ini file, such as:
[parameters.ini]
database_user = user
database_version = 20110611142248
I want to read in and use the database version specified in the parameters.ini file from within a bash shell script so I can process it.
#!/bin/sh
# Need to get database version from parameters.ini file to use in script
php app/console doctrine:migrations:migrate $DATABASE_VERSION
How would I do this?
How about grepping for that line then using awk
version=$(awk -F "=" '/database_version/ {print $2}' parameters.ini)
You can use bash native parser to interpret ini values, by:
$ source <(grep = file.ini)
Sample file:
[section-a]
var1=value1
var2=value2
IPS=( "1.2.3.4" "1.2.3.5" )
To access variables, you simply printing them: echo $var1. You may also use arrays as shown above (echo ${IPS[#]}).
If you only want a single value just grep for it:
source <(grep var1 file.ini)
For the demo, check this recording at asciinema.
It is simple as you don't need for any external library to parse the data, but it comes with some disadvantages. For example:
If you have spaces between = (variable name and value), then you've to trim the spaces first, e.g.
$ source <(grep = file.ini | sed 's/ *= */=/g')
Or if you don't care about the spaces (including in the middle), use:
$ source <(grep = file.ini | tr -d ' ')
To support ; comments, replace them with #:
$ sed "s/;/#/g" foo.ini | source /dev/stdin
The sections aren't supported (e.g. if you've [section-name], then you've to filter it out as shown above, e.g. grep =), the same for other unexpected errors.
If you need to read specific value under specific section, use grep -A, sed, awk or ex).
E.g.
source <(grep = <(grep -A5 '\[section-b\]' file.ini))
Note: Where -A5 is the number of rows to read in the section. Replace source with cat to debug.
If you've got any parsing errors, ignore them by adding: 2>/dev/null
See also:
How to parse and convert ini file into bash array variables? at serverfault SE
Are there any tools for modifying INI style files from shell script
Sed one-liner, that takes sections into account. Example file:
[section1]
param1=123
param2=345
param3=678
[section2]
param1=abc
param2=def
param3=ghi
[section3]
param1=000
param2=111
param3=222
Say you want param2 from section2. Run the following:
sed -nr "/^\[section2\]/ { :l /^param2[ ]*=/ { s/[^=]*=[ ]*//; p; q;}; n; b l;}" ./file.ini
will give you
def
Bash does not provide a parser for these files. Obviously you can use an awk command or a couple of sed calls, but if you are bash-priest and don't want to use any other shell, then you can try the following obscure code:
#!/usr/bin/env bash
cfg_parser ()
{
ini="$(<$1)" # read the file
ini="${ini//[/\[}" # escape [
ini="${ini//]/\]}" # escape ]
IFS=$'\n' && ini=( ${ini} ) # convert to line-array
ini=( ${ini[*]//;*/} ) # remove comments with ;
ini=( ${ini[*]/\ =/=} ) # remove tabs before =
ini=( ${ini[*]/=\ /=} ) # remove tabs after =
ini=( ${ini[*]/\ =\ /=} ) # remove anything with a space around =
ini=( ${ini[*]/#\\[/\}$'\n'cfg.section.} ) # set section prefix
ini=( ${ini[*]/%\\]/ \(} ) # convert text2function (1)
ini=( ${ini[*]/=/=\( } ) # convert item to array
ini=( ${ini[*]/%/ \)} ) # close array parenthesis
ini=( ${ini[*]/%\\ \)/ \\} ) # the multiline trick
ini=( ${ini[*]/%\( \)/\(\) \{} ) # convert text2function (2)
ini=( ${ini[*]/%\} \)/\}} ) # remove extra parenthesis
ini[0]="" # remove first element
ini[${#ini[*]} + 1]='}' # add the last brace
eval "$(echo "${ini[*]}")" # eval the result
}
cfg_writer ()
{
IFS=' '$'\n'
fun="$(declare -F)"
fun="${fun//declare -f/}"
for f in $fun; do
[ "${f#cfg.section}" == "${f}" ] && continue
item="$(declare -f ${f})"
item="${item##*\{}"
item="${item%\}}"
item="${item//=*;/}"
vars="${item//=*/}"
eval $f
echo "[${f#cfg.section.}]"
for var in $vars; do
echo $var=\"${!var}\"
done
done
}
Usage:
# parse the config file called 'myfile.ini', with the following
# contents::
# [sec2]
# var2='something'
cfg.parser 'myfile.ini'
# enable section called 'sec2' (in the file [sec2]) for reading
cfg.section.sec2
# read the content of the variable called 'var2' (in the file
# var2=XXX). If your var2 is an array, then you can use
# ${var[index]}
echo "$var2"
Bash ini-parser can be found at The Old School DevOps blog site.
Just include your .ini file into bash body:
File example.ini:
DBNAME=test
DBUSER=scott
DBPASSWORD=tiger
File example.sh
#!/bin/bash
#Including .ini file
. example.ini
#Test
echo "${DBNAME} ${DBUSER} ${DBPASSWORD}"
All of the solutions I've seen so far also hit on commented out lines. This one didn't, if the comment code is ;:
awk -F '=' '{if (! ($0 ~ /^;/) && $0 ~ /database_version/) print $2}' file.ini
You may use crudini tool to get ini values, e.g.:
DATABASE_VERSION=$(crudini --get parameters.ini '' database_version)
one of more possible solutions
dbver=$(sed -n 's/.*database_version *= *\([^ ]*.*\)/\1/p' < parameters.ini)
echo $dbver
Display the value of my_key in an ini-style my_file:
sed -n -e 's/^\s*my_key\s*=\s*//p' my_file
-n -- do not print anything by default
-e -- execute the expression
s/PATTERN//p -- display anything following this pattern
In the pattern:
^ -- pattern begins at the beginning of the line
\s -- whitespace character
* -- zero or many (whitespace characters)
Example:
$ cat my_file
# Example INI file
something = foo
my_key = bar
not_my_key = baz
my_key_2 = bing
$ sed -n -e 's/^\s*my_key\s*=\s*//p' my_file
bar
So:
Find a pattern where the line begins with zero or many whitespace characters,
followed by the string my_key, followed by zero or many whitespace characters, an equal sign, then zero or many whitespace characters again. Display the rest of the content on that line following that pattern.
Similar to the other Python answers, you can do this using the -c flag to execute a sequence of Python statements given on the command line:
$ python3 -c "import configparser; c = configparser.ConfigParser(); c.read('parameters.ini'); print(c['parameters.ini']['database_version'])"
20110611142248
This has the advantage of requiring only the Python standard library and the advantage of not writing a separate script file.
Or use a here document for better readability, thusly:
#!/bin/bash
python << EOI
import configparser
c = configparser.ConfigParser()
c.read('params.txt')
print c['chassis']['serialNumber']
EOI
serialNumber=$(python << EOI
import configparser
c = configparser.ConfigParser()
c.read('params.txt')
print c['chassis']['serialNumber']
EOI
)
echo $serialNumber
sed
You can use sed to parse the ini configuration file, especially when you've section names like:
# last modified 1 April 2001 by John Doe
[owner]
name=John Doe
organization=Acme Widgets Inc.
[database]
# use IP address in case network name resolution is not working
server=192.0.2.62
port=143
file=payroll.dat
so you can use the following sed script to parse above data:
# Configuration bindings found outside any section are given to
# to the default section.
1 {
x
s/^/default/
x
}
# Lines starting with a #-character are comments.
/#/n
# Sections are unpacked and stored in the hold space.
/\[/ {
s/\[\(.*\)\]/\1/
x
b
}
# Bindings are unpacked and decorated with the section
# they belong to, before being printed.
/=/ {
s/^[[:space:]]*//
s/[[:space:]]*=[[:space:]]*/|/
G
s/\(.*\)\n\(.*\)/\2|\1/
p
}
this will convert the ini data into this flat format:
owner|name|John Doe
owner|organization|Acme Widgets Inc.
database|server|192.0.2.62
database|port|143
database|file|payroll.dat
so it'll be easier to parse using sed, awk or read by having section names in every line.
Credits & source: Configuration files for shell scripts, Michael Grünewald
Alternatively, you can use this project: chilladx/config-parser, a configuration parser using sed.
For people (like me) looking to read INI files from shell scripts (read shell, not bash) - I've knocked up the a little helper library which tries to do exactly that:
https://github.com/wallyhall/shini (MIT license, do with it as you please. I've linked above including it inline as the code is quite lengthy.)
It's somewhat more "complicated" than the simple sed lines suggested above - but works on a very similar basis.
Function reads in a file line-by-line - looking for section markers ([section]) and key/value declarations (key=value).
Ultimately you get a callback to your own function - section, key and value.
Here is my version, which parses sections and populates a global associative array g_iniProperties with it.
Note that this works only with bash v4.2 and higher.
function parseIniFile() { #accepts the name of the file to parse as argument ($1)
#declare syntax below (-gA) only works with bash 4.2 and higher
unset g_iniProperties
declare -gA g_iniProperties
currentSection=""
while read -r line
do
if [[ $line = [* ]] ; then
if [[ $line = [* ]] ; then
currentSection=$(echo $line | sed -e 's/\r//g' | tr -d "[]")
fi
else
if [[ $line = *=* ]] ; then
cleanLine=$(echo $line | sed -e 's/\r//g')
key=$currentSection.$(echo $cleanLine | awk -F: '{ st = index($0,"=");print substr($0,0,st-1)}')
value=$(echo $cleanLine | awk -F: '{ st = index($0,"=");print substr($0,st+1)}')
g_iniProperties[$key]=$value
fi
fi;
done < $1
}
And here is a sample code using the function above:
parseIniFile "/path/to/myFile.ini"
for key in "${!g_iniProperties[#]}"; do
echo "Found key/value $key = ${g_iniProperties[$key]}"
done
Yet another implementation using awk with a little more flexibility.
function parse_ini() {
cat /dev/stdin | awk -v section="$1" -v key="$2" '
BEGIN {
if (length(key) > 0) { params=2 }
else if (length(section) > 0) { params=1 }
else { params=0 }
}
match($0,/#/) { next }
match($0,/^\[(.+)\]$/){
current=substr($0, RSTART+1, RLENGTH-2)
found=current==section
if (params==0) { print current }
}
match($0,/(.+)=(.+)/) {
if (found) {
if (params==2 && key==$1) { print $3 }
if (params==1) { printf "%s=%s\n",$1,$3 }
}
}'
}
You can use calling passing between 0 and 2 params:
cat myfile1.ini myfile2.ini | parse_ini # List section names
cat myfile1.ini myfile2.ini | parse_ini 'my-section' # Prints keys and values from a section
cat myfile1.ini myfile2.ini | parse_ini 'my-section' 'my-key' # Print a single value
complex simplicity
ini file
test.ini
[section1]
name1=value1
name2=value2
[section2]
name1=value_1
name2 = value_2
bash script with read and execute
/bin/parseini
#!/bin/bash
set +a
while read p; do
reSec='^\[(.*)\]$'
#reNV='[ ]*([^ ]*)+[ ]*=(.*)' #Remove only spaces around name
reNV='[ ]*([^ ]*)+[ ]*=[ ]*(.*)' #Remove spaces around name and spaces before value
if [[ $p =~ $reSec ]]; then
section=${BASH_REMATCH[1]}
elif [[ $p =~ $reNV ]]; then
sNm=${section}_${BASH_REMATCH[1]}
sVa=${BASH_REMATCH[2]}
set -a
eval "$(echo "$sNm"=\""$sVa"\")"
set +a
fi
done < $1
then in another script I source the results of the command and can use any variables within
test.sh
#!/bin/bash
source parseini test.ini
echo $section2_name2
finally from command line the output is thus
# ./test.sh
value_2
Some of the answers don't respect comments. Some don't respect sections. Some recognize only one syntax (only ":" or only "="). Some Python answers fail on my machine because of differing captialization or failing to import the sys module. All are a bit too terse for me.
So I wrote my own, and if you have a modern Python, you can probably call this from your Bash shell. It has the advantage of adhering to some of the common Python coding conventions, and even provides sensible error messages and help. To use it, name it something like myconfig.py (do NOT call it configparser.py or it may try to import itself,) make it executable, and call it like
value=$(myconfig.py something.ini sectionname value)
Here's my code for Python 3.5 on Linux:
#!/usr/bin/env python3
# Last Modified: Thu Aug 3 13:58:50 PDT 2017
"""A program that Bash can call to parse an .ini file"""
import sys
import configparser
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser(description="A program that Bash can call to parse an .ini file")
parser.add_argument("inifile", help="name of the .ini file")
parser.add_argument("section", help="name of the section in the .ini file")
parser.add_argument("itemname", help="name of the desired value")
args = parser.parse_args()
config = configparser.ConfigParser()
config.read(args.inifile)
print(config.get(args.section, args.itemname))
I wrote a quick and easy python script to include in my bash script.
For example, your ini file is called food.ini
and in the file you can have some sections and some lines:
[FRUIT]
Oranges = 14
Apples = 6
Copy this small 6 line Python script and save it as configparser.py
#!/usr/bin/python
import configparser
import sys
config = configparser.ConfigParser()
config.read(sys.argv[1])
print config.get(sys.argv[2],sys.argv[3])
Now, in your bash script you could do this for example.
OrangeQty=$(python configparser.py food.ini FRUIT Oranges)
or
ApplesQty=$(python configparser.py food.ini FRUIT Apples)
echo $ApplesQty
This presupposes:
you have Python installed
you have the configparser library installed (this should come with a std python installation)
Hope it helps
:¬)
The explanation to the answer for the one-liner sed.
[section1]
param1=123
param2=345
param3=678
[section2]
param1=abc
param2=def
param3=ghi
[section3]
param1=000
param2=111
param3=222
sed -nr "/^\[section2\]/ { :l /^\s*[^#].*/ p; n; /^\[/ q; b l; }" ./file.ini
To understand, it will be easier to format the line like this:
sed -nr "
# start processing when we found the word \"section2\"
/^\[section2\]/ { #the set of commands inside { } will be executed
#create a label \"l\" (https://www.grymoire.com/Unix/Sed.html#uh-58)
:l /^\s*[^#].*/ p;
# move on to the next line. For the first run it is the \"param1=abc\"
n;
# check if this line is beginning of new section. If yes - then exit.
/^\[/ q
#otherwise jump to the label \"l\"
b l
}
" file.ini
This script will get parameters as follow :
meaning that if your ini has :
pars_ini.ksh < path to ini file > < name of Sector in Ini file > < the name in name=value to return >
eg. how to call it :
[ environment ]
a=x
[ DataBase_Sector ]
DSN = something
Then calling :
pars_ini.ksh /users/bubu_user/parameters.ini DataBase_Sector DSN
this will retrieve the following "something"
the script "pars_ini.ksh" :
\#!/bin/ksh
\#INI_FILE=path/to/file.ini
\#INI_SECTION=TheSection
\# BEGIN parse-ini-file.sh
\# SET UP THE MINIMUM VARS FIRST
alias sed=/usr/local/bin/sed
INI_FILE=$1
INI_SECTION=$2
INI_NAME=$3
INI_VALUE=""
eval `sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
-e 's/;.*$//' \
-e 's/[[:space:]]*$//' \
-e 's/^[[:space:]]*//' \
-e "s/^\(.*\)=\([^\"']*\)$/\1=\"\2\"/" \
< $INI_FILE \
| sed -n -e "/^\[$INI_SECTION\]/,/^\s*\[/{/^[^;].*\=.*/p;}"`
TEMP_VALUE=`echo "$"$INI_NAME`
echo `eval echo $TEMP_VALUE`
This implementation uses awk and has the following advantages:
Will only return the first matching entry
Ignores lines that start with a ;
Trims leading and trailing whitespace, but not internal whitespace
Formatted version:
awk -F '=' '/^\s*database_version\s*=/ {
sub(/^ +/, "", $2);
sub(/ +$/, "", $2);
print $2;
exit;
}' parameters.ini
One-liner:
awk -F '=' '/^\s*database_version\s*=/ { sub(/^ +/, "", $2); sub(/ +$/, "", $2); print $2; exit; }' parameters.ini
You can use a CSV parser xsv as parsing INI data.
cargo install xsv
$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
$ xsv select -d "=" - <<< "$( cat /etc/*release )" | xsv search --no-headers --select 1 "DISTRIB_CODENAME" | xsv select 2
xenial
or from a file.
$ xsv select -d "=" - file.ini | xsv search --no-headers --select 1 "DISTRIB_CODENAME" | xsv select 2
My version of the one-liner
#!/bin/bash
#Reader for MS Windows 3.1 Ini-files
#Usage: inireader.sh
# e.g.: inireader.sh win.ini ERRORS DISABLE
# would return value "no" from the section of win.ini
#[ERRORS]
#DISABLE=no
INIFILE=$1
SECTION=$2
ITEM=$3
cat $INIFILE | sed -n /^\[$SECTION\]/,/^\[.*\]/p | grep "^[:space:]*$ITEM[:space:]*=" | sed s/.*=[:space:]*//
Just finished writing my own parser. I tried to use various parser found here, none seems to work with both ksh93 (AIX) and bash (Linux).
It's old programming style - parsing line by line. Pretty fast since it used few external commands. A bit slower because of all the eval required for dynamic name of the array.
The ini support 3 special syntaxs:
includefile=ini file -->
Load an additionnal ini file. Useful for splitting ini in multiple files, or re-use some piece of configuration
includedir=directory -->
Same as includefile, but include a complete directory
includesection=section -->
Copy an existing section to the current section.
I used all thoses syntax to have pretty complex, re-usable ini file. Useful to install products when installing a new OS - we do that a lot.
Values can be accessed with ${ini[$section.$item]}. The array MUST be defined before calling this.
Have fun. Hope it's useful for someone else!
function Show_Debug {
[[ $DEBUG = YES ]] && echo "DEBUG $#"
}
function Fatal {
echo "$#. Script aborted"
exit 2
}
#-------------------------------------------------------------------------------
# This function load an ini file in the array "ini"
# The "ini" array must be defined in the calling program (typeset -A ini)
#
# It could be any array name, the default array name is "ini".
#
# There is heavy usage of "eval" since ksh and bash do not support
# reference variable. The name of the ini is passed as variable, and must
# be "eval" at run-time to work. Very specific syntax was used and must be
# understood before making any modifications.
#
# It complexify greatly the program, but add flexibility.
#-------------------------------------------------------------------------------
function Load_Ini {
Show_Debug "$0($#)"
typeset ini_file="$1"
# Name of the array to fill. By default, it's "ini"
typeset ini_array_name="${2:-ini}"
typeset section variable value line my_section file subsection value_array include_directory all_index index sections pre_parse
typeset LF="
"
if [[ ! -s $ini_file ]]; then
Fatal "The ini file is empty or absent in $0 [$ini_file]"
fi
include_directory=$(dirname $ini_file)
include_directory=${include_directory:-$(pwd)}
Show_Debug "include_directory=$include_directory"
section=""
# Since this code support both bash and ksh93, you cannot use
# the syntax "echo xyz|while read line". bash doesn't work like
# that.
# It forces the use of "<<<", introduced in bash and ksh93.
Show_Debug "Reading file $ini_file and putting the results in array $ini_array_name"
pre_parse="$(sed 's/^ *//g;s/#.*//g;s/ *$//g' <$ini_file | egrep -v '^$')"
while read line; do
if [[ ${line:0:1} = "[" ]]; then # Is the line starting with "["?
# Replace [section_name] to section_name by removing the first and last character
section="${line:1}"
section="${section%\]}"
eval "sections=\${$ini_array_name[sections_list]}"
sections="$sections${sections:+ }$section"
eval "$ini_array_name[sections_list]=\"$sections\""
Show_Debug "$ini_array_name[sections_list]=\"$sections\""
eval "$ini_array_name[$section.exist]=YES"
Show_Debug "$ini_array_name[$section.exist]='YES'"
else
variable=${line%%=*} # content before the =
value=${line#*=} # content after the =
if [[ $variable = includefile ]]; then
# Include a single file
Load_Ini "$include_directory/$value" "$ini_array_name"
continue
elif [[ $variable = includedir ]]; then
# Include a directory
# If the value doesn't start with a /, add the calculated include_directory
if [[ $value != /* ]]; then
value="$include_directory/$value"
fi
# go thru each file
for file in $(ls $value/*.ini 2>/dev/null); do
if [[ $file != *.ini ]]; then continue; fi
# Load a single file
Load_Ini "$file" "$ini_array_name"
done
continue
elif [[ $variable = includesection ]]; then
# Copy an existing section into the current section
eval "all_index=\"\${!$ini_array_name[#]}\""
# It's not necessarily fast. Need to go thru all the array
for index in $all_index; do
# Only if it is the requested section
if [[ $index = $value.* ]]; then
# Evaluate the subsection [section.subsection] --> subsection
subsection=${index#*.}
# Get the current value (source section)
eval "value_array=\"\${$ini_array_name[$index]}\""
# Assign the value to the current section
# The $value_array must be resolved on the second pass of the eval, so make sure the
# first pass doesn't resolve it (\$value_array instead of $value_array).
# It must be evaluated on the second pass in case there is special character like $1,
# or ' or " in it (code).
eval "$ini_array_name[$section.$subsection]=\"\$value_array\""
Show_Debug "$ini_array_name[$section.$subsection]=\"$value_array\""
fi
done
fi
# Add the value to the array
eval "current_value=\"\${$ini_array_name[$section.$variable]}\""
# If there's already something for this field, add it with the current
# content separated by a LF (line_feed)
new_value="$current_value${current_value:+$LF}$value"
# Assign the content
# The $new_value must be resolved on the second pass of the eval, so make sure the
# first pass doesn't resolve it (\$new_value instead of $new_value).
# It must be evaluated on the second pass in case there is special character like $1,
# or ' or " in it (code).
eval "$ini_array_name[$section.$variable]=\"\$new_value\""
Show_Debug "$ini_array_name[$section.$variable]=\"$new_value\""
fi
done <<< "$pre_parse"
Show_Debug "exit $0($#)\n"
}
When I use a password in base64, I put the separator ":" because the base64 string may has "=". For example (I use ksh):
> echo "Abc123" | base64
QWJjMTIzCg==
In parameters.ini put the line pass:QWJjMTIzCg==, and finally:
> PASS=`awk -F":" '/pass/ {print $2 }' parameters.ini | base64 --decode`
> echo "$PASS"
Abc123
If the line has spaces like "pass : QWJjMTIzCg== " add | tr -d ' ' to trim them:
> PASS=`awk -F":" '/pass/ {print $2 }' parameters.ini | tr -d ' ' | base64 --decode`
> echo "[$PASS]"
[Abc123]
This uses the system perl and clean regular expressions:
cat parameters.ini | perl -0777ne 'print "$1" if /\[\s*parameters\.ini\s*\][\s\S]*?\sdatabase_version\s*=\s*(.*)/'
The answer of "Karen Gabrielyan" among another answers was the best but in some environments we dont have awk, like typical busybox, i changed the answer by below code.
trim()
{
local trimmed="$1"
# Strip leading space.
trimmed="${trimmed## }"
# Strip trailing space.
trimmed="${trimmed%% }"
echo "$trimmed"
}
function parseIniFile() { #accepts the name of the file to parse as argument ($1)
#declare syntax below (-gA) only works with bash 4.2 and higher
unset g_iniProperties
declare -gA g_iniProperties
currentSection=""
while read -r line
do
if [[ $line = [* ]] ; then
if [[ $line = [* ]] ; then
currentSection=$(echo $line | sed -e 's/\r//g' | tr -d "[]")
fi
else
if [[ $line = *=* ]] ; then
cleanLine=$(echo $line | sed -e 's/\r//g')
key=$(trim $currentSection.$(echo $cleanLine | cut -d'=' -f1'))
value=$(trim $(echo $cleanLine | cut -d'=' -f2))
g_iniProperties[$key]=$value
fi
fi;
done < $1
}
If Python is available, the following will read all the sections, keys and values and save them in variables with their names following the format "[section]_[key]". Python can read .ini files properly, so we make use of it.
#!/bin/bash
eval $(python3 << EOP
from configparser import SafeConfigParser
config = SafeConfigParser()
config.read("config.ini"))
for section in config.sections():
for (key, val) in config.items(section):
print(section + "_" + key + "=\"" + val + "\"")
EOP
)
echo "Environment_type: ${Environment_type}"
echo "Environment_name: ${Environment_name}"
config.ini
[Environment]
type = DEV
name = D01
If using sections, this will do the job :
Example raw output :
$ ./settings
[section]
SETTING_ONE=this is setting one
SETTING_TWO=This is the second setting
ANOTHER_SETTING=This is another setting
Regexp parsing :
$ ./settings | sed -n -E "/^\[.*\]/{s/\[(.*)\]/\1/;h;n;};/^[a-zA-Z]/{s/#.*//;G;s/([^ ]*) *= *(.*)\n(.*)/\3_\1='\2'/;p;}"
section_SETTING_ONE='this is setting one'
section_SETTING_TWO='This is the second setting'
section_ANOTHER_SETTING='This is another setting'
Now all together :
$ eval "$(./settings | sed -n -E "/^\[.*\]/{s/\[(.*)\]/\1/;h;n;};/^[a-zA-Z]/{s/#.*//;G;s/([^ ]*) *= *(.*)\n(.*)/\3_\1='\2'/;p;}")"
$ echo $section_SETTING_TWO
This is the second setting
I have nice one-liner (assuimng you have php and jq installed):
cat file.ini | php -r "echo json_encode(parse_ini_string(file_get_contents('php://stdin'), true, INI_SCANNER_RAW));" | jq '.section.key'
This thread does not have enough solutions to choose from, thus here my solution, it does not require tools like sed or awk :
grep '^\[section\]' -A 999 config.ini | tail -n +2 | grep -B 999 '^\[' | head -n -1 | grep '^key' | cut -d '=' -f 2
If your are to expect sections with more than 999 lines, feel free to adapt the example above. Note that you may want to trim the resulting value, to remove spaces or a comment string after the value. Remove the ^ if you need to match keys that do not start at the beginning of the line, as in the example of the question. Better, match explicitly for white spaces and tabs, in such a case.
If you have multiple values in a given section you want to read, but want to avoid reading the file multiple times:
CONFIG_SECTION=$(grep '^\[section\]' -A 999 config.ini | tail -n +2 | grep -B 999 '^\[' | head -n -1)
KEY1=$(echo ${CONFIG_SECTION} | tr ' ' '\n' | grep key1 | cut -d '=' -f 2)
echo "KEY1=${KEY1}"
KEY2=$(echo ${CONFIG_SECTION} | tr ' ' '\n' | grep key2 | cut -d '=' -f 2)
echo "KEY2=${KEY2}"

Different MD5 outputs in shell and script

This is driving me crazy, im trying to do some MD5 calculation based on the fritzbox SPEC for logging in. Basically you have to convert a challenge and the password into UTF-16LE and then hash it by md5, then concat challenge-md5(uft-16le(challenge-password))
To do so i'm using iconv and md5 from mac OSX in a script
echo -n "challenge-password1234" | iconv -f ISO8859-1 -t UTF-16LE | md5
Which outputs to 2f42ad272c7aec4c94f0d9525080e6de
Doing the exact thing by just pasting it in the shell outputs to 1722e126192656712a1d352e550f1317
The latter one is correct (accepted by fritzbox) the first one is wrong.
Calling the script with bash script.sh results in the proper hash, calling it with sh script.sh results in the wrong hash, which leads to the new question: How come the output is any different between sh and bash?
Different versions of echo behave in very different ways. Some take command options (like -n) that modify their behavior (including -n suppressing the trailing linefeed), and some don't. Some interpret escape sequences in the string itself (including \c at the end of the string suppressing the trailing linefeed)... and some don't. Some do both. It appears the version of echo (/bin/echo) on your system doesn't take options, and therefore treats -n as a string to be printed. If you're using bash, its builtin version overrides /bin/echo, and does interpret flags.
Basically, echo is a mess of inconsistency and portability traps. So don't use it, use printf instead. It's a little more complicated because you have to specify a format string, then the actual stuff you want printed, but it can save a ton of headaches.
$ printf "%s" "challenge-password1234" | iconv -f ISO8859-1 -t UTF-16LE | md5
1722e126192656712a1d352e550f1317
And by the way, here's what the echo command was actually printing:
$ printf "%s\n" "-n challenge-password1234" | iconv -f ISO8859-1 -t UTF-16LE | md5
2f42ad272c7aec4c94f0d9525080e6de
You are specifically asking for the difference between script and command line result. Please note that there can be other cases in which the script result will not be useable for your Fritzbox.
The sample code for the session handling in the AVM Fritzbox documentation is written in C# see
https://avm.de/fileadmin/user_upload/Global/Service/Schnittstellen/AVM_Technical_Note_-_Session_ID.pdf
from that code I derived https://github.com/WolfgangFahl/fritz-csharp-api
and added some tests:
https://github.com/WolfgangFahl/fritz-csharp-api/blob/master/fritzsimpletest.cs
which where inspired by the tests in:
https://github.com/WolfgangFahl/fritzbox-java-api/blob/master/src/test/java/com/github/kaklakariada/fritzbox/Md5ServiceTest.java
Basically there were 5 examples:
"" -> "d41d8cd98f00b204e9800998ecf8427e"
"secret", "09433e1853385270b51511571e35eeca"
"test", "c8059e2ec7419f590e79d7f1b774bfe6"
"1234567z-äbc", "9e224a41eeefa284df7bb0f26c2913e2";
"!\"§$%&/()=?ßüäöÜÄÖ-.,;:_`´+*#'<>≤|" -> "ad44a7cb10a95cb0c4d7ae90b0ff118a"
and yours is now example number 6:
"challenge-password1234" -> "1722e126192656712a1d352e550f1317"
and these behave the same in the Java and C# implementation. Now trying out these with the bash script below which has
echo -n "$l_s" | iconv --from-code ISO8859-1 --to-code UTF-16LE | md5sum -b | gawk '{print substr($0,1,32)}'
which is e.g. discussed in https://www.ip-phone-forum.de/threads/fritzbox-challenge-response-in-sh.264639/
as it's getmd5 function gives different results for the Umlaut cases e.g. in my bash on Mac OS Sierra.
There is already some debug output added.
The encoding given for 1234567z-äbc has the byte sequence 2d c3 a4 62 63 while e.g. the java implementation has 2d e4 62 63.
So beware of umlauts in your password - the fritzbox access might fail using this script solution. I am looking for a workaround and will post it here when i find it.
bash script
#!/bin/bash
# WF 2017-10-30
# Fritzbox handling
#
# get the property with the given name
# params
# 1: the property name e.g. fritzbox.url, fritzbox.username, fritzbox.password
#
getprop() {
local l_prop="$1"
cat $HOME/.fritzbox/application.properties | grep "$l_prop" | cut -f2 -d=
}
#
# get a value from the fritzbox login_sid.lua
#
getboxval() {
local l_node="$1"
local l_response="$2"
if [ "$l_response" != "" ]
then
l_data="&response=$l_response"
fi
fxml=/tmp/fxml$$
curl --insecure -s "${box_url}/login_sid.lua?username=${username}$l_response" > $fxml
cat $fxml |
gawk -v node=$l_node 'match($0,"<"node">([0-9a-f]+)</"node">",m) { print m[1] }'
cat $fxml
rm $fxml
}
#
# get the md5 for the given string
#
# see https://avm.de/fileadmin/user_upload/Global/Service/Schnittstellen/AVM_Technical_Note_-_Session_ID.pdf
#
# param
# 1: s - the string
#
# return
# md5
#
getmd5() {
local l_s="$1"
echo -n "$l_s" | iconv -f ISO8859-1 -t UTF-16LE | od -x
echo -n "$l_s" | iconv --from-code ISO8859-1 --to-code UTF-16LE | md5sum -b | gawk '{print substr($0,1,32)}'
}
# get global settings from application properties
box_url=$(getprop fritzbox.url)
username=$(getprop fritzbox.username)
password=$(getprop fritzbox.password)
# uncomment to test
getmd5 ""
# should be d41d8cd98f00b204e9800998ecf8427e
getmd5 secret
# should be 09433e1853385270b51511571e35eeca
getmd5 test
# should be c8059e2ec7419f590e79d7f1b774bfe6
getmd5 1234567z-äbc
# should be 9e224a41eeefa284df7bb0f26c2913e2
getmd5 "!\"§$%&/()=?ßüäöÜÄÖ-.,;:_\`´+*#'<>≤|"
# should be ad44a7cb10a95cb0c4d7ae90b0ff118a
exit
# Login and get SID
challenge=$(getboxval Challenge "")
echo "challenge=$challenge"
md5=$(getmd5 "${challenge}-${password}")
echo "md5=$md5"
response="${challenge}-${md5}"
echo "response=$response"
getboxval SID "$response"

Resources