Sanitize a string for json [duplicate] - bash

I'm using git, then posting the commit message and other bits as a JSON payload to a server.
Currently I have:
MSG=`git log -n 1 --format=oneline | grep -o ' .\+'`
which sets MSG to something like:
Calendar can't go back past today
then
curl -i -X POST \
-H 'Accept: application/text' \
-H 'Content-type: application/json' \
-d "{'payload': {'message': '$MSG'}}" \
'https://example.com'
My real JSON has another couple of fields.
This works fine, but of course when I have a commit message such as the one above with an apostrophe in it, the JSON is invalid.
How can I escape the characters required in bash? I'm not familiar with the language, so am not sure where to start. Replacing ' with \' would do the job at minimum I suspect.

jq can do this.
Lightweight, free, and written in C, jq enjoys widespread community support with over 15k stars on GitHub. I personally find it very speedy and useful in my daily workflow.
Convert string to JSON
echo -n '猫に小判' | jq -Rsa .
# "\u732b\u306b\u5c0f\u5224"
To explain,
-R means "raw input"
-s means "include linebreaks" (mnemonic: "slurp")
-a means "ascii output" (optional)
. means "output the root of the JSON document"
Git + Grep Use Case
To fix the code example given by the OP, simply pipe through jq.
MSG=`git log -n 1 --format=oneline | grep -o ' .\+' | jq -Rsa .`

Using Python:
This solution is not pure bash, but it's non-invasive and handles unicode.
json_escape () {
printf '%s' "$1" | python -c 'import json,sys; print(json.dumps(sys.stdin.read()))'
}
Note that JSON is part of the standard python libraries and has been for a long time, so this is a pretty minimal python dependency.
Or using PHP:
json_escape () {
printf '%s' "$1" | php -r 'echo json_encode(file_get_contents("php://stdin"));'
}
Use like so:
$ json_escape "ヤホー"
"\u30e4\u30db\u30fc"

Instead of worrying about how to properly quote the data, just save it to a file and use the # construct that curl allows with the --data option. To ensure that the output of git is correctly escaped for use as a JSON value, use a tool like jq to generate the JSON, instead of creating it manually.
jq -n --arg msg "$(git log -n 1 --format=oneline | grep -o ' .\+')" \
'{payload: { message: $msg }}' > git-tmp.txt
curl -i -X POST \
-H 'Accept: application/text' \
-H 'Content-type: application/json' \
-d #git-tmp.txt \
'https://example.com'
You can also read directly from standard input using -d #-; I leave that as an exercise for the reader to construct the pipeline that reads from git and produces the correct payload message to upload with curl.
(Hint: it's jq ... | curl ... -d#- 'https://example.com' )

I was also trying to escape characters in Bash, for transfer using JSON, when I came across this. I found that there is actually a larger list of characters that must be escaped – particularly if you are trying to handle free form text.
There are two tips I found useful:
Use the Bash ${string//substring/replacement} syntax described in this thread.
Use the actual control characters for tab, newline, carriage return, etc. In vim you can enter these by typing Ctrl+V followed by the actual control code (Ctrl+I for tab for example).
The resultant Bash replacements I came up with are as follows:
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\\/\\\\} # \
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\//\\\/} # /
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\'/\\\'} # ' (not strictly needed ?)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\"/\\\"} # "
JSON_TOPIC_RAW=${JSON_TOPIC_RAW// /\\t} # \t (tab)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//
/\\\n} # \n (newline)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^M/\\\r} # \r (carriage return)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^L/\\\f} # \f (form feed)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^H/\\\b} # \b (backspace)
I have not at this stage worked out how to escape Unicode characters correctly which is also (apparently) required. I will update my answer if I work this out.

OK, found out what to do. Bash supports this natively as expected, though as always, the syntax isn't really very guessable!
Essentially ${string//substring/replacement} returns what you'd image, so you can use
MSG=${MSG//\'/\\\'}
To do this. The next problem is that the first regex doesn't work anymore, but that can be replaced with
git log -n 1 --pretty=format:'%s'
In the end, I didn't even need to escape them. Instead, I just swapped all the ' in the JSON to \". Well, you learn something every day.

git log -n 1 --format=oneline | grep -o ' .\+' | jq --slurp --raw-input
The above line works for me. refer to
https://github.com/stedolan/jq for more jq tools

I found something like that :
MSG=`echo $MSG | sed "s/'/\\\\\'/g"`

The simplest way is using jshon, a command line tool to parse, read and create JSON.
jshon -s 'Your data goes here.' 2>/dev/null

[...] with an apostrophe in it, the JSON is invalid.
Not according to https://www.json.org. A single quote is allowed in a JSON string.
How can I escape the characters required in bash?
You can use xidel to properly prepare the JSON you want to POST.
As https://example.com can't be tested, I'll be using https://api.github.com/markdown (see this answer) as an example.
Let's assume 'çömmít' "mêssågè" as the exotic output of git log -n 1 --pretty=format:'%s'.
Create the (serialized) JSON object with the value of the "text"-attribute properly escaped:
$ git log -n 1 --pretty=format:'%s' | \
xidel -se 'serialize({"text":$raw},{"method":"json","encoding":"us-ascii"})'
{"text":"'\u00E7\u00F6mm\u00EDt' \"m\u00EAss\u00E5g\u00E8\""}
Curl (variable)
$ eval "$(
git log -n 1 --pretty=format:'%s' | \
xidel -se 'msg:=serialize({"text":$raw},{"method":"json","encoding":"us-ascii"})' --output-format=bash
)"
$ echo $msg
{"text":"'\u00E7\u00F6mm\u00EDt' \"m\u00EAss\u00E5g\u00E8\""}
$ curl -d "$msg" https://api.github.com/markdown
<p>'çömmít' "mêssågè"</p>
Curl (pipe)
$ git log -n 1 --pretty=format:'%s' | \
xidel -se 'serialize({"text":$raw},{"method":"json","encoding":"us-ascii"})' | \
curl -d#- https://api.github.com/markdown
<p>'çömmít' "mêssågè"</p>
Actually, there's no need for curl if you're already using xidel.
Xidel (pipe)
$ git log -n 1 --pretty=format:'%s' | \
xidel -s \
-d '{serialize({"text":read()},{"method":"json","encoding":"us-ascii"})}' \
"https://api.github.com/markdown" \
-e '$raw'
<p>'çömmít' "mêssågè"</p>
Xidel (pipe, in-query)
$ git log -n 1 --pretty=format:'%s' | \
xidel -se '
x:request({
"post":serialize(
{"text":$raw},
{"method":"json","encoding":"us-ascii"}
),
"url":"https://api.github.com/markdown"
})/raw
'
<p>'çömmít' "mêssågè"</p>
Xidel (all in-query)
$ xidel -se '
x:request({
"post":serialize(
{"text":system("git log -n 1 --pretty=format:'\''%s'\''")},
{"method":"json","encoding":"us-ascii"}
),
"url":"https://api.github.com/markdown"
})/raw
'
<p>'çömmít' "mêssågè"</p>

This is an escaping solution using Perl that escapes backslash (\), double-quote (") and control characters U+0000 to U+001F:
$ echo -ne "Hello, 🌵\n\tBye" | \
perl -pe 's/(\\(\\\\)*)/$1$1/g; s/(?!\\)(["\x00-\x1f])/sprintf("\\u%04x",ord($1))/eg;'
Hello, 🌵\u000a\u0009Bye

I struggled with the same problem. I was trying to add a variable on the payload of cURL in bash and it kept returning as invalid_JSON. After trying a LOT of escaping tricks, I reached a simple method that fixed my issue. The answer was all in the single and double quotes:
curl --location --request POST 'https://hooks.slack.com/services/test-slack-hook' \
--header 'Content-Type: application/json' \
--data-raw '{"text":'"$data"'}'
Maybe it comes in handy for someone!

I had the same idea to send a message with commit message after commit.
First i tryed similar was as autor here.
But later found a better and simpler solution.
Just created php file which is sending message and call it with wget.
in hooks/post-receive :
wget -qO - "http://localhost/git.php"
in git.php:
chdir("/opt/git/project.git");
$git_log = exec("git log -n 1 --format=oneline | grep -o ' .\+'");
And then create JSON and call CURL in PHP style

Integrating a JSON-aware tool in your environment is sometimes a no-go, so here's a POSIX solution that should work on every UNIX/Linux:
json_stringify() {
[ "$#" -ge 1 ] || return 1
LANG=C awk '
BEGIN {
for ( i = 1; i <= 127; i++ )
repl[ sprintf( "%c", i) ] = sprintf( "\\u%04x", i )
for ( i = 1; i < ARGC; i++ ) {
s = ARGV[i]
printf("%s", "\"")
while ( match( s, /[\001-\037\177"\\]/ ) ) {
printf("%s%s", \
substr(s,1,RSTART-1), \
repl[ substr(s,RSTART,RLENGTH) ] \
)
s = substr(s,RSTART+RLENGTH)
}
print s "\""
}
exit
}
' "$#"
}
Or using the widely available perl:
json_stringify() {
[ "$#" -ge 1 ] || return 1
LANG=C perl -le '
for (#ARGV) {
s/[\x00-\x1f\x7f"\\]/sprintf("\\u%04x",ord($0))/ge;
print "\"$_\""
}
' -- "$#"
}
Then you can do:
json_stringify '"foo\bar"' 'hello
world'
"\u0022foo\bar\u0022"
"hello\u000aworld"
limitations:
Doesn't handle NUL bytes.
Doesn't validate the input for UNICODE, it only escapes the mandatory ASCII characters specified in the RFC 8259.
Replying to OP's question:
MSG=$(git log -n 1 --format=oneline | grep -o ' .\+')
curl -i -X POST \
-H 'Accept: application/text' \
-H 'Content-type: application/json' \
-d '{"payload": {"message": '"$(json_stringify "$MSG")"'}}' \
'https://example.com'

Related

Unable to loop through the JSON internal Array having spaces in values using Bash script JQ [duplicate]

Background
I want to be able to pass a json file to WP CLI, to iteratively create posts.
So I thought I could create a JSON file:
[
{
"post_type": "post",
"post_title": "Test",
"post_content": "[leaflet-map][leaflet-marker]",
"post_status": "publish"
},
{
"post_type": "post",
"post_title": "Number 2",
"post_content": "[leaflet-map fitbounds][leaflet-circle]",
"post_status": "publish"
}
]
and iterate the array with jq:
cat posts.json | jq --raw-output .[]
I want to be able to iterate these to execute a similar function:
wp post create \
--post_type=post \
--post_title='Test Map' \
--post_content='[leaflet-map] [leaflet-marker]' \
--post_status='publish'
Is there a way I can do this with jq, or similar?
The closest I've gotten so far is this:
> for i in $(cat posts.json | jq -c .[]); do echo $i; done
But this seems to take issue with the (valid) spaces in the strings. Output:
{"post_type":"post","post_title":"Test","post_content":"[leaflet-map][leaflet-marker]","post_status":"publish"}
{"post_type":"post","post_title":"Number
2","post_content":"[leaflet-map
fitbounds][leaflet-circle]","post_status":"publish"}
Am I way off with this approach, or can it be done?
Use a while to read entire lines, rather than iterating over the words resulting from the command substitution.
while IFS= read -r obj; do
...
done < <(jq -c '.[]' posts.json)
Maybe this would work for you:
Make a bash executable, maybe call it wpfunction.sh
#!/bin/bash
wp post create \
--post_type="$1"\
--post_title="$2" \
--post_content="$3" \
--post_status="$4"
Then run jq on your posts.json and pipe it into xargs
jq -M -c '.[] | [.post_type, .post_title, .post_content, .post_status][]' \
posts.json | xargs -n4 ./wpfunction`
I am experimenting to see how this would handle post_content that contained quotes...
First generate an array of the arguments you wish to pass then convert to a shell compatible form using #sh. Then you could pass to xargs to invoke the command.
$ jq -r '.[] | ["post", "create", (to_entries[] | "--\(.key)=\(.value|tojson)")] | #sh' input.json | xargs wp

Create variables base on cURL response - Bash

I'm trying to create 2 variables via bash $lat, $long base on the result of my curl response.
curl ipinfo.io/33.62.137.111 | grep "loc" | awk '{print $2}'
I got.
"42.6334,-71.3162",
I'm trying to get
$lat=42.6334
$long=-71.3162
Can someone give me a little push ?
IFS=, read -r lat long < <(
curl -s ipinfo.io/33.62.137.111 |
jq -r '.loc'
)
printf 'Latitude is: %s\nLongitude is: %s\n' "$lat" "$long"
The ipinfo.io API is returning JSON data, so let parse it with jq:
Here is the JSON as returned by the query from your sample:
{
"ip": "33.62.137.111",
"city": "Columbus",
"region": "Ohio",
"country": "US",
"loc": "39.9690,-83.0114",
"postal": "43218",
"timezone": "America/New_York",
"readme": "https://ipinfo.io/missingauth"
}
We are going to JSON query the loc entry from the main root object ..
curl -s ipinfo.io/33.62.137.111: download the JSON data -s silently without progress.
jq -r '.loc': Process JSON data, query the loc entry of the main object and -r output raw string.
IFS=, read -r lat long < <(: Sets the Internal Field Separator to , and read both lat and long variables from the following command group output stream.
Although the answer from #LeaGris is quite interesting, if you don't want to use an external library or something, you can try this:
Playground: https://repl.it/repls/ThoughtfulImpressiveComputer
coordinates=($(curl ipinfo.io/33.62.137.111 | sed 's/ //g' | grep -P '(?<=\"loc\":").*?(?=\")' -o | tr ',' ' '))
echo "${coordinates[#]}"
echo ${coordinates[0]}
echo ${coordinates[1]}
Example output:
39.9690 -83.0114 # echo "${coordinates[#]}"
39.9690 # ${coordinates[0]}
-83.0114 # ${coordinates[1]}
Explanation:
curl ... get the JSON data
sed 's/ //g' remove all spaces
grep -P ... -o
-P interpret the given pattern as a perl regexp
(?<=\"loc\":").*?(?=\")
(?<=\"loc\":") regex lookbehind
.*? capture the longitude and latitude part with non-greedy search
(?=\") regex lookahead
-o get only the matching part which'ld be e.g. 39.9690,-83.0114
tr ',' ' ' replace , with space
Finally we got something like this: 39.9690 -83.0114
Putting it in parentheses lets us create an array with two values in it (cf. ${coordinates[...]}).

CURL error "URL using bad/illegal format or missing URL" when trying to pass variable as a part of URL

When I'm trying to execute script below and getting error: "curl: (3) URL using bad/illegal format or missing URL"
#!/bin/bash
stage="develop"
branch="branch_name"
getDefinition=$(curl -u user#example.com:password -X GET "https://dev.azure.com/organization/project/_apis/build/definitions?api-version=5.1")
for def in $(echo "$getDefinition" | jq '.value[] | select (.path=="\\Some_path\\'$stage'") | .id'); do
getBuildInfo=$(curl -u user#example.com:password -X GET "https://dev.azure.com/organization/project/_apis/build/definitions/${def}\?api-version=5.1")
# echo $def
body=$(echo "${getBuildInfo}" | jq '.repository.defaultBranch = "refs/heads/release/'"${branch}"'"' | jq '.options[].inputs.branchFilters = "[\"+refs/heads/release/'"${branch}"'\"]"' | jq '.triggers[].branchFilters[] = "+refs/heads/release/'"${branch}"'"')
echo ${body} > data.json
done
It happens when I'm trying to pass variable ${def} into a line:
curl -u user#example.com:password -X GET "https://dev.azure.com/organization/project/_apis/build/definitions/${def}\?api-version=5.1"
But when I declare an array, curl works as expected.
Example:
declare -a def
def=(1 2 3 4)
curl -u user#example.com:password -X GET "https://dev.azure.com/organization/project/_apis/build/definitions/${def}\?api-version=5.1"
Could you please suggest how can I pass variable into URL properly?
Do you need to call curl for 4 times? If so.
for def in 1 2 3 4; do curl -u user#example.com:password -X GET "https://dev.azure.com/organization/project/_apis/build/definitions/${def}\?api-version=5.1"; done
Set IFS=$' \t\r\n' at the top of your script.
IFS is the Interactive Field Separator.
In UNIX, IFS is space, tab, newline, or $' \t\n', but on Windows this needs to be $' \t\r\n'.
The ^M character is \r.

ShellScript to send Mattermost notification is not working

I want to send a Message into a Mattermost channel with the help of a ShellScript/WebHook/cURL. The following code is the function to send the Message.
function matterSend() {
ENDPOINT=https://url.to.Mattermost/WebhookID
USERNAME="${USER}"
PAYLOAD=$(cat <<'EOF'
'payload={
"username" : "${USERNAME}",
"channel" : "TestChannel",
"text" : "#### Test to \n
| TestR | TestS | New Mode |
|:-----------|:-----------:|-----------------------------------------------:|
| ${2} | ${3} | ${1} :white_check_mark: |
"
}'
EOF
)
echo "CURL: curl -i -X POST -d ${PAYLOAD} ${ENDPOINT}"
curl -i -X POST -d "${PAYLOAD}" "${ENDPOINT}"
}
As you can see, when I ECHO the command I get:
curl -i -X POST -d 'payload={
"username" : "TestUser",
"channel" : "TestChannel",
"text" : "#### Test to \n
| TestR | TestS | New Mode |
|:-----------|:-----------:|-----------------------------------------------:|
| ${2} | ${3} | ${1} :white_check_mark: |
"
}' https://url.to.Mattermost/WebhookID
If I paste that code directly into the terminal and execute it, it works. But when I run the script with the help of a Jenkins-Job I get the Error:
Unable to parse incoming data","message":"Unable to parse incoming
data".
Why is it not working?
Without knowledge of the API you are connecting to, I would guess that you need
# Drop function keyword, indent body
matterSend() {
# Lowercase variable names; declare them local
local endpoint=https://url.to.Mattermost/WebhookID
local username=$USER
# Pro tip: don't use a variable for the payload if it's effectively static
payload=$(cat <<-__EOF
payload={
"username" : "$username",
"channel" : "TestChannel",
"text" : "#### Test to \\n| TestR | TestS | New Mode |\\n|:-----------|:-----------:|-----------------------------------------------:|\\n| ${2} | ${3} | ${1} :white_check_mark: |\\n"
}
__EOF
)
echo "CURL: curl -i -X POST -d $payload $endpoint"
curl -i -X POST -d "$payload" "$endpoint"
}
Replacing the newlines inside the "text" element with \n (and doubling the backslash because the here document is now being interpreted by the shell when it's assigned) is mildly speculative; perhaps remove the remaining newlines, too. The real beef is removing the misplaced literal single quotes around the payload.
Maybe also explore printf for formatting fixed-width tabular text.
The here document's <<-__EOF uses the unquoted separator __EOF and the dash before it says to remove any tabs from the beginning of each line. Needless to say, the indentation on those lines consists of a literal tab character.
Generating JSON (or XML, or other structured formats) via string concatenation leads to pain and suffering. Instead, use a tool that actually understands the format.
Using a compliant generator such as jq means you no longer need to be responsible for putting \ns in the data (as multi-character strings), changing "s in text to \", or any of the other otherwise-necessary munging.
matterSend() {
# Lowercase variable names; declare them local
local endpoint=https://url.to.Mattermost/WebhookID
local username=$USER
local text="#### Test to
| TestR | TestS | New Mode |
|:-----------|:-----------:|-----------------------------------------------:|
| ${2} | ${3} | ${1} :white_check_mark: |
"
payload=$(jq --arg username "$username" \
--arg channel "TestChannel" \
--arg text "$text" \
'{"username": $username, "channel": $channel, "text": $text}')
# Advice: use "set -x" if you want to trace the commands your script would run.
# ...or at least printf %q, as below; avoids misleading output from echo.
# printf '%q ' curl -i -X POST -d "$payload" "$endpoint" >&2; echo >&2
curl -i -X POST -d "$payload" "$endpoint"
}

How to urlencode data for curl command?

I am trying to write a bash script for testing that takes a parameter and sends it through curl to web site. I need to url encode the value to make sure that special characters are processed properly. What is the best way to do this?
Here is my basic script so far:
#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $#
Use curl --data-urlencode; from man curl:
This posts data, similar to the other --data options with the exception that this performs URL-encoding. To be CGI-compliant, the <data> part should begin with a name followed by a separator and a content specification.
Example usage:
curl \
--data-urlencode "paramName=value" \
--data-urlencode "secondParam=value" \
http://example.com
See the man page for more info.
This requires curl 7.18.0 or newer (released January 2008). Use curl -V to check which version you have.
You can as well encode the query string:
curl --get \
--data-urlencode "p1=value 1" \
--data-urlencode "p2=value 2" \
http://example.com
# http://example.com?p1=value%201&p2=value%202
Another option is to use jq:
$ printf %s 'input text'|jq -sRr #uri
input%20text
$ jq -rn --arg x 'input text' '$x|#uri'
input%20text
-r (--raw-output) outputs the raw contents of strings instead of JSON string literals. -n (--null-input) doesn't read input from STDIN.
-R (--raw-input) treats input lines as strings instead of parsing them as JSON, and -sR (--slurp --raw-input) reads the input into a single string. You can replace -sRr with -Rr if your input only contains a single line or if you don't want to replace linefeeds with %0A:
$ printf %s\\n multiple\ lines of\ text|jq -Rr #uri
multiple%20lines
of%20text
$ printf %s\\n multiple\ lines of\ text|jq -sRr #uri
multiple%20lines%0Aof%20text%0A
Or this percent-encodes all bytes:
xxd -p|tr -d \\n|sed 's/../%&/g'
Here is the pure BASH answer.
Update: Since many changes have been discussed, I have placed this on https://github.com/sfinktah/bash/blob/master/rawurlencode.inc.sh for anybody to issue a PR against.
Note: This solution is not intended to encode unicode or multi-byte characters - which are quite outside BASH's humble native capabilities. It's only intended to encode symbols that would otherwise ruin argument passing in POST or GET requests, e.g. '&', '=' and so forth.
Very Important Note: DO NOT ATTEMPT TO WRITE YOUR OWN UNICODE CONVERSION FUNCTION, IN ANY LANGUAGE, EVER. See end of answer.
rawurlencode() {
local string="${1}"
local strlen=${#string}
local encoded=""
local pos c o
for (( pos=0 ; pos<strlen ; pos++ )); do
c=${string:$pos:1}
case "$c" in
[-_.~a-zA-Z0-9] ) o="${c}" ;;
* ) printf -v o '%%%02x' "'$c"
esac
encoded+="${o}"
done
echo "${encoded}" # You can either set a return variable (FASTER)
REPLY="${encoded}" #+or echo the result (EASIER)... or both... :p
}
You can use it in two ways:
easier: echo http://url/q?=$( rawurlencode "$args" )
faster: rawurlencode "$args"; echo http://url/q?${REPLY}
[edited]
Here's the matching rawurldecode() function, which - with all modesty - is awesome.
# Returns a string in which the sequences with percent (%) signs followed by
# two hex digits have been replaced with literal characters.
rawurldecode() {
# This is perhaps a risky gambit, but since all escape characters must be
# encoded, we can replace %NN with \xNN and pass the lot to printf -b, which
# will decode hex for us
printf -v REPLY '%b' "${1//%/\\x}" # You can either set a return variable (FASTER)
echo "${REPLY}" #+or echo the result (EASIER)... or both... :p
}
With the matching set, we can now perform some simple tests:
$ diff rawurlencode.inc.sh \
<( rawurldecode "$( rawurlencode "$( cat rawurlencode.inc.sh )" )" ) \
&& echo Matched
Output: Matched
And if you really really feel that you need an external tool (well, it will go a lot faster, and might do binary files and such...) I found this on my OpenWRT router...
replace_value=$(echo $replace_value | sed -f /usr/lib/ddns/url_escape.sed)
Where url_escape.sed was a file that contained these rules:
# sed url escaping
s:%:%25:g
s: :%20:g
s:<:%3C:g
s:>:%3E:g
s:#:%23:g
s:{:%7B:g
s:}:%7D:g
s:|:%7C:g
s:\\:%5C:g
s:\^:%5E:g
s:~:%7E:g
s:\[:%5B:g
s:\]:%5D:g
s:`:%60:g
s:;:%3B:g
s:/:%2F:g
s:?:%3F:g
s^:^%3A^g
s:#:%40:g
s:=:%3D:g
s:&:%26:g
s:\$:%24:g
s:\!:%21:g
s:\*:%2A:g
While it is not impossible to write such a script in BASH (probably using xxd and a very lengthy ruleset) capable of handing UTF-8 input, there are faster and more reliable ways. Attempting to decode UTF-8 into UTF-32 is a non-trivial task to do with accuracy, though very easy to do inaccurately such that you think it works until the day it doesn't.
Even the Unicode Consortium removed their sample code after discovering it was no longer 100% compatible with the actual standard.
The Unicode standard is constantly evolving, and has become extremely nuanced. Any implementation you can whip together will not be properly compliant, and if by some extreme effort you managed it, it wouldn't stay compliant.
Use Perl's URI::Escape module and uri_escape function in the second line of your bash script:
...
value="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$2")"
...
Edit: Fix quoting problems, as suggested by Chris Johnsen in the comments. Thanks!
One of variants, may be ugly, but simple:
urlencode() {
local data
if [[ $# != 1 ]]; then
echo "Usage: $0 string-to-urlencode"
return 1
fi
data="$(curl -s -o /dev/null -w %{url_effective} --get --data-urlencode "$1" "")"
if [[ $? != 3 ]]; then
echo "Unexpected error" 1>&2
return 2
fi
echo "${data##/?}"
return 0
}
Here is the one-liner version for example (as suggested by Bruno):
date | curl -Gso /dev/null -w %{url_effective} --data-urlencode #- "" | cut -c 3-
# If you experience the trailing %0A, use
date | curl -Gso /dev/null -w %{url_effective} --data-urlencode #- "" | sed -E 's/..(.*).../\1/'
for the sake of completeness, many solutions using sed or awk only translate a special set of characters and are hence quite large by code size and also dont translate other special characters that should be encoded.
a safe way to urlencode would be to just encode every single byte - even those that would've been allowed.
echo -ne 'some random\nbytes' | xxd -plain | tr -d '\n' | sed 's/\(..\)/%\1/g'
xxd is taking care here that the input is handled as bytes and not characters.
edit:
xxd comes with the vim-common package in Debian and I was just on a system where it was not installed and I didnt want to install it. The altornative is to use hexdump from the bsdmainutils package in Debian. According to the following graph, bsdmainutils and vim-common should have an about equal likelihood to be installed:
http://qa.debian.org/popcon-png.php?packages=vim-common%2Cbsdmainutils&show_installed=1&want_legend=1&want_ticks=1
but nevertheless here a version which uses hexdump instead of xxd and allows to avoid the tr call:
echo -ne 'some random\nbytes' | hexdump -v -e '/1 "%02x"' | sed 's/\(..\)/%\1/g'
I find it more readable in python:
encoded_value=$(python3 -c "import urllib.parse; print urllib.parse.quote('''$value''')")
the triple ' ensures that single quotes in value won't hurt. urllib is in the standard library. It work for example for this crazy (real world) url:
"http://www.rai.it/dl/audio/" "1264165523944Ho servito il re d'Inghilterra - Puntata 7
I've found the following snippet useful to stick it into a chain of program calls, where URI::Escape might not be installed:
perl -p -e 's/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg'
(source)
If you wish to run GET request and use pure curl just add --get to #Jacob's solution.
Here is an example:
curl -v --get --data-urlencode "access_token=$(cat .fb_access_token)" https://graph.facebook.com/me/feed
This may be the best one:
after=$(echo -e "$before" | od -An -tx1 | tr ' ' % | xargs printf "%s")
Direct link to awk version : http://www.shelldorado.com/scripts/cmds/urlencode
I used it for years and it works like a charm
:
##########################################################################
# Title : urlencode - encode URL data
# Author : Heiner Steven (heiner.steven#odn.de)
# Date : 2000-03-15
# Requires : awk
# Categories : File Conversion, WWW, CGI
# SCCS-Id. : #(#) urlencode 1.4 06/10/29
##########################################################################
# Description
# Encode data according to
# RFC 1738: "Uniform Resource Locators (URL)" and
# RFC 1866: "Hypertext Markup Language - 2.0" (HTML)
#
# This encoding is used i.e. for the MIME type
# "application/x-www-form-urlencoded"
#
# Notes
# o The default behaviour is not to encode the line endings. This
# may not be what was intended, because the result will be
# multiple lines of output (which cannot be used in an URL or a
# HTTP "POST" request). If the desired output should be one
# line, use the "-l" option.
#
# o The "-l" option assumes, that the end-of-line is denoted by
# the character LF (ASCII 10). This is not true for Windows or
# Mac systems, where the end of a line is denoted by the two
# characters CR LF (ASCII 13 10).
# We use this for symmetry; data processed in the following way:
# cat | urlencode -l | urldecode -l
# should (and will) result in the original data
#
# o Large lines (or binary files) will break many AWK
# implementations. If you get the message
# awk: record `...' too long
# record number xxx
# consider using GNU AWK (gawk).
#
# o urlencode will always terminate it's output with an EOL
# character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
# urldecode
##########################################################################
PN=`basename "$0"` # Program name
VER='1.4'
: ${AWK=awk}
Usage () {
echo >&2 "$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
-l: encode line endings (result will be one line of output)
The default is to encode each input line on its own."
exit 1
}
Msg () {
for MsgLine
do echo "$PN: $MsgLine" >&2
done
}
Fatal () { Msg "$#"; exit 1; }
set -- `getopt hl "$#" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage # "getopt" detected an error
EncodeEOL=no
while [ $# -gt 0 ]
do
case "$1" in
-l) EncodeEOL=yes;;
--) shift; break;;
-h) Usage;;
-*) Usage;;
*) break;; # First file name
esac
shift
done
LANG=C export LANG
$AWK '
BEGIN {
# We assume an awk implementation that is just plain dumb.
# We will convert an character to its ASCII value with the
# table ord[], and produce two-digit hexadecimal output
# without the printf("%02X") feature.
EOL = "%0A" # "end of line" string (encoded)
split ("1 2 3 4 5 6 7 8 9 A B C D E F", hextab, " ")
hextab [0] = 0
for ( i=1; i<=255; ++i ) ord [ sprintf ("%c", i) "" ] = i + 0
if ("'"$EncodeEOL"'" == "yes") EncodeEOL = 1; else EncodeEOL = 0
}
{
encoded = ""
for ( i=1; i<=length ($0); ++i ) {
c = substr ($0, i, 1)
if ( c ~ /[a-zA-Z0-9.-]/ ) {
encoded = encoded c # safe character
} else if ( c == " " ) {
encoded = encoded "+" # special handling
} else {
# unsafe character, encode it as a two-digit hex-number
lo = ord [c] % 16
hi = int (ord [c] / 16);
encoded = encoded "%" hextab [hi] hextab [lo]
}
}
if ( EncodeEOL ) {
printf ("%s", encoded EOL)
} else {
print encoded
}
}
END {
#if ( EncodeEOL ) print ""
}
' "$#"
Here's a Bash solution which doesn't invoke any external programs:
uriencode() {
s="${1//'%'/%25}"
s="${s//' '/%20}"
s="${s//'"'/%22}"
s="${s//'#'/%23}"
s="${s//'$'/%24}"
s="${s//'&'/%26}"
s="${s//'+'/%2B}"
s="${s//','/%2C}"
s="${s//'/'/%2F}"
s="${s//':'/%3A}"
s="${s//';'/%3B}"
s="${s//'='/%3D}"
s="${s//'?'/%3F}"
s="${s//'#'/%40}"
s="${s//'['/%5B}"
s="${s//']'/%5D}"
printf %s "$s"
}
url=$(echo "$1" | sed -e 's/%/%25/g' -e 's/ /%20/g' -e 's/!/%21/g' -e 's/"/%22/g' -e 's/#/%23/g' -e 's/\$/%24/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' -e 's/(/%28/g' -e 's/)/%29/g' -e 's/\*/%2a/g' -e 's/+/%2b/g' -e 's/,/%2c/g' -e 's/-/%2d/g' -e 's/\./%2e/g' -e 's/\//%2f/g' -e 's/:/%3a/g' -e 's/;/%3b/g' -e 's//%3e/g' -e 's/?/%3f/g' -e 's/#/%40/g' -e 's/\[/%5b/g' -e 's/\\/%5c/g' -e 's/\]/%5d/g' -e 's/\^/%5e/g' -e 's/_/%5f/g' -e 's/`/%60/g' -e 's/{/%7b/g' -e 's/|/%7c/g' -e 's/}/%7d/g' -e 's/~/%7e/g')
this will encode the string inside of $1 and output it in $url. although you don't have to put it in a var if you want. BTW didn't include the sed for tab thought it would turn it into spaces
Using php from a shell script:
value="http://www.google.com"
encoded=$(php -r "echo rawurlencode('$value');")
# encoded = "http%3A%2F%2Fwww.google.com"
echo $(php -r "echo rawurldecode('$encoded');")
# returns: "http://www.google.com"
http://www.php.net/manual/en/function.rawurlencode.php
http://www.php.net/manual/en/function.rawurldecode.php
If you don't want to depend on Perl you can also use sed. It's a bit messy, as each character has to be escaped individually. Make a file with the following contents and call it urlencode.sed
s/%/%25/g
s/ /%20/g
s/ /%09/g
s/!/%21/g
s/"/%22/g
s/#/%23/g
s/\$/%24/g
s/\&/%26/g
s/'\''/%27/g
s/(/%28/g
s/)/%29/g
s/\*/%2a/g
s/+/%2b/g
s/,/%2c/g
s/-/%2d/g
s/\./%2e/g
s/\//%2f/g
s/:/%3a/g
s/;/%3b/g
s//%3e/g
s/?/%3f/g
s/#/%40/g
s/\[/%5b/g
s/\\/%5c/g
s/\]/%5d/g
s/\^/%5e/g
s/_/%5f/g
s/`/%60/g
s/{/%7b/g
s/|/%7c/g
s/}/%7d/g
s/~/%7e/g
s/ /%09/g
To use it do the following.
STR1=$(echo "https://www.example.com/change&$ ^this to?%checkthe#-functionality" | cut -d\? -f1)
STR2=$(echo "https://www.example.com/change&$ ^this to?%checkthe#-functionality" | cut -d\? -f2)
OUT2=$(echo "$STR2" | sed -f urlencode.sed)
echo "$STR1?$OUT2"
This will split the string into a part that needs encoding, and the part that is fine, encode the part that needs it, then stitches back together.
You can put that into a sh script for convenience, maybe have it take a parameter to encode, put it on your path and then you can just call:
urlencode https://www.exxample.com?isThisFun=HellNo
source
You can emulate javascript's encodeURIComponent in perl. Here's the command:
perl -pe 's/([^a-zA-Z0-9_.!~*()'\''-])/sprintf("%%%02X", ord($1))/ge'
You could set this as a bash alias in .bash_profile:
alias encodeURIComponent='perl -pe '\''s/([^a-zA-Z0-9_.!~*()'\''\'\'''\''-])/sprintf("%%%02X",ord($1))/ge'\'
Now you can pipe into encodeURIComponent:
$ echo -n 'hèllo wôrld!' | encodeURIComponent
h%C3%A8llo%20w%C3%B4rld!
Python 3 based on #sandro's good answer from 2010:
echo "Test & /me" | python -c "import urllib.parse;print (urllib.parse.quote(input()))"
Test%20%26%20/me
This nodejs-based answer will use encodeURIComponent on stdin:
uriencode_stdin() {
node -p 'encodeURIComponent(require("fs").readFileSync(0))'
}
echo -n $'hello\nwörld' | uriencode_stdin
hello%0Aw%C3%B6rld
For those of you looking for a solution that doesn't need perl, here is one that only needs hexdump and awk:
url_encode() {
[ $# -lt 1 ] && { return; }
encodedurl="$1";
# make sure hexdump exists, if not, just give back the url
[ ! -x "/usr/bin/hexdump" ] && { return; }
encodedurl=`
echo $encodedurl | hexdump -v -e '1/1 "%02x\t"' -e '1/1 "%_c\n"' |
LANG=C awk '
$1 == "20" { printf("%s", "+"); next } # space becomes plus
$1 ~ /0[adAD]/ { next } # strip newlines
$2 ~ /^[a-zA-Z0-9.*()\/-]$/ { printf("%s", $2); next } # pass through what we can
{ printf("%%%s", $1) } # take hex value of everything else
'`
}
Stitched together from a couple of places across the net and some local trial and error. It works great!
uni2ascii is very handy:
$ echo -ne '你好世界' | uni2ascii -aJ
%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C
Simple PHP option:
echo 'part-that-needs-encoding' | php -R 'echo urlencode($argn);'
What would parse URLs better than javascript?
node -p "encodeURIComponent('$url')"
Here is a POSIX function to do that:
url_encode() {
awk 'BEGIN {
for (n = 0; n < 125; n++) {
m[sprintf("%c", n)] = n
}
n = 1
while (1) {
s = substr(ARGV[1], n, 1)
if (s == "") {
break
}
t = s ~ /[[:alnum:]_.!~*\47()-]/ ? t s : t sprintf("%%%02X", m[s])
n++
}
print t
}' "$1"
}
Example:
value=$(url_encode "$2")
The question is about doing this in bash and there's no need for python or perl as there is in fact a single command that does exactly what you want - "urlencode".
value=$(urlencode "${2}")
This is also much better, as the above perl answer, for example, doesn't encode all characters correctly. Try it with the long dash you get from Word and you get the wrong encoding.
Note, you need "gridsite-clients" installed to provide this command:
sudo apt install gridsite-clients
Here's the node version:
uriencode() {
node -p "encodeURIComponent('${1//\'/\\\'}')"
}
Another php approach:
echo "encode me" | php -r "echo urlencode(file_get_contents('php://stdin'));"
Here is my version for busybox ash shell for an embedded system, I originally adopted Orwellophile's variant:
urlencode()
{
local S="${1}"
local encoded=""
local ch
local o
for i in $(seq 0 $((${#S} - 1)) )
do
ch=${S:$i:1}
case "${ch}" in
[-_.~a-zA-Z0-9])
o="${ch}"
;;
*)
o=$(printf '%%%02x' "'$ch")
;;
esac
encoded="${encoded}${o}"
done
echo ${encoded}
}
urldecode()
{
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
Ruby, for completeness
value="$(ruby -r cgi -e 'puts CGI.escape(ARGV[0])' "$2")"
Here's a one-line conversion using Lua, similar to blueyed's answer except with all the RFC 3986 Unreserved Characters left unencoded (like this answer):
url=$(echo 'print((arg[1]:gsub("([^%w%-%.%_%~])",function(c)return("%%%02X"):format(c:byte())end)))' | lua - "$1")
Additionally, you may need to ensure that newlines in your string are converted from LF to CRLF, in which case you can insert a gsub("\r?\n", "\r\n") in the chain before the percent-encoding.
Here's a variant that, in the non-standard style of application/x-www-form-urlencoded, does that newline normalization, as well as encoding spaces as '+' instead of '%20' (which could probably be added to the Perl snippet using a similar technique).
url=$(echo 'print((arg[1]:gsub("\r?\n", "\r\n"):gsub("([^%w%-%.%_%~ ]))",function(c)return("%%%02X"):format(c:byte())end):gsub(" ","+"))' | lua - "$1")
In this case, I needed to URL encode the hostname. Don't ask why. Being a minimalist, and a Perl fan, here's what I came up with.
url_encode()
{
echo -n "$1" | perl -pe 's/[^a-zA-Z0-9\/_.~-]/sprintf "%%%02x", ord($&)/ge'
}
Works perfectly for me.

Resources