Splitting /proc/cmdline arguments with spaces - shell

Most scripts that parse /proc/cmdline break it up into words and then filter out arguments with a case statement, example:
CMDLINE="quiet union=aufs wlan=FOO"
for x in $CMDLINE
do
»···case $x in
»···»···wlan=*)
»···»···echo "${x//wlan=}"
»···»···;;
»···esac
done
The problem is when the WLAN ESSID has spaces. Users expect to set wlan='FOO
BAR' (like a shell variable) and then get the unexpected result of 'FOO with the above code, since the for loop splits on spaces.
Is there a better way of parsing the /proc/cmdline from a shell script falling short of almost evaling it?
Or is there some quoting tricks? I was thinking I could perhaps ask users to entity quote spaces and decode like so: /bin/busybox httpd -d "FOO%20BAR". Or is that a bad solution?

There are some ways:
cat /proc/PID/cmdline | tr '\000' ' '
cat /proc/PID/cmdline | xargs -0 echo
These will work with most cases, but will fail when arguments have spaces in them. However I do think that there would be better approaches than using /proc/PID/cmdline.

set -- $(cat /proc/cmdline)
for x in "$#"; do
case "$x" in
wlan=*)
echo "${x#wlan=}"
;;
esac
done

Most commonly, \0ctal escape sequences are used when spaces are unacceptable.
In Bash, printf can be used to unescape them, e.g.
CMDLINE='quiet union=aufs wlan=FOO\040BAR'
for x in $CMDLINE; do
[[ $x = wlan=* ]] || continue
printf '%b\n' "${x#wlan=}"
done

Since you want the shell to parse the /proc/cmdline contents, it's hard to avoid eval'ing it.
#!/bin/bash
eval "kernel_args=( $(cat /proc/cmdline) )"
for arg in "${kernel_args[#]}" ; do
case "${arg}" in
wlan=*)
echo "${arg#wlan=}"
;;
esac
done
This is obviously dangerous though as it would blindly run anything that was specified on the kernel command-line like quiet union=aufs wlan=FOO ) ; touch EVIL ; q=( q.
Escaping spaces (\x20) sounds like the most straightforward and safe way.
A heavy alternative is to use some parser, which understand shell-like syntax.
In this case, you may not even need the shell anymore.
For example, with python:
$ cat /proc/cmdline
quiet union=aufs wlan='FOO BAR' key="val with space" ) ; touch EVIL ; q=( q
$ python -c 'import shlex; print shlex.split(None)' < /proc/cmdline
['quiet', 'union=aufs', 'wlan=FOO BAR', 'key=val with space', ')', ';', 'touch', 'EVIL', ';', 'q=(', 'q']

Use xargs -n1:
[centos#centos7 ~]$ CMDLINE="quiet union=aufs wlan='FOO BAR'"
[centos#centos7 ~]$ echo $CMDLINE
quiet union=aufs wlan='FOO BAR'
[centos#centos7 ~]$ echo $CMDLINE | xargs -n1
quiet
union=aufs
wlan=FOO BAR
[centos#centos7 ~]$ xargs -n1 -a /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.14.4.el7.x86_64
root=UUID=3260cdba-e07e-408f-93b3-c4e9ff55ab10
ro
consoleblank=0
crashkernel=auto
rhgb
quiet
LANG=en_US.UTF-8

You could do something like the following using bash, which would turn those arguments in to variables like $cmdline_union and $cmdline_wlan:
bash -c "for i in $(cat /proc/cmdline); do printf \"cmdline_%q\n\" \"\$i\"; done" | grep = > /tmp/cmdline.sh
. /tmp/cmdline.sh
Then you would quote and/or escape things just like you would in a normal shell.

In posh:
$ f() { echo $1 - $3 - $2 - $4
> }
$ a="quiet union=aufs wlan=FOO"
$ f $a
quiet - wlan=FOO - union=aufs -
You can define a function and give your $CMDLINE unquoted as an argument to the function. Then you'll invoke shell's parsing mechanisms. Note, that you should test this on the shell it will be working in -- zsh does some funny things with quoting ;-).
Then you can just tell the user to do quoting like in shell:
#!/bin/posh
CMDLINE="quiet union=aufs wlan=FOO"
f() {
while test x"$1" != x
do
case $1 in
union=*) echo ${1##union=}; shift;;
*) shift;;
esac
done
}
f $CMDLINE
(posh - Policy-compliant Ordinary SHell, a shell stripped of any features beyond standard POSIX)

Found here a nice way to do it with awk, unfortunately it will work only with doublequotes:
# Replace spaces outside double quotes with newlines
args=`cat /proc/cmdline | tr -d '\n' | awk 'BEGIN {RS="\"";ORS="\"" }{if (NR%2==1){gsub(/ /,"\n",$0);print $0} else {print $0}}'`
IFS='
'
for line in $args; do
key=${line%%=*}
value=${line#*=}
value=`echo $value | sed -e 's/^"//' -e 's/"$//'`
printf "%20s = %s\n" "$key" "$value"
done

Related

How to parse multiple line output as separate variables

I'm relatively new to bash scripting and I would like someone to explain this properly, thank you. Here is my code:
#! /bin/bash
echo "first arg: $1"
echo "first arg: $2"
var="$( grep -rnw $1 -e $2 | cut -d ":" -f1 )"
var2=$( grep -rnw $1 -e $2 | cut -d ":" -f1 | awk '{print substr($0,length,1)}')
echo "$var"
echo "$var2"
The problem I have is with the output, the script I'm trying to write is a c++ function searcher, so upon launching my script I have 2 arguments, one for the directory and the second one as the function name. This is how my output looks like:
first arg: Projekt
first arg: iseven
Projekt/AX/include/ax.h
Projekt/AX/src/ax.cpp
h
p
Now my question is: how do can I save the line by line output as a variable, so that later on I can use var as a path, or to use var2 as a character to compare. My plan was to use IF() statements to determine the type, idea: IF(last_char == p){echo:"something"}What I've tried was this question: Capturing multiple line output into a Bash variable and then giving it an array. So my code looked like: "${var[0]}". Please explain how can I use my line output later on, as variables.
I'd use readarray to populate an array variable just in case there's spaces in your command's output that shouldn't be used as field separators that would end up messing up foo=( ... ). And you can use shell parameter expansion substring syntax to get the last character of a variable; no need for that awk bit in your var2:
#!/usr/bin/env bash
readarray -t lines < <(printf "%s\n" "Projekt/AX/include/ax.h" "Projekt/AX/src/ax.cpp")
for line in "${lines[#]}"; do
printf "%s\n%s\n" "$line" "${line: -1}" # Note the space before the -1
done
will display
Projekt/AX/include/ax.h
h
Projekt/AX/src/ax.cpp
p

Loop through a comma-separated shell variable

Suppose I have a Unix shell variable as below
variable=abc,def,ghij
I want to extract all the values (abc, def and ghij) using a for loop and pass each value into a procedure.
The script should allow extracting arbitrary number of comma-separated values from $variable.
Not messing with IFS
Not calling external command
variable=abc,def,ghij
for i in ${variable//,/ }
do
# call your procedure/other scripts here below
echo "$i"
done
Using bash string manipulation http://www.tldp.org/LDP/abs/html/string-manipulation.html
You can use the following script to dynamically traverse through your variable, no matter how many fields it has as long as it is only comma separated.
variable=abc,def,ghij
for i in $(echo $variable | sed "s/,/ /g")
do
# call your procedure/other scripts here below
echo "$i"
done
Instead of the echo "$i" call above, between the do and done inside the for loop, you can invoke your procedure proc "$i".
Update: The above snippet works if the value of variable does not contain spaces. If you have such a requirement, please use one of the solutions that can change IFS and then parse your variable.
If you set a different field separator, you can directly use a for loop:
IFS=","
for v in $variable
do
# things with "$v" ...
done
You can also store the values in an array and then loop through it as indicated in How do I split a string on a delimiter in Bash?:
IFS=, read -ra values <<< "$variable"
for v in "${values[#]}"
do
# things with "$v"
done
Test
$ variable="abc,def,ghij"
$ IFS=","
$ for v in $variable
> do
> echo "var is $v"
> done
var is abc
var is def
var is ghij
You can find a broader approach in this solution to How to iterate through a comma-separated list and execute a command for each entry.
Examples on the second approach:
$ IFS=, read -ra vals <<< "abc,def,ghij"
$ printf "%s\n" "${vals[#]}"
abc
def
ghij
$ for v in "${vals[#]}"; do echo "$v --"; done
abc --
def --
ghij --
I think syntactically this is cleaner and also passes shell-check linting
variable=abc,def,ghij
for i in ${variable//,/ }
do
# call your procedure/other scripts here below
echo "$i"
done
#/bin/bash
TESTSTR="abc,def,ghij"
for i in $(echo $TESTSTR | tr ',' '\n')
do
echo $i
done
I prefer to use tr instead of sed, becouse sed have problems with special chars like \r \n in some cases.
other solution is to set IFS to certain separator
Another solution not using IFS and still preserving the spaces:
$ var="a bc,def,ghij"
$ while read line; do echo line="$line"; done < <(echo "$var" | tr ',' '\n')
line=a bc
line=def
line=ghij
Here is an alternative tr based solution that doesn't use echo, expressed as a one-liner.
for v in $(tr ',' '\n' <<< "$var") ; do something_with "$v" ; done
It feels tidier without echo but that is just my personal preference.
The following solution:
doesn't need to mess with IFS
doesn't need helper variables (like i in a for-loop)
should be easily extensible to work for multiple separators (with a bracket expression like [:,] in the patterns)
really splits only on the specified separator(s) and not - like some other solutions presented here on e.g. spaces too.
is POSIX compatible
doesn't suffer from any subtle issues that might arise when bash’s nocasematch is on and a separator that has lower/upper case versions is used in a match like with ${parameter/pattern/string} or case
beware that:
it does however work on the variable itself and pop each element from it - if that is not desired, a helper variable is needed
it assumes var to be set and would fail if it's not and set -u is in effect
while true; do
x="${var%%,*}"
echo $x
#x is not really needed here, one can of course directly use "${var%%:*}"
if [ -z "${var##*,*}" ] && [ -n "${var}" ]; then
var="${var#*,}"
else
break
fi
done
Beware that separators that would be special characters in patterns (e.g. a literal *) would need to be quoted accordingly.
Here's my pure bash solution that doesn't change IFS, and can take in a custom regex delimiter.
loop_custom_delimited() {
local list=$1
local delimiter=$2
local item
if [[ $delimiter != ' ' ]]; then
list=$(echo $list | sed 's/ /'`echo -e "\010"`'/g' | sed -E "s/$delimiter/ /g")
fi
for item in $list; do
item=$(echo $item | sed 's/'`echo -e "\010"`'/ /g')
echo "$item"
done
}
Try this one.
#/bin/bash
testpid="abc,def,ghij"
count=`echo $testpid | grep -o ',' | wc -l` # this is not a good way
count=`expr $count + 1`
while [ $count -gt 0 ] ; do
echo $testpid | cut -d ',' -f $i
count=`expr $count - 1 `
done

How can I expand arguments to a bash function into a chain of piped commands?

I often find myself doing something like this a lot:
something | grep cat | grep bat | grep rat
when all I recall is that those three words must have occurred somewhere, in some order, in the output of something...Now, i could do something like this:
something | grep '.*cat.*bat.*rat.*'
but that implies ordering (bat appears after cat). As such, I was thinking of adding a bash function to my environment called mgrep which would turn:
mgrep cat bat rat
into
grep cat | grep bat | grep rat
but I'm not quite sure how to do it (or whether there is an alternative?). One idea would be to for loop over the parameters like so:
while (($#)); do
grep $1 some_thing > some_thing
shift
done
cat some_thing
where some_thing is possibly some fifo like when one does >(cmd) in bash but I'm not sure. How would one proceed?
I believe you could generate a pipeline one command at a time, by redirecting stdin at each step. But it's much simpler and cleaner to generate your pipeline as a string and execute it with eval, like this:
CMD="grep '$1' " # consume the first argument
shift
for arg in "$#" # Add the rest in a pipeline
do
CMD="$CMD | grep '$arg'"
done
eval $CMD
This will generate a pipeline of greps that always reads from standard input, as in your model. Note that it protects spaces in quoted arguments, so that it works correctly if you write:
mgrep 'the cat' 'the bat' 'the rat'
Thanks to Alexis, this is what I did:
function mgrep() #grep multiple keywords
{
CMD=''
while (($#)); do
CMD="$CMD grep \"$1\" | "
shift
done
eval ${CMD%| }
}
You can write a recursive function; I'm not happy with the base case, but I can't think of a better one. It seems a waste to need to call cat just to pass standard input to standard output, and the while loop is a bit inelegant:
mgrep () {
local e=$1;
# shift && grep "$e" | mgrep "$#" || while read -r; do echo "$REPLY"; done
shift && grep "$e" | mgrep "$#" || cat
# Maybe?
# shift && grep "$e" | mgrep "$#" || echo "$(</dev/stdin)"
}

How to decode URL-encoded string in shell?

I have a file with a list of user-agents which are encoded.
E.g.:
Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
I want a shell script which can read this file and write to a new file with decoded strings.
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
I have been trying to use this example to get it going but it is not working so far.
$ echo -e "$(echo "%31+%32%0A%33+%34" | sed 'y/+/ /; s/%/\\x/g')"
My script looks like:
#!/bin/bash
for f in *.log; do
echo -e "$(cat $f | sed 'y/+/ /; s/%/\x/g')" > y.log
done
Here is a simple one-line solution.
$ function urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }
It may look like perl :) but it is just pure bash. No awks, no seds ... no overheads. Using the : builtin, special parameters, pattern substitution and the echo builtin's -e option to translate hex codes into characters. See bash's manpage for further details. You can use this function as separate command
$ urldecode https%3A%2F%2Fgoogle.com%2Fsearch%3Fq%3Durldecode%2Bbash
https://google.com/search?q=urldecode+bash
or in variable assignments, like so:
$ x="http%3A%2F%2Fstackoverflow.com%2Fsearch%3Fq%3Durldecode%2Bbash"
$ y=$(urldecode "$x")
$ echo "$y"
http://stackoverflow.com/search?q=urldecode+bash
If you are a python developer, this maybe preferable:
For Python 3.x (default):
echo -n "%21%20" | python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"
For Python 2.x (deprecated):
echo -n "%21%20" | python -c "import sys, urllib as ul; print ul.unquote(sys.stdin.read());"
urllib is really good at handling URL parsing
With BASH, to read the per cent encoded URL from standard in and decode:
while read; do echo -e ${REPLY//%/\\x}; done
Press CTRL-D to signal the end of file(EOF) and quit gracefully.
You can decode the contents of a file by setting the file to be standard in:
while read; do echo -e ${REPLY//%/\\x}; done < file
You can decode input from a pipe either, for example:
echo 'a%21b' | while read; do echo -e ${REPLY//%/\\x}; done
The read built in command reads standard in until it sees a Line Feed character. It sets a variable called REPLY equal to the line of text it just read.
${REPLY//%/\\x} replaces all instances of '%' with '\x'.
echo -e interprets \xNN as the ASCII character with hexadecimal value of NN.
while repeats this loop until the read command fails, eg. EOF has been reached.
The above does not change '+' to ' '. To change '+' to ' ' also, like guest's answer:
while read; do : "${REPLY//%/\\x}"; echo -e ${_//+/ }; done
: is a BASH builtin command. Here it just takes in a single argument and does nothing with it.
The double quotes make everything inside one single parameter.
_ is a special parameter that is equal to the last argument of the previous command, after argument expansion. This is the value of REPLY with all instances of '%' replaced with '\x'.
${_//+/ } replaces all instances of '+' with ' '.
This uses only BASH and doesn't start any other process, similar to guest's answer.
This is what seems to be working for me.
#!/bin/bash
urldecode(){
echo -e "$(sed 's/+/ /g;s/%\(..\)/\\x\1/g;')"
}
for f in /opt/logs/*.log; do
name=${f##/*/}
cat $f | urldecode > /opt/logs/processed/$HOSTNAME.$name
done
Replacing '+'s with spaces, and % signs with '\x' escapes, and letting echo interpret the \x escapes using the '-e' option was not working. For some reason, the cat command was printing the % sign as its own encoded form %25. So sed was simply replacing %25 with \x25. When the -e option was used, it was simply evaluating \x25 as % and the output was same as the original.
Trace:
Original: Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
sed: Mozilla\x252F5.0\x2520\x2528Macintosh\x253B\x2520U\x253B\x2520Intel\x2520Mac\x2520OS\x2520X\x252010.6\x253B\x2520en
echo -e: Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
Fix: Basically ignore the 2 characters after the % in sed.
sed: Mozilla\x2F5.0\x20\x28Macintosh\x3B\x20U\x3B\x20Intel\x20Mac\x20OS\x20X\x2010.6\x3B\x20en
echo -e: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
Not sure what complications this would result in, after extensive testing, but works for now.
Bash script for doing it in native Bash (original source):
LANG=C
urlencode() {
local l=${#1}
for (( i = 0 ; i < l ; i++ )); do
local c=${1:i:1}
case "$c" in
[a-zA-Z0-9.~_-]) printf "$c" ;;
' ') printf + ;;
*) printf '%%%.2X' "'$c"
esac
done
}
urldecode() {
local data=${1//+/ }
printf '%b' "${data//%/\x}"
}
If you want to urldecode file content, just put the file content as an argument.
Here's a test that will run halt if the decoded encoded file content differs (if it runs for a few seconds, the script probably works correctly):
while true
do cat /dev/urandom | tr -d '\0' | head -c1000 > /tmp/tmp;
A="$(cat /tmp/tmp; printf x)"
A=${A%x}
A=$(urlencode "$A")
urldecode "$A" > /tmp/tmp2
cmp /tmp/tmp /tmp/tmp2
if [ $? != 0 ]
then break
fi
done
perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/pack H2,$1/gie' ./*.log
With -i updates the files in-place (some sed implementations have borrowed that from perl) with .back as the backup extension.
s/x/y/e substitutes x with the evaluation of the y perl code.
The perl code in this case uses pack to pack the hex number captured in $1 (first parentheses pair in the regexp) as the corresponding character.
An alternative to pack is to use chr(hex($1)):
perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/chr hex $1/gie' ./*.log
If available, you could also use uri_unescape() from URI::Escape:
perl -pi.back -MURI::Escape -e 'y/+/ /;$_=uri_unescape$_' ./*.log
bash idiom for url-decoding
Here is a bash idiom for url-decoding a string held in variabe x and assigning the result to variable y:
: "${x//+/ }"; printf -v y '%b' "${_//%/\\x}"
Unlike the accepted answer, it preserves trailing newlines during assignment. (Try assigning the result of url-decoding v%0A%0A%0A to a variable.)
It also is fast. It is 6700% faster at assigning the result of url-decoding to a variable than the accepted answer.
Caveat: It is not possible for a bash variable to contain a NUL. For example, any bash solution attempting to decode %00 and assign the result to a variable will not work.
Benchmark details
function.sh
#!/bin/bash
urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }
x=%21%20
for (( i=0; i<5000; i++ )); do
y=$(urldecode "$x")
done
idiom.sh
#!/bin/bash
x=%21%20
for (( i=0; i<5000; i++ )); do
: "${x//+/ }"; printf -v y '%b' "${_//%/\\x}"
done
$ hyperfine --warmup 5 ./function.sh ./idiom.sh
Benchmark #1: ./function.sh
Time (mean ± σ): 2.844 s ± 0.036 s [User: 1.728 s, System: 1.494 s]
Range (min … max): 2.801 s … 2.907 s 10 runs
Benchmark #2: ./idiom.sh
Time (mean ± σ): 42.4 ms ± 1.0 ms [User: 40.7 ms, System: 1.1 ms]
Range (min … max): 40.5 ms … 44.8 ms 64 runs
Summary
'./idiom.sh' ran
67.06 ± 1.76 times faster than './function.sh'
If you really want a function ...
If you really want a function, say for readability reasons, I suggest the following:
# urldecode [-v var ] argument
#
# Urldecode the argument and print the result.
# It replaces '+' with SPACE and then percent decodes.
# The output is consistent with https://meyerweb.com/eric/tools/dencoder/
#
# Options:
# -v var assign the output to shell variable VAR rather than
# print it to standard output
#
urldecode() {
local assign_to_var=
local OPTIND opt
while getopts ':v:' opt; do
case $opt in
v)
local var=$OPTARG
assign_to_var=Y
;;
\?)
echo "$FUNCNAME: error: -$OPTARG: invalid option" >&2
return 1
;;
:)
echo "$FUNCNAME: error: -$OPTARG: this option requires an argument" >&2
return 1
;;
*)
echo "$FUNCNAME: error: an unexpected execution path has occurred." >&2
return 1
;;
esac
done
shift "$((OPTIND - 1))"
# Convert all '+' to ' '
: "${1//+/ }"
# We exploit that the $_ variable (last argument to the previous command
# after expansion) contains the result of the parameter expansion
if [[ $assign_to_var ]]; then
printf -v "$var" %b "${_//%/\\x}"
else
printf %b "${_//%/\\x}"
fi
}
Example 1: Printing the result to stdout
x='v%0A%0A%0A'
urldecode "$x" | od -An -tx1
Result:
76 0a 0a 0a
Example 2: Assigning the result of decoding to a shell variable:
x='v%0A%0A%0A'
urldecode -v y "$x"
echo -n "$y" | od -An -tx1
(same result)
This function, while not as fast as the idiom above, is still 1300% faster than the accepted answer at doing assignments due to no subshell being involved. In addition, as shown in the example's output, it preserves trailing newlines due to no command substitution being involved.
If you have php installed on your server, you can "cat" or even "tail" any file, with url encoded strings very easily.
tail -f nginx.access.log | php -R 'echo urldecode($argn)."\n";'
As #barti_ddu said in the comments, \x "should be [double-]escaped".
% echo -e "$(echo "Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en" | sed 'y/+/ /; s/%/\\x/g')"
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
Rather than mixing up Bash and sed, I would do this all in Python. Here's a rough cut of how:
#!/usr/bin/env python
import glob
import os
import urllib
for logfile in glob.glob(os.path.join('.', '*.log')):
with open(logfile) as current:
new_log_filename = logfile + '.new'
with open(new_log_filename, 'w') as new_log_file:
for url in current:
unquoted = urllib.unquote(url.strip())
new_log_file.write(unquoted + '\n')
Building upon some of the other answers, but for the POSIX world, could use the following function:
url_decode() {
printf '%b\n' "$(sed -E -e 's/\+/ /g' -e 's/%([0-9a-fA-F]{2})/\\x\1/g')"
}
It uses printf '%b\n' because there is no echo -e and breaks the sed call to make it easier to read, forcing -E to be able to use references with \1. It also forces what follows % to look like some hex code.
Just wanted to share this other solution, pure bash:
encoded_string="Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en"
printf -v decoded_string "%b" "${encoded_string//\%/\\x}"
echo $decoded_string
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
Updating Jay's answer for Python 3.5+:
echo "%31+%32%0A%33+%34" | python -c "import sys; from urllib.parse import unquote ; print(unquote(sys.stdin.read()))"
Still, brendan's bash solution with explanation seems more direct and elegant.
With GNU awk:
LC_ALL=C gawk -vRS='%[[:xdigit:]]{2}' '
RT {RT = sprintf("%c",strtonum("0x" substr(RT, 2)))}
{gsub(/\+/," ");printf "%s", $0 RT}'
Would take URI-encoded on stdin and print the decoded output on stdout.
We set the record separator as a regexp that matches a %XX sequence. In GNU awk, the input that matched it is stored in the RT special variable. We extract the hex digits from there, append to "0x" for strnum() to turn into a number, passed in turn to sprintf("%c") which in the C locale would convert to the corresponding byte value.
With sed:
#!/bin/bash
URL_DECODE="$(echo "$1" | sed -E 's/%([0-9a-fA-F]{2})/\\x\1/g;s/\+/ /g'"
echo -e "$URL_DECODE"
s/%([0-9a-fA-F]{2})/\\x\1/g replaces % with \x to transform urlencoded to hexadecimal
s/\+/ /g replace + to space ' ', in case using + in query string
Just save it to decodeurl.sh and make it executable with chmod +x decodeurl.sh
If you need a way do encode too, this complete code will help:
#!/bin/bash
#
# Enconding e Decoding de URL com sed
#
# Por Daniel Cambría
# daniel.cambria#bureau-it.com
#
# jul/2021
function url_decode() {
echo "$#" \
| sed -E 's/%([0-9a-fA-F]{2})/\\x\1/g;s/\+/ /g'
}
function url_encode() {
# Conforme RFC 3986
echo "$#" \
| sed \
-e 's/ /%20/g' \
-e 's/:/%3A/g' \
-e 's/,/%2C/g' \
-e 's/\?/%3F/g' \
-e 's/#/%23/g' \
-e 's/\[/%5B/g' \
-e 's/\]/%5D/g' \
-e 's/#/%40/g' \
-e 's/!/%41/g' \
-e 's/\$/%24/g' \
-e 's/&/%26/g' \
-e "s/'/%27/g" \
-e 's/(/%28/g' \
-e 's/)/%29/g' \
-e 's/\*/%2A/g' \
-e 's/\+/%2B/g' \
-e 's/,/%2C/g' \
-e 's/;/%3B/g' \
-e 's/=/%3D/g'
}
echo -e "URL decode: " $(url_decode "$1")
echo -e "URL encode: " $(url_encode "$1")
$ uenc='H%C3%B6he %C3%BCber%20dem%20Meeresspiegel'
$ utf8=$(echo -e "${uenc//%/\\x}")
$ echo $utf8
Höhe über dem Meeresspiegel
$
With the zsh shell (instead of bash), the only shell whose variables can hold any byte value including NUL (encoded as %00):
set -o extendedglob +o multibyte
string='Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en'
decoded=${${string//+/ }//(#b)%([[:xdigit:]](#c2))/${(#):-0x$match[1]}}
${var//pattern/replacement}: ksh-style parameter expansion operator to expand to the value of $var with every string matching pattern replaced with replacement.
(#b) activate back references so every part inside brackets in the pattern can be accessed as corresponding $match[n] in the replacement.
(#c2): equivalent of ERE {2}
${(#)param-expansion}: parameter expansion where the # flag causes the result to be interpreted as an arithmetic expression and the corresponding byte value to be returned.
${var:-value}: expands to value if $var is empty, here applied to no variable at all, so we can just specify an arbitrary string as the subject of a parameter expansion.
To make it a function that decodes the contents of a variable in-place:
uridecode_var() {
emulate -L zsh
set -o extendedglob +o multibyte
eval $1='${${'$1'//+/ }//(#b)%([[:xdigit:]](#c2))/${(#):-0x$match[1]}}'
}
$ string='Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en'
$ uridecode_var string
$ print -r -- $string
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
python, for zshrc
# Usage: decodeUrl %3A%2F%2F
function decodeUrl(){
echo "$1" | python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"
}
# Usage: encodeUrl https://google.com/search?q=urldecode+bash
# return: https://google.com/search\?q\=urldecode+bash
function encodeUrl(){
echo "$1" | python3 -c "import sys; from urllib.parse import quote; print(quote(sys.stdin.read()));"
}
used gridsite-clients
1. yum install gridsite-clients / or apt-get install gridsite-clients
2. grep -a 'http' access.log | xargs urlencode -d
Here is a solution that is done in pure bash where input and output are bash variables. It will decode '+' as a space and handle the '%20' space, as well as other %-encoded characters.
#!/bin/bash
#here is text that contains both '+' for spaces and a %20
text="hello+space+1%202"
decoded=$(echo -e `echo $text | sed 's/+/ /g;s/%/\\\\x/g;'`)
echo decoded=$decoded
Expanding to
https://stackoverflow.com/a/37840948/8142470
to work with HTML entities
$ htmldecode() { : "${*//+/ }"; echo -e "${_//&#x/\x}" | tr -d
';'; } $ htmldecode
"http://google.com/search&?q=urldecode+bash" http://google.com/search&?q=urldecode+bash
(argument must be quoted)
Just a quick hint for other who are searching for a busybox compatible solution. In busybox shell you can use
httpd -d $ENCODED_URL
Example use case for busybox:
Download a file with wget and save it with the original decoded filename:
wget --no-check-certificate $ENCODED_URL -O $(basename $(httpd -d $ENCODED_URL))
If you prefer gawk, there's absolutely no need to force LC_ALL=C or gawk -b just to decode URL-encoded -
here's a fully functional proof-of-concept showcasing how gawk-unicode mode could directly decode purely binary files like MP3-audio or MP4-video files that were URL-encoded,and get back the exact same file, as confirmed by hashing.
It uses FS | OFS to handle the spaces that were set to +, similar to python3's quote-plus in their urllib :
( fg && fg && fg ) 2>/dev/null;
gls8x "${f}"
echo
pvE0 < "${f}" | xxh128sum | lgp3
echo ; echo
pvE0 < "${f}" | urlencodeAWKchk \
\
| gawk -ne '
BEGIN {
RS="[%][[:xdigit:]]{2}";
FS="[+]"
_=(4^5)*54 # if this offset doesn-t
# work, try
# 8^7
# instead
} (NF+="_"*(ORS = sprintf("%.*s", RT != "",
sprintf("%c",\
_+("0x" \
substr( RT, 2 ))))))~""' |pvE9|xxh128sum|lgp3
1 -rwxrwxrwx 1 5555 staff 9290187 May 27 2021 genieaudio_16277926_.lossless.mp3*
in0: 8.86MiB 0:00:00 [3.56GiB/s] [3.56GiB/s][=================>] 100%
5d43c221bf6c85abac80eea8dbb412a1 stdin
in0: 8.86MiB 0:00:00 [3.47GiB/s] [3.47GiB/s] [=================>] 100%
out9: 8.86MiB 0:00:05 [1.72MiB/s] [1.72MiB/s] [ <=> ]
5d43c221bf6c85abac80eea8dbb412a1 stdin
1 -rw-r--r-- 1 5555 staff 215098877 Feb 8 17:30 vg3.mp4
in0: 205MiB 0:00:00 [2.66GiB/s] [2.66GiB/s] [=================>] 100%
2778670450b08cee694dcefc23cd4d93 stdin
in0: 205MiB 0:00:00 [3.31GiB/s] [3.31GiB/s] [=================>] 100%
out9: 205MiB 0:02:01 [1.69MiB/s] [1.69MiB/s] [ <=> ]
2778670450b08cee694dcefc23cd4d93 stdin
Minimalistic uridecode [-v varname] bash function:
Comming late on this SO Question (11 year ago), I see:
First answer suggesting the use of printf -v varname %b... was offer by jamp, near than 3 year after question was asked.
Fist answer offering a function for doing this was offered 10 years and 6 month after question, by Robin A. Meade.
Here is my smaller function:
uridecode() {
if [[ $1 == -v ]];then local -n _res="$2"; shift 2; else local _res; fi
: "${*//+/ }"; printf -v _res %b "${_//%/\\x}"
[[ ${_res#A} == _res=* ]] && echo "$_res"
}
Or less condensed:
uridecode() {
if [[ $1 == -v ]];then # If 1st argument is ``-v''
local -n _res="$2" # _res is a nameref to ``$2''
shift 2 # drop 1st two arguments
else
local _res # _res is a local variable
fi
: "${*//+/ }" # _ hold argumenrs having ``+'' replaced by spaces
printf -v _res %b "${_//%/\\x}" # store in _res rendered string
[[ ${_res#A} == _res=* ]] && # print _res if local
echo "$_res"
}
Usage:
uridecode Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
uridecode -v myvar Hell%6f w%6Frld%21
echo $myvar
Hello world!
As I use $* instead of $1, and because URI doesn't hold special characters, there is no need to quote arguments.
A slightly modified version of the Python answer that accepts an input and output file in a one liner.
cat inputfile.txt | python -c "import sys, urllib as ul; print ul.unquote(sys.stdin.read());" > ouputfile.txt
$ uenc='H%C3%B6he %C3%BCber%20dem%20Meeresspiegel'
$ utf8=$(printf "${uenc//%/\\x}")
$ echo $utf8
Höhe über dem Meeresspiegel
$

best way to find top-level directory for path in bash

I need a command that will return the top level base directory for a specified path in bash.
I have an approach that works, but seems ugly:
echo "/go/src/github.myco.com/viper-ace/psn-router" | cut -d "/" -f 2 | xargs printf "/%s"
It seems there is a better way, however all the alternatives I've seen seem worse.
Thanks for any suggestions!
One option is using awk:
echo "/go/src/github.myco.com/viper-ace/psn-router" |
awk -F/ '{print FS $2}'
/go
As a native-bash approach forking no subshells and invoking no other programs (thus, written to minimize overhead), which works correctly in corner cases including directories with newlines:
topdir() {
local re='^(/+[^/]+)'
[[ $1 =~ $re ]] && printf '%s\n' "${BASH_REMATCH[1]}"
}
Like most other solutions here, invocation will then look something like outvar=$(topdir "$path").
To minimize overhead even further, you could pass in the destination variable name rather than capturing stdout:
topdir() {
local re='^(/+[^/]+)'
[[ $1 =~ $re ]] && printf -v "$2" '%s' "${BASH_REMATCH[1]}"
}
...used as: topdir "$path" outvar, after which "$outvar" will expand to the result.
not sure better but with sed
$ echo "/go/src/github.myco.com/viper-ace/psn-router" | sed -E 's_(/[^/]+).*_\1_'
/go
Here's a sed possibility. Still ugly. Handles things like ////////home/path/to/dir. Still blows up on newlines.
$ echo "////home/path/to/dir" | sed 's!/*\([^/]*\).*!\1!g'
/home
Newlines breaking it:
$ cd 'testing '$'\n''this'
$ pwd
/home/path/testing
this
$ pwd | sed 's!/*\([^/]*\).*!/\1!g'
/home
/this
If you know your directories will be rather normally named, your and anubhava's solutions certainly seem to be more readable.
This is bash, sed and tr in a function :
#!/bin/bash
function topdir(){
dir=$( echo "$1" | tr '\n' '_' )
echo "$dir" | sed -e 's#^\(/[^/]*\)\(.*\)$#\1#g'
}
topdir '/go/src/github.com/somedude/someapp'
topdir '/home/somedude'
topdir '/with spaces/more here/app.js'
topdir '/with newline'$'\n''before/somedir/somefile.txt'
Regards!

Resources