How do i iterate all ASCII to use them as variable? [duplicate] - bash

This question already has answers here:
Integer ASCII value to character in BASH using printf
(13 answers)
Closed 3 years ago.
I have a binary "crackme" which i want to try all ASCII characters as parameters, here's what i've done so far.
#!/bin/bash
for ((i=32;i<127;i++))
do
c="\\$(printf %03o "$i")";
echo $c;
./crackme $c;
done
The command executed is "./crackme \65" and obviously i'm trying to execute "./crackme A".

For posterity, a couple of useful functions:
# 65 -> A
chr() { printf "\\x$(printf "%x" "$1")"; }
# A -> 65
ord() { printf "%d" "'$1"; }
The odd final parameter of the ord printf command is documented:
Arguments to non-string format specifiers are treated as C language constants, except that a leading plus or minus sign is allowed, and if the leading character is a single or double quote, the value is the ASCII value of the following character.
then
for ((i = 32; i < 127; i++)); do
./crackme "$(chr $i)"
done

Not pretty, but this works:
for ((i=32;i<127;i++))
do
printf -v c '\\x%x' "$i"
./crackme "$(echo -e $c)"
done

Related

Reading a particular Digit from a given number in shell [duplicate]

I have a string in a Bash shell script that I want to split into an array of characters, not based on a delimiter but just one character per array index. How can I do this? Ideally it would not use any external programs. Let me rephrase that. My goal is portability, so things like sed that are likely to be on any POSIX compatible system are fine.
Try
echo "abcdefg" | fold -w1
Edit: Added a more elegant solution suggested in comments.
echo "abcdefg" | grep -o .
You can access each letter individually already without an array conversion:
$ foo="bar"
$ echo ${foo:0:1}
b
$ echo ${foo:1:1}
a
$ echo ${foo:2:1}
r
If that's not enough, you could use something like this:
$ bar=($(echo $foo|sed 's/\(.\)/\1 /g'))
$ echo ${bar[1]}
a
If you can't even use sed or something like that, you can use the first technique above combined with a while loop using the original string's length (${#foo}) to build the array.
Warning: the code below does not work if the string contains whitespace. I think Vaughn Cato's answer has a better chance at surviving with special chars.
thing=($(i=0; while [ $i -lt ${#foo} ] ; do echo ${foo:$i:1} ; i=$((i+1)) ; done))
As an alternative to iterating over 0 .. ${#string}-1 with a for/while loop, there are two other ways I can think of to do this with only bash: using =~ and using printf. (There's a third possibility using eval and a {..} sequence expression, but this lacks clarity.)
With the correct environment and NLS enabled in bash these will work with non-ASCII as hoped, removing potential sources of failure with older system tools such as sed, if that's a concern. These will work from bash-3.0 (released 2005).
Using =~ and regular expressions, converting a string to an array in a single expression:
string="wonkabars"
[[ "$string" =~ ${string//?/(.)} ]] # splits into array
printf "%s\n" "${BASH_REMATCH[#]:1}" # loop free: reuse fmtstr
declare -a arr=( "${BASH_REMATCH[#]:1}" ) # copy array for later
The way this works is to perform an expansion of string which substitutes each single character for (.), then match this generated regular expression with grouping to capture each individual character into BASH_REMATCH[]. Index 0 is set to the entire string, since that special array is read-only you cannot remove it, note the :1 when the array is expanded to skip over index 0, if needed.
Some quick testing for non-trivial strings (>64 chars) shows this method is substantially faster than one using bash string and array operations.
The above will work with strings containing newlines, =~ supports POSIX ERE where . matches anything except NUL by default, i.e. the regex is compiled without REG_NEWLINE. (The behaviour of POSIX text processing utilities is allowed to be different by default in this respect, and usually is.)
Second option, using printf:
string="wonkabars"
ii=0
while printf "%s%n" "${string:ii++:1}" xx; do
((xx)) && printf "\n" || break
done
This loop increments index ii to print one character at a time, and breaks out when there are no characters left. This would be even simpler if the bash printf returned the number of character printed (as in C) rather than an error status, instead the number of characters printed is captured in xx using %n. (This works at least back as far as bash-2.05b.)
With bash-3.1 and printf -v var you have slightly more flexibility, and can avoid falling off the end of the string should you be doing something other than printing the characters, e.g. to create an array:
declare -a arr
ii=0
while printf -v cc "%s%n" "${string:(ii++):1}" xx; do
((xx)) && arr+=("$cc") || break
done
If your string is stored in variable x, this produces an array y with the individual characters:
i=0
while [ $i -lt ${#x} ]; do y[$i]=${x:$i:1}; i=$((i+1));done
The most simple, complete and elegant solution:
$ read -a ARRAY <<< $(echo "abcdefg" | sed 's/./& /g')
and test
$ echo ${ARRAY[0]}
a
$ echo ${ARRAY[1]}
b
Explanation: read -a reads the stdin as an array and assigns it to the variable ARRAY treating spaces as delimiter for each array item.
The evaluation of echoing the string to sed just add needed spaces between each character.
We are using Here String (<<<) to feed the stdin of the read command.
I have found that the following works the best:
array=( `echo string | grep -o . ` )
(note the backticks)
then if you do: echo ${array[#]} ,
you get: s t r i n g
or: echo ${array[2]} ,
you get: r
Pure Bash solution with no loop:
#!/usr/bin/env bash
str='The quick brown fox jumps over a lazy dog.'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array (skip first record)
# Character 037 is the octal representation of ASCII Record Separator
# so it can capture all other characters in the string, including spaces.
IFS= mapfile -s1 -t -d $'\37' array <<<"${str//?()/$'\37'}"
# Strip out captured trailing newline of here-string in last record
array[-1]="${array[-1]%?}"
# Debug print array
declare -p array
string=hello123
for i in $(seq 0 ${#string})
do array[$i]=${string:$i:1}
done
echo "zero element of array is [${array[0]}]"
echo "entire array is [${array[#]}]"
The zero element of array is [h]. The entire array is [h e l l o 1 2 3 ].
Yet another on :), the stated question simply says 'Split string into character array' and don't say much about the state of the receiving array, and don't say much about special chars like and control chars.
My assumption is that if I want to split a string into an array of chars I want the receiving array containing just that string and no left over from previous runs, yet preserve any special chars.
For instance the proposed solution family like
for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
Have left overs in the target array.
$ y=(1 2 3 4 5 6 7 8)
$ x=abc
$ for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
$ printf '%s ' "${y[#]}"
a b c 4 5 6 7 8
Beside writing the long line each time we want to split a problem, so why not hide all this into a function we can keep is a package source file, with a API like
s2a "Long string" ArrayName
I got this one that seems to do the job.
$ s2a()
> { [ "$2" ] && typeset -n __=$2 && unset $2;
> [ "$1" ] && __+=("${1:0:1}") && s2a "${1:1}"
> }
$ a=(1 2 3 4 5 6 7 8 9 0) ; printf '%s ' "${a[#]}"
1 2 3 4 5 6 7 8 9 0
$ s2a "Split It" a ; printf '%s ' "${a[#]}"
S p l i t I t
If the text can contain spaces:
eval a=( $(echo "this is a test" | sed "s/\(.\)/'\1' /g") )
$ echo hello | awk NF=NF FS=
h e l l o
Or
$ echo hello | awk '$0=RT' RS=[[:alnum:]]
h
e
l
l
o
I know this is a "bash" question, but please let me show you the perfect solution in zsh, a shell very popular these days:
string='this is a string'
string_array=(${(s::)string}) #Parameter expansion. And that's it!
print ${(t)string_array} -> type array
print $#string_array -> 16 items
This is an old post/thread but with a new feature of bash v5.2+ using the shell option patsub_replacement and the =~ operator for regex. More or less same with #mr.spuratic post/answer.
str='There can be only one, the Highlander.'
regexp="${str//?/(&)}"
[[ "$str" =~ $regexp ]] &&
printf '%s\n' "${BASH_REMATCH[#]:1}"
Or by just: (which includes the whole string at index 0)
declare -p BASH_REMATCH
If that is not desired, one can remove the value of the first index (index 0), with
unset -v 'BASH_REMATCH[0]'
instead of using printf or echo to print the value of the array BASH_REMATCH
One can check/see the value of the variable "$regexp" with either
declare -p regexp
Output
declare -- regexp="(T)(h)(e)(r)(e)( )(c)(a)(n)( )(b)(e)( )(o)(n)(l)(y)( )(o)(n)(e)(,)( )(t)(h)(e)( )(H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
or
echo "$regexp"
Using it in a script, one might want to test if the shopt is enabled or not, although the manual says it is on/enabled by default.
Something like.
if ! shopt -q patsub_replacement; then
shopt -s patsub_replacement
fi
But yeah, check the bash version too! If you're not sure which version of bash is in use.
if ! ((BASH_VERSINFO[0] >= 5 && BASH_VERSINFO[1] >= 2)); then
printf 'No dice! bash version 5.2+ is required!\n' >&2
exit 1
fi
Space can be excluded from regexp variable, change it from
regexp="${str//?/(&)}"
To
regexp="${str//[! ]/(&)}"
and the output is:
declare -- regexp="(T)(h)(e)(r)(e) (c)(a)(n) (b)(e) (o)(n)(l)(y) (o)(n)(e) (t)(h)(e) (H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
Maybe not as efficient as the other post/answer but it is still a solution/option.
If you want to store this in an array, you can do this:
string=foo
unset chars
declare -a chars
while read -N 1
do
chars[${#chars[#]}]="$REPLY"
done <<<"$string"x
unset chars[$((${#chars[#]} - 1))]
unset chars[$((${#chars[#]} - 1))]
echo "Array: ${chars[#]}"
Array: f o o
echo "Array length: ${#chars[#]}"
Array length: 3
The final x is necessary to handle the fact that a newline is appended after $string if it doesn't contain one.
If you want to use NUL-separated characters, you can try this:
echo -n "$string" | while read -N 1
do
printf %s "$REPLY"
printf '\0'
done
AWK is quite convenient:
a='123'; echo $a | awk 'BEGIN{FS="";OFS=" "} {print $1,$2,$3}'
where FS and OFS is delimiter for read-in and print-out
For those who landed here searching how to do this in fish:
We can use the builtin string command (since v2.3.0) for string manipulation.
↪ string split '' abc
a
b
c
The output is a list, so array operations will work.
↪ for c in (string split '' abc)
echo char is $c
end
char is a
char is b
char is c
Here's a more complex example iterating over the string with an index.
↪ set --local chars (string split '' abc)
for i in (seq (count $chars))
echo $i: $chars[$i]
end
1: a
2: b
3: c
zsh solution: To put the scalar string variable into arr, which will be an array:
arr=(${(ps::)string})
If you also need support for strings with newlines, you can do:
str2arr(){ local string="$1"; mapfile -d $'\0' Chars < <(for i in $(seq 0 $((${#string}-1))); do printf '%s\u0000' "${string:$i:1}"; done); printf '%s' "(${Chars[*]#Q})" ;}
string=$(printf '%b' "apa\nbepa")
declare -a MyString=$(str2arr "$string")
declare -p MyString
# prints declare -a MyString=([0]="a" [1]="p" [2]="a" [3]=$'\n' [4]="b" [5]="e" [6]="p" [7]="a")
As a response to Alexandro de Oliveira, I think the following is more elegant or at least more intuitive:
while read -r -n1 c ; do arr+=("$c") ; done <<<"hejsan"
declare -r some_string='abcdefghijklmnopqrstuvwxyz'
declare -a some_array
declare -i idx
for ((idx = 0; idx < ${#some_string}; ++idx)); do
some_array+=("${some_string:idx:1}")
done
for idx in "${!some_array[#]}"; do
echo "$((idx)): ${some_array[idx]}"
done
Pure bash, no loop.
Another solution, similar to/adapted from Léa Gris' solution, but using read -a instead of readarray/mapfile :
#!/usr/bin/env bash
str='azerty'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array
# ${str//?()/$'\x1F'} replace each character "c" with "^_c".
# ^_ (Control-_, 0x1f) is Unit Separator (US), you can choose another
# character.
IFS=$'\x1F' read -ra array <<< "${str//?()/$'\x1F'}"
# now, array[0] contains an empty string and the rest of array (starting
# from index 1) contains the original string characters :
declare -p array
# Or, if you prefer to keep the array "clean", you can delete
# the first element and pack the array :
unset array[0]
array=("${array[#]}")
declare -p array
However, I prefer the shorter (and easier to understand for me), where we remove the initial 0x1f before assigning the array :
#!/usr/bin/env bash
str='azerty'
shopt -s extglob
tmp="${str//?()/$'\x1F'}" # same as code above
tmp=${tmp#$'\x1F'} # remove initial 0x1f
IFS=$'\x1F' read -ra array <<< "$tmp" # assign array
declare -p array # verification

Value of var is not shown when concatenating $ and var --> $var [duplicate]

This question already has answers here:
How can I look up a variable by name with #!/bin/sh (POSIX sh)?
(4 answers)
Bash String concatenation and get content
(2 answers)
Closed 4 years ago.
I've got about a day of experience in bash as of now..
string () {
for (( i=0; i<${#1}; i++ ))
do
echo "$"${1:$i:1}""
done
}
string "hello"
This script returns "$h", "$e", "$l", "$l", "$o",
but I actually want it to return the contents of variables h, e, l, l and o.
How do I do that?
You need to use indirect parameter expansion:
for ((i=0; i<${#1}; i++)); do
t=${1:i:1}
echo "${!t}"
done
${!t} takes the value of $t, and treats that as the name of the parameter to expand.
A one-liner using eval to safely write a series of echos to output bash parameter transformation assignment statements, with GNU grep divvying up the input string:
h=foo o=bar
string() { eval $(printf 'echo ${%s#A};\n' $(grep -o . <<< "$#" )) ; }
string hello
Output, (blank lines represent the unset variables $e and $l):
h='foo'
o='bar'

How to iterate over the characters of a string in a POSIX shell script?

A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:
for x in $(seq 1 5); do
echo $x
done
But, how do I iterate over each character of a word?
It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash, but I don't have busybox handy to test with.
var='ab * cd'
tmp="$var" # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
rest="${tmp#?}" # All but the first character of the string
first="${tmp%"$rest"}" # Remove $rest, and you're left with the first character
echo "$first"
tmp="$rest"
done
Output:
a
b
*
c
d
Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ] are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}" are needed if the string contains "*".
Use getopts to process input one character at a time. The : instructs getopts to ignore illegal options and set OPTARG. The leading - in the input makes getopts treat the string as a options.
If getopts encounters a colon, it will not set OPTARG, so the script uses parameter expansion to return : when OPTARG is not set/null.
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "-$1"
do echo "'${OPTARG:-:}'"
done
}
while read -r line;do
split_string "$line"
done
As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "$1";do
case "${OPTARG:=:}" in
([[:print:]])
[ -n "$multi" ] && echo "$multi" && multi=
echo "$OPTARG" && continue
esac
multi="$multi$OPTARG"
case "$multi" in
([[:print:]]) echo "$multi" && multi=
esac
done
[ -n "$multi" ] && echo "$multi"
}
while read -r line;do
split_string "-$line"
done
Here the extra case "$multi" is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.
This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.
Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.
This won't handle the case where a single byte character is followed by a combining character.
This works in dash and busybox:
echo 'ab * cd' | grep -o .
Output:
a
b
*
c
d
I was developing a script which demanded stacks... So, we can use it to iterate through strings
#!/bin/sh
# posix script
pop () {
# $1 top
# $2 stack
eval $1='$(expr "'\$$2'" : "\(.\).*")'
eval $2='$(expr "'\$$2'" : ".\(.*\)" )'
}
string="ABCDEFG"
while [ "$string" != "" ]
do
pop c string
echo "--" $c
done

Reverse Triangle using shell

OK so Ive been at this for a couple days,im new to this whole bash UNIX system thing i just got into it but I am trying to write a script where the user inputs an integer and the script will take that integer and print out a triangle using the integer that was inputted as a base and decreasing until it reaches zero. An example would be:
reverse_triangle.bash 4
****
***
**
*
so this is what I have so far but when I run it nothing happens I have no idea what is wrong
#!/bin/bash
input=$1
count=1
for (( i=$input; i>=$count;i-- ))
do
for (( j=1; j>=i; j++ ))
do
echo -n "*"
done
echo
done
exit 0
when I try to run it nothing happens it just goes to the next line. help would be greatly appreciated :)
As I said in a comment, your test is wrong: you need
for (( j=1; j<=i; j++ ))
instead of
for (( j=1; j>=i; j++ ))
Otherwise, this loop is only executed when i=1, and it becomes an infinite loop.
Now if you want another way to solve that, in a much better way:
#!/bin/bash
[[ $1 = +([[:digit:]]) ]] || { printf >&2 'Argument must be a number\n'; exit 1; }
number=$((10#$1))
for ((;number>=1;--number)); do
printf -v spn '%*s' "$number"
printf '%s\n' "${spn// /*}"
done
Why is it better? first off, we check that the argument is really a number. Without this, your code is subject to arbitrary code injection. Also, we make sure that the number is understood in radix 10 with 10#$1. Otherwise, an argument like 09 would raise an error.
We don't really need an extra variable for the loop, the provided argument is good enough. Now the trick: to print n times a pattern, a cool method is to store n spaces in a variable with printf: %*s will expand to n spaces, where n is the corresponding argument found by printf.
For example:
printf '%s%*s%s\n' hello 42 world
would print:
hello world
(with 42 spaces).
Editor's note: %*s will NOT generally expand to n spaces, as evidenced by above output, which contains 37 spaces.
Instead, the argument that * is mapped to,42, is the field width for the sfield, which maps to the following argument,world, causing string world to be left-space-padded to a length of 42; since world has a character count of 5, 37 spaces are used for padding.
To make the example work as intended, use printf '%s%*s%s\n' hello 42 '' world - note the empty string argument following 42, which ensures that the entire field is made up of padding, i.e., spaces (you'd get the same effect if no arguments followed 42).
With printf's -v option, we can store any string formatted by printf into a variable; here we're storing $number spaces in spn. Finally, we replace all spaces by the character *, using the expansion ${spn// /*}.
Yet another possibility:
#!/bin/bash
[[ $1 = +([[:digit:]]) ]] || { printf >&2 'Argument must be a number\n'; exit 1; }
printf -v s '%*s' $((10#1))
s=${s// /*}
while [[ $s ]]; do
printf '%s\n' "$s"
s=${s%?}
done
This time we construct the variable s that contains a bunch of * (number given by user), using the previous technique. Then we have a while loop that loops while s is non empty. At each iteration we print the content of s and we remove a character with the expansion ${s%?} that removes the last character of s.
Building on gniourf_gniourf's helpful answer:
The following is simpler and performs significantly better:
#!/bin/bash
count=$1 # (... number-validation code omitted for brevity)
# Create the 1st line, composed of $count '*' chars, and store in var. $line.
printf -v line '%.s*' $(seq $count)
# Count from $count down to 1.
while (( count-- )); do
# Print a *substring* of the 1st line based on the current value of $count.
printf "%.${count}s\n" "$line"
done
printf -v line '*%.s' $(seq $count) is a trick that prints * $count times, thanks to %.s* resulting in * for each argument supplied, irrespective of the arguments' values (thanks to %.s, which effectively ignores its argument). $(seq $count) expands to $count arguments, resulting in a string composed of $count * chars. overall, which - thanks to -v line, is stored in variable $line.
printf "%.${count}s\n" "$line" prints a substring from the beginning of $line that is $count chars. long.

How do you echo a 4-digit Unicode character in Bash?

I'd like to add the Unicode skull and crossbones to my shell prompt (specifically the 'SKULL AND CROSSBONES' (U+2620)), but I can't figure out the magic incantation to make echo spit it, or any other, 4-digit Unicode character. Two-digit one's are easy. For example, echo -e "\x55", .
In addition to the answers below it should be noted that, obviously, your terminal needs to support Unicode for the output to be what you expect. gnome-terminal does a good job of this, but it isn't necessarily turned on by default.
On macOS's Terminal app Go to Preferences-> Encodings and choose Unicode (UTF-8).
In UTF-8 it's actually 6 digits (or 3 bytes).
$ printf '\xE2\x98\xA0'
☠
To check how it's encoded by the console, use hexdump:
$ printf ☠ | hexdump
0000000 98e2 00a0
0000003
% echo -e '\u2620' # \u takes four hexadecimal digits
☠
% echo -e '\U0001f602' # \U takes eight hexadecimal digits
😂
This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.
So long as your text-editors can cope with Unicode (presumably encoded in UTF-8) you can enter the Unicode code-point directly.
For instance, in the Vim text-editor you would enter insert mode and press Ctrl + V + U and then the code-point number as a 4-digit hexadecimal number (pad with zeros if necessary). So you would type Ctrl + V + U 2 6 2 0. See: What is the easiest way to insert Unicode characters into a document?
At a terminal running Bash you would type CTRL+SHIFT+U and type in the hexadecimal code-point of the character you want. During input your cursor should show an underlined u. The first non-digit you type ends input, and renders the character. So you could be able to print U+2620 in Bash using the following:
echo CTRL+SHIFT+U2620ENTERENTER
(The first enter ends Unicode input, and the second runs the echo command.)
Credit: Ask Ubuntu SE
Here's a fully internal Bash implementation, no forking, unlimited size of Unicode characters.
fast_chr() {
local __octal
local __char
printf -v __octal '%03o' $1
printf -v __char \\$__octal
REPLY=$__char
}
function unichr {
local c=$1 # Ordinal of char
local l=0 # Byte ctr
local o=63 # Ceiling
local p=128 # Accum. bits
local s='' # Output string
(( c < 0x80 )) && { fast_chr "$c"; echo -n "$REPLY"; return; }
while (( c > o )); do
fast_chr $(( t = 0x80 | c & 0x3f ))
s="$REPLY$s"
(( c >>= 6, l++, p += o+1, o>>=1 ))
done
fast_chr $(( t = p | c ))
echo -n "$REPLY$s"
}
## test harness
for (( i=0x2500; i<0x2600; i++ )); do
unichr $i
done
Output was:
─━│┃┄┅┆┇┈┉┊┋┌┍┎┏
┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯
┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏
═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯
╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏
▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯
▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●
◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯
◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿
Quick one-liner to convert UTF-8 characters into their 3-byte format:
var="$(echo -n '☠' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo
or
echo -n '☠' | od -An -tx1 | sed 's/ /\\x/g'
The output of both is \xE2\x98\xA0, so you can write reversely:
echo $'\xe2\x98\xa0' # ☠
Just put "☠" in your shell script. In the correct locale and on a Unicode-enabled console it'll print just fine:
$ echo ☠
☠
$
An ugly "workaround" would be to output the UTF-8 sequence, but that also depends on the encoding used:
$ echo -e '\xE2\x98\xA0'
☠
$
In bash to print a Unicode character to output use \x,\u or \U (first for 2 digit hex, second for 4 digit hex, third for any length)
echo -e '\U1f602'
I you want to assign it to a variable use $'...' syntax
x=$'\U1f602'
echo $x
Here is a list of all unicode emoji's available:
https://en.wikipedia.org/wiki/Emoji#Unicode_blocks
Example:
echo -e "\U1F304"
🌄
For get the ASCII value of this character use hexdump
echo -e "🌄" | hexdump -C
00000000 f0 9f 8c 84 0a |.....|
00000005
And then use the values informed in hex format
echo -e "\xF0\x9F\x8C\x84\x0A"
🌄
Any of these three commands will print the character you want in a console, provided the console do accept UTF-8 characters (most current ones do):
echo -e "SKULL AND CROSSBONES (U+2620) \U02620"
echo $'SKULL AND CROSSBONES (U+2620) \U02620'
printf "%b" "SKULL AND CROSSBONES (U+2620) \U02620\n"
SKULL AND CROSSBONES (U+2620) ☠
After, you could copy and paste the actual glyph (image, character) to any (UTF-8 enabled) text editor.
If you need to see how such Unicode Code Point is encoded in UTF-8, use xxd (much better hex viewer than od):
echo $'(U+2620) \U02620' | xxd
0000000: 2855 2b32 3632 3029 20e2 98a0 0a (U+2620) ....
That means that the UTF8 encoding is: e2 98 a0
Or, in HEX to avoid errors: 0xE2 0x98 0xA0. That is, the values between the space (HEX 20) and the Line-Feed (Hex 0A).
If you want a deep dive into converting numbers to chars: look here to see an article from Greg's wiki (BashFAQ) about ASCII encoding in Bash!
I'm using this:
$ echo -e '\u2620'
☠
This is pretty easier than searching a hex representation... I'm using this in my shell scripts. That works on gnome-term and urxvt AFAIK.
You may need to encode the code point as octal in order for prompt expansion to correctly decode it.
U+2620 encoded as UTF-8 is E2 98 A0.
So in Bash,
export PS1="\342\230\240"
will make your shell prompt into skull and bones.
If you don't mind a Perl one-liner:
$ perl -CS -E 'say "\x{2620}"'
☠
-CS enables UTF-8 decoding on input and UTF-8 encoding on output. -E evaluates the next argument as Perl, with modern features like say enabled. If you don't want a newline at the end, use print instead of say.
Sorry for reviving this old question. But when using bash there is a very easy approach to create Unicode codepoints from plain ASCII input, which even does not fork at all:
unicode() { local -n a="$1"; local c; printf -vc '\\U%08x' "$2"; printf -va "$c"; }
unicodes() { local a c; for a; do printf -vc '\\U%08x' "$a"; printf "$c"; done; };
Use it as follows to define certain codepoints
unicode crossbones 0x2620
echo "$crossbones"
or to dump the first 65536 unicode codepoints to stdout (takes less than 2s on my machine. The additional space is to prevent certain characters to flow into each other due to shell's monospace font):
for a in {0..65535}; do unicodes "$a"; printf ' '; done
or to tell a little very typical parent's story (this needs Unicode 2010):
unicodes 0x1F6BC 32 43 32 0x1F62D 32 32 43 32 0x1F37C 32 61 32 0x263A 32 32 43 32 0x1F4A9 10
Explanation:
printf '\UXXXXXXXX' prints out any Unicode character
printf '\\U%08x' number prints \UXXXXXXXX with the number converted to Hex, this then is fed to another printf to actually print out the Unicode character
printf recognizes octal (0oct), hex (0xHEX) and decimal (0 or numbers starting with 1 to 9) as numbers, so you can choose whichever representation fits best
printf -v var .. gathers the output of printf into a variable, without fork (which tremendously speeds up things)
local variable is there to not pollute the global namespace
local -n var=other aliases var to other, such that assignment to var alters other. One interesting part here is, that var is part of the local namespace, while other is part of the global namespace.
Please note that there is no such thing as local or global namespace in bash. Variables are kept in the environment, and such are always global. Local just puts away the current value and restores it when the function is left again. Other functions called from within the function with local will still see the "local" value. This is a fundamentally different concept than all the normal scoping rules found in other languages (and what bash does is very powerful but can lead to errors if you are a programmer who is not aware of that).
In Bash:
UnicodePointToUtf8()
{
local x="$1" # ok if '0x2620'
x=${x/\\u/0x} # '\u2620' -> '0x2620'
x=${x/U+/0x}; x=${x/u+/0x} # 'U-2620' -> '0x2620'
x=$((x)) # from hex to decimal
local y=$x n=0
[ $x -ge 0 ] || return 1
while [ $y -gt 0 ]; do y=$((y>>1)); n=$((n+1)); done
if [ $n -le 7 ]; then # 7
y=$x
elif [ $n -le 11 ]; then # 5+6
y=" $(( ((x>> 6)&0x1F)+0xC0 )) \
$(( (x&0x3F)+0x80 ))"
elif [ $n -le 16 ]; then # 4+6+6
y=" $(( ((x>>12)&0x0F)+0xE0 )) \
$(( ((x>> 6)&0x3F)+0x80 )) \
$(( (x&0x3F)+0x80 ))"
else # 3+6+6+6
y=" $(( ((x>>18)&0x07)+0xF0 )) \
$(( ((x>>12)&0x3F)+0x80 )) \
$(( ((x>> 6)&0x3F)+0x80 )) \
$(( (x&0x3F)+0x80 ))"
fi
printf -v y '\\x%x' $y
echo -n -e $y
}
# test
for (( i=0x2500; i<0x2600; i++ )); do
UnicodePointToUtf8 $i
[ "$(( i+1 & 0x1f ))" != 0 ] || echo ""
done
x='U+2620'
echo "$x -> $(UnicodePointToUtf8 $x)"
Output:
─━│┃┄┅┆┇┈┉┊┋┌┍┎┏┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿
U+2620 -> ☠
The printf builtin (just as the coreutils' printf) knows the \u escape sequence which accepts 4-digit Unicode characters:
\uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
Test with Bash 4.2.37(1):
$ printf '\u2620\n'
☠
Based on Stack Overflow questions Unix cut, remove first token and https://stackoverflow.com/a/15903654/781312:
(octal=$(echo -n ☠ | od -t o1 | head -1 | cut -d' ' -f2- | sed -e 's#\([0-9]\+\) *#\\0\1#g')
echo Octal representation is following $octal
echo -e "$octal")
Output is the following.
Octal representation is following \0342\0230\0240
☠
Easy with a Python2/3 one-liner:
$ python -c 'print u"\u2620"' # python2
$ python3 -c 'print(u"\u2620")' # python3
Results in:
☠
If hex value of unicode character is known
H="2620"
printf "%b" "\u$H"
If the decimal value of a unicode character is known
declare -i U=2*4096+6*256+2*16
printf -vH "%x" $U # convert to hex
printf "%b" "\u$H"

Resources