How do you echo a 4-digit Unicode character in Bash?

How do you echo a 4-digit Unicode character in Bash? - bash

I'd like to add the Unicode skull and crossbones to my shell prompt (specifically the 'SKULL AND CROSSBONES' (U+2620)), but I can't figure out the magic incantation to make echo spit it, or any other, 4-digit Unicode character. Two-digit one's are easy. For example, echo -e "\x55", .
In addition to the answers below it should be noted that, obviously, your terminal needs to support Unicode for the output to be what you expect. gnome-terminal does a good job of this, but it isn't necessarily turned on by default.
On macOS's Terminal app Go to Preferences-> Encodings and choose Unicode (UTF-8).

In UTF-8 it's actually 6 digits (or 3 bytes).
$ printf '\xE2\x98\xA0'
☠
To check how it's encoded by the console, use hexdump:
$ printf ☠ | hexdump
0000000 98e2 00a0
0000003

% echo -e '\u2620' # \u takes four hexadecimal digits
☠
% echo -e '\U0001f602' # \U takes eight hexadecimal digits
😂
This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.

So long as your text-editors can cope with Unicode (presumably encoded in UTF-8) you can enter the Unicode code-point directly.
For instance, in the Vim text-editor you would enter insert mode and press Ctrl + V + U and then the code-point number as a 4-digit hexadecimal number (pad with zeros if necessary). So you would type Ctrl + V + U 2 6 2 0. See: What is the easiest way to insert Unicode characters into a document?
At a terminal running Bash you would type CTRL+SHIFT+U and type in the hexadecimal code-point of the character you want. During input your cursor should show an underlined u. The first non-digit you type ends input, and renders the character. So you could be able to print U+2620 in Bash using the following:
echo CTRL+SHIFT+U2620ENTERENTER
(The first enter ends Unicode input, and the second runs the echo command.)
Credit: Ask Ubuntu SE

Here's a fully internal Bash implementation, no forking, unlimited size of Unicode characters.
fast_chr() {
local __octal
local __char
printf -v __octal '%03o' $1
printf -v __char \\$__octal
REPLY=$__char
}
function unichr {
local c=$1 # Ordinal of char
local l=0 # Byte ctr
local o=63 # Ceiling
local p=128 # Accum. bits
local s='' # Output string
(( c < 0x80 )) && { fast_chr "$c"; echo -n "$REPLY"; return; }
while (( c > o )); do
fast_chr $(( t = 0x80 | c & 0x3f ))
s="$REPLY$s"
(( c >>= 6, l++, p += o+1, o>>=1 ))
done
fast_chr $(( t = p | c ))
echo -n "$REPLY$s"
}
## test harness
for (( i=0x2500; i<0x2600; i++ )); do
unichr $i
done
Output was:
─━│┃┄┅┆┇┈┉┊┋┌┍┎┏
┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯
┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏
═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯
╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏
▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯
▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●
◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯
◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿

Quick one-liner to convert UTF-8 characters into their 3-byte format:
var="$(echo -n '☠' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo
or
echo -n '☠' | od -An -tx1 | sed 's/ /\\x/g'
The output of both is \xE2\x98\xA0, so you can write reversely:
echo $'\xe2\x98\xa0' # ☠

Just put "☠" in your shell script. In the correct locale and on a Unicode-enabled console it'll print just fine:
$ echo ☠
☠
$
An ugly "workaround" would be to output the UTF-8 sequence, but that also depends on the encoding used:
$ echo -e '\xE2\x98\xA0'
☠
$

In bash to print a Unicode character to output use \x,\u or \U (first for 2 digit hex, second for 4 digit hex, third for any length)
echo -e '\U1f602'
I you want to assign it to a variable use $'...' syntax
x=$'\U1f602'
echo $x

Here is a list of all unicode emoji's available:
https://en.wikipedia.org/wiki/Emoji#Unicode_blocks
Example:
echo -e "\U1F304"
🌄
For get the ASCII value of this character use hexdump
echo -e "🌄" | hexdump -C
00000000 f0 9f 8c 84 0a |.....|
00000005
And then use the values informed in hex format
echo -e "\xF0\x9F\x8C\x84\x0A"
🌄

Any of these three commands will print the character you want in a console, provided the console do accept UTF-8 characters (most current ones do):
echo -e "SKULL AND CROSSBONES (U+2620) \U02620"
echo $'SKULL AND CROSSBONES (U+2620) \U02620'
printf "%b" "SKULL AND CROSSBONES (U+2620) \U02620\n"
SKULL AND CROSSBONES (U+2620) ☠
After, you could copy and paste the actual glyph (image, character) to any (UTF-8 enabled) text editor.
If you need to see how such Unicode Code Point is encoded in UTF-8, use xxd (much better hex viewer than od):
echo $'(U+2620) \U02620' | xxd
0000000: 2855 2b32 3632 3029 20e2 98a0 0a (U+2620) ....
That means that the UTF8 encoding is: e2 98 a0
Or, in HEX to avoid errors: 0xE2 0x98 0xA0. That is, the values between the space (HEX 20) and the Line-Feed (Hex 0A).
If you want a deep dive into converting numbers to chars: look here to see an article from Greg's wiki (BashFAQ) about ASCII encoding in Bash!

I'm using this:
$ echo -e '\u2620'
☠
This is pretty easier than searching a hex representation... I'm using this in my shell scripts. That works on gnome-term and urxvt AFAIK.

You may need to encode the code point as octal in order for prompt expansion to correctly decode it.
U+2620 encoded as UTF-8 is E2 98 A0.
So in Bash,
export PS1="\342\230\240"
will make your shell prompt into skull and bones.

If you don't mind a Perl one-liner:
$ perl -CS -E 'say "\x{2620}"'
☠
-CS enables UTF-8 decoding on input and UTF-8 encoding on output. -E evaluates the next argument as Perl, with modern features like say enabled. If you don't want a newline at the end, use print instead of say.

Sorry for reviving this old question. But when using bash there is a very easy approach to create Unicode codepoints from plain ASCII input, which even does not fork at all:
unicode() { local -n a="$1"; local c; printf -vc '\\U%08x' "$2"; printf -va "$c"; }
unicodes() { local a c; for a; do printf -vc '\\U%08x' "$a"; printf "$c"; done; };
Use it as follows to define certain codepoints
unicode crossbones 0x2620
echo "$crossbones"
or to dump the first 65536 unicode codepoints to stdout (takes less than 2s on my machine. The additional space is to prevent certain characters to flow into each other due to shell's monospace font):
for a in {0..65535}; do unicodes "$a"; printf ' '; done
or to tell a little very typical parent's story (this needs Unicode 2010):
unicodes 0x1F6BC 32 43 32 0x1F62D 32 32 43 32 0x1F37C 32 61 32 0x263A 32 32 43 32 0x1F4A9 10
Explanation:
printf '\UXXXXXXXX' prints out any Unicode character
printf '\\U%08x' number prints \UXXXXXXXX with the number converted to Hex, this then is fed to another printf to actually print out the Unicode character
printf recognizes octal (0oct), hex (0xHEX) and decimal (0 or numbers starting with 1 to 9) as numbers, so you can choose whichever representation fits best
printf -v var .. gathers the output of printf into a variable, without fork (which tremendously speeds up things)
local variable is there to not pollute the global namespace
local -n var=other aliases var to other, such that assignment to var alters other. One interesting part here is, that var is part of the local namespace, while other is part of the global namespace.
Please note that there is no such thing as local or global namespace in bash. Variables are kept in the environment, and such are always global. Local just puts away the current value and restores it when the function is left again. Other functions called from within the function with local will still see the "local" value. This is a fundamentally different concept than all the normal scoping rules found in other languages (and what bash does is very powerful but can lead to errors if you are a programmer who is not aware of that).

In Bash:
UnicodePointToUtf8()
{
local x="$1" # ok if '0x2620'
x=${x/\\u/0x} # '\u2620' -> '0x2620'
x=${x/U+/0x}; x=${x/u+/0x} # 'U-2620' -> '0x2620'
x=$((x)) # from hex to decimal
local y=$x n=0
[ $x -ge 0 ] || return 1
while [ $y -gt 0 ]; do y=$((y>>1)); n=$((n+1)); done
if [ $n -le 7 ]; then # 7
y=$x
elif [ $n -le 11 ]; then # 5+6
y=" $(( ((x>> 6)&0x1F)+0xC0 )) \
$(( (x&0x3F)+0x80 ))"
elif [ $n -le 16 ]; then # 4+6+6
y=" $(( ((x>>12)&0x0F)+0xE0 )) \
$(( ((x>> 6)&0x3F)+0x80 )) \
$(( (x&0x3F)+0x80 ))"
else # 3+6+6+6
y=" $(( ((x>>18)&0x07)+0xF0 )) \
$(( ((x>>12)&0x3F)+0x80 )) \
$(( ((x>> 6)&0x3F)+0x80 )) \
$(( (x&0x3F)+0x80 ))"
fi
printf -v y '\\x%x' $y
echo -n -e $y
}
# test
for (( i=0x2500; i<0x2600; i++ )); do
UnicodePointToUtf8 $i
[ "$(( i+1 & 0x1f ))" != 0 ] || echo ""
done
x='U+2620'
echo "$x -> $(UnicodePointToUtf8 $x)"
Output:
─━│┃┄┅┆┇┈┉┊┋┌┍┎┏┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿
U+2620 -> ☠

The printf builtin (just as the coreutils' printf) knows the \u escape sequence which accepts 4-digit Unicode characters:
\uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
Test with Bash 4.2.37(1):
$ printf '\u2620\n'
☠

Based on Stack Overflow questions Unix cut, remove first token and https://stackoverflow.com/a/15903654/781312:
(octal=$(echo -n ☠ | od -t o1 | head -1 | cut -d' ' -f2- | sed -e 's#\([0-9]\+\) *#\\0\1#g')
echo Octal representation is following $octal
echo -e "$octal")
Output is the following.
Octal representation is following \0342\0230\0240
☠

Easy with a Python2/3 one-liner:
$ python -c 'print u"\u2620"' # python2
$ python3 -c 'print(u"\u2620")' # python3
Results in:
☠

If hex value of unicode character is known
H="2620"
printf "%b" "\u$H"
If the decimal value of a unicode character is known
declare -i U=2*4096+6*256+2*16
printf -vH "%x" $U # convert to hex
printf "%b" "\u$H"

Related

bash function name with dash is bad pratice? [duplicate]

What are the syntax rules for identifiers, especially function and variable names, in Bash?
I wrote a Bash script and tested it on various versions of Bash on Ubuntu, Debian, Red Hat 5 and 6, and even an old Solaris 8 box. The script ran well, so it shipped.
Yet when a user tried it on SUSE machines, it gave a "not a valid identifier" error. Fortunately, my guess that there was an invalid character in the function name was right. The hyphens were messing it up.
The fact that a script that was at least somewhat tested would have completely different behaviour on another Bash or distro was disconcerting. How can I avoid this?

From the manual:
Shell Function Definitions
...
name () compound-command [redirection]
function name [()] compound-command [redirection]
name is defined elsewhere:
name A word consisting only of alphanumeric characters and under‐
scores, and beginning with an alphabetic character or an under‐
score. Also referred to as an identifier.
So hyphens are not valid. And yet, on my system, they do work...
$ bash --version
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)

The question was about "the rules", which has been answered two different ways, each correct in some sense, depending on what you want to call "the rules". Just to flesh out #rici's point that you can shove about any character in a function name, I wrote a small bash script to try to check every possible (0-255) character as a function name, as well as as the second character of a function name:
#!/bin/bash
ASCII=( nul soh stx etx eot enq ack bel bs tab nl vt np cr so si dle \
dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us sp )
for((i=33; i < 127; ++i)); do
printf -v Hex "%x" $i
printf -v Chr "\x$Hex"
ASCII[$i]="$Chr"
done
ASCII[127]=del
for((i=128; i < 256; ++i)); do
ASCII[$i]=$(printf "0X%x" $i)
done
# ASCII table is now defined
function Test(){
Illegal=""
for((i=1; i <= 255; ++i)); do
Name="$(printf \\$(printf '%03o' $i))"
eval "function $1$Name(){ return 0; }; $1$Name ;" 2>/dev/null
if [[ $? -ne 0 ]]; then
Illegal+=" ${ASCII[$i]}"
# echo Illegal: "${ASCII[$i]}"
fi
done
printf "Illegal: %s\n" "$Illegal"
}
echo "$BASH_VERSION"
Test
Test "x"
# can we really do funky crap like this?
function [}{(){
echo "Let me take you to, funkytown!"
}
[}{ # why yes, we can!
# though editor auto-indent modes may punish us
I actually skip NUL (0x00), as that's the one character bash may object to finding in the input stream. The output from this script was:
4.4.0(1)-release
Illegal: soh tab nl sp ! " # $ % & ' ( ) * 0 1 2 3 4 5 6 7 8 9 ; < > \ ` { | } ~ del
Illegal: soh " $ & ' ( ) ; < > [ \ ` | del
Let me take you to, funkytown!
Note that bash happily lets me name my function "[}{". Probably my code is not quite rigorous enough to provide the exact rules for legality-in-practice, but it should give a flavor of what manner of abuse is possible.
I wish I could mark this answer "For mature audiences only."

Command identifiers and variable names have different syntaxes. A variable name is restricted to alphanumeric characters and underscore, not starting with a digit. A command name, on the other hand, can be just about anything which doesn't contain bash metacharacters (and even then, they can be quoted).
In bash, function names can be command names, as long as they would be parsed as a WORD without quotes. (Except that, for some reason, they cannot be integers.) However, that is a bash extension. If the target machine is using some other shell (such as dash), it might not work, since the Posix standard shell grammar only allows "NAME" in the function definition form (and also prohibits the use of reserved words).

From 3.3 Shell Functions:
Shell functions are a way to group commands for later execution using a single name for the group. They are executed just like a "regular" command. When the name of a shell function is used as a simple command name, the list of commands associated with that function name is executed. Shell functions are executed in the current shell context; no new process is created to interpret them.
Functions are declared using this syntax:
name () compound-command [ redirections ]
or
function name [()] compound-command [ redirections ]
and from 2 Definitions:
name
A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names. Also referred to as an identifier.

Note The biggest correction here is that newline is never allowed in a function name.
My answer:
Bash --posix: [a-zA-Z_][0-9a-zA-Z_]*
Bash 3.0-4.4: [^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
Bash 5.0: [^#%0-9\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
\1 and \x7f works now
Bash 5.1: [^#%\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
Numbers can come first?! Yep!
Any bash 3-5: [^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
Same as 3.0-4.4
My suggestion (opinion): [^#%0-9\0-\f "$&'();<>\`|\x7f-\xff][^\0-\f "$&'();<>\`|\x7f-\xff]
Positive version: [!*+,-./:=?#A-Z\[\]^_a-z{}~][#%0-9!*+,-./:=?#A-Z\[\]^_a-z{}~]*
My version of the test:
for ((x=1; x<256; x++)); do
hex="$(printf "%02x" $x)"
name="$(printf \\x${hex})"
if [ "${x}" = "10" ]; then
name=$'\n'
fi
if [ "$(echo -n "${name}" | xxd | awk '{print $2}')" != "${hex}" ]; then
echo "$x failed first sanity check"
fi
(
eval "function ${name}(){ echo ${x};}" &>/dev/null
if test "$("${name}" 2>/dev/null)" != "${x}"; then
eval "function ok${name}doe(){ echo ${x};}" &>/dev/null
if test "$(type -t okdoe 2>/dev/null)" = "function"; then
echo "${x} failed second sanity test"
fi
if test "$("ok${name}doe" 2>/dev/null)" != "${x}"; then
echo "${x}(${name}) never works"
else
echo "${x}(${name}) cannot be first"
fi
else
# Just assume everything over 128 is hard, unless this says otherwise
if test "${x}" -gt 127; then
if declare -pF | grep -q "declare -f \x${hex}"; then
echo "${x} works, but is actually not difficult"
declare -pF | grep "declare -f \x${hex}" | xxd
fi
elif ! declare -pF | grep -q "declare -f \x${hex}"; then
echo "${x} works, but is difficult in bash"
fi
fi
)
done
Some additional notes:
Characters 1-31 are less than ideal, as they are more difficult to type.
Characters 128-255 are even less ideal in bash (except on bash 3.2 on macOS. It might be compiled differently?) because commands like declare -pF do not render the special characters, even though they are there in memory. This means any introspection code will incorrectly assume that these functions are not there. However, features like compgen still correctly render the characters.
Out of my testing scope, but some unicode does work too, although it's extra hard to paste/type on macOS over ssh.

This script tests all valid chars for
function names with 1 char.
It outputs 53 valid chars (a-zA-Z and underscore) using
a POSIX shell and 220 valid chars with BASH v4.4.12.
The Answer from Ron Burk is valid, but lacks the numbers.
#!/bin/sh
FILE='/tmp/FOO'
I=0
VALID=0
while [ $I -lt 256 ]; do {
NAME="$( printf \\$( printf '%03o' $I ))"
I=$(( I + 1 ))
>"$FILE"
( eval "$NAME(){ rm $FILE;}; $NAME" 2>/dev/null )
if [ -f "$FILE" ]; then
rm "$FILE"
else
VALID=$(( VALID + 1 ))
echo "$VALID/256 - OK: $NAME"
fi
} done

bash: Emit n printable characters from a string with ANSI codes

In bash, given an arbitrary string containing ANSI CSI codes (eg colours), how do I emit a subset of the printable characters, printed in the correct colours?
Eg, given:
s=$'\e[0;1;31mRED\e[0;1;32mGREEN\e[0;1;33mYELLOW'
How do I do something like:
coloursubstr "$s" 0 5
coloursubstr "$s" 2 7

With bash and GNU grep:
coloursubstr() {
local string="$1" from="$2" num="$3"
local line i array=()
# fill array
while IFS= read -r line; do
[[ $line =~ ^([^m]+m)(.*)$ ]]
for ((i=0;i<${#BASH_REMATCH[2]};i++)); do
array+=("${BASH_REMATCH[1]}${BASH_REMATCH[2]:$i:1}")
done
done < <(grep -Po $'\x1b.*?m[^\x1b]*' <<< "$string")
# print array
for ((i=$from;i<$from+$num;i++)); do
printf "%s" "${array[$i]}"
done
echo
}
s=$'\e[0;1;31mRED\e[0;1;32mGREEN\e[0;1;33mYELLOW'
coloursubstr "$s" 0 5
coloursubstr "$s" 2 7
Output:
I assume all color codes start with \e, end with m and text is prefixed by a color code.

Partial answer, (specific hack with magic numbers, not at all general):
echo "${s:0:23}"
echo "${s:0:9}${s:11:25}"
Output:

Simple bash script (input letter output number)

Hi I'm looking to write a simple script which takes an input letter and outputs it's numerical equivalent :-
I was thinking of listing all letters as variables, then have bash read the input as a variable but from here I'm pretty stuck, any help would be awesome!
#!/bin/bash
echo "enter letter"
read "LET"
a=1
b=2
c=3
d=4
e=5
f=6
g=7
h=8
i=9
j=10
k=11
l=12
m=13
n=14
o=15
p=16
q=17
r=18
s=19
t=20
u=21
v=22
w=23
x=24
y=25
z=26
LET=${a..z}
if
$LET = [ ${a..z} ];
then
echo $NUM
sleep 5
echo "success!"
sleep 1
exit
else
echo "FAIL :("
exit
fi

Try this:
echo "Input letter"
read letter
result=$(($(printf "%d\n" \'$letter) - 65))
echo $result
0
ASCII equivalent of 'A' is 65 so all you've got to do to is to take away 65 (or 64, if you want to start with 1, not 0) from the letter you want to check. For lowercase the offset will be 97.

A funny one, abusing Bash's radix system:
read -n1 -p "Type a letter: " letter
if [[ $letter = [[:alpha:]] && $letter = [[:ascii:]] ]]; then
printf "\nCode: %d\n" "$((36#$letter-9))"
else
printf "\nSorry, you didn't enter a valid letter\n"
fi
The interesting part is the $((36#$letter-9)). The 36# part tells Bash to understand the following string as a number in radix 36 which consists of a string containing the digits and letters (case not important, so it'll work with uppercase letters too), with 36#a=10, 36#b=11, …, 36#z=35. So the conversion is just a matter of subtracting 9.
The read -n1 only reads one character from standard input. The [[ $letter = [[:alpha:]] && $letter = [[:ascii:]] ]] checks that letter is really an ascii letter. Without the [[:ascii:]] test, we would validate characters like é (depending on locale) and this would mess up with the conversion.

use these two functions to get chr and ord :
chr() {
[ "$1" -lt 256 ] || return 1
printf "\\$(printf '%03o' "$1")"
}
ord() {
LC_CTYPE=C printf '%d' "'$1"
}
echo $(chr 97)
a

USing od and tr
echo "type letter: "
read LET
echo "$LET" | tr -d "\n" | od -An -t uC
OR using -n
echo -n "$LET" | od -An -t uC
If you want it to start at a=1
echo $(( $(echo -n "$LET" | od -An -t uC) - 96 ))
Explanation
Pipes into the tr to remove the newline.
Use od to change to unsigned decimal.

late to the party: use an associative array:
# require bash version 4
declare -A letters
for letter in {a..z}; do
letters[$letter]=$((++i))
done
read -p "enter a single lower case letter: " letter
echo "the value of $letter is ${letters[$letter]:-N/A}"

Bash shell Decimal to Binary base 2 conversion

I'm looking for an easy way in Bash to convert a decimal number into a binary number. I have variables that need to be converted:
$ip1 $ip2 $ip3 $ip4
Is there a simple method to do this without looking at every individual number?
I would prefer not to have to write a lot of code.

You can use bc as:
echo "obase=2;$ip1" | bc
See it

Convert decimal to binary with bash builtin commands (range 0 to 255):
D2B=({0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1})
echo ${D2B[7]}
00000111
echo ${D2B[85]}
01010101
echo ${D2B[127]}
01111111
To remove leading zeros, e.g. from ${D2B[7]}:
echo $((10#${D2B[7]}))
111
This creates an array with 00000000 00000001 00000010 ... 11111101 11111110 11111111 with bash‘s brace expansion. The position in array D2B represents its decimal value.
See also: Understanding code ({0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1})

Decimal to binary conversion in Bash:
I'm using Ubuntu 14.04 to do this.
Convert the decimals 1 through 5 to binary.
el#apollo:~$ bc <<< "obase=2;1"
1
el#apollo:~$ bc <<< "obase=2;2"
10
el#apollo:~$ bc <<< "obase=2;3"
11
el#apollo:~$ bc <<< "obase=2;4"
100
el#apollo:~$ bc <<< "obase=2;5"
101
Bonus example:
el#apollo:~$ bc <<< "obase=2;1024"
10000000000
el#apollo:~$ bc <<< "obase=2;2^128"
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

General method for converting an integer number into another representation with another base (but base<=10 because of using digits 0..9 for representation, only):
function convertIntvalToBase () # (Val Base)
{
val=$1
base=$2
result=""
while [ $val -ne 0 ] ; do
result=$(( $val % $base ))$result #residual is next digit
val=$(( $val / $base ))
done
echo -n $result
}
e.g.
convertIntvalToBase $ip1 2 # converts $ip1 into binary representation

Defined as a function in bash:
# to Binary:
toBinary(){
local n bit
for (( n=$1 ; n>0 ; n >>= 1 )); do bit="$(( n&1 ))$bit"; done
printf "%s\n" "$bit"
}

I like to use dc for this. It's very concise:
$ n=50; dc -e "$n 2op"
110010
The commands here are as follows:
Push the number, n, on the stack, via shell expansion.
Push 2 on the stack, then use o to pop the stack and use 2 as the output radix.
p to print the top of the stack (which is just n), using the output radix set in step 2 (so print in binary).
If you want padding:
$ n=50; pad_size=16; printf "%0${pad_size}d\n" $(dc -e "$n 2op")
0000000000110010

To make #codaddict's answer a little more pretty, use this to prefix the output with 0b for "binary":
printf "0b%s\n" "$(echo "obase=2; $((num1 + num2))" | bc)"
Example:
num1=2#1111 # binary 1111 (decimal 15)
num2=2#11111 # binary 11111 (decimal 31)
printf "0b%s\n" "$(echo "obase=2; $((num1 + num2))" | bc)"
Output:
0b101110
This is decimal 46.
For details on the input base-2 formatting in bash, such as 2#1111 above, see the very end of my answer here: How to use all bash operators, and arithmetic expansion, in bash.
To have at least 8 digits in the output, use:
printf "0b%08d\n" $(echo "obase=2; $((num1 + num2))" | bc)
Source: David Rankin, in an answer to my question here.

Decimal to Binary using only Bash
Any integer number can be converted ti binary using it::
touch dec2bin.bash && chmod +x "$_" && vim "$_"
And, then copy paste the following:
#!/bin/bash
num=$1;
dec2bin()
{
op=2; ## Since we're converting to binary
quo=$(( $num/ $op)); ## quotient
rem=$(( $num% $op)); ## remainder
array=(); ## array for putting remainder inside array
array+=("$rem"); ## array expansion
until [[ $quo -eq 0 ]]; do
num=$quo; ## looping to get all remainder, untill the remainder is 0
quo=$(( $num / $op));
rem=$(( $num % $op));
array+="$rem"; ## array expansion
done
binary=$(echo "${array[#]}" | rev); ## reversing array
printf "$binary\n"; ## print array
}
main()
{
[[ -n ${num//[0-9]/} ]] &&
{ printf "$num is not an integer bruv!\n"; return 1;
} || { dec2bin $num; }
}
main;
For example:
./dec2bin.bash $var
110100100
Integer must be added!!
./dec2bin.bash 420.py
420.py is not an integer bruv!
Also, another way using python:
Much slower
python -c "print(bin(420))"
0b110100100
Hexadecimal to Binary using only Bash
Similarly, hexadecimal to binary, as follows using only bash:
#!/usr/local/bin/bash ## For Darwin :( higher bash :)
#!/bin/bash ## Linux :)
hex=$1;
hex2bin()
{
op=2; num=$((16#$hex));
quo=$(( $num/ $op));
rem=$(( $num% $op));
array=();
array+=("$rem");
until [[ $quo -eq 0 ]]; do
num=$quo;
quo=$(( $num / $op));
rem=$(( $num % $op));
array+="$rem";
done
binary=$(echo "${array[#]}" | rev);
printf "Binary of $1 is: $binary\n";
}
main()
{
[[ -n ${hex//[0-9,A-F,a-f]/} ]] &&
{ printf "$hex is not a hexa decimal number bruv!\n"; return 1;
} || { hex2bin $hex; }
}
main;
For example:
./hex2bin.bash 1aF
Binary of 1aF is: 110101111
Hex must be passed:
./hex2bin.bash XyZ
XyZ is not a hexa decimal number bruv!

toBin ()
{
printf "%08d\n" $(dc -e "$1 2op")
}
$ toBin 37
00100101

Padding zeros in a string

I'm writing a bash script to get some podcasts. The problem is that some of the podcast numbers are one digits while others are two/three digits, therefore I need to pad them to make them all 3 digits.
I tried the following:
n=1
n = printf %03d $n
wget http://aolradio.podcast.aol.com/sn/SN-$n.mp3
but the variable 'n' doesn't stay padded permanently. How can I make it permanent?

Use backticks to assign the result of the printf command (``):
n=1
wget http://aolradio.podcast.aol.com/sn/SN-`printf %03d $n`.mp3
EDIT: Note that i removed one line which was not really necessary.
If you want to assign the output of 'printf %...' to n, you could
use
n=`printf %03d $n`
and after that, use the $n variable substitution you used before.

Seems you're assigning the return value of the printf command (which is its exit code), you want to assign the output of printf.
bash-3.2$ n=1
bash-3.2$ n=$(printf %03d $n)
bash-3.2$ echo $n
001

Attention though if your input string has a leading zero!
printf will still do the padding, but also convert your string to hex octal format.
# looks ok
$ echo `printf "%05d" 03`
00003
# but not for numbers over 8
$ echo `printf "%05d" 033`
00027
A solution to this seems to be printing a float instead of decimal.
The trick is omitting the decimal places with .0f.
# works with leading zero
$ echo `printf "%05.0f" 033`
00033
# as well as without
$ echo `printf "%05.0f" 33`
00033

to avoid context switching:
a="123"
b="00000${a}"
c="${b: -5}"

n=`printf '%03d' "2"`
Note spacing and backticks

As mentioned by noselad, please command substitution, i.e. $(...), is preferable as it supercedes backtics, i.e. `...`.
Much easier to work with when trying to nest several command substitutions instead of escaping, i.e. "backslashing", backtics.

This is in response to an answer given by cC Xx.
It will work only until a's value less is than 5 digits.
Consider when a=12345678.
It'll truncate the leading digits:
a="12345678"
b="00000${a}"
c="${b: -5}"
echo "$a, $b, $c"
This gives the following output:
12345678, 0000012345678, 45678
Putting an if to check value of a is less than 5 digits and then doing it could be solution:
if [[ $a -lt 9999 ]] ; then b="00000${a}" ; c="${b: -5}" ; else c=$a; fi

Just typing this here for additional information.
If you know the number of zeroes you need, you can use the string concatenation:
let pad="0"
pad+=1
echo $pad # this will print 01

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do you echo a 4-digit Unicode character in Bash? - bash

In UTF-8 it's actually 6 digits (or 3 bytes). $ printf '\xE2\x98\xA0' ☠ To check how it's encoded by the console, use hexdump: $ printf ☠ | hexdump 0000000 98e2 00a0 0000003

% echo -e '\u2620' # \u takes four hexadecimal digits ☠ % echo -e '\U0001f602' # \U takes eight hexadecimal digits 😂 This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.

Quick one-liner to convert UTF-8 characters into their 3-byte format: var="$(echo -n '☠' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo or echo -n '☠' | od -An -tx1 | sed 's/ /\\x/g' The output of both is \xE2\x98\xA0, so you can write reversely: echo $'\xe2\x98\xa0' # ☠

Just put "☠" in your shell script. In the correct locale and on a Unicode-enabled console it'll print just fine: $ echo ☠ ☠ $ An ugly "workaround" would be to output the UTF-8 sequence, but that also depends on the encoding used: $ echo -e '\xE2\x98\xA0' ☠ $

In bash to print a Unicode character to output use \x,\u or \U (first for 2 digit hex, second for 4 digit hex, third for any length) echo -e '\U1f602' I you want to assign it to a variable use $'...' syntax x=$'\U1f602' echo $x

I'm using this: $ echo -e '\u2620' ☠ This is pretty easier than searching a hex representation... I'm using this in my shell scripts. That works on gnome-term and urxvt AFAIK.

You may need to encode the code point as octal in order for prompt expansion to correctly decode it. U+2620 encoded as UTF-8 is E2 98 A0. So in Bash, export PS1="\342\230\240" will make your shell prompt into skull and bones.

If you don't mind a Perl one-liner: $ perl -CS -E 'say "\x{2620}"' ☠ -CS enables UTF-8 decoding on input and UTF-8 encoding on output. -E evaluates the next argument as Perl, with modern features like say enabled. If you don't want a newline at the end, use print instead of say.

The printf builtin (just as the coreutils' printf) knows the \u escape sequence which accepts 4-digit Unicode characters: \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits) Test with Bash 4.2.37(1): $ printf '\u2620\n' ☠

Easy with a Python2/3 one-liner: $ python -c 'print u"\u2620"' # python2 $ python3 -c 'print(u"\u2620")' # python3 Results in: ☠

If hex value of unicode character is known H="2620" printf "%b" "\u$H" If the decimal value of a unicode character is known declare -i U=24096+6256+2*16 printf -vH "%x" $U # convert to hex printf "%b" "\u$H"

Related

bash function name with dash is bad pratice? [duplicate]

bash: Emit n printable characters from a string with ANSI codes

Simple bash script (input letter output number)

Bash shell Decimal to Binary base 2 conversion

Padding zeros in a string

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do you echo a 4-digit Unicode character in Bash? - bash

In UTF-8 it's actually 6 digits (or 3 bytes). $ printf '\xE2\x98\xA0' ☠ To check how it's encoded by the console, use hexdump: $ printf ☠ | hexdump 0000000 98e2 00a0 0000003

% echo -e '\u2620' # \u takes four hexadecimal digits ☠ % echo -e '\U0001f602' # \U takes eight hexadecimal digits 😂 This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.

Quick one-liner to convert UTF-8 characters into their 3-byte format: var="$(echo -n '☠' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo or echo -n '☠' | od -An -tx1 | sed 's/ /\\x/g' The output of both is \xE2\x98\xA0, so you can write reversely: echo $'\xe2\x98\xa0' # ☠

Just put "☠" in your shell script. In the correct locale and on a Unicode-enabled console it'll print just fine: $ echo ☠ ☠ $ An ugly "workaround" would be to output the UTF-8 sequence, but that also depends on the encoding used: $ echo -e '\xE2\x98\xA0' ☠ $

In bash to print a Unicode character to output use \x,\u or \U (first for 2 digit hex, second for 4 digit hex, third for any length) echo -e '\U1f602' I you want to assign it to a variable use $'...' syntax x=$'\U1f602' echo $x

I'm using this: $ echo -e '\u2620' ☠ This is pretty easier than searching a hex representation... I'm using this in my shell scripts. That works on gnome-term and urxvt AFAIK.

You may need to encode the code point as octal in order for prompt expansion to correctly decode it. U+2620 encoded as UTF-8 is E2 98 A0. So in Bash, export PS1="\342\230\240" will make your shell prompt into skull and bones.

If you don't mind a Perl one-liner: $ perl -CS -E 'say "\x{2620}"' ☠ -CS enables UTF-8 decoding on input and UTF-8 encoding on output. -E evaluates the next argument as Perl, with modern features like say enabled. If you don't want a newline at the end, use print instead of say.

The printf builtin (just as the coreutils' printf) knows the \u escape sequence which accepts 4-digit Unicode characters: \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits) Test with Bash 4.2.37(1): $ printf '\u2620\n' ☠

Easy with a Python2/3 one-liner: $ python -c 'print u"\u2620"' # python2 $ python3 -c 'print(u"\u2620")' # python3 Results in: ☠

If hex value of unicode character is known H="2620" printf "%b" "\u$H" If the decimal value of a unicode character is known declare -i U=2*4096+6*256+2*16 printf -vH "%x" $U # convert to hex printf "%b" "\u$H"

Related

bash function name with dash is bad pratice? [duplicate]

bash: Emit n printable characters from a string with ANSI codes

Simple bash script (input letter output number)

Bash shell Decimal to Binary base 2 conversion

Padding zeros in a string

Categories

Resources

If hex value of unicode character is known H="2620" printf "%b" "\u$H" If the decimal value of a unicode character is known declare -i U=24096+6256+2*16 printf -vH "%x" $U # convert to hex printf "%b" "\u$H"