Convert array of bytes to base64 string using bash [duplicate]

Convert array of bytes to base64 string using bash [duplicate] - bash

cat /dev/urandom is always a fun way to create scrolling characters on your display, but produces too many non-printable characters.
Is there an easy way to encode it on the command-line in such a way that all of its output are readable characters, base64 or uuencode for example.
Note that I prefer solutions that require no additional files to be created.

What about something like
cat /dev/urandom | base64
Which gives (lots of) stuff like
hX6VYoTG6n+suzKhPl35rI+Bsef8FwVKDYlzEJ2i5HLKa38SLLrE9bW9jViSR1PJGsDmNOEgWu+6
HdYm9SsRDcvDlZAdMXAiHBmq6BZXnj0w87YbdMnB0e2fyUY6ZkiHw+A0oNWCnJLME9/6vJUGsnPL
TEw4YI0fX5ZUvItt0skSSmI5EhaZn09gWEBKRjXVoGCOWVlXbOURkOcbemhsF1pGsRE2WKiOSvsr
Xj/5swkAA5csea1TW5mQ1qe7GBls6QBYapkxEMmJxXvatxFWjHVT3lKV0YVR3SI2CxOBePUgWxiL
ZkQccl+PGBWmkD7vW62bu1Lkp8edf7R/E653pi+e4WjLkN2wKl1uBbRroFsT71NzNBalvR/ZkFaa
2I04koI49ijYuqNojN5PoutNAVijyJDA9xMn1Z5UTdUB7LNerWiU64fUl+cgCC1g+nU2IOH7MEbv
gT0Mr5V+XAeLJUJSkFmxqg75U+mnUkpFF2dJiWivjvnuFO+khdjbVYNMD11n4fCQvN9AywzH23uo
03iOY1uv27ENeBfieFxiRwFfEkPDgTyIL3W6zgL0MEvxetk5kc0EJTlhvin7PwD/BtosN2dlfPvw
cjTKbdf43fru+WnFknH4cQq1LzN/foZqp+4FmoLjCvda21+Ckediz5mOhl0Gzuof8AuDFvReF5OU
Or, without the (useless) cat+pipe :
base64 /dev/urandom
(Same kind of output ^^ )
EDIT : you can also user the --wrap option of base64, to avoid having "short lines" :
base64 --wrap=0 /dev/urandom
This will remove wrapping, and you'll get "full-screen" display ^^

A number of folks have suggested catting and piping through base64 or uuencode. One issue with this is that you can't control how much data to read (it will continue forever, or until you hit ctrl+c). Another possibility is to use the dd command, which will let you specify how much data to read before exiting. For example, to read 1kb:
dd if=/dev/urandom bs=1k count=1 2>/dev/null | base64
Another option is to pipe to the strings command which may give more variety in its output (non-printable characters are discarded, any runs of least 4 printable characters [by default] are displayed). The problem with strings is that it displays each "run" on its own line.
dd if=/dev/urandom bs=1k count=1 2>/dev/null | strings
(of course you can replace the entire command with
strings /dev/urandom
if you don't want it to ever stop).
If you want something really funky, try one of:
cat -v /dev/urandom
dd if=/dev/urandom bs=1k count=1 2>/dev/null | cat -v

So, what is wrong with
cat /dev/urandom | uuencode -
?
Fixed after the first attempt didn't actually work... ::sigh::
BTW-- Many unix utilities use '-' in place of a filename to mean "use the standard input".

There are already several good answers on how to base64 encode random data (i.e. cat /dev/urandom | base64). However in the body of your question you elaborate:
... encode [urandom] on the command-line in such a way that all of it's output are readable characters, base64 or uuencode for example.
Given that you don't actually require parseable base64 and just want it to be readable, I'd suggest
cat /dev/urandom | tr -dC '[:graph:]'
base64 only outputs alphanumeric characters and two symbols (+ and / by default). [:graph:] will match any printable non-whitespace ascii, including many symbols/punctuation-marks that base64 lacks. Therefore using tr -dC '[:graph:]' will result in a more random-looking output, and have better input/output efficiency.
I often use < /dev/random stdbuf -o0 tr -Cd '[:graph:]' | stdbuf -o0 head --bytes 32 for generating strong passwords.

You can do more interesting stuff with BASH's FIFO pipes:
uuencode <(head -c 200 /dev/urandom | base64 | gzip)

cat /dev/urandom | tr -dc 'a-zA-Z0-9'

Try
xxd -ps /dev/urandom
xxd(1)

Related

Why is generating a higher amount of random data much slower?

I want to generate a high amount of random numbers. I wrote the following bash command (note that I am using cat here for demonstrational purposes; in my real use case, I am piping the numbers into a process):
for i in {1..99999999}; do echo -e "$(cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 1)"; done | cat
The numbers are printed at a very low rate. However, if I generate a smaller amount, it is much faster:
for i in {1..9999}; do echo -e "$(cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 1)"; done | cat
Note that the only difference is 9999 instead of 99999999.
Why is this? Is the data buffered somewhere? Is there a way to optimize this, so that the random numbers are piped/streamed into cat immediately?

Why is this?
Generating {1..99999999} 100000000 arguments and then parsing them requires a lot of memory allocation from bash. This significantly stalls the whole system.
Additionally, large chunks of data are read from /dev/urandom, and about 96% of that data are filtered out by tr -dc '0-9'. This significantly depletes the entropy pool and additionally stalls the whole system.
Is the data buffered somewhere?
Each process has its own buffer, so:
cat /dev/urandom is buffering
tr -dc '0-9' is buffering
fold -w 5 is buffering
head -n 1 is buffering
the left side of pipeline - the shell, has its own buffer
and the right side - | cat has its own buffer
That's 6 buffering places. Even ignoring input buffering from head -n1 and from the right side of the pipeline | cat, that's 4 output buffers.
Also, save animals and stop cat abuse. Use tr </dev/urandom, instead of cat /dev/urandom | tr. Fun fact - tr can't take filename as a argument.
Is there a way to optimize this, so that the random numbers are piped/streamed into cat immediately?
Remove the whole code.
Take only as little bytes from the random source as you need. To generate a 32-bit number you only need 32 bits - no more. To generate a 5-digit number, you only need 17 bits - rounding to 8-bit bytes, that's only 3 bytes. The tr -dc '0-9' is a cool trick, but it definitely shouldn't be used in any real code.
Strangely recently I answered I guess a similar question, copying the code from there, you could:
for ((i=0;i<100000000;++i)); do echo "$((0x$(dd if=/dev/urandom of=/dev/stdout bs=4 count=1 status=none | xxd -p)))"; done | cut -c-5
# cut to take first 5 digits
But that still would be unacceptably slow, as it runs 2 processes for each random number (and I think just taking the first 5 digits will have a bad distribution).
I suggest to use $RANDOM, available in bash. If not, use $SRANDOM if you really want /dev/urandom (and really know why you want it). If not, I suggest to write the random number generation from /dev/urandom in a real programming language, like C, C++, python, perl, ruby. I believe one could write it in awk.
The following looks nice, but still converting binary data to hex, just to convert them to decimal later is a workaround for that shell just can't work with binary data:
count=10;
# take count*4 bytes from input
dd if=/dev/urandom of=/dev/stdout bs=4 count=$count status=none |
# Convert bytes to hex 4 bytes at a time
xxd -p -c 4 |
# Convert hex to decimal using GNU awk
awk --non-decimal-data '{printf "%d\n", "0x"$0}'

Why are you running this in a loop? You can just run a single set of these commands to generate everything, e.g.:
cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 100000000
I.e. just generate a single stream of numbers, rather than generate them individually.
I'd second the suggestion of using another language for this, it should be much more efficient. For example, in Python it would just be:
from random import randrange
for _ in range(100000000):
print(randrange(100000))

#SamMason gave the best answer so far, as he completely did away with the loop:
cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 100000000
That still leaves a lot of room for improvement though. First, tr -dc '0-9' only uses about 4% of the stuff that's coming out of /dev/urandom :-) Second, depending on how those random numbers will be consumed in the end, some additional overhead may be incurred for getting rid of leading zeros -- so that some numbers will not be interpreted as octal. Let me suggest a better alternative, using the od command:
outputFile=/dev/null # For test. Replace with the real file.
count=100000000
od -An -t u2 -w2 /dev/urandom | head -n $count >$outputFile
A qick test with the time command showed this to be roughly four times faster than the tr version. And there is really no need for using "another language", as both od and head are highly optimized, and this whole thing runs at native speed.
NOTE: The above command will be generating 16-bit integers, ranging from 0 to 65535 inclusive. If you need a larger range, then you could go for 32 bit numbers, and that will give you a range from 0 to 4294967295:
od -An -t u4 -w4 /dev/urandom | head -n $count >$outputFile
If needed, the end user can scale those down to the desired size with a modulo division.

String -> SHA -> base64, 3 ways and 3 different results via cmd tools (openssl/xxd/base64) why?

method1:
$echo -n "The quick brown fox jumps over the lazy dog" | openssl sha1 | base64
MmZkNGUxYzY3YTJkMjhmY2VkODQ5ZWUxYmI3NmU3MzkxYjkzZWIxMgo=
method2:
$ echo -n "The quick brown fox jumps over the lazy dog" | openssl sha1 | xxd -r -p | base64
L9ThxnotKPzthJ7hu3bnORuT6xI=
method3:
echo -n "The quick brown fox jumps over the lazy dog" | openssl sha1 | xxd -b -p | base64
MzI2NjY0MzQ2NTMxNjMzNjM3NjEzMjY0MzIzODY2NjM2NTY0MzgzNDM5NjU2NTMxNjI2MjM3MzY2NTM3CjMzMzkzMTYyMzkzMzY1NjIzMTMyMGEK
I am basically trying to do a checksum an input string The quick brown fox jumps over the lazy dog via sha1 as a checksum and then base64 the result and I have two methods above, I think the method2 is correct answer but I have to an extra step to convert the hex back into binary via xxd -r and plain format -p before I feed it into base64 again, why do I have to do this extra step ?
I don't find anywhere the base64 cmd tool is expecting the input to be binary ? But let's assume so when I explicitly convert it into binary and feed it to base64 via mehod3 xxd -b option,the result is different again.
This might be easier if it's in programing language bcos we have full control but via a few cmd tools its a bit confusing, could someone help me explain this ?

There are three different results here because you are passing in three different strings to base64.
Per your question on base64 expecting the input to be binary, #chepner is right here:
All data is binary; text is just a stream of bytes representing an encoding (ASCII, UTF-8, etc) of text.
Intermediary steps
Let's store the shared command in a variable for clarity.
$ msg='The quick brown fox jumps over the lazy dog'
$ sha_val="$(printf "$msg" | openssl sha1 | awk '{ print $2 }')"
$ printf "$sha_val"
2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
A couple things to note:
Using printf because it is more consistent, especially when we are comparing bytes and hashes.
Piping to awk '{ print $2 }' as openssl may prepend with (stdin)=.
Comparing the bytes
We can use xxd to compare the bytes for each, using -c 1000 to use 1000-char lines (i.e. don't add newlines for < 1000-char strings). This is useful for strings like the output in method2, where there are control characters that can't be printed.
method 1
This is the hex representation of the sha value. For example, the first 2 in the sha output is 32 in this result because hex 32 <=> dec 50 <=> ASCII/UTF-8 "2". If this is confusing, take a look at an ASCII table.
$ printf "$sha_val" | xxd -p -c 1000
32666434653163363761326432386663656438343965653162623736653733393162393365623132
method 2
This output is the EXACT SAME as $sha_val, given that we are converting from hex to ASCII binary and then back with xxd. Note that converting the sha value from hex to binary is not necessary for base64.
$ printf "$sha_val" | xxd -r -p | xxd -p -c 1000
2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
method 3
xxd's -p option is overriding the -b option, so xxd -b -p <=> xxd -p.
$ printf "$sha_val$" | xxd -p -c 1000 | xxd -p -c 1000
33323636363433343635333136333336333736313332363433323338363636333635363433383334333936353635333136323632333733363635333733333339333136323339333336353632333133323061
As you can see, base64 generates three different strings because it receives three different strings.

Decoding a base64 encoded random on MinGW not working

I am trying to make a bash script work on MinGW and it seems the shell is not able to decode something like the following.
t=$(openssl rand -base64 64)
echo "$t" | base64 --decode
Resulting in,
Ԋ7▒%
▒7▒SUfX▒L:o<x▒▒䈏ţ94V▒▒▒▒uW;▒▒pxu▒base64: invalid input
Interestingly, if I output the base64 character and run the command as such, it works.
echo "+e5dcWsijZ83uR2cuKxIDJTTiwTvqB7J0EJ63paJdzGomQxw9BhfPvFTkdKP9I1o
g29pZKjUfDO8/SUNt+idWQ==" | base64 --decode
Anybody knows what I am doing wrong?
Thanks

I solved the above case by passing --ignore-garbage flag to the base64 decode. It ignores the non-alphabet characters.
echo "$t" | base64 --decode --ignore-garbage
However, I would still like to know how did I create "garbage" ;) ?

I think what has happened here is the base64 string contains some embedded spaces, and that causes the actual "invalid input" w (and what you observe to as garbage.)
The openssl rand -base64 64 command introduces some newlines (not spaces), for example,
openssl rand -base64 64 > b64.txt
... then viewing the b64.txt file in an editor I see two separate lines
tPKqKPbH5LkGu13KR6zDdJOBpUGD4pAqS6wKGS32EOyJaK0AmTG4da3fDuOI4T+k
abInqlQcH5k7k9ZVEzv8FA==
... and this implies there is a newline character between the 'k' and 'a'
So the base64 string has this embedded newline. The base64 -d can handle newlines (as demonstrated by your successful example), but, it cannot handle space char.
The newlines can get translated to spaces by some actions of the shell. This is very likely happening by the echo $t I.e. if t has newlines inside it, then echo will just replace then with single space. Actually, how it behaves can depend on shell options, and type of string quotes, if any, applied.
To fix the command, we can remove the newline before piping to the base64 -d command.
One way to do that is to use tr command, e.g. the following works on Linux:
t=$(openssl rand -base64 64 | tr -d '\n')
echo $t | base64 -d
... or alternatively, remove the spaces, again using tr
t=$(openssl rand -base64 64)
echo $t | tr -d ' ' | base64 -d

Convert hash string to binary file

What would be the easiest way to convert the text produced by utilities such as sha512sum into a binary file?
I'd like to convert hex string like 77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec (example produced by sha512sum) into a binary file (64 bytes long), in which each byte's value would be equivalent to a pair of letters/digits from the string. I'm looking for a solution that would require minimal amount of tools, so I'd be happy if this can be done easily with bash, sed or some utility from coreutils. I'd rather avoid xxd, as this doesn't seem to handle such string anyway (I'd have to add "addresses" and some whitespace).
I need the hash as a binary file, to convert it into an object file and link with the application that I'm writing. If there's another way to embed such string (in a binary form!) into application (via an array or whatever) - it's also a solution for me.

A bit of sed and echo might do:
for i in $(echo 77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec | sed 's/../& /g'); do
echo -ne "\x$i"
done > output.bin
The sed command is splitting the hex string into bytes and the echo shows it as hexadecimal character.
Or in a shorter form with sha512sum output, as suggested in the comment:
echo -ne "$(sha512sum some-file.txt | sed 's/ .*$//; s/../\\x&/g')"

How about perl:
<<<77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec \
perl -e 'print pack "H*", <STDIN>' > hash.bin

If you have openssl in your system and want a sha512 hash in binary form, you can use this:
openssl dgst -sha512 -binary somefile.txt

If you have node:
node -e "var fs = require('fs'); fs.writeFileSync('binary', new Buffer('77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec', 'hex'))"

s="77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec";
echo -n $s | xxd -r -p > file.bin
1 File(s) 64 bytes
Tested on Ubuntu 16.04.7

Bash scripting: How can I patch files? (write a given string in a given position of a file)

I’m writing a script to change the UUID of an NTFS partition (AFAIK, there is none). That means writing 8 bytes from 0x48 to 0x4F (72-79 decimal) of /dev/sdaX (X being the # of my partition).
If I wanted to change it to a random UUID, I could use this:
dd if=/dev/urandom of=/dev/sdaX bs=8 count=1 seek=9 conv=notrunc
Or I could change /dev/urandom to /dev/sdaY to clone the UUID from another partition.
But... what if I want to craft a personalized UUID? I already have it stored (and regex-checked) in a $UUID variable in hexadecimal string format (16 characters), like this:
UUID="2AE2C85D31835048"
I was thinking about this approach:
echo "$UUID" | xxd -r -p | dd of=/dev/sdaX ...
This is just a scratch... I’m not sure about the exact options to make it work. My question is:
Is the echo $var | xxd -r | dd really the best approach?
What would be the exact command and options to make it work?
As for the answers, I’m also looking for:
An explanation of all the options used, and what they do.
If possible, an alternative command to test it in a file and/or screen before changing the partition.
I already have an 100-byte dump file called ntfs.bin that I can use for tests and check results using
xxd ntfs.bin
So any solution that provides me a way to check results using xxd in screen so I can compare with original ntfs.bin file would be highly appreciated.

Try:
UUID="2AE2C85D31835048"
echo "$UUID" | xxd -r -p | wc -c
echo "$UUID" | xxd -r -p | dd of=file obs=1 oseek=72 conv=block,notrunc cbs=8

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Convert array of bytes to base64 string using bash [duplicate] - bash

So, what is wrong with cat /dev/urandom | uuencode - ? Fixed after the first attempt didn't actually work... ::sigh:: BTW-- Many unix utilities use '-' in place of a filename to mean "use the standard input".

You can do more interesting stuff with BASH's FIFO pipes: uuencode <(head -c 200 /dev/urandom | base64 | gzip)

cat /dev/urandom | tr -dc 'a-zA-Z0-9'

Try xxd -ps /dev/urandom xxd(1)

Related

Why is generating a higher amount of random data much slower?

String -> SHA -> base64, 3 ways and 3 different results via cmd tools (openssl/xxd/base64) why?

Decoding a base64 encoded random on MinGW not working

Convert hash string to binary file

Bash scripting: How can I patch files? (write a given string in a given position of a file)

Categories

Resources