Convert hash string to binary file - bash

What would be the easiest way to convert the text produced by utilities such as sha512sum into a binary file?
I'd like to convert hex string like 77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec (example produced by sha512sum) into a binary file (64 bytes long), in which each byte's value would be equivalent to a pair of letters/digits from the string. I'm looking for a solution that would require minimal amount of tools, so I'd be happy if this can be done easily with bash, sed or some utility from coreutils. I'd rather avoid xxd, as this doesn't seem to handle such string anyway (I'd have to add "addresses" and some whitespace).
I need the hash as a binary file, to convert it into an object file and link with the application that I'm writing. If there's another way to embed such string (in a binary form!) into application (via an array or whatever) - it's also a solution for me.

A bit of sed and echo might do:
for i in $(echo 77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec | sed 's/../& /g'); do
echo -ne "\x$i"
done > output.bin
The sed command is splitting the hex string into bytes and the echo shows it as hexadecimal character.
Or in a shorter form with sha512sum output, as suggested in the comment:
echo -ne "$(sha512sum some-file.txt | sed 's/ .*$//; s/../\\x&/g')"

How about perl:
<<<77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec \
perl -e 'print pack "H*", <STDIN>' > hash.bin

If you have openssl in your system and want a sha512 hash in binary form, you can use this:
openssl dgst -sha512 -binary somefile.txt

If you have node:
node -e "var fs = require('fs'); fs.writeFileSync('binary', new Buffer('77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec', 'hex'))"

s="77f4de214a5423e3c7df8704f65af57c39c55f08424c75c0931ab09a5bfdf49f5f939f2caeff1e0113d9b3d6363583e4830cf40787100b750cde76f00b8cd3ec";
echo -n $s | xxd -r -p > file.bin
1 File(s) 64 bytes
Tested on Ubuntu 16.04.7

Related

How to check if a binary file is contained inside another binary from the Linux command line?

Basically I want a "multiline grep that takes binary strings as patterns".
For example:
printf '\x00\x01\n\x02\x03' > big.bin
printf '\x01\n\x02' > small.bin
printf '\x00\n\x02' > small2.bin
Then the following should hold:
small.bin is contained in big.bin
small2.bin is not contained in big.bin
I don't want to have to convert the files to ASCII hex representation with xxd as shown e.g. at: https://unix.stackexchange.com/questions/217936/equivalent-command-to-grep-binary-files because that feels wasteful.
Ideally, the tool should handle large files that don't fit into memory.
Note that the following attempts don't work.
grep -f matches where it shouldn't because it must be splitting newlines:
grep -F -f small.bin big.bin
# Correct: Binary file big.bin matches
grep -F -f small2.bin big.bin
# Wrong: Binary file big.bin matches
Shell substitution as in $(cat) fails because it is impossible to handle null characters in Bash AFAIK, so the string just gets truncated at the first 0 I believe:
grep -F "$(cat small.bin)" big.bin
# Correct: Binary file big.bin matches
grep -F "$(cat small2.bin)" big.bin
# Wrong: Binary file big.bin matches
A C question has been asked at: How can i check if binary file's content is found in other binary file? but is it possible with any widely available CLI (hopefully POSIX, or GNU coreutils) tools?
Notably, implementing an non-naive algorithm such as Boyer-Moore is not entirely trivial.
I can hack up a working Python one liner as follows, but it won't work for files that don't fit into memory:
grepbin() ( python -c 'import sys;sys.exit(not open(sys.argv[1]).read() in open(sys.argv[2]).read())' "$1" "$2" )
grepbin small.bin big.bin && echo 1
grepbin small2.bin big.bin && echo 2
I could also find the following two tools on GitHub:
https://github.com/tmbinc/bgrep in C, installable with (amazing :-)):
curl -L 'https://github.com/tmbinc/bgrep/raw/master/bgrep.c' | gcc -O2 -x c -o /usr/local/bin/bgrep -
https://github.com/gahag/bgrep in Rust, installable with:
cargo install bgrep
but they don't seem so support taking the pattern from a file, you provide the input as hex ASCII on the command line. I could use:
bgrep $(xxd -p small.bin | tr -d '\n') big.bin
since it does not matter as much if the small file gets converted with xxd, but it's not really nice.
In any case, if I were to implement the feature, I'd likely it to the Rust library above.
bgrep is also mentioned at: How does bgrep work?
Tested on Ubuntu 20.10.
How to check if a binary file is contained inside another binary from the Linux command line?
The very POSIX portable way would be to use od to convert to hex and then check for substring with grep, along with some sed scripting in between.
The usual normal portable way, would be to use xxd instead of od:
xxd -p small.bin | tr -d '[ \n]' > small.bin2
xxd -p big.bin | tr -d '[ \n]' > big.bin2
grep -F -f small.bin2 big.bin2
which works fine tested in docker on alpine with busybox.
But:
I don't want to have to convert the files to ASCII hex representation with xxd as shown
then you can't work with binary files in shell. Pick another language. Shell is specifically created to parse nice looking human readable strings - for anything else, it's utterly unpleasant and for files with zero bytes xxd is the first thing you type.
I can hack up a working Python one liner as follows,
awk is also POSIX and available everywhere - I believe someone more skilled in awk may come and write the exact 1:1 of your python script, but:
but it won't work for files that don't fit into memory:
So write a different algorithm, that will not do that.
Overall, when giving the constraint of not using xxd (or od) to convert a binary file with zero bytes to it's hex representation:
is it possible with any widely available CLI (hopefully POSIX, or GNU coreutils) tools?
No. Write your own program for that. You may also write it in perl, it's sometimes available on machines that don't have python.

Decoding a base64 encoded random on MinGW not working

I am trying to make a bash script work on MinGW and it seems the shell is not able to decode something like the following.
t=$(openssl rand -base64 64)
echo "$t" | base64 --decode
Resulting in,
Ԋ7▒%
▒7▒SUfX▒L:o<x▒▒䈏ţ94V▒▒▒▒uW;▒▒pxu▒base64: invalid input
Interestingly, if I output the base64 character and run the command as such, it works.
echo "+e5dcWsijZ83uR2cuKxIDJTTiwTvqB7J0EJ63paJdzGomQxw9BhfPvFTkdKP9I1o
g29pZKjUfDO8/SUNt+idWQ==" | base64 --decode
Anybody knows what I am doing wrong?
Thanks
I solved the above case by passing --ignore-garbage flag to the base64 decode. It ignores the non-alphabet characters.
echo "$t" | base64 --decode --ignore-garbage
However, I would still like to know how did I create "garbage" ;) ?
I think what has happened here is the base64 string contains some embedded spaces, and that causes the actual "invalid input" w (and what you observe to as garbage.)
The openssl rand -base64 64 command introduces some newlines (not spaces), for example,
openssl rand -base64 64 > b64.txt
... then viewing the b64.txt file in an editor I see two separate lines
tPKqKPbH5LkGu13KR6zDdJOBpUGD4pAqS6wKGS32EOyJaK0AmTG4da3fDuOI4T+k
abInqlQcH5k7k9ZVEzv8FA==
... and this implies there is a newline character between the 'k' and 'a'
So the base64 string has this embedded newline. The base64 -d can handle newlines (as demonstrated by your successful example), but, it cannot handle space char.
The newlines can get translated to spaces by some actions of the shell. This is very likely happening by the echo $t I.e. if t has newlines inside it, then echo will just replace then with single space. Actually, how it behaves can depend on shell options, and type of string quotes, if any, applied.
To fix the command, we can remove the newline before piping to the base64 -d command.
One way to do that is to use tr command, e.g. the following works on Linux:
t=$(openssl rand -base64 64 | tr -d '\n')
echo $t | base64 -d
... or alternatively, remove the spaces, again using tr
t=$(openssl rand -base64 64)
echo $t | tr -d ' ' | base64 -d

Convert array of bytes to base64 string using bash [duplicate]

cat /dev/urandom is always a fun way to create scrolling characters on your display, but produces too many non-printable characters.
Is there an easy way to encode it on the command-line in such a way that all of its output are readable characters, base64 or uuencode for example.
Note that I prefer solutions that require no additional files to be created.
What about something like
cat /dev/urandom | base64
Which gives (lots of) stuff like
hX6VYoTG6n+suzKhPl35rI+Bsef8FwVKDYlzEJ2i5HLKa38SLLrE9bW9jViSR1PJGsDmNOEgWu+6
HdYm9SsRDcvDlZAdMXAiHBmq6BZXnj0w87YbdMnB0e2fyUY6ZkiHw+A0oNWCnJLME9/6vJUGsnPL
TEw4YI0fX5ZUvItt0skSSmI5EhaZn09gWEBKRjXVoGCOWVlXbOURkOcbemhsF1pGsRE2WKiOSvsr
Xj/5swkAA5csea1TW5mQ1qe7GBls6QBYapkxEMmJxXvatxFWjHVT3lKV0YVR3SI2CxOBePUgWxiL
ZkQccl+PGBWmkD7vW62bu1Lkp8edf7R/E653pi+e4WjLkN2wKl1uBbRroFsT71NzNBalvR/ZkFaa
2I04koI49ijYuqNojN5PoutNAVijyJDA9xMn1Z5UTdUB7LNerWiU64fUl+cgCC1g+nU2IOH7MEbv
gT0Mr5V+XAeLJUJSkFmxqg75U+mnUkpFF2dJiWivjvnuFO+khdjbVYNMD11n4fCQvN9AywzH23uo
03iOY1uv27ENeBfieFxiRwFfEkPDgTyIL3W6zgL0MEvxetk5kc0EJTlhvin7PwD/BtosN2dlfPvw
cjTKbdf43fru+WnFknH4cQq1LzN/foZqp+4FmoLjCvda21+Ckediz5mOhl0Gzuof8AuDFvReF5OU
Or, without the (useless) cat+pipe :
base64 /dev/urandom
(Same kind of output ^^ )
EDIT : you can also user the --wrap option of base64, to avoid having "short lines" :
base64 --wrap=0 /dev/urandom
This will remove wrapping, and you'll get "full-screen" display ^^
A number of folks have suggested catting and piping through base64 or uuencode. One issue with this is that you can't control how much data to read (it will continue forever, or until you hit ctrl+c). Another possibility is to use the dd command, which will let you specify how much data to read before exiting. For example, to read 1kb:
dd if=/dev/urandom bs=1k count=1 2>/dev/null | base64
Another option is to pipe to the strings command which may give more variety in its output (non-printable characters are discarded, any runs of least 4 printable characters [by default] are displayed). The problem with strings is that it displays each "run" on its own line.
dd if=/dev/urandom bs=1k count=1 2>/dev/null | strings
(of course you can replace the entire command with
strings /dev/urandom
if you don't want it to ever stop).
If you want something really funky, try one of:
cat -v /dev/urandom
dd if=/dev/urandom bs=1k count=1 2>/dev/null | cat -v
So, what is wrong with
cat /dev/urandom | uuencode -
?
Fixed after the first attempt didn't actually work... ::sigh::
BTW-- Many unix utilities use '-' in place of a filename to mean "use the standard input".
There are already several good answers on how to base64 encode random data (i.e. cat /dev/urandom | base64). However in the body of your question you elaborate:
... encode [urandom] on the command-line in such a way that all of it's output are readable characters, base64 or uuencode for example.
Given that you don't actually require parseable base64 and just want it to be readable, I'd suggest
cat /dev/urandom | tr -dC '[:graph:]'
base64 only outputs alphanumeric characters and two symbols (+ and / by default). [:graph:] will match any printable non-whitespace ascii, including many symbols/punctuation-marks that base64 lacks. Therefore using tr -dC '[:graph:]' will result in a more random-looking output, and have better input/output efficiency.
I often use < /dev/random stdbuf -o0 tr -Cd '[:graph:]' | stdbuf -o0 head --bytes 32 for generating strong passwords.
You can do more interesting stuff with BASH's FIFO pipes:
uuencode <(head -c 200 /dev/urandom | base64 | gzip)
cat /dev/urandom | tr -dc 'a-zA-Z0-9'
Try
xxd -ps /dev/urandom
xxd(1)

Why grep not function as expected with large file? [duplicate]

grep returns
Binary file test.log matches
For example
echo "line1 re \x00\r\nline2\r\nline3 re\r\n" > test.log # in zsh
echo -e "line1 re \x00\r\nline2\r\nline3 re\r\n" > test.log # in bash
grep re test.log
I wish the result will show line1 and line3 (total two lines).
Is it possible to use tr convert the unprintable data into readable data, to let grep work again?
grep -a
It can't get simpler than that.
One way is to simply treat binary files as text anyway, with grep --text but this may well result in binary information being sent to your terminal. That's not really a good idea if you're running a terminal that interprets the output stream (such as VT/DEC or many others).
Alternatively, you can send your file through tr with the following command:
tr '[\000-\011\013-\037\177-\377]' '.' <test.log | grep whatever
This will change anything less than a space character (except newline) and anything greater than 126, into a . character, leaving only the printables.
If you want every "illegal" character replaced by a different one, you can use something like the following C program, a classic standard input filter:
#include<stdio.h>
int main (void) {
int ch;
while ((ch = getchar()) != EOF) {
if ((ch == '\n') || ((ch >= ' ') && (ch <= '~'))) {
putchar (ch);
} else {
printf ("{{%02x}}", ch);
}
}
return 0;
}
This will give you {{NN}}, where NN is the hex code for the character. You can simply adjust the printf for whatever style of output you want.
You can see that program in action here, where it:
pax$ printf 'Hello,\tBob\nGoodbye, Bob\n' | ./filterProg
Hello,{{09}}Bob
Goodbye, Bob
You could run the data file through cat -v, e.g
$ cat -v tmp/test.log | grep re
line1 re ^#^M
line3 re^M
which could be then further post-processed to remove the junk; this is most analogous to your query about using tr for the task.
-v simply tells cat to display non-printing characters.
You can use "strings" to extract strings from a binary file, for example
strings binary.file | grep foo
You can force grep to look at binary files with:
grep --binary-files=text
You might also want to add -o (--only-matching) so you don't get tons of binary gibberish that will bork your terminal.
Starting with Grep 2.21, binary files are treated differently:
When searching binary data, grep now may treat non-text bytes as line
terminators. This can boost performance significantly.
So what happens now is that with binary data, all non-text bytes
(including newlines) are treated as line terminators. If you want to change this
behavior, you can:
use --text. This will ensure that only newlines are line terminators
use --null-data. This will ensure that only null bytes are line terminators
grep -a will force grep to search and output from a file that grep thinks is binary.
grep -a re test.log
As James Selvakumar already said, grep -a does the trick. -a or --text forces Grep to handle the inputstream as text.
See Manpage http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
try
cat test.log | grep -a somestring
you can do
strings test.log | grep -i
this will convert give output as a readable string to grep.
Here's what I used in a system that didn't have "strings" command installed
cat yourfilename | tr -cd "[:print:]"
This prints the text and removes unprintable characters in one fell swoop, unlike "cat -v filename" which requires some postprocessing to remove unwanted stuff. Note that some of the binary data may be printable so you'll still get some gibberish between the good stuff. I think strings removes this gibberish too if you can use that.
You can also try Word Extractor tool. Word Extractor can be used with any file in your computer to separate the strings that contain human text / words from binary code (exe applications, DLLs).

How to grep a text file which contains some binary data?

grep returns
Binary file test.log matches
For example
echo "line1 re \x00\r\nline2\r\nline3 re\r\n" > test.log # in zsh
echo -e "line1 re \x00\r\nline2\r\nline3 re\r\n" > test.log # in bash
grep re test.log
I wish the result will show line1 and line3 (total two lines).
Is it possible to use tr convert the unprintable data into readable data, to let grep work again?
grep -a
It can't get simpler than that.
One way is to simply treat binary files as text anyway, with grep --text but this may well result in binary information being sent to your terminal. That's not really a good idea if you're running a terminal that interprets the output stream (such as VT/DEC or many others).
Alternatively, you can send your file through tr with the following command:
tr '[\000-\011\013-\037\177-\377]' '.' <test.log | grep whatever
This will change anything less than a space character (except newline) and anything greater than 126, into a . character, leaving only the printables.
If you want every "illegal" character replaced by a different one, you can use something like the following C program, a classic standard input filter:
#include<stdio.h>
int main (void) {
int ch;
while ((ch = getchar()) != EOF) {
if ((ch == '\n') || ((ch >= ' ') && (ch <= '~'))) {
putchar (ch);
} else {
printf ("{{%02x}}", ch);
}
}
return 0;
}
This will give you {{NN}}, where NN is the hex code for the character. You can simply adjust the printf for whatever style of output you want.
You can see that program in action here, where it:
pax$ printf 'Hello,\tBob\nGoodbye, Bob\n' | ./filterProg
Hello,{{09}}Bob
Goodbye, Bob
You could run the data file through cat -v, e.g
$ cat -v tmp/test.log | grep re
line1 re ^#^M
line3 re^M
which could be then further post-processed to remove the junk; this is most analogous to your query about using tr for the task.
-v simply tells cat to display non-printing characters.
You can use "strings" to extract strings from a binary file, for example
strings binary.file | grep foo
You can force grep to look at binary files with:
grep --binary-files=text
You might also want to add -o (--only-matching) so you don't get tons of binary gibberish that will bork your terminal.
Starting with Grep 2.21, binary files are treated differently:
When searching binary data, grep now may treat non-text bytes as line
terminators. This can boost performance significantly.
So what happens now is that with binary data, all non-text bytes
(including newlines) are treated as line terminators. If you want to change this
behavior, you can:
use --text. This will ensure that only newlines are line terminators
use --null-data. This will ensure that only null bytes are line terminators
grep -a will force grep to search and output from a file that grep thinks is binary.
grep -a re test.log
As James Selvakumar already said, grep -a does the trick. -a or --text forces Grep to handle the inputstream as text.
See Manpage http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
try
cat test.log | grep -a somestring
you can do
strings test.log | grep -i
this will convert give output as a readable string to grep.
Here's what I used in a system that didn't have "strings" command installed
cat yourfilename | tr -cd "[:print:]"
This prints the text and removes unprintable characters in one fell swoop, unlike "cat -v filename" which requires some postprocessing to remove unwanted stuff. Note that some of the binary data may be printable so you'll still get some gibberish between the good stuff. I think strings removes this gibberish too if you can use that.
You can also try Word Extractor tool. Word Extractor can be used with any file in your computer to separate the strings that contain human text / words from binary code (exe applications, DLLs).

Resources