How can I create a binary file using Bash? - bash

How can I create a binary file with consequent binary values in Bash?
Like:
hexdump testfile
0000000 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e
0000010 1110 1312 1514 1716 1918 1b1a 1d1c 1f1e
0000020 2120 2322 2524 2726 2928 2b2a 2d2c 2f2e
0000030 ....
In C, I do:
fd = open("testfile", O_RDWR | O_CREAT);
for (i=0; i< CONTENT_SIZE; i++)
{
testBufOut[i] = i;
}
num_bytes_written = write(fd, testBufOut, CONTENT_SIZE);
close (fd);
This is what I wanted:
#! /bin/bash
i=0
while [ $i -lt 256 ]; do
h=$(printf "%.2X\n" $i)
echo "$h"| xxd -r -p
i=$((i-1))
done

There's only one byte you cannot pass as an argument in a Bash command line: 0
For any other value, you can just redirect it. It's safe.
echo -n $'\x01' > binary.dat
echo -n $'\x02' >> binary.dat
...
For the value 0, there's another way to output it to a file
dd if=/dev/zero of=binary.dat bs=1c count=1
To append it to file, use
dd if=/dev/zero oflag=append conv=notrunc of=binary.dat bs=1c count=1

Take a look at xxd:
xxd: creates a hex dump of a given file or standard input. It can
also
convert a hex dump back to its original binary form.

If you don't mind to not use an existing command and want to describe you data in a text file, you can use binmake. That is a C++ program that you can compile and use like following:
First get and compile binmake (the binary will be in bin/):
git clone https://github.com/dadadel/binmake
cd binmake
make
Create your text file file.txt:
big-endian
00010203
04050607
# Separated bytes not concerned by endianness
08 09 0a 0b 0c 0d 0e 0f
Generate your binary file file.bin:
./binmake file.txt file.bin
hexdump file.bin
0000000 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e
0000008
Note: you can also use it with standard input and standard output.

Use the below command,
i=0; while [ $i -lt 256 ]; do echo -en '\x'$(printf "%0x" $i)'' >> binary.dat; i=$((i+1)); done

Related

bash: how to print an integer in hex to a specific length

I am trying to dump the decimal integer values from one file in a hex format.
I do have a file with integer values in decimal.
$ more test.dat_trim
2 9
0 -11
7 -17
14 -1
I am trying to print this integer in hex. I know also that the integer values are small enough to fit on 2 bytes. I want the output to be on 2 bytes. But then when i am trying:
declare -i i;for i in $(<test.dat_trim);do printf "%.2x\n" $i; done;
02
09
00
fffffffffffffff5
07
ffffffffffffffef
0e
ffffffffffffffff
Basically printf "%.2x\n" it is only working for positive number. How can i make it work for negative also?
Just to clarify what i am expecting: The result should be like this:
02
09
00
f5
07
ef
0e
ff
meaning that i want for the negative values to be sign extended only on 1 byte.
Printing signed hex values is uncommon, so there is no conversion specifier providing this.
You could use the following work around:
for i in $(<test.dat_trim); do
if [ $i -ge 0 ]; then
printf " 0x%02x\n" $i;
else
printf "%c0x%02x\n" '-' $[$i * -1];
fi
done;
Referrig the update to the question:
Just replace this line
printf "%c0x%02x\n" '-' $[$i * -1];
with this
printf " 0x%02x\n" $[256 + $i];
This however, only works for the numbers >= -256.
It can be done in awk, that handles negative numbers also:
awk '{printf "0x%x%s0x%x\n", $1, OFS, $2}' OFS='\t' file
0x2 0x9
0x0 0xfffffff5
0x7 0xffffffef
0xe 0xffffffff
Kinda silly but what the heck:
xargs -a test.dat_trim bash -c 'printf %.2s\\n $(printf %02x\\n $* | rev) | rev' _
Have you tried printf("%04x\n",i)?

skip the first 32k of stdin with dd?

If I have a file on a file system I can do something like this with dd:
dd if=/my/filewithaheader.bin bs=32k skip=1 | gunzip | tar tvf
however if I try something like this:
./commandthatputsstuffonstdout | dd bs=32k skip=1 | gunzip | tar tvf
I get the error:
dd: 'standard input': cannot skip to specified offset.
How can I fix this, can it be done with dd, or is there another unix command I can use
You could use tail. Say:
./commandthatputsstuffonstdout | tail -c +1025 ...
to skip the first 1024 bytes of output produced by your command.
From man tail:
-c, --bytes=K
output the last K bytes; alternatively, use -c +K to output
bytes starting with the Kth of each file
I've just run into this too, and using the fullblock iflag prevents the short read and subsequent abort.
Example:
gzip -d < ./disk_image.dd.gz | \
dd bs=4M skip=32768 iflag=fullblock,skip_bytes of=./partial_image.dd
Bit late answer but this dd example worked for me.
Create example source file:
$ dd if=/tmp/somefile of=/tmp/test skip=50 bs=100 count=1
Skip 50 bytes and copy 10 bytes after it to test_skip file:
$ dd if=/tmp/test of=/tmp/test_skip skip=50 bs=1 count=10
10+0 records in
10+0 records out
10 bytes (10 B) copied, 8.2091e-05 s, 122 kB/s
Or data from stdin:
cat /tmp/test| dd of=/tmp/stdin bs=1 skip=50 count=1
Verify output:
$ hexdump /tmp/test
0000000 ebf3 e8fd df1b 0aa1 faa3 1fba 1817 1267
0000010 1402 f539 fb69 f263 f319 084b 0b26 1150
0000020 182a f98d 030c e0b0 e47c f13d ef3b 1146
0000030 0b7e 0f72 0e58 f2bd f403 ee95 e529 0567
0000040 f88e 1994 0e83 12e5 11e7 fd4b 032f f4f0
0000050 fc9d 010a 0ab6 06b6 1224 f5cb 01e4 e67a
0000060 ebe0 f1a0
0000064
$ hexdump /tmp/test_skip
0000000 0f72 0e58 f2bd f403 ee95
Source file offset 50 is byte: 0x0f

redirect stdout to script, so it can be parsed and then sent to stdout

I have a (java) program that prints a line of hex numbers to stdout every 5ish seconds, until the program is terminated by the user.
I would like to redirect that output to a bash script so I could convert each of those hex numbers independently to decimal, then print the parsed line to stdout.
I tried using myProgram | myScript but that did the piping before any lines were printed, then didn't keep listening to stdout. I then tried myProgram > myScript, and that just overwrote the script.
Ideas?
Edit: adding output from the runs, (sorry for the poor formatting, I couldn't get it all in the code highlighting) so the middle of the output is not highighted).
Here is the script
#!/bin/bash
echo $0
echo $#
echo $1
Here is how my program runs while it goes straight to stdout this would continue forever if I didn't terminate it.
mmmm#mmmm:~/mmmm/mmmm/mmmmm$ java net.tinyos.tools.Listen -comm
serial#/dev/ttyUSB0:micaz
serial#/dev/ttyUSB0:57600: resynchronising
00 FF FF 00 02 04 22 93 00 02 02 C9
00 FF FF 00 03 04 22 93 00 03 03 0E
00 FF FF 00 02 04 22 93 00 03 03 0E
00 FF FF 00 02 04 22 93 00 02 02 C9
^Z
[5]+ Stopped java net.tinyos.tools.Listen -comm
serial#/dev/ttyUSB0:micaz
Here is where I try to pipe it to my script (which i have set to print the number of command line arguments and the first argument. It just freeze after this...
mmmm#mmmm:~/mmmm/mmmm/mmmmm$$ java net.tinyos.tools.Listen -comm serial#/dev/ttyUSB0:micaz | ./parser.sh
./parser.sh
0
serial#/dev/ttyUSB0:57600: resynchronising
Diagnosis
When you use this script like this:
java javaprog | myScript
and myScript contains:
#!/bin/bash
echo $0
echo $#
echo $1
Then the output from the script will be its name (myScript) from the echo $0, the number of arguments it was passed (0) from the echo $#, and the first argument (an empty line is echoed) from the echo $1. The script then exits (successfully). The issue is nothing to do with buffering; it is all to do with the script not reading anything from its standard input. Even a trivial modification would be an improvement:
#!/bin/bash
while read data; do echo $data; done
That's a slower form of cat, except that it normalizes random sequences of spaces and tabs into single spaces, stripping leading and trailing spaces off the line. It would at least demonstrate the script processing the output from the Java program.
Trying awk
To do what you're after, you should probably replace that with an awk program or something similar. This is a first draft, but it stands some chance of working:
awk '{for(i = 1; i <= NF; i++) { x = "0x" $i + 0; printf(" %d", x); printf "\n";}'
This says 'for each line (because there is no pattern before the open brace)', do 'for each of the fields 1..NF, convert the field into an explicit hex string with the 0x prefix and adding 0, then print the value as a decimal number (trusting awk to convert a string such as '0xC9' to a number).
Using Perl
Unfortunately, a little testing shows that this does not work; the problem is getting a value other than 0 for x. So, ... time to fall back on Perl in awk-emulation mode:
$ echo '00 C9 28 13 A0 FF 01' |
> perl -na -e 'for ($i = 0; $i < scalar(#F); $i++) { printf(" %d", hex $F[$i]); }
> printf "\n";'
0 201 40 19 160 255 1
$
That works - it's even fairly easy to understand. The -n option means 'read each line of data and execute the commands in the script on each line (but do not print $_ at the end)'. The -a option combined with either -n (as here, or -p which is like -n except it prints $_ automatically) means 'automatically split the input into the array #F. The script then processes each element of #F in each line (rather verbosely), using the hex function to convert the string in $F[$i] to a number and then printing that number with printf(). The verbosity can be reduced (this is Perl: There's More Than One Way To Do It, or TMTOWTDI - tim-toady) with:
$ echo '00 C9 28 13 A0 FF 01' |
> perl -na -e 'foreach my $i (#F) { printf(" %d", hex $i); } printf "\n";'
0 201 40 19 160 255 1
$
Same result, less code. There might be more abbreviated techniques; that's compact enough without being wholly illegible.
\1. check if your system has the unbuffer command installed
which unbuffer
(typically systems that are using bash are Linux-based, and have unbuffer available)
\2. If yes,
unbuffer myProgram | myScript
edit
As you have shown us your shell script as
#!/bin/bash
echo $0
echo $#
echo $1
Please recall that the values you are echoing, $0, $#, $1 are positional parameters to bash related to the command line arguments. Typically options or filenames for processing.
To print the whole line, the # of fields on the line, and the value of the first line, awk is a perfect solution to this problem.
Try changing your script to
cat myScript.awk
#!/bin/awk -f
{
print $0
print $NF
print $1
}
chmod 755 myScript.awk
Hmm.. Seeing ^Z to stop input tells me you are using Windows or are you using bash under Cygwin?
I hope this helps.
This might be a buffering issue. The GNU Coreutils come with a tool called stdbuf. If it is available on your system, try running:
stdbuf -o0 program | stdbuf -i0 script

How to get only the first ten bytes of a binary file

I am writing a bash script that needs to get the header (first 10 bytes) of a file and then in another section get everything except the first 10 bytes. These are binary files and will likely have \0's and \n's throughout the first 10 bytes. It seems like most utilities work with ASCII files. What is a good way to achieve this task?
To get the first 10 bytes, as noted already:
head -c 10
To get all but the first 10 bytes (at least with GNU tail):
tail -c+11
head -c 10 does the right thing here.
You can use the dd command to copy an arbitrary number of bytes from a binary file.
dd if=infile of=outfile1 bs=10 count=1
dd if=infile of=outfile2 bs=10 skip=1
How to split a stream (or a file) under bash
Two answer here!
Reading SO request:
get the header (first 10 bytes) of a file and then in another section get everything except the first 10 bytes.
I understand:
How to split a file at specific point
As all answers here does access same file two time, instead of just split it!!
Here is my two cents:
The interesting thing using Un*x is considering every whole job as a filter, it's easy to a split stream using unbuffered I/O. Most of standard un*x tools (cat, grep, awk, sed, python, perl ...) work as filters.
1. Using head or dd but in a single pass
{ head -c 10 >head_part; cat >tail_part;} <file
This is the more efficient, as your file is read only 1 time, the first 10 byte goes to head_part and the rest goes to tail_part.
Note: second redirection >tail_part could be place outside of whole list ({ ...;}) as well...
You could do same, using dd:
{ dd count=1 bs=10 of=head_part; cat;} <file >tail_part
This stay more efficient than running two process of dd to open same file two times.
...And still use standard block size for the rest of file:
Another sample based on read by line:
Split HTTP (or mail) stream on near empty line (line containing only carriage return: \r):
nc google.com 80 <<<$'GET / HTTP/1.0\r\nHost: google.com\r\n\r' |
{ sed -u '/^\r$/q' >/tmp/so_head.raw; cat;} >/tmp/so_body.raw
or, to drop empty last head line:
nc google.com 80 <<<$'GET / HTTP/1.0\r\nHost: google.com\r\n\r' |
{ sed -nu '/^\r$/q;p' >/tmp/so_head.raw; cat;} >/tmp/so_body.raw
This will produce two files:
ls -l so_*.raw
-rw-r--r-- 1 root root 307 Apr 25 11:40 so_head.raw
-rw-r--r-- 1 root root 219 Apr 25 11:40 so_body.raw
grep www so_*.raw
so_body.raw:here.
so_head.raw:Location: http://www.google.com/
2. Pure bash way:
If the goal is to obtain values of first 10 bytes in a usable bash variable, here is a nice and efficient way:
Because ten byte are few, fork to head could be avoided. from Read a file by bytes in BASH:
read8() {
local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car || { printf -v $_r8_var '';return 1;}
printf -v $_r8_var %02X "'"$_r8_car
}
{
first10=()
for i in {0..9};do
read8 first10[i] || break
done
cat
} < "$infile" >"$outfile"
This will create an array ${first10[#]} containing hexadecimal values of first ten bytes of $infile and store rest of data into $outfile.
declare -p first10
declare -a first10=([0]="25" [1]="50" [2]="44" [3]="46" [4]="2D" [5]="31" [6]="2E"
[7]="34" [8]="0A" [9]="25")
This was a PDF (%PDF -> 25 50 44 46)... Here's another sample:
{
first10=()
for i in {0..9};do
read8 first10[i] || break
done
cat
} <<<"Hello world!"
d!
As I didn't redirect output, string d! will be output on terminal.
echo ${first10[#]}
48 65 6C 6C 6F 20 77 6F 72 6C
printf '%b%b%b%b%b%b%b%b%b%b\n' ${first10[#]/#/\\x}
Hello worl
About binary
You said:
These are binary files and will likely have \0's and \n's throughout the first 10 bytes.
{
first10=()
for i in {0..9};do
read8 first10[i] || break
done
cat
} < <(gzip <<<"Hello world!") >/dev/null
echo ${first10[#]}
1F 8B 08 00 00 00 00 00 00 03
( Sample with a \n at bottom of this ;)
As a function
read8() { local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car || { printf -v $_r8_var '';return 1;}
printf -v $_r8_var %02X "'"$_r8_car ;}
get10() {
local -n result=${1:-first10} # 1st arg is array name
local -i _i
result=()
for ((_i=0;_i<${2:-10};_i++));do # 2nd arg is number of bytes
read8 result[_i] || { unset result[_i] ; return 1 ;}
done
cat
}
Then (here, I use the special character ⛶ for: there was no newline. ).
get10 pdf 4 <$infile >$outfile
printf %b ${pdf[#]/#/\\x}
%PDF⛶
echo $(( $(stat -c %s $infile) - $(stat -c %s $outfile) ))
4
get10 test 8 <<<'Hello world'
rld!
printf %b ${test[#]/#/\\x}
Hello Wo⛶
get10 test 24 <<<'Hello World!'
printf %b ${test[#]/#/\\x}
Hello World!
( And the last character printed is a \n! ;)
Final binary demo:
get10 test 256 < <(gzip <<<'Hello world!')
printf '%b' ${test[#]/#/\\x} | gunzip
Hello world!
printf " %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s\n" ${test[#]}
1F 8B 08 00 00 00 00 00 00 03 F3 48 CD C9 C9 57
28 CF 2F CA 49 51 E4 02 00 41 E4 A9 B2 0D 00 00
00
Note!! This work fine and is very quick while number of byte to read stay low, even processing large files. This could be used for file recognition, for sample. But for spliting files on larger parts, you have to use split, head, tail and/or dd.

using bash: write bit representation of integer to file

I have a file with binary data and I need to replace a few bytes in a certain position. I've come up with the following to direct bash to the offset and show me that it found the place I want:
dd bs=1 if=file iseek=24 conv=block cbs=2 | hexdump
Now, to use "file" as the output:
echo anInteger | dd bs=1 of=hextest.txt oseek=24 conv=block cbs=2
This seems to work just fine, I can review the changes made in a hex editor. Problem is, "anInteger" will be written as the ASCII representation of that integer (which makes sense) but I need to write the binary representation.
I want to use bash for this and the script should run on as many systems as possible (I don't know if the target system will have python or whatever installed).
How do I tell the command to convert the input to binary (possibly from a hex)?
printf is more portable than echo. This function takes a decimal integer and outputs a byte with that value:
echobyte () {
if (( $1 >= 0 && $1 <= 255 ))
then
printf "\\x$(printf "%x" $1)"
else
printf "Invalid value\n" >&2
return 1
fi
}
$ echobyte 97
a
$ for i in {0..15}; do echobyte $i; done | hd
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010
You can use echo to emit specific bytes using hex or octal. For example:
echo -n -e \\x30
will print ascii 0 (0x30)
(-n remove trailing newline)
xxd is the better way. xxd -r infile outfile will take ascii hex-value in infile to patch outfile, and you can specify the specific position in infile by this: 1FE:55AA
Worked like a treat. I used the following code to replace 4 bytes at byte 24 in little endian with two integers (1032 and 1920). The code does not truncate the file.
echo -e \\x08\\x04\\x80\\x07 | dd of=<file> obs=1 oseek=24 conv=block,notrunc cbs=4
Thanks again.
I have a function to do this:
# number representation from 0 to 255 (one char long)
function chr() { printf "\\$(printf '%03o' "$1")" ; return 0 ; }
# from 0 to 65535 (two char long)
function word_litleendian() { chr $(($1 / 256)) ; chr $(($1 % 256)) ; return 0 ; }
function word_bigendian() { chr $(($1 % 256)) ; chr $(($1 / 256)) ; return 0 ; }
# from 0 to 4294967295 (four char long)
function dword_litleendian() { word_lilteendian $(($1 / 65536)) ; word_litleendian $(($1 % 65536)) ; return 0 ; }
function dword_bigendian() { word_bigendian $(($1 / 65536)) ; word_bigendian $(($1 % 65536)) ; return 0 ; }
You can use piping or redirection to catch the result.
If you're willing to rely on bc (which is fairly common)
echo -e "ibase=16\n obase=2 \n A1" | bc -q
might help.
With bash, "printf" has the "-v" option, and all shell has logical operators.
So here is simplier form in bash :
int2bin() {
local i=$1
local f
printf -v f '\\x%02x\\x%02x\\x%02x\\x%02x' $((i&255)) $((i >> 8 & 255)) $((i >> 16 & 255)) $((i >> 24 & 255))
printf "$f"
}
You might put the desired input into a file and use the "if=" option to dd to insert exactly the input you desire.
In my case, I needed to go from a decimal numeric argument to the actual unsigned 16-bit big endian value. This is probably not the most efficient way, but it works:
# $1 is whatever number (0 to 65535) the caller specifies
DECVAL=$1
HEXSTR=`printf "%04x" "$DECVAL"`
BYTEONE=`echo -n "$HEXSTR" | cut -c 1-2`
BYTETWO=`echo -n "$HEXSTR" | cut -c 3-4`
echo -ne "\x$BYTEONE\x$BYTETWO" | dd of="$FILENAME" bs=1 seek=$((0xdeadbeef)) conv=notrunc

Resources