Is there a fast way to read alternate bytes in dd - shell

I'm trying to read out every other pair of bytes in a binary file using dd in a loop, but it is unusably slow.
I have a binary file on a BusyBox embedded device containing data in rgb565 format. Each pixel is 2 bytes and I'm trying to read out every other pixel to do very basic image scaling to reduce file size.
The overall size is 640x480 and I've been able to read every other "row" of pixels by looping dd with a 960 byte block size. But doing the same for every other "column" that remains by looping through with a 2 byte block size is ridiculously slow even on my local system.
i=1
while [[ $i -le 307200 ]]
do
dd bs=2 skip=$((i-1)) seek=$((i-1)) count=1 if=./tmpfile >> ./outfile 2>/dev/null
let i=i+2
done
While I get the output I expect, this method is unusable.
Is there some less obvious way to have dd quickly copy every other pair of bytes?
Sadly I don't have much control over what gets compiled in to BusyBox. I'm open to other possible methods but a dd/sh solution may be all I can use. For instance, one build has omitted head -c...
I appreciate all the feedback. I will check out each of the various suggestions and check back with results.

Skipping every other character is trivial for tools like sed or awk as long as you don't need to cope with newlines and null bytes. But Busybox's support for null bytes in sed and awk is poor enough that I don't think you can cope with them at all. It's possible to deal with newlines, but it's a giant pain because there are 16 different combinations to deal with depending on whether each position in a 4-byte block is a newline or not.
Since arbitrary binary data is a pain, let's translate to hexadecimal or octal! I'll draw some inspiration from bin2hex and hex2bin scripts by Stéphane Chazelas. Since we don't care about the intermediate format, I'll use octal, which is a lot simpler to deal with because the final step uses printf which only supports octal. Stéphane's hex2bin uses awk for the hexadecimal-to-octal conversion; a oct2bin can use sed. So in the end you need sh, od, sed and printf.
I don't think you can avoid printf: it's critical to outputting null bytes. While od is essential, most of its options aren't, so it should be possible to tweak this code to support a very stripped-down od with a bit more postprocessing.
od -An -v -t o1 -w4 |
sed 's/^ \([0-7]*\) \([0-7]*\).*/printf \\\\\1\\\\\2/' |
sh
The reason this is so fast compared to your dd-based approach is that BusyBox runs printf in the parent process, whereas dd requires its own process. Forking is slow. If I remember correctly, there's a compilation option which makes BusyBox fork for all utilities. In this case my approach will probably be as slow as yours. Here's an intermediate approach using dd which can't avoid the forks, but at least avoids opening and closing the file every time. It should be a little faster than yours.
i=$(($(wc -c <"$1") / 4))
exec <"$1"
dd ibs=2 count=1 conv=notrunc 2>/dev/null
while [ $i -gt 1 ]; do
dd ibs=2 count=1 skip=1 conv=notrunc 2>/dev/null
i=$((i - 1))
done

No idea if this will be faster or even possible with BusyBox, but it's a thought...
#!/bin/bash
# Empty result file
> result
exec 3< datafile
while true; do
# Read 2 bytes into file "short"
dd bs=2 count=1 <&3 > short 2> /dev/null
[ ! -s short ] && break
# Accumulate result file
cat short >> result
# Read two bytes and discard
dd bs=2 count=1 <&3 > short 2> /dev/null
[ ! -s short ] && break
done
Or this should be more efficient:
#!/bin/bash
exec 3< datafile
for ((i=0;i<76800;i++)) ; do
# Skip 2 bytes then read 2 bytes
dd bs=2 count=1 skip=1 <&3 2> /dev/null
done > result
Or, maybe you could use netcat or ssh to send the file to a sensible (more powerful) computer with proper tools to process it and return it. For example, if the remote computer had ImageMagick it could down-scale the image very simply.

Another option might be to use Lua which has a reputation for being small, fast and well suited to embedded systems - see Lua website. There are pre-built, downloadable binaries of it there too. It is also suggested on the Busybox website here.
I have never written any Lua before, so there may be some inefficiencies but this seems to work pretty well and processes a 640x480 RGB565 image in a few milliseconds on my desktop.
-- scale.lua
-- Usage: lua scale.lua input.bin output.bin
-- Scale an image by skipping alternate lines and alternate columns
-- Set up width, height and bytes per pixel
w = 640
h = 480
bpp = 2
-- Open first argument for input, second for output
inp = assert(io.open(arg[1], "rb"))
out = assert(io.open(arg[2], "wb"))
-- Read image, one line at a time
for i = 0, h-1, 1 do
-- Read a whole line
line = inp:read(w*bpp)
-- Only use every second line
if (i % 2) == 0 then
io.write("DEBUG: Processing row: ",i,"\n")
-- Build up new, reduced line by picking substrings
reduced=""
for p = 1, w*bpp, bpp*2 do
reduced = reduced .. string.sub(line,p,p+bpp-1)
end
io.write("DEBUG: New line length in bytes: ",#reduced,"\n")
out:write(reduced)
end
end
assert(out:close())
I created a greyscale test image with ImageMagick as follows:
magick -depth 16 -size 640x480 gradient: gray:image.bin
Then I ran the above Lua script with:
lua scale.lua image.bin smaller.bin
Then I made a JPEG I could view for testing with:
magick -depth 16 -size 320x240 gray:smaller.bin smaller.jpg

Related

Bash split stdin by null and pipe to pipeline

I have a stream that is null delimited, with an unknown number of sections. For each delimited section I want to pipe it into another pipeline until the last section has been read, and then terminate.
In practice, each section is very large (~1GB), so I would like to do this without reading each section into memory.
For example, imagine I have the stream created by:
for I in {3..5}; do seq $I; echo -ne '\0';
done
I'll get a steam that looks like:
1
2
3
^#1
2
3
4
^#1
2
3
4
5
^#
When piped through cat -v.
I would like to pipe each section through paste -sd+ | bc, so I get a stream that looks like:
6
10
15
This is simply an example. In actuality the stream is much larger and the pipeline is more complicated, so solutions that don't rely on streams are not feasible.
I've tried something like:
set -eo pipefail
while head -zn1 | head -c-1 | ifne -n false | paste -sd+ | bc; do :; done
but I only get
6
10
If I leave off bc I get
1+2+3
1+2+3+4
1+2+3+4+5
which is basically correct. This leads me to believe that the issue is potentially related to buffering and the way each process is actually interacting with the pipes between them.
Is there some way to fix the way that these commands exchange streams so that I can get the desired output? Or, alternatively, is there a way to accomplish this with other means?
In principle this is related to this question, and I could certainly write a program that reads stdin into a buffer, looks for the null character, and pipes the output to a spawned subprocess, as the accepted answer does for that question. Given the general support of streams and null delimiters in bash, I'm hoping to do something that's a little more "native". In particular, if I want to go this route, I'll have to escape the pipeline (paste -sd+ | bc) in a string instead of just letting the same shell interpret it. There's nothing too inherently bad about this, but it's a little ugly and will require a bunch of somewhat error prone escaping.
Edit
As was pointed out in an answer, head makes no guarantees about how much it buffers. Unless it only buffers single byte at a time, which would be impractical, this will never work. Thus, it seems like the only solution would be to read it into memory, or write a specific program.
The issue with your original code is that head doesn't guarantee that it won't read more than it outputs. Thus, it can consume more than one (NUL-delimited) chunk of input, even if it's emitting only one chunk of output.
read, by contrast, guarantees that it won't consume more than you ask it for.
set -o pipefail
while IFS= read -r -d '' line; do
bc <<<"${line//$'\n'/+}"
done < <(build_a_stream)
If you want native logic, there's nothing more native than just writing the whole thing in shell.
Calling external tools -- including bc, cut, paste, or others -- involves a fork() penalty. If you're only processing small amounts of data per invocation, the efficiency of the tools is overwhelmed by the cost of starting them.
while read -r -d '' -a numbers; do # read up to the next NUL into an array
sum=0 # initialize an accumulator
for number in "${numbers[#]}"; do # iterate over that array
(( sum += number )) # ...using an arithmetic context for our math
done
printf '%s\n' "$sum"
done < <(build_a_stream)
For all of the above, I tested with the following build_a_stream implementation:
build_a_stream() {
local i j IFS=$'\n'
local -a numbers
for ((i=3; i<=5; i++)); do
numbers=( )
for ((j=0; j<=i; j++)); do
numbers+=( "$j" )
done
printf '%s\0' "${numbers[*]}"
done
}
As discussed, the only real solution seemed to be writing a program to do this specifically. I wrote one in rust called xstream-util. After installing it with cargo install xstream-util, you can pipe the input into
xstream -0 -- bash -c 'paste -sd+ | bc'
to get the desired output
6
10
15
It doesn't avoid having to run the program in bash, so it still needs escaping if the pipeline is complicated. Also, it currently only supports single byte delimiters.

Disk space required for unix sort

I am currently doing a UNIX sort (via GitBash on a Windows machine) of a 500GB text file. Due to running out of space on the main disk, I have used the -T option to direct the temp files to a disk where I have enough space to accommodate the entire file. The thing is, I've been watching the disk space and apparently the temp files are already in excess of what the original file was. I don't know how much further this is going to go, but I'm wondering if there is a rule by which I can predict how much space I will need for temp files.
I'd batch it manually as described in this unix.SE answer.
Find some very basic queries that will divide your content into chunks that are small enough to be sorted. For example, if it's a file of words, you could create queries like grep ^a …, grep ^b …, and so on. Some items may need more granularity than others.
You can script that like:
#!/bin/bash
for char1 in other {0..9} {a..z}; do
out="/tmp/sort.$char1.xz"
echo "Extracting lines starting with '$char1'"
if [ "$char1" = "other" ]; then char1='[^a-z0-9]'; fi
grep -i "^$char1" *.txt |xz -c0 > "$out"
unxz -c "$out" |sort -u >> output.txt || exit 1
rm "$out"
done
echo "It worked"
I'm using xz -0 because it's almost as fast as gzip's default gzip -6 yet it's vastly better at conserving space. I omitted it from the final output in order to preserve the exit value of sort -u, but you could instead use a size check (iirc, sort fails with zero output) and then use sort -u |xz -c0 >> output.txt.xz since the xz (and gzip) container lets you concatenate archives (I've written about that before too).
This works because the output of each grep run is already sorted (0 is before 1, which is before a, etc.), so the final assembly doesn't need to run through sort (note, the "other" section will be slightly different since some non-alphanumeric characters are before the numbers, others are between numbers and letters, and others still are after the letters. You can also remove grep's -i flag and additionally iterate through {A..Z} to be case sensitive). Each individual iteration obviously still needs to be sorted, but hopefully they're manageable.
If the program exits before completing all iterations and saying "It worked" then you can edit the script with a more discrete batch for the last iteration it tried. Remove all prior iterations since they're successfully saved in output.txt.

How to move and rename a file with random characters in shell?

I have this file:
/root/.aria2/aria2.txt
and I want to move it to:
/var/spool/sms/outgoing/aria2_XXXXX
Note that XXXXX are random characters.
How do I do that using only the facilities exposed by the openwrt (a GNU/Linux distribution for embedded devices) and the ash shell?
A simple way of generating a semi-random number in bash is to use the date +%N command or the system provided $RANDOM:
rn=$(date +%N) # Nanoseconds
rn=${rn:3:5} # to limit to 5 digits
or, using $RANDOM, you need to check you have sufficient digits for your purpose. If 5 is the number of digits you need:
rn=$RANDOM
while [ ${#rn} -lt 5 ]; do
rn="${rn}${RANDOM}"
done
rn=${rn:0:5}
To move while providing the random suffix:
mv /root/.aria2/aria2.txt /var/spool/sms/outgoing/aria2_${rn}
On systems with /dev/random you can obtain a string of random ASCII characters with something like
dd if=/dev/random count=1 |
tr -dc ' -~' |
dd bs=8 count=1
Set the bs= in the second instance to the amount of characters you want.
The probability of getting the same result twice is very low, but you have not told us what is an acceptable range. You should understand (or help us help you understand) what is an acceptable probability in your scenario.
Use the tempfile command
mv aria2.txt `tempfile -d $dir -p aria2`
see man tempfile for the gory details.

How to shrink a large file in bash?

I have a file that I want to truncate to 2kb (i.e. keep the first 2kb of data, get rid of the rest). How can I do this with bash?
The command is (surprise, surprise) truncate.
truncate -s 2KB file
The standards-compliant way to do this (not relying on any Linux-only tools such as truncate) is to use dd:
dd if=/dev/null of=/file/to/truncate seek=1 bs=2k
Unlike the other dd answer, which merely copies the first 2k of a file, this one truncates the target file at that point.
You could do something like this in pure bash:
IFS= read -r -n 2048 first2k < file
printf "%s" "$first2k" > file
but using dd is a much better idea. For one, it's more likely to be atomic; it's possible an external process could modify the first 2048 bytes of file after read, but before the printf. Second, it's less verbose :)
You can also use read's default variable REPLY, which does not require setting IFS to avoid word splitting:
read -r -n 2048 < file
printf "%s" "$REPLY" > file
Use dd:
dd if=yourfile of=firstLump bs=2k count=1
if = the input file
of = the output file
bs = blocksize
count = number of blocks
Available on Linux AND Mac OSX.

Bash Version of C64 Code Art: 10 PRINT CHR$(205.5+RND(1)); : GOTO 10

I picked up a copy of the book 10 PRINT CHR$(205.5+RND(1)); : GOTO 10
http://www.amazon.com/10-PRINT-CHR-205-5-RND/dp/0262018462
This book discusses the art produced by the single line of Commodore 64 BASIC:
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
This just repeatedly prints randomly character 205 or 206 to the screen from the PETSCII set:
http://en.wikipedia.org/wiki/PETSCII
https://vimeo.com/26472518
I'm not sure why the original uses the characters 205 and 206 instead of the identical 109 and 110. Also, I prefer to add a clear at the beginning. This is what I usually type into the C64:
1?CHR$(147)
2?CHR$(109.5+RND(1));:GOTO2
RUN
You can try this all for yourself in an emulator, such as this one using Flash or JavaScript:
http://codeazur.com.br/stuff/fc64_final/
http://www.kingsquare.nl/jsc64
When inputting the above code into the emulators listed, you'll need to realize that
( is *
) is (
+ is ]
I decided it would be amusing to write a bash line to do something similar.
I currently have:
clear; while :; do [ $(($RANDOM%2)) -eq 0 ] && (printf "\\") || (printf "/"); done;
Two questions:
Any suggestions for making this more concise?
Any suggestions
for a better output character? The forward and backward slash are
not nearly as beautiful since their points don't line up. The characters used from PETSCII are special characters, not slashes. I didn't see anything in ASCII that could work as well, but maybe you can suggest a way to pull in a character from UTF-8 or something else?
Best ANSWERS So Far
Shortest for bash (40 characters):
yes 'c=(╱ ╲);printf ${c[RANDOM%2]}'|bash
Here is a short one for zsh (53 characters):
c=(╱ ╲);clear;while :;do printf ${c[RANDOM%2+1]};done
Here is an alias I like to put in my .bashrc or .profile
alias art='c=(╱ ╲);while :;do printf "%s" ${c[RANDOM%2]};done'
Funny comparing this to the shortest I can do for C64 BASIC (23 characters):
1?C_(109.5+R_(1));:G_1
The underscores are shift+H, shift+N, and shift+O respectively. I can't paste the character here since they are specific to PETSCII. Also, the C64 output looks prettier ;)
You can read about the C64 BASIC abbreviations here:
http://www.commodore.ca/manuals/c64_programmers_reference/c64-programmers_reference_guide-02-basic_language_vocabulary.pdf
How about this?
# The characters you want to use
chars=( $'\xe2\x95\xb1' $'\xe2\x95\xb2' )
# Precompute the size of the array chars
nchars=${#chars[#]}
# clear screen
clear
# The loop that prints it:
while :; do
printf -- "${chars[RANDOM%nchars]}"
done
As a one-liner with shorter variable names to make it more concise:
c=($'\xe2\x95\xb1' $'\xe2\x95\xb2'); n=${#c[#]}; clear; while :; do printf -- "${c[RANDOM%n]}"; done
You can get rid of the loop if you know in advance how many characters to print (here 80*24=1920)
c=($'\xe2\x95\xb1' $'\xe2\x95\xb2'); n=${#c[#]}; clear; printf "%s" "${c[RANDOM%n]"{1..1920}"}"
Or, if you want to include the characters directly instead of their code:
c=(╱‬ ╲); n=${#c[#]}; clear; while :; do printf "${c[RANDOM%n]}"; done
Finally, with the size of the array c precomputed and removing unnecessary spaces and quotes (and I can't get shorter than this):
c=(╱‬ ╲);clear;while :;do printf ${c[RANDOM%2]};done
Number of bytes used for this line:
$ wc -c <<< 'c=(╱‬ ╲);clear;while :;do printf ${c[RANDOM%2]};done'
59
Edit. A funny way using the command yes:
clear;yes 'c=(╱ ╲);printf ${c[RANDOM%2]}'|bash
It uses 50 bytes:
$ wc -c <<< "clear;yes 'c=(╱ ╲);printf \${c[RANDOM%2]}'|bash"
51
or 46 characters:
$ wc -m <<< "clear;yes 'c=(╱ ╲);printf \${c[RANDOM%2]}'|bash"
47
After looking at some UTF stuff:
2571 BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT
2572 BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT
(╱‬ and ╲) seem best.
f="╱╲";while :;do print -n ${f[(RANDOM % 2) + 1]};done
also works in zsh (thanks Clint on OFTC for giving me bits of that)
Here is my 39 character command line solution I just posted to #climagic:
grep -ao "[/\\]" /dev/urandom|tr -d \\n
In bash, you can remove the double quotes around the [/\] match expression and make it even shorter than the C64 solution, but I've included them for good measure and cross shell compatibility. If there was a 1 character option to grep to make grep trim newlines, then you could make this 27 characters.
I know this doesn't use the Unicode characters so maybe it doesn't count. It is possible to grep for the Unicode characters in /dev/urandom, but that will take a long time because that sequence comes up less often and if you pipe it the command pipeline will probably "stick" for quite a while before producing anything due to line buffering.
Bash supports Unicode now, so we don't need to use UTF-8 character sequences such as $'\xe2\x95\xb1'.
This is my most-correct version: it loops, prints either / or \ based on a random number as others do.
for((;;x=RANDOM%2+2571)){ printf "\U$x";}
41
My previous best was:
while :;do printf "\U257"$((RANDOM%2+1));done
45
And this one 'cheats' using embedded Unicode (I think for obviousness, maintainability, and simplicity, this is my favourite).
Z=╱╲;for((;;)){ printf ${Z:RANDOM&1:1};}
40
My previous best was:
while Z=╱╲;do printf ${Z:RANDOM&1:1};done
41
And here are some more.
while :;do ((RANDOM&1))&&printf "\U2571"||printf "\U2572";done
while printf -v X "\\\U%d" $((2571+RANDOM%2));do printf $X;done
while :;do printf -v X "\\\U%d" $((2571+RANDOM%2));printf $X;done
while printf -v X '\\U%d' $((2571+RANDOM%2));do printf $X;done
c=('\U2571' '\U2572');while :;do printf ${c[RANDOM&1]};done
X="\U257";while :;do printf $X$((RANDOM%2+1));done
Now, this one runs until we get a stack overflow (not another one!) since bash does not seem to support tail-call elimination yet.
f(){ printf "\U257"$((RANDOM%2+1));f;};f
40
And this is my attempt to implement a crude form of tail-process elimination. But when you have had enough and press ctrl-c, your terminal will vanish.
f(){ printf "\U257"$((RANDOM%2+1));exec bash -c f;};export -f f;f
UPDATE:
And a few more.
X=(╱ ╲);echo -e "\b${X[RANDOM&1]"{1..1000}"}" 46
X=("\U2571" "\U2572");echo -e "\b${X[RANDOM&1]"{1..1000}"}" 60
X=(╱ ╲);while :;do echo -n ${X[RANDOM&1]};done 46
Z=╱╲;while :;do echo -n ${Z:RANDOM&1:1};done 44
Sorry for necroposting, but here's bash version in 38 characters.
yes 'printf \\u$[2571+RANDOM%2]'|bash
using for instead of yes inflates this to 40 characters:
for((;;)){ printf \\u$[2571+RANDOM%2];}
109 chr for Python 3
Which was the smallest I could get it.
#!/usr/bin/python3
import random
while True:
if random.randrange(2)==1:print('\u2572',end='')
else:print('\u2571',end='')
#!/usr/bin/python3
import random
import sys
while True:
if random.randrange(2)==1:sys.stdout.write("\u2571")
else:sys.stdout.write("\u2572")
sys.stdout.flush()
Here's a version for Batch which fits in 127 characters:
cmd /v:on /c "for /l %a in (0,0,0) do #set /a "a=!random!%2" >nul & if "!a!"=="0" (set /p ".=/" <nul) else (set /p ".=\" <nul)"

Resources