Which efficient & portable shell statement on GNU/Linux can zero-pad piped bytes to word boundary? - shell

I need to pad NUL bytes at the end of a byte stream exceeding available storage & memory, so output length is divisible by N. Context of the function I am implementing:
#!/bin/sh
generate_arbitrary_length | paddingN | work_with_padded
Working code for N=8192:
padding8192(){ dd status=none bs=8192 conv=sync ; }
But reducing copy block size is orders of magnitude slower for small N, this did not finish:
padding4(){ dd status=none bs=4 conv=sync ; }
I can express the counting & padding using wc and dd, after duplicating the input stream:
padding4(){ { { tee /dev/fd/3 >&2 ; } 3>&1 | wc -c | { read -r isize ; pad=$(( 4 - isize % 4)) ; [ 0 -lt $pad ] && dd status=none if=/dev/zero bs=$pad count=1 >&2 ; } } 2>&1 ; }
Much faster already. But very difficult to read - who could even tell why padding ends up at EOF?
Any better approach?
Though I only need to keep as much state as needed to store byte count modulo word size, I cannot think of a simple yet performant implementation using shell builtins. Dependencies should remain minimal: using GNU coreutils/cpio/tar, no compiler/perl/features that would differ between busybox/dash/bash. I have not come up with an awk solution as I failed to make it perform well (G/s) on binary input not evenly NL/NUL-separated into lines.

Since you mention there's a compiler available, here's a tiny, portable C program. It does not get any faster and memory-economic. It's even readable for most people in the programming community. If not, you can always sprinkle /* Comments! */. :-)
#!/bin/sh
#
# pad.sh - pad input, reading in large blocks from stdin, writing stdout.
# padding $1:padchar $2:alignment $3:blocksize
padding () {
aout="./a$$.out"
cc -x c -o "$aout" - <<EOF
#include <stdio.h>
int main (void) {
size_t align = $2, nwritten = 0, nread;
char buffer[$3];
while ((nread = fread (buffer, 1, sizeof buffer, stdin)) > 0)
nwritten += fwrite (buffer, 1, nread, stdout);
if ((nwritten % align) != 0)
for (align -= nwritten % align; align != 0; --align)
putchar ($1);
return 0;
}
EOF
"$aout" && rm "$aout"
}
printf '%s' 123456789 | padding 0 4 16384 | od -c
printf '%s' abcdefghi | padding "'\n'" 16 BUFSIZ | od -c
printf '%s' PAGE_SIZE | padding 65 32 "$(getconf PAGE_SIZE)" | od -c
In action:
$ ./pad.sh
0000000 1 2 3 4 5 6 7 8 9 \0 \0 \0
0000014
0000000 a b c d e f g h i \n \n \n \n \n \n \n
0000020
0000000 P A G E _ S I Z E A A A A A A A
0000020 A A A A A A A A A A A A A A A A
0000040
If you are concerned about the non-POSIXly compiler option -x c you can easily write the C program to pad.c and compile it from there. Advanced error handling for fwrite, fread and putchar left to the reader.
Note how the here-document avoids main having to parse arguments. You can even pass strings like PAGE_SIZE if your stdio makes them available by default.
I just realized that compiling C like this is not much different from a nifty awk script -- awk also compiles an internal program and then executes it. What's better than compiling to the machine's CPU and running the executable?

The POSIX thing to do would be to use a temporary file.
padding() (
tmpf=$(mktemp) &&
trap 'rm "$tmpf"' EXIT &&
tee "$tmpf" &&
isize=$(wc -c <"$tmpf") &&
pad=$(( $1 - isize % $1 )) &&
if [ "$pad" -ne 0 ]; then
dd status=none if=/dev/zero bs="$pad" count=1
fi
)

Related

Is there a command for substituting a set of characters by a set of strings?

I'm would like to substitute a set of edit: single byte characters with a set of literal strings in a stream, without any constraint on the line size.
#!/bin/bash
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'
The expected output would be:
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...
I can think of a bash function that would do that, something like:
chars_to_strings() {
local delim buffer
while true
do
delim=''
IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'
if [[ -n "${delim:+_}" ]] || [[ -n "${buffer:+_}" ]]
then
# Do the replacements in "$buffer"
# ...
printf "%s%s" "$buffer" "$delim"
else
break
fi
done
}
But I'm looking for a more efficient way, any thoughts?
Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?
sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'
Or, via separate commands:
sed -e $'s/\a/<bell>/g' \
-e $'s/\b/<backspace>/g' \
-e $'s/\t/<horizontal-tab>/g' \
-e $'s/\v/<vertical-tab>/g'
Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):
$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
{
gsub(/\a/, "<bell>")
gsub(/\b/, "<backspace>")
gsub(/\t/, "<horizontal-tab>")
gsub(/\v/, "<vertical-tab>")
print $0
}
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>
For a simple one-liner with reasonable portability, try Perl.
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'
Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.
Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).
Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...
A few items:
continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')
General idea:
chars_to_strings() {
while read -r -n 15 buffer # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
do
printf "%s\n" "${buffer}"
done | awk '{print NR,FNR,length($0)}' # replace 'print ...' with OP's replacement logic
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
A variation on this idea using a named pipe:
mkfifo /tmp/pipeX
sleep infinity > /tmp/pipeX # keep pipe open so awk does not exit
awk '{print NR,FNR,length($0)}' < /tmp/pipeX &
chars_to_strings() {
while read -r -n 15 buffer
do
printf "%s\n" "${buffer}"
done > /tmp/pipeX
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
# kill background 'awk' and/or 'sleep infinity' when no longer needed
don't waste FS/OFS - use the built-in variables to take 2 out of the 5 needed :
echo $' \t abc xyz \t \a \n\n ' |
mawk 'gsub(/\7/, "<bell>", $!(NF = NF)) + gsub(/\10/,"<bs>") +\
gsub(/\11/,"<h-tab>")^_' OFS='<v-tab>' FS='\13' ORS='<newline>'
<h-tab> abc xyz <h-tab> <bell> <newline><newline> <newline>
To have NO constraint on the line length you could do something like this with GNU awk:
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(foo,bar)
print
}'
That will read and process the input 100 chars at a time no matter which chars are present, whether it has newlines or not, and even if the input was one multi-terabyte line.
Replace gsub(foo,bar) with whatever substitution(s) you have in mind, e.g.:
$ printf '\a,\b,\t,\v' |
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(/\a/,"<bell>")
gsub(/\b/,"<backspace>")
gsub(/\t/,"<horizontal-tab>")
gsub(/\v/,"<vertical-tab>")
print
}'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab>
and of course it'd be trivial to pass a list of old and new strings to awk rather than hardcoding them, you'd just have to sanitize any regexp or backreference metachars before calling gsub().

Check if a file's hex dump has modulo 4 length

I am trying to write a script that, taking a file as argument, checks if the hex dump of the file has a mod 4 length. If the length is not mod 4 it must add 00 to the end of the dump to make it mod 4
I tried with hexdump
D=$(hexdump filename)
if [ $((${#D}%4)) != 0 ]
then
D+=00
fi
and with od
D=$(od -t x filename)
if [ $((${#D}%4)) != 0 ]
then
D+=00
fi
but both methods not working
I think the problems are as follows:
I include the offset columns in variable D when I only have to consider the hexdump.
For example:
cat file.txt
// Hello World!
//This is a file .txt
od -t x file.txt
//0000000 6c6c6548 6f77206f 21646c72 6968540a
//0000020 73692073 66206120 20656c69 7478742e
//0000040 0000000a
//0000041
The columns 0000000 0000020 0000040 0000041 must not be inside D.
In practice, inside D there must be 6c6c65486f77206f21646c726968540a736920736620612020656c697478742e0000000a
Also, I have to modify that dump so that it has a mod 4 lenght and I don't know if it is enough to simply add 00 at the end.
Any idea how this can be done?
D=$(xxd -p $1|tr -d '\n')
while [ $(bc<<<$(echo -n $D|wc -c)%4) != 0 ];do D+=00;done
does not include columns
Why do you need to examine the hex dump? Just pad the raw file with zeros if necessary.
padump () {
local size=$(stat -f '%z' "$1")
local i
( cat "$1"
for ((i=size; i%2>0; i++)); do
printf '\0'
done ) |
hexdump
}
Usage: padump filename
The argument to stat is unfortunately not portable; on Linux, try stat -c '%s' "$1" (the code above was written on macOS).
The modulo of the above is 2 because four hex digits corresponds to two bytes of file.
Anyway, I'll note that in your attempt, od already adds more padding than you bargained for, and outputs the bytes in the wrong order. The final value at offset 0x0041 is the single byte 0x0a and the zeros before it actually belong after it, and are padding. The other hex values are similarly swapped; 0x6c is the character l, but occurs first in the dump - read as a regular hex dump, 6c6c6548 spells lleH. You can fix this with options;
#!/bin/sh
od -t 2x --endian=big "$1" |
sed 's/^[^ ]* //;s/ //g;$d'
Because od already adds the padding, this is probably the simplest solution. The above also discards the offsets (the first column; sed 's/^[^ ]* //') and removes any remaining spaces between the hex digits (s/ //g), and the final line with just an offset ($d). However, the --endian=big option is Linux only. Alternatively, use -t 1x and add padding as above.

dd: reading binary file as blocks of size N returned less data than N

i need to process large binary files in segments. in concept this would be similar to split, but instead of writing each segment to a file, i need to take that segment and send it as the input of another process. i thought i could use dd to read/write the file in chunks, but the results aren't at all what i expected. for example, if i try :
dd if=some_big_file bs=1M |
while : ; do
dd bs=1M count=1 | processor
done
... the output sizes are actually 131,072 bytes and not 1,048,576.
could anyone tell me why i'm not seeing output blocked to 1M chunks and how i could better accomplish what i'm trying to do ?
thanks.
According to dd's manual:
bs=bytes
[...] if no data-transforming conv option is specified, input is copied to the output as soon as it's read, even if it is smaller than the block size.
So try with dd iflag=fullblock:
fullblock
Accumulate full blocks from input. The read system call may
return early if a full block is not available. When that
happens, continue calling read to fill the remainder of the
block. This flag can be used only with iflag. This flag is
useful with pipes for example as they may return short reads.
In that case, this flag is needed to ensure that a count=
argument is interpreted as a block count rather than a count
of read operations.
First of all, you don't need the first dd. A cat file | while or done < file would do the trick as well.
dd bs=1M count=1 might return less than 1M, see
When is dd suitable for copying data? (or, when are read() and write() partial)
Instead of dd count=… use head with the (non-posix) option -c ….
file=some_big_file
(( m = 1024 ** 2 ))
(( blocks = ($(stat -c %s "$file") + m - 1) / m ))
for ((i=0; i<blocks; ++i)); do
head -c "$m" | processor
done < "$file"
Or posix conform but very inefficient
(( octM = 4 * 1024 * 1024 ))
someCommand | od -v -to1 -An | tr -d \\n | tr ' ' '\\' |
while IFS= read -rN $octM block; do
printf %b "$block" | processor
done

using bash: write bit representation of integer to file

I have a file with binary data and I need to replace a few bytes in a certain position. I've come up with the following to direct bash to the offset and show me that it found the place I want:
dd bs=1 if=file iseek=24 conv=block cbs=2 | hexdump
Now, to use "file" as the output:
echo anInteger | dd bs=1 of=hextest.txt oseek=24 conv=block cbs=2
This seems to work just fine, I can review the changes made in a hex editor. Problem is, "anInteger" will be written as the ASCII representation of that integer (which makes sense) but I need to write the binary representation.
I want to use bash for this and the script should run on as many systems as possible (I don't know if the target system will have python or whatever installed).
How do I tell the command to convert the input to binary (possibly from a hex)?
printf is more portable than echo. This function takes a decimal integer and outputs a byte with that value:
echobyte () {
if (( $1 >= 0 && $1 <= 255 ))
then
printf "\\x$(printf "%x" $1)"
else
printf "Invalid value\n" >&2
return 1
fi
}
$ echobyte 97
a
$ for i in {0..15}; do echobyte $i; done | hd
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010
You can use echo to emit specific bytes using hex or octal. For example:
echo -n -e \\x30
will print ascii 0 (0x30)
(-n remove trailing newline)
xxd is the better way. xxd -r infile outfile will take ascii hex-value in infile to patch outfile, and you can specify the specific position in infile by this: 1FE:55AA
Worked like a treat. I used the following code to replace 4 bytes at byte 24 in little endian with two integers (1032 and 1920). The code does not truncate the file.
echo -e \\x08\\x04\\x80\\x07 | dd of=<file> obs=1 oseek=24 conv=block,notrunc cbs=4
Thanks again.
I have a function to do this:
# number representation from 0 to 255 (one char long)
function chr() { printf "\\$(printf '%03o' "$1")" ; return 0 ; }
# from 0 to 65535 (two char long)
function word_litleendian() { chr $(($1 / 256)) ; chr $(($1 % 256)) ; return 0 ; }
function word_bigendian() { chr $(($1 % 256)) ; chr $(($1 / 256)) ; return 0 ; }
# from 0 to 4294967295 (four char long)
function dword_litleendian() { word_lilteendian $(($1 / 65536)) ; word_litleendian $(($1 % 65536)) ; return 0 ; }
function dword_bigendian() { word_bigendian $(($1 / 65536)) ; word_bigendian $(($1 % 65536)) ; return 0 ; }
You can use piping or redirection to catch the result.
If you're willing to rely on bc (which is fairly common)
echo -e "ibase=16\n obase=2 \n A1" | bc -q
might help.
With bash, "printf" has the "-v" option, and all shell has logical operators.
So here is simplier form in bash :
int2bin() {
local i=$1
local f
printf -v f '\\x%02x\\x%02x\\x%02x\\x%02x' $((i&255)) $((i >> 8 & 255)) $((i >> 16 & 255)) $((i >> 24 & 255))
printf "$f"
}
You might put the desired input into a file and use the "if=" option to dd to insert exactly the input you desire.
In my case, I needed to go from a decimal numeric argument to the actual unsigned 16-bit big endian value. This is probably not the most efficient way, but it works:
# $1 is whatever number (0 to 65535) the caller specifies
DECVAL=$1
HEXSTR=`printf "%04x" "$DECVAL"`
BYTEONE=`echo -n "$HEXSTR" | cut -c 1-2`
BYTETWO=`echo -n "$HEXSTR" | cut -c 3-4`
echo -ne "\x$BYTEONE\x$BYTETWO" | dd of="$FILENAME" bs=1 seek=$((0xdeadbeef)) conv=notrunc

Shell command to sum integers, one per line?

I am looking for a command that will accept (as input) multiple lines of text, each line containing a single integer, and output the sum of these integers.
As a bit of background, I have a log file which includes timing measurements. Through grepping for the relevant lines and a bit of sed reformatting I can list all of the timings in that file. I would like to work out the total. I can pipe this intermediate output to any command in order to do the final sum. I have always used expr in the past, but unless it runs in RPN mode I do not think it is going to cope with this (and even then it would be tricky).
How can I get the summation of integers?
Bit of awk should do it?
awk '{s+=$1} END {print s}' mydatafile
Note: some versions of awk have some odd behaviours if you are going to be adding anything exceeding 2^31 (2147483647). See comments for more background. One suggestion is to use printf rather than print:
awk '{s+=$1} END {printf "%.0f", s}' mydatafile
Paste typically merges lines of multiple files, but it can also be used to convert individual lines of a file into a single line. The delimiter flag allows you to pass a x+x type equation to bc.
paste -s -d+ infile | bc
Alternatively, when piping from stdin,
<commands> | paste -s -d+ - | bc
The one-liner version in Python:
$ python -c "import sys; print(sum(int(l) for l in sys.stdin))"
I would put a big WARNING on the commonly approved solution:
awk '{s+=$1} END {print s}' mydatafile # DO NOT USE THIS!!
that is because in this form awk uses a 32 bit signed integer representation: it will overflow for sums that exceed 2147483647 (i.e., 2^31).
A more general answer (for summing integers) would be:
awk '{s+=$1} END {printf "%.0f\n", s}' mydatafile # USE THIS INSTEAD
Plain bash:
$ cat numbers.txt
1
2
3
4
5
6
7
8
9
10
$ sum=0; while read num; do ((sum += num)); done < numbers.txt; echo $sum
55
With jq:
seq 10 | jq -s 'add' # 'add' is equivalent to 'reduce .[] as $item (0; . + $item)'
dc -f infile -e '[+z1<r]srz1<rp'
Note that negative numbers prefixed with minus sign should be translated for dc, since it uses _ prefix rather than - prefix for that. For example, via tr '-' '_' | dc -f- -e '...'.
Edit: Since this answer got so many votes "for obscurity", here is a detailed explanation:
The expression [+z1<r]srz1<rp does the following:
[ interpret everything to the next ] as a string
+ push two values off the stack, add them and push the result
z push the current stack depth
1 push one
<r pop two values and execute register r if the original top-of-stack (1)
is smaller
] end of the string, will push the whole thing to the stack
sr pop a value (the string above) and store it in register r
z push the current stack depth again
1 push 1
<r pop two values and execute register r if the original top-of-stack (1)
is smaller
p print the current top-of-stack
As pseudo-code:
Define "add_top_of_stack" as:
Remove the two top values off the stack and add the result back
If the stack has two or more values, run "add_top_of_stack" recursively
If the stack has two or more values, run "add_top_of_stack"
Print the result, now the only item left in the stack
To really understand the simplicity and power of dc, here is a working Python script that implements some of the commands from dc and executes a Python version of the above command:
### Implement some commands from dc
registers = {'r': None}
stack = []
def add():
stack.append(stack.pop() + stack.pop())
def z():
stack.append(len(stack))
def less(reg):
if stack.pop() < stack.pop():
registers[reg]()
def store(reg):
registers[reg] = stack.pop()
def p():
print stack[-1]
### Python version of the dc command above
# The equivalent to -f: read a file and push every line to the stack
import fileinput
for line in fileinput.input():
stack.append(int(line.strip()))
def cmd():
add()
z()
stack.append(1)
less('r')
stack.append(cmd)
store('r')
z()
stack.append(1)
less('r')
p()
Pure and short bash.
f=$(cat numbers.txt)
echo $(( ${f//$'\n'/+} ))
perl -lne '$x += $_; END { print $x; }' < infile.txt
My fifteen cents:
$ cat file.txt | xargs | sed -e 's/\ /+/g' | bc
Example:
$ cat text
1
2
3
3
4
5
6
78
9
0
1
2
3
4
576
7
4444
$ cat text | xargs | sed -e 's/\ /+/g' | bc
5148
I've done a quick benchmark on the existing answers which
use only standard tools (sorry for stuff like lua or rocket),
are real one-liners,
are capable of adding huge amounts of numbers (100 million), and
are fast (I ignored the ones which took longer than a minute).
I always added the numbers of 1 to 100 million which was doable on my machine in less than a minute for several solutions.
Here are the results:
Python
:; seq 100000000 | python -c 'import sys; print sum(map(int, sys.stdin))'
5000000050000000
# 30s
:; seq 100000000 | python -c 'import sys; print sum(int(s) for s in sys.stdin)'
5000000050000000
# 38s
:; seq 100000000 | python3 -c 'import sys; print(sum(int(s) for s in sys.stdin))'
5000000050000000
# 27s
:; seq 100000000 | python3 -c 'import sys; print(sum(map(int, sys.stdin)))'
5000000050000000
# 22s
:; seq 100000000 | pypy -c 'import sys; print(sum(map(int, sys.stdin)))'
5000000050000000
# 11s
:; seq 100000000 | pypy -c 'import sys; print(sum(int(s) for s in sys.stdin))'
5000000050000000
# 11s
Awk
:; seq 100000000 | awk '{s+=$1} END {print s}'
5000000050000000
# 22s
Paste & Bc
This ran out of memory on my machine. It worked for half the size of the input (50 million numbers):
:; seq 50000000 | paste -s -d+ - | bc
1250000025000000
# 17s
:; seq 50000001 100000000 | paste -s -d+ - | bc
3750000025000000
# 18s
So I guess it would have taken ~35s for the 100 million numbers.
Perl
:; seq 100000000 | perl -lne '$x += $_; END { print $x; }'
5000000050000000
# 15s
:; seq 100000000 | perl -e 'map {$x += $_} <> and print $x'
5000000050000000
# 48s
Ruby
:; seq 100000000 | ruby -e "puts ARGF.map(&:to_i).inject(&:+)"
5000000050000000
# 30s
C
Just for comparison's sake I compiled the C version and tested this also, just to have an idea how much slower the tool-based solutions are.
#include <stdio.h>
int main(int argc, char** argv) {
long sum = 0;
long i = 0;
while(scanf("%ld", &i) == 1) {
sum = sum + i;
}
printf("%ld\n", sum);
return 0;
}
 
:; seq 100000000 | ./a.out
5000000050000000
# 8s
Conclusion
C is of course fastest with 8s, but the Pypy solution only adds a very little overhead of about 30% to 11s. But, to be fair, Pypy isn't exactly standard. Most people only have CPython installed which is significantly slower (22s), exactly as fast as the popular Awk solution.
The fastest solution based on standard tools is Perl (15s).
Using the GNU datamash util:
seq 10 | datamash sum 1
Output:
55
If the input data is irregular, with spaces and tabs at odd places, this may confuse datamash, then either use the -W switch:
<commands...> | datamash -W sum 1
...or use tr to clean up the whitespace:
<commands...> | tr -d '[[:blank:]]' | datamash sum 1
If the input is large enough, the output will be in scientific notation.
seq 100000000 | datamash sum 1
Output:
5.00000005e+15
To convert that to decimal, use the the --format option:
seq 100000000 | datamash --format '%.0f' sum 1
Output:
5000000050000000
Plain bash one liner
$ cat > /tmp/test
1
2
3
4
5
^D
$ echo $(( $(cat /tmp/test | tr "\n" "+" ) 0 ))
BASH solution, if you want to make this a command (e.g. if you need to do this frequently):
addnums () {
local total=0
while read val; do
(( total += val ))
done
echo $total
}
Then usage:
addnums < /tmp/nums
You can using num-utils, although it may be overkill for what you need. This is a set of programs for manipulating numbers in the shell, and can do several nifty things, including of course, adding them up. It's a bit out of date, but they still work and can be useful if you need to do something more.
https://suso.suso.org/programs/num-utils/index.phtml
It's really simple to use:
$ seq 10 | numsum
55
But runs out of memory for large inputs.
$ seq 100000000 | numsum
Terminado (killed)
The following works in bash:
I=0
for N in `cat numbers.txt`
do
I=`expr $I + $N`
done
echo $I
I realize this is an old question, but I like this solution enough to share it.
% cat > numbers.txt
1
2
3
4
5
^D
% cat numbers.txt | perl -lpe '$c+=$_}{$_=$c'
15
If there is interest, I'll explain how it works.
Cannot avoid submitting this, it is the most generic approach to this Question, please check:
jot 1000000 | sed '2,$s/$/+/;$s/$/p/' | dc
It is to be found over here, I was the OP and the answer came from the audience:
Most elegant unix shell one-liner to sum list of numbers of arbitrary precision?
And here are its special advantages over awk, bc, perl, GNU's datamash and friends:
it uses standards utilities common in any unix environment
it does not depend on buffering and thus it does not choke with really long inputs.
it implies no particular precision limits -or integer size for that matter-, hello AWK friends!
no need for different code, if floating point numbers need to be added, instead.
it theoretically runs unhindered in the minimal of environments
sed 's/^/.+/' infile | bc | tail -1
Pure bash and in a one-liner :-)
$ cat numbers.txt
1
2
3
4
5
6
7
8
9
10
$ I=0; for N in $(cat numbers.txt); do I=$(($I + $N)); done; echo $I
55
Alternative pure Perl, fairly readable, no packages or options required:
perl -e "map {$x += $_} <> and print $x" < infile.txt
For Ruby Lovers
ruby -e "puts ARGF.map(&:to_i).inject(&:+)" numbers.txt
Here's a nice and clean Raku (formerly known as Perl 6) one-liner:
say [+] slurp.lines
We can use it like so:
% seq 10 | raku -e "say [+] slurp.lines"
55
It works like this:
slurp without any arguments reads from standard input by default; it returns a string. Calling the lines method on a string returns a list of lines of the string.
The brackets around + turn + into a reduction meta operator which reduces the list to a single value: the sum of the values in the list. say then prints it to standard output with a newline.
One thing to note is that we never explicitly convert the lines to numbers—Raku is smart enough to do that for us. However, this means our code breaks on input that definitely isn't a number:
% echo "1\n2\nnot a number" | raku -e "say [+] slurp.lines"
Cannot convert string to number: base-10 number must begin with valid digits or '.' in '⏏not a number' (indicated by ⏏)
in block <unit> at -e line 1
You can do it in python, if you feel comfortable:
Not tested, just typed:
out = open("filename").read();
lines = out.split('\n')
ints = map(int, lines)
s = sum(ints)
print s
Sebastian pointed out a one liner script:
cat filename | python -c"from fileinput import input; print sum(map(int, input()))"
The following should work (assuming your number is the second field on each line).
awk 'BEGIN {sum=0} \
{sum=sum + $2} \
END {print "tot:", sum}' Yourinputfile.txt
$ cat n
2
4
2
7
8
9
$ perl -MList::Util -le 'print List::Util::sum(<>)' < n
32
Or, you can type in the numbers on the command line:
$ perl -MList::Util -le 'print List::Util::sum(<>)'
1
3
5
^D
9
However, this one slurps the file so it is not a good idea to use on large files. See j_random_hacker's answer which avoids slurping.
One-liner in Racket:
racket -e '(define (g) (define i (read)) (if (eof-object? i) empty (cons i (g)))) (foldr + 0 (g))' < numlist.txt
C (not simplified)
seq 1 10 | tcc -run <(cat << EOF
#include <stdio.h>
int main(int argc, char** argv) {
int sum = 0;
int i = 0;
while(scanf("%d", &i) == 1) {
sum = sum + i;
}
printf("%d\n", sum);
return 0;
}
EOF)
My version:
seq -5 10 | xargs printf "- - %s" | xargs | bc
C++ (simplified):
echo {1..10} | scc 'WRL n+=$0; n'
SCC project - http://volnitsky.com/project/scc/
SCC is C++ snippets evaluator at shell prompt

Resources