Filter out numbers larger than 100 in a file - ksh

I am having large file with numbers each at a line and I just need to list the numbers higher than 100 in this file
I know some while/if that could help in this but I still need the most summarized one liner for example awk command to get that output for me
Example of outputs in my file:
0.000
0.000
260.591
259.906
0.000
864.451
866.000
0.000
0.000
260.796
0.000
0.000
866.351
0.000
87.554
80.000
846.142
1436.716
1435.794
522.925
524.617
0.000

Turning my comment into a proper answer
Using awk
awk '$1 > 100' INPUT.txt
Bash cannot handle floats but it's possible by a little help from bc
$ while read; do if (( $(echo "$REPLY > 100" | bc -l) )); then echo $REPLY; fi; done < INPUT.txt
from help read:
If no NAMEs are supplied, the line read is stored in the REPLY variable.

Related

How to padding digit after floating point in bash

I have input file contains contents as follows
0.0 0.0 98.0 91.0
145.525 72.62 243.525 163.63
I want output as
0.000 0.000 98.000 91.000
145.525 72.620 243.525 163.630
With awk:
awk '{for(i=1;i<=NF;i++){$i=sprintf("%0.3f", $i)}}1' file
See manual sprintf
You can try with sed, It's not mathematic but ...
sed -E ':A;s/([0-9]*\.[0-9]{1,2})( |$)/\10\2/;tA' infile

Bash: Store the last word of each line starting with

I have a *.dat file that incrementally grows for several hours. I want to monitor a certain value in time so that I can compare them, watch its trend and so on.
What I have so far:
LTIME=$(stat -c %Z test2.dat)
while true
do
ATIME=$(stat -c %Z test2.dat)
if [[ "$ATIME" != "$LTIME" ]]
then
grep "15 RT" test2.dat > test_grep2.txt
LTIME=$ATIME
fi
sleep 60
done
which updates an artificial textfile with each increment of the *.dat. Returns stuff like:
15 RT 0.000 0.000 0.000 0.000 0.000 -1.4666E+04
15 RT 0.000 0.000 0.000 0.000 0.000 -1.7073E+04
15 RT 0.000 0.000 0.000 0.000 0.000 -1.9379E+04
15 RT 0.000 0.000 0.000 0.000 0.000 -2.1583E+04
I also have this one:
while read line
do [ -z "$line" ] && continue ;echo ${line##* }
done < test_grep2.txt
which takes the last "word" of each line of such txt into the console:
1.0225E+04
1.1738E+04
1.3219E+04
1.4668E+04
1.6083E+04
2.4867E+04
2.5943E+04
But I havent succeeded yet in putting these two together. This just doesnt work (the last "words" are not printed out as the grep txt keeps getting updated):
[ -e test_grep.txt ] && rm test_grep.txt
LTIME=$(stat -c %Z test2.dat)
while true
do
ATIME=$(stat -c %Z test2.dat)
if [[ "$ATIME" != "$LTIME" ]]
then
grep -i "15 RT" test.dat > test_grep.txt
LTIME=$ATIME
fi
sleep 5
done
datime=$(stat -c %Z test_grep.txt)
while true
do
datime2=$(stat -c %Z test_grep.txt)
if [[ "$datime2" != "$datime" ]]
then
while read line
do [ -z "$line" ] && continue ;echo ${line##* }
done < test_grep.txt
datime=$datime2
fi
sleep 5
done
And I believe that there must be a more efficient and elegant way that using a temp-file.
May I ask for your help with these? Sourcing the last "word" of each line that contains the string "15 RT" and either storing them or saving to a file for later comparison/evaluation. And all this "online". As the *.dat grows on and on.
Thanks a lot!
Yes, this should do it:
tail -f growing.dat | awk '/15 RT/ {print $NF}'
tail -f is very efficient, as it listens for file modify event and only outputs new lines when added (no need to loop and constantly check if file was modified). awk script will simply output the last field for each line that contains 15 RT.
Edit. Additionally, if you wish to store that output to a file, and monitor the values in terminal, you can use tee:
tail -f growing.dat | awk '/15 RT/ {print $NF}' | tee values.log
Since awk is buffering output, to see the values in real-time, you can flush the output after each update:
tail -f growing.dat | awk '/15 RT/ {print $NF; fflush()}' | tee values.log
Edit 2. If the file doesn't exist initially, you should use tail -F:
tail -F growing.dat | awk '/15 RT/ {print $NF}'
that way, tail will keep retrying to open file if it is inaccessible, it looks like this (message is printed to stderr):
tail: cannot open 'growing.dat' for reading: No such file or directory
tail: 'growing.dat' has appeared; following new file
-5.1583E+04

Most elegant unix shell one-liner to sum list of numbers of arbitrary precision?

As regards integer adding one-liners, several proposed shell scripting solutions exist;
however, on closer look at each of the solutions chosen, there are inherent limitations:
awk ones would choke at arbitrary precision and integer size (it behaves C-like, afterall)
bc ones would rather be unhappy with arbitrarily long inputs: (sed 's/$/+\\/g';echo 0)|bc
Understanding that there may be issues of portability on top of that across platforms (see [1] [2]) which is undesired,
is there a generic solution which is a winner on both practicality and brevity?
Hint: SunOS & MacOSX are examples where portability would be an issue.
fi. could dc command permit to handle arbitrarily large 2^n, integer or otherwise, inputs?
[1] awk: https://stackoverflow.com/a/450821/1574494 or https://stackoverflow.com/a/25245025/1574494 or Printing long integers in awk
[2] bc: Bash command to sum a column of numbers
An optimal solution for dc(1) sums the inputs as they are read:
$ jot 1000000 | sed '2,$s/$/+/;$s/$/p/' | dc
500000500000
The one I usually use is paste -sd+|bc:
$ time seq 1 20000000 | paste -sd+|bc
200000010000000
real 0m10.092s
user 0m10.854s
sys 0m0.481s
(For strict Posix compliance, paste needs to be provided with an explicit argument: paste -sd+ -|bc. Apparently that is necessary with the BSD paste implementation installed by default on OS X.)
However, that will fail for larger inputs, because bc buffers an entire expression in memory before evaluating it. On my system, bc ran out of memory trying to add 100 million numbers, although it was able to do 70 million. But other systems may have smaller capacities.
Since bc has variables, you could avoid long lines by repetitively adding to a variable instead of constructing a single long expression. This is (as far as I know) 100% Posix compliant, but there is a 3x time penalty:
$ time seq 1 20000000|sed -e's/^/s+=/;$a\' -es|bc
200000010000000
real 0m29.224s
user 0m44.119s
sys 0m0.820s
Another way to handle the case where the input size exceeds bc's buffering capacity would be to use the standard xargs tool to add the numbers in groups:
$ time seq 1 100000000 |
> IFS=+ xargs sh -c 'echo "$*"' _ | bc | paste -sd+ | bc
5000000050000000
real 1m0.289s
user 1m31.297s
sys 0m19.233s
The number of input lines used by each xargs evaluation will vary from system to system, but it will normally be in the hundreds and it might be much more. Obviously, the xargs | bc invocations could be chained arbitrarily to increase capacity.
It might be necessary to limit the size of the xargs expansion using the -s switch, on systems where ARG_MAX exceeds the capacity of the bc command. Aside from performing an experiment to establish the bc buffer limit, there is no portable way to establish what that limit might be but it certainly should be no less than LINE_MAX which is guaranteed to be at least 2048. Even with 100-digit addends, that will allow a reduction by a factor of 20, so a chain of 10 xargs|bc pipes would handle over 1013 addends assuming you were prepared to wait a couple of months for that to complete.
As an alternative to constructing a large fixed-length pipeline, you could use a function to recursively pipe the output from xargs|bc until only one value is produced:
radd () {
if read a && read b; then
{ printf '%s\n%s\n' "$a" "$b"; cat; } |
IFS=+ xargs -s $MAXLINE sh -c 'echo "$*"' _ |
bc | radd
else
echo "$a"
fi
}
If you use a very conservative value for MAXLINE, the above is quite slow, but with plausible larger values it is not much slower than the simple paste|bc solution:
$ time seq 1 20000000 | MAXLINE=2048 radd
200000010000000
real 1m38.850s
user 0m46.465s
sys 1m34.503s
$ time seq 1 20000000 | MAXLINE=60000 radd
200000010000000
real 0m12.097s
user 0m17.452s
sys 0m5.090s
$ time seq 1 100000000 | MAXLINE=60000 radd
5000000050000000
real 1m3.972s
user 1m31.394s
sys 0m27.946s
As well as the bc solutions, I timed some other possibilities. As shown above, with an input of 20 million numbers, paste|bc took 10 seconds. That's almost identical to the time used by adding 20 million numbers with
gawk -M '{s+=$0} END{print s}'
Programming languages such as python and perl proved to be faster:
# 9.2 seconds to sum 20,000,000 integers
python -c $'import sys\nprint(sum(int(x) for x in sys.stdin))'
# 5.1 seconds
perl -Mbignum -lne '$s+=$_; END{print $s}'
I was unable to test dc -f - -e '[+z1<r]srz1<rp' on large inputs, since its performance appears to be quadratic (or worse); it summed 25 thousand numbers in 3 seconds, but it took 19 seconds to sum 50 thousand and 90 seconds to do 100 thousand.
Although bc is not the fastest and memory limitations require awkward workarounds, it has the advantage of working out of the box on Posix-compliant systems without the necessity to install enhanced versions of any standard utility (awk) or programming languages not required by Posix (perl and python).
You can use gawk with the -M flag:
$ seq 1 20000000 | gawk -M '{s+=$0} END{print s}'
200000010000000
Or Perl with bignum enabled:
$ seq 1 20000000 | perl -Mbignum -lne '$s+=$_; END{print $s}'
200000010000000
$ seq 1000|(sum=0;while read num; do sum=`echo $sum+$num|bc -l`;done;echo $sum)
500500
Also, this one will not win a top-speed prize, however it IS:
oneliner, yes.
portable
adds lists of any length
adds numbers of any precision (each number's length limited only by MAXLINE)
does not rely on external tools such as python/perl/awk/R etc
with a stretch, you may call it elegant too ;-)
come on guys, show the better way to do this!
It seems that the following does the trick:
$ seq 1000|dc -f - -e '[+z1<r]srz1<rp'
500500
but, is it the optimal solution?
jot really slows u down :
( time ( jot 100000000 | pvZ -i 0.2 -l -cN in0 |
mawk2 '{ __+=$_ } END { print __ }' FS='\n' ) )
in0: 100M 0:00:17 [5.64M/s] [ <=> ]
( jot 100000000 | nice pv -pteba -i 1 -i 0.2 -l -cN in0 |
mawk2 FS='\n'; )
26.43s user 0.78s system 153% cpu 17.730 total
5000000050000000
using another awk instance to generate the sequence shaves off 39.7% :
( time (
mawk2 -v __='100000000' '
BEGIN { for(_-=_=__=+__;_<__;) {
print ++_ } }' |
pvZ -i 0.2 -l -cN in0 |
mawk2 '{ __+=$_ } END{ print __ }' FS='\n' ))
in0: 100M 0:00:10 [9.37M/s] [ <=> ]
( mawk2 -v __='100000000' 'BEGIN {…}' | )
19.44s user 0.68s system 188% cpu 10.687 total
5000000050000000
for the bc option, gnu-paste is quite a bit faster than bsd-paste in this regard, but both absolutely pale compared to awk, while perl is only slightly behind :
time jot 15000000 | pvE9 | mawk2 '{ _+=$__ } END { print _ }'
out9: 118MiB 0:00:02 [45.0MiB/s] [45.0MiB/s] [ <=> ]
112500007500000
jot 15000000 2.60s user 0.03s system 99% cpu 2.640 total
pvE 0.1 out9 0.01s user 0.05s system 2% cpu 2.640 total
mawk2 '{...}'
1.09s user 0.03s system 42% cpu 2.639 total
perl -Mbignum -lne '$s+=$_; END{print $s}' # perl 5.36
1.36s user 0.03s system 52% cpu 2.662 total
time jot 15000000 | pvE9 | gpaste -sd+ -|bc
out9: 118MiB 0:00:02 [45.3MiB/s] [45.3MiB/s] [ <=> ]
112500007500000
jot 15000000 2.59s user 0.03s system 99% cpu 2.627 total
pvE 0.1 out9 0.01s user 0.05s system 2% cpu 2.626 total
gpaste -sd+ - 0.27s user 0.03s system 11% cpu 2.625 total # gnu-paste
bc 4.55s user 0.46s system 66% cpu 7.544 total
time jot 15000000 | pvE9 | paste -sd+ -|bc
out9: 118MiB 0:00:05 [22.7MiB/s] [22.7MiB/s] [ <=> ]
112500007500000
jot 15000000 2.63s user 0.03s system 51% cpu 5.207 total
pvE 0.1 out9 0.01s user 0.06s system 1% cpu 5.209 total
paste -sd+ - 5.14s user 0.05s system 99% cpu 5.211 total # bsd-paste
bc 4.53s user 0.40s system 49% cpu 10.029 total

How to generate random numbers between 0 and 1 in bash

I'm trying to infer how to generate two random numbers as input values for parameters of readproportion, updateproportion in a way that sum of these parameters should equal 1, in the following bash command.
$ ./bin/ycsb run basic -P workloads/workloada -p readproportion=0.50 -p updateproportion=0.50
Please help with your suggestions.
Thanks
As far as I can remember ${RANDOM} generates integers in the interval 0 - 32767. So, I guess, you might want to try something like this to generate random values in [0,1]:
bc -l <<< "scale=4 ; ${RANDOM}/32767"
$ arr=( $(awk 'BEGIN{srand(); r=rand(); print r, 1-r}') )
$ echo "${arr[0]}"
0.661624
$ echo "${arr[1]}"
0.338376
$ arr=( $(awk 'BEGIN{srand(); r=rand(); printf "%.2f %.2f\n", r, 1-r}') )
$ echo "${arr[0]}"
0.74
$ echo "${arr[1]}"
0.26
srand() will only update the seed once per second but as long as you don't need to call the script more frequently than once per second it should be fine.
Here is a super simple way to do it:
echo 0."$RANDOM"
^^ DO NOT USE AND READ BELOW ^^
Edit:
I cannot read:
$ rnum=$((RANDOM%10000+1))
$ echo "scale=4; $rnum/10000" | bc
.8748
$ echo "scale=4; 1-$rnum/10000" | bc
.1252
Edit 2:
As pointed out the first iteration of this is terrible. Once I read the issue entirely and tried to do some simple maths with the "number". I realized that this is super broken. You can read more at:
http://www.tldp.org/LDP/abs/html/randomvar.html
- and -
https://www.shell-tips.com/2010/06/14/performing-math-calculation-in-bash/
With idea from Robert J:
RANDOM=$$ # Reseed using script's PID
r1=$((${RANDOM}%98+1))
r2=$((100-$r1))
printf -v r1 "0.%.2d" $r1
printf -v r2 "0.%.2d" $r2
echo "$r1 $r2"
Output (e.g.):
0.66 0.34
0.01 0.99
0.42 0.58
0.33 0.67
0.22 0.78
0.33 0.67
0.65 0.35
0.77 0.23
0.71 0.29
This is "wasteful" (it just throws away characters until /dev/urandom happens to spit out enough ascii 0 to 9s), but it is securely pseudorandom:
rand_frac() { echo -n 0.; LANG=C tr -dc 0-9 </dev/urandom | head -c12; }
Then, you can
$ echo The random number is.... $(rand_frac)
The random number is.... 0.413856349581
LANG=C is there to prevent tr from choking on bad UTF-8 sequences from /dev/urandom. -dc means delete the compliment of the subsequent character class (so, delete everything except 0-9).
You can change the 0-9 to generate arbitrary character classes, or adjust head -c 12 to something else, which can be useful for making passwords.
This works nice
$ bc -l <<< "scale=16; $(od -t u2 -An -N2 /dev/random)/(2 ^ 16)"
.2792053222656250

How to zero pad a sequence of integers in bash so that all have the same width?

I need to loop some values,
for i in $(seq $first $last)
do
does something here
done
For $first and $last, I need it to be of fixed length 5. So if the input is 1, I need to add zeros in front such that it becomes 00001. It loops till 99999 for example, but the length has to be 5.
E.g.: 00002, 00042, 00212, 12312 and so forth.
Any idea on how I can do that?
In your specific case though it's probably easiest to use the -f flag to seq to get it to format the numbers as it outputs the list. For example:
for i in $(seq -f "%05g" 10 15)
do
echo $i
done
will produce the following output:
00010
00011
00012
00013
00014
00015
More generally, bash has printf as a built-in so you can pad output with zeroes as follows:
$ i=99
$ printf "%05d\n" $i
00099
You can use the -v flag to store the output in another variable:
$ i=99
$ printf -v j "%05d" $i
$ echo $j
00099
Notice that printf supports a slightly different format to seq so you need to use %05d instead of %05g.
Easier still you can just do
for i in {00001..99999}; do
echo $i
done
If the end of sequence has maximal length of padding (for example, if you want 5 digits and command is seq 1 10000), than you can use -w flag for seq - it adds padding itself.
seq -w 1 10
would produce
01
02
03
04
05
06
07
08
09
10
use printf with "%05d" e.g.
printf "%05d" 1
Very simple using printf
[jaypal:~/Temp] printf "%05d\n" 1
00001
[jaypal:~/Temp] printf "%05d\n" 2
00002
Use awk like this:
awk -v start=1 -v end=10 'BEGIN{for (i=start; i<=end; i++) printf("%05d\n", i)}'
OUTPUT:
00001
00002
00003
00004
00005
00006
00007
00008
00009
00010
Update:
As pure bash alternative you can do this to get same output:
for i in {1..10}
do
printf "%05d\n" $i
done
This way you can avoid using an external program seq which is NOT available on all the flavors of *nix.
I pad output with more digits (zeros) than I need then use tail to only use the number of digits I am looking for. Notice that you have to use '6' in tail to get the last five digits :)
for i in $(seq 1 10)
do
RESULT=$(echo 00000$i | tail -c 6)
echo $RESULT
done
If you want N digits, add 10^N and delete the first digit.
for (( num=100; num<=105; num++ ))
do
echo ${num:1:3}
done
Output:
01
02
03
04
05
Other way :
zeroos="000"
echo
for num in {99..105};do
echo ${zeroos:${#num}:${#zeroos}}${num}
done
So simple function to convert any number would be:
function leading_zero(){
local num=$1
local zeroos=00000
echo ${zeroos:${#num}:${#zeroos}}${num}
}
One way without using external process forking is string manipulation, in a generic case it would look like this:
#start value
CNT=1
for [whatever iterative loop, seq, cat, find...];do
# number of 0s is at least the amount of decimals needed, simple concatenation
TEMP="000000$CNT"
# for example 6 digits zero padded, get the last 6 character of the string
echo ${TEMP:(-6)}
# increment, if the for loop doesn't provide the number directly
TEMP=$(( TEMP + 1 ))
done
This works quite well on WSL as well, where forking is a really heavy operation. I had a 110000 files list, using printf "%06d" $NUM took over 1 minute, the solution above ran in about 1 second.
This will work also:
for i in {0..9}{0..9}{0..9}{0..9}
do
echo "$i"
done
If you're just after padding numbers with zeros to achieve fixed length, just add the nearest multiple of 10
eg. for 2 digits, add 10^2, then remove the first 1 before displaying output.
This solution works to pad/format single numbers of any length, or a whole sequence of numbers using a for loop.
# Padding 0s zeros:
# Pure bash without externals eg. awk, sed, seq, head, tail etc.
# works with echo, no need for printf
pad=100000 ;# 5 digit fixed
for i in {0..99999}; do ((j=pad+i))
echo ${j#?}
done
Tested on Mac OSX 10.6.8, Bash ver 3.2.48
TL;DR
$ seq 1 10 | awk '{printf("%05d\n", $1)}'
Input(Pattern 1. Slow):
$ seq 1 10 | xargs -n 1 printf "%05d\n"
Input(Pattern 2. Fast):
$ seq 1 10 | awk '{printf("%05d\n", $1)}'
Output(same result in each case):
00001
00002
00003
00004
00005
00006
00007
00008
00009
00010
Explanation
I'd like to suggest the above patterns. These implementations can be used as a command so that we can use them again with ease. All you have to care about in these commands is the length of the numbers after being converted.(like changing the number %05d into %09d.) Plus, it's also applicable to other solutions such as the following. The example is too dependent on my environment, so your output might be different, but I think you can tell the usefulness easily.
$ wc -l * | awk '{printf("%05d\n", $1)}'
00007
00001
00001
00001
00013
00017
00001
00001
00001
00043
And like this:
$ wc -l * | awk '{printf("%05d\n", $1)}' | sort | uniq
00001
00007
00013
00017
00043
Moreover, if you write in this manner, we can also execute the commands asynchronously. (I found a nice article:
https://www.dataart.com/en/blog/linux-pipes-tips-tricks)
disclaimer: I'm not sure of this, and I am not a *nix expert.
Performance test:
Super Slow:
$ time seq 1 1000 | xargs -n 1 printf "%09d\n" > test
seq 1 1000 0.00s user 0.00s system 48% cpu 0.008 total
xargs -n 1 printf "%09d\n" > test 1.14s user 2.17s system 84% cpu 3.929 total
Relatively Fast:
for i in {1..1000}
do
printf "%09d\n" $i
done
$ time sh k.sh > test
sh k.sh > test 0.01s user 0.01s system 74% cpu 0.021 total
for i in {1..1000000}
do
printf "%09d\n" $i
done
$ time sh k.sh > test
sh k.sh > test 7.10s user 1.52s system 99% cpu 8.669 total
Fast:
$ time seq 1 1000 | awk '{printf("%09d\n", $1)}' > test
seq 1 1000 0.00s user 0.00s system 47% cpu 0.008 total
awk '{printf("%09d\n", $1)}' > test 0.00s user 0.00s system 52% cpu 0.009 total
$ time seq 1 1000000 | awk '{printf("%09d\n", $1)}' > test
seq 1 1000000 0.27s user 0.00s system 28% cpu 0.927 total
awk '{printf("%09d\n", $1)}' > test 0.92s user 0.01s system 99% cpu 0.937 total
If you have to implement the higher performance solution, probably it may require other techniques, not using the shell script.
1.) Create a sequence of numbers 'seq' from 1 to 1000, and fix the width '-w' (width is determined by length of ending number, in this case 4 digits for 1000).
2.) Also, select which numbers you want using 'sed -n' (in this case, we select numbers 1-100).
3.) 'echo' out each number. Numbers are stored in the variable 'i', accessed using the '$'.
Pros: This code is pretty clean.
Cons: 'seq' isn't native to all Linux systems (as I understand)
for i in `seq -w 1 1000 | sed -n '1,100p'`;
do
echo $i;
done
you don't need awk for that — either seq or jot alone suffices :
% seq -f '%05.f' 6 # bsd-seq
00001
00002
00003
00004
00005
00006
% gseq -f '%05.f' 6 # gnu-seq
00001
00002
00003
00004
00005
00006
% jot -w '%05.f' 6
00001
00002
00003
00004
00005
00006
…… unless you're going into bigint territory :
% gawk -Mbe '
function __(_,___) {
return +_<+___?___:_
}
BEGIN {
_+=_^=_<_
____="%0*.f\n"
} {
___=__($--_, !+$++_)
_____=__(++_+--_, length(______=+$NF))
do {
printf(____,_____,___)
} while (___++<______)
}' <<< '999999999999999999996 1000000000000000000003'
0999999999999999999996
0999999999999999999997
0999999999999999999998
0999999999999999999999
1000000000000000000000
1000000000000000000001
1000000000000000000002
1000000000000000000003
——————————————————————————————————————————————————
If you need to print out a HUGE range of numbers, then this approach maybe a bit faster -
printing out every integer from 1 to 1 million, left-zero-padded to 9-digits wide, in 0.049s
*caveat : I didn't have the spare time to make it cover all input ranges :: it's just a proof of concept accepting increments of powers of 10
——————————————————————————————————————————————————
( time ( LC_ALL=C mawk2 '
function jot(____,_______,_____,_,__,___,______) {
if(____==(____^!____)) {
return +____<+_______\
? sprintf("%0*.f",_______,____)\
: +____
}
_______= (_______-=____=length(____)-\
(_=!(_<_)))<+_ \
? "" \
: sprintf("%0*.f",_______,!_)
__=_= (!(__=_+=_+_))(__=(-+--_)+(__+=_)^++_)\
(__+=_=(((_--^_--+_++)^++_-_^!_)/_))(__+_)
_____= "."
gsub(_____,"\\&&",__)
____—-
do {
gsub(_____,__,_)
_____=_____"."
} while(—____)
gsub(_____,(_______)"&\n",_)
sub("^[^\n]+[\n]","",_)
sub(".$",""~"",_______)
return \
(_)(_______)\
sprintf("%0*.f",length(_____),__<__)
} { print jot($1,$2) }' <<< '10000000 9'
) | pvE9 ) |xxh128sum |ggXy3 | lgp3
sleep 2
( time ( LC_ALL=C jot 1000000 |
LC_ALL=C mawk2 '{ printf("%09.f\n", $1) }'
) | pvE9 ) |xxh128sum |ggXy3 | lgp3
out9: 9.54MiB 0:00:00 [ 275MiB/s] [ 275MiB/s] [<=> ]
( LC_ALL=C mawk2 <<< '1000000 9'; )
0.04s user 0.01s system 93% cpu 0.049 total
e0491043bdb4c8bc16769072f3b71f98 stdin
out9: 9.54MiB 0:00:00 [36.5MiB/s] [36.5MiB/s] [ <=> ]
( LC_ALL=C jot 1000000 | LC_ALL=C mawk2 '{printf("%09.f\n", $1)}'; )
0.43s user 0.01s system 158% cpu 0.275 total
e0491043bdb4c8bc16769072f3b71f98 stdin
By the time you're doing 10 million, the time differences become noticeable :
out9: 95.4MiB 0:00:00 [ 216MiB/s] [ 216MiB/s] [<=> ]
( LC_ALL=C mawk2 <<< '10000000 9'; )
0.38s user 0.06s system 95% cpu 0.458 total
be3ed6c8e9ee947e5ba4ce51af753663 stdin
out9: 95.4MiB 0:00:02 [36.3MiB/s] [36.3MiB/s] [ <=> ]
( LC_ALL=C jot 10000000 | LC_ALL=C mawk2 '{printf("%09.f\n", $1)}'; )
4.30s user 0.04s system 164% cpu 2.638 total
be3ed6c8e9ee947e5ba4ce51af753663 stdin
out9: 95.4MiB 0:00:02 [35.2MiB/s] [35.2MiB/s] [ <=> ]
( LC_ALL=C python3 -c '__=1; ___=10**7;
[ print("{0:09d}".format(_)) for _ in range(__,___+__) ]'
) | pvE9 ) | xxh128sum |ggXy3 | lgp3 ; )
2.68s user 0.04s system 99% cpu 2.725 total
be3ed6c8e9ee947e5ba4ce51af753663 stdin

Resources