How can hex addition in BC be made to overflow at the 8 byte limit (uint64)? - bash

I am adding a series of 8196 64-bit unsigned integers, and I need the running-total to "roll-over" back to zero and continue counting from there... just as a "normal" programing language would do at the relevent INT_MAX ceiling.
As the test script shows, adding 1 to an boundary value (FF, FFFF, etc) just keeps on increasing the total. A feature, no doubt, but I'd like to limit it to 64-bits for this particular instance..
Is there some way to limit bc in this?
unset f
for ((i=0; i<8; i++)); do
f=${f}FF; echo -ne "$((${#f}/2)) bytes + 1 "
echo 'ibase=16; obase=10; ('$f'+1)' |bc
done
echo "I want 8th+1 to = 0000000000000000"
# output
#
# 1 bytes + 1 100
# 2 bytes + 1 10000
# 3 bytes + 1 1000000
# 4 bytes + 1 100000000
# 5 bytes + 1 10000000000
# 6 bytes + 1 1000000000000
# 7 bytes + 1 100000000000000
# 8 bytes + 1 10000000000000000
# I want 8th+1 to = 0000000000000000

This is called a modulo and you can read https://superuser.com/questions/31445/gnu-bc-modulo-with-scale-other-than-0 here about modulo and bc.

Related

Bash - Sum of all the multiples of 3 or 5 below N - timed-out

I'm trying to calculate the sum of all the multiples of 3 or 5 below N in bash but my attempts fail at the speed benchmark.
The input format is described as follow:
The first line is T, which denotes the number of test cases, followed by T lines, each containing a value of N.
Sample input:
2
10
100
Expected output:
23
2318
Here are my attemps:
With bc:
#!/bin/bash
readarray input
printf 'n=%d-1; x=n/3; y=n/5; z=n/15; (1+x)*x/2*3 + (1+y)*y/2*5 - (1+z)*z/2*15\n' "${input[#]:1}" |
bc
With pure bash:
#!/bin/bash
read t
while (( t-- ))
do
read n
echo "$(( --n, x=n/3, y=n/5, z=n/15, (1+x)*x/2*3 + (1+y)*y/2*5 - (1+z)*z/2*15 ))"
done
remark: I'm using t because the input doesn't end with a newline...
Both solutions are evaluated as "too slow", but I really don't know what could be further improved. Do you have an idea?
With awk:
BEGIN {
split("0 0 3 3 8 14 14 14 23 33 33 45 45 45", sums)
split("0 0 1 1 2 3 3 3 4 5 5 6 6 6", ns)
}
NR > 1 {
print fizzbuzz_sum($0 - 1)
}
function fizzbuzz_sum(x, q, r) {
q = int(x / 15)
r = x % 15
return q*60 + q*(q-1)/2*105 + sums[r] + (x-r)*ns[r]
}
It's pretty fast on my old laptop that has an AMD A9-9410 processor
$ printf '%s\n' 2 10 100 | awk -f fbsum.awk
23
2318
$
$ time seq 0 1000000 | awk -f fbsum.awk >/dev/null
real 0m1.532s
user 0m1.542s
sys 0m0.010s
$
And with bc, in case you need it to be capable of handling big numbers too:
{
cat <<EOF
s[1] = 0; s[2] = 0; s[3] = 3; s[4] = 3; s[5] = 8
s[6] = 14; s[7] = 14; s[8] = 14; s[9] = 23; s[10] = 33
s[11] = 33; s[12] = 45; s[13] = 45; s[14] = 45
n[1] = 0; n[2] = 0; n[3] = 1; n[4] = 1; n[5] = 2
n[6] = 3; n[7] = 3; n[8] = 3; n[9] = 4; n[10] = 5
n[11] = 5; n[12] = 6; n[13] = 6; n[14] = 6
define f(x) {
auto q, r
q = x / 15
r = x % 15
return q*60 + q*(q-1)/2*105 + s[r] + (x-r)*n[r]
}
EOF
awk 'NR > 1 { printf "f(%s - 1)\n", $0 }'
} | bc
It's much slower though.
$ printf '%s\n' 2 10 100 | sh ./fbsum.sh
23
2318
$
$ time seq 0 1000000 | sh ./fbsum.sh >/dev/null
real 0m4.980s
user 0m5.224s
sys 0m0.358s
$
Let's start from the basics and try to optimize it as much as possible:
#!/usr/bin/env bash
read N
sum=0
for ((i=1;i<N;++i)); do
if ((i%3 == 0 )) || (( i%5 == 0 )); then
(( sum += i ))
fi
done
echo $sum
In the above, we run the loop N times, perform minimally N comparisons and maximally 2N sums (i and sum). We could speed this up by doing multiple loops with steps of 3 and 5, however, we have to take care of double counting:
#!/usr/bin/env bash
read N
sum=0
for ((i=N-N%3;i>=3;i-=3)); do (( sum+=i )); done
for ((i=N-N%5;i>=5;i-=5)); do (( i%3 == 0 )) && continue; ((sum+=i)); done
echo $sum
We have now maximally 2N/3 + 2N/5 = 16N/15 sums and N/5 comparisons. This is already much faster. We could still optimise it by adding an extra loop with a step of 3*5 to subtract the double counting.
#!/usr/bin/env bash
read N
sum=0
for ((i=N-N%3 ; i>=3 ; i-=3 )); do ((sum+=i)); done
for ((i=N-N%5 ; i>=5 ; i-=5 )); do ((sum+=i)); done
for ((i=N-N%15; i>=15; i-=15)); do ((sum-=i)); done
echo $sum
This brings us to maximally 2(N/3 + N/5 + N/15) = 17N/15 additions and zero comparisons. This is optimal, however, we still have a call to an arithmetic expression per cycle. This we could absorb into the for-loop:
#!/usr/bin/env bash
read N
sum=0
for ((i=N-N%3 ; i>=3 ; sum+=i, i-=3 )); do :; done
for ((i=N-N%5 ; i>=5 ; sum+=i, i-=5 )); do :; done
for ((i=N-N%15; i>=15; sum-=i, i-=15)); do :; done
echo $sum
Finally, the easiest would be to use the formula of the Arithmetic Series removing all loops. Having in mind that bash uses integer arithmetic (i.e m = p*(m/p) + m%p), one can write
#!/usr/bin/env bash
read N
(( sum = ( (3 + N-N%3) * (N/3) + (5 + N-N%5) * (N/5) - (15 + N-N%15) * (N/15) ) / 2 ))
echo $sum
The latter is the fastest possible way (with the exception of numbers below 15) as it does not call any external binary such as bc or awk and performs the task without any loops.
What about something like this
#! /bin/bash
s35() {
m=$(($1-1)); echo $(seq -s+ 3 3 $m) $(seq -s+ 5 5 $m) 0 | bc
}
read t
while read n
do
s35 $n
done
or
s35() {
m=$(($1-1));
{ sort -nu <(seq 3 3 $m) <(seq 5 5 $m) | tr '\n' +; echo 0; } | bc
}
to remove duplicates.
This Shellcheck-clean pure Bash code processes input from echo 1000000; seq 1000000 (one million inputs) in 40 seconds on an unexotic Linux VM:
#! /bin/bash -p
a=( -15 1 -13 -27 -11 -25 -9 7 -7 -21 -5 11 -3 13 -1 )
b=( 0 -8 -2 18 22 40 42 28 28 42 40 22 18 -2 -8 )
read -r t
while (( t-- )); do
read -r n
echo "$(( m=n%15, ((7*n+a[m])*n+b[m])/30 ))"
done
The code depends on the fact that the sum for each value n can be calculated with a quadratic function of the form (7*n**2+A*n+B)/30. The values of A and B depend on the value of n modulo 15. The arrays a and b in the code contain the values of A and B for each possible modulus value ({0..14}). (To avoid doing the algebra I wrote a little Bash program to generate the a and b arrays.)
The code can easily be translated to other programming languages, and would run much faster in many of them.
For a pure bash approach,
#!/bin/bash
DBG=1
echo -e "This will generate the series sum for multiples of each of 3 and 5 ..."
echo -e "\nEnter the number of summation sets to be generated => \c"
read sets
for (( k=1 ; k<=${sets} ; k++))
do
echo -e "\n============================================================"
echo -e "Enter the maximum value of a multiple => \c"
read max
echo ""
for multiplier in 3 5
do
sum=0
iter=$((max/${multiplier}))
for (( i=1 ; i<=${iter} ; i++ ))
do
next=$((${i}*${multiplier}))
sum=$((sum+=${next}))
test ${DBG} -eq 1 && echo -e "\t ${next} ${sum}"
done
echo -e "TOTAL: ${sum} for ${iter} multiples of ${multiplier} <= ${max}\n"
done
done
The session log when DBG=1:
This will generate the series sum for multiples of each of 3 and 5 ...
Enter the number of summation sets to be generated => 2
============================================================
Enter the maximum value of a multiple => 15
3 3
6 9
9 18
12 30
15 45
TOTAL: 45 for 5 multiples of 3 <= 15
5 5
10 15
15 30
TOTAL: 30 for 3 multiples of 5 <= 15
============================================================
Enter the maximum value of a multiple => 12
3 3
6 9
9 18
12 30
TOTAL: 30 for 4 multiples of 3 <= 12
5 5
10 15
TOTAL: 15 for 2 multiples of 5 <= 12
While awk will always be faster than shell, with bash you can use ((m % 3 == 0)) || ((m % 5 == 0)) to identify the multiples of 3 and 5 less than n. You will have to see if it passes the time constraints, but it should be relatively quick,
#!/bin/bash
declare -i t n sum ## handle t, n and sum as integer values
read t || exit 1 ## read t or handle error
while ((t--)); do ## loop t times
sum=0 ## initialize sum zero
read n || exit 1 ## read n or handle error
## loop from 3 to < n
for ((m = 3; m < n; m++)); do
## m is multiple of 3 or multiple of 5
((m % 3 == 0)) || ((m % 5 == 0)) && {
sum=$((sum + m)) ## add m to sum
}
done
echo $sum ## output sum
done
Example Use/Output
With the script in mod35sum.sh and your data in dat/mod35sum.txt you would have:
$ bash sum35mod.sh < dat/sum35mod.txt
23
2318

How to find values 2 exponential in shell?

Is there a way to find a value's 2 exponential form in bash.
For example if I input 512 it should result output as 9 meaning 2 ^ 9 is 512.
Any help here is immensely appreciated - Thanks
When I read the question, 512 is the input, and 9 is the output. Is is possible what is being asked here is the answer to "log_base_2(512)" which has an answer of "9". If so, then maybe this would help.
$ echo "l(512) / l(2)" | bc -l
9.00000000000000000008
The explanation of the math can be found here:
How do I calculate the log of a number using bc?
Using awk.
$ echo 512 | awk '{print log($1)/log(2)}'
9
Put that into a script (expo.sh):
#!/bin/bash
_num="$1"
expon=$(awk -v a="$_num" 'BEGIN{print log(a)/log(2)}')
if [[ $expon =~ ^[0-9]+\.[0-9]*$ ]]; then # Match floating points
echo "$_num is not an exponent of 2"; # Not exponent if floating point
else
echo "$_num = 2^${expon}"; # print number
fi
Run:
$ ./expo.sh 512
512 = 2^9
$ ./expo.sh 21
21 is not an exponent of 2
A fast way to check a number x is an 2 exponent is to check bitwise and x and x-1 and to exclude 0, x>0
((x>0 && ( x & x-1 ) == 0 )) && echo $x is a 2-exponent
using this algorithm: fast-computing-of-log2-for-64-bit-integers to compute log2
tab32=( 0 9 1 10 13 21 2 29
11 14 16 18 22 25 3 30
8 12 20 28 15 17 24 7
19 27 23 6 26 5 4 31 )
log2_32() {
local value=$1
(( value |= value >> 1 ))
(( value |= value >> 2 ))
(( value |= value >> 4 ))
(( value |= value >> 8 ))
(( value |= value >> 16 ))
log2_32=${tab32[(value * 16#7C4ACDD & 16#ffffffff)>>27]}
}
log2_32 262144
echo "$log2_32"

Can't seem to add two numbers in shell

I have been googling and trying different methods but nothing seems to work.
I have the following code
string=0 4 5 27 8 7 0 6
total=0
for n in "$string"; do
total=$(($total + $n))
done
This way I want to count the total sum of all the numbers within that string.
I have also tried expr "$total" + "$n" but that gives me an error saying the operand is not an integer.
Any suggestion how I might make this work?
Don't quote the string in the in clause, quoted string is not split into words:
#! /bin/bash
total=0
string='0 4 5 27 8 7 0 6'
for n in $string ; do
(( total += n ))
done
echo $total
string=0 4 5 27 8 7 0 6
This attempts to set the variable string to 0, then invoke the command 4 with arguments 5 27 8 7 0 6.
You need to quote the value:
string="0 4 5 27 8 7 0 6"
And you need to remove the quotes when you refer to it; change
for n in "$string"; do
to
for n in $string; do
You should use :
total=$(( total + n ))
no need for the $ before variables inside a $(( )) statement

Why use seq 0 in bash for loop

Why use seq 0 in bash for loop?
for i in `seq 0 $(( ${#ARRAYEX[#]} - 1 ))`
do
echo "ARRAYEX${i}=${ARRAYEX[${i}]}"
done
The seq command generates a sequence of numbers.
For example
seq 0 10
generates a sequence of numbers from 0 up to 10:
0 1 2 3 4 5 6 7 8 9 10
(usually each number is on a new line, but I place them after each other)
In your example a sequence on number starting at 0 up to the size of the array minus 1 is generated.
The seq 0 $(( ${#ARRAYEX[#]} - 1 )) part expands to:
0 1 2 3 4
assuming that the ARRAYEX has a size of 5.
Inside the loop the array is used again, so the loop is iterating over all array element (as the first element of the array starts at 0).
seq 0 $(( ${#ARRAYEX[#]} - 1 )) creates a sequence of all the possible indexes of the array. You can also use
for ((i=0; i<${#ARRAYEX[#]}; ++i )) ; do

bash 'while read line' efficiency with big file

I was using a while loop to process a task,
which read records from a big file about 10 million lines.
I found that the processing become more and more slower as time goes by.
and I make a simulated script with 1 million lines as blow, which reveal the problem.
but I still don't know why, how does the read command work?
seq 1000000 > seq.dat
while read s;
do
if [ `expr $s % 50000` -eq 0 ];then
echo -n $( expr `date +%s` - $A) ' ';
A=`date +%s`;
fi
done < seq.dat
The terminal outputs the time interval:
98 98 98 98 98 97 98 97 98 101 106 112 121 121 127 132 135 134
at about 50,000 lines,the processing become slower obviously.
Using your code, I saw the same pattern of increasing times (right from the beginning!). If you want faster processing, you should rewrite using shell internal features. Here's my bash version:
tabChar=" " # put a real tab char here, of course
seq 1000000 > seq.dat
while read s;
do
if (( ! ( s % 50000 ) )) ;then
echo $s "${tabChar}" $( expr `date +%s` - $A)
A=$(date +%s);
fi
done < seq.dat
edit
fixed bug, output indicated each line was being processed, now only every 50000'th line gets the timing treatment. Doah!
was
if (( s % 50000 )) ;then
fixed to
if (( ! ( s % 50000 ) )) ;then
output now echo ${.sh.version} = Version JM 93t+ 2010-05-24
50000
100000 1
150000 0
200000 1
250000 0
300000 1
350000 0
400000 1
450000 0
500000 1
550000 0
600000 1
650000 0
700000 1
750000 0
output bash
50000 480
100000 3
150000 2
200000 3
250000 3
300000 2
350000 3
400000 3
450000 2
500000 2
550000 3
600000 2
650000 2
700000 3
750000 3
800000 2
850000 2
900000 3
950000 2
800000 1
850000 0
900000 1
950000 0
1e+06 1
As to why your original test case is taking so long ... not sure. I was surprised to see both the time for each test cyle AND the increase in time. If you really need to understand this, you may need to spend time instrumenting more test stuff. Maybe you'd see something running truss or strace (depending on your base OS).
I hope this helps.
Read is a comparatively slow process, as the author of "Learning the Korn Shell" points out*. (Just above Section 7.2.2.1.) There are other programs, such as awk or sed that have been highly optimized to do what is essentially the same thing: read from a file one line at a time and perform some operations using that input.
Not to mention, that you're calling an external process every time you're doing subtraction or taking the modulus, which can get expensive. awk has both of those functionalities built in.
As the following test points out, awk is quite a bit faster:
#!/usr/bin/env bash
seq 1000000 |
awk '
BEGIN {
command = "date +%s"
prevTime = 0
}
$1 % 50000 == 0 {
command | getline currentTime
close(command)
print currentTime - prevTime
prevTime = currentTime
}
'
Output:
1335629268
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
Note that the first number is equivalent to date +%s. Just like in your test case, I let the first match be.
Note
*Yes the author is talking about the Korn Shell, not bash as the OP tagged, but bash and ksh are rather similar in a lot of ways. ksh is actually a superset of bash. So I would assume that the read command is not drastically different from one shell to another.

Resources