getting overflow in bash script to compute the multinomial coeffcients - bash

I'm using bash to compute the multinomial coefficients. The code follows bellow:
#!/bin/bash
function factorial {
declare n=$1
(( n < 2 )) && echo 1 && return
echo $(( n * $(factorial $((n-1))) ))
}
function binomial {
declare n=$1
declare k=$2
echo $(( $(factorial $((n))) / ( $(factorial $((k))) * $(factorial $((n-k))) ) ))
}
function multinomial {
arr=("$#")
declare mcoeff=1
declare n=0
for k in "${arr[#]}";
do
((n=$n+$k))
((mcoeff=$mcoeff*$(binomial "$n" "$k")))
done
echo "$mcoeff"
}
multinomial $#
It seems I have an overflow in some situations.
$ ./multinomial.sh 4 5 6
630630
$ ./multinomial.sh 4 5 6 7
-119189070
Any idea how to fix this?

A shell is an environment from which to call tools with a language to sequence those calls. It is not meant to be a full-featured programming language nor is it meant for complicated calculations. Try this instead, I just translated your shell code into the equivalent awk:
$ cat multinomial.sh
#!/usr/bin/env bash
awk -v nums="$*" '
function factorial(n) {
if ( n < 2 ) {
return 1
}
return n * factorial(n-1)
}
function binomial(n,k) {
return factorial(n) / ( factorial(k) * factorial(n-k) )
}
function multinomial(str, arr, mcoeff, n, k) {
split(str,arr)
mcoeff = 1
n = 0
for (j=1; j in arr; j++) {
k = arr[j]
n = n + k
mcoeff = mcoeff * binomial(n,k)
}
return mcoeff
}
BEGIN { print multinomial(nums) }
'
.
$ ./multinomial.sh 4 5 6
630630
$ ./multinomial.sh 4 5 6 7
107550162720
$ ./multinomial.sh 4 5 6 7 8
629483036137955968

Related

Java String.hashCode() implementation in Bash

I am trying to imlplement String.hashCode() function in Bash. I Couldn't figure out the bug.
this is my sample implementation
function hashCode(){ #similar function to java String.hashCode()
foo=$1
echo $foo
h=0
for (( i=0; i<${#foo}; i++ )); do
val=$(ord ${foo:$i:1})
echo $val
if ((31 * h + val > 2147483647))
then
h=$((-2147483648 + (31 * h + val) % 2147483648 ))
elif ((31 * h + val < -2147483648))
then
h=$(( 2147483648 - ( 31 * h + val) % 2147483648 ))
else
h=$(( 31 * h + val))
fi
done
printf %d $h
}
function ord() { #asci to int conversion
LC_CTYPE=C printf %d "'$1"
}
Java function looks like this
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
Expected output for string "__INDEX_STAGING_DATA__0_1230ee6d-c37a-46cf-821c-55412f543fa6" is "1668783629" but the output is -148458597
Note - Have to handle java int overflow and underflow.
Vinujan, your code is working for the purpose of hashing a given string using the algorithm you have included. You do not need the ord function as you can cause the literal conversion to ASCII value with printf -v val "%d" "'${foo:$i:1}" (unless you need the LC_CTYPE=C for character set differences).
For example, with just minor tweaks to your code, it will hash the string "hello" properly:
#!/bin/bash
function hashCode() {
local foo="$1"
local -i h=0
for ((i = 0; i < ${#foo}; i++)); do
printf -v val "%d" "'${foo:$i:1}" # val is ASCII val
if ((31 * h + val > 2147483647)) # hash scheme
then
h=$((-2147483648 + (31 * h + val) % 2147483648 ))
elif ((31 * h + val < -2147483648))
then
h=$(( 2147483648 - ( 31 * h + val) % 2147483648 ))
else
h=$(( 31 * h + val))
fi
done
printf "%d" $h # final hashCode in decimal
}
hash=$(hashCode "$1")
printf "\nhashCode: 0x%02x (%d decimal)\n" $hash $hash
Example Use/Output
$ bash hashcode.sh hello
hashCode: 0x5e918d2 (99162322 decimal)
Where you look like you have problems is in the algorithm for hashing itself. For example a longer string like password will result in your scheme returning a negative 64-bit value that looks suspect, e.g.:
$ bash hashcode.sh password
hashCode: 0xffffffffb776462d (-1216985555 decimal)
This may be your intended hash, I have nothing to compare the algorithm against. Look things over, and if you still have problems, edit your question and describe exactly what problems/error/etc. you are getting when you run the script and add that output to your question.
Edit of Hash Function for Better Behavior
Without an algorithm to implement, the only thing I can do is to reformulate the algorithm you provided to be better behaved when the calculations exceed INT_MAX/INT_MIN. Looking at your existing algorithm, it appeared to make the problems worse as large numbers were encountered rather than smoothing the values to insure they remained within the bounds.
Frankly, it looked like you had omitted subtracting INT_MIN or adding INT_MAX to h before reducing the value modulo 2147483648 when it exceeded/fell below those limits. (e.g. you forgot the parenthesis around the subtraction and addition) Simply adding that to the hash algorithm seemed to produce better behavior and your desired output.
I also save the result of your hash calculation in hval, so that it is not computed multiple times each loop, e.g.
function hashCode() {
local foo="$1"
local -i h=0
for ((i = 0; i < ${#foo}; i++)); do
printf -v val "%d" "'${foo:$i:1}" # val is ASCII val
hval=$((31 * h + val))
if ((hval > 2147483647)) # hash scheme
then
h=$(( (hval - 2147483648) % 2147483648 ))
elif ((hval < -2147483648))
then
h=$(( (hval + 2147483648) % 2147483648 ))
else
h=$(( hval ))
fi
done
printf "%d" $h # final hashCode in decimal
}
New Values
Note the hash for "hello" remains the same (as you would expect), but the value for "password" is now better behaved and returns what looks like would be expected, instead of some sign-extended 64-bit value. E.g.,
$ bash hashcode2.sh hello
hashCode: 0x5e918d2 (99162322 decimal)
$ bash hashcode2.sh password
hashCode: 0x4889ba9b (1216985755 decimal)
And note, it does produce your expected output:
$ bash hashcode2.sh "__INDEX_STAGING_DATA__0_1230ee6d-c37a-46cf-821c-55412f543fa6"
hashCode: 0x63779e0d (1668783629 decimal)
Let me know if that is more what you were attempting to do.
I got an lean solution:
hashCode() {
o=$1
h=0
for j in $(seq 1 ${#o})
do
a=$((j-1))
c=${o:$a:1}
v=$(echo -n "$c" | od -d)
i=${v:10:3}
h=$((31 * $h + $i ))
# echo -n a $a c $c i $i h $h
h=$(( (2**31-1) & $h ))
# echo -e "\t"$h
done
echo $h
}
which was wrong. :) The error was in my clever bitwise-ORing of (2**31-1) ^ $h a bitwise ANDing seems a bit wiser: (2**31-1) & $h
This might be condensed to:
hashCode() {
o=$1
h=0
for j in $(seq 1 ${#o})
do
v=$(echo -n "${$o:$((j-1)):1}" | od -d)
h=$(( (31 * $h + ${v:10:3}) & (2**31-1) ))
done
echo $h
}

How do I perform this calculation and output it to standard out?

I am trying to do this in Bash:
read n
echo int(math.ceil((math.sqrt(1 + 8 * n) - 1) / 2))
Of course this isn't working syntax but I am just putting it there so you can tell what I am trying to do.
Is there an easy way to actually make this into valid Bash?
Although you ask to do this in Bash, there's no native support for functions like square root or ceiling. It would be simpler to delegate to Perl:
perl -wmPOSIX -e "print POSIX::ceil((sqrt(1 + 8 * $n) - 1) / 2)"
Alternatively, you could use bc to calculate the square root, and some Bash to calculate the ceiling.
Let's define a function that prints the result of the formula with sqrt of bc:
formula() {
local n=$1
bc -l <<< "(sqrt(1 + 8 * $n) - 1) / 2"
}
The -l flag changes the scale from the default 0 to 20.
This affects the scale in the display of floating point results.
For example, with the default zero, 10 / 3 would print just 3.
We need the floating point details in the next step to compute the ceiling.
ceil() {
local n=$1
local intpart=${n%%.*}
if [[ $n =~ \.00*$ ]]; then
echo $intpart
else
echo $((intpart + 1))
fi
}
The idea here is to extract the integer part,
and if the decimal part is all zeros, then we print simply the integer part,
otherwise the integer part + 1, as that is the ceiling.
And a final simple function that combines the above functions to get the result that you want:
compute() {
local n=$1
ceil $(formula $n)
}
And a checker function to test it:
check() {
local n num
for n; do
num=$(formula $n)
echo $n $num $(compute $n)
done
}
Let's try it:
check 1 2 3 4 7 11 12 16 17
It produces:
1 1.00000000000000000000 1
2 1.56155281280883027491 2
3 2.00000000000000000000 2
4 2.37228132326901432992 3
7 3.27491721763537484861 4
11 4.21699056602830190566 5
12 4.42442890089805236087 5
16 5.17890834580027361089 6
17 5.35234995535981255455 6
You can use bc's sqrt function.
echo "(sqrt(1 + 8 * 3) - 1) / 2" | bc
Ceil function can be implemented using any of the methods described in this answer.
Getting Ceil integer
For eg:
ceiling_divide() {
ceiling_result=`echo "($1 + $2 - 1)/$2" | bc`
}
You can use bc for all the job
$>cat filebc
print "Enter a number\n";
scale=20
a=read()
b=((sqrt(1 + 8 * a) - 1) / 2)
scale=0
print "ceil = ";
((b/1)+((b%1)>0))
quit
Call it like that
bc -q filebc

Calculate Median in Multiple Rows

I have a file name numbers, simply contain bunch random numbers
1 2 3
7 5 9
2 2 9
5 4 5
7 2 6
I have to create a script that find the median for each row, and here is my code:
while read -a row
do
for i in "${row[#]}"
do
length=`expr ${#row[#]} % 2`
if [ $length -ne 0 ] ; then
mid=`expr ${#row[#]} / 2`
echo ${row[middle]}
elif [ $length -eq 0 ] ; then
val1=`expr ${#row[#]} / 2`
val2=`expr (${$row[#]} / 2) + 1`
mid=`expr ($val1 + $val2) / 2`
echo $mid
done | sort -n
done < numbers
However this doesn't work, it shows error instead. What mistake did I do in this code? Also I still haven't figure out where is the proper way to place the sort -n since it needs to be sorted first before calculate the median, right?
Bash can only do integer arithmetic, you need a tool like bc to compute the average:
#!/bin/bash
while read -a n ; do
n=($(IFS=$'\n' ; echo "${n[*]}" | sort -n))
len=${#n[#]}
if (( len % 2 )) ; then
echo ${n[ len / 2 ]}
else
bc -l <<< "scale=1; (${n[ len / 2 - 1 ]} + ${n[ len / 2 ]}) / 2"
fi
done
I'd probably reach for a higher level language, e.g. Perl:
#!/usr/bin/perl
use warnings;
use strict;
while (<>) {
my #n = sort { $a <=> $b } split;
print #n % 2 ? $n[ #n / 2 ]
: ($n[ #n / 2 - 1 ] + $n[ #n / 2 ]) / 2,
"\n";
}
I just had to awk it, for the fun of it.
Notice I don't use an if but fractions of indexes.
awk '{
split($0,a) # create array a from input line
asort(a,b) # sort array into array b (gnu awk specific)
# add twice the median, or around the median and divide by 2
print ( b[int(NF/2+0.7)] + b[int(NF/2+1.2)] )/2
}' numbers
Shortened (67 chars):
awk '{split($0,a);asort(a,b);print(b[int(NF/2+0.7)]+b[int(NF/2+1.2)])/2}' numbers
66 chars golf :-)
awk '{split($0,a);asort(a,b);$0=(b[int(NF/2+0.7)]+b[int(NF/2+1.2)])/2}1' numbers

Find smallest missing integer in an array

I'm writing a bash script which requires searching for the smallest available integer in an array and piping it into a variable.
I know how to identify the smallest or the largest integer in an array but I can't figure out how to identify the 'missing' smallest integer.
Example array:
1
2
4
5
6
In this example I would need 3 as a variable.
Using sed for this would be silly. With GNU awk you could do
array=(1 2 4 5 6)
echo "${array[#]}" | awk -v RS='\\s+' '{ a[$1] } END { for(i = 1; i in a; ++i); print i }'
...which remembers all numbers, then counts from 1 until it finds one that it doesn't remember and prints that. You can then remember this number in bash with
array=(1 2 4 5 6)
number=$(echo "${array[#]}" | awk -v RS='\\s+' '{ a[$1] } END { for(i = 1; i in a; ++i); print i }')
However, if you're already using bash, you could just do the same thing in pure bash:
#!/bin/bash
array=(1 2 4 5 6)
declare -a seen
for i in ${array[#]}; do
seen[$i]=1
done
for((number = 1; seen[number] == 1; ++number)); do true; done
echo $number
You can iterate from minimal to maximal number and take first non existing element,
use List::Util qw( first );
my #arr = sort {$a <=> $b} qw(1 2 4 5 6);
my $min = $arr[0];
my $max = $arr[-1];
my %seen;
#seen{#arr} = ();
my $first = first { !exists $seen{$_} } $min .. $max;
This code will do as you ask. It can easily be accelerated by using a binary search, but it is clearest stated in this way.
The first element of the array can be any integer, and the subroutine returns the first value that isn't in the sequence. It returns undef if the complete array is contiguous.
use strict;
use warnings;
use 5.010;
my #data = qw/ 1 2 4 5 6 /;
say first_missing(#data);
#data = ( 4 .. 99, 101 .. 122 );
say first_missing(#data);
sub first_missing {
my $start = $_[0];
for my $i ( 1 .. $#_ ) {
my $expected = $start + $i;
return $expected unless $_[$i] == $expected;
}
return;
}
output
3
100
Here is a Perl one liner:
$ echo '1 2 4 5 6' | perl -lane '}
{#a=sort { $a <=> $b } #F; %h=map {$_=>1} #a;
foreach ($a[0]..$a[-1]) { if (!exists($h{$_})) {print $_}} ;'
If you want to switch from a pipeline to a file input:
$ perl -lane '}
{#a=sort { $a <=> $b } #F; %h=map {$_=>1} #a;
foreach ($a[0]..$a[-1]) { if (!exists($h{$_})) {print $_}} ;' file
Since it is sorted in the process, input can be in arbitrary order.
$ cat tst.awk
BEGIN {
split("1 2 4 5 6",a)
for (i=1;a[i+1]==a[i]+1;i++) ;
print a[i]+1
}
$ awk -f tst.awk
3
Having fun with #Borodin's excellent answer:
#!/usr/bin/env perl
use 5.020; # why not?
use strict;
use warnings;
sub increasing_stream {
my $start = int($_[0]);
return sub {
$start += 1 + (rand(1) > 0.9);
};
}
my $stream = increasing_stream(rand(1000));
my $first = $stream->();
say $first;
while (1) {
my $next = $stream->();
say $next;
last unless $next == ++$first;
$first = $next;
}
say "Skipped: $first";
Output:
$ ./tyu.pl
381
382
383
384
385
386
387
388
389
390
391
392
393
395
Skipped: 394
Here's one bash solution (assuming the numbers are in a file, one per line):
sort -n numbers.txt | grep -n . |
grep -v -m1 '\([0-9]\+\):\1' | cut -f1 -d:
The first part sorts the numbers and then adds a sequence number to each one, and the second part finds the first sequence number which doesn't correspond to the number in the array.
Same thing, using sort and awk (bog-standard, no extensions in either):
sort -n numbers.txt | awk '$1!=NR{print NR;exit}'
Here is a slight variation on the theme set by other answers. Values coming in are not necessarily pre-sorted:
$ cat test
sort -nu <<END-OF-LIST |
1
5
2
4
6
END-OF-LIST
awk 'BEGIN { M = 1 } M > $1 { next } M == $1 { M++; next }
M < $1 { exit } END { print M }'
$ sh test
3
Notes:
If numbers are pre-sorted, do not bother with the sort.
If there are no missing numbers, the next higher number is output.
In this example, a here document supplies numbers, but one can use a file or pipe.
M may start greater than the smallest to ignore missing numbers below a threshold.
To auto-start the search at the lowest number, change BEGIN { M = 1 } to NR == 1 { M = $1 }.

Manage Sub-variables in Bash

I want to manage subvariables in Bash. I can assign the subvariables, but I dont know how to use it:
#/bin/bash
n=1
for lvl in 1 2;
do
export key$n="${RANDOM:0:2}"
let n=$n+1
done
for num in 1 2; do
echo $key$num
done
If I use echo $key$num, it print number sequence of variable $num, and not the random numbers
Use arrays.
for n in 1 2; do
key[n]="${RANDOM:0:2}"
done
for num in 1 2; do
echo "${key[num]}"
done
See http://mywiki.wooledge.org/BashGuide/Arrays.
Also, in bash you'll generally do better counting from 0 instead of 1, and you don't need to export variables unless you want to run some other program that is going to look for them in its inherited environment.
You may use arrays (see #MarkReed), or use declare:
for n in 1 2; do
declare -- key$n="${RANDOM:0:2}"
done
for n in 1 2; do
v=$(declare -p key$n) ; v="${v#*=}" ; echo "${v//\"/}"
done
The same using functions:
key_set () # n val
{
declare -g -- key$1=$2
}
key_get () # n
{
local v=$(declare -p key$1) ; v="${v#*=}" ; echo "${v//\"/}"
}
for n in 1 2; do
key_set $n "${RANDOM:0:2}"
done
for n in 1 2; do
key_get $n
done

Resources