Java String.hashCode() implementation in Bash - bash

I am trying to imlplement String.hashCode() function in Bash. I Couldn't figure out the bug.
this is my sample implementation
function hashCode(){ #similar function to java String.hashCode()
foo=$1
echo $foo
h=0
for (( i=0; i<${#foo}; i++ )); do
val=$(ord ${foo:$i:1})
echo $val
if ((31 * h + val > 2147483647))
then
h=$((-2147483648 + (31 * h + val) % 2147483648 ))
elif ((31 * h + val < -2147483648))
then
h=$(( 2147483648 - ( 31 * h + val) % 2147483648 ))
else
h=$(( 31 * h + val))
fi
done
printf %d $h
}
function ord() { #asci to int conversion
LC_CTYPE=C printf %d "'$1"
}
Java function looks like this
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
Expected output for string "__INDEX_STAGING_DATA__0_1230ee6d-c37a-46cf-821c-55412f543fa6" is "1668783629" but the output is -148458597
Note - Have to handle java int overflow and underflow.

Vinujan, your code is working for the purpose of hashing a given string using the algorithm you have included. You do not need the ord function as you can cause the literal conversion to ASCII value with printf -v val "%d" "'${foo:$i:1}" (unless you need the LC_CTYPE=C for character set differences).
For example, with just minor tweaks to your code, it will hash the string "hello" properly:
#!/bin/bash
function hashCode() {
local foo="$1"
local -i h=0
for ((i = 0; i < ${#foo}; i++)); do
printf -v val "%d" "'${foo:$i:1}" # val is ASCII val
if ((31 * h + val > 2147483647)) # hash scheme
then
h=$((-2147483648 + (31 * h + val) % 2147483648 ))
elif ((31 * h + val < -2147483648))
then
h=$(( 2147483648 - ( 31 * h + val) % 2147483648 ))
else
h=$(( 31 * h + val))
fi
done
printf "%d" $h # final hashCode in decimal
}
hash=$(hashCode "$1")
printf "\nhashCode: 0x%02x (%d decimal)\n" $hash $hash
Example Use/Output
$ bash hashcode.sh hello
hashCode: 0x5e918d2 (99162322 decimal)
Where you look like you have problems is in the algorithm for hashing itself. For example a longer string like password will result in your scheme returning a negative 64-bit value that looks suspect, e.g.:
$ bash hashcode.sh password
hashCode: 0xffffffffb776462d (-1216985555 decimal)
This may be your intended hash, I have nothing to compare the algorithm against. Look things over, and if you still have problems, edit your question and describe exactly what problems/error/etc. you are getting when you run the script and add that output to your question.
Edit of Hash Function for Better Behavior
Without an algorithm to implement, the only thing I can do is to reformulate the algorithm you provided to be better behaved when the calculations exceed INT_MAX/INT_MIN. Looking at your existing algorithm, it appeared to make the problems worse as large numbers were encountered rather than smoothing the values to insure they remained within the bounds.
Frankly, it looked like you had omitted subtracting INT_MIN or adding INT_MAX to h before reducing the value modulo 2147483648 when it exceeded/fell below those limits. (e.g. you forgot the parenthesis around the subtraction and addition) Simply adding that to the hash algorithm seemed to produce better behavior and your desired output.
I also save the result of your hash calculation in hval, so that it is not computed multiple times each loop, e.g.
function hashCode() {
local foo="$1"
local -i h=0
for ((i = 0; i < ${#foo}; i++)); do
printf -v val "%d" "'${foo:$i:1}" # val is ASCII val
hval=$((31 * h + val))
if ((hval > 2147483647)) # hash scheme
then
h=$(( (hval - 2147483648) % 2147483648 ))
elif ((hval < -2147483648))
then
h=$(( (hval + 2147483648) % 2147483648 ))
else
h=$(( hval ))
fi
done
printf "%d" $h # final hashCode in decimal
}
New Values
Note the hash for "hello" remains the same (as you would expect), but the value for "password" is now better behaved and returns what looks like would be expected, instead of some sign-extended 64-bit value. E.g.,
$ bash hashcode2.sh hello
hashCode: 0x5e918d2 (99162322 decimal)
$ bash hashcode2.sh password
hashCode: 0x4889ba9b (1216985755 decimal)
And note, it does produce your expected output:
$ bash hashcode2.sh "__INDEX_STAGING_DATA__0_1230ee6d-c37a-46cf-821c-55412f543fa6"
hashCode: 0x63779e0d (1668783629 decimal)
Let me know if that is more what you were attempting to do.

I got an lean solution:
hashCode() {
o=$1
h=0
for j in $(seq 1 ${#o})
do
a=$((j-1))
c=${o:$a:1}
v=$(echo -n "$c" | od -d)
i=${v:10:3}
h=$((31 * $h + $i ))
# echo -n a $a c $c i $i h $h
h=$(( (2**31-1) & $h ))
# echo -e "\t"$h
done
echo $h
}
which was wrong. :) The error was in my clever bitwise-ORing of (2**31-1) ^ $h a bitwise ANDing seems a bit wiser: (2**31-1) & $h
This might be condensed to:
hashCode() {
o=$1
h=0
for j in $(seq 1 ${#o})
do
v=$(echo -n "${$o:$((j-1)):1}" | od -d)
h=$(( (31 * $h + ${v:10:3}) & (2**31-1) ))
done
echo $h
}

Related

How to calculate Adler32 checksum for zip in Bash?

I need get checksum Adler32 and store to variable in bash.
It will be used in automatic script and it will be useful if no additional app/liberty will used which user need to install.
Is it possible to use common / basic command Bash command to get this value?
This is monumentally slow (about 60,000 times slower than C), but it shows that yes, it is possible.
#!/bin/bash
sum1=1
sum2=0
while LANG=C IFS= read -r -d '' -n 1 ch; do
printf -v val '%d\n' "'$ch"
(( val = val < 0 ? val + 256 : val, sum1 = (sum1 + val) % 65521, sum2 = (sum2 + sum1) % 65521 ))
done
(( adler = sum1 + 65536 * sum2 ))
echo $adler
Hopefully someone who actually knows bash could vastly improve on this.
Maybe this solution?:
python -c "import zlib; print(zlib.adler32(\"${file}\"))"
Tried two adler bash functions
one with an ordination dictionary and one with printf
also tried some bit shifting like
instead of sum1=(sum1+val)%65521 -> temp= (sum1+val),sum1=temp >> 16 *15 + (temp & 65355)%65521
wasn't able to improve it a lot, perhaps somebody knows a faster one.
last function is a awk function, it is the fastest, works also on files.
#!/bin/bash
a=$'Hello World'; b=""
for ((i=0;i<1000;i++)); do b+=$a; done
#-- building associative array ord byte character array
declare -Ai ordCHAR=()
for ((i=1;i<256;i++)); do printf -v hex "%x" $i; printf -v char "\x"$hex; ordCHAR[$char]=$i; done
unset hex char i
#-- building associative array ord byte character array -- END
#-- with dictionary
function adler32_A ()
{
local char; local -i sum1=1 sum2=0 val
LC_ALL=C; while read -rN 1 char; do
val=${ordCHAR[$char]};
((sum1=(sum1+val) % 65521, sum2 = (sum2 + sum1) % 65521 ))
done <<< $1
#-- removing 0A=\n addition, because of here string
(( sum2-=sum1, sum2<0 ? sum2+=65521 :0, sum1-=val, sum1<0 ? sum1+=65521 :0 ));
printf "%08x" $(( (sum2 << 16) + sum1 ))
LC_ALL=""
}
#-- with printf
function adler32_B ()
{
local char; local -i sum1=1 sum2=0 val
LC_ALL=C; while read -rN 1 char;
do
printf -v val '%d' "'$char"
(( sum1 = (sum1 + val) % 65521, sum2 = (sum2 + sum1) % 65521 ))
done <<< $1
#-- removing 0A=\n addition, because of here string
(( sum2-=sum1, sum2<0 ? sum2+=65521 :0, sum1-=val, sum1<0 ? sum1+=65521 :0 ));
printf "%x" $((sum1 + 65536 * sum2 ))
LC_ALL=""
}
#-- call adler32_awk [text STR] [evaluate text as path bINT]
function adler32_awk ()
{
local -i bPath=$2;
awk -b \
' BEGIN {RS="^$"; bPath='"$bPath"'; for (i=0;i<256;i++) charOrdARR[sprintf("%c",i)]=i; A=1; B=0;}
{
recordSTR=substr($0,1,length($0)-1); if (bPath) {getline byte_data < recordSTR; close(recordSTR);} else byte_data=recordSTR;
l=length(byte_data); for (i=1;i<=l;i++) {
A+=charOrdARR[substr(byte_data,i,1)]; if (A>65520) A-=65521;
B+=A; if (B>65520) B-=65521;}
printf "%x", lshift(B,16)+A; }
' <<<$1
}
time adler32_A "$b"
time adler32_B "$b"
#-- adler 32 of file -> adler32_awk "/home/.../your file" 1
time adler32_awk "$b"

Generate all possible ipv4 addresses using seq?

When using seq to generate an ip address, I use seq 0 255 and it generate the last octet. How can I transition this so it will generate all the other octets and their possible combinations (over 4 million combinations). Any help to start would be appreciated
If you were looking for a bash solution:
for h in {1..255}; do for i in {1..255}; do for j in {1..255}; do for k in {1..255}; do echo "$h.$i.$j.$k"; done; done; done; done
Or the multi-line version
for h in {1..255}
do for i in {1..255}
do for j in {1..255}
do for k in {1..255}
do echo "$h.$i.$j.$k"
done
done
done
done
Or if you are really intent on using seq
for h in `seq 255`; do for i in `seq 255`; do for j in `seq 255`; do for k in `seq 255`; do echo "$h.$i.$j.$k"; done; done; done; done
With awk and four loops:
awk 'BEGIN{OFS="."; for(h=0;h<256;h++){for(i=0;i<256;i++){for(j=0;j<256;j++){for(k=0;k<256;k++){print h,i,j,k}}}}}'
With C and four loops:
Put this a file with name ipgen.c:
#include <stdio.h>
int main() {
int h, i, j, k;
for (h = 0; h < 256; h++) {
for (i = 0; i < 256; i++) {
for (j = 0; j < 256; j++) {
for (k = 0; k < 256; k++) {
printf("%d.%d.%d.%d\n", h, i, j, k);
}
}
}
}
return 0;
}
Compile it: gcc ipgen.c -o ipgen
Start it: ./ipgen
For speed I'd vote for a 4-way awk/for loop (per Cyrus comment), but thought I'd look at a recursive function:
tuple () {
local level=$1 # number of tuples to generate
local max=$2 # assume tuples are numbered 1 to max
local in=$3 # current tuple from parent
local pfx="." # for all but the topmost call we append a period on the front of our loop counter
[[ -z "${in}" ]] && pfx="" # topmost call appends no prefix
local i
local out
for (( i=1 ; i<=${max} ; i++ ))
do
out="${in}${pfx}${i}"
if [[ "${level}" -eq 1 ]] # if we've reached the bottom of our function calls
then
echo "${out}" # print our latest string
else
tuple $((level-1)) "${max}" "${out}" # otherwise recurse with the latest string
fi
done
}
To build a 3-tuple with values ranging from 1 to 2:
$ tuple 3 2
1.1.1
1.1.2
1.2.1
1.2.2
2.1.1
2.1.2
2.2.1
2.2.2
To build a 3-tuple with values ranging from 1 to 4:
$ tuple 3 4
1.1.1
1.1.2
1.1.3
1.1.4
1.2.1
...
4.3.4
4.4.1
4.4.2
4.4.3
4.4.4
To build a 4-tuple with values ranging from 1 to 255 (keep in mind this is NOT going to be fast since we're making a LOT of bash-level calls):
$ tuple 4 255
1.1.1.1
1.1.1.2
1.1.1.3
1.1.1.4
1.1.1.5
1.1.1.6
1.1.1.7
1.1.1.8
1.1.1.9
1.1.1.10
... if you let it run long enough ...
255.255.255.250
255.255.255.251
255.255.255.252
255.255.255.253
255.255.255.254
255.255.255.255

getting overflow in bash script to compute the multinomial coeffcients

I'm using bash to compute the multinomial coefficients. The code follows bellow:
#!/bin/bash
function factorial {
declare n=$1
(( n < 2 )) && echo 1 && return
echo $(( n * $(factorial $((n-1))) ))
}
function binomial {
declare n=$1
declare k=$2
echo $(( $(factorial $((n))) / ( $(factorial $((k))) * $(factorial $((n-k))) ) ))
}
function multinomial {
arr=("$#")
declare mcoeff=1
declare n=0
for k in "${arr[#]}";
do
((n=$n+$k))
((mcoeff=$mcoeff*$(binomial "$n" "$k")))
done
echo "$mcoeff"
}
multinomial $#
It seems I have an overflow in some situations.
$ ./multinomial.sh 4 5 6
630630
$ ./multinomial.sh 4 5 6 7
-119189070
Any idea how to fix this?
A shell is an environment from which to call tools with a language to sequence those calls. It is not meant to be a full-featured programming language nor is it meant for complicated calculations. Try this instead, I just translated your shell code into the equivalent awk:
$ cat multinomial.sh
#!/usr/bin/env bash
awk -v nums="$*" '
function factorial(n) {
if ( n < 2 ) {
return 1
}
return n * factorial(n-1)
}
function binomial(n,k) {
return factorial(n) / ( factorial(k) * factorial(n-k) )
}
function multinomial(str, arr, mcoeff, n, k) {
split(str,arr)
mcoeff = 1
n = 0
for (j=1; j in arr; j++) {
k = arr[j]
n = n + k
mcoeff = mcoeff * binomial(n,k)
}
return mcoeff
}
BEGIN { print multinomial(nums) }
'
.
$ ./multinomial.sh 4 5 6
630630
$ ./multinomial.sh 4 5 6 7
107550162720
$ ./multinomial.sh 4 5 6 7 8
629483036137955968

How to round a large number in Shell command?

In Mac terminal, I would like to round a large number.
For example,
At 10^13th place:
1234567812345678 --> 1230000000000000
Or at 10^12th place:
1234567812345678 --> 1235000000000000
So I would like to specify the place, and then get the rounded number.
How do I do this?
You can use arithmetic expansion:
$ val=1234567812345678
$ echo $(( ${val: -13:1} < 5 ? val - val % 10**13 : val - val % 10**13 + 10**13 ))
1230000000000000
$ echo $(( ${val: -12:1} < 5 ? val - val % 10**12 : val - val % 10**12 + 10**12 ))
1235000000000000
This checks if the most significant removed digit is 5 or greater, and if it is, the last significant unremoved digit is increased by one; then we subtract the division remainder from the (potentially modified) initial value.
If you don't want to have to write it this way, you can wrap it in a little function:
round () {
echo $(( ${1: -$2:1} < 5 ? $1 - $1 % 10**$2 : $1 - $1 % 10**$2 + 10**$2 ))
}
which can then be used like this:
$ round "$val" 13
1230000000000000
$ round "$val" 12
1235000000000000
Notice that quoting $val isn't strictly necessary here, it's just a good habit.
If the one-liner is too cryptic, this is a more readable version of the same:
round () {
local rounded=$(( $1 - $1 % 10**$2 )) # Truncate
# Check if most significant removed digit is >= 5
if (( ${1: -$2:1} >= 5 )); then
(( rounded += 10**$2 ))
fi
echo $rounded
}
Apart from arithmetic expansion, this also uses parameter expansion to get a substring: ${1: -$2:1} stands for "take $1, count $2 from the back, take one character". There has to be a space before -$2 (or is has to be in parentheses) because otherwise it would be interpreted as a different expansion, checking if $1 is unset or null, which we don't want.
awk's [s]printf function can do rounding for you, within the limits of double-precision floating-point arithmetic:
$ for p in 13 12; do
awk -v p="$p" '{ n = sprintf("%.0f", $0 / 10^p); print n * 10^p }' <<<1234567812345678
done
1230000000000000
1235000000000000
For a pure bash implementation, see Benjamin W.'s helpful answer.
Actually, if you want to round to n significant digits you might be best served by mixing up traditional math and strings.
Serious debugging is left to the student, but this is what I quickly came up with for bash shell and hope MAC is close enough:
function rounder
{
local value=$1;
local digits=${2:-3};
local zeros="$( eval "printf '0%.0s' {1..$digits}" )"; #proper zeros
# a bit of shell magic that repats the '0' $digits times.
if (( value > 1$zeros )); then
# large enough to require rounding
local length=${#value};
local digits_1=$(( $digits + 1 )); #digits + 1
local tval="${value:0:$digits_1}"; #leading digits, plus one
tval=$(( $tval + 5 )); #half-add
local tlength=${#tval}; #check if carried a digit
local zerox="";
if (( tlength > length )); then
zerox="0";
fi
value="${tval:0:$digits}${zeros:0:$((length-$digits))}$zerox";
fi
echo "$value";
}
See how this can be done much shorter, but that's another exercise for the student.
Avoiding floating point math due to the inherit problems within.
All sorts of special cases, like negative numbers, are not covered.

More simple math help in bash!

In the same thread as this question, I am giving this another shot and ask SO to help address how I should take care of this problem. I'm writing a bash script which needs to perform the following:
I have a circle in x and y with radius r.
I specify resolution which is the distance between points I'm checking.
I need to loop over x and y (from -r to r) and check if the current (x,y) is in the circle, but I loop over discrete i and j instead.
Then i and j need to go from -r/resolution to +r/resolution.
In the loop, what will need to happen is echo "some_text i*resolution j*resolution 15.95 cm" (note lack of $'s because I'm clueless). This output is what I'm really looking for.
My best shot so far:
r=40.5
resolution=2.5
end=$(echo "scale=0;$r/$resolution") | bc
for (( i=-end; i<=end; i++ ));do
for (( j=-end; j<=end; j++ ));do
x=$(echo "scale=5;$i*$resolution") | bc
y=$(echo "scale=5;$j*$resolution") | bc
if (( x*x + y*y <= r*r ));then <-- No, r*r will not work
echo "some_text i*resolution j*resolution 15.95 cm"
fi
done
done
I've had just about enough with bash and may look into ksh like was suggested by someone in my last question, but if anyone knows a proper way to execute this, please let me know! What ever the solution to this, it will set my future temperament towards bash scripting for sure.
You may want to include the pipe into bc in the $()'s. Instead of.
end=$(echo "scale=0;$r/$resolution") | bc
use
end=$(echo "scale=0;$r/$resolution" | bc)
should help a bit.
EDIT And here's a solution.
r=40.5
resolution=2.5
end=$(echo "scale=0;$r/$resolution" | bc)
for i in $(seq -${end} ${end}); do
for j in $(seq -${end} ${end}); do
x=$(echo "scale=5;$i*$resolution" | bc)
y=$(echo "scale=5;$j*$resolution" | bc)
check=$(echo "($x^2+$y^2)<=$r^2" | bc)
if [ ${check} -eq '1' ]; then
iRes=$(echo "$i*$resolution" | bc)
jRes=$(echo "$j*$resolution" | bc)
echo "some_text $iRes $jRes 15.95 cm"
fi
done
done
As already mentioned this problem is probably best solved using bc, awk, ksh or another scripting language.
Pure Bash. Simple problems which actually need floating point arithmetic sometimes can be transposed to some sort of fixed point arithmetic using only integers. The following solution simulates 2 decimal places after the decimal point.
There is no need for pipes and external processes inside the loops if this precision is sufficient.
factor=100 # 2 digits after the decimal point
r=4050 # the representation of 40.50
resolution=250 # the representation of 2.50
end=$(( (r/resolution)*factor )) # correct the result of the division
for (( i=-end; i<=end; i+=factor )); do
for (( j=-end; j<=end; j+=factor )); do
x=$(( (i*resolution)/factor )) # correct the result of the division
y=$(( (j*resolution)/factor )) # correct the result of the division
if [ $(( x*x + y*y )) -le $(( r*r )) ] ;then # no correction needed
echo "$x $y ... "
fi
done
done
echo -e "resolution = $((resolution/factor)).$((resolution%factor))"
echo -e "r = $((r/factor)).$((r%factor))"
you haven't heard of (g)awk ??. then you should go learn about it. It will benefit you for the long run. Translation of your bash script to awk.
awk 'BEGIN{
r=40.5
resol=2.5
end = r/resol
print end
for (i=-end;i<=end;i++) {
for( j=-end;j<=end;j++ ){
x=sprintf("%.5d",i*resol)
y=sprintf("%.5d",j*resol)
if ( x*x + y*y <= r*r ){
print ".......blah blah ......"
}
}
}
}'
It's looking more like a bc script than a Bash one any way, so here goes:
#!/usr/bin/bc -q
/* -q suppresses a welcome banner - GNU extension? */
r = 40.5
resolution = 2.5
scale = 0
end = r / resolution
scale = 5
for ( i = -end; i <= end; i++ ) {
/* moved x outside the j loop since it only changes with i */
x = i * resolution
for ( j = -end; j <= end; j++ ) {
y = j * resolution
if ( x^2 * y^2 <= r^2 ) {
/*
the next few lines output on separate lines, the quote on
a line by itself causes a newline to be created in the output
numeric output includes newlines automatically
you can comment this out and uncomment the print statement
to use it which is a GNU extension
*/
/* */
"some_text
"
i * resolution
j * resolution
"15.95 cm
"
/* */
/* non-POSIX:
print "some_text ", i * resolution, " ", j * resolution, " 15.95 cm\n"
*/
}
}
}
quit

Resources