Issue with precision of Ruby math operations - ruby

Do you know how to fix the following issue with math precision?
p RUBY_VERSION # => "1.9.1"
p 0.1%1 # => 0.1
p 1.1%1 # => 0.1
p 90.0%1 # => 0.0
p 90.1%1 # => 0.0999999999999943
p 900.1%1 # => 0.100000000000023
p RUBY_VERSION # => "1.9.2"
p 0.1%1 # => 0.1
p 1.1%1 # => 0.10000000000000009
p 90.0%1 # => 0.0
p 90.1%1 # => 0.09999999999999432
p 900.1%1 # => 0.10000000000002274

Big Decimal
As the man said;
Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation.
I have however had great success using the BigDecimal class. To quote its intro
Ruby provides built-in support for arbitrary precision integer arithmetic. For example:
42**13 -> 1265437718438866624512
BigDecimal provides similar support for very large or very accurate floating point numbers.
Taking one of your examples;
>> x = BigDecimal.new('900.1')
=> #<BigDecimal:101113be8,'0.9001E3',8(8)>
>> x % 1
=> #<BigDecimal:10110b498,'0.1E0',4(16)>
>> y = x % 1
=> #<BigDecimal:101104760,'0.1E0',4(16)>
>> y.to_s
=> "0.1E0"
>> y.to_f
=> 0.1
As you can see, ensuring decent precision is possible but it requires a little bit of effort.

This is true of all computer languages, not just Ruby. It's a feature of representing floating point numbers on binary computers:
What Every Computer Scientist Should Know About Floating Point Arithmetic

Writing 0.1 into a floating point will always result in rounding errors. If you want 'precise' decimal representation, you should use the Decimal type.

Related

What determines the length of an Integer in Ruby?

I have been wondering what could be the maximum length of Integer before it gets to Float::INFINITY.
On my 64 bit (Arch Linux) system:
> 1023.**(3355446).bit_length
# => 33549731
> 1023.**(3355446).+(1000000 ** 1000000).+(1000 ** 100).bit_length
# => 33549731
In fact:
> a = 1023.**(3355446) ; ''
# => ""
> b = 1023.**(3355446).+(1000000 ** 1000000).+(1000 ** 100) ; ''
# => ""
> a.to_s.length == b.to_s.length
# => true
The above takes some time, but this one doesn't
a, b, length_of = 1023.**(3355446), 1023.**(3355446).+(1000000 ** 1000000).+(1000 ** 100), lambda { |x| Math.log10(x).to_i.next } ; ''
# => ""
length_of.(a).eql?(length_of.(b))
# => true
Thus, if you are running a program, which has an infinite loop and a counter which increases many hundreds or thousands of times a second, and you have to run it 24 * 365, that may cause bugs I think.
So the question is what determines the length of the Integer in Ruby? Does it differ on 32 bit and 64 bit systems?
Edit:
On my rapsberry pi 3 model B:
2.**(31580669).bit_length
# => 31580670
2.**(31580669).next.bit_length
# => 31580670
> l = ->(x) { Math.log10(x).to_i.next }
# => #<Proc:0x00a46df0#(irb):1 (lambda)>
> l === 2.**(31580669)
# => 9506729
> l === 2.**(31580669) + 100 ** 100
# => 9506729
So the question on Ruby 2.3 and older would be how big could be a Bignum. From Ruby 2.4+ the question is how big can be an Integer?
What could be the maximum length of Integer before it gets to Float::INFINITY.
Integer operations in Ruby will (almost) never return Infinity. An Integer can be as big as you have memory to hold it.
Float is implemented as a classic double precision-floating point number with an upper limit of about 1.7976931348623157e+308 and will return Float::Infinity if you go to high.
1.7976931348623157e+308.to_f + 10**307
=> Infinity
Some languages, like Perl 5, upgrade integers to doubles to get more space to work. So you will get Infinity if you go too high.
$ perl -wle 'printf "%f\n", 10**308'
100000000000000001097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336.000000
$ perl -wle 'printf "%f\n", 10**308 + 10**308'
Inf
But Ruby's Integers have no limit but your memory. When Integers get too large they switch to using the GNU Multiple Precision Arithmetic Library which supports arbitrary precision arithmetic.
There are a few operations which can result in Infinity, like power.
10**10000000
(irb):5: warning: in a**b, b may be too big
=> Infinity
But multiplication has no such limit.
a = 10**1000000
...
a *= a
...
a *= a
...
a *= a
...
a.bit_length
=> 26575425
Thus, if you are running a program, which has an infinite loop and a counter which increases many hundreds or thousands of times a second, and you have to run it 24 * 365, that may cause bugs I think.
This is a real world concern for 32 bit integers which becomes a pressing problem as 2038 approaches, but not 64 bit integers. If we incremented a counter a million times a second it would take almost 300,000 years. What I've just described is 64 bit time with microsecond resolution.
But in Ruby you can make a simple counter effectively as large as you want.

Why is 10^9942066 the biggest power I can calculate without overflows?

In ruby, some large numbers are larger than infinity. Through binary search, I discovered:
(1.0/0) > 10**9942066.000000001 # => false
(1.0/0) > 10**9942066 # => true
RUBY_VERSION # => "2.3.0"
Why is this? What is special about 109942066? It doesn't seem to be an arbitrary number like 9999999, it is not close to any power of two (it's approximately equivelent to 233026828.36662442).
Why isn't ruby's infinity infinite? How is 109942066 involved?
I now realize, any number greater than 109942066 will overflow to infinity:
10**9942066.000000001 #=> Infinity
10**9942067 #=> Infinity
But that still leaves the question: Why 109942066?
TL;DR
I did the calculations done inside numeric.c's int_pow manually, checking where an integer overflow (and a propagation to Bignum's, including a call to rb_big_pow) occurs. Once the call to rb_big_pow happens there is a check whether the two intermediate values you've got in int_pow are too large or not, and the cutoff value seems to be just around 9942066 (if you're using a base of 10 for the power). Approximately this value is close to
BIGLEN_LIMIT / ceil(log2(base^n)) * n ==
32*1024*1024 / ceil(log2(10^16)) * 16 ==
32*1024*1024 / 54 * 16 ~=
9942054
where BIGLEN_LIMIT is an internal limit in ruby which is used as a constant to check if a power calculation would be too big or not, and is defined as 32*1024*1024. base is 10, and n is the largest power-of-2 exponent for the base that would still fit inside a Fixnum.
Unfortunately I can't find a better way than this approximation, due to the algorithm used to calculate powers of big numbers, but it might be good enough to use as an upper limit if your code needs to check validity before doing exponentiation on big numbers.
Original question:
The problem is not with 9942066, but that with one of your number being an integer, the other one being a float. So
(10**9942066).class # => Bignum
(10**9942066.00000001).class # => Float
The first one is representable by a specific number internally, which is smaller than Infinity. The second one, as it's still a float is not representable by an actual number, and is simply replaced by Infinity, which is of course not larger than Infinity.
Updated question:
You are right that there seem to be some difference around 9942066 (if you're using a 64-bit ruby under Linux, as the limits might be different under other systems). While ruby does use the GMP library to handle big numbers, it does some precheck before even going to GMP, as shown by the warnings you can receive. It will also do the exponentiation manually using GMP's mul commands, without calling GMP's pow functions.
Fortunately the warnings are easy to catch:
irb(main):010:0> (10**9942066).class
=> Bignum
irb(main):005:0> (10**9942067).class
(irb):5: warning: in a**b, b may be too big
=> Float
And then you can actually check where these warnings are emitted inside ruby's bignum.c library.
But first we need to get to the Bignum realm, as both of our numbers are simple Fixnums. The initial part of the calculation, and the "upgrade" from fixnum to bignum is done inside numeric.c. Ruby does quick exponentiation, and at every step it checks whether the result would still fit into a Fixnum (which is 2 bits less than the system bitsize: 62 bits on a 64 bit machine). If not, it will then convert the values to the Bignum realm, and continues the calculations there. We are interested at the point where this conversion happens, so let's try to figure out when it does in our 10^9942066 example (I'm using x,y,z variables as present inside the ruby's numeric.c code):
x = 10^1 z = 10^0 y = 9942066
x = 10^2 z = 10^0 y = 4971033
x = 10^2 z = 10^2 y = 4971032
x = 10^4 z = 10^2 y = 2485516
x = 10^8 z = 10^2 y = 1242758
x = 10^16 z = 10^2 y = 621379
x = 10^16 z = 10^18 y = 621378
x = OWFL
At this point x will overflow (10^32 > 2^62-1), so the process will continue on the Bignum realm by calculating x**y, which is (10^16)^621378 (which are actually still both Fixnums at this stage)
If you now go back to bignum.c and check how it determines if a number is too large or not, you can see that it will check the number of bits required to hold x, and multiply this number with y. If the result is larger than 32*1024*1024, it will then fail (emit a warning and does the calculations using basic floats).
(10^16) is 54 bits (ceil(log_2(10^16)) == 54), 54*621378 is 33554412. This is only slightly smaller than 33554432 (by 20), the limit after which ruby will not do Bignum exponentiation, but simply convert y to double, and hope for the best (which will obviously fail, and just return Infinity)
Now let's try to check this with 9942067:
x = 10^1 z = 10^0 y = 9942067
x = 10^1 z = 10^1 y = 9942066
x = 10^2 z = 10^1 y = 4971033
x = 10^2 z = 10^3 y = 4971032
x = 10^4 z = 10^3 y = 2485516
x = 10^8 z = 10^3 y = 1242758
x = 10^16 z = 10^3 y = 621379
x = 10^16 z = OWFL
Here, at the point z overflows (10^19 > 2^62-1), the calculation will continue on the Bignum realm, and will calculate x**y. Note that here it will calculate (10^16)^621379, and while (10^16) is still 54 bits, 54*621379 is 33554466, which is larger than 33554432 (by 34). As it's larger you'll get the warning, and ruby will only to calculations using double, hence the result is Infinity.
Note that these checks are only done if you are using the power function. That's why you can still do (10**9942066)*10, as similar checks are not present when doing plain multiplication, meaning you could implement your own quick exponentiation method in ruby, in which case it will still work with larger values, although you won't have this safety check anymore. See for example this quick implementation:
def unbounded_pow(x,n)
if n < 0
x = 1.0 / x
n = -n
end
return 1 if n == 0
y = 1
while n > 1
if n.even?
x = x*x
n = n/2
else
y = x*y
x = x*x
n = (n-1)/2
end
end
x*y
end
puts (10**9942066) == (unbounded_pow(10,9942066)) # => true
puts (10**9942067) == (unbounded_pow(10,9942067)) # => false
puts ((10**9942066)*10) == (unbounded_pow(10,9942067)) # => true
But how would I know the cutoff for a specific base?
My math is not exactly great, but I can tell a way to approximate where the cutoff value will be. If you check the above calls you can see the conversion between Fixnum and Bignum happens when the intermediate base reaches the limit of Fixnum. The intermediate base at this stage will always have an exponent which is a power of 2, so you just have to maximize this value. For example let's try to figure out the maximum cutoff value for 12.
First we have to check what is the highest base we can store in a Fixnum:
ceil(log2(12^1)) = 4
ceil(log2(12^2)) = 8
ceil(log2(12^4)) = 15
ceil(log2(12^8)) = 29
ceil(log2(12^16)) = 58
ceil(log2(12^32)) = 115
We can see 12^16 is the max we can store in 62 bits, or if we're using a 32 bit machine 12^8 will fit into 30 bits (ruby's Fixnums can store values up to two bits less than the machine size limit).
For 12^16 we can easily determine the cutoff value. It will be 32*1024*1024 / ceil(log2(12^16)), which is 33554432 / 58 ~= 578525. We can easily check this in ruby now:
irb(main):004:0> ((12**16)**578525).class
=> Bignum
irb(main):005:0> ((12**16)**578526).class
(irb):5: warning: in a**b, b may be too big
=> Float
Now we hate to go back to our original base of 12. There the cutoff will be around 578525*16 (16 being the exponent of the new base), which is 9256400. If you check in ruby, the values are actually quite close to this number:
irb(main):009:0> (12**9256401).class
=> Bignum
irb(main):010:0> (12**9256402).class
(irb):10: warning: in a**b, b may be too big
=> Float
Note that the problem is not with the number but with the operation, as told by the warning you get.
$ ruby -e 'puts (1.0/0) > 10**9942067'
-e:1: warning: in a**b, b may be too big
false
The problem is 10**9942067 breaks Ruby's power function. Instead of throwing an exception, which would be a better behavior, it erroneously results in infinity.
$ ruby -e 'puts 10**9942067'
-e:1: warning: in a**b, b may be too big
Infinity
The other answer explains why this happens near 10e9942067.
10**9942067 is not greater than infinity, it is erroneously resulting in infinity. This is a bad habit of a lot of math libraries that makes mathematicians claw their eyeballs out in frustration.
Infinity is not greater than infinity, they're equal, so your greater than check is false. You can see this by checking if they're equal.
$ ruby -e 'puts (1.0/0) == 10**9942067'
-e:1: warning: in a**b, b may be too big
true
Contrast this with specifying the number directly using scientific notation. Now Ruby doesn't have to do math on huge numbers, it just knows that any real number is less than infinity.
$ ruby -e 'puts (1.0/0) > 10e9942067'
false
Now you can put on as big an exponent as you like.
$ ruby -e 'puts (1.0/0) > 10e994206700000000000000000000000000000000'
false

Internal rounding woes: accurate way to sum Ruby floating point numbers?

This is of course broken:
(0.1 + 0.1 + 0.1) => 0.30000000000000004
(0.1 + 0.1 + 0.1) == 0.3 # false
I don't need a perfect sum, just good enough to say two Floats are the same value. The best I can figure out is to multiply both sides of the equation and round. Is this the best way?
((0.1 + 0.1 + 0.1) * 1000).round == (0.3 * 1000).round
UPDATE: I'm stuck on Ruby v1.8.7.
There is a difference between summing accurately and comparing effectively. You say you want the former, but it looks like you want the later. The underlying Ruby float arithmetic is IEEE and has sensible semantics for minimizing accumulated error, but there always will be when using a representation that can't exactly represent all values. To accurately model error, FP addition shouldn't produce an exact value, it should produce an interval and further additions will operate on intervals.
In practice, many applications don't need to have detailed accounting for error, they just need to do their calculation and be aware that comparisons aren't exact and output decimal representations should be rounded.
Here's a simple extension to Float that will help you out with comparison. It or something like it should be in the stdlib, but ain't.
class Float
def near_enough?(other, epsilon = 1e-6)
(self - other.to_f).abs < epsilon.to_f
end
end
pry(main)> (0.1 + 0.1 + 0.1).near_enough?(0.3)
=> true
pry(main)> (0.1 + 0.1 + 0.1).near_enough?(0.3, 1e-17)
=> false
pry(main)> ( [0.1] * (10**6) ).reduce(:+).near_enough?(10**5, 1e-5)
=> true
pry(main)> ( [0.1] * (10**6) ).reduce(:+).near_enough?(10**5)
=> false
Picking an appropriate epsilon can be tricky in the general case. You should read What Every Computer Scientist Should Know About Floating-Point Arithmetic. I've found Bruce Dawson's floating point tricks blogs excellent, here's his chapter on Comparing Floating Point Numbers
If you really are concerned about accuracy, you could do your arithmetic using an exact representation. Ruby supplies a Rational class (even back in 1.8) which let's you do exact arithmetic on fractions.
pry(main)> r=Rational(1,10)
=> (1/10)
pry(main)> (r + r + r) == Rational(3,10)
=> true
pry(main)> (r + r + r) == 0.3
=> true
pry(main)> r.to_f
=> 0.1
pry(main)> (r + r + r).to_f
=> 0.3
The round method supports the specification of decimal places to which to round: http://www.ruby-doc.org/core-1.9.3/Float.html#method-i-round
So
(0.1 + 0.1 + 0.1).round(1) == (0.3).round(1)
... ought to be good.

multiplying floating point numbers produces zero

the code below outputs 0.0. is this because of the overflow? how to avoid it? if not, why?
p ((1..100000).map {rand}).reduce :*
I was hoping to speed up this code:
p r.reduce(0) {|m, v| m + (Math.log10 v)}
and use this instead:
p Math.log10 (r.reduce :*)
but apparently this is not always possible...
The values produced by rand are all between 0.0 and 1.0. This means that on each multiplication, your number gets smaller. So by the time you have multiplied 1000 of them, it is probably indistinguishable from 0.
At some point, ruby will take your number to be so small that it is 0. for instance: 2.0e-1000 # => 0
Every multiplication reduces your number by about 1/21, so after about 50 of them, you are down 1/250, and after 100000 (actually, after about 700) you have underflowed the FP format itself, see here.
Ruby provides the BigDecimal class, which implements accurate floating point arithmetic.
require 'bigdecimal'
n = 100
decimals = n.times.map { BigDecimal.new rand.to_s }
result = decimals.reduce :*
result.nonzero?.nil? # returns nil if zero, self otherwise
# => false
result.precs # [significant_digits, maximum_significant_digits]
# => [1575, 1764]
Math.log10 result
# => -46.8031931083014
It is a lot slower than native floating point numbers, however. With n = 100_000, the decimals.reduce :* call went on for minutes on my computer before I finally interrupted it.

Removing scientific notation from float

I'm currently multiplying two floats like so: 0.0004 * 0.0000000000012 = 4.8e-16
How do I get the result in a normal format, i.e. without the scientific notation, something like 0.0000000000324 and then round it up to say 5 numbers.
You can use string formatting.
a = 0.0004 * 0.0000000000012 # => 4.8e-16
'%.5f' % a # => "0.00000"
pi = Math::PI # => 3.141592653589793
'%.5f' % pi # => "3.14159"

Resources