Why is 1.001 + 0.001 not equal to 1.002? [duplicate] - ruby

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I have some code:
num1 = 1.001
num2 = 0.001
sum = num1 + num2
puts sum
I expected 1.002000 but I am getting 1.0019999999999998. Why is this the case?

This is a commonly-found, fundamental problem with binary representation of fractional values, and is not Ruby-specific.
Because of the way that floating-point numbers are implemented as binary values, there are sometimes noticeable discrepancies between what you'd expect with "normal" decimal math and what actually results. Ruby's default representation of floating-point numbers is no exception--since it is based on the industry-standard double-precision (IEEE 754) format, it internally represents non-integers as binary values, and so its approximations don't quite line up with decimal values.
If you need to do decimal calculations in Ruby, consider using BigDecimal (documentation):
require 'bigdecimal'
num1 = BigDecimal.new("1.001")
num2 = BigDecimal.new("0.001")
puts num1 + num2 #=> 0.1002E1

In a perfect world yes you expected that 1.002000, but you have an error due to rounding in floating point arithmetic operation, you can check on the web machine epsilon or just floating point.
For example, you can calculate the relative error like that, and the error machine for the ruby language is 1e-15
f = 0.0
100.times { f += 0.1 }
p f #=> 9.99999999999998 # should be 10.0 in the ideal world.
p 10-f #=> 1.9539925233402755e-14 # the floating-point error.

Related

How to compare matrices in GNU Octave [duplicate]

I am writing a program where I need to delete duplicate points stored in a matrix. The problem is that when it comes to check whether those points are in the matrix, MATLAB can't recognize them in the matrix although they exist.
In the following code, intersections function gets the intersection points:
[points(:,1), points(:,2)] = intersections(...
obj.modifiedVGVertices(1,:), obj.modifiedVGVertices(2,:), ...
[vertex1(1) vertex2(1)], [vertex1(2) vertex2(2)]);
The result:
>> points
points =
12.0000 15.0000
33.0000 24.0000
33.0000 24.0000
>> vertex1
vertex1 =
12
15
>> vertex2
vertex2 =
33
24
Two points (vertex1 and vertex2) should be eliminated from the result. It should be done by the below commands:
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
After doing that, we have this unexpected outcome:
>> points
points =
33.0000 24.0000
The outcome should be an empty matrix. As you can see, the first (or second?) pair of [33.0000 24.0000] has been eliminated, but not the second one.
Then I checked these two expressions:
>> points(1) ~= vertex2(1)
ans =
0
>> points(2) ~= vertex2(2)
ans =
1 % <-- It means 24.0000 is not equal to 24.0000?
What is the problem?
More surprisingly, I made a new script that has only these commands:
points = [12.0000 15.0000
33.0000 24.0000
33.0000 24.0000];
vertex1 = [12 ; 15];
vertex2 = [33 ; 24];
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
The result as expected:
>> points
points =
Empty matrix: 0-by-2
The problem you're having relates to how floating-point numbers are represented on a computer. A more detailed discussion of floating-point representations appears towards the end of my answer (The "Floating-point representation" section). The TL;DR version: because computers have finite amounts of memory, numbers can only be represented with finite precision. Thus, the accuracy of floating-point numbers is limited to a certain number of decimal places (about 16 significant digits for double-precision values, the default used in MATLAB).
Actual vs. displayed precision
Now to address the specific example in the question... while 24.0000 and 24.0000 are displayed in the same manner, it turns out that they actually differ by very small decimal amounts in this case. You don't see it because MATLAB only displays 4 significant digits by default, keeping the overall display neat and tidy. If you want to see the full precision, you should either issue the format long command or view a hexadecimal representation of the number:
>> pi
ans =
3.1416
>> format long
>> pi
ans =
3.141592653589793
>> num2hex(pi)
ans =
400921fb54442d18
Initialized values vs. computed values
Since there are only a finite number of values that can be represented for a floating-point number, it's possible for a computation to result in a value that falls between two of these representations. In such a case, the result has to be rounded off to one of them. This introduces a small machine-precision error. This also means that initializing a value directly or by some computation can give slightly different results. For example, the value 0.1 doesn't have an exact floating-point representation (i.e. it gets slightly rounded off), and so you end up with counter-intuitive results like this due to the way round-off errors accumulate:
>> a=sum([0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]); % Sum 10 0.1s
>> b=1; % Initialize to 1
>> a == b
ans =
logical
0 % They are unequal!
>> num2hex(a) % Let's check their hex representation to confirm
ans =
3fefffffffffffff
>> num2hex(b)
ans =
3ff0000000000000
How to correctly handle floating-point comparisons
Since floating-point values can differ by very small amounts, any comparisons should be done by checking that the values are within some range (i.e. tolerance) of one another, as opposed to exactly equal to each other. For example:
a = 24;
b = 24.000001;
tolerance = 0.001;
if abs(a-b) < tolerance, disp('Equal!'); end
will display "Equal!".
You could then change your code to something like:
points = points((abs(points(:,1)-vertex1(1)) > tolerance) | ...
(abs(points(:,2)-vertex1(2)) > tolerance),:)
Floating-point representation
A good overview of floating-point numbers (and specifically the IEEE 754 standard for floating-point arithmetic) is What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg.
A binary floating-point number is actually represented by three integers: a sign bit s, a significand (or coefficient/fraction) b, and an exponent e. For double-precision floating-point format, each number is represented by 64 bits laid out in memory as follows:
The real value can then be found with the following formula:
This format allows for number representations in the range 10^-308 to 10^308. For MATLAB you can get these limits from realmin and realmax:
>> realmin
ans =
2.225073858507201e-308
>> realmax
ans =
1.797693134862316e+308
Since there are a finite number of bits used to represent a floating-point number, there are only so many finite numbers that can be represented within the above given range. Computations will often result in a value that doesn't exactly match one of these finite representations, so the values must be rounded off. These machine-precision errors make themselves evident in different ways, as discussed in the above examples.
In order to better understand these round-off errors it's useful to look at the relative floating-point accuracy provided by the function eps, which quantifies the distance from a given number to the next largest floating-point representation:
>> eps(1)
ans =
2.220446049250313e-16
>> eps(1000)
ans =
1.136868377216160e-13
Notice that the precision is relative to the size of a given number being represented; larger numbers will have larger distances between floating-point representations, and will thus have fewer digits of precision following the decimal point. This can be an important consideration with some calculations. Consider the following example:
>> format long % Display full precision
>> x = rand(1, 10); % Get 10 random values between 0 and 1
>> a = mean(x) % Take the mean
a =
0.587307428244141
>> b = mean(x+10000)-10000 % Take the mean at a different scale, then shift back
b =
0.587307428244458
Note that when we shift the values of x from the range [0 1] to the range [10000 10001], compute a mean, then subtract the mean offset for comparison, we get a value that differs for the last 3 significant digits. This illustrates how an offset or scaling of data can change the accuracy of calculations performed on it, which is something that has to be accounted for with certain problems.
Look at this article: The Perils of Floating Point. Though its examples are in FORTRAN it has sense for virtually any modern programming language, including MATLAB. Your problem (and solution for it) is described in "Safe Comparisons" section.
type
format long g
This command will show the FULL value of the number. It's likely to be something like 24.00000021321 != 24.00000123124
Try writing
0.1 + 0.1 + 0.1 == 0.3.
Warning: You might be surprised about the result!
Maybe the two numbers are really 24.0 and 24.000000001 but you're not seeing all the decimal places.
Check out the Matlab EPS function.
Matlab uses floating point math up to 16 digits of precision (only 5 are displayed).

How to prevent BigDecimal from truncating results?

Follow up to this question:
I want to calculate 1/1048576 and get the correct result, i.e. 0.00000095367431640625.
Using BigDecimal's / truncates the result:
require 'bigdecimal'
a = BigDecimal.new(1)
#=> #<BigDecimal:7fd8f18aaf80,'0.1E1',9(27)>
b = BigDecimal.new(2**20)
#=> #<BigDecimal:7fd8f189ed20,'0.1048576E7',9(27)>
n = a / b
#=> #<BigDecimal:7fd8f0898750,'0.9536743164 06E-6',18(36)>
n.to_s('F')
#=> "0.000000953674316406" <- should be ...625
This really surprised me, because I was under the impression that BigDecimal would just work.
To get the correct result, I have to use div with an explicit precision:
n = a.div(b, 100)
#=> #<BigDecimal:7fd8f29517a8,'0.9536743164 0625E-6',27(126)>
n.to_s('F')
#=> "0.00000095367431640625" <- correct
But I don't really understand that precision argument. Why do I have to specify it and what value do I have to use to get un-truncated results?
Does this even qualify as "arbitrary-precision floating point decimal arithmetic"?
Furthermore, if I calculate the above value via:
a = BigDecimal.new(5**20)
#=> #<BigDecimal:7fd8f20ab7e8,'0.9536743164 0625E14',18(27)>
b = BigDecimal.new(10**20)
#=> #<BigDecimal:7fd8f2925ab8,'0.1E21',9(36)>
n = a / b
#=> #<BigDecimal:7fd8f4866148,'0.9536743164 0625E-6',27(54)>
n.to_s('F')
#=> "0.00000095367431640625"
I do get the correct result. Why?
BigDecimal can perform arbitrary-precision floating point decimal arithmetic, however it cannot automatically determine the "correct" precision for a given calculation.
For example, consider
BigDecimal.new(1)/BigDecimal.new(3)
# <BigDecimal:1cfd748, '0.3333333333 33333333E0', 18(36)>
Arguably, there is no correct precision in this case; the right value to use depends on the accuracy required in your calculations. It's worth noting that in a mathematical sense†, almost all whole number divisions result in a number with an infinite decimal expansion, thus requiring rounding. A fraction only has a finite representation if, after reducing it to lowest terms, the denominator's only prime factors are 2 and 5.
So you have to specify the precision. Unfortunately the precision argument is a little weird, because it seems to be both the number of significant digits and the number of digits after the decimal point. Here's 1/1048576 for varying precision
1 0.000001
2 0.00000095
3 0.000000953
9 0.000000953
10 0.0000009536743164
11 0.00000095367431641
12 0.000000953674316406
18 0.000000953674316406
19 0.00000095367431640625
For any value less than 10, BigDecimal truncates the result to 9 digits which is why you get a sudden spike in accuracy at precision 10: at that point is switches to truncating to 18 digits (and then rounds to 10 significant digits).
† Depending on how comfortable you are comparing the sizes of countably infinite sets.

How do I square a number without using multiplication?

Was wondering if there is a way to write a method which squares a number (integer or decimal/float) without using the operational sign (*). For example: square of 2 will be 4, square of 2.5 will be 6.25, and 3.5's will be 12.25.
Here is my approach:
def square(num)
number = num
number2 = number
(1...(number2.floor)).each{ num += number }
num
end
puts square(2) #=> 4 [Correct]
puts square(16) #=> 256 [Correct]
puts square(2.5) #=> 5.0 [Wrong]
puts square(3.5) #=> 10.5 [Wrong]
The code works for integers, but not with floats/decimals. What am I doing wrong here? Also, if anybody has a fresh approach to this problem then please share. Algorithms are also welcome. Also, considering performance of the method will be a plus.
There are a few tricks you could use, arranged here in order of increasing trickery.
Logarithms
Observe that k * k = e^log(k*k) = e^(log(k) + log(k)), and use that rule:
Math.exp(Math.log(5.2) + Math.log(5.2))
# => 27.04
No multiplication here!
Division
As another commenter suggested, you could take the reciprocal operation, division: k/(1.0/k) == k^2. However, this introduces additional floating-point errors, since k / (1.0 / k) is two floating-point operations, whereas k * k is only one.
Exponentiation
Or, since this is Ruby, if you want exactly the same value as the floating-point operation and you don't want to use the multiplication operator, you can use the exponentiation operator: k**2 == k * k.
Call a web service
It's not multiplying if you don't do it yourself!
require 'wolfram' # https://github.com/cldwalker/wolfram
query = 'Square[5.2]'
result = Wolfram.fetch(query)
Blatant cheating
Finally, if you're feeling really cheap, you could avoid actually employing the literal "*" operation, and use something equivalent:
n = ...
require 'base64'
n.send (Base64.decode64 'Kg==').to_sym, n # => n * n
Didn't use any operation sign.
def square(num)
num.send 42.chr, num
end
Well, the inverse of multiplication is division, so you can get the same result* by dividing by its inverse. That is: square(n) = n / (1.0 / n). Just make sure you don't inadvertently do integer division.
*Technically dividing twice introduces a second opportunity for rounding error in floating-point arithmetic since it performs two operations. So, this will not produce exactly the same result as floating-point multiplication - but this was also not a requirement in the question.

Floating point is limited to 16 digits

I have the following code:
def pi
pivalue = 4 * (4 * Math.atan(1.0/5.0) - Math.atan(1.0/239.0))
pivaluestring = pi.to_s
puts pivaluestring[0,20]
end
Why is that pivalue is only limited to 16 decimal points? I want there to be a much bigger limit (maximum).
Use BigMath and BigDecimal (in the Standard Library):
require "bigdecimal/math"
p BigMath::PI(50).to_s
#=>"0.3141592653589793238462643383279502884197169399375105820974944592309049629352442819E1"
# Or
include BigMath
p PI(100).to_s
BigDecimal provides arbitrary-precision floating point decimal arithmetic.
Ruby floats are 64bit floats. Once you take away the sign bit and the exponent bits you are left with 52 bits for the mantissa which is about 16 digits of decimal precision.
Ruby does have an arbitrary precision library: big decimal. Converting your code to use it would look a little like
require "bigdecimal"
require "bigdecimal/math"
def pi(prec=20)
pivalue = 4 * (4 * BigMath.atan(BigDecimal.new("0.2",prec), prec) - BigMath.atan(BigDecimal.new(1)/BigDecimal.new(239), prec))
pivaluestring = pivalue.to_s
puts pivaluestring[0,20]
end
You usually have to give bigdecimal a precision to tell it how many decimals it has to track.
There is also a built in BigMath.PI function
That is because Math.atan is a Float. Since you only have that much precision in the middle of the calculation, you cannot get more precision.
By the way, for float precision, you can get the pi simply by doing:
Math::PI # => 3.141592653589793
Floating point values have limited precision based on the number of bits used to store the value. Read these articles about floating-point arithmetic and limitations:
http://floating-point-gui.de/
http://www.ruby-doc.org/core-2.0/Float.html

BigDecimal loses precision after multiplication

I'm getting a strange behaviour with BigDecimal in ruby. Why does this print false?
require 'bigdecimal'
a = BigDecimal.new('100')
b = BigDecimal.new('5.1')
c = a / b
puts c * b == a #false
BigDecimal doesn't claim to have infinite precision, it just provides support for precisions outside the normal floating point ranges:
BigDecimal provides similar support for very large or very accurate floating point numbers.
But BigDecimal values still have a finite number of significant digits, hence the precs method:
precs
Returns an Array of two Integer values.
The first value is the current number of significant digits in the BigDecimal. The second value is the maximum number of significant digits for the BigDecimal.
You can see things starting to go awry if you look at your c:
>> c.to_s
=> "0.19607843137254901960784313725E2"
That's a nice clean rational number but BigDecimal doesn't know that, it is still stuck seeing c as a finite string of digits.
If you use Rational instead, you'll get the results you're expecting:
>> a = Rational(100)
>> b = Rational(51, 10)
>> c * b == a
=> true
Of course, this trickery only applies if you are working with Rational numbers so anything fancy (such as roots or trigonometry) is out of bounds.
This is normal behaviour, and not at all strange.
BigDecimal does not guarantee infinite accuracy, it allows you to specify arbitrary accuracy, which is not the same thing. The value 100/5.1 cannot be expressed with complete precision using floating point internal representation. Doesn't matter how many bits are used.
A "big rational" approach could achieve it - but would not give you access to some functions e.g. square roots.
See http://ruby-doc.org/core-1.9.3/Rational.html
# require 'rational' necessary only in Ruby 1.8
a = 100.to_r
b = '5.1'.to_r
c = a / b
c * b == a
# => true

Resources