Behaviour of ruby with regards to numerical exactness (Scheme comparison) - ruby

In Ruby (1.9), are numbers always considered exact? I know Ruby does some fairly complex stuff with numbers under the hood, since you can do crazy things like ask for 1 trillion to the power of 1 trillion and you will get an answer (after waiting many moons for it to compute).
In Scheme, part of the specification dictates that implementations should indicate whether or not the implementation's internal representation of the number is "exact" or not. For example, 1/2 is always exact, 1/3 is exact, 0.3333 is exact etc etc. But the result of an imprecise mathematical operation on an exact number may produce a number that the Scheme implementation knows to be inexact (due to floating point precision).
(exact? (/ 0.33333 2))
=> #f
That's false, so it's not exact.
Is there a way to deduce the same information in Ruby? If I always use the Complex (or Rational) representation of a number during mathematical operations, will it always be exact in Ruby, or just extremely close to exact?
Complex("0.33333") / 2
Is that exact, or not?

Ruby's class Float is not exact. Class Integer on the other hand more or less "is" since it shifts seamlessly from Fixnum to Bignum
>> 1.4534346345235236346363574564356435
=> 1.45343463452352
>> 10**400
=> 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>> 10.0**400
=> Infinity
It looks like others would love to see the test you are asking about, namely Numeric#exact? -- see this feature request.

Related

When is it useful to compare floating-point values for equality?

I seem to see people asking all the time around here questions about comparing floating-point numbers. The canonical answer is always: just see if the numbers are within some small number of each other…
So the question is this: Why would you ever need to know if two floating point numbers are equal to each other?
In all my years of coding I have never needed to do this (although I will admit that even I am not omniscient). From my point of view, if you are trying to use floating point numbers for and for some reason want to know if the numbers are equal, you should probably be using an integer type (or a decimal type in languages that support it). Am I just missing something?
A few reasons to compare floating-point numbers for equality are:
Testing software. Given software that should conform to a precise specification, exact results might be known or feasibly computable, so a test program would compare the subject software’s results to the expected results.
Performing exact arithmetic. Carefully designed software can perform exact arithmetic with floating-point. At its simplest, this may simply be integer arithmetic. (On platforms which provide IEEE-754 64-bit double-precision floating-point but only 32-bit integer arithmetic, floating-point arithmetic can be used to perform 53-bit integer arithmetic.) Comparing for equality when performing exact arithmetic is the same as comparing for equality with integer operations.
Searching sorted or structured data. Floating-point values can be used as keys for searching, in which case testing for equality is necessary to determine that the sought item has been found. (There are issues if NaNs may be present, since they report false for any order test.)
Avoiding poles and discontinuities. Functions may have special behaviors at certain points, the most obvious of which is division. Software may need to test for these points and divert execution to alternate methods.
Note that only the last of these tests for equality when using floating-point arithmetic to approximate real arithmetic. (This list of examples is not complete, so I do not expect this is the only such use.) The first three are special situations. Usually when using floating-point arithmetic, one is approximating real arithmetic and working with mostly continuous functions. Continuous functions are “okay” for working with floating-point arithmetic because they transmit errors in “normal” ways. For example, if your calculations so far have produced some a' that approximates an ideal mathematical result a, and you have a b' that approximates an ideal mathematical result b, then the computed sum a'+b' will approximate a+b.
Discontinuous functions, on the other hand, can disrupt this behavior. For example, if we attempt to round a number to the nearest integer, what happens when a is 3.49? Our approximation a' might be 3.48 or 3.51. When the rounding is computed, the approximation may produce 3 or 4, turning a very small error into a very large error. When working with discontinuous functions in floating-point arithmetic, one has to be careful. For example, consider evaluating the quadratic formula, (−b±sqrt(b2−4ac))/(2a). If there is a slight error during the calculations for b2−4ac, the result might be negative, and then sqrt will return NaN. So software cannot simply use floating-point arithmetic as if it easily approximated real arithmetic. The programmer must understand floating-point arithmetic and be wary of the pitfalls, and these issues and their solutions can be specific to the particular software and application.
Testing for equality is a discontinuous function. It is a function f(a, b) that is 0 everywhere except along the line a=b. Since it is a discontinuous function, it can turn small errors into large errors—it can report as equal numbers that are unequal if computed with ideal mathematics, and it can report as unequal numbers that are equal if computed with ideal mathematics.
With this view, we can see testing for equality is a member of a general class of functions. It is not any more special than square root or division—it is continuous in most places but discontinuous in some, and so its use must be treated with care. That care is customized to each application.
I will relate one place where testing for equality was very useful. We implement some math library routines that are specified to be faithfully rounded. The best quality for a routine is that it is correctly rounded. Consider a function whose exact mathematical result (for a particular input x) is y. In some cases, y is exactly representable in the floating-point format, in which case a good routine will return y. Often, y is not exactly representable. In this case, it is between two numbers representable in the floating-point format, some numbers y0 and y1. If a routine is correctly rounded, it returns whichever of y0 and y1 is closer to y. (In case of a tie, it returns the one with an even low digit. Also, I am discussing only the round-to-nearest ties-to-even mode.)
If a routine is faithfully rounded, it is allowed to return either y0 or y1.
Now, here is the problem we wanted to solve: We have some version of a single-precision routine, say sin0, that we know is faithfully rounded. We have a new version, sin1, and we want to test whether it is faithfully rounded. We have multiple-precision software that can evaluate the mathematical sin function to great precision, so we can use that to check whether the results of sin1 are faithfully rounded. However, the multiple-precision software is slow, and we want to test all four billion inputs. sin0 and sin1 are both fast, but sin1 is allowed to have outputs different from sin0, because sin1 is only required to be faithfully rounded, not to be the same as sin0.
However, it happens that most of the sin1 results are the same as sin0. (This is partly a result of how math library routines are designed, using some extra precision to get a very close result before using a few final arithmetic operations to deliver the final result. That tends to get the correctly rounded result most of the time but sometimes slips to the next nearest value.) So what we can do is this:
For each input, calculate both sin0 and sin1.
Compare the results for equality.
If the results are equal, we are done. If they are not, use the extended precision software to test whether the sin1 result is faithfully rounded.
Again, this is a special case for using floating-point arithmetic. But it is one where testing for equality serves very well; the final test program runs in a few minutes instead of many hours.
The only time I needed, it was to check if the GPU was IEEE 754 compliant.
It was not.
Anyway I haven't used a comparison with a programming language. I just run the program on the CPU and on the GPU producing some binary output (no literals) and compared the outputs with a simple diff.
There are plenty possible reasons.
Since I know Squeak/Pharo Smalltalk better, here are a few trivial examples taken out of it (it relies on strict IEEE 754 model):
Float>>isFinite
"simple, byte-order independent test for rejecting Not-a-Number and (Negative)Infinity"
^(self - self) = 0.0
Float>>isInfinite
"Return true if the receiver is positive or negative infinity."
^ self = Infinity or: [self = NegativeInfinity]
Float>>predecessor
| ulp |
self isFinite ifFalse: [
(self isNaN or: [self negative]) ifTrue: [^self].
^Float fmax].
ulp := self ulp.
^self - (0.5 * ulp) = self
ifTrue: [self - ulp]
ifFalse: [self - (0.5 * ulp)]
I'm sure that you would find some more involved == if you open some libm implementation and check... Unfortunately, I don't know how to search == thru github web interface, but manually I found this example in julia libm (a variant of fdlibm)
https://github.com/JuliaLang/openlibm/blob/master/src/s_remquo.c
remquo(double x, double y, int *quo)
{
...
fixup:
INSERT_WORDS(x,hx,lx);
y = fabs(y);
if (y < 0x1p-1021) {
if (x+x>y || (x+x==y && (q & 1))) {
q++;
x-=y;
}
} else if (x>0.5*y || (x==0.5*y && (q & 1))) {
q++;
x-=y;
}
GET_HIGH_WORD(hx,x);
SET_HIGH_WORD(x,hx^sx);
q &= 0x7fffffff;
*quo = (sxy ? -q : q);
return x;
Here, the remainder function answer a result x between -y/2 and y/2. If it is exactly y/2, then there are 2 choices (a tie)... The == test in fixup is here to test the case of exact tie (resolved so as to always have an even quotient).
There are also a few ==zero tests, for example in __ieee754_logf (test for trivial case log(1)) or __ieee754_rem_pio2 (modulo pi/2 used for trigonometric functions).

why does Ruby's Rational class treat string arguments differently from numeric arguments?

I'm using ruby's Rational library to convert the width & height of images to aspect ratios.
I've noticed that string arguments are treated differently than numeric arguments.
>> Rational('1.91','1')
=> (191/100)
>> Rational(1.91,1)
=> (8601875288277647/4503599627370496)
>> RUBY_VERSION
=> "2.1.5"
>> RUBY_ENGINE
=> "ruby"
FYI 1.91:1 is an aspect ratio recommended by Facebook for images on their platform.
Values like 191 and 100 are much more convenient to store in my database than 8601875288277647 and 4503599627370496. But I'd like to understand where this different originates before deciding which approach to use.
The Rational test suite doesn't seem to cover this exact case.
Disclaimer: This is only an educated guess, based on some knowledge on how to implement such a feat.
As Kent Dahl already said, Floats are not precise, they have a fixed precision, which means 1.91 is really 1.910000000000000000001 or something like that, which ruby "knows" should be displayed as 1.91.
"1.91" on the other hand is a string, basically an array of characters: '1', '.', '9', '1'.
This said, here is what you need to do, to build the rational out of floats:
Get rid of the . (mathematically by multiplying the numerator and denominator with 10^x, or multiplying with ten as many times, as there are numbers behind the .)
Find the greatest common denominator (gcd)
Divide num and denom with the gcd
Step 1 however, is a little different for Float and String:
The Float, we will have to multiply with 10^x, where x is (because of the precision) not 2 (as one would think with 1.91), but more something like 16 (remember: 1.9100...1).
For the String, we COULD cast it into a float and do the same trick, but hey, there is an easier way: We just count the number of characters behind the dot (which is 2), remove the dot and multiply the denom with 10^2... This is not only the easier, but also the more precise way.
The big numbers might disappear again, when applying step 3, that's why you will not always get those strange results when dealing with rationals from floats.
TLDR: The numbers will be built differently based on the arguments being String, or FLoat. FLoats can produce long-ass numbers, because precision.
The Float 1.91 is stored as a double which has a given amount of precision, limited by binary presentation. The equivalent Rational object retains this precision a such as possible, so it is huge. There is no way of storing 1.91 exactly in a double, but the value you get is close enough for most uses.
As for the String, it represents a different value - the exact value of 1.91 - and as you create a Rational it retains it better. It is more correct than the Float, UT takes longer to use for calculations.
This is similar to the problem with 1.0/3 as it "goes on forever" 0.333333...etc, but Rational can represent it exactly.

How does addition work in racket/scheme?

For example, if you try (+ 3 4), how is it broken down and calculated in the source, specifically? Does it use recursion with add1?
The implementation of + is actually much more complicated than you might expect, because arithmetic is generic in Racket: it works on integers, rational numbers, complex numbers, and so on. You can even mix and match these kinds of numbers and it'll do the right thing. In the end, it's ultimately going to use arithmetic in C, which is what the runtime system is written in.
If you're curious, you can find more of the guts of the numeric tower here: https://github.com/plt/racket/blob/master/src/racket/src/numarith.c
Other pointers: Bignum arithmetic, the Scheme numeric tower, the Racket reference on numbers.
The + operator is a primitive operation, part of the core language. For efficiency reasons, it wouldn't make much sense to implement it as a recursive procedure.

Why does Ruby's Fixnum#/ method round down when it is called on another Fixnum?

Okay, so what's up with this?
irb(main):001:0> 4/3
=> 1
irb(main):002:0> 7/8
=> 0
irb(main):003:0> 5/2
=> 2
I realize Ruby is doing integer division here, but why? With a langauge as flexible as Ruby, why couldn't 5/2 return the actual, mathematical result of 5/2? Is there some common use for integer division that I'm missing? It seems to me that making 7/8 return 0 would cause more confusion than any good that might come from it is worth. Is there any real reason why Ruby does this?
Because most languages (even advanced/high-level ones) in creation do it? You will have the same behaviour on integer in C, C++, Java, Perl, Python... This is Euclidian Division (hence the corresponding modulo % operator).
The integer division operation is even implemented at hardware level on many architecture. Others have asked this question, and one reason is symetry: In static typed languages such as see, this allows all integer operations to return integers, without loss of precision. It also allow easy access to the corresponding low-level assembler operation, since C was designed as a sort of extension layer over it.
Moreover, as explained in one comment to the linked article, floating point operations were costly (or not supported on all architectures) for many years, and not required for processes such as splitting a dataset in fixed lots.

Arbitrary precision arithmetic with Ruby

How the heck does Ruby do this? Does Jörg or anyone else know what's happening behind the scenes?
Unfortunately I don't know C very well so bignum.c is of little help to me. I was just kind of curious it someone could explain (in plain English) the theory behind whatever miracle algorithm its using.
irb(main):001:0> 999**999
368063488259223267894700840060521865838338232037353204655959621437025609300472231530103873614505175218691345257589896391130393189447969771645832382192366076536631132001776175977932178658703660778465765811830827876982014124022948671975678131724958064427949902810498973271030787716781467419524180040734398996952930832508934116945966120176735120823151959779536852290090377452502236990839453416790640456116471139751546750048602189291028640970574762600185950226138244530187489211615864021135312077912018844630780307462205252807737757672094320692373101032517459518497524015120165166724189816766397247824175394802028228160027100623998873667435799073054618906855460488351426611310634023489044291860510352301912426608488807462312126590206830413782664554260411266378866626653755763627796569082931785645600816236891168141774993267488171702172191072731069216881668294625679492696148976999868715671440874206427212056717373099639711168901197440416590226524192782842896415414611688187391232048327738965820265934093108172054875188246591760877131657895633586576611857277011782497943522945011248430439201297015119468730712364007639373910811953430309476832453230123996750235710787086641070310288725389595138936784715274150426495416196669832679980253436807864187160054589045664027158817958549374490512399055448819148487049363674611664609890030088549591992466360050042566270348330911795487647045949301286614658650071299695652245266080672989921799342509291635330827874264789587306974472327718704306352445925996155619153783913237212716010410294999877569745287353422903443387562746452522860420416689019732913798073773281533570910205207767157128174184873357050830752777900041943256738499067821488421053870869022738698816059810579221002560882999884763252161747566893835178558961142349304466506402373556318707175710866983035313122068321102457824112014969387225476259342872866363550383840720010832906695360553556647545295849966279980830561242960013654529514995113584909050813015198928283202189194615501403435553060147713139766323195743324848047347575473228198492343231496580885057330510949058490527738662697480293583612233134502078182014347192522391449087738579081585795613547198599661273567662441490401862839817822686573112998663038868314974259766039340894024308383451039874674061160538242392803580758232755749310843694194787991556647907091849600704712003371103926967137408125713631396699343733288014254084819379380555174777020843568689927348949484201042595271932630685747613835385434424807024615161848223715989797178155169951121052285149157137697718850449708843330475301440373094611119631361702936342263219382793996895988331701890693689862459020775599439506870005130750427949747071390095256759203426671803377068109744629909769176319526837824364926844730545524646494321826241925107158040561607706364484910978348669388142016838792902926158979355432483611517588605967745393958061959024834251565197963477521095821435651996730128376734574843289089682710350244222290017891280419782767803785277960834729869249991658417000499998999
Simple: it does it the same way you do, ever since first grade. Except it doesn't compute in base 10, it computes in base 4 billion (and change).
Think about it: with our number system, we can only represent numbers from 0 to 9. So, how can we compute 6+7 without overflowing? Easy: we do actually overflow! We cannot represent the result of 6+7 as a number between 0 and 9, but we can overflow to the next place and represent it as two numbers between 0 and 9: 3×100 + 1×101. If you want to add two numbers, you add them digit-wise from the right and overflow ("carry") to the left. If you want to multiply two numbers, you have to multiply every digit of one number individually with the other number, then add up the intermediate results.
BigNum arithmetic (this is what this kind of arithmetic where the numbers are bigger than the native machine numbers is usually called) works basically the same way. Except that the base is not 10, and its not 2, either – it's the size of a native machine integer. So, on a 32 bit machine, it would be base 232 or 4 294 967 296.
Specifically, in Ruby Integer is actually an abstract class that is never instianted. Instead, it has two subclasses, Fixnum and Bignum, and numbers automagically migrate between them, depending on their size. In MRI and YARV, Fixnum can hold a 31 or 63 bit signed integer (one bit is used for tagging) depending on the native word size of the machine. In JRuby, a Fixnum can hold a full 64 bit signed integer, even on an 32 bit machine.
The simplest operation is adding two numbers. And if you look at the implementation of + or rather bigadd_core in YARV's bignum.c, it's not too bad to follow. I can't read C either, but you can cleary see how it loops over the individual digits.
You could read the source for bignum.c...
At a very high level, without going into any implementation details, bignums are calculated "by hand" like you used to do in grade school. Now, there are certainly many optimizations that can be applied, but that's the gist of it.
I don't know of the implementation details so I'll cover how a basic Big Number implementation would work.
Basically instead of relying on CPU "integers" it will create it's own using multiple CPU integers. To store arbritrary precision, well lets say you have 2 bits. So the current integer is 11. You want to add one. In normal CPU integers, this would roll over to 00
But, for big number, instead of rolling over and keeping a "fixed" integer width, it would allocate another bit and simulate an addition so that the number becomes the correct 100.
Try looking up how binary math can be done on paper. It's very simple and is trivial to convert to an algorithm.
Beaconaut APICalc 2 just released on Jan.18, 2011, which is an arbitrary-precision integer calculator for bignum arithmetic, cryptography analysis and number theory research......
http://www.beaconaut.com/forums/default.aspx?g=posts&t=13
It uses the Bignum class
irb(main):001:0> (999**999).class
=> Bignum
Rdoc is available of course

Resources