I have a rails app that process some data, and some of these data include numbers with decimals such as 1.9943, and division between these numbers and other integers. I wanted to know what the best way to store this was.
I thought of storing the numbers that would retain integers as integers and numbers that could become decimals as decimals. Although it was in a weird format like
#<BigDecimal:7fda470aa9f0,'0.197757E1',18(18)>
it seems to perform the correct arithmetic when I divide two decimal numbers or a decimal with an integer. When I try to divide integers with integers, it doesn't work correctly. I thought rails would automatically convert the result into a proper decimal, but it seems to keep it as integer and strip the remainders. Is there anything I can do about this?
And what would be the best way to store this type of information? Should I store it all as decimals, or maybe floats?
If you want to divide two integers without losing precision, you need to cast one of them to a Float or BigDecimal first:
irb(main):007:0> 2/3
=> 0
irb(main):008:0> Float(2)/3
=> 0.666666666666667
I am a bit confused when you say that you get different results when you divide a Float/Integer vs. Integer/Float? These should have the same result:
irb(main):010:0> Integer(2)/Float(3)
=> 0.666666666666667
irb(main):011:0> Float(2)/Integer(3)
=> 0.666666666666667
irb(main):012:0> String( BigDecimal('2')/3 )
=> "0.666666666666666666666666666666666666666666666666666667E0"
irb(main):013:0> String( 2/BigDecimal('3') )
=> "0.666666666666666666666666666666666666666666666666666667E0"
Can you provide a code example?
As far as storage goes, any integer data should be stored as an Integer regardless of its expected use in future calculations.
Storing Floats vs. BigDecimals depends on how much precision you require. If you don't require much precision, a Float will provide a double-precision representation. If you require a high degree of precision, BigDecimal will provide an arbitrary-precision representation.
Related: Ruby Numbers - Explains the difference between Integers, Floats, BigDecimals, and Rationals
you don't need big precision to have problems with floats
You should better avoid as much as possible floats.
Ex: 123.6 - 123 => in floats it will give you 0,59..
in BigDecimal you will have 0.6
Related
Here what's bringing me trouble:
irb> (0.5).round # => 1 YES
irb> (0.075).round(2) # => 0.08 YES
irb> (9.075).round(2) # => 9.07 WHY???
What is going on? How come the result isn't 9.08?
Floating point is tricky. The decimal 9.075 can't be exactly represented as a float. This isn't specific to ruby.
The rounding algorithm in most cases (not including nans and the like) works by multiplying the number by the appropriate power of 10, rounding, and then dividing by that same number. That multiplying by 10 results in some loss of precision.
Floating point number can not represent all number precisely. I advise to read Floating Point - Representable numbers, conversion and rounding.
Since Ruby 2.2 you can use prev_float and next_float to see which are the cloased representable floating points to a given number:
9.075.prev_float
#=> 9.074999999999998
9.075.next_float
#=> 9.075000000000001
As you see 9.075 is between to 9.075000000000001 and 9.074999999999998, therefore the mean is at 9.0749999999999995 and therefore 9.075 rounds down to 9.07.
This has to deal with the way ruby deals with Float. If you convert to Rational it rounds to the closest number : (0.075).to_r.round(2) #=> (7/100) and 9.075.to_r.round(2) #=> (907/100)
More details on the floating point logic (logic used to internaly store floats) : https://en.wikipedia.org/wiki/Floating_point
I'm using ruby's Rational library to convert the width & height of images to aspect ratios.
I've noticed that string arguments are treated differently than numeric arguments.
>> Rational('1.91','1')
=> (191/100)
>> Rational(1.91,1)
=> (8601875288277647/4503599627370496)
>> RUBY_VERSION
=> "2.1.5"
>> RUBY_ENGINE
=> "ruby"
FYI 1.91:1 is an aspect ratio recommended by Facebook for images on their platform.
Values like 191 and 100 are much more convenient to store in my database than 8601875288277647 and 4503599627370496. But I'd like to understand where this different originates before deciding which approach to use.
The Rational test suite doesn't seem to cover this exact case.
Disclaimer: This is only an educated guess, based on some knowledge on how to implement such a feat.
As Kent Dahl already said, Floats are not precise, they have a fixed precision, which means 1.91 is really 1.910000000000000000001 or something like that, which ruby "knows" should be displayed as 1.91.
"1.91" on the other hand is a string, basically an array of characters: '1', '.', '9', '1'.
This said, here is what you need to do, to build the rational out of floats:
Get rid of the . (mathematically by multiplying the numerator and denominator with 10^x, or multiplying with ten as many times, as there are numbers behind the .)
Find the greatest common denominator (gcd)
Divide num and denom with the gcd
Step 1 however, is a little different for Float and String:
The Float, we will have to multiply with 10^x, where x is (because of the precision) not 2 (as one would think with 1.91), but more something like 16 (remember: 1.9100...1).
For the String, we COULD cast it into a float and do the same trick, but hey, there is an easier way: We just count the number of characters behind the dot (which is 2), remove the dot and multiply the denom with 10^2... This is not only the easier, but also the more precise way.
The big numbers might disappear again, when applying step 3, that's why you will not always get those strange results when dealing with rationals from floats.
TLDR: The numbers will be built differently based on the arguments being String, or FLoat. FLoats can produce long-ass numbers, because precision.
The Float 1.91 is stored as a double which has a given amount of precision, limited by binary presentation. The equivalent Rational object retains this precision a such as possible, so it is huge. There is no way of storing 1.91 exactly in a double, but the value you get is close enough for most uses.
As for the String, it represents a different value - the exact value of 1.91 - and as you create a Rational it retains it better. It is more correct than the Float, UT takes longer to use for calculations.
This is similar to the problem with 1.0/3 as it "goes on forever" 0.333333...etc, but Rational can represent it exactly.
I'm trying to convince the person behind a Ruby lib dealing with money to use BigDecimal, not Float.
This library explicitly only supports two decimals of precision. It takes a float input (e.g. 12.34), turns it into a string (12.34.to_s # => "12.34") and sends it to an API. It then gets a string amount back ("56.78") that it turns into a float ("56.78".to_f # => 56.78).
I can easily reproduce Float rounding issues doing math on two-decimal floats (e.g. 139.25 + 74.79 # => 214.04000000000002).
But assume you don't do math. You only turn a two-decimal number represented as a Float into a string and back again. Is Ruby's Float then guaranteed to be reliable, or can you think of any case where it is not?
You can easily exceed floating point precision for larger numbers:
"100000000000000.01".to_f #=> 100000000000000.02
ree-1.8.7-2010.02 :003 > (10015.8*100.0).to_i
=> 1001579
ree-1.8.7-2010.02 :004 > 10015.8*100.0
=> 1001580.0
ree-1.8.7-2010.02 :005 > 1001580.0.to_i
=> 1001580
ruby 1.8.7 produces the same.
Does anybody knows how to eradicate this heresy? =)
Actually, all of this make sense.
Because 0.8 cannot be represented exactly by any series of 1 / 2 ** x for various x, it must be represented approximately, and it happens that this is slightly less than 10015.8.
So, when you just print it, it is rounded reasonably.
When you convert it to an integer without adding 0.5, it truncates .79999999... to .7
When you type in 10001580.0, well, that has an exact representation in all formats, including float and double. So you don't see the truncation of a value ever so slightly less than the next integral step.
Floating point is not inaccurate, it just has limitations on what can be represented. Yes, FP is perfectly accurate but cannot necessarily represent every number we can easily type in using base 10. (Update/clarification: well, ironically, it can represent exactly every integer, because every integer has a 2 ** x composition, but "every fraction" is another story. Only certain decimal fractions can be exactly composed using a 1/2**x series.)
In fact, JavaScript implementations use floating point storage and arithmetic for all numeric values. This is because FP hardware produces exact results for integers, so this got the JS guys 52-bit math using existing hardware on (at the time) almost-entirely 32-bit machines.
Due to truncation error in float calculation, 10015.8*100.0 is actually calculated as 1001579.999999... So if you simply apply to_i, it cuts off the decimal part and returns 1001579
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
>> sprintf("%.16f", 10015.8*100.0)
=> "1001579.9999999999000000"
And Float#to_i truncates this to 1001579.
I am writing code that will deal with currencies, charges, etc. I am going to use the BigDecimal class for math and storage, but we ran into something weird with it.
This statement:
1876.8 == BigDecimal('1876.8')
returns false.
If I run those values through a formatting string "%.13f" I get:
"%.20f" % 1876.8 => 1876.8000000000000
"%.20f" % BigDecimal('1876.8') => 1876.8000000000002
Note the extra 2 from the BigDecimal at the last decimal place.
I thought BigDecimal was supposed to counter the inaccuracies of storing real numbers directly in the native floating point of the computer. Where is this 2 coming from?
It won't give you as much control over the number of decimal places, but the conventional format mechanism for BigDecimal appears to be:
a.to_s('F')
If you need more control, consider using the Money gem, assuming your domain problem is mostly about currency.
gem install money
You are right, BigDecimal should be storing it correctly, my best guess is:
BigDecimal is storing the value correctly
When passed to a string formatting function, BigDecimal is being cast as a lower precision floating point value, creating the ...02.
When compared directly with a float, the float has an extra decimal place far beyond the 20 you see (classic floats can't be compared behavoir).
Either way, you are unlikely to get accurate results comparing a float to a BigDecimal.
Don't compare FPU decimal string fractions for equality
The problem is that the equality comparison of a floating or double value with a decimal constant that contains a fraction is rarely successful.
Very few decimal string fractions have exact values in the binary FP representation, so equality comparisons are usually doomed.*
To answer your exact question, the 2 is coming from a slightly different conversion of the decimal string fraction into the Float format. Because the fraction cannot be represented exactly, it's possible that two computations will consider different amounts of precision in intermediate calculations and ultimately end up rounding the result to a 52-bit IEEE 754 double precision mantissa differently. It hardly matters because there is no exact representation anyway, but one is probably more wrong than the other.
In particular, your 1876.8 cannot be represented exactly by an FP object, in fact, between 0.01 and 0.99, only 0.25, 0.50, and 0.75 have exact binary representations. All the others, include 1876.8, repeat forever and are rounded to 52 bits. This is about half of the reason that BigDecimal even exists. (The other half of the reason is the fixed precision of FP data: sometimes you need more.)
So, the result that you get when comparing an actual machine value with a decimal string constant depends on every single bit in the binary fraction ... down to 1/252 ... and even then requires rounding.
If there is anything even the slightest bit (hehe, bit, sorry) imperfect about the process that produced the number, or the input conversion code, or anything else involved, they won't look exactly equal.
An argument could even be made that the comparison should always fail because no IEEE-format FPU can even represent that number exactly. They really are not equal, even though they look like it. On the left, your decimal string has been converted to a binary string, and most of the numbers just don't convert exactly. On the right, it's still a decimal string.
So don't mix floats with BigDecimal, just compare one BigDecimal with another BigDecimal. (Even when both operands are floats, testing for equality requires great care or a fuzzy test. Also, don't trust every formatted digit: output formatting will carry remainders way off the right side of the fraction, so you don't generally start seeing zeroes, you will just see garbage values.)
*The problem: machine numbers are x/2n, but decimal constants are x/(2n * 5m). Your value as sign, exponent, and mantissa is the infinitely repeating 0 10000001001 1101010100110011001100110011001100110011001100110011... Ironically, FP arithmetic is perfectly precise and equality comparisons work perfectly well when the value has no fraction.
as David said, BigDecimal is storing it right
p (BigDecimal('1876.8') * 100000000000000).to_i
returns 187680000000000000
so, yes, the string formatting is ruining it
If you don't need fractional cents, consider storing and manipulating the currency as an integer, then dividing by 100 when it's time to display. I find that easier than dealing with the inevitable precision issues of storing and manipulating in floating point.
On Mac OS X, I'm running ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9]
irb(main):004:0> 1876.8 == BigDecimal('1876.8') => true
However, being Ruby, I think you should think in terms of messages sent to objects. What does this return to you:
BigDecimal('1876.8') == 1876.8
The two aren't equivalent, and if you're trying to use BigDecimal's ability to determine precise decimal equality, it should be the receiver of the message asking about the equality.
For the same reason I don't think formatting the BigDecimal by sending a format message to the format string is the right approach either.