Ruby JSON parse non-integer numbers as strings (or BigDecimal)

Ruby JSON parse non-integer numbers as strings (or BigDecimal) - ruby

Is there any way to tell the default ruby JSON library to parse non-integer numeric values as string (or BigDecimal?) instead of floats?
ie JSON.parse('{"foo": 123.45}')['foo'].class outputs Float, which may lead to precision issues.
PD: the oj library supports loading these values as BigDecimals.
PD2: seems there isn't: https://github.com/flori/json/blob/76f41a84e2bace20c3076aba53887537e37dfdb2/lib/json/pure/parser.rb#L196

In theory JSON as a container could hold highly precise numbers, but in practice one end is generally limited to IEEE 754 double precision floating point numbers as that is what JavaScript itself is limited to. Any precision loss will already be incurred if the values are encoded in JavaScript or almost any JSON implementation.
Hence, converting to BigDecimal from the parsed Float will almost always result in no additional loss of precision:
data = JSON.parse("[1.025]")
# Float can't represent decimal values precisely, so `round` fails
data.first.round(2) # => 1.02
# Converting to big decimal improves the precision of future operations
BigDecimal.new(data.first.to_s).round(2).to_s # => "1.03"
You are much better off transporting your highly precise values as strings.
Lastly, if you really need to ruby libraries can always be monkey patched to behave how you want.

Related

Convert string currency to float with ruby

I have the following string:
"1.273,08"
And I need to convert to float and the result must be:
1273.08
I tried some code using gsub but I can't solve this.
How can I do this conversion?

You have already received two good answers how to massage your String into your desired format using String#delete and String#tr.
But there is a deeper problem.
The decimal value 1 273.0810 cannot be accurately represented as an IEEE 754-2019 / ISO/IEC 60559:2020 binary64 floating point value.
Just like the value 1/3rd can easily be represented in ternary (0.13) but has an infinite representation in decimal (0.33333333…10, i.e. 0.[3]…10) and thus cannot be accurately represented, the value 8/100th can easily be represented in decimal (0.0810) but has an infinite representation in binary (0.0001010001111010111000010100011110101110000101…2, i.e. 0.[00010100011110101110]…2). In other words, it is impossible to express 1 273.0810 as a Ruby Float.
And that's not specific to Ruby, or even to programming, that is just basic high school maths: you cannot represent this number in binary, period, just like you cannot represent 1/3rd in decimal, or π in any integer base.
And of course, computers don't have infinite memory, so not only does 1 273.0810 have an infinite representation in binary, but as a Float, it will also be cut off after 64 bits. The closest possible value to 1 273.0810 as an IEEE 754-2019 / ISO/IEC 60559:2020 binary64 floating point value is 1 273.079 999 999 999 927 240 423 858 1710, which is less than 1 273.0810.
That is why you should never represent money using binary numbers: everybody will expect it to be decimal, not binary; if I write a cheque, I write it in decimal, not binary. People will expect that it is impossible to represent $ 1/3rd, they will expect that it is impossible to represent $ π, but they will not expect and not accept that if they put $ 1273.08 into their account, they will actually end up with slightly less than that.
The correct way to represent money would be to use a specialized Money datatype, or at least using the bigdecimal library from the standard library:
require 'bigdecimal'
BigDecimal('1.273,08'.delete('.').tr(',', '.'))
#=> 0.127308e4

I would do
"1.273,08".delete('.') # delete '.' from the string
.tr(',', '.') # replace ',' with '.'
.to_f # translate to float
#=> 1273.08

So, we're using . as a thousands separator and , instead of a dot:
str = "1.273,08"
str.gsub('.','').gsub(',', '.').to_f

Go Protobuf Precision Decimals

What is the correct scalar type to use in my protobuf definition file, if I want to transmit an arbitrary-precision decimal value?
I am using shopspring/decimal instead of a float64 in my Go code to prevent math errors. When writing my protobuf file with the intention of transmitting these values over gRPC, I could use:
double which translates to a float64
string which would certainly be precise in its own way but strikes me as clunky
Something like decimal from mgravell/protobuf-net?
Conventional wisdom has taught me to skirt floats in monetary applications, but I may be over-careful since it's a point of serialization.

If you really need arbitrary precision, I fear there is no correct answer right now. There is https://github.com/protocolbuffers/protobuf/issues/4406 open, but it does not seem to be very active. Without built-in support, you will really need to perform the serialization manually and then use either string or bytes to store the result. Which one to use between string and bytes likely depends on whether you need cross-platform/cross-library compatibility: if you need compatibility, use string and parse the decimal representation in the string using the appropriate arbitrary precision type in the reader; if you don't need it and you're going to read the data using the same cpu architecture and library you can probably just use the binary serialization provided by that library (MarshalBinary/UnmarshalBinary) and use bytes.
On the other hand, if you just need to send monetary values with an appropriate precision and do not need arbitrary precision, you can probably just use sint64/uint64 and use an appropriate unit (these are commonly called fixed-point numbers). To give an example, if you need to represent a monetary value in dollars with 4 decimal digits, your unit would be 1/10000th of a dollar so that e.g. the value 1 represents $0.0001, the value 19900 represents $1.99, -500000 represents $-50, and so on. With such a unit you can represent the range $-922,337,203,685,477.5808 to $922,337,203,685,477.5807 - that should likely be sufficient for most purposes. You will still need to perform the scaling manually, but it should be fairly trivial and portable. Given the range above, I would suggest using sint64 is preferable as it allows you also to represent negative values; uint64 should be considered only if you need the extra positive range and don't need negative values.
Alternatively, if you don't mind importing another package, you may want to take a look at https://github.com/googleapis/googleapis/blob/master/google/type/money.proto or https://github.com/googleapis/googleapis/blob/master/google/type/decimal.proto (that incidentally implement something very similar to the two models described above), and the related utility functions at https://pkg.go.dev/github.com/googleapis/go-type-adapters/adapters
As a side note, you are completely correct that you should almost never use floating point for monetary values.

Why does ruby BigDecimal show representation inaccuracy similar to float?

Certain floating point numbers have inherent inaccuracy from binary floating point representation:
> puts "%.50f" % (0.5) # cleanly representable
0.50000000000000000000000000000000000000000000000000
> puts "%.50f" % (0.1) # not cleanly representable
0.10000000000000000555111512312578270211815834045410
This is nothing new. But why does ruby's BigDecimal also show this behaviour?
> puts "%.50f" % ("0.1".to_d)
0.10000000000000000555111512312578270211815834045410
(I'm using the rails shorthand .to_d instead of BigDecimal.new for brevity only, this is not a rails specific question.)
Question: Why is "0.1".to_d still showing errors on the order of 10-17? I thought the purpose of BigDecimal was expressly to avoid inaccuracies like that?
At first I thought this was because I was converting an already inaccurate floating point 0.1 to BigDecimal, and BigDecimal was just losslessly representing the inaccuraccy. But I made sure I was using the string constructor (as in the snippet above), which should avoid the problem.
EDIT:
A bit more investigation shows that BigDecimal does still internally represent things cleanly. (Obvious, because otherwise this would be a huge bug in a very widely used system.) Here's an example with an operation that would still show error:
> puts "%.50f" % ("0.1".to_d * "10".to_d)
1.00000000000000000000000000000000000000000000000000
If the representation were lossy, that would show the same error as above, just shifted by an order of magnitude. What is going on here?

The %.50f specifier takes a floating point value, so that decimal value needs to be converted to floating point before it's rendered for display, and as such is subjected to the same floating point noise you get in ordinary floating point values.
sprintf and friends, like the String#% method, do conversions automatically depending on the type specified in the placeholder.
To suppress that you'd have to use the .to_s method on the BigDecimal number directly. It can take an optional format specifier if you want a certain number of places, and this can be chained in to a %s placeholder in your other string.

Are two-decimal Ruby Floats guaranteed accurate if you turn them into strings and back again?

I'm trying to convince the person behind a Ruby lib dealing with money to use BigDecimal, not Float.
This library explicitly only supports two decimals of precision. It takes a float input (e.g. 12.34), turns it into a string (12.34.to_s # => "12.34") and sends it to an API. It then gets a string amount back ("56.78") that it turns into a float ("56.78".to_f # => 56.78).
I can easily reproduce Float rounding issues doing math on two-decimal floats (e.g. 139.25 + 74.79 # => 214.04000000000002).
But assume you don't do math. You only turn a two-decimal number represented as a Float into a string and back again. Is Ruby's Float then guaranteed to be reliable, or can you think of any case where it is not?

You can easily exceed floating point precision for larger numbers:
"100000000000000.01".to_f #=> 100000000000000.02

Why is BigDecimal returning a weird value?

I am writing code that will deal with currencies, charges, etc. I am going to use the BigDecimal class for math and storage, but we ran into something weird with it.
This statement:
1876.8 == BigDecimal('1876.8')
returns false.
If I run those values through a formatting string "%.13f" I get:
"%.20f" % 1876.8 => 1876.8000000000000
"%.20f" % BigDecimal('1876.8') => 1876.8000000000002
Note the extra 2 from the BigDecimal at the last decimal place.
I thought BigDecimal was supposed to counter the inaccuracies of storing real numbers directly in the native floating point of the computer. Where is this 2 coming from?

It won't give you as much control over the number of decimal places, but the conventional format mechanism for BigDecimal appears to be:
a.to_s('F')
If you need more control, consider using the Money gem, assuming your domain problem is mostly about currency.
gem install money

You are right, BigDecimal should be storing it correctly, my best guess is:
BigDecimal is storing the value correctly
When passed to a string formatting function, BigDecimal is being cast as a lower precision floating point value, creating the ...02.
When compared directly with a float, the float has an extra decimal place far beyond the 20 you see (classic floats can't be compared behavoir).
Either way, you are unlikely to get accurate results comparing a float to a BigDecimal.

Don't compare FPU decimal string fractions for equality
The problem is that the equality comparison of a floating or double value with a decimal constant that contains a fraction is rarely successful.
Very few decimal string fractions have exact values in the binary FP representation, so equality comparisons are usually doomed.*
To answer your exact question, the 2 is coming from a slightly different conversion of the decimal string fraction into the Float format. Because the fraction cannot be represented exactly, it's possible that two computations will consider different amounts of precision in intermediate calculations and ultimately end up rounding the result to a 52-bit IEEE 754 double precision mantissa differently. It hardly matters because there is no exact representation anyway, but one is probably more wrong than the other.
In particular, your 1876.8 cannot be represented exactly by an FP object, in fact, between 0.01 and 0.99, only 0.25, 0.50, and 0.75 have exact binary representations. All the others, include 1876.8, repeat forever and are rounded to 52 bits. This is about half of the reason that BigDecimal even exists. (The other half of the reason is the fixed precision of FP data: sometimes you need more.)
So, the result that you get when comparing an actual machine value with a decimal string constant depends on every single bit in the binary fraction ... down to 1/252 ... and even then requires rounding.
If there is anything even the slightest bit (hehe, bit, sorry) imperfect about the process that produced the number, or the input conversion code, or anything else involved, they won't look exactly equal.
An argument could even be made that the comparison should always fail because no IEEE-format FPU can even represent that number exactly. They really are not equal, even though they look like it. On the left, your decimal string has been converted to a binary string, and most of the numbers just don't convert exactly. On the right, it's still a decimal string.
So don't mix floats with BigDecimal, just compare one BigDecimal with another BigDecimal. (Even when both operands are floats, testing for equality requires great care or a fuzzy test. Also, don't trust every formatted digit: output formatting will carry remainders way off the right side of the fraction, so you don't generally start seeing zeroes, you will just see garbage values.)
*The problem: machine numbers are x/2n, but decimal constants are x/(2n * 5m). Your value as sign, exponent, and mantissa is the infinitely repeating 0 10000001001 1101010100110011001100110011001100110011001100110011... Ironically, FP arithmetic is perfectly precise and equality comparisons work perfectly well when the value has no fraction.

as David said, BigDecimal is storing it right
p (BigDecimal('1876.8') * 100000000000000).to_i
returns 187680000000000000
so, yes, the string formatting is ruining it

If you don't need fractional cents, consider storing and manipulating the currency as an integer, then dividing by 100 when it's time to display. I find that easier than dealing with the inevitable precision issues of storing and manipulating in floating point.

On Mac OS X, I'm running ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9]
irb(main):004:0> 1876.8 == BigDecimal('1876.8') => true
However, being Ruby, I think you should think in terms of messages sent to objects. What does this return to you:
BigDecimal('1876.8') == 1876.8
The two aren't equivalent, and if you're trying to use BigDecimal's ability to determine precise decimal equality, it should be the receiver of the message asking about the equality.
For the same reason I don't think formatting the BigDecimal by sending a format message to the format string is the right approach either.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio