Go Protobuf Precision Decimals - go

What is the correct scalar type to use in my protobuf definition file, if I want to transmit an arbitrary-precision decimal value?
I am using shopspring/decimal instead of a float64 in my Go code to prevent math errors. When writing my protobuf file with the intention of transmitting these values over gRPC, I could use:
double which translates to a float64
string which would certainly be precise in its own way but strikes me as clunky
Something like decimal from mgravell/protobuf-net?
Conventional wisdom has taught me to skirt floats in monetary applications, but I may be over-careful since it's a point of serialization.

If you really need arbitrary precision, I fear there is no correct answer right now. There is https://github.com/protocolbuffers/protobuf/issues/4406 open, but it does not seem to be very active. Without built-in support, you will really need to perform the serialization manually and then use either string or bytes to store the result. Which one to use between string and bytes likely depends on whether you need cross-platform/cross-library compatibility: if you need compatibility, use string and parse the decimal representation in the string using the appropriate arbitrary precision type in the reader; if you don't need it and you're going to read the data using the same cpu architecture and library you can probably just use the binary serialization provided by that library (MarshalBinary/UnmarshalBinary) and use bytes.
On the other hand, if you just need to send monetary values with an appropriate precision and do not need arbitrary precision, you can probably just use sint64/uint64 and use an appropriate unit (these are commonly called fixed-point numbers). To give an example, if you need to represent a monetary value in dollars with 4 decimal digits, your unit would be 1/10000th of a dollar so that e.g. the value 1 represents $0.0001, the value 19900 represents $1.99, -500000 represents $-50, and so on. With such a unit you can represent the range $-922,337,203,685,477.5808 to $922,337,203,685,477.5807 - that should likely be sufficient for most purposes. You will still need to perform the scaling manually, but it should be fairly trivial and portable. Given the range above, I would suggest using sint64 is preferable as it allows you also to represent negative values; uint64 should be considered only if you need the extra positive range and don't need negative values.
Alternatively, if you don't mind importing another package, you may want to take a look at https://github.com/googleapis/googleapis/blob/master/google/type/money.proto or https://github.com/googleapis/googleapis/blob/master/google/type/decimal.proto (that incidentally implement something very similar to the two models described above), and the related utility functions at https://pkg.go.dev/github.com/googleapis/go-type-adapters/adapters
As a side note, you are completely correct that you should almost never use floating point for monetary values.

Related

64 bit integer and 64 bit float homogeneous representation

Assume we have some sequence as input. For performance reasons we may want to convert it in homogeneous representation. And in order to transform it into homogeneous representation we are trying to convert it to same type. Here lets consider only 2 types in input - int64 and float64 (in my simple code I will use numpy and python; it is not the matter of this question - one may think only about 64-bit integer and 64-bit floats).
First we may try to cast everything to float64.
So we want something like so as input:
31 1.2 -1234
be converted to float64. If we would have all int64 we may left it unchanged ("already homogeneous"), or if something else was found we would return "not homogeneous". Pretty straightforward.
But here is the problem. Consider a bit modified input:
31000000 1.2 -1234
Idea is clear - we need to check that our "caster" is able to handle large by absolute value int64 properly:
format(np.float64(31000000), '.0f') # just convert to float64 and print
'31000000'
Seems like not a problem at all. So lets go to the deal right away:
im = np.iinfo(np.int64).max # maximum of int64 type
format(np.float64(im), '.0f')
format(np.float64(im-100), '.0f')
'9223372036854775808'
'9223372036854775808'
Now its really undesired - we lose some information which maybe needed. I.e. we want to preserve all the information provided in the input sequence.
So our im and im-100 values cast to the same float64 representation. The reason of this is clear - float64 has only 53 significand of total 64 bits. That is why its precision enough to represent log10(2^53) ~= 15.95 i.e. about all 16-length int64 without any information loss. But int64 type contains up to 19 digits.
So we end up with about [10^16; 10^19] (more precisely [10^log10(53); int64.max]) range in which each int64 may be represented with information loss.
Q: What decision in such situation should one made in order to represent int64 and float64 homogeneously.
I see several options for now:
Just convert all int64 range to float64 and "forget" about possible information loss.
Motivation here is "majority of input barely will be > 10^16 int64 values".
EDIT: This clause was misleading. In clear formulation we don't consider such solutions (but left it for completeness).
Do not make such automatic conversions at all. Only if explicitly specified.
I.e. we agree with performance drawbacks. For any int-float arrays. Even with ones as in simplest 1st case.
Calculate threshold for performing conversion to float64 without possible information loss. And use it while making casting decision. If int64 above this threshold found - do not convert (return "not homogeneous").
We've already calculate this threshold. It is log10(2^53) rounded.
Create new type "fint64". This is an exotic decision but I'm considering even this one for completeness.
Motivation here consists of 2 points. First one: it is frequent situation when user wants to store int and float types together. Second - is structure of float64 type. I'm not quite understand why one will need ~308 digits value range if significand consists only of ~16 of them and other ~292 is itself a noise. So we might use one of float64 exponent bits to indicate whether its float or int is stored here. But for int64 it would be definitely drawback to lose 1 bit. Cause would reduce our integer range twice. But we would gain possibility freely store ints along with floats without any additional overhead.
EDIT: While my initial thinking of this was as "exotic" decision in fact it is just a variant of another solution alternative - composite type for our representation (see 5 clause). But need to add here that my 1st composition has definite drawback - losing some range for float64 and for int64. What we rather do - is not to subtract 1 bit but add one bit which represents a flag for int or float type stored in following 64 bits.
As proposed #Brendan one may use composite type consists of "combination of 2 or more primitive types". So using additional primitives we may cover our "problem" range for int64 for example and get homogeneous representation in this "new" type.
EDITs:
Because here question arisen I need to try be very specific: Devised application in question do following thing - convert sequence of int64 or float64 to some homogeneous representation lossless if possible. The solutions are compared by performance (e.g. total excessive RAM needed for representation). That is all. No any other requirements is considered here (cause we should consider a problem in its minimal state - not writing whole application). Correspondingly algo that represents our data in homogeneous state lossless (we are sure we not lost any information) fits into our app.
I've decided to remove words "app" and "user" from question - it was also misleading.
When choosing a data type there are 3 requirements:
if values may have different signs
needed precision
needed range
Of course hardware doesn't provide a lot of types to choose from; so you'll need to select the next largest provided type. For example, if you want to store values ranging from 0 to 500 with 8 bits of precision; then hardware won't provide anything like that and you will need to use either 16-bit integer or 32-bit floating point.
When choosing a homogeneous representation there are 3 requirements:
if values may have different signs; determined from the requirements from all of the original types being represented
needed precision; determined from the requirements from all of the original types being represented
needed range; determined from the requirements from all of the original types being represented
For example, if you have integers from -10 to +10000000000 you need a 35 bit integer type that doesn't exist so you'll use a 64-bit integer, and if you need floating point values from -2 to +2 with 31 bits of precision then you'll need a 33 bit floating point type that doesn't exist so you'll use a 64-bit floating point type; and from the requirements of these two original types you'll know that a homogeneous representation will need a sign flag, a 33 bit significand (with an implied bit), and a 1-bit exponent; which doesn't exist so you'll use a 64-bit floating point type as the homogeneous representation.
However; if you don't know anything about the requirements of the original data types (and only know that whatever the requirements were they led to the selection of a 64-bit integer type and a 64-bit floating point type), then you'll have to assume "worst cases". This leads to needing a homogeneous representation that has a sign flag, 62 bits of precision (plus an implied 1 bit) and an 8 bit exponent. Of course this 71 bit floating point type doesn't exist, so you need to select the next largest type.
Also note that sometimes there is no "next largest type" that hardware supports. When this happens you need to resort to "composed types" - a combination of 2 or more primitive types. This can include anything up to and including "big rational numbers" (numbers represented by 3 big integers in "numerator / divisor * (1 << exponent)" form).
Of course if the original types (the 64-bit integer type and 64-bit floating point type) were primitive types and your homogeneous representation needs to use a "composed type"; then your "for performance reasons we may want to convert it in homogeneous representation" assumption is likely to be false (it's likely that, for performance reasons, you want to avoid using a homogeneous representation).
In other words:
If you don't know anything about the requirements of the original data types, it's likely that, for performance reasons, you want to avoid using a homogeneous representation.
Now...
Let's rephrase your question as "How to deal with design failures (choosing the wrong types which don't meet requirements)?". There is only one answer, and that is to avoid the design failure. Run-time checks (e.g. throwing an exception if the conversion to the homogeneous representation caused precision loss) serve no purpose other than to notify developers of design failures.
It is actually very basic: use 64 bits floating point. Floating point is an approximation, and you will loose precision for many ints. But there are no uncertainties other than "might this originally have been integral" and "does the original value deviates more than 1.0".
I know of one non-standard floating point representation that would be more powerfull (to be found in the net). That might (or might not) help cover the ints.
The only way to have an exact int mapping, would be to reduce the int range, and guarantee (say) 60 bits ints to be precise, and the remaining range approximated by floating point. Floating point would have to be reduced too, either exponential range as mentioned, or precision (the mantissa).

Go type for purchasing/financial calculations

I'm building an online store in Go. As would be expected, several important pieces need to record exact monetary amounts. I'm aware of the rounding problems associated with floats (i.e. 0.3 cannot be exactly represented, etc.).
The concept of currency seems easy to just represent as a string. However, I'm unsure of what type would be most appropriate to express the actual monetary amount in.
The key requirements would seem to be:
Can exactly express decimal numbers down to a specified number of decimal places, based on the currency (some currencies use more than 2 decimal places: http://www.londonfx.co.uk/ccylist.html )
Obviously basic arithmetic operations are needed - add/sub/mul/div.
Sane string conversion - which would essentially mean conversion to it's decimal equivalent. Also, internationalization would need to be at least possible, even if all of the logic for that isn't built in (1.000 in Europe vs 1,000 in the US).
Rounding, possibly using alternate rounding schemes like Banker's rounding.
Needs to have a simple and obvious way to correspond to a database value - MySQL in my case. (Might make the most sense to treat the value as a string at this level in order to ensure it's value is preserved exactly.)
I'm aware of math/big.Rat and it seems to solve a lot of these things, but for example it's string output won't work as-is, since it will output in the "a/b" form. I'm sure there is a solution for that too, but I'm wondering if there is some sort of existing best practice that I'm not aware of (couldn't easily find) for this sort of thing.
UPDATE: This package looks promising: https://code.google.com/p/godec/
You should keep i18n decoupled from your currency implementation. So no, don't bundle everything in a struct and call it a day. Mark what currency the amount represents but nothing more. Let i18n take care of formatting, stringifying, prefixing, etc.
Use an arbitrary precision numerical type like math/big.Rat. If that is not an option (because of serialization limitations or other barriers), then use the biggest fixed-size integer type you can use to represent the amount of atomic money in whatever currency you are representing – cents for USD, yens for JPY, rappen for CHF, cents for EUR, and so forth.
When using the second approach take extra care to not incur in overflows and define a clear and meaningful rounding behaviour for division.

Arbitrary base conversion algorithm for (textually represented) integers

I am looking a general algorithm that would convert from one (arbitrary) numerical base to another (also arbitrary) without storing the result in a large integer and performing arithmetic operations on it in between.
The algorithm I am looking for takes an array of numerical values in a given base (that would mostly be a string of characters) and returns the result alike.
Thank you for help.
I would say it is not possible. For certain bases it would be possible to convert from one string to another, by just streaming the chars through (e.g. if one base is a multiple of the other, like octal->hex), but for arbitrary bases it is not possible without arithmetic operations.
If you would do it with strings/chars in between it would be still big integer arithmetic, but your integers were just in a (unnecessary big) unusual format.
So you have just the choice between: Either reprogram arithmetic operations with char encoded numbers, or do the step and use a big integer library and walk the convert(char(base1->bigInt), convert(bigInt->base2) path.
It's computable, but it's not pretty.
Seriously, it'd probably be easier and faster to include one of the many bignum libraries or write your own.

How do I trim the zero value after decimal

As I tried to debug, I found that : just as I type in
Dim value As Double
value = 0.90000
then hit enter, and it automatically converts to 0.9
Shouldn't it keep the precision in double in visual basic?
For my calculation, I absolutely need to show the precision
If precision is required then the Currency data type is what you want to use.
There are at least two representations of your value in play. One is the value you see on the screen -- a string -- and one is the internal representation -- a binary value. In dealing with fractional values, the two are often not equivalent and where they aren't, it's because they can't be.
If you stick with doubles, VB will maintain 53 bits of mantissa throughout your calculations, no matter how they might appear when printed. If you transition through the string domain, say by saving to a file or DB and later retrieving, it often has to leave some of that precision behind. It's inevitable, because the interface between the two domains is not perfect. Some values that can be exactly represented as strings (or Decimals, that is, powers of ten) can't be exactly represented as fractional powers of 2.
This has nothing to do with VB, it's the nature of floating point. The best you can do is control where the rounding occurs. For this purpose your friend is the Format function, which controls how a value appears in string form.
? Format$(0.9, "0.00000") will show you an example.
You are getting what you see on the screen confused with what bits are being set in the Double to make that number.
VB is simply being "helpful", and simply knocking off excess zeros. But for all intents and purposes,
0.9
is identical to
0.90000
If you don't believe me, try doing this comparison:
Debug.Print CDbl("0.9") = CDbl("0.90000")
As has already been said, displayed precision can be shown using the Format$() function, e.g.
Debug.Print Format$(0.9, "0.00000")
No, it shouldn't keep the precision. Binary floating point values don't retain this information... and it would be somewhat odd to do so, given that you're expressing the value in one base even though it's being represented in another.
I don't know whether VB6 has a decimal floating point type, but that's probably what you want - or a fixed point decimal type, perhaps. Certainly in .NET, System.Decimal has retained extra 0s from .NET 1.1 onwards. If this doesn't help you, you could think about remembering two integers - e.g. "90000" and "100000" in this case, so that the value you're representing is one integer divided by another, with the associated level of precision.
EDIT: I thought that Currency may be what you want, but according to this article, that's fixed at 4 decimal places, and you're trying to retain 5. You could potentially just multiply by 10, if you always want 5 decimal places - but it's an awkward thing to remember to do everywhere... and you'd have to work out how to format it appropriately. It would also always be 4 decimal places, I suspect, even if you'd specified fewer - so if you want "0.300" to be different to "0.3000" then Currency may not be appropriate. I'm entirely basing this on articles online though...
You can also enter the value as 0.9# instead. This helps avoid implicit coercion within an expression that may truncate the precision you expect. In most cases the compiler won't require this hint though because floating point literals default to Double (indeed, the IDE typically deletes the # symbol unless the value was an integer, e.g. 9#).
Contrast the results of these:
MsgBox TypeName(0.9)
MsgBox TypeName(0.9!)
MsgBox TypeName(0.9#)

Hashtables/Dictionaries that use floats/doubles

I read somewhere about other data structures similar to hashtables, dictionaries but instead of using ints, they were using floats/doubles, etc.
Anyone knows what they are?
If you mean using floats/doubles as keys in your hash, that's easy. For example, in .NET, it's just using Dictionary<double,MyValueType>.
If you're talking about having the hash be based off a double instead of an int....
Technically, you can have any element as your internal hash. Normally, this is done using an int or long, since these are fast, and the hashing algorithm is easy to compute.
However, the hash is really just a BitArray at heart, so anything would work. There really isn't much advantage to making this something other than an int or long, other than potentially allowing a larger set of hash values (ie: if you go to an 8 byte or larger type for your hash).
You mean as keys? That strikes me as tricky.
If you're using them as arbitrary keys, they're no better than integers.
If you expect to calculate a floating-point value and use it to look something up in a hash table, you're living very dangerously. Floating point numbers do not have infinite precision, and calculating the same thing in two slightly different ways can result in very tiny differences in the result. Hash keys rely on getting the exact same thing every time, so you'd have to be careful to round, and round in exactly the same way at all times. This is trickier than it sounds, by the way.
So, what would you do with floating-point hashes?
A hash algorithm is, in general terms, just a function that produces a smaller output from a larger input. Good hash functions have interesting properties like a large change in output for a small change in the input, and an assurance that they produce every possible output value for some input.
It's not hard to write a simple polynomial type hash function that outputs a floating-point value, rather than an integer value, but it's difficult to ensure that the resulting hash function has the desired properties without getting into the details of the particular floating-point representation used.
At least part of the reason that hash functions are nearly always implemented in integer arithmetic is because proving various properties about an integer calculation is easier than doing the same for a floating point calculation.
It's fairly easy to prove that some (sum of prime factors) modulo (another prime) must, necessarily, produce every possible output for some input. Doing the same for a calculation with a bunch of floating-point fractions would be a drag.
Add to that the relative difficulty of storing and transmitting floating-point values without corruption, and it's just not worth it.
Your question history shows that you use .Net, so I'll answer in that context.
If you want a Dictionary that is type aware, such that you can specify it should use floats or doubles for the keys or values, use System.Collections.Generic.Dictionary<T, U> http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
If you want a Dictionary that is type blind, such that you can use floats AND doubles for keys and values, use System.Collections.HashTable http://msdn.microsoft.com/en-us/library/system.collections.hashtable.aspx

Resources