DBL_MAX & Max value of a double

DBL_MAX & Max value of a double - cocoa

This line:
NSLog(#"DBL_MAX: %f", DBL_MAX);
prints this very large value:
17976931348623157081452742373170435679807056752584499659891747680315726078002853876058955863276687817154045895351438246423432132688946418276846754670353751698604991057655128207624549009038932894407586850845513394230458323690322294816580855933212334827479
However, when I test a double value like this:
double test = 9999999999999999.0;
NSLog(#"test: %f", test);
I get this unexpected result:
test: 10000000000000000.000000
This appears to be the maximum number of digits that produce the expected result:
double test = 999999999999999.0;
NSLog(#"test: %f", test);
test: 999999999999999.000000
How can I work with higher positive fractions?
Thanks.

Unfortunately, I can't answer the question directly, as I don't understand what you mean by "How can I work with higher positive fractions?".
However, I can shed some light on what a floating-point number is ans what it isn't.
A floating-point number consists of:
A sign (plus or minus)
An exponent
A value (known as the "mantissa").
These are combined using a clever encoding typically into 32, 64, 80, or 128 bits. In addition, some special encodings are used to represent +-infinity, Not a Number (NaN), and +-Zero.
As the mantissa has a limited number of bits, your value can only have this number of significant bits. A really small floating-point number can represent values in the 10^-308 and and large one 10^308. However, any number can only have about 16 decimal digits.
On other words, the print-out if DBL_MAX does not corresponds the amount of information stored in the variable. For example, there is no way to represent the same number but with a ...7480 instead of ...7479 at the end.
So back to the question, in order to tell how to represent your values, you must describe what kind of values you want to represent. Are they really fractions (i.e. one integer divided by another integer), in that case you might want to represent this using two integers. If you want to represent really large values, you might want to use packages like http://gmplib.org

Floating point in C# doesn't produce accurate results all the time. There are numbers that cannot be represented in double, floats or decimals. You can improve your accuracy by using "decimal" instead of "double", but it still doesn't ensure that all numbers will be represented exactly.

Related

Fastest algorithm to convert hexadecimal numbers into decimal form without using a fixed length variable to store the result

I want to write a program to convert hexadecimal numbers into their decimal forms without using a variable of fixed length to store the result because that would restrict the range of inputs that my program can work with.
Let's say I were to use a variable of type long long int to calculate, store and print the result. Doing so would limit the range of hexadecimal numbers that my program can handle to between 8000000000000001 and 7FFFFFFFFFFFFFFF. Anything outside this range would cause the variable to overflow.
I did write a program that calculates and stores the decimal result in a dynamically allocated string by performing carry and borrow operations but it runs much slower, even for numbers that are as big as 7FFFFFFFF!
Then I stumbled onto this site which could take numbers that are way outside the range of a 64 bit variable. I tried their converter with numbers much larger than 16^65 - 1 and still couldn't get it to overflow. It just kept on going and printing the result.
I figured that they must be using a much better algorithm for hex to decimal conversion, one that isn't limited to 64 bit values.
So far, Google's search results have only led me to algorithms that use some fixed-length variable for storing the result.
That's why I am here. I wanna know if such an algorithm exists and if it does, what is it?

Well, it sounds like you already did it when you wrote "a program that calculates and stores the decimal result in a dynamically allocated string by performing carry and borrow operations".
Converting from base 16 (hexadecimal) to base 10 means implementing multiplication and addition of numbers in a base 10x representation. Then for each hex digit d, you calculate result = result*16 + d. When you're done you have the same number in a 10-based representation that is easy to write out as a decimal string.
There could be any number of reasons why your string-based method was slow. If you provide it, I'm sure someone could comment.
The most important trick for making it reasonably fast, though, is to pick the right base to convert to and from. I would probably do the multiplication and addition in base 109, so that each digit will be as large as possible while still fitting into a 32-bit integer, and process 7 hex digits at a time, which is as many as I can while only multiplying by single digits.
For every 7 hex digts, I'd convert them to a number d, and then do result = result * ‭(16^7) + d.
Then I can get the 9 decimal digits for each resulting digit in base 109.
This process is pretty easy, since you only have to multiply by single digits. I'm sure there are faster, more complicated ways that recursively break the number into equal-sized pieces.

Is using integers as fractional coefficients instead of floats a good idea for a monetary application?

My application requires a fractional quantity multiplied by a monetary value.
For example, $65.50 × 0.55 hours = $36.025 (rounded to $36.03).
I know that floats should not be used to represent money, so I'm storing all of my monetary values as cents. $65.50 in the above equation is stored as 6550 (integer).
For the fractional coefficient, my issue is that 0.55 does not have a 32-bit float representation. In the use case above, 0.55 hours == 33 minutes, so 0.55 is an example of a specific value that my application will need to account for exactly. The floating point representation of 0.550000012 is insufficient, because the user will not understand where the additional 0.000000012 came from. I cannot simply call a rounding function on 0.550000012 because it will round to the whole number.
Multiplication solution
To solve this, my first idea was to store all quantities as integers and multiply × 1000. So 0.55 entered by the user would become 550 (integer) when stored. All calculations would happen without floats, and then simply divide by 1000 (integer division, not float) when presenting the result to the user.
I realize that this would permanently limit me to 3 decimal places of
precision. If I decide that 3 is adequate for the lifetime of my
application, does this approach make sense?
Are there potential rounding issues if I were to use integer division?
Is there a name for this process? EDIT: As indicated by #SergGr, this is fixed-point arithmetic.
Is there a better approach?
EDIT:
I should have clarified, this is not time-specific. It is for generic quantities like 1.256 pounds of flour, 1 sofa, or 0.25 hours (think invoices).
What I'm trying to replicate here is a more exact version of Postgres's extra_float_digits = 0 functionality, where if the user enters 0.55 (float32), the database stores 0.550000012 but when queried for the result returns 0.55 which appears to be exactly what the user typed.
I am willing to limit this application's precision to 3 decimal places (it's business, not scientific), so that's what made me consider the × 1000 approach.
I'm using the Go programming language, but I'm interested in generic cross-language solutions.

Another solution to store the result is using the rational form of the value. You can explain the number by two integer value which the number is equal p/q, such that both p and q are integers. Hence, you can have more precision for your numbers and do some math with the rational numbers in the format of two integers.

Note: This is an attempt to merge different comments into one coherent answer as was requested by Matt.
TL;DR
Yes, this approach makes sense but most probably is not the best choice
Yes, there are rounding issues but there inevitably will be some no matter what representation you use
What you suggest using is called Decimal fixed point numbers
I'd argue yes, there is a better approach and it is to use some standard or popular decimal floating point numbers library for your language (Go is not my native language so I can't recommend one)
In PostgreSQL it is better to use Numeric (something like Numeric(15,3) for example) rather than a combination of float4/float8 and extra_float_digits. Actually this is what the first item in the PostgreSQL doc on Floating-Point Types suggests:
If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead.
Some more details on how non-integer numbers can be stored
First of all there is a fundamental fact that there are infinitely many numbers in the range [0;1] so you obviously can't store every number there in any finite data structure. It means you have to make some compromises: no matter what way you choose, there will be some numbers you can't store exactly so you'll have to round.
Another important point is that people are used to 10-based system and in that system only results of division by numbers in a form of 2^a*5^b can be represented using a finite number of digits. For every other rational number even if you somehow store it in the exact form, you will have to do some truncation and rounding at the formatting for human usage stage.
Potentially there are infinitely many ways to store numbers. In practice only a few are widely used:
floating point numbers with two major branches of binary (this is what most today's hardware natively implements and what is support by most of the languages as float or double) and decimal. This is the format that store mantissa and exponent (can be negative), so the number is mantissa * base^exponent (I omit sign and just say it is logically a part of the mantissa although in practice it is usually stored separately). Binary vs. decimal is specified by the base. For example 0.5 will be stored in binary as a pair (1,-1) i.e. 1*2^-1 and in decimal as a pair (5,-1) i.e. 5*10^-1. Theoretically you can use any other base as well but in practice only 2 and 10 make sense as the bases.
fixed point numbers with the same division in binary and decimal. The idea is the same as in floating point numbers but some fixed exponent is used for all the numbers. What you suggests is actually a decimal fixed point number with the exponent fixed at -3. I've seen a usage of binary fixed-point numbers on some embedded hardware where there is no built-in support of floating point numbers, because binary fixed-point numbers can be implemented with reasonable efficiency using integer arithmetic. As for decimal fixed-point numbers, in practice they are not much easier to implement that decimal floating-point numbers but provide much less flexibility.
rational numbers format i.e. the value is stored as a pair of (p, q) which represents p/q (and usually q>0 so sign stored in p and either p=0, q=1 for 0 or gcd(p,q) = 1 for every other number). Usually this requires some big integer arithmetic to be useful in the first place (here is a Go example of math.big.Rat). Actually this might be an useful format for some problems and people often forget about this possibility, probably because it is often not a part of a standard library. Another obvious drawback is that as I said people are not used to think in rational numbers (can you easily compare which is greater 123/456 or 213/789?) so you'll have to convert the final results to some other form. Another drawback is that if you have a long chain of computations, internal numbers (p and q) might easily become very big values so computations will be slow. Still it may be useful to store intermediate results of calculations.
In practical terms there is also a division into arbitrary length and fixed length representations. For example:
IEEE 754 float or double are fixed length floating-point binary representations,
Go math.big.Float is an arbitrary length floating-point binary representations
.Net decimal is a fixed length floating-point decimal representations
Java BigDecimal is an arbitrary length floating-point decimal representations
In practical terms I'd says that the best solution for your problem is some big enough fixed length floating point decimal representations (like .Net decimal). An arbitrary length implementation would also work. If you have to make an implementation from scratch, than your idea of a fixed length fixed point decimal representation might be OK because it is the easiest thing to implement yourself (a bit easier than the previous alternatives) but it may become a burden at some point.

As mentioned in the comments, it would be best to use some builtin Decimal module in your language to handle exact arithmetic. However, since you haven't specified a language, we cannot be certain that your language may even have such a module. If it does not, here is how to go about doing so.
Consider using Binary Coded Decimal to store your values. The way it works is by restricting the values that can be stored per byte to 0 through 9 (inclusive), "wasting" the rest. You can encode a decimal representation of a number byte by byte that way. For example, 613 would become
6 -> 0000 0110
1 -> 0000 0001
3 -> 0000 0011
613 -> 0000 0110 0000 0001 0000 0011
Where each grouping of 4 digits above is a "nibble" of a byte. In practice, a packed variant is used, where two decimal digits are packed into a byte (one per nibble) to be less "wasteful". You can then implement a few methods to do your basic addition, subtract, multiplication, etc. Just iterate over an array of bytes, and perform your classic grade school addition / multiplication algorithms (keep in mind for the packed variant that you may need to pad a zero to get an even number of nibbles). You just need to keep a variable to store where the decimal point is, and remember to carry where necessary to preserve the encoding.

why does Ruby's Rational class treat string arguments differently from numeric arguments?

I'm using ruby's Rational library to convert the width & height of images to aspect ratios.
I've noticed that string arguments are treated differently than numeric arguments.
>> Rational('1.91','1')
=> (191/100)
>> Rational(1.91,1)
=> (8601875288277647/4503599627370496)
>> RUBY_VERSION
=> "2.1.5"
>> RUBY_ENGINE
=> "ruby"
FYI 1.91:1 is an aspect ratio recommended by Facebook for images on their platform.
Values like 191 and 100 are much more convenient to store in my database than 8601875288277647 and 4503599627370496. But I'd like to understand where this different originates before deciding which approach to use.
The Rational test suite doesn't seem to cover this exact case.

Disclaimer: This is only an educated guess, based on some knowledge on how to implement such a feat.
As Kent Dahl already said, Floats are not precise, they have a fixed precision, which means 1.91 is really 1.910000000000000000001 or something like that, which ruby "knows" should be displayed as 1.91.
"1.91" on the other hand is a string, basically an array of characters: '1', '.', '9', '1'.
This said, here is what you need to do, to build the rational out of floats:
Get rid of the . (mathematically by multiplying the numerator and denominator with 10^x, or multiplying with ten as many times, as there are numbers behind the .)
Find the greatest common denominator (gcd)
Divide num and denom with the gcd
Step 1 however, is a little different for Float and String:
The Float, we will have to multiply with 10^x, where x is (because of the precision) not 2 (as one would think with 1.91), but more something like 16 (remember: 1.9100...1).
For the String, we COULD cast it into a float and do the same trick, but hey, there is an easier way: We just count the number of characters behind the dot (which is 2), remove the dot and multiply the denom with 10^2... This is not only the easier, but also the more precise way.
The big numbers might disappear again, when applying step 3, that's why you will not always get those strange results when dealing with rationals from floats.
TLDR: The numbers will be built differently based on the arguments being String, or FLoat. FLoats can produce long-ass numbers, because precision.

The Float 1.91 is stored as a double which has a given amount of precision, limited by binary presentation. The equivalent Rational object retains this precision a such as possible, so it is huge. There is no way of storing 1.91 exactly in a double, but the value you get is close enough for most uses.
As for the String, it represents a different value - the exact value of 1.91 - and as you create a Rational it retains it better. It is more correct than the Float, UT takes longer to use for calculations.
This is similar to the problem with 1.0/3 as it "goes on forever" 0.333333...etc, but Rational can represent it exactly.

Need an maths algorithm to encode number in big integer to integer

I want to convert a number value of 100 digits into lessthan 10 digits and vice versa.
So I pass that encoded number to mobile user and on getting back can make 100 digits number again.
I want to use it in PHP, .NET or JS.
But before that I need an algorithm for that.
I have some idea to use simple divide-subtract and add-multiply options in my mind to implement. But need some more secure than that.

What you're asking for is impossible. You are trying to pigeonhole 10^100 items into 10^10 boxes. Some box will get more than one item and so it's impossible to invert back to "the" original item.
You could encode the 100-digit base-10 numbers as a 56-digit base-62 number (use uppercase and lowercase Roman alphabet and digits 0-9). The math here is 100 * log(10) / log(62).
To encode using less than ten characters from some alphabet, you need an alphabet with ~2^34 symbols. The math here is 100 * log(10) / log(number of symbols). Good luck with that.

If you have more than 10 000 000 000 different possible values in the 100 digit number you can not possibly map that to a 10 digit number and reliably map back to the original number.

A 100 digit number, I assume this is a base ten number, When talking about numbers on computers talk of 'digits' is almost meaningless.
If you actually mean a 100bit integer, then this wont easily fit into a single 64bit integer ( range +/- 9,223,372,036,854,775,808 ) then you have not phrased your question all that well. And no amount of compression or encoding will let you represent 100bits using no more than 10bits.
If you mean 100 figures in base ten, then you are dealing with bignums so should probably just treat them as bytes and use a bignum library.
100 base ten figures is still less than 512 bits.

Assuming that the 100-digit number is base 10, then if my math is not wrong you'll need 10 base 100 digits to represent the same number. So instead of using just characters from 0-9, you'll need to expand the characters to include other glyphs, including upper-case and lower-case letters, etc., to complete a 100 character alphabet. OK, my math is wrong, so disregard this, but consider the next paragraph.
Another thought is to use a hashing algorithm to derive a 10-byte hash from your 100-digit number and use that as key in a server-side database (hash-table). No encoding/decoding, just send the key to the mobile client, the mobile client uses the key to fetch the 100-digit number from the server.

Arithmetic in ruby

Why this code 7.30 - 7.20 in ruby returns 0.0999999999999996, not 0.10?
But if i'll write 7.30 - 7.16, for example, everything will be ok, i'll get 0.14.
What the problem, and how can i solve it?

What Every Computer Scientist Should Know About Floating-Point Arithmetic

The problem is that some numbers we can easily write in decimal don't have an exact representation in the particular floating point format implemented by current hardware. A casual way of stating this is that all the integers do, but not all of the fractions, because we normally store the fraction with a 2**e exponent. So, you have 3 choices:
Round off appropriately. The unrounded result is always really really close, so a rounded result is invariably "perfect". This is what Javascript does and lots of people don't even realize that JS does everything in floating point.
Use fixed point arithmetic. Ruby actually makes this really easy; it's one of the only languages that seamlessly shifts to Class Bignum from Fixnum as numbers get bigger.
Use a class that is designed to solve this problem, like BigDecimal
To look at the problem in more detail, we can try to represent your "7.3" in binary. The 7 part is easy, 111, but how do we do .3? 111.1 is 7.5, too big, 111.01 is 7.25, getting closer. Turns out, 111.010011 is the "next closest smaller number", 7.296875, and when we try to fill in the missing .003125 eventually we find out that it's just 111.010011001100110011... forever, not representable in our chosen encoding in a finite bit string.

The problem is that floating point is inaccurate. You can solve it by using Rational, BigDecimal or just plain integers (for example if you want to store money you can store the number of cents as an int instead of the number of dollars as a float).
BigDecimal can accurately store any number that has a finite number of digits in base 10 and rounds numbers that don't (so three thirds aren't one whole).
Rational can accurately store any rational number and can't store irrational numbers at all.

That is a common error from how float point numbers are represented in memory.
Use BigDecimal if you need exact results.
result=BigDecimal.new("7.3")-BigDecimal("7.2")
puts "%2.2f" % result

It is interesting to note that a number that has few decimals in one base may typically have a very large number of decimals in another. For instance, it takes an infinite number of decimals to express 1/3 (=0.3333...) in the base 10, but only one decimal in the base 3. Similarly, it takes many decimals to express the number 1/10 (=0.1) in the base 2.

Since you are doing floating point math then the number returned is what your computer uses for precision.
If you want a closer answer, to a set precision, just multiple the float by that (such as by 100), convert it to an int, do the math, then divide.
There are other solutions, but I find this to be the simplest since rounding always seems a bit iffy to me.
This has been asked before here, you may want to look for some of the answers given before, such as this one:
Dealing with accuracy problems in floating-point numbers

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio