I would like to know the best way to get a pseudorandom float number in a closed interval using the Ruby rand kernel function (please not Random module).
To take an example I will use the closed interval [0.0, 7.7] (both 0.0 and 7.7 included in the interval), but any other float interval should be valid too.
For the interval [0.0, 7.7] the next solution is not valid:
rand * 7.7
Why?
If you call rand without arguments you will get a pseudorandom floating point number greater than or equal to 0.0 and less than 1.0. So what is the range of float numbers that the previous solutions can give to us?
rand will return a pseudorandom float number in the range [0.0, 0.9999999...]
0.0 * 7.7
=> 0.0 # Correct!
0.9999999 * 7.7
=> 7.69999923 # Incorrect!
The interval does not match with [0.0, 7.7].
Does anyone know an elegant solution to this problem?
Thank you!
There's a Random class that can do what you want:
generator = Random.new # You need to instance it
generator.rand 0.0..7.7
(The documentation states the difference between 0.0..7.7 and 0.0...7.7 will be taken in account.)
In the future 1.9.3, you'll be able to pass a range to Kernel#rand and Random.rand (you can already do that in the preview version).
I would do something like this:
Fineness = 2**64
puts rand(Fineness+1)*7.7/Fineness
Whenever rand returns its maximum possible value, you will get Fineness*7.7/Fineness which turns out to equal 7.7 exactly (but I'm not totally sure this will always be the case, because floats are inexact).
As long as Fineness has more bits in it than a double on your computer, then I believe you will not notice any strangeness in the distribution of your results.
How about:
(rand/0.9999999999999999...)*7.7
Basically, normalize the random number by the largest possible random number. That way you will create the range [0..1].
However, I am unsure how to get the max number, which is less than 1.0 in ruby.
Why do you need this? I don't know of a case where there would be a need for this as a true single or double precision number. On the other hand, there are real cases where you might need numbers between 0.0 and 7.7 in increments of 0.1. In that case you could use well established techniques to go from 0 to 77 and then divide by 10.
Depending on the number of digits of precision you need you could use a round to even approach to snap to the boundaries of the interval to the edges. Hope this helps.
Here is the text from Wikipedia
Round half to evenA tie-breaking rule that is even less biased is
round half to even, namely
If the fraction of y is 0.5, then q is the even integer nearest to y.
Thus, for example, +23.5 becomes +24, +22.5 becomes +22, −22.5 becomes
−22, and −23.5 becomes −24.
This method also treats positive and negative values symmetrically,
and therefore is free of overall bias if the original numbers are
positive or negative with equal probability. In addition, for most
reasonable distributions of y values, the expected (average) value of
the rounded numbers is essentially the same as that of the original
numbers, even if the latter are all positive (or all negative).
However, this rule will still introduce a positive bias for even
numbers (including zero), and a negative bias for the odd ones.
This variant of the round-to-nearest method is also called unbiased
rounding (ambiguously, and a bit abusively), convergent rounding,
statistician's rounding, Dutch rounding, Gaussian rounding, or
bankers' rounding. This is widely used in bookkeeping.
This is the default rounding mode used in IEEE 754 computing functions
and operators.
Related
This is something that's been on my mind for years, but I never took the time to ask before.
Many (pseudo) random number generators generate a random number between 0.0 and 1.0. Mathematically there are infinite numbers in this range, but double is a floating point number, and therefore has a finite precision.
So the questions are:
Just how many double numbers are there between 0.0 and 1.0?
Are there just as many numbers between 1 and 2? Between 100 and 101? Between 10^100 and 10^100+1?
Note: if it makes a difference, I'm interested in Java's definition of double in particular.
Java doubles are in IEEE-754 format, therefore they have a 52-bit fraction; between any two adjacent powers of two (inclusive of one and exclusive of the next one), there will therefore be 2 to the 52th power different doubles (i.e., 4503599627370496 of them). For example, that's the number of distinct doubles between 0.5 included and 1.0 excluded, and exactly that many also lie between 1.0 included and 2.0 excluded, and so forth.
Counting the doubles between 0.0 and 1.0 is harder than doing so between powers of two, because there are many powers of two included in that range, and, also, one gets into the thorny issues of denormalized numbers. 10 of the 11 bits of the exponents cover the range in question, so, including denormalized numbers (and I think a few kinds of NaN) you'd have 1024 times the doubles as lay between powers of two -- no more than 2**62 in total anyway. Excluding denormalized &c, I believe the count would be 1023 times 2**52.
For an arbitrary range like "100 to 100.1" it's even harder because the upper bound cannot be exactly represented as a double (not being an exact multiple of any power of two). As a handy approximation, since the progression between powers of two is linear, you could say that said range is 0.1 / 64th of the span between the surrounding powers of two (64 and 128), so you'd expect about
(0.1 / 64) * 2**52
distinct doubles -- which comes to 7036874417766.4004... give or take one or two;-).
Every double value whose representation is between 0x0000000000000000 and 0x3ff0000000000000 lies in the interval [0.0, 1.0]. That's (2^62 - 2^52) distinct values (plus or minus a couple depending on whether you count the endpoints).
The interval [1.0, 2.0] corresponds to representations between 0x3ff0000000000000 and 0x400000000000000; that's 2^52 distinct values.
The interval [100.0, 101.0] corresponds to representations between 0x4059000000000000 and 0x4059400000000000; that's 2^46 distinct values.
There are no doubles between 10^100 and 10^100 + 1. Neither one of those numbers is representable in double precision, and there are no doubles that fall between them. The closest two double precision numbers are:
99999999999999982163600188718701095...
and
10000000000000000159028911097599180...
Others have already explained that there are around 2^62 doubles in the range [0.0, 1.0].
(Not really surprising: there are almost 2^64 distinct finite doubles; of those, half are positive, and roughly half of those are < 1.0.)
But you mention random number generators: note that a random number generator generating numbers between 0.0 and 1.0 cannot in general produce all these numbers; typically it'll only produce numbers of the form n/2^53 with n an integer (see e.g. the Java documentation for nextDouble). So there are usually only around 2^53 (+/-1, depending on which endpoints are included) possible values for the random() output. This means that most doubles in [0.0, 1.0] will never be generated.
The article Java's new math, Part 2: Floating-point numbers from IBM offers the following code snippet to solve this (in floats, but I suspect it works for doubles as well):
public class FloatCounter {
public static void main(String[] args) {
float x = 1.0F;
int numFloats = 0;
while (x <= 2.0) {
numFloats++;
System.out.println(x);
x = Math.nextUp(x);
}
System.out.println(numFloats);
}
}
They have this comment about it:
It turns out there are exactly 8,388,609 floats between 1.0 and 2.0 inclusive; large but hardly the uncountable infinity of real numbers that exist in this range. Successive numbers are about 0.0000001 apart. This distance is called an ULP for unit of least precision or unit in the last place.
2^53 - the size of the significand/mantissa of a 64bit floating point number including the hidden bit.
Roughly yes, as the significand is fixed but the exponent changes.
See the wikipedia article for more information.
The Java double is a IEEE 754 binary64 number.
This means that we need to consider:
Mantissa is 52 bit
Exponent is 11 bit number with 1023 bias (ie with 1023 added to it)
If the exponent is all 0 and the mantissa is non zero then the number is said to be non-normalized
This basically means there is a total of 2^62-2^52+1 of possible double representations that according to the standard are between 0 and 1. Note that 2^52+1 is to the remove the cases of the non-normalized numbers.
Remember that if mantissa is positive but exponent is negative number is positive but less than 1 :-)
For other numbers it is a bit harder because the edge integer numbers may not representable in a precise manner in the IEEE 754 representation, and because there are other bits used in the exponent to be able represent the numbers, so the larger the number the lower the different values.
Math.random() returns a number with 17 digits after the binary point in my console, so the probability of it being exactly 0 is astronomically low. What exactly is the point of that it can technically be 0 but never can be 1, when it is basically never going to return either of those numbers anyway? What led to this? Could someone give me some examples where this property of the Math.random() is relevant?
Typically to produce the random floating-point value a random integer is generated first and then that is scaled to the floating-point range. Because of the binary representation of a floating-point value there are a power-of-two uniformly distributed values between 0 and 1 not including 1.0. If you do include 1.0 then the range is a power of two plus one; and generating a uniform random integer in such a range is much more difficult.
Besides that, returning 1.0 would only cause problems. After you multiply it by a constant you end up with an extremely small chance, but still a chance, of getting exactly that constant. Even after quantising with round-to-nearest you still only get half the chance of occuring compared to any other value (except zero, which also suffers the same penalty).
So it's both easier to do, and more useful.
That said, this is no place to fuss about perfectly uniform distributions. Simply by multiplying a floating-point value, which is inherently restricted to some finite set of values, by a constant and quantising the result, you end up with some results being slightly (immeasurably) more likely than others.
When Upgrading to ruby 1.9, I have a failing test when comparing expected vs. actual values for a BigDecimal that is the result of dividing a Float.
expected: '0.495E0',9(18)
got: '0.4950000000 0000005E0',18(27)
googling for things like "bigdecimal ruby precision" and "bigdecimal changes ruby 1.9" isn't getting me anywhere.
How did BigDecimal's behavior change in ruby 1.9?
update 1
> RUBY_VERSION
=> "1.8.7"
> 1.23.to_d
=> #<BigDecimal:1034630a8,'0.123E1',18(18)>
> RUBY_VERSION
=> "1.9.3"
> 1.23.to_d
=> #<BigDecimal:1029f3988,'0.123E1',18(45)>
What does 18(18) and 18(45) mean? Precision I imagine, but what is the notation/unit?
update 2
the code is running:
((10 - 0.1) * (5.0/100)).to_d
My test is expecting this to be equal (==) to:
0.495.to_f
This passed under 1.8, fails under 1.9.2 and 1.9.3
Equality comparisons rarely succeed on FP values
The short answer is that the Float#to_d is more accurate in 1.9 and is correctly failing the equality test that should not have succeeded in 1.8.7.
The long answer involves a basic rule of floating point programming: never do equality comparisons. Instead, fuzzy comparisons like if (abs(x-y) < epsilon) are recommended, or code is written to avoid the need for equality comparison altogether.
Although there are in theory about 232 single-precision numbers and 264 double-precision numbers that could be exactly compared, there are an infinite number that cannot be so compared. (Note: it is safe to do equality comparisons on FP values that happen to be integral. So, contrary to much advice, they are actually perfectly safe for loop indices and subscripts.)
Worse, the way we write fractional numbers makes it unlikely that a comparison with any specific constant will be successful.
That's because the fractions are binary, that is 1/2 + 1/4 + 1/8 ... but our constants are decimal. So, for example, consider monetary amounts in the range $1.00, $1.01, $1.02 .. $1.99. There are 100 values in this range and yet only 4 of them have exact FP representations: 1.00, 1.25, 1.50, and 1.75.
So, back to your problem. Your result of 0.495 has no exact representation and neither does the input constant of 0.1. You begin the calculation with a subtraction of two FP numbers with different magnitudes. The smaller number will be denormalized in order to accomplish the subtraction and so it will lose two or three low-order bits. As a result, the calculation will lead to a slightly large number than 0.495, because the entire 0.1 was not subtracted from 10. Your constant is actually slightly smaller (internally) than 0.495. And that's why the comparison fails.
Ruby 1.8 must have been accidentally or deliberately losing some low order bits and effectively introducing a rounding step that ended up helping your test.
Remember: the rule of thumb is that you must explicitly program in such rounding for floating point comparisons.
Notes. To answer the question from the comments about simple decimal fraction constants not having exact representations: They don't have exact finite forms because they repeat in binary. Every machine fraction is a rational number of the form x/2n. Now, the constants are decimal and every decimal constant is a rational number of the form x/(2n * 5m). The 5m numbers are odd, so there isn't a 2n factor for any of them. Only when m == 0 is there a finite representation in both the binary and decimal expansion of the fraction. So, 1.25 is exact because it's 5/(22*50) but 0.1 is not because it's 1/(20*51). There is simply no way to express 0.1 as a finite sum of x/2n components.
See the Wikipedia article on floating point accuracy problems. It does a very good job of explaining why numbers like 0.1 and 0.01 cannot be represented exactly using floating point numbers.
The simple explanation is that these numbers, when represented in binary floating-point format, are recurring, just like one third is 0.3333333333... recurring in decimal.
Just as you can never represent one third exactly using a finite set of decimal digits, you cannot represent these numbers exactly using a finite set of binary digits.
I was wondering, how I can get the best precision on ruby. Someone told me that the best precision is probably between 0 and 1, because as you go into larger numbers the step increases as well.
I suppose a way to find out would be to know what the minimum float number is and what the next float number, then the precision would be the difference, right? If I'm correct how could I do this on ruby?
I am not sure how to use this http://ruby.wikia.com/wiki/Float to find that information.
Any help appreciated.
In terms of significant digits, the precision is the same, regardless of scale. That is, if you scale your range from [0.0, 1000.0] down to [0.0, 1.0] just by dividing numbers in the natural range by 1000.0, this will have no discernible effect on the precision of your range. In fact, a larger range will have marginally greater precision since it fully contains the smaller range.
As for discovering the absolute precision, you have two problems:
The absolute precision depends on the magnitude, which varies "infinitely" within the range [0, 1] (limx→0 log(x) = –∞). So there is no one precision for numbers in that range. You can only derive absolute precision at a given point in the range.
The common technique for discovering the minimum step — known as the ulp — is to interpret the bit-representation of the float as an integer, increment it by one, and reinterpret the result as a float. Ruby doesn't, AFAIK, let you do this.
There is, however, an iterative solution. Simply add 1.0 to the number and subtract ((x + 1.0) - x). If the difference is zero, double the addend ((x + 2.0) - x) and repeat until the difference is non-zero. Otherwise, halve the addend (to 0.5) and repeat until the difference is zero. Whenever you stop, the lowest addend that produces a non-zero difference is the ulp. (I described this from vague memory, so it might be NQR.)
You can use class Rational - it stores non-integer numbers as fraction of two Integers, which (as far as I know) will be automatically converted to Bignum, when need.
The Flt ruby library provides arbitrary floating point precision.
This is something that's been on my mind for years, but I never took the time to ask before.
Many (pseudo) random number generators generate a random number between 0.0 and 1.0. Mathematically there are infinite numbers in this range, but double is a floating point number, and therefore has a finite precision.
So the questions are:
Just how many double numbers are there between 0.0 and 1.0?
Are there just as many numbers between 1 and 2? Between 100 and 101? Between 10^100 and 10^100+1?
Note: if it makes a difference, I'm interested in Java's definition of double in particular.
Java doubles are in IEEE-754 format, therefore they have a 52-bit fraction; between any two adjacent powers of two (inclusive of one and exclusive of the next one), there will therefore be 2 to the 52th power different doubles (i.e., 4503599627370496 of them). For example, that's the number of distinct doubles between 0.5 included and 1.0 excluded, and exactly that many also lie between 1.0 included and 2.0 excluded, and so forth.
Counting the doubles between 0.0 and 1.0 is harder than doing so between powers of two, because there are many powers of two included in that range, and, also, one gets into the thorny issues of denormalized numbers. 10 of the 11 bits of the exponents cover the range in question, so, including denormalized numbers (and I think a few kinds of NaN) you'd have 1024 times the doubles as lay between powers of two -- no more than 2**62 in total anyway. Excluding denormalized &c, I believe the count would be 1023 times 2**52.
For an arbitrary range like "100 to 100.1" it's even harder because the upper bound cannot be exactly represented as a double (not being an exact multiple of any power of two). As a handy approximation, since the progression between powers of two is linear, you could say that said range is 0.1 / 64th of the span between the surrounding powers of two (64 and 128), so you'd expect about
(0.1 / 64) * 2**52
distinct doubles -- which comes to 7036874417766.4004... give or take one or two;-).
Every double value whose representation is between 0x0000000000000000 and 0x3ff0000000000000 lies in the interval [0.0, 1.0]. That's (2^62 - 2^52) distinct values (plus or minus a couple depending on whether you count the endpoints).
The interval [1.0, 2.0] corresponds to representations between 0x3ff0000000000000 and 0x400000000000000; that's 2^52 distinct values.
The interval [100.0, 101.0] corresponds to representations between 0x4059000000000000 and 0x4059400000000000; that's 2^46 distinct values.
There are no doubles between 10^100 and 10^100 + 1. Neither one of those numbers is representable in double precision, and there are no doubles that fall between them. The closest two double precision numbers are:
99999999999999982163600188718701095...
and
10000000000000000159028911097599180...
Others have already explained that there are around 2^62 doubles in the range [0.0, 1.0].
(Not really surprising: there are almost 2^64 distinct finite doubles; of those, half are positive, and roughly half of those are < 1.0.)
But you mention random number generators: note that a random number generator generating numbers between 0.0 and 1.0 cannot in general produce all these numbers; typically it'll only produce numbers of the form n/2^53 with n an integer (see e.g. the Java documentation for nextDouble). So there are usually only around 2^53 (+/-1, depending on which endpoints are included) possible values for the random() output. This means that most doubles in [0.0, 1.0] will never be generated.
The article Java's new math, Part 2: Floating-point numbers from IBM offers the following code snippet to solve this (in floats, but I suspect it works for doubles as well):
public class FloatCounter {
public static void main(String[] args) {
float x = 1.0F;
int numFloats = 0;
while (x <= 2.0) {
numFloats++;
System.out.println(x);
x = Math.nextUp(x);
}
System.out.println(numFloats);
}
}
They have this comment about it:
It turns out there are exactly 8,388,609 floats between 1.0 and 2.0 inclusive; large but hardly the uncountable infinity of real numbers that exist in this range. Successive numbers are about 0.0000001 apart. This distance is called an ULP for unit of least precision or unit in the last place.
2^53 - the size of the significand/mantissa of a 64bit floating point number including the hidden bit.
Roughly yes, as the significand is fixed but the exponent changes.
See the wikipedia article for more information.
The Java double is a IEEE 754 binary64 number.
This means that we need to consider:
Mantissa is 52 bit
Exponent is 11 bit number with 1023 bias (ie with 1023 added to it)
If the exponent is all 0 and the mantissa is non zero then the number is said to be non-normalized
This basically means there is a total of 2^62-2^52+1 of possible double representations that according to the standard are between 0 and 1. Note that 2^52+1 is to the remove the cases of the non-normalized numbers.
Remember that if mantissa is positive but exponent is negative number is positive but less than 1 :-)
For other numbers it is a bit harder because the edge integer numbers may not representable in a precise manner in the IEEE 754 representation, and because there are other bits used in the exponent to be able represent the numbers, so the larger the number the lower the different values.