probabilities with small numbers - probability

I am working with large amounts of probabilities that I multiply so i quickly obtain very small numbers. But it seems that python finally store the final result as zero.
to overpass this difficutly, I decided to sum the logs of these probabilities (instead of directly multiplying the probabilities). This strategy returns a negative number (call it c) as expected.
But then, if I want to apply the exponential on c (to come back on the real value of my product of probabilities), I obtain the value zero because c is too largely negative (something like -123445,4).
How could I overpass this problem?

If you are going to use numbers of that magnitude you should use a specialized library which can handle arbitrary floating point precision. Check out mpmath or bigfloat package for example.
Computers natively only support number down to approximately exp(-300). Alternatively, you could restrict your code to store only the exponent and never convert it in a decimal representation.

Related

Precison of digital computers

I read that multiplying multiple values between 0 and 1 will significantly reduce the precision of digital computers; I want to know the basis on which such postulate is based? And does it still holds for modern-day computers?
The typical IEEE-conformant representation of fractional numbers only supports a limited number of (binary) digits. So, very often, the result of some computation isn't an exact representation of the expected mathematical value, but something close to it (rounded to the next number representable within the digits limit), meaning that there is some amount of error in most calculations.
If you do multi-step calculations, you might be lucky that the error introduced by one step is compensated by some complementary error at a later step. But that's pure luck, and statistics teaches us that the expected error will indeed increase with every step.
If you e.g. do 1000 multiplications using the float datatype (typically achieving 6-7 significant decimal digits accuracy), I'd expect the result to be correct only up to about 5 digits, and in worst case only 3-4 digits.
There are ways to do precise calculations (at least for addition, subtraction, multiplication and division), e.g. using the ratio type in the LISP programming language, but in practice they are rarely used.
So yes, doing multi-step calculations in datatypes supporting fractional numbers quickly degrades precision, and it happens with all number ranges, not only with numbers between 0 and 1.
If this is a problem for some application, it's a special skill to transform mathematical formulas into equivalent ones that can be computed with better precision (e.g. formulas with fewer intermediate steps).

How to get a representative random number from a set of pseudo random numbers?

Let's say I got three pseudo random numbers from different pseudo random number generators.
Since the generators would reflect only a part of the real random number generating process, I believe that one way to get a number closer to real random might be to somehow get a "center" of the three pseudo random numbers.
An easy way to get that "center" would be to take average, median or mode (if any) of them.
I am wondering if there's a more sophisticated way due to the fact that they should represent random numbers.
Well, there is an approach, called entropy extractor, which allows to get (good) random numbers from not quite random source(s).
If you have three independent but somewhat low quality (biased) RNGs, you could combine them together into uniform source.
Suppose you have three generators giving you a single byte each, then uniform output would be
t = X*Y + Z
where addition and multiplication are done over GF(28) finite field.
Some code (Python)
def RNG1():
return ... # single random byte
def RNG2():
return ... # single random byte
def RNG3():
return ... # single random byte
from pyfinite import ffield
def muRNG():
X = RNG1()
Y = RNG2()
Z = RNG3()
GF = ffield.FField(8)
return GF.Add(GF.Multiply(X, Y), Z)
Paper where this idea was stated
Trying to use some form of "centering" turns out to be a bad idea if your goal is to have a better representation of the randomness.
First, a thought experiment. If you think three values gives more randomness, wouldn't more be even better? It turns out that if you take either the average or median of n Uniform(0,1) values, as n→∞ these both converge to 0.5, a point. It also happens to be the case that replacing distributions with a "representative" constant is generally a bad idea if you want to understand stochastic systems. As an extreme example, consider queues. As the arrival rate of customers/entities approaches the rate at which they can be served, stochastic queues get progressively larger on average. However, if the arrival and service distributions are constant, queues remain at zero length until the arrival rate exceeds the service rate, at which point they go to infinity. When the rates are equal, the stochastic queue would have infinite queues, while the deterministic queue would remain at its initial length (usually assumed to be zero). Infinity and zero are about as wildly different as you can get, illustrating that replacing distributions in a queueing model with their means would give you no understanding of how queues actually work.
Next, empirical evidence. Below histograms of the medians and averages constructed from 10,000 samples of three uniforms. As you can see, they have different distribution shapes but are clearly no longer uniform. Values bunch in the middle and are progressively rarer towards the endpoints of the range (0,1).
The uniform distribution has maximum entropy for continuous distributions on a closed interval, so both of these alternatives, being non-uniform, are clearly lower entropy, i.e., more predictable.
To get good random numbers, it's advisable to get some bits of entropy. Depending on whether they are used for security purposes or not, you could just get the time from the system clock as a seed for a random number generator, or use more sophisticated means. The project PWGen download | SourceForge.net is open-sourced, and monitors Windows events as a source of random bits of entropy.
You can find more info on how to random numbers in C++ from this SO ? too: Random number generation in C++11: how to generate, how does it work? [closed]. It turns out C++'s random numbers aren't always all that random: Everything You Never Wanted to Know about C++'s random_device; so looking for a good way to seed, i.e. by passing the time in mS to srand() and calling rand() might be a quick and dirty way to go.

Random number from many other random numbers, is it more random?

We want to generate a uniform random number from the interval [0, 1].
Let's first generate k random booleans (for example by rand()<0.5) and decide according to these on what subinterval [m*2^{-k}, (m+1)*2^{-k}] the number will fall. Then we use one rand() to get the final output as m*2^{-k} + rand()*2^{-k}.
Let's assume we have arbitrary precision.
Will a random number generated this way be 'more random' than the usual rand()?
PS. I guess the subinterval picking amounts to just choosing the binary representation of the output 0. b_1 b_2 b_3... one digit b_i at a time and the final step is adding the representation of rand() to the end of the output.
It depends on the definition of "more random". If you use more random generators, it means more random state, and it means that cycle length will be greater. But cycle length is just one property of random generators. Cycle length of 2^64 usually OK for almost any purpose (the only exception I know is that if you need a lot of different, long sequences, like for some kind of simulation).
However, if you combine two bad random generators, they don't necessarily become better, you have to analyze it. But there are generators, which do work this way. For example, KISS is an example for this: it combines 3, not-too-good generators, and the result is a good generator.
For card shuffling, you'll need a cryptographic RNG. Even a very good, but not cryptographic RNG is inadequate for this purpose. For example, Mersenne Twister, which is a good RNG, is not suitable for secure card shuffling! It is because observing output numbers, it is possible to figure out its internal state, so shuffle result can be predicted.
This can help, but only if you use a different pseudorandom generator for the first and last bits. (It doesn't have to be a different pseudorandom algorithm, just a different seed.)
If you use the same generator, then you will still only be able to construct 2^n different shuffles, where n is the number of bits in the random generator's state.
If you have two generators, each with n bits of state, then you can produce up to a total of 2^(2n) different shuffles.
Tinkering with a random number generator, as you are doing by using only one bit of random space and then calling iteratively, usually weakens its random properties. All RNGs fail some statistical tests for randomness, but you are more likely to get find that a noticeable cycle crops up if you start making many calls and combining them.

math: scale coordinate system so that certain points get integer coordinates

this is more a mathematical problem. nonethelesse i am looking for the algorithm in pseudocode to solve it.
given is a one dimensional coordinate system, with a number of points. the coordinates of the points may be in floating point.
now i am looking for a factor that scales this coordinate system, so that all points are on fixed number (i.e. integer coordinate)
if i am not mistaken, there should be a solution for this problem as long as the number of points is not infinite.
if i am wrong and there is no analytical solution for this problem, i am interested in an algorithm that approximates the solution as close as possible. (i.e. the coordinates will look like 15.0001)
if you are interested for the concrete problem:
i would like to overcome the well known pixelsnapping problem in adobe flash, which cuts of half-pixels at the border of bitmaps if the whole stage is scaled. i would like to find out an ideal scaling factor for the stage which makes my bitmaps being placed on whole (screen-)pixel coordinates.
since i am placing two bitmaps on the stage, the number of points will be 4 in each direction (x,y).
thanks!
As suggested, you have to convert your floating point numbers to rational ones. Fix a tolerance epsilon, and for each coordinate, find its best rational approximation within epsilon.
An algorithm and definitions is outlined there in this section.
Once you have converted all the coordinates into rational numbers, the scaling is given by the least common multiple of the denominators.
Note that this latter number can become quite huge, so you may want to experiment with epsilon so that to control the denominators.
My own inclination, if I were in your situation, would be to use rational numbers not with floating point.
And the algorithms you are looking for is finding the lowest common denominator.
A floating point number is an integer, multiplied by a power of two (the power might be negative).
So, find the largest necessary power of two among your inputs, and that gives you a scale factor that will work. The power of two isn't just -1 times the exponent of the float, it's a few more than that (according to where the least significant 1 bit is in the significand).
It's also optimal, because if x times a power of 2 is an odd integer then x in its float representation was already in simplest rational form, there's no smaller integer that you can multiply x by to get an integer.
Obviously if you have a mixture of large and small values among your input, then the resulting integers will tend to be bigger than 64 bit. So there is an analytical solution, but perhaps not a very good one given what you want to do with the results.
Note that this approach treats floats as being precise representations, which they are not. You may get more sensible results by representing each float as a rational number with smaller denominator (within some defined tolerance), then taking the lowest common multiple of all the denominators.
The problem there though is the approximation process - if the input float is 0.334[*] then I can't in general be sure whether the person who gave it to me really mean 0.334, or whether it's 1/3 with some inaccuracy. I therefore don't know whether to use a scale factor of 3 and say the scaled result is 1, or use a scale factor of 500 and say the scaled result is 167. And that's just with 1 input, never mind a bunch of them.
With 4 inputs and allowed final tolerance of 0.0001, you could perhaps find the 10 closest rationals to each input with a certain maximum denominator, then try 10^4 different possibilities and see whether the resulting scale factor gives you any values that are too far from an integer. Brute force seems nasty, but you might a least be able to bound the search a bit as you go. Also "maximum denominator" might be expressed in terms of the primes present in the factorization, rather than just the number, since if you can find a lot of common factors among them then they'll have a smaller lcm and hence smaller deviation from integers after scaling.
[*] Not that 0.334 is an exact float value, but that sort of thing. Decimal examples are easier.
If you are talking about single precision floating point numbers, then the number can be expressed like this according to wikipedia:
From this formula you can deduce that you always get an integer if you multiply by 2127+23. (Actually, when e is 0 you have to use another formula for the special range of "subnormal" numbers so 2126+23 is sufficient. See the linked wikipedia article for details.)
To do this in code you will probably need to do some bit twiddling to extract the factors in the above formula from the bits in the floating point value. And then you will need some kind of support for unlimited size numbers to express the integer result of the scaling (e.g. BigInteger in .NET). Normal primitive types in most languages/platforms are typically limited to much smaller sizes.
It's really a problem in statistical inference combined with noise reduction. This is the method I'm going to try out soon. I'm assuming you're trying to get a regularly spaced 2-D grid but a similar method could work on a regularly spaced grid of 3 or more dimensions.
First tabulate all the differences and note that (dx,dy) and (-dx,-dy) denote the same displacement, so there's an equivalence relation. Group those differenecs that are within a pre-assigned threshold (epsilon) of one another. Epsilon should be large enough to capture measurement errors due to random noise or lack of image resolution, but small enough not to accidentally combine clusters.
Sort the clusters by their average size (dr = root(dx^2 + dy^2)).
If the original grid was, indeed, regularly spaced and generated by two independent basis vectors, then the two smallest linearly independent clusters will indicate so. The smallest cluster is the one centered on (0, 0). The next smallest cluster (dx0, dy0) has the first basis vector up to +/- sign (-dx0, -dy0) denotes the same displacement, recall.
The next smallest clusters may be linearly dependent on this (up to the threshold epsilon) by virtue of being multiples of (dx0, dy0). Find the smallest cluster which is NOT a multiple of (dx0, dy0). Call this (dx1, dy1).
Now you have enough to tag the original vectors. Group the vector, by increasing lexicographic order (x,y) > (x',y') if x > x' or x = x' and y > y'. Take the smallest (x0,y0) and assign the integer (0, 0) to it. Take all the others (x,y) and find the decomposition (x,y) = (x0,y0) + M0(x,y) (dx0, dy0) + M1(x,y) (dx1,dy1) and assign it the integers (m0(x,y),m1(x,y)) = (round(M0), round(M1)).
Now do a least-squares fit of the integers to the vectors to the equations (x,y) = (ux,uy) m0(x,y) (u0x,u0y) + m1(x,y) (u1x,u1y)
to find (ux,uy), (u0x,u0y) and (u1x,u1y). This identifies the grid.
Test this match to determine whether or not all the points are within a given threshold of this fit (maybe using the same threshold epsilon for this purpose).
The 1-D version of this same routine should also work in 1 dimension on a spectrograph to identify the fundamental frequency in a voice print. Only in this case, the assumed value for ux (which replaces (ux,uy)) is just 0 and one is only looking for a fit to the homogeneous equation x = m0(x) u0x.

Mersenne Twister: seeding & visualization

I am using a C# implementation of Mersenne Twister I downloaded from CenterSpace. I have two problems with it:
No matter how I seed the algorithm it does not pass DieHard tests, and by that I mean I get quite a lot of 1s and 0s for p-value. Also my KStest on 269 p-values is 0. Well, I cannot quite interpret p-value, but I think a few 1s and 0s in the result is bad news.
I have been asked to visually show the randomness of the numbers. So I plot the numbers as they are generated, and this does not seem random at all. Here is two screenshots of the result after a few seconds and a few seconds later. As you can see in the second screenshot the numbers fall on some parallel lines. I have tried different algorithms to map numbers to points. They all result in parallel lines, but with different angles! This is how I mapped numbers to points for these screenshots: new Point(number % _canvasWidth, number % _canvasHeight). As you may guess, the visual result depends on the form's width and height, and this is a disasterous result.
Here is a few ways I tried to seed the algorithm:
User entry. I enter some numbers to seed the algorithm as an int array.
Random numbers generated by the algorithm itself!!
An array of new Guid().GetHashCode()
What am I missing here? How should I seed the algorithm? How can I get it pass the DieHard?
While I cannot speak to your first point, the second problem has to do with how you are computing the points to draw on. Specifically,
x = number % _canvasWidth;
y = number % _canvasHeight;
will give you a "pattern" that corresponds somewhat to the aspect ratio of the window you are drawing to. For example, if _canvasWidth and _canvasHeight were equal, you would always draw on a single diagonal line as x and y would always be the same. This graphical representation wouldn't be appropriate in this case, then.
What about taking the N bits of the RNG output and using half for the x coordinate and the other half for the y coordinate? For those bits that fall out of the bounds of your window you might want to consider two options:
Don't draw them (or draw them offscreen)
Perform a linear interpolation to map the range of bits to the width/height of your window
Either option should give you a more representative picture of the bits you are getting our of your random number generator. Good luck!
Your stripy point-plotting problem should easily be fixed by generating a new random number for each of the x and y coordinates. Trying to reuse a single generated number for x and y is basically premature optimization, but if you do go down that route, make sure you extract different bits for each from the number; as is, x=n%width;y=n%height gives you enormous correlation between x and y, as can be seen in your images.
I've been using various C++ Mersenne Twister implementations for years (most recently boost's) to generate random points and had no difficulties with it (seed related or otherwise). It really is a superb generator.
True random number generation cannot be done with a mathematical function. If it's important to have truly random numbers, get a hardware random number generator. I've developed real money online poker games—such hardware is the only way to be confident there are no patterns in the numbers.
If targeting a Linux environment, the /dev/random and /dev/urandom pseudo devices do a lot better than a mathematical generator, since they incorporate random numbers representing hardware activity.

Resources