Sampling from Geometric distribution in constant time - random

I would like to know if there is any method to sample form the Geometric distribution in constant time without using log which can be hard to approximate. Thanks.

Without relying on logarithms, there is no algorithm to sample from a geometric(p) distribution in constant expected time. Rather, on a realistic computing model, such an algorithm's expected running time must grow at least as fast as 1 + log(1/p)/w, where w is the word size of the computer in bits (Bringmann and Friedrich 2013). The following algorithm, which is equivalent to one in the Bringmann paper, generates a geometric(px/py) random number without relying on logarithms, and when px/py is very small, the algorithm is considerably faster than the trivial algorithm of generating trials until a success:
Set pn to px, k to 0, and d to 0.
While pn*2 <= py, add 1 to k and multiply pn by 2.
With probability (1− px /py)2k, add 1 to d and repeat this step.
Generate a uniform random integer in [0, 2k), call it m, then with probability (1− px /py)m, return d*2k+m. Otherwise, repeat this step.
(The actual algorithm described in the Bringmann paper is in fact much more involved than this; see my note "On a Geometric Sampler".)
REFERENCES:
Bringmann, K., and Friedrich, T., 2013, July. Exact and efficient generation of geometric random variates and random graphs, in International Colloquium on Automata, Languages, and Programming (pp. 267-278).

Related

generating sorted random numbers without exponentiation involved?

I am looking for a math equation or algorithm which can generate uniform random numbers in ascending order in the range [0,1] without the help of division operator. i am keen in skipping the division operation because i am implementing it in hardware. Thank you.
Generating the numbers in ascending (or descending) order means generating them sequentially but with the right distribution. That, in turn, means we need to know the distribution of the minimum of a set of size N, and then at each stage we need to use conditioning to determine the next value based on what we've already seen. Mathematically these are both straightforward except for the issue of avoiding division.
You can generate the minimum of N uniform(0,1)'s from a single uniform(0,1) random number U using the algorithm min = 1 - U**(1/N), where ** denotes exponentiation. In other words, the complement of the Nth root of a uniform has the same distribution as the minimum of N uniforms over the range [0,1], which can then be scaled to any other interval length you like.
The conditioning aspect basically says that the k values already generated will have eaten up some portion of the original interval, and that what we now want is the minimum of N-k values, scaled to the remaining range.
Combining the two pieces yields the following logic. Generate the smallest of the N uniforms, scale it by the remaining interval length (1 the first time), and make that result the last value we have generated. Then generate the smallest of N-1 uniforms, scale it by the remaining interval length, and add it to the last one to give you your next value. Lather, rinse, repeat, until you have done them all. The following Ruby implementation gives distributionally correct results, assuming you have read in or specified N prior to this:
last_u = 0.0
N.downto(1) do |i|
p last_u += (1.0 - last_u) * (1.0 - (rand ** (1.0/i)))
end
but we have that pesky ith root which uses division. However, if we know N ahead of time, we can pre-calculate the inverses of the integers from 1 to N offline and table them.
last_u = 0.0
N.downto(1) do |i|
p last_u += (1.0 - last_u) * (1.0 - (rand ** inverse[i]))
end
I don't know of any way get the correct distributional behavior sequentially without using exponentiation. If that's a show-stopper, you're going to have to give up on either the sequential nature of the process or the uniformity requirement.
You can try so-called "stratified sampling", which means you divide the range into bins and then sample randomly from bins. A sample thus generated is more uniform (less clumping) than a sample generated from the entire interval. For this reason, stratified sampling reduces the variance of Monte Carlo estimates (I don't suppose that's important to you, but that's why the method was invented, as a reduction of variance method).
It is an interesting problem to generate numbers in order, but my guess is that to get a uniform distribution over the entire interval, you will have to apply some formulas which require more computation. If you want to minimize computation time, I suspect you cannot do better than generating a sample and then sorting it.

Iterative random weighted choice [duplicate]

Suppose that I have an n-sided loaded die, where each side k has some probability pk of coming up when I roll it. I’m curious if there is a good data structure for storing this information statically (i.e., for a fixed set of probabilities), so that I can efficiently simulate a random roll of the die.
Currently, I have an O(lg n) solution for this problem. The idea is to store a table of the cumulative probability of the first k sides for all k, then generate a random real number in the range [0, 1) and perform a binary search over the table to get the largest index whose cumulative value is no greater than the chosen value.
I rather like this solution, but it seems odd that the runtime doesn’t take the probabilities into account. In particular, in the extreme cases of one side always coming up or the values being uniformly distributed, it’s possible to generate the result of the roll in O(1) using a naive approach, while my solution will still take logarithmically many steps.
Does anyone have any suggestions for how to solve this problem in a way that is somehow “adaptive” in it’s runtime?
Update: Based on the answers to this question, I have written up an article describing many approaches to this problem, along with their analyses. It looks like Vose’s implementation of the alias method gives Θ(n) preprocessing time and O(1) time per die roll, which is truly impressive. Hopefully this is a useful addition to the information contained in the answers!
You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.
The idea is to take your array of probabilities pk and produce three new n-element arrays, qk, ak, and bk. Each qk is a probability between 0 and 1, and each ak and bk is an integer between 1 and n.
We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If qi < s then return ai else return bi. The work in the alias method is in figuring out how to produce qk, ak and bk.
Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.
function get_result(node, seed):
if seed < node.interval.start:
return get_result(node.left_child, seed)
else if seed < node.interval.end:
// start <= seed < end
return node.result
else:
return get_result(node.right_child, seed)
The good thing about this solution is that is very simple to implement but still has good complexity.
I'm thinking of granulating your table.
Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.
Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.
Maybe I could explain easier with an example:
Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3
Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)
arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3
Then to get the probability, just randomize a number between 0 and 10 and simply access that index.
This method might loose accuracy, but increase x and accuracy will be sufficient.
There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time.
One of the simplest ways to choose an integer with a custom weight function f(x) is the rejection sampling method. The following assumes that the highest possible value of f is max and each weight is 0 or greater. The time complexity for rejection sampling is constant on average, but depends greatly on the shape of the distribution and has a worst case of running forever. To choose an integer in [1, k] using rejection sampling:
Choose a uniform random integer i in [1, k].
With probability f(i)/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is f(i) or less, return i, or go to step 1 otherwise.)
Other algorithms have an average sampling time that doesn't depend so greatly on the distribution (usually either constant or logarithmic), but often require you to precalculate the weights in a setup step and store them in a data structure. Some of them are also economical in terms of the number of random bits they use on average. Many of these algorithms were introduced after 2011, and they include—
The Bringmann–Larsen succinct data structure ("Succinct Sampling from Discrete Distributions", 2012),
Yunpeng Tang's multi-level search ("An Empirical Study of Random Sampling Methods for Changing Discrete Distributions", 2019), and
the Fast Loaded Dice Roller (2020).
Other algorithms include the alias method (already mentioned in your article), the Knuth–Yao algorithm, the MVN data structure, and more. See my section "Weighted Choice With Replacement" for a survey.

Is there a fast way to invert a matrix in Matlab?

I have lots of large (around 5000 x 5000) matrices that I need to invert in Matlab. I actually need the inverse, so I can't use mldivide instead, which is a lot faster for solving Ax=b for just one b.
My matrices are coming from a problem that means they have some nice properties. First off, their determinant is 1 so they're definitely invertible. They aren't diagonalizable, though, or I would try to diagonlize them, invert them, and then put them back. Their entries are all real numbers (actually rational).
I'm using Matlab for getting these matrices and for this stuff I need to do with their inverses, so I would prefer a way to speed Matlab up. But if there is another language I can use that'll be faster, then please let me know. I don't know a lot of other languages (a little but of C and a little but of Java), so if it's really complicated in some other language, then I might not be able to use it. Please go ahead and suggest it, though, in case.
I actually need the inverse, so I can't use mldivide instead,...
That's not true, because you can still use mldivide to get the inverse. Note that A-1 = A-1 * I. In MATLAB, this is equivalent to
invA = A\speye(size(A));
On my machine, this takes about 10.5 seconds for a 5000x5000 matrix. Note that MATLAB does have an inv function to compute the inverse of a matrix. Although this will take about the same amount of time, it is less efficient in terms of numerical accuracy (more info in the link).
First off, their determinant is 1 so they're definitely invertible
Rather than det(A)=1, it is the condition number of your matrix that dictates how accurate or stable the inverse will be. Note that det(A)=∏i=1:n λi. So just setting λ1=M, λn=1/M and λi≠1,n=1 will give you det(A)=1. However, as M → ∞, cond(A) = M2 → ∞ and λn → 0, meaning your matrix is approaching singularity and there will be large numerical errors in computing the inverse.
My matrices are coming from a problem that means they have some nice properties.
Of course, there are other more efficient algorithms that can be employed if your matrix is sparse or has other favorable properties. But without any additional info on your specific problem, there is nothing more that can be said.
I would prefer a way to speed Matlab up
MATLAB uses Gauss elimination to compute the inverse of a general matrix (full rank, non-sparse, without any special properties) using mldivide and this is Θ(n3), where n is the size of the matrix. So, in your case, n=5000 and there are 1.25 x 1011 floating point operations. So on a reasonable machine with about 10 Gflops of computational power, you're going to require at least 12.5 seconds to compute the inverse and there is no way out of this, unless you exploit the "special properties" (if they're exploitable)
Inverting an arbitrary 5000 x 5000 matrix is not computationally easy no matter what language you are using. I would recommend looking into approximations. If your matrices are low rank, you might want to try a low-rank approximation M = USV'
Here are some more ideas from math-overflow:
https://mathoverflow.net/search?q=matrix+inversion+approximation
First suppose the eigen values are all 1. Let A be the Jordan canonical form of your matrix. Then you can compute A^{-1} using only matrix multiplication and addition by
A^{-1} = I + (I-A) + (I-A)^2 + ... + (I-A)^k
where k < dim(A). Why does this work? Because generating functions are awesome. Recall the expansion
(1-x)^{-1} = 1/(1-x) = 1 + x + x^2 + ...
This means that we can invert (1-x) using an infinite sum. You want to invert a matrix A, so you want to take
A = I - X
Solving for X gives X = I-A. Therefore by substitution, we have
A^{-1} = (I - (I-A))^{-1} = 1 + (I-A) + (I-A)^2 + ...
Here I've just used the identity matrix I in place of the number 1. Now we have the problem of convergence to deal with, but this isn't actually a problem. By the assumption that A is in Jordan form and has all eigen values equal to 1, we know that A is upper triangular with all 1s on the diagonal. Therefore I-A is upper triangular with all 0s on the diagonal. Therefore all eigen values of I-A are 0, so its characteristic polynomial is x^dim(A) and its minimal polynomial is x^{k+1} for some k < dim(A). Since a matrix satisfies its minimal (and characteristic) polynomial, this means that (I-A)^{k+1} = 0. Therefore the above series is finite, with the largest nonzero term being (I-A)^k. So it converges.
Now, for the general case, put your matrix into Jordan form, so that you have a block triangular matrix, e.g.:
A 0 0
0 B 0
0 0 C
Where each block has a single value along the diagonal. If that value is a for A, then use the above trick to invert 1/a * A, and then multiply the a back through. Since the full matrix is block triangular the inverse will be
A^{-1} 0 0
0 B^{-1} 0
0 0 C^{-1}
There is nothing special about having three blocks, so this works no matter how many you have.
Note that this trick works whenever you have a matrix in Jordan form. The computation of the inverse in this case will be very fast in Matlab because it only involves matrix multiplication, and you can even use tricks to speed that up since you only need powers of a single matrix. This may not help you, though, if it's really costly to get the matrix into Jordan form.

Why don't genetic algorithms work on problems like factoring RSA?

Some time ago i was pretty interested in GAs and i studied about them quite a bit. I used C++ GAlib to write some programs and i was quite amazed by their ability to solve otherwise difficult to compute problems, in a matter of seconds. They seemed like a great bruteforcing technique that works really really smart and adapts.
I was reading a book by Michalewitz, if i remember the name correctly and it all seemed to be based on the Schema Theorem, proved by MIT.
I've also heard that it cannot really be used to approach problems like factoring RSA private keys.
Could anybody explain why this is the case ?
Genetic Algorithm are not smart at all, they are very greedy optimizer algorithms. They all work around the same idea. You have a group of points ('a population of individuals'), and you transform that group into another one with stochastic operator, with a bias in the direction of best improvement ('mutation + crossover + selection'). Repeat until it converges or you are tired of it, nothing smart there.
For a Genetic Algorithm to work, a new population of points should perform close to the previous population of points. Little perturbation should creates little change. If, after a small perturbation of a point, you obtain a point that represents a solution with completely different performance, then, the algorithm is nothing better than random search, a usually not good optimization algorithm. In the RSA case, if your points are directly the numbers, it's either YES or NO, just by flipping a bit... Thus using a Genetic Algorithm is no better than random search, if you represents the RSA problem without much thinking "let's code search points as the bits of the numbers"
I would say because factorisation of keys is not an optimisation problem, but an exact problem. This distinction is not very accurate, so here are details.
Genetic algorithms are great to solve problems where the are minimums (local/global), but there aren't any in the factorising problem. Genetic algorithm as DCA or Simulated annealing needs a measure of "how close I am to the solution" but you can't say this for our problem.
For an example of problem genetics are good, there is the hill climbing problem.
GAs are based on fitness evaluation of candidate solutions.
You basically have a fitness function that takes in a candidate solution as input and gives you back a scalar telling you how good that candidate is. You then go on and allow the best individuals of a given generation to mate with higher probability than the rest, so that the offspring will be (hopefully) more 'fit' overall, and so on.
There is no way to evaluate fitness (how good is a candidate solution compared to the rest) in the RSA factorization scenario, so that's why you can't use them.
GAs are not brute-forcing, they’re just a search algorithm. Each GA essentially looks like this:
candidates = seed_value;
while (!good_enough(best_of(candidates))) {
candidates = compute_next_generation(candidates);
}
Where good_enough and best_of are defined in terms of a fitness function. A fitness function says how well a given candidate solves the problem. That seems to be the core issue here: how would you write a fitness function for factorization? For example 20 = 2*10 or 4*5. The tuples (2,10) and (4,5) are clearly winners, but what about the others? How “fit” is (1,9) or (3,4)?
Indirectly, you can use a genetic algorithm to factor an integer N. Dixon's integer factorization method uses equations involving powers of the first k primes, modulo N. These products of powers of small primes are called "smooth". If we are using the first k=4 primes - {2,3,5,7} - 42=2x3x7 is smooth and 11 is not (for lack of a better term, 11 is "rough"). Dixon's method requires an invertible k x k matrix consisting of the exponents that define these smooth numbers. For more on Dixon's method see https://en.wikipedia.org/wiki/Dixon%27s_factorization_method.
Now, back to the original question: There is a genetic algorithm for finding equations for Dixon's method.
Let r be the inverse of a smooth number mod N - so r is a rough number
Let s be smooth
Generate random solutions of rx = sy mod N. These solutions [x,y] are the population for the genetic algorithm. Each x, y has a smooth component and a rough component. For example suppose x = 369 = 9 x 41. Then (assuming 41 is not small enough to count as smooth), the rough part of x is 41 and the smooth part is 9.
Choose pairs of solutions - "parents" - to combine into linear combinations with ever smaller rough parts.
The algorithm terminates when a pair [x,y] is found with rough parts [1,1], [1,-1],[-1,1] or [-1,-1]. This yields an equation for Dixon's method, because rx=sy mod N and r is the only rough number left: x and y are smooth, and s started off smooth. But even 1/r mod N is smooth, so it's all smooth!
Every time you combine two pairs - say [v,w] and [x,y] - the smooth parts of the four numbers are obliterated, except for the factors the smooth parts of v and x share, and the factors the smooth parts of w and y share. So we choose parents that share smooth parts to the greatest possible extent. To make this precise, write
g = gcd(smooth part of v, smooth part of x)
h = gcd(smooth part of w, smooth part of y)
[v,w], [x,y] = [g v/g, h w/h], [g x/g, h y/h].
The hard-won smooth factors g and h will be preserved into the next generation, but the smooth parts of v/g, w/h, x/g and y/h will be sacrificed in order to combine [v,w] and [x,y]. So we choose parents for which v/g, w/h, x/g and y/h have the smallest smooth parts. In this way we really do drive down the rough parts of our solutions to rx = sy mod N from one generation to the next.
On further thought the best way to make your way towards smooth coefficients x, y in the lattice ax = by mod N is with regression, not a genetic algorithm.
Two regressions are performed, one with response vector R0 consisting of x-values from randomly chosen solutions of ax = by mod N; and the other with response vector R1 consisting of y-values from the same solutions. Both regressions use the same explanatory matrix X. In X are columns consisting of the remainders of the x-values modulo smooth divisors, and other columns consisting of the remainders of the y-values modulo other smooth divisors.
The best choice of smooth divisors is the one that minimizes the errors from each regression:
E0 = R0 - X (inverse of (X-transpose)(X)) (X-transpose) (R0)
E1 = R1 - X (inverse of (X-transpose)(X)) (X-transpose) (R1)
What follows is row operations to annihilate X. Then apply a result z of these row operations to the x- and y-values from the original solutions from which X was formed.
z R0 = z R0 - 0
= z R0 - zX (inverse of (X-transpose)(X)) (X-transpose) (R0)
= z E0
Similarly, z R1 = z E1
Three properties are now combined in z R0 and z R1:
They are multiples of large smooth numbers, because z annihilates remainders modulo smooth numbers.
They are relatively small, since E0 and E1 are small.
Like any linear combination of solutions to ax = by mod N, z R0 and z R1 are themselves solutions to that equation.
A relatively small multiple of a large smooth number might just be the smooth number itself. Having a smooth solution of ax = by mod N yields an input to Dixon's method.
Two optimizations make this particularly fast:
There is no need to guess all the smooth numbers and columns of X at once. You can run regressions continuosly, adding one column to X at a time, choosing columns that reduce E0 and E1 the most. At no time will any two smooth numbers with a common factor be selected.
You can also start with a lot of random solutions of zx = by mod N, and remove the ones with the largest errors between selections of new columns for X.

Data structures for loaded dice?

Suppose that I have an n-sided loaded die, where each side k has some probability pk of coming up when I roll it. I’m curious if there is a good data structure for storing this information statically (i.e., for a fixed set of probabilities), so that I can efficiently simulate a random roll of the die.
Currently, I have an O(lg n) solution for this problem. The idea is to store a table of the cumulative probability of the first k sides for all k, then generate a random real number in the range [0, 1) and perform a binary search over the table to get the largest index whose cumulative value is no greater than the chosen value.
I rather like this solution, but it seems odd that the runtime doesn’t take the probabilities into account. In particular, in the extreme cases of one side always coming up or the values being uniformly distributed, it’s possible to generate the result of the roll in O(1) using a naive approach, while my solution will still take logarithmically many steps.
Does anyone have any suggestions for how to solve this problem in a way that is somehow “adaptive” in it’s runtime?
Update: Based on the answers to this question, I have written up an article describing many approaches to this problem, along with their analyses. It looks like Vose’s implementation of the alias method gives Θ(n) preprocessing time and O(1) time per die roll, which is truly impressive. Hopefully this is a useful addition to the information contained in the answers!
You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.
The idea is to take your array of probabilities pk and produce three new n-element arrays, qk, ak, and bk. Each qk is a probability between 0 and 1, and each ak and bk is an integer between 1 and n.
We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If qi < s then return ai else return bi. The work in the alias method is in figuring out how to produce qk, ak and bk.
Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.
function get_result(node, seed):
if seed < node.interval.start:
return get_result(node.left_child, seed)
else if seed < node.interval.end:
// start <= seed < end
return node.result
else:
return get_result(node.right_child, seed)
The good thing about this solution is that is very simple to implement but still has good complexity.
I'm thinking of granulating your table.
Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.
Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.
Maybe I could explain easier with an example:
Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3
Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)
arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3
Then to get the probability, just randomize a number between 0 and 10 and simply access that index.
This method might loose accuracy, but increase x and accuracy will be sufficient.
There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time.
One of the simplest ways to choose an integer with a custom weight function f(x) is the rejection sampling method. The following assumes that the highest possible value of f is max and each weight is 0 or greater. The time complexity for rejection sampling is constant on average, but depends greatly on the shape of the distribution and has a worst case of running forever. To choose an integer in [1, k] using rejection sampling:
Choose a uniform random integer i in [1, k].
With probability f(i)/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is f(i) or less, return i, or go to step 1 otherwise.)
Other algorithms have an average sampling time that doesn't depend so greatly on the distribution (usually either constant or logarithmic), but often require you to precalculate the weights in a setup step and store them in a data structure. Some of them are also economical in terms of the number of random bits they use on average. Many of these algorithms were introduced after 2011, and they include—
The Bringmann–Larsen succinct data structure ("Succinct Sampling from Discrete Distributions", 2012),
Yunpeng Tang's multi-level search ("An Empirical Study of Random Sampling Methods for Changing Discrete Distributions", 2019), and
the Fast Loaded Dice Roller (2020).
Other algorithms include the alias method (already mentioned in your article), the Knuth–Yao algorithm, the MVN data structure, and more. See my section "Weighted Choice With Replacement" for a survey.

Resources