PyMC: What 's the best way to create an array with a random number of random variables in pymc? - pymc

Say you have a hierarchical model in which N is given by a Poisson distribution (N ~ Poisson(mu=lambda); prob(N=n)=Poisson(x=n; mu=lambda)), or any other distribution, it doesn't matter. And then you want to create an array with N random variables, say, exponentially distributed: X = (X_1, X_2, ..., X_n) and X_i ~ Exponential(beta=a_i). What's the best way to do this? (Note that both the dimension and the components are random.)

Related

Algorithm For Partition An Array into Subsets With Minimum Total Variance

I have an array of floating point numbers and I would like to partition the array into two subsets, such that their total variance is minimized.
The total variance is defined in the following:
var = (var_1 * n_1 + var_2 * n_2)/(n_1 + n_2)
where n_1 and n_2 are number of elements on the left/right respectively, and var_1 and var_2 are variance on the left/right respectively.
My question is: Is there any efficient algorithm for finding the global minimum of the total variance? The algorithm should output two subset, each containing the elements of the corresponding group.
Moreover, suppose each element is a tuple (x,y), and instead of variance I would like to find the global covariance of the left and right, defined in similar way as above. Is there some general algorithm for dealing with such partition problems? I guess this should be harder because all algorithms I can think of requires sorting the array, but there is no obvious comparator for sorting the tuple here.

minimize variance of k integers from n ordered integers

Given a series of n integers and a number k, n>k, what's the solution of minimizing the variance of k new integers? You may add up any successive integers to a new integer and thus reduce n integers to k integers.
Here is an example. Given n=4, k=2, the series of integers are 4,4,1,1. The solution is 4,6 instead of 8,2 or 9,1.
I have come up with a greedy algorithm which goes like this: for every possible new integers, minimize the absolute value of the difference of this integer and the average of all the integers. But this won't work in some cases. Is there any efficient algorithm works?
The variance of a random variable X is E[(X - E[X])^2]. Here X is a random element of the output list. We know that E[X] is equal to the sum of the input numbers divided by k, so this objective is equivalent to the sum of (x - sum/k)^2 over output values x. This can be accomplished by slightly modifying a word wrap algorithm: Word wrap to X lines instead of maximum width (Least raggedness)

Efficient way to take determinant of an n! x n! matrix in Maple

I have a large matrix, n! x n!, for which I need to take the determinant. For each permutation of n, I associate
a vector of length 2n (this is easy computationally)
a polynomial of in 2n variables (a product of linear factors computed recursively on n)
The matrix is the evaluation matrix for the polynomials at the vectors (thought of as points). So the sigma,tau entry of the matrix (indexed by permutations) is the polynomial for sigma evaluated at the vector for tau.
Example: For n=3, if the ith polynomial is (x1 - 4)(x3 - 5)(x4 - 4)(x6 - 1) and the jth point is (2,2,1,3,5,2), then the (i,j)th entry of the matrix will be (2 - 4)(1 - 5)(3 - 4)(2 - 1) = -8. Here n=3, so the points are in R^(3!) = R^6 and the polynomials have 3!=6 variables.
My goal is to determine whether or not the matrix is nonsingular.
My approach right now is this:
the function point takes a permutation and outputs a vector
the function poly takes a permutation and outputs a polynomial
the function nextPerm gives the next permutation in lexicographic order
The abridged pseudocode version of my code is this:
B := [];
P := [];
w := [1,2,...,n];
while w <> NULL do
B := B append poly(w);
P := P append point(w);
w := nextPerm(w);
od;
// BUILD A MATRIX IN MAPLE
M := Matrix(n!, (i,j) -> eval(B[i],P[j]));
// COMPUTE DETERMINANT IN MAPLE
det := LinearAlgebra[Determinant]( M );
// TELL ME IF IT'S NONSINGULAR
if det = 0 then return false;
else return true; fi;
I'm working in Maple using the built in function LinearAlgebra[Determinant], but everything else is a custom built function that uses low level Maple functions (e.g. seq, convert and cat).
My problem is that this takes too long, meaning I can go up to n=7 with patience, but getting n=8 takes days. Ideally, I want to be able to get to n=10.
Does anyone have an idea for how I could improve the time? I'm open to working in a different language, e.g. Matlab or C, but would prefer to find a way to speed this up within Maple.
I realize this might be hard to answer without all the gory details, but the code for each function, e.g. point and poly, is already optimized, so the real question here is if there is a faster way to take a determinant by building the matrix on the fly, or something like that.
UPDATE: Here are two ideas that I've toyed with that don't work:
I can store the polynomials (since they take a while to compute, I don't want to redo that if I can help it) into a vector of length n!, and compute the points on the fly, and plug these values into the permutation formula for the determinant:
The problem here is that this is O(N!) in the size of the matrix, so for my case this will be O((n!)!). When n=10, (n!)! = 3,628,800! which is way to big to even consider doing.
Compute the determinant using the LU decomposition. Luckily, the main diagonal of my matrix is nonzero, so this is feasible. Since this is O(N^3) in the size of the matrix, that becomes O((n!)^3) which is much closer to doable. The problem, though, is that it requires me to store the whole matrix, which puts serious strain on memory, nevermind the run time. So this doesn't work either, at least not without a bit more cleverness. Any ideas?
It isn't clear to me if your problem is space or time. Obviously the two trade back and forth. If you only wish to know if the determinant is positive or not, then you definitely should go with LU decomposition. The reason is that if A = LU with L lower triangular and U upper triangular, then
det(A) = det(L) det(U) = l_11 * ... * l_nn * u_11 * ... * u_nn
so you only need to determine if any of the main diagonal entries of L or U is 0.
To simplify further, use Doolittle's algorithm, where l_ii = 1. If at any point the algorithm breaks down, the matrix is singular so you can stop. Here's the gist:
for k := 1, 2, ..., n do {
for j := k, k+1, ..., n do {
u_kj := a_kj - sum_{s=1...k-1} l_ks u_sj;
}
for i = k+1, k+2, ..., n do {
l_ik := (a_ik - sum_{s=1...k-1} l_is u_sk)/u_kk;
}
}
The key is that you can compute the ith row of U and the ith column of L at the same time, and you only need to know the previous row/column to move forward. This way you parallel process as much as you can and store as little as you need. Since you can compute the entries a_ij as needed, this requires you to store two vectors of length n while generating two more vectors of length n (rows of U, columns of L). The algorithm takes n^2 time. You might be able to find a few more tricks, but that depends on your space/time trade off.
Not sure if I've followed your problem; is it (or does it reduce to) the following?
You have two vectors of n numbers, call them x and c, then the matrix element is product over k of (x_k+c_k), with each row/column corresponding to distinct orderings of x and c?
If so, then I believe the matrix will be singular whenever there are repeated values in either x or c, since the matrix will then have repeated rows/columns. Try a bunch of Monte Carlo's on a smaller n with distinct values of x and c to see if that case is in general non-singular - it's quite likely if that's true for 6, it'll be true for 10.
As far as brute-force goes, your method:
Is a non-starter
Will work much more quickly (should be a few seconds for n=7), though instead of LU you might want to try SVD, which will do a much better job of letting you know how well behaved your matrix is.

Sum Combination List

I need an algorithm for this problem:
Given a set of n natural numbers x1,x2,...,xn, a number S and k. Form the sum of k numbers picked from the set (a number can be pick many times) with sum S.
Stated differently: List every possible combination for S with Bounds: n<=256, x<=1000, k<=32
E.g.
problem instance: {1,2,5,9,11,12,14,15}, S=30, k=3
There are 4 possible combinations
S=1+14+15, 2+14+14, 5+11+15, 9+9+12.
With these bounds, it is unfeasible to use brute force but I think of dynamic programming is a good approach.
The scheme is: Table t, with t[m,v] = number of combinations of sum v formed by m numbers.
1. Initialize t[1,x(i)], for every i.
2. Then use formula t[m,v]=Sum(t[m-1,v-x(i)], every i satisfied v-x(i)>0), 2<=m<=k.
3. After obtaining t[k,S], I can trace back to find all the combinations.
The dilemma is that t[m,v] can be increase by duplicate commutative combinations e.g., t[2,16]=2 due to 16=15+1 and 1+15. Furthermore, the final result f[3,30] is large, due to 1+14+15, 1+15+14, ...,2+14+14,14+2+14,...
How to get rid of symmetric permutations? Thanks in advance.
You can get rid of permutations by imposing an ordering on the way you pick elements of x. Make your table a triple t[m, v, n] = number of combinations of sum v formed by m numbers from x1..xn. Now observe t[m, v, n] = t[m, v, n-1] + t[m-1, v-x_n, n]. This solves the permutation problem by only generating summands in reverse order from their appearance in x. So for instance it'll generate 15+14+1 and 14+14+2 but never 14+15+1.
(You probably don't need to fill out the whole table, so you should probably compute lazily; in fact, a memoized recursive function is probably what you want here.)

partition a sequence of 2n real numbers so that

I'm currently reading The Algorithm Design Manual and I'm stuck on this exercise.
Take a sequence of 2n real numbers as input. Design an O(n log n) algorithm that
partitions the numbers into n pairs, with the property that the partition minimizes
the maximum sum of a pair. For example, say we are given the numbers (1,3,5,9).
The possible partitions are ((1,3),(5,9)), ((1,5),(3,9)), and ((1,9),(3,5)). The pair
sums for these partitions are (4,14), (6,12), and (10,8). Thus the third partition has
10 as its maximum sum, which is the minimum over the three partitions.
My guess from some examples is that the solution looks like this:
# in pseudo ruby code
a = [1,3,5,9]
pairs = []
while !a.empty?
pairs << [a.shift, a.pop] # [a.first, a.last]
end
pairs
But how to prove it?
The algorithm works because when x0, x1, ... x2n-1 is the sorted list, there is always an optimal solution that contains (x0, x2n-1).
Proof:
Consider any optimal solution which does not contain (x0, x2n-1). It must contain pairs (x0, xa) and (xb, x2n-1) with x0 ≤ xa ≤ x2n-1 and x0 ≤ xb ≤ x2n-1. Remove those pairs from the solution, and in their place put (x0, x2n-1) and (xa, xb). Could the presence of either new pair have "damaged" the solution? The pair (x0, x2n-1) could not have, since its sum is less than or equal to the sum of (xb, x2n-1) which was a member of the original, optimal solution. Neither again could (xa, xb) have caused damage, since its sum is less than or equal to the sum of (xb, x2n-1), which was a member of the same solution. We have constructed an optimal solution which does contain (x0, x2n-1).
Thus the algorithm you give never forecloses the possibility of finding an optimal solution at any step, and when there are only two values left to pair they must be paired together.
Given the input array
x0 x1 x2 ... x2n
use merge sort to sort it in O( n log n) time to produce a sorted list
xa0 xa1 ... xa2n
where the indices ai indicate the permutation that you have performed on the initial list to obtain the second list.
I claim then that the pairing that produces the minimum maximum sum over all pairings is the following pairing:
(xai, xa2n-i) for i = 0, 1, ..., n. That is, you group the highest valued number with the lowest valued number available.
Proof
Proceed by induction, for the case 2n=2 this is obvious.
Without loss of generality consider that the input is a list of sorted numbers (since if it is not then sort them)
x0 x1 x2 ... x2n.
Consider the pairing of x2n with any number, then clearly the minmum sum of this pairing is achieved with (x2n, x0).
Now consider the pairing of x2n-1, either (x2n,x0),(x2n-1,x1) is the pairings that produce the minumim max sum, or (x2n,x1),(x2n-1,x0) is, or both are. In the latter case our choice doesn't matter. In the seconds to last case, this is impossible (think about it.) In the general case if we proceed inductively by this process, when we are looking for a pair for x2n-k, xk is the lowest unused value that we can pair up with, however; suppose instead we pair up xk with some other x2n-j for j < k, to try to get a lower minum sum. This is impossible as, x2n-j + xk >= x2n-k + xk, so at most we can only achieve the same minimum sum.
This means that choosing (x2n-k,xk) gives us the minimum pairings.
I think I can prove this for a sequence with no duplicated numbers, and it should be a reasonably simple exercise for the reader to extend the proof to non-unique sequences.
Pair x0, x2n together, then pair all other numbers according to an optimal solution.
Now consider the pairing of (x0, x2n) against any other pair xy, xz from the optimal subset. x2n + either xy or xz will be greater than xy+xz and also x2n+x0, therefore the pairing of x2n, x0 was optimal.
The proof now extends by induction to the pairing of X1, X2n-1, and further partitions of the subset, eventually producing the OP's pairing.

Resources