I have a positive definite covariance matrix C of size 3n x 3n constructed from n^2 blocks of size 3x3.
Running MvNormal (with e.g a zero mean vector) on this matrix to draw Gaussian random vectors, I am getting the error
PosDefException: matrix is not positive definite; Cholesky factorization failed.
and indeed checking isposef(C) returns false when n becomes too large. However my matrix should be positive definite for any n, so it seems that there is some kind of numerical instability (perhaps due to the determinant becoming too small or too large beyond machine precision).
The reproducible code I am using to generate C is below:
#######################################################
# inputs
grid_size=10
l_sq = 1
xmax = 2
#######################################################
# kernel function used to construct covariance matrix C
function corr(x, y, l_sq)
v=x-y
d_sq=sum(v.^2)
n = d_sq/(2l_sq)
return exp(-n)*(Matrix{Float64}(I, length(x), length(x)))
end
nb_grid_points = grid_size^3
gaussian_vector_dim = 3*nb_grid_points
oneD_grid = LinRange(-xmax, xmax, grid_size)
# get input set X which indexes grid points
threeD_grid = collect.(Iterators.product(oneD_grid, oneD_grid, oneD_grid))
grid_points = vec(reshape(threeD_grid,:,1))
########################################
# build C by blocks
C = Array{Float64}(undef, gaussian_vector_dim, gaussian_vector_dim)
for i in 1:nb_grid_points
for j in 1:nb_grid_points
#block covariance matrix C consist of DxD correlation-function matrices K_i,j for i,j=1,...,nb_grid_points
C[3*(i-1)+1:(3*i),(3*(j-1)+1):(3*j)] = corr(grid_points[i], grid_points[j],l_sq)
end
end
#########################################
# plot covariance matrix
plt.imshow(C,cmap="Blues", interpolation="none")
plt.colorbar()
plt.title("Covariance matrix")
#########################################
print("C is symmetric:",issymmetric(C))
print("\ndet C=",det(C))
print("\nC is positive definite=",isposdef(C))
Maintaining, l_sq = 1 , xmax = 2, the code above gives isposdef(C) = false when grid_size=10 but isposdef(C)=true if grid_size is 9 or less.
Why is this failure occurring and how can I fix it? Perhaps I can help Julia by indicating that the covariance matrix is sparse?
Related
Let N be a (linear) single-layer perceptron with weight matrix w of dimension nxn.
I want to train N under the Boolean constraint that the condition number k(w) of the weights w remain below a given threshold k_0 at each step of the optimisation.
Is there a standard way to implement this constraint (in pytorch, say)?
After each optimizer step, go through the list of parameters and recondition all matrices:
(code looked at for a few seconds, but not tested)
def recondition_(x, max_cond): # would need to be fixed for non-square x
u, s, vh = torch.linalg.svd(x)
curr_cond = s[0] / s[-1]
if curr_cond > max_cond:
ratio = curr_cond / max_cond
mult = torch.linspace(0, math.log(ratio), len(s)).exp()
s = mult * s
x[:] = torch.mm(u, torch.mm(torch.diag(s), vh))
Training loop:
...
optimizer.step()
with torch.no_grad():
for p in model.parameters():
if p.dim() == 2:
recondition_(p, max_cond)
...
I'm using a 64 bit LCG (MMIX (by Knuth)). It generate a certain block of random numbers inside my code, which use them to perform some operations. My code works in single core and I would like to parallelize the work to reduce the execution time.
Before start thinking to more advanced methods in this sense I'd like to simply execute more identical codes in parallel (in fact the code repeats the same task over a certain numbers of indipendent simulation, so I can simply split the number of simulation between more identical codes and run them in parallel).
My only problem now is to find a seed for each code; in particular, to avoid the possibility of unwanted non trivial correlation between data generated in different codes, I have to be sure that the random number generated in the various codes don't overlap. To do so, starting from a certain seed in the first code I have to find a way to find a value (the next seed) very distant not in absolute value but in the pseudo-random sequence (so, such that, to go from the first to the second seed, I need a huge number of steps of LCG).
My first attempt was this:
starting from the LCG relation between 2 consecutive numbers generated in the sequence
So, in principle, I could calculate the above relation with, say, n = 2^40 and I_0 equal to the value of the first seed, and obtain a new seed distant 2^40 steps in the random CLG sequence from the first one.
The problem is that, doing so, I necessary go in overflow calculating a^n. In fact for MMIX (by Knuth) a~2^62 and i use unsigned long long int (64 bit). Note that the only problem here is the fraction in the above relation. If there only were sum and multiplication I could ignore the overflow problem due to the following modular properties (in fact I'm using 2^64 as c (64 bit generator)):
So, starting from a certain value (first seed), how can I find a second one distant a huge number of step in the LC pseudo-random sequence?
[EDIT]
r3mainer solution is perfectly suited for python codes. I'm trying now to implement it in c using unsigned __int128 variables. I have only one problem: in principle I should compute:
Say, for simplicity, I want to compute:
with n = 2^40 and c(a-1)~2^126. I proceed with a cycle.Starting with temp = a, in each iteration I compute temp = temp*temp, then I compute temp%c(a-1). The problem is in the second step (temp = temp*temp). temp in fact could be, in principle any number < c(a-1)~2^126. If temp is a big number, say > 2^64, I'll go in overflow, reaching 2^128 - 1, before the next module operation. So is there a way to avoid it? For now the only solution I see is to perform each multiplication with a loop over bit, as suggested here: c code: prevent overflow in modular operation with huge modules (modules near the overflow treshold)
Is there another way to perform module operation during the multiplication?
(note that being c = 2^64, with mod(c) operation I don't have the same problem because the overflow point (for ull int variables) coincides with the module)
Any LCG of the form x[n+1] = (x[n] * a + c) % m can be skipped to an arbitrary position very quickly.
Starting with a seed value of zero, the first few iterations of the LCG will give you this sequence:
x₀ = 0
x₁ = c % m
x₂ = (c(a + 1)) % m
x₃ = (c(a² + a + 1)) % m
x₄ = (c(a³ + a² + a + 1)) % m
It's pretty easy to see that each term is actually the sum of a geometric series, which can be calculated with a simple formula:
x_n = (c(a^{n-1} + a^{n-2} + ... + a + 1)) % m
= (c * (a^n - 1) / (a - 1)) % m
The (a^n - 1) term can be calculated quickly by modular exponentiation, but dividing by (a-1) is a bit tricky because (a-1) and m are both even (i.e., not coprime), so we can't calculate the modular multiplicative inverse of (a-1) mod m directly.
Instead, calculate (a^n-1) mod m*(a-1), then perform a straightforward (non-modular) division of the result by a-1. In Python, the calculation would go something like this:
def lcg_skip(m, a, c, n):
# Calculate nth term of LCG sequence with parameters m (modulus),
# a (multiplier) and c (increment), assuming an initial seed of zero
a1 = a - 1
t = pow(a, n, m * a1) - 1
t = (t * c // a1) % m
return t
def test(nsteps):
m = 2**64
a = 6364136223846793005
c = 1442695040888963407
#
print("Calculating by brute force:")
seed = 0
for i in range(nsteps):
seed = (seed * a + c) % m
print(seed)
#
print("Calculating by fast method:")
# Calculate nth term by modular exponentiation
print(lcg_skip(m, a, c, nsteps))
test(1000000)
So to create LCGs with non-overlapping output sequences, all you would need to do is use initial seed values generated by lcg_skip() with values of n that are far enough apart.
Well, for LCG it is known property to jump forward and backward in O(log2(N)) time where N is the distance between jump points, paper by F. Brown, "Random Number Generation with Arbitrary Stride," Trans. Am. Nucl. Soc. (Nov. 1994).
It means if you have LCG parameters (a, c) satisfying Hull–Dobell Theorem, then whole period would be 264 numbers before repeating themself, and say for Nt number pf threads you make jump distance of 264 / Nt, and all threads start with the same seed and just jump after initializing LCG by (264 / Nt)*threadId, and you would be completely safe from RNG correlations due to sequences overlap.
For simplest case of common 64 unsigned modulo math, as implemented in NumPy, code below should work fine
import numpy as np
class LCG(object):
UZERO: np.uint64 = np.uint64(0)
UONE : np.uint64 = np.uint64(1)
def __init__(self, seed: np.uint64, a: np.uint64, c: np.uint64) -> None:
self._seed: np.uint64 = np.uint64(seed)
self._a : np.uint64 = np.uint64(a)
self._c : np.uint64 = np.uint64(c)
def next(self) -> np.uint64:
self._seed = self._a * self._seed + self._c
return self._seed
def seed(self) -> np.uint64:
return self._seed
def set_seed(self, seed: np.uint64) -> np.uint64:
self._seed = seed
def skip(self, ns: np.int64) -> None:
"""
Signed argument - skip forward as well as backward
The algorithm here to determine the parameters used to skip ahead is
described in the paper F. Brown, "Random Number Generation with Arbitrary Stride,"
Trans. Am. Nucl. Soc. (Nov. 1994). This algorithm is able to skip ahead in
O(log2(N)) operations instead of O(N). It computes parameters
A and C which can then be used to find x_N = A*x_0 + C mod 2^M.
"""
nskip: np.uint64 = np.uint64(ns)
a: np.uint64 = self._a
c: np.uint64 = self._c
a_next: np.uint64 = LCG.UONE
c_next: np.uint64 = LCG.UZERO
while nskip > LCG.UZERO:
if (nskip & LCG.UONE) != LCG.UZERO:
a_next = a_next * a
c_next = c_next * a + c
c = (a + LCG.UONE) * c
a = a * a
nskip = nskip >> LCG.UONE
self._seed = a_next * self._seed + c_next
#%%
np.seterr(over='ignore')
seed = np.uint64(1)
rng64 = LCG(seed, np.uint64(6364136223846793005), np.uint64(1))
print(rng64.next())
print(rng64.next())
print(rng64.next())
#%%
rng64.skip(-3) # back by 3
print(rng64.next())
print(rng64.next())
print(rng64.next())
rng64.skip(-3) # back by 3
rng64.skip(2) # forward by 2
print(rng64.next())
Tested in Python 3.9.1, x64 Win 10
I am writing a function that checks if a matrix X is positive semidefinite with a given rank k. To do this, I compute the eigenvalues of X, and I check that exactly k of them are positive and the rest are 0. Here's what I have so far:
using LinearAlgebra
function ispossemdef(X::AbstractMatrix, k::Int, ϵ::Real = 1e-10)
n = size(X, 1) # dim of X
!issymmetric(X) && return false # short-circuit if X is asymmetric
k > n && error("k > n") # throw error if k > n
eigs = eigvals(X) # eigenvalues of X in ascending order
z = eigs[1:(n - k)] # the values that should be zero
p = eigs[(n - k + 1):end] # the values that should be positive
n_minus_k_zero_eigenvalues = norm(z) < ϵ
k_positive_eigenvalues = all(p .> ϵ)
return n_minus_k_zero_eigenvalues & k_positive_eigenvalues
end
Is there a better algorithm for doing this? Better might mean faster (avoids computing the eigenvalues), or more numerically stable (lets me get away with a stricter error tolerance).
For example, the isposdef function (which is the k = n special case of what I'm doing) works by attempting to compute the Cholesky factor of X, and reporting back with whether or not it could. Can I generalize this procedure to semidefinite matrices? If so, is it better than checking the eigenvalues?
It will not work on all matrices, but have you looked at
using LinearAlgebra # for julia 1+
help> isposdef
at the isposdef() function?
Motivation:
In writing out a matrix operation that was to be performed over tens of thousands of vectors I kept coming across the warning:
Requested 200000x200000 (298.0GB) array exceeds maximum array size
preference. Creation of arrays greater than this limit may take a long
time and cause MATLAB to become unresponsive. See array size limit or
preference panel for more information.
The reason for this was my use of diag() to get the values down the diagonal of an matrix inner product. Because MATLAB is generally optimized for vector/matrix operations, when I first write code, I usually go for the vectorized form. In this case, however, MATLAB has to build the entire matrix in order to get the diagonal which causes the memory and speed issues.
Experiment:
I decided to test the use of diag() vs a for loop to see if at any point it was more efficient to use diag():
num = 200000; % Matrix dimension
x = ones(num, 1);
y = 2 * ones(num, 1);
% z = diag(x*y'); % Expression to solve
% Loop approach
tic
z = zeros(num,1);
for i = 1 : num
z(i) = x(i)*y(i);
end
toc
% Dividing the too-large matrix into process-able chunks
fraction = [10, 20, 50, 100, 500, 1000, 5000, 10000, 20000];
time = zeros(size(fraction));
for k = 1 : length(fraction)
f = fraction(k);
% Operation to time
tic
z = zeros(num,1);
for i = 1 : k
first = (i-1) * (num / f);
last = first + (num / f);
z(first + 1 : last) = diag(x(first + 1: last) * y(first + 1 : last)');
end
time(k) = toc;
end
% Plot results
figure;
hold on
plot(log10(fraction), log10(chunkTime));
plot(log10(fraction), repmat(log10(loopTime), 1, length(fraction)));
plot(log10(fraction), log10(chunkTime), 'g*'); % Plot points along time
legend('Partioned Running Time', 'Loop Running Time');
xlabel('Log_{10}(Fractional Size)'), ylabel('Log_{10}(Running Time)'), title('Running Time Comparison');
This is the result of the test:
(NOTE: The red line represents the loop time as a threshold--it's not to say that the total loop time is constant regardless of the number of loops)
From the graph it is clear that it takes breaking the operations down into roughly 200x200 square matrices to be faster to use diag than to perform the same operation using loops.
Question:
Can someone explain why I'm seeing these results? Also, I would think that with MATLAB's ever-more optimized design, there would be built-in handling of these massive matrices within a diag() function call. For example, it could just perform the i = j indexed operations. Is there a particular reason why this might be prohibitive?
I also haven't really thought of memory implications for diag using the partition method, although it's clear that as the partition size decreases, memory requirements drop.
Test of speed of diag vs. a loop.
Initialization:
n = 10000;
M = randn(n, n); %create a random matrix.
Test speed of diag:
tic;
d = diag(M);
toc;
Test speed of loop:
tic;
d = zeros(n, 1);
for i=1:n
d(i) = M(i,i);
end;
toc;
This would test diag. Your code is not a clean test of diag...
Comment on where there might be confusion
Diag only extracts the diagonal of a matrix. If x and y are vectors, and you do d = diag(x * y'), MATLAB first constructs the n by n matrix x*y' and calls diag on that. This is why, you get the error, "cannot construct 290GB matrix..." Matlab interpreter does not optimize in a crazy way, realize you only want the diagonal and construct just a vector (rather than full matrix with x*y', that does not happen.
Not sure if you're asking this, but the fastest way to calculate d = diag(x*y') where x and y are n by 1 vectors would simply be: d = x.*y
Statement of Problem:
I have an array M with m rows and n columns. The array M is filled with non-zero elements.
I also have a vector t with n elements, and a vector omega
with m elements.
The elements of t correspond to the columns of matrix M.
The elements of omega correspond to the rows of matrix M.
Goal of Algorithm:
Define chi as the multiplication of vector t and omega. I need to obtain a 1D vector a, where each element of a is a function of chi.
Each element of chi is unique (i.e. every element is different).
Using mathematics notation, this can be expressed as a(chi)
Each element of vector a corresponds to an element or elements of M.
Matlab code:
Here is a code snippet showing how the vectors t and omega are generated. The matrix M is pre-existing.
[m,n] = size(M);
t = linspace(0,5,n);
omega = linspace(0,628,m);
Conceptual Diagram:
This appears to be a type of integration (if this is the right word for it) along constant chi.
Reference:
Link to reference
The algorithm is not explicitly stated in the reference. I only wish that this algorithm was described in a manner reminiscent of computer science textbooks!
Looking at Figure 11.5, the matrix M is Figure 11.5(a). The goal is to find an algorithm to convert Figure 11.5(a) into 11.5(b).
It appears that the algorithm is a type of integration (averaging, perhaps?) along constant chi.
It appears to me that reshape is the matlab function you need to use. As noted in the link:
B = reshape(A,siz) returns an n-dimensional array with the same elements as A, but reshaped to siz, a vector representing the dimensions of the reshaped array.
That is, create a vector siz with the number m*n in it, and say A = reshape(P,siz), where P is the product of vectors t and ω; or perhaps say something like A = reshape(t*ω,[m*n]). (I don't have matlab here, or would run a test to see if I have the product the right way around.) Note, the link does not show an example with one number (instead of several) after the matrix parameter to reshape, but I would expect from the description that A = reshape(t*ω,m*n) might also work.
You should add a pseudocode or a link to the algorithm you want to implement. From what I could understood I have developed the following code anyway:
M = [1 2 3 4; 5 6 7 8; 9 10 11 12]' % easy test M matrix
a = reshape(M, prod(size(M)), 1) % convert M to vector 'a' with reshape command
[m,n] = size(M); % Your sample code
t = linspace(0,5,n); % Your sample code
omega = linspace(0,628,m); % Your sample code
for i=1:length(t)
for j=1:length(omega) % Acces a(chi) in the desired order
chi = length(omega)*(i-1)+j;
t(i) % related t value
omega(j) % related omega value
a(chi) % related a(chi) value
end
end
As you can see, I also think that the reshape() function is the solution to your problems. I hope that this code helps,
The basic idea is to use two separate loops. The outer loop is over the chi variable values, whereas the inner loop is over the i variable values. Referring to the above diagram in the original question, the i variable corresponds to the x-axis (time), and the j variable corresponds to the y-axis (frequency). Assuming that the chi, i, and j variables can take on any real number, bilinear interpolation is then used to find an amplitude corresponding to an element in matrix M. The integration is just an averaging over elements of M.
The following code snippet provides an overview of the basic algorithm to express elements of a matrix as a vector using the spectral collapsing from 2D to 1D. I can't find any reference for this, but it is a solution that works for me.
% Amp = amplitude vector corresponding to Figure 11.5(b) in book reference
% M = matrix corresponding to the absolute value of the complex Gabor transform
% matrix in Figure 11.5(a) in book reference
% Nchi = number of chi in chi vector
% prod = product of timestep and frequency step
% dt = time step
% domega = frequency step
% omega_max = maximum angular frequency
% i = time array element along x-axis
% j = frequency array element along y-axis
% current_i = current time array element in loop
% current_j = current frequency array element in loop
% Nchi = number of chi
% Nivar = number of i variables
% ivar = i variable vector
% calculate for chi = 0, which only occurs when
% t = 0 and omega = 0, at i = 1
av0 = mean( M(1,:) );
av1 = mean( M(2:end,1) );
av2 = mean( [av0 av1] );
Amp(1) = av2;
% av_val holds the sum of all values that have been averaged
av_val_sum = 0;
% loop for rest of chi
for ccnt = 2:Nchi % 2:Nchi
av_val_sum = 0; % reset av_val_sum
current_chi = chi( ccnt ); % current value of chi
% loop over i vector
for icnt = 1:Nivar % 1:Nivar
current_i = ivar( icnt );
current_j = (current_chi / (prod * (current_i - 1))) + 1;
current_t = dt * (current_i - 1);
current_omega = domega * (current_j - 1);
% values out of range
if(current_omega > omega_max)
continue;
end
% use bilinear interpolation to find an amplitude
% at current_t and current_omega from matrix M
% f_x_y is the bilinear interpolated amplitude
% Insert bilinear interpolation code here
% add to running sum
av_val_sum = av_val_sum + f_x_y;
end % icnt loop
% compute the average over all i
av = av_val_sum / Nivar;
% assign the average to Amp
Amp(ccnt) = av;
end % ccnt loop