c++ eigen A.inverse()*B not equal to A.ldlt().solve(B) - eigen

I would like to compute the trace of the product of two given matrices, say A and B, Trace(AInv * B) where * is the regular matrix product, AInv is the inverse of A (being symmetric and positive definite) and B is symmetric.
Solution 1: computing the inverse explicitely
Noting that Trace(AInv * B) is equivalent to taking the sum of the componentwise product of AInv and B:
double sol1 = (A.inverse().cwiseProduct(B)).sum();
Solution 2: using ldlt decomposition from the Eigen library
double sol2 = (A.selfadjointView<Lower>().ldlt().solve(B)).trace();
Theoretically, these solutions should be the same, but in my test, they don't. Sounds like I am missing something. As .ldlt().solve() is not made to compute matrix inverse but rather solve a linear system, my question is : does .ldlt() perform any sort of normalization? If not, what I am doing wrong?
Many thanks!

The statement to compute sol1 is wrong: you need to either transpose one of the operands or use a matrix-matrix product: correct versions:
double sol1 = (A.inverse().cwiseProduct(B.transpose())).sum();
double sol1 = (A.inverse().lazyProduct(B)).diagonal().sum();
double sol1 = (A.inverse().lazyProduct(B)).trace();
double sol1 = (A.inverse() * B).diagonal().sum();
double sol1 = (A.inverse() * B).trace();
Note that, in Eigen, when you write (A*B).diagonal() only diagonal elements of A*B are computed;, not the off-diagonal ones.
In general, it is not recommended to explicitly compute the inverse of a matrix, and using either A.lu().solve(B) or A.ldlt().solve(B) will give you more accurate results and will be faster too because, unless A is very small (2, 3, 4), A.inverse() is equivalent to A.lu().solve(I). In the future, Eigen will very likely rewrite expressions like:
A.inverse() * B
as:
A.lu().solve(B)
for you anyway.

Related

Computing a single element of the adjugate or inverse of a symbolic binary matrix

I'm trying to get a single element of an adjugate A_adj of a matrix A, both of which need to be symbolic expressions, where the symbols x_i are binary and the matrix A is symmetric and sparse. Python's sympy works great for small problems:
from sympy import zeros, symbols
size = 4
A = zeros(size,size)
x_i = [x for x in symbols(f'x0:{size}')]
for i in range(size-1):
A[i,i] += 0.5*x_i[i]
A[i+1,i+1] += 0.5*x_i[i]
A[i,i+1] = A[i+1,i] = -0.3*(i+1)*x_i[i]
A_adj_0 = A[1:,1:].det()
A_adj_0
This calculates the first element A_adj_0 of the cofactor matrix (which is the corresponding minor) and correctly gives me 0.125x_0x_1x_2 - 0.28x_2x_2^2 - 0.055x_1^2x_2 - 0.28x_1x_2^2, which is the expression I need, but there are two issues:
This is completely unfeasible for larger matrices (I need this for sizes of ~100).
The x_i are binary variables (i.e. either 0 or 1) and there seems to be no way for sympy to simplify expressions of binary variables, i.e. simplifying polynomials x_i^n = x_i.
The first issue can be partly addressed by instead solving a linear equation system Ay = b, where b is set to the first basis vector [1, 0, 0, 0], such that y is the first column of the inverse of A. The first entry of y is the first element of the inverse of A:
b = zeros(size,1)
b[0] = 1
y = A.LUsolve(b)
s = {x_i[i]: 1 for i in range(size)}
print(y[0].subs(s) * A.subs(s).det())
print(A_adj_0.subs(s))
The problem here is that the expression for the first element of y is extremely complicated, even after using simplify() and so on. It would be a very simple expression with simplification of binary expressions as mentioned in point 2 above. It's a faster method, but still unfeasible for larger matrices.
This boils down to my actual question:
Is there an efficient way to compute a single element of the adjugate of a sparse and symmetric symbolic matrix, where the symbols are binary values?
I'm open to using other software as well.
Addendum 1:
It seems simplifying binary expressions in sympy is possible with a simple custom substitution which I wasn't aware of:
A_subs = A_adj_0
for i in range(size):
A_subs = A_subs.subs(x_i[i]*x_i[i], x_i[i])
A_subs
You should make sure to use Rational rather than floats in sympy so S(1)/2 or Rational(1, 2) rather than 0.5.
There is a new (undocumented and for the moment internal) implementation of matrices in sympy called DomainMatrix. It is likely to be a lot faster for a problem like this and always produces polynomial results in a fully expanded form. I expect that it will be much faster for this kind of problem but it still seems to be fairly slow for this because is is not sparse internally (yet - that will probably change in the next release) and it does not take advantage of the simplification from the symbols being binary-valued. It can be made to work over GF(2) but not with symbols that are assumed to be in GF(2) which is something different.
In case it is helpful though this is how you would use it in sympy 1.7.1:
from sympy import zeros, symbols, Rational
from sympy.polys.domainmatrix import DomainMatrix
size = 10
A = zeros(size,size)
x_i = [x for x in symbols(f'x0:{size}')]
for i in range(size-1):
A[i,i] += Rational(1, 2)*x_i[i]
A[i+1,i+1] += Rational(1, 2)*x_i[i]
A[i,i+1] = A[i+1,i] = -Rational(3, 10)*(i+1)*x_i[i]
# Convert to DomainMatrix:
dM = DomainMatrix.from_list_sympy(size-1, size-1, A[1:, 1:].tolist())
# Compute determinant and convert back to normal sympy expression:
# Could also use dM.det().as_expr() although it might be slower
A_adj_0 = dM.charpoly()[-1].as_expr()
# Reduce powers:
A_adj_0 = A_adj_0.replace(lambda e: e.is_Pow, lambda e: e.args[0])
print(A_adj_0)

sympy matrix to explicit sum and back (to matrix notation)

I am working in sympy with symbolic matrices.
Once made explicit I can not return to implicit representations.
I tried to work something out with the pair of .as_explicit() and MatrixExpr.from_index_summation(expr)
But the latter seems to expect an explicit sigma notation sum, not a sum of indexed elements.
As a minimal working example here is my approach on matrix multiplication:
A = MatrixSymbol('A',3,4)
B = MatrixSymbol('B',4,3)
Matrix_Notation = A * B
Expanded = (A * B).as_explicit()
FromSummation = MatrixExpr.from_index_summation(Expanded)
Here we can see, that FromSummation is still the same as Expanded
I suppose that the Expanded expression should be converted to sigma sums such that .from_index_summation can be expected to work. But how can this be done?

Numpy Hermitian Matrix class

Are you aware of something like a hermitian matrix class in numpy? I'd like to optimize matrix calculations like
B = U * A * U.H
, where A (and thus, B) are hermitian. Without specification, all matrix elements of B are calculated. In fact, it should be able to save a factor of about 2 here. Do I miss something?
The method I need should take take the upper/lower triangle of A, the full matrix of U and return the upper/lower triangle of B.
I don't think there exists a method for your specific problem, but with a little thought you might be able to build an algorithm from the low-level BLAS routines that are wrapped in SciPy. For example, dgemm, dsymm, and dtrmm do general, symmetric, and triangular matrix products respectively. Here's an example of using them:
from scipy.linalg.blas import dgemm, dsymm, dtrmm
A = np.random.rand(10, 10)
B = np.random.rand(10, 10)
S = np.dot(A, A.T) # symmetric matrix
T = np.triu(S) # upper triangular matrix
# normal matrix-matrix product
assert np.allclose(dgemm(1, A, B), np.dot(A, B))
# symmetric mat-mat product using only upper-triangle
assert np.allclose(dsymm(1, T, B), np.dot(S, B))
# upper-triangular mat-mat product
assert np.allclose(dtrmm(1, T, B), np.dot(T, B))
There are many other low-level BLAS routines available; I find the NETLIB page to be a good resource to learn what they do. You may be able to cleverly use some combination of the available routines to efficiently solve the problem you have in mind.
Edit: it looks like there are LAPACK routines that quickly compute exactly what you want: dsytrd or zhetrd, but unfortunately these don't appear to be wrapped directly in scipy.linalg.lapack, though scipy does provide cython wrappers for them. Best of luck!
I needed tridiagonal reduction of a symmetric/Hermitian matrix A,
T = Q^H * A * Q
– presumably OP's underlying problem – and I've just submitted a pull request to SciPy for properly interfacing LAPACK's {s,d}sytrd (for real symmetric matrices) and {c,z}hetrd (for Hermitian matrices). All routines use either only the upper or the lower triangular part of the matrix.
Once this has been merged, it can be used like
import numpy as np
n = 3
A = np.zeros((n, n), dtype=dtype)
A[np.triu_indices_from(A)] = np.arange(1, 2*n+1, dtype=dtype)
# query lwork -- optional
lwork, info = sytrd_lwork(n)
assert info == 0
data, d, e, tau, info = sytrd(A, lwork=lwork)
assert info == 0
The vectors d and e now contain the main diagonal and the upper and lower diagonal, respectively.

How is `(d*a)mod(b)=1` written in Ruby?

How should I write this:
(d*a)mod(b)=1
in order to make it work properly in Ruby? I tried it on Wolfram, but their solution:
(da(b, d))/(dd) = -a/d
doesn't help me. I know a and b. I need to solve (d*a)mod(b)=1 for d in the form d=....
It's not clear what you're asking, and, depending on what you mean, a solution may be impossible.
First off, (da(b, d))/(dd) = -a/d, is not a solution to that equation; rather, it's a misinterpretation of the notation used for partial derivatives. What Wolfram Alpha actually gave you was:
, which is entirely unrelated.
Secondly, if you're trying to solve (d*a)mod(b)=1 for d, you may be out of luck. For any value of a and b, where a and b have a common prime factor, there are an infinite number of values of d that satisfy the equation. If a and b are coprime, you can use the formula given in LutzL's answer.
Additionally, if you're looking to perform symbolic manipulation of equations, Ruby is likely not the proper tool. Consider using a CAS, like Python's SymPy or Wolfram Mathematica.
Finally, if you're just trying to compute (d*a)mod(b), the modulo operator in Ruby is %, so you'd write (d*a)%(b).
You are looking for the modular inverse of a modulo b.
For any two numbers a,b the extended euclidean algorithm
g,u,v = xgcd(a, b)
gives coefficients u,v such that
u*a+v*b = g
and g is the greatest common divisor. You need a,b co-prime, preferably by ensuring that b is a prime number, to get g=1 and then you can set d=u.
xgcd(a,b)
if b = 0
return (a,1,0)
q,r = a divmod b
// a = q*b + r
g,u,v = xgcd(b, r)
// g = u*b + v*r = u*b + v*(a-q*b) = v*a+(u-q*v)*b
return g,v,u - q*v

How to calculate the sum of two normal distributions

I have a value type that represents a gaussian distribution:
struct Gauss {
double mean;
double variance;
}
I would like to perform an integral over a series of these values:
Gauss eulerIntegrate(double dt, Gauss iv, Gauss[] values) {
Gauss r = iv;
foreach (Gauss v in values) {
r += v*dt;
}
return r;
}
My question is how to implement addition for these normal distributions.
The multiplication by a scalar (dt) seemed simple enough. But it wasn't simple! Thanks FOOSHNICK for the help:
public static Gauss operator * (Gauss g, double d) {
return new Gauss(g.mean * d, g.variance * d * d);
}
However, addition eludes me. I assume I can just add the means; it's the variance that's causing me trouble. Either of these definitions seems "logical" to me.
public static Gauss operator + (Gauss a, Gauss b) {
double mean = a.mean + b.mean;
// Is it this? (Yes, it is!)
return new Gauss(mean, a.variance + b.variance);
// Or this? (nope)
//return new Gauss(mean, Math.Max(a.variance, b.variance));
// Or how about this? (nope)
//return new Gauss(mean, (a.variance + b.variance)/2);
}
Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?
I suppose I could switch the code to use interval arithmetic instead, but I was hoping to stay in the world of prob and stats.
The sum of two normal distributions is itself a normal distribution:
N(mean1, variance1) + N(mean2, variance2) ~ N(mean1 + mean2, variance1 + variance2)
This is all on wikipedia page.
Be careful that these really are variances and not standard deviations.
// X + Y
public static Gauss operator + (Gauss a, Gauss b) {
//NOTE: this is valid if X,Y are independent normal random variables
return new Gauss(a.mean + b.mean, a.variance + b.variance);
}
// X*b
public static Gauss operator * (Gauss a, double b) {
return new Gauss(a.mean*b, a.variance*b*b);
}
To be more precise:
If a random variable Z is defined as the linear combination of two uncorrelated Gaussian random variables X and Y, then Z is itself a Gaussian random variable, e.g.:
if Z = aX + bY,
then mean(Z) = a * mean(X) + b * mean(Y), and variance(Z) = a2 * variance(X) + b2 * variance(Y).
If the random variables are correlated, then you have to account for that. Variance(X) is defined by the expected value E([X-mean(X)]2). Working this through for Z = aX + bY, we get:
variance(Z) = a2 * variance(X) + b2 * variance(Y) + 2ab * covariance(X,Y)
If you are summing two uncorrelated random variables which do not have Gaussian distributions, then the distribution of the sum is the convolution of the two component distributions.
If you are summing two correlated non-Gaussian random variables, you have to work through the appropriate integrals yourself.
Well, your multiplication by scalar is wrong - you should multiply variance by the square of d. If you're adding a constant, then just add it to the mean, the variance stays the same. If you're adding two distributions, then add the means and add the variances.
Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?
Arguably not, as adding two distributions means different things - having worked in reliability and maintainablity my first reaction from the title would be the distribution of a system's mtbf, if the mtbf of each part is normally distributed and the system had no redundancy. You are talking about the distribution of the sum of two normally distributed independent variates, not the (logical) sum of two normal distributions' effect. Very often, operator overloading has surprising semantics. I'd leave it as a function and call it 'normalSumDistribution' unless your code has a very specific target audience.
Hah, I thought you couldn't add gaussian distributions together, but you can!
http://mathworld.wolfram.com/NormalSumDistribution.html
In fact, the mean is the sum of the individual distributions, and the variance is the sum of the individual distributions.
I'm not sure that I like what you're calling "integration" over a series of values. Do you mean that word in a calculus sense? Are you trying to do numerical integration? There are other, better ways to do that. Yours doesn't look right to me, let alone optimal.
The Gaussian distribution is a nice, smooth function. I think a nice quadrature approach or Runge-Kutta would be a much better idea.
I would have thought it depends on what type of addition you are doing. If you just want to get a normal distribution with properties (mean, standard deviation etc.) equal to the sum of two distributions then the addition of the properties as given in the other answers is fine. This is the assumption used in something like PERT where if a large number of normal probability distributions are added up then the resulting probability distribution is another normal probability distribution.
The problem comes when the two distributions being added are not similar. Take for instance adding a probability distribution with a mean of 2 and standard deviation of 1 and a probability distribution of 10 with a standard deviation of 2. If you add these two distributions up, you get a probability distribution with two peaks, one at 2ish and one at 10ish. The result is therefore not a normal distibution. The assumption about adding distributions is only really valid if the original distributions are either very similar or you have a lot of original distributions so that the peaks and troughs can be evened out.

Resources