Minimizing a function of vectors - algorithm

I need to minimize the following sum:
minimize sum for all i{(i = 1 to n) fi(v(i), v(i - 1), tangent(i))}
v and tangent are vectors.
fi takes the 3 vectors as arguments and returns a cost associated with these 3 vectors. For this function, v(i - 1) is the vector chosen in the previous iteration. tangent(i) is also known. fi calculates the cost of choosing a vector v(i), given the other two vectors v(i - 1) and tangent(i). The v(0) and v(n) vectors are known. tangent(i) values are also known in advance for alli = 0 to n.
My task is to determine all such v(i)s such that the total cost of the function values for i = 1 to n is minimized.
Can you please give me any ideas to solve this?
So far I can think of Branch and Bound or dynamic programming methods.
Thanks!

I think this is a problem in mathematical optimisation, with an objective function built up of dot products and arcCosines, subject to the constraint that your vectors should be unit vectors. You could enforce this either with Lagrange multipliers, or by including a normalising step in the arc-Cosine. If Ti is a unit vector then for Vi calculate cos^-1(Ti.Vi/sqrt(Vi.Vi)). I would have a go at using a conjugate gradient optimiser for this, or perhaps even Newton's method, with my starting point Vi = Ti.
I would hope that this would be reasonably tractable, because the Vi are only related to neighbouring Vi. You might even get somewhere by repeatedly adjusting each Vi in isolation, one by one, to optimise the objective function. It might be worth just seeing what happens if you repeatedly set Vi to be the average of Ti, Vi+1, and Vi-1, and then scaled Vi to be a unit vector again.

Related

Numerical instability?

I am working in a program that concerns the optimization of some objective function obj over the scalar beta. The true global minimum beta0 is set at beta0=1.
In the mwe below you can see that obj is constructed as the sum of the 100-R (here I use R=3) smallest eigenvalues of the 100x100 symmetric matrix u'*u. While around the true global minimum obj "looks good" when I plot the objective function evaluated at much larger values of beta the objective function becomes very unstable (here or running the mwe you can see that multiple local minima (and maxima) appear, associated with values of obj(beta) smaller than the true global minimum).
My guess is that there is some sort of "numerical instability" going on, but I am unable to find the source.
%Matrix dimensions
N=100;
T=100;
%Reproducibility
rng('default');
%True global minimum
beta0=1;
%Generating data
l=1+randn(N,2);
s=randn(T+1,2);
la=1+randn(N,2);
X(1,:,:)=1+(3*l+la)*(3*s(1:T,:)+s(2:T+1,:))';
s=s(1:T,:);
a=(randn(N,T));
Y=beta0*squeeze(X(1,:,:))+l*s'+a;
%Give "beta" a large value
beta=1e6;
%Compute objective function
u=Y-beta*squeeze(X(1,:,:));
ev=sort(eig(u'*u)); % sort eigenvalues
obj=sum(ev(1:100-3))/(N*T); % "obj" is sum of 97 smallest eigenvalues
This evaluates the objective function at obj(beta=1e6). I have noticed that some of the eigenvalues from eig(u'*u) are negative (see object ev), when by construction the matrix u'*u is positive semidefinite
I am guessing this may have to do with floating point arithmetic issues and may (partly) be the answer to the instability of my function, but I am not sure.
Finally, this is what the objective function obj evaluated at a wide range of values for betalooks like:
% Now plot "obj" for a wide range of values of "beta"
clear obj
betaGrid=-5e5:100:5e5;
for i=1:length(betaGrid)
u=Y-betaGrid(i)*squeeze(X(1,:,:));
ev=sort(eig(u'*u));
obj(i)=sum(ev(1:100-3))/(N*T);
end
plot(betaGrid,obj,"*")
xlabel('\beta')
ylabel('obj')
This gives this figure, which shows how unstable it becomes for extreme values for beta.
The key here is noticing that computing eigenvalues can be a hard problem.
Actually the condition number for this problem is K = norm(A) * norm(inv(A)) (don't compute it this way, use cond(). This means the the an (relative) perturbation in the inpute (i.e. the matrix entries) gets amplified by the condition number when computing the output. I modified your code a little bit to compute and plot the condition number in each step. It turns out that for a large part of the range you are interested in it is greater than 10^17, which is abysmal. (Note that the double floating point numbers are accurate to not quite 16 significant (decimal) digits. This means even the representation error of double floating point numbers will here produce errors that make every digit "insignificant".) This already explains the bad behaviour. You should note that usually we can compute the largest eigenvalues quite accurately, the errors in the smaller (in magnitude) ones usually increase.
If the condition number was better (closer to 1) I would have suggested
computing the singular values, as they happen to be the eigenvalues (due to the symmetry). The svd is numerically more stable, but with this really bad
condition even this will not help. In the following modification of the
final snippet I added a graph that plots the condition number.
The only case where anything is salvageable is for R=0, then we actually
want to compute the sum of all eigenvalues, which happens to be the
trace of our matrix, which can easily be computed by just summing the
diagonal entries.
To summarize: This problem seems to have an inherent bad condition, so it doesn't really matter how you compute it. If you have a completely different formulation for the same problem that might help.
% Now plot "obj" for a wide range of values of "beta"
clear obj
L = 5e5; % decrease to 5e-1 to see that the condition number is still >1e9 around the optimum
betaGrid=linspace(-L,L,1000);
condition = nan(size(betaGrid));
for i=1:length(betaGrid)
disp(i/length(betaGrid))
u=Y-betaGrid(i)*squeeze(X(1,:,:));
A = u'*u;
ev=sort(eig(A));
condition(i) = cond(A);
obj(i)=sum(ev(1:100-3))/(N*t); % for R=0 use trace(A)/(N*T);
end
subplot(1,2,1);
plot(betaGrid,obj,"*")
xlabel('\beta')
ylabel('obj')
subplot(1,2,2);
semilogy(betaGrid, condition);
title('condition number');

Matrix Chain Multiplication Dynamic Programming

Assume that multiplying a matrix G1 of dimension p×q with another matrix G2 of dimension q×r requires pqr scalar multiplications. Computing the product of n matrices G1G2G3 ….. Gn can be done by parenthesizing in different ways. Define GiGi+1 as an explicitly computed pair for a given paranthesization if they are directly multiplied. For example, in the matrix multiplication chain G1G2G3G4G5G6 using parenthesization (G1(G2G3))(G4(G5G6)), G2G3 and G5G6 are only explicitly computed pairs.
Consider a matrix multiplication chain F1F2F3F4F5, where matrices F1,F2,F3,F4 and F5 are of dimensions 2×25,25×3,3×16,16×1 and 1×1000, respectively. In the parenthesization of F1F2F3F4F5 that minimizes the total number of scalar multiplications, the explicitly computed pairs is/are
F1F2 and F3F4 only
F2F3 only
F3F4 only
F2F3 and F4F5 only
=======================================================================
My approach - I want to solve this under one minute, but the only way I know is that to use Bottom up Dynamic Approach by making a table and the other thing I can conclude is we should multiply with F5 at last because it has 1000 in it's dimension.So, please how to develop fast intuition for this kind of question!
======================================================================
Correct answer is F3F4
The most important thing to note is the dimension 1×1000. You better watch out for it if you want to minimize the multiplications. OK, now we do know what we are looking for is basically multiply a small number with 1000.
Carefully examining if we go with F4F5, we would be multiplying 16x1x1000. But computing F3F4 first , the result matrix has dimension 3x1. So going with F3F4 we are able to get small numbers like 3,1 . So , no way im going with F4F5.
By similar logic I would not go with F2F3 and loose the smaller 3 and get bigger 25 and 16 to be later used with 1000.
OK, for F1F2, you can quickly find that (F1F2)(F3F4) is not better than
(F1(F2(F3F4))) . So the answer is F3F4

Querying large amount of multidimensional points in R^N

I'm looking at listing/counting the number of integer points in R^N (in the sense of Euclidean space), within certain geometric shapes, such as circles and ellipses, subject to various conditions, for small N. By this I mean that N < 5, and the conditions are polynomial inequalities.
As a concrete example, take R^2. One of the queries I might like to run is "How many integer points are there in an ellipse (parameterised by x = 4 cos(theta), y = 3 sin(theta) ), such that y * x^2 - x * y = 4?"
I could implement this in Haskell like this:
ghci> let latticePoints = [(x,y) | x <- [-4..4], y <-[-3..3], 9*x^2 + 16*y^2 <= 144, y*x^2 - x*y == 4]
and then I would have:
ghci> latticePoints
[(-1,2),(2,2)]
Which indeed answers my question.
Of course, this is a very naive implementation, but it demonstrates what I'm trying to achieve. (I'm also only using Haskell here as I feel it most directly expresses the underlying mathematical ideas.)
Now, if I had something like "In R^5, how many integer points are there in a 4-sphere of radius 1,000,000, satisfying x^3 - y + z = 20?", I might try something like this:
ghci> :{
Prelude| let latticePoints2 = [(x,y,z,w,v) | x <-[-1000..1000], y <- [-1000..1000],
Prelude| z <- [-1000..1000], w <- [-1000..1000], v <-[1000..1000],
Prelude| x^2 + y^2 + z^2 + w^2 + v^2 <= 1000000, x^3 - y + z == 20]
Prelude| :}
so if I now type:
ghci> latticePoints2
Not much will happen...
I imagine the issue is because it's effectively looping through 2000^5 (32 quadrillion!) points, and it's clearly unreasonably of me to expect my computer to deal with that. I can't imagine doing a similar implementation in Python or C would help matters much either.
So if I want to tackle a large number of points in such a way, what would be my best bet in terms of general algorithms or data structures? I saw in another thread (Count number of points inside a circle fast), someone mention quadtrees as well as K-D trees, but I wouldn't know how to implement those, nor how to appropriately query one once it was implemented.
I'm aware some of these numbers are quite large, but the biggest circles, ellipses, etc I'd be dealing with are of radius 10^12 (one trillion), and I certainly wouldn't need to deal with R^N with N > 5. If the above is NOT possible, I'd be interested to know what sort of numbers WOULD be feasible?
There is no general way to solve this problem. The problem of finding integer solutions to algebraic equations (equations of this sort are called Diophantine equations) is known to be undecidable. Apparently, you can write equations of this sort such that solving the equations ends up being equivalent to deciding whether a given Turing machine will halt on a given input.
In the examples you've listed, you've always constrained the points to be on some well-behaved shape, like an ellipse or a sphere. While this particular class of problem is definitely decidable, I'm skeptical that you can efficiently solve these problems for more complex curves. I suspect that it would be possible to construct short formulas that describe curves that are mostly empty but have a huge bounding box.
If you happen to know more about the structure of the problems you're trying to solve - for example, if you're always dealing with spheres or ellipses - then you may be able to find fast algorithms for this problem. In general, though, I don't think you'll be able to do much better than brute force. I'm willing to admit that (and in fact, hopeful that) someone will prove me wrong about this, though.
The idea behind the kd-tree method is that you recursive subdivide the search box and try to rule out whole boxes at a time. Given the current box, use some method that either (a) declares that all points in the box match the predicate (b) declares that no points in the box match the predicate (c) makes no declaration (one possibility, which may be particularly convenient in Haskell: interval arithmetic). On (c), cut the box in half (say along the longest dimension) and recursively count in the halves. Obviously the method can choose (c) all the time, which devolves to brute force; the goal here is to do (a) or (b) as much as possible.
The performance of this method is very dependent on how it's instantiated. Try it -- it shouldn't be more than a couple dozen lines of code.
For nicely connected region, assuming your shape is significantly smaller than your containing search space, and given a seed point, you could do a growth/building algorithm:
Given a seed point:
Push seed point into test-queue
while test-queue has items:
Pop item from test-queue
If item tests to be within region (eg using a callback function):
Add item to inside-set
for each neighbour point (generated on the fly):
if neighbour not in outside-set and neighbour not in inside-set:
Add neighbour to test-queue
else:
Add item to outside-set
return inside-set
The trick is to find an initial seed point that is inside the function.
Make sure your set implementation gives O(1) duplicate checking. This method will eventually break down with large numbers of dimensions as the surface area exceeds the volume, but for 5 dimensions should be fine.

Algorithm for finding all combinations of (x,y,z,j) that satisfy w+x = y+j, where w,x,y,j are integers between -N...N inclusive

I'm working on a problem that requires an array (dA[j], j=-N..N) to be calculated from the values of another array (A[i], i=-N..N) based on a conservation of momentum rule (x+y=z+j). This means that for a given index j for all the valid combinations of (x,y,z) I calculate A[x]A[y]A[z]. dA[j] is equal to the sum of these values.
I'm currently precomputing the valid indices for each dA[j] by looping x=-N...+N,y=-N...+N and calculating z=x+y-j and storing the indices if abs(z) <= N.
Is there a more efficient method of computing this?
The reason I ask is that in future I'd like to also be able to efficiently find for each dA[j] all the terms that have a specific A[i]. Essentially to be able to compute the Jacobian of dA[j] with respect to dA[i].
Update
For the sake of completeness I figured out a way of doing this without any if statements: if you parametrize the equation x+y=z+j given that j is a constant you get the equation for a plane. The constraint that x,y,z need to be integers between -N..N create boundaries on this plane. The points that define this boundary are functions of N and j. So all you have to do is loop over your parametrized variables (s,t) within these boundaries and you'll generate all the valid points by using the vectors defined by the plane (s*u + t*v + j*[0,0,1]).
For example, if you choose u=[1,0,-1] and v=[0,1,1] all the valid solutions for every value of j are bounded by a 6 sided polygon with points (-N,-N),(-N,-j),(j,N),(N,N),(N,-j), and (j,-N).
So for each j, you go through all (2N)^2 combinations to find the correct x's and y's such that x+y= z+j; the running time of your application (per j) is O(N^2). I don't think your current idea is bad (and after playing with some pseudocode for this, I couldn't improve it significantly). I would like to note that once you've picked a j and a z, there is at most 2N choices for x's and y's. So overall, the best algorithm would still complete in O(N^2).
But consider the following improvement by a factor of 2 (for the overall program, not per j): if z+j= x+y, then (-z)+(-j)= (-x)+(-y) also.

Is there a fast way to invert a matrix in Matlab?

I have lots of large (around 5000 x 5000) matrices that I need to invert in Matlab. I actually need the inverse, so I can't use mldivide instead, which is a lot faster for solving Ax=b for just one b.
My matrices are coming from a problem that means they have some nice properties. First off, their determinant is 1 so they're definitely invertible. They aren't diagonalizable, though, or I would try to diagonlize them, invert them, and then put them back. Their entries are all real numbers (actually rational).
I'm using Matlab for getting these matrices and for this stuff I need to do with their inverses, so I would prefer a way to speed Matlab up. But if there is another language I can use that'll be faster, then please let me know. I don't know a lot of other languages (a little but of C and a little but of Java), so if it's really complicated in some other language, then I might not be able to use it. Please go ahead and suggest it, though, in case.
I actually need the inverse, so I can't use mldivide instead,...
That's not true, because you can still use mldivide to get the inverse. Note that A-1 = A-1 * I. In MATLAB, this is equivalent to
invA = A\speye(size(A));
On my machine, this takes about 10.5 seconds for a 5000x5000 matrix. Note that MATLAB does have an inv function to compute the inverse of a matrix. Although this will take about the same amount of time, it is less efficient in terms of numerical accuracy (more info in the link).
First off, their determinant is 1 so they're definitely invertible
Rather than det(A)=1, it is the condition number of your matrix that dictates how accurate or stable the inverse will be. Note that det(A)=∏i=1:n λi. So just setting λ1=M, λn=1/M and λi≠1,n=1 will give you det(A)=1. However, as M → ∞, cond(A) = M2 → ∞ and λn → 0, meaning your matrix is approaching singularity and there will be large numerical errors in computing the inverse.
My matrices are coming from a problem that means they have some nice properties.
Of course, there are other more efficient algorithms that can be employed if your matrix is sparse or has other favorable properties. But without any additional info on your specific problem, there is nothing more that can be said.
I would prefer a way to speed Matlab up
MATLAB uses Gauss elimination to compute the inverse of a general matrix (full rank, non-sparse, without any special properties) using mldivide and this is Θ(n3), where n is the size of the matrix. So, in your case, n=5000 and there are 1.25 x 1011 floating point operations. So on a reasonable machine with about 10 Gflops of computational power, you're going to require at least 12.5 seconds to compute the inverse and there is no way out of this, unless you exploit the "special properties" (if they're exploitable)
Inverting an arbitrary 5000 x 5000 matrix is not computationally easy no matter what language you are using. I would recommend looking into approximations. If your matrices are low rank, you might want to try a low-rank approximation M = USV'
Here are some more ideas from math-overflow:
https://mathoverflow.net/search?q=matrix+inversion+approximation
First suppose the eigen values are all 1. Let A be the Jordan canonical form of your matrix. Then you can compute A^{-1} using only matrix multiplication and addition by
A^{-1} = I + (I-A) + (I-A)^2 + ... + (I-A)^k
where k < dim(A). Why does this work? Because generating functions are awesome. Recall the expansion
(1-x)^{-1} = 1/(1-x) = 1 + x + x^2 + ...
This means that we can invert (1-x) using an infinite sum. You want to invert a matrix A, so you want to take
A = I - X
Solving for X gives X = I-A. Therefore by substitution, we have
A^{-1} = (I - (I-A))^{-1} = 1 + (I-A) + (I-A)^2 + ...
Here I've just used the identity matrix I in place of the number 1. Now we have the problem of convergence to deal with, but this isn't actually a problem. By the assumption that A is in Jordan form and has all eigen values equal to 1, we know that A is upper triangular with all 1s on the diagonal. Therefore I-A is upper triangular with all 0s on the diagonal. Therefore all eigen values of I-A are 0, so its characteristic polynomial is x^dim(A) and its minimal polynomial is x^{k+1} for some k < dim(A). Since a matrix satisfies its minimal (and characteristic) polynomial, this means that (I-A)^{k+1} = 0. Therefore the above series is finite, with the largest nonzero term being (I-A)^k. So it converges.
Now, for the general case, put your matrix into Jordan form, so that you have a block triangular matrix, e.g.:
A 0 0
0 B 0
0 0 C
Where each block has a single value along the diagonal. If that value is a for A, then use the above trick to invert 1/a * A, and then multiply the a back through. Since the full matrix is block triangular the inverse will be
A^{-1} 0 0
0 B^{-1} 0
0 0 C^{-1}
There is nothing special about having three blocks, so this works no matter how many you have.
Note that this trick works whenever you have a matrix in Jordan form. The computation of the inverse in this case will be very fast in Matlab because it only involves matrix multiplication, and you can even use tricks to speed that up since you only need powers of a single matrix. This may not help you, though, if it's really costly to get the matrix into Jordan form.

Resources