I am trying to develop an algorithm (in the framework of gradient descent )for an SEM(structural equation model) problem.There is a parameter matrix B(n*n) with all its diagonal elements fixed to be zero.And a term of inv(I-B) (inversion of I - B) in my objective function.There is no other constraint such as symmetry on B.
My question is that how can we make sure (I-B) is not singular in the iterations?
In this problem,because the domain of the objective function is not the whole R^n space,it seems that the strict conditions for the convergence of gradient descent will be not satisfied.Standard textbooks will assume the objective to have a domain in the whole R^n space.It seems that gradient descent will not have a guaranteed convergence.
In the update of the iterative algorithms,currently my implementation is that checking whether (I-B) is close to singular, then if it is not, the step size of the gradient descent will be shrunk.Is there any better numerical approach to dealing with this problem?
You can try putting a logarithmic barrier on det(I-B)>0 or det(I-B)<0 depending on which gives you a better result, or if you have more info about your problem . the gradient of logdet is quite nice https://math.stackexchange.com/questions/38701/how-to-calculate-the-gradient-of-log-det-matrix-inverse
You can also compute the Fenchel dual so you can potentially use a primal-dual approach.
Related
I stumbled upon something, which I consider very strange.
As an example consider the code
A = reshape(1:6, 3,2)
A/[1 1]
which gives
3×1 Array{Float64,2}:
2.5
3.5
4.5
As I understand, in general such division gives the weighted average of columns, where each weight is inversely proportional to the corresponding element of the vector.
So my question is, why is it defined such way?
What is the mathematical justification of this definition?
It's the minimum error solution to |A - v*[1 1]|₂ – which, being overconstrained, has no exact solution in general (i.e. value v such that the norm is precisely zero). The behavior of / and \ is heavily overloaded, solving both under and overconstrained systems by a variety of techniques and heuristics. Whether this kind of overloading is a good idea or not is debatable, but it's what people have come to expect from these operations in Matlab and Octave, and it's often quite convenient to have so much functionality available in a single operator.
Let A be an NxN matrix and b be a Nx1 column vector. Then \ solves Ax=b, and / solves xA=b.
As Stefan mentions, this is extended to underdetermined cases as the least squares solution. This is done via the QR or SVD decompositions. See the details on these algorithms to see why this is the case. Hint: the linear form of the OLS estimator can actually be written as the solution to matrix decompositions, so it's the same thing.
Now you might ask, how does it actually solve it? That's a complicated question. Essentially, it uses a matrix factorization. But which matrix factorization is used is dependent on the matrix type. The reason for this is because Gaussian elimination is O(n^3), and so treating the problem generally is usually not good. But whenever you can specialize, you can get speedups. So essentially \ (and /, which transposes and calls \) check for a bunch of special types and pick a factorization or other algorithm (LU, QR, SVD, Cholesky, etc.) based on the matrix type. The flow chart from MATLAB explains this very well. There's a lot of details here, and it gets even more details when the matrix is sparse. Also IterativeSolvers.jl should be mentioned because it's another set of algorithms for solving Ax=b.
Most applied math problems reduce down to linear algebra, with solving Ax=b being one of the most important and difficult problems, which is why there is tons of research on the subject. In fact, you can probably say that the vast majority of the field of numerical linear algebra is devoted to finding fast methods for solving Ax=b on specific matrix types. \ essentially puts all of the direct (non-iterative) methods into one convenient operator.
I'm trying to understand the logic behind the OpenCV's cv::undisortPoints()' iterative approximation algorithm.
The implementation is available at:
https://github.com/Itseez/opencv/blob/master/modules/imgproc/src/undistort.cpp (lines 361-368).
The way I see it:
using last best guessed pixel position (x, y), try to find better guess by applying inverse of the 'distortion at current best guess', and adjust the pixel position in regard to the initial distorted position (x0, y0)
use initial distorted position (x0, y0) as a first 'best guess'
But the above doesn't really tell why this can be done...
One of the users posted (here: Understanding of openCV undistortion) that this is a kind of "non-linear solving algorithm (e.g. Newton's method, Levenberg-Marquardt algorithm, etc)". And from what I've seen there are at least a few possible solutions to this kind of undistorting problem.
Questions:
What iterative algorithm exactly is implemented in cv::undistortPoints()?
Is there any white paper showing (and [what's more important] explaining 'like I'm five') the idea behind it?
How do we know that this algorithm will converge (at least to the local minimum)?
Why do we do the correction in regard to the initial position (x0, y0)?
It uses the false position ("regula falsi") method. I have not seen a proof that the sequence converges for this particular equation, regardless of the choice of distortion parameters (or even for every choice of "physically plausible" parameters). It'd be very easy to write one for a few special cases, e.g. physical pure 2nd-order barrel distortion.
In practice it seems to work well. If you feel uncomfortable with it, you can always replace with the equation solver of your choice. For pure radial distortion of any order (i.e. with a single unknown), you can use any polynomial equation solver, e.g. good old SLATEC's rpqr79.
Problem
I have a formula for calculation of 1D polynomial, joint function. I want to find all local maximums of that function within a given range.
My approach
My current solution is that i evaluate my function in a certain number of points from the range and then I go through these points and remember points where function changed from rising to decline. Of cause I can change number of samples within the interval, but I want to find all maximums with as lowest number of samples as possible.
Question
Can you suggest any effetive algorithm to me?
Finding all the maxima of an unknown function is hard. You can never be sure that a maximum you found is really just one maximum or that you have not overlooked a maximum somewhere.
However, if something is known about the function, you can try to exploit that. The simplest one is, of course, is if the function is known to be rational and bounded in grade. Up to a rational function of grade five it is possible to derive all four extrema from a closed formula, see http://en.wikipedia.org/wiki/Quartic_equation#General_formula_for_roots for details. Most likely, you don't want to implement that, but for linear, square, and cubic roots, the closed formula is feasible and can be used to find maxima of a quartic function.
That is only the most simple information that might be known, other interesting information is whether you can give a bound to the second derivative. This would allow you to reduce the sampling density when you find a strong slope.
You may also be able to exploit information from how you intend to use the maxima you found. It can give you clues about how much precision you need. Is it sufficient to know that a point is near a maximum? Or that a point is flat? Is it really a problem if a saddle point is classified as a maximum? Or if a maximum right next to a turning point is overlooked? And how much is the allowable error margin?
If you cannot exploit information like this, you are thrown back to sampling your function in small steps and hoping you don't make too much of an error.
Edit:
You mention in the comments that your function is in fact a kernel density estimation. This gives you at least the following information:
Unless the kernel is not limited in extend, your estimated function will be a piecewise function: Any point on it will only be influenced by a precisely calculable number of measurement points.
If the kernel is based on a rational function, the resulting estimated function will be piecewise rational. And it will be of the same grade as the kernel!
If the kernel is the uniform kernel, your estimated function will be a step function.
This case needs special handling because there won't be any maxima in the mathematical sense. However, it also makes your job really easy.
If the kernel is the triangular kernel, your estimated function will be a piecewise linear function.
If the kernel is the Epanechnikov kernel, your estimated function will be a piecewise quadratic function.
In all these cases it is next to trivial to produce the piecewise functions and to find their maxima.
If the kernel is of too high grade or transcendental, you still know the measurements that your estimation is based on, and you know the kernel properties. This allows you to derive a heuristic on how dense your maxima can get.
At the very least, you know the first and second derivative of the kernel.
In principle, this allows you to calculate the first and second derivative of the estimated function at any point.
In the case of a local kernel, it might be more prudent to calculate the first derivative and an upper bound to the second derivative of the estimated function at any point.
With this information, it should be possible to constrain the search to the regions where there are maxima and avoid oversampling of the slopes.
As you see, there is a lot of useful information that you can derive from the knowledge of your function, and which you can use to your advantage.
The local maxima are among the roots of the first derivative. To isolate those roots in your working interval you can use the Sturm theorem, and proceed by dichotomy. In theory (using exact arithmetic) it gives you all real roots.
An equivalent approach is to express your polynomial in the Bezier/Bernstein basis and look for changes of signs of the coefficients (hull property). Dichotomic search can be efficiently implemented by recursive subdivision of the Bezier.
There are several classical algorithms available for polynomials, such as Laguerre, that usually look for the complex roots as well.
Is there an algorithm to check if a given (possibly nonlinear) function f is always positive?
The idea that I currently have is to find the roots of the function (using newton-raphson algorithm or similar techniques, see http://en.wikipedia.org/wiki/Root-finding_algorithm) and check for derivatives, or finding the minimum of the f, but they don't seems to be the best solutions to this problem, also there are a lot of convergence issues with root finding algorithms.
For example, in Maple, function verify can do this, but I need to implement it in my own program.
Maple Help on verify: http://www.maplesoft.com/support/help/Maple/view.aspx?path=verify/function_shells
Maple example:
assume(x,'real');
verify(x^2+1,0,'greater_than' ); --> returns true, since for every x we have x^2+1 > 0
[edit] Some background on the question:
The function $f$ is the right hand-side differential nonlinear model for a circuit. A nonlinear circuit can be modeled as a set of ordinary differential equations by applying modified nodal analysis (MNA), for sake of simplicity, let's consider only systems with 1 dimension, so $x' = f(x)$ where $f$ describes the circuit, for example $f$ can be $f(x) = 10x - 100x^2 + 200x^3 - 300x^4 + 100x^5$ ( A model for nonlinear tunnel-diode) or $f=10 - 2sin(4x)+ 3x$ (A model for josephson junction).
$x$ is bounded and $f$ is only defined in interval $[a,b] \in R$. $f$ is continuous.
I can also make an assumption that $f$ is Lipschitz with Lipschitz constant L>0, but I don't want to unless I have to.
If I understand your problem correctly, it boils down to counting the number of (real) roots in an interval without necessarily identifying them. In fact, you don't even need to get the exact number, just whether or not it's equal to zero.
If your function is a polynomial, I think that Sturm's theorem may be applicable. The Wikipedia article claims two other procedures are preferred, so you might want to check those out, too. I'm not sure if Descartes' rule of signs works on an interval, but Budan's theorem does appear to.
Let P(x) denote the polynomial in question. The least fixed point (LFP) of P is the lowest value of x such that x=P(x). The polynomial has real coefficients. There is no guarantee in general that an LFP will exist, although one is guaranteed to exist if the degree is odd and ≥ 3. I know of an efficient solution if the degree is 3. x=P(x) thus 0=P(x)-x. There is a closed-form cubic formula, solving for x is somewhat trivial and can be hardcoded. Degrees 2 and 1 are similarly easy. It's the more complicated cases that I'm having trouble with, since I can't seem to come up with a good algorithm for arbitrary degree.
EDIT:
I'm only considering real fixed points and taking the least among them, not necessarily the fixed point with the least absolute value.
Just solve f(x) = P(x) - x using your favorite numerical method. For example, you could iterate
x_{n + 1} = x_n - P(x_n) / (P'(x_n) - 1).
You won't find closed-form formula in general because there aren't any closed-form formula for quintic and higher polynomials. Thus, for quintic and higher degree you have to use a numerical method of some sort.
Since you want the least fixed point, you can't get away without finding all real roots of P(x) - x and selecting the smallest.
Finding all the roots of a polynomial is a tricky subject. If you have a black box routine, then by all means use it. Otherwise, consider the following trick:
Form M the companion matrix of P(x) - x
Find all eigenvalues of M
but this requires you have access to a routine for finding eigenvalues (which is another tricky problem, but there are plenty of good libraries).
Otherwise, you can implement the Jenkins-Traub algorithm, which is a highly non trivial piece of code.
I don't really recommend finding a zero (with eg. Newton's method) and deflating until you reach degree one: it is very unstable if not done properly, and you'll lose a lot of accuracy (and it is very difficult to tackle multiple roots with it). The proper way do do it is in fact the above-mentioned Jenkins-Traub algorithm.
This problem is trying to find the "least" (here I'm not sure if you mean in magnitude or actually the smallest, which could be the most negative) root of a polynomial. There is no closed form solution for polynomials of large degree, but there are myriad numerical approaches to finding roots.
As is often the case, Wikipedia is a good place to begin your search.
If you want to find the smallest root, then you can use the rule of signs to pin down the interval where it exists and then use some numerical method to find roots in that interval.