How to add lasso (L1-norm) reisudial in Ceres Solver - ceres-solver

I want to add a regular term in least non-linear square function. How to do in Ceres Solver?

Pure L1-norm residual is non-differentiable and you cannot use it with ceres or for that matter any solver which depends on derivatives. However, Huber loss, or the SoftL1 loss included with Ceres are smooth approximations to L1 loss and you should be able to use them.
http://ceres-solver.org/nnls_modeling.html#instances

Related

what is the numerical method to guarantee safer matrix inversion?

I am trying to develop an algorithm (in the framework of gradient descent )for an SEM(structural equation model) problem.There is a parameter matrix B(n*n) with all its diagonal elements fixed to be zero.And a term of inv(I-B) (inversion of I - B) in my objective function.There is no other constraint such as symmetry on B.
My question is that how can we make sure (I-B) is not singular in the iterations?
In this problem,because the domain of the objective function is not the whole R^n space,it seems that the strict conditions for the convergence of gradient descent will be not satisfied.Standard textbooks will assume the objective to have a domain in the whole R^n space.It seems that gradient descent will not have a guaranteed convergence.
In the update of the iterative algorithms,currently my implementation is that checking whether (I-B) is close to singular, then if it is not, the step size of the gradient descent will be shrunk.Is there any better numerical approach to dealing with this problem?
You can try putting a logarithmic barrier on det(I-B)>0 or det(I-B)<0 depending on which gives you a better result, or if you have more info about your problem . the gradient of logdet is quite nice https://math.stackexchange.com/questions/38701/how-to-calculate-the-gradient-of-log-det-matrix-inverse
You can also compute the Fenchel dual so you can potentially use a primal-dual approach.

Definition of matrix-vector division operator of Julia

I stumbled upon something, which I consider very strange.
As an example consider the code
A = reshape(1:6, 3,2)
A/[1 1]
which gives
3×1 Array{Float64,2}:
2.5
3.5
4.5
As I understand, in general such division gives the weighted average of columns, where each weight is inversely proportional to the corresponding element of the vector.
So my question is, why is it defined such way?
What is the mathematical justification of this definition?
It's the minimum error solution to |A - v*[1 1]|₂ – which, being overconstrained, has no exact solution in general (i.e. value v such that the norm is precisely zero). The behavior of / and \ is heavily overloaded, solving both under and overconstrained systems by a variety of techniques and heuristics. Whether this kind of overloading is a good idea or not is debatable, but it's what people have come to expect from these operations in Matlab and Octave, and it's often quite convenient to have so much functionality available in a single operator.
Let A be an NxN matrix and b be a Nx1 column vector. Then \ solves Ax=b, and / solves xA=b.
As Stefan mentions, this is extended to underdetermined cases as the least squares solution. This is done via the QR or SVD decompositions. See the details on these algorithms to see why this is the case. Hint: the linear form of the OLS estimator can actually be written as the solution to matrix decompositions, so it's the same thing.
Now you might ask, how does it actually solve it? That's a complicated question. Essentially, it uses a matrix factorization. But which matrix factorization is used is dependent on the matrix type. The reason for this is because Gaussian elimination is O(n^3), and so treating the problem generally is usually not good. But whenever you can specialize, you can get speedups. So essentially \ (and /, which transposes and calls \) check for a bunch of special types and pick a factorization or other algorithm (LU, QR, SVD, Cholesky, etc.) based on the matrix type. The flow chart from MATLAB explains this very well. There's a lot of details here, and it gets even more details when the matrix is sparse. Also IterativeSolvers.jl should be mentioned because it's another set of algorithms for solving Ax=b.
Most applied math problems reduce down to linear algebra, with solving Ax=b being one of the most important and difficult problems, which is why there is tons of research on the subject. In fact, you can probably say that the vast majority of the field of numerical linear algebra is devoted to finding fast methods for solving Ax=b on specific matrix types. \ essentially puts all of the direct (non-iterative) methods into one convenient operator.

Inverting small matrix

I have a piece of code in Fortran 90 in which I have to solve both a non-linear (for which I have to invert the Jacobian matrix) and a linear system of equations. When I say small I mean n unknowns for both operations, with n<=4. Unfortunately, n is not known a priori. What do you think is the best option?
I thought of writing explicit formulas for cases with n=1,2 and using other methods for n=3,4 (e.g. some functions of the Intel MKL libraries), for the sake of performance. Is this sensible or should I write explicit formulas for the inverse matrix also for n=3,4?

Can Scip solve MIP subject to matrix equations?

My questions concerns the Mixed integer programming (MIP) in Scip:
I have the following code:
$\min trace(X)$
subject to
$$(A+D)^TX+X(A+D)=I\\
d_i \in \left\{0,1\right\} \mbox{ for } i=1,\ldots,n$$
where A is a n*n matrix and $D=diag(d_1,\ldots,d_n)$ is a diagonal matrix.
Since the matrix constraints are linear the equation can be transformed to a system of linear equations (via Kronecker product and vectorize operation), but this is limited to small n. Is it possible to solve the matrix equation directly with Scip? Is there a way to embed an external solver? Or do I have to write my own solver for the continuous lyapunov matrix equation?
You could try using the pip file format used for polynomial constraints and objective. See http://polip.zib.de/ and http://polip.zib.de/pipformat.php
You would have to do the matrix operations yourself or use ZIMPL
Matrix equations cannot be handled in SCIP. You would need to transform them into linear equations. Also, all the data has to be loaded into an LP solver at some time and needs to be formulated as usual constraints here as well. So even if SCIP itself would be able to handle matrix equations you are sooner or later to required to expand the problem.

Optimization of multivariate function with a initial solution close to the optimum

I was wondering if anyone knows which kind of algorithm could be use in my case. I already have run the optimizer on my multivariate function and found a solution to my problem, assuming that my function is regular enough. I slightly perturbate the problem and would like to find the optimum solution which is close to my last solution. Is there any very fast algorithm in this case or should I just fallback to a regular one.
We probably need a bit more information about your problem; but since you know you're near the right solution, and if derivatives are easy to calculate, Newton-Raphson is a sensible choice, and if not, Conjugate-Gradient may make sense.
If you already have an iterative optimizer (for example, based on Powell's direction set method, or CG), why don't you use your initial solution as a starting point for the next run of your optimizer?
EDIT: due to your comment: if calculating the Jacobian or the Hessian matrix gives you performance problems, try BFGS (http://en.wikipedia.org/wiki/BFGS_method), it avoids calculation of the Hessian completely; here
http://www.alglib.net/optimization/lbfgs.php you find a (free-for-non-commercial) implementation of BFGS. A good description of the details you will here.
And don't expect to get anything from finding your initial solution with a less sophisticated algorithm.
So this is all about unconstrained optimization. If you need information about constrained optimization, I suggest you google for "SQP".
there are a bunch of algorithms for finding the roots of equations. If you know approximately where the root is, there are algorithms that will get you arbitrarily close very quickly, in ln n time or better.
One is Newton's method
another is the Bisection Method
Note that these algorithms are for single variable functions, but can be expanded to multivariate functions.
Every minimization algorithm performs better (read: perform at all) if you have a good initial guess. The initial guess for the perturbed problem will be in your case the minimum point of the non perturbed problem.
Then, you have to specify your requirements: you want speed. What accuracy do you want ? Does space efficiency matters ? Most importantly: what information do you have: only the value of the function, or do you also have the derivatives (possibly second derivatives) ?
Some background on the problem would help too. Looking for a smooth function which has been discretized will be very different than looking for hundreds of unrelated parameters.
Global information (ie. is the function convex, is there a guaranteed global minimum or many local ones, etc) can be left aside for now. If you have trouble finding the minimum point of the perturbed problem, this is something you will have to investigate though.
Answering these questions will allow us to select a particular algorithm. There are many choices (and trade-offs) for multivariate optimization.
Also, which is quicker will very much depend on the problem (rather than on the algorithm), and should be determined by experimentation.
Thought I don't know much about using computers in this capacity, I remember an article that used neuroevolutionary techniques to find "best-fit" equations relatively efficiently, given a known function complexity (linear, Nth-polynomial, exponential, logarithmic, etc) and a set of point plots. As I recall it was one of the earliest uses of what we now know as computational neuroevolution; because the functional complexity (and thus the number of terms) of the equation is known and fixed, a static neural net can be used and seeded with your closest values, then "mutated" and tested for fitness, with heuristics to make new nets closer to existing nets with high fitness. Using multithreading, many nets can be created, tested and evaluated in parallel.

Resources