Ceres: Compute uncertainty on parameter - ceres-solver

I am using Ceres to make a fit, and would like to get an uncertainty for the fit parameters. It has been suggested to use the Covariance class, but I am not sure whether I read the documentation correctly. Here is what I tried in analogy to the documentation to get the uncertainties for a simple linear fit:
void Fit::fit_linear_function(const std::vector<double>& x, const std::vector<double>& y, int idx_start, int idx_end, double& k, double& d) {
Problem problem;
for (int i = idx_start; i <= idx_end; ++i) {
//std::cout << "i x y "<<i<< " " << x[i] << " " << y[i] << std::endl;
problem.AddResidualBlock(
new ceres::AutoDiffCostFunction<LinearResidual, 1,1, 1>(
new LinearResidual(x[i], y[i])),
NULL, &k, &d);
}
Covariance::Options options;
Covariance covariance(options);
std::vector<std::pair<const double*, const double *>> covariance_blocks;
covariance_blocks.push_back(std::make_pair(&k,&k));
covariance_blocks.push_back(std::make_pair(&d,&d));
CHECK(covariance.Compute(covariance_blocks,&problem));
double covariance_kk;
double covariance_dd;
covariance.GetCovarianceBlock(&k,&k, &covariance_kk);
covariance.GetCovarianceBlock(&d,&d, &covariance_dd);
std::cout<< "Covariance test k" << covariance_kk<<std::endl;
std::cout<< "Covariance test d" << covariance_dd<<std::endl;
It compiles and produces output, but the results are quite off from what I get from scipy so I must have made a mistake.

Solve the problem and then use the ceres::Covariance class.
http://ceres-solver.org/nnls_covariance.html

Related

Async doesn't work for long vectors

I am doing some parallel programming with async. I have an integrator and in a test program I wanted to see whether if dividing a vector in 4 subvectors actually takes one fourth of the time to complete the task.
I had an initial issue about the time measured, now solved as steady_clock() measures real and not CPU time.
I tried the code with different vector lenghts. For short lenghts (<10e5 elements) the direct integration is faster: normal, as the .get() calls and the sum take their time.
For intermediate lenghts (about 1e8 elements) the integration followed the expected time, giving 1 s as the first time and 0.26 s for the second time.
For long vectors(10e9 or higher) the second integration takes much more time than the first, more than 3 s against a similar or greater time.
Why? What is the process that makes the divide and conquer routine slower?
A couple of additional notes: Please note that I pass the vectors by reference, so that cannot be the issue, and keep in mind that this is a test code, thus the subvector creation is not the point of the question.
#include<iostream>
#include<vector>
#include<thread>
#include<future>
#include<ctime>
#include<chrono>
using namespace std;
using namespace chrono;
typedef steady_clock::time_point tt;
double integral(const std::vector<double>& v, double dx) //simpson 1/3
{
int n=v.size();
double in=0.;
if(n%2 == 1) {in+=v[n-1]*v[n-1]; n--;}
in=(v[0]*v[0])+(v[n-1]*v[n-1]);
for(int i=1; i<n/2; i++)
in+= 2.*v[2*i] + 4.*v[2*i+1];
return in*dx/3.;
}
int main()
{
double h=0.001;
vector<double> v1(100000,h); // a vector, content is not important
// subvectors
vector<double> sv1(v1.begin(), v1.begin() + v1.size()/4),
sv2(v1.begin() + v1.size()/4 +1,v1.begin()+ 2*v1.size()/4),
sv3( v1.begin() + 2*v1.size()/4+1, v1.begin() + 3*v1.size()/4+1),
sv4( v1.begin() + 3*v1.size()/4+1, v1.end());
double a,b;
cout << "f1" << endl;
tt bt1 = chrono::steady_clock::now();
// complete integration: should take time t
a=integral(v1, h);
tt et1 = chrono::steady_clock::now();
duration<double> time_span = duration_cast<duration<double>>(et1 - bt1);
cout << time_span.count() << endl;
future<double> f1, f2,f3,f4;
cout << "f2" << endl;
tt bt2 = chrono::steady_clock::now();
// four integrations: should take time t/4
f1 = async(launch::async, integral, ref(sv1), h);
f2 = async(launch::async, integral, ref(sv2), h);
f3 = async(launch::async, integral, ref(sv3), h);
f4 = async(launch::async, integral, ref(sv4), h);
b=f1.get()+f2.get()+f3.get()+f4.get();
tt et2 = chrono::steady_clock::now();
duration<double> time_span2 = duration_cast<duration<double>>(et2 - bt2);
cout << time_span2.count() << endl;
cout << a << " " << b << endl;
return 0;
}

performance tuning on Eigen sparse matrix

I've implemented something using Eigen's SparseMatrix, basically it's something like,
SparseMatrix W;
...
W.row(i) += X.row(j); // X is another SparseMatrix, both W and X are row major.
...
and I did some perf-profiling on the code via google-pprof, and I think the above code is problematic, see figure below,
fig 1
then fig 2
finally fig 3
looks like the operator+= brings in much memory-copy stuff.
I don't know much about the internals of SparseMatrix operations, but is there any recommended way to optimize the above code?
If the sparsity of X is a subset of the sparsity of W, then you can wrote your own function doing the addition in-place:
namespace Eigen {
template<typename Dst, typename Src>
void inplace_sparse_add(Dst &dst, const Src &src)
{
EIGEN_STATIC_ASSERT( ((internal::evaluator<Dst>::Flags&RowMajorBit) == (internal::evaluator<Src>::Flags&RowMajorBit)),
THE_STORAGE_ORDER_OF_BOTH_SIDES_MUST_MATCH);
using internal::evaluator;
evaluator<Dst> dst_eval(dst);
evaluator<Src> src_eval(src);
assert(dst.rows()==src.rows() && dst.cols()==src.cols());
for (Index j=0; j<src.outerSize(); ++j)
{
typename evaluator<Dst>::InnerIterator dst_it(dst_eval, j);
typename evaluator<Src>::InnerIterator src_it(src_eval, j);
while(src_it)
{
while(dst_it && dst_it.index()!=src_it.index())
++dst_it;
assert(dst_it);
dst_it.valueRef() += src_it.value();
++src_it;
}
}
}
}
Here is a usage example:
int main()
{
int n = 10;
MatrixXd R = MatrixXd::Random(n,n);
SparseMatrix<double, RowMajor> A = R.sparseView(0.25,1), B = 0.5*R.sparseView(0.65,1);
cout << A.toDense() << "\n\n" << B.toDense() << "\n\n";
inplace_sparse_add(A, B);
cout << A.toDense() << "\n\n";
auto Ai = A.row(2);
inplace_sparse_add(Ai, B.row(2));
cout << A.toDense() << "\n\n";
}

Reading in from file with modern c++ - data is not stored

maybe I get something wrong with shared_pointers or there is some basic shortcoming of mine but I couldn't get this right. So I want to read in some data from a file. There are position and momentum data on each line of the data file and the first line stores the number of data points.
I need to read this in to my data structure and for some reason my graph would not fill, although the data reads in correctly.
const int dim = 3; // dimension of problem
template <typename T, typename G>
// T is the type of the inputted locations and G is the type of the
// distance between them
// for example: int point with float/double distance
struct Node{
std::pair< std::array<T, dim>,std::pair< std::array<T, dim>, G > > pos; // position
std::pair< std::array<T, dim>,std::pair< std::array<T, dim>, G > > mom; // momentum
// a pair indexed by a position in space and has a pair of position
// and the distance between these points
};
template <typename T, typename G>
struct Graph{
int numOfNodes;
std::vector< Node<T,G> > nodes;
};
This is the data structure and here's my read function (std::cout-s are only for testing):
template <typename T, typename G>
std::istream& operator>>(std::istream& is, std::shared_ptr< Graph<T,G> >& graph){
is >> graph->numOfNodes; // there's the number of nodes on the first line of the data file
std::cout << graph->numOfNodes << "\n";
for(int k=0; k<graph->numOfNodes; k++){
Node<T,G> temp;
for(auto i : temp.pos.first){
is >> i;
std::cout << i << "\t";
}
std::cout << "\t";
for(auto i : temp.mom.first){
is >> i;
std::cout << i << "\t";
}
std::cout << "\n";
graph->nodes.push_back(temp);
}
return is;
}
I have an output function as well. So if I output the graph which I intended to fill during read-in is zeroed out. Number of nodes os correct however positions and momente are all zeroed out. What did I do wrong? Thanks in advance.
for(auto i : temp.pos.first){
is >> i;
std::cout << i << "\t";
}
Think of this as similar to a function. If you have something like:
void doX(int i) { i = 42; }
int main() {
int j=5;
doX(j);
return j;
}
Running this code, you'll see the program returns the value 5. This is because the function doX takes i by value; it basically takes a copy of the variable.
If you replace doX's signature with
void doX(int &i)
and run the code, you'll see it returns 42. This is because the function is now taking the argument by reference, and so can modify it.
Your loops will behave similarly. As you have it now, they take a copy of the values in the arrays in turn, but are not by reference.
As with the function, you can change your loops to look like
for(auto &i : temp.pos.first){
is >> i;
std::cout << i << "\t";
}
This should then let you change the values stored in the arrays.

Redefine a single operator of a defined type. C++

I want to redefine the bit shift operator on a 64 bit unsigned integer in c++ in such a way that I can do say, x<<d, where x is a 64 bit integer and d is an integer with |d|<64, to make it equivalent to x<<d for d>0 and x>>|d| for d<0.
The only way I know how to do this is to define a whole new class and overload the << operator, but I think that also means I need to overload all the other operators I need (unless there is a trick I don't know), which seems a bit silly considering I want them to behave exactly as they do for the pre-defined type. It's just the bitshift that I want to change. At present, I have just written a function called 'shift' to do this, which doesn't seem very c++ ish, even though it works fine.
What is the stylistically correct way to do what I need?
Thanks
If you were able to do this, it would be very confusing to other C++ programmers who read your code and see:
int64 x = 92134;
int64 y = x >> 3;
And have it behave differently than their expectations, and behave differently from what the C++ standard defines.
The stylistic choice that agrees most with the C++ code I've seen is to continue using your own myshift() function.
int64 y = myshift(x, 3);
I think it's very horrible (and I propose it just for fun) but... if you accept to wrap the number of bit shifted in a struct...
#include <iostream>
struct foo
{ int num; };
long long int operator<< (const long long int & lli, const foo & f)
{
int d { f.num };
if ( d < 0 )
d = -d;
if ( d >= 64 )
d = 0;
return lli << d;
}
int main()
{
long long int lli { 1 };
std::cout << (lli << foo{+3}) << std::endl; // shift +3
std::cout << (lli << foo{-3}) << std::endl; // shift +3 (-3 -> +3)
std::cout << (lli << foo{+90}) << std::endl; // no shift (over 64)
std::cout << (lli << foo{-90}) << std::endl; // no shift (over 64)
return 0;
}

Is fftw output depending on size of input?

In the last week i have been programming some 2-dimensional convolutions with FFTW, by passing to the frequency domain both signals, multiplying, and then coming back.
Surprisingly, I am getting the correct result only when input size is less than a fixed number!
I am posting some working code, in which i take simple initial constant matrixes of value 2 for the input, and 1 for the filter on the spatial domain. This way, the result of convolving them should be a matrix of the average of the first matrix values, i.e., 2, since it is constant. This is the output when I vary the sizes of width and height from 0 to h=215, w=215 respectively; If I set h=216, w=216, or greater, then the output gets corrupted!! I would really appreciate some clues about where could I be making some mistake. Thank you very much!
#include <fftw3.h>
int main(int argc, char* argv[]) {
int h=215, w=215;
//Input and 1 filter are declared and initialized here
float *in = (float*) fftwf_malloc(sizeof(float)*w*h);
float *identity = (float*) fftwf_malloc(sizeof(float)*w*h);
for(int i=0;i<w*h;i++){
in[i]=5;
identity[i]=1;
}
//Declare two forward plans and one backward
fftwf_plan plan1, plan2, plan3;
//Allocate for complex output of both transforms
fftwf_complex *inTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
fftwf_complex *identityTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
//Initialize forward plans
plan1 = fftwf_plan_dft_r2c_2d(h, w, in, inTrans, FFTW_ESTIMATE);
plan2 = fftwf_plan_dft_r2c_2d(h, w, identity, identityTrans, FFTW_ESTIMATE);
//Execute them
fftwf_execute(plan1);
fftwf_execute(plan2);
//Multiply in frequency domain. Theoretically, no need to multiply imaginary parts; since signals are real and symmetric
//their transform are also real, identityTrans[i][i] = 0, but i leave here this for more generic implementation.
for(int i=0; i<(w/2+1)*h; i++){
inTrans[i][0] = inTrans[i][0]*identityTrans[i][0] - inTrans[i][1]*identityTrans[i][1];
inTrans[i][1] = inTrans[i][0]*identityTrans[i][1] + inTrans[i][1]*identityTrans[i][0];
}
//Execute inverse transform, store result in identity, where identity filter lied.
plan3 = fftwf_plan_dft_c2r_2d(h, w, inTrans, identity, FFTW_ESTIMATE);
fftwf_execute(plan3);
//Output first results of convolution(in, identity) to see if they are the average of in.
for(int i=0;i<h/h+4;i++){
for(int j=0;j<w/w+4;j++){
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
}
}std::cout<<endl;
//Compute average of data
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
std::cout<<"Mean of input was " << (float)sum/(w*h) << endl;
std::cout<< endl;
fftwf_destroy_plan(plan1);
fftwf_destroy_plan(plan2);
fftwf_destroy_plan(plan3);
return 0;
}
Your problem has nothing to do with fftw ! It comes from this line :
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
if w=216 and h=216 then `w*h*w*h=2 176 782 336. The higher limit for signed 32bit integer is 2 147 483 647. You are facing an overflow...
Solution is to cast the denominator to float.
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(((float)w)*h*w*h) << endl;
The next trouble that you are going to face is this one :
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
Remember that a float has 7 useful decimal digits. If w=h=4000, the computed average will be lower than the real one. Use a double or write two loops and sum on the inner loop (localsum) before summing the outer loop (sum+=localsum) !
Bye,
Francis

Resources