Separating the image background using openCV - image

Actually, I'm trying to subtract the background from this image. Apparently, I just want to subtract the green background and here is the code I'm using:
Mat img_object = imread(patternImageName);
Mat imageInHSV;
cvtColor(img_object, imageInHSV, CV_BGR2HSV);
Mat chan[3],imgThreshed, processed;
split( imageInHSV, chan );
Mat H = chan[0];
// compute statistics for Hue value
cv::Scalar mean, stddev;
cv::meanStdDev(H, mean, stddev);
// ensure we get 95% of all valid Hue samples (statistics 3*sigma rule)
float minHue = 80;
float maxHue = 95;
cout << "MinValue :" << mean[0] << " MaxHue:" << stddev[0] << endl;
cout << H << endl;
// STEP 2: detection phase
cv::inRange(H, cv::Scalar(minHue), cv::Scalar(maxHue), imgThreshed);
imshow("thresholded", imgThreshed);
I checked the values of the channel H to decide the minHue and maxHue so I choosed the interval of the most frequent values in the matrix which will definitely be the green one. But, I got this result which is obsiously not what I'm looking for because there is missing stuff in it. Any idea how to improve it? how to get better subtract the background from this kind of images?

I am not sure exactly what your goal is. However, I got much better result on your sample image from the two other channels (Saturation and lightness rather than hue) using the range of [mean-stddev,mean+stddev]. Averaging the results of all three channels shows some improvement:
using namespace std ;
using namespace cv ;
int main()
{
Mat img_object = imread("1.png");
Mat imageInHSV;
cvtColor(img_object, imageInHSV, CV_BGR2HSV);
Mat chan[3];
split( imageInHSV, chan );
Mat result ;
Mat threshImg[3] ;
for(int i=0 ; i<3 ; i++)
{
Mat H = chan[i];
// compute statistics for each channel
cv::Scalar mean, stddev;
cv::meanStdDev(H, mean, stddev);
// statistically 68% of data should be in this range
float minVal = mean[0]-stddev[0];
float maxVal = mean[0]+stddev[0];
cout << "MinValue :" << mean[0] << " MaxHue:" << stddev[0] << endl;
// Separating the dominant 68% which we guess should be the background.
cv::inRange(H, cv::Scalar(minVal), cv::Scalar(maxVal), threshImg[i]);
}
// averaging the results from three different channels (Hue, Saturation, lightness).
result = (threshImg[0]+threshImg[1]+threshImg[2])/3 ;
imwrite("thresholded_012.jpg", result) ;
}
Input image:
Output image:

Related

How to quickly calculate A'A in Eigen, where A is a sparse matrix?

As shown in the question, is there any sample code to calculate this matrix multiplication?
Here is a link to Dense matrix.
I think this example demonstrates what you need.
#include <Eigen/eigen>
int m = 8;
int n = 5;
//Eigen has no built-in random sparse function (that I know of!)
Eigen::MatrixXd A_dense = MatrixXd::Random(m, n);
//create a sparse copy to demonstrate functionality
Eigen::SparseMatrix<double> A = A_dense.sparseView();
//create a sparse matrix of compatible dimensions for A^T * A
Eigen::SparseMatrix<double> ATA(n, n);
//compute A^T * A
ATA.selfadjointView<Lower>().rankUpdate(A.transpose(), 1.0);
//print A in dense format so its readable
std::cout << A.toDense() << "\n\n";
//print ATA in dense format so its readable
std::cout << ATA.toDense() << "\n\n";
//check with intuitive / less optimized operation
std::cout << (A.transpose() * A).toDense() << "\n\n";
This will use a specialized routine to calculate A^T * A and will only compute your preferred triangle (in this case, the lower half) as the result is a symmetric matrix.
My output:
-0.997497 0.64568 -0.817194 -0.982177 -0.0984222
0.127171 0.49321 -0.271096 -0.24424 -0.295755
-0.613392 -0.651784 -0.705374 0.0633259 -0.885922
0.617481 0.717887 -0.668203 0.142369 0.215369
0.170019 0.421003 0.97705 0.203528 0.566637
-0.0402539 0.0270699 -0.108615 0.214331 0.605213
-0.299417 -0.39201 -0.761834 -0.667531 0.0397656
0.791925 -0.970031 -0.990661 0.32609 -0.3961
2.51603 0 0 0 0
-0.318591 2.87295 0 0 0
0.414807 0.986724 4.21357 0 0
1.48181 -0.656854 1.09012 1.6879 0
0.483357 1.1462 1.49161 0.232797 1.77424
2.51603 -0.318591 0.414807 1.48181 0.483357
-0.318591 2.87295 0.986724 -0.656854 1.1462
0.414807 0.986724 4.21357 1.09012 1.49161
1.48181 -0.656854 1.09012 1.6879 0.232797
0.483357 1.1462 1.49161 0.232797 1.77424

Unexpected and large runtime variations in Eigen for matrix multiplies

I am comparing ways to perform equivalent matrix operations within Eigen, and am getting extraordinarily different runtimes, including some non-intuitive results.
I am comparing three mathematically equivalent forms of the matrix multiplication:
wx * transpose(data)
The three forms I'm comparing are:
result = wx * data.transpose() (straight multiply version)
result.noalias() = wx * data.transpose() (noalias version)
result = (data * wx.transpose()).transpose() (transposed version)
I am also testing using both Column Major and Row Major storage.
With column major storage, the transposed version is significantly faster (an order of magnitude) than both the straight multiply and the no alias version, which are both approximately equal in runtime.
With row major storage, the noalias and the transposed version are both significantly faster than the straight multiply in runtime.
I understand that Eigen uses lazy evaluation, and that the immediate results returned from an operation are often expression templates, and are not the intermediate values. I also understand that matrix * matrix operations will always produce a temporary when they are the last operation on the right hand side, to avoid aliasing issues, hence why I am attempting to speed things up through noalias().
My main questions:
Why is the transposed version always significantly faster, even (in the case of column major storage) when I explicitly state noalias so no temporaries are created?
Why does the (significant) difference in runtime only occur between the straight multiply and the noalias version when using column major storage?
The code I am using for this is below. It is being compiled using gcc 4.9.2, on a Centos 6 install, using the following command line.
g++ eigen_test.cpp -O3 -std=c++11 -o eigen_test -pthread -fopenmp -finline-functions
using Matrix = Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor>;
// using Matrix = Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
int wx_rows = 8000;
int wx_cols = 1000;
int samples = 1;
// Eigen::MatrixXf matrix = Eigen::MatrixXf::Random(matrix_rows, matrix_cols);
Matrix wx = Eigen::MatrixXf::Random(wx_rows, wx_cols);
Matrix data = Eigen::MatrixXf::Random(samples, wx_cols);
Matrix result;
unsigned int iterations = 10000;
float sum = 0;
auto before = std::chrono::high_resolution_clock::now();
for (unsigned int ii = 0; ii < iterations; ++ii)
{
result = wx * data.transpose();
sum += result(result.rows() - 1, result.cols() - 1);
}
auto after = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before).count();
std::cout << "original sum: " << sum << std::endl;
std::cout << "original time (ms): " << duration << std::endl;
std::cout << std::endl;
sum = 0;
before = std::chrono::high_resolution_clock::now();
for (unsigned int ii = 0; ii < iterations; ++ii)
{
result.noalias() = wx * data.transpose();
sum += result(wx_rows - 1, samples - 1);
}
after = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before).count();
std::cout << "alias sum: " << sum << std::endl;
std::cout << "alias time (ms) : " << duration << std::endl;
std::cout << std::endl;
sum = 0;
before = std::chrono::high_resolution_clock::now();
for (unsigned int ii = 0; ii < iterations; ++ii)
{
result = (data * wx.transpose()).transpose();
sum += result(wx_rows - 1, samples - 1);
}
after = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before).count();
std::cout << "new sum: " << sum << std::endl;
std::cout << "new time (ms) : " << duration << std::endl;
One half of the explanation is because, in the current version of Eigen, multi-threading is achieved by splitting the work over blocks of columns of the result (and the right-hand-side). With only 1 column, multi-threading does not take place. In the column-major case, this explain why cases 1 and 2 underperform. On the other hand, case 3 is evaluated as:
column_major_tmp.noalias() = data * wx.transpose();
result = column_major_tmp.transpose();
and since wx.transpose().cols() is huge, multi-threading is effective.
To understand the row-major case, you also need to know that internally matrix products is implemented for a column-major destination. If the destination is row-major, as in case 2, then the product is transposed, so what really happens is:
row_major_result.transpose().noalias() = data * wx.transpose();
and so we're back to case 3 but without temporary.
This is clearly a limitation of current Eigen's multi-threading implementation for highly unbalanced matrix sizes. Ideally threads should be spread on row-block and/or column-block depending on the size of the matrices at hand.
BTW, you should also compile with -march=native to let Eigen fully exploit your CPU (AVX, FMA, AVX512...).

Iterative multiplication of relative transforms leads to fast instability

I'm writing a program that receives Eigen transforms and stores them in a container after applying some noise. In particular, at time k, I receive transform Tk. I get from the container the transform Tk-1, create the delta = Tk-1-1 · Tk, apply some noise to delta and store Tk-1 · delta as a new element of the container.
I've noticed that after 50 iterations the values are completely wrong and at every iteration I see that the last element of the container, when pre-multiplied by its inverse, is not even equal to the identity.
I've already checked that the container follows the rules of allocation specified by Eigen.
I think the problem is related to the instability of the operations I'm doing.
The following simple code produce the nonzero values when max = 35 and goes to infinity when max is bigger than 60.
Eigen::Isometry3d my_pose = Eigen::Isometry3d::Identity();
my_pose.translate(Eigen::Vector3d::Random());
my_pose.rotate(Eigen::Quaterniond::UnitRandom());
Eigen::Isometry3d my_other_pose = my_pose;
int max = 35;
for(int i=0; i < max; i++)
{
my_pose = my_pose * my_pose.inverse() * my_pose;
}
std::cerr << my_pose.matrix() - my_other_pose.matrix() << std::endl;
I'm surprised how fast the divergence happens. Since my real program is expected to iterate more than hundreds of times, is there a way to create relative transforms that are more stable?
Yes, use a Quaterniond for the rotations:
Eigen::Isometry3d my_pose = Eigen::Isometry3d::Identity();
my_pose.translate(Eigen::Vector3d::Random());
my_pose.rotate(Eigen::Quaterniond::UnitRandom());
Eigen::Isometry3d my_other_pose = my_pose;
Eigen::Quaterniond q(my_pose.rotation());
int max = 35;
for (int i = 0; i < max; i++) {
std::cerr << q.matrix() << "\n\n";
std::cerr << my_pose.matrix() << "\n\n";
q = q * q.inverse() * q;
my_pose = my_pose * my_pose.inverse() * my_pose;
}
std::cerr << q.matrix() - Eigen::Quaterniond(my_other_pose.rotation()).matrix() << "\n";
std::cerr << my_pose.matrix() - my_other_pose.matrix() << std::endl;
If you would have examined the difference you printed out, the rotation part of the matrix gets a huge error, while the translation part is tolerable. The inverse on the rotation matrix will hit stability issues quickly, so using it directly is usually not recommended.

Async doesn't work for long vectors

I am doing some parallel programming with async. I have an integrator and in a test program I wanted to see whether if dividing a vector in 4 subvectors actually takes one fourth of the time to complete the task.
I had an initial issue about the time measured, now solved as steady_clock() measures real and not CPU time.
I tried the code with different vector lenghts. For short lenghts (<10e5 elements) the direct integration is faster: normal, as the .get() calls and the sum take their time.
For intermediate lenghts (about 1e8 elements) the integration followed the expected time, giving 1 s as the first time and 0.26 s for the second time.
For long vectors(10e9 or higher) the second integration takes much more time than the first, more than 3 s against a similar or greater time.
Why? What is the process that makes the divide and conquer routine slower?
A couple of additional notes: Please note that I pass the vectors by reference, so that cannot be the issue, and keep in mind that this is a test code, thus the subvector creation is not the point of the question.
#include<iostream>
#include<vector>
#include<thread>
#include<future>
#include<ctime>
#include<chrono>
using namespace std;
using namespace chrono;
typedef steady_clock::time_point tt;
double integral(const std::vector<double>& v, double dx) //simpson 1/3
{
int n=v.size();
double in=0.;
if(n%2 == 1) {in+=v[n-1]*v[n-1]; n--;}
in=(v[0]*v[0])+(v[n-1]*v[n-1]);
for(int i=1; i<n/2; i++)
in+= 2.*v[2*i] + 4.*v[2*i+1];
return in*dx/3.;
}
int main()
{
double h=0.001;
vector<double> v1(100000,h); // a vector, content is not important
// subvectors
vector<double> sv1(v1.begin(), v1.begin() + v1.size()/4),
sv2(v1.begin() + v1.size()/4 +1,v1.begin()+ 2*v1.size()/4),
sv3( v1.begin() + 2*v1.size()/4+1, v1.begin() + 3*v1.size()/4+1),
sv4( v1.begin() + 3*v1.size()/4+1, v1.end());
double a,b;
cout << "f1" << endl;
tt bt1 = chrono::steady_clock::now();
// complete integration: should take time t
a=integral(v1, h);
tt et1 = chrono::steady_clock::now();
duration<double> time_span = duration_cast<duration<double>>(et1 - bt1);
cout << time_span.count() << endl;
future<double> f1, f2,f3,f4;
cout << "f2" << endl;
tt bt2 = chrono::steady_clock::now();
// four integrations: should take time t/4
f1 = async(launch::async, integral, ref(sv1), h);
f2 = async(launch::async, integral, ref(sv2), h);
f3 = async(launch::async, integral, ref(sv3), h);
f4 = async(launch::async, integral, ref(sv4), h);
b=f1.get()+f2.get()+f3.get()+f4.get();
tt et2 = chrono::steady_clock::now();
duration<double> time_span2 = duration_cast<duration<double>>(et2 - bt2);
cout << time_span2.count() << endl;
cout << a << " " << b << endl;
return 0;
}

Is fftw output depending on size of input?

In the last week i have been programming some 2-dimensional convolutions with FFTW, by passing to the frequency domain both signals, multiplying, and then coming back.
Surprisingly, I am getting the correct result only when input size is less than a fixed number!
I am posting some working code, in which i take simple initial constant matrixes of value 2 for the input, and 1 for the filter on the spatial domain. This way, the result of convolving them should be a matrix of the average of the first matrix values, i.e., 2, since it is constant. This is the output when I vary the sizes of width and height from 0 to h=215, w=215 respectively; If I set h=216, w=216, or greater, then the output gets corrupted!! I would really appreciate some clues about where could I be making some mistake. Thank you very much!
#include <fftw3.h>
int main(int argc, char* argv[]) {
int h=215, w=215;
//Input and 1 filter are declared and initialized here
float *in = (float*) fftwf_malloc(sizeof(float)*w*h);
float *identity = (float*) fftwf_malloc(sizeof(float)*w*h);
for(int i=0;i<w*h;i++){
in[i]=5;
identity[i]=1;
}
//Declare two forward plans and one backward
fftwf_plan plan1, plan2, plan3;
//Allocate for complex output of both transforms
fftwf_complex *inTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
fftwf_complex *identityTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
//Initialize forward plans
plan1 = fftwf_plan_dft_r2c_2d(h, w, in, inTrans, FFTW_ESTIMATE);
plan2 = fftwf_plan_dft_r2c_2d(h, w, identity, identityTrans, FFTW_ESTIMATE);
//Execute them
fftwf_execute(plan1);
fftwf_execute(plan2);
//Multiply in frequency domain. Theoretically, no need to multiply imaginary parts; since signals are real and symmetric
//their transform are also real, identityTrans[i][i] = 0, but i leave here this for more generic implementation.
for(int i=0; i<(w/2+1)*h; i++){
inTrans[i][0] = inTrans[i][0]*identityTrans[i][0] - inTrans[i][1]*identityTrans[i][1];
inTrans[i][1] = inTrans[i][0]*identityTrans[i][1] + inTrans[i][1]*identityTrans[i][0];
}
//Execute inverse transform, store result in identity, where identity filter lied.
plan3 = fftwf_plan_dft_c2r_2d(h, w, inTrans, identity, FFTW_ESTIMATE);
fftwf_execute(plan3);
//Output first results of convolution(in, identity) to see if they are the average of in.
for(int i=0;i<h/h+4;i++){
for(int j=0;j<w/w+4;j++){
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
}
}std::cout<<endl;
//Compute average of data
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
std::cout<<"Mean of input was " << (float)sum/(w*h) << endl;
std::cout<< endl;
fftwf_destroy_plan(plan1);
fftwf_destroy_plan(plan2);
fftwf_destroy_plan(plan3);
return 0;
}
Your problem has nothing to do with fftw ! It comes from this line :
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
if w=216 and h=216 then `w*h*w*h=2 176 782 336. The higher limit for signed 32bit integer is 2 147 483 647. You are facing an overflow...
Solution is to cast the denominator to float.
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(((float)w)*h*w*h) << endl;
The next trouble that you are going to face is this one :
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
Remember that a float has 7 useful decimal digits. If w=h=4000, the computed average will be lower than the real one. Use a double or write two loops and sum on the inner loop (localsum) before summing the outer loop (sum+=localsum) !
Bye,
Francis

Resources