Suppose we have a connected component in the image as the following image illustrates:image
My question is how can calculate the bounding ellipse of the connected components (the red ellipse in the image). I have checked MATLAB function regionprops, and understand how MATLAB can do that. I also notice that Opencv has similar function to do that CBlob::GetEllipse(). However, although I understand how they obtain the result by reading the code, the fundamental theory behind it is still unclear to me. I am therefore wondering whether there are some standard algorithms to do the job. Thanks!
Based on the comments, I reorganized my question: in image moment Wikipedia the calculation formula of the longest axis angle is
However, in the MATLAB function regionprops, the codes are as follows:
% Calculate orientation.
if (uyy > uxx)
num = uyy - uxx + sqrt((uyy - uxx)^2 + 4*uxy^2);
den = 2*uxy;
num = 2*uxy;
den = uxx - uyy + sqrt((uxx - uyy)^2 + 4*uxy^2);
This implementation is inconsistent with the formula in Wikipedia. I was wondering which one is correct.

If you're looking for a OpenCV implementation than I can give it to you. The algorithm is the following:
Convert image to 1bit (b&w)
Find all contours
Create contour that contains all points from founded contours
Calculate convex hull of this contour
Find rotated ellipse (rectangle) with minimal square that contains calculated in previous step contour
Here's code:
Mat src = imread("ellipse.jpg"), tmp;
vector<Vec4i> hierarchy;
vector<vector<Point> > contours;
vector<Point> bigContour, hull;
RotatedRect ell;
//step 1
cvtColor(src, tmp, CV_BGR2GRAY);
threshold(tmp, tmp, 100, 255, THRESH_BINARY);
//step 2
findContours(tmp, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
//step 3
for (size_t i=0; i<contours.size(); i++)
for (size_t j=0; j<contours[i].size(); j++)
//step 4
convexHull(bigContour, hull);
//step 5
ell = fitEllipse(hull);
//drawing result
ellipse(src, ell, Scalar(0,0,255), 2);
imshow("result", src);
This is the input:
And here's a result:

I was trying to find out what's the algorithm behind it as well so I could write my own implementation of it. I found it on a blog post of mathworks. In one of the comments, the author says:
regionprops calculates the 2nd-order moments of the object in question and then returns measurements of the ellipse with the same 2nd-order moments.
and later later:
The equations used are from Haralick and Shapiro, Computer and Robot Vision vol. 1, Appendix A, Addison-Wesley 1992. I did a sanity check by constructing an image containing an ellipse with major axis length = 100 and minor axis length = 50, and regionprops returned the correct measurements.
I don't have that book but seems I'll need to get a copy of it.

I'm not sure how matlab or opencv calculates the ellipsoid. But if you are interested in math behind it, there is a very nice optimization approach called Löwner-John ellipsoid. You can find more information about this method in the Stanford Convex Optimization course. I hope it helps...


What does "filter" mean in Realistic Ray Tracing?

I'm reading the book Realistic Ray Tracing and I couldn't understand the box filter code:
void boxFilter(Vector2* samples, int num_samples)
for (int i = 0; i < num_samples; i++)
samples[i].x = samples[i].x - 0.5f;
samples[i].y = samples[i].y - 0.5f;
In my opinion, "filter" is an array of weights, and sampling is to generate positions to produce rays, filter is to combine the results (so the filter method should return float[], but the function above returns an Vector2[]). What does the code mean?
The basic idea of box filtering is that no matter where on an image plane "pixel" the sample lands, a box filter causes the renderer to act as if it landed in the exact center of that pixel.
I haven't read that particular book, but I'm guessing that in that code fragment, sample[].x and .y are the int (or previously floor()ed) locations where the returned ray hits the image plane, in pixel coordinates. Therefore, subtracting .5 from each puts the sample in the geometric center of each pixel, hence, it's a box filter.
For an in-depth discussion of box filtering (and other filters), see Physically-Based Rendering, Chapter 7, "Sampling and Reconstruction".

Compute curvature of a bent pipe using image processing (Hough transform parabola detection)

I'm trying to design a way to detect this pipe's curvature. I tried applying hough transform and found detected line but they don't lie along the surface of pipe so smoothing it out to fit a beizer curve is not working .Please suggest some good way to start for the image like this.[
The image obtained by hough transform to detect lines is as follows
I'm using standard Matlab code for probabilistic hough transform line detection that generates line segment surrounding the structure. Essentially the shape of pipe resembles a parabola but for hough parabola detection I need to provide eccentricity of the point prior to the detection. Please suggest a good way for finding discrete points along the curvature that can be fitted to a parabola. I have given tag to opencv and ITK so if there is function that can be implemented on this particular picture please suggest the function I will try it out to see the results.
img = imread('test2.jpg');
rawimg = rgb2gray(img);
[accum, axis_rho, axis_theta, lineprm, lineseg] = Hough_Grd(bwtu, 8, 0.01);
figure(1); imagesc(axis_theta*(180/pi), axis_rho, accum); axis xy;
xlabel('Theta (degree)'); ylabel('Pho (pixels)');
title('Accumulation Array from Hough Transform');
figure(2); imagesc(bwtu); colormap('gray'); axis image;
title('Raw Image with Line Segments Detected');
The edge map of the image is as follows and the result generated after applying Hough transform on edge map is also not good. I was thinking a solution that does general parametric shape detection like this curve can be expressed as a family of parabola and so we do a curve fitting to estimate the coefficients as it bends to analyze it's curvature. I need to design a real time procedure so please suggest anything in this direction.
I suggest the following approach:
First stage: generate a segmentation of the pipe.
perform thresholding on the image.
find connected components in the thresholded image.
search for a connected component which represents the pipe.
The connected component which represents the pipe should have an edge map which is divided into top and bottom edges (see attached image).
The top and bottom edges should have similar size, and they should have a relatively constant distance from one another. In other words, the variance of their per-pixel distances should be low.
Second stage - extract curve
At this stage, you should extract the points of the curve for performing Beizer fitting.
You can either perform this calculation on the top edge, or the bottom edge.
another option is to do it on the skeleton of the pipe segmentation.
The pipe segmentation. Top and bottom edges are mark with blue and red correspondingly.
I = mat2gray(imread('ILwH7.jpg'));
im = rgb2gray(I);
%constant values to be used later on
%stage 1 - thresholding & noise cleaning
bwIm = imfill(bwIm,'holes');
bwIm = imopen(bwIm,strel('disk',1));
CC = bwconncomp(bwIm);
%iterates over the CC list, and searches for the CC which represents the
for ii=1:length(CC.PixelIdxList)
%ignore small CC
%extracts CC edges
ccMask = zeros(size(bwIm));
ccMask(CC.PixelIdxList{ii}) = 1;
ccMaskEdges = edge(ccMask);
%finds connected components in the edges mat(there should be two).
%these are the top and bottom parts of the pipe.
CC2 = bwconncomp(ccMaskEdges);
if length(CC2.PixelIdxList)~=2
%tests that the top and bottom edges has similar sizes
s1 = length(CC2.PixelIdxList{1});
s2 = length(CC2.PixelIdxList{2});
if(min(s1,s2)/max(s1,s2) < SIMILAR_SIZE_THRESHOLD)
%calculate the masks of these two connected compnents
topEdgeMask = false(size(ccMask));
topEdgeMask(CC2.PixelIdxList{1}) = true;
bottomEdgeMask = false(size(ccMask));
bottomEdgeMask(CC2.PixelIdxList{2}) = true;
%tests that the variance of the distances between the points is low
topEdgeDists = bwdist(topEdgeMask);
bottomEdgeDists = bwdist(bottomEdgeMask);
var1 = std(topEdgeDists(bottomEdgeMask));
var2 = std(bottomEdgeDists(topEdgeMask));
%if the variances are low - we have found the CC of the pipe. break!
pipeMask = ccMask;
%performs median filtering on the top and bottom boundaries.
[topCorveY, topCurveX] = find(topEdgeMask);
topCurveX = medfilt1(topCurveX);
topCurveY = medfilt1(topCurveY);
[bottomCorveY, bottomCurveX] = find(bottomEdgeMask);
bottomCurveX = medfilt1(bottomCurveX);
bottomCorveY = medfilt1(bottomCorveY);
%display results
imshow(pipeMask); hold on;
In this specific example, acquiring the pipe segmentation by thresholding was relatively easy. In some scenes it may be more complex. in these cases, you may want to use region growing algorithm for generating the pipe segmentation.
Detecting the connected component which represents the pipe can be done by using some more hueristics. For example - the local curvature of it's boundaries should be low.
You can find the connected components (CCs) of your inverted edge-map image. Then you can somehow filter those components, say for example, based on their pixel count, using region-properties. Here are the connected components I obtained using the given Octave code.
Now you can fit a model to each of these CCs using something like nlinfit or any suitable method.
im = imread('uFBtU.png');
gr = rgb2gray(uint8(im));
er = imerode(gr, ones(3)) < .5;
[lbl, n] = bwlabel(er, 8);

What algorithms or approaches apart from Haar cascades could be used for custom objects detection?

I need to do computer visions tasks in order to detect watter bottles or soda cans. I will obtain 'frontal' images of bottles, soda cans or any other random objects (one by one) and my algorithm should determine whether it's a bottle, a can or any of them.
Some details about object detecting scenario:
As mentioned, I will test one single object per image/video frame.
Not all watter bottles are the same. There could be color in plastic, lid or label variation. Maybe some could not get label or lid.
Same about variation goes for soda cans. No wrinkled soda cans are gonna be tested though.
There could be small size variation between objects.
I could have a green (or any custom color) background.
I will do any needed filters on image.
This will be run on a Raspberry Pi.
Just in case, an example of each:
I've tested a couple times OpenCV face detection algorithms and I know it works pretty good but I'd need to obtain an special Haar Cascades features XML file for detecting each custom object on this approach.
So, the distinct alternatives I have in mind are:
Creating a custom Haar Classifier.
Considering shapes.
Considering outlines.
I'd like to get a simple algorithm and I think creating a custom Haar classifier could be even not needed. What would you suggest?
I strongly considered the shape/aspect ratio approach.
However I guess I'm facing some issues as bottles come in distinct sizes or even shapes each. But this made me think or set following considerations:
I'm applying a threshold with THRESH_BINARY method. (Thanks to the answers).
I will use a white background on detection.
Soda cans are all same size.
So, a bounding box for soda cans with high accuracy might distinguish a can.
What I've achieved:
Threshold really helped me, I could notice that on white background tests I would obtain for cans:
And this is what it's obtained for bottles:
So, darker areas left dominancy is noticeable. There are some cases in cans where this might turn into false negatives. And for bottles, light and angle may lead to not consistent results but I really really think this could be a shorter approach.
So, I'm quite confused now how I should evaluate that darkness dominancy, I've read that findContours leads to it but I'm quite lost on how to seize such function. For example, in case of soda cans, it may find several contours, so I get lost on what to evaluate.
Note: I'm open to test any other algorithms or libraries distinct to Open CV.
I see few basic ideas here:
Check object (to be precise - object boundind rect) width/height ratio. For can it's approimetely 2-2.5, for bottle i think it will be >3. It's very simple idea to it should be easy to test it quickly and i think it should has quite good accuracy. For some values, like 2.75 (assumimg that values that i gave are correct, which most likely isn't true) you can use some different algorithm.
Check whether you object contains glass/transparence regions - if yes, than definitely it's a bottle. Here you can read more about it.
Use grabcut algorithm to get object mask/more precise shape and check whether this shape width at the top is similar to width at the bottom - if yes than it's a can, no - bottle (bottles has screw cap at the top).
Since you want to recognize can vs bottle rather than pepsi vs coke, shape matching is probably the way to go when compared to Haar and the features2d matchers like SIFT/SURF/ORB
A unique background color will make things easier.
First create a histogram from an image of just the background
int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};
cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );
Then use calcBackProject to create a mask of bg and not bg
cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );
cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
//a small blur here could work nicely
threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
//finding just the right value here might be better than the above method
int magic_threshold = 64;
temp_ND.convertTo( bottle_backproj, CV_8U, 255.);
//I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
Then either:
Compare bottle_mask or bottle_backproj to a few sample bottle masks/backprojections using matchTemplate with a threshold on confidence to decide if it's a match.
matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED);
double confidence; minMaxLoc( result, NULL, &confidence);
Or use matchShapes, though I've never gotten this to work properly.
double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);
Or use linemod which is difficult to set up but works great for images like this where the shape isn't very complex. Aside from the linked file, I haven't found any working samples of this method so here's what I did.
First create/train the detector with some sample images
//some magic numbers
std::vector<int> T_at_level;
//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
width += T - width % T;
int height = bottle_mask.rows;
if( height % T != 0)
height += T - height % T;
//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;
cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask;
//you might need to erode padded_mask by a few pixels.
//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);
//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id = "bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);
Then do some matching
std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;
new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;
As cyriel suggests, the aspect ratio (width/height) might be one useful measure. Here is some OpenCV Python code that finds contours (hopefully including the outline of the bottle or can) and gives you aspect ratio and some other measurements:
# src image should have already had some contrast enhancement (such as
# cv2.threshold) and edge finding (such as cv2.Canny)
contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
num_points = len(contour)
if num_points < 5:
# The contour has too few points to fit an ellipse. Skip it.
# We could use area to help determine the type of object.
# Small contours are probably false detections (not really a whole object).
area = cv2.contourArea(contour)
bounding_ellipse = cv2.fitEllipse(contour)
center, radii, angle_degrees = bounding_ellipse
# Let's define an ellipse's normal orientation to be landscape (width > height).
# We must ensure that the ellipse's measurements match this orientation.
if radii[0] < radii[1]:
radii = (radii[1], radii[0])
angle_degrees -= 90.0
# We could use the angle to help determine the type of object.
# A bottle or can's angle is probably approximately a multiple of 90 degrees,
# assuming that it is at rest and not falling.
# Calculate the aspect ratio (width / height).
# For example, 0.5 means the object's height is 2 times its width.
# A bottle is probably taller than a can.
aspect_ratio = radii[0] / radii[1]
For checking transparency, you can compare the picture to a known background using histogram analysis or background subtraction.
The contour's moments can be used to determine its centroid (center of gravity):
moments = cv2.moments(contour)
m00 = moments['m00']
m01 = moments['m01']
m10 = moments['m10']
centroid = (m10 / m00, m01 / m00)
You could compare this to the center. If the object is bigger ("heavier") on one end, the centroid will be closer to that end than the center is.
So, my main approach for detection was:
Bottles are transparent and cans are opaque
Generally algorithm consisted in:
Take a grayscale picture.
Apply a binary threshold.
Select a convenient ROI from it.
Obtain it's color mean and even the standard deviation.
Implementation was basically reduced to this function (where CAN and BOTTLE were previously defined):
int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {
Mat img;
Rect r;
vector<Mat> channels;
r = Rect(x,y,width,height);
if ( !capture ) {
fprintf( stderr, "ERROR: capture is NULL \n" );
return -1;
img = Mat(cvQueryFrame( capture ));
threshold(img, img, 127, 255, THRESH_BINARY);
// ROI
Mat roiImage = img(r);
split(roiImage, channels);
Scalar m = mean(channels[0]);
float media = m[0];
printf("Media: %f\n", media);
if (media < thresholdValue) {
return CAN;
else {
return BOTTLE;
As it can be seen, a THRESH_BINARY threshold was applied, and it was a plain white background which was used. However the main and critical issue I faced with this whole approach and algorithm was luminosity changes in environment, even minor ones.
Sometimes I could notice a THRESH_BINARY_INV might help more, but I wonder if I could use some certian threshold parameters or wether applying other filters may lead to getting rid of environment lightning as an issue.
I really appreciate the aspect ratio calculation approach from bounding box or finding contours but I found this straight forward and simple when conditions were adjusted.
I'd use deep learning, based on Transfer learning.
The idea is this: given a highly complex well trained neural network, that was trained on a similar classification task (tipically over a large public dataset, like imagenet), you can freeze the majority of its weigths and only train the last layers. There are lots of tutorials out there. You don't need to have a background on deep learning.
There is a tutorial which is almost out of the box with tensorflow here and here there is another based on keras.

Transform point position in trapezoid to rectangle position

I am trying to find out how I can transform a coordinate Pxy within the green trapezoid below into the equivalent coordinate on the real ground plane.
I have the exact measures of the room, meaning I can exactly say how long A,B,C and D are in that room shown below.
Also I know how long A,B,C and D are in that green triangle (coordinate wise).
I have already been reading about homography and matrix transformation, but can't really wrap my head around it. Any input steering me into the right direction would be appreciated.
There is the code computes the affine transformation matrix using the library Opencv (it shows how to trasform your trapezoid to rectangle and how to find transformation matrix for futher calculations):
//example from book
// Learning OpenCV: Computer Vision with the OpenCV Library
// by Gary Bradski and Adrian Kaehler
// Published by O'Reilly Media, October 3, 2008
#include <cv.h>
#include <highgui.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char* argv[])
IplImage *src=0, *dst=0;
// absolute or relative path to image should be in argv[1]
char* filename = argc == 2 ? argv[1] : "Image0.jpg";
// get the picture
src = cvLoadImage(filename,1);
printf("[i] image: %s\n", filename);
assert( src != 0 );
// points (corners of )
CvPoint2D32f srcQuad[4], dstQuad[4];
// transformation matrix
CvMat* warp_matrix = cvCreateMat(3,3,CV_32FC1);
// clone image
dst = cvCloneImage(src);
// define all the points
//here the coordinates of corners of your trapezoid
srcQuad[0].x = ??; //src Top left
srcQuad[0].y = ??;
srcQuad[1].x = ??; //src Top right
srcQuad[1].y = ??;
srcQuad[2].x = ??; //src Bottom left
srcQuad[2].y = ??;
srcQuad[3].x = ??; //src Bot right
srcQuad[3].y = ??;
//- - - - - - - - - - - - - -//
//coordinates of rectangle in src image
dstQuad[0].x = 0; //dst Top left
dstQuad[0].y = 0;
dstQuad[1].x = src->width-1; //dst Top right
dstQuad[1].y = 0;
dstQuad[2].x = 0; //dst Bottom left
dstQuad[2].y = src->height-1;
dstQuad[3].x = src->width-1; //dst Bot right
dstQuad[3].y = src->height-1;
// get transformation matrix that you can use to calculate
//coordinates of point Pxy
// perspective transformation
cvNamedWindow( "cvWarpPerspective", 1 );
cvShowImage( "cvWarpPerspective", dst );
return 0;
Hope it will be helpfull!
If I understand your question correctly, you are looking for the transform matrix that expresses the position and orientation (aka the "pose") of your camera in relation to the world. If you have this matrix - lets call it M - you could map any point from your camera coordinate frame to the world coordinate frame and vice versa. In your case you'll want to transform a rectangle onto the plane (0, 1, 0)^T + 0 in world coordinates.
There are several ways to derive this pose Matrix. First of all you'll need to know another matrix - K - which describes the internal camera parameters to convert positions in the camera coordinate frame to actual pixel positions. This involves a standard pinhole projection as well as radial distortion and a few other things.
To determine both K and M you have to calibrate your camera. This is usually done by taking a calibration pattern (e.g. a chessboard-pattern) for which the positions of the chessboard-fields are known. Then you can establish so called Point-Correspondences between the known positions on the pattern and the observed pixel-positions. Once you have enough of these point-pairs you can solve a Matrix H = KM. This is your Homography matrix you've mentioned already. Once you have that, you can reconstruct K and M.
So much for the theory. For the practical part I would suggest to have a look at the OpenCV-Documentations (e.g. you could start here: OpenCV Camera calibration and here: OpenCV Pose estimation).
I hope this will point you in the right directions ;)
Just for the sake of completion. I ended up looking at the thread suggested by #mmgp and implemented a solution that is equivalent to the one presented by Christopher R. Wren:
Perspective Transform Estimation
This turned out to work really well for my case, although there was some distortion from the camera.

how to draw ROC curve in computer vision algorithm?

I used a detection algorithm to detect the object in 100 images, with each image containing exactly 2 truth, i.e., each image contains 2 objects. then I added noise and find the best one. I calculated the intersection area between detection result and the ground truth intArea, and also the union area unionArea = rectA + rectB - intArea. then I planned to use these ratios to draw ROC curve as follows:
init TP, FP as 100X1 array.
for threshold = 0..1, step = 0.01
curIdx = 1;
for each ratio(i), i = 1..100
if ratio(i) > threshold then
TP(curIdx) = TP(curIdx) + 1;
FP(curIdx) = FP(curIdx) + 1;
then I used TP/100 as Y axis value, and TP/(TP+FP) as X axis value to draw ROC curve.
but the result is not as expected: (I can't post image now because I'm a new user -_-)
So, would anyone plz help me and tell me where I was wrong? thank you all!
VLFeat implements a very easy way to draw ROC curve under the Matlab environment. Please check this link:
If you want to know the internals of ROC graph generation, you can read Tom Fawcett's report on ROC Graphs or his
ScienceDirect article.
If you just want to generate the plots without going to its technicalities, you can use Yard phyton library or ROCR R package
