What algorithms or approaches apart from Haar cascades could be used for custom objects detection? - algorithm

I need to do computer visions tasks in order to detect watter bottles or soda cans. I will obtain 'frontal' images of bottles, soda cans or any other random objects (one by one) and my algorithm should determine whether it's a bottle, a can or any of them.
Some details about object detecting scenario:
As mentioned, I will test one single object per image/video frame.
Not all watter bottles are the same. There could be color in plastic, lid or label variation. Maybe some could not get label or lid.
Same about variation goes for soda cans. No wrinkled soda cans are gonna be tested though.
There could be small size variation between objects.
I could have a green (or any custom color) background.
I will do any needed filters on image.
This will be run on a Raspberry Pi.
Just in case, an example of each:
I've tested a couple times OpenCV face detection algorithms and I know it works pretty good but I'd need to obtain an special Haar Cascades features XML file for detecting each custom object on this approach.
So, the distinct alternatives I have in mind are:
Creating a custom Haar Classifier.
Considering shapes.
Considering outlines.
I'd like to get a simple algorithm and I think creating a custom Haar classifier could be even not needed. What would you suggest?
Update
I strongly considered the shape/aspect ratio approach.
However I guess I'm facing some issues as bottles come in distinct sizes or even shapes each. But this made me think or set following considerations:
I'm applying a threshold with THRESH_BINARY method. (Thanks to the answers).
I will use a white background on detection.
Soda cans are all same size.
So, a bounding box for soda cans with high accuracy might distinguish a can.
What I've achieved:
Threshold really helped me, I could notice that on white background tests I would obtain for cans:
And this is what it's obtained for bottles:
So, darker areas left dominancy is noticeable. There are some cases in cans where this might turn into false negatives. And for bottles, light and angle may lead to not consistent results but I really really think this could be a shorter approach.
So, I'm quite confused now how I should evaluate that darkness dominancy, I've read that findContours leads to it but I'm quite lost on how to seize such function. For example, in case of soda cans, it may find several contours, so I get lost on what to evaluate.
Note: I'm open to test any other algorithms or libraries distinct to Open CV.

I see few basic ideas here:
Check object (to be precise - object boundind rect) width/height ratio. For can it's approimetely 2-2.5, for bottle i think it will be >3. It's very simple idea to it should be easy to test it quickly and i think it should has quite good accuracy. For some values, like 2.75 (assumimg that values that i gave are correct, which most likely isn't true) you can use some different algorithm.
Check whether you object contains glass/transparence regions - if yes, than definitely it's a bottle. Here you can read more about it.
Use grabcut algorithm to get object mask/more precise shape and check whether this shape width at the top is similar to width at the bottom - if yes than it's a can, no - bottle (bottles has screw cap at the top).

Since you want to recognize can vs bottle rather than pepsi vs coke, shape matching is probably the way to go when compared to Haar and the features2d matchers like SIFT/SURF/ORB
A unique background color will make things easier.
First create a histogram from an image of just the background
int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};
cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );
Then use calcBackProject to create a mask of bg and not bg
cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );
cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
//a small blur here could work nicely
threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
//finding just the right value here might be better than the above method
int magic_threshold = 64;
temp_ND.convertTo( bottle_backproj, CV_8U, 255.);
//I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
}
Then either:
Compare bottle_mask or bottle_backproj to a few sample bottle masks/backprojections using matchTemplate with a threshold on confidence to decide if it's a match.
matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED);
double confidence; minMaxLoc( result, NULL, &confidence);
Or use matchShapes, though I've never gotten this to work properly.
double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);
Or use linemod which is difficult to set up but works great for images like this where the shape isn't very complex. Aside from the linked file, I haven't found any working samples of this method so here's what I did.
First create/train the detector with some sample images
//some magic numbers
std::vector<int> T_at_level;
T_at_level.push_back(4);
T_at_level.push_back(8);
//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
width += T - width % T;
int height = bottle_mask.rows;
if( height % T != 0)
height += T - height % T;
//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;
cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask;
//you might need to erode padded_mask by a few pixels.
//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);
//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id = "bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);
Then do some matching
std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;
new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;

As cyriel suggests, the aspect ratio (width/height) might be one useful measure. Here is some OpenCV Python code that finds contours (hopefully including the outline of the bottle or can) and gives you aspect ratio and some other measurements:
# src image should have already had some contrast enhancement (such as
# cv2.threshold) and edge finding (such as cv2.Canny)
contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
num_points = len(contour)
if num_points < 5:
# The contour has too few points to fit an ellipse. Skip it.
continue
# We could use area to help determine the type of object.
# Small contours are probably false detections (not really a whole object).
area = cv2.contourArea(contour)
bounding_ellipse = cv2.fitEllipse(contour)
center, radii, angle_degrees = bounding_ellipse
# Let's define an ellipse's normal orientation to be landscape (width > height).
# We must ensure that the ellipse's measurements match this orientation.
if radii[0] < radii[1]:
radii = (radii[1], radii[0])
angle_degrees -= 90.0
# We could use the angle to help determine the type of object.
# A bottle or can's angle is probably approximately a multiple of 90 degrees,
# assuming that it is at rest and not falling.
# Calculate the aspect ratio (width / height).
# For example, 0.5 means the object's height is 2 times its width.
# A bottle is probably taller than a can.
aspect_ratio = radii[0] / radii[1]
For checking transparency, you can compare the picture to a known background using histogram analysis or background subtraction.
The contour's moments can be used to determine its centroid (center of gravity):
moments = cv2.moments(contour)
m00 = moments['m00']
m01 = moments['m01']
m10 = moments['m10']
centroid = (m10 / m00, m01 / m00)
You could compare this to the center. If the object is bigger ("heavier") on one end, the centroid will be closer to that end than the center is.

So, my main approach for detection was:
Bottles are transparent and cans are opaque
Generally algorithm consisted in:
Take a grayscale picture.
Apply a binary threshold.
Select a convenient ROI from it.
Obtain it's color mean and even the standard deviation.
Distinguish.
Implementation was basically reduced to this function (where CAN and BOTTLE were previously defined):
int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {
Mat img;
Rect r;
vector<Mat> channels;
r = Rect(x,y,width,height);
if ( !capture ) {
fprintf( stderr, "ERROR: capture is NULL \n" );
getchar();
return -1;
}
img = Mat(cvQueryFrame( capture ));
cvtColor(img,img,CV_RGB2GRAY);
threshold(img, img, 127, 255, THRESH_BINARY);
// ROI
Mat roiImage = img(r);
split(roiImage, channels);
Scalar m = mean(channels[0]);
float media = m[0];
printf("Media: %f\n", media);
if (media < thresholdValue) {
return CAN;
}
else {
return BOTTLE;
}
}
As it can be seen, a THRESH_BINARY threshold was applied, and it was a plain white background which was used. However the main and critical issue I faced with this whole approach and algorithm was luminosity changes in environment, even minor ones.
Sometimes I could notice a THRESH_BINARY_INV might help more, but I wonder if I could use some certian threshold parameters or wether applying other filters may lead to getting rid of environment lightning as an issue.
I really appreciate the aspect ratio calculation approach from bounding box or finding contours but I found this straight forward and simple when conditions were adjusted.

I'd use deep learning, based on Transfer learning.
The idea is this: given a highly complex well trained neural network, that was trained on a similar classification task (tipically over a large public dataset, like imagenet), you can freeze the majority of its weigths and only train the last layers. There are lots of tutorials out there. You don't need to have a background on deep learning.
There is a tutorial which is almost out of the box with tensorflow here and here there is another based on keras.

Related

How can I scale/interpolate an image with indexed values smoothly?

I am wanting to scale grayscale images (input masks, really) with discrete values up smoothly. The values in these images are indexes that represent arbitrary concepts (e.g. "terrain types"; they are usually indices into a table), rather than values on a continuous scale, so they can't be averaged or blended in any way.
Do there exist algorithms that can do this with a more pleasing result than nearest-neighbour, which results in a very blocky, pixelated result? I am looking for something that will at least produce more rounded, more fluid results. The kind of thing that would be ideal would be a whitepaper, or a library (preferably in Java).
I've researched the subject, but I can't find anything. There is plenty about linear or cubic interpolation, etc., but that won't work for indexed values. The only algorithm I ever see mentioned that does not try to average values is nearest-neighbour. But there must be more?
Using colour here for clarity. I do of course understand that the preferred result here is impossible; I'm not asking for something that reconstitutes destroyed information, just hoping for something that will at least guestimate something smoother than the first result.
Scan the destination image and for every corresponding source pixel (non-integer coordinates) check if the colors of the four surrounding pixels are the same. If yes, assign that color.
If not, perform as many bilinear interpolations as there are different colors. For this assign the weight 1 for a given color (each in turn) and 0 for the others, and interpolate the weight. Finally, keep the color with the largest weight.
By analytical geometry, one can show that in bilinear interpolation, the iso-weight curves are arcs of hyperbola. If your magnification is large, you will see them. G1 continuity is not guaranteed. If this is an annoyance, you can work with G1 bicubic interpolation instead.
If this still does not satisfy you, you can try smooth approximating surfaces rather than interpolating ones. But the principle of keeping the color of maximum weight remains.
If there aren't many distinct colors and you want to use ready-made functions, you can work this out as follows:
split the image in several binary images (white for a chosen color, black for background);
magnify all images (to grayscale) using the favorite method;
now implement yourself a function that assigns every pixel the color that has the largest value among the magnified images.
You can also apply a smoothing filter to the binary images before or after magnification.
For the sake of illustration, here is what you would get with two colors at a time (but this easily generalizes).
Color source image:
Smoothing applied to the binary equivalents:
Magnified:
Maximum weight decision:
One thing you could try is to extract a polygon for the boundary of each uniformly-colored region, then upscale and draw the polygon in the output image. You won’t create neatly rounded edges, but you will avoid the stair-case effect of the nearest neighbor interpolation. Upscaling polygons should avoid gaps between the regions too.
I guess that smoothing the shape for each value individually is a way to avoid undesired mixed value.
To handle values individually, here, I started with your nearest-neighbour image v, and create 3 image { A.bmp, B.bmp, C.bmp } by hand.
(each image has only 1 color region and background is black. e.g. A.bmp is below:)
After smoothing the shape for each image, draw these shapes to one result image buffer with different color.
//I use C++ and OpenCV
int main()
{
const std::string FileNames[3] = { "A.bmp", "B.bmp", "C.bmp" };
const cv::Scalar ResultShowColor[3] = { cv::Scalar(0,255,255), cv::Scalar(0,255,0), cv::Scalar(0,0,255) };
cv::Mat Imgs[3];
const int KernelSize = 15;
for( int i=0; i<3; ++i )
{
Imgs[i] = cv::imread( FileNames[i], cv::IMREAD_GRAYSCALE );
if( Imgs[i].empty() )return 0;
cv::threshold( Imgs[i], Imgs[i], 32, 255, cv::THRESH_BINARY );
cv::GaussianBlur( Imgs[i], Imgs[i], cv::Size(KernelSize,KernelSize), 0 );
cv::threshold( Imgs[i], Imgs[i], 255*0.5, 255, cv::THRESH_BINARY );
cv::imshow( FileNames[i], Imgs[i] );
}
cv::Mat ResultImg = cv::Mat::zeros( Imgs[0].size(), CV_8UC3 );
for( int i=0; i<3; ++i )
{
ResultImg.setTo( ResultShowColor[i], Imgs[i] );
}
cv::imshow( "ResultImg", ResultImg );
if( cv::waitKey() == 's' ){ cv::imwrite( "ResultImg.png", ResultImg ); }
return 0;
}
This is result:
Yes, this result is not enough. Gaps exist at the boundaries of shapes.
Therefore some ingenuity is required... but I post this because it might be some hint for you.

depth peeling invariance in webgl (and threejs)

I'm looking at what i think is the first paper for depth peeling (the simplest algorithm?) and I want to implement it with webgl, using three.js
I think I understand the concept and was able to make several peels, with some logic that looks like this:
render(scene, camera) {
const oldAutoClear = this._renderer.autoClear
this._renderer.autoClear = false
setDepthPeelActive(true) //sets a global injected uniform in a singleton elsewhere, every material in the scene has onBeforeRender injected with additional logic and uniforms
let ping
let pong
for (let i = 0; i < this._numPasses; i++) {
const pingPong = i % 2 === 0
ping = pingPong ? 1 : 0
pong = pingPong ? 0 : 1
const writeRGBA = this._screenRGBA[i]
const writeDepth = this._screenDepth[ping]
setDepthPeelPassNumber(i) //was going to try increasing the polygonOffsetUnits here globally,
if (i > 0) {
//all but first pass write to depth
const readDepth = this._screenDepth[pong]
setDepthPeelFirstPass(false)
setDepthPeelPrevDepthTexture(readDepth)
this._depthMaterial.uniforms.uFirstPass.value = 0
this._depthMaterial.uniforms.uPrevDepthTex.value = readDepth
} else {
//first pass just renders to depth
setDepthPeelFirstPass(true)
setDepthPeelPrevDepthTexture(null)
this._depthMaterial.uniforms.uFirstPass.value = 1
this._depthMaterial.uniforms.uPrevDepthTex.value = null
}
scene.overrideMaterial = this._depthMaterial
this._renderer.render(scene, camera, writeDepth, true)
scene.overrideMaterial = null
this._renderer.render(scene, camera, writeRGBA, true)
}
this._quad.material = this._blitMaterial
// this._blitMaterial.uniforms.uTexture.value = this._screenDepth[ping]
this._blitMaterial.uniforms.uTexture.value = this._screenRGBA[
this._currentBlitTex
]
console.log(this._currentBlitTex)
this._renderer.render(this._scene, this._camera)
this._renderer.autoClear = oldAutoClear
}
I'm using gl_FragCoord.z to do the test, and packing the depth into a 8bit RGBA texture, with a shader that looks like this:
float depth = gl_FragCoord.z;
vec4 pp = packDepthToRGBA( depth );
if( uFirstPass == 0 ){
float prevDepth = unpackRGBAToDepth( texture2D( uPrevDepthTex , vSS));
if( depth <= prevDepth + 0.0001) {
discard;
}
}
gl_FragColor = pp;
Varying vSS is computed in the vertex shader, after the projection:
vSS.xy = gl_Position.xy * .5 + .5;
The basic idea seems to work and i get peels, but only if i using the fudge factor. It looks like it fails though as the angle gets more obtuse (which is why polygonOffset needs both the factor and units, to account for the slope?).
I didn't understand at all how the invariance is solved. I don't understand how the mentioned extension is being used other than it seems to be overriding the fragment depth, but with what?
I must admit that I'm not sure even which interpolation is being referred to here since every pixel is aligned, i'm just using nearest filtering.
I did see some hints about depth buffer precision, but not really understanding the issue, i wanted to try packing the depth into only three channels and see what happens.
Having such a small fudge factor make it sort of work tells me that likely all these sampled and computed depths do seem to exist in the same space. But this seems to be the same issue as if using gl.EQUAL for depth testing? For shits and giggles i tried to override the depth with the unpacked depth immediately after packing it, but it didn't seem to do anything.
edit
Increasing the polygon offset with each peel seems to have done the trick. I got some fighting though with the lines but i think it's due to the fact that i was already using offset to draw them and i need to include that in the peel offset. I'd still love to understand more about the problem.
The depth buffer stores depths :) Depending on the 'far' and 'near' planes the perspective projection tends to set the depths of the points "stacked" in just a short part of the buffer. It's not linear in z. You can see this on your own setting a different color depending on the depth and render some triangle that takes most of near-far distance.
A shadow map stores depths (distances to light)... calculated after projection. Later, in the second or following pass, you will compare those depths, which are "stacked", which makes some comparisons to fail due to they are very similar values: hazardous variances.
You can user a more fine-grained depth buffer, 24 bits instead of 16 or 8 bits. This may solve part of the problem.
There's another issue: the perspective division or z/w, needed to get normalized device coordinates (NDC). It occurs after vertex shader, so gl_FragDepth = gl_FragCoord.z is affected.
The other approach is to store the depths calculated in some space that doesn't suffer "stacking" nor perspective division. Camera space is one. In other words, you can calculate the depth undoing projection in the vertex shader.
The article you link to is for old fixed-pipeline, without shaders. It shows a NVIDIA extension to deal with these variances.

How to convert cv2.addWeighted and cv2.GaussianBlur into MATLAB?

I have this Python code:
cv2.addWeighted(src1, 4, cv2.GaussianBlur(src1, (0, 0), 10), src2, -4, 128)
How can I convert it to Matlab? So far I got this:
f = imread0('X.jpg');
g = imfilter(f, fspecial('gaussian',[size(f,1),size(f,2)],10));
alpha = 4;
beta = -4;
f1 = f*alpha+g*beta+128;
I want to subtract local mean color image.
Input image:
Blending output from OpenCV:
The documentation for cv2.addWeighted has the definition such that:
cv2.addWeighted(src1, alpha, src2, beta, gamma[, dst[, dtype]]) → dst
Also, the operations performed on the output image is such that:
(source: opencv.org)
Therefore, what your code is doing is exactly correct... at least for cv2.addWeighted. You take alpha, multiply this by the first image, then beta, multiply this by the second image, then add gamma on top of this. The only intricacy left to deal with is saturate, which means that any values that are beyond the dynamic range of the data type you are dealing with, you cap it at that much. Because there is a potential for negatives to occur in the result, the saturate option simply means to make any values that are negative 0 and any values that are greater than the maximum expected to that max. In this case, you'll want to make any values larger than 1 equal to 1. As such, it'll be a good idea to convert your image to double through im2double because you want to allow the addition and subtraction of values beyond the dynamic range to happen first, then you saturate after. By using the default image precision of the image (which is uint8), the saturation will happen even before the saturate operation occurs, and that'll give you the wrong results. Because you're doing this double conversion, you'll want to convert the addition of 128 for your gamma to 0.5 to compensate.
Now, the only slight problem is your Gaussian Blur. Looking at the documentation, by doing cv2.GaussianBlur(src1, (0, 0), 10), you are telling OpenCV to infer on the mask size while the standard deviation is 10. MATLAB does not infer the size of the mask for you, so you need to do this yourself. A common practice is to simply find six-times the standard deviation, take the floor and add 1. This is for both the horizontal and vertical dimensions of the mask. You can see my post here on the justification as to why this is common practice: By which measures should I set the size of my Gaussian filter in MATLAB?
Therefore, in MATLAB, you would do this with your Gaussian blur instead. BTW, it's simply imread, not imread0:
f = im2double(imread('http://i.stack.imgur.com/kl3Md.jpg')); %// Change - Reading image directly from StackOverflow
sigma = 10; %// Change
sz = 1 + floor(6*sigma); %// Change
g = imfilter(f, fspecial('gaussian', sz, sigma)); %// Change
%// Rest of the code is the same
alpha = 4;
beta = -4;
f1 = f*alpha+g*beta+0.5; %// Change
%// Saturate
f1(f1 > 1) = 1;
f1(f1 < 0) = 0;
I get this image:
Take a note that there is a slight difference in the way this appears between OpenCV in MATLAB... especially the hallowing around the eye. This is because OpenCV does something different when inferring the mask size for the Gaussian blur. This I'm not sure what is going on, but how I specified the mask size by looking at the standard deviation is one of the most common heuristics for it. Play around with the standard deviation until you get something you like.

Need to automatically eliminate noise in image and outer boundary of a object

i am mechanical engineering student working on a project to automatically detect the weld seam (The seam is a edge that is to be welded) present in a workshop. This gives a basic terminology involved in welding (http://i.imgur.com/Hfwjq0w.jpg).
To separate the weldment from the other objects, i have taken the background image and subtracted the foreground image having the weldment to obatin only the weldment(http://i.imgur.com/v7yBWs1.jpg). After image subtraction,there are the shadow ,glare and remnant noises of subtracted background are still present.
As i want to automatically identify only the weld seam without the outer boundary of weldment, i have tried to detect the edges in the weldment image using canny algorithm and tried to eliminate the isolated noises using the function bwareopen.I have somehow obtained the approximate boundary of weldment and weld seam. The threshold i have used are purely on trial and error approach as dont know a way to automatically set a threshold to detect them.
The problem now i am facing is that i cant specify an definite threshold as this algorithm should be able to identify the seam of any material regardless of its surface texture,glare and shadow present there. I need some assistance to remove the glare,shadow and isolated points from the background subtracted image.
Also i need help to get rid of the outer boundary and obtain only smooth weld seam from starting point to end point.
i have tried to use the following code:
a=imread('imageofworkpiece.jpg'); %http://i.imgur.com/3ngu235.jpg
b=imread('background.jpg'); %http://i.imgur.com/DrF6wC2.jpg
Ip = imsubtract(b,a);
imshow(Ip) % weldment separated %http://i.imgur.com/v7yBWs1.jpg
BW = rgb2gray(Ip);
c=edge(BW,'canny',0.05); % by trial and error
figure;imshow(c) % %http://i.imgur.com/1UQ8E3D.jpg
bw = bwareaopen(c, 100); % by trial and error
figure;imshow(bw) %http://i.imgur.com/Gnjy2aS.jpg
Can anybody please suggest me a adaptive way to set a threhold and remove the outer boundary to detect only the seam? Thank you
Well this doesn't solve your problem of finding an automatic thresholding algorithm. but I can help with isolation the seam. The seam is along the y axis (will this always be the case?) so I used hough transform to isolate only near vertical lines. Normally it finds all lines but I restricted the theta search parameter. The code I'm using now happens to highlight the longest line segment (I got it directly from the matlab website) and it is coincidentally the weld seam. This was purely coincidental. But using your bwareaopened image as input the hough line detector is able to find the seam. Of course it required a bit of playing around to work, so you are stuck at your original problem of finding optimal settings somehow
Maybe this can be a springboard for someone else
a=imread('weldment.jpg'); %http://i.imgur.com/3ngu235.jpg
b=imread('weld_bg.jpg'); %http://i.imgur.com/DrF6wC2.jpg
Ip = imsubtract(b,a);
imshow(Ip) % weldment separated %http://i.imgur.com/v7yBWs1.jpg
BW = rgb2gray(Ip);
c=edge(BW,'canny',0.05); % by trial and error
bw = bwareaopen(c, 100); % by trial and error
figure(1);imshow(c) ;title('canny') % %http://i.imgur.com/1UQ8E3D.jpg
figure(2);imshow(bw);title('bw area open') %http://i.imgur.com/Gnjy2aS.jpg
[H,T,R] = hough(bw,'RhoResolution',1,'Theta',-15:5:15);
figure(3)
imshow(H,[],'XData',T,'YData',R,...
'InitialMagnification','fit');
xlabel('\theta'), ylabel('\rho');
axis on, axis normal, hold on;
P = houghpeaks(H,5,'threshold',ceil(0.5*max(H(:))));
x = T(P(:,2)); y = R(P(:,1));
plot(x,y,'s','color','white');
% Find lines and plot them
lines = houghlines(BW,T,R,P,'FillGap',2,'MinLength',30);
figure(4), imshow(BW), hold on
max_len = 0;
for k = 1:length(lines)
xy = [lines(k).point1; lines(k).point2];
plot(xy(:,1),xy(:,2),'LineWidth',2,'Color','green');
% Plot beginnings and ends of lines
plot(xy(1,1),xy(1,2),'x','LineWidth',2,'Color','yellow');
plot(xy(2,1),xy(2,2),'x','LineWidth',2,'Color','red');
% Determine the endpoints of the longest line segment
len = norm(lines(k).point1 - lines(k).point2);
if ( len > max_len)
max_len = len;
xy_long = xy;
end
end
% highlight the longest line segment
plot(xy_long(:,1),xy_long(:,2),'LineWidth',2,'Color','blue');
from your image it looks like the weld seam will be usually very dark with sharp intensity edge so why don't you use that ?
do not use background
create derivation image
dx[y][x]=pixel[y][x]-pixel[y][x-1]
do this for whole image (if on place then x must decrease in loop!!!)
filter out all derivations lower then thresholds
if (|dx[y][x]|<threshold) dx[y][x]=0; else pixel[y][x]=255;` // or what ever values you use
how to obtain threshold value ?
compute min and max intensity and set threshold as (max-min)*scale where scale is value lower then 1.0 (start with 0.02 or 0.1 for example ...
do this also for y axis
so compute dy[][]... and combine dx[][] and dy[][] together. Either with OR or by AND logical functions
filter out artifacts
you can use morphologic filters or smooth threshold for this. After all this you will have mask of pixels of weld seam
if you need boundig box then just loop through all pixels and remember min,max x,y coords ...
[Notes]
if your images will have good lighting then you can ignore the derivation and threshold the intensity directly with something like:
threshold = 0.5*(average_intensity+lowest_intensity)
if you want really fully automate this then you have to use adaptive thresholds. So try more thresholds in a loop and remember result closest to desired output based on geometry size,position etc ...
[edit1] finally have some time/mood for this so
Intensity image threshold
you provided just single image which is far from enough to make reliable algorithm. This is the result
as you can see without further processing this is not good approach
Derivation image threshold
threshold derivation by x (10%)
threshold derivation by y (5%)
AND combination of both 10% di/dx and 1.5% di/dy
The code in C++ looks like this (sorry do not use Matlab):
int x,y,i,i0,i1,tr2,tr3;
pic1=pic0; // copy input image pic0 to pic1
pic2=pic0; // copy input image pic0 to pic2 (just to resize to desired size for derivation)
pic3=pic0; // copy input image pic0 to pic3 (just to resize to desired size for derivation)
pic1.rgb2i(); // RGB -> grayscale
// abs derivate by x
for (y=pic1.ys-1;y>0;y--)
for (x=pic1.xs-1;x>0;x--)
{
i0=pic1.p[y][x ].dd;
i1=pic1.p[y][x-1].dd;
i=i0-i1; if (i<0) i=-i;
pic2.p[y][x].dd=i;
}
// compute min,max derivation
i0=pic2.p[1][1].dd; i1=i0;
for (y=1;y<pic1.ys;y++)
for (x=1;x<pic1.xs;x++)
{
i=pic2.p[y][x].dd;
if (i0>i) i0=i;
if (i1<i) i1=i;
}
tr2=i0+((i1-i0)*100/1000);
// abs derivate by y
for (y=pic1.ys-1;y>0;y--)
for (x=pic1.xs-1;x>0;x--)
{
i0=pic1.p[y ][x].dd;
i1=pic1.p[y-1][x].dd;
i=i0-i1; if (i<0) i=-i;
pic3.p[y][x].dd=i;
}
// compute min,max derivation
i0=pic3.p[1][1].dd; i1=i0;
for (y=1;y<pic1.ys;y++)
for (x=1;x<pic1.xs;x++)
{
i=pic3.p[y][x].dd;
if (i0>i) i0=i;
if (i1<i) i1=i;
}
tr3=i0+((i1-i0)*15/1000);
// threshold the derivation images and combine them
for (y=1;y<pic1.ys;y++)
for (x=1;x<pic1.xs;x++)
{
// copy original (pic0) pixel for non thresholded areas the rest fill with green color
if ((pic2.p[y][x].dd>=tr2)&&(pic3.p[y][x].dd>=tr3)) i=0x00FF00;
else i=pic0.p[y][x].dd;
pic1.p[y][x].dd=i;
}
pic0 is input image
pic1 is output image
pic2,pic3 are just temporary storage for derivations
pic?.xy,pic?.ys is the size of pic?
pic.p[y][x].dd is pixel axes (dd means access pixel as DWORD ...)
as you can see there is a lot of stuff around (nod visible in the first image you provided) so you need to process this further
segmentate and separate...,
use hough transform ...
filter out small artifacts ...
identify object by expected geometry properties (aspect ratio,position,size)
Adaptive thresholds:
you need for this to know the desired output image properties (not possible to reliably deduce from single image input) then create function that do the above processing with variable tr2,tr3. Try in loop more options of tr2,tr3 (loop through all values or iterate to better results and remember the best output (so you also need some function that detects the quality of output) for example:
quality=0.0; param=0.0;
for (a=0.2;a<=0.8;a+=0.1)
{
pic1=process_image(pic0,a);
q=detect_quality(pic1);
if (q>quality) { quality=q; param=a; pico=pic1; }
}
after this the pic1 should hold the relatively best threshold image ... You should process like this all threshold separately inside the process_image the targeted threshold must be scaled by a for example tr2=i0+((i1-i0)*a);

Detection of coins (and fit ellipses) on an image

I am currently working on a project where I am trying to detect a few coins lying on a flat surface (i.e. a desk). The coins do not overlap and are not hidden by other objects. But there might be other objects visible and the lighting conditions may not be perfect... Basically, consider yourself filming your desk which has some coins on it.
So each point should be visible as an Ellipse. Since I don't know the position of the camera the shape of the ellipses may vary, from a circle (view from top) to flat ellipses depending on the angle the coins are filmed from.
My problem is that I am not sure how to extract the coins and finally fit ellipses over them (which I am looking for to do further calculations).
For now, I have just made the first attempt by setting a threshold value in OpenCV, using findContours() to get the contour lines and fitting an ellipse. Unfortunately, the contour lines only rarely give me the shape of the coins (reflections, bad lighting, ...) and this way is also not preferred since I don't want the user to set any threshold.
Another idea was to use a template matching method of an ellipse on that image, but since I don't know the angle of the camera nor the size of the ellipses I don't think this would work well...
So I wanted to ask if anybody could tell me a method that would work in my case.
Is there a fast way to extract the three coins from the image? The calculations should be made in realtime on mobile devices and the method should not be too sensitive for different or changing lights or the color of the background.
Would be great if anybody could give me any tips on which method could work for me.
Here's some C99 source implementing the traditional approach (based on OpenCV doco):
#include "cv.h"
#include "highgui.h"
#include <stdio.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
//
// We need this to be high enough to get rid of things that are too small too
// have a definite shape. Otherwise, they will end up as ellipse false positives.
//
#define MIN_AREA 100.00
//
// One way to tell if an object is an ellipse is to look at the relationship
// of its area to its dimensions. If its actual occupied area can be estimated
// using the well-known area formula Area = PI*A*B, then it has a good chance of
// being an ellipse.
//
// This value is the maximum permissible error between actual and estimated area.
//
#define MAX_TOL 100.00
int main( int argc, char** argv )
{
IplImage* src;
// the first command line parameter must be file name of binary (black-n-white) image
if( argc == 2 && (src=cvLoadImage(argv[1], 0))!= 0)
{
IplImage* dst = cvCreateImage( cvGetSize(src), 8, 3 );
CvMemStorage* storage = cvCreateMemStorage(0);
CvSeq* contour = 0;
cvThreshold( src, src, 1, 255, CV_THRESH_BINARY );
//
// Invert the image such that white is foreground, black is background.
// Dilate to get rid of noise.
//
cvXorS(src, cvScalar(255, 0, 0, 0), src, NULL);
cvDilate(src, src, NULL, 2);
cvFindContours( src, storage, &contour, sizeof(CvContour), CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE, cvPoint(0,0));
cvZero( dst );
for( ; contour != 0; contour = contour->h_next )
{
double actual_area = fabs(cvContourArea(contour, CV_WHOLE_SEQ, 0));
if (actual_area < MIN_AREA)
continue;
//
// FIXME:
// Assuming the axes of the ellipse are vertical/perpendicular.
//
CvRect rect = ((CvContour *)contour)->rect;
int A = rect.width / 2;
int B = rect.height / 2;
double estimated_area = M_PI * A * B;
double error = fabs(actual_area - estimated_area);
if (error > MAX_TOL)
continue;
printf
(
"center x: %d y: %d A: %d B: %d\n",
rect.x + A,
rect.y + B,
A,
B
);
CvScalar color = CV_RGB( rand() % 255, rand() % 255, rand() % 255 );
cvDrawContours( dst, contour, color, color, -1, CV_FILLED, 8, cvPoint(0,0));
}
cvSaveImage("coins.png", dst, 0);
}
}
Given the binary image that Carnieri provided, this is the output:
./opencv-contour.out coin-ohtsu.pbm
center x: 291 y: 328 A: 54 B: 42
center x: 286 y: 225 A: 46 B: 32
center x: 471 y: 221 A: 48 B: 33
center x: 140 y: 210 A: 42 B: 28
center x: 419 y: 116 A: 32 B: 19
And this is the output image:
What you could improve on:
Handle different ellipse orientations (currently, I assume the axes are perpendicular/horizontal). This would not be hard to do using image moments.
Check for object convexity (have a look at cvConvexityDefects)
Your best way of distinguishing coins from other objects is probably going to be by shape. I can't think of any other low-level image features (color is obviously out). So, I can think of two approaches:
Traditional object detection
Your first task is to separate the objects (coins and non-coins) from the background. Ohtsu's method, as suggested by Carnieri, will work well here. You seem to worry about the images being bipartite but I don't think this will be a problem. As long as there is a significant amount of desk visible, you're guaranteed to have one peak in your histogram. And as long as there are a couple of visually distinguishable objects on the desk, you are guaranteed your second peak.
Dilate your binary image a couple of times to get rid of noise left by thresholding. The coins are relatively big so they should survive this morphological operation.
Group the white pixels into objects using region growing -- just iteratively connect adjacent foreground pixels. At the end of this operation you will have a list of disjoint objects, and you will know which pixels each object occupies.
From this information, you will know the width and the height of the object (from the previous step). So, now you can estimate the size of the ellipse that would surround the object, and then see how well this particular object matches the ellipse. It may be easier just to use width vs height ratio.
Alternatively, you can then use moments to determine the shape of the object in a more precise way.
I don't know what the best method for your problem is. About thresholding specifically, however, you can use Otsu's method, which automatically finds the optimal threshold value based on an analysis of the image histogram. Use OpenCV's threshold method with the parameter ThresholdType equal to THRESH_OTSU.
Be aware, though, that Otsu's method work well only in images with bimodal histograms (for instance, images with bright objects on a dark background).
You've probably seen this, but there is also a method for fitting an ellipse around a set of 2D points (for instance, a connected component).
EDIT: Otsu's method applied to a sample image:
Grayscale image:
Result of applying Otsu's method:
If anyone else comes along with this problem in the future as I did, but using C++:
Once you have used findContours to find the contours (as in Misha's answer above), you can easily fit ellipses using fitEllipse, eg
vector<vector<Point> > contours;
findContours(img, contours, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0,0));
RotatedRect rotRecs[contours.size()];
for (int i = 0; i < contours.size(); i++) {
rotRecs[i] = fitEllipse(contours[i]);
}

Resources