Loop through all pixels and get/set individual pixel color in OpenGL? - macos

I wrote a little thingy with Processing that I now would like to make a Mac OS X Screen Saver of. However, diving in to OpenGL was not as easy as I thought it would be.
Basically I want to loop through all pixels on screen and based on that pixels color set another pixels color.
The Processing code looks like this:
void setup(){
size(500,500, P2D);
frameRate(30);
background(255);
}
void draw(){
for(int x = 0; x<width; x++){
for(int y = 0; y<height; y++){
float xRand2 = x+random(2);
float yRand2 = y+random(2);
int xRand = int(xRand2);
int yRand = int(yRand2);
if(get(x,y) == -16777216){
set(x+xRand, y+yRand, #FFFFFF);
}
else if(get(x,y) == -1){
set(x+xRand, y+yRand, #000000);
}
}
}
}
It's not very pretty and nor is it very effective. However, I'd like to find out how to do something similiar with OpenGL. I don't even know where to start.

The basic idea of OpenGL is that you never set the values of individual pixels manually, because that's often too slow. Instead you render triangles and do all kinds of tricks with them, like textures, blending, etc.
In order to freely program what each individual pixel does in OpenGL, you need to use a technique called shaders. And that's not very easy if you haven't done anything similar before. The idea of shaders is that GPU executes them instead of CPU, which results in very good performance and takes the load off from the CPU. But in your case it is probably a better idea to do it with CPU and not with shaders and OpenGL, as that approach is much easier to start with.
I recommend you use a library like SDL (or possibly glfw), which lets you do things with pixels without hardware acceleration. You can still do it with OpenGL too, though. By using the function glDrawPixels. That function draws raw pixel data to the screen. But it's probably not very fast.
So start by reading some tutorials about SDL, for example.
Edit: If you want to use shaders, the difficulty with them (among other things) is that you can't specify coordinates to which set pixel values. And you can't get pixel values directly from the screen either. One way to do it with shaders would be the following:
Set up two textures: texture A and texture B
Bind one of the textures as a target that you render everything to
Bind another one of the textures as an input texture for the shader
Render a full-screen quadrangle using your shader and show the result on the screen
Swap textures A and B so you became to use your previous result as your next input
Render again

If you don't want to go the shader route, try doing all your pixel modification in the CPU on a 2D memory array, then use glDrawPixels every frame to push your pixels to the screen. It won't be very hardware accelerated, but it might be fine for your purposes. Another thing to try is to use glTexImage2D to bind your new pixel data every frame to a texture and then render a textured quad to the entire screen. I'm not sure which will be faster. My advice is to try these things before jumping into the complexity of shaders.

There are a few bugs in your code that make reverse engineering and porting it harder, and make me wonder if you actually posted the correct code. Assuming that the visual effect produced is what you want, here is a more efficient and more correct draw():
void draw() {
loadPixels();
for(int x = 0; x<width/2; x++) {
for(int y = 0; y<height/2; y++) {
int x_new = 2*x+int(random(2));
int y_new = 2*y+int(random(2));
if (x_new < width && y_new < height) {
int dest_pixel = (y_new*width + x_new);
color c = pixels[y*width+x];
if(c == #FFFFFF){
pixels[dest_pixel] = #000000;
}
else {
pixels[dest_pixel] = #FFFFFF;
}
}
}
}
updatePixels();
}
Note that the upper bounds of the loop are divided by two. As you wrote it, 3/4 of your set() calls were for pixels that are beyond the bounds of the window. The extra if is necessary because of the addition of small random values to the coordinates.
The overall effect of this code could be described as an in-place stretch and invert of the image, with a little bit of randomness thrown in. Because it's an in-place transformation, it can't be easily parallelized or accelerated, so you are best off implementing this as bitmap/texture operations on the CPU. You can do this without having to ever read pixels from the GPU, but you will have to push a screen full of pixels to the GPU each frame.
If you use glDrawPixels with a format argument of GL_LUMINANCE and a type argument of GL_UNSIGNED_BYTE, then you can pretty easily convert this code to operate on a byte array, which will keep the memory consumption down somewhat as compared with using 32-bit RGBA values.

Related

OpenCL crash when calling finish()

I am writing an OpenCL app on mac using c++, and it crashes in certain cases depending on the work size.
The program crashes due to a SIGABRT.
Is there any way to get more information about the error?
Why is SIGABRT being raised? Can I catch it?
EDIT:
I realize that this program is a doozie, however I will try to explain it in case anyone would like to take a stab at it.
Through debugging I discovered that the cause of the SIGABRT was one of the kernels timing out.
The program is a tile-based 3D renderer. It is an OpenCL implementation of this algorithm: https://github.com/ssloy/tinyrenderer
The screen is divided into 8x8 tiles. One of the kernels (the tiler) computes which polygons overlap each tile, storing the results in a data structure called tilePolys. A subsequent kernel (the rasterizer), which runs one work item per tile, iterates over the list of polys occupying the tile and rasterizes them.
The tiler writes to an integer buffer which is a list of lists of polygon indices. Each list is of a fixed size (polysPerTile + 1 for the count) where the first element is the count and the subsequent polysPerTile elements are indices of polygons in the tile. There is one such list per tile.
For some reason in certain cases the tiler writes a very large poly count (13172746) to one of the tile's lists in tilePolys. This causes the rasterizer to loop for a long time and time out.
The strange thing is that the index to which the large count is written is never accessed by the tiler.
The code for the tiler kernel is below:
// this kernel is executed once per polygon
// it computes which tiles are occupied by the polygon and adds the index of the polygon to the list for that tile
kernel void tiler(
// number of polygons
ulong nTris,
// width of screen
int width,
// height of screen
int height,
// number of tiles in x direction
int tilesX,
// number of tiles in y direction
int tilesY,
// number of pixels per tile (tiles are square)
int tileSize,
// size of the polygon list for each tile
int polysPerTile,
// 4x4 matrix representing the viewport
global const float4* viewport,
// vertex positions
global const float* vertices,
// indices of vertices
global const int* indices,
// array of array-lists of polygons per tile
// structure of list is an int representing the number of polygons covering that tile,
// followed by [polysPerTile] integers representing the indices of the polygons in that tile
// there are [tilesX*tilesY] such arraylists
volatile global int* tilePolys)
{
size_t faceInd = get_global_id(0);
// compute vertex position in viewport space
float3 vs[3];
for(int i = 0; i < 3; i++) {
// indices are vertex/uv/normal
int vertInd = indices[faceInd*9+i*3];
float4 vertHomo = (float4)(vertices[vertInd*4], vertices[vertInd*4+1], vertices[vertInd*4+2], vertices[vertInd*4+3]);
vertHomo = vec4_mul_mat4(vertHomo, viewport);
vs[i] = vertHomo.xyz / vertHomo.w;
}
float2 bboxmin = (float2)(INFINITY,INFINITY);
float2 bboxmax = (float2)(-INFINITY,-INFINITY);
// size of screen
float2 clampCoords = (float2)(width-1, height-1);
// compute bounding box of triangle in screen space
for (int i=0; i<3; i++) {
for (int j=0; j<2; j++) {
bboxmin[j] = max(0.f, min(bboxmin[j], vs[i][j]));
bboxmax[j] = min(clampCoords[j], max(bboxmax[j], vs[i][j]));
}
}
// transform bounding box to tile space
int2 tilebboxmin = (int2)(bboxmin[0] / tileSize, bboxmin[1] / tileSize);
int2 tilebboxmax = (int2)(bboxmax[0] / tileSize, bboxmax[1] / tileSize);
// loop over all tiles in bounding box
for(int x = tilebboxmin[0]; x <= tilebboxmax[0]; x++) {
for(int y = tilebboxmin[1]; y <= tilebboxmax[1]; y++) {
// get index of tile
int tileInd = y * tilesX + x;
// get start index of polygon list for this tile
int counterInd = tileInd * (polysPerTile + 1);
// get current number of polygons in list
int numPolys = atomic_inc(&tilePolys[counterInd]);
// if list is full, skip tile
if(numPolys >= polysPerTile) {
// decrement the count because we will not add to the list
atomic_dec(&tilePolys[counterInd]);
} else {
// otherwise add the poly to the list
// the index is the offset + numPolys + 1 as tilePolys[counterInd] holds the poly count
int ind = counterInd + numPolys + 1;
tilePolys[ind] = (int)(faceInd);
}
}
}
}
My theories are that either:
I have incorrectly implemented the atomic functions for reading and incrementing the count
I am using an incorrect number format causing garbage to be written into tilePolys
One of my other kernels is inadvertently writing into the tilePolys buffer
I do not think it is the last one though because if instead of writing faceInd to tilePolys, I write a constant value, the large poly count disappears.
tilePolys[counterInd+numPolys+1] = (int)(faceInd); // this is the problem line
tilePolys[counterInd+numPolys+1] = (int)(5); // this fixes the issue
It looks like your kernel is crashing on the GPU itself. You can't really get any extra diagnostics about that directly, at least not on macOS. You'll need to start narrowing down the problem. Some suggestions:
As the crash is currently happening in clFinish() you don't know what asynchronous command is causing the crash. Try switching all your enqueue calls to blocking mode. This should cause it to crash in the call that's actually going wrong.
Check return/error codes on all OpenCL API calls. Sometimes, ignoring an error from an earlier call can cause problems in a later call which relies on earlier results. For example, if creating a buffer fails, passing the result of that buffer creation as a kernel argument will cause problems when trying to run the kernel.
The most likely reason for the crash is that your OpenCL kernel is accessing memory out of bounds or is otherwise misusing pointers. Re-check any array index calculations.
Check if the problem occurs with smaller work batches. Scale up from one workgroup (or work item if not using groups) and see if it only occurs beyond a certain work size. This may give you a clue about buffer sizes and array indices that might be causing the crash.
Systematically comment out parts of your kernel. If the crash goes away if you comment out a specific piece of code, there's a good chance the problem is in that code.
If you've narrowed the problem down to a small area of code but can't work out where it's coming from, start recording diagnostic output to check that variables have the values you're expecting.
Without seeing any code, I can't give you any more specific advice than that.
Note that OpenCL is deprecated on macOS, so if you're specifically targeting that platform and don't need to support Linux, Windows, etc. I recommend learning Metal Compute instead. Apple has made it clear that this is the GPU programming platform they want to support, and the tooling for it is already much better than their OpenCL tooling ever was.
I suspect Apple will eventually stop implementing OpenCL support when they release a Mac with a new type of GPU, so even if you're targeting the Mac as well as other platforms, you will probably need to switch to Metal on the Mac somewhere down the line anyway. As of macOS 10.14, the minimum system requirements of the OS already include a Metal-capable GPU, so you only need OpenCL as a fallback if you wish to support all Mac models able to run 10.13 or an even older OS version.

outlining text in processing

My goal is to obtain an outline of text that is 1 pixels wide.
It could look something like this: https://jsfiddle.net/Lk1ju9yw/
I can't think of a good way to go about this so I did the following (in pseudocode):
PImage img;
void setup() {
size(400, 400);
// use text() to write on the canvas
// initialize PImage img
// load pixels for canvas and img
// loop thru canvas pixels and look for contrast
for (int x = 0; x < width; x++) {
for (int y = 0; y < height; y++) {
// compare canvas pixels at x-y with its neighbors
// change respective pixel on PImage img so as not to disturb canvas
}
}
// update pixels and draw img over the canvas
img.updatePixels();
img(img, 0, 0);
}
In a nutshell, I wrote white text on a black background on the canvas, did some edge detection and drew the results on a PImage, then used the PImage to store the results. I guess I could have skipped the PImage phase but I wanted to see what the edge detection algorithm produced.
So this does a decent job of getting the outline but there are some problems:
The outline is sometimes 1+ pixels wide. This is a problem. Suppose I want to store the outline (ie. all the positions of the white pixels) in an ArrayList.
For example, if using the ArrayList I draw an ellipse at EVERY point along the outline, the result is ok. But if I want the ellipses spaced apart, the ellipse-outline becomes kind of rough. In the fiddle I provided, the left edge of the letter 'h' is 2 pixels wide. Sometimes the ellipse will be drawn at the inner pixel, sometimes at the outer. That kind of thing makes it look ugly.
Elements of the ArrayList might be neighbors in the ArrayList, but not on the PImage. If I want to draw a circle for every 10th ArrayList location, the result won't necessarily be spaced apart on the PImage.
Here is an example of how ugly it can be: https://jsfiddle.net/Lk1ju9yw/1/
I am quite sure I understand why this is happening. I just don't know how to avoid it.
I also believe there is a solution (a PFont method) in p5.js. I am comfortable using p5 but unless I have to (let's say, because of difficulty), I would rather use processing. I've also heard of some libraries in processing that can help with this. Partly, I am interested in the result, but I am also interested in learning if I can program a solution myself (with some guidance, that is).
You can get an outline of text very easily in P5.js, because text honors the fill and stroke colors. So if you call noFill() the text will not be filled in, and if you call stroke(0) the text will have a black outline.
function setup() {
createCanvas(400, 200);
noSmooth();
}
function draw() {
background(220);
textSize(72);
textAlign(CENTER);
noFill();
stroke(0);
text("hey", width/2, height/2);
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.5.16/p5.js"></script>
Unfortunately this approach won't work in regular Processing, because it just uses the stroke color for text. I'm not totally sure about Processing.js, but my guess is it's the same as Processing.
If you draw this to a buffer (using createGraphics()), then you can iterate over the buffer to get a list of points that make up your outline.
Now, as for putting the points in the correct order, you're going to have to do that yourself. The first approach that occurs to me is to sort them and group them by letter.
For example, your algorithm might be:
Find the upper-left-most point. Add it to your list.
Does that point you just added have any neighbors? If so, pick one and add it to your list. Repeat this step until the point has no neighbors.
Are there any points left? If so, find the point closest to the one you just added, and add it to your list. Go to step 2.
This might not be perfect, but if you want something more advanced you might have to start thinking about processing the list of points: maybe removing points that have a left neighbor, for example. You're going to have to play around to find the effect you're looking for.
This was an interesting question., thanks for that. Good luck, sounds like a fun project.

Do I need to move the tiles or the player in a 2d tile world?

I'm currently creating a 2d tile game and I'm wondering if the tiles has to move or the character.
I ask this question because I already have created the "2d tile map" but it is running too slow, and I can't fix it. I tried everything now and the result is that I get 30 fps.
The reason that it is running too slow is because every 1ms with a timer, the tiles are being redrawn. But I can't figure out how to fix this problem.
This is how I make the map :
public void makeBoard()
{
for (int i = 0; i < tileArray.GetLength(0); i++)
{
for (int j = 0; j < tileArray.GetLength(1); j++)
{
tileArray[i, j] = new Tile() { xPos = j * 50, yPos = i * 50 };
}
}
}
Here I redraw each 1ms or higher the tiles and sprites :
private void Wereld_Paint_1(object sender, PaintEventArgs e)
{
//label1.Text = k++.ToString();
using (Graphics grap = Graphics.FromImage(bmp))
{
for (int i = 0; i < tileArray.GetLength(0); i++)
{
for (int j = 0; j < tileArray.GetLength(1); j++)
{
grap.DrawImage(tileArray[i, j].tileImage, j * 50, i * 50, 50, 50);
}
}
grap.DrawImage(player.movingObjectImage, player.xPos, player.yPos, 50, 50);
grap.DrawImage(enemyGoblin.movingObjectImage, enemyGoblin.xPos, enemyGoblin.yPos, 50, 50);
groundPictureBox.Image = bmp;
// grap.Dispose();
}
}
This is the Timer with a specific interval :
private void UpdateTimer_Tick(object sender, EventArgs e)
{
if(player.Update()==true) // true keydown event is fired
{
this.Invalidate();
}
label1.Text = lastFrameRate.ToString(); // for fps rate show
CalculateFrameRate(); // for fps rate show
}
Are you writing the tile implementation yourself? Probably the issue is that at every frame you're drawing all tiles.
2D engines with scrolling tiles should draw tiles on a larger sprite than the screen, then draw that sprite around which is a fast operation (you'd need to specify the language you're using so I can provide some hint on how to actually make that fast - basically an in video memory accelerated blit, but every language has it's way to make it happen)
when the border of this super-sprite is closer to the screen border than a threshold (usually half tile), the larger sprite is redrawn around the current position - but there is no need to draw all the tiles on this! start copying the supersprite on this recentered sprite and you only need to draw the tiles missing from the previous supersprite because of the offset.
As mentioned in the comments your concept is wrong. So here's just a simple summary of how to do this task:
Tile map is static
From functional point of view does not matter if player moves or the map but from performance point of view the number of tiles is hugely bigger then number of players so moving player is faster.
To achieve player centered or follow views you have to move the camera too.
rendering
Repainting every 1ms is insane and most likely impossible on nowadays computers if you got medium complexity of the scene. Human vision can't detect it anyway so there is no point in repainting more that 25-40 fps. The only reason for higher fps needs is to be synchronized with your monitor refreshing to avoid scan line artifacts (even LCD use scan lines refreshing). To have more fps then the refresh rate of your monitor is pointless (many fps players would oppose but our perception is what it is no matter what they say).
Anyway if your rendering took more then 1ms (which is more then likely) then your timer is screwed because it should be firing several times before the first handler even stops. That usually causes massive slowdowns due to synchronisation problems so the resulting fps is usually even smaller then the rendering engine could provide. So how to remedy that?
set timer interval to 20ms or more
add bool _redraw=false
And use it to redraw only when you need to repaint screen. So on any action like player movement, camera movement or turn, animation change set it to true
inside timer event handler call your repaint only if _redraw==true and set it to false afterwards.
This will boost performance a lot. even if your repaint will take more than the timer interval still this will be much much faster then your current approach.
To avoid flickering use Back buffering.
camera and clipping
Your map is most likely much bigger then the screen so there is no point to repaint all the tiles. You can look at the camera as a means to select the right part of your map. If your game does not use rotations then you need just position and may be zoom/scale. If you want rotations then 2D 3x3 homogeneous matrices are the way.
Let assume you got only position (no zoom or rotating) then you can use this transformations:
screen_x=world_x-camera_x
screen_y=world_y-camera_y
world_x=screen_x+camera_x
world_y=screen_y+camera_y
So camera is your camera view position, world is you tile position in map grid and screen is the position on screen. If you got indexes of your tile in map then just multiply them by tile size in pixels to obtain the world coordinates.
To select only visible tiles you need to obtain the corner positions of your screen, convert them into world coordinates, then into indexes in map and finally render only tiles inside rectangle that these points form in your map + some margin of error (for example render 1 tile enlarged rectangle in all directions). This way the rendering will be independent on your map size. This process is called clipping.
I strongly recommend to look at these related QAs:
Improving performance of click detection on a staggered column isometric grid
2D Diamond (isometric) map editor ... read the comments there !!!
The demo in the linked QAs use only GDI and direct pixel access to bitmaps in win32 form app so you can compare performance with your code (they should be similar) and tweak your code until it behaves as should.

how can I implement a slow smooth background scrolling in sdl

I am trying to implement background scrolling using SDL 2.
As far as I understand one can only move source rectangle by an integer value.
My scrolling works fine when I move it by one every iteration of the game loop.
But I want to move it slower. I tried to move it using this code
moved += speed;
if (moved >= 1.0) {
++src_rect.x;
moved -= 1;
}
Here moved and speed are doubles . I want my background to move something like ten times slower, therefore I set speed to 0.1. It does move ten times slower, but the animation is no longer smooth. It kind of jumps from one pixel to another, which looks and feels ugly when the speed is low.
I am thinking of making my background larger and scrolling it using an integer. Maybe when background is large enough the speed of 1 will seem slower.
Is there a way to scroll not a very large background slowly and smoothly and the same time?
Thanks.
What I would do is have a set of floats that would track the virtual screen position, then you just cast the floats to integers when you actually render, that way you don't ever lose the precision of the floats.
To give you an example, I have an SDL_Rect, I want to move it every frame. I have two floating point variables that track the x and y position of the rect, every frame I would update those x and y positions, cast them to an integer, and then render the rect, EX:
// Rect position
float XPos = 0.0f;
float YPos = 0.0f;
SDL_Rect rect = {0, 0, 64, 64};
// Update virtual positions
XPos += 20.0f * DeltaTime;
YPos += 20.0f * DeltaTime;
// Move rect down and to the right
rect.x = (int)XPos;
rect.y = (int)YPos;
While this doesn't give you the exact result you are wanting, it is the only way that I know of to do this, it will let you delay your movement more precisely without giving you that ugly chunkiness in the movement, it also will let you add stuff like more precise acceleration too. Hope this helps.

What algorithms or approaches apart from Haar cascades could be used for custom objects detection?

I need to do computer visions tasks in order to detect watter bottles or soda cans. I will obtain 'frontal' images of bottles, soda cans or any other random objects (one by one) and my algorithm should determine whether it's a bottle, a can or any of them.
Some details about object detecting scenario:
As mentioned, I will test one single object per image/video frame.
Not all watter bottles are the same. There could be color in plastic, lid or label variation. Maybe some could not get label or lid.
Same about variation goes for soda cans. No wrinkled soda cans are gonna be tested though.
There could be small size variation between objects.
I could have a green (or any custom color) background.
I will do any needed filters on image.
This will be run on a Raspberry Pi.
Just in case, an example of each:
I've tested a couple times OpenCV face detection algorithms and I know it works pretty good but I'd need to obtain an special Haar Cascades features XML file for detecting each custom object on this approach.
So, the distinct alternatives I have in mind are:
Creating a custom Haar Classifier.
Considering shapes.
Considering outlines.
I'd like to get a simple algorithm and I think creating a custom Haar classifier could be even not needed. What would you suggest?
Update
I strongly considered the shape/aspect ratio approach.
However I guess I'm facing some issues as bottles come in distinct sizes or even shapes each. But this made me think or set following considerations:
I'm applying a threshold with THRESH_BINARY method. (Thanks to the answers).
I will use a white background on detection.
Soda cans are all same size.
So, a bounding box for soda cans with high accuracy might distinguish a can.
What I've achieved:
Threshold really helped me, I could notice that on white background tests I would obtain for cans:
And this is what it's obtained for bottles:
So, darker areas left dominancy is noticeable. There are some cases in cans where this might turn into false negatives. And for bottles, light and angle may lead to not consistent results but I really really think this could be a shorter approach.
So, I'm quite confused now how I should evaluate that darkness dominancy, I've read that findContours leads to it but I'm quite lost on how to seize such function. For example, in case of soda cans, it may find several contours, so I get lost on what to evaluate.
Note: I'm open to test any other algorithms or libraries distinct to Open CV.
I see few basic ideas here:
Check object (to be precise - object boundind rect) width/height ratio. For can it's approimetely 2-2.5, for bottle i think it will be >3. It's very simple idea to it should be easy to test it quickly and i think it should has quite good accuracy. For some values, like 2.75 (assumimg that values that i gave are correct, which most likely isn't true) you can use some different algorithm.
Check whether you object contains glass/transparence regions - if yes, than definitely it's a bottle. Here you can read more about it.
Use grabcut algorithm to get object mask/more precise shape and check whether this shape width at the top is similar to width at the bottom - if yes than it's a can, no - bottle (bottles has screw cap at the top).
Since you want to recognize can vs bottle rather than pepsi vs coke, shape matching is probably the way to go when compared to Haar and the features2d matchers like SIFT/SURF/ORB
A unique background color will make things easier.
First create a histogram from an image of just the background
int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};
cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );
Then use calcBackProject to create a mask of bg and not bg
cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );
cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
//a small blur here could work nicely
threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
//finding just the right value here might be better than the above method
int magic_threshold = 64;
temp_ND.convertTo( bottle_backproj, CV_8U, 255.);
//I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
}
Then either:
Compare bottle_mask or bottle_backproj to a few sample bottle masks/backprojections using matchTemplate with a threshold on confidence to decide if it's a match.
matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED);
double confidence; minMaxLoc( result, NULL, &confidence);
Or use matchShapes, though I've never gotten this to work properly.
double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);
Or use linemod which is difficult to set up but works great for images like this where the shape isn't very complex. Aside from the linked file, I haven't found any working samples of this method so here's what I did.
First create/train the detector with some sample images
//some magic numbers
std::vector<int> T_at_level;
T_at_level.push_back(4);
T_at_level.push_back(8);
//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
width += T - width % T;
int height = bottle_mask.rows;
if( height % T != 0)
height += T - height % T;
//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;
cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask;
//you might need to erode padded_mask by a few pixels.
//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);
//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id = "bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);
Then do some matching
std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;
new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;
As cyriel suggests, the aspect ratio (width/height) might be one useful measure. Here is some OpenCV Python code that finds contours (hopefully including the outline of the bottle or can) and gives you aspect ratio and some other measurements:
# src image should have already had some contrast enhancement (such as
# cv2.threshold) and edge finding (such as cv2.Canny)
contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
num_points = len(contour)
if num_points < 5:
# The contour has too few points to fit an ellipse. Skip it.
continue
# We could use area to help determine the type of object.
# Small contours are probably false detections (not really a whole object).
area = cv2.contourArea(contour)
bounding_ellipse = cv2.fitEllipse(contour)
center, radii, angle_degrees = bounding_ellipse
# Let's define an ellipse's normal orientation to be landscape (width > height).
# We must ensure that the ellipse's measurements match this orientation.
if radii[0] < radii[1]:
radii = (radii[1], radii[0])
angle_degrees -= 90.0
# We could use the angle to help determine the type of object.
# A bottle or can's angle is probably approximately a multiple of 90 degrees,
# assuming that it is at rest and not falling.
# Calculate the aspect ratio (width / height).
# For example, 0.5 means the object's height is 2 times its width.
# A bottle is probably taller than a can.
aspect_ratio = radii[0] / radii[1]
For checking transparency, you can compare the picture to a known background using histogram analysis or background subtraction.
The contour's moments can be used to determine its centroid (center of gravity):
moments = cv2.moments(contour)
m00 = moments['m00']
m01 = moments['m01']
m10 = moments['m10']
centroid = (m10 / m00, m01 / m00)
You could compare this to the center. If the object is bigger ("heavier") on one end, the centroid will be closer to that end than the center is.
So, my main approach for detection was:
Bottles are transparent and cans are opaque
Generally algorithm consisted in:
Take a grayscale picture.
Apply a binary threshold.
Select a convenient ROI from it.
Obtain it's color mean and even the standard deviation.
Distinguish.
Implementation was basically reduced to this function (where CAN and BOTTLE were previously defined):
int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {
Mat img;
Rect r;
vector<Mat> channels;
r = Rect(x,y,width,height);
if ( !capture ) {
fprintf( stderr, "ERROR: capture is NULL \n" );
getchar();
return -1;
}
img = Mat(cvQueryFrame( capture ));
cvtColor(img,img,CV_RGB2GRAY);
threshold(img, img, 127, 255, THRESH_BINARY);
// ROI
Mat roiImage = img(r);
split(roiImage, channels);
Scalar m = mean(channels[0]);
float media = m[0];
printf("Media: %f\n", media);
if (media < thresholdValue) {
return CAN;
}
else {
return BOTTLE;
}
}
As it can be seen, a THRESH_BINARY threshold was applied, and it was a plain white background which was used. However the main and critical issue I faced with this whole approach and algorithm was luminosity changes in environment, even minor ones.
Sometimes I could notice a THRESH_BINARY_INV might help more, but I wonder if I could use some certian threshold parameters or wether applying other filters may lead to getting rid of environment lightning as an issue.
I really appreciate the aspect ratio calculation approach from bounding box or finding contours but I found this straight forward and simple when conditions were adjusted.
I'd use deep learning, based on Transfer learning.
The idea is this: given a highly complex well trained neural network, that was trained on a similar classification task (tipically over a large public dataset, like imagenet), you can freeze the majority of its weigths and only train the last layers. There are lots of tutorials out there. You don't need to have a background on deep learning.
There is a tutorial which is almost out of the box with tensorflow here and here there is another based on keras.

Resources