I cannot get my head around what is the exact meaning of AllocColorPlanes and AllocColorCells requests. I get that they allocate colors, to which I then refer to using pixel values and I understand what a plane mask in general terms is, as described in the Glossary section of the documentation. What I do not understand is why are plane masks returned when allocating colors? I tried to find the answer in the X protocol documentation, the Xlib documentation, various man pages and code using XCB/Xlib, but none provided a meaningful answer.


Removing skew/distortion based on known dimensions of a shape

I have an idea for an app that takes a printed page with four squares in each corner and allows you to measure objects on the paper given at least two squares are visible. I want to be able to have a user take a picture from less than perfect angles and still have the objects be measured accurately.
I'm unable to figure out exactly how to find information on this subject due to my lack of knowledge in the area. I've been able to find examples of opencv code that does some interesting transforms and the like but I've yet to figure out what I'm asking in simpler terms.
Does anyone know of papers or mathematical concepts I can lookup to get further into this project?
I'm not quite sure how or who to ask other than people on this forum, sorry for the somewhat vague question.
What you describe is very reminiscent of augmented reality marker tracking. Maybe you can start by searching these words on a search engine of your choice.
A single marker, if done correctly, can be used to identify it without confusing it with other markers AND to determine how the surface is placed in 3D space in front of the camera.
But that's all very difficult and advanced stuff, I'd greatly advise to NOT try and implement something like this, it would take years of research... The only way you have is to use a ready-made open source library that outputs the data you need for your app.
It may even not exist. In that case you'll have to buy one. Given the niché of your problem that would be perfectly plausible.
Here I give you only the programming aspect and if you want you can find out about the mathematical aspect from those examples. Most of the functions you need can be done using OpenCV. Here are some examples in python:
To detect the printed paper, you can use cv2.findContours function. The most outer contour is possibly the paper, but you need to test on actual images. https://docs.opencv.org/3.1.0/d4/d73/tutorial_py_contours_begin.html
In case of sloping (not in perfect angle), you can find the angle by cv2.minAreaRect which return the angle of the contour you found above. https://docs.opencv.org/3.1.0/dd/d49/tutorial_py_contour_features.html (part 7b).
If you want to rotate the paper, use cv2.warpAffine. https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_geometric_transformations/py_geometric_transformations.html
To detect the object in the paper, there are some methods. The easiest way is using the contours above. If the objects are in certain colors, you can detect it by using color filter. https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html

Image Segmentation

So I am trying to write some code that lets me segment the fuses you see in the picture below. I have come up with two approaches:
1) Based on color. I threshold using OpenCV's inRange function. This approach works well for all fuses except the brown fuse. The brown fuse it too similar in colour with the fusebox itself and therefore it's very hard to segment it out.
2) I considered thresholding the image heavily so that I can detect the white points/terminals on the fuses themselves using OpenCV SimpleBlobDetector. I then filter out the blobs by their distances to each other. As I know the size of the fuses, I can filter out the invalid fuses. This approach works well for all fuses but the white one as it appears in even the most thresholded images.
I was hoping I could get a pointer on how to segment such an image. Would background subtraction work?
My experience with segmentation is that a single approach frequently does not work for difficult segmentation. I if one algorithm works for all but brown and the other all but white, the union of the two should yield a complete result. I know it is nice to have one elegant algorithm, but many of my best results have had to resort to the hybrid of multiple techniques.
I'd consider separating the channels into rgb and hue, saturation, and value and looking at each channel separately. Sometimes browns that look very similar in color are have significantly different saturation or color channel values. Adding and subtracting different channels can also sometimes enhance contrast. This is simple but in many cases produces a fast and simple output that can be used for thresholding, watershed (see below), or perhaps background subtraction.
I think you might also want to try the watershed algorithm. Many examples and explainations are available. Watershed requires that you provide a mask that contains the background (the fusebox and table) and a piece of each of the foreground objects (fuses). As I understand it, you already can detect the contacts on the fuses so that piece is done.
Another approach is to just accept you can't see brown fuses. If you can detect empty slots and every other color, you may be able to know by deduction where the brown ones are.
It's had to know what will work beforehand without some experimentation, but this should give you some ideas of how to improve what you have.
basic colour image segmentation code in C++ is available at https://github.com/imensedave/basic_vision_region_segmentation
The trick with this approach is edge detection is done before colour regions are grown.

Matching photographed image with screenshot (or generated image based on data model)

first of all, I have to say I'm new to the field of computervision and I'm currently facing a problem, I tried to solve with opencv (Java Wrapper) without success.
Basicly I have a picture of a part from a Model taken by a camera (different angles, resoultions, rotations...) and I need to find the position of that part in the model.
Example Picture:
Model Picture:
So one question is: Where should I start/which algorithm should I use?
My first try was to use KeyPoint Matching with SURF as Detector, Descriptor and BF as Matcher.
It worked for about 2 pcitures out of 10. I used the default parameters and tried other detectors, without any improvements. (Maybe it's a question of the right parameters. But how to find out the right parameteres combined with the right algorithm?...)
Two examples:
My second try was to use the color to differentiate the certain elements in the model and to compare the structure with the model itself (In addition to the picture of the model I also have and xml representation of the model..).
Right now I extraxted the color red out of the image, adjusted h,s,v values manually to get the best detection for about 4 pictures, which fails for other pictures.
Two examples:
I also tried to use edge detection (canny, gray, with histogramm Equalization) to detect geometric structures. For some results I could imagine, that it will work, but using the same canny parameters for other pictures "fails". Two examples:
As I said I'm not familiar with computervision and just tried out some algorithms. I'm facing the problem, that I don't know which combination of algorithms and techniques is the best and in addition to that which parameters should I use. Testing it manually seems to be impossible.
Thanks in advance
Your initial idea of using SURF features was actually very good, just try to understand how the parameters for this algorithm work and you should be able to register your images. A good starting point for your parameters would be varying only the Hessian treshold, and being fearles while doing so: your features are quite well defined, so try to use tresholds around 2000 and above (increasing in steps of 500-1000 till you get good results is totally ok).
Alternatively you can try to detect your ellipses and calculate an affine warp that normalizes them and run a cross-correlation to register them. This alternative does imply much more work, but is quite fascinating. Some ideas on that normalization using the covariance matrix and its choletsky decomposition here.

Template matching algorithms

Please suggest any template matching algorithms, which are independent of size and rotation.
(any source codes as examples if possible please)
Actually I understand how the algorithm works, we can resize template and rotate it. It is computationally expensive, but we can use image pyramids. But the real problem for me now is when the picture is made at some angle to object, so that only a perspective transform can correct the image. I mean that even if we rotate image or scale it, we will not get a good match if the object in image is perspectively transformed. Of course it is possible to try to generate many templates at different perspective, but I think it is very bad idea.
One more problem when using template matching based on shape matching.
What if image doesn't have many sharp edges? For example a plate or dish?
I've also heard about camera callibration for object detection. What is the algorithm used for that purpose? I don't understand how it can be used for template matching.
I don't think there is an efficient template matching algorithm that is affine-invariant (rotation+scale+translation).
You can make template matching somewhat robust to scale+rotation by using a distance transform (see Chamfering style methods). You should probably also look at SIFT and MSER to get a sense of how the research area has been shaped the past decade. But these are not template matching algorithms.
Check out this recent 2013 paper on efficient affine template matching: "Fast-Match". http://www.eng.tau.ac.il/~simonk/FastMatch/
Matlab code is available on that website. Basic idea is to exhaustively search the affine space, but do it in the sparsest way possible based on how smooth the image is. Has a formal approximation guarantee, although it won't always find the absolute best answer.

Computing the difference between images

Do you guys know of any algorithms that can be used to compute difference between images?
Take this webpage for example http://tineye.com/ You give it a link or upload an image and it finds similiar images. I doubt that it compares the image in question against all of them (or maybe it does).
By compute I mean like what the Levenshtein_distance or the Hamming distance is for strings.
By no means do I need to the correct answer for a project or anything, I just found the website and got very curious. I know digg pays for a similiar service for their website.
The very simplest measures are going to be RMS-error based approaches, for example:
Root Mean Square Deviation
Peak Signal to Noise Ratio
These probably gel with your notions of distance measures, but their results are really only meaningful if you've got two images that are very close already, like if you're looking at how well a particular compression scheme preserved the original image. Also, the same result from either comparison can mean a lot of different things, depending on what kind of artifacts there are (take a look at the paper I cite below for some example photos of RMS/PSNR can be misleading).
Beyond these, there's a whole field of research devoted to image similarity. I'm no expert, but here are a few pointers:
A lot of work has gone into approaches using dimensionality reduction (PCA, SVD, eigenvalue analysis, etc) to pick out the principal components of the image and compare them across different images.
Other approaches (particularly medical imaging) use segmentation techniques to pick out important parts of images, then they compare the images based on what's found
Still others have tried to devise similarity measures that get around some of the flaws of RMS error and PSNR. There was a pretty cool paper on the spatial domain structural similarity (SSIM) measure, which tries to mimic peoples' perceptions of image error instead of direct, mathematical notions of error. The same guys did an improved translation/rotation-invariant version using wavelet analysis in this paper on WSSIM.
It looks like TinEye uses feature vectors with values for lots of attributes to do their comparison. If you hunt around on their site, you eventually get to the Ideé Labs page, and their FAQ has some (but not too many) specifics on the algorithm:
Q: How does visual search work?
A: Idée’s visual search technology uses sophisticated algorithms to analyze hundreds of image attributes such as colour, shape, texture, luminosity, complexity, objects, and regions.These attributes form a compact digital signature that describes the appearance of each image, and these signatures are calculated by and indexed by our software. When performing a visual search, these signatures are quickly compared by our search engine to return visually similar results.
This is by no means exhaustive (it's just a handful of techniques I've encountered in the course of my own research), but if you google for technical papers or look through proceedings of recent conferences on image processing, you're bound to find more methods for this stuff. It's not a solved problem, but hopefully these pointers will give you an idea of what's involved.
One technique is to use color histograms. You can use machine learning algorithms to find similar images based on the repesentation you use. For example, the commonly used k-means algorithm. I have seen other solutions trying to analyze the vertical and horizontal lines in the image after using edge detection. Texture analysis is also used.
A recent paper clustered images from picasa web. You can also try the clustering algorithm that I am working on.
Consider using lossy wavelet compression and comparing the highest relevance elements of the images.
What TinEye does is a sort of hashing over the image or parts of it (see their FAQ). It's probably not a real hash function since they want similar "hashes" for similar (or nearly identical) images. But all they need to do is comparing that hash and probably substrings of it, to know whether the images are similar/identical or whether one is contained in another.
Heres an image similarity page, but its for polygons. You could convert your image into a finite number of polygons based on color and shape, and run these algorithm on each of them.
here is some code i wrote, 4 years ago in java yikes that does image comparisons using histograms. dont look at any part of it other than buildHistograms()
maybe its helpful, atleast if you are using java
Correlation techniques will make a match jump out. If they're JPEGs you could compare the dominant coefficients for each 8x8 block and get a decent match. This isn't exactly correlation but it's based on a cosine transfore, so it's a first cousin.
