I have a collection of more than 100 images. Although I am able to see the common patterns between all of them I was asked to write some kind of the script/program to check for similarities or degree of similarity between them.
All of the images are 255x255 in size. There are only two colors black and white. In the most cases images are composed of three primitive shapes:
Squares/rectangles
Vertical/horizontal/diagonal lines
Blob/cloud like shapes stretched diagonally on the image
This shapes in the most cases are in the same place on the images but in different sizes and shapes. For example squares appears in the corners of the image but in different sizes.
My question is : Is there any kind of the software that can give me a numerical value that would represent the degree of similarity between images?
You can compute these operations and match them:
Median filter / Erode / Dilate
Compute gradients of images (Scharr, Sobel) to extract strong patterns
Compute Hough lines transform or contours extraction on the result of 2
4a. Perform Mahalanobis distance on HU moments from the result of 3
4b. (Alternative) compute histograms and match them on results of 3
One of the best libraries so far is OpenCV (http://www.opencv.org), which is well documented (docs.opencv.org)
Related
First I applied Delaunay Triangulation on an image with 3000 triangles. I measured similarity (SSIM) to original image as 0.75. (The higher value more similar)
Then I applied Delaunay Triangulation on the image's RGB channels separately as 1000 triangles each. Then I combined 3 images and formed the final image. Then I measured similarity of this (SSIM) to original image as 0.65. (The higher value more similar)
In both cases; points chosen randomly, median value of pixels containing triangles choosen as color of the triangle
I did lots of trials but none of the trials showed better results.
Isn't this weird? Think about it. I just use 1000 random triangles on one layer. Then 1000 more on second layer. Then 1000 more on third layer. When these are put on top of it, it should create more than 3000 unique polygons compared to final image triangulation. Because they do not coincide.
a) What can be the reason behind this?
b) What advantages can I obtain when I apply delaunay triangulation on RGB channels separately instead of applying it on image itself? It is obvious I can not get better similarity. But maybe Storage wise can I get better? Maybe in other areas? What can they be?
When the triangles in each layer don't coincide, it creates a low-pass filtering effect in brightness, because the three triangles that contribute to a pixel's brightness are larger than the single triangle you get in the other case.
It's hard to suggest any 'advantages' to either approach, since we don't really know why you are doing this in the first place.
If you want better similarity, though, then you have to pick better points. I would suggest making the probability of selecting a point proportional to the magnitude of the gradient at that point.
I have read two references about the SIFT algorithm here and here and I am not truly understanding how just some key points are detected considering that the algorithm works on the difference of Gaussians calculated on several resolutions (they call them octaves). Here are the steps from the technique according to what I've understood from the paper.
Given the input image, blur them with Gaussian filters using different sigmas, resulting in Gaussian filtered images. In the paper, they used 5 Gaussian filters per octave (they tell that two adjacent Gaussian filtered images were filtered using sigma and k * sigma gaussian filter parameters), and they consider 4 octaves in the algorithm. So, there are a total of 20 gaussian filtered images (5 per octave), but we will act on 5 Gaussian filtered images from each octave individually.
For each octave, we calculate 4 Difference Of Gaussian images (DOG) from the 5 Gaussian filtered images by just subtracting adjacent Gaussian filtered images. So, now we have a total of 16 difference of Gaussian images, but we will consider 4 Difference of Gaussian images from each octave individually.
Find local extrema (maximum or minimum values) pixels by comparing a pixel in each DOG with 26 neighbor pixels. Among these, 8 pixels are in the same scale as the pixel (in a 3x3 window), 9 are in a 3x3 window in the scale above (Difference of Gaussian image from the same octave) and 9 others in a 3x3 window in the scale below.
Once finding these local extrema in different octaves, we must refine them, eliminating low-contrast points or weak edge points. They filter bad candidates using threshold in a Taylor expansion function and eigenvalue ratio threshold calculated in a Hessian matrix.
(this part I don't understand perfectly): For each interest point that survived (in each octave, I believe), they consider a neighborhood around it and calculate gradient magnitudes and orientation of each pixel from that region. They build a gradient orientation histogram covering 360 degrees and select the highest peak and also peaks that are higher than 80% of the highest peak. They define that the orientation of the keypoint is defined in a parabola (fitting function?) to the 3 histogram values closest to each peak to interpolate the peak position (I really dont understand this part perfectly).
What I am not understanding
1- The tutorial and even the original paper are not clear on how to detect a single key point as we are dealing with multiple octaves (images resolutions). For example, suppose I have detected 1000 key points in the first octave, 500 in the second, 250 in the third and 125 in the fourth octave. The SIFT algorithm will return me the following data about the key points: 1-(x,y) coordinates 2- scale (what is that?) 3-orientation and 4- the feature vector (which I easily understood how it is built). There are also Python functions from Opencv that can draw these keypoints using the original image (thus, the first octave), but how if the keypoints are detected in different octaves and thus the algorithm considers DOG images with different resolutions?
2- I don't understand the part 5 of the algorithm very well. Is it useful for defining the orientation of the keypoint, right? can somebody explain that to me with other words and maybe I can understand?
3- To find Local extrema per octave (step 3), they don't explain how to do that in the first and last DOG images. As we are considering 4 DOG images, It is possible to do that only in the second and third DOG images.
4- There is another thing that the author wrote that completely messed the understanding of the approach from me:
Figure 1: For each octave of scale space, the initial image is
repeatedly convolved with Gaussians to produce the set of scale space
images shown on the left. Adjacent Gaussian images are subtracted to
produce the difference-of-Gaussian images on the right. After each
octave, the Gaussian image is down-sampled by a factor of 2, and the
process repeated.
What? Does he downsample only one Gaussian image? how can the process be repeated by doing that? I mean, the difference of Gaussians is originally done by filtering the INPUT IMAGE differently. So, I believe the INPUT IMAGE, and not the Gaussian image must be resampled. Or the author forgot to write that THE GAUSSIAN IMAGES from a given octave are downsampled and the process is repeated for another octave?
I am comparing RGB images of small colored granules spilled randomly on a white backdrop. My current method involves importing the image into Matlab, converting to a binary image, setting a threshold and forcing all pixels above it to white. Next, I am calculating the percentage of the pixels that are black. In comparing the images to one another, the measurement of % black pixels is great; however, it does not take into account how well the granules are dispersed. Although the % black from two different images may be identical, the images may be far from being alike. For example, assume I have two images to compare. Both show a % black pixels of 15%. In one picture, the black pixels are randomly distributed throughout the image. In the other, a clump of black pixels are in one corner and are very sparse in the rest of the image.
What can I use in Matlab to numerically quantify how "spread out" the black pixels are for the purpose of comparing the two images?
I haven't been able to wrap my brain around this one yet, and need some help. Your thoughts/answers are most appreciated.
Found an answer to a very similar problem -> https://stats.stackexchange.com/a/13274
Basically, you would use the average distance from a central point to every black pixel as a measure of dispersion.
My idea is based upon the mean free path ()used in ideal gad theory / thermodynamics)
First, you must separate your foreground objects, using something like bwconncomp.
The mean free path is calculated by the mean distance between the centers of your regions. So for n regions, you take all n/2*(n-1) pairs, calculate all the distances and average them. If the mean distance is big, your particles are well spread. If it is small, your objects are close together.
You may want to multiply the resulting mean by n and divide it by the edge length to get a dimensionless number. (Independent of your image size and independent of the number of particles)
I want to find percentage similarity between uncolored images. Specifically, I want to compare my own drawing with an image. Here's an example image:
I don't have any knowledge about image-processing. What algorithms can be used to achieve my goal? Any guidance would be appreciated.
If your images are both black and white, you could compute the Hausdorff distance. In simple words, each black pixel is a point. For each point of image A, you compute the closest point of image B. You get a list of distances. The Hausdorff distance, is the greatest value of this list. The smaller it is, the more similar your images are.
You will have to compute this for several relative position/angle/aspect ratio between your two images, in order to find the position that matches the best.
You can extend this method to any non B&W image by computing the edges.
I am interested in using shapes like these:
Usually a tangram is made of 7 shapes(5 triangles, 1 square and 1 parallelogram).
What I want to do is fill a shape only with tangram shapes, so at this point,
the size and repetition of shapes shouldn't matter.
Here's something I manually tried:
I am a bit lost on how to approach this.
Assuming I have a path (an ordered list/array of points of the outline),
I imagine I should try to do some sort of triangulation.
Is there such a thing as Deulanay triangulation with triangles constrained to 45 degrees
right angled triangles ?
A more 'brute' approach would be to add a bunch of triangles(45 degrees) and use SAT
for collision detection to 'fix' overlaps, and hopefully gaps will be avoided.
Since the square and parallelogram can be made of triangles(45 degrees) too, I imagine there
would be a nice clean geometric solution, right ?
How do I pack triangles(45 degrees) inside an arbitrary shape ?
Any ideas are welcome.
A few random thoughts (maybe they help you find a better solution) if you're using only the original sizes of the shapes:
as you point out, all shapes in the tangram can be made composed of e.g. the yellow or pink triangle (d-g-c), so try also thinking of a bottom-up approach such as first trying to place as many yellow triangles into your shape and then combine them into larger shapes if possible. In the worst case, you'll end up with a set of these smallest triangles.
any kind triangulation of non-polygons (such as the half-moon in your example) probably does not work very well...
It looks like you require that the shapes can only have a few discrete orientations. To find the best fit of these triangles into the given shape, I'd propose the following approximate solution: draw a grid of triangles (i.e. a square grid with diagonal lines) across the shape and take those triangles which are fully contained. This most likely will not give you the optimal coverage but then you could repeatedly shift the grid by a tenth of the grid size in horizontal and vertical direction and see whether you'll find something which covers a larger fraction of the original shape (or you could go in steps of 1/2 then 1/4 etc. of the original grid size in the spirit of a binary search).
If you allow any arbitrary scaling of the shapes you could approximate any (reasonably smooth ?) shape to arbitrary precision by adding smaller and smaller shapes. E.g. if you have a raster image, you can e.g. choose the size of the yellow triangle such that two of them make a pixel on the image and then you can represent any such raster image.