In machine learning, a lot of techniques require defining a metric between different data points. I want to know what are some popular metrics when the data are images.
An obvious way of measuring distance between images is to sum up the squares of pixel errors. But this is sensitive to simple transformations like translation. For example, even shifting the whole image by one pixel could result in a large distance.
What are some other distance measuring techniques that is more compatible with translation, rotations, etc.?
Wasserstein distance(earth mover's distance) and kullback leibler divergence are the two that I have come across while studying literature about Generative Adversarial Networks(GANs).
Related
I'm reading about image search and I've gotten to the point where I have a basic understanding of feature vectors and have a very basic (definitely incomplete) understanding of rotation invariant and scale invariant features. How you can look at multi-sampled images for scale invariance and corners for rotational invariance.
To search a billion images though there is no way you could do a linear search. Most of my reading seems to imply a K-d tree is used as a partitioning data structure to improve the lookup times.
What metric is the K-d tree split on? If you use descriptors like SIFT,SURF, or ORB there is no guarantee your similar keypoints line up in the feature vectors so I'm confused how you determine 'left' or 'right' since with features like this you need the split to be based on similarity. My guess is in euclidean distance from a 'standard' then you do a robust nearest neighbor search, but would like some input on how the inital query into the KD tree is handled before the nearest neighbor search. I would think a KD tree needs to be comparing similar features in each dimension, but I don't see how that happens with many key points.
I can find a lot of papers on the nearest neighbor search, but most seem to assume you know how this is handled so I'm missing something here.
It's quite simple. All that feature descriptors present image as a point in multidimensional space. Just for the sake of simplicity, let's assume that your descriptor dimension is 2. Than all your images would be mapped onto two dimesional plane. Then, kd-tree will split this plane into rectangular areas. Any images that fall within same area would be considered as similar.
That means, btw, two images that lie really close to each other, but in different areas (leafs of the kd-tree) will not be considered as similar.
To overcome this issue, cosine similarity can be used instead of euclidian distance. You can read more about the subject in wiki.
I have a general question in image processing. I have a noisy image. I would like to classify the noisy image into some regions. Two famous approaches which can use
MRF/Gibbs MRF: model the spatial dependence between neighborhood pixels
Total variation: key idea maybe based on smallest variation of image.
My question is: Could you tell me what are different between two approaches for noise removal? Which one is better? Thanks
The MRF gives you a framework to do discrete optimization of problems, which respect the Markovian property, that is a pixel is conditioned only on the neighboring ones (roughly stated). Typical applications include binary or multi-class labeling problems. Total variation on the other hand, is generally used as a regularization by adding the integral of the absolute gradient of the signal/image to the energy functional. This helps to neglect irrelevant detail and focus on important ones.
We cannot say one is better than the other, as they are not exactly contradictory things. It depends on the application and the energy function you use in the MRFs.
I am making use of the ELKI library to perform some distance measure between features.
Among other features, I am planing to implement Tamura features. From the research that I have done, this algorithm return a vector that represents three 'unrelated' features. (1st element: coarseness, 2nd element: contrast, 3rd-18th element: directional). Shall the distance between two tamura feature vectors be measured as a whole OR is it better for the distance between these three features to be measured independently (possible with different distance functions)?
Besides I read that Chisqaure and Quadratic-form distance are good algorithms to measure distance between histograms since they utilizes information across bins to retrieve more perceptually desirable results. However, I am still not sure whether such algorithms are adequate to measure the directionality histogram part of the Tamura feature. Can someone suggest a good distance function for such situation?
Thanks!
I was trying to make a application that compares the difference between 2 images in java with opencv. After trying various approaches I came across the algorithm called Demons algorithm.
To me it seems to give the difference of images by some transformation on each place. But I couldn't understand it since the references I found were too complex for me.
Even the demons algorithm does not do what I need I'm interested in learning it.
Can any one explain simply what happens in the demons algorithm and how to write a simple code to use that algorithm on 2 images.
I can give you an overview of general algorithms for deformable image registration, demons is one of them
There are 3 components of the algorithm, a similarity metric, a transformation model and an optimization algorithm.
A similarity metric is used to compute pixel based / patch based similarity between pixels/patches. Common similarity measures are SSD, normalized cross correlation for mono-modal images while information theoretic measures like mutual information are used in the case of multi-modal image registration.
In the case of deformable registration, they generally have a regular grid super-imposed over the image and the grid is deformed by solving an optimization problem which is formulated such that the similarity metric and the smoothness penalty imposed over the transformation is minimized. In deformable registration, once there are deformations over the grid, the final transformation at the pixel level is computed using a B-Spine interpolation of the grid at the pixel level so that the transformation is smooth and continuous.
There are 2 general approaches towards solving the optimization problem, some people use discrete optimization and solve it as a MRF optimization problem while some people use gradient descent, I think demons uses gradient descent.
In case of MRF based approaches, the unary cost is the cost for deforming each node in grid and it is the similarity computed between patches, the pairwise cost which imposes the smoothness of the grid, is generally a potts/truncated quadratic potential which ensures that neighboring nodes in the grid have almost the same displacement. Once you have the unary and pairwise cost, you feed it to a MRF optimization algorithm and get the displacements at the grid level, then you use a B-Spline interpolation to compute pixel level displacement. This process is repeated in a coarse to fine fashion over several scales and also the algorithm is run many times at each scale (reducing the displacement at each node every time).
In case of gradient descent based methods, they formulate the problem with the similarity metric and the grid transformation computed over the image and then compute the gradient of the energy function which they have formulated. The energy function is minimized using iterative gradient descent, however these approaches can get stuck in a local minima and are quite slow.
Some popular methods are DROP, Elastix, itk provides some tools
If you want to know more about algorithms related to deformable image registration, I will recommend you to take a look to FAIR( guide book), FAIR is a toolbox for Matlab so you will have examples to understand the theory.
http://www.cas.mcmaster.ca/~modersit/FAIR/
Then if you want to specifically see some demon example,, here you have this other toolbox:
http://www.mathworks.es/matlabcentral/fileexchange/21451-multimodality-non-rigid-demon-algorithm-image-registration
Looking for any information/algorithms relating to comparing vector graphics. E.g. say there two point collections or vector files with two almost identical figures. I want to determine that a first figure is about 90% similar to the second one.
A common way to test for similarity is with image moments. Moments are intrinsically translationally invariant, and if the objects you compare might be scaled or rotated you can use moments that are invariant to these transformations, such as Hu moments.
Most of the programs I know would require rasterized versions of the vector objects; but the moments could be calculated directly from the vector graphics using a Green's Theorem approach, or a more simplistic approach that just identifies unique (unordered) vertex configurations would be to convert the Hu moment integrals to sums over the vertices -- in a physics analogy replacing the continuous object with equal point masses at each vertex.
There is a paper on a tool called VISTO that sorts vector graphics images (using moments, I think), which should certainly be useful for more details.
You could search for fingerprint matching algorithms. Fingerprints are usually converted to a set of points with their relative location to each other, which makes it basically the same problem as yours.
You could transform it to a non-vector graphic and then apply standard image analysis techniques like SIFT points, etc.