Image size influence comparing histograms OpenCV - image

I'm using compareHist() function to compare the histograms of two images.
My question is: Does the size of the image has a considerable influence on the results? Should I resize the images or normalize the histograms before comparing? I'm using the CV_COMP_CORREL method.

You have to normalize histograms before comparision.
Imagine you have non noramlized histograms, e.g. one of them has a bins values in interval [0..1000] and other in [0..1]. How can you compare them? Of course every mathematical operation like addition makes no sense, because what is the result of this addition?
Then in practice a size of an image does not really matters.
In practice means that if you hava an Image A and you scale it lets say two times and you've got an image B, then if you compute hist(A) and hist(B), normalize both then histograms will be practically the same. It is because of the fact if you are scaling an image by factor k, and you have n pixels in color c in image A, then in image B you have approximately k*k*n pixels in color c (depends of interpolation). So every color amount also "scale" proportionally, so if you normalize hist(A)and hist(B), results will be approximately the same (also if your bins have sizes greater than 1 like 16 etc.)

Related

Data structure for pixel selections in a picture

Is there a convenient data structure for storing a pixel selection in a picture?
By pixel selection I mean a set of pixels you obtain with selection tools such as those in image editing software (rectangles, lasso, magic wand, etc.). there can be holes, and in the general case the selection is (much) smaller than the picture itself.
The objective is to be able to save/load selections, display the selected pixels only is a separate view (bounding box size), using selections in specific algorithms (typically algorithms requiring segmentation), etc. It should use as little memory space as possible since the objective is to store a lot of them in a DB.
Solutions I found so far:
a boolean array (size of the picture/8)
a list of (uint16,uint16) => unefficient if many pixels in the selection
an array of lists: lists of pixels series for each line
A boolean array will take W x H bits for the raster plus extra accounting (such as ROI limits). This is roughly proportional to the area of the bounding box.
A list of pixel coordinates will take like 32 bits (2x16 bits) per selected pixel. This is pretty large compared to the boolean array, except when the selection is very hollow.
Another useful representation is the run-length-encoding, which counts the continguous pixels row by row. This representation will take about 16 bits per run. Said differently, 16 / n bits per pixel when the average length of the runs is n pixels. This works fine for large filled shapes, but poorly for isolated pixels.
Finally, you can also consider just storing the outlines of the shapes as a list of pixels (32 bits per pixel) or as a Freeman chain (only 3 bits per pixel), which can be a significant saving with respect to the full enumeration.
As you can see, the choice is uneasy because the efficiency of the different representations is strongly dependent on the shape of the selection. Another important aspect is the ease with which the given representation can be used for the targeted processing of the selection.

Algorithm to cut down number of comparisons to calculate chi-squared distance between histograms?

I'm working on a side project that will accept a source image and then produce a photo mosaic using a set of thumbnailed images it has available. I have an implementation that works OK (see below) but I'm running up against "big O" issues trying to increase the number of available images for replacement.
The process I'm currently using is the following:
I pre-calculated 4 bucket RGB color histograms for all the available replacement images
Scale up the source image to 1000x1000
Create 20x20 "tiles" from the scaled source image and create 4 bucket RGB histograms for each tile
For each tile, calculate the Chi-squared distance for each of the available replacement images
For each tile, select the replacement image with the smallest Chi-squared distance
So concretely, the problem I'm running into as the number of available replacement images increases the number of comparisons grows exponentially. I'm currently testing with 25,000 available replacement images and it takes nearly 10 minutes to generate the final image across 4 cores on my laptop.
My question is, is there an approach I can use to avoid having the number of distance calculations grow exponentially?
One idea I had was calculating the distances between each of the goal "tiles", separating them into some N groups, finding an average histogram within the group and then finding the closest K available images to the average histogram. From there, I'd go back and calculate the closest matches for the tiles within each group but from a smaller source of the K closest images.
The pragmatic answer is cheat.
Define several aggregate projections, like "average R", "average G", "average B". Precategorize your images on these projections. Do a preliminary score for each section to the thumbnails which is the sum of absolute differences between the image's projection and the thumbnails.
Now throw the thumbnails into a heap, and pull off the best 50. Do your detailed calculation on that 50 and select the best one of those.
You might not pick the perfect answer. But you'll pick a pretty good one. And your necessary work per thumbnail is very small. 400 times you do 3 lookups, and a couple of comparisons. Only a few make the cut to the real work.

Change image brightness and contrast using DCT coefficients

I'm trying to perform some image transformations in the frequency domain (using dct coeff) such as adjusting the brightness and contrast of a grayscale image. What I know so far is that adjusting the brightness implies adding an offset to the pixel intensity and adjusting the contrast is multiplying each pixel with a value. My question is if this is still available in the frequency domain?
img = image("lena.bmp")
img= double(img)-128;
blKsz = 8;
coef = blkproc(I,[block_size block_size],'dct2');
new_coef = coef - 0.3;
% IDCT
new_img = blkproc(new_coef,[block_size block_size],'idct2');
new_img = new_img+128;
When I do this there is no visible difference to the image, even though the values are a bit higher. But if instead of doing by blocks I perform on the coef of full image
coef = dct2(img); % or blKsz=512; %full image
the difference is noticeable.
What am I doing wrong? Is it the way I choose the values I add and multiply (which are totally random)? I would also like to mention that if I add an offset artifacts from idct are present in the output (the first 3 top left pixels of each block are way different than the others).
I know that the top left value of each dct block has the brightness average of the block. Should I modify only this one and not the other values? How does the block size influence the result?
Adjusting the brightness and contrast of your image in the frequency domain is certainly possible, but the practical implications behind it are questionable. Mainly, I am not sure why you would want to go through all of the computational burden of calculating each DCT block. Contrast and brightness enhancement in this particular case in the spatial domain is of worst case O(n) where n is the total number of pixels in the image. Going to the frequency domain will incur additional computational cost.
In any case, as you mentioned in your post, you can increase brightness by adding a constant to all of the intensity values, and to increase contrast you can scale each pixel by a constant factor. It is analogous to the frequency domain like so:
Contrast Enhancement
By taking a look at the spatial domain, if you multiply every pixel by a constant, all of the DCT coefficients will also get multiplied by a constant as well as the DCT is a (relatively) invertible transform. As such, if you want to achieve contrast enhancement, you can take every single value for every DCT block that you have and multiply by this constant.
Brightness Enhancement
By taking a look at the spatial domain, if you take every pixel and add a constant to it, you are essentially increasing the overall "power" of the image. If you were to take a look at the frequency spectrum between any image and an image with a constant value added to each pixel, for the DCT blocks the coefficients would be the same except for the DC value (top left corner of each block). As such, if you wanted to increase brightness in the frequency domain, you would add a constant value to each of the DC values in each of the DCT blocks. However, I am not sure what the numerical correlation is between adding a value to the DC term in each block in comparison to adding a constant value to each pixel in the spatial domain. What I mean to say is adding a value of... let's say... 5 for example... in the spatial domain does not mean that if you add a value of 5 to each of the DC values, you will get the same results. There will definitely be an increase in brightness, but I'm not sure how to quantify how much brightness increase you will get once you take the inverse DCT.
A caveat that you should take a note of is that if you try to add a value to every DCT coefficient, you would essentially be simulating additive white Gaussian noise. The higher the value, the more noise that gets introduced into your image. As such, if you were to take the inverse DCT, the results may not be favourable to your liking. Therefore, make sure you only add a constant to the DC value (i.e. the top left corner of a DCT block)
Long story short, theoretically it is possible to do this in the frequency domain, but practically speaking I don't see a point.

Does a negative cross correlation show high or low similarity?

I am programming some image processing techniques which requires comparing the similarity of two sub images. I'm using a the normalised cross correlation metric which returns a value between -1 and +1. Should I be taking the absolute value of this as my similarity measure or does negative cross correlation imply a poor similarity?
Negative and positive correlation is meaningful and all depends on your application. Let me to make it more clear. Suppose you have three datasets (e.g. A : Age, B : Hair and C : Height). Suppose that the correlation between A and C is positive (0.98), so it mean that by increasing the age, it is expected to be taller). However, we you calculate the correlation between A and B, you find that it is negative! What does it mean? It means by increasing age, you expect to have less hair! So, as you can see, positive correlation means a parallel increasing/decreasing in both dataset, while negative correlation means having two opposite trend which can be meaningful, because base on the negative correlation you can expect to have more hair when you are kid!
-1 is a sign of correlation, too. Only values around 0 are an indication that there is no correlation. Near +1 means, that the image is very similar to the other one. Near -1 means, that it's likely that one image is a negative and should be inverted, so the images are similar and get a correlation near +1.
First of all, the Normalized Cross-Correlation (NCC) used as similarity function has different properties than the correlation. Positive large values implies high similarity, while negative large values implies low similarity.
If your input matrices have only positive values, you cannot have negative NCC values. However, if your implementation of the NCC removes first the intensity mean value of the images, your images will have positive and negative NCC values. Hence, you can have negative NCC values, like normxcorr2
TL;DR
First, you are trying to do image registration (template matching), i.e. transform an image to fit the coordinate system of the template image. In order to do that, you need to use a similarity function (or dissimilarity function) to estimate the needed transformation.
I assume that:
You only apply linear transformations. Hence, it is needed to
estimate the spatial orthogonal misalignment between the two images,
offset in the x- and y-axis
Both images have the same modality,
single modality, where both template and image were captured by the same device/configuration.
Hence, the use of the normalized cross-correlation seems like a good option. If the previous assumptions don't apply, please use other similarity function
The Normalized Cross-Correlation (NCC) is an intensity-based similarity function. Hence, it measures how similar two images are based only on the pixel intensity. Basically, the image is spatially shifted over the template. For each shift, it is added the pixel by pixel multiplication at each overlapping position.
Therefore, for maximum alignment, you need to shift the image to the position that maximise the similarity between the two images, i.e. largest NCC
See also: image registering
I don't think you are right in saying that "the Normalized Cross-Correlation (NCC) used as similarity function has different properties than the correlation. Positive large values implies high similarity, while negative large values implies low similarity."
Correlation in images is no different than the other types of non-image correlations.
In NCC a correlation value of +1 indicates two images are identical pixel-by-pixel. A correlation value of 0 indicates no similarity. However a correlation value of -1 does not imply no similarity. It also means maximum similarity, but in the opposite sense. The image pixels in the normalized domain can take values in the range [0 1]. If you take one of the images, subtract all pixel values from 1 (1-(pixel_value)), you create an inverted image, where bright spots become dark and dark spots become light. If correlation with the original image produced a correlation value of 1 (100% similarity), the correlation with the inverted image produces an correlation value of -1.

What is sparsity in image processing?

I am new in image processing and I don't know the use of basic terms, I know the basic definition of sparsity, but can anyone please elaborate the definition in term of image processing?
Well Sajid, I actually was doing image processing a few months ago, and I had found a website that gave me what I thought was the best definition of sparsity.
Sparsity and density are terms used to describe the percentage of
cells in a database table that are not populated and populated,
respectively. The sum of the sparsity and density should equal 100%.
A table that is 10% dense has 10% of its cells populated with non-zero
values. It is therefore 90% sparse – meaning that 90% of its cells are
either not filled with data or are zeros.
I took this in the context of on/off for black and white image processing. If many pixels were off, then the pixels were sparse.
As The Obscure Question said, sparsity is when a vector or matrix is mostly zeros. To see a real world example of this, just look at the wavelet transform, which is known to be sparse for any real-world image.
(all the black values are 0)
Sparsity has powerful impacts. It can transform matrix multiplication of two NxN matrices, normally a O(N^3) operation, into an O(k) operation (with k non-zero elements). Why? Because it's a well-known fact that for all x, x * 0 = 0.
What does sparsity mean? In the problems I've been exposed to, it means similarity in some domain. For example, natural images are largely the same color in areas (the sky is blue, the grass is green, etc). If you take the wavelet transform of that natural image, the output is sparse through the recursive nature of the wavelet (well, at least recursive in the Haar wavelet).

Resources