Checking images for similarity with OpenCV - image

Does OpenCV support the comparison of two images, returning some value (maybe a percentage) that indicates how similar these images are? E.g. 100% would be returned if the same image was passed twice, 0% would be returned if the images were totally different.
I already read a lot of similar topics here on StackOverflow. I also did quite some Googling. Sadly I couldn't come up with a satisfying answer.

This is a huge topic, with answers from 3 lines of code to entire research magazines.
I will outline the most common such techniques and their results.
Comparing histograms
One of the simplest & fastest methods. Proposed decades ago as a means to find picture simmilarities. The idea is that a forest will have a lot of green, and a human face a lot of pink, or whatever. So, if you compare two pictures with forests, you'll get some simmilarity between histograms, because you have a lot of green in both.
Downside: it is too simplistic. A banana and a beach will look the same, as both are yellow.
OpenCV method: compareHist()
Template matching
A good example here matchTemplate finding good match. It convolves the search image with the one being search into. It is usually used to find smaller image parts in a bigger one.
Downsides: It only returns good results with identical images, same size & orientation.
OpenCV method: matchTemplate()
Feature matching
Considered one of the most efficient ways to do image search. A number of features are extracted from an image, in a way that guarantees the same features will be recognized again even when rotated, scaled or skewed. The features extracted this way can be matched against other image feature sets. Another image that has a high proportion of the features matching the first one is considered to be depicting the same scene.
Finding the homography between the two sets of points will allow you to also find the relative difference in shooting angle between the original pictures or the amount of overlapping.
There are a number of OpenCV tutorials/samples on this, and a nice video here. A whole OpenCV module (features2d) is dedicated to it.
Downsides: It may be slow. It is not perfect.
Over on the OpenCV Q&A site I am talking about the difference between feature descriptors, which are great when comparing whole images and texture descriptors, which are used to identify objects like human faces or cars in an image.

Since no one has posted a complete concrete example, here are two quantitative methods to determine the similarity between two images. One method for comparing images with the same dimensions; another for scale-invariant and transformation indifferent images. Both methods return a similarity score between 0 to 100, where 0 represents a completely different image and 100 represents an identical/duplicate image. For all other values in between: the lower the score, the less similar; the higher the score, the more similar.
Method #1: Structural Similarity Index (SSIM)
To compare differences and determine the exact discrepancies between two images, we can utilize Structural Similarity Index (SSIM) which was introduced in Image Quality Assessment: From Error Visibility to Structural Similarity. SSIM is an image quality assessment approach which estimates the degradation of structural similarity based on the statistical properties of local information between a reference and a distorted image. The range of SSIM values extends between [-1, 1] and it typically calculated using a sliding window in which the SSIM value for the whole image is computed as the average across all individual window results. This method is already implemented in the scikit-image library for image processing and can be installed with pip install scikit-image.
The skimage.metrics.structural_similarity() function returns a comparison score and a difference image, diff. The score represents the mean SSIM score between two images with higher values representing higher similarity. The diff image contains the actual image differences with darker regions having more disparity. Larger areas of disparity are highlighted in black while smaller differences are in gray. Here's an example:
Input images
Difference image -> highlighted mask differences
The SSIM score after comparing the two images show that they are very similar.
Similarity Score: 89.462%
To visualize the exact differences between the two images, we can iterate through each contour, filter using a minimum threshold area to remove tiny noise, and highlight discrepancies with a bounding box.
Limitations: Although this method works very well, there are some important limitations. The two input images must have the same size/dimensions and also suffers from a few problems including scaling, translations, rotations, and distortions. SSIM also does not perform very well on blurry or noisy images. These problems are addressed in Method #2.
Code:
from skimage.metrics import structural_similarity
import cv2
import numpy as np
first = cv2.imread('clownfish_1.jpeg')
second = cv2.imread('clownfish_2.jpeg')
# Convert images to grayscale
first_gray = cv2.cvtColor(first, cv2.COLOR_BGR2GRAY)
second_gray = cv2.cvtColor(second, cv2.COLOR_BGR2GRAY)
# Compute SSIM between two images
score, diff = structural_similarity(first_gray, second_gray, full=True)
print("Similarity Score: {:.3f}%".format(score * 100))
# The diff image contains the actual image differences between the two images
# and is represented as a floating point data type so we must convert the array
# to 8-bit unsigned integers in the range [0,255] before we can use it with OpenCV
diff = (diff * 255).astype("uint8")
# Threshold the difference image, followed by finding contours to
# obtain the regions that differ between the two images
thresh = cv2.threshold(diff, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
# Highlight differences
mask = np.zeros(first.shape, dtype='uint8')
filled = second.copy()
for c in contours:
area = cv2.contourArea(c)
if area > 100:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(first, (x, y), (x + w, y + h), (36,255,12), 2)
cv2.rectangle(second, (x, y), (x + w, y + h), (36,255,12), 2)
cv2.drawContours(mask, [c], 0, (0,255,0), -1)
cv2.drawContours(filled, [c], 0, (0,255,0), -1)
cv2.imshow('first', first)
cv2.imshow('second', second)
cv2.imshow('diff', diff)
cv2.imshow('mask', mask)
cv2.imshow('filled', filled)
cv2.waitKey()
Method #2: Dense Vector Representations
Typically, two images will not be exactly the same. They may have variations with slightly different backgrounds, dimensions, feature additions/subtractions, or transformations (scaled, rotated, skewed). In other words, we cannot use a direct pixel-to-pixel approach since with variations, the problem shifts from identifying pixel-similarity to object-similarity. We must switch to deep-learning feature models instead of comparing individual pixel values.
To determine identical and near-similar images, we can use the the sentence-transformers library which provides an easy way to compute dense vector representations for images and the OpenAI Contrastive Language-Image Pre-Training (CLIP) Model which is a neural network already trained on a variety of (image, text) pairs. The idea is to encode all images into vector space and then find high density regions which correspond to areas where the images are fairly similar.
When two images are compared, they are given a score between 0 to 1.00. We can use a threshold parameter to identify two images as similar or different. A lower threshold will result in clusters which have fewer similar images in it. Conversely, a higher threshold will result in clusters that have more similar images. A duplicate image will have a score of 1.00 meaning the two images are exactly the same. To find near-similar images, we can set the threshold to any arbitrary value, say 0.9. For instance, if the determined score between two images are greater than 0.9 then we can conclude they are near-similar images.
An example:
This dataset has five images, notice how there are duplicates of flower #1 while the others are different.
Identifying duplicate images
Score: 100.000%
.\flower_1 copy.jpg
.\flower_1.jpg
Both flower #1 and its copy are the same
Identifying near-similar images
Score: 97.141%
.\cat_1.jpg
.\cat_2.jpg
Score: 95.693%
.\flower_1.jpg
.\flower_2.jpg
Score: 57.658%
.\cat_1.jpg
.\flower_1 copy.jpg
Score: 57.658%
.\cat_1.jpg
.\flower_1.jpg
Score: 57.378%
.\cat_1.jpg
.\flower_2.jpg
Score: 56.768%
.\cat_2.jpg
.\flower_1 copy.jpg
Score: 56.768%
.\cat_2.jpg
.\flower_1.jpg
Score: 56.284%
.\cat_2.jpg
.\flower_2.jpg
We get more interesting results between different images. The higher the score, the more similar; the lower the score, the less similar. Using a threshold of 0.9 or 90%, we can filter out near-similar images.
Comparison between just two images
Score: 97.141%
.\cat_1.jpg
.\cat_2.jpg
Score: 95.693%
.\flower_1.jpg
.\flower_2.jpg
Score: 88.914%
.\ladybug_1.jpg
.\ladybug_2.jpg
Score: 94.503%
.\cherry_1.jpg
.\cherry_2.jpg
Code:
from sentence_transformers import SentenceTransformer, util
from PIL import Image
import glob
import os
# Load the OpenAI CLIP Model
print('Loading CLIP Model...')
model = SentenceTransformer('clip-ViT-B-32')
# Next we compute the embeddings
# To encode an image, you can use the following code:
# from PIL import Image
# encoded_image = model.encode(Image.open(filepath))
image_names = list(glob.glob('./*.jpg'))
print("Images:", len(image_names))
encoded_image = model.encode([Image.open(filepath) for filepath in image_names], batch_size=128, convert_to_tensor=True, show_progress_bar=True)
# Now we run the clustering algorithm. This function compares images aganist
# all other images and returns a list with the pairs that have the highest
# cosine similarity score
processed_images = util.paraphrase_mining_embeddings(encoded_image)
NUM_SIMILAR_IMAGES = 10
# =================
# DUPLICATES
# =================
print('Finding duplicate images...')
# Filter list for duplicates. Results are triplets (score, image_id1, image_id2) and is scorted in decreasing order
# A duplicate image will have a score of 1.00
# It may be 0.9999 due to lossy image compression (.jpg)
duplicates = [image for image in processed_images if image[0] >= 0.999]
# Output the top X duplicate images
for score, image_id1, image_id2 in duplicates[0:NUM_SIMILAR_IMAGES]:
print("\nScore: {:.3f}%".format(score * 100))
print(image_names[image_id1])
print(image_names[image_id2])
# =================
# NEAR DUPLICATES
# =================
print('Finding near duplicate images...')
# Use a threshold parameter to identify two images as similar. By setting the threshold lower,
# you will get larger clusters which have less similar images in it. Threshold 0 - 1.00
# A threshold of 1.00 means the two images are exactly the same. Since we are finding near
# duplicate images, we can set it at 0.99 or any number 0 < X < 1.00.
threshold = 0.99
near_duplicates = [image for image in processed_images if image[0] < threshold]
for score, image_id1, image_id2 in near_duplicates[0:NUM_SIMILAR_IMAGES]:
print("\nScore: {:.3f}%".format(score * 100))
print(image_names[image_id1])
print(image_names[image_id2])

If for matching identical images ( same size/orientation )
// Compare two images by getting the L2 error (square-root of sum of squared error).
double getSimilarity( const Mat A, const Mat B ) {
if ( A.rows > 0 && A.rows == B.rows && A.cols > 0 && A.cols == B.cols ) {
// Calculate the L2 relative error between images.
double errorL2 = norm( A, B, CV_L2 );
// Convert to a reasonable scale, since L2 error is summed across all pixels of the image.
double similarity = errorL2 / (double)( A.rows * A.cols );
return similarity;
}
else {
//Images have a different size
return 100000000.0; // Return a bad value
}
Source

Sam's solution should be sufficient. I've used combination of both histogram difference and template matching because not one method was working for me 100% of the times. I've given less importance to histogram method though. Here's how I've implemented in simple python script.
import cv2
class CompareImage(object):
def __init__(self, image_1_path, image_2_path):
self.minimum_commutative_image_diff = 1
self.image_1_path = image_1_path
self.image_2_path = image_2_path
def compare_image(self):
image_1 = cv2.imread(self.image_1_path, 0)
image_2 = cv2.imread(self.image_2_path, 0)
commutative_image_diff = self.get_image_difference(image_1, image_2)
if commutative_image_diff < self.minimum_commutative_image_diff:
print "Matched"
return commutative_image_diff
return 10000 //random failure value
#staticmethod
def get_image_difference(image_1, image_2):
first_image_hist = cv2.calcHist([image_1], [0], None, [256], [0, 256])
second_image_hist = cv2.calcHist([image_2], [0], None, [256], [0, 256])
img_hist_diff = cv2.compareHist(first_image_hist, second_image_hist, cv2.HISTCMP_BHATTACHARYYA)
img_template_probability_match = cv2.matchTemplate(first_image_hist, second_image_hist, cv2.TM_CCOEFF_NORMED)[0][0]
img_template_diff = 1 - img_template_probability_match
# taking only 10% of histogram diff, since it's less accurate than template method
commutative_image_diff = (img_hist_diff / 10) + img_template_diff
return commutative_image_diff
if __name__ == '__main__':
compare_image = CompareImage('image1/path', 'image2/path')
image_difference = compare_image.compare_image()
print image_difference

A little bit off topic but useful is the pythonic numpy approach. Its robust and fast but just does compare pixels and not the objects or data the picture contains (and it requires images of same size and shape):
A very simple and fast approach to do this without openCV and any library for computer vision is to norm the picture arrays by
import numpy as np
picture1 = np.random.rand(100,100)
picture2 = np.random.rand(100,100)
picture1_norm = picture1/np.sqrt(np.sum(picture1**2))
picture2_norm = picture2/np.sqrt(np.sum(picture2**2))
After defining both normed pictures (or matrices) you can just sum over the multiplication of the pictures you like to compare:
1) If you compare similar pictures the sum will return 1:
In[1]: np.sum(picture1_norm**2)
Out[1]: 1.0
2) If they aren't similar, you'll get a value between 0 and 1 (a percentage if you multiply by 100):
In[2]: np.sum(picture2_norm*picture1_norm)
Out[2]: 0.75389941124629822
Please notice that if you have colored pictures you have to do this in all 3 dimensions or just compare a greyscaled version. I often have to compare huge amounts of pictures with arbitrary content and that's a really fast way to do so.

one can use auto encoder for such task using architectures like VGG16 on pre-trained ImageRes data; Then calculate distance between query and other images in order to find the closest match.

Related

Calculate 3D distance based on change in intensity

I have three sections (top, mid, bot) of grayscale images (3D). In each section, I have a point with coordinates (x,y) and intensity values [0-255]. The distance between each section is 20 pixels.
I created an illustration to show how those images were generated using a microscope:
Illustration
Illustration (side view): red line is the object of interest. Blue stars represents the dots which are visible in top, mid, bot section. The (x,y) coordinates of these dots are known. The length of the object remains the same but it can rotate in space - 'out of focus' (illustration shows a rotating line at time point 5). At time point 1, the red line is resting (in 2D image: 2 dots with a distance equal to the length of the object).
I want to estimate the x,y,z-coordinate of the end points (represents as stars) by using the changes in intensity, the knowledge about the length of the object and the information in the sections I have. Any help would be appreciated.
Here is an example of images:
Bot section
Mid section
Top section
My 3D PSF data:
https://drive.google.com/file/d/1qoyhWtLDD2fUy2zThYUgkYM3vMXxNh64/view?usp=sharing
Attempt so far:
enter image description here
I guess the correct approach would be to record three images with slightly different z-coordinates for your bot and your top frame, then do a 3D-deconvolution (using Richardson-Lucy or whatever algorithm).
However, a more simple approach would be as I have outlined in my comment. If you use the data for a publication, I strongly recommend to emphasize that this is just an estimation and to include the steps how you have done it.
I'd suggest the following procedure:
Since I do not have your PSF-data, I fake some by estimating the PSF as a 3D-Gaussiamn. Of course, this is a strong simplification, but you should be able to get the idea behind it.
First, fit a Gaussian to the PSF along z:
[xg, yg, zg] = meshgrid(-32:32, -32:32, -32:32);
rg = sqrt(xg.^2+yg.^2);
psf = exp(-(rg/8).^2) .* exp(-(zg/16).^2);
% add some noise to make it a bit more realistic
psf = psf + randn(size(psf)) * 0.05;
% view psf:
%
subplot(1,3,1);
s = slice(xg,yg,zg, psf, 0,0,[]);
title('faked PSF');
for i=1:2
s(i).EdgeColor = 'none';
end
% data along z through PSF's center
z = reshape(psf(33,33,:),[65,1]);
subplot(1,3,2);
plot(-32:32, z);
title('PSF along z');
% Fit the data
% Generate a function for a gaussian distibution plus some background
gauss_d = #(x0, sigma, bg, x)exp(-1*((x-x0)/(sigma)).^2)+bg;
ft = fit ((-32:32)', z, gauss_d, ...
'Start', [0 16 0] ... % You may find proper start points by looking at your data
);
subplot(1,3,3);
plot(-32:32, z, '.');
hold on;
plot(-32:.1:32, feval(ft, -32:.1:32), 'r-');
title('fit to z-profile');
The function that relates the intensity I to the z-coordinate is
gauss_d = #(x0, sigma, bg, x)exp(-1*((x-x0)/(sigma)).^2)+bg;
You can re-arrange this formula for x. Due to the square root, there are two possibilities:
% now make a function that returns the z-coordinate from the intensity
% value:
zfromI = #(I)ft.sigma * sqrt(-1*log(I-ft.bg))+ft.x0;
zfromI2= #(I)ft.sigma * -sqrt(-1*log(I-ft.bg))+ft.x0;
Note that the PSF I have faked is normalized to have one as its maximum value. If your PSF data is not normalized, you can divide the data by its maximum.
Now, you can use zfromI or zfromI2 to get the z-coordinate for your intensity. Again, I should be normalized, that is the fraction of the intensity to the intensity of your reference spot:
zfromI(.7)
ans =
9.5469
>> zfromI2(.7)
ans =
-9.4644
Note that due to the random noise I have added, your results might look slightly different.

How to sort image blocks by to minimize the differences between them?

I have a list of small image blocks. All of them are the same size (eg : 16x16). They are black and white, with pixel values between 0-255. I would like to sort those images to minimize the the difference between them.
What I have in mind is to compute the mean absolute difference (MAD) of pixels between adjacent blocks (eg : block N vs block N+1, block N+1 vs block N+2, ...).
From this I can calculate the sum of those MAD values (eg : sum = mad1 + mad2 + ...). What I would like is to find the ordering that minimize that sum.
Before :
After : (this was done by hand only to provide an example, there is probably a better ordering for those blocks, especially the ones with vertical stripes)
Based on RaymoAisla comment which pointed out this is similar to Travelling salesman problem, I have decided to implement a solution using Nearest neighbour algorithm. While not perfect, it gives great results :
Here is code :
from PIL import Image
import numpy as np
def sortImages(images): #Nearest neighbour algorithm
result = []
tovisit = range(len(images))
best = max(tovisit, key=lambda x: images[x].sum()) #brightest block as first
tovisit.remove(best)
result.append(best)
while tovisit:
best = min(tovisit, key=lambda x: MAD(images[x], images[best])) #nearest neighbour
tovisit.remove(best)
result.append(best)
return result #indices of sorted image blocks
def MAD(a, b): #mean absolute distance
return np.absolute(a - b).sum()
images = ... #should contains b&w images only (eg : from PIL Image)
arrays = [np.array(x).astype(int) for x in images] #convert images to np arrays
result = sortImages(arrays)

Average set of color images and standard deviation

I am learning image analysis and trying to average set of color images and get standard deviation at each pixel
I have done this, but it is not by averaging RGB channels. (for ex rchannel = I(:,:,1))
filelist = dir('dir1/*.jpg');
ims = zeros(215, 300, 3);
for i=1:length(filelist)
imname = ['dir1/' filelist(i).name];
rgbim = im2double(imread(imname));
ims = ims + rgbim;
end
avgset1 = ims/length(filelist);
figure;
imshow(avgset1);
I am not sure if this is correct. I am confused as to how averaging images is useful.
Also, I couldn't get the matrix holding standard deviation.
Any help is appreciated.
If you are concerned about finding the mean RGB image, then your code is correct. What I like is that you converted the images using im2double before accumulating the mean and so you are making everything double precision. As what Parag said, finding the mean image is very useful especially in machine learning. It is common to find the mean image of a set of images before doing image classification as it allows the dynamic range of each pixel to be within a normalized range. This allows the training of the learning algorithm to converge quickly to the optimum solution and provide the best set of parameters to facilitate the best accuracy in classification.
If you want to find the mean RGB colour which is the average colour over all images, then no your code is not correct.
You have summed over all channels individually which is stored in sumrgbims, so the last step you need to do now take this image and sum over each channel individually. Two calls to sum in the first and second dimensions chained together will help. This will produce a 1 x 1 x 3 vector, so using squeeze after this to remove the singleton dimensions and get a 3 x 1 vector representing the mean RGB colour over all images is what you get.
Therefore:
mean_colour = squeeze(sum(sum(sumrgbims, 1), 2));
To address your second question, I'm assuming you want to find the standard deviation of each pixel value over all images. What you will have to do is accumulate the square of each image in addition to accumulating each image inside the loop. After that, you know that the standard deviation is the square root of the variance, and the variance is equal to the average sum of squares subtracted by the mean squared. We have the mean image, now you just have to square the mean image and subtract this with the average sum of squares. Just to be sure our math is right, supposing we have a signal X with a mean mu. Given that we have N values in our signal, the variance is thus equal to:
Source: Science Buddies
The standard deviation would simply be the square root of the above result. We would thus calculate this for each pixel independently. Therefore you can modify your loop to do that for you:
filelist = dir('set1/*.jpg');
sumrgbims = zeros(215, 300, 3);
sum2rgbims = sumrgbims; % New - for standard deviation
for i=1:length(filelist)
imname = ['set1/' filelist(i).name];
rgbim = im2double(imread(imname));
sumrgbims = sumrgbims + rgbim;
sum2rgbims = sum2rgbims + rgbim.^2; % New
end
rgbavgset1 = sumrgbims/length(filelist);
% New - find standard deviation
rgbstdset1 = ((sum2rgbims / length(filelist)) - rgbavgset.^2).^(0.5);
figure;
imshow(rgbavgset1, []);
% New - display standard deviation image
figure;
imshow(rgbstdset1, []);
Also to make sure, I've scaled the display of each imshow call so the smallest value gets mapped to 0 and the largest value gets mapped to 1. This does not change the actual contents of the images. This is just for display purposes.

Comparing 2 images intensities

I have two images 1 and 2. I want to get the v (intensity) value of the hsv images; then I want the v (intensity) value of the first image equal to v (intesity) value of the second image?
I used this code to get the v
v = image1(:, :, 3);
u = image2(:, :, 3);
How do I make both u and v the same value?
Thanks,
It definitely sounds like you want to do some kind of histogram equalization. As a first attempt you can try Matlab's histeq function. (You should also read the documentation for imhist.) To make the intensity values in image1 more closely match those in image2 (they will almost certainly never become identical) you would do something like this:
v = image1(:, :, 3);
u = image2(:, :, 3);
targetHist = imhist(u);
newV = histeq(v, targetHist);
newImage1 = cat(3, image1(:, :, 1), image1(:, :, 2), newV);
% or, alternatively %
image1(:, :, 3) = newV;
If this technique doesn't give you the results you need, there are other methods of histogram equalization that you can use, including adaptive techniques that equalize intensities by regions.
Edit:
Here's a little about what the code is doing. For more information you can look at the Algorithm section of the link to histeq I gave above.
Given a reference image (in this case image2) in HSV format, we're taking the Value channel (or Intensity, or Brightness channel) and using imhist to divide the intensity levels in the image into 256 bins (by default), with each bin containing the number of pixels that have that intensity value.
histeq in this usage actually does histogram matching. It calculates the histogram for the second image and tries to "match" the input histogram. For instance, if the histogram has all of the bins empty except for bin 100, which has 300 pixels, and your image has all of the bins empty except for bin 80, which has 300 pixels, histeq will increase the intensity of each pixel in the image by 20. A very simplistic example, but hopefully you get the idea.
It would be very helpful to plot the 3 histograms for image1, image2 and the equalized histogram to see how the intensity levels for image1 are changed.

How does the algorithm to color the song list in iTunes 11 work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
The new iTunes 11 has a very nice view for the song list of an album, picking the colors for the fonts and background in function of album cover. Anyone figured out how the algorithm works?
I approximated the iTunes 11 color algorithm in Mathematica given the album cover as input:
How I did it
Through trial and error, I came up with an algorithm that works on ~80% of the albums with which I've tested it.
Color Differences
The bulk of the algorithm deals with finding the dominant color of an image. A prerequisite to finding dominant colors, however, is calculating a quantifiable difference between two colors. One way to calculate the difference between two colors is to calculate their Euclidean distance in the RGB color space. However, human color perception doesn't match up very well with distance in the RGB color space.
Therefore, I wrote a function to convert RGB colors (in the form {1,1,1}) to YUV, a color space which is much better at approximating color perception:
(EDIT: #cormullion and #Drake pointed out that Mathematica's built-in CIELAB and CIELUV color spaces would be just as suitable... looks like I reinvented the wheel a bit here)
convertToYUV[rawRGB_] :=
Module[{yuv},
yuv = {{0.299, 0.587, 0.114}, {-0.14713, -0.28886, 0.436},
{0.615, -0.51499, -0.10001}};
yuv . rawRGB
]
Next, I wrote a function to calculate color distance with the above conversion:
ColorDistance[rawRGB1_, rawRGB2_] :=
EuclideanDistance[convertToYUV # rawRGB1, convertToYUV # rawRGB2]
Dominant Colors
I quickly discovered that the built-in Mathematica function DominantColors doesn't allow enough fine-grained control to approximate the algorithm that iTunes uses. I wrote my own function instead...
A simple method to calculate the dominant color in a group of pixels is to collect all pixels into buckets of similar colors and then find the largest bucket.
DominantColorSimple[pixelArray_] :=
Module[{buckets},
buckets = Gather[pixelArray, ColorDistance[#1,#2] < .1 &];
buckets = Sort[buckets, Length[#1] > Length[#2] &];
RGBColor ## Mean # First # buckets
]
Note that .1 is the tolerance for how different colors must be to be considered separate. Also note that although the input is an array of pixels in raw triplet form ({{1,1,1},{0,0,0}}), I return a Mathematica RGBColor element to better approximate the built-in DominantColors function.
My actual function DominantColorsNew adds the option of returning up to n dominant colors after filtering out a given other color. It also exposes tolerances for each color comparison:
DominantColorsNew[pixelArray_, threshold_: .1, n_: 1,
numThreshold_: .2, filterColor_: 0, filterThreshold_: .5] :=
Module[
{buckets, color, previous, output},
buckets = Gather[pixelArray, ColorDistance[#1, #2] < threshold &];
If[filterColor =!= 0,
buckets =
Select[buckets,
ColorDistance[ Mean[#1], filterColor] > filterThreshold &]];
buckets = Sort[buckets, Length[#1] > Length[#2] &];
If[Length # buckets == 0, Return[{}]];
color = Mean # First # buckets;
buckets = Drop[buckets, 1];
output = List[RGBColor ## color];
previous = color;
Do[
If[Length # buckets == 0, Return[output]];
While[
ColorDistance[(color = Mean # First # buckets), previous] <
numThreshold,
If[Length # buckets != 0, buckets = Drop[buckets, 1],
Return[output]]
];
output = Append[output, RGBColor ## color];
previous = color,
{i, n - 1}
];
output
]
The Rest of the Algorithm
First I resized the album cover (36px, 36px) & reduced detail with a bilateral filter
image = Import["http://i.imgur.com/z2t8y.jpg"]
thumb = ImageResize[ image, 36, Resampling -> "Nearest"];
thumb = BilateralFilter[thumb, 1, .2, MaxIterations -> 2];
iTunes picks the background color by finding the dominant color along the edges of the album. However, it ignores narrow album cover borders by cropping the image.
thumb = ImageCrop[thumb, 34];
Next, I found the dominant color (with the new function above) along the outermost edge of the image with a default tolerance of .1.
border = Flatten[
Join[ImageData[thumb][[1 ;; 34 ;; 33]] ,
Transpose # ImageData[thumb][[All, 1 ;; 34 ;; 33]]], 1];
background = DominantColorsNew[border][[1]];
Lastly, I returned 2 dominant colors in the image as a whole, telling the function to filter out the background color as well.
highlights = DominantColorsNew[Flatten[ImageData[thumb], 1], .1, 2, .2,
List ## background, .5];
title = highlights[[1]];
songs = highlights[[2]];
The tolerance values above are as follows: .1 is the minimum difference between "separate" colors; .2 is the minimum difference between numerous dominant colors (A lower value might return black and dark gray, while a higher value ensures more diversity in the dominant colors); .5 is the minimum difference between dominant colors and the background (A higher value will yield higher-contrast color combinations)
Voila!
Graphics[{background, Disk[]}]
Graphics[{title, Disk[]}]
Graphics[{songs, Disk[]}]
Notes
The algorithm can be applied very generally. I tweaked the above settings and tolerance values to the point where they work to produce generally correct colors for ~80% of the album covers I tested. A few edge cases occur when DominantColorsNew doesn't find two colors to return for the highlights (i.e. when the album cover is monochrome). My algorithm doesn't address these cases, but it would be trivial to duplicate iTunes' functionality: when the album yields less than two highlights, the title becomes white or black depending on the best contrast with the background. Then the songs become the one highlight color if there is one, or the title color faded into the background a bit.
More Examples
With the answer of #Seth-thompson and the comment of #bluedog, I build a little Objective-C (Cocoa-Touch) project to generate color schemes in function of an image.
You can check the project at :
https://github.com/luisespinoza/LEColorPicker
For now, LEColorPicker is doing:
Image is scaled to 36x36 px (this reduce the compute time).
It generates a pixel array from the image.
Converts the pixel array to YUV space.
Gather colors as Seth Thompson's code does it.
The color's sets are sorted by count.
The algorithm select the three most dominant colors.
The most dominant is asigned as Background.
The second and third most dominants are tested using the w3c color contrast formula, to check if the colors has enought contrast with the background.
If one of the text colors don't pass the test, then is asigned to white or black, depending of the Y component.
That is for now, I will be checking the ColorTunes project (https://github.com/Dannvix/ColorTunes) and the Wade Cosgrove project for new features. Also I have some new ideas for improve the color scheme result.
Wade Cosgrove of Panic wrote a nice blog post describing his implementation of an algorithm that approximates the one in iTunes. It includes a sample implementation in Objective-C.
You might also checkout ColorTunes which is a HTML implementation of the Itunes album view which is using the MMCQ (median cut color quantization) algorithm.
I just wrote a JS library implementing roughly the same algorithm that the one described by #Seth. It is freely available on github.com/arcanis/colibrijs, and on NPM as colibrijs.
With #Seth's answer I implemented the algorithm to get the dominant color in the two lateral borders of a picture using PHP and Imagick.
https://gist.github.com/philix/5688064#file-simpleimage-php-L81
It's being used to fill the background of cover photos in http://festea.com.br
I asked the same question in a different context and was pointed over to http://charlesleifer.com/blog/using-python-and-k-means-to-find-the-dominant-colors-in-images/ for a learning algorithm (k Means) that rougly does the same thing using random starting points in the image. That way, the algorithm finds dominant colors by itself.

Resources