I have a binary image below:
it's an image of random abstract picture, and by using matlab, what I wanna do is to detect, how many peaks does it have so I'll know that there are roughly 5 objects in it.
As you can see, there are, 5 peaks in it, so it means there are 5 objects in it.
I've tried using imregionalmax(), but I don't find it usefull, since my image already in binary image. I also tried to use regionprops('Area'), but it shows wrong number since there is no exact whitespace between each object. Thanks in advance
An easy way to do this would be to simply sum across the rows for each column and find the peaks of the result using findpeaks. In the example below, I have opted to use the inverse of the image which will result in positive peaks where the columns are.
rowSum = sum(1 - image, 1);
If we plot this, it looks like the bottom plot
We can then use findpeaks to identify the peaks in this plot. We will apply a 5-point moving average to it to help eliminate false peaks.
[peaks, locations, widths, prominences] = findpeaks(smooth(rowSum));
You can then select the "true" peaks by thresholding based on any of these outputs. For this example we can use prominences and find the more prominent peaks.
isPeak = prominences > 50;
nPeaks = sum(isPeak)
5
Then we can plot the peaks locations to confirm
plot(locations(isPeak), peaks(isPeak), 'r*');
If you have some prior knowledge about the expected widths of the peaks, you could adjust the smooth span to match this expected width and obtain some cleaner peaks when using findpeaks.
Using an expected width of 40 for your image, findpeaks was able to perfectly detect all 5 peaks with no false positive.
findpeaks(smooth(rowSum, 40));
As your they are peaks, they are vertical structures. So in this particular case, you case use projection histograms (also know as histogram projection function): you make all the black pixels fall as if they were effected by gravity. Then you will find a curve of black pixels on the bottom of your image. Then you can count the number of peaks.
Here is the algorithm:
Invert the image (black is normally the absence of information)
Histogram projection
Closing and opening in order to clean the signal and get the final result.
You can add a maxima detection to get the top of the peaks.
I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.
I have an image (logical values), like this
I need to get this image resampled from pixel to mm or cm; this is the code I use to get the resampling:
function [ Ires ] = imresample3( I, pixDim )
[r,c]=size(I);
x=1:1:c;
y=1:1:r;
[X,Y]=meshgrid(x,y);
rn=r*pixDim;
cn=c*pixDim;
xNew=1:pixDim:cn;
yNew=1:pixDim:rn;
[Xnew,Ynew]=meshgrid(xNew,yNew);
Id=double(I);
Ires=interp2(X,Y,Id,Xnew,Ynew);
end
What I get is a black image. I suspect that this code does something that is not what I have in mind: it seems to take only the upper-left part of the image.
What I want is, instead, to have the same image on a mm/cm scale: what I expect is that every white pixel should be mapped from the original position to the new position (in mm/cm); what happen is certainly not what I expect.
I'm not sure that interp2 is the right command to use.
I don't want to resize the image, I just want to go from pixel world to mm/cm world.
pixDim is of course the dimension of the image pixel, obtained dividing the height of the ear in cm by the height of the ear in mm (and it is on average 0.019 cm).
Any ideas?
EDIT: I was quite sure that the code had no sense, but someone told me to do that way...anyway, if I have two edged ears, I need first to scale both the the real dimension and then perform some operations on them. What I mean with "real dimension" is that if one has size 6.5x3.5cm and the other has size 6x3.2cm, I need to perform operations on this dimensions.
I don't get how can I move from the pixel dimension to cm dimension BEFORE doing operation.
I want to move from one world to the other because I want to get rid of the capturing distance (because I suppose that if a picture of the ear is taken near and the other is taken far, they should have different size in pixel dimension).
Am I correct? There is a way to do it? I thought I can plot the ear scaling the axis, but then I suppose I cannot subtract one from the other, right?
Matlab does not use units. To apply your factor of 0.019cm/pixel you have to scale by a factor of 0.019 to have a 1cm grid, but this would cause any artefact below a size of 1cm to be lost.
Best practice is to display the data using multiple axis, one for cm and one for pixels. It's explained here: http://www.mathworks.de/de/help/matlab/creating_plots/using-multiple-x-and-y-axes.html
Any function processing the data should be independent of the scale or use the scale factor as an input argument, everything else is a sign of some serious algorithmic issues.
For a thumbnail-engine I would like to develop an algorithm that takes x random thumbnails (crop, no resize) from an image, analyzes them for contrast and chooses the one with the highest contrast. I'm working with PHP and Imagick but I would be glad for some general tips about how to compute contrast of imagery.
It seems that many things are easier than computing contrast, for example counting colors, computing luminosity,etc.
What are your experiences with the analysis of picture material?
I'd do it that way (pseudocode):
L[256] = {0,0,0...}
loop over each pixel:
luminance = avg(R,G,B)
increment L[luminance] by 1
for i = 0 to 255:
if L[i] < C: L[i] = 0 // C = threshold of your chose
find index of first and last non-zero value of L[]
contrast = last - first
In looking for the image "with the highest contrast," you will need to be very careful in how you define contrast for the image. In the simplest way, contrast is the difference between the lowest intensity and the highest intensity in the image. That is not going to be very useful in your case.
I suggest you use a histogram approach to describe the contrast of a given image and then compare the properties of the histograms to determine the image with the highest contrast as you define it. You could use a variety of well known containers to represent the histogram in code, or construct a class to meet your specific needs. (I am not implying that you need to create a histogram in the form of a chart – just a statistical representation of the intensity values.) You could use the variance of each histogram directly as a measure of contrast, or use the standard deviation if that is easier to work with.
The key really lies in how you define the contrast of the image. In general, I would define a high contrast image as one with values present for all, or nearly all, the possible values. And I would further add that in this definition of a high contrast image, the intensity values of the image will tend to be distributed across the range of possible values in a uniform way.
Using this approach, a low contrast image would tend to have relatively few discrete intensity values and they would tend to be closely grouped together rather than uniformly distributed. (As a general rule, they will also tend to be grouped toward the center of the range.)
I'm looking to create a base table of images and then compare any new images against that to determine if the new image is an exact (or close) duplicate of the base.
For example: if you want to reduce storage of the same image 100's of times, you could store one copy of it and provide reference links to it. When a new image is entered you want to compare to an existing image to make sure it's not a duplicate ... ideas?
One idea of mine was to reduce to a small thumbnail and then randomly pick 100 pixel locations and compare.
Below are three approaches to solving this problem (and there are many others).
The first is a standard approach in computer vision, keypoint matching. This may require some background knowledge to implement, and can be slow.
The second method uses only elementary image processing, and is potentially faster than the first approach, and is straightforward to implement. However, what it gains in understandability, it lacks in robustness -- matching fails on scaled, rotated, or discolored images.
The third method is both fast and robust, but is potentially the hardest to implement.
Keypoint Matching
Better than picking 100 random points is picking 100 important points. Certain parts of an image have more information than others (particularly at edges and corners), and these are the ones you'll want to use for smart image matching. Google "keypoint extraction" and "keypoint matching" and you'll find quite a few academic papers on the subject. These days, SIFT keypoints are arguably the most popular, since they can match images under different scales, rotations, and lighting. Some SIFT implementations can be found here.
One downside to keypoint matching is the running time of a naive implementation: O(n^2m), where n is the number of keypoints in each image, and m is the number of images in the database. Some clever algorithms might find the closest match faster, like quadtrees or binary space partitioning.
Alternative solution: Histogram method
Another less robust but potentially faster solution is to build feature histograms for each image, and choose the image with the histogram closest to the input image's histogram. I implemented this as an undergrad, and we used 3 color histograms (red, green, and blue), and two texture histograms, direction and scale. I'll give the details below, but I should note that this only worked well for matching images VERY similar to the database images. Re-scaled, rotated, or discolored images can fail with this method, but small changes like cropping won't break the algorithm
Computing the color histograms is straightforward -- just pick the range for your histogram buckets, and for each range, tally the number of pixels with a color in that range. For example, consider the "green" histogram, and suppose we choose 4 buckets for our histogram: 0-63, 64-127, 128-191, and 192-255. Then for each pixel, we look at the green value, and add a tally to the appropriate bucket. When we're done tallying, we divide each bucket total by the number of pixels in the entire image to get a normalized histogram for the green channel.
For the texture direction histogram, we started by performing edge detection on the image. Each edge point has a normal vector pointing in the direction perpendicular to the edge. We quantized the normal vector's angle into one of 6 buckets between 0 and PI (since edges have 180-degree symmetry, we converted angles between -PI and 0 to be between 0 and PI). After tallying up the number of edge points in each direction, we have an un-normalized histogram representing texture direction, which we normalized by dividing each bucket by the total number of edge points in the image.
To compute the texture scale histogram, for each edge point, we measured the distance to the next-closest edge point with the same direction. For example, if edge point A has a direction of 45 degrees, the algorithm walks in that direction until it finds another edge point with a direction of 45 degrees (or within a reasonable deviation). After computing this distance for each edge point, we dump those values into a histogram and normalize it by dividing by the total number of edge points.
Now you have 5 histograms for each image. To compare two images, you take the absolute value of the difference between each histogram bucket, and then sum these values. For example, to compare images A and B, we would compute
|A.green_histogram.bucket_1 - B.green_histogram.bucket_1|
for each bucket in the green histogram, and repeat for the other histograms, and then sum up all the results. The smaller the result, the better the match. Repeat for all images in the database, and the match with the smallest result wins. You'd probably want to have a threshold, above which the algorithm concludes that no match was found.
Third Choice - Keypoints + Decision Trees
A third approach that is probably much faster than the other two is using semantic texton forests (PDF). This involves extracting simple keypoints and using a collection decision trees to classify the image. This is faster than simple SIFT keypoint matching, because it avoids the costly matching process, and keypoints are much simpler than SIFT, so keypoint extraction is much faster. However, it preserves the SIFT method's invariance to rotation, scale, and lighting, an important feature that the histogram method lacked.
Update:
My mistake -- the Semantic Texton Forests paper isn't specifically about image matching, but rather region labeling. The original paper that does matching is this one: Keypoint Recognition using Randomized Trees. Also, the papers below continue to develop the ideas and represent the state of the art (c. 2010):
Fast Keypoint Recognition using Random Ferns - faster and more scalable than Lepetit 06
BRIEF: Binary Robust Independent Elementary Features - less robust but very fast -- I think the goal here is real-time matching on smart phones and other handhelds
The best method I know of is to use a Perceptual Hash. There appears to be a good open source implementation of such a hash available at:
http://phash.org/
The main idea is that each image is reduced down to a small hash code or 'fingerprint' by identifying salient features in the original image file and hashing a compact representation of those features (rather than hashing the image data directly). This means that the false positives rate is much reduced over a simplistic approach such as reducing images down to a tiny thumbprint sized image and comparing thumbprints.
phash offers several types of hash and can be used for images, audio or video.
This post was the starting point of my solution, lots of good ideas here so I though I would share my results. The main insight is that I've found a way to get around the slowness of keypoint-based image matching by exploiting the speed of phash.
For the general solution, it's best to employ several strategies. Each algorithm is best suited for certain types of image transformations and you can take advantage of that.
At the top, the fastest algorithms; at the bottom the slowest (though more accurate). You might skip the slow ones if a good match is found at the faster level.
file-hash based (md5,sha1,etc) for exact duplicates
perceptual hashing (phash) for rescaled images
feature-based (SIFT) for modified images
I am having very good results with phash. The accuracy is good for rescaled images. It is not good for (perceptually) modified images (cropped, rotated, mirrored, etc). To deal with the hashing speed we must employ a disk cache/database to maintain the hashes for the haystack.
The really nice thing about phash is that once you build your hash database (which for me is about 1000 images/sec), the searches can be very, very fast, in particular when you can hold the entire hash database in memory. This is fairly practical since a hash is only 8 bytes.
For example, if you have 1 million images it would require an array of 1 million 64-bit hash values (8 MB). On some CPUs this fits in the L2/L3 cache! In practical usage I have seen a corei7 compare at over 1 Giga-hamm/sec, it is only a question of memory bandwidth to the CPU. A 1 Billion-image database is practical on a 64-bit CPU (8GB RAM needed) and searches will not exceed 1 second!
For modified/cropped images it would seem a transform-invariant feature/keypoint detector like SIFT is the way to go. SIFT will produce good keypoints that will detect crop/rotate/mirror etc. However the descriptor compare is very slow compared to hamming distance used by phash. This is a major limitation. There are a lot of compares to do, since there are maximum IxJxK descriptor compares to lookup one image (I=num haystack images, J=target keypoints per haystack image, K=target keypoints per needle image).
To get around the speed issue, I tried using phash around each found keypoint, using the feature size/radius to determine the sub-rectangle. The trick to making this work well, is to grow/shrink the radius to generate different sub-rect levels (on the needle image). Typically the first level (unscaled) will match however often it takes a few more. I'm not 100% sure why this works, but I can imagine it enables features that are too small for phash to work (phash scales images down to 32x32).
Another issue is that SIFT will not distribute the keypoints optimally. If there is a section of the image with a lot of edges the keypoints will cluster there and you won't get any in another area. I am using the GridAdaptedFeatureDetector in OpenCV to improve the distribution. Not sure what grid size is best, I am using a small grid (1x3 or 3x1 depending on image orientation).
You probably want to scale all the haystack images (and needle) to a smaller size prior to feature detection (I use 210px along maximum dimension). This will reduce noise in the image (always a problem for computer vision algorithms), also will focus detector on more prominent features.
For images of people, you might try face detection and use it to determine the image size to scale to and the grid size (for example largest face scaled to be 100px). The feature detector accounts for multiple scale levels (using pyramids) but there is a limitation to how many levels it will use (this is tunable of course).
The keypoint detector is probably working best when it returns less than the number of features you wanted. For example, if you ask for 400 and get 300 back, that's good. If you get 400 back every time, probably some good features had to be left out.
The needle image can have less keypoints than the haystack images and still get good results. Adding more doesn't necessarily get you huge gains, for example with J=400 and K=40 my hit rate is about 92%. With J=400 and K=400 the hit rate only goes up to 96%.
We can take advantage of the extreme speed of the hamming function to solve scaling, rotation, mirroring etc. A multiple-pass technique can be used. On each iteration, transform the sub-rectangles, re-hash, and run the search function again.
My company has about 24million images come in from manufacturers every month. I was looking for a fast solution to ensure that the images we upload to our catalog are new images.
I want to say that I have searched the internet far and wide to attempt to find an ideal solution. I even developed my own edge detection algorithm.
I have evaluated speed and accuracy of multiple models.
My images, which have white backgrounds, work extremely well with phashing. Like redcalx said, I recommend phash or ahash. DO NOT use MD5 Hashing or anyother cryptographic hashes. Unless, you want only EXACT image matches. Any resizing or manipulation that occurs between images will yield a different hash.
For phash/ahash, Check this out: imagehash
I wanted to extend *redcalx'*s post by posting my code and my accuracy.
What I do:
from PIL import Image
from PIL import ImageFilter
import imagehash
img1=Image.open(r"C:\yourlocation")
img2=Image.open(r"C:\yourlocation")
if img1.width<img2.width:
img2=img2.resize((img1.width,img1.height))
else:
img1=img1.resize((img2.width,img2.height))
img1=img1.filter(ImageFilter.BoxBlur(radius=3))
img2=img2.filter(ImageFilter.BoxBlur(radius=3))
phashvalue=imagehash.phash(img1)-imagehash.phash(img2)
ahashvalue=imagehash.average_hash(img1)-imagehash.average_hash(img2)
totalaccuracy=phashvalue+ahashvalue
Here are some of my results:
item1 item2 totalsimilarity
desk1 desk1 3
desk1 phone1 22
chair1 desk1 17
phone1 chair1 34
Hope this helps!
As cartman pointed out, you can use any kind of hash value for finding exact duplicates.
One starting point for finding close images could be here. This is a tool used by CG companies to check if revamped images are still showing essentially the same scene.
I have an idea, which can work and it most likely to be very fast.
You can sub-sample an image to say 80x60 resolution or comparable,
and convert it to grey scale (after subsampling it will be faster).
Process both images you want to compare.
Then run normalised sum of squared differences between two images (the query image and each from the db),
or even better Normalised Cross Correlation, which gives response closer to 1, if
both images are similar.
Then if images are similar you can proceed to more sophisticated techniques
to verify that it is the same images.
Obviously this algorithm is linear in terms of number of images in your database
so even though it is going to be very fast up to 10000 images per second on the modern hardware.
If you need invariance to rotation, then a dominant gradient can be computed
for this small image, and then the whole coordinate system can be rotated to canonical
orientation, this though, will be slower. And no, there is no invariance to scale here.
If you want something more general or using big databases (million of images), then
you need to look into image retrieval theory (loads of papers appeared in the last 5 years).
There are some pointers in other answers. But It might be overkill, and the suggest histogram approach will do the job. Though I would think combination of many different
fast approaches will be even better.
I believe that dropping the size of the image down to an almost icon size, say 48x48, then converting to greyscale, then taking the difference between pixels, or Delta, should work well. Because we're comparing the change in pixel color, rather than the actual pixel color, it won't matter if the image is slightly lighter or darker. Large changes will matter since pixels getting too light/dark will be lost. You can apply this across one row, or as many as you like to increase the accuracy. At most you'd have 47x47=2,209 subtractions to make in order to form a comparable Key.
Picking 100 random points could mean that similar (or occasionally even dissimilar) images would be marked as the same, which I assume is not what you want. MD5 hashes wouldn't work if the images were different formats (png, jpeg, etc), had different sizes, or had different metadata. Reducing all images to a smaller size is a good bet, doing a pixel-for- pixel comparison shouldn't take too long as long as you're using a good image library / fast language, and the size is small enough.
You could try making them tiny, then if they are the same perform another comparison on a larger size - could be a good combination of speed and accuracy...
What we loosely refer to as duplicates can be difficult for algorithms to discern.
Your duplicates can be either:
Exact Duplicates
Near-exact Duplicates. (minor edits of image etc)
perceptual Duplicates (same content, but different view, camera etc)
No1 & 2 are easier to solve. No 3. is very subjective and still a research topic.
I can offer a solution for No1 & 2.
Both solutions use the excellent image hash- hashing library: https://github.com/JohannesBuchner/imagehash
Exact duplicates
Exact duplicates can be found using a perceptual hashing measure.
The phash library is quite good at this. I routinely use it to clean
training data.
Usage (from github site) is as simple as:
from PIL import Image
import imagehash
# image_fns : List of training image files
img_hashes = {}
for img_fn in sorted(image_fns):
hash = imagehash.average_hash(Image.open(image_fn))
if hash in img_hashes:
print( '{} duplicate of {}'.format(image_fn, img_hashes[hash]) )
else:
img_hashes[hash] = image_fn
Near-Exact Duplicates
In this case you will have to set a threshold and compare the hash values for their distance from each
other. This has to be done by trial-and-error for your image content.
from PIL import Image
import imagehash
# image_fns : List of training image files
img_hashes = {}
epsilon = 50
for img_fn1, img_fn2 in zip(image_fns, image_fns[::-1]):
if image_fn1 == image_fn2:
continue
hash1 = imagehash.average_hash(Image.open(image_fn1))
hash2 = imagehash.average_hash(Image.open(image_fn2))
if hash1 - hash2 < epsilon:
print( '{} is near duplicate of {}'.format(image_fn1, image_fn2) )
If you have a large number of images, look into a Bloom filter, which uses multiple hashes for a probablistic but efficient result. If the number of images is not huge, then a cryptographic hash like md5 should be sufficient.
I think it's worth adding to this a phash solution I built that we've been using for a while now: Image::PHash. It is a Perl module, but the main parts are in C. It is several times faster than phash.org and has a few extra features for DCT-based phashes.
We had dozens of millions of images already indexed on a MySQL database, so I wanted something fast and also a way to use MySQL indices (which don't work with hamming distance), which led me to use "reduced" hashes for direct matches, the module doc discusses this.
It's quite simple to use:
use Image::PHash;
my $iph1 = Image::PHash->new('file1.jpg');
my $p1 = $iph1->pHash();
my $iph2 = Image::PHash->new('file2.jpg');
my $p2 = $iph2->pHash();
my $diff = Image::PHash::diff($p1, $p2);
I made a very simple solution in PHP for comparing images several years ago. It calculates a simple hash for each image, and then finds the difference. It works very nice for cropped or cropped with translation versions of the same image.
First I resize the image to a small size, like 24x24 or 36x36. Then I take each column of pixels and find average R,G,B values for this column.
After each column has its own three numbers, I do two passes: first on odd columns and second on even ones. The first pass sums all the processed cols and then divides by their number ( [1] + [2] + [5] + [N-1] / (N/2) ). The second pass works in another manner: ( [3] - [4] + [6] - [8] ... / (N/2) ).
So now I have two numbers. As I found out experimenting, the first one is a major one: if it's far from the values of another image, they are not similar from the human point of view at all.
So, the first one represents the average brightness of the image (again, you can pay most attention to green channel, then the red one, etc, but the default R->G->B order works just fine). The second number can be compared if the first two are very close, and it in fact represents the overall contrast of the image: if we have some black/white pattern or any contrast scene (lighted buildings in the city at night, for example) and if we are lucky, we will get huge numbers here if out positive members of sum are mostly bright, and negative ones are mostly dark, or vice versa. As I want my values to be always positive, I divide by 2 and shift by 127 here.
I wrote the code in PHP in 2017, and seems I lost the code. But I still have the screenshots:
The same image:
Black & White version:
Cropped version:
Another image, ranslated version:
Same color gamut as 4th, but another scene:
I tuned the difference thresholds so that the results are really nice. But as you can see, this simple algorithm cannot do anything good with simple scene translations.
On a side note I can notice that a modification can be written to make cropped copies from each of two images at 75-80 percent, 4 at the corners or 8 at the corners and middles of the edges, and then by comparing the cropped variants with another whole image just the same way; and if one of them gets a significantly better similarity score, then use its value instead of the default one).