Downscaling images using Bilinear and Bicubic algorithms - image

I'm trying to understand how simple algos for resizing images work.
I've implemented Bilinear and Bicubic interpolation methods and they both work perfectly when used to upscale images. However if I'm using the same methods for downscaling(let's say 0.25x of original one) the image starts to drop tiny details.
For example(wires arent a straight lines anymore):
I understand what causes this effect.
So my question is how it can be fixed? I mean if I'm using the same algos in different software the final image doesn't have that kind of distortion at this type of details(in my case - wires).
The only solution I've found so far is to downscale image step by step at rate not more than 10-20% of its original size per step, but it's of course much slower.
Thanks for your attention.

Interestingly, few textbooks on image processing warn about this: for downsampling, interpolation is both overkill and inappropriate. Inappropriate because by the Shannon theorem, you shouldn't sample below the Nyquist frequency, or you get nasty aliasing effects. Overkill because subpixel accuracy is useless when the pixels are tiny.
There are two ways to obtain correct results:
lowpass filtering (blur) followed by decimation,
averaging of blocks of pixels.
The latter is more efficient from a computational point of view, but a little less flexible.

Related

How to remove a scratch from an image using matlab

Let's say I have this image this:
With a black scratch and I want to remove it from my image. I know it is noise. I have tried neighbourhood filter and also gaussian filter but no success.
If you know the location of the scratch, this problem is known as inpainting, and there are very sophisticated algorithms for that. So one approach would be to detect the scratch as good as you can, then use a standard inpainting algorithm on it. I've played with your image in Mathematica a little:
First I applied a median filter to the image. As you found out yourself, this removes the scratch, but also removes a lot of detail. The difference between median and original image is a good indicator for your scratch, though:
When I binarize this image with a manually selected threshold, I get a quick&dirty scratch detector:
If you have more knowledge about what your scratches look like, you can improve this detector a lot. e.g. are the scratches always dark? Do they always have high contrast? Are they always smooth curves, i.e. is their curvature always low? - Each of these properties can be measured somehow, so you'd combine these measurements to a single image and binarize that.
One small improvement is to remove small components:
This is still not perfect, but the result is good enough to use it as an inpainting mask:
This will remove some detail, too, but the differences are harder to spot.
Full Mathematica code:
difference = ImageDifference[sourceImage, MedianFilter[sourceImage, 2]];
mask = DeleteSmallComponents[Binarize[difference, 0.15], 15];
Inpaint[sourceImage, mask]
EDIT:
If you're don't have access to a standard inpainting algorithm (like Navier Stokes or Telea), a poor man's algorithm would be to use the median filtered image in those regions where the mask is 1 (probably something like mask*sourceImage + (1-mask)*medialFilteredImage in Matlab). Depending on the image data, the difference might not be worth the extra effort of a "real" inpainting algorithm:
A filter for Avisynth and a plugin for VirtualDub (my two favourite video editing tools). It will hardly get better than these two (You can learn from them if you really need to implement it yourself).
My result using median filter with ImageJ

Algorithm for culling pixels in a graphical data view?

I'm writing a wxpython widget which shows the state of several objects over time (x cycles). Right now I have it working with 1 pixel/cycle and zooming in and back out to 1:1; but I would like to allow zooming out. I wanted to see if there are any go-to algorithms for thowing away/combining data before I started rolling my own using only my own feeble heuristics. Is there any such algo, or should I just start coding my own solution?
Depends a lot on what type of images you're resizing. See The myth of infinite detail: Bilinear vs. Bicubic and Better Image Resizing by our very own Jeff! There you can compare results of naive nearest neighbor, bilinear filtering, bicubic filtering, bicubic sharper and genuine fractals.
Jeff's conclusion:
Reducing images is a completely safe
and rational operation. You're simply
reducing precision and resolution by
discarding information. Make the image
as small as you want, and you have
complete fidelity-- within the bounds
of the number of pixels you've
a> llowed. You'll get good results no
matter which algorithm you pick.
(Well, unless you pick the nave Pixel
Resize or Nearest Neighbor
algorithms.)
Enlarging images is risky. Beyond a
certain point, enlarging images is a
fool's errand; you can't magically
s> ynthesize an infinite number of new
pixels out of thin air. And
interpolated pixels are never as good
as real pixels. That's why it's more
than a little artificial to upsize the
512x512 Lena image by 500%. It'd be
smarter to find a higher resolution
scan or picture of whatever you need*
than it would be to upsize it in
software.
But when you can't avoid enlarging an
image, that's when it pays to know the
tradeoffs between bicubic, bilinear,
and more advanced resizing algorithms.
At least arm yourself with enough
knowledge to pick the best of the bad
options you have.

How do I choose an image interpolation method? (Emgu/OpenCV)

The image resizing function provided by Emgu (a .net wrapper for OpenCV) can use any one of four interpolation methods:
CV_INTER_NN (default)
CV_INTER_LINEAR
CV_INTER_CUBIC
CV_INTER_AREA
I roughly understand linear interpolation, but can only guess what cubic or area do. I suspect NN stands for nearest neighbour, but I could be wrong.
The reason I'm resizing an image is to reduce the amount of pixels (they will be iterated over at some point) whilst keeping them representative. I mention this because it seems to me that interpolation is central to this purpose - getting the right type ought therefore be quite important.
My question then, is what are the pros and cons of each interpolation method? How do they differ and which one should I use?
Nearest neighbor will be as fast as possible, but you will lose substantial information when resizing.
Linear interpolation is less fast, but will not result in information loss unless you're shrinking the image (which you are).
Cubic interpolation (probably actually "Bicubic") uses one of many possible formulas that incorporate multiple neighbor pixels. This is much better for shrinking images, but you are still limited as to how much shrinking you can do without information loss. Depending on the algorithm, you can probably reduce your images by 50% or 75%. The primary con of this approach is that it is much slower.
Not sure what "area" is - it may actually be "Bicubic". In all likelihood, this setting will give your best result (in terms of information loss / appearance), but at the cost of the longest processing time.
Update: this link gives more details (including a fifth type not included in your list):
http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html?highlight=resize#resize
The algorithms are: (descriptions are from the OpenCV documentation)
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
If you want more speed use Nearest Neighbor method.
If you want to preserve quality of Image after downsampling, you can consider using INTER_AREA based interpolation, but again it depends on image content.
You can find detailed analysis of speed comparison here
Below is the speed comparison on 400*400 px image taken from the above link
The interpolation method to use depends on what you are trying to achieve:
CV_INTER_LINEAR or CV_INTER_CUBIC apply a lowpass filter (average) in order to achieve a trade-off between visual quality and edge removal (lowpass filters tend to remove edges in order to reduce aliasing in images). Between these two, i'd recommend you CV_INTER_CUBIC.
CV_INTER_NN method actually is Nearest neighbour, it's the most basic method and you'll get sharper edges (no lowpass filter will be applied). However this method simply is like "zooming" the image, no visual enhancement.
They all lose information, which you use depends on the speed you need, how much information you can afford to lose and the nature of your image.
Sorry there is no correct answer - that's why there is a choice

Where can I find a good read about bicubic interpolation and Lanczos resampling?

I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).
I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.
Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?
The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.
One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.
One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.
Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.
For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
This function is called the filter kernel.
To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.
Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.
Bicubic convolution is basically the same, with a different filter kernel function.
To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.
Also, image_operations.cc and convolver.cc in skia have a pretty well commented implementation of lanczos interpolation.
While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.
As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.
Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).
Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).
Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.
The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.
To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.
So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.
It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"
Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.
I would like suggest the following article for a basic understanding of different image interpolation methods image interpolation via convolution. If you want to try more interpolation methods, the imageresampler is a nice open source project to begin with.
In my opinion image interpolation can be understood from two aspects, one is from function fitting perspective, and one is from convolution perspective. For example, the spline interpolation explained in image interpolation via convolution is well explained from function fitting perspective in Cubic interpolation.
Additionally, image interpolation is always related to a specific application, for example image zooming, image rotation and so on. In fact for a specific application, image interpolation can be implemented i.n a smart way. For example, image rotation can be implemented via a three-shearing method, and during each shearing operation different one-dimension interpolation algorithms can be implemented.

Detecting if two images are visually identical

Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!");
try $image->Modulate(saturation=>-100);
try $image->Blur(radius=>3,sigma=>99);
try $image->Normalize();
try $image->Equalize();
try $image->Sample("16x16");
try $image->Threshold();
try $image->Set(magick=>'mono');
($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.
findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.
Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.
Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.
I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.
resize the image to a 1x1 pixle... if they are exact, there is a small probability they are the same picture...
now resize it to a 2x2 pixle image, if all 4 pixles are exact, there is a larger probability they are exact...
then 3x3, if all 9 pixles are exact... good chance etc.
then 4x4, if all 16 pixles are exact,... better chance.
etc...
doing it this way, you can make efficiency improvments... if the 1x1 pixel grid is off by a lot, why bother checking 2x2 grid? etc.
If you have lots of images, a color histogram could be used to get rough closeness of images before doing a full image comparison of each image against each other one (i.e. O(n^2)).
There is DPEG, "The" Duplicate Media Manager, but its code is not open. It's a very old tool - I remember using it in 2003.
You could use diff to see if they are REALLY different.. I guess it will remove lots of useless comparison. Then, for the algorithm, I would use a probabilistic approach.. what are the chances that they look the same.. I'd based that on the amount of rgb in each pixel. You could also find some other metrics such as luminosity and stuff like that.

Resources