So here is my problem:
I have an image, that image is large (high resolution) and it needs to be small (much lower resolution).
So I do the naive thing (kill every other pixel) and the result looks poor.
So I try to do something more intelligent (low pass filtering using a Fourier transform and re-sampling in Fourier space) and the result is a little better but still fairly poor.
So my question, is there a perceptually motivated image down-sampling algorithm (or implementation)?
edit:
While I am aware of a number of resampling techniques, my application is more concerned with preserving the perceptual features, rather than producing smooth images.
edit2: it is safe to assume I have some level of familiarity with digital signal processing, convolutions, wavelet transforms, etc
Read this:
http://www.dspguide.com/
OK, that's quite a read. But understanding filter design would be handy.
In general, the process for scaling an image from W1 x H1 to W2 x H2 where W1, W2, H1, H2 are integers, is to find new W3, H3 so that W1 and W2 are integer factors of W3 and H1 and H2 are integer factors of H3, and then pad the original image with zeros (used to space the pixels of the original image) so that it's now W3 x H3 in size. This introduces high frequencies due to discontinuities in the image, so you apply a low-pass filter to the image, and then decimate the filtered image to its new size (W2 x H2). Sounds like you might be trying to do this already, but the filtering can be done in the time domain so that the Fourier transform isn't really necessary.
In practice, the process I just described is optimized (you'll note that when applying a convolution filter to the upscaled image most of the terms will be 0, so you can avoid most of the multiplication operations in your algorithm, for example. And since you end up throwing away many of the filtered results, you don't need to calculate those, so you end up with a handful of multiplications and additions for each pixel in the target image, basically. The trick is to figure out which coefficients to use.)
libswscale in the ffmpeg project does something like this, I believe. Check it out:
http://gitorious.org/libswscale
As others pointed out, (and you apparently noticed) decimating the image introduces aliasing artifacts. I can't be sure about your resampling implementation, but the technique has interesting gotchas depending on the window size you use and other implementation details.
Bicubic interpolation is generally regarded as good enough, but there is no perfect solution, it depends on people and on the properties of the picture being resampled.
Related links:
I didn't even know that sharpness was also called acutance.
Aliasing is a problem that can occur when downsampling naively.
Pascal is right. Depends on the image, and on what you want. Some factors:
preserving sharp edges
preserving colours
algorithm speed
This is your method.
Some others:
Lanczos resampling
Bicubic interpolation
Spline interpolation
Note that sometimes resampling down can get you a sharper result than, say, using a lower resolution camera, because there will be edges in the high-resolution image that cannot be detected by a lower-res device.
Side note: Many algorithms (especially Nearest Neighbour) can be optimised if you are scaling down by an integer (e.g. dividing by 4 or 6).
Recommended ImageMagick "general purpose" downsampling methods are discussed here: http://www.imagemagick.org/Usage/filter/nicolas/#downsample
You could try a content aware resizing algorithm. See: http://www.seamcarving.com/
Paint Mono (an OS fork of Paint.NET) implements Supersampling algorithm for image downsampling here: http://code.google.com/p/paint-mono/source/browse/trunk/src/PdnLib/Surface.cs?spec=svn59&r=59#1313
Related
An image is given to us that has been corrupted by:
Gaussian blur
Gaussian noise
Motion blur
in that order. The parameters of all the above (filter size, variance, SNR, etc) are known to us.
How can we restore the image?
I have tried to compute the aggregate degradation function by convolving the above and then used the Weiner filter to restore, but the attempts have failed so far, since the blur still remains.
Could anyone please shed some light?
For Gaussian and motion blur, it is a matter of deducing the convolution kernel. Once it is known, deconvolution can be done in Fourier space. The Fourier transform of the image, divided by the Fourier transform of the kernel, gives the Fourier transform of a (hopefully) improved image.
Gaussians transform into other Gaussians, so there is no problem with divide by zero. But Gaussians do fall of rather fast, as exp(-x^2), so you'd be dividing by small numbers to obtain large whacky high frequency amplitudes. So, some sort constant bias or other way of keeping the FT of the kernal from getting small must be applied. That's where the Wiener filter comes in. The bias is usually chosen in relation to random noise levels, or quantization.
For motion blur, a typical case is when the clean image is convolved with a short line segment. Unfortunately, sharply cut-off line segments have plenty of zeros. Again, Wiener filter to the rescue.
Additive Guassian noise cannot be removed, but can be averaged out. The simplest quickest way is to blur the image with Gaussian, box, or other filter. Biggest problem with that - you end up with a blurred image! Median filters are somewhat better at preserving edges and details if not too small. There are many noise reduction techniques out there.
Sometimes noise reduction is easy for certain types of images. For Cassini imaging work, most image features were either high-contrast hard edges (planet edges, craters), or softly varying (cloud details in atmospheres) so I used an edge detector, fattened (dilated) its output, blurred it, and used that as a mask to protect parts of the image from a small-radius blur filter. Applying different filters.
There's Signal Processing Stack Exchange site (in beta for now) which may have questions and answers about restoring corrupted images. https://dsp.stackexchange.com/questions
I have been thinking about this for quite some time, but never really performed detailed analysis on this. Does the foreground segmentation using GrabCut[1] algorithm depend on the size of the input image? Intuitively, it appears to me that since grabcut is based on color models, color distributions should not change as the size of the image changes, but [aliasing] artifacts in smaller images might play a role.
Any thoughts or existing experiments on the dependence of size of the image on image segmentation using grabcut would be highly appreciated.
Thanks
[1] C. Rother, V. Kolmogorov, and A. Blake, GrabCut: Interactive foreground extraction using iterated graph cuts, ACM Trans. Graph., vol. 23, pp. 309–314, 2004.
Size matters.
The objective function of GrabCut balances two terms:
The unary term that measures the per-pixel fit to the foreground/background color model.
The smoothness term (pair-wise term) that measures the "complexity" of the segmentation boundary.
The first term (unary) scales with the area of the foreground while the second (smoothness) scales with the perimeter of the foreground.
So, if you scale your image by a x2 factor you increase the area by x4 while the perimeter scales only roughly by a x2 factor.
Therefore, if you tuned (or learned) the parameters of the energy function for a specific image size / scale, these parameters may not work for you in different image sizes.
PS
Did you know that Office 2010 "foreground selection tool" is based on GrabCut algorithm?
Here's a PDF of the GrabCut paper, courtesy of Microsoft Research.
The two main effects of image size will be run time and the scale of details in the image which may be considered significant. Of these two, run time is the one which will bite you with GrabCut - graph cutting methods are already rather slow, and GrabCut uses them iteratively.
It's very common to start by downsampling the image to a smaller resolution, often in combination with a low-pass filter (i.e. you sample the source image with a Gaussian kernel). This significantly reduces the n which the algorithm runs over while reducing the effect of small details and noise on the result.
You can also use masking to restrict processing to only specific portions of the image. You're already getting some of this in GrabCut as the initial "grab" or selection stage, and again later during the brush-based refinement stage. This stage also gives you some implicit information about scale, i.e. the feature of interest is probably filling most of the selection region.
Recommendation:
Display the image at whatever scale is convenient and downsample the selected region to roughly the n = 100k to 200k range per their example. If you need to improve the result quality, use the result of the initial stage as the starting point for a following iteration at higher resolution.
I'm writing a wxpython widget which shows the state of several objects over time (x cycles). Right now I have it working with 1 pixel/cycle and zooming in and back out to 1:1; but I would like to allow zooming out. I wanted to see if there are any go-to algorithms for thowing away/combining data before I started rolling my own using only my own feeble heuristics. Is there any such algo, or should I just start coding my own solution?
Depends a lot on what type of images you're resizing. See The myth of infinite detail: Bilinear vs. Bicubic and Better Image Resizing by our very own Jeff! There you can compare results of naive nearest neighbor, bilinear filtering, bicubic filtering, bicubic sharper and genuine fractals.
Jeff's conclusion:
Reducing images is a completely safe
and rational operation. You're simply
reducing precision and resolution by
discarding information. Make the image
as small as you want, and you have
complete fidelity-- within the bounds
of the number of pixels you've
a> llowed. You'll get good results no
matter which algorithm you pick.
(Well, unless you pick the nave Pixel
Resize or Nearest Neighbor
algorithms.)
Enlarging images is risky. Beyond a
certain point, enlarging images is a
fool's errand; you can't magically
s> ynthesize an infinite number of new
pixels out of thin air. And
interpolated pixels are never as good
as real pixels. That's why it's more
than a little artificial to upsize the
512x512 Lena image by 500%. It'd be
smarter to find a higher resolution
scan or picture of whatever you need*
than it would be to upsize it in
software.
But when you can't avoid enlarging an
image, that's when it pays to know the
tradeoffs between bicubic, bilinear,
and more advanced resizing algorithms.
At least arm yourself with enough
knowledge to pick the best of the bad
options you have.
The image resizing function provided by Emgu (a .net wrapper for OpenCV) can use any one of four interpolation methods:
CV_INTER_NN (default)
CV_INTER_LINEAR
CV_INTER_CUBIC
CV_INTER_AREA
I roughly understand linear interpolation, but can only guess what cubic or area do. I suspect NN stands for nearest neighbour, but I could be wrong.
The reason I'm resizing an image is to reduce the amount of pixels (they will be iterated over at some point) whilst keeping them representative. I mention this because it seems to me that interpolation is central to this purpose - getting the right type ought therefore be quite important.
My question then, is what are the pros and cons of each interpolation method? How do they differ and which one should I use?
Nearest neighbor will be as fast as possible, but you will lose substantial information when resizing.
Linear interpolation is less fast, but will not result in information loss unless you're shrinking the image (which you are).
Cubic interpolation (probably actually "Bicubic") uses one of many possible formulas that incorporate multiple neighbor pixels. This is much better for shrinking images, but you are still limited as to how much shrinking you can do without information loss. Depending on the algorithm, you can probably reduce your images by 50% or 75%. The primary con of this approach is that it is much slower.
Not sure what "area" is - it may actually be "Bicubic". In all likelihood, this setting will give your best result (in terms of information loss / appearance), but at the cost of the longest processing time.
Update: this link gives more details (including a fifth type not included in your list):
http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html?highlight=resize#resize
The algorithms are: (descriptions are from the OpenCV documentation)
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
If you want more speed use Nearest Neighbor method.
If you want to preserve quality of Image after downsampling, you can consider using INTER_AREA based interpolation, but again it depends on image content.
You can find detailed analysis of speed comparison here
Below is the speed comparison on 400*400 px image taken from the above link
The interpolation method to use depends on what you are trying to achieve:
CV_INTER_LINEAR or CV_INTER_CUBIC apply a lowpass filter (average) in order to achieve a trade-off between visual quality and edge removal (lowpass filters tend to remove edges in order to reduce aliasing in images). Between these two, i'd recommend you CV_INTER_CUBIC.
CV_INTER_NN method actually is Nearest neighbour, it's the most basic method and you'll get sharper edges (no lowpass filter will be applied). However this method simply is like "zooming" the image, no visual enhancement.
They all lose information, which you use depends on the speed you need, how much information you can afford to lose and the nature of your image.
Sorry there is no correct answer - that's why there is a choice
How do I segment a 2D image into blobs of similar values efficiently? The given input is a n array of integer, which includes hue for non-gray pixels and brightness of gray pixels.
I am writing a virtual mobile robot using Java, and I am using segmentation to analyze the map and also the image from the camera. This is a well-known problem in Computer Vision, but when it's on a robot performance does matter so I wanted some inputs. Algorithm is what matters, so you can post code in any language.
Wikipedia article: Segmentation (image processing)
[PPT] Stanford CS-223-B Lecture 11 Segmentation and Grouping (which says Mean Shift is perhaps the best technique to date)
Mean Shift Pictures (paper is also available from Dorin Comaniciu)
I would downsample,in colourspace and in number of pixels, use a vision method(probably meanshift) and upscale the result.
This is good because downsampling also increases the robustness to noise, and makes it more likely that you get meaningful segments.
You could use floodfill to smooth edges afterwards if you need smoothness.
Some more thoughts (in response to your comment).
1) Did you blend as you downsampled? y[i]=(x[2i]+x[2i+1])/2 This should eliminate noise.
2)How fast do you want it to be?
3)Have you tried dynamic meanshift?(also google for dynamic x for all algorithms x)
Not sure if it is too efficient, but you could try using a Kohonen neural network (or, self-organizing map; SOM) to group the similar values, where each pixel contains the original color and position and only the color is used for the Kohohen grouping.
You should read up before you implement this though, as my knowledge of the Kohonen network goes as far as that it is used for grouping data - so I don't know what the performance/viability options are for your scenario.
There are also Hopfield Networks. They can be mangled into grouping from what I read.
What I have now:
Make a buffer of the same size as the input image, initialized to UNSEGMENTED.
For each pixel in the image where the corresponding buffer value is not UNSEGMENTED, flood the buffer using the pixel value.
a. The border checking of the flooding is done by checking if pixel is within EPSILON (currently set to 10) of the originating pixel's value.
b. Flood filling algorithm.
Possible issue:
The 2.a.'s border checking is called many times in the flood filling algorithm. I could turn it into a lookup if I could precalculate the border using edge detection, but that may add more time than current check.
private boolean isValuesCloseEnough(int a_lhs, int a_rhs) {
return Math.abs(a_lhs - a_rhs) <= EPSILON;
}
Possible Enhancement:
Instead of checking every single pixel for UNSEGMENTED, I could randomly pick a few points. If you are expecting around 10 blobs, picking random points in that order may suffice. Drawback is that you might miss a useful but small blob.
Check out Eyepatch (eyepatch.stanford.edu). It should help you during the investigation phase by providing a variety of possible filters for segmentation.
An alternative to flood-fill is the connnected-components algorithm. So,
Cheaply classify your pixels. e.g. divide pixels in colour space.
Run the cc to find the blobs
Retain the blobs of significant size
This approach is widely used in early vision approaches. For example in the seminal paper "Blobworld: A System for Region-Based Image Indexing and Retrieval".