What factors are best for image resizing? [closed] - image

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Let's say I have an image that is 3000 px wide. I know (at least I think I do) that if I downsize it to be 1500 px wide (that is, 50%), the result will be better than if I resize it to be 1499 or 1501 px wide.
I suppose that will be so regardless of the algorithm used. But I have no solid proof, and the reason I'd like to have proof is that it could help me decide less obvious cases.
For instance, reducing it to 1000 px (one third) will also presumably work ok. But what about 3/4? Is it better than 1/2? It certainly can hold more detail, but will part of it not become irretrievably fuzzy? Is there a metric for the 'incurred fuzziness' which can be offset against the actual resolution?
For instance, I suppose such a metric would clearly show 3000 -> 1501 to be worse than 3000 -> 1500, more than is gained by 1501 > 1500.
Intuitively, 1/n resizes where n is a factor of the original size would yield the best results, followed by n/m where the numbers were the lowest possible. Where the original size (both X and Y) were not a multiple of the denominator, the results are expected to be poorer, tho I have no proof of that.
These issues must have been studied by someone. People have devised all sorts of complex algorithms, they must take this somehow in consideration. But I don't even know here to ask these questions. I ask them here because I've seen related ones with good answers. Thanks for your attention and please excuse the contrived presentation.

The algorithm is key. Here's a list of common ones, from lowest quality to highest. As you get higher in quality, the exact ratio of input size to output size makes less of a difference. By the end of the list you shouldn't be able to tell the difference between resizing to 1499 or 1500.
Nearest Neighbor, i.e. keeping some pixels and dropping others.
Bilinear interpolation. This takes the 2x2 area of pixels around the point where your ideal sample would be, and calculates a new value based on how close its position is to each of the 4 pixels. It doesn't work well if you're reducing below 2:1 because it starts to resemble nearest neighbor.
Bicubic interpolation. Similar to Bilinear but using a 3x3 area with a more complex formula to get sharper results. Again not good below 2:1.
Pixel averaging. If this isn't done with an integer multiple of input to output you'll be averaging different amounts each time and the results will be uneven.
Lanczos filtering. This takes a number of pixels from the input and runs them through a modified version of Sinc that attempts to retain as much of the detail as possible while keeping the calculations tractable. The size and speed of the filter varies with the resizing ratio. It's slow, but not as slow as Sinc.
Sinc filtering. This is theoretically perfect, but it requires processing a large chunk of input for every pixel output so it's very slow. You may also notice the difference between theory and practice when you see ringing artifacts in the output.

The answer to your question:
Most important factor is choosing a good re-sizing algorithm. For example, bi-cubic interpolation will not work well if you re-size by factor > 2 and do not apply smoothing. Unfortunately there is no best algorithm. If you are using photoshop or other advanced resizing tool you may chose the algorithm. In 'Picasa' you cannot choose. Each algorithm has its downsides. Some are better for natural images, other for computer graphic generated image
Less important factor is round division. The larger the output image size the better results you will get but the file will take more megabytes. Re-scaling from 3000 pix to 1600 will give you visually better results than re-scaling to 1500.
Another factor - amount of rescales. Resizing an image from 3000 to 2000 and than to 1500 will produce slightly worse result than direct resize from 3000 to 1500. Each time you resize the image, some information is lost
Friendly advice: keep the size of your image (both height and width) divisible by 4. For example 1501 is a bad size, 1500 or 1504 is better. The reason is that some hardware deal faster with images with size divisible by 4. Though quality will not improve, your browsing experience will be better.
If you display your image on a computer screen, try to match its size to the size of your screen. Otherwise the display process will make another resampling and you will not be able to observe the true beauty of your image.
If you intend to print your image, better have a high resolution. You will need at least 300 dpi. So if you want to print it on 10 inch paper, leave it at least 3000 pixels.
The last one is obvious, but I will mention it: try to keep the original aspect ratio when you resize the image. Otherwise it will become distorted. So if you downscale it from 3000 width to 1499, then you will not be able to choose an integer for image height to keep the original aspect ratio.
JPEG compression will harm you image much more than the difference in visual quality between 1500 pix image and 1499. Keep that in mind. Even with slight compression you will not be able to see the difference in quality
As a summary - stop worrying about image exact size. Choose a modern resampling algorithm (if you can), roughly estimate the size as a trade-of between size on disk, image quality and printing paper size (if relevant).
Keep the original aspect ratio and remember that JPEG compression will harm your image much more than the difference in visual quality of different resampling algorithms or slight variation in image size.

Related

How to estimate GIF file size?

We're building an online video editing service. One of the features allows users to export a short segment from their video as an animated gif. Imgur has a file size limit of 2Mb per uploaded animated gif.
Gif file size depends on number of frames, color depth and the image contents itself: a solid flat color result in a very lightweight gif, while some random colors tv-noise animation would be quite heavy.
First I export each video frame as a PNG of the final GIF frame size (fixed, 384x216).
Then, to maximize gif quality I undertake several gif render attempts with slightly different parameters - varying number of frames and number of colors in the gif palette. The render that has the best quality while staying under the file size limit gets uploaded to Imgur.
Each render takes time and CPU resources — this I am looking to optimize.
Question: what could be a smart way to estimate the best render settings depending on the actual images, to fit as close as possible to the filesize limit, and at least minimize the number of render attempts to 2–3?
The GIF image format uses LZW compression. Infamous for the owner of the algorithm patent, Unisys, aggressively pursuing royalty payments just as the image format got popular. Turned out well, we got PNG to thank for that.
The amount by which LZW can compress the image is extremely non-deterministic and greatly depends on the image content. You, at best, can provide the user with a heuristic that estimates the final image file size. Displaying, say, a success prediction with a colored bar. You'd can color it pretty quickly by converting just the first frame. That won't take long on 384x216 image, that runs in human time, a fraction of a second.
And then extrapolate the effective compression rate of that first image to the subsequent frames. Which ought to encode only small differences from the original frame so ought to have comparable compression rates.
You can't truly know whether it exceeds the site's size limit until you've encoded the entire sequence. So be sure to emphasize in your UI design that your prediction is just an estimate so your user isn't going to disappointed too much. And of course provide him with the tools to get the size lowered, something like a nearest-neighbor interpolation that makes the pixels in the image bigger. Focusing on making the later frames smaller can pay off handsomely as well, GIF encoders don't normally do this well by themselves. YMMV.
There's no simple answer to this. Single-frame GIF size mainly depends on image entropy after quantization, and you could try using stddev as an estimator using e.g. ImageMagick:
identify -format "%[fx:standard_deviation]" imagename.png
You can very probably get better results by running a smoothing kernel on the image in order to eliminate some high-frequency noise that's unlikely to be informational, and very likely to mess up compression performance. This goes much better with JPEG than with GIF, anyway.
Then, in general, you want to run a great many samples in order to come up with something of the kind (let's say you have a single compression parameter Q)
STDDEV SIZE W/Q=1 SIZE W/Q=2 SIZE W/Q=3 ...
value1 v1,1 v1,2 v1,3
After running several dozens of tests (but you need do this only once, not "at runtime"), you will get both an estimate of, say, , and a measurement of its error. You'll then see that an image with stddev 0.45 that compresses to 108 Kb when Q=1 will compress to 91 Kb plus or minus 5 when Q=2, and 88 Kb plus or minus 3 when Q=3, and so on.
At that point you get an unknown image, get its stddev and compression #Q=1, and you can interpolate the probable size when Q equals, say, 4, without actually running the encoding.
While your service is active, you can store statistical data (i.e., after you really do the encoding, you store the actual results) to further improve estimation; after all you'd only store some numbers, not any potentially sensitive or personal information that might be in the video. And acquiring and storing those numbers would come nearly for free.
Backgrounds
It might be worthwhile to recognize images with a fixed background; in that case you can run some adaptations to make all the frames identical in some areas, and have the GIF animation algorithm not store that information. This, when and if you get such a video (e.g. a talking head), could lead to huge savings (but would throw completely off the parameter estimation thing, unless you could estimate also the actual extent of the background area. In that case, let this area be B, let the frame area be A, the compressed "image" size for five frames would be A+(A-B)*(5-1) instead of A*5, and you could apply this correction factor to the estimate).
Compression optimization
Then there are optimization techniques that slightly modify the image and adapt it for a better compression, but we'd stray from the topic at hand. I had several algorithms that worked very well with paletted PNG, which is similar to GIF in many regards, but I'd need to check out whether and which of them may be freely used.
Some thoughts: LZW algorithm goes on in lines. So whenever a sequence of N pixels is "less than X%" different (perceptually or arithmetically) from an already encountered sequence, rewrite the sequence:
018298765676523456789876543456787654
987678656755234292837683929836567273
here the 656765234 sequence in the first row is almost matched by the 656755234 sequence in the second row. By changing the mismatched 5 to 6, the LZW algorithm is likely to pick up the whole sequence and store it with one symbol instead of three (6567,5,5234) or more.
Also, LZW works with bits, not bytes. This means, very roughly speaking, that the more the 0's and 1's are balanced, the worse the compression will be. The more unpredictable their sequence, the worse the results.
So if we can find out a way of making the distribution more **a**symmetrical, we win.
And we can do it, and we can do it losslessly (the same works with PNG). We choose the most common colour in the image, once we have quantized it. Let that color be color index 0. That's 00000000, eight fat zeroes. Now we choose the most common colour that follows that one, or the second most common colour; and we give it index 1, that is, 00000001. Another seven zeroes and a single one. The next colours will be indexed 2, 4, 8, 16, 32, 64 and 128; each of these has only a single bit 1, all others are zeroes.
Since colors will be very likely distributed following a power law, it's reasonable to assume that around 20% of the pixels will be painted with the first nine most common colours; and that 20% of the data stream can be made to be at least 87.5% zeroes. Most of them will be consecutive zeroes, which is something that LZW will appreciate no end.
Best of all, this intervention is completely lossless; the reindexed pixels will still be the same colour, it's only the palette that will be shifted accordingly. I developed such a codec for PNG some years ago, and in my use case scenario (PNG street maps) it yielded very good results, ~20% gain in compression. With more varied palettes and with LZW algorithm the results will be probably not so good, but the processing is fast and not too difficult to implement.

Accuracy depending on the image size

I would like to know if my opinion is correct or no:
If we consider a specific model that is able to perform several complex computations in order to compute the accuracy representing the correct classification rate of a big image database input.
Note: all the images have the size: 300 x 200 pixels.
FIRST
The images' size is reduced to 180 x 180, so the model is then computed by using these resized images database.
SECONDLY
The images' size is reduced to 120 x 120, so the model is then computed by using these resized images database.
In this case, is it correct that when the size of the images increases, the accuracy also increases? (sure that the time complexity increases)
And when the size of the images decreases (like in second point: from 180x180 to 120x120), the accuracy also decreases? (but sure that the time complexity decreases).
I need your opinions, with brief explanation. Any help will be very appreciated!
The answer is "it depends". It depends on the specific problem you are trying to solve. If you are training a classifier to determine whether or not an image contains a face, you can get away with reducing the size of the image quite a bit. 32x32 is a common size used by face detectors. On the other hand, if you are trying to determine whose face it is, you will most likely need a higher-resolution image.
Think about it this way: reducing the size of the image removes high-frequency information. The more of it you remove, the less specific your representation becomes. I would expect that decreasing image size would decrease false negatives and increase false positives, but again, that depends on what kind of categories are you trying to classify. For any particular problem there is probably a "sweet spot", an image size that yields the maximum accuracy.

Does GrabCut Segmentation depend on the size of the image?

I have been thinking about this for quite some time, but never really performed detailed analysis on this. Does the foreground segmentation using GrabCut[1] algorithm depend on the size of the input image? Intuitively, it appears to me that since grabcut is based on color models, color distributions should not change as the size of the image changes, but [aliasing] artifacts in smaller images might play a role.
Any thoughts or existing experiments on the dependence of size of the image on image segmentation using grabcut would be highly appreciated.
Thanks
[1] C. Rother, V. Kolmogorov, and A. Blake, GrabCut: Interactive foreground extraction using iterated graph cuts, ACM Trans. Graph., vol. 23, pp. 309–314, 2004.
Size matters.
The objective function of GrabCut balances two terms:
The unary term that measures the per-pixel fit to the foreground/background color model.
The smoothness term (pair-wise term) that measures the "complexity" of the segmentation boundary.
The first term (unary) scales with the area of the foreground while the second (smoothness) scales with the perimeter of the foreground.
So, if you scale your image by a x2 factor you increase the area by x4 while the perimeter scales only roughly by a x2 factor.
Therefore, if you tuned (or learned) the parameters of the energy function for a specific image size / scale, these parameters may not work for you in different image sizes.
PS
Did you know that Office 2010 "foreground selection tool" is based on GrabCut algorithm?
Here's a PDF of the GrabCut paper, courtesy of Microsoft Research.
The two main effects of image size will be run time and the scale of details in the image which may be considered significant. Of these two, run time is the one which will bite you with GrabCut - graph cutting methods are already rather slow, and GrabCut uses them iteratively.
It's very common to start by downsampling the image to a smaller resolution, often in combination with a low-pass filter (i.e. you sample the source image with a Gaussian kernel). This significantly reduces the n which the algorithm runs over while reducing the effect of small details and noise on the result.
You can also use masking to restrict processing to only specific portions of the image. You're already getting some of this in GrabCut as the initial "grab" or selection stage, and again later during the brush-based refinement stage. This stage also gives you some implicit information about scale, i.e. the feature of interest is probably filling most of the selection region.
Recommendation:
Display the image at whatever scale is convenient and downsample the selected region to roughly the n = 100k to 200k range per their example. If you need to improve the result quality, use the result of the initial stage as the starting point for a following iteration at higher resolution.

How to estimate the size of JPEG image which will be scaled down

For example, I have an 1024*768 JPEG image. I want to estimate the size of the image which will be scaled down to 800*600 or 640*480. Is there any algorithm to calculate the size without generating the scaled image?
I took a look in the resize dialog in Photoshop. The size they show is basically (width pixels * height pixels * bit/pixel) which shows a huge gap between the actual file size.
I have mobile image browser application which allow user to send image through email with options to scale down the image. We provide check boxes for the user to choose down-scale resolution with the estimate size. For large image (> 10MB), we have 3 down scale size to choose from. If we generate a cached image for each option, it may hurt the memory. We are trying to find the best solution which avoid memory consumption.
I have successfully estimated the scaled size based on the DQT - the quality factor.
I conducted some experiments and find out if we use the same quality factor as in the original JPEG image, the scaled image will have size roughly equal to (scale factor * scale factor) proportion of the original image size. The quality factor can be estimate based on the DQT defined in the every JPEG image. Algorithm has be defined to estimate the quality factor based on the standard quantization table shown in Annex K in JPEG spec.
Although other factors like color subsampling, different compression algorithm and the image itself will contribute to error, the estimation is pretty accurate.
P.S. By examining JPEGSnoop and it source code, it helps me a lot :-)
Cheers!
Like everyone else said, the best algorithm to determine what sort of JPEG compression you'll get is the JPEG compression algorithm.
However, you could also calculate the Shannon entropy of your image, in order to try and understand how much information is actually present. This might give you some clues as to the theoretical limits of your compression, but is probably not the best solution for your problem.
This concept will help you measure the differences in information between an all white image and that of a crowd, which is related to it's compressibility.
-Brian J. Stinar-
Why estimate what you can measure?
In essence, it's impossible to provide any meaningful estimate due to the fact that different types of images (in terms of their content) will compress very differently using the JPEG algorithm. (A 1024x768 pure white image will be vastly smaller than a photograph of a crowd scene for example.)
As such, if you're after an accurate figure it would make sense to simply carry out the re-size.
Alternatively, you could just provide an range such as "40KB to 90KB", based on an "average" set of images.
I think what you want is something weird and difficult to do. Based on JPG compression level some images are heavier that others in terms of heavier (size).
My hunch for JPEG images: Given two images at same resolution, compressed at the same quality ratio - the image taking smaller memory will compress more (in general) when its resolution is reduced.
Why? From experience: many times when working with a set of images, I have seen that if a thumbnail is occupying significantly more memory than most others, reducing its resolution has almost no change in the size (memory). On other hand, reducing resolution of one of the average size thumbnails reduces the size significantly. (all parameters like original/final resolution and JPEG quality being the same in the two cases).
Roughly speaking - higher the entropy, less will be the impact on size of image by changing resolution (at the same JPEG quality).
If you can verify this with experiments, maybe you can use this as a quick method to estimate the size. If my language is confusing, I can explain with some mathematical notation/psuedo formula.
An 800*600 image file should be roughly (800*600)/(1024*768) times as large as the 1024*768 image file it was scaled down from. But this is really a rough estimate, because the compressibility of original and scaled versions of the image might be different.
Before I attempt to answer your question, I'd like to join the ranks of people that think it's simpler to measure rather than estimate. But it's still an interesting question, so here's my answer:
Look at the block DCT coefficients of the input JPEG image. Perhaps you can find some sort of relationship between the number of higher frequency components and the file size after shrinking the image.
My hunch: all other things (e.g. quantization tables) being equal, the more higher frequency components you have in your original image, the bigger the difference in file size between the original and shrinked image will be.
I think that by shrinking the image, you will reduce some of the higher frequency components during interpolation, increasing the possibility that they will be quantized to zero during the lossy quantization step.
If you go down this path, you're in luck: I've been playing with JPEG block DCT coefficients and put some code up to extract them.

Image fingerprint to compare similarity of many images

I need to create fingerprints of many images (about 100.000 existing, 1000 new per day, RGB, JPEG, max size 800x800) to compare every image to every other image very fast. I can't use binary compare methods because also images which are nearly similar should be recognized.
Best would be an existing library, but also some hints to existing algorithms would help me a lot.
Normal hashing or CRC calculation algorithms do not work well with image data. The dimensional nature of the information must be taken into account.
If you need extremely robust fingerprinting, such that affine transformations (scaling, rotation, translation, flipping) are accounted for, you can use a Radon transformation on the image source to produce a normative mapping of the image data - store this with each image and then compare just the fingerprints. This is a complex algorithm and not for the faint of heart.
a few simple solutions are possible:
Create a luminosity histogram for the image as a fingerprint
Create scaled down versions of each image as a fingerprint
Combine technique (1) and (2) into a hybrid approach for improved comparison quality
A luminosity histogram (especially one that is separated into RGB components) is a reasonable fingerprint for an image - and can be implemented quite efficiently. Subtracting one histogram from another will produce a new historgram which you can process to decide how similar two images are. Histograms, because the only evaluate the distribution and occurrence of luminosity/color information handle affine transformations quite well. If you quantize each color component's luminosity information down to an 8-bit value, 768 bytes of storage are sufficient for the fingerprint of an image of almost any reasonable size. Luminosity histograms produce false negatives when the color information in an image is manipulated. If you apply transformations like contrast/brightness, posterize, color shifting, luminosity information changes. False positives are also possible with certain types of images ... such as landscapes and images where a single color dominates others.
Using scaled images is another way to reduce the information density of the image to a level that is easier to compare. Reductions below 10% of the original image size generally lose too much of the information to be of use - so an 800x800 pixel image can be scaled down to 80x80 and still provide enough information to perform decent fingerprinting. Unlike histogram data, you have to perform anisotropic scaling of the image data when the source resolutions have varying aspect ratios. In other words, reducing a 300x800 image into an 80x80 thumbnail causes deformation of the image, such that when compared with a 300x500 image (that's very similar) will cause false negatives. Thumbnail fingerprints also often produce false negatives when affine transformations are involved. If you flip or rotate an image, its thumbnail will be quite different from the original and may result in a false positive.
Combining both techniques is a reasonable way to hedge your bets and reduce the occurence of both false positives and false negatives.
There is a much less ad-hoc approach than the scaled down image variants that have been proposed here that retains their general flavor, but which gives a much more rigorous mathematical basis for what is going on.
Take a Haar wavelet of the image. Basically the Haar wavelet is the succession of differences from the lower resolution images to each higher resolution image, but weighted by how deep you are in the 'tree' of mipmaps. The calculation is straightforward. Then once you have the Haar wavelet appropriately weighted, throw away all but the k largest coefficients (in terms of absolute value), normalize the vector and save it.
If you take the dot product of two of those normalized vectors it gives you a measure of similarity with 1 being nearly identical. I posted more information over here.
You should definitely take a look at phash.
For image comparison there is this php project :
https://github.com/kennethrapp/phasher
And my little javascript clone:
https://redaktor.me/phasher/demo_js/index.html
Unfortunately this is "bitcount"-based but will recognize rotated images.
Another approach in javascript was to build a luminosity histogram from the image by the help of canvas. You can visualize a polygon histogram on the canvas and compare that polygon in your database (e.g. mySQL spatial ...)
A long time ago I worked on a system that had some similar characteristics, and this is an approximation of the algorithm we followed:
Divide the picture into zones. In our case we were dealing with 4:3 resolution video, so we used 12 zones. Doing this takes the resolution of the source images out of the picture.
For each zone, calculate an overall color - the average of all pixels in the zone
For the entire image, calculate an overall color - the average of all zones
So for each image, you're storing n + 1 integer values, where n is the number of zones you're tracking.
For comparisons, you also need to look at each color channel individually.
For the overall image, compare the color channels for the overall colors to see if they are within a certain threshold - say, 10%
If the images are within the threshold, next compare each zone. If all zones also are within the threshold, the images are a strong enough match that you can at least flag them for further comparison.
This lets you quickly discard images that are not matches; you can also use more zones and/or apply the algorithm recursively to get stronger match confidence.
Similar to Ic's answer - you might try comparing the images at multiple resolutions. So each image get saved as 1x1, 2x2, 4x4 .. 800x800. If the lowest resolution doesn't match (subject to a threshold), you can immediately reject it. If it does match, you can compare them at the next higher resolution, and so on..
Also - if the images share any similar structure, such as medical images, you might be able to extract that structure into a description that is easier/faster to compare.
As of 2015 (back to the future... on this 2009 question which is now high-ranked in Google) image similarity can be computed using Deep Learning techniques. The family of algorithms known as Auto Encoders can create a vector representation which is searchable for similarity. There is a demo here.
One way you can do this is to resize the image and drop the resolution significantly (to 200x200 maybe?), storing a smaller (pixel-averaged) version for doing the comparison. Then define a tolerance threshold and compare each pixel. If the RGB of all pixels are within the tolerance, you've got a match.
Your initial run through is O(n^2) but if you catalog all matches, each new image is just an O(n) algorithm to compare (you only have to compare it to each previously inserted image). It will eventually break down however as the list of images to compare becomes larger, but I think you're safe for a while.
After 400 days of running, you'll have 500,000 images, which means (discounting the time to resize the image down) 200(H)*200(W)*500,000(images)*3(RGB) = 60,000,000,000 comparisons. If every image is an exact match, you're going to be falling behind, but that's probably not going to be the case, right? Remember, you can discount an image as a match as soon as a single comparison falls outside your threshold.
Do you literally want to compare every image against the others? What is the application? Maybe you just need some kind of indexing and retrieval of images based on certain descriptors? Then for example you can look at MPEG-7 standard for Multimedia Content Description Interface. Then you could compare the different image descriptors, which will be not that accurate but much faster.
So you want to do "fingerprint matching" that's pretty different than "image matching". Fingerprints' analysis has been deeply studied during the past 20 years, and several interesting algorithms have been developed to ensure the right detection rate (with respect to FAR and FRR measures - False Acceptance Rate and False Rejection Rate).
I suggest you to better look to LFA (Local Feature Analysis) class of detection techniques, mostly built on minutiae inspection. Minutiae are specific characteristics of any fingerprint, and have been classified in several classes. Mapping a raster image to a minutiae map is what actually most of Public Authorities do to file criminals or terrorists.
See here for further references
For iPhone image comparison and image similarity development check out:
http://sites.google.com/site/imagecomparison/
To see it in action, check out eyeBuy Visual Search on the iTunes AppStore.
It seems that specialised image hashing algorithms are an area of active research but perhaps a normal hash calculation of the image bytes would do the trick.
Are you seeking byte-identical images rather than looking for images that are derived from the same source but may be a different format or resolution (which strikes me as a rather hard problem).

Resources