Uncertainty in L,a,b space of compressed JPEG images - image

My team wish to calculate the contrast between two photographs taken in a wet environment.
We will calculate contrast using the formula
Contrast = SQRT((ΔL)^2 + (Δa)^2 + (Δb)^2)
where ΔL is the difference in luminosity, Δa is the difference in (redness-greeness) and Δb is (yellowness-blueness), which are the dimensions of Lab space.
Our (so far successful) approach has been to convert each pixel from RGB to Lab space, and taking the mean values of the relevant sections of the image as our A and B variables.
However the environment limits us to using a (waterproof) GoPro camera which compresses images to JPEG format, rather than saving as TIFF, so we are not using a true-colour image.
We now need to quantify the uncertainty in the contrast - for which we need to know the uncertainty in A and B and by extension the uncertainties (or mean/typical uncertainty) in each a and b value for each RGB pixel. We can calculate this only if we know the typical/maximum uncertainty produced when converting from true-colour to JPEG.
Therefore we need to know the maximum possible difference in each of the RGB channels when saving in JPEG format.
EG. if true colour RGB pixel (5, 7, 9) became (2, 9, 13) after compression the uncertainty in each channel would be (+/- 3, +/- 2, +/- 4).
We believe that the camera compresses colour in the aspect ratio 4:2:0 - is there a way to test this?
However our main question is; is there any way of knowing the maximum possible error in each channel, or calculating the uncertainty from the compressed RGB result?
Note: We know it is impossible to convert back from JPEG to TIFF as JPEG compression is lossy. We merely need to quantify the extent of this loss on colour.

In short, it is not possible to absolutely quantify the maximum possible difference in digital counts in a JPEG image.
You highlight one of these points well already. When image data is encoded using the JPEG standard, it is first converted to the YCbCr color space.
Once in this color space, the chroma channels (Cb and Cr) are downsampled, because the human visual system is less sensitive to artifacts in chroma information than it is lightness information.
The error introduced here is content-dependent; an area of very rapidly varying chroma and hue will have considerably more content loss than an area of constant hue/chroma.
Even knowing the 4:2:0 compression, which describes the amount and geometry of downsampling (more information here), the content still dictates the error introduced at this step.
Another problem is the quantization performed in JPEG compression.
The resulting information is encoded using a Discrete Cosine Transform. In the transformed space, the results are again quantized depending on the desired quality. This quantization is set at the time of file generation, which is performed in-camera. Again, even if you knew the exact DCT quantization being performed by the camera, the actual effect on RGB digital counts is ultimately content-dependent.
Yet another difficulty is noise created by DCT block artifacts, which (again) is content dependent.
These scene dependencies make the algorithm very good for visual image compression, but very difficult to characterize absolutely.
However, there is some light at the end of the tunnel. JPEG compression will cause significantly more error in areas of rapidly changing image content. Areas of constant color and texture will have significantly less compression error and artifacts. Depending on your application you may be able to leverage this to your benefit.

Related

How to estimate GIF file size?

We're building an online video editing service. One of the features allows users to export a short segment from their video as an animated gif. Imgur has a file size limit of 2Mb per uploaded animated gif.
Gif file size depends on number of frames, color depth and the image contents itself: a solid flat color result in a very lightweight gif, while some random colors tv-noise animation would be quite heavy.
First I export each video frame as a PNG of the final GIF frame size (fixed, 384x216).
Then, to maximize gif quality I undertake several gif render attempts with slightly different parameters - varying number of frames and number of colors in the gif palette. The render that has the best quality while staying under the file size limit gets uploaded to Imgur.
Each render takes time and CPU resources — this I am looking to optimize.
Question: what could be a smart way to estimate the best render settings depending on the actual images, to fit as close as possible to the filesize limit, and at least minimize the number of render attempts to 2–3?
The GIF image format uses LZW compression. Infamous for the owner of the algorithm patent, Unisys, aggressively pursuing royalty payments just as the image format got popular. Turned out well, we got PNG to thank for that.
The amount by which LZW can compress the image is extremely non-deterministic and greatly depends on the image content. You, at best, can provide the user with a heuristic that estimates the final image file size. Displaying, say, a success prediction with a colored bar. You'd can color it pretty quickly by converting just the first frame. That won't take long on 384x216 image, that runs in human time, a fraction of a second.
And then extrapolate the effective compression rate of that first image to the subsequent frames. Which ought to encode only small differences from the original frame so ought to have comparable compression rates.
You can't truly know whether it exceeds the site's size limit until you've encoded the entire sequence. So be sure to emphasize in your UI design that your prediction is just an estimate so your user isn't going to disappointed too much. And of course provide him with the tools to get the size lowered, something like a nearest-neighbor interpolation that makes the pixels in the image bigger. Focusing on making the later frames smaller can pay off handsomely as well, GIF encoders don't normally do this well by themselves. YMMV.
There's no simple answer to this. Single-frame GIF size mainly depends on image entropy after quantization, and you could try using stddev as an estimator using e.g. ImageMagick:
identify -format "%[fx:standard_deviation]" imagename.png
You can very probably get better results by running a smoothing kernel on the image in order to eliminate some high-frequency noise that's unlikely to be informational, and very likely to mess up compression performance. This goes much better with JPEG than with GIF, anyway.
Then, in general, you want to run a great many samples in order to come up with something of the kind (let's say you have a single compression parameter Q)
STDDEV SIZE W/Q=1 SIZE W/Q=2 SIZE W/Q=3 ...
value1 v1,1 v1,2 v1,3
After running several dozens of tests (but you need do this only once, not "at runtime"), you will get both an estimate of, say, , and a measurement of its error. You'll then see that an image with stddev 0.45 that compresses to 108 Kb when Q=1 will compress to 91 Kb plus or minus 5 when Q=2, and 88 Kb plus or minus 3 when Q=3, and so on.
At that point you get an unknown image, get its stddev and compression #Q=1, and you can interpolate the probable size when Q equals, say, 4, without actually running the encoding.
While your service is active, you can store statistical data (i.e., after you really do the encoding, you store the actual results) to further improve estimation; after all you'd only store some numbers, not any potentially sensitive or personal information that might be in the video. And acquiring and storing those numbers would come nearly for free.
Backgrounds
It might be worthwhile to recognize images with a fixed background; in that case you can run some adaptations to make all the frames identical in some areas, and have the GIF animation algorithm not store that information. This, when and if you get such a video (e.g. a talking head), could lead to huge savings (but would throw completely off the parameter estimation thing, unless you could estimate also the actual extent of the background area. In that case, let this area be B, let the frame area be A, the compressed "image" size for five frames would be A+(A-B)*(5-1) instead of A*5, and you could apply this correction factor to the estimate).
Compression optimization
Then there are optimization techniques that slightly modify the image and adapt it for a better compression, but we'd stray from the topic at hand. I had several algorithms that worked very well with paletted PNG, which is similar to GIF in many regards, but I'd need to check out whether and which of them may be freely used.
Some thoughts: LZW algorithm goes on in lines. So whenever a sequence of N pixels is "less than X%" different (perceptually or arithmetically) from an already encountered sequence, rewrite the sequence:
018298765676523456789876543456787654
987678656755234292837683929836567273
here the 656765234 sequence in the first row is almost matched by the 656755234 sequence in the second row. By changing the mismatched 5 to 6, the LZW algorithm is likely to pick up the whole sequence and store it with one symbol instead of three (6567,5,5234) or more.
Also, LZW works with bits, not bytes. This means, very roughly speaking, that the more the 0's and 1's are balanced, the worse the compression will be. The more unpredictable their sequence, the worse the results.
So if we can find out a way of making the distribution more **a**symmetrical, we win.
And we can do it, and we can do it losslessly (the same works with PNG). We choose the most common colour in the image, once we have quantized it. Let that color be color index 0. That's 00000000, eight fat zeroes. Now we choose the most common colour that follows that one, or the second most common colour; and we give it index 1, that is, 00000001. Another seven zeroes and a single one. The next colours will be indexed 2, 4, 8, 16, 32, 64 and 128; each of these has only a single bit 1, all others are zeroes.
Since colors will be very likely distributed following a power law, it's reasonable to assume that around 20% of the pixels will be painted with the first nine most common colours; and that 20% of the data stream can be made to be at least 87.5% zeroes. Most of them will be consecutive zeroes, which is something that LZW will appreciate no end.
Best of all, this intervention is completely lossless; the reindexed pixels will still be the same colour, it's only the palette that will be shifted accordingly. I developed such a codec for PNG some years ago, and in my use case scenario (PNG street maps) it yielded very good results, ~20% gain in compression. With more varied palettes and with LZW algorithm the results will be probably not so good, but the processing is fast and not too difficult to implement.

What are the steps in which loss takes place in jpeg compression?

JPEG is a lossy image compression which can give a high compression ratio.
As far as I know, information loss takes place in JPEG during quantization.
Are there any other steps in JPEG compression where the loss takes place or can take place?
If it takes place, then where?
There are 3 aspects of JPEG compression which affect the quality and accuracy of images:
1) Loss of precision takes place during the quantization stage. Accuracy of the colors is lost in order to reduce the amount of data generated.
2) Errors are introduced during the conversion to/from the RGB/YCC color spaces.
3) Errors are introduced during the transformation to/from the frequency domain. The Discrete Cosine Transform converts pixels into the frequency domain. This conversion incurs errors in both directions.
Another place where loss can take place in JPEG compression is the chroma subsampling stage.
My understanding is that most JPEG-compressed images use 4:2:0 color subsampling: after converting each pixel from RGB to YCbCr, the Cb values for a 2x2 block of pixels are averaged to a single value, and the Cr values for that 2x2 block of pixels are also averaged to a single value.
The JPEG standard also supports 4:4:4 (no downsampling).

is jpg format good for image processing algorithms

most non-serious cameras (cameras on phones and webcams) provide lossy JPEG image as output.
while for a human eye they may not be noticed but the data loss could be critical for image processing algorithms.
If I am correct what is general approach you take when analyzing input images ?
(please note: using a industry standard camera may not be an option for hobbyist programmers)
JPG is an entire family of implementations, there are actually 4 methods. The most common method is the "normal" method, based on the Discrete Cosine Transform. This simply divides the image in 8x8 blocks and calculates the DCT of this. This results in a list of coefficients. To store these coefficients efficiently, they are multiplied by some other matrix (quantization matrix), such that the higher frequencies are usually rounded to zero. This is the only lossy step in the process. The reason this is done is to be able to store the coefficients more efficiently than before.
So, your question is not answered very easily. It also depends on the size of the input, if you have a sufficiently large image (say 3000x2000), stored at a relatively high precision, you will have no trouble with artefacts. A small image with a high compression rate might cause troubles.
Remember though that an image taken with a camera contains a lot of noise, which in itself is probably far more troubling than the jpg compression.
In my work I usually converted all images to pgm format, which is a raw format. This ensures that if I process the image in a pipeline fashion, all intermediate steps do not suffer from jpg compression.
Keep in mind that operations such as rotation, scaling, and repeated saving of JPG cause data loss each iteration.

How to estimate the size of JPEG image which will be scaled down

For example, I have an 1024*768 JPEG image. I want to estimate the size of the image which will be scaled down to 800*600 or 640*480. Is there any algorithm to calculate the size without generating the scaled image?
I took a look in the resize dialog in Photoshop. The size they show is basically (width pixels * height pixels * bit/pixel) which shows a huge gap between the actual file size.
I have mobile image browser application which allow user to send image through email with options to scale down the image. We provide check boxes for the user to choose down-scale resolution with the estimate size. For large image (> 10MB), we have 3 down scale size to choose from. If we generate a cached image for each option, it may hurt the memory. We are trying to find the best solution which avoid memory consumption.
I have successfully estimated the scaled size based on the DQT - the quality factor.
I conducted some experiments and find out if we use the same quality factor as in the original JPEG image, the scaled image will have size roughly equal to (scale factor * scale factor) proportion of the original image size. The quality factor can be estimate based on the DQT defined in the every JPEG image. Algorithm has be defined to estimate the quality factor based on the standard quantization table shown in Annex K in JPEG spec.
Although other factors like color subsampling, different compression algorithm and the image itself will contribute to error, the estimation is pretty accurate.
P.S. By examining JPEGSnoop and it source code, it helps me a lot :-)
Cheers!
Like everyone else said, the best algorithm to determine what sort of JPEG compression you'll get is the JPEG compression algorithm.
However, you could also calculate the Shannon entropy of your image, in order to try and understand how much information is actually present. This might give you some clues as to the theoretical limits of your compression, but is probably not the best solution for your problem.
This concept will help you measure the differences in information between an all white image and that of a crowd, which is related to it's compressibility.
-Brian J. Stinar-
Why estimate what you can measure?
In essence, it's impossible to provide any meaningful estimate due to the fact that different types of images (in terms of their content) will compress very differently using the JPEG algorithm. (A 1024x768 pure white image will be vastly smaller than a photograph of a crowd scene for example.)
As such, if you're after an accurate figure it would make sense to simply carry out the re-size.
Alternatively, you could just provide an range such as "40KB to 90KB", based on an "average" set of images.
I think what you want is something weird and difficult to do. Based on JPG compression level some images are heavier that others in terms of heavier (size).
My hunch for JPEG images: Given two images at same resolution, compressed at the same quality ratio - the image taking smaller memory will compress more (in general) when its resolution is reduced.
Why? From experience: many times when working with a set of images, I have seen that if a thumbnail is occupying significantly more memory than most others, reducing its resolution has almost no change in the size (memory). On other hand, reducing resolution of one of the average size thumbnails reduces the size significantly. (all parameters like original/final resolution and JPEG quality being the same in the two cases).
Roughly speaking - higher the entropy, less will be the impact on size of image by changing resolution (at the same JPEG quality).
If you can verify this with experiments, maybe you can use this as a quick method to estimate the size. If my language is confusing, I can explain with some mathematical notation/psuedo formula.
An 800*600 image file should be roughly (800*600)/(1024*768) times as large as the 1024*768 image file it was scaled down from. But this is really a rough estimate, because the compressibility of original and scaled versions of the image might be different.
Before I attempt to answer your question, I'd like to join the ranks of people that think it's simpler to measure rather than estimate. But it's still an interesting question, so here's my answer:
Look at the block DCT coefficients of the input JPEG image. Perhaps you can find some sort of relationship between the number of higher frequency components and the file size after shrinking the image.
My hunch: all other things (e.g. quantization tables) being equal, the more higher frequency components you have in your original image, the bigger the difference in file size between the original and shrinked image will be.
I think that by shrinking the image, you will reduce some of the higher frequency components during interpolation, increasing the possibility that they will be quantized to zero during the lossy quantization step.
If you go down this path, you're in luck: I've been playing with JPEG block DCT coefficients and put some code up to extract them.

Is there a quality, file-size, or other benefit to JPEG sizes being multiples of 8px or 16px?

The JPEG compression encoding process splits a given image into blocks of 8x8 pixels, working with these blocks in future lossy and lossless compressions. [source]
It is also mentioned that if the image is a multiple 1MCU block (defined as a Minimum Coded Unit, 'usually 16 pixels in both directions') that lossless alterations to a JPEG can be performed. [source]
I am working with product images and would like to know both if, and how much benefit can be derived from using multiples of 16 in my final image size (say, using an image with size 480px by 360px) vs. a non-multiple of 16 (such as 484x362). In this example I am not interested in further alterations, editing, or recompression of the final image.
To try to get closer to a specific answer where I know there must be largely generalities: Given a 480x360 image that is 64k and saved at maximum quality in Photoshop [example]:
Can I expect any quality loss from an image that is 484x362
What amount of file size addition can I expect (for this example, the additional space would be white pixels)
Are there any other disadvantages to growing larger than the 8px grid?
I know it's arbitrary to use that specific example, but it would still be helpful (for me and potentially any others pondering an image size) to understand what level of compromise I'd be dealing with in breaking the non-8px grid.
The key issue here is a debate I've had is whether 8-pixel divisible images are higher quality than images that are not divisible by 8-pixels.
8 pixels is the cutoff. The reason is because JPEG images are simply an array of 8x8 DCT blocks; if the image resolution isn't mod8 in both directions, the encoder has to pad the sides up to the next mod8 resolution. This in practice is not very expensive bit-wise; what's much worse are the cases when an image has sharp black lines (such as a letterboxed image) that don't lie on block boundaries. This is especially problematic in video encoding. The reason for this being a problem is that the frequency transform of a sharp line is a Gaussian distribution of coefficients--resulting in an enormous number of bits to code.
For those curious, the most common method of padding edges in intra compression (such as JPEG images) is to mirror the lines of pixels before the edge. For example, if you need to pad three lines and line X is the edge, line X+1 is equal to line X, line X+2 is equal to line X-1, and line X+3 is equal to line X-2. This quite effectively minimizes the cost in transform coefficients of the extra lines.
In inter coding, however, the padding algorithms generally simply duplicate the last line, because the mirror method does not work well for inter compression, such as in video compression.
Sometimes you need to use 16 pixel boundaries rather than 8 because of subsampling; every 2nd pixel is thrown away during the encoding process, and those 8x8 DCT blocks started out as 16x16 and will decode back to 16x16. This won't be a problem at the highest quality settings.
A JPG with sizes being multiplies of 8 can also be rotated/flipped with no quality loss. For example gthumb can do this on Linux.
The image dimensions being multiples of 8 or 16 is not going to affect the size on disk very much, but you can get dramatic savings if you can line up the visual contents to the 8x8 pixel grid, such as if there is a repeating pattern or texture in the image.
What Tometzky said. If you don't have the correct multiple, the lossless flip and rotate algorithms don't work. That's because the padding on the right/bottom that can be safely ignored now ends up on the left/top, where it can't.

Resources