MPEG-1 Video Compression Quantization - mpeg

MPEG-1 quantizes both inter-frames and intra-frames, however I am curious why the quantization table for inter-frames are all constants, whereas the default quantization table for intra-frames is not.

This is because inter-frames contains only error residuals after subtracting predicted image from source image. Such residuals have almost uniform frequency distribution (if prediction ie. motion estimation is made correctly). For these frames we use uniform quantization matrices.
In contrast, intra-frames have very high power of low frequencies, they are hard to compress because of they enormous amplitudes. For these unpredicted frames we use low-frequency centered quantization matrices.

Related

Gamma Correction and Luminance Masking

As per Weber's law, delta(L)/L is a constant where L is luminance measured in candela/m2 )i.e. (L2 - L1)/L1. This implies that a small change in lower luminance range (darker) is perceptually much more than a small change in higher luminance range (brighter).
The sRGB images which we have stored are gamma corrected i.e. they first undergo a non-linear transfer function which also partially simulated human perception.
I would like to know what happens to luminance masking after gamma correction? Does Weber law still hold on these sRGB images or are they perceptually uniform i.e. 1 unit of difference in pixel value is same be it in darker region or in brighter region? In other words, is delta(L) constant in gamma corrected images where L is gamma corrected pixel value.
Weber's Law does not apply to sRGB coded values to the extent it does apply to luminance. In other words, sRGB value is closer to being perceptually uniform than cd/m2.
To answer your question, I would NOT expect delta(sRGB coded pseudo-L) to be (even vaguely) constant.
However, keep in mind that both Weber-Fechner and sRGB are coarse approximations to perception. CIECAM02 is a more modern alternative worth exploring.

Uncertainty in L,a,b space of compressed JPEG images

My team wish to calculate the contrast between two photographs taken in a wet environment.
We will calculate contrast using the formula
Contrast = SQRT((ΔL)^2 + (Δa)^2 + (Δb)^2)
where ΔL is the difference in luminosity, Δa is the difference in (redness-greeness) and Δb is (yellowness-blueness), which are the dimensions of Lab space.
Our (so far successful) approach has been to convert each pixel from RGB to Lab space, and taking the mean values of the relevant sections of the image as our A and B variables.
However the environment limits us to using a (waterproof) GoPro camera which compresses images to JPEG format, rather than saving as TIFF, so we are not using a true-colour image.
We now need to quantify the uncertainty in the contrast - for which we need to know the uncertainty in A and B and by extension the uncertainties (or mean/typical uncertainty) in each a and b value for each RGB pixel. We can calculate this only if we know the typical/maximum uncertainty produced when converting from true-colour to JPEG.
Therefore we need to know the maximum possible difference in each of the RGB channels when saving in JPEG format.
EG. if true colour RGB pixel (5, 7, 9) became (2, 9, 13) after compression the uncertainty in each channel would be (+/- 3, +/- 2, +/- 4).
We believe that the camera compresses colour in the aspect ratio 4:2:0 - is there a way to test this?
However our main question is; is there any way of knowing the maximum possible error in each channel, or calculating the uncertainty from the compressed RGB result?
Note: We know it is impossible to convert back from JPEG to TIFF as JPEG compression is lossy. We merely need to quantify the extent of this loss on colour.
In short, it is not possible to absolutely quantify the maximum possible difference in digital counts in a JPEG image.
You highlight one of these points well already. When image data is encoded using the JPEG standard, it is first converted to the YCbCr color space.
Once in this color space, the chroma channels (Cb and Cr) are downsampled, because the human visual system is less sensitive to artifacts in chroma information than it is lightness information.
The error introduced here is content-dependent; an area of very rapidly varying chroma and hue will have considerably more content loss than an area of constant hue/chroma.
Even knowing the 4:2:0 compression, which describes the amount and geometry of downsampling (more information here), the content still dictates the error introduced at this step.
Another problem is the quantization performed in JPEG compression.
The resulting information is encoded using a Discrete Cosine Transform. In the transformed space, the results are again quantized depending on the desired quality. This quantization is set at the time of file generation, which is performed in-camera. Again, even if you knew the exact DCT quantization being performed by the camera, the actual effect on RGB digital counts is ultimately content-dependent.
Yet another difficulty is noise created by DCT block artifacts, which (again) is content dependent.
These scene dependencies make the algorithm very good for visual image compression, but very difficult to characterize absolutely.
However, there is some light at the end of the tunnel. JPEG compression will cause significantly more error in areas of rapidly changing image content. Areas of constant color and texture will have significantly less compression error and artifacts. Depending on your application you may be able to leverage this to your benefit.

What are the steps in which loss takes place in jpeg compression?

JPEG is a lossy image compression which can give a high compression ratio.
As far as I know, information loss takes place in JPEG during quantization.
Are there any other steps in JPEG compression where the loss takes place or can take place?
If it takes place, then where?
There are 3 aspects of JPEG compression which affect the quality and accuracy of images:
1) Loss of precision takes place during the quantization stage. Accuracy of the colors is lost in order to reduce the amount of data generated.
2) Errors are introduced during the conversion to/from the RGB/YCC color spaces.
3) Errors are introduced during the transformation to/from the frequency domain. The Discrete Cosine Transform converts pixels into the frequency domain. This conversion incurs errors in both directions.
Another place where loss can take place in JPEG compression is the chroma subsampling stage.
My understanding is that most JPEG-compressed images use 4:2:0 color subsampling: after converting each pixel from RGB to YCbCr, the Cb values for a 2x2 block of pixels are averaged to a single value, and the Cr values for that 2x2 block of pixels are also averaged to a single value.
The JPEG standard also supports 4:4:4 (no downsampling).

is jpg format good for image processing algorithms

most non-serious cameras (cameras on phones and webcams) provide lossy JPEG image as output.
while for a human eye they may not be noticed but the data loss could be critical for image processing algorithms.
If I am correct what is general approach you take when analyzing input images ?
(please note: using a industry standard camera may not be an option for hobbyist programmers)
JPG is an entire family of implementations, there are actually 4 methods. The most common method is the "normal" method, based on the Discrete Cosine Transform. This simply divides the image in 8x8 blocks and calculates the DCT of this. This results in a list of coefficients. To store these coefficients efficiently, they are multiplied by some other matrix (quantization matrix), such that the higher frequencies are usually rounded to zero. This is the only lossy step in the process. The reason this is done is to be able to store the coefficients more efficiently than before.
So, your question is not answered very easily. It also depends on the size of the input, if you have a sufficiently large image (say 3000x2000), stored at a relatively high precision, you will have no trouble with artefacts. A small image with a high compression rate might cause troubles.
Remember though that an image taken with a camera contains a lot of noise, which in itself is probably far more troubling than the jpg compression.
In my work I usually converted all images to pgm format, which is a raw format. This ensures that if I process the image in a pipeline fashion, all intermediate steps do not suffer from jpg compression.
Keep in mind that operations such as rotation, scaling, and repeated saving of JPG cause data loss each iteration.

Does JPEG use a row-major compression algorithm?

A JPEG image, if it is non-progrssive loads from top-to-bottom, and not from left-to-right or any other manner.
Doesn't that imply that jpeg uses some row-wise compression technique? Does it (use a row-wise compression technique)?
No, it doesn't. JPEG mainly constitutes the use of chroma channel subsampling, a discrete cosine transform (DCT) and some non-lossy compression such as run-length encoding. The image is divided into blocks, usually 8x8 pixels, and then transformed into a frequency domain representation via DCT. In a non-progressive JPEG, these blocks would be stored left to right, top to bottom. With a progressive JPEG, the lower frequency components will be stored before the higher ones, allowing a low-resolution preview be viewed before the whole image has been transmitted.
As you can rotate a JPEG by 90 degrees quick and lossless I think it's not row-major compression. It's just that the compressed blocks are stored in some order and that happens to be row by row.

Resources