Compression method effectiveness varies hugely - image

I decided I'd attempt an image compression (From pixel RGBs) idea I had for a bit of extra credit in class. I've finished, but I find the levels of compression I get varies HUGELY from image to image. With this image, I'm getting a file size 1.25x the size of the coresponding PNG. With this image however, I'm getting a file size 22.5x the size of the PNG.
My compression works by first assigning each color in the image with an int(starting from 0), then using that int rather than the actual color in the file. The file is formatted as:
0*qj8c1*50i2p2*pg93*9zlds4*2rk5*4ok4r6*8mv1w7*2r25l8*3m89o9*9yp7c10*111*2clz112*g1j13*2w34z14*auq15*3zhg616*mmhc17*5lgsi18*25lw919*7ip84+0!0!0!0!0!0!0!0!0!0!0!0!0!1!1!1!#2!2!2!2!2!2!2!2!2!2!2!2!2!3!3!3!#4!4!4!4!4!4!4!4!4!4!4!4!4!5!5!5!#6!6!6!6!6!6!6!6!6!6!6!6!6!0!0!0!#3!3!3!3!3!3!3!3!3!3!3!3!3!2!2!2!#1!1!1!1!1!1!1!1!1!1!1!1!1!4!4!4!#7!7!7!7!7!7!7!7!7!7!7!7!7!6!6!6!#5!5!5!5!5!5!5!5!5!5!5!5!5!8!8!8!#9!9!9!9!9!9!9!9!9!9!9!9!9!10!10!10!#8!8!8!8!8!8!8!8!8!8!8!8!8!11!11!11!#12!12!12!12!12!12!12!12!12!12!12!12!12!7!7!7!#13!13!13!13!13!13!13!13!13!13!13!13!13!14!14!14!#15!15!15!15!15!15!15!15!15!15!15!15!15!16!16!16!#17!17!17!17!17!17!17!17!17!17!17!17!17!13!13!13!#18!18!18!18!18!18!18!18!18!18!18!18!18!15!15!15!#10!10!10!10!10!10!10!10!10!10!10!10!10!19!19!19!#
Where the first bit (With the *s and base36 numbers) is the dictionary defining the colors, and the second part seperated by !s is the actual image.
Why am I seeing the level of compression vary so hugely image from image? Is there a flaw in my compression algorithm?
Edit: The fact that the actual level of compression is poor compared to JPEG or PNG isn't an issue, I wasn't expecting to rival any major formats.
Thanks

Related

My .jpg file is larger than .png?

I used Microsoft Paint to create a 15248 x 6552 solid color picture. I saved it as both .png and .jpg and was expecting the .jpg to be smaller than .png, but it was not.
The .jpg file is 1.49MB, while the .png is 391KB. Shouldnt jpeg being a lossy compression be technically smaller in size?
I read somewhere that .png is better for solid colors etc, so I downloaded a picture form the web (not a solid color) and used paint to save it in both formats. This time the jpeg was smaller than png. Is it solely due to the gradient of colors? if so then why?
Even if the picture is a solid color should jpg encodng be able to compress it even better?
It's to be expected that PNG performs better than JPEG in this scenario.
As pointed out in other answer, PNG does a per-line pixel prediction, followed by ZLIB compression. If the image has a single colour, the prediction will produce a constant zero value for all the pixels, except for the start of each row. Hence the compression will be very effective. I'd bet that if the image were "landscape" (6552 x 15248 instead of 15248 x 6552) the compression would be even a little better.
JPEG compression, instead, divides the image in blocks of 8x8 pixels, and for each one it attempts to quantize finely the low frequency components and coarsely the high frequency components. This works nicely for "natural" (photographic or rendered) images, but no so nicely for images with few colors (or a single one!).
See some comparisons here.
Not necessarily.
PNG is a prediction-based algorithm, which means that it tries to deduct the value of one pixel based on previously coded pixels. I bet the prediction is really accurate for a solid image, thence the very good results.
JPEG accepts different "quality levels" which determine the size of your compressed file. The size differences between your experiment and the web version are likely due to that (unless you're downloading a different image, of course!).
Note that JPEG may introduce some image artifacts because it is a lossy algorithm, while PNG will recover the exact input image for you.
I've found for the same picture that if you save as PNG 1st then JPG the PNG will be smaller and if saved as JPG 1st it will be smaller than the PNG saved afterwards

How can an Interlaced .png file's size be smaller than the original file?

Ok, so I tried to use the imagemagick command:
"convert picA.png -interlace line picB.png"
to make an interlace version of my .png images. Most of the time, I got the resulting image is larger than the original one, which is kinda normal. However, on certain image, the resulting image size is smaller.
So I just wonder why does that happen? I really don't want my new image to lose any quality because of the command.
Also, is there any compatibility problem with interlaced .png image?
EDIT: I guess my problem is that the original image was not compressed as best as it could be.
The following only applies to the cases where the pixel size is >= 8 bits. I didn't investigate for other cases but I expect similar outcomes.
A content-identical interlaced PNG image file will almost always be greater because of the additional data for filter type descriptions required to handle the passes scanlines. This is what I explained in details in this web page based on the PNG RFC RFC2083.
In short, this is because the sum of the below number of bytes for interlaced filter types description per interlacing pass is almost always greater than the image height (which is the number of filter types for non-interlaced images):
nb_pass1_lines = CEIL(height/8)
nb_pass2_lines = (width>4?CEIL(height/8):0)
nb_pass3_lines = CEIL((height-4)/8)
nb_pass4_lines = (width>2?CEIL(height/4):0)
nb_pass5_lines = CEIL((height-2)/4)
nb_pass6_lines = (width>1?CEIL(height/2):0)
nb_pass7_lines = FLOOR(height/2)
Though, theoretically, it can be that the data entropy/complexity accidentally gets lowered enough by the Adam7 interlacing so that, with the help of filtering, the usually additional space needed for filter types with interlacing may be compensated through the deflate compression used for the PNG format. This would be a particular case to be proven as the entropy/complexity is more likely to increase with interlacing because the image data is made less consistent through the interlacing deconstruction.
I used the word "accidentally" because reducing the data entropy/complexity is not the purpose of the Adam7 interlacing. Its purpose is to allow the progressive loading and display of the image through a passes mechanism. While, reducing the entropy/complexity is the purpose of the filtering for PNG.
I used the word "usually" because, as shown in the explanation web page, for example, a 1 pixel image will be described through the same length of uncompressed data whether interlaced or not. So, in this case, no additional space should be needed.
When it comes to the PNG file size, a lower size for interlaced can be due to:
Different non-pixel encoding related content embedded in the file such as palette (in the case of color type =! 3) and non-critical chunks such as chromaticities, gamma, number of significant bits, default background color, histogram, transparency, physical pixel dimensions, time, text, compressed text. Note that some of those non-pixel encoding related content can lead to different display of the image depending on the software used and the situation.
Different pixel encoding related content (which can change the image quality) such as bit depth, color type (and thus the use of palette or not with color type = 3), image size,... .
Different compression related content such as better filtering choices, accidental lower data entropy/complexity due to interlacing as explained above (theoretical particular case), higher compression level (as you mentioned)
If I had to check whether 2 PNG image files are equivalent pixel wise, I would use the following command in a bash prompt:
diff <( convert non-interlaced.png rgba:- ) <( convert interlaced.png rgba:- )
It should return no difference.
For the compatibility question, if the PNG encoder and PNG decoder implement the mandatory aspects of the PNG RFC, I see no reason for the interlacing to lead to a compatibility issue.
Edit 2018 11 13:
Some experiments based on auto evolved distributed genetic algorithms with niche mechanism (hosted on https://en.oga.jod.li ) are explained here:
https://jod.li/2018/11/13/can-an-interlaced-png-image-be-smaller-than-the-equivalent-non-interlaced-image/
Those experiments show that it is possible for equivalent PNG images to have a smaller size interlaced than non-interlaced. The best images for this are tall, they have a one pixel width and have pixel content that appear random. Though, the shape is not the only important aspect for the interlaced image to be smaller than the non-interlaced image as random cases with the same shape lead to different size differences.
So, yes, some PNG images can be identical pixel wise and for non-pixel related content but have a smaller size interlaced than non-interlaced.
So I just wonder why does that happen?
From section Interlacing and pass extraction of the PNG spec.
Scanlines that do not completely fill an integral number of bytes are padded as defined in 7.2: Scanlines.
NOTE If the reference image contains fewer than five columns or fewer than five rows, some passes will be empty.
I would assume the behavior your experiencing is the result of the Adam7 method requiring additional padding.

Difference in entropy values for the same image

I am finding the entropy value of an RGB image after histogram processing using the Y plane as follows:
i % the original image
y1=rgb2ycbcr(i);
y=y1(:,:,1);cb=y1(:,:,2);cr=y1(:,:,3);
he1=histeq(y);
r1=cat(3,he1,cb,cr);
r1=ycbcr2rgb(r1);
g1=rgb2gray(r1);
e1=entropy(g1);
Now I followed the procedure:
imwrite(r1,'temp1.jpg');
i2=imread('temp1.jpg');
g2=rgb2gray(i2);
e2=entropy(g2);
But now e1 and e2 are different. Why it is so?
You're writing the image r1 to disk using the JPEG compression standard. JPEG is lossy, which means that what is written to disk is not the same as what was originally stored in memory. Though the images look perceptually the same, if you compared the colour values between corresponding pixels, the majority of them will be slightly different. These slight differences is why the JPEG standard gives high compression ratios and thus smaller file sizes.
If you want to ensure that what you write to file is the same as what you read in, use a lossless compression standard, such as using PNG. As such, change the destination filename so that you're using PNG, not JPEG:
imwrite(r1,'temp1.png'); %// Change
i2=imread('temp1.png'); %// Change
g2=rgb2gray(i2);
e2=entropy(g2);

Concept of Image pixels, resolution and magnification of image

I am trying to understand the relation between image pixels, their size, resolution and how changing the resolutions of an image results in the same image size but slight amount of blurriness. I referred to this links but some doubts still remain:
1) So, what is the "commonly" accepted definition of "resolution"?
2) How(and Why) does changing the resolution of an image (say, a Desktop wallpaper) result in the same image size but a slight amount of blurriness? In this case, what definition of "resolution", "pixels" and "image size" are we talking about?
Thanks in advance!
The image that is displayed on the screen is composed of thousands (or millions) of small dots; these are called pixels.
The number of pixels that can be displayed on the screen is referred to as the resolution of the image.
Image resolution is the detail an image holds.
As the resolution goes up, the image becomes more clear. It becomes
sharper, more defined, and more detailed as well.
In addition to image size, the quality of the image can also be manipulated. Here we use the word "compression". An uncompressed image is saved in a file format that doesn't compress the pixels in the image at all. Formats such as BMP or TIF files do not compress the image.
If you want to reduce the "file size" (number of megabytes required to save the image), you can choose to store your image as a JPG file and choose the amount of compression you want before saving the image.
These terms are explained in a video
Put in simple words, resolution is simply the number of pixels(picture cells) contained in a horizontal/vertical axis. The more the pixels arranged in an axis, better the image. The formal definition can be easily availed on the web.
Changing the resolution simply means changing the number of pixels in the image.Higher resolution implies more pixels hence a more detailed image. Reducing the resolution of an image decreases the image size following either lossy or lossless compression algorithms. This further reduces the amount of information in the image leading to the blurriness or the jagged edges.Image size varies according to the degree of compression only. 'Pixels' and 'Resolution' remain what they had been explained.

Matlab imwrite function changes the pixel values

I tried to change some pixel values of a Grayscale image and save it using imwrite in matlab.
no problem with saving.
the problem is when I read it back, some pixel values have been changed. not exactly the same values I assigned to pixels before saving it.
I'm trying to hash images so 1unit difference will effect the hash numbers.
As mentioned by mmgp, JPG can be lossy. That means that some of the information in your image will be lost in favor of storage efficiency.
The rationale behind JPG is somewhat like that behind MP3 -- changes in hues etc. that the human eye is not particularly well-adapted to distinguish will be simplified or removed altogether, thus decreasing the amount of information in the image. The information in a JPG represents a similar-looking, but in fact very different image. This is probably what you're experiencing.
In Matlab, have a look at the output of help imwrite. You can give a parameter to the jpg write called 'Quality', which is a number between 0 and 100, 100 meaning (near-)lossless compression.
Although the JPEG standard does allow for (near-)lossless compression, it is not often used in practice (at least, in my field). More popular lossless image formats are PNG, JPEG2000 and TIFF. Read more about it here.
All of these are also available in Matlab's imwrite function.

Resources