I am trying to compress a grayscale image using Huffman coding in MATLAB, and have tried the following code.
I have used a grayscale image with size 512x512 in tif format. My problem is that the size of the compressed image (length of the compressed codeword) is getting bigger than the size of the uncompressed image. The compression ratio is getting less than 1.
clc;
clear all;
A1 = imread('fig1.tif');
[M N]=size(A1);
A = A1(:);
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
[dict,avglen]=huffmandict(count,p) % build the Huffman dictionary
comp= huffmanenco(A,dict); %encode your original image with the dictionary you just built
compression_ratio= (512*512*8)/length(comp) %computing the compression ratio
%% DECODING
Im = huffmandeco(comp,dict); % Decode the code
I11=uint8(Im);
decomp=reshape(I11,M,N);
imshow(decomp);
There is a slight error in your code. I'm assuming you want to calculate the probability of encountering each pixel, which is the normalized histogram. You're not computing it properly. Specifically:
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
total is summing over [0,255] which is not correct. You're supposed to compute the probability distribution of your image. You should use imhist for that instead. As such, you should do this instead:
count = 0:255;
p = imhist(A1) / numel(A1);
This will correctly calculate your probability distribution for your image. Remember, when you're doing Huffman coding, you need to specify the probability of encountering a pixel. Assuming that each pixel can equally be likely to be chosen, this is captured by calculating the image's histogram, then normalizing by the total number of pixels in your image. Try that and see if you get any better results.
However, Huffman will only give you good compression ratios if you have frequently occurring symbols. Did you happen to take a look at the histogram or the spread of your pixels in your image?
If the spread is quite large, with very few entries per bin, then Huffman will not give you any compression savings. In fact it may give you a larger size as a result. Bear in mind that the TIFF compression standard only uses Huffman as part of the algorithm. There is also some pre- and post-processing done to further drive down the size.
As a further example, suppose I had an image that consisted of [0, 1, 2, ... 255; 0, 1, 2, ..., 255; 0, 1, 2, ..., 255]; I have 3 rows of [0,255], but really it could be any number of rows. This means that the probability of encountering each symbol is equiprobable, or 1/255, which means that for each symbol, we would need 8 bits per symbol... which is essentially the raw pixel value anyway!
The key behind Huffman is that a group of bits together generate one symbol. Frequently occurring symbols get assigned a smaller sequence of bits. Because this particular image that I talked about has intensities that are equiprobable, then you'd only generate one symbol per intensity rather than a group. With this, not only will you transmit the dictionary, you would effectively be sending one character at a time, and this is no better than sending the raw byte stream.
If you want your image to be compressed by raw Huffman, the distribution of pixels has to be skewed. For example, if most of the intensities in your image are dark, or are bright. If your image has good contrast or if the spread of the pixel intensities is flat throughout the image, then Huffman will not give you any compression savings.
Related
This question is based on the one asked earlier Understanding image steganography by LSB substitution method
In order to make the code efficient and reduce the mean square error (MSE) the suggestion was: "read the file as is with and convert it to bits with de2bi(fread(fopen(filename)), 8). Embed these bits to your cover image with the minimum k factor required, probably 1 or 2. When you extract your secret, you'll be able to reconstruct the original file." This is what I have been trying but somewhere I am doing wrong as I am not getting any display. However, the MSE has indeed reduced. Basically, I am confused as to how to convert the image to binary, perform the algorithm on that data and display the image after extraction.
Can somebody please help?
I've made some modifications to your code to get this to work regardless of what the actual image is. However, they both need to be either colour or grayscale. There are also some errors your code that would not allow me to run it on my version of MATLAB.
Firstly, you aren't reading in the images properly. You're opening up a byte stream for the images, then using imread on the byte stream to read in the image. That's wrong - just provide a path to the actual file.
Secondly, the images are already in uint8, so you can perform the permuting and shifting of bits natively on this.
The rest of your code is the same as before, except for the image resizing. You don't need to specify the number of channels. Also, there was a syntax error with bitcmp. I used 'uint8' instead of the value 8 as my version of MATLAB requires that you specify a string of the expected data type. The value 8 here I'm assuming you mean 8 bits, so it makes sense to put 'uint8' here.
I'll also read your images directly from Stack Overflow. I'll assume the dinosaur image is the cover while the flower is the message:
%%% Change
x = imread('https://i.stack.imgur.com/iod2d.png'); % cover message
y = imread('https://i.stack.imgur.com/Sg5mr.png'); % message image
n = input('Enter the no of LSB bits to be subsituted- ');
%%% Change
S = uint8(bitor(bitand(x,bitcmp(2^n-1,'uint8')),bitshift(y,n-8))); %Stego
E = uint8(bitand(255,bitshift(S,8-n))); %Extracted
origImg = double(y); %message image
distImg = double(E); %extracted image
[M N d] = size(origImg);
distImg1=imresize(distImg,[M N]); % Change
figure(1),imshow(x);title('1.Cover image')
figure(2),imshow(y);title('2.Message to be hide')
figure(3),imshow((abs(S)),[]);title('3.Stegnographic image')
figure(4),imshow(real(E),[]); title('4.Extracted image');
This runs for me and I manage to reconstruct the message image. Choosing the number of bits to be about 4 gives you a good compromise between the cover and message image.
Loading the byte stream instead of the pixel array of the secret will result to a smaller payload. How smaller it'll be depends on the image format and how repetitive the colours are.
imread() requires a filename and loads a pixel array if said filename is a valid image file. Loading the byte stream of the file and passing that to imread() makes no sense. What you want is this
% read in the byte stream of a file
fileID = fopen(filename);
secretBytes = fread(fileID);
fclose(fileID);
% write it back to a file
fileID = fopen(filename);
fwrite(fileID, secretBytes);
fclose(fileID);
Note that the cover image is loaded as a pixel array, because you'll need to modify it.
The size of your payload is length(secretBytes) * 8 and this must fit in your cover image. If you decide to embed k bits per pixel, for all your colour planes, the following requirement must be met
secretBytes * 8 <= prod(size(coverImage)) * k
If you want to embed in only one colour plane, regardless of whether your cover medium is an RGB or greyscale, you need to modify that to
secretBytes * 8 <= size(coverImage,1) * size(coverImage,2) * k
If this requirement isn't met, you can choose to
stop the process
ask the user for a smaller file to embed
increase k
include more colour planes, if available
The following is a prototype for embedding in one colour plane in the least significant bit only (k = 1).
HEADER_LEN = 24;
coverImage = imread('lena.png');
secretBytes = uint8('Hello world'); % this could be any byte stream
%% EMBEDDING
coverPlane = coverImage(:,:,1); % this assumes an RGB image
bits = de2bi(secretBytes,8)';
bits = [de2bi(numel(bits), HEADER_LEN) bits(:)'];
nBits = length(bits);
coverPlane(1:nBits) = bitset(coverPlane(1:nBits),1,bits);
coverImage(:,:,1) = coverPlane;
%% EXTRACTION
nBits = bi2de(bitget(coverPlane(1:HEADER_LEN),1));
extBits = bitget(coverPlane(HEADER_LEN+1:HEADER_LEN+nBits),1);
extractedBytes = bi2de(reshape(extBits',8,length(extBits)/8)')';
Along with your message bytes you have to embed the length of the secret, so the extractor knows how many bits to extract.
If you embed with k > 1 or in more than one colour planes, the logic becomes more complicated and you have to be careful how you implement the changes.
For example, you can choose to embed in each colour plane at a time until you run out of bits to hide, or you can flatten the whole pixel array with coverImage(:), which will embed in the RGB of each pixel, one pixel at a time until you run out of bits.
If you embed with k > 1, you have to pad your bits vector with 0s until its length is divisible by k. Then you can combine your bits in groups of k with
bits = bi2de(reshape(a',k,length(bits)/k)')';
And to embed them, you want to resort back to using bitand() and bitor().
coverPlane(1:nBits) = bitor(bitand(coverPlane(1:nBits), bitcmp(2^k-1,'uint8')), bits);
There are more details, like extracting exactly 24 bits for the message length and I can't stress enough you have to think very carefully how you implement all of those things. You can't just stitch parts from different code snippets and expect everything to do what you want it to do.
I'm working on LSB-DCT based Image steganography in which i have to apply LSB to DCT coefficients of the image for data embedding to JPEG.i'm new to all this.so searched and read some research papers they all lack a lot of information regarding the process after DCT.i also read many questions and answers on stackoverflow too and got more confused.
here are the questions:
1-reasearch paper and in question on the web they all are using 8x8 block size from image for DCT..what i should do if the resolution of image does not completely divides into 8x8 blocks like 724 x 520.
520 / 8 = 65 but 724 / 8 = 90.5
2-if i have a lot of blocks and some information to hide which we suppose can fit into 5 blocks..do i still need to take dct of the remaining blocks and and idct.
3-do i need to apply quantization after dct and then apply lsb or i can apply lsb directly??
4-research papers are not mentioning anything about not to touch quantized dct coefficients with value 0 and 1 and the first value..now should i use them or not?? and why not?? i get it about the 0 because it's was high frequency components and is removed in JPEG for compression..and i'm not doing any compression..so can i use it and still produce the same JPEG file???
5-in quantization we divide the DCT Coefficients with quantization matrix and round off the values.in reverse,i have to multiply quantization matrix with DCT Coefficients just..no undo for round off???
For the Comment on DCT and then IDCT:
From different Research Papers:
JPEG steganography
If you want to save your image to jpeg, you have to follow the jpeg encoding process. Unfortunately, papers most I've read say don't do it justice. The complete process is the following (wiki summary of a 182-page specifications book):
RGB to YCbCr conversion (optional),
subsampling of the chroma channels (optional),
8x8 block splitting,
pixel value recentering,
DCT,
quantisation based on compression ratio/quality,
order the coefficients in a zigzag pattern, and
entropy encoding; most frequently involving Huffman coding and run-length encoding (RLE).
There are actually a lot more details involved, such as headers, section markers, specifics of how to store the DC and AC coefficients, etc. Then, there are aspects that the standard has only loosely defined and their implementation can vary between codecs, e.g., subsampling algorithm, quantisation tables and entropy encoding. That said, most pieces of software abide by the general JFIF standard and can be read by various software. If you want your jpeg file to do the same, be prepared to write hundreds (to about a thousand) lines of code just for an encoder. You're better off borrowing an encoder that has already been published on the internet than writing your own. You can start by looking into libjpeg which is written in C and forms the basis of many other jpeg codecs, its C# implementation or even a Java version inspired by it.
In some pseudocode, the encoding/decoding process can be described as follows.
function saveToJpeg(pixels, fileout) {
// pixels is a 2D or 3D array containing your raw pixel values
// blocks is a list of 2D arrays of size 8x8 each, containing pixel values
blocks = splitBlocks(pixels);
// a list similar to blocks, but for the DCT coefficients
coeffs = dct(blocks);
saveCoefficients(coeffs, fileout);
}
function loadJpeg(filein) {
coeffs = readCoefficients(filein);
blocks = idct(coeffs);
pixels = combineBlocks(blocks);
return pixels;
}
For steganography, you'd modify it as follows.
function embedSecretToJpeg(pixels, secret, fileout) {
blocks = splitBlocks(pixels);
coeffs = dct(blocks);
modified_coeffs = embedSecret(coeffs, secret);
saveCoefficients(modified_coeffs, fileout);
}
function extractSecretFromJpeg(filein) {
coeffs = readCoefficients(filein);
secret = extractSecret(coeffs);
return secret;
}
If your cover image is already in jpeg, there is no need to load it with a decoder to pixels and then pass it to an encoder to embed your message. You can do this instead.
function embedSecretToJpeg(pixels, secret, filein, fileout) {
coeffs = readCoefficients(filein);
modified_coeffs = embedSecret(coeffs, secret);
saveCoefficients(modified_coeffs, fileout);
}
As far as your questions are concerned, 1, 2, 3 and 5 should be taken care of by the encoder/decoder unless you're writing one yourself.
Question 1: Generally, you want to pad the image with the necessary number of rows/columns so that both the width and height are divisible by 8. Internally, the encoder will keep track of the padded rows/columns, so that the decoder will discard them after reconstruction. The choice of pixel value for these dummy rows/columns is up to you, but you're advised against using a constant value because it will result to ringing artifacts which has to do with the fact that the Fourier transform of a square wave being the sinc function.
Question 2: While you'll modify only a few blocks, the encoding process requires you to transform them all so they can be stored to a file.
Question 3: You have to quantise the float DCT coefficients as that's what's stored losslessly to a file. You can modify them to your heart's content after the quantisation step.
Question 4: Nobody prevents you from modifying any coefficient, but you have to remember each coefficient affects all 64 pixels in a block. The DC coefficient and the low frequency AC ones introduce the biggest distortions, so you might want to stay away from them. More specifically, because of the way the DC coefficients are stored, modifying one would propage the distortion to all following blocks.
Since most high frequency coefficients are 0, they are efficiently compressed with RLE. Modifying a 0 coefficient may flip it to a 1 (if you're doing basic LSB substitution), which disrupts this efficient compression.
Lastly, some algorithms store their secret in any non-zero coefficients and will skip any 0s. However, if you attempted to modify a 1, it might flip to a 0 and in the extraction process you'd blindly skip reading it. Therefore, such algorithms don't go near any coefficients with the value of 1 or 0.
Question 5: In decoding you just multiply the coefficient with the respective quantisation table value. For example, the DC coefficient is 309.443 and quantisation gives you round(309.443 / 16) = 19. The rounding off bit is the lossy part here, which doesn't allow you to reconstruct 309.433. So the reverse is simply 19 * 16 = 304.
Other uses of DCT in steganography
Frequency transforms, such as DCT and DWT can be used in steganography to embed the secret in the frequency domain but not necessarily store the stego image to jpeg. This process is pixels -> DCT -> coefficients -> modify coefficients -> IDCT -> pixels, which is what you send to the receiver. As such, the choice of format matters here. If you decide to save your pixels to jpeg, your secret in the DCT coefficients may be disturbed by another layer of quantisation from the jpeg encoding.
I am enrolled in a Coursera Machine Learning course where I am learning about neural networks. I got some hand-written digits data from this link: http://yann.lecun.com/exdb/mnist/
Now I want to convert these data in to .jpg format, and I am using this code.
function nx=conv(x)
nx=zeros(size(x));
for i=1:size(x,1)
c=reshape(x(i,:),20,20);
imwrite(c,'data.jpg','jpg')
nx(i,:)=(imread('data.jpg'))(:)';
delete('data.jpg');
end
end
Then, I run the above code with:
nx=conv(x);
x is 5000 training examples of handwritten digits. Each training example is a 20 x 20 pixel grayscale image of a digit. Each pixel is represented by a floating point number indicating the grayscale intensity at that location.
The 20 x 20 grid of pixels is "unrolled" into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix x. This gives us a 5000 x 400 matrix x where every row is a training example for a handwritten digit image.
After I run this code, I rewrite an image to disk to check:
imwrite(nx(1,:),'check.jpg','jpg')
However, I find the image is fuzzy. How would I convert these images correctly?
You are saving the images using JPEG, which is a lossy compression algorithm to save the images. Lossy compression algorithms guarantee a high compression ratio, but at the expense of slightly degrading your image. That's probably why you are seeing fuzzy images as it is attributed to compression artifacts.
From the looks of it, you want to save exactly what the data should be to file. As such, use a lossless compression algorithm instead, like PNG. Therefore, change your saving line of code to use PNG:
imwrite(c,'data.png','png')
nx(i,:)=(imread('data.png'))(:)';
delete('data.png');
Also:
imwrite(nx(1,:),'check.png','png')
I was trying to calculate the CR(compressing ratio) of an image that I compressed and decompressed using FFT in matlab. I have read a similar post here about CR calculation but I did not get the method he was using to process the image. That post was saying that : CR = numel(X)/numel(Y)
What I understood is that X is my image before FFT and Y is after. So I said that
I=imread('flowers.tif')
RGB = im2double(I);
%process...
iRGB = my reconstructed image after iFFT
CR = numel(RGB)/numel(iRGB);
But this results in CR =1 which I do not think that is the correct answer. Can someone explain to me what am I missing?
The compression rate is the ratio of numel of the compressed representation and the un-compressed one. Your iRGB is a reconstructed representation and therefore has the same number of elements as RGB (you need to reconstruct the entire image). For the CR you need the numel of your compressed representation.
My team wish to calculate the contrast between two photographs taken in a wet environment.
We will calculate contrast using the formula
Contrast = SQRT((ΔL)^2 + (Δa)^2 + (Δb)^2)
where ΔL is the difference in luminosity, Δa is the difference in (redness-greeness) and Δb is (yellowness-blueness), which are the dimensions of Lab space.
Our (so far successful) approach has been to convert each pixel from RGB to Lab space, and taking the mean values of the relevant sections of the image as our A and B variables.
However the environment limits us to using a (waterproof) GoPro camera which compresses images to JPEG format, rather than saving as TIFF, so we are not using a true-colour image.
We now need to quantify the uncertainty in the contrast - for which we need to know the uncertainty in A and B and by extension the uncertainties (or mean/typical uncertainty) in each a and b value for each RGB pixel. We can calculate this only if we know the typical/maximum uncertainty produced when converting from true-colour to JPEG.
Therefore we need to know the maximum possible difference in each of the RGB channels when saving in JPEG format.
EG. if true colour RGB pixel (5, 7, 9) became (2, 9, 13) after compression the uncertainty in each channel would be (+/- 3, +/- 2, +/- 4).
We believe that the camera compresses colour in the aspect ratio 4:2:0 - is there a way to test this?
However our main question is; is there any way of knowing the maximum possible error in each channel, or calculating the uncertainty from the compressed RGB result?
Note: We know it is impossible to convert back from JPEG to TIFF as JPEG compression is lossy. We merely need to quantify the extent of this loss on colour.
In short, it is not possible to absolutely quantify the maximum possible difference in digital counts in a JPEG image.
You highlight one of these points well already. When image data is encoded using the JPEG standard, it is first converted to the YCbCr color space.
Once in this color space, the chroma channels (Cb and Cr) are downsampled, because the human visual system is less sensitive to artifacts in chroma information than it is lightness information.
The error introduced here is content-dependent; an area of very rapidly varying chroma and hue will have considerably more content loss than an area of constant hue/chroma.
Even knowing the 4:2:0 compression, which describes the amount and geometry of downsampling (more information here), the content still dictates the error introduced at this step.
Another problem is the quantization performed in JPEG compression.
The resulting information is encoded using a Discrete Cosine Transform. In the transformed space, the results are again quantized depending on the desired quality. This quantization is set at the time of file generation, which is performed in-camera. Again, even if you knew the exact DCT quantization being performed by the camera, the actual effect on RGB digital counts is ultimately content-dependent.
Yet another difficulty is noise created by DCT block artifacts, which (again) is content dependent.
These scene dependencies make the algorithm very good for visual image compression, but very difficult to characterize absolutely.
However, there is some light at the end of the tunnel. JPEG compression will cause significantly more error in areas of rapidly changing image content. Areas of constant color and texture will have significantly less compression error and artifacts. Depending on your application you may be able to leverage this to your benefit.