I'm working on LSB-DCT based Image steganography in which i have to apply LSB to DCT coefficients of the image for data embedding to JPEG.i'm new to all this.so searched and read some research papers they all lack a lot of information regarding the process after DCT.i also read many questions and answers on stackoverflow too and got more confused.
here are the questions:
1-reasearch paper and in question on the web they all are using 8x8 block size from image for DCT..what i should do if the resolution of image does not completely divides into 8x8 blocks like 724 x 520.
520 / 8 = 65 but 724 / 8 = 90.5
2-if i have a lot of blocks and some information to hide which we suppose can fit into 5 blocks..do i still need to take dct of the remaining blocks and and idct.
3-do i need to apply quantization after dct and then apply lsb or i can apply lsb directly??
4-research papers are not mentioning anything about not to touch quantized dct coefficients with value 0 and 1 and the first value..now should i use them or not?? and why not?? i get it about the 0 because it's was high frequency components and is removed in JPEG for compression..and i'm not doing any compression..so can i use it and still produce the same JPEG file???
5-in quantization we divide the DCT Coefficients with quantization matrix and round off the values.in reverse,i have to multiply quantization matrix with DCT Coefficients just..no undo for round off???
For the Comment on DCT and then IDCT:
From different Research Papers:
JPEG steganography
If you want to save your image to jpeg, you have to follow the jpeg encoding process. Unfortunately, papers most I've read say don't do it justice. The complete process is the following (wiki summary of a 182-page specifications book):
RGB to YCbCr conversion (optional),
subsampling of the chroma channels (optional),
8x8 block splitting,
pixel value recentering,
DCT,
quantisation based on compression ratio/quality,
order the coefficients in a zigzag pattern, and
entropy encoding; most frequently involving Huffman coding and run-length encoding (RLE).
There are actually a lot more details involved, such as headers, section markers, specifics of how to store the DC and AC coefficients, etc. Then, there are aspects that the standard has only loosely defined and their implementation can vary between codecs, e.g., subsampling algorithm, quantisation tables and entropy encoding. That said, most pieces of software abide by the general JFIF standard and can be read by various software. If you want your jpeg file to do the same, be prepared to write hundreds (to about a thousand) lines of code just for an encoder. You're better off borrowing an encoder that has already been published on the internet than writing your own. You can start by looking into libjpeg which is written in C and forms the basis of many other jpeg codecs, its C# implementation or even a Java version inspired by it.
In some pseudocode, the encoding/decoding process can be described as follows.
function saveToJpeg(pixels, fileout) {
// pixels is a 2D or 3D array containing your raw pixel values
// blocks is a list of 2D arrays of size 8x8 each, containing pixel values
blocks = splitBlocks(pixels);
// a list similar to blocks, but for the DCT coefficients
coeffs = dct(blocks);
saveCoefficients(coeffs, fileout);
}
function loadJpeg(filein) {
coeffs = readCoefficients(filein);
blocks = idct(coeffs);
pixels = combineBlocks(blocks);
return pixels;
}
For steganography, you'd modify it as follows.
function embedSecretToJpeg(pixels, secret, fileout) {
blocks = splitBlocks(pixels);
coeffs = dct(blocks);
modified_coeffs = embedSecret(coeffs, secret);
saveCoefficients(modified_coeffs, fileout);
}
function extractSecretFromJpeg(filein) {
coeffs = readCoefficients(filein);
secret = extractSecret(coeffs);
return secret;
}
If your cover image is already in jpeg, there is no need to load it with a decoder to pixels and then pass it to an encoder to embed your message. You can do this instead.
function embedSecretToJpeg(pixels, secret, filein, fileout) {
coeffs = readCoefficients(filein);
modified_coeffs = embedSecret(coeffs, secret);
saveCoefficients(modified_coeffs, fileout);
}
As far as your questions are concerned, 1, 2, 3 and 5 should be taken care of by the encoder/decoder unless you're writing one yourself.
Question 1: Generally, you want to pad the image with the necessary number of rows/columns so that both the width and height are divisible by 8. Internally, the encoder will keep track of the padded rows/columns, so that the decoder will discard them after reconstruction. The choice of pixel value for these dummy rows/columns is up to you, but you're advised against using a constant value because it will result to ringing artifacts which has to do with the fact that the Fourier transform of a square wave being the sinc function.
Question 2: While you'll modify only a few blocks, the encoding process requires you to transform them all so they can be stored to a file.
Question 3: You have to quantise the float DCT coefficients as that's what's stored losslessly to a file. You can modify them to your heart's content after the quantisation step.
Question 4: Nobody prevents you from modifying any coefficient, but you have to remember each coefficient affects all 64 pixels in a block. The DC coefficient and the low frequency AC ones introduce the biggest distortions, so you might want to stay away from them. More specifically, because of the way the DC coefficients are stored, modifying one would propage the distortion to all following blocks.
Since most high frequency coefficients are 0, they are efficiently compressed with RLE. Modifying a 0 coefficient may flip it to a 1 (if you're doing basic LSB substitution), which disrupts this efficient compression.
Lastly, some algorithms store their secret in any non-zero coefficients and will skip any 0s. However, if you attempted to modify a 1, it might flip to a 0 and in the extraction process you'd blindly skip reading it. Therefore, such algorithms don't go near any coefficients with the value of 1 or 0.
Question 5: In decoding you just multiply the coefficient with the respective quantisation table value. For example, the DC coefficient is 309.443 and quantisation gives you round(309.443 / 16) = 19. The rounding off bit is the lossy part here, which doesn't allow you to reconstruct 309.433. So the reverse is simply 19 * 16 = 304.
Other uses of DCT in steganography
Frequency transforms, such as DCT and DWT can be used in steganography to embed the secret in the frequency domain but not necessarily store the stego image to jpeg. This process is pixels -> DCT -> coefficients -> modify coefficients -> IDCT -> pixels, which is what you send to the receiver. As such, the choice of format matters here. If you decide to save your pixels to jpeg, your secret in the DCT coefficients may be disturbed by another layer of quantisation from the jpeg encoding.
Related
I am working on a application where images at different focus planes are aquired and currently stored inside a multipage tif. Unfortunately the tif-based compression techniques does not
benefit
from the signal redundancy over the different focus planes.
I found some resourcs about this
here
ZPEG
and here
JPEG2000 Addon
unfortunately they are all far away from a standard.
I was wondering if there is probably a video codec which could achive great compression ratios in this scenario?
I am also very open very any other ideas.
Here's a different approach: turning the cross-plane redundancy into spatial redundancy and then using standard image compression.
In the simplest way, just take strips of width*1 pixel, from every plane, and stack them. As an image, that will look vertically smeared in a weird way. It's best if this lines up with DCT blocks (if applicable) to avoid having a sharp horizontal edge through a block, so it should probably be padded to a multiple of (usually) 8 planes by duplicating a plane. You could gain a bit more by optimizing the padding for minimum energy, but that's complicated whereas duplicating is already pretty good and trivial.
It obviously wouldn't compress well with unfiltered lossless compression, but PNG with a suitable filter (up, average or paeth) should work.
The problem with tiff is that it does not support inter-component decorrelation in its baseline. There are some extensions (not very broadly supported) that allow storing other file compression formats (such as a complete JPEG2000 JP2 file, extension 0x8798), but it is not guaranteed that an standard decoder will process it correctly.
If you can use any tool you want, close to optimal coding performance is probably obtained with a good spectral decorrelation transform (the KLT for lossy compression and the RKLT for lossless compression - see http://gici.uab.cat/GiciWebPage/downloads.php#spectral for a JAVA implementation of these transform) and then a good compression algorithm such as JPEG2000. On the other hand, this approach can be a bit complicated to implement and slow due to the KLT/RKLT transforms.
Other simpler approach is to simply use JPEG2000 with the DWT for spectral decorrelation. For instance, if you use the Kakadu implementation (kakadusoftware.com), you just need to use the proper parameters when compressing. Here you have an example invocation extracted from http://kakadusoftware.com/wp-content/uploads/2014/06/Usage_Examples.txt,
Ai) kdu_compress -i catscan.rawl*35#524288 -o catscan.jpx -jpx_layers *
-jpx_space sLUM Creversible=yes Sdims={512,512} Clayers=16
Mcomponents=35 Msigned=no Mprecision=12
Sprecision=12,12,12,12,12,13 Ssigned=no,no,no,no,no,yes
Mvector_size:I4=35 Mvector_coeffs:I4=2048
Mstage_inputs:I25={0,34} Mstage_outputs:I25={0,34}
Mstage_collections:I25={35,35}
Mstage_xforms:I25={DWT,1,4,3,0}
Mnum_stages=1 Mstages=25
-- Compresses a medical volume consisting of 35 slices, each 512x512,
represented in raw little-endian format with 12-bits per sample,
packed into 2 bytes per sample. This example follows example (x)
above, but adds a multi-component transform, which is implemented
using a 3 level DWT, based on the 5/3 reversible kernel (the kernel-id
is 1, which is found in the second field of the `Mstage_xforms' record.
-- To decode the above parameter attributes, note that:
a) There is only one multi-component transform stage, whose instance
index is 25 (this is the I25 suffix found on the descriptive
attributes for this stage). The value 25 is entirely arbitrary. I
picked it to make things interesting. There can, in general, be
any number of transform stages.
b) The single transform stage consists of only one transform block,
defined by the `Mstage_xforms:I25' attribute -- there can be
any number of transform blocks, in general.
c) This block takes 35 input components and produces 35 output
components, as indicated by the `Mstage_collections:I25' attribute.
d) The stage inputs and stage outputs are not permuted in this example;
they are enumerated as 0-34 in each case, as given by the
`Mstage_inputs:I25' and `Mstage_outputs:I25' attributes.
e) The transform block itself is implemented using a DWT, whose kernel
ID is 1 (this is the Part-1 5/3 reversible DWT kernel). Block
outputs are added to the offset vector whose instance index is 4
(as given by `Mvector_size:I4' and `Mvector_coeffs:I4') and the
DWT has 3 levels. The final field in the `Mstage_xforms' record
is set to 0, meaning that the canvas origin for the multi-component
DWT is to be taken as 0.
f) Since a multi-component transform is being used, the precision
and signed/unsigned properties of the final decompressed (or
original compressed) image components are given by `Mprecision'
and `Msigned', while their number is given by `Mcomponents'.
g) The `Sprecision' and `Ssigned' attributes record the precision
and signed/unsigned characteristics of what we call the codestream
components -- i.e., the components which are obtained by block
decoding and spatial inverse wavelet transformation. In this
case, the first 5 are low-pass subband components, at the bottom
of the DWT tree; the next 4 are high-pass subband components
from level 3; then come 9 high-pass components from level 2 of
the DWT; and finally the 17 high-pass components belonging to
the first DWT level. DWT normalization conventions for both
reversible and irreversible multi-component transforms dictate
that all high-pass subbands have a passband gain of 2, while
low-pass subbands have a passband gain of 1. This is why all
but the first 5 `Sprecision' values have an extra bit -- remember
that missing entries in the `Sprecision' and `Ssigned' arrays
are obtained by replicating the last supplied value.
I am trying to compress a grayscale image using Huffman coding in MATLAB, and have tried the following code.
I have used a grayscale image with size 512x512 in tif format. My problem is that the size of the compressed image (length of the compressed codeword) is getting bigger than the size of the uncompressed image. The compression ratio is getting less than 1.
clc;
clear all;
A1 = imread('fig1.tif');
[M N]=size(A1);
A = A1(:);
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
[dict,avglen]=huffmandict(count,p) % build the Huffman dictionary
comp= huffmanenco(A,dict); %encode your original image with the dictionary you just built
compression_ratio= (512*512*8)/length(comp) %computing the compression ratio
%% DECODING
Im = huffmandeco(comp,dict); % Decode the code
I11=uint8(Im);
decomp=reshape(I11,M,N);
imshow(decomp);
There is a slight error in your code. I'm assuming you want to calculate the probability of encountering each pixel, which is the normalized histogram. You're not computing it properly. Specifically:
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');
p(i)=count(i)/total;
end
total is summing over [0,255] which is not correct. You're supposed to compute the probability distribution of your image. You should use imhist for that instead. As such, you should do this instead:
count = 0:255;
p = imhist(A1) / numel(A1);
This will correctly calculate your probability distribution for your image. Remember, when you're doing Huffman coding, you need to specify the probability of encountering a pixel. Assuming that each pixel can equally be likely to be chosen, this is captured by calculating the image's histogram, then normalizing by the total number of pixels in your image. Try that and see if you get any better results.
However, Huffman will only give you good compression ratios if you have frequently occurring symbols. Did you happen to take a look at the histogram or the spread of your pixels in your image?
If the spread is quite large, with very few entries per bin, then Huffman will not give you any compression savings. In fact it may give you a larger size as a result. Bear in mind that the TIFF compression standard only uses Huffman as part of the algorithm. There is also some pre- and post-processing done to further drive down the size.
As a further example, suppose I had an image that consisted of [0, 1, 2, ... 255; 0, 1, 2, ..., 255; 0, 1, 2, ..., 255]; I have 3 rows of [0,255], but really it could be any number of rows. This means that the probability of encountering each symbol is equiprobable, or 1/255, which means that for each symbol, we would need 8 bits per symbol... which is essentially the raw pixel value anyway!
The key behind Huffman is that a group of bits together generate one symbol. Frequently occurring symbols get assigned a smaller sequence of bits. Because this particular image that I talked about has intensities that are equiprobable, then you'd only generate one symbol per intensity rather than a group. With this, not only will you transmit the dictionary, you would effectively be sending one character at a time, and this is no better than sending the raw byte stream.
If you want your image to be compressed by raw Huffman, the distribution of pixels has to be skewed. For example, if most of the intensities in your image are dark, or are bright. If your image has good contrast or if the spread of the pixel intensities is flat throughout the image, then Huffman will not give you any compression savings.
My team wish to calculate the contrast between two photographs taken in a wet environment.
We will calculate contrast using the formula
Contrast = SQRT((ΔL)^2 + (Δa)^2 + (Δb)^2)
where ΔL is the difference in luminosity, Δa is the difference in (redness-greeness) and Δb is (yellowness-blueness), which are the dimensions of Lab space.
Our (so far successful) approach has been to convert each pixel from RGB to Lab space, and taking the mean values of the relevant sections of the image as our A and B variables.
However the environment limits us to using a (waterproof) GoPro camera which compresses images to JPEG format, rather than saving as TIFF, so we are not using a true-colour image.
We now need to quantify the uncertainty in the contrast - for which we need to know the uncertainty in A and B and by extension the uncertainties (or mean/typical uncertainty) in each a and b value for each RGB pixel. We can calculate this only if we know the typical/maximum uncertainty produced when converting from true-colour to JPEG.
Therefore we need to know the maximum possible difference in each of the RGB channels when saving in JPEG format.
EG. if true colour RGB pixel (5, 7, 9) became (2, 9, 13) after compression the uncertainty in each channel would be (+/- 3, +/- 2, +/- 4).
We believe that the camera compresses colour in the aspect ratio 4:2:0 - is there a way to test this?
However our main question is; is there any way of knowing the maximum possible error in each channel, or calculating the uncertainty from the compressed RGB result?
Note: We know it is impossible to convert back from JPEG to TIFF as JPEG compression is lossy. We merely need to quantify the extent of this loss on colour.
In short, it is not possible to absolutely quantify the maximum possible difference in digital counts in a JPEG image.
You highlight one of these points well already. When image data is encoded using the JPEG standard, it is first converted to the YCbCr color space.
Once in this color space, the chroma channels (Cb and Cr) are downsampled, because the human visual system is less sensitive to artifacts in chroma information than it is lightness information.
The error introduced here is content-dependent; an area of very rapidly varying chroma and hue will have considerably more content loss than an area of constant hue/chroma.
Even knowing the 4:2:0 compression, which describes the amount and geometry of downsampling (more information here), the content still dictates the error introduced at this step.
Another problem is the quantization performed in JPEG compression.
The resulting information is encoded using a Discrete Cosine Transform. In the transformed space, the results are again quantized depending on the desired quality. This quantization is set at the time of file generation, which is performed in-camera. Again, even if you knew the exact DCT quantization being performed by the camera, the actual effect on RGB digital counts is ultimately content-dependent.
Yet another difficulty is noise created by DCT block artifacts, which (again) is content dependent.
These scene dependencies make the algorithm very good for visual image compression, but very difficult to characterize absolutely.
However, there is some light at the end of the tunnel. JPEG compression will cause significantly more error in areas of rapidly changing image content. Areas of constant color and texture will have significantly less compression error and artifacts. Depending on your application you may be able to leverage this to your benefit.
I'm trying to implement JPEG Compression (or as close to it as I can), but there are some points that I need clarity on with the actual implementation. I will explain what I currently know and where I see the issues, if anyone could clear them up that would be fantastic.
The first step is to split the image into 8x8 blocks. But I am not sure on the best way to do this, for example, what dimension array would be best to use to store all of these segments considering that it will have to have chroma down sampled and then DCT applied. Would it be 3D array (two dimensions to store 2D elements of the image and then one dimension for the color channels) and then iterate through in groups of 8 or a 4D array (with an extra dimension for storing each 8x8 group) or another method entirely.
I can then potentially see issues with the chrominance down sampling because then the size of the array would have to change size once the number of chrominance values have been reduced and these would then have to be put into the DCT which cant really take all of the different sized arrays for chrominance and luminescence at the same time.
Also is the idea of the DCT that it takes all three color channels of the 8x8 group and then converts the three values to one value thus saving space or does it take each color channel one at a time (if so I don't really understand the point how converting to Fourier space makes the compression more efficient)? Also I have noticed that the values I get for the DCT are well out of bounds of 0-255 and are instead much higher. As far as I know these values for each 8x8 block would be divided by the IJG standard quantization matrix follow by different entropy encoding.
I realize this question covers a lot of areas and is quite messy, but I can provide any additional information if required, any help would be greatly appreciated.
"Digital Video Compression" by Peter Symes has a chapter on JPEG, and is a good introduction to compression in general.
The reference implementation for JPEG might be a good start.
The JPEG compression encoding process splits a given image into blocks of 8x8 pixels, working with these blocks in future lossy and lossless compressions. [source]
It is also mentioned that if the image is a multiple 1MCU block (defined as a Minimum Coded Unit, 'usually 16 pixels in both directions') that lossless alterations to a JPEG can be performed. [source]
I am working with product images and would like to know both if, and how much benefit can be derived from using multiples of 16 in my final image size (say, using an image with size 480px by 360px) vs. a non-multiple of 16 (such as 484x362). In this example I am not interested in further alterations, editing, or recompression of the final image.
To try to get closer to a specific answer where I know there must be largely generalities: Given a 480x360 image that is 64k and saved at maximum quality in Photoshop [example]:
Can I expect any quality loss from an image that is 484x362
What amount of file size addition can I expect (for this example, the additional space would be white pixels)
Are there any other disadvantages to growing larger than the 8px grid?
I know it's arbitrary to use that specific example, but it would still be helpful (for me and potentially any others pondering an image size) to understand what level of compromise I'd be dealing with in breaking the non-8px grid.
The key issue here is a debate I've had is whether 8-pixel divisible images are higher quality than images that are not divisible by 8-pixels.
8 pixels is the cutoff. The reason is because JPEG images are simply an array of 8x8 DCT blocks; if the image resolution isn't mod8 in both directions, the encoder has to pad the sides up to the next mod8 resolution. This in practice is not very expensive bit-wise; what's much worse are the cases when an image has sharp black lines (such as a letterboxed image) that don't lie on block boundaries. This is especially problematic in video encoding. The reason for this being a problem is that the frequency transform of a sharp line is a Gaussian distribution of coefficients--resulting in an enormous number of bits to code.
For those curious, the most common method of padding edges in intra compression (such as JPEG images) is to mirror the lines of pixels before the edge. For example, if you need to pad three lines and line X is the edge, line X+1 is equal to line X, line X+2 is equal to line X-1, and line X+3 is equal to line X-2. This quite effectively minimizes the cost in transform coefficients of the extra lines.
In inter coding, however, the padding algorithms generally simply duplicate the last line, because the mirror method does not work well for inter compression, such as in video compression.
Sometimes you need to use 16 pixel boundaries rather than 8 because of subsampling; every 2nd pixel is thrown away during the encoding process, and those 8x8 DCT blocks started out as 16x16 and will decode back to 16x16. This won't be a problem at the highest quality settings.
A JPG with sizes being multiplies of 8 can also be rotated/flipped with no quality loss. For example gthumb can do this on Linux.
The image dimensions being multiples of 8 or 16 is not going to affect the size on disk very much, but you can get dramatic savings if you can line up the visual contents to the 8x8 pixel grid, such as if there is a repeating pattern or texture in the image.
What Tometzky said. If you don't have the correct multiple, the lossless flip and rotate algorithms don't work. That's because the padding on the right/bottom that can be safely ignored now ends up on the left/top, where it can't.