Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I know how to find the maximum possible capacity of an image given the resolution, image type, and bits per pixel for hiding. How do I find the estimated message size?
Say the image is 100 x 200 pixels and is a 8-bit paletted image. And we are hiding 4-Bit LSB. What would the estimated message size be?
The estimated message length is the total length of 1s and 0s you will embed. This is composed of the header (optional) and message stream.
Size of message (message stream)
This depends on the size of the message and how you hide it. Generally, you want to ask what form your message takes when you convert it to 1s and 0s (message stream). The numbers 0-255 and ASCII characters can be represented by 8 bits each. The most straightforward examples include:
plain text: number of characters x 8
binary (black and white) image: height x width
grayscale image: height x width x 8
colour image: height x width x 24
You can also decide to compress your message before embedding with, for example, Huffman coding. This should convert your message to fewer bits than the above examples, but you will also need to include your decoding table in your message so that the recipient can decode the message. Overall, Huffman compression results to a shorter length, including the table, for even just a few hundred characters.
Metadata (header)
Speaking of metadata, in many cases, embedding just your message stream is not enough. For a message stream which is shorter than the maximum message capacity, you have 3 options on what to do with the remaining space:
fill it with random 1s and 0s (effectively introduce most distortions to the cover image),
do nothing, or
stretch the message so that it takes up as much of the maximum message capacity as possible, e.g. matrix encoding in the F5 steganographic system.
If you decide to do nothing, you may need to tell the recipient how many pixels he has to read to extract the whole message so not to carry on reading gibberish. You can either say the total number of bits of the message stream, or how many pixels to read until all the information is extracted. For the former option, the tendency is to devote 32 bits for the message length, but that can be quite the overkill. You can either set a more practical limit, or adopt a more dynamic approach.
A practical limit would be 24 bits, if you assume you will never use a bigger cover than a 1920x1200 grayscale image with 4-bit LSB embedding (1920x1200x4 = 9216000 < 2^24 maximum storage capacity).
A more dynamic approach would be to estimate the minimum number of bits to represent the message length, e.g. 8 bits for up to a message length of 256, 9 bits for up to 512, etc. Then encode this number to a 5-bit value, followed by the message length. For example, if the message length is 3546 bits, using 32 bits to encode the length, it becomes 00000000000000000000110111011010. But with the dynamic approach, it is 01100 10111011010, where 01100 is binary for 12, which says to read the 12 following bits to obtain the message length.
If your program handles both text and images as the secret message, you'll also need to tell the recipient the type of the secret. If you are ever only going to use the four above types, I would encode that using two bits: plain text = 00, binary image = 01, grayscale image = 10, colour image = 11.
If the secret is an image, you'll also need to provide the height and width lengths. 16x2 bits is the general tendency here. But similarly to the message length above, you can use something more practical. For example, you can expect no image to have more than 2048 pixel length for either width or height, so 11x2 bits will be enough to encode this information.
If you embed your secret in more than the last LSB, your message length may not be divisible by that number. For example, a message length of 301, when you embed in the 4-bit LSB. In this case, you need to pad with message with 3 more junk 1s or 0s so that it becomes divisible by 4. Now, 304 is your reported message stream, but after you extract it, you can discard the last 3 bits. It is logical to assume you will never embed in more than 7-bit LSB, so devoting 3 bits to padding should be more than enough.
Depending on what you choose to include in the metadata, you can stitch all of these together and call them header.
Examples
Let's do a couple of examples to see this in action. We let the header format to be in the order of message length, secret type, padding, height, width (last two only if necessary).
Embed the string 'Hello World' using 4-bit LSB
The message stream is 11 x 8 = 88 bits.
88 mod 4 = 0, so the padding is 000 (3 bits).
The message length is 88 = 00111 1011000 (12 bits).
Secret type is text, so 00 (2 bits).
Estimated message length: Header + message stream = (12 + 2 + 3) + 88 = 105 bits.
Embed a grayscale image of size 151 x 256 using 3-bit LSB
The message stream is 151 x 256 x 8 = 309248 bits.
309248 mod 3 = 2, so the padding is 3-2 = 1 = 001 (3 bits).
The message length is 309249 = 10011 1001011100000000001 (24 bits).
Secret type is grayscale image, so 10 (2 bits).
Secret is image, so adding the width and height using 2 16-bit numbers (32 bits).
Estimated message length: Header + message stream = (24 + 2 + 3 + 32) + 309249 = 309310 bits.
Related
This is all of the information I was provided in the practice question. I am trying to figure out how to calculate it when prompted to do so on an exam...
How to determine the number of bytes necessary to store an uncompressed grayscale image of size 8000 × 3400 pixels?
I am also curious how the calculation changes if the image is a compressed binary image.
"I am trying to figure out how to calculate it when prompted to do so on an exam."
There are 8 bits to make 1 byte, so once you know how many bits-per-pixel (bpp) you have, this is a very simple calculation.
For 8 bits per pixel greyscale, just multiply the width by the height.
8000 * 3400 = 27200000 bytes.
For 1 bit per pixel black&white, multiply the width by the height and then divide by 8.
(8000 * 3400) / 8 = 3400000 bytes.
It's critical that the image is uncompressed, and that there's no padding at the end of each raster line. Otherwise the count will be off.
The first thing to work out is how many pixels you have. That is easy, it is just the width of the image multiplied by the height:
N = w * h
So, in your case:
N = 8000 * 3400 = 27200000 pixels
Next, in general you need to work out how many samples (S) you have at each of those 27200000 pixel locations in the image. That depends on the type of the image:
if the image is greyscale, you will have a single grey value at each location, so S=1
if the image is greyscale and has transparency as well, you will have a grey value plus a transparency (alpha) value at each location, so S=2
if the image is colour, you will have three samples for each pixel - one Red sample, one Green sample and one Blue sample, so S=3
if the image is colour and has transparency as well, you will get the 3 RGB values plus a transparency (alpha) value for each pixel, so S=4
there are others, but let's not get too complicated
The final piece of the jigsaw is how big each sample is, or how much storage it takes, i.e. the bytes per sample (B).
8-bit data takes 1 byte per sample, so B=1
16-bit data takes 2 bytes per sample, so B=2
32-bit floating point or integer data take 4 bytes per sample, so B=4
there are others, but let's not get too complicated
So, the actual answer for an uncompressed greyscale image is:
storage required = w * h * S * B
and in your specific case:
storage required = 8000 * 3400 * 1 * 1 = 27200000 bytes
If the image were compressed, the only thing you should hope and expect is that it takes less storage. The actual amount required will depend on:
how repetitive/predictable the image is - the more predictable the image is, in general, the better it will compress
how many colours the image contains - fewer colours generally means better compression
which image file format you require (PNG, JPEG, TIFF, GIF)
which compression algorithm you use (RLE, LZW, DCT)
how long you are prepared to wait for compression and decompression - the longer you can wait, the better you can compress in general
what losses/inaccuracies you are prepared to tolerate to save space - if you are prepared to accept a lower quality version of your image, you can get a smaller file
Determine the number of bytes necessary to store an uncompressed RGB color image of size 640 ×
480 pixels using 8, 10, 12, and 14 bits per color channel?
I know how to calculate the size of image by using Size = (rows * columns * bpp) but i cannot understand what bit per color channel means in this question
Bits per color channel, is the number of bits that are used for storing a color component of a single pixel.
RGB color space has 3 channels: Red, Green and Blue.
The "bits per color channel" (bpc) is the number of bits that are used for storing each component (e.g 8 bits for red, 8 bits for green, 8 bits for blue).
The dynamic range of 8 bits is [0, 255] (255 = 2^8-1).
8 bpc applies 24 bits per pixel (bpp).
The number of bits per pixel defines the Color Depth of the image.
24 bpp can represent 2^24 = 16,777,216 different colors.
More bits applies larger range: 12 bits range is [0, 4095] (4095 = 2^12-1), and much larger colors variety can be coded in each pixel.
12 bpc applies 36 bpp, and can represent 2^36 = 68,719,476,736 different colors.
For more information refer to BIT DEPTH TUTORIAL
Remark: The bits per channel is not directly related to memory storage (e.g. it's common to store 12 bit in 2 bytes [16 bits] in memory).
As you probably know, an image is built as a matrix of pixels.
Following figure illustrates the structure of an RGB image:
Following figure illustrates a pixel with 8 bits per color channel:
Following figure illustrates a pixel with 10 bits per color channel:
Following figure illustrates a pixel with 12 bits per color channel:
There subject is much wider than that, but I think that's enough...
I modified some Labview code I found online to use in my program. It works, I understand nearly all of it, but there's one section that confuses me. This is the program:
This program takes 2 images, subtracts them, and returns the picture plus a percentage difference. What I understand is it takes the pictures, subtracts them, converts the subtracted image into an array of colored pixels, then math happens, and the pixels are compared to the threshold. It adds a 1 for every pixel greater than the threshold, divides it by the image size, and out comes a percentage. The part I don't understand is the math part, the whole quotient and remainder section with a "random" 256. Because I don't understand how to get these numbers, I have a percentage, but I don't understand what they mean. Here's a picture of the front panel with 2 different tests.
In the top one, I have a percentage of 15, and the bottom a percentage of 96. This tells me that the bottom one is "96 percent different". But is there anyway to make sure this is accurate?
The other question I have is threshold, as I don't know exactly what that does either. Like if I change the threshold on the bottom image to 30, my percentage is 8%, with the same picture.
I'm sure once I understand the quotient/remainder part, it'll all make sense, but I can't seem to get it. Thank you for your help.
My best guess is that someone tried to characterize difference between 2 images with a single number. The remainder-quotient part is a "poor man" approach to split each 2D array element of the difference into 2 lower bytes (2 remainders) and the upper 2 byte word. Then lower 2 bytes of the difference are summed and the result is added to the upper 2 bytes (as a word). Maybe 3 different bytes each represented different channel of the camera (e.g. RGB color)?
Then, the value is compared against the threshold, and number of pixels above the threshold are calculated. This number is divided by the total number of pixels to calculate the %% difference. So result is a %% of pixels, which differ from the master image by the threshold.
E.g. if certain pixel of your image was 0x00112233 and corresponding master image pixel had a value of 0x00011122, then the number compared to the threshold is (0x11 - 0x01) + (0x22 - 0x11) + (0x33 - 0x22) = 0x10 + 0x11 + 0x11 = 0x32 = 50 decimal.
Whether this is the best possible comparison/difference criteria is the question well outside of this topic.
When you open an image in a text editor you get some characters which don't really makes sense (at least not to me). Is there a way to add comments to that text, so the file would not apear damaged when opened with an image viewer.
So, something like this:
Would turn into this:
The way to go is setting metadata fields if your image format supports any.
For example, for a PNG you can set a comment field when exporting the file or with a separate tool like exiftool:
exiftool -comment="One does not simply put text into the image data" test.png
If the purpose of the text is to ensure ownership then take a look at digital watermarking.
If you are looking to actually encode information in your image you should use steganography ( https://en.wikipedia.org/wiki/Steganography )
The wiki article runs you through the basics and shows and example of a picture of a cat hidden in a picture of trees as an example of how you can hide information. In the case of hiding text you can do the following:
Encoding
Come up with your phase: For argument's sake I'll use the word Hidden
Convert that text to a numeric representation - for simplicity I'll assume ASCII conversion of characters, but you don't have to
"Hidden" = 72 105 100 100 101 110
Convert the numeric representation to Binary
72 = 01001000 / 105 = 01101001 / 100 = 01100100 / 101=01100100 / 110=01101110
For each letter convert the 8 bit binary representations into four 2 bit binary representations that we shall call mA,mR,mG,mB for reasons that will become clear shortly
72 = 01 00 10 00 => 1 0 2 0 = mA mR mG mB
Open an image file for editing: I would suggest using C# to load the image and then use Get/Set Pixels to edit them (How to manipulate images at the pixel level in C# )
use the last 2 bits of each color channel for each pixel to encode your message. For example to encode H in the first pixel of an image you can use the C# code at the end of the instructions
Once all letters of the Word - one per pixel - have been encoded in the image you are done.
Decoding
Use the same basic process in reverse.
You walk through the image one pixel at a time
You take the 2 least significant bits of each color channel in the pixel
You concatenate the LSB together in alpha,red,green,blue order.
You convert the concatenated bits into an 8 bit representation and then convert that binary form to base 10. Finally, you perform a look up on the base 10 number in an ASCII chart, or just cast the number to a char.
You repeat for the next pixel
The thing to remember is that the technique I described will allow you to encode information in the image without a human observer noticing because it only manipulates the image on the last 2 bits of each color channel in a single pixel, and human eyes cannot really tell the difference between the colors in the range of [(252,252,252,252) => (255,255,255,255)].
But as food for thought, I will mention that a computer can with the right algorithms, and there is active research into bettering the ability of a computer to be able to pick this sort of thing out.
So if you only want to put in a watermark then this should work, but if you want to actually hide something you have to encrypt the message and then perform the
Steganography on the encrypted binary. Since encrypted data is MUCH larger than plain text data it will require an image with far more pixels.
Here is the code to encode H into the first pixel of your image in C#.
//H=72 and needs the following message Alpha, message Red, message Green, message Blue components
mA = 1;
mR = 0;
mG = 2;
mB = 0;
Bitmap myBitmap = new Bitmap("YourImage.bmp");
//pixel 0,0 is the first pixel
Color pixelColor = myBitmap.GetPixel(0, 0);
//the 252 places 1's in the 6 bits that we aren't manipulating so that ANDing with the message bits works
pixelColor = Color.FromArgb(c.A & (252 + mA), c.R & (252 + mR), c.G & (252 + mG), c.B & (252 + mB));
myBitmap.SetPixel(0, 0, pixelColor);
I wanted to use CreateBitmapFromMemory method, and its requires the stride as an input. and this stride confused me.
cbStride [in]
Type: UINT
The number of bytes between successive scanlines in pbBuffer.
and here it says: stride = image width + padding
Why do we need these extra space(padding). why dont just image width.
This is how calculate the stride right?
lWidthByte = (lWidth * bits + 7) / 8;
lWidth→pixel count
bits→bits per pixel
I suppuse deviding by 8 is for convert to byte. but,
What is (+7) doing here?
and finally
cbStride =((lWidthByte + 3) / 4) * 4;
Whats going on here? (why not cbStride = lWidthByte)
Please help me to clear these.
The use of padding is due to various (old and current) memory layout optimizations.
Having image pixel-rows have a length (in bytes) that is an integral multiple of 4/8/16 bytes can significantly simplify and optimized many image based operations. The reason is that these sizes allow proper storing and parallel pixel processing in the CPU registers, e.g. with SSE/MMX, without mixing pixels from two consecutive rows.
Without padding, extra code has to be inserted to handle partial WORD/DWORD pixel data since two consecutive pixels in memory might refer to one pixel on the right of one row and the left pixel on the next row.
If your image is a single channel image with 8 bit depth, i.e. grayscale in the range [0,255], then the stride would be the image width rounded up to the nearest multiple of 4 or 8 bytes. Note that the stride always specified in bytes even when a pixel may have more than one byte depth.
For images with more channels and/or more than one byte per pixel/channel, the stride would be the image width in bytes rounded up to the nearest multiple of 4 or 8 bytes.
The +7 and similar arithmetic examples you gave just make sure that the numbers are correctly rounded since integer math truncates the non-integer component of the division.
Just insert some numbers and see how it works. Don't forget to truncate (floor()) the intermediate division results.