stride and padding of an image - image

I wanted to use CreateBitmapFromMemory method, and its requires the stride as an input. and this stride confused me.
cbStride [in]
Type: UINT
The number of bytes between successive scanlines in pbBuffer.
and here it says: stride = image width + padding
Why do we need these extra space(padding). why dont just image width.
This is how calculate the stride right?
lWidthByte = (lWidth * bits + 7) / 8;
lWidth→pixel count
bits→bits per pixel
I suppuse deviding by 8 is for convert to byte. but,
What is (+7) doing here?
and finally
cbStride =((lWidthByte + 3) / 4) * 4;
Whats going on here? (why not cbStride = lWidthByte)
Please help me to clear these.

The use of padding is due to various (old and current) memory layout optimizations.
Having image pixel-rows have a length (in bytes) that is an integral multiple of 4/8/16 bytes can significantly simplify and optimized many image based operations. The reason is that these sizes allow proper storing and parallel pixel processing in the CPU registers, e.g. with SSE/MMX, without mixing pixels from two consecutive rows.
Without padding, extra code has to be inserted to handle partial WORD/DWORD pixel data since two consecutive pixels in memory might refer to one pixel on the right of one row and the left pixel on the next row.
If your image is a single channel image with 8 bit depth, i.e. grayscale in the range [0,255], then the stride would be the image width rounded up to the nearest multiple of 4 or 8 bytes. Note that the stride always specified in bytes even when a pixel may have more than one byte depth.
For images with more channels and/or more than one byte per pixel/channel, the stride would be the image width in bytes rounded up to the nearest multiple of 4 or 8 bytes.
The +7 and similar arithmetic examples you gave just make sure that the numbers are correctly rounded since integer math truncates the non-integer component of the division.
Just insert some numbers and see how it works. Don't forget to truncate (floor()) the intermediate division results.

Related

How to determine the number of bytes necessary to store an uncompressed grayscale image of size 8000 × 3400 pixels?

This is all of the information I was provided in the practice question. I am trying to figure out how to calculate it when prompted to do so on an exam...
How to determine the number of bytes necessary to store an uncompressed grayscale image of size 8000 × 3400 pixels?
I am also curious how the calculation changes if the image is a compressed binary image.
"I am trying to figure out how to calculate it when prompted to do so on an exam."
There are 8 bits to make 1 byte, so once you know how many bits-per-pixel (bpp) you have, this is a very simple calculation.
For 8 bits per pixel greyscale, just multiply the width by the height.
8000 * 3400 = 27200000 bytes.
For 1 bit per pixel black&white, multiply the width by the height and then divide by 8.
(8000 * 3400) / 8 = 3400000 bytes.
It's critical that the image is uncompressed, and that there's no padding at the end of each raster line. Otherwise the count will be off.
The first thing to work out is how many pixels you have. That is easy, it is just the width of the image multiplied by the height:
N = w * h
So, in your case:
N = 8000 * 3400 = 27200000 pixels
Next, in general you need to work out how many samples (S) you have at each of those 27200000 pixel locations in the image. That depends on the type of the image:
if the image is greyscale, you will have a single grey value at each location, so S=1
if the image is greyscale and has transparency as well, you will have a grey value plus a transparency (alpha) value at each location, so S=2
if the image is colour, you will have three samples for each pixel - one Red sample, one Green sample and one Blue sample, so S=3
if the image is colour and has transparency as well, you will get the 3 RGB values plus a transparency (alpha) value for each pixel, so S=4
there are others, but let's not get too complicated
The final piece of the jigsaw is how big each sample is, or how much storage it takes, i.e. the bytes per sample (B).
8-bit data takes 1 byte per sample, so B=1
16-bit data takes 2 bytes per sample, so B=2
32-bit floating point or integer data take 4 bytes per sample, so B=4
there are others, but let's not get too complicated
So, the actual answer for an uncompressed greyscale image is:
storage required = w * h * S * B
and in your specific case:
storage required = 8000 * 3400 * 1 * 1 = 27200000 bytes
If the image were compressed, the only thing you should hope and expect is that it takes less storage. The actual amount required will depend on:
how repetitive/predictable the image is - the more predictable the image is, in general, the better it will compress
how many colours the image contains - fewer colours generally means better compression
which image file format you require (PNG, JPEG, TIFF, GIF)
which compression algorithm you use (RLE, LZW, DCT)
how long you are prepared to wait for compression and decompression - the longer you can wait, the better you can compress in general
what losses/inaccuracies you are prepared to tolerate to save space - if you are prepared to accept a lower quality version of your image, you can get a smaller file

Masking out the upper or lower 4 bits results in the same image

I am processing a 12-bit image which is unfortunately store as a 16-bit tiff. However, I do not know which 4 of the 16 bits are useless. So I tried three methods: mask each pixel with 0xFFF0, 0x0FFF, or 0x0FF0. It appears to me the resulting image of these three methods look just the same, but their md5 values are different. Why does this happen? Are there any differences if I use any of these three images for other purposes later?
Computer monitors can only display 256 distinct brightness levels. A 12-bit image consequently has its lower 4 bits ignored. So you see no difference when you zero out those bits or not.
When a 12-bit image is stored in a 16-bit integer, the upper 4 bits are usually left at zero, so there is no difference when you zero them or not. [Sometimes the pixel value is scaled to occupy the full 16 bit range, but this is not usually the case.]
So, don’t mask out any bits is my recommendation. Zeroing our the lower 4 bits just reduced the precision of the values in the image, making it equivalent to an 8-bit image. Masking the upper 4 bits is pointless because they already are zero.

Specifying a pitch for SetDIBitsToDevice

I am displaying bitmaps with the function SetDIBitsToDevice. This function knows about the total image size via a LPBITMAPINFO structure that has Width and Height fields. It also knows about the region of interest to be drawn via the arguments XDest, YDest, Width, Height. All these are specified in pixels.
So far so good when the image is stored as a canonical one, i.e. with a row pitch (number of bytes between a pixel and the one immediately below) that matches the image width in bytes, with padding (if necessary) to reach the next multiple of four bytes.
For technical reasons, I have images with a larger pitch (but still a multiple of four). For instance, width=1000 but pitch=1024. For a grayscale image (1 byte per pixel), I can trick the function by declaring a width of 1024 in LPBITMAPINFO and a width of 1000 when passed to SetDIBitsToDevice.
But for a 3 bytes per pixels image (RGB), I am stuck because 1024 bytes do not correspond to an integer number of pixels, and I see no way to specify that pitch.
Do you see a workaround or something I missed in the documentation ? (I don't think that the field SizeImage can be of any use.)

Can Someone Explain This Labview Code

I modified some Labview code I found online to use in my program. It works, I understand nearly all of it, but there's one section that confuses me. This is the program:
This program takes 2 images, subtracts them, and returns the picture plus a percentage difference. What I understand is it takes the pictures, subtracts them, converts the subtracted image into an array of colored pixels, then math happens, and the pixels are compared to the threshold. It adds a 1 for every pixel greater than the threshold, divides it by the image size, and out comes a percentage. The part I don't understand is the math part, the whole quotient and remainder section with a "random" 256. Because I don't understand how to get these numbers, I have a percentage, but I don't understand what they mean. Here's a picture of the front panel with 2 different tests.
In the top one, I have a percentage of 15, and the bottom a percentage of 96. This tells me that the bottom one is "96 percent different". But is there anyway to make sure this is accurate?
The other question I have is threshold, as I don't know exactly what that does either. Like if I change the threshold on the bottom image to 30, my percentage is 8%, with the same picture.
I'm sure once I understand the quotient/remainder part, it'll all make sense, but I can't seem to get it. Thank you for your help.
My best guess is that someone tried to characterize difference between 2 images with a single number. The remainder-quotient part is a "poor man" approach to split each 2D array element of the difference into 2 lower bytes (2 remainders) and the upper 2 byte word. Then lower 2 bytes of the difference are summed and the result is added to the upper 2 bytes (as a word). Maybe 3 different bytes each represented different channel of the camera (e.g. RGB color)?
Then, the value is compared against the threshold, and number of pixels above the threshold are calculated. This number is divided by the total number of pixels to calculate the %% difference. So result is a %% of pixels, which differ from the master image by the threshold.
E.g. if certain pixel of your image was 0x00112233 and corresponding master image pixel had a value of 0x00011122, then the number compared to the threshold is (0x11 - 0x01) + (0x22 - 0x11) + (0x33 - 0x22) = 0x10 + 0x11 + 0x11 = 0x32 = 50 decimal.
Whether this is the best possible comparison/difference criteria is the question well outside of this topic.

Bitmap storage / memory representation for N-bit RGB bitmaps

Typically, the most common RGB format seems to be 24-bit RGB (8-bits for each channel). However, historically, RGB has been represented in many other formats, including 3-bit RGB (1-bit per channel), 6-bit RGB (2-bits per channel), 9-bit RGB (3-bits per channel), etc.
When an N-bit RGB file has a value of N that is not a multiple of 8, how are these bitmaps typically represented in memory? For example, if we have 6-bit RGB, it means each pixel is 6 bits, and thus each pixel is not directly addressable by a modern computer without using bitwise operations.
So, is it common practice to simply covert N-bit RGB files into bitmaps where each pixel is of an addressable size (e.g. convert 6-bit to 8-bit)? Or is it more common to simply use bitwise operations to manipulate bitmaps where the pixel size is not addressable?
And what about disk storage? How is, say, a 6-bit RGB file stored on disk, when the last byte of the bitmap may not even contain a full pixel?
Images are often heavy and bandwidth is critical when transferring them. So a 6 bit per channel image is a reasonable choice if some loss in chrominance is acceptable (usually unnoticeable with textures and photos)
How is, say, a 6-bit RGB file stored on disk, when the last byte of
the bitmap may not even contain a full pixel?
If the smallest unit of storage is a Byte then yes you need to add some padding to be 8 bit aligned. This is fine because the space saving compared to an 8 bit per channel image can be considerable.
A power of 2 value that is divisible by 6 is very large. Better numbers are 5 bits for the red and blue channels and 6 bits for the green channel and the total is 16 bits per pixel. R5G6B5 is a very common pixel format.
Apologies for the archeological dig, but I just couldn't resist, as there's no good answer imho.
In the old days memory was the most expensive part of a computer. These days memory is dirt-cheap, so the most sensible way to handle N-bit channel images in modern hardware is to blow up every channel to the number of bits that fits your API or hardware (usually 8 bits).
For files, you could do the same, as disk space is also cheap these days. (and you can use one of the many compressed file formats out there.)
That said, the way it worked in the old days when these formats were common, is this:
The total number of bits per pixel was not a multiple of 8, but the number of pixels per scan line always was a multiple of 8. In this case you can store your scan lines "a bit at a time" and not waste any memory space when storing it in bytes.
So if your pixels were 9 bits per pixel, and a scan line was 320 pixels, you would have 320/8 = 40 bytes containing bit #0 of each pixel, followed by 40 bytes containing all bit #1's etc. up to and including bit #8. Hence all pixel info for your scan line would be exactly 360 bytes.
The video chips would have a different hardware wiring to the memory, so rendering such scan lines was fast. In fact, this is the easiest way to implement a variable amount of bits/pixel hardware support: by pulling bits from N adresses at once.
Note that this method does not change the amount of 'bitshifting' required to find the bits for pixel number X in a scanline, based on the total number of bits you use. You just read less addresses ahead at once.

Resources