Direct3D9 - Convert a A2R10G10B10 image to a A8R8G8B8 image - format

Before starting:
A2B10G10R10 (2 bits for the alpha, 10 bits for each color channels)
A8B8G8R8 (8 bits for every channels)
Correct me if I'm wrong, but is it right that the A2B10G10R10 pixel format cannot be displayed directly on screens ?
If so, I would like to convert my A2B10G10R10 image to a displayable A8B8G8R8 one either using OpenCV, Direct3D9 or even manually but I'm really bad when it comes to bitwise operation that's why I need your help.
So here I am:
// Get the texture bits pointer
offscreenSurface->LockRect(&memDesc, NULL, 0);
// Copy the texture bits to a cv::Mat
cv::Mat m(desc.Height, desc.Width, CV_8UC4, memDesc.pBits, memDesc.Pitch);
// Convert from A2B10G10R10 to A8B8G8R8
???
Here how I think I should do for every 32 bits pack:
Copy the first original 2 bits into the first converted 8 bits
Scale every original 10 bits to every converted 8 bits (How to do that ?) for every other channels
Note:
The cv::cvtColor doesn't seem to propose the format conversion I need
I can't use IDirect3DDevice9::StretchRect method
Even Google seems to be lost on this subject
So to resume, the question is:
How to convert a A2B10G10R10 pixel format texture to a A8B8G8R8 one ?
Thanks. Best regards.

I'm not sure why you are using legacy Direct3D 9 instead of DirectX 11. In any case, the naming scheme between Direct3D 9 era D3DFMT and the modern DXGI_FORMAT is flipped, so it can be a bit confusing.
D3DFMT_A8B8G8R8 is the same as DXGI_FORMAT_R8G8B8A8_UNORM
D3DFMT_A2B10G10R10 is the same as DXGI_FORMAT_R10G10B10A2_UNORM
D3DFMT_A8R8G8B8 is the same as DXGI_FORMAT_B8G8R8A8_UNORM
There is no direct equivalent of D3DFMT_A2R10G10B10 in DXGI but you can swap the red/blue channels to get it.
There's also a long-standing bug in the deprecated D3DX9, D3DX10, and D3DX11 helper libraries where the DDS file format DDPIXELFORMAT have the red and blue masks backwards for both 10:10:10:2 formats. My DDS texture readers solve this by flipping the mapping of the masks to the formats on read, and always writing DDS files using the more modern DX10 header where I explicitly use DXGI_FORMAT_R10G10B10A2_UNORM
. See this post for more details.
The biggest problem with converting 10:10:10:2 to 8:8:8:8 is that you are losing 2 bits of data from the R, G, B color channels. You can do a naïve bit-shift, but the results are usually crap. To handle the color conversion where you are losing precision, you want to use something like error diffusion or ordered dithering.
Furthermore for the 2-bit alpha, you don't want 3 (11) to map to 192 (11000000) because in 2-bit alpha 3 "11" is fully opaque while 255 (11111111) is in 8-bit alpha.
Take a look at DirectXTex which is an open source library that does conversions for every DXGI_FORMAT and can handle legacy conversions of most D3DFMT. It implements all the stuff I just mentioned.
The library uses float4 intermediate values because it's built on DirectXMath and that provides a more general solution than having a bunch of special-case conversion combinations. For special-case high-performance use, you could write a direct 10-bit to 8-bit converter with all the dithering, but that's a pretty unusual situation.
With all that discussion of format image conversion out of the way, you can in fact render a 10:10:10:2 texture onto a 8:8:8:8 render target for display. You can use 10:10:10:2 as a render target backbuffer format as well, and it will get converted to 8:8:8:8 as part of the present. Hardware support for 10:10:10:2 is optional on Direct3D 9, but required for Direct3D Feature Level 10 or better cards when using DirectX 11. You can even get true 10-bit display scan-out when using the "exclusive" full screen rendering mode, and Windows 10 is implementing HDR display out natively later this year.

There's a general solution to this, and it's nice to be able to do it at the drop of a hat without needing to incorporate a bulky library, or introduce new rendering code (sometimes not even practical).
First, rip it apart. I can never keep track of which order the RGBA fields are in, so I just try it every way until one works, a strategy which reliably works every time.. eventually. But you may as well trust your analysis for your first attempt. The docs I found said D3D is listing them from MSB to LSB, so in this case we have %AA RRRRRRRRRR GGGGGGGGGG BBBBBBBBBB (but I have no idea if that's right)
b = (src>> 0) & 0x3FF;
g = (src>>10) & 0x3FF;
r = (src>>20) & 0x3FF;
a = (src>>30) & 0x003;
Next, you fix the precision. Naive bit-shift frequently works fine. If the results are 8 bits per channel, you're no worse off than you are with most images. A shift down from 10 bits to 3 would look bad without dithering but from 10 to 8 can look alright.
r >>= 2; g >>= 2; b >>= 2;
Now the alpha component does get tricky because it's shifting the other way. As #chuck-walbourn said you need to consider how you want the alpha values to map. Here's what you probably want:
%00 -> %00000000
%01 -> %01010101
%10 -> %10101010
%11 -> %11111111
Although a lookup table with size 4 probably makes the most sense here, there's a more general way of doing it. What you do is shove your small value to the top of the big value and then replicate it. Here it is with your scenario and one other more interesting scenario:
%Aa -> %Aa Aa Aa Aa
%xyz -> %xyz xyz xy
Let's examine what would happen for xyz with a lookup table:
%000 -> %000 000 00 (0)
%001 -> %001 001 00 (36) +36
%010 -> %010 010 01 (73) +37
%011 -> %011 011 01 (109) +36
%100 -> %100 100 10 (146) +37
%101 -> %101 101 10 (182) +36
%110 -> %110 110 11 (219) +37
%111 -> %111 111 11 (255) +36
As you can see, we get some good characteristics with this approach. Naturally having a %00000000 and %11111111 result is of paramount importance.
Next we pack the results:
dst = (a<<24)|(r<<16)|(g<<8)|(b<<0);
And then we decide if we need to optimize it or look at what the compiler does and scratch our heads.

Related

Convolution effect changed from Gimp version 2.8.22 to 2.10.18

I recently had the task to apply several convolution filters at university. While playing around with Gimp version 2.10.18, I noticed that the filters from the exercises I applied did not have the supposed outcome.
I found out that convolution behavior changed from Gimp 2.8.22 to 2.10.18 and wanted to ask if someone knew how to get the old behavior back.
Let me explain what should happen and what actually happens in 2.10.18:
My sample picture looks like this (these are the values in all its pixel rows):
90 90 150 210 210
I now apply the filter
0 0 0
0 2 0
0 0 0
with divisor 1 and offset 0.
The maths behind it and Gimp 2.8 tell me that the outcome should be composed of
180 values on the left side, 255 on the right side
I don't understand what Gimp 2.10 does, but the outcome just has brighter values (90->125, 150->205, 210->255) instead of the expected change.
Is this a bug or am I somehow missing something? Thanks!
A big difference between 2.10 (high-bit-depth) and previous versions (8-bit), is that the 2.10 works in "linear light". In 2.8, the 0..255 values pixels are not a linear representation of the color/luminosity but are gamma-corrected (so that there are more values for dark tones(*)). Most Gimp 2.8 tools work (incorrectly) directly on these gamma-corrected values. In Gimp 2.10, if you are in 8-bit (and in the general case, using gamma-corrected representation, but this is mostly useful in 8-bit), the pixel data is converted to 32-bit FP linear, removing the gamma compensation, then the required transformation is applied, then the data is converted back to 8-bit, with the gamma compensation reinstated.
June 2021 Edit: in 2.10, if you put the image in a high-precision mode, and use the values that are the mathematical equivalents to 90/255, 15O/255 and 210/255:
... you get a result that is equivalent to 180/255:
Which confirms that in 2.10 convolution operates on "linear light".
So
If you want the old behavior, use the old Gimp. But you have to keep in mind that the old behavior was incorrect, even if some workflows could take advantage of it.
If you wanted to see what a spatial convolution matrix can do, then use Gimp 2.10 in "linear light".
(*) Try this: open two images in Gimp, fill one with a checkerboard pattern and one with grey (128,128,128). Step back until the checkerboard becomes a uniform gray. You'll notice that the plain gray image is darker... so (128,128,128) is not the middle of the luminosity range.

How do successive convolutional layers work?

If my first convolution have 64 filters and my second has 32 filters.
Will i have :
1 Image -> Conv(64 filters) -> 64 ImagesFiltred -> Conv(32 filters) -> 64 x 32 = 2048 Images filtred
Or :
1 Image -> Conv(64 filters) -> 64 ImagesFiltred -> Conv(32 filters) -> 32 Images filtred
If it is the second answer : what are goin on between the 64 ImagesFiltred and the second Conv ??
Thanks for your answer, in don't find a good tutorial that explain clearly, it always a rush ...
Your first point is correct. Convolutions are essentially ways of altering and extracting features from data. We do this by creating m images, each looking at a certain frame of the original image. On this first convolutional layer, we then take n images for each convoluted image in the first layer.
SO: k1 *k2 would be the total number of images.
To further this point,
a convolution works by making feature maps of an image. When you have successive convolutional layers, you are making feature maps of feature maps. I.e. if I start with 1 image, and my first convolutional layer is of size 20, then I have 20 images (more specifically feature maps) at the end of convolution 1. Then let's say I add a second convolution of size 10. What happens is then I am making 10 feature maps for every 1 image. Thus, it would be 20*10 images = 200 feature maps.
Let's say for example you have a 50x50 pixel image. Let's say you have a convolutional layer with a filter of size 5x5. What happens if you don't have padding or anything else) is that you "slide" across the image and get a weighted average of the pixels at each iteration of the slide (depending on your location). You would then get an output feature map of size 5x5. Let's say you do this 20 times then (i.e. a 5x5x20 convolution) You would then have as an output 20 feature maps of size 5x5. In the diagram mentioned in the VGG neural network post below, the diagram only shows the number of feature maps to made for the incoming feature maps NOT the end sum of feature maps.
I hope this explanation was thorough!
Here we have the architecture of the VGG-16
In VGG-16 we have 4 convolutions : 64, 128, 256 512
And in the architecture we saw that we don't have 64 images, 64*128 images etc
but just 64 images, 128 images etc
So the good answer was not the first but the second. And it imply my second questions :
"What are goin on between the 64 ImagesFiltred and the second Conv ??"
I think between a 64 conv et and 32 conv they are finaly only 1 filter but on two pixel couch so it divide the thickness of the conv by 2.
And between a 64 conv and a 128 conv they are only 2 filter on one pixel couch so ti multiply by 2 the thickness of the conv.
Am i right ?

Comment image's "source code"

When you open an image in a text editor you get some characters which don't really makes sense (at least not to me). Is there a way to add comments to that text, so the file would not apear damaged when opened with an image viewer.
So, something like this:
Would turn into this:
The way to go is setting metadata fields if your image format supports any.
For example, for a PNG you can set a comment field when exporting the file or with a separate tool like exiftool:
exiftool -comment="One does not simply put text into the image data" test.png
If the purpose of the text is to ensure ownership then take a look at digital watermarking.
If you are looking to actually encode information in your image you should use steganography ( https://en.wikipedia.org/wiki/Steganography )
The wiki article runs you through the basics and shows and example of a picture of a cat hidden in a picture of trees as an example of how you can hide information. In the case of hiding text you can do the following:
Encoding
Come up with your phase: For argument's sake I'll use the word Hidden
Convert that text to a numeric representation - for simplicity I'll assume ASCII conversion of characters, but you don't have to
"Hidden" = 72 105 100 100 101 110
Convert the numeric representation to Binary
72 = 01001000 / 105 = 01101001 / 100 = 01100100 / 101=01100100 / 110=01101110
For each letter convert the 8 bit binary representations into four 2 bit binary representations that we shall call mA,mR,mG,mB for reasons that will become clear shortly
72 = 01 00 10 00 => 1 0 2 0 = mA mR mG mB
Open an image file for editing: I would suggest using C# to load the image and then use Get/Set Pixels to edit them (How to manipulate images at the pixel level in C# )
use the last 2 bits of each color channel for each pixel to encode your message. For example to encode H in the first pixel of an image you can use the C# code at the end of the instructions
Once all letters of the Word - one per pixel - have been encoded in the image you are done.
Decoding
Use the same basic process in reverse.
You walk through the image one pixel at a time
You take the 2 least significant bits of each color channel in the pixel
You concatenate the LSB together in alpha,red,green,blue order.
You convert the concatenated bits into an 8 bit representation and then convert that binary form to base 10. Finally, you perform a look up on the base 10 number in an ASCII chart, or just cast the number to a char.
You repeat for the next pixel
The thing to remember is that the technique I described will allow you to encode information in the image without a human observer noticing because it only manipulates the image on the last 2 bits of each color channel in a single pixel, and human eyes cannot really tell the difference between the colors in the range of [(252,252,252,252) => (255,255,255,255)].
But as food for thought, I will mention that a computer can with the right algorithms, and there is active research into bettering the ability of a computer to be able to pick this sort of thing out.
So if you only want to put in a watermark then this should work, but if you want to actually hide something you have to encrypt the message and then perform the
Steganography on the encrypted binary. Since encrypted data is MUCH larger than plain text data it will require an image with far more pixels.
Here is the code to encode H into the first pixel of your image in C#.
//H=72 and needs the following message Alpha, message Red, message Green, message Blue components
mA = 1;
mR = 0;
mG = 2;
mB = 0;
Bitmap myBitmap = new Bitmap("YourImage.bmp");
//pixel 0,0 is the first pixel
Color pixelColor = myBitmap.GetPixel(0, 0);
//the 252 places 1's in the 6 bits that we aren't manipulating so that ANDing with the message bits works
pixelColor = Color.FromArgb(c.A & (252 + mA), c.R & (252 + mR), c.G & (252 + mG), c.B & (252 + mB));
myBitmap.SetPixel(0, 0, pixelColor);

Compressing/packing "don't care" bits into 3 states

At the moment I am working on an on screen display project with black, white and transparent pixels. (This is an open source project: http://code.google.com/p/super-osd; that shows the 256x192 pixel set/clear OSD in development but I'm migrating to a white/black/clear OSD.)
Since each pixel is black, white or transparent I can use a simple 2 bit/4 state encoding where I store the black/white selection and the transparent selection. So I would have a truth table like this (x = don't care):
B/W T
x 0 pixel is transparent
0 1 pixel is black
1 1 pixel is white
However as can be clearly seen this wastes one bit when the pixel is transparent. I'm designing for a memory constrained microcontroller, so whenever I can save memory it is good.
So I'm trying to think of a way to pack these 3 states into some larger unit (say, a byte.) I am open to using lookup tables to decode and encode the data, so a complex algorithm can be used, but it cannot depend on the states of the pixels before or after the current unit/byte (this rules out any proper data compression algorithm) and the size must be consistent; that is, a scene with all transparent pixels must be the same as a scene with random noise. I was imagining something on the level of densely packed decimal which packs 3 x 4-bit (0-9) BCD numbers in only 10 bits with something like 24 states remaining out of the 1024, which is great. So does anyone have any ideas?
Any suggestions? Thanks!
In a byte (256 possible values) you can store 5 of your three-bit values. One way to look at it: three to the fifth power is 243, slightly less than 256. The fact that it's slightly less also shows that you're not wasting much of a fraction of a bit (hardly any, either).
For encoding five of your 3-bit "digits" into a byte, think of taking a number in base 3 made from your five "digits" in succession -- the resulting value is guaranteed to be less than 243 and therefore directly storable in a byte. Similarly, for decoding, do the base-3 conversion of a byte's value.

color depth bits?

(Quick version: jump to paragraph next to the last one - the one beginning with "But")
I was happy in my ignorance believing that PVRTC images were 4 or 2 bits per channel. That sounded plausible. It would give 4+4+4+4 (16 bit) or 2+2+2+2 (8 bit) textures, that would have 2^16 (65536) and 2^8 (256) color depth respectively. But reading through some documents about PVRTC, I suddenly realized that it said 4 bpp (and 2 bpp), i.e. 4 bits per pixel. Confusion and madness entered my world.
What?! 4 bits? Per pixel? But that's just 1 bit per channel! (And don't even get me started on the 2 bit one, that one was far too weird for my brain to grasp at the moment.) Some moments into this agonizing reality, I came to understand this wasn't so real after all. Apparently, when saying 4 bpp, it's referring to the compression, and not the color depth. Phew, I wasn't not mad, after all.
But then I started to wonder: what color depth do these images have then, after decompression? I tried to look this information up, but apparently it's not considered important to mention (or I'm just bad at finding info).
The fact that PVRTC compressed images don't seem give any visible artifacts in OpenGLES with the pixel format RGBA4444 would suggest they're 16 bit (using 32 bit png images with the pixel format RGBA4444 in OpenGLES on the iPhone gives very visible artifacts).
According to the paper http://web.onetel.net.uk/~simonnihal/assorted3d/fenney03texcomp.pdf the final output of the decompressor is 8 bits per channel.

Resources