Detecting .png alpha channel in Windows/D3D - windows

I'm loading a texture from .png using D3DXCreateTextureFromFile(). How can my program know if the image file contains an alpha channel?

This isn't too hard to do by simply examining the file.
A PNG file consists of:
A file header
One or more 'chunks'
The file header is always 8 bytes and should be skipped over.
Each chunk begins with 4 bytes indicating its length, and 4 bytes indicating its type. The first chunk should always be 13 bytes and have the type IHDR. This contains the information about the image.
The tenth byte in the header contains the exact information you're looking for. It will be equal to 6 if the PNG file is RGBA.
More information can be found here.

Call IDirect3DTexture9::GetSurfaceLevel and then call IDirect3DSurface9::GetDesc. The D3DSURFACE_DESC.Format member will tell you.

Related

When creating a Xing or Info tag in an MP3, may I use any MP3 header or does it have to match other frames?

I have a set of bare MP3 files. Bare as in I removed all tags (no ID3, no Xing, no Info) from those files.
Just before sending one of these files to the client, I want to add an Info tag. All of my files are CBR so we will use an Info tag (no Xing).
Right now I get the first 4 bytes of the existing MP3 to get the Version (MPEG-1, Layer III), Bitrate, Frequency, Stereo Mode, etc. and thus determine the size of one frame. I create the tag that way, reusing these 4 bytes for the Info tag and determining the size of the frame.
For those wondering, these 4 bytes may look like this:
FF FB 78 04
To me it felt like you are expected to use the exact same first 4 bytes in the Info tag as found in the other audio frames of the MP3, but when using ffmpeg, they stick an Info tag with a hard coded header (wrong bitrate, wrong frequency, etc.)
My question is: Is ffmpeg really doing it right? (LAME doesn't do that) Could I do the same, skipping the load of the first 4 bytes and still have the greater majority of the players out there play my files as expected?
Note: since I read these 4 bytes over the network, it would definitely save a lot of time and some bandwidth to not have to load these 4 bytes on a HEAD request. Resources I could use for the GET requests instead...
The reason for the difference is that with certain configurations, the size of a frame is less than 192 bytes. In that case, the full Info/Xing tag will not fit (and from what I can see, the four optional fields are always included, so an Info/Xing tag is always full even if not required to be).
So, for example, if you have a single channel with 44.1kHz data at 32kbps, the MP3 frame is 117 or 118 bytes. This is less than what is necessary to save the Info/Xing tag.
What LAME does in that situation is forfeit the Info/Xing tag. It's not going to be seen anywhere in the file.
On the other hand, what FFMPEG does is create a frame with a higher bitrate. So instead of 32kbps, it will try with 48kbps and then 64kbps. Once it finds a configuration which offers a frame large enough to support the Info/Xing tag, it stops. (I have not looked at the code, so how FFMPEG really finds a large enough frame, I do not know, but on my end I just incremented the bitrate index field by one until frame size >= 192 and it works).
You can replicate the feat by first creating (or converting) a WAVE file at 44.1kHz using a 32kbps bitrate then try to convert it to MP3 using ffmpeg and see that the Info/Xing tag has a different bitrate.

H.264 - Identify Access Units of an image

I need to parse a H.264 stream to collect only NAL's needed to form a complete image, of only one frame. I'm reading the H.264 standard, but it's confuse and hard to read. I made some experiments but, did not worked. For example, i extracted an access unit with primary_pic_type == 0 containing only slice_type == 7 (I-Slice), it should give me a frame, but i tried to extract from ffmpeg, it did not work. But, when i append the next access_unit, containing only slice_type == 5 (P-Slice) it worked. Maybe i need to extract POC information, but i think not, because i only need extract one frame, but i'm not sure. Someone have some tip of how get only NAL's i need to form one complete image?
I assume that you have an "Annex B" style stream that looks like this:
(AUD)(SPS)(PPS)(I-Slice)(PPS)(P-Slice)(PPS)(P-Slice) ... (AUD)(SPS)(PPS)(I-Slice)
I assume that you want to decode a single I frame and we hope that your I frame is also an IDR frame.
Your are somewhere in the middle of the stream.
Keep reading until your find an (AUD) = 0x00 0x00 0x00 0x01 0x09.
Now push everything into your decoder until you are in front of | marking the second (PPS) : (AUD)(SPS)(PPS)(I-Slice) | (PPS)
Flush your decoder to emit an uncompressed frame.
This doesn't solve the general case but probably decodes most well behaved streams.
Just in case someone has the same problem, i solved it. I go until i find an AUD of primary_pic_type == 0. So i extract the AUD and the next one (when it's a field), send the two AUD to the server, and decode the frame using ffmpeg to generate a JPG image.

What exactly is the difference between MediaFoundation RGB data and a BMP?

In trying to understand how to convert mediafoundation rgb32 data into a bitmap data that can be loaded into image/bitmap widgets or saved as a bitmap file, I am wondering what the RGB32 data actually is, in comparison to the data a BMP has?
Is it simply missing header information or key information a bitmap file has like width, height, etc?
What does RGB32 actually mean, in comparison to BMP data in a bitmap file or memory stream?
You normally have 32-bit RGB as IMFMediaBuffer attached to IMFSample. This is just bitmap bits, without format specific metadata. You can access this data by obtaining media buffer pointer, such as, for example, by doing IMFSample::ConvertToContiguousBuffer call, then doing IMFMediaBuffer::Lock to get a pixel data pointer.
The obtained buffer is compatible to data in standard .BMP file (except maybe, at some times, the rows could be in reverse order), it is just .BMP file has a header before this data. .BMP file normally has BITMAPFILEHEADER structure, then BITMAPINFOHEADER and then the buffer in question. If you write it one after another initialized respectively, this would yield you a valid picture file. This and other questions here show the way to create a .BMP file from bitmap bits.
See this GitHub code snippet, which is really close to the requested task and might be a good starting point.

Parsing split video with ffmpeg

I have video file split into few chunks. Split done and random file positions, but chunks are large enough.
I need to parse every part with different instances of AVFormatContext. Chunks come one after another in right order. I think there are two options here:
Being able to save and restore AVFormatContext state;
Save video file header (from first chunk) and attach it to every chunk.
I tried both but no success. First approach requires to go too deeply beyond public API of ffmpeg. With second approach I am unable to merge header with new chunk so that ffmpeg can handle it.
Can you help me with this?
Thank you.
It totally depends on the file type. MP4 for example, the header must be completely rewritten, and can not just be copied. Flv the header can probably just be copied, but MUST be split on frame boundary and not randomly. TS could do this, but you would miss a frame at the cut point.
Realistically, the file will need to be reassembled, the split correctly.

Where is the endianness of the frame buffer device accounted for?

I'm working on a board with an at91 sama5d3 based device. In order to test the framebuffer, I redirected a raw data file to /dev/fb0. The file was generated using gimp and exported as a raw data file in the 'normal RGB' format. So as a serialised byte stream the data format in the file is RGBRGB (where each colour is 8 bits).
When copied to the framebuffer, the Red and the Blues were swapped as the LCD controller operates in little endian format when configured for 24 bpp mode.
This got me wondering, at what point is the endianness of the framebuffer taken into account? I'm planning on using directfb and had a look at some of the docs but didn't see anything directly alluding to the endianness of the pixel data.
I'm not familiar with how the kernel framebuffer works and so am missing a few pieces of the puzzle.

Resources