I captured raw audio data stream together with its WAVEFORMATEXTENSIBLE struct.
WAVEFORMATEXTENSIBLE is shown in the figure below:
Following the standard of wav file, I tried to write the raw bits into a wav file.
What I do is:
write "RIFF".
write a DWORD. (filesize - sizeof("RIFF") - sizeof(DWORD)).
=== WaveFormat Chunk ===
write "WAVEfmt "
write a DWORD. (size of the WAVEFORMATEXTENSIBLE struct)
write the WAVEFORMATEXTENSIBLE struct.
=== Fact Chunk ===
write "fact"
write a DWORD. ( 4 )
write a DWORD. ( num of samples in the stream, which should be sizeof(rawdata)*8/wBitsPerSample ).
=== Data Chunk ===
write "data"
write a DWORD (size of rawdata)
write the raw data.
After getting the wav file from the above steps, I played the wav file with media player, there is no sound, playing with audacity will give me a distorted sound, I can hear that it is the correct audio I want, but the sound is distorted with noise.
The raw data can be find here
The wav file I generate is here
It is very confusing to me, because when I use the same method to convert IEEE-float data to wav file, it works just fine.
I figured this out, it seems the getbuffer releasebuffer cycle in IAudioRenderClient is putting raw data that has the format same as that passed into the initialize method of the IAudioClient.
The GetMixFormat in IAudioClient in my case is different from the format passed into the initialize method. I think GetMixFormat gets the format that the device supports.
IAudioClient should have done the conversion of format from the initialized format to the mixformat. I intercept the initialize method, get the format, and it works like a charm.
I'm intercepting WASAPI to access the audio data and face the exact same issue where the generated audio file from the data sounds like the correct content but is very noisy somehow although the frame rate, sample width, number of channels etc. are set properly.
The SubFormat field of WAVEFORMATEXTENSIBLE shows that the data is actually KSDATAFORMAT_SUBTYPE_IEEE_FLOAT, while I originally treat it as integers. According to this page, KSDATAFORMAT_SUBTYPE_IEEE_FLOAT is equivalent to WAVE_FORMAT_IEEE_FLOAT in WAVEFORMATEX. Hence, setting the "audio format" in the wav file's fmt chunk(normally starts in the 20th position) to WAVE_FORMAT_IEEE_FLOAT(which is 3) solved the problem. Remember to put it in little endian.
Original value of audio format
After modification
Related
I have an MFC based project that decodes some data and generates 16 bit 48000 Hz raw wav audio data
The program continuously generates wav audio data in real time
Are there any functions in MFC that will let me play back the audio data in the sound card? I have been googling around for a while and the consensus seems to be that MFC doesn't have this feature. I have also found this tutorial that shows how to playback a wav file using PlaySound() function, but it looks like it is only for wav files and even if it plays audio data in memory, that data has to be prepared in the form of a full wav file with all the header information, while I need to play back raw wav data generated in real time
I have also seen people suggest using Direct X, but I feel like something like this should be possible using basic windows library functions without having to use any other extra libraries. I also found this tutorial for creating and reading wav files in an MFC based project, but it's not really clear how to use it to play raw wav data in memory. This tutorial uses waveOutOpen() function to playbakc the wav file, and it looks like this is probably what I need, but I cannot find a simple tutorial that shows how to use it.
How do I playback raw wav audio in memory in an MFC Dialog based project? I am looking for something where I can specify pointer to the wav data, number of samples, bits and sampling frequency and the function would playback the wav data for me. A basic working example such as generating a sinewave and playing it back will be appreciated. If directx is the only way to do this then that's fine as well.
I need to parse a H.264 stream to collect only NAL's needed to form a complete image, of only one frame. I'm reading the H.264 standard, but it's confuse and hard to read. I made some experiments but, did not worked. For example, i extracted an access unit with primary_pic_type == 0 containing only slice_type == 7 (I-Slice), it should give me a frame, but i tried to extract from ffmpeg, it did not work. But, when i append the next access_unit, containing only slice_type == 5 (P-Slice) it worked. Maybe i need to extract POC information, but i think not, because i only need extract one frame, but i'm not sure. Someone have some tip of how get only NAL's i need to form one complete image?
I assume that you have an "Annex B" style stream that looks like this:
(AUD)(SPS)(PPS)(I-Slice)(PPS)(P-Slice)(PPS)(P-Slice) ... (AUD)(SPS)(PPS)(I-Slice)
I assume that you want to decode a single I frame and we hope that your I frame is also an IDR frame.
Your are somewhere in the middle of the stream.
Keep reading until your find an (AUD) = 0x00 0x00 0x00 0x01 0x09.
Now push everything into your decoder until you are in front of | marking the second (PPS) : (AUD)(SPS)(PPS)(I-Slice) | (PPS)
Flush your decoder to emit an uncompressed frame.
This doesn't solve the general case but probably decodes most well behaved streams.
Just in case someone has the same problem, i solved it. I go until i find an AUD of primary_pic_type == 0. So i extract the AUD and the next one (when it's a field), send the two AUD to the server, and decode the frame using ffmpeg to generate a JPG image.
In trying to understand how to convert mediafoundation rgb32 data into a bitmap data that can be loaded into image/bitmap widgets or saved as a bitmap file, I am wondering what the RGB32 data actually is, in comparison to the data a BMP has?
Is it simply missing header information or key information a bitmap file has like width, height, etc?
What does RGB32 actually mean, in comparison to BMP data in a bitmap file or memory stream?
You normally have 32-bit RGB as IMFMediaBuffer attached to IMFSample. This is just bitmap bits, without format specific metadata. You can access this data by obtaining media buffer pointer, such as, for example, by doing IMFSample::ConvertToContiguousBuffer call, then doing IMFMediaBuffer::Lock to get a pixel data pointer.
The obtained buffer is compatible to data in standard .BMP file (except maybe, at some times, the rows could be in reverse order), it is just .BMP file has a header before this data. .BMP file normally has BITMAPFILEHEADER structure, then BITMAPINFOHEADER and then the buffer in question. If you write it one after another initialized respectively, this would yield you a valid picture file. This and other questions here show the way to create a .BMP file from bitmap bits.
See this GitHub code snippet, which is really close to the requested task and might be a good starting point.
I am trying to use ffmpeg, and have been doing a lot of experiment last 1 month.
I have not been able to get through. Is it really difficult to use FFmpeg?
My requirement is simple as below.
Can you please guide me if ffmpeg is suitable one or I have implement on my own (using codec libs available).
I have a webm file (having VP8 and OPUS frames)
I will read the encoded data and send it to remote guy
The remote guy will read the encoded data from socket
The remote guy will write it to a file (can we avoid decoding).
Then remote guy should be able to pay the file using ffplay or any player.
Now I will take a specific example.
Say I have a file small.webm, containing VP8 and OPUS frames.
I am reading only audio frames (OPUS) using av_read_frame api (Then checks stream index and filters audio frames only)
So now I have data buffer (encoded) as packet.data and encoded data buffer size as packet.size (Please correct me if wrong)
Here is my first doubt, everytime audio packet size is not same, why the difference. Sometimes packet size is as low as 54 bytes and sometimes it is 420 bytes. For OPUS will frame size vary from time to time?
Next say somehow extract a single frame (really do not know how to extract a single frame) from packet and send it to remote guy.
Now remote guy need to write the buffer to a file. To write the file we can use av_interleaved_write_frame or av_write_frame api. Both of them takes AVPacket as argument. Now I can have a AVPacket, set its data and size member. Then I can call av_write_frame api. But that does not work. Reason may be one should set other members in packet like ts, dts, pts etc. But I do not have such informations to set.
Can somebody help me to learn if FFmpeg is the right choice, or should I write a custom logic like parse a opus file and get frame by frame.
Now remote guy need to write the buffer to a file. To write the file
we can use av_interleaved_write_frame or av_write_frame api. Both of
them takes AVPacket as argument. Now I can have a AVPacket, set its
data and size member. Then I can call av_write_frame api. But that
does not work. Reason may be one should set other members in packet
like ts, dts, pts etc. But I do not have such informations to set.
Yes, you do. They were in the original packet you received from the demuxer in the sender. You need to serialize all information in this packet and set each value accordingly in the receiver.
I'm using a MSDN tutorial to encode RAW RGB32 frame to an h264 videon this first part works without any problem. ( http://msdn.microsoft.com/en-us/library/ff819477%28v=VS.85%29.aspx)
But, there is one think that i can do : I just want to write the output encoded video to a BYTE array other than the file, i have read about 400 different web pages and all the Media Foundation documentation, but i don't see how to do that !!
I have try many different way, life using MFCreateTempFile and work with the IMFByteStream but there is nothing to do !
After i have try with it :
http://msdn.microsoft.com/en-us/library/windows/desktop/ms698913%28v=VS.85%29.aspx
But my buffer is empty !
Please help me !! I'm losing my eyes !!
H.264 Video Encoder is an MFT, that is it exposes IMFTransform interface and does not necessarily need to participate in a session. You can instantiate it standalone, set it up and get raw H.264 encoded data from its ProcessOutput method.