I need to parse a H.264 stream to collect only NAL's needed to form a complete image, of only one frame. I'm reading the H.264 standard, but it's confuse and hard to read. I made some experiments but, did not worked. For example, i extracted an access unit with primary_pic_type == 0 containing only slice_type == 7 (I-Slice), it should give me a frame, but i tried to extract from ffmpeg, it did not work. But, when i append the next access_unit, containing only slice_type == 5 (P-Slice) it worked. Maybe i need to extract POC information, but i think not, because i only need extract one frame, but i'm not sure. Someone have some tip of how get only NAL's i need to form one complete image?
I assume that you have an "Annex B" style stream that looks like this:
(AUD)(SPS)(PPS)(I-Slice)(PPS)(P-Slice)(PPS)(P-Slice) ... (AUD)(SPS)(PPS)(I-Slice)
I assume that you want to decode a single I frame and we hope that your I frame is also an IDR frame.
Your are somewhere in the middle of the stream.
Keep reading until your find an (AUD) = 0x00 0x00 0x00 0x01 0x09.
Now push everything into your decoder until you are in front of | marking the second (PPS) : (AUD)(SPS)(PPS)(I-Slice) | (PPS)
Flush your decoder to emit an uncompressed frame.
This doesn't solve the general case but probably decodes most well behaved streams.
Just in case someone has the same problem, i solved it. I go until i find an AUD of primary_pic_type == 0. So i extract the AUD and the next one (when it's a field), send the two AUD to the server, and decode the frame using ffmpeg to generate a JPG image.
Related
I have a set of bare MP3 files. Bare as in I removed all tags (no ID3, no Xing, no Info) from those files.
Just before sending one of these files to the client, I want to add an Info tag. All of my files are CBR so we will use an Info tag (no Xing).
Right now I get the first 4 bytes of the existing MP3 to get the Version (MPEG-1, Layer III), Bitrate, Frequency, Stereo Mode, etc. and thus determine the size of one frame. I create the tag that way, reusing these 4 bytes for the Info tag and determining the size of the frame.
For those wondering, these 4 bytes may look like this:
FF FB 78 04
To me it felt like you are expected to use the exact same first 4 bytes in the Info tag as found in the other audio frames of the MP3, but when using ffmpeg, they stick an Info tag with a hard coded header (wrong bitrate, wrong frequency, etc.)
My question is: Is ffmpeg really doing it right? (LAME doesn't do that) Could I do the same, skipping the load of the first 4 bytes and still have the greater majority of the players out there play my files as expected?
Note: since I read these 4 bytes over the network, it would definitely save a lot of time and some bandwidth to not have to load these 4 bytes on a HEAD request. Resources I could use for the GET requests instead...
The reason for the difference is that with certain configurations, the size of a frame is less than 192 bytes. In that case, the full Info/Xing tag will not fit (and from what I can see, the four optional fields are always included, so an Info/Xing tag is always full even if not required to be).
So, for example, if you have a single channel with 44.1kHz data at 32kbps, the MP3 frame is 117 or 118 bytes. This is less than what is necessary to save the Info/Xing tag.
What LAME does in that situation is forfeit the Info/Xing tag. It's not going to be seen anywhere in the file.
On the other hand, what FFMPEG does is create a frame with a higher bitrate. So instead of 32kbps, it will try with 48kbps and then 64kbps. Once it finds a configuration which offers a frame large enough to support the Info/Xing tag, it stops. (I have not looked at the code, so how FFMPEG really finds a large enough frame, I do not know, but on my end I just incremented the bitrate index field by one until frame size >= 192 and it works).
You can replicate the feat by first creating (or converting) a WAVE file at 44.1kHz using a 32kbps bitrate then try to convert it to MP3 using ffmpeg and see that the Info/Xing tag has a different bitrate.
First problem is with audio rescaling. I'm trying to redo doc/examples/transcode_aac.c so that it also resamples from 41100 to 48000, it contained a warning that it can't do it.
Using doc/examples/resampling_audio.c as a reference, I saw that before doing swr_convert, I need to find the number of audio samples at the output with the code like this:
int dst_nb_samples = av_rescale_rnd( input_frame->nb_samples + swr_get_delay(resampler_context, 41100),
48000, 41100, AV_ROUND_UP);
Problem is, when I just set int dst_nb_samples = input_frame->nb_samples (which is 1024), it encodes and plays normally, but when I do that av_rescale_rnd thing (which results in 1196), audio is slowed down and distorted, like there are skips in the audio.
Second problem is with trying to mux webm with opus audio.
When I set AVStream->time_base to 1/48000, and increase AVFrame->pts by 960, the resulted file is played in the player as a file that is much bigger. 17 seconds audio shows as 16m11s audio, but it plays normally.
When I increase pts by 20, it displays normally, but has a lot of [libopus # 00ffa660] Queue input is backward in time messages during the encoding. Same for pts 30, still has those messages.
Should I try time_scale 1/1000? webm always have timecodes in milliseconds, and opus have packet size of 20ms (960 samples at 48000 Hz).
Search for pts += 20;
Here is the whole file, all modification I did are marked with //MINE: http://www.mediafire.com/file/jlgo7x4hiz7bw64/transcode_aac.c
Here is the file I tested it on http://www.mediafire.com/file/zdy0zarlqw3qn6s/480P_600K_71149981_soundonly.mkv
The easiest way to achieve that is by using swr_convert_frame which take a frame and resample it to a completely different one.
You can read more about it here: https://ffmpeg.org/doxygen/3.2/swresample_8h_source.html
dst_nb_samples can be calculated as this:
dst_nb_samples = 48000.0 / audio_stream->codec->sample_rate * inputAudioFrame->nb_samples;
Yours probably correct too, I didn't check, but this one I used before, confirm with yours but the number you gave check out. So real problem is probably somewhere else. Try to supply 960 samples in sync with video frames, to do this you need to store audio frames to an additional liner buffer. See if problem fixes.
And/or:
2ndly my experiences says audio pts increase as number of samples per frame (i.e. 960 for 50fps video for 48000hz (48000/50)), not by ms. If you supply 1196 samples, use pts += 1196 (if not used additional buffer I mentioned above). This is different then video frame pts. Hope that helps.
You are definitely in right path. I'll examine the source code if I have time. Anyway hope that helps.
I am trying to use ffmpeg, and have been doing a lot of experiment last 1 month.
I have not been able to get through. Is it really difficult to use FFmpeg?
My requirement is simple as below.
Can you please guide me if ffmpeg is suitable one or I have implement on my own (using codec libs available).
I have a webm file (having VP8 and OPUS frames)
I will read the encoded data and send it to remote guy
The remote guy will read the encoded data from socket
The remote guy will write it to a file (can we avoid decoding).
Then remote guy should be able to pay the file using ffplay or any player.
Now I will take a specific example.
Say I have a file small.webm, containing VP8 and OPUS frames.
I am reading only audio frames (OPUS) using av_read_frame api (Then checks stream index and filters audio frames only)
So now I have data buffer (encoded) as packet.data and encoded data buffer size as packet.size (Please correct me if wrong)
Here is my first doubt, everytime audio packet size is not same, why the difference. Sometimes packet size is as low as 54 bytes and sometimes it is 420 bytes. For OPUS will frame size vary from time to time?
Next say somehow extract a single frame (really do not know how to extract a single frame) from packet and send it to remote guy.
Now remote guy need to write the buffer to a file. To write the file we can use av_interleaved_write_frame or av_write_frame api. Both of them takes AVPacket as argument. Now I can have a AVPacket, set its data and size member. Then I can call av_write_frame api. But that does not work. Reason may be one should set other members in packet like ts, dts, pts etc. But I do not have such informations to set.
Can somebody help me to learn if FFmpeg is the right choice, or should I write a custom logic like parse a opus file and get frame by frame.
Now remote guy need to write the buffer to a file. To write the file
we can use av_interleaved_write_frame or av_write_frame api. Both of
them takes AVPacket as argument. Now I can have a AVPacket, set its
data and size member. Then I can call av_write_frame api. But that
does not work. Reason may be one should set other members in packet
like ts, dts, pts etc. But I do not have such informations to set.
Yes, you do. They were in the original packet you received from the demuxer in the sender. You need to serialize all information in this packet and set each value accordingly in the receiver.
I captured raw audio data stream together with its WAVEFORMATEXTENSIBLE struct.
WAVEFORMATEXTENSIBLE is shown in the figure below:
Following the standard of wav file, I tried to write the raw bits into a wav file.
What I do is:
write "RIFF".
write a DWORD. (filesize - sizeof("RIFF") - sizeof(DWORD)).
=== WaveFormat Chunk ===
write "WAVEfmt "
write a DWORD. (size of the WAVEFORMATEXTENSIBLE struct)
write the WAVEFORMATEXTENSIBLE struct.
=== Fact Chunk ===
write "fact"
write a DWORD. ( 4 )
write a DWORD. ( num of samples in the stream, which should be sizeof(rawdata)*8/wBitsPerSample ).
=== Data Chunk ===
write "data"
write a DWORD (size of rawdata)
write the raw data.
After getting the wav file from the above steps, I played the wav file with media player, there is no sound, playing with audacity will give me a distorted sound, I can hear that it is the correct audio I want, but the sound is distorted with noise.
The raw data can be find here
The wav file I generate is here
It is very confusing to me, because when I use the same method to convert IEEE-float data to wav file, it works just fine.
I figured this out, it seems the getbuffer releasebuffer cycle in IAudioRenderClient is putting raw data that has the format same as that passed into the initialize method of the IAudioClient.
The GetMixFormat in IAudioClient in my case is different from the format passed into the initialize method. I think GetMixFormat gets the format that the device supports.
IAudioClient should have done the conversion of format from the initialized format to the mixformat. I intercept the initialize method, get the format, and it works like a charm.
I'm intercepting WASAPI to access the audio data and face the exact same issue where the generated audio file from the data sounds like the correct content but is very noisy somehow although the frame rate, sample width, number of channels etc. are set properly.
The SubFormat field of WAVEFORMATEXTENSIBLE shows that the data is actually KSDATAFORMAT_SUBTYPE_IEEE_FLOAT, while I originally treat it as integers. According to this page, KSDATAFORMAT_SUBTYPE_IEEE_FLOAT is equivalent to WAVE_FORMAT_IEEE_FLOAT in WAVEFORMATEX. Hence, setting the "audio format" in the wav file's fmt chunk(normally starts in the 20th position) to WAVE_FORMAT_IEEE_FLOAT(which is 3) solved the problem. Remember to put it in little endian.
Original value of audio format
After modification
I'm using a MSDN tutorial to encode RAW RGB32 frame to an h264 videon this first part works without any problem. ( http://msdn.microsoft.com/en-us/library/ff819477%28v=VS.85%29.aspx)
But, there is one think that i can do : I just want to write the output encoded video to a BYTE array other than the file, i have read about 400 different web pages and all the Media Foundation documentation, but i don't see how to do that !!
I have try many different way, life using MFCreateTempFile and work with the IMFByteStream but there is nothing to do !
After i have try with it :
http://msdn.microsoft.com/en-us/library/windows/desktop/ms698913%28v=VS.85%29.aspx
But my buffer is empty !
Please help me !! I'm losing my eyes !!
H.264 Video Encoder is an MFT, that is it exposes IMFTransform interface and does not necessarily need to participate in a session. You can instantiate it standalone, set it up and get raw H.264 encoded data from its ProcessOutput method.