FFmpeg transcoded sound (AAC) stops after half video time - ffmpeg

I have a strange problem in my C/C++ FFmpeg transcoder, which takes an input MP4 (varying input codecs) and produces and output MP4 (x264, baseline & AAC LC #44100 sample rate with libfdk_aac):
The resulting mp4 video has fine images (x264) and the audio (AAC LC) works fine as well, but is only played until exactly the half of the video.
The audio is not slowed down, not stretched and doesn't stutter. It just stops right in the middle of the video.
One hint may be that the input file has a sample rate of 22050 and 22050/44100 is 0.5, but I really don't get why this would make the sound just stop after half the time. I'd expect such an error leading to sound being at the wrong speed. Everything works just fine if I don't try to enforce 44100 and instead just use the incoming sample_rate.
Another guess would be that the pts calculation doesn't work. But the audio sounds just fine (until it stops) and I do exactly the same for the video part, where it works flawlessly. "Exactly", as in the same code, but "audio"-variables replaced with "video"-variables.
FFmpeg reports no errors during the whole process. I also flush the decoders/encoders/interleaved_writing after all the package reading from the input is done. It works well for the video so I doubt there is much wrong with my general approach.
Here are the functions of my code (stripped off the error handling & other class stuff):
AudioCodecContext Setup
outContext->_audioCodec = avcodec_find_encoder(outContext->_audioTargetCodecID);
outContext->_audioStream =
avformat_new_stream(outContext->_formatContext, outContext->_audioCodec);
outContext->_audioCodecContext = outContext->_audioStream->codec;
outContext->_audioCodecContext->channels = 2;
outContext->_audioCodecContext->channel_layout = av_get_default_channel_layout(2);
outContext->_audioCodecContext->sample_rate = 44100;
outContext->_audioCodecContext->sample_fmt = outContext->_audioCodec->sample_fmts[0];
outContext->_audioCodecContext->bit_rate = 128000;
outContext->_audioCodecContext->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;
outContext->_audioCodecContext->time_base =
(AVRational){1, outContext->_audioCodecContext->sample_rate};
outContext->_audioStream->time_base = (AVRational){1, outContext->_audioCodecContext->sample_rate};
int retVal = avcodec_open2(outContext->_audioCodecContext, outContext->_audioCodec, NULL);
Resampler Setup
outContext->_audioResamplerContext =
swr_alloc_set_opts( NULL, outContext->_audioCodecContext->channel_layout,
outContext->_audioCodecContext->sample_fmt,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->channel_layout,
_inputContext._audioCodecContext->sample_fmt,
_inputContext._audioCodecContext->sample_rate,
0, NULL);
int retVal = swr_init(outContext->_audioResamplerContext);
Decoding
decodedBytes = avcodec_decode_audio4( _inputContext._audioCodecContext,
_inputContext._audioTempFrame,
&p_gotAudioFrame, &_inputContext._currentPacket);
Converting (only if decoding produced a frame, of course)
int retVal = swr_convert( outContext->_audioResamplerContext,
outContext->_audioConvertedFrame->data,
outContext->_audioConvertedFrame->nb_samples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
Encoding (only if decoding produced a frame, of course)
outContext->_audioConvertedFrame->pts =
av_frame_get_best_effort_timestamp(_inputContext._audioTempFrame);
// Init the new packet
av_init_packet(&outContext->_audioPacket);
outContext->_audioPacket.data = NULL;
outContext->_audioPacket.size = 0;
// Encode
int retVal = avcodec_encode_audio2( outContext->_audioCodecContext,
&outContext->_audioPacket,
outContext->_audioConvertedFrame,
&p_gotPacket);
// Set pts/dts time stamps for writing interleaved
av_packet_rescale_ts( &outContext->_audioPacket,
outContext->_audioCodecContext->time_base,
outContext->_audioStream->time_base);
outContext->_audioPacket.stream_index = outContext->_audioStream->index;
Writing (only if encoding produced a packet, of course)
int retVal = av_interleaved_write_frame(outContext->_formatContext, &outContext->_audioPacket);
I am quite out of ideas about what would cause such a behaviour.

So, I finally managed to figure things out myself.
The problem was indeed in the difference of the sample_rate.
You'd assume that a call to swr_convert() would give you all the samples you need for converting the audio frame when called like I did.
Of course, that would be too easy.
Instead, you need to call swr_convert (potentially) multiple times per frame and buffer its output, if required. Then you need to grab a single frame from the buffer and that is what you will have to encode.
Here is my new convertAudioFrame function:
// Calculate number of output samples
int numOutputSamples = av_rescale_rnd(
swr_get_delay(outContext->_audioResamplerContext, _inputContext._audioCodecContext->sample_rate)
+ _inputContext._audioTempFrame->nb_samples,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->sample_rate,
AV_ROUND_UP);
if (numOutputSamples == 0)
{
return;
}
uint8_t* tempSamples;
av_samples_alloc( &tempSamples, NULL,
outContext->_audioCodecContext->channels, numOutputSamples,
outContext->_audioCodecContext->sample_fmt, 0);
int retVal = swr_convert( outContext->_audioResamplerContext,
&tempSamples,
numOutputSamples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
// Write to audio fifo
if (retVal > 0)
{
retVal = av_audio_fifo_write(outContext->_audioFifo, (void**)&tempSamples, retVal);
}
av_freep(&tempSamples);
// Get a frame from audio fifo
int samplesAvailable = av_audio_fifo_size(outContext->_audioFifo);
if (samplesAvailable > 0)
{
retVal = av_audio_fifo_read(outContext->_audioFifo,
(void**)outContext->_audioConvertedFrame->data,
outContext->_audioCodecContext->frame_size);
// We got a frame, so also set its pts
if (retVal > 0)
{
p_gotConvertedFrame = 1;
if (_inputContext._audioTempFrame->pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pts;
}
else if (_inputContext._audioTempFrame->pkt_pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pkt_pts;
}
}
}
This function I basically call until there are no more frame in the audio fifo buffer.
So, the audio was only half as long because I only encoded as many frames as I decoded. Where I actually needed to encode 2 times as many frames due to 2 times the sample_rate.

Related

VideoToolbox hardware encoded I frame not clear on Intel Mac

When I captured video from camera on Intel Mac, used VideoToolbox to hardware encode raw pixel buffers to H.264 codec slices, I found that the VideoToolbox encoded I frame not clear, causing it looks like blurs every serveral seconds. Below are properties setted:
self.bitrate = 1000000;
self.frameRate = 20;
int interval_second = 2;
int interval_second = 2;
NSDictionary *compressionProperties = #{
(id)kVTCompressionPropertyKey_ProfileLevel: (id)kVTProfileLevel_H264_High_AutoLevel,
(id)kVTCompressionPropertyKey_RealTime: #YES,
(id)kVTCompressionPropertyKey_AllowFrameReordering: #NO,
(id)kVTCompressionPropertyKey_H264EntropyMode: (id)kVTH264EntropyMode_CABAC,
(id)kVTCompressionPropertyKey_PixelTransferProperties: #{
(id)kVTPixelTransferPropertyKey_ScalingMode: (id)kVTScalingMode_Trim,
},
(id)kVTCompressionPropertyKey_AverageBitRate: #(self.bitrate),
(id)kVTCompressionPropertyKey_ExpectedFrameRate: #(self.frameRate),
(id)kVTCompressionPropertyKey_MaxKeyFrameInterval: #(self.frameRate * interval_second),
(id)kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration: #(interval_second),
(id)kVTCompressionPropertyKey_DataRateLimits: #[#(self.bitrate / 8), #1.0],
};
result = VTSessionSetProperties(self.compressionSession, (CFDictionaryRef)compressionProperties);
if (result != noErr) {
NSLog(#"VTSessionSetProperties failed: %d", (int)result);
return;
} else {
NSLog(#"VTSessionSetProperties succeeded");
}
These are very strange compression settings. Do you really need short GOP and very strict data rate limits?
I very much suspect you just copied some code off the internet without having any idea what it does. If it's the case, just set interval_second = 300 and remove kVTCompressionPropertyKey_DataRateLimits completely

H264 Decoding with Apple Video Toolkit

I am trying to get an H264 streaming app working on various platforms using a combination of Apple Video Toolbox and OpenH264. There is one use-case that doesn't work and I can't find any solution. When the source uses video Toolbox on a 2011 iMac running MacOS High Sierra and the receiver is a MacBook pro running Big Sur.
On the receiver the decoded image is about 3/4 green. If I scale the image down to about 1/8 of original before encoding then it works fine. If I capture the frames on the MacBook and then run exactly the same decoding software in a test program on the iMac then it decodes fine. Doing the same on the Macbook (same image of test program) give 3/4 green again. I have a similar problem when receiving from an OpenH264 encoder on a slower Windows machine. I suspect that this has something to do with temporal processing, but really don't understand H264 well enough to work it out. One thing that I did notice is that the decode call returns with no error code but a NULL pixel buffer about 70% of the time.
The "guts" of the decoding part looks like this (modified from a demo on GitHub)
void didDecompress(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration )
{
CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
*outputPixelBuffer = CVPixelBufferRetain(pixelBuffer);
}
void initVideoDecodeToolBox ()
{
if (!decodeSession)
{
const uint8_t* parameterSetPointers[2] = { mSPS, mPPS };
const size_t parameterSetSizes[2] = { mSPSSize, mPPSSize };
OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,2, //param count
parameterSetPointers,
parameterSetSizes,
4, //nal start code size
&formatDescription);
if(status == noErr)
{
CFDictionaryRef attrs = NULL;
const void *keys[] = { kCVPixelBufferPixelFormatTypeKey, kVTDecompressionPropertyKey_RealTime };
uint32_t v = kCVPixelFormatType_32BGRA;
const void *values[] = { CFNumberCreate(NULL, kCFNumberSInt32Type, &v), kCFBooleanTrue };
attrs = CFDictionaryCreate(NULL, keys, values, 2, NULL, NULL);
VTDecompressionOutputCallbackRecord callBackRecord;
callBackRecord.decompressionOutputCallback = didDecompress;
callBackRecord.decompressionOutputRefCon = NULL;
status = VTDecompressionSessionCreate(kCFAllocatorDefault, formatDescription, NULL, attrs, &callBackRecord, &decodeSession);
CFRelease(attrs);
}
else
{
NSLog(#"IOS8VT: reset decoder session failed status=%d", status);
}
}
}
CVPixelBufferRef decode ( const char *NALBuffer, size_t NALSize )
{
CVPixelBufferRef outputPixelBuffer = NULL;
if (decodeSession && formatDescription )
{
// The NAL buffer has been stripped of the NAL length data, so this has to be put back in
MemoryBlock buf ( NALSize + 4);
memcpy ( (char*)buf.getData()+4, NALBuffer, NALSize );
*((uint32*)buf.getData()) = CFSwapInt32HostToBig ((uint32)NALSize);
CMBlockBufferRef blockBuffer = NULL;
OSStatus status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, buf.getData(), NALSize+4,kCFAllocatorNull,NULL, 0, NALSize+4, 0, &blockBuffer);
if(status == kCMBlockBufferNoErr)
{
CMSampleBufferRef sampleBuffer = NULL;
const size_t sampleSizeArray[] = {NALSize + 4};
status = CMSampleBufferCreateReady(kCFAllocatorDefault,blockBuffer,formatDescription,1, 0, NULL, 1, sampleSizeArray,&sampleBuffer);
if (status == kCMBlockBufferNoErr && sampleBuffer)
{
VTDecodeFrameFlags flags = 0;VTDecodeInfoFlags flagOut = 0;
// The default is synchronous operation.
// Call didDecompress and call back after returning.
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame ( decodeSession, sampleBuffer, flags, &outputPixelBuffer, &flagOut );
if(decodeStatus != noErr)
{
DBG ( "decode failed status=" + String ( decodeStatus) );
}
CFRelease(sampleBuffer);
}
CFRelease(blockBuffer);
}
}
return outputPixelBuffer;
}
Note: the NAL blocks don't have a 00 00 00 01 separator because they are streamed in blocks with explicit length field.
Decoding works fine on all platforms, and the encoded stream decodes fine with OpenH264.
Well, I finally found the answer so I'm going to leave it here for posterity. It turns out that the Video Toolkit decode function expects the NAL blocks that all belong to the same frame to be copied into a single SampleBuffer. The older Mac is providing the app with single keyframes that are split into separate NAL blocks which the app then sends individually across the network. Unfortunately this means that the first NAL block will be processed, in may case less than a quarter of the picture, and the rest will be discarded. What you need to do is work out which NALs are part of the same frame, and bundle them together. Unfortunately this requires you to partially parse the PPS and the frames themselves, which is not trivial. Many thanks to the post here at the Apple site which put me on the right track.

sync dumped rtp streams [duplicate]

Hi I am in a need of a bit of a help/guidance because I got stuck in my research.
The problem:
How to convert RTP data using either gstreamer or avlib (ffmpeg) in either API (by programming) or console versions.
Data
I have RTP dump that comes from RTP/RTCP over TCP so I can get the precise start and stop for each RTP packet in file. It's a H264 video stream dump.
The data is in this fashion because I need to acquire the RTCP/RTP interleaved stream via libcurl (which I'm currently doing)
Status
I've tried to use ffmpeg to consume pure RTP packets but is seems that using rtp either by console or by programming involves "starting" the whole rtsp/rtp session business in ffmpeg. I've stopped there and for the time being I didn't pursue this avenue deeper. I guess this is possible with lover level RTP API like ff_rtp_parse_packet() I'm too new with this lib to do it straight out.
Then there is the gstreamer It has somewhat more capabilities to do it without programming, but for the time being I'm not able to figure out how to pass it the RTP dump I have.
I have also tried to do a little bit of a trickery and stream the dump via socat/nc to the udp port and listen on it via ffplay with sdp file as an input, there seems to be some progress the rtp at least gets recognized, but for socat there are loads of packet missing (data sent too fast perhaps?) and in the end the data is not visualized. When I used nc the video was badly misshapen but at least there were not that much receive errors.
One way or another the data is not properly visualized.
I know I can depacketize the data "by hand" but the idea is to do it via some kind of library because in the end there would also be second stream with audio that would have to be muxed together with the video.
I would appreciate any help on how to tackle this problem.
Thanks.
Finally after some period of time I had time to sit down at this problem again, and finally I've got the solution that satisfies me. I went on with RTP interleaved stream (RTP is interleaved with RTCP over single TCP connection).
So I had a interleaved RTCP/RTP stream that needed to be disassembled to Audio (PCM A-Law) and Video (h.264 Constrained baseline) RTP packets.
The decomposition of the RTSP stream containing RTP data is described here rfc2326.
Depacketization of the H264 is described here rfc6184, for the PCM A-Law the frames came out to be raw audio in RTP so no depacketization was necessary.
Next step was to calculate proper PTS (or presentation time stamp) for each stream, that was a bit of a hassle but finally the Live555 code came to help
(see RTP lipsync synchronization).
The last task was to mux it into a container that would support PCM alaw, I've used ffmpeg's avlibraries.
There are many examples over the Internet but many of them are outdated (ffmpeg is very 'dynamic' in API changes region) so I'm posting (most important parts of) what actually worked for me in the end:
The setup part:
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include "libavutil/intreadwrite.h"
#include "libavutil/mathematics.h"
AVFormatContext *formatContext;
AVOutputFormat *outputFormat;
AVStream *video_st;
AVStream *audio_st;
AVCodec *av_encode_codec = NULL;
AVCodec *av_audio_encode_codec = NULL;
AVCodecContext *av_video_encode_codec_ctx = NULL;
AVCodecContext *av_audio_encode_codec_ctx = NULL;
av_register_all();
av_log_set_level(AV_LOG_TRACE);
outputFormat = av_guess_format(NULL, pu8outFileName, NULL);
outputFormat->video_codec = AV_CODEC_ID_H264;
av_encode_codec = avcodec_find_encoder(AV_CODEC_ID_H264);
av_audio_encode_codec = avcodec_find_encoder(AV_CODEC_ID_PCM_ALAW);
avformat_alloc_output_context2(&formatContext, NULL, NULL, pu8outFileName);
formatContext->oformat = outputFormat;
strcpy(formatContext->filename, pu8outFileName);
outputFormat->audio_codec = AV_CODEC_ID_PCM_ALAW;
av_video_encode_codec_ctx = avcodec_alloc_context3(av_encode_codec);
av_audio_encode_codec_ctx = avcodec_alloc_context3(av_audio_encode_codec);
av_video_encode_codec_ctx->codec_id = outputFormat->video_codec;
av_video_encode_codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
av_video_encode_codec_ctx->bit_rate = 4000;
av_video_encode_codec_ctx->width = u32width;
av_video_encode_codec_ctx->height = u32height;
av_video_encode_codec_ctx->time_base = (AVRational){ 1, u8fps };
av_video_encode_codec_ctx->max_b_frames = 0;
av_video_encode_codec_ctx->pix_fmt = AV_PIX_FMT_YUV420P;
av_audio_encode_codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16;
av_audio_encode_codec_ctx->codec_id = AV_CODEC_ID_PCM_ALAW;
av_audio_encode_codec_ctx->codec_type = AVMEDIA_TYPE_AUDIO;
av_audio_encode_codec_ctx->sample_rate = 8000;
av_audio_encode_codec_ctx->channels = 1;
av_audio_encode_codec_ctx->time_base = (AVRational){ 1, u8fps };
av_audio_encode_codec_ctx->channel_layout = AV_CH_LAYOUT_MONO;
video_st = avformat_new_stream(formatContext, av_encode_codec);
audio_st = avformat_new_stream(formatContext, av_audio_encode_codec);
audio_st->index = 1;
video_st->avg_frame_rate = (AVRational){ 90000, 90000 / u8fps };
av_stream_set_r_frame_rate(video_st, (AVRational){ 90000, 90000 / u8fps });
The packets for video are written like this:
uint8_t *pu8framePtr = video_frame;
AVPacket pkt = { 0 };
av_init_packet(&pkt);
if (0x65 == pu8framePtr[4] || 0x67 == pu8framePtr[4] || 0x68 == pu8framePtr[4])
{
pkt.flags = AV_PKT_FLAG_KEY;
}
pkt.data = (uint8_t *)pu8framePtr;
pkt.size = u32LastFrameSize;
pkt.pts = av_rescale_q(s_video_sync.fSyncTime.tv_sec * 1000000 + s_video_sync.fSyncTime.tv_usec, (AVRational){ 1, 1000000 }, video_st->time_base);
pkt.dts = pkt.pts;
pkt.stream_index = video_st->index;
av_interleaved_write_frame(formatContext, &pkt);
av_packet_unref(&pkt);
and for the audio like this:
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.flags = AV_PKT_FLAG_KEY;
pkt.data = (uint8_t *)pu8framePtr;
pkt.size = u32AudioDataLen;
pkt.pts = av_rescale_q(s_audio_sync.fSyncTime.tv_sec * 1000000 + s_audio_sync.fSyncTime.tv_usec, (AVRational){ 1, 1000000 }, audio_st->time_base);
pkt.dts = pkt.pts;
pkt.stream_index = audio_st->index;
if (u8FirstIFrameFound) {av_interleaved_write_frame(formatContext, &pkt);}
av_packet_unref(&pkt)
and at the end some deinits:
av_write_trailer(formatContext);
av_dump_format(formatContext, 0, pu8outFileName, 1);
avcodec_free_context(&av_video_encode_codec_ctx);
avcodec_free_context(&av_audio_encode_codec_ctx);
avio_closep(&formatContext->pb);
avformat_free_context(formatContext);

Is it possible to decode MPEG4 frames without delay with ffmpeg?

I use ffmpeg's MPEG4 decoder. The decoder has CODEC_CAP_DELAY capability among others. It means the decoder will give me decoded frames with latency of 1 frame.
I have a set of MPEG4 (I- & P- )frames from AVI file and feed ffmpeg decoder with these frames. For the very first I-frame decoder gives me nothing, but decodes the frames successfully. I can force the decoder to get the decoded frame with the second call of avcodec_decode_video2 and providing nulls (flush it), but if I do so for each frame I get artifacts for the first group of pictures (e.g. second decoded P-frame is of gray color).
If I do not force ffmpeg decoder to give me decoded frame right now, then it works flawlessly and without artifacts.
Question: But is it possible to get decoded frame without giving the decoder next frame and without artifacts?
Small example of how decoding is implemented for each frame:
// decode
int got_frame = 0;
int err = 0;
int tries = 5;
do
{
err = avcodec_decode_video2(m_CodecContext, m_Frame, &got_frame, &m_Packet);
/* some codecs, such as MPEG, transmit the I and P frame with a
latency of one frame. You must do the following to have a
chance to get the last frame of the video */
m_Packet.data = NULL;
m_Packet.size = 0;
--tries;
}
while (err >= 0 && got_frame == 0 && tries > 0);
But as I said that gave me artifacts for the first gop.
Use the "-flags +low_delay" option (or in code, set AVCodecContext.flags |= CODEC_FLAG_LOW_DELAY).
I tested several options and "-flags low_delay" and "-probesize 32" is more important than others. bellow code worked for me.
AVDictionary* avDic = nullptr;
av_dict_set(&avDic, "flags", "low_delay", 0);
av_dict_set(&avDic, "probesize", "32", 0);
const int errorCode = avformat_open_input(&pFormatCtx, mUrl.c_str(), nullptr, &avDic);

AVFrame to RGB - decoding artifacts

I want to programmatically convert a mp4 video file (with h264 codec) to single RGB images. With the command line this looks like:
ffmpeg -i test1080.mp4 -r 30 image-%3d.jpg
Using this command produces a nice set of pictures. But when I try to programmatically do the same some images (probably B and P frames) look odd (e.g. have kind of distorted areas with difference information etc.). The reading and conversion code is as follow:
AVFrame *frame = avcodec_alloc_frame();
AVFrame *frameRGB = avcodec_alloc_frame();
AVPacket packet;
int buffer_size=avpicture_get_size(PIX_FMT_RGB24, m_codecCtx->width,
m_codecCtx->height);
uint8_t *buffer = new uint8_t[buffer_size];
avpicture_fill((AVPicture *)frameRGB, buffer, PIX_FMT_RGB24,
m_codecCtx->width, m_codecCtx->height);
while (true)
{
// Read one packet into `packet`
if (av_read_frame(m_formatCtx, &packet) < 0) {
break; // End of stream. Done decoding.
}
if (avcodec_decode_video(m_codecCtx, frame, &buffer_size, packet.data, packet.size) < 1) {
break; // Error in decoding
}
if (!buffer_size) {
break;
}
// Convert
img_convert((AVPicture *)frameRGB, PIX_FMT_RGB24, (AVPicture*)frame,
m_codecCtx->pix_fmt, m_codecCtx->width, m_codecCtx->height);
// RGB data is now available in frameRGB for further processing
}
How can I convert the video stream so that each final image shows all image data, so that information from B and P frames is included in all frames?
[EDIT:] A sample image showing the artifacts is here: http://imageshack.us/photo/my-images/201/sampleq.jpg/
Regards,
If the third argument of avcodec_decode_video returns a null value, it does not mean the error. This means that the frame is not yet ready. You need to continue to read frames until the value becomes nonzero.
if (!buffer_size) {
continue;
}
UPD
Try to add the check and display only the key frames, it will help isolate the problem.
while (true)
{
// Read one packet into `packet`
if (av_read_frame(m_formatCtx, &packet) < 0) {
break; // End of stream. Done decoding.
}
if (avcodec_decode_video(m_codecCtx, frame, &buffer_size,
packet.data, packet.size) < 1)
{
break; // Error in decoding
}
if (!buffer_size) {
continue; // <-- It's important!
}
// check for key frame
if (packet.flags & AV_PKT_FLAG_KEY)
{
// Convert
img_convert((AVPicture *)frameRGB, PIX_FMT_RGB24, (AVPicture*)frame,
m_codecCtx->pix_fmt, m_codecCtx->width, m_codecCtx->height);
}
}

Resources