I am trying to get an H264 streaming app working on various platforms using a combination of Apple Video Toolbox and OpenH264. There is one use-case that doesn't work and I can't find any solution. When the source uses video Toolbox on a 2011 iMac running MacOS High Sierra and the receiver is a MacBook pro running Big Sur.
On the receiver the decoded image is about 3/4 green. If I scale the image down to about 1/8 of original before encoding then it works fine. If I capture the frames on the MacBook and then run exactly the same decoding software in a test program on the iMac then it decodes fine. Doing the same on the Macbook (same image of test program) give 3/4 green again. I have a similar problem when receiving from an OpenH264 encoder on a slower Windows machine. I suspect that this has something to do with temporal processing, but really don't understand H264 well enough to work it out. One thing that I did notice is that the decode call returns with no error code but a NULL pixel buffer about 70% of the time.
The "guts" of the decoding part looks like this (modified from a demo on GitHub)
void didDecompress(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration )
{
CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
*outputPixelBuffer = CVPixelBufferRetain(pixelBuffer);
}
void initVideoDecodeToolBox ()
{
if (!decodeSession)
{
const uint8_t* parameterSetPointers[2] = { mSPS, mPPS };
const size_t parameterSetSizes[2] = { mSPSSize, mPPSSize };
OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,2, //param count
parameterSetPointers,
parameterSetSizes,
4, //nal start code size
&formatDescription);
if(status == noErr)
{
CFDictionaryRef attrs = NULL;
const void *keys[] = { kCVPixelBufferPixelFormatTypeKey, kVTDecompressionPropertyKey_RealTime };
uint32_t v = kCVPixelFormatType_32BGRA;
const void *values[] = { CFNumberCreate(NULL, kCFNumberSInt32Type, &v), kCFBooleanTrue };
attrs = CFDictionaryCreate(NULL, keys, values, 2, NULL, NULL);
VTDecompressionOutputCallbackRecord callBackRecord;
callBackRecord.decompressionOutputCallback = didDecompress;
callBackRecord.decompressionOutputRefCon = NULL;
status = VTDecompressionSessionCreate(kCFAllocatorDefault, formatDescription, NULL, attrs, &callBackRecord, &decodeSession);
CFRelease(attrs);
}
else
{
NSLog(#"IOS8VT: reset decoder session failed status=%d", status);
}
}
}
CVPixelBufferRef decode ( const char *NALBuffer, size_t NALSize )
{
CVPixelBufferRef outputPixelBuffer = NULL;
if (decodeSession && formatDescription )
{
// The NAL buffer has been stripped of the NAL length data, so this has to be put back in
MemoryBlock buf ( NALSize + 4);
memcpy ( (char*)buf.getData()+4, NALBuffer, NALSize );
*((uint32*)buf.getData()) = CFSwapInt32HostToBig ((uint32)NALSize);
CMBlockBufferRef blockBuffer = NULL;
OSStatus status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, buf.getData(), NALSize+4,kCFAllocatorNull,NULL, 0, NALSize+4, 0, &blockBuffer);
if(status == kCMBlockBufferNoErr)
{
CMSampleBufferRef sampleBuffer = NULL;
const size_t sampleSizeArray[] = {NALSize + 4};
status = CMSampleBufferCreateReady(kCFAllocatorDefault,blockBuffer,formatDescription,1, 0, NULL, 1, sampleSizeArray,&sampleBuffer);
if (status == kCMBlockBufferNoErr && sampleBuffer)
{
VTDecodeFrameFlags flags = 0;VTDecodeInfoFlags flagOut = 0;
// The default is synchronous operation.
// Call didDecompress and call back after returning.
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame ( decodeSession, sampleBuffer, flags, &outputPixelBuffer, &flagOut );
if(decodeStatus != noErr)
{
DBG ( "decode failed status=" + String ( decodeStatus) );
}
CFRelease(sampleBuffer);
}
CFRelease(blockBuffer);
}
}
return outputPixelBuffer;
}
Note: the NAL blocks don't have a 00 00 00 01 separator because they are streamed in blocks with explicit length field.
Decoding works fine on all platforms, and the encoded stream decodes fine with OpenH264.
Well, I finally found the answer so I'm going to leave it here for posterity. It turns out that the Video Toolkit decode function expects the NAL blocks that all belong to the same frame to be copied into a single SampleBuffer. The older Mac is providing the app with single keyframes that are split into separate NAL blocks which the app then sends individually across the network. Unfortunately this means that the first NAL block will be processed, in may case less than a quarter of the picture, and the rest will be discarded. What you need to do is work out which NALs are part of the same frame, and bundle them together. Unfortunately this requires you to partially parse the PPS and the frames themselves, which is not trivial. Many thanks to the post here at the Apple site which put me on the right track.
Related
When I captured video from camera on Intel Mac, used VideoToolbox to hardware encode raw pixel buffers to H.264 codec slices, I found that the VideoToolbox encoded I frame not clear, causing it looks like blurs every serveral seconds. Below are properties setted:
self.bitrate = 1000000;
self.frameRate = 20;
int interval_second = 2;
int interval_second = 2;
NSDictionary *compressionProperties = #{
(id)kVTCompressionPropertyKey_ProfileLevel: (id)kVTProfileLevel_H264_High_AutoLevel,
(id)kVTCompressionPropertyKey_RealTime: #YES,
(id)kVTCompressionPropertyKey_AllowFrameReordering: #NO,
(id)kVTCompressionPropertyKey_H264EntropyMode: (id)kVTH264EntropyMode_CABAC,
(id)kVTCompressionPropertyKey_PixelTransferProperties: #{
(id)kVTPixelTransferPropertyKey_ScalingMode: (id)kVTScalingMode_Trim,
},
(id)kVTCompressionPropertyKey_AverageBitRate: #(self.bitrate),
(id)kVTCompressionPropertyKey_ExpectedFrameRate: #(self.frameRate),
(id)kVTCompressionPropertyKey_MaxKeyFrameInterval: #(self.frameRate * interval_second),
(id)kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration: #(interval_second),
(id)kVTCompressionPropertyKey_DataRateLimits: #[#(self.bitrate / 8), #1.0],
};
result = VTSessionSetProperties(self.compressionSession, (CFDictionaryRef)compressionProperties);
if (result != noErr) {
NSLog(#"VTSessionSetProperties failed: %d", (int)result);
return;
} else {
NSLog(#"VTSessionSetProperties succeeded");
}
These are very strange compression settings. Do you really need short GOP and very strict data rate limits?
I very much suspect you just copied some code off the internet without having any idea what it does. If it's the case, just set interval_second = 300 and remove kVTCompressionPropertyKey_DataRateLimits completely
When I run encoder->ProcessInput(stream_id, sample.Get(), 0) I am getting a E_FAIL ("Unspecified error") error which isn't very helpful.
I am either trying to (1) Figure out what the real error is and/or (2) get past this unspecified error.
Ultimately, my goal is achieving this: http://alax.info/blog/1716
Here's the gist of what I am doing:
(Error occurs in this block)
void encode_frame(ComPtr<ID3D11Texture2D> texture) {
_com_error error = NULL;
IMFTransform *encoder = nullptr;
encoder = get_encoder();
if (!encoder) {
cout << "Did not get a valid encoder to utilize\n";
return;
}
cout << "Making it Direct3D aware...\n";
setup_D3_aware_mft(encoder);
cout << "Setting up input/output media types...\n";
setup_media_types(encoder);
error = encoder->ProcessMessage(MFT_MESSAGE_COMMAND_FLUSH, NULL); // flush all stored data
error = encoder->ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, NULL);
error = encoder->ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, NULL); // first sample is about to be processed, req for async
cout << "Encoding image...\n";
IMFMediaEventGenerator *event_generator = nullptr;
error = encoder->QueryInterface(&event_generator);
while (true) {
IMFMediaEvent *event = nullptr;
MediaEventType type;
error = event_generator->GetEvent(0, &event);
error = event->GetType(&type);
uint32_t stream_id = get_stream_id(encoder); // Likely just going to be 0
uint32_t frame = 1;
uint64_t sample_duration = 0;
ComPtr<IMFSample> sample = nullptr;
IMFMediaBuffer *mbuffer = nullptr;
DWORD length = 0;
uint32_t img_size = 0;
MFCalculateImageSize(desktop_info.input_sub_type, desktop_info.width, desktop_info.height, &img_size);
switch (type) {
case METransformNeedInput:
ThrowIfFailed(MFCreateDXGISurfaceBuffer(__uuidof(ID3D11Texture2D), texture.Get(), 0, false, &mbuffer),
mbuffer, "Failed to generate a media buffer");
ThrowIfFailed(MFCreateSample(&sample), sample.Get(), "Couldn't create sample buffer");
ThrowIfFailed(sample->AddBuffer(mbuffer), sample.Get(), "Couldn't add buffer");
// Test (delete this) - fake buffer
/*byte *buffer_data;
MFCreateMemoryBuffer(img_size, &mbuffer);
mbuffer->Lock(&buffer_data, NULL, NULL);
mbuffer->GetCurrentLength(&length);
memset(buffer_data, 0, img_size);
mbuffer->Unlock();
mbuffer->SetCurrentLength(img_size);
sample->AddBuffer(mbuffer);*/
MFFrameRateToAverageTimePerFrame(desktop_info.fps, 1, &sample_duration);
sample->SetSampleDuration(sample_duration);
// ERROR
ThrowIfFailed(encoder->ProcessInput(stream_id, sample.Get(), 0), sample.Get(), "ProcessInput failed.");
I setup my media types like this:
void setup_media_types(IMFTransform *encoder) {
IMFMediaType *output_type = nullptr;
IMFMediaType *input_type = nullptr;
ThrowIfFailed(MFCreateMediaType(&output_type), output_type, "Failed to create output type");
ThrowIfFailed(MFCreateMediaType(&input_type), input_type, "Failed to create input type");
/*
List of all MF types:
https://learn.microsoft.com/en-us/windows/desktop/medfound/alphabetical-list-of-media-foundation-attributes
*/
_com_error error = NULL;
int stream_id = get_stream_id(encoder);
error = output_type->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
error = output_type->SetGUID(MF_MT_SUBTYPE, desktop_info.output_sub_type);
error = output_type->SetUINT32(MF_MT_AVG_BITRATE, desktop_info.bitrate);
error = MFSetAttributeSize(output_type, MF_MT_FRAME_SIZE, desktop_info.width, desktop_info.height);
error = MFSetAttributeRatio(output_type, MF_MT_FRAME_RATE, desktop_info.fps, 1);
error = output_type->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive); // motion will be smoother, fewer artifacts
error = output_type->SetUINT32(MF_MT_MPEG2_PROFILE, eAVEncH264VProfile_High);
error = output_type->SetUINT32(MF_MT_MPEG2_LEVEL, eAVEncH264VLevel3_1);
error = output_type->SetUINT32(CODECAPI_AVEncCommonRateControlMode, eAVEncCommonRateControlMode_CBR); // probably will change this
ThrowIfFailed(encoder->SetOutputType(stream_id, output_type, 0), output_type, "Couldn't set output type");
error = input_type->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
error = input_type->SetGUID(MF_MT_SUBTYPE, desktop_info.input_sub_type);
error = input_type->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
error = MFSetAttributeSize(input_type, MF_MT_FRAME_SIZE, desktop_info.width, desktop_info.height);
error = MFSetAttributeRatio(input_type, MF_MT_FRAME_RATE, desktop_info.fps, 1);
error = MFSetAttributeRatio(input_type, MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
ThrowIfFailed(encoder->SetInputType(stream_id, input_type, 0), input_type, "Couldn't set input type");
}
My desktop_info struct is:
struct desktop_info {
int fps = 30;
int width = 2560;
int height = 1440;
uint32_t bitrate = 10 * 1000000; // 10Mb
GUID input_sub_type = MFVideoFormat_ARGB32;
GUID output_sub_type = MFVideoFormat_H264;
} desktop_info;
Output of my program prior to reaching ProcessInput:
Hello World!
Number of devices: 3
Device #0
Adapter: Intel(R) HD Graphics 630
Got some information about the device:
\\.\DISPLAY2
Attached to desktop : 1
Got some information about the device:
\\.\DISPLAY1
Attached to desktop : 1
Did not find another adapter. Index higher than the # of outputs.
Successfully duplicated output from IDXGIOutput1
Accumulated frames: 0
Created a 2D texture...
Number of encoders/processors available: 1
Encoder name: IntelĀ« Quick Sync Video H.264 Encoder MFT
Making it Direct3D aware...
Setting up input/output media types...
If you're curious what my Locals were right before ProcessInput: http://prntscr.com/mx1i9t
This may be an "unpopular" answer since it doesn't provide a solution for MFT specifically but after 8 months of working heavily on this stuff, I would highly recommend not using MFT and implementing encoders directly.
My solution was implementing an HW encoder like NVENC/QSV and you could fall back on a software encoder like x264 if the client doesn't have HW acceleration available.
The reason for this is that MFT is far more opaque and not well documented/supported by Microsoft. I think you'll find you want more control over the settings & parameter tuning of the encoder's as well wherein each encoder implementation is subtly different.
We have seen this error coming from the Intel graphics driver. (The H.264 encoder MFT uses the Intel GPU to do the encode the video into H.264 format.)
In our case, I think the bug was triggered by configuring the encoder to a very high bit rate and then configuring to a low bit rate. In your sample code, it does not look like you are changing the bit rate, so I am not sure if it is the same bug.
Intel just released a new driver about two weeks ago, that is supposed to have the fix for the bug that we were seeing. So, you may want to give that new driver a try -- hopefully it will fix the problem that you are having.
The new driver is version 25.20.100.6519. You can get it from the Intel web site: https://downloadcenter.intel.com/download/28566/Intel-Graphics-Driver-for-Windows-10
If the new driver does not fix the problem, you could try running your program on a different PC that uses a NVidia or AMD graphics card, to see if the problem only happens on PCs that have Intel graphics.
I have an audio related app that has multichannel mixer to play m4a files at a time.
I'm using the AudioToolBox framework to stream audio, but on iOS9 the framework throws me exception in mixer rendering callback where i am streaming the audio files.
Interestingly apps compiled with the iOS9 SDK continue to stream the same file perfectly on iOS7/8 devices, but not iOS9.
Now i can't figure out if Apple broke something in iOS9, or we have the files encoded wrong on our end, but they play just fine on both iOS 7/8 but not 9.
Exception:
malloc: *** error for object 0x7fac74056e08: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
It works for all other formats does not give any exception or any kind of memory errors but does not work for m4a format which is very surprising.
Here is a code to load files which works for wav,aif etc formats but not for m4a:
- (void)loadFiles{
AVAudioFormat *clientFormat = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatFloat32
sampleRate:kGraphSampleRate
channels:1
interleaved:NO];
for (int i = 0; i < numFiles && i < maxBufs; i++) {
ExtAudioFileRef xafref = 0;
// open one of the two source files
OSStatus result = ExtAudioFileOpenURL(sourceURL[i], &xafref);
if (result || !xafref) {break; }
// get the file data format, this represents the file's actual data format
AudioStreamBasicDescription fileFormat;
UInt32 propSize = sizeof(fileFormat);
result = ExtAudioFileGetProperty(xafref, kExtAudioFileProperty_FileDataFormat, &propSize, &fileFormat);
if (result) { break; }
// set the client format - this is the format we want back from ExtAudioFile and corresponds to the format
// we will be providing to the input callback of the mixer, therefore the data type must be the same
double rateRatio = kGraphSampleRate / fileFormat.mSampleRate;
propSize = sizeof(AudioStreamBasicDescription);
result = ExtAudioFileSetProperty(xafref, kExtAudioFileProperty_ClientDataFormat, propSize, clientFormat.streamDescription);
if (result) { break; }
// get the file's length in sample frames
UInt64 numFrames = 0;
propSize = sizeof(numFrames);
result = ExtAudioFileGetProperty(xafref, kExtAudioFileProperty_FileLengthFrames, &propSize, &numFrames);
if (result) { break; }
if(i==metronomeBusIndex)
numFrames = (numFrames+6484)*4;
//numFrames = (numFrames * rateRatio); // account for any sample rate conversion
numFrames *= rateRatio;
// set up our buffer
mSoundBuffer[i].numFrames = (UInt32)numFrames;
mSoundBuffer[i].asbd = *(clientFormat.streamDescription);
UInt32 samples = (UInt32)numFrames * mSoundBuffer[i].asbd.mChannelsPerFrame;
mSoundBuffer[i].data = (Float32 *)calloc(samples, sizeof(Float32));
mSoundBuffer[i].sampleNum = 0;
// set up a AudioBufferList to read data into
AudioBufferList bufList;
bufList.mNumberBuffers = 1;
bufList.mBuffers[0].mNumberChannels = 1;
bufList.mBuffers[0].mData = mSoundBuffer[i].data;
bufList.mBuffers[0].mDataByteSize = samples * sizeof(Float32);
// perform a synchronous sequential read of the audio data out of the file into our allocated data buffer
UInt32 numPackets = (UInt32)numFrames;
result = ExtAudioFileRead(xafref, &numPackets, &bufList);
if (result) {
free(mSoundBuffer[i].data);
mSoundBuffer[i].data = 0;
}
// close the file and dispose the ExtAudioFileRef
ExtAudioFileDispose(xafref);
}
// [clientFormat release];
}
If anyone could point me in the right direction, how do i go about debugging the issue?
Do we need to re-encode our files in some specific way?
I tried it on iOS 9.1.beta3 yesterday and things seem to be back to normal.
Try it out. Let us know if it works out for you too.
I have a strange problem in my C/C++ FFmpeg transcoder, which takes an input MP4 (varying input codecs) and produces and output MP4 (x264, baseline & AAC LC #44100 sample rate with libfdk_aac):
The resulting mp4 video has fine images (x264) and the audio (AAC LC) works fine as well, but is only played until exactly the half of the video.
The audio is not slowed down, not stretched and doesn't stutter. It just stops right in the middle of the video.
One hint may be that the input file has a sample rate of 22050 and 22050/44100 is 0.5, but I really don't get why this would make the sound just stop after half the time. I'd expect such an error leading to sound being at the wrong speed. Everything works just fine if I don't try to enforce 44100 and instead just use the incoming sample_rate.
Another guess would be that the pts calculation doesn't work. But the audio sounds just fine (until it stops) and I do exactly the same for the video part, where it works flawlessly. "Exactly", as in the same code, but "audio"-variables replaced with "video"-variables.
FFmpeg reports no errors during the whole process. I also flush the decoders/encoders/interleaved_writing after all the package reading from the input is done. It works well for the video so I doubt there is much wrong with my general approach.
Here are the functions of my code (stripped off the error handling & other class stuff):
AudioCodecContext Setup
outContext->_audioCodec = avcodec_find_encoder(outContext->_audioTargetCodecID);
outContext->_audioStream =
avformat_new_stream(outContext->_formatContext, outContext->_audioCodec);
outContext->_audioCodecContext = outContext->_audioStream->codec;
outContext->_audioCodecContext->channels = 2;
outContext->_audioCodecContext->channel_layout = av_get_default_channel_layout(2);
outContext->_audioCodecContext->sample_rate = 44100;
outContext->_audioCodecContext->sample_fmt = outContext->_audioCodec->sample_fmts[0];
outContext->_audioCodecContext->bit_rate = 128000;
outContext->_audioCodecContext->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;
outContext->_audioCodecContext->time_base =
(AVRational){1, outContext->_audioCodecContext->sample_rate};
outContext->_audioStream->time_base = (AVRational){1, outContext->_audioCodecContext->sample_rate};
int retVal = avcodec_open2(outContext->_audioCodecContext, outContext->_audioCodec, NULL);
Resampler Setup
outContext->_audioResamplerContext =
swr_alloc_set_opts( NULL, outContext->_audioCodecContext->channel_layout,
outContext->_audioCodecContext->sample_fmt,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->channel_layout,
_inputContext._audioCodecContext->sample_fmt,
_inputContext._audioCodecContext->sample_rate,
0, NULL);
int retVal = swr_init(outContext->_audioResamplerContext);
Decoding
decodedBytes = avcodec_decode_audio4( _inputContext._audioCodecContext,
_inputContext._audioTempFrame,
&p_gotAudioFrame, &_inputContext._currentPacket);
Converting (only if decoding produced a frame, of course)
int retVal = swr_convert( outContext->_audioResamplerContext,
outContext->_audioConvertedFrame->data,
outContext->_audioConvertedFrame->nb_samples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
Encoding (only if decoding produced a frame, of course)
outContext->_audioConvertedFrame->pts =
av_frame_get_best_effort_timestamp(_inputContext._audioTempFrame);
// Init the new packet
av_init_packet(&outContext->_audioPacket);
outContext->_audioPacket.data = NULL;
outContext->_audioPacket.size = 0;
// Encode
int retVal = avcodec_encode_audio2( outContext->_audioCodecContext,
&outContext->_audioPacket,
outContext->_audioConvertedFrame,
&p_gotPacket);
// Set pts/dts time stamps for writing interleaved
av_packet_rescale_ts( &outContext->_audioPacket,
outContext->_audioCodecContext->time_base,
outContext->_audioStream->time_base);
outContext->_audioPacket.stream_index = outContext->_audioStream->index;
Writing (only if encoding produced a packet, of course)
int retVal = av_interleaved_write_frame(outContext->_formatContext, &outContext->_audioPacket);
I am quite out of ideas about what would cause such a behaviour.
So, I finally managed to figure things out myself.
The problem was indeed in the difference of the sample_rate.
You'd assume that a call to swr_convert() would give you all the samples you need for converting the audio frame when called like I did.
Of course, that would be too easy.
Instead, you need to call swr_convert (potentially) multiple times per frame and buffer its output, if required. Then you need to grab a single frame from the buffer and that is what you will have to encode.
Here is my new convertAudioFrame function:
// Calculate number of output samples
int numOutputSamples = av_rescale_rnd(
swr_get_delay(outContext->_audioResamplerContext, _inputContext._audioCodecContext->sample_rate)
+ _inputContext._audioTempFrame->nb_samples,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->sample_rate,
AV_ROUND_UP);
if (numOutputSamples == 0)
{
return;
}
uint8_t* tempSamples;
av_samples_alloc( &tempSamples, NULL,
outContext->_audioCodecContext->channels, numOutputSamples,
outContext->_audioCodecContext->sample_fmt, 0);
int retVal = swr_convert( outContext->_audioResamplerContext,
&tempSamples,
numOutputSamples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
// Write to audio fifo
if (retVal > 0)
{
retVal = av_audio_fifo_write(outContext->_audioFifo, (void**)&tempSamples, retVal);
}
av_freep(&tempSamples);
// Get a frame from audio fifo
int samplesAvailable = av_audio_fifo_size(outContext->_audioFifo);
if (samplesAvailable > 0)
{
retVal = av_audio_fifo_read(outContext->_audioFifo,
(void**)outContext->_audioConvertedFrame->data,
outContext->_audioCodecContext->frame_size);
// We got a frame, so also set its pts
if (retVal > 0)
{
p_gotConvertedFrame = 1;
if (_inputContext._audioTempFrame->pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pts;
}
else if (_inputContext._audioTempFrame->pkt_pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pkt_pts;
}
}
}
This function I basically call until there are no more frame in the audio fifo buffer.
So, the audio was only half as long because I only encoded as many frames as I decoded. Where I actually needed to encode 2 times as many frames due to 2 times the sample_rate.
I want to programmatically convert a mp4 video file (with h264 codec) to single RGB images. With the command line this looks like:
ffmpeg -i test1080.mp4 -r 30 image-%3d.jpg
Using this command produces a nice set of pictures. But when I try to programmatically do the same some images (probably B and P frames) look odd (e.g. have kind of distorted areas with difference information etc.). The reading and conversion code is as follow:
AVFrame *frame = avcodec_alloc_frame();
AVFrame *frameRGB = avcodec_alloc_frame();
AVPacket packet;
int buffer_size=avpicture_get_size(PIX_FMT_RGB24, m_codecCtx->width,
m_codecCtx->height);
uint8_t *buffer = new uint8_t[buffer_size];
avpicture_fill((AVPicture *)frameRGB, buffer, PIX_FMT_RGB24,
m_codecCtx->width, m_codecCtx->height);
while (true)
{
// Read one packet into `packet`
if (av_read_frame(m_formatCtx, &packet) < 0) {
break; // End of stream. Done decoding.
}
if (avcodec_decode_video(m_codecCtx, frame, &buffer_size, packet.data, packet.size) < 1) {
break; // Error in decoding
}
if (!buffer_size) {
break;
}
// Convert
img_convert((AVPicture *)frameRGB, PIX_FMT_RGB24, (AVPicture*)frame,
m_codecCtx->pix_fmt, m_codecCtx->width, m_codecCtx->height);
// RGB data is now available in frameRGB for further processing
}
How can I convert the video stream so that each final image shows all image data, so that information from B and P frames is included in all frames?
[EDIT:] A sample image showing the artifacts is here: http://imageshack.us/photo/my-images/201/sampleq.jpg/
Regards,
If the third argument of avcodec_decode_video returns a null value, it does not mean the error. This means that the frame is not yet ready. You need to continue to read frames until the value becomes nonzero.
if (!buffer_size) {
continue;
}
UPD
Try to add the check and display only the key frames, it will help isolate the problem.
while (true)
{
// Read one packet into `packet`
if (av_read_frame(m_formatCtx, &packet) < 0) {
break; // End of stream. Done decoding.
}
if (avcodec_decode_video(m_codecCtx, frame, &buffer_size,
packet.data, packet.size) < 1)
{
break; // Error in decoding
}
if (!buffer_size) {
continue; // <-- It's important!
}
// check for key frame
if (packet.flags & AV_PKT_FLAG_KEY)
{
// Convert
img_convert((AVPicture *)frameRGB, PIX_FMT_RGB24, (AVPicture*)frame,
m_codecCtx->pix_fmt, m_codecCtx->width, m_codecCtx->height);
}
}