VideoToolbox hardware encoded I frame not clear on Intel Mac - macos

When I captured video from camera on Intel Mac, used VideoToolbox to hardware encode raw pixel buffers to H.264 codec slices, I found that the VideoToolbox encoded I frame not clear, causing it looks like blurs every serveral seconds. Below are properties setted:
self.bitrate = 1000000;
self.frameRate = 20;
int interval_second = 2;
int interval_second = 2;
NSDictionary *compressionProperties = #{
(id)kVTCompressionPropertyKey_ProfileLevel: (id)kVTProfileLevel_H264_High_AutoLevel,
(id)kVTCompressionPropertyKey_RealTime: #YES,
(id)kVTCompressionPropertyKey_AllowFrameReordering: #NO,
(id)kVTCompressionPropertyKey_H264EntropyMode: (id)kVTH264EntropyMode_CABAC,
(id)kVTCompressionPropertyKey_PixelTransferProperties: #{
(id)kVTPixelTransferPropertyKey_ScalingMode: (id)kVTScalingMode_Trim,
},
(id)kVTCompressionPropertyKey_AverageBitRate: #(self.bitrate),
(id)kVTCompressionPropertyKey_ExpectedFrameRate: #(self.frameRate),
(id)kVTCompressionPropertyKey_MaxKeyFrameInterval: #(self.frameRate * interval_second),
(id)kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration: #(interval_second),
(id)kVTCompressionPropertyKey_DataRateLimits: #[#(self.bitrate / 8), #1.0],
};
result = VTSessionSetProperties(self.compressionSession, (CFDictionaryRef)compressionProperties);
if (result != noErr) {
NSLog(#"VTSessionSetProperties failed: %d", (int)result);
return;
} else {
NSLog(#"VTSessionSetProperties succeeded");
}

These are very strange compression settings. Do you really need short GOP and very strict data rate limits?
I very much suspect you just copied some code off the internet without having any idea what it does. If it's the case, just set interval_second = 300 and remove kVTCompressionPropertyKey_DataRateLimits completely

Related

H264 Decoding with Apple Video Toolkit

I am trying to get an H264 streaming app working on various platforms using a combination of Apple Video Toolbox and OpenH264. There is one use-case that doesn't work and I can't find any solution. When the source uses video Toolbox on a 2011 iMac running MacOS High Sierra and the receiver is a MacBook pro running Big Sur.
On the receiver the decoded image is about 3/4 green. If I scale the image down to about 1/8 of original before encoding then it works fine. If I capture the frames on the MacBook and then run exactly the same decoding software in a test program on the iMac then it decodes fine. Doing the same on the Macbook (same image of test program) give 3/4 green again. I have a similar problem when receiving from an OpenH264 encoder on a slower Windows machine. I suspect that this has something to do with temporal processing, but really don't understand H264 well enough to work it out. One thing that I did notice is that the decode call returns with no error code but a NULL pixel buffer about 70% of the time.
The "guts" of the decoding part looks like this (modified from a demo on GitHub)
void didDecompress(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration )
{
CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
*outputPixelBuffer = CVPixelBufferRetain(pixelBuffer);
}
void initVideoDecodeToolBox ()
{
if (!decodeSession)
{
const uint8_t* parameterSetPointers[2] = { mSPS, mPPS };
const size_t parameterSetSizes[2] = { mSPSSize, mPPSSize };
OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,2, //param count
parameterSetPointers,
parameterSetSizes,
4, //nal start code size
&formatDescription);
if(status == noErr)
{
CFDictionaryRef attrs = NULL;
const void *keys[] = { kCVPixelBufferPixelFormatTypeKey, kVTDecompressionPropertyKey_RealTime };
uint32_t v = kCVPixelFormatType_32BGRA;
const void *values[] = { CFNumberCreate(NULL, kCFNumberSInt32Type, &v), kCFBooleanTrue };
attrs = CFDictionaryCreate(NULL, keys, values, 2, NULL, NULL);
VTDecompressionOutputCallbackRecord callBackRecord;
callBackRecord.decompressionOutputCallback = didDecompress;
callBackRecord.decompressionOutputRefCon = NULL;
status = VTDecompressionSessionCreate(kCFAllocatorDefault, formatDescription, NULL, attrs, &callBackRecord, &decodeSession);
CFRelease(attrs);
}
else
{
NSLog(#"IOS8VT: reset decoder session failed status=%d", status);
}
}
}
CVPixelBufferRef decode ( const char *NALBuffer, size_t NALSize )
{
CVPixelBufferRef outputPixelBuffer = NULL;
if (decodeSession && formatDescription )
{
// The NAL buffer has been stripped of the NAL length data, so this has to be put back in
MemoryBlock buf ( NALSize + 4);
memcpy ( (char*)buf.getData()+4, NALBuffer, NALSize );
*((uint32*)buf.getData()) = CFSwapInt32HostToBig ((uint32)NALSize);
CMBlockBufferRef blockBuffer = NULL;
OSStatus status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, buf.getData(), NALSize+4,kCFAllocatorNull,NULL, 0, NALSize+4, 0, &blockBuffer);
if(status == kCMBlockBufferNoErr)
{
CMSampleBufferRef sampleBuffer = NULL;
const size_t sampleSizeArray[] = {NALSize + 4};
status = CMSampleBufferCreateReady(kCFAllocatorDefault,blockBuffer,formatDescription,1, 0, NULL, 1, sampleSizeArray,&sampleBuffer);
if (status == kCMBlockBufferNoErr && sampleBuffer)
{
VTDecodeFrameFlags flags = 0;VTDecodeInfoFlags flagOut = 0;
// The default is synchronous operation.
// Call didDecompress and call back after returning.
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame ( decodeSession, sampleBuffer, flags, &outputPixelBuffer, &flagOut );
if(decodeStatus != noErr)
{
DBG ( "decode failed status=" + String ( decodeStatus) );
}
CFRelease(sampleBuffer);
}
CFRelease(blockBuffer);
}
}
return outputPixelBuffer;
}
Note: the NAL blocks don't have a 00 00 00 01 separator because they are streamed in blocks with explicit length field.
Decoding works fine on all platforms, and the encoded stream decodes fine with OpenH264.
Well, I finally found the answer so I'm going to leave it here for posterity. It turns out that the Video Toolkit decode function expects the NAL blocks that all belong to the same frame to be copied into a single SampleBuffer. The older Mac is providing the app with single keyframes that are split into separate NAL blocks which the app then sends individually across the network. Unfortunately this means that the first NAL block will be processed, in may case less than a quarter of the picture, and the rest will be discarded. What you need to do is work out which NALs are part of the same frame, and bundle them together. Unfortunately this requires you to partially parse the PPS and the frames themselves, which is not trivial. Many thanks to the post here at the Apple site which put me on the right track.

m4a audio files not playing on iOS 9

I have an audio related app that has multichannel mixer to play m4a files at a time.
I'm using the AudioToolBox framework to stream audio, but on iOS9 the framework throws me exception in mixer rendering callback where i am streaming the audio files.
Interestingly apps compiled with the iOS9 SDK continue to stream the same file perfectly on iOS7/8 devices, but not iOS9.
Now i can't figure out if Apple broke something in iOS9, or we have the files encoded wrong on our end, but they play just fine on both iOS 7/8 but not 9.
Exception:
malloc: *** error for object 0x7fac74056e08: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
It works for all other formats does not give any exception or any kind of memory errors but does not work for m4a format which is very surprising.
Here is a code to load files which works for wav,aif etc formats but not for m4a:
- (void)loadFiles{
AVAudioFormat *clientFormat = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatFloat32
sampleRate:kGraphSampleRate
channels:1
interleaved:NO];
for (int i = 0; i < numFiles && i < maxBufs; i++) {
ExtAudioFileRef xafref = 0;
// open one of the two source files
OSStatus result = ExtAudioFileOpenURL(sourceURL[i], &xafref);
if (result || !xafref) {break; }
// get the file data format, this represents the file's actual data format
AudioStreamBasicDescription fileFormat;
UInt32 propSize = sizeof(fileFormat);
result = ExtAudioFileGetProperty(xafref, kExtAudioFileProperty_FileDataFormat, &propSize, &fileFormat);
if (result) { break; }
// set the client format - this is the format we want back from ExtAudioFile and corresponds to the format
// we will be providing to the input callback of the mixer, therefore the data type must be the same
double rateRatio = kGraphSampleRate / fileFormat.mSampleRate;
propSize = sizeof(AudioStreamBasicDescription);
result = ExtAudioFileSetProperty(xafref, kExtAudioFileProperty_ClientDataFormat, propSize, clientFormat.streamDescription);
if (result) { break; }
// get the file's length in sample frames
UInt64 numFrames = 0;
propSize = sizeof(numFrames);
result = ExtAudioFileGetProperty(xafref, kExtAudioFileProperty_FileLengthFrames, &propSize, &numFrames);
if (result) { break; }
if(i==metronomeBusIndex)
numFrames = (numFrames+6484)*4;
//numFrames = (numFrames * rateRatio); // account for any sample rate conversion
numFrames *= rateRatio;
// set up our buffer
mSoundBuffer[i].numFrames = (UInt32)numFrames;
mSoundBuffer[i].asbd = *(clientFormat.streamDescription);
UInt32 samples = (UInt32)numFrames * mSoundBuffer[i].asbd.mChannelsPerFrame;
mSoundBuffer[i].data = (Float32 *)calloc(samples, sizeof(Float32));
mSoundBuffer[i].sampleNum = 0;
// set up a AudioBufferList to read data into
AudioBufferList bufList;
bufList.mNumberBuffers = 1;
bufList.mBuffers[0].mNumberChannels = 1;
bufList.mBuffers[0].mData = mSoundBuffer[i].data;
bufList.mBuffers[0].mDataByteSize = samples * sizeof(Float32);
// perform a synchronous sequential read of the audio data out of the file into our allocated data buffer
UInt32 numPackets = (UInt32)numFrames;
result = ExtAudioFileRead(xafref, &numPackets, &bufList);
if (result) {
free(mSoundBuffer[i].data);
mSoundBuffer[i].data = 0;
}
// close the file and dispose the ExtAudioFileRef
ExtAudioFileDispose(xafref);
}
// [clientFormat release];
}
If anyone could point me in the right direction, how do i go about debugging the issue?
Do we need to re-encode our files in some specific way?
I tried it on iOS 9.1.beta3 yesterday and things seem to be back to normal.
Try it out. Let us know if it works out for you too.

FFmpeg transcoded sound (AAC) stops after half video time

I have a strange problem in my C/C++ FFmpeg transcoder, which takes an input MP4 (varying input codecs) and produces and output MP4 (x264, baseline & AAC LC #44100 sample rate with libfdk_aac):
The resulting mp4 video has fine images (x264) and the audio (AAC LC) works fine as well, but is only played until exactly the half of the video.
The audio is not slowed down, not stretched and doesn't stutter. It just stops right in the middle of the video.
One hint may be that the input file has a sample rate of 22050 and 22050/44100 is 0.5, but I really don't get why this would make the sound just stop after half the time. I'd expect such an error leading to sound being at the wrong speed. Everything works just fine if I don't try to enforce 44100 and instead just use the incoming sample_rate.
Another guess would be that the pts calculation doesn't work. But the audio sounds just fine (until it stops) and I do exactly the same for the video part, where it works flawlessly. "Exactly", as in the same code, but "audio"-variables replaced with "video"-variables.
FFmpeg reports no errors during the whole process. I also flush the decoders/encoders/interleaved_writing after all the package reading from the input is done. It works well for the video so I doubt there is much wrong with my general approach.
Here are the functions of my code (stripped off the error handling & other class stuff):
AudioCodecContext Setup
outContext->_audioCodec = avcodec_find_encoder(outContext->_audioTargetCodecID);
outContext->_audioStream =
avformat_new_stream(outContext->_formatContext, outContext->_audioCodec);
outContext->_audioCodecContext = outContext->_audioStream->codec;
outContext->_audioCodecContext->channels = 2;
outContext->_audioCodecContext->channel_layout = av_get_default_channel_layout(2);
outContext->_audioCodecContext->sample_rate = 44100;
outContext->_audioCodecContext->sample_fmt = outContext->_audioCodec->sample_fmts[0];
outContext->_audioCodecContext->bit_rate = 128000;
outContext->_audioCodecContext->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;
outContext->_audioCodecContext->time_base =
(AVRational){1, outContext->_audioCodecContext->sample_rate};
outContext->_audioStream->time_base = (AVRational){1, outContext->_audioCodecContext->sample_rate};
int retVal = avcodec_open2(outContext->_audioCodecContext, outContext->_audioCodec, NULL);
Resampler Setup
outContext->_audioResamplerContext =
swr_alloc_set_opts( NULL, outContext->_audioCodecContext->channel_layout,
outContext->_audioCodecContext->sample_fmt,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->channel_layout,
_inputContext._audioCodecContext->sample_fmt,
_inputContext._audioCodecContext->sample_rate,
0, NULL);
int retVal = swr_init(outContext->_audioResamplerContext);
Decoding
decodedBytes = avcodec_decode_audio4( _inputContext._audioCodecContext,
_inputContext._audioTempFrame,
&p_gotAudioFrame, &_inputContext._currentPacket);
Converting (only if decoding produced a frame, of course)
int retVal = swr_convert( outContext->_audioResamplerContext,
outContext->_audioConvertedFrame->data,
outContext->_audioConvertedFrame->nb_samples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
Encoding (only if decoding produced a frame, of course)
outContext->_audioConvertedFrame->pts =
av_frame_get_best_effort_timestamp(_inputContext._audioTempFrame);
// Init the new packet
av_init_packet(&outContext->_audioPacket);
outContext->_audioPacket.data = NULL;
outContext->_audioPacket.size = 0;
// Encode
int retVal = avcodec_encode_audio2( outContext->_audioCodecContext,
&outContext->_audioPacket,
outContext->_audioConvertedFrame,
&p_gotPacket);
// Set pts/dts time stamps for writing interleaved
av_packet_rescale_ts( &outContext->_audioPacket,
outContext->_audioCodecContext->time_base,
outContext->_audioStream->time_base);
outContext->_audioPacket.stream_index = outContext->_audioStream->index;
Writing (only if encoding produced a packet, of course)
int retVal = av_interleaved_write_frame(outContext->_formatContext, &outContext->_audioPacket);
I am quite out of ideas about what would cause such a behaviour.
So, I finally managed to figure things out myself.
The problem was indeed in the difference of the sample_rate.
You'd assume that a call to swr_convert() would give you all the samples you need for converting the audio frame when called like I did.
Of course, that would be too easy.
Instead, you need to call swr_convert (potentially) multiple times per frame and buffer its output, if required. Then you need to grab a single frame from the buffer and that is what you will have to encode.
Here is my new convertAudioFrame function:
// Calculate number of output samples
int numOutputSamples = av_rescale_rnd(
swr_get_delay(outContext->_audioResamplerContext, _inputContext._audioCodecContext->sample_rate)
+ _inputContext._audioTempFrame->nb_samples,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->sample_rate,
AV_ROUND_UP);
if (numOutputSamples == 0)
{
return;
}
uint8_t* tempSamples;
av_samples_alloc( &tempSamples, NULL,
outContext->_audioCodecContext->channels, numOutputSamples,
outContext->_audioCodecContext->sample_fmt, 0);
int retVal = swr_convert( outContext->_audioResamplerContext,
&tempSamples,
numOutputSamples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
// Write to audio fifo
if (retVal > 0)
{
retVal = av_audio_fifo_write(outContext->_audioFifo, (void**)&tempSamples, retVal);
}
av_freep(&tempSamples);
// Get a frame from audio fifo
int samplesAvailable = av_audio_fifo_size(outContext->_audioFifo);
if (samplesAvailable > 0)
{
retVal = av_audio_fifo_read(outContext->_audioFifo,
(void**)outContext->_audioConvertedFrame->data,
outContext->_audioCodecContext->frame_size);
// We got a frame, so also set its pts
if (retVal > 0)
{
p_gotConvertedFrame = 1;
if (_inputContext._audioTempFrame->pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pts;
}
else if (_inputContext._audioTempFrame->pkt_pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pkt_pts;
}
}
}
This function I basically call until there are no more frame in the audio fifo buffer.
So, the audio was only half as long because I only encoded as many frames as I decoded. Where I actually needed to encode 2 times as many frames due to 2 times the sample_rate.

Decoder crashes after ffmpeg upgrade

Recently I upgraded ffmpeg from 0.9 to 1.0 (tested on Win7x64 and on iOS), and now avcodec_decode_video2 seagfaults. Long story short: the crash occurs every time the video dimensions change (eg. from 320x240 to 160x120 or vice versa).
I receive mpeg4 video stream from some proprietary source and decode it like this:
// once, during initialization:
AVCodec *codec_ = avcodec_find_decoder(CODEC_ID_MPEG4);
AVCodecContext ctx_ = avcodec_alloc_context3(codec_);
avcodec_open2(ctx_, codec_, 0);
AVPacket packet_;
av_init_packet(&packet_);
AVFrame picture_ = avcodec_alloc_frame();
// on every frame:
int got_picture;
packet_.size = size;
packet_.data = (uint8_t *)buffer;
avcodec_decode_video2(ctx_, picture_, &got_picture, &packet_);
Again, all the above had worked flawlessly until I upgraded to 1.0. Now every time the frame dimensions change - avcodec_decode_video2 crashes. Note that I don't assign width/height in AVCodecContext - neither in the beginning, nor when the stream changes - can it be the reason?
I'd appreciate any idea!
Update: setting ctx_.width and ctx_.height doesn't help.
Update2: just before the crash I get the following log messages:
mpeg4, level 24: "Found 2 unreleased buffers!".
level 8: "Assertion i < avci->buffer_count failed at libavcodec/utils.c:603"
Update3 upgrading to 1.1.2 fixed this crash. The decoder is able again to cope with dimensions change on the fly.
You can try to fill the AVPacket::side_data. If you change the frame size, codec receives information from it (see libavcodec/utils.c apply_param_change function)
This structure can be filled as follows:
int my_ff_add_param_change(AVPacket *pkt, int32_t width, int32_t height)
{
uint32_t flags = 0;
int size = 4 * 3;
uint8_t *data;
if (!pkt)
return AVERROR(EINVAL);
flags = AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS;
data = av_packet_new_side_data(pkt, AV_PKT_DATA_PARAM_CHANGE, size);
if (!data)
return AVERROR(ENOMEM);
((uint32_t*)data)[0] = flags;
((uint32_t*)data)[1] = width;
((uint32_t*)data)[2] = height;
return 0;
}
You need to call this function every time the size changes.
I think this feature has appeared recently. I didn't know about it until I looked new ffmpeg sources.
UPD
As you write, the easiest method to solve the problem is to perform codec restart. Just call avcodec_close / avcodec_open2
I just ran into same issue when my frames were changing size on the fly. However, calling avcodec_close/avcodec_open2 is superflous. A cleaner way is to just reset your AVPacket data structure before the call to avcodec_decode_video2. Here it is the code:
av_init_packet(&packet_)
The key here is that this method resets the all of the values of AVPacket to defaults. Check docs for more info.

What is a 10.6-compatible means of recording video frames to a movie without using the QuickTime API?

I'm updating an application to be 64-bit-compatible, but I'm having a little difficulty with our movie recording code. We have a FireWire camera that feeds YUV frames into our application, which we process and encode out to disk within an MPEG4 movie. Currently, we are using the C-based QuickTime API to do this (using Image Compression Manager, etc.), but the old QuickTime API does not have support for 64 bit.
My first attempt was to use QTKit's QTMovie and encode individual frames using -addImage:forDuration:withAttributes:, but that requires the creation of an NSImage for each frame (which is computationally expensive) and it does not do temporal compression, so it doesn't generate the most compact files.
I'd like to use something like QTKit Capture's QTCaptureMovieFileOutput, but I can't figure out how to feed raw frames into that which aren't associated with a QTCaptureInput. We can't use our camera directly with QTKit Capture because of our need to manually control the gain, exposure, etc. for it.
On Lion, we now have the AVAssetWriter class in AVFoundation which lets you do this, but I still have to target Snow Leopard for the time being, so I'm trying to find a solution that works there as well.
Therefore, is there a way to do non-QuickTime frame-by-frame recording of video that is more efficient than QTMovie's -addImage:forDuration:withAttributes: and produces file sizes comparable to what the older QuickTime API can?
In the end, I decided to go with the approach suggested by TiansHUo, and use libavcodec for the video compression here. Based on the instructions by Martin here, I downloaded the FFmpeg source and built a 64-bit compatible version of the necessary libraries using
./configure --disable-gpl --arch=x86_64 --cpu=core2 --enable-shared --disable-amd3dnow --enable-memalign-hack --cc=llvm-gcc
make
sudo make install
This creates the LGPL shared libraries for the 64-bit Core2 processors in the Mac. Unfortunately, I haven't yet figured a way to make the library run without crashing when the MMX optimizations are enabled, so that is disabled right now. This slows down encoding somewhat. After some experimentation, I found that I could build a 64-bit version of the library which had MMX optimizations enabled and was stable on the Mac by using the above configuration options. This is much faster when encoding than the library built with MMX disabled.
Note that if you use these shared libraries, you should make sure you follow the LGPL compliance instructions on FFmpeg's site to the letter.
In order to get these shared libraries to function properly when placed in proper folder within my Mac application bundle, I needed to use install_name_tool to adjust the internal search paths in these libraries to point to their new location in the Frameworks directory within the application bundle:
install_name_tool -id #executable_path/../Frameworks/libavutil.51.9.1.dylib libavutil.51.9.1.dylib
install_name_tool -id #executable_path/../Frameworks/libavcodec.53.7.0.dylib libavcodec.53.7.0.dylib
install_name_tool -change /usr/local/lib/libavutil.dylib #executable_path/../Frameworks/libavutil.51.9.1.dylib libavcodec.53.7.0.dylib
install_name_tool -id #executable_path/../Frameworks/libavformat.53.4.0.dylib libavformat.53.4.0.dylib
install_name_tool -change /usr/local/lib/libavutil.dylib #executable_path/../Frameworks/libavutil.51.9.1.dylib libavformat.53.4.0.dylib
install_name_tool -change /usr/local/lib/libavcodec.dylib #executable_path/../Frameworks/libavcodec.53.7.0.dylib libavformat.53.4.0.dylib
install_name_tool -id #executable_path/../Frameworks/libswscale.2.0.0.dylib libswscale.2.0.0.dylib
install_name_tool -change /usr/local/lib/libavutil.dylib #executable_path/../Frameworks/libavutil.51.9.1.dylib libswscale.2.0.0.dylib
Your specific paths may vary. This adjustment lets them work from within the application bundle without having to install them in /usr/local/lib on the user's system.
I then had my Xcode project link against these libraries, and I created a separate class to handle the video encoding. This class takes in raw video frames (in BGRA format) through the videoFrameToEncode property and encodes them within the movieFileName file as MPEG4 video in an MP4 container. The code is as follows:
SPVideoRecorder.h
#import <Foundation/Foundation.h>
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
uint64_t getNanoseconds(void);
#interface SPVideoRecorder : NSObject
{
NSString *movieFileName;
CGFloat framesPerSecond;
AVCodecContext *codecContext;
AVStream *videoStream;
AVOutputFormat *outputFormat;
AVFormatContext *outputFormatContext;
AVFrame *videoFrame;
AVPicture inputRGBAFrame;
uint8_t *pictureBuffer;
uint8_t *outputBuffer;
unsigned int outputBufferSize;
int frameColorCounter;
unsigned char *videoFrameToEncode;
dispatch_queue_t videoRecordingQueue;
dispatch_semaphore_t frameEncodingSemaphore;
uint64_t movieStartTime;
}
#property(readwrite, assign) CGFloat framesPerSecond;
#property(readwrite, assign) unsigned char *videoFrameToEncode;
#property(readwrite, copy) NSString *movieFileName;
// Movie recording control
- (void)startRecordingMovie;
- (void)encodeNewFrameToMovie;
- (void)stopRecordingMovie;
#end
SPVideoRecorder.m
#import "SPVideoRecorder.h"
#include <sys/time.h>
#implementation SPVideoRecorder
uint64_t getNanoseconds(void)
{
struct timeval now;
gettimeofday(&now, NULL);
return now.tv_sec * NSEC_PER_SEC + now.tv_usec * NSEC_PER_USEC;
}
#pragma mark -
#pragma mark Initialization and teardown
- (id)init;
{
if (!(self = [super init]))
{
return nil;
}
/* must be called before using avcodec lib */
avcodec_init();
/* register all the codecs */
avcodec_register_all();
av_register_all();
av_log_set_level( AV_LOG_ERROR );
videoRecordingQueue = dispatch_queue_create("com.sonoplot.videoRecordingQueue", NULL);;
frameEncodingSemaphore = dispatch_semaphore_create(1);
return self;
}
#pragma mark -
#pragma mark Movie recording control
- (void)startRecordingMovie;
{
dispatch_async(videoRecordingQueue, ^{
NSLog(#"Start recording to file: %#", movieFileName);
const char *filename = [movieFileName UTF8String];
// Use an MP4 container, in the standard QuickTime format so it's readable on the Mac
outputFormat = av_guess_format("mov", NULL, NULL);
if (!outputFormat) {
NSLog(#"Could not set output format");
}
outputFormatContext = avformat_alloc_context();
if (!outputFormatContext)
{
NSLog(#"avformat_alloc_context Error!");
}
outputFormatContext->oformat = outputFormat;
snprintf(outputFormatContext->filename, sizeof(outputFormatContext->filename), "%s", filename);
// Add a video stream to the MP4 file
videoStream = av_new_stream(outputFormatContext,0);
if (!videoStream)
{
NSLog(#"av_new_stream Error!");
}
// Use the MPEG4 encoder (other DiVX-style encoders aren't compatible with this container, and x264 is GPL-licensed)
AVCodec *codec = avcodec_find_encoder(CODEC_ID_MPEG4);
if (!codec) {
fprintf(stderr, "codec not found\n");
exit(1);
}
codecContext = videoStream->codec;
codecContext->codec_id = codec->id;
codecContext->codec_type = AVMEDIA_TYPE_VIDEO;
codecContext->bit_rate = 4800000;
codecContext->width = 640;
codecContext->height = 480;
codecContext->pix_fmt = PIX_FMT_YUV420P;
// codecContext->time_base = (AVRational){1,(int)round(framesPerSecond)};
// videoStream->time_base = (AVRational){1,(int)round(framesPerSecond)};
codecContext->time_base = (AVRational){1,200}; // Set it to 200 FPS so that we give a little wiggle room when recording at 50 FPS
videoStream->time_base = (AVRational){1,200};
// codecContext->max_b_frames = 3;
// codecContext->b_frame_strategy = 1;
codecContext->qmin = 1;
codecContext->qmax = 10;
// codecContext->mb_decision = 2; // -mbd 2
// codecContext->me_cmp = 2; // -cmp 2
// codecContext->me_sub_cmp = 2; // -subcmp 2
codecContext->keyint_min = (int)round(framesPerSecond);
// codecContext->flags |= CODEC_FLAG_4MV; // 4mv
// codecContext->flags |= CODEC_FLAG_LOOP_FILTER;
codecContext->i_quant_factor = 0.71;
codecContext->qcompress = 0.6;
// codecContext->max_qdiff = 4;
codecContext->flags2 |= CODEC_FLAG2_FASTPSKIP;
if(outputFormat->flags & AVFMT_GLOBALHEADER)
{
codecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
}
// Open the codec
if (avcodec_open(codecContext, codec) < 0)
{
NSLog(#"Couldn't initialize the codec");
return;
}
// Open the file for recording
if (avio_open(&outputFormatContext->pb, outputFormatContext->filename, AVIO_FLAG_WRITE) < 0)
{
NSLog(#"Couldn't open file");
return;
}
// Start by writing the video header
if (avformat_write_header(outputFormatContext, NULL) < 0)
{
NSLog(#"Couldn't write video header");
return;
}
// Set up the video frame and output buffers
outputBufferSize = 400000;
outputBuffer = malloc(outputBufferSize);
int size = codecContext->width * codecContext->height;
int pictureBytes = avpicture_get_size(PIX_FMT_YUV420P, codecContext->width, codecContext->height);
pictureBuffer = (uint8_t *)av_malloc(pictureBytes);
videoFrame = avcodec_alloc_frame();
videoFrame->data[0] = pictureBuffer;
videoFrame->data[1] = videoFrame->data[0] + size;
videoFrame->data[2] = videoFrame->data[1] + size / 4;
videoFrame->linesize[0] = codecContext->width;
videoFrame->linesize[1] = codecContext->width / 2;
videoFrame->linesize[2] = codecContext->width / 2;
avpicture_alloc(&inputRGBAFrame, PIX_FMT_BGRA, codecContext->width, codecContext->height);
frameColorCounter = 0;
movieStartTime = getNanoseconds();
});
}
- (void)encodeNewFrameToMovie;
{
// NSLog(#"Encode frame");
if (dispatch_semaphore_wait(frameEncodingSemaphore, DISPATCH_TIME_NOW) != 0)
{
return;
}
dispatch_async(videoRecordingQueue, ^{
// CFTimeInterval previousTimestamp = CFAbsoluteTimeGetCurrent();
frameColorCounter++;
if (codecContext == NULL)
{
return;
}
// Take the input BGRA texture data and convert it to a YUV 4:2:0 planar frame
avpicture_fill(&inputRGBAFrame, videoFrameToEncode, PIX_FMT_BGRA, codecContext->width, codecContext->height);
struct SwsContext * img_convert_ctx = sws_getContext(codecContext->width, codecContext->height, PIX_FMT_BGRA, codecContext->width, codecContext->height, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL);
sws_scale(img_convert_ctx, (const uint8_t* const *)inputRGBAFrame.data, inputRGBAFrame.linesize, 0, codecContext->height, videoFrame->data, videoFrame->linesize);
// Encode the frame
int out_size = avcodec_encode_video(codecContext, outputBuffer, outputBufferSize, videoFrame);
// Generate a packet and insert in the video stream
if (out_size != 0)
{
AVPacket videoPacket;
av_init_packet(&videoPacket);
if (codecContext->coded_frame->pts != AV_NOPTS_VALUE)
{
uint64_t currentFrameTime = getNanoseconds();
videoPacket.pts = av_rescale_q(((uint64_t)currentFrameTime - (uint64_t)movieStartTime) / 1000ull/*codecContext->coded_frame->pts*/, AV_TIME_BASE_Q/*codecContext->time_base*/, videoStream->time_base);
// NSLog(#"Frame time %lld, converted time: %lld", ((uint64_t)currentFrameTime - (uint64_t)movieStartTime) / 1000ull, videoPacket.pts);
}
if(codecContext->coded_frame->key_frame)
{
videoPacket.flags |= AV_PKT_FLAG_KEY;
}
videoPacket.stream_index = videoStream->index;
videoPacket.data = outputBuffer;
videoPacket.size = out_size;
int ret = av_write_frame(outputFormatContext, &videoPacket);
if (ret < 0)
{
av_log(outputFormatContext, AV_LOG_ERROR, "%s","Error while writing frame.\n");
av_free_packet(&videoPacket);
return;
}
av_free_packet(&videoPacket);
}
// CFTimeInterval frameDuration = CFAbsoluteTimeGetCurrent() - previousTimestamp;
// NSLog(#"Frame duration: %f ms", frameDuration * 1000.0);
dispatch_semaphore_signal(frameEncodingSemaphore);
});
}
- (void)stopRecordingMovie;
{
dispatch_async(videoRecordingQueue, ^{
// Write out the video trailer
if (av_write_trailer(outputFormatContext) < 0)
{
av_log(outputFormatContext, AV_LOG_ERROR, "%s","Error while writing trailer.\n");
exit(1);
}
// Close out the file
if (!(outputFormat->flags & AVFMT_NOFILE))
{
avio_close(outputFormatContext->pb);
}
// Free up all movie-related resources
avcodec_close(codecContext);
av_free(codecContext);
codecContext = NULL;
free(pictureBuffer);
free(outputBuffer);
av_free(videoFrame);
av_free(outputFormatContext);
av_free(videoStream);
});
}
#pragma mark -
#pragma mark Accessors
#synthesize framesPerSecond, videoFrameToEncode, movieFileName;
#end
This works under Lion and Snow Leopard in a 64-bit application. It records at the same bitrate as my previous QuickTime-based approach, with overall lower CPU usage.
Hopefully, this will help out someone else in a similar situation.
I asked a very similar question of a QuickTime engineer last month at WWDC and they basically suggested using a 32-bit helper process...
I know that's not what you wanted to hear. ;)
Yes, there is (at least) a way to do non-QuickTime frame-by-frame recording of video that is more efficient and produces files comparable to Quicktime.
The open-source library libavcodec is perfect for your case of video-encoding. It is used in very popular open-source and commercial software and libraries (For example: mplayer, google chrome, imagemagick, opencv) It also provides a huge amount of options to tweak and numerous file formats (all important formats and lots of exotic formats). It is efficient and produces files at all kinds of bit-rates.
From Wikipedia:
libavcodec is a free software/open source LGPL-licensed library of
codecs for encoding and decoding video and audio data.[1] It is
provided by FFmpeg project or Libav project.[2] [3] libavcodec is an
integral part of many open-source multimedia applications and
frameworks. The popular MPlayer, xine and VLC media players use it as
their main, built-in decoding engine that enables playback of many
audio and video formats on all supported platforms. It is also used by
the ffdshow tryouts decoder as its primary decoding library.
libavcodec is also used in video editing and transcoding applications
like Avidemux, MEncoder or Kdenlive for both decoding and encoding.
libavcodec is particular in that it contains decoder and sometimes
encoder implementations of several proprietary formats, including ones
for which no public specification has been released. This reverse
engineering effort is thus a significant part of libavcodec
development. Having such codecs available within the standard
libavcodec framework gives a number of benefits over using the
original codecs, most notably increased portability, and in some cases
also better performance, since libavcodec contains a standard library
of highly optimized implementations of common building blocks, such as
DCT and color space conversion. However, even though libavcodec
strives for decoding that is bit-exact to the official implementation,
bugs and missing features in such reimplementations can sometimes
introduce compatibility problems playing back certain files.
You can choose to import FFmpeg directly into your XCode project.
Another solution is to directly pipe your frames into the FFmpeg
executable.
The FFmpeg project is a fast, accurate multimedia transcoder which can
be applied in a variety of scenarios on OS X.
FFmpeg (libavcodec included) can be compiled in mac
http://jungels.net/articles/ffmpeg-howto.html
FFmpeg (libavcodec included) can be also compiled in 64 bits on snow leopard
http://www.martinlos.com/?p=41
FFmpeg supports a huge number of video and audio codecs:
http://en.wikipedia.org/wiki/Libavcodec#Implemented_video_codecs
Note that libavcodec and FFmpeg is LGPL, which means that you will have to mention you've used them, and you don't need to open source your project.

Resources