Encoding raw nv12 frames with ffmpeg - ffmpeg

I am trying to encode raw frames in nv12 format. Frame rate is 15. I am using avcodec to encode. My capture device has a callback function which is activated when a raw viewfinder frame is available. I am copying the raw viewfinder frame and making a AVFrame from the data. Then I supply the frame to avcodec_encode_video as described in the api sample but somehow I am not getting expected result. I am using posix thread. I keep the raw frames on a buffer. Then my encoder thread collects data from the buffer and encodes it. The speed of encoding is too slow (h264 and mpeg1 -tested). Is it a problem with my threading or something else? I am at loss. The output is mysterious. The whole encoding process is a single function and single threaded but I find a bunch of frame encoded at a time. How exactly does the encoder function?Here is the code snippet for encoding:
fprintf(stderr,"Encoding %d\n",i);
AVFrame *picture;
int y = 0,x;
picture = avcodec_alloc_frame();
av_image_alloc(picture->data, picture->linesize,c->width, c->height,c->pix_fmt, 1);
uint8_t* buf_source = new uint8_t[vr->width*vr->height*3/2];
uint8_t* data = vr->buffer->Read(vr->width*vr->height*3/2);
/*for (y = 0; y < vr->height*vr->width; y++)
picture->data[0][(y/vr->width) * picture->linesize[0] + (y%vr->width)] = buf_source[(y/vr->width)+(y%vr->width)]x + y + i * 7;
picture->data[1][(y/vr->width) * picture->linesize[1] + (y%vr->width)] = buf_source[vr->width*vr->height + 2 * ((y/vr->width)+(y%vr->width))]128 + y + i * 2;
picture->data[2][(y/vr->width) * picture->linesize[2] + (y%vr->width)] = buf_source[vr->width*vr->height + 2 * ((y/vr->width)+(y%vr->width)) + 1]64 + x + i * 5;
for(y=0;y<c->height;y++) {
for(x=0;x<c->width;x++) {
picture->data[0][y * picture->linesize[0] + x] = x + y + i * 7;
/* Cb and Cr */
for(y=0;y<c->height/2;y++) {
for(x=0;x<c->width/2;x++) {
picture->data[1][y * picture->linesize[1] + x] = 128 + y + i * 2;
picture->data[2][y * picture->linesize[2] + x] = 64 + x + i * 5;
fprintf(stderr,"Data ready\n");
outbuf_size = 100000 + c->width*c->height*3/2;
outbuf = (uint8_t*)malloc(outbuf_size);
fprintf(stderr,"Preparation done!!!\n");
out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture);
had_output |= out_size;
printf("encoding frame %3d (size=%5d)\n", i, out_size);
fwrite(outbuf, 1, out_size, f);

You can use sws_scale in libswscale for colorspace conversion. First create an SwsContext specifying the source (NV12) and destination (YUV420P) using sws_getContext.
m_pSwsCtx = sws_getContext(picture_width,
And then when you want to do the conversion each frame,
sws_scale(m_pSwsCtx, frameData, frameLineSize, 0, frameHeight,
outFrameData, outFrameLineSize);


Why is chrominance lost when i copy DXGI_FORMAT_NV12 ID3D11Texture from a d3d11device to a d3d11on12device?

D3D11_TEXTURE2D_DESC texture_desc = {0};
texture_desc.Width = 640;
texture_desc.Height = 480;
texture_desc.MipLevels = 1;
texture_desc.Format = DXGI_FORMAT_NV12;
texture_desc.SampleDesc.Count = 1;
texture_desc.ArraySize = 1;
texture_desc.Usage = D3D11_USAGE_DEFAULT;
texture_desc.MiscFlags = D3D11_RESOURCE_MISC_SHARED;
Microsoft::WRL::ComPtr<ID3D11Texture2D> temp_texture_for_my_device{nullptr};
my_device->CreateTexture2D(&texture_desc, NULL, &temp_texture_for_my_device);
Microsoft::WRL::ComPtr<IDXGIResource> dxgi_resource{nullptr};
HANDLE shared_handle = NULL;
Microsoft::WRL::ComPtr<ID3D11Texture2D> temp_texture_for_ffmpeg_device {nullptr};
ffmpeg_device->OpenSharedResource(shared_handle, __uuidof(ID3D11Texture2D), (void**)temp_texture_for_ffmpeg_device.GetAddressOf());
ffmpeg_device_context->CopySubresourceRegion(temp_texture_for_ffmpeg_device.Get(), 0, 0, 0, 0, (ID3D11Texture2D*)ffmpeg_avframe->data[0], (int)ffmpeg_avframe->data[1], NULL);
I copy temp_texture_for_ffmpeg_device to a D3D11_USAGE_STAGING, it's normal, but when i copy temp_texture_for_my_device to a D3D11_USAGE_STAGING, i lost the chrominance data.
When i map the texture to cpu via D3D11_USAGE_STAGING:
temp_texture_for_ffmpeg_device : RowPitch is 768, DepthPitch is 768 * 720;
temp_texture_for_my_device : RowPitch is 1024, DepthPitch is 1024 * 480;
I think there are some different parameters between the two devices(or device context?), but I don't know what parameters would cause such a difference
my_device and my_device_context are created by D3D11On12CreateDevice
The DirectX Video formats are planar, meaning that each component is contiguous in memory rather than being interleaved like most formats. For DirectX 12, this is explicitly exposed in the layout information which you can obtain via D3D12GetFormatPlaneCount.
Here's a template that works with D3D12_SUBRESOURCE_DATA and D3D12_MEMCPY_DEST. Here the SlicePitch is set to the size of an individual plane.
template<typename T, typename PT> void AdjustPlaneResource(
_In_ size_t height,
_In_ size_t slicePlane,
_Inout_ T& res) noexcept
switch (static_cast<int>(fmt))
case DXGI_FORMAT_P010:
case DXGI_FORMAT_P016:
if (!slicePlane)
// Plane 0
res.SlicePitch = res.RowPitch * static_cast<PT>(height);
// Plane 1
res.pData = const_cast<uint8_t*>(reinterpret_cast<const uint8_t*>(res.pData) + res.RowPitch * PT(height));
res.SlicePitch = res.RowPitch * static_cast<PT>((height + 1) >> 1);
if (!slicePlane)
// Plane 0
res.SlicePitch = res.RowPitch * static_cast<PT>(height);
// Plane 1
res.pData = const_cast<uint8_t*>(reinterpret_cast<const uint8_t*>(res.pData) + res.RowPitch * PT(height));
res.RowPitch = (res.RowPitch >> 1);
res.SlicePitch = res.RowPitch * static_cast<PT>(height);
For DirectX 11, the extra planar information has to be assumed as it's not directly exposed by the API as such. You have to compute the extra space required for the additional plane(s). Here's a snippet from DirectXTex. In this case slice is the total size of all the planes in one 'slice' of the resource.
pitch = ((uint64_t(width) + 1u) >> 1) * 4u;
slice = pitch * uint64_t(height);
case DXGI_FORMAT_Y210:
case DXGI_FORMAT_Y216:
pitch = ((uint64_t(width) + 1u) >> 1) * 8u;
slice = pitch * uint64_t(height);
pitch = ((uint64_t(width) + 1u) >> 1) * 2u;
slice = pitch * (uint64_t(height) + ((uint64_t(height) + 1u) >> 1));
case DXGI_FORMAT_P010:
case DXGI_FORMAT_P016:
pitch = ((uint64_t(width) + 1u) >> 1) * 4u;
slice = pitch * (uint64_t(height) + ((uint64_t(height) + 1u) >> 1));
pitch = ((uint64_t(width) + 3u) >> 2) * 4u;
slice = pitch * uint64_t(height) * 2u;
case DXGI_FORMAT_P208:
pitch = ((uint64_t(width) + 1u) >> 1) * 2u;
slice = pitch * uint64_t(height) * 2u;
case DXGI_FORMAT_V208:
pitch = uint64_t(width);
slice = pitch * (uint64_t(height) + (((uint64_t(height) + 1u) >> 1) * 2u));
case DXGI_FORMAT_V408:
pitch = uint64_t(width);
slice = pitch * (uint64_t(height) + (uint64_t(height >> 1) * 4u));

FFmpeg - MJPEG decoding gives inconsistent values

I have a set of JPEG frames which I am muxing into an avi, which gives me a mjpeg video. This is the command I run on the console:
ffmpeg -y -start_number 0 -i %06d.JPEG -codec copy vid.avi
When I try to demux the video using ffmpeg C api, I get frames which are slightly different in values. Demuxing code looks something like this:
AVFormatContext* fmt_ctx = NULL;
AVCodecContext* cdc_ctx = NULL;
AVCodec* vid_cdc = NULL;
int ret;
unsigned int height, width;
// read_nframes is the number of frames to read
output_arr = new unsigned char [height * width * 3 *
sizeof(unsigned char) * read_nframes];
avcodec_open2(cdc_ctx, vid_cdc, NULL);
int num_bytes;
uint8_t* buffer = NULL;
const AVPixelFormat out_format = AV_PIX_FMT_RGB24;
num_bytes = av_image_get_buffer_size(out_format, width, height, 1);
buffer = (uint8_t*)av_malloc(num_bytes * sizeof(uint8_t));
AVFrame* vid_frame = NULL;
vid_frame = av_frame_alloc();
AVFrame* conv_frame = NULL;
conv_frame = av_frame_alloc();
av_image_fill_arrays(conv_frame->data, conv_frame->linesize, buffer,
out_format, width, height, 1);
struct SwsContext *sws_ctx = NULL;
sws_ctx = sws_getContext(width, height, cdc_ctx->pix_fmt,
width, height, out_format,
int frame_num = 0;
AVPacket vid_pckt;
while (av_read_frame(fmt_ctx, &vid_pckt) >=0) {
ret = avcodec_send_packet(cdc_ctx, &vid_pckt);
if (ret < 0)
ret = avcodec_receive_frame(cdc_ctx, vid_frame);
if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
if (ret >= 0) {
// convert image from native format to planar GBR
sws_scale(sws_ctx, vid_frame->data,
vid_frame->linesize, 0, vid_frame->height,
conv_frame->data, conv_frame->linesize);
unsigned char* r_ptr = output_arr +
(height * width * sizeof(unsigned char) * 3 * frame_num);
unsigned char* g_ptr = r_ptr + (height * width * sizeof(unsigned char));
unsigned char* b_ptr = g_ptr + (height * width * sizeof(unsigned char));
unsigned int pxl_i = 0;
for (unsigned int r = 0; r < height; ++r) {
uint8_t* avframe_r = conv_frame->data[0] + r*conv_frame->linesize[0];
for (unsigned int c = 0; c < width; ++c) {
r_ptr[pxl_i] = avframe_r[0];
g_ptr[pxl_i] = avframe_r[1];
b_ptr[pxl_i] = avframe_r[2];
avframe_r += 3;
if (frame_num >= read_nframes)
In my experience around two-thirds of the pixel values are different, each by +-1 (in a range of [0,255]). I am wondering is it due to some decoding scheme FFmpeg uses for reading JPEG frames? I tried encoding and decoding png frames, and it works perfectly fine. I am sure this is something to do with the libav decoding process because the MD5 values are consistent between the images and the video:
ffmpeg -i %06d.JPEG -f framemd5 -
ffmpeg -i vid.avi -f framemd5 -
In short my goal is to get the same pixel by pixel values for each JPEG frame as I would I have gotten if I was reading the JPEG images directly. Here is the stand-alone bitbucket code I used. It includes cmake files to build code, and a couple of jpeg frames with the converted avi file to test this problem. (give '--filetype png' to test the png decoding).

Is this part of a real IFFT process really optimal?

When calculating (I)FFT it is possible to calculate "N*2 real" data points using a ordinary complex (I)FFT of N data points.
Not sure about my terminology here, but this is how I've read it described.
There are several posts about this on stackoverflow already.
This can speed things up a bit when only dealing with such "real" data which is often the case when dealing with for example sound (re-)synthesis.
This increase in speed is offset by the need for a pre-processing step that somehow... uhh... fidaddles? the data to achieve this. Look I'm not even going to try to convince anyone I fully understand this but thanks to previously mentioned threads, I came up with the following routine, which does the job nicely (thank you!).
However, on my microcontroller this costs a bit more than I'd like even though trigonometric functions are already optimized with LUTs.
But the routine itself just looks like it should be possible to optimize mathematically to minimize processing. To me it seems similar to plain 2d rotation. I just can't quite wrap my head around it, but it just feels like this could be done with fewer both trigonometric calls and arithmetic operations.
I was hoping perhaps someone else might easily see what I don't and provide some insight into how this math may be simplified.
This particular routine is for use with IFFT, before the bit-reversal stage.
MAG_A/B = 0 TO 1
PHA_A/B = 0 TO 2PI
r = MAG_A * sin(PHA_A)
i = MAG_B * sin(PHA_B)
rsum = r + i
rdif = r - i
r = MAG_A * cos(PHA_A)
i = MAG_B * cos(PHA_B)
isum = r + i
idif = r - i
r = -cos(INDEX)
i = -sin(INDEX)
rtmp = r * isum + i * rdif
itmp = i * isum - r * rdif
OUTPUT rsum + rtmp
OUTPUT itmp + idif
OUTPUT rsum - rtmp
OUTPUT itmp - idif
original working code, if that's your poison:
void fft_nz_set(fft_complex_t complex[], unsigned bits, unsigned index, int32_t mag_lo, int32_t pha_lo, int32_t mag_hi, int32_t pha_hi) {
unsigned size = 1 << bits;
unsigned shift = SINE_TABLE_BITS - (bits - 1);
unsigned n = index; // index for mag_lo, pha_lo
unsigned z = size - index; // index for mag_hi, pha_hi
int32_t rsum, rdif, isum, idif, r, i;
r = smmulr(mag_lo, sine(pha_lo)); // mag_lo * sin(pha_lo)
i = smmulr(mag_hi, sine(pha_hi)); // mag_hi * sin(pha_hi)
rsum = r + i; rdif = r - i;
r = smmulr(mag_lo, cosine(pha_lo)); // mag_lo * cos(pha_lo)
i = smmulr(mag_hi, cosine(pha_hi)); // mag_hi * cos(pha_hi)
isum = r + i; idif = r - i;
r = -sinetable[(1 << SINE_BITS) - (index << shift)]; // cos(pi_c * (index / size) / 2)
i = -sinetable[index << shift]; // sin(pi_c * (index / size) / 2)
int32_t rtmp = smmlar(r, isum, smmulr(i, rdif)) << 1; // r * isum + i * rdif
int32_t itmp = smmlsr(i, isum, smmulr(r, rdif)) << 1; // i * isum - r * rdif
complex[n].r = rsum + rtmp;
complex[n].i = itmp + idif;
complex[z].r = rsum - rtmp;
complex[z].i = itmp - idif;
// For reference, this would be used as follows to generate a sawtooth (after IFFT)
void synth_sawtooth(fft_complex_t *complex, unsigned fft_bits) {
unsigned fft_size = 1 << fft_bits;
fft_sym_dc(complex, 0, 0); // sets dc bin [0]
for(unsigned n = 1, z = fft_size - 1; n <= fft_size >> 1; n++, z--) {
// calculation of amplitude/index (sawtooth) for both n and z
fft_sym_magnitude(complex, fft_bits, n, 0x4000000 / n, 0x4000000 / z);

How to mention real image instead of dummy image in ffmpeg api-example.c

I am using video_encode_example function from api-example.c of FFmpeg,
which basically creates 25 dummy images and encodes into a one second video.
How ever i am unable to mention real images instead of dummy ones.
If any one know how to do this for xcode objective C, pl submit a reply.
Below is the function
* Video encoding example
static void video_encode_example(const char *filename)
AVCodec *codec;
AVCodecContext *c= NULL;
int i, out_size, size, x, y, outbuf_size;
FILE *f;
AVFrame *picture;
uint8_t *outbuf, *picture_buf;
printf("Video encoding\n");
/* find the mpeg1 video encoder */
codec = avcodec_find_encoder(CODEC_ID_MPEG1VIDEO);
if (!codec) {
fprintf(stderr, "codec not found\n");
c= avcodec_alloc_context();
picture= avcodec_alloc_frame();
/* put sample parameters */
c->bit_rate = 400000;
/* resolution must be a multiple of two */
c->width = 352;
c->height = 288;
/* frames per second */
c->time_base= (AVRational){1,25};
c->gop_size = 10; /* emit one intra frame every ten frames */
c->pix_fmt = PIX_FMT_YUV420P;
/* open it */
if (avcodec_open(c, codec) < 0) {
fprintf(stderr, "could not open codec\n");
f = fopen(filename, "wb");
if (!f) {
fprintf(stderr, "could not open %s\n", filename);
/* alloc image and output buffer */
outbuf_size = 100000;
outbuf = malloc(outbuf_size);
size = c->width * c->height;
picture_buf = malloc((size * 3) / 2); /* size for YUV 420 */
picture->data[0] = picture_buf;
picture->data[1] = picture->data[0] + size;
picture->data[2] = picture->data[1] + size / 4;
picture->linesize[0] = c->width;
picture->linesize[1] = c->width / 2;
picture->linesize[2] = c->width / 2;
/* encode 1 second of video */
for(i=0;i<25;i++) {
/* prepare a dummy image */
/* Y */
for(y=0;y<c->height;y++) {
for(x=0;x<c->width;x++) {
picture->data[0][y * picture->linesize[0] + x] = x + y + i * 3;
/* Cb and Cr */
for(y=0;y<c->height/2;y++) {
for(x=0;x<c->width/2;x++) {
picture->data[1][y * picture->linesize[1] + x] = 128 + y + i * 2;
picture->data[2][y * picture->linesize[2] + x] = 64 + x + i * 5;
/* encode the image */
out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture);
printf("encoding frame %3d (size=%5d)\n", i, out_size);
fwrite(outbuf, 1, out_size, f);
/* get the delayed frames */
for(; out_size; i++) {
out_size = avcodec_encode_video(c, outbuf, outbuf_size, NULL);
printf("write frame %3d (size=%5d)\n", i, out_size);
fwrite(outbuf, 1, out_size, f);
/* add sequence end code to have a real mpeg file */
outbuf[0] = 0x00;
outbuf[1] = 0x00;
outbuf[2] = 0x01;
outbuf[3] = 0xb7;
fwrite(outbuf, 1, 4, f);
I didn't test it but may be if you are capturing the image in YUV420p format you can point the picture->data[0] to your captured byte pointer.
picture->data[0] = picture_buf; //put your data here, remove the dummy data insertion

How to write a video encoder with ffmpeg that explicitly controls the position of keyframes?

I want to write an encoder with ffmpeg which can put iFrames (keyframes) at positions I want. Where can I found tutorials or reference material for it?
Is it possible to do this with mencoder or any opensource encoder. I want to encode H263 file. I am writing under & for linux.
You'll need to look at the libavcodec documentation - specifically, at avcodec_encode_video(). I found that the best available documentation is in the ffmpeg header files and the API sample source code that's provided with the ffmpeg source. Specifically, look at libavcodec/api-example.c or even ffmpeg.c.
To force an I frame, you'll need to set the pict_type member of the picture you're encoding to 1: 1 is an I frame, 2 is a P frame, and I don't remember what's the code for a B frame off the top of my head... Also, the key_frame member needs to be set to 1.
Some introductory material is available here and here, but I don't really know how good it is.
You'll need to be careful how you allocate the frame objects that the API calls require. api-example.c is your best bet as far as that goes, in my opinion. Look for the function video_encode_example() - it's concise and illustrates all the important things you need to worry about - pay special attention to the second call to avcodec_encode_video() that passes a NULL picture argument - it's required to get the last frames of video since MPEG video is encoded out of sequence and you may end up with a delay of a few frames.
An up-to-date version of api-example.c can be found at http://ffmpeg.org/doxygen/trunk/doc_2examples_2decoding_encoding_8c-example.html
It does the entire video encoding in a single and relatively short function. So this is probably a good place to start. Compile and run it. And then start modifying it until it does what you want.
It also has audio encoding and audio & video decoding examples.
GStreamer has decent documentation, has bindings for a number of languages (although the native API is C), and supports any video format you can find plugins for, including H.263 via gstreamer-ffmpeg.
you will need libavcodec library, For the first step I think you can learn about its use in ffplay.c file inside ffmpeg source code. It would tell you a lot. You can check my project also about video at rtstegvideo.sourceforge.net.
Hope this help.
If you're Java programmer then use Xuggler.
Minimal runnable example on FFmpeg 2.7
Based on Ori Pessach's answer, below is a minimal example that generates frames of form.
The key parts of the code that control frame type are:
c = avcodec_alloc_context3(codec);
/* Minimal distance of I-frames. This is the maximum value allowed,
or else we get a warning at runtime. */
c->keyint_min = 600;
/* Or else it defaults to 0 b-frames are not allowed. */
c->max_b_frames = 1;
frame->key_frame = 0;
switch (frame->pts % 4) {
case 0:
frame->key_frame = 1;
frame->pict_type = AV_PICTURE_TYPE_I;
case 1:
case 3:
frame->pict_type = AV_PICTURE_TYPE_P;
case 2:
frame->pict_type = AV_PICTURE_TYPE_B;
We can then verify the frame type with:
ffprobe -select_streams v \
-show_frames \
-show_entries frame=pict_type \
-of csv \
as mentioned at: https://superuser.com/questions/885452/extracting-the-index-of-key-frames-from-a-video-using-ffmpeg
Some rules were enforced by FFmpeg even if I try to overcome them:
the first frame is an I-frame
cannot place a B0frame before an I-frame (TODO why?)
Preview of generated output.
#include <libavcodec/avcodec.h>
#include <libavutil/imgutils.h>
#include <libavutil/opt.h>
#include <libswscale/swscale.h>
static AVCodecContext *c = NULL;
static AVFrame *frame;
static AVPacket pkt;
static FILE *file;
struct SwsContext *sws_context = NULL;
Convert RGB24 array to YUV. Save directly to the `frame`,
modifying its `data` and `linesize` fields
static void ffmpeg_encoder_set_frame_yuv_from_rgb(uint8_t *rgb) {
const int in_linesize[1] = { 3 * c->width };
sws_context = sws_getCachedContext(sws_context,
c->width, c->height, AV_PIX_FMT_RGB24,
c->width, c->height, AV_PIX_FMT_YUV420P,
0, 0, 0, 0);
sws_scale(sws_context, (const uint8_t * const *)&rgb, in_linesize, 0,
c->height, frame->data, frame->linesize);
Generate 2 different images with four colored rectangles, each 25 frames long:
Image 1:
black | red
green | blue
Image 2:
yellow | red
green | white
uint8_t* generate_rgb(int width, int height, int pts, uint8_t *rgb) {
int x, y, cur;
rgb = realloc(rgb, 3 * sizeof(uint8_t) * height * width);
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) {
cur = 3 * (y * width + x);
rgb[cur + 0] = 0;
rgb[cur + 1] = 0;
rgb[cur + 2] = 0;
if ((frame->pts / 25) % 2 == 0) {
if (y < height / 2) {
if (x < width / 2) {
/* Black. */
} else {
rgb[cur + 0] = 255;
} else {
if (x < width / 2) {
rgb[cur + 1] = 255;
} else {
rgb[cur + 2] = 255;
} else {
if (y < height / 2) {
rgb[cur + 0] = 255;
if (x < width / 2) {
rgb[cur + 1] = 255;
} else {
rgb[cur + 2] = 255;
} else {
if (x < width / 2) {
rgb[cur + 1] = 255;
rgb[cur + 2] = 255;
} else {
rgb[cur + 0] = 255;
rgb[cur + 1] = 255;
rgb[cur + 2] = 255;
return rgb;
/* Allocate resources and write header data to the output file. */
void ffmpeg_encoder_start(const char *filename, int codec_id, int fps, int width, int height) {
AVCodec *codec;
int ret;
codec = avcodec_find_encoder(codec_id);
if (!codec) {
fprintf(stderr, "Codec not found\n");
c = avcodec_alloc_context3(codec);
if (!c) {
fprintf(stderr, "Could not allocate video codec context\n");
c->bit_rate = 400000;
c->width = width;
c->height = height;
c->time_base.num = 1;
c->time_base.den = fps;
/* I, P, B frame placement parameters. */
c->gop_size = 600;
c->max_b_frames = 1;
c->keyint_min = 600;
c->pix_fmt = AV_PIX_FMT_YUV420P;
if (codec_id == AV_CODEC_ID_H264)
av_opt_set(c->priv_data, "preset", "slow", 0);
if (avcodec_open2(c, codec, NULL) < 0) {
fprintf(stderr, "Could not open codec\n");
file = fopen(filename, "wb");
if (!file) {
fprintf(stderr, "Could not open %s\n", filename);
frame = av_frame_alloc();
if (!frame) {
fprintf(stderr, "Could not allocate video frame\n");
frame->format = c->pix_fmt;
frame->width = c->width;
frame->height = c->height;
ret = av_image_alloc(frame->data, frame->linesize, c->width, c->height, c->pix_fmt, 32);
if (ret < 0) {
fprintf(stderr, "Could not allocate raw picture buffer\n");
Write trailing data to the output file
and free resources allocated by ffmpeg_encoder_start.
void ffmpeg_encoder_finish(void) {
uint8_t endcode[] = { 0, 0, 1, 0xb7 };
int got_output, ret;
do {
ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
if (ret < 0) {
fprintf(stderr, "Error encoding frame\n");
if (got_output) {
fwrite(pkt.data, 1, pkt.size, file);
} while (got_output);
fwrite(endcode, 1, sizeof(endcode), file);
Encode one frame from an RGB24 input and save it to the output file.
Must be called after ffmpeg_encoder_start, and ffmpeg_encoder_finish
must be called after the last call to this function.
void ffmpeg_encoder_encode_frame(uint8_t *rgb) {
int ret, got_output;
pkt.data = NULL;
pkt.size = 0;
switch (frame->pts % 4) {
case 0:
frame->key_frame = 1;
frame->pict_type = AV_PICTURE_TYPE_I;
case 1:
case 3:
frame->key_frame = 0;
frame->pict_type = AV_PICTURE_TYPE_P;
case 2:
frame->key_frame = 0;
frame->pict_type = AV_PICTURE_TYPE_B;
ret = avcodec_encode_video2(c, &pkt, frame, &got_output);
if (ret < 0) {
fprintf(stderr, "Error encoding frame\n");
if (got_output) {
fwrite(pkt.data, 1, pkt.size, file);
/* Represents the main loop of an application which generates one frame per loop. */
static void encode_example(const char *filename, int codec_id) {
int pts;
int width = 320;
int height = 240;
uint8_t *rgb = NULL;
ffmpeg_encoder_start(filename, codec_id, 25, width, height);
for (pts = 0; pts < 100; pts++) {
frame->pts = pts;
rgb = generate_rgb(width, height, pts, rgb);
int main(void) {
encode_example("tmp.h264", AV_CODEC_ID_H264);
encode_example("tmp.mpg", AV_CODEC_ID_MPEG1VIDEO);
/* TODO: is this encoded correctly? Possible to view it without container? */
/*encode_example("tmp.vp8", AV_CODEC_ID_VP8);*/
return 0;
Tested on Ubuntu 15.10. GitHub upstream.
Do you really want to do this?
In most cases, you are better off just controlling the global parameters of AVCodecContext.
FFmpeg does smart things like using a keyframe if the new frame is completely different from the previous one, and not much would be gained from differential encoding.
For example, if we set just:
c->keyint_min = 600;
then we get exactly 4 key-frames on the above example, which is logical since there are 4 abrupt frame changes on the generated video.
