Am Currently updating our FFMPEG library usage from a pretty old version(0.5) to 2.8. As part of the change, had replaced avcodec_decode_video to avcodec_decode_video2. However, am noticing quite a difference in the way avcodec_decode_video2 functions compared to the old avcodec_decode_video. For the same packet (same data), 'avcodec_decode_video2' gives got_picture_ptr as zeo whereas the old 'avcodec_decode_video' was giving a non-zero value. In the example that am describing here, am decoding an FLV file with VideoCodec:H264-MPEG-4 AVC (part 10) and AudioCodec:MPEG AAC Audio (Am attaching a part of the Hex Version of the FLV file in FLV_Sample.Hex FLV_Sample_Hex). The original flv file is too large). For the first AVPacket (obtained from av_read_frame), got_picture_ptr from 'avcodec_decode_video2' is zero but old 'avcodec_decode_video' gives 296(Am attaching the entire AVPacket data obtained and the outputs obtained from the two functions in the file FFMPEG_Decoding_Packet_Info.txt FFMPEG_Decoding_Packet_Info). Continuing on, the new 'avcodec_decode_video2' keeps giving 'Zero' till the 23rd Packet where it gives 1. So its not like avcodec_decode_video2 keeps giving zero. My main dilemma is that am not sure if this difference in behaviour is due to the changes in 'avcodec_decode_video2' or any errors that I have made in using the Decoder. I have put a snippet of the code that am using to use the decoder below. Any suggestions will be helpful.
AVFormatContext *pFormatCtx;
AVCodecContext *pCodecCtx;
AVCodec *pCodec;
AVFrame *pFrameRGB;
#if FFMPEG_2_8
avformat_open_input(&pFormatCtx, strFileName, NULL, NULL) ;
#else
av_open_input_file(&pFormatCtx, strFileName, NULL, 0, NULL) ;
#endif //FFMPEG_2_8
size_t videoStream=pFormatCtx->nb_streams;
bool streamFound = false ;
for(size_t i=0; i<pFormatCtx->nb_streams; i++)
{
#if FFMPEG_2_8
if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO)
#else
if(pFormatCtx->streams[i]->codec->codec_type==CODEC_TYPE_VIDEO)
#endif //FFMPEG_2_8
{
videoStream = i;
streamFound = true ;
break;
}
}
if(streamFound)
{
pCodecCtx=pFormatCtx->streams[videoStream]->codec;
// Find the decoder for the video stream
pCodec=avcodec_find_decoder(pCodecCtx->codec_id);
if(pCodec==NULL)
return false; // Codec not found
// Open codec
#if FFMPEG_2_8
if(avcodec_open2(pCodecCtx, pCodec,NULL)<0)
#else
if(avcodec_open(pCodecCtx, pCodec)<0)
#endif //FFMPEG_2_8
{
return false; // Could not open codec
}
#if FFMPEG_2_8
pFrameRGB=av_frame_alloc() ;
#else
pFrameRGB=avcodec_alloc_frame();
#endif //FFMPEG_2_8
if(pFrameRGB==NULL)
return false; //No Memory
while(true)
{
AVPacket packet ;
if (av_read_frame(pFormatCtx, &packet) < 0)
{
break ;
}
int frameFinished;
if (packet.stream_index == videoStream)
{
#if FFMPEG_2_8
avcodec_decode_video2(pCodecCtx, pFrameRGB, &frameFinished, &packet);
#else
avcodec_decode_video(pCodecCtx, pFrameRGB, &frameFinished, packet.data, packet.size);
#endif //FFMPEG_2_8
}
if(frameFinished !=0)
{
break ;
}
}
}
I do have almost same implementation, working on latest 2.8.1 version. I have no idea about old version(0.5), but your implementation for new version seem to be fine.
I guess one thing about "got_picture_ptr ", not sure though. JFYI, decode order and display order are different in H264. May be earlier version of FFMPEG used to give out the pictures in decode order, rather than display order. In such case, you will see non-zero value for every packet decode, beginning from the first packet.
At some point, ffmpeg would have corrected this to give out the pictures in display order. Hence, you may not observe non-zero value from the very first packet decode.
I guess your application is working fine, irrespective of this difference.
Related
I'm wanting to read the audio out of a video file as fast as possible, using the libav libraries. It's all working fine, but it seems like it could be faster.
To get a performance baseline, I ran this ffmpeg command and timed it:
time ffmpeg -threads 1 -i file -map 0:a:0 -f null -
On a test file (a 2.5gb 2hr .MOV with pcm_s16be audio) this comes out to about 1.35 seconds on my M1 Macbook Pro.
On the other hand, this minimal C code (based on FFmpeg's "Demuxing and decoding" example) is consistently around 0.3 seconds slower.
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
static int decode_packet(AVCodecContext *dec, const AVPacket *pkt, AVFrame *frame)
{
int ret = 0;
// submit the packet to the decoder
ret = avcodec_send_packet(dec, pkt);
// get all the available frames from the decoder
while (ret >= 0) {
ret = avcodec_receive_frame(dec, frame);
av_frame_unref(frame);
}
return 0;
}
int main (int argc, char **argv)
{
int ret = 0;
AVFormatContext *fmt_ctx = NULL;
AVCodecContext *dec_ctx = NULL;
AVFrame *frame = NULL;
AVPacket *pkt = NULL;
if (argc != 3) {
exit(1);
}
int stream_idx = atoi(argv[2]);
/* open input file, and allocate format context */
avformat_open_input(&fmt_ctx, argv[1], NULL, NULL);
/* get the stream */
AVStream *st = fmt_ctx->streams[stream_idx];
/* find a decoder for the stream */
AVCodec *dec = avcodec_find_decoder(st->codecpar->codec_id);
/* allocate a codec context for the decoder */
dec_ctx = avcodec_alloc_context3(dec);
/* copy codec parameters from input stream to output codec context */
avcodec_parameters_to_context(dec_ctx, st->codecpar);
/* init the decoder */
avcodec_open2(dec_ctx, dec, NULL);
/* allocate frame and packet structs */
frame = av_frame_alloc();
pkt = av_packet_alloc();
/* read frames from the specified stream */
while (av_read_frame(fmt_ctx, pkt) >= 0) {
if (pkt->stream_index == stream_idx)
ret = decode_packet(dec_ctx, pkt, frame);
av_packet_unref(pkt);
if (ret < 0)
break;
}
/* flush the decoders */
decode_packet(dec_ctx, NULL, frame);
return ret < 0;
}
I tried measuring parts of this program to see if it was spending a lot of time in the setup, but it's not – at least 1.5 seconds of the runtime is the loop where it's reading frames.
So I took some flamegraph recordings (using cargo-flamegraph) and ran each a few times to make sure the timing was consistent. There's probably some overhead since both were consistently higher than running normally, but they still have the ~0.3 second delta.
# 1.812 total
time sudo flamegraph ./minimal file 1
# 1.542 total
time sudo flamegraph ffmpeg -threads 1 -i file -map 0:a:0 -f null - 2>&1
Here are the flamegraphs stacked up, scaled so that the faster one is only 85% as wide as the slower one. (click for larger)
The interesting thing that stands out to me is how long is spent on read in the minimal example vs. ffmpeg:
The time spent on lseek is also a lot longer in the minimal program – it's plainly visible in that flamegraph, but in the ffmpeg flamegraph, lseek is a single pixel wide.
What's causing this discrepancy? Is ffmpeg actually doing less work than I think it is here? Is the minimal code doing something naive? Is there some buffering or other I/O optimizations that ffmpeg has enabled?
How can I shave 0.3 seconds off of the minimal example's runtime?
The difference is that ffmpeg, when run with the -map flag, is explicitly setting the AVDISCARD_ALL flag on the streams that were going to be ignored. The packets for those streams still get read from disk, but with this flag set, they never make it into av_read_frame (with the mov demuxer, at least).
In the example code, by contrast, this while loop receives every packet from every stream, and only drops the packets after they've been (wastefully) passed through av_read_frame.
/* read frames from the specified stream */
while (av_read_frame(fmt_ctx, pkt) >= 0) {
if (pkt->stream_index == stream_idx)
ret = decode_packet(dec_ctx, pkt, frame);
av_packet_unref(pkt);
if (ret < 0)
break;
}
I changed the program to set the discard flag on the unused streams:
// ...
/* open input file, and allocate format context */
avformat_open_input(&fmt_ctx, argv[1], NULL, NULL);
/* get the stream */
AVStream *st = fmt_ctx->streams[stream_idx];
/* discard packets from other streams */
for(int i = 0; i < fmt_ctx->nb_streams; i++) {
fmt_ctx->streams[i]->discard = AVDISCARD_ALL;
}
st->discard = AVDISCARD_DEFAULT;
// ...
With that change in place, it gives about a ~1.8x speedup on the same test file, after the cache is warmed up.
Minimal example, without discard 1.593s
ffmpeg with -map 0:a:0 1.404s
Minimal example, with discard 0.898s
I use dll of ffmpeg read the USB camera, display the picture on the control, and save the image to the file. At first, the reading is normal. The sound and image can be saved. But after reading for a few minutes, data can not be read, av_read_frame return error code is - 5, and the camera is automatically turned off. Direct use ffmpeg.exe when I go to read the camera and save the file, it won't be a problem for hours. Anybody can you tell me how to return - 5 error code and how to deal with it?
ffmpeg version is 4.2.3
The code like this:
while (1)
{
if ((ret = av_read_frame(m_pVidFmtCtx, dec_pkt)) == 0)
{
ret = avcodec_send_packet(m_pVidFmtCtx->streams[dec_pkt->stream_index]->codec, dec_pkt);
while(ret >= 0)
{
ret = avcodec_receive_frame(m_pVidFmtCtx->streams[dec_pkt->stream_index]->codec, pframe);
sws_scale(img_rgb_ctx, pframe->data, pframe->linesize, 0, cy, dstData, dstLinesize);
var bitmap = new Bitmap(dstWidth, dstHeight, dstLinesize[0], PixelFormat.Format24bppRgb, convertedFrameBufferPtr);
}
}
else
{
printf("Error code: %d", ret); // Here ret is -5
}
}
FFmpeg error codes based on standard POSIX error codes used on errno. But FFmpeg directly returns these with AVERROR macro and this effectively makes it a negative number.
So error code -5 is AVERROR(EIO). Obviously something seriously wrong with Input/Output.
I have a strange problem in my C/C++ FFmpeg transcoder, which takes an input MP4 (varying input codecs) and produces and output MP4 (x264, baseline & AAC LC #44100 sample rate with libfdk_aac):
The resulting mp4 video has fine images (x264) and the audio (AAC LC) works fine as well, but is only played until exactly the half of the video.
The audio is not slowed down, not stretched and doesn't stutter. It just stops right in the middle of the video.
One hint may be that the input file has a sample rate of 22050 and 22050/44100 is 0.5, but I really don't get why this would make the sound just stop after half the time. I'd expect such an error leading to sound being at the wrong speed. Everything works just fine if I don't try to enforce 44100 and instead just use the incoming sample_rate.
Another guess would be that the pts calculation doesn't work. But the audio sounds just fine (until it stops) and I do exactly the same for the video part, where it works flawlessly. "Exactly", as in the same code, but "audio"-variables replaced with "video"-variables.
FFmpeg reports no errors during the whole process. I also flush the decoders/encoders/interleaved_writing after all the package reading from the input is done. It works well for the video so I doubt there is much wrong with my general approach.
Here are the functions of my code (stripped off the error handling & other class stuff):
AudioCodecContext Setup
outContext->_audioCodec = avcodec_find_encoder(outContext->_audioTargetCodecID);
outContext->_audioStream =
avformat_new_stream(outContext->_formatContext, outContext->_audioCodec);
outContext->_audioCodecContext = outContext->_audioStream->codec;
outContext->_audioCodecContext->channels = 2;
outContext->_audioCodecContext->channel_layout = av_get_default_channel_layout(2);
outContext->_audioCodecContext->sample_rate = 44100;
outContext->_audioCodecContext->sample_fmt = outContext->_audioCodec->sample_fmts[0];
outContext->_audioCodecContext->bit_rate = 128000;
outContext->_audioCodecContext->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;
outContext->_audioCodecContext->time_base =
(AVRational){1, outContext->_audioCodecContext->sample_rate};
outContext->_audioStream->time_base = (AVRational){1, outContext->_audioCodecContext->sample_rate};
int retVal = avcodec_open2(outContext->_audioCodecContext, outContext->_audioCodec, NULL);
Resampler Setup
outContext->_audioResamplerContext =
swr_alloc_set_opts( NULL, outContext->_audioCodecContext->channel_layout,
outContext->_audioCodecContext->sample_fmt,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->channel_layout,
_inputContext._audioCodecContext->sample_fmt,
_inputContext._audioCodecContext->sample_rate,
0, NULL);
int retVal = swr_init(outContext->_audioResamplerContext);
Decoding
decodedBytes = avcodec_decode_audio4( _inputContext._audioCodecContext,
_inputContext._audioTempFrame,
&p_gotAudioFrame, &_inputContext._currentPacket);
Converting (only if decoding produced a frame, of course)
int retVal = swr_convert( outContext->_audioResamplerContext,
outContext->_audioConvertedFrame->data,
outContext->_audioConvertedFrame->nb_samples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
Encoding (only if decoding produced a frame, of course)
outContext->_audioConvertedFrame->pts =
av_frame_get_best_effort_timestamp(_inputContext._audioTempFrame);
// Init the new packet
av_init_packet(&outContext->_audioPacket);
outContext->_audioPacket.data = NULL;
outContext->_audioPacket.size = 0;
// Encode
int retVal = avcodec_encode_audio2( outContext->_audioCodecContext,
&outContext->_audioPacket,
outContext->_audioConvertedFrame,
&p_gotPacket);
// Set pts/dts time stamps for writing interleaved
av_packet_rescale_ts( &outContext->_audioPacket,
outContext->_audioCodecContext->time_base,
outContext->_audioStream->time_base);
outContext->_audioPacket.stream_index = outContext->_audioStream->index;
Writing (only if encoding produced a packet, of course)
int retVal = av_interleaved_write_frame(outContext->_formatContext, &outContext->_audioPacket);
I am quite out of ideas about what would cause such a behaviour.
So, I finally managed to figure things out myself.
The problem was indeed in the difference of the sample_rate.
You'd assume that a call to swr_convert() would give you all the samples you need for converting the audio frame when called like I did.
Of course, that would be too easy.
Instead, you need to call swr_convert (potentially) multiple times per frame and buffer its output, if required. Then you need to grab a single frame from the buffer and that is what you will have to encode.
Here is my new convertAudioFrame function:
// Calculate number of output samples
int numOutputSamples = av_rescale_rnd(
swr_get_delay(outContext->_audioResamplerContext, _inputContext._audioCodecContext->sample_rate)
+ _inputContext._audioTempFrame->nb_samples,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->sample_rate,
AV_ROUND_UP);
if (numOutputSamples == 0)
{
return;
}
uint8_t* tempSamples;
av_samples_alloc( &tempSamples, NULL,
outContext->_audioCodecContext->channels, numOutputSamples,
outContext->_audioCodecContext->sample_fmt, 0);
int retVal = swr_convert( outContext->_audioResamplerContext,
&tempSamples,
numOutputSamples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
// Write to audio fifo
if (retVal > 0)
{
retVal = av_audio_fifo_write(outContext->_audioFifo, (void**)&tempSamples, retVal);
}
av_freep(&tempSamples);
// Get a frame from audio fifo
int samplesAvailable = av_audio_fifo_size(outContext->_audioFifo);
if (samplesAvailable > 0)
{
retVal = av_audio_fifo_read(outContext->_audioFifo,
(void**)outContext->_audioConvertedFrame->data,
outContext->_audioCodecContext->frame_size);
// We got a frame, so also set its pts
if (retVal > 0)
{
p_gotConvertedFrame = 1;
if (_inputContext._audioTempFrame->pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pts;
}
else if (_inputContext._audioTempFrame->pkt_pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pkt_pts;
}
}
}
This function I basically call until there are no more frame in the audio fifo buffer.
So, the audio was only half as long because I only encoded as many frames as I decoded. Where I actually needed to encode 2 times as many frames due to 2 times the sample_rate.
I want to programmatically convert a mp4 video file (with h264 codec) to single RGB images. With the command line this looks like:
ffmpeg -i test1080.mp4 -r 30 image-%3d.jpg
Using this command produces a nice set of pictures. But when I try to programmatically do the same some images (probably B and P frames) look odd (e.g. have kind of distorted areas with difference information etc.). The reading and conversion code is as follow:
AVFrame *frame = avcodec_alloc_frame();
AVFrame *frameRGB = avcodec_alloc_frame();
AVPacket packet;
int buffer_size=avpicture_get_size(PIX_FMT_RGB24, m_codecCtx->width,
m_codecCtx->height);
uint8_t *buffer = new uint8_t[buffer_size];
avpicture_fill((AVPicture *)frameRGB, buffer, PIX_FMT_RGB24,
m_codecCtx->width, m_codecCtx->height);
while (true)
{
// Read one packet into `packet`
if (av_read_frame(m_formatCtx, &packet) < 0) {
break; // End of stream. Done decoding.
}
if (avcodec_decode_video(m_codecCtx, frame, &buffer_size, packet.data, packet.size) < 1) {
break; // Error in decoding
}
if (!buffer_size) {
break;
}
// Convert
img_convert((AVPicture *)frameRGB, PIX_FMT_RGB24, (AVPicture*)frame,
m_codecCtx->pix_fmt, m_codecCtx->width, m_codecCtx->height);
// RGB data is now available in frameRGB for further processing
}
How can I convert the video stream so that each final image shows all image data, so that information from B and P frames is included in all frames?
[EDIT:] A sample image showing the artifacts is here: http://imageshack.us/photo/my-images/201/sampleq.jpg/
Regards,
If the third argument of avcodec_decode_video returns a null value, it does not mean the error. This means that the frame is not yet ready. You need to continue to read frames until the value becomes nonzero.
if (!buffer_size) {
continue;
}
UPD
Try to add the check and display only the key frames, it will help isolate the problem.
while (true)
{
// Read one packet into `packet`
if (av_read_frame(m_formatCtx, &packet) < 0) {
break; // End of stream. Done decoding.
}
if (avcodec_decode_video(m_codecCtx, frame, &buffer_size,
packet.data, packet.size) < 1)
{
break; // Error in decoding
}
if (!buffer_size) {
continue; // <-- It's important!
}
// check for key frame
if (packet.flags & AV_PKT_FLAG_KEY)
{
// Convert
img_convert((AVPicture *)frameRGB, PIX_FMT_RGB24, (AVPicture*)frame,
m_codecCtx->pix_fmt, m_codecCtx->width, m_codecCtx->height);
}
}
Recently I upgraded ffmpeg from 0.9 to 1.0 (tested on Win7x64 and on iOS), and now avcodec_decode_video2 seagfaults. Long story short: the crash occurs every time the video dimensions change (eg. from 320x240 to 160x120 or vice versa).
I receive mpeg4 video stream from some proprietary source and decode it like this:
// once, during initialization:
AVCodec *codec_ = avcodec_find_decoder(CODEC_ID_MPEG4);
AVCodecContext ctx_ = avcodec_alloc_context3(codec_);
avcodec_open2(ctx_, codec_, 0);
AVPacket packet_;
av_init_packet(&packet_);
AVFrame picture_ = avcodec_alloc_frame();
// on every frame:
int got_picture;
packet_.size = size;
packet_.data = (uint8_t *)buffer;
avcodec_decode_video2(ctx_, picture_, &got_picture, &packet_);
Again, all the above had worked flawlessly until I upgraded to 1.0. Now every time the frame dimensions change - avcodec_decode_video2 crashes. Note that I don't assign width/height in AVCodecContext - neither in the beginning, nor when the stream changes - can it be the reason?
I'd appreciate any idea!
Update: setting ctx_.width and ctx_.height doesn't help.
Update2: just before the crash I get the following log messages:
mpeg4, level 24: "Found 2 unreleased buffers!".
level 8: "Assertion i < avci->buffer_count failed at libavcodec/utils.c:603"
Update3 upgrading to 1.1.2 fixed this crash. The decoder is able again to cope with dimensions change on the fly.
You can try to fill the AVPacket::side_data. If you change the frame size, codec receives information from it (see libavcodec/utils.c apply_param_change function)
This structure can be filled as follows:
int my_ff_add_param_change(AVPacket *pkt, int32_t width, int32_t height)
{
uint32_t flags = 0;
int size = 4 * 3;
uint8_t *data;
if (!pkt)
return AVERROR(EINVAL);
flags = AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS;
data = av_packet_new_side_data(pkt, AV_PKT_DATA_PARAM_CHANGE, size);
if (!data)
return AVERROR(ENOMEM);
((uint32_t*)data)[0] = flags;
((uint32_t*)data)[1] = width;
((uint32_t*)data)[2] = height;
return 0;
}
You need to call this function every time the size changes.
I think this feature has appeared recently. I didn't know about it until I looked new ffmpeg sources.
UPD
As you write, the easiest method to solve the problem is to perform codec restart. Just call avcodec_close / avcodec_open2
I just ran into same issue when my frames were changing size on the fly. However, calling avcodec_close/avcodec_open2 is superflous. A cleaner way is to just reset your AVPacket data structure before the call to avcodec_decode_video2. Here it is the code:
av_init_packet(&packet_)
The key here is that this method resets the all of the values of AVPacket to defaults. Check docs for more info.