Opening a HLS stream using avformat_open_input retrieves data from all streams and I would like to only retrieve data from some of them. Is that possible?
Consider the following MWE:
#include <libavformat/avformat.h>
int main(int argc, char **argv)
AVFormatContext *inFmtCtx = NULL;
AVPacket packet;
const char *inUrl;
int ret;
if (argc < 2) { return -1; }
inUrl = argv[1];
if ((ret = avformat_open_input(&inFmtCtx, inUrl, NULL, NULL)) < 0)
goto end;
if ((ret = avformat_find_stream_info(inFmtCtx, NULL)) < 0)
goto end;
while (1) {
ret = av_read_frame(inFmtCtx, &packet);
if (ret < 0) break;
// # Placeholder: Do Something # //
printf("%i, ", packet.stream_index);
if (ret < 0 && ret != AVERROR_EOF) {
fprintf(stderr, "Error occurred: %s\n", av_err2str(ret));
return 1;
return 0;
Using the example HLS url "" (might be geolocked), the printf returns values between 0 and 9, indicating that all 10 streams (5 video, 5 audio) are retrieved.
Of course, one could discard all but the selected ones, after they have been read, e.g. using
if(packet.stream_index != selectedVideoStreamId && packet.stream_index != selectedAudioStreamId) {
But can the input context / ffmpeg be configured to only retrieve the selected streams, i.e. not downloading all the data that is not needed (the unselected streams)?
You can disable a HLS variant by discarding all streams that belong to it:
if ((ret = avformat_open_input(&inFmtCtx, inUrl, NULL, NULL)) < 0)
goto end;
// disable all but the last stream
for (i = 0; i < inFmtCtx->nb_streams - 1; ++i) {
AVStream *st = inFmtCtx->streams[i];
st->discard = AVDISCARD_ALL;
if ((ret = avformat_find_stream_info(inFmtCtx, NULL)) < 0)
goto end;
Reading your stream for a few seconds yields:
stream=0 pkt_count=0
stream=1 pkt_count=0
stream=2 pkt_count=0
stream=3 pkt_count=0
stream=4 pkt_count=0
stream=5 pkt_count=0
stream=6 pkt_count=0
stream=7 pkt_count=0
stream=8 pkt_count=998
stream=9 pkt_count=937
As you can see it reads two streams corresponding to the multiplexed audio/video streams in the last playlist, even if a single stream was enabled. If you need better granularity than that you'll have to modify the HLS demuxer.
Im using ffmpeg do read an udp stream (contains only video) and to decode frames , I then like to encode again , but during the decoding or from the demuxing i get blurry pictures pictures especially the lower part.
I have a video player that also uses ffmpeg that displays the video perfectly and I try to look in that code but I don't see any differences.
In the log I se things like
Invalid NAL unit 8, skipping.
nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3
Invalid NAL unit 7, skipping.
bytestream overread td
error while decoding MB 109 49, bytestream td
The main things in the code looks like:
AVFormatContext *fmt_ctx = 0;
AVDictionary *options = 0;
av_dict_set(&options, "analyzeduration", "500000", NULL);
av_dict_set(&options, "probesize", "500000", NULL);
char* url = "udp://";
avformat_open_input(&fmt_ctx, url, 0, &options);
avformat_find_stream_info(fmt_ctx, &options);
int nRet = 0;
av_dump_format(fmt_ctx, 0, url, 0);
AVStream *pStream = fmt_ctx->streams[0];
AVCodecID nCodecid = pStream->codec->codec_id;
AVCodec* pCodec = avcodec_find_decoder(nCodecid);
AVCodecContext* pCodecCtx = pStream->codec;
nRet = avcodec_open2(pCodecCtx, pCodec, NULL);
int nInH = pStream->codec->height;
int nInW = pStream->codec->width;
int nOutW = nInW / 4;
int nOutH = nInH / 4;
SwsContext* pSwsCtx = sws_getContext(nInW, nInH, AV_PIX_FMT_YUV420P,
nOutW, nOutH, AV_PIX_FMT_RGB24,
m_pFilmWdg->m_img = QImage(nOutW, nOutH, QImage::Format_RGB888);
int linesizes[4];
av_image_fill_linesizes(linesizes, AV_PIX_FMT_RGB24, nOutW);
for (;;)
av_init_packet(&pkt); = NULL;
pkt.size = 0;
nRet = av_read_frame(fmt_ctx, &pkt);
nRet = avcodec_send_packet(pCodecCtx, &pkt);
AVFrame* picture = av_frame_alloc();
nRet = avcodec_receive_frame(pCodecCtx, picture);
if (AVERROR(EAGAIN) == nRet)
uint8_t* p[] = { m_pFilmWdg->m_img.bits() };
nRet = sws_scale(pSwsCtx, picture->data, picture->linesize, 0, nInH, p, linesizes);
I found the the problem after a lot of debugging . The issue was that udp packages was lost or damaged during the time of decoding. I split up the loop in too two in different threads. One
DemuxLoop and one DecodeLoop something like
void VideoDecode::DemuxLoop()
int nRet = av_read_frame(fmt_ctx, &pkt);
put in queue
void VideoDecode::DecodeLoop()
pick from queue
int nRet = avcodec_send_packet(pCodecCtx, &pkt);
AVFrame* picture = av_frame_alloc();
nRet = avcodec_receive_frame(pCodecCtx, picture);
I have a set of JPEG frames which I am muxing into an avi, which gives me a mjpeg video. This is the command I run on the console:
ffmpeg -y -start_number 0 -i %06d.JPEG -codec copy vid.avi
When I try to demux the video using ffmpeg C api, I get frames which are slightly different in values. Demuxing code looks something like this:
AVFormatContext* fmt_ctx = NULL;
AVCodecContext* cdc_ctx = NULL;
AVCodec* vid_cdc = NULL;
int ret;
unsigned int height, width;
// read_nframes is the number of frames to read
output_arr = new unsigned char [height * width * 3 *
sizeof(unsigned char) * read_nframes];
avcodec_open2(cdc_ctx, vid_cdc, NULL);
int num_bytes;
uint8_t* buffer = NULL;
const AVPixelFormat out_format = AV_PIX_FMT_RGB24;
num_bytes = av_image_get_buffer_size(out_format, width, height, 1);
buffer = (uint8_t*)av_malloc(num_bytes * sizeof(uint8_t));
AVFrame* vid_frame = NULL;
vid_frame = av_frame_alloc();
AVFrame* conv_frame = NULL;
conv_frame = av_frame_alloc();
av_image_fill_arrays(conv_frame->data, conv_frame->linesize, buffer,
out_format, width, height, 1);
struct SwsContext *sws_ctx = NULL;
sws_ctx = sws_getContext(width, height, cdc_ctx->pix_fmt,
width, height, out_format,
int frame_num = 0;
AVPacket vid_pckt;
while (av_read_frame(fmt_ctx, &vid_pckt) >=0) {
ret = avcodec_send_packet(cdc_ctx, &vid_pckt);
if (ret < 0)
ret = avcodec_receive_frame(cdc_ctx, vid_frame);
if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
if (ret >= 0) {
// convert image from native format to planar GBR
sws_scale(sws_ctx, vid_frame->data,
vid_frame->linesize, 0, vid_frame->height,
conv_frame->data, conv_frame->linesize);
unsigned char* r_ptr = output_arr +
(height * width * sizeof(unsigned char) * 3 * frame_num);
unsigned char* g_ptr = r_ptr + (height * width * sizeof(unsigned char));
unsigned char* b_ptr = g_ptr + (height * width * sizeof(unsigned char));
unsigned int pxl_i = 0;
for (unsigned int r = 0; r < height; ++r) {
uint8_t* avframe_r = conv_frame->data[0] + r*conv_frame->linesize[0];
for (unsigned int c = 0; c < width; ++c) {
r_ptr[pxl_i] = avframe_r[0];
g_ptr[pxl_i] = avframe_r[1];
b_ptr[pxl_i] = avframe_r[2];
avframe_r += 3;
if (frame_num >= read_nframes)
In my experience around two-thirds of the pixel values are different, each by +-1 (in a range of [0,255]). I am wondering is it due to some decoding scheme FFmpeg uses for reading JPEG frames? I tried encoding and decoding png frames, and it works perfectly fine. I am sure this is something to do with the libav decoding process because the MD5 values are consistent between the images and the video:
ffmpeg -i %06d.JPEG -f framemd5 -
ffmpeg -i vid.avi -f framemd5 -
In short my goal is to get the same pixel by pixel values for each JPEG frame as I would I have gotten if I was reading the JPEG images directly. Here is the stand-alone bitbucket code I used. It includes cmake files to build code, and a couple of jpeg frames with the converted avi file to test this problem. (give '--filetype png' to test the png decoding).
I am looking to copy an AVFrame into an array where pixels are stored one channel at a time in a row-major order.
I am using FFMPEG's api to read frames from a video. I have used avcodec_decode_video2 to fetch each frame as an AVFrame as follows:
AVFormatContext* fmt_ctx = NULL;
avformat_open_input(&fmt_ctx, filepath, NULL, NULL);
int video_stream_idx; // stores the stream index for the video
AVFrame* vid_frame = NULL;
vid_frame = av_frame_alloc();
AVPacket vid_pckt;
int frame_finish;
while (av_read_frame(fmt_ctx, &vid_pckt) >= 0) {
if (b_vid_pckt.stream_index == video_stream_idx) {
avcodec_decode_video2(cdc_ctx, vid_frame, &frame_finish, &vid_pckt);
if (frame_finish) {
/* perform conversion */
The destination array looks like this:
unsigned char* frame_arr = new unsigned char [cdc_ctx->width * cdc_ctx->height * 3];
I need to copy all of vid_frame into frame_arr, where the range of pixel values should be [0, 255]. The problem is that the array needs to store the frame in row major order, one channel at a time, i.e. R11, R12, ... R21, R22, ... G11, G12, ... G21, G22, ... B11, B12, ... B21, B22, ... (I have used the notation [color channel][row index][column index], i.e. G21 is the green channel value of pixel at row 2, column 1). I have had a look at sws_scale, but I don't understand it enough to figure out whether that function is capable of doing such a conversion. Can somebody help!! :)
The format you called "one channel at a time" has a term named planar. (btw, the opposite format is named packed) And almost every pixel format is of row order.
The problem here is the input format may vary and all of them should be converted to one format. That's what sws_scale() does.
However, there is no such planar RGB format in ffmpeg libs yet. You have to write your own pixel format description into ffmpeg source code libavutil/pixdesc.c and re-build the libs.
Or you can just convert the frame into AV_PIX_FMT_GBRP format, which is the most similar one to what you want. AV_PIX_FMT_GBRP is a planar format, while the green channel is at first and red at last (blue middle). And rearrange these channels then.
// Create a SwsContext first:
SwsContext* sws_ctx = sws_getContext(cdc_ctx->width, cdc_ctx->height, cdc_ctx->pix_fmt, cdc_ctx->width, cdc_ctx->height, AV_PIX_FMT_GBRP, 0, 0, 0, 0);
// alloc some new space for storing converted frame
AVFrame* gbr_frame = av_frame_alloc();
picture->format = AV_PIX_FMT_GBRP;
picture->width = cdc_ctx->width;
picture->height = cdc_ctx->height;
av_frame_get_buffer(picture, 32);
while (av_read_frame(fmt_ctx, &vid_pckt) >=0) {
ret = avcodec_send_packet(cdc_ctx, &vid_pckt);
// In particular, we don't expect AVERROR(EAGAIN), because we read all
// decoded frames with avcodec_receive_frame() until done.
if (ret < 0)
ret = avcodec_receive_frame(cdc_ctx, vid_frame);
if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
if (ret >= 0) {
// convert image from native format to planar GBR
sws_scale(sws_ctx, vid_frame->data,
vid_frame->linesize, 0, vid_frame->height,
gbr_frame->data, gbr_frame->linesize);
// rearrange gbr channels in gbr_frame as you like
// g channel is gbr_frame->data[0]
// b channel is gbr_frame->data[1]
// r channel is gbr_frame->data[2]
// ......
I am working on capturing and streaming audio to RTMP server at a moment. I work under MacOS (in Xcode), so for capturing audio sample-buffer I use AVFoundation-framework. But for encoding and streaming I need to use ffmpeg-API and libfaac encoder. So output format must be AAC (for supporting stream playback on iOS-devices).
And I faced with such problem: audio-capturing device (in my case logitech camera) gives me sample-buffer with 512 LPCM samples, and I can select input sample-rate from 16000, 24000, 36000 or 48000 Hz. When I give these 512 samples to AAC-encoder (configured for appropriate sample-rate), I hear a slow and jerking audio (seems as like pice of silence after each frame).
I figured out (maybe I am wrong), that libfaac encoder accepts audio frames only with 1024 samples. When I set input samplerate to 24000 and resample input sample-buffer to 48000 before encoding, I obtain 1024 resampled samples. After encoding these 1024 sampels to AAC, I hear proper sound on output. But my web-cam produce 512 samples in buffer for any input samplerate, when output sample-rate must be 48000 Hz. So I need to do resampling in any case, and I will not obtain exactly 1024 samples in buffer after resampling.
Is there a way to solve this problem within ffmpeg-API functionality?
I would be grateful for any help.
I guess that I can accumulate resampled buffers until count of samples become 1024, and then encode it, but this is stream so there will be troubles with resulting timestamps and with other input devices, and such solution is not suitable.
The current issue came out of the problem described in [question]: How to fill audio AVFrame (ffmpeg) with the data obtained from CMSampleBufferRef (AVFoundation)?
Here is a code with audio-codec configs (there also was video stream but video work fine):
/*global variables*/
static AVFrame *aframe;
static AVFrame *frame;
AVOutputFormat *fmt;
AVFormatContext *oc;
AVStream *audio_st, *video_st;
Init ()
AVCodec *audio_codec, *video_codec;
int ret;
avformat_alloc_output_context2(&oc, NULL, "flv", filename);
fmt = oc->oformat;
oc->oformat->video_codec = AV_CODEC_ID_H264;
oc->oformat->audio_codec = AV_CODEC_ID_AAC;
video_st = NULL;
audio_st = NULL;
if (fmt->video_codec != AV_CODEC_ID_NONE)
{ //… /*init video codec*/}
if (fmt->audio_codec != AV_CODEC_ID_NONE) {
audio_codec= avcodec_find_encoder(fmt->audio_codec);
if (!(audio_codec)) {
fprintf(stderr, "Could not find encoder for '%s'\n",
audio_st= avformat_new_stream(oc, audio_codec);
if (!audio_st) {
fprintf(stderr, "Could not allocate stream\n");
audio_st->id = oc->nb_streams-1;
audio_st->codec->sample_fmt = AV_SAMPLE_FMT_S16;
audio_st->codec->bit_rate = 32000;
audio_st->codec->sample_rate = 48000;
audio_st->time_base = (AVRational){1, audio_st->codec->sample_rate };
audio_st->codec->channels = 1;
audio_st->codec->channel_layout = AV_CH_LAYOUT_MONO;
if (oc->oformat->flags & AVFMT_GLOBALHEADER)
audio_st->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
if (video_st)
// …
/*prepare video*/
if (audio_st)
aframe = avcodec_alloc_frame();
if (!aframe) {
fprintf(stderr, "Could not allocate audio frame\n");
AVCodecContext *c;
int ret;
c = audio_st->codec;
ret = avcodec_open2(c, audio_codec, 0);
if (ret < 0) {
fprintf(stderr, "Could not open audio codec: %s\n", av_err2str(ret));
And resampling and encoding audio:
if (mType == kCMMediaType_Audio)
CMSampleTimingInfo timing_info;
CMSampleBufferGetSampleTimingInfo(sampleBuffer, 0, &timing_info);
double pts=0;
double dts=0;
AVCodecContext *c;
AVPacket pkt = { 0 }; // data and size must be 0;
int got_packet, ret;
c = audio_st->codec;
CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer);
NSUInteger channelIndex = 0;
CMBlockBufferRef audioBlockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t audioBlockBufferOffset = (channelIndex * numSamples * sizeof(SInt16));
size_t lengthAtOffset = 0;
size_t totalLength = 0;
SInt16 *samples = NULL;
CMBlockBufferGetDataPointer(audioBlockBuffer, audioBlockBufferOffset, &lengthAtOffset, &totalLength, (char **)(&samples));
const AudioStreamBasicDescription *audioDescription = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(sampleBuffer));
SwrContext *swr = swr_alloc();
int in_smprt = (int)audioDescription->mSampleRate;
av_opt_set_int(swr, "in_channel_layout", AV_CH_LAYOUT_MONO, 0);
av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout, 0);
av_opt_set_int(swr, "in_channel_count", audioDescription->mChannelsPerFrame, 0);
av_opt_set_int(swr, "out_channel_count", audio_st->codec->channels, 0);
av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout, 0);
av_opt_set_int(swr, "in_sample_rate", audioDescription->mSampleRate,0);
av_opt_set_int(swr, "out_sample_rate", audio_st->codec->sample_rate,0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_S16, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", audio_st->codec->sample_fmt, 0);
uint8_t **input = NULL;
int src_linesize;
int in_samples = (int)numSamples;
ret = av_samples_alloc_array_and_samples(&input, &src_linesize, audioDescription->mChannelsPerFrame,
in_samples, AV_SAMPLE_FMT_S16P, 0);
uint8_t *output=NULL;
int out_samples = av_rescale_rnd(swr_get_delay(swr, in_smprt) +in_samples, (int)audio_st->codec->sample_rate, in_smprt, AV_ROUND_UP);
av_samples_alloc(&output, NULL, audio_st->codec->channels, out_samples, audio_st->codec->sample_fmt, 0);
in_samples = (int)numSamples;
out_samples = swr_convert(swr, &output, out_samples, (const uint8_t **)input, in_samples);
aframe->nb_samples =(int) out_samples;
ret = avcodec_fill_audio_frame(aframe, audio_st->codec->channels, audio_st->codec->sample_fmt,
(uint8_t *)output,
(int) out_samples *
av_get_bytes_per_sample(audio_st->codec->sample_fmt) *
audio_st->codec->channels, 1);
aframe->channel_layout = audio_st->codec->channel_layout;
aframe->sample_rate= audio_st->codec->sample_rate;
if (timing_info.presentationTimeStamp.timescale!=0)
pts=(double) timing_info.presentationTimeStamp.value/timing_info.presentationTimeStamp.timescale;
aframe->pts = av_rescale_q(aframe->pts, audio_st->time_base, audio_st->codec->time_base);
ret = avcodec_encode_audio2(c, &pkt, aframe, &got_packet);
if (ret < 0) {
fprintf(stderr, "Error encoding audio frame: %s\n", av_err2str(ret));
if (got_packet)
pkt.stream_index = audio_st->index;
pkt.pts = av_rescale_q(pkt.pts, audio_st->codec->time_base, audio_st->time_base);
pkt.dts = av_rescale_q(pkt.dts, audio_st->codec->time_base, audio_st->time_base);
// Write the compressed frame to the media file.
ret = av_interleaved_write_frame(oc, &pkt);
if (ret != 0) {
fprintf(stderr, "Error while writing audio frame: %s\n",
I also ended up here after having a similar problem. I'm reading audio and video from a Blackmagic Decklink SDI card in 720p50 meaning I had 960 samples per videoframe (48k/50fps) I wanted to encode together with the video. Got really weird audio when only sending 960 samples to aacenc and it didn't really complain about this fact either.
Started to use AVAudioFifo (see ffmpeg/doc/examples/transcode_aac.c) and kept adding frames to it until I had enough frames to satisfy aacenc. This will mean I have samples playing too late I guess, since pts will be set on 1024 samples when the first 960 should really have another value. But, it's not really noticeable as far as I can hear/see.
I got a similar problem. I was encoding PCM packets to AAC while the length of PCM packets are sometimes smaller than 1024.
If I encode the packet that's smaller than 1024, the audio will be slow. On the other hand, if I throw it away, the audio will get faster. swr_convert function didn't have any automatic buffering from my observation.
I ended up with a buffer scheme that packets was filled to a 1024 buffer and the buffer gets encoded and cleaned everytime it's full.
The function to fill buffer is below:
// put frame data into buffer of fixed size
bool ffmpegHelper::putAudioBuffer(const AVFrame *pAvFrameIn, AVFrame **pAvFrameBuffer, AVCodecContext *dec_ctx, int frame_size, int &k0) {
// prepare pFrameAudio
if (!(*pAvFrameBuffer)) {
if (!(*pAvFrameBuffer = av_frame_alloc())) {
av_log(NULL, AV_LOG_ERROR, "Alloc frame failed\n");
return false;
} else {
(*pAvFrameBuffer)->format = dec_ctx->sample_fmt;
(*pAvFrameBuffer)->channels = dec_ctx->channels;
(*pAvFrameBuffer)->sample_rate = dec_ctx->sample_rate;
(*pAvFrameBuffer)->nb_samples = frame_size;
int ret = av_frame_get_buffer(*pAvFrameBuffer, 0);
if (ret < 0) {
char err[500];
av_log(NULL, AV_LOG_ERROR, "get audio buffer failed: %s\n",
av_make_error_string(err, AV_ERROR_MAX_STRING_SIZE, ret));
return false;
(*pAvFrameBuffer)->nb_samples = 0;
(*pAvFrameBuffer)->pts = pAvFrameIn->pts;
// copy input data to buffer
int n_channels = pAvFrameIn->channels;
int new_samples = min(pAvFrameIn->nb_samples - k0, frame_size - (*pAvFrameBuffer)->nb_samples);
int k1 = (*pAvFrameBuffer)->nb_samples;
if (pAvFrameIn->format == AV_SAMPLE_FMT_S16) {
int16_t *d_in = (int16_t *)pAvFrameIn->data[0];
d_in += n_channels * k0;
int16_t *d_out = (int16_t *)(*pAvFrameBuffer)->data[0];
d_out += n_channels * k1;
for (int i = 0; i < new_samples; ++i) {
for (int j = 0; j < pAvFrameIn->channels; ++j) {
*d_out++ = *d_in++;
} else {
printf("not handled format for audio buffer\n");
return false;
(*pAvFrameBuffer)->nb_samples += new_samples;
k0 += new_samples;
return true;
And the loop for fill buffer and encode is below:
// transcoding needed
int got_frame;
AVMediaType stream_type;
// decode the packet (do it your self)
decodePacket(packet, dec_ctx, &pAvFrame_, got_frame);
if (enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
ret = 0;
// break audio packet down to buffer
if (enc_ctx->frame_size > 0) {
int k = 0;
while (k < pAvFrame_->nb_samples) {
if (!putAudioBuffer(pAvFrame_, &pFrameAudio_, dec_ctx, enc_ctx->frame_size, k))
return false;
if (pFrameAudio_->nb_samples == enc_ctx->frame_size) {
// the buffer is full, encode it (do it yourself)
ret = encodeFrame(pFrameAudio_, stream_index, got_frame, false);
if (ret < 0)
return false;
pFrameAudio_->pts += enc_ctx->frame_size;
pFrameAudio_->nb_samples = 0;
} else {
ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
} else {
// encode packet directly
ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
You have to break sample buffer into chunks of size 1024, i did for recording mp3 in android for more info follow these links link1,links2
If anyone ended up here, I had the same issue, and just as #Mohit pointed out for AAC each audio frame has to be broken down into 1024 bytes chunks.
uint8_t *buffer = (uint8_t*) malloc(1024);
AVFrame *frame = av_frame_alloc();
while((fread(buffer, 1024, 1, fp)) == 1) {
frame->data[0] = buffer;
A possible solution is to use asetnsamples filter which sets the number of samples for each output audio frame :
You can feed the filter with your input frames and the resulting output frames each have the desired number of samples. The value for the number of samples in filter should be equal to frame_size of the encoder AVCodecContext.