FFmpeg command line CPU usage vs code CPU usage - ffmpeg

On my existing PC/IP Camera combination, the command line function
ffmpeg -i rtsp://abcd:123456#1.2.3.4/rtspvideostream /home/pc/video.avi
correctly writes the video stream to file and uses approximately 30% of my CPU.
The command line function
ffmpeg -i rtsp://abcd:123456#1.2.3.4/rtspvideostream -vcodec copy /home/pc/video.avi
uses approximately 3% of my CPU for seemingly the same result. I assume the removal of some functionality related to the codec contributes to this CPU saving.
Using the following standard ffmpeg initialisation:
AVFormatContext *pFormatCtx;
AVCodecContext *pCodecCtx;
AVDictionary *opts = NULL;
av_dict_set(&opts, "rtsp_transport", "tcp", 0);
avformat_open_input(&pFormatCtx,"rtsp://abcd:123456#1.2.3.4/rtspvideostream", NULL, &opts);
avformat_find_stream_info(pFormatCtx,NULL);
int videoStream = -1;
for(int i=0; i<(int)pFormatCtx->nb_streams; i++)
{
if(pFormatCtx->streams[i]->codec->codec_type== AVMEDIA_TYPE_VIDEO)
{
videoStream=i;
break;
}
}
pCodecCtx=pFormatCtx->streams[videoStream]->codec;
pCodec=avcodec_find_decoder(pCodecCtx->codec_id);
avcodec_open2(pCodecCtx, pCodec, NULL)
AVFrame *pFrame = av_frame_alloc();
int numBytes = avpicture_get_size(AV_PIX_FMT_YUV420P, IMAGEWIDTH, IMAGEHEIGHT);
uint8_t *buffer12 = (uint8_t*)av_malloc(numBytes*sizeof(uint8_t));
avpicture_fill((AVPicture *)pFrame, buffer12, AV_PIX_FMT_YUV420P, IMAGEWIDTH, IMAGEHEIGHT);
with the standard reading implementation:
int frameFinished = 0;
while(frameFinished == 0)
{
av_read_frame(pFormatCtx, &packet);
if(packet.stream_index==videoStream)
{
avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);
}
if( packet.duration ) av_free_packet(&packet);
packet.duration = 0;
}
correctly gets the video stream and uses approximately 30% of CPU.
In the command-line ffmpeg implementation, the addition of the parameters '-vcodec copy' dramatically decreases CPU usage. I am unable to reproduce a similar drop in CPU usage for the above coding implementation.
Assuming it is possible, how do I do it ?

The -vcodec copy implies that the video stream is copied as is (it is not being decoded/encdoed) so the CPU usage is pure I/O. To do the same you should remove all the codec opening stuff and decoding (avcodec_decode_video2) from your code and just write the packet directly to the output stream.

Related

Why is ffmpeg faster than this minimal example?

I'm wanting to read the audio out of a video file as fast as possible, using the libav libraries. It's all working fine, but it seems like it could be faster.
To get a performance baseline, I ran this ffmpeg command and timed it:
time ffmpeg -threads 1 -i file -map 0:a:0 -f null -
On a test file (a 2.5gb 2hr .MOV with pcm_s16be audio) this comes out to about 1.35 seconds on my M1 Macbook Pro.
On the other hand, this minimal C code (based on FFmpeg's "Demuxing and decoding" example) is consistently around 0.3 seconds slower.
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
static int decode_packet(AVCodecContext *dec, const AVPacket *pkt, AVFrame *frame)
{
int ret = 0;
// submit the packet to the decoder
ret = avcodec_send_packet(dec, pkt);
// get all the available frames from the decoder
while (ret >= 0) {
ret = avcodec_receive_frame(dec, frame);
av_frame_unref(frame);
}
return 0;
}
int main (int argc, char **argv)
{
int ret = 0;
AVFormatContext *fmt_ctx = NULL;
AVCodecContext *dec_ctx = NULL;
AVFrame *frame = NULL;
AVPacket *pkt = NULL;
if (argc != 3) {
exit(1);
}
int stream_idx = atoi(argv[2]);
/* open input file, and allocate format context */
avformat_open_input(&fmt_ctx, argv[1], NULL, NULL);
/* get the stream */
AVStream *st = fmt_ctx->streams[stream_idx];
/* find a decoder for the stream */
AVCodec *dec = avcodec_find_decoder(st->codecpar->codec_id);
/* allocate a codec context for the decoder */
dec_ctx = avcodec_alloc_context3(dec);
/* copy codec parameters from input stream to output codec context */
avcodec_parameters_to_context(dec_ctx, st->codecpar);
/* init the decoder */
avcodec_open2(dec_ctx, dec, NULL);
/* allocate frame and packet structs */
frame = av_frame_alloc();
pkt = av_packet_alloc();
/* read frames from the specified stream */
while (av_read_frame(fmt_ctx, pkt) >= 0) {
if (pkt->stream_index == stream_idx)
ret = decode_packet(dec_ctx, pkt, frame);
av_packet_unref(pkt);
if (ret < 0)
break;
}
/* flush the decoders */
decode_packet(dec_ctx, NULL, frame);
return ret < 0;
}
I tried measuring parts of this program to see if it was spending a lot of time in the setup, but it's not – at least 1.5 seconds of the runtime is the loop where it's reading frames.
So I took some flamegraph recordings (using cargo-flamegraph) and ran each a few times to make sure the timing was consistent. There's probably some overhead since both were consistently higher than running normally, but they still have the ~0.3 second delta.
# 1.812 total
time sudo flamegraph ./minimal file 1
# 1.542 total
time sudo flamegraph ffmpeg -threads 1 -i file -map 0:a:0 -f null - 2>&1
Here are the flamegraphs stacked up, scaled so that the faster one is only 85% as wide as the slower one. (click for larger)
The interesting thing that stands out to me is how long is spent on read in the minimal example vs. ffmpeg:
The time spent on lseek is also a lot longer in the minimal program – it's plainly visible in that flamegraph, but in the ffmpeg flamegraph, lseek is a single pixel wide.
What's causing this discrepancy? Is ffmpeg actually doing less work than I think it is here? Is the minimal code doing something naive? Is there some buffering or other I/O optimizations that ffmpeg has enabled?
How can I shave 0.3 seconds off of the minimal example's runtime?
The difference is that ffmpeg, when run with the -map flag, is explicitly setting the AVDISCARD_ALL flag on the streams that were going to be ignored. The packets for those streams still get read from disk, but with this flag set, they never make it into av_read_frame (with the mov demuxer, at least).
In the example code, by contrast, this while loop receives every packet from every stream, and only drops the packets after they've been (wastefully) passed through av_read_frame.
/* read frames from the specified stream */
while (av_read_frame(fmt_ctx, pkt) >= 0) {
if (pkt->stream_index == stream_idx)
ret = decode_packet(dec_ctx, pkt, frame);
av_packet_unref(pkt);
if (ret < 0)
break;
}
I changed the program to set the discard flag on the unused streams:
// ...
/* open input file, and allocate format context */
avformat_open_input(&fmt_ctx, argv[1], NULL, NULL);
/* get the stream */
AVStream *st = fmt_ctx->streams[stream_idx];
/* discard packets from other streams */
for(int i = 0; i < fmt_ctx->nb_streams; i++) {
fmt_ctx->streams[i]->discard = AVDISCARD_ALL;
}
st->discard = AVDISCARD_DEFAULT;
// ...
With that change in place, it gives about a ~1.8x speedup on the same test file, after the cache is warmed up.
Minimal example, without discard 1.593s
ffmpeg with -map 0:a:0 1.404s
Minimal example, with discard 0.898s

Encoding of raw frames (D3D11Texture2D) to an rtsp stream using libav*

I have managed to create a rtsp stream using libav* and directX texture (which I am obtaining from GDI API using Bitblit method). Here's my approach for creating live rtsp stream:
Create output context and stream (skipping the checks here)
avformat_alloc_output_context2(&ofmt_ctx, NULL, "rtsp", rtsp_url); //RTSP
vid_codec = avcodec_find_encoder(ofmt_ctx->oformat->video_codec);
vid_stream = avformat_new_stream(ofmt_ctx,vid_codec);
vid_codec_ctx = avcodec_alloc_context3(vid_codec);
Set codec params
codec_ctx->codec_tag = 0;
codec_ctx->codec_id = ofmt_ctx->oformat->video_codec;
//codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
codec_ctx->width = width; codec_ctx->height = height;
codec_ctx->gop_size = 12;
//codec_ctx->gop_size = 40;
//codec_ctx->max_b_frames = 3;
codec_ctx->pix_fmt = target_pix_fmt; // AV_PIX_FMT_YUV420P
codec_ctx->framerate = { stream_fps, 1 };
codec_ctx->time_base = { 1, stream_fps};
if (fctx->oformat->flags & AVFMT_GLOBALHEADER)
{
codec_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
}
Initialize video stream
if (avcodec_parameters_from_context(stream->codecpar, codec_ctx) < 0)
{
Debug::Error("Could not initialize stream codec parameters!");
return false;
}
AVDictionary* codec_options = nullptr;
if (codec->id == AV_CODEC_ID_H264) {
av_dict_set(&codec_options, "profile", "high", 0);
av_dict_set(&codec_options, "preset", "fast", 0);
av_dict_set(&codec_options, "tune", "zerolatency", 0);
}
// open video encoder
int ret = avcodec_open2(codec_ctx, codec, &codec_options);
if (ret<0) {
Debug::Error("Could not open video encoder: ", avcodec_get_name(codec->id), " error ret: ", AVERROR(ret));
return false;
}
stream->codecpar->extradata = codec_ctx->extradata;
stream->codecpar->extradata_size = codec_ctx->extradata_size;
Start streaming
// Create new frame and allocate buffer
AVFrame* AllocateFrameBuffer(AVCodecContext* codec_ctx, double width, double height)
{
AVFrame* frame = av_frame_alloc();
std::vector<uint8_t> framebuf(av_image_get_buffer_size(codec_ctx->pix_fmt, width, height, 1));
av_image_fill_arrays(frame->data, frame->linesize, framebuf.data(), codec_ctx->pix_fmt, width, height, 1);
frame->width = width;
frame->height = height;
frame->format = static_cast<int>(codec_ctx->pix_fmt);
//Debug::Log("framebuf size: ", framebuf.size(), " frame format: ", frame->format);
return frame;
}
void RtspStream(AVFormatContext* ofmt_ctx, AVStream* vid_stream, AVCodecContext* vid_codec_ctx, char* rtsp_url)
{
printf("Output stream info:\n");
av_dump_format(ofmt_ctx, 0, rtsp_url, 1);
const int width = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetTextureWidth();
const int height = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetTextureHeight();
//DirectX BGRA to h264 YUV420p
SwsContext* conversion_ctx = sws_getContext(width, height, src_pix_fmt,
vid_stream->codecpar->width, vid_stream->codecpar->height, target_pix_fmt,
SWS_BICUBIC | SWS_BITEXACT, nullptr, nullptr, nullptr);
if (!conversion_ctx)
{
Debug::Error("Could not initialize sample scaler!");
return;
}
AVFrame* frame = AllocateFrameBuffer(vid_codec_ctx,vid_codec_ctx->width,vid_codec_ctx->height);
if (!frame) {
Debug::Error("Could not allocate video frame\n");
return;
}
if (avformat_write_header(ofmt_ctx, NULL) < 0) {
Debug::Error("Error occurred when writing header");
return;
}
if (av_frame_get_buffer(frame, 0) < 0) {
Debug::Error("Could not allocate the video frame data\n");
return;
}
int frame_cnt = 0;
//av start time in microseconds
int64_t start_time_av = av_gettime();
AVRational time_base = vid_stream->time_base;
AVRational time_base_q = { 1, AV_TIME_BASE };
// frame pixel data info
int data_size = width * height * 4;
uint8_t* data = new uint8_t[data_size];
// AVPacket* pkt = av_packet_alloc();
while (RtspStreaming::IsStreaming())
{
/* make sure the frame data is writable */
if (av_frame_make_writable(frame) < 0)
{
Debug::Error("Can't make frame writable");
break;
}
//get copy/ref of the texture
//uint8_t* data = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetBuffer();
if (!WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetPixels(data, 0, 0, width, height))
{
Debug::Error("Failed to get frame buffer. ID: ", RtspStreaming::WindowId());
std::this_thread::sleep_for (std::chrono::seconds(2));
continue;
}
//printf("got pixels data\n");
// convert BGRA to yuv420 pixel format
int srcStrides[1] = { 4 * width };
if (sws_scale(conversion_ctx, &data, srcStrides, 0, height, frame->data, frame->linesize) < 0)
{
Debug::Error("Unable to scale d3d11 texture to frame. ", frame_cnt);
break;
}
//Debug::Log("frame pts: ", frame->pts, " time_base:", av_rescale_q(1, vid_codec_ctx->time_base, vid_stream->time_base));
frame->pts = frame_cnt++;
//frame_cnt++;
//printf("scale conversion done\n");
//encode to the video stream
int ret = avcodec_send_frame(vid_codec_ctx, frame);
if (ret < 0)
{
Debug::Error("Error sending frame to codec context! ",frame_cnt);
break;
}
AVPacket* pkt = av_packet_alloc();
//av_init_packet(pkt);
ret = avcodec_receive_packet(vid_codec_ctx, pkt);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
{
//av_packet_unref(pkt);
av_packet_free(&pkt);
continue;
}
else if (ret < 0)
{
Debug::Error("Error during receiving packet: ",AVERROR(ret));
//av_packet_unref(pkt);
av_packet_free(&pkt);
break;
}
if (pkt->pts == AV_NOPTS_VALUE)
{
//Write PTS
//Duration between 2 frames (us)
int64_t calc_duration = (double)AV_TIME_BASE / av_q2d(vid_stream->r_frame_rate);
//Parameters
pkt->pts = (double)(frame_cnt * calc_duration) / (double)(av_q2d(time_base) * AV_TIME_BASE);
pkt->dts = pkt->pts;
pkt->duration = (double)calc_duration / (double)(av_q2d(time_base) * AV_TIME_BASE);
}
int64_t pts_time = av_rescale_q(pkt->dts, time_base, time_base_q);
int64_t now_time = av_gettime() - start_time_av;
if (pts_time > now_time)
av_usleep(pts_time - now_time);
//pkt.pts = av_rescale_q_rnd(pkt.pts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));
//pkt.dts = av_rescale_q_rnd(pkt.dts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));
//pkt.duration = av_rescale_q(pkt.duration, in_stream->time_base, out_stream->time_base);
//pkt->pos = -1;
//write frame and send
if (av_interleaved_write_frame(ofmt_ctx, pkt)<0)
{
Debug::Error("Error muxing packet, frame number:",frame_cnt);
break;
}
//Debug::Log("RTSP streaming...");
//sstd::this_thread::sleep_for(std::chrono::milliseconds(1000/20));
//av_packet_unref(pkt);
av_packet_free(&pkt);
}
//av_free_packet(pkt);
delete[] data;
/* Write the trailer, if any. The trailer must be written before you
* close the CodecContexts open when you wrote the header; otherwise
* av_write_trailer() may try to use memory that was freed on
* av_codec_close(). */
av_write_trailer(ofmt_ctx);
av_frame_unref(frame);
av_frame_free(&frame);
printf("streaming thread CLOSED!\n");
}
Now, this allows me to connect to my rtsp server and maintain the connection. However, on the rtsp client side I am getting either gray or single static frame as shown below:
Would appreciate if you can help with following questions:
Firstly, why the stream is not working in spite of continued connection to the server and updating frames?
Video codec. By default rtsp format uses Mpeg4 codec, is it possible to use h264? When I manually set it to AV_CODEC_ID_H264 the program fails at avcodec_open2 with return value of -22.
Do I need to create and allocate new "AVFrame" and "AVPacket" for every frame? Or can I just reuse global variable for this?
Do I need to explicitly define some code for real-time streaming? (Like in ffmpeg we use "-re" flag).
Would be great if you can point out some example code for creating livestream. I have checked following resources:
https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/encode_video.c
streaming FLV to RTMP with FFMpeg using H264 codec and C++ API to flv.js
https://medium.com/swlh/streaming-video-with-ffmpeg-and-directx-11-7395fcb372c4
Update
While test I found that I am able to play the stream using ffplay, while it's getting stuck on VLC player. Here is snapshot on the ffplay log
The basic construct and initialization seems to be okay. Find below responses to your questions
why the stream is not working in spite of continued connection to the server and updating frames?
If you're getting an error or broken stream, you might wanna check into your presentation and decompression timestamps (pts/dts) of your packet.
In your code, I notice that you're taking time_base from video stream object which is not guranteed to be same as codec->time_base value and usually varies depending upon active stream.
AVRational time_base = vid_stream->time_base;
AVRational time_base_q = { 1, AV_TIME_BASE };
Video codec. By default rtsp format uses Mpeg4 codec, is it possible to use h264?
I don't see why not... RTSP is just a protocol for carrying your packets over the network. So you should be able use AV_CODEC_ID_H264 for encoding the stream.
Do I need to create and allocate new "AVFrame" and "AVPacket" for every frame? Or can I just reuse global variable for this?
In libav during encoding process a single packet is used for encoding a video frame, while there can be multiple audio frames in a single packet. I should reference this, but can't seem to find any source at the moment. But anyways the point is you would need to create new packet every time.
Do I need to explicitly define some code for real-time streaming? (Like in ffmpeg we use "-re" flag).
You don't need to add anything else for real time streaming. Although you might wanna implement it to limit the number of frame updates that you pass to encoder and save some performance.
for me the difference between ffplay good capture and VLC bad capture (for UDP packets) was pkt_size=xxx attribute (ffmpeg -re -i test.mp4 -f mpegts udp://127.0.0.1:23000?pkt_size=1316) (VLC open media network tab udp://#:23000:pkt_size=1316). So only if pkt_size is defined (and equal) VLC is able to capture.

Replace Bento4 with libav / ffmpeg

We use Bento4 - a really well designed SDK - to demux mp4 files in .mov containers. Decoding is done by an own codec, so only the raw (intraframe) samples are needed. By now this works pretty straightforward
AP4_Track *test_videoTrack = nullptr;
AP4_ByteStream *input = nullptr;
AP4_Result result = AP4_FileByteStream::Create(filename, AP4_FileByteStream::STREAM_MODE_READ, input);
AP4_File m_file (*input, true);
//
// Read movie tracks, and metadata, find the video track
size_t index = 0;
uint32_t m_width = 0, m_height = 0;
auto item = m_file.GetMovie()->GetTracks().FirstItem();
auto track = item->GetData();
if (track->GetType() == AP4_Track::TYPE_VIDEO)
{
m_width = (uint32_t)((double)test_videoTrack->GetWidth() / double(1 << 16));
m_height = (uint32_t)((double)test_videoTrack->GetHeight() / double(1 << 16));
std::string codec("unknown");
auto sd = track->GetSampleDescription(0);
AP4_String c;
if (AP4_SUCCEEDED(sd->GetCodecString(c)))
{
codec = c.GetChars();
}
// Find and instantiate the decoder
AP4_Sample sample;
AP4_DataBuffer sampleData;
test_videoTrack->ReadSample(0, sample, sampleData);
}
For several reasons we would prefer replacing Bento4 with libav/ffmpeg (mainly because we already have in the project and want to reduce dependencies)
How would we ( preferrably in pseudo-code ) replace the Bento4-tasks done above with libav? Please remember that the used codec is not in the ffmpeg library, so we cannot use the standard ffmpeg decoding examples. Opening the media file simply fails. Without decoder we got no size or any other info so far. What we need would
open the media file
get contained tracks (possibly also audio)
get track size / length info
get track samples by index
It turned out to be very easy:
AVFormatContext* inputFile = avformat_alloc_context();
avformat_open_input(&inputFile, filename, nullptr, nullptr);
avformat_find_stream_info(inputFile, nullptr);
//Get just two streams...First Video & First Audio
int videoStreamIndex = -1, audioStreamIndex = -1;
for (int i = 0; i < inputFile->nb_streams; i++)
{
if (inputFile->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO && videoStreamIndex == -1)
{
videoStreamIndex = i;
}
else if (inputFile->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO && audioStreamIndex == -1)
{
audioStreamIndex = i;
}
}
Now test for the correct codec tag
// get codec id
char ct[64] = {0};
static const char* codec_id = "MPAK";
av_get_codec_tag_string( ct, sizeof(ct),inputFile->streams[videoStreamIndex]->codec->codec_tag);
assert(strncmp( ct , codec_id, strlen(codec_id)) == 0)
I did not know that the sizes are set even before a codec is chosen (or even available).
// lookup size
Size2D mediasize(inputFile->streams[videoStreamIndex]->codec->width, inputFile->streams[videoStreamIndex]->codec->height);
Seeking by frame and unpacking (video) is done like this:
AVStream* s = m_file->streams[videoStreamIndex];
int64_t seek_ts = (int64_t(frame_index) * s->r_frame_rate.den * s->time_base.den) / (int64_t(s->r_frame_rate.num) * s->time_base.num);
av_seek_frame(m_hap_file, videoStreamIndex, seek_ts, AVSEEK_FLAG_ANY);
AVPacket pkt;
av_read_frame(inputFile, &pkt);
Now the packet contains a frame ready to unpack with own decoder.

How to fill audio AVFrame (ffmpeg) with the data obtained from CMSampleBufferRef (AVFoundation)?

I am writing program for streaming live audio and video from webcamera to rtmp-server. I work in MacOS X 10.8, so I use AVFoundation framework for obtaining audio and video frames from input devices. This frames come into delegate:
-(void) captureOutput:(AVCaptureOutput*)captureOutput didOutputSampleBuffer: (CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection*)connection ,
where sampleBuffer contains audio or video data.
When I recieve audio data in the sampleBuffer, I'm trying to convert this data into AVFrame and encode AVFramewith libavcodec:
aframe = avcodec_alloc_frame(); //AVFrame *aframe;
int got_packet, ret;
CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer); //CMSampleBufferRef
NSUInteger channelIndex = 0;
CMBlockBufferRef audioBlockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t audioBlockBufferOffset = (channelIndex * numSamples * sizeof(SInt16));
size_t lengthAtOffset = 0;
size_t totalLength = 0;
SInt16 *samples = NULL;
CMBlockBufferGetDataPointer(audioBlockBuffer, audioBlockBufferOffset, &lengthAtOffset, &totalLength, (char **)(&samples));
const AudioStreamBasicDescription *audioDescription = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(sampleBuffer));
aframe->nb_samples =(int) numSamples;
aframe->channels=audioDescription->mChannelsPerFrame;
aframe->sample_rate=(int)audioDescription->mSampleRate;
//my webCamera configured to produce 16bit 16kHz LPCM mono, so sample format hardcoded here, and seems to be correct
avcodec_fill_audio_frame(aframe, aframe->channels, AV_SAMPLE_FMT_S16,
(uint8_t *)samples,
aframe->nb_samples *
av_get_bytes_per_sample(AV_SAMPLE_FMT_S16) *
aframe->channels, 0);
//encoding audio
ret = avcodec_encode_audio2(c, &pkt, aframe, &got_packet);
if (ret < 0) {
fprintf(stderr, "Error encoding audio frame: %s\n", av_err2str(ret));
exit(1);
}
The problem is that when I get so formed frames, I can hear the wanted sound, but it is slowing down and discontinuous (as if after each data frame comes the same frame of silence). It seems that something is wrong in the transformation from CMSampleBuffer to AVFrame , because the preview from the microphone created with AVFoundation from the same sample buffers played normally.
I would be grateful for your help.
UPD: Creating and initializing the AVCodceContext structure
audio_codec= avcodec_find_encoder(AV_CODEC_ID_AAC);
if (!(audio_codec)) {
fprintf(stderr, "Could not find encoder for '%s'\n",
avcodec_get_name(AV_CODEC_ID_AAC));
exit(1);
}
audio_st = avformat_new_stream(oc, audio_codec); //AVFormatContext *oc;
if (!audio_st) {
fprintf(stderr, "Could not allocate stream\n");
exit(1);
}
audio_st->id=1;
audio_st->codec->sample_fmt= AV_SAMPLE_FMT_S16;
audio_st->codec->bit_rate = 64000;
audio_st->codec->sample_rate= 16000;
audio_st->codec->channels=1;
audio_st->codec->codec_type= AVMEDIA_TYPE_AUDIO;

Converting 3gp (amr) to mp3 using ffmpeg api calls

Converting 3gp (amr) to mp3 using ffmpeg api calls
I try to use libavformat (ffmpeg) to build my own function that converts 3gp audio files (recorded with an android mobile device) into mp3 files.
I use av_read_frame() to read a frame from the input file and use avcodec_decode_audio3() to decode the data
into a buffer and use this buffer to encode the data into mp3 with avcodec_encode_audio.
This seems to give me a correct result for converting wav to mp3 and mp3 to wav (Or decode one mp3 and encode to another mp3) but not for amr to mp3.
My resulting mp3 file seems to has the right length but only consists of noise.
In another post I read that amr-decoder does not use the same sample format than mp3 does.
AMR uses FLT and mp3 S16 or S32 und that I have to do resampling.
So I call av_audio_resample_init() and audio_resample for each frame that has been decoded.
But that does not solve my problem completely. Now I can hear my recorded voice and unsterstand what I was saying, but the quality is very low and there is still a lot of noise.
I am not sure if I set the parameters of av_audio_resample correctly, especially the last 4 parameters (I think not) or if I miss something else.
ReSampleContext* reSampleContext = av_audio_resample_init(1, 1, 44100, 8000, AV_SAMPLE_FMT_S32, AV_SAMPLE_FMT_FLT, 0, 0, 0, 0.0);
while(1)
{
if(av_read_frame(ic, &avpkt) < 0)
{
break;
}
out_size = AVCODEC_MAX_AUDIO_FRAME_SIZE;
int count;
count = avcodec_decode_audio3(audio_stream->codec, (short *)decodedBuffer, &out_size, &avpkt);
if(count < 0)
{
break;
}
if((audio_resample(reSampleContext, (short *)resampledBuffer, (short *)decodedBuffer, out_size / 4)) < 0)
{
fprintf(stderr, "Error\n");
exit(1);
}
out_size = AVCODEC_MAX_AUDIO_FRAME_SIZE;
pktOut.size = avcodec_encode_audio(c, outbuf, out_size, (short *)resampledBuffer);
if(c->coded_frame && c->coded_frame->pts != AV_NOPTS_VALUE)
{
pktOut.pts = av_rescale_q(c->coded_frame->pts, c->time_base, outStream->time_base);
//av_res
}
pktOut.pts = AV_NOPTS_VALUE;
pktOut.dts = AV_NOPTS_VALUE;
pktOut.flags |= AV_PKT_FLAG_KEY;
pktOut.stream_index = audio_stream->index;
pktOut.data = outbuf;
if(av_write_frame(oc, &pktOut) != 0)
{
fprintf(stderr, "Error while writing audio frame\n");
exit(1);
}
}

Resources