FFMPEG libavformat internal buffering - ffmpeg

I'm using FFMPEG for a C++ audio streaming and playback application.
I use the avformat_open_input function to open an URL to an external compressed audio file and then I step through to stream using av_read_frame. Then for each packet i directly decode the data and queue it in the audio buffer using OpenAL.
My question is if FFMPEG internally prebuffers compressed data from the external URL?
Does FFMPEG keep downloading data in the background even if I don't call av_read_frame?
Or is it my responsibility to maintain a intermediate buffer where I download as many packets as possible ahead of time to avoid starving the audio-playback?
If so, how much does it buffer/download internally? Can I configure this?
I have been looking through the documentation but have not found any information on this.
Thanks.
Update:
According to this thread http://ffmpeg.zeranoe.com/forum/viewtopic.php?f=15&t=376 libav should by default prebuffer about 5MB depending on AVFormatContext::max_analyze_duration. However I haven't noticed this behavior and it doesn't seem to change if I alter max_analyze_duration.
If I monitor the memory consumption of my process it doesn't increase after I call avformat_open_input and if I simulate slow-network, av_read_frame directly stops working like if it didn't have any packets buffered.

Related

FFmpeg buffering before write

I have written a working audio transcoder using the new FFmpeg API.
As documented, the following behavior occurs:
At the beginning of decoding or encoding, the codec might accept
multiple input frames/packets without returning a frame...
This buffering can be on the order of 200ms before the write callback is invoked. Is there any easy way to tell FFmpeg to always immediately process and write all available data, or to adjust the internal buffer sizes?

Understanding ffmpeg re parameter

I was reading about the -re option in ffmpeg .
What they have mentioned is
From the docs
-re (input)
Read input at the native frame rate. Mainly used to simulate a grab device, or live input stream (e.g. when reading from a file). Should not be used with actual grab devices or live input streams (where it can cause packet loss). By default ffmpeg attempts to read the input(s) as fast as possible. This option will slow down the reading of the input(s) to the native frame rate of the input(s). It is useful for real-time output (e.g. live streaming).
My doubt is basically the part of the above description that I highlighted. It is suggested to not use the option during live input streams but in the end, it is suggested to use it in real-time output.
Considering a situation where both the input and output are in rtmp format, should I use it or not?
Don't use it. It's useful for real-time output when ffmpeg is able to process a source at a speed faster than real-time. In that scenario, ffmpeg may send output at that faster rate and the receiver may not be able to or want to buffer and queue its input.
It (-re) is suitable for streaming from offline files and reads them with its native speed (i.e. 25 fps); otherwise, FFmpeg may output hundreds of frames per second and this may cause problems.

FFMpeg - Is it difficultt to use

I am trying to use ffmpeg, and have been doing a lot of experiment last 1 month.
I have not been able to get through. Is it really difficult to use FFmpeg?
My requirement is simple as below.
Can you please guide me if ffmpeg is suitable one or I have implement on my own (using codec libs available).
I have a webm file (having VP8 and OPUS frames)
I will read the encoded data and send it to remote guy
The remote guy will read the encoded data from socket
The remote guy will write it to a file (can we avoid decoding).
Then remote guy should be able to pay the file using ffplay or any player.
Now I will take a specific example.
Say I have a file small.webm, containing VP8 and OPUS frames.
I am reading only audio frames (OPUS) using av_read_frame api (Then checks stream index and filters audio frames only)
So now I have data buffer (encoded) as packet.data and encoded data buffer size as packet.size (Please correct me if wrong)
Here is my first doubt, everytime audio packet size is not same, why the difference. Sometimes packet size is as low as 54 bytes and sometimes it is 420 bytes. For OPUS will frame size vary from time to time?
Next say somehow extract a single frame (really do not know how to extract a single frame) from packet and send it to remote guy.
Now remote guy need to write the buffer to a file. To write the file we can use av_interleaved_write_frame or av_write_frame api. Both of them takes AVPacket as argument. Now I can have a AVPacket, set its data and size member. Then I can call av_write_frame api. But that does not work. Reason may be one should set other members in packet like ts, dts, pts etc. But I do not have such informations to set.
Can somebody help me to learn if FFmpeg is the right choice, or should I write a custom logic like parse a opus file and get frame by frame.
Now remote guy need to write the buffer to a file. To write the file
we can use av_interleaved_write_frame or av_write_frame api. Both of
them takes AVPacket as argument. Now I can have a AVPacket, set its
data and size member. Then I can call av_write_frame api. But that
does not work. Reason may be one should set other members in packet
like ts, dts, pts etc. But I do not have such informations to set.
Yes, you do. They were in the original packet you received from the demuxer in the sender. You need to serialize all information in this packet and set each value accordingly in the receiver.

Osx: Core Audio: Parse raw, compressed audio data with AudioToolbox (to get PCM)

I am downloading various sound files with my own c++ http client (i.e. mp3's, aiff's etc.). Now I want to parse them using Core Audio's AudioToolbox, to get linear PCM data for playback with i.e. OpenAL. According to this document: https://developer.apple.com/library/mac/#documentation/MusicAudio/Conceptual/CoreAudioOverview/ARoadmaptoCommonTasks/ARoadmaptoCommonTasks.html , it should be possible to also create an audio file from memory. Unfortunately I didn't find any way of doing this when browsing the API, so what is the common way to do this? Please don't say that I should save the file to my hard drive first.
Thank you!
I have done this using an input memory buffer, avoiding any files, in my case I started with AAC audio format and used apple's api : AudioConverterFillComplexBuffer to do the hardware decompress into LPCM. The trick is you have to define a callback function to supply each packet of input data. That api call does the format conversion on a per packet basis. In my case I had to write code to parse the compressed AAC data to identify packet starts (0xfff) then use the callback to spoon feed each packet into the api call. I am also using OpenAL for audio rendering which has its own challenges to avoid using input files.

Encode WebCam frames with H.264 on .NET

What i want to do is the following procedure:
Get a frame from the Webcam.
Encode it with an H264 encoder.
Create a packet with that frame with my own "protocol" to send it via UDP.
Receive it and decode it...
It would be a live streaming.
Well i just need help with the Second step.
Im retrieving camera images with AForge Framework.
I dont want to write frames to files and then decode them, that would be very slow i guess.
I would like to handle encoded frames in memory and then create the packets to be sent.
I need to use an open source encoder. Already tryed with x264 following this example
How does one encode a series of images into H264 using the x264 C API?
but seems it only works on Linux, or at least thats what i thought after i saw like 50 errors when trying to compile the example with visual c++ 2010.
I have to make clear that i already did a lot of research (1 week reading) before writing this but couldnt find a (simple) way to do it.
I know there is the RTMP protocol, but the video stream will always be seen by one peroson at a(/the?) time and RTMP is more oriented to stream to many people. Also i already streamed with an adobe flash application i made but was too laggy ¬¬.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
I hope that at least someone could point me on(/at?) the right direction.
My english is not good maybe blah blah apologies. :P
PS: doesnt has to be in .NET, it can be in any language as long as it works on Windows.
Many many many many thanks in advance.
You could try your approach using Microsoft's DirectShow technology. There is an opensource x264 wrapper available for download at Monogram.
If you download the filter, you need to register it with the OS using regsvr32. I would suggest doing some quick testing to find out if this approach is feasible, use the GraphEdit tool to connect your webcam to the encoder and have a look at the configuration options.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
This really depends on the required latency: the more frames you package, the less header overhead, but the more latency since you have to wait for multiple frames to be encoded before you can send them. For live streaming the latency should be kept to a minimum and the typical protocols used are RTP/UDP. This implies that your maximum packet size is limited to the MTU of the network often requiring IDR frames to be fragmented and sent in multiple packets.
My advice would be to not worry about sending more frames in one packet until/unless you have a reason to. This is more often necessary with audio streaming since the header size (e.g. IP + UDP + RTP) is considered big in relation to the audio payload.

Resources