I'm developing an application that needs to publish a media stream to an rtmp "ingestion" url (as used in YouTube Live, or as input to Wowza Streaming Engine, etc), and I'm using the ffmpeg library (programmatically, from C/C++, not the command line tool) to handle the rtmp layer. I've got a working version ready, but am seeing some problems when streaming higher bandwidth streams to servers with worse ping. The problem exists both when using the ffmpeg "native"/builtin rtmp implementation and the librtmp implementation.
When streaming to a local target server with low ping through a good network (specifically, a local Wowza server), my code has so far handled every stream I've thrown at it and managed to upload everything in real time - which is important, since this is meant exclusively for live streams.
However, when streaming to a remote server with a worse ping (e.g. the youtube ingestion urls on a.rtmp.youtube.com, which for me have 50+ms pings), lower bandwidth streams work fine, but with higher bandwidth streams the network is underutilized - for example, for a 400kB/s stream, I'm only seeing ~140kB/s network usage, with a lot of frames getting delayed/dropped, depending on the strategy I'm using to handle network pushback.
Now, I know this is not a problem with the network connection to the target server, because I can successfully upload the stream in real time when using the ffmpeg command line tool to the same target server or using my code to stream to a local Wowza server which then forwards the stream to the youtube ingestion point.
So the network connection is not the problem and the issue seems to lie with my code.
I've timed various parts of my code and found that when the problem appears, calls to av_write_frame / av_interleaved_write_frame (I never mix & match them, I am always using one version consistently in any specific build, it's just that I've experimented with both to see if there is any difference) sometimes take a really long time - I've seen those calls sometimes take up to 500-1000ms, though the average "bad case" is in the 50-100ms range. Not all calls to them take this long, most return instantly, but the average time spent in these calls grows bigger than the average frame duration, so I'm not getting a real time upload anymore.
The main suspect, it seems to me, could be the rtmp Acknowledgement Window mechanism, where a sender of data waits for a confirmation of receipt after sending every N bytes, before sending any more data - this would explain the available network bandwidth not being fully used, since the client would simply sit there and wait for a response (which takes a longer time because of the lower ping), instead of using the available bandwidth. Though I haven't looked at ffmpeg's rtmp/librtmp code to see if it actually implements this kind of throttling, so it could be something else entirely.
The full code of the application is too much to post here, but here are some important snippets:
Format context creation:
const int nAVFormatContextCreateError = avformat_alloc_output_context2(&m_pAVFormatContext, nullptr, "flv", m_sOutputUrl.c_str());
Stream creation:
m_pVideoAVStream = avformat_new_stream(m_pAVFormatContext, nullptr);
m_pVideoAVStream->id = m_pAVFormatContext->nb_streams - 1;
m_pAudioAVStream = avformat_new_stream(m_pAVFormatContext, nullptr);
m_pAudioAVStream->id = m_pAVFormatContext->nb_streams - 1;
Video stream setup:
m_pVideoAVStream->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
m_pVideoAVStream->codecpar->codec_id = AV_CODEC_ID_H264;
m_pVideoAVStream->codecpar->width = nWidth;
m_pVideoAVStream->codecpar->height = nHeight;
m_pVideoAVStream->codecpar->format = AV_PIX_FMT_YUV420P;
m_pVideoAVStream->codecpar->bit_rate = 10 * 1000 * 1000;
m_pVideoAVStream->time_base = AVRational { 1, 1000 };
m_pVideoAVStream->codecpar->extradata_size = int(nTotalSizeRequired);
m_pVideoAVStream->codecpar->extradata = (uint8_t*)av_malloc(m_pVideoAVStream->codecpar->extradata_size + AV_INPUT_BUFFER_PADDING_SIZE);
// Fill in the extradata here - I'm sure I'm doing that correctly.
Audio stream setup:
m_pAudioAVStream->time_base = AVRational { 1, 1000 };
// Let's leave creation of m_pAudioCodecContext out of the scope of this question, I'm quite sure everything is done right there.
const int nAudioCodecCopyParamsError = avcodec_parameters_from_context(m_pAudioAVStream->codecpar, m_pAudioCodecContext);
Opening the connection:
const int nAVioOpenError = avio_open2(&m_pAVFormatContext->pb, m_sOutputUrl.c_str(), AVIO_FLAG_WRITE);
Starting the stream:
AVDictionary * pOptions = nullptr;
const int nWriteHeaderError = avformat_write_header(m_pAVFormatContext, &pOptions);
Sending a video frame:
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.dts = nTimestamp;
pkt.pts = nTimestamp;
pkt.duration = nDuration; // I know what I have the wrong duration sometimes, but I don't think that's the issue.
pkt.data = pFrameData;
pkt.size = pFrameDataSize;
pkt.flags = bKeyframe ? AV_PKT_FLAG_KEY : 0;
pkt.stream_index = m_pVideoAVStream->index;
const int nWriteFrameError = av_write_frame(m_pAVFormatContext, &pkt); // This is where too much time is spent.
Sending an audio frame:
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.pts = m_nTimestampMs;
pkt.dts = m_nTimestampMs;
pkt.duration = m_nDurationMs;
pkt.stream_index = m_pAudioAVStream->index;
const int nWriteFrameError = av_write_frame(m_pAVFormatContext, &pkt);
Any ideas? Am I on the right track with thinking about the Acknowledgement Window? Am I doing something else completely wrong?
I don't think this explains everything, but, just in case, for someone in a similar situation, the fix/workaround I found was:
1) build ffmpeg with the librtmp implementation of the rtmp protocol
2) build ffmpeg with --enable-network, it adds a couple of features to the librtmp protocol
3) pass "rtmp_buffer_size" parameter to avio_open2, and increase it's value to a satisfactory one
I can't give you a full step-by-step explanation of what was going wrong, but this fixed at least the symptom that was causing me problems.
Related
I am trying to make a custom media sink for video playback in an OpenGL application (without the various WGL_NV_DX_INTEROP, as I am not sure if all my target devices support this).
What I have done so far is to write a custom stream sink that accepts RGB32 samples and set up playback with a media session, however i encountered a problem with initial testing of playing an mp4 file:
one (or more) of the MFTs in the generated topology keep failing with an error code MF_E_TRANSFORM_NEED_MORE_INPUT, therefore my stream sink never receives samples
After a few samples have been requested, the media session receives the event MF_E_ATTRIBUTENOTFOUND, but I still don't know where it is coming from
If, however, I configure the stream sink to receive NV12 samples, everything seems to work fine.
My best guess is the color converter MFT generated by the TopologyLoader needs some more configuration, but I don't know how to do that, considering that I need to keep this entire process indipendent from the original file types.
I've made a minimal test case, that demonstrate the use of a custom video renderer with a classical Media Session.
I use big_buck_bunny_720p_50mb.mp4, and i don't see any problems using RGB32 format.
Sample code here : https://github.com/mofo7777/Stackoverflow under MinimalSinkRenderer.
EDIT
Your program works well with big_buck_bunny_720p_50mb.mp4. I think that your mp4 file is the problem. Share it, if you can.
I just made a few changes :
You Stop on MESessionEnded, and you Close on MESessionStopped.
case MediaEventType.MESessionEnded:
Debug.WriteLine("MediaSession:SesssionEndedEvent");
hr = mediaSession.Stop();
break;
case MediaEventType.MESessionClosed:
Debug.WriteLine("MediaSession:SessionClosedEvent");
receiveSessionEvent = false;
break;
case MediaEventType.MESessionStopped:
Debug.WriteLine("MediaSession:SesssionStoppedEvent");
hr = mediaSession.Close();
break;
default:
Debug.WriteLine("MediaSession:Event: " + eventType);
break;
Adding this to wait for the sound, and to check sample is ok :
internal HResult ProcessSample(IMFSample s)
{
//Debug.WriteLine("Received sample!");
CurrentFrame++;
if (s != null)
{
long llSampleTime = 0;
HResult hr = s.GetSampleTime(out llSampleTime);
if (hr == HResult.S_OK && ((CurrentFrame % 50) == 0))
{
TimeSpan ts = TimeSpan.FromMilliseconds(llSampleTime / (10000000 / 1000));
Debug.WriteLine("Frame {0} : {1}", CurrentFrame.ToString(), ts.ToString());
}
// Do not call SafeRelease here, it is done by the caller, it is a parameter
//SafeRelease(s);
}
System.Threading.Thread.Sleep(26);
return HResult.S_OK;
}
In
public HResult SetPresentationClock(IMFPresentationClock pPresentationClock)
adding
SafeRelease(PresentationClock);
before
if (pPresentationClock != null)
PresentationClock = pPresentationClock;
In a user-space application, I'm writing v210 formatted video data to a V4L2 loopback device. When I watch the video in VLC or other viewer, I just get clownbarf and claims that the stream is UYUV or other, not v210. I suspect I need to tell the loopback device something more than what I have, to make the stream appear as v210 to the viewer. Is there one more place/way to tell it that it'll be handling a certain format?
What I do now:
int frame_w, frame_h = ((some sane values))
outputfd = open("/dev/video4", O_RDWR);
// check VIDIOC_QUERYCAPS, ...
struct v4l2_format fmt;
memset(&fmt, 0, sizeof(fmt));
fmt.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
fmt.fmt.pix.width = frame_w;
fmt.fmt.pix.height = frame_h;
fmt.fmt.pix.bytesperline = inbpr; // no padding
fmt.fmt.pix.field = 1;
fmt.fmt.pix.sizeimage = frame_h * fmt.fmt.pix.bytesperline;
fmt.fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
v210width = (((frame_w+47)/48)*48); // round up to mult of 48 px
byte_per_row = (v210width*8)/3;
fmt.fmt.pix.pixelformat = 'v' | '2' << 8 | '1' << 16 | '0' << 24;
fmt.fmt.pix.width = v210width;
fmt.fmt.pix.bytesperline = byte_per_row ;
ioctl(outputfd, VIDIOC_S_FMT, &fmt);
// later, in some inner loop...
... write stuff to uint8_t buffer[] ...
write(outputfd, buffer, buffersize);
If I write UYVY format, or RGB or others, it can be made to work. Viewers display the video and report the correct format.
This code is based on examples, reading the V4L docs, and some working in-house code. No one here knows exactly what are all the things one must do to open and write to a video device.
While there is an easily found example online of how to read video from a V4L device, I couldn't find a similar quality example for writing. If such exists, it may show the missing piece.
Context: I have a file called libffmpeg.so, that I took from the APK of an Android application that is using FFMPEG to encode and decode files between several Codecs. Thus, I take for grant that this is compiled with encoding options enable and that this .so file is containing all the codecs somewhere. This file is compiled for ARM (what we call ARMEABI profile on Android).
I also have a very complete class with interops to call API from ffmpeg. Whatever is the origin of this static library, all call responses are good and most endpoints exist. If not I add them or fix deprecated one.
When I want to create an ffmpeg Encoder, the returned encoder is correct.
var thisIsSuccessful = avcodec_find_encoder(myAVCodec.id);
Now, I have a problem with Codecs. The problem is that - let's say that out of curiosity - I iterate through the list of all the codecs to see which one I'm able to open with the avcodec_open call ...
AVCodec codec;
var res = FFmpeg.av_codec_next(&codec);
while((res = FFmpeg.av_codec_next(res)) != null)
{
var name = res->longname;
AVCodec* encoder = FFmpeg.avcodec_find_encoder(res->id);
if (encoder != null) {
AVCodecContext c = new AVCodecContext ();
/* put sample parameters */
c.bit_rate = 64000;
c.sample_rate = 22050;
c.channels = 1;
if (FFmpeg.avcodec_open (ref c, encoder) >= 0) {
System.Diagnostics.Debug.WriteLine ("[YES] - " + name);
}
} else {
System.Diagnostics.Debug.WriteLine ("[NO ] - " + name);
}
}
... then only uncompressed codecs are working. (YUV, FFmpeg Video 1, etc)
My hypothesis are these one:
An option that was missing at the time of compiling to the .so file
The av_open_codec calls is acting depending on the properties of the AVCodecContext I've referenced in the call.
I'm really curious about why only a minimum set of uncompressed codecs are returned?
[EDIT]
#ronald-s-bultje answer led me to read AVCodecContext API description, and there are a lot of mendatory fileds with "MUST be set by user" when used on an encoder. Setting a value for these parameters on AVCodecContext made most of the nice codecs available:
c.time_base = new AVRational (); // Output framerate. Here, 30fps
c.time_base.num = 1;
c.time_base.den = 30;
c.me_method = 1; // Motion-estimation mode on compression -> 1 is none
c.width = 640; // Source width
c.height = 480; // Source height
c.gop_size = 30; // Used by h264. Just here for test purposes.
c.bit_rate = c.width * c.height * 4; // Randomly set to that...
c.pix_fmt = FFmpegSharp.Interop.Util.PixelFormat.PIX_FMT_YUV420P; // Source pixel format
The av_open_codec calls is acting depending on the properties of the
AVCodecContext I've referenced in the call.
It's basically that. I mean, for the video encoders, you didn't even set width/height, so most encoders really can't be expected to do anything useful like this, and are right to error right out.
You can set default parameters using e.g. avcodec_get_context_defaults3(), which should help you a long way to getting some useful settings in the AVCodecContext. After that, set typical ones like width/height/pix_fmt to the ones describing your input format (if you want to do audio encoding - which is actually surprisingly unclear from your question, you'll need to set some different ones like sample_fmt/sample_rate/channels, but same idea). And then you should be relatively good to go.
Imagine I have H.264 AnxB frames coming in from a real-time conversation. What is the best way to encapsulate in MPEG2 transport stream while maintaining the timing information for subsequent playback?
I am using libavcodec and libavformat libraries. When I obtain pointer to object (*pcc) of type AVCodecContext, I set the foll.
pcc->codec_id = CODEC_ID_H264;
pcc->bit_rate = br;
pcc->width = 640;
pcc->height = 480;
pcc->time_base.num = 1;
pcc->time_base.den = fps;
When I receive NAL units, I create a AVPacket and call av_interleaved_write_frame().
AVPacket pkt;
av_init_packet( &pkt );
pkt.flags |= AV_PKT_FLAG_KEY;
pkt.stream_index = pst->index;
pkt.data = (uint8_t*)p_NALunit;
pkt.size = len;
pkt.dts = AV_NOPTS_VALUE;
pkt.pts = AV_NOPTS_VALUE;
av_interleaved_write_frame( fc, &pkt );
I basically have two questions:
1) For variable framerate, is there a way to not specify the foll.
pcc->time_base.num = 1;
pcc->time_base.den = fps;
and replace it with something to indicate variable framerate?
2) While submitting packets, what "timestamps" should I assign to
pkt.dts and pkt.pts?
Right now, when I play the output using ffplay it is playing at constant framerate (fps) which I use in the above code.
I also would love to know how to accommodate varying spatial resolution. In the stream that I receive, each keyframe is preceded by SPS and PPS. I know whenever the spatial resolution changes.
IS there a way to not have to specify
pcc->width = 640;
pcc->height = 480;
upfront? In other words, indicate that the spatial resolution can change mid-stream.
Thanks a lot,
Eddie
DTS and PTS are measured in a 90 KHz clock. See ISO 13818 part 1 section 2.4.3.6 way down below the syntax table.
As for the variable frame rate, your framework may or may not have a way to generate this (vui_parameters.fixed_frame_rate_flag=0). Whether the playback software handles it is an ENTIRELY different question. Most players assume a fixed frame rate regardless of PTS or DTS. mplayer can't even compute the frame rate correctly for a fixed-rate transport stream generated by ffmpeg.
I think if you're going to change the resolution you need to end the stream (nal_unit_type 10 or 11) and start a new sequence. It can be in the same transport stream (assuming your client's not too simple).
How would I limit upload speed from the server in node.js?
Is this even an option?
Scenario: I'm writing some methods to allow users to automated-ly upload files to my server. I want to limit the upload speed to (for instance) 50kB/s (configurable of course).
I do not think you can force a client to stream at a predefined speed, however you can control the "average speed" of the entire process.
var startTime = Date.now(),
totalBytes = ..., //NOTE: you need the client to give you the total amount of incoming bytes
curBytes = 0;
stream.on('data', function(chunk) { //NOTE: chunk is expected to be a buffer, if string look for different ways to get bytes written
curBytes += chunk.length;
var offsetTime = calcReqDelay(targetUploadSpeed);
if (offsetTime > 0) {
stream.pause();
setTimeout(offsetTime, stream.resume);
}
});
function calcReqDelay(targetUploadSpeed) { //speed in bytes per second
var timePassed = Date.now() - startTime;
var targetBytes = targetUploadSpeed * timePassed / 1000;
//calculate how long to wait (return minus in case we actually should be faster)
return waitTime;
}
This is of course pseudo code, but you probably get the point. There may be another, and better, way which I do not know about. In such case, I hope someone else will point it out.
Note that it is also not very precise, and you may want to have a different metric than the average speed.
Use throttle module to control the pipe stream speed
npm install throttle
var Throttle = require('throttle');
// create a "Throttle" instance that reads at 1 b/s
var throttle = new Throttle(1);
req.pipe(throttle).pipe(gzip).pipe(res);
Instead of rolling your own, the normal way to do this in production, is to let your load balancer or entry server throttle the incoming requests. See http://en.wikipedia.org/wiki/Bandwidth_throttling. It's not typically something an application needs to handle itself.