encapsulating H.264 streams variable framerate in MPEG2 transport stream - ffmpeg

Imagine I have H.264 AnxB frames coming in from a real-time conversation. What is the best way to encapsulate in MPEG2 transport stream while maintaining the timing information for subsequent playback?
I am using libavcodec and libavformat libraries. When I obtain pointer to object (*pcc) of type AVCodecContext, I set the foll.
pcc->codec_id = CODEC_ID_H264;
pcc->bit_rate = br;
pcc->width = 640;
pcc->height = 480;
pcc->time_base.num = 1;
pcc->time_base.den = fps;
When I receive NAL units, I create a AVPacket and call av_interleaved_write_frame().
AVPacket pkt;
av_init_packet( &pkt );
pkt.flags |= AV_PKT_FLAG_KEY;
pkt.stream_index = pst->index;
pkt.data = (uint8_t*)p_NALunit;
pkt.size = len;
pkt.dts = AV_NOPTS_VALUE;
pkt.pts = AV_NOPTS_VALUE;
av_interleaved_write_frame( fc, &pkt );
I basically have two questions:
1) For variable framerate, is there a way to not specify the foll.
pcc->time_base.num = 1;
pcc->time_base.den = fps;
and replace it with something to indicate variable framerate?
2) While submitting packets, what "timestamps" should I assign to
pkt.dts and pkt.pts?
Right now, when I play the output using ffplay it is playing at constant framerate (fps) which I use in the above code.
I also would love to know how to accommodate varying spatial resolution. In the stream that I receive, each keyframe is preceded by SPS and PPS. I know whenever the spatial resolution changes.
IS there a way to not have to specify
pcc->width = 640;
pcc->height = 480;
upfront? In other words, indicate that the spatial resolution can change mid-stream.
Thanks a lot,
Eddie

DTS and PTS are measured in a 90 KHz clock. See ISO 13818 part 1 section 2.4.3.6 way down below the syntax table.
As for the variable frame rate, your framework may or may not have a way to generate this (vui_parameters.fixed_frame_rate_flag=0). Whether the playback software handles it is an ENTIRELY different question. Most players assume a fixed frame rate regardless of PTS or DTS. mplayer can't even compute the frame rate correctly for a fixed-rate transport stream generated by ffmpeg.
I think if you're going to change the resolution you need to end the stream (nal_unit_type 10 or 11) and start a new sequence. It can be in the same transport stream (assuming your client's not too simple).

Related

Can't get the right formula to set frame pts for a stream using libav

I'm trying to save a stream of frames as mp4.
Source framerate is not fixed and it stay in the range [15,30]
Encoder params:
...
eCodec.time_base = AVRational(1,3000);
eCodec.framerate = AVRational(30, 1);
...
Stream params:
eStream = avformat_new_stream(eFormat, null);
eStream.codecpar = codecParams;
eStream.time_base = eCodec.time_base;
Decoder time_base is 0/1 and it marks each frame with a pts like:
480000
528000
576000
...
PTS(f) is always == PTS(f-1)+48000
Encoding (dFrame is the received frame, micro the elapsed time in microseconds):
av_frame_copy(eFrame, dFrame);
eFrame.pts = micro*3/1000;
This make the video playing too fast.
I can't understand why, but changing micro*3/1000 to micro*3*4/1000 make the video play at the correct speed (checked against a clock after many minutes of varying fps)
What am I missing?

How to write a video stream containing B-frame and no DTS to a MP4 container?

I want to save a h264 video stream received from a RTSP source to a MP4 container.
Not like other questions asked on SO, here the challenges I face are:
The stream contains B frames.
The stream has only PTS given by the RTP/RTCP.
Here is the code I did
// ffmpeg
pkt->data = ..;
pkt->size = ..;
pkt->flags = bKeyFrame? AV_PKT_FLAG_KEY : 0;
pkt->dts = AV_NOPTS_VALUE;
pkt->pts = PTS;
// PTS is based on epoch microseconds so I ignored re-scaling.
//av_packet_rescale_ts(pkt, { 1, AV_TIME_BASE }, muxTimebase);
auto ret = av_interleaved_write_frame(m_pAVFormatCtx, pkt);
I received a lot of error messages like this:
"Application provided invalid, non monotonically increasing dts to muxer ...".
Result: the mp4 file is playable via VLC but the FPS is just a half of the original FPS and the video duration is incorrect (VLC shows a weird number).
So how do I set correct DTS and PTS before sending to the container?
Update:
I have tried some changes, though not successfully yet, I found that the reason of the frame rate drop is due to the muxer discards frames having incorrect DTS.
Additionally, if I set start of PTS and DTS value too big, some players like VLC has to delay some time before showing video.
I have done several experiments and have some things to share to you.
Regardless having B-frames or not, mp4 muxer requires DTS must be (at least):
Monotonically increasing.
DTS <= PTS per each frame.
PTS and DTS should start from values close to zero (otherwise players like VLC has to delay some time before displaying video).
If there is no B-frames in the stream, DTS could be copied from PTS and saving frames to
a mp4 file without any issue.
If there are B-frames in the stream, the story is total different.
In this case, PTS of frames are not monotonically increased due to B-frames.
Hence, just copying DTS = PTS definitely won't work. We have to find a way to
have DTS by either sending DTS via out-of-band or calculating from FPS and PTS.
For sending out-of-band, it is quite complicated because it requires handling both
RTSP server and RTSP client. Here I just want to show the simple way of deducing DTS from FPS and PTS.
Rough steps are like this:
Detects average duration (or FPS) between frames
Parses FPS from SDP of receiving RTSP session. This way depends on
support of RTSP server. Some support, others do not.  
Another way is to calculate average duration between frames from sequence of
frames. You can buffer number of frames equal to size of one GOP,
getting the PTS difference of the first and the last frame of the GOP
divided by the number of frames you will have the average duration.
Example, if the FPS is assumed 30, then calculated average duration
should be approx 33,333 us.
Saving to the container
// Initialize container
pAVStream->time_base = { 1, AV_TIME_BASE }; // PTS/DTS in microseconds.
pAVFormatCtx->oformat->flags |= AVFMT_SEEK_TO_PTS;
ret = avformat_write_header(m_pAVFormatCtx, &priv_opts);
Assume that you have pre-calculated average duration:
nAvgDuration = 33'333LL;
// Per each frame
if (waitingForTheFirstKeyFrame) {
if (!bsKeyFrame) {
return false;
}
waitingForTheFirstKeyFrame = false;
nPTSOffset = nPTS; // pts will start from 0
nStartDTS = nPTS - nAvgDuration; // dts will start from -nAvgDuration
}
nDTS = nStartDTS;
nStartDTS += nAvgDuration; // dts is monotonically increasing
pkt->pts = nPTS - nPTSOffset;
pkt->dts = nDTS - nPTSOffset;
// Since PTS/DTS are in microseconds, no need to rescalling more.
// Of course, you can use a different time_base.
auto ret = av_interleaved_write_frame(m_pAVFormatCtx, pkt);
Caution:
This solution works well with an assumption that the original PTS of the stream (at server side) are monotonically increased, there are no gaps between frames and there is no frame loss. Otherwise, the accuracy of DTS may be reduced or even the mp4 file could not be played.
It is not normal that "The stream has only PTS given by the RTP/RTCP.". Something wrong here.
If there is no dts, it means you should use pts only. If there are really B frames, then you would have dts values that different than pts.
Try on your code dts = pts and see what happens.

Using FFMPEG to make HLS clips from H264

I am using a Hi35xx camera processor from HiSilicon. It is an Arm9 with a video pipeline bolted on the side. At one end of the pipeline is the CMOS sensor. At the other end is a H264 encoder. When I turn on the pipeline, the encoder outputs H264 NAL packets like this:
frame0: <SPS>,<PPS>,<SEI>,<key frame>
frame1: <delta frame>
frame2: <delta frame>
...
frameN: <delta frame>
frameN+1: <SPS>,<PPS>,<SEI><key frame>
frameN+2: <delta frame>
frameN+3: <delta frame>
...
etc.
I am turning that into HLS clips by doing the following (pseudo code for clarity) :
av_register_all();
avformat_network_init();
avformat_alloc_output_context2(&ctx_out, NULL, "hls", "./foo.m3u8");
strm_out = avformat_new_stream(ctx_out, NULL);
codec_out = strm_out->codecpar;
codec_out->codec_id = AV_CODEC_ID_H264;
codec_out->codec_type = AVMEDIA_TYPE_VIDEO;
codec_out->width = encoder_width;
codec_out->height = encoder_height;
codec_out->bit_rate = encoder_bitrate;
codec_out->codec_tag = 0;
avformat_write_header(ctx_out, NULL);
while(get_packet_from_pipeline_encoder(&encoder_packet)) {
AVPacket pkt;
av_init_packet(&pkt);
pkt.stream_index = 0;
pkt.dts = AV_NOPTS_VALUE;
pkt.pts = AV_NOPTS_VALUE;
pkt.duration = (1000000/FRAMERATE); // frame rate in microseconds
pkt.data = encoder_packet.data;
pkt.size = encoder_packet.size;
if (is_keyframe(&encoder_packet)) {
pkt.flags |= AV_PKT_FLAG_KEY;
}
av_write_frame(ctx_out, &pkt);
}
av_write_trailer(ctx_out);
avformat_free_context(ctx_out);
This seems to work fine except that the resulting HLS frame rate is not right. Of course, this happens because I am not setting the pts/dts stuff correctly and ffmpeg lets me know that. So I have two quetions:
Am I going about this right?
How can I set the pts/dts stuff correctly?
The encoder is giving me packets and I am submitting them as frames. Those <SPS>, <PPS> and <SEI> packets are really out of band data and don't really have a timestamp. How can I submit them correctly?
My conclusion is that I was going at this the wrong way. The basic problem was that I don't have an input context so there is no h264 parser that can take the SPS, PPS and SEI packets and do anything with them. I suspect that my loop appears to be working because I am writing to an 'mpegts' file which is just h264 packets with the leading NAL zeroes replace with length words (some bit-stream filter is doing that). But that would mean that there isn't much chance that I can get the timestamps right because I have to submit them as frames. I can't submit them as 'extradata/sidedata' because there is no decoder to catch them.
This problem is fixed by writing a custom IO context for the output of my encoder and then do a normal input context. I have done some experiments with this approach and it seems to work.

Play a PCM stream sampled at 16 kHz

I get a input frame stream through a socket, it is a mono 32-bit IEEE floating point PCM stream sampled at 16 kHz.
I get this with the following code : audio File sample
With Audacity i can visualize this and i see a regular cuts between my audio flux:
var audioCtx = new(window.AudioContext || window.webkitAudioContext)();
var audioBuffer = audioCtx.createBuffer(1, 256, 16000);
var BufferfloatArray;
var source = audioCtx.createBufferSource();
source.buffer = audioBuffer;
var gainNode = audioCtx.createGain();
gainNode.gain.value = 0.1;
gainNode.connect(audioCtx.destination);
source.connect(gainNode);
source.start(0);
socket.on('audioFrame', function(raw) {
var context = audioCtx;
BufferfloatArray = new Float32Array(raw);
var src = context.createBufferSource();
audioBuffer.getChannelData(0).set(BufferfloatArray);
src.buffer = audioBuffer;
src.connect(gainNode);
src.start(0);
});
I think it is because of the sample rate of my raw buffer (16000) is different of the sample rate of my Audio context (44100), what do you think ?
This is not a sample rate problem, because the AudioBufferSourceNode resamples the audio to the AudioContext's rate when playing.
What you should do here, is to have a little queue of buffers you feed using the network, and then, play your buffers normally like you do, but from the buffer queue, taking extra care to schedule them (using the first parameter of the start method of the AudioBufferSourceNode) at the right time, so that the end of the previous buffer is exactly the start of the next one. You can use the AudioBuffer.duration parameter to achieve this (duration is in seconds).

Is packet duration guaranteed to be uniform for entire stream?

I use packet duration to translate from frame index to pts and back, and I'd like to be sure that this is a reliable method of doing so.
Alternatively, is there a better way to translate pts to a frame index and vice versa?
A snippet showing my usage:
bool seekFrame(int64_t frame)
{
if(frame > container.frameCount)
frame = container.frameCount;
// Seek to a frame behind the desired frame because nextFrame() will also increment the frame index
int64_t seek = pts_cache[frame-1]; // pts_cache is an array of all frame pts values
// get the nearest prior keyframe
int preceedingKeyframe = av_index_search_timestamp(container.video_st, seek, AVSEEK_FLAG_BACKWARD);
// here's where I'm worried that packetDuration isn't a reliable method of translating frame index to
// pts value
int64_t nearestKeyframePts = preceedingKeyframe * container.packetDuration;
avcodec_flush_buffers(container.pCodecCtx);
int ret = av_seek_frame(container.pFormatCtx, container.videoStreamIndex, nearestKeyframePts, AVSEEK_FLAG_ANY);
if(ret < 0) return false;
container.lastPts = nearestKeyframePts;
AVFrame *pFrame = NULL;
while(nextFrame(pFrame, NULL) && container.lastPts < seek)
{
;
}
container.currentFrame = frame-1;
av_free(pFrame);
return true;
}
No, not guaranteed. It may work with some codec/container combination where frame-rate is static. avi, h264 raw (annex-b) and yuv4mpeg come to mind. But other containers like flv, mp4, ts, have a PTS/DTS (or CTS) for EVERY frame. The source could be variable frame rate, or frames could have be dropped at some point during processing due to bandwidth. Also some codecs will remove duplicate frames.
So unless you created the file yourself. Do not trust it. There is no guaranteed way to look at a frame and know its 'index' except start at the beginning and count.
Your method, MAY be good enough for most files however.

Resources