MPEG2 TS end-to-end delay - mpeg

I need to calculate the end to end delay (between the encoder and the decoder) in an MPEG 2 TS
based on time stamps information (PTS PCR DTS). Are those time stamps enough to calculate the delay?

These time-stamps are inserted into the transport stream by the encoder, and are used by the decoder - such as syncing between audio and video frames, and in general locking with the original clock to display the video correctly.
The delay between an encoder and a decoder, on the other hand, is like asking what is the delay between transmitting the data from the source and receiving it in the destination. This is not determined by the data (i.e. the transport stream and the data within such as time stamps) but by the network conditions.

Related

GOP size does not correlate with actual latency

As far as I know, GOP size should correlate with observable video delay (latency). For example, if GOP size is 2, then delay in video should be near two seconds and so on, at least with CBR. But, when I set GOP size to 2, publish stream to ingest server, consume this stream and measure latency, it is between 0.8-1.2 seconds, not 2+ seconds as excepted. Increasing GOP size leads to same results: with GOP 4 latency is near 2.5 seconds, not 4 seconds.
How I measure this latency: stream working stopwatch from web-camera using OBS to ingest server and calculate difference between stopwatch value and value displayed in stream consumed from ingest. For greater measurement accuracy, I make a photo with stopwatch and actual image from ingest in one field of view.
My OBS settings is here:
Can you suggest, why do I get such results and how relevant is my statement about correlation between GOP size and video latency? Maybe, H264 settings like "zerolatency" makes some magic?
Thanks.
For streaming, each group of pictures is made of IPPPPPP --a key frame followed by some number of seconds' worth of P frames. In principle, an encoder need not incur a delay of any given length. When you send constant bit rate streams, the delay happens because the encoder must sometimes recode some frames at a lower or higher bit rate.

Convert LPCM buffer to AAC for HTTP Live Streaming

I have an application that records audio from devices into a Float32 (LPCM) buffer.
However, LPCM needs to be encoded in an audio format (MP3, AAC) to be used as a media segment to be streamed, according to the HTTP Live Streaming specifications. I have found some useful resources on how to convert a LPCM file to an AAC / MP3 file but this is not exactly what I am looking for, since I am not willing to convert a file but a buffer.
What are the main differences between converting an audio file and a raw audio buffer (LPCM, Float32)? Is the latter more trivial?
My initial thought was to create a thread that would regularly fetch data from a ring buffer (where the raw audio is stored) and convert it to a a valid audio format (either AAC or MP3).
Would it be more sensible to do so immediately when the AudioBuffer is captured through a AURenderCallback and hence pruning the ring buffer?
Thanks for your help,
The core audio recording buffer length and the desired audio file length are rarely always exactly the same. So it might be better to poll your circular/ring buffer (you know the sample rate, which should tell approximately how often) to decouple the two rates, and convert the buffer (if filled sufficiently) to a file at a later time. You can memory map a raw audio file to the buffer, but there may or may not be any performance difference between that, and async writing a temp file.

av_interleaved_write_frame return 0 but no data written

I use the ffmpeg to stream the encoded aac data , i use the
av_interleaved_write_frame()
to write frame.
The return value is 0,and it means success as the description.
Write a packet to an output media file ensuring correct interleaving.
The packet must contain one audio or video frame. If the packets are already correctly interleaved, the application should call av_write_frame() instead as it is slightly faster. It is also important to keep in mind that completely non-interleaved input will need huge amounts of memory to interleave with this, so it is preferable to interleave at the demuxer level.
Parameters
s media file handle
pkt The packet containing the data to be written. pkt->buf must be set to a valid AVBufferRef describing the packet data. Libavformat takes ownership of this reference and will unref it when it sees fit. The caller must not access the data through this reference after this function returns. This can be NULL (at any time, not just at the end), to flush the interleaving queues. Packet's stream_index field must be set to the index of the corresponding stream in s.streams. It is very strongly recommended that timing information (pts, dts duration) is set to correct values.
Returns
0 on success, a negative AVERROR on error.
However, I found no data written.
What did i miss ? How to solve it ?
av_interleaved_write_frame() must hold data in memory before it writes it out. interleaving is the process of taking multiple streams (one audio stream, one video for example) and serializing them in a monotonic order. SO, if you write an audio frame, it will keep in in memory until you write a video frame that comes 'later'. Once a later video frame is written, the audio frame can be flushed' This way streams can be processed at different speeds or in different threads, but the output is still monotonic. If you are only writing one stream (one acc stream, no video) then use av_write_frame() as suggested.

rtp decoding issue on p frames

I am streaming an rtsp stream from an IP camera. I have a parser which packages the data into frames based on the rtp payload type. The parser is able to process I frames since these contain the start of frame and end of frame packets, as well as packets in between (this is FU-A payload type).
These are combined to create a complete frame. The problem comes in when I try to construct P frames, from the wireshark dump some of these appear to be fragmented (FU-A payload type) these contain the start of frame and end of frame packets, however these do not contain packets in between. Also in some instances the camera sends strange marked packets with a payload type 1, this according to my understanding should be a complete frame.
Upon processing these two versions of P frames I then use ffmpeg to attempt to decode the frames, I receive errors messages like top block unavailable for intra mode 4x4.
At first I thought this could be due to an old ffmpeg version but I searched the web and recompiled ffmpeg with the same problem.
The I frames appear fragmented and contain lots of packets, some P frame have a start of frame (0x81) and EOF (0x41) but no packets in between and some just looked corrupt starting with 0x41 (seems like this should be the second byte) which gives payload type of 1. I am a novice when it comes to these issues but I looked at rtp documentation and I cannot find an issue with how I handle the data.
Also I stream from VLC and this seems fine but appears to halve the frame rate, I am not sure how they are able to reconstruct frames.
Please could someone help.
It is common for I-frames to be fragmented since they are usually a lot bigger than p-frames. P-frames can however also be fragmented. However there is nothing wrong with a P-frame that has been fragmented into 2 RTP packets i.e. one with the FU-header start bit set, and the following one with the end bit set. There do not need to be packets in between. For example, if the MTU is 1500, and the NAL unit is 1600 bytes large, this will be fragmented into 2 RTP packets.
As for the packets "looking corrupt" starting with 0x41 without a prior packet with a 0x81, you should examine the sequence number in the RTP header as this will tell you straight away if packets are missing. If you are seeing packet loss, the first thing to try is to increase your socket receiver buffer size.
Since VLC is able to play the stream, there is most likely an issue in the way you are reassembling the NAL units.
Also, in your question it is not always clear which byte you are referring to: I'm assuming that the 0x41 and 0x81 appear in the 2nd byte of the RTP payload, i.e. the FU header in the case where the NAL unit type of the first byte is FU-A.
Finally, note that "payload type" is the RTP payload type (RFC3550), not the NAL unit type defined in the H.264 standard.

DirectShow push sources, syncing and timestamping

I have a filter graph that takes raw audio and video input and then uses the ASF Writer to encode them to a WMV file.
I've written two custom push source filters to provide the input to the graph. The audio filter just uses WASAPI in loopback mode to capture the audio and send the data downstream. The video filter takes raw RGB frames and sends them downstream.
For both the audio and video frames, I have the performance counter value for the time the frames were captured.
Question 1: If I want to properly timestamp the video and audio, do I need to create a custom reference clock that uses the performance counter or is there a better way for me to sync the two inputs, i.e. calculate the stream time?
The video input is captured from a Direct3D buffer somewhere else and I cannot guarantee the framerate, so it behaves like a live source. I always know the start time of a frame, of course, but how do I know the end time?
For instance, let's say the video filter ideally wants to run at 25 FPS, but due to latency and so on, frame 1 starts perfectly at the 1/25th mark but frame 2 starts later than the expected 2/25th mark. That means there's now a gap in the graph since the end time of frame 1 doesn't match the start time of frame 2.
Question 2: Will the downstream filters know what to do with the delay between frame 1 and 2, or do I manually have to decrease the length of frame 2?
One option is to omit time stamps, but this might end up in filters fail to process this data. Another option is to use System Reference Clock to generate time stamps - in any way this is preferable to directly using performance counter as a time stamp source.
Yes you need to time stamp video and audio in order to keep them in sync, this is the only way to tell that data is actually attributed to the same time
Video samples don't have time, you can omit stop time or set it equal to start time, a gap between video frame stop time and next frame start time has no consequences
Renderers are free to choose whether they need to respect time stamps or not, with audio you of course will want smooth stream without gaps in time stamps

Resources