Convert Jitter from RTP timestamp unit to millisseconds - ffmpeg

I have a video conference app and I want to display the Interarrival Jitter to the user. I am getting this information from FFmpeg, and it follows the RFC 3550 Appendix A.8, so the information is in timestamp units. I am not sure how to convert it. I am currently dividing the Jitter by 90.000 (the video stream timebase). Is this correct?
Similar question: Jitter units for Live555

Related

Converting image sequence to video with inconsistent frame rate

I recently collected video data where the video was generated as image sequences. However, between different video of the same length, different numbers of frames were acquired, which made me think that the image sequence have varied frame rates between videos. So my question is how do I convert this image sequence back to video with accurate duration between frames. Is there a way to get that information from the date and time it was created using a code? I know ffmpeg seems to be the tools many people use.
I am not sure where to start. I am not very familiar with coding, so already have trouble executing the correct codes.

VMAF-like quality indicator with single video file

I am looking for a VMAF-like objective user-perception video quality scanner that functions at scale. The use case is a twitch-like streaming service where videos are eligible to be played on demand after the live stream completes. We want to have some level of quality in the on demand library without having to view every live stream. We are encoding the livestreams into HLS playlists after the stream completes, but using VMAF to compare the post-stream mp4 to the post-encoded mp4s in HLS doesn't provide the information needed as the original mp4 could be of low quality due to bandwidth issues during the live stream.
Clarification
Not sure if I get the question correctly. You want to measure the output quality of the transcoded video without using the reference video. Is that correct?
Answer
VMAF is a reference quality metric, which means it simply compares how much subjective distortion was introduced into the transcoded video when compared to the source video. It always needs a reference input video.
I think what you are looking for is a no-reference quality metric(s). Where you can measure the "quality" of video without a reference source video. There are a lot of no-reference quality metrics intended to capture different distortion artifacts in the output video. For example, blurring, blocking, and so on. Then you can make an aggregated metric based on these values depending upon what you want to measure.
Conclusion
So, if I were you, I would start searching for no-reference quality metrics. And then look for tools that can measure those no-reference quality metrics efficiently. Hope that answers your question.

What does ffmpeg think is the difference between an audio frame and audio sample?

Here's a curious option listed in the man pages of ffmpeg:
-aframes number (output)
Set the number of audio frames to output. This is an obsolete alias for "-frames:a", which you should use instead.
What an 'audio frame' is seems dubious to me. This SO answer says that frame is synonymous with sample, but that can't be what ffmpeg thinks a frame is. Just look at this example when I resample some audio to 22.05 kHz and a length of exactly 313 frames:
$ ffmpeg -i input.mp3 -frames:a 313 -ar:a 22.05K output.wav
If 'frame' and 'sample' were synonymous, we would expect audio duration to be 0.014 seconds, but the actual duration is 8 seconds. ffmpeg thinks the frame rate of my input is 39.125.
What's going on here? What does ffmpeg think an audio frame really is? How do I go about finding this frame rate of my input audio?
FFmpeg uses an AVFrame structure internally to convey and process all media data in chunks. The number of samples per frame depends on the decoder. For video, a frame consists of all pixel data for one picture, which is a logical grouping, although it can also contain pixel data for two half-pictures of an interlaced video stream.
For audio, decoders of DCT-based codecs typically fill a frame with the number of samples used in the DCT window - that's 1024 for AAC and 576/1152 for MP3, as Brad mentioned, depending on sampling rate. PCM samples are independent so there is no inherent concept of framing and thus frame size. However the samples still need to be accommodated within AVFrames, and ffmpeg defaults to 1024 samples per frame for planar PCM in each buffer (one for each channel).
You can use the ashowinfo filter to display the frame size. You can also use the asetnsamples filter to regroup the data in a custom frame size.
A "frame" is a bit of an overloaded term here.
In PCM, a frame is a set of samples occurring at the same time. If your audio were 22.05 kHz and you had 313 PCM frames, it's length in time would be about 14 milliseconds, as you expect.
However, your audio isn't PCM... it's MP3. An MP3 frame is about 26 milliseconds long. 313 of them add up to about 8 seconds. The frame here is a block of audio that cannot be decoded independently. (In fact, some frames actually depend on other frames via the bit reservoir!)

Implementing custom h264 quantization for Ffmpeg?

I have a Raspberry Pi, and I'm livestreaming using FFmpeg. Unfortunately my wifi signal varies over the course of my stream. I'm currently using raspivid to send h264 encoded video to the stream. I have set a constant resolution and FPS, but have not set bitrate nor quantization, so they are variable.
However, the issue is that the quantization doesn't vary enough for my needs. If my wifi signal drops, my ffmpeg streaming speed will dip below 1.0x to 0.95xish for minutes, but my bitrate drops so slowly that ffmpeg can never make it back to 1.0x. As a result my stream will run into problems and start buffering.
I would like the following to happen:
If Ffmpeg (my stream command)'s reported speed goes below 1.0x (slower than realtime streaming), then increase quantization compression (lower bitrate) exponentially until Ffmpeg speed stabilizes at 1.0x. Prioritize stabilizing at 1.0x as quickly as possible.
My understanding is that the quantization logic Ffmpeg is using should be in the h264 encoder, but I can't find any mention of quantization at all in this github: https://github.com/cisco/openh264
My knowledge of h264 is almost zilch, so I'm trying to figure out
A) How does h264 currently vary the quantization during my stream, if at all?
B) Where is that code?
C) How hard is it for me to implement what I'm describing?
Thanks in advance!!

Delay in video in DirectShow graph

I'm seeing a noticeable video which is causing the resulting audio/video sync to be off for a capture card that I'm testing. My graph topology is as follows.
Video Source -> Sample Grabber -> Null Renderer
Audio Source -> Sample Grabber -> Null Renderer
The samples from video is compressed using H264, and Audio is compressed using FAAC. This topology and application code works for capture cards that I've used in the past. But I see this delay with the current card that I'm testing. Naturally I thought it was related to the card itself. So I checked and found that there is no video/audio desync when using Open Broadcaster, VLC, or the same graph in GraphEdit to capture with this card.
This indicates to me that the problem is related to how I'm constructing the graph. I then tried adjusting the buffer sizes using IAMBufferNegotiation, as well as SetStreamSyncOffset without success.
The sync is almost perfect if I apply a 500 ms lag to the video (e.g. videoTimeStamp = videoTimeStamp - 500). This is strange because I would expect to see more latency in the audio than video.
Video and audio synchronization is all about time stamps. Video or audio leg might delay processing of data, but it is time stamps that show original and intended sync.
Potential causes include:
Video and audio sources timestamp data independently, incorrectly delivering unsynchronized data - does not look like your case
You neglect time stamps and you use actual time of sample arrival to your sample grabber, which is incorrect
Another filter in between, such as decoder, incorrectly restamps data when processes it

Resources