FFmpeg dash manifest '-window_size' - ffmpeg

In the FFmpeg DASH documentation I don't understand the purpose of -window_size which is explained as:
Set the maximum number of segments kept in the manifest.
If my video is 30 seconds long, the GOP size is 4 seconds and the segment length is 4 seconds, what is the meaning and purpose of a parameter to control the maximum number of segments kept in the manifest, when does this parameter need to be used and how do you determine valid values?
I'm guessing that the stream is being loaded into memory and the number of segments in the manifest controls how much is kept in memory at one time but it's just a wild guess and I can't find any further explanation.
I am not live streaming in case it's relevant.

The window size is relevant if you stream live. In a live scenario a player could rewind and the window size determines how far a player could go back. Since you are not live streaming - it is not relevant for you.

Related

Is there a way to predict the amount of memory needed for ffmpeg?

I've just starting using ffmpeg and I want to create a VR180 video from a list of images with resolution 11520x5760. (Images are 80MB each, i have for now just 225 for testing.)
I used the code :
ffmpeg -framerate 30 -i "%06d.png" "output.mp4"
I ran out of my 8G RAM and ffmpeg crashed.
So I've create a 10G swap, ffmpeg filled it up and crashed.
Is there a way to know how much is needed for an ffmpeg command to run properly ?
Please provide output of the ffmpeg command when you run it.
I'm assuming FFmpeg will transcode to H.264, so it will create a H.264 encoder. Most memory sits in the lookahead queue and reference buffers. For H.264, the default for --rc-lookahead is 40. I believe H.264 allows something like 2x4=8 references (?) + current frame(s) (there can be frame-threading), so let's say roughly 50 frames in total. Frame size for YUV420P data is 1.5xresolution, so 1.5x11520x5760x50=~5GB. Add to that encoder-specific data which roughly doubles this, so 10GB should be enough.
If 8+10GB is not enough, my rough handwavy calculation is probably not precise enough. Your options are:
significantly reduce --rc-lookahead, --threads and --level so there's fewer frames alive at a time - read the documentation for each of these options to understand what they do, what their defaults are and what to change them to to reduce memory usage (see e.g. note here for --rc-lookahead).
You can also use a different (less complex) codec that has smaller memory requirements.

How to set frame offset for source and reference video for calculating using FFMpeg commandline?

I have a scenario where I am streaming a reference video on a server machine and receiving it at a client machine with exact same codec, using FFMpeg via UDP/RTP.
So, I have a reference.avi file and a recording.ts file with me. Now, due to a network side issue and FFMpeg discarding old frames, often the recording.ts lacks exactly 12 FRAMES from the beginning. Sometimes, it may lack more frames in-between but that'd due to general network traffic and packet loss reason and I don't plan to account for that. Anyways, due to those 12 frames, when I calculate the PSNR, it drops down to ~13, even though remaining frames may/may not be affected.
So, my aim is to discard first 12 frames from reference.ts and then compare. For that, I would also need to adjust the frames from recording.ts.
Consider the following scenario:
reference.ts has 1500 frames. So naturally I am going to cut-short it 1488. Then we have the following cases:
recording.ts has 1500 frames. This is not affected. Still I will remove 12 frames to match the count. So frame 1 would then represent frame 13.
recording.ts has 1496 frames. This is not affected. Still I will remove 12 frames even though it'd get to 1484 count assuming that frame 1 would then represent frame 13.
recording.ts has 1488 frames. This is affected. No need to remove frames.
recording.ts has 1480 frames. This is affected. No need to remove frames.
Once that is done, then I will calcualte the PSNR. So, my FFMpeg should be able to do all this, hopefully in a single command on bash.
A better alternative would be for FFMpeg to find the where the 13th frame is in recording.ts and then cut-short from the beginning. That'd be more preferred and even more if there is no cut-shorting required, i.e. if offset could be set in-line to command and no additional video output is generated for use in PSNR comparison.
Current I am using the following command to calculate the PSNR.
ffmpeg -i 'recording.ts' -vf "movie='reference.avi', psnr=stats_file='psnr.txt'" -f rawvideo -y /dev/null
It'd be great if somebody could help me in this regard. Thanks.

MME Audio Output Buffer Size

I am currently playing around with outputting FP32 samples via the old MME API (waveOutXxx functions). The problem I've bumped into is that if I provide a buffer length that does not evenly divide the sample rate, certain audible clicks appear in the audio stream; when recorded, it looks like some of the samples are lost (I'm generating a sine wave for the test). Currently I am using the "magic" value of 2205 samples per buffer for 44100 sample rate.
The question is, does anybody know the reason for these dropouts and if there is some magic formula that provides a way to compute the "proper" buffer size?
Safe alignment of data buffers is the value of nBlockAlign of WAVEFORMATEX structure.
Software must process a multiple of nBlockAlign bytes of data at a
time. Data written to and read from a device must always start at the
beginning of a block. For example, it is illegal to start playback of
PCM data in the middle of a sample (that is, on a non-block-aligned
boundary).
For PCM formats this is the amount of bytes for single sample across all channels. Non-PCM formats have their own alignments, often equal to length of format-specific block, e.g. 20 ms.
Back in time when waveOutXxx was the primary API for audio, carrying over unaligned bytes was an unreasonable burden for the API and unneeded performance overhead. Right now this API is a compatibility layer on top of other audio APIs, and I suppose that unaligned bytes are just stripped to still play the rest of the content, which would otherwise be rejected in full due to this small glitch, which might be just a smaller and non-fatal caller's inaccuracy.
if you fill the audio buffer with sine sample and play it looped , very easily it will click , unless the buffer length is not a multiple of the frequence, as you said ... the audible click in fact is a discontinuity in the wave ...an advanced techinques is to fill the buffer dinamically , that is, you should set a callback notification while the buffer pointer advance and fill the buffer with appropriate data at appropriate offset. i would use a more large buffer as 2205 is too short to get an async notification , calculate data , and write the buffer ,all that while playing , but it would depend of cpu power

DirectShow push sources, syncing and timestamping

I have a filter graph that takes raw audio and video input and then uses the ASF Writer to encode them to a WMV file.
I've written two custom push source filters to provide the input to the graph. The audio filter just uses WASAPI in loopback mode to capture the audio and send the data downstream. The video filter takes raw RGB frames and sends them downstream.
For both the audio and video frames, I have the performance counter value for the time the frames were captured.
Question 1: If I want to properly timestamp the video and audio, do I need to create a custom reference clock that uses the performance counter or is there a better way for me to sync the two inputs, i.e. calculate the stream time?
The video input is captured from a Direct3D buffer somewhere else and I cannot guarantee the framerate, so it behaves like a live source. I always know the start time of a frame, of course, but how do I know the end time?
For instance, let's say the video filter ideally wants to run at 25 FPS, but due to latency and so on, frame 1 starts perfectly at the 1/25th mark but frame 2 starts later than the expected 2/25th mark. That means there's now a gap in the graph since the end time of frame 1 doesn't match the start time of frame 2.
Question 2: Will the downstream filters know what to do with the delay between frame 1 and 2, or do I manually have to decrease the length of frame 2?
One option is to omit time stamps, but this might end up in filters fail to process this data. Another option is to use System Reference Clock to generate time stamps - in any way this is preferable to directly using performance counter as a time stamp source.
Yes you need to time stamp video and audio in order to keep them in sync, this is the only way to tell that data is actually attributed to the same time
Video samples don't have time, you can omit stop time or set it equal to start time, a gap between video frame stop time and next frame start time has no consequences
Renderers are free to choose whether they need to respect time stamps or not, with audio you of course will want smooth stream without gaps in time stamps

How to calculate the frames per second(fps) performance of a video decoder?

How do we get the performance of a video decoder as to how many frames it can decode per second. I know following parameters are used to arrive at fps but not able to relate them in a formula which gives the exact answer:
seconds taken to decode a video sequence, total number of frames in the encoded video sequence, clock rate of the hardware/processor which executes the code, Million cycles per second(MCPS) of the decoder
How is MCPS and fps related?
Given the calculation of Byron. I think it should be more in the lines of:
A file F to be encoded which consists of N frames
takes T seconds to be encoded on a processor which can do X MCPS
than I would say the encoder uses: (T*X)/N MC(million cycles) per frame
given that the framerate is F (for instance 25 frames a second)
than the above value times F gives the used MCPS for the encoder.
if this is lower than the MCPS of your processor, you can encode realtime (or faster).
R
When a codec quotes a MCPS number it is for a specific hardware configuration.
Million Cycles Per Second. This parameter describes the performance of any software on a given processor. For example, when we say a codec takes 100 MCPS on a given processor, it means that it consumes 100 Million cycles of the processor every second. Reference
Also some video is encoded better by different codecs. Different video streams will have different performance characteristics based on the type of video encoded. There are codecs that encode anime very well and fast, but do horribly on DVD movies. There are many parameters to consider.
The best way to determine the performance a specific algorithm is to run it on the same hardware against the type of streams you think you will be encoding. you should do multiple runs with different video and average.
That said for a specific stream on a specific peice of hardware the math is relativly simple:
If it takes a 2.5Ghz processor 5 seconds to encode a file, the MCPS for that encoder is 2500/5 or 500 MCPS
There is also a peak MCPS number, where peak mcps can be defined as:
...Peak MCPS [quoted here] is the maximum average MCPS calculated over a sliding window of 4 pictures. The actual MCPS number may vary within a +/- 5% range.
Reference

Resources