Matlab VideoWriter Disklogger - Too slow for HD - performance

I am trying to capture a video with matlab.
The Video might have a duration of about 15-60 minutes, so it is pretty big and I store it to disk instead of memory. (I don't need online precessing)
I want to capture the video in high resolution (1280x720 would be fine). But with High Resolution, I get the problem, that Matlab does not log the data fast enough to the disk.
Here are my observations:
Resolution of 640x480: Everything works fine.
Resolution of 800x600 or above: The utilization of my RAM increases linearly, while capturing the video and decreases linearly after i stop capturing the video. After the stop command, Matlabs Command Window is blocked for some time. While that time, I can see tat the .avi - file is growing. Of course, the higher I pick the resolution, the faster the RAM utilization increases.
So my problem is, that I cannot use a Resolution of 1280x720, because after caputring the video for about 5 minutes, my whole RAM (8GB) is utilized and I get an awful out of memory error. (Interesing fact: the video that uses my whole RAM has only about 300 MB. That must be the MJPEG compression rate)
Has anybody got an idea how to solve my problem? Is matlabs VideoWriter class just too slow and there is nothing I can do? Other softwares for video capturing are also able to make HD Videos..
Best regards,
Richi
For the completeness, here is the code that i used:
path = 'C:\Daten\test\test.avi';
videoRec = videoinput('winvideo',1,'MJPG_1280x720');
src = getselectedsource(videoRec);
src.FrameRate = '30.0000';
set(videoRec,'TriggerRepeat',inf);
set(videoRec, 'LoggingMode', 'disk');
logger = VideoWriter(path);
set(logger,'FrameRate',str2double(src.FrameRate));
videoRec.Disklogger = logger;
start(videoRec);

Related

Is there a way to predict the amount of memory needed for ffmpeg?

I've just starting using ffmpeg and I want to create a VR180 video from a list of images with resolution 11520x5760. (Images are 80MB each, i have for now just 225 for testing.)
I used the code :
ffmpeg -framerate 30 -i "%06d.png" "output.mp4"
I ran out of my 8G RAM and ffmpeg crashed.
So I've create a 10G swap, ffmpeg filled it up and crashed.
Is there a way to know how much is needed for an ffmpeg command to run properly ?
Please provide output of the ffmpeg command when you run it.
I'm assuming FFmpeg will transcode to H.264, so it will create a H.264 encoder. Most memory sits in the lookahead queue and reference buffers. For H.264, the default for --rc-lookahead is 40. I believe H.264 allows something like 2x4=8 references (?) + current frame(s) (there can be frame-threading), so let's say roughly 50 frames in total. Frame size for YUV420P data is 1.5xresolution, so 1.5x11520x5760x50=~5GB. Add to that encoder-specific data which roughly doubles this, so 10GB should be enough.
If 8+10GB is not enough, my rough handwavy calculation is probably not precise enough. Your options are:
significantly reduce --rc-lookahead, --threads and --level so there's fewer frames alive at a time - read the documentation for each of these options to understand what they do, what their defaults are and what to change them to to reduce memory usage (see e.g. note here for --rc-lookahead).
You can also use a different (less complex) codec that has smaller memory requirements.

Using more than 2 NV_ENC at a time with FFMPEG

I'm currently generating timelapse videos using a thread on my CPU with fluent-ffmpeg running on nodejs. It takes roughly 1 minute to generate a 10 second timelapse. I'm generating many at the same time (basically one per thread) such that I tend to get the best performance at 8 worker threads. ... overall system throughput is about one video per 12 seconds.
GPU processing using h264_nvenc takes the single-thread time to about 3-4 seconds. Yippie! I went out and bought some nVidia 1660's to take advantage.
Unfortunately, when I go to generate the 3rd simultaneous video, I get "Conversion Failed!" error from FFMPEG.
Some basic research seems to show you can only 2 at a time. Perhaps 3 with updated drivers.
Is there a method around this? Posts from here indicates this limit is artificial and can be worked around: https://www.techpowerup.com/268495/nvidia-silently-increases-geforce-nvenc-concurrent-sessions-limit-to-3
Perhaps a way to use all the cuda/tensor/etc cores to render timelapse videos instead of just relying on the limited nv_enc?
Current limit of 3 renders on both my GTX 1060 and my RTX 2080 Ti. Other post says GTX 1660 is same. So this is obviously an artificial limit. Looking at the nVidia link posted above, which has a list of cards and their NVENC/NVDEC capabilities, it looks like most nVidia gaming cards themselves have this 3-render limit. However, most of the modern (Pascal and up) Quadro cards allow unlimited renders per card. As another workaround, you can put multiple gaming cards in a system. FFmpeg has a function to send a particular job to the card of your chosing. The same encoder module is in the GTX1660 as is in the RTX 2080 Ti, so there shouldn't be much speed difference between low-end and high-end cards. Maybe some minor difference from the memory bus width, but I haven't compared 1660/2080Ti directly to each other. What I'm saying is: if you need more encoding horsepower, just buy another couple low-end cards and divide up the workload using FFmpeg's builtin functionality.

FFmpeg dash manifest '-window_size'

In the FFmpeg DASH documentation I don't understand the purpose of -window_size which is explained as:
Set the maximum number of segments kept in the manifest.
If my video is 30 seconds long, the GOP size is 4 seconds and the segment length is 4 seconds, what is the meaning and purpose of a parameter to control the maximum number of segments kept in the manifest, when does this parameter need to be used and how do you determine valid values?
I'm guessing that the stream is being loaded into memory and the number of segments in the manifest controls how much is kept in memory at one time but it's just a wild guess and I can't find any further explanation.
I am not live streaming in case it's relevant.
The window size is relevant if you stream live. In a live scenario a player could rewind and the window size determines how far a player could go back. Since you are not live streaming - it is not relevant for you.

How can I speed up gzip compression with h5py?

I'm trying to store the frames of an mp4 video into hdf5 using h5py. At first I tried simply not compressing the data. This caused a 5000 MB video to be about 500 GBs when stored in hdf5. I'm experimenting with using gzip compression to make the dataset more managable, but using the compression it takes about a minute to store a single frame of the video. Here is a minimal code example
import h5py
hdf5 = h5py.File(file, mode='a')
dset = hdf5.create_dataset(dset_name, shape=(70000, 1080, 1920, 3),
dtype=np.uint8, chunks=True, compression='gzip')
for i, frame in enumerate(video_stream):
dset[i] = frame
Each video has about 70e3 of 1080p rbg images. video_stream is an object that returns (1080, 1920, 3) arrays when iterated over. You can look at it here if you think that's important. So how can I stored this data into hdf5 at a reasonable speed and end up with a reasonable file size? Is it possible to get close to mp4 compression?
MP4 is a quite advanced standard, specifically designed to store video, with often hardware acceleration. You see its efficiency when it manages to pack more than 400 billion values in just 5 billion bytes.
HDF5 is not a video standard, GZip isn't very well suitable for video either. Python probably doesn't matter a lot as the gzip compression is probably in C anyway, but it should be noted that the code is single-threaded. In summary, you're not going to get anything close to MP4.
To be honest, why are you even trying? I suspect you don't have much affinity with video data yet.

Windows seems to hang sometimes for 300-600ms - measured by Performance Counter

Anyone know how to avoid that Windows 7 sometimes pauses for 300-600ms, even freezing SystemTime and MultimediaTimer (so if you measure time before and after this pause, it measures 0ms while PerformanceCounter in fact does measure this pause correctly. CPU load is pretty low (10%). The system uses a new MLC SSD. Do these still have stutter issues?
I found this behaviour by measuring timestamps from a camera grapping at 6 frames per second. I logged when images came in, and looking at the grapping log, the time between the images were fine, until I warned if the time between them was 20% too fast and 20% too slow. Then I sometimes (once per hour, sometimes only after 4 hours) got 300-600ms warnings. Followed by some "too fast" (image buffer suddenly give images from the buffer that built up during the 300-600ms pauses in a burst). However, the times in the log entries show that the systemtime wasnt updated during this time.
Log timestamps are given by GetLocalTime(LPSYSTEMTIME), and the time between images grapped are given by PerformanceCounter. When I use multimediatimer for to measure the time between new images , its time duration is the same as you get when subtracting the times in the log. Then I thought it was weird that it gave me extra images with 0-30ms time difference.
I tried all kinds of tweaks and driver updates in the network interface, different cameras, to no luck.
166ms is the ideal time between images , but here is an example of "bursts" of missing time slots and discrepancy between systemtime and performancecounter:
[03:06:09:48:22:615]New Image
[03:06:09:48:22:781]New Image
[03:06:09:48:22:949]New Image
[03:06:09:48:22:974]New Image. Warning Time since last: 224ms
[03:06:09:48:23:083]New Image
[03:06:09:48:23:238]New Image. Warning Time since last: 454ms
[03:06:09:48:23:261]New Image. Warning Time since last: 224ms
[03:06:09:48:23:415]New Image. Warning Time since last: 353ms
[03:06:09:48:23:551]New Image
[03:06:09:48:23:583]New Image. Warning Time since last: 330ms
[03:06:09:48:23:734]New Image. Warning Time since last: 451ms
[03:06:09:48:23:754]New Image. Warning Time since last: 119ms
[03:06:09:48:23:854]New Image
[03:06:09:48:24:020]New Image
[03:06:09:48:24:186]New Image
[03:06:09:48:24:354]New Image
[03:06:09:48:24:520]New Image
[03:06:09:48:24:686]New Image
So it all comes down to this question:
What phenomenon can cause the systemtime and multimedia time to lock up with the rest of the system so the pause is masked in the timings, while performance counter still keeps time, and how can I fix it?
I fixed this by installing a new networks driver, disabling hyperthreading, turbo boost and CPU Pstates.

Resources