I'm trying to store the frames of an mp4 video into hdf5 using h5py. At first I tried simply not compressing the data. This caused a 5000 MB video to be about 500 GBs when stored in hdf5. I'm experimenting with using gzip compression to make the dataset more managable, but using the compression it takes about a minute to store a single frame of the video. Here is a minimal code example
import h5py
hdf5 = h5py.File(file, mode='a')
dset = hdf5.create_dataset(dset_name, shape=(70000, 1080, 1920, 3),
dtype=np.uint8, chunks=True, compression='gzip')
for i, frame in enumerate(video_stream):
dset[i] = frame
Each video has about 70e3 of 1080p rbg images. video_stream is an object that returns (1080, 1920, 3) arrays when iterated over. You can look at it here if you think that's important. So how can I stored this data into hdf5 at a reasonable speed and end up with a reasonable file size? Is it possible to get close to mp4 compression?
MP4 is a quite advanced standard, specifically designed to store video, with often hardware acceleration. You see its efficiency when it manages to pack more than 400 billion values in just 5 billion bytes.
HDF5 is not a video standard, GZip isn't very well suitable for video either. Python probably doesn't matter a lot as the gzip compression is probably in C anyway, but it should be noted that the code is single-threaded. In summary, you're not going to get anything close to MP4.
To be honest, why are you even trying? I suspect you don't have much affinity with video data yet.
Related
I've just starting using ffmpeg and I want to create a VR180 video from a list of images with resolution 11520x5760. (Images are 80MB each, i have for now just 225 for testing.)
I used the code :
ffmpeg -framerate 30 -i "%06d.png" "output.mp4"
I ran out of my 8G RAM and ffmpeg crashed.
So I've create a 10G swap, ffmpeg filled it up and crashed.
Is there a way to know how much is needed for an ffmpeg command to run properly ?
Please provide output of the ffmpeg command when you run it.
I'm assuming FFmpeg will transcode to H.264, so it will create a H.264 encoder. Most memory sits in the lookahead queue and reference buffers. For H.264, the default for --rc-lookahead is 40. I believe H.264 allows something like 2x4=8 references (?) + current frame(s) (there can be frame-threading), so let's say roughly 50 frames in total. Frame size for YUV420P data is 1.5xresolution, so 1.5x11520x5760x50=~5GB. Add to that encoder-specific data which roughly doubles this, so 10GB should be enough.
If 8+10GB is not enough, my rough handwavy calculation is probably not precise enough. Your options are:
significantly reduce --rc-lookahead, --threads and --level so there's fewer frames alive at a time - read the documentation for each of these options to understand what they do, what their defaults are and what to change them to to reduce memory usage (see e.g. note here for --rc-lookahead).
You can also use a different (less complex) codec that has smaller memory requirements.
I have an application that records audio from devices into a Float32 (LPCM) buffer.
However, LPCM needs to be encoded in an audio format (MP3, AAC) to be used as a media segment to be streamed, according to the HTTP Live Streaming specifications. I have found some useful resources on how to convert a LPCM file to an AAC / MP3 file but this is not exactly what I am looking for, since I am not willing to convert a file but a buffer.
What are the main differences between converting an audio file and a raw audio buffer (LPCM, Float32)? Is the latter more trivial?
My initial thought was to create a thread that would regularly fetch data from a ring buffer (where the raw audio is stored) and convert it to a a valid audio format (either AAC or MP3).
Would it be more sensible to do so immediately when the AudioBuffer is captured through a AURenderCallback and hence pruning the ring buffer?
Thanks for your help,
The core audio recording buffer length and the desired audio file length are rarely always exactly the same. So it might be better to poll your circular/ring buffer (you know the sample rate, which should tell approximately how often) to decouple the two rates, and convert the buffer (if filled sufficiently) to a file at a later time. You can memory map a raw audio file to the buffer, but there may or may not be any performance difference between that, and async writing a temp file.
The context is transcoding on a Raspberry Pi 3 from 1080i MPEG2 TS to 1080p#30fps H264 MP4 using libav avconv or ffmpeg. Both are using almost idenitical omx.c source file and share the same result.
The performance is short of 30fps (about 22fps) which makes it unsuitable for live transcoding without reducing the frame rate.
By timestamping the critical code, I noticed the following:
OMX_EmptyThisBuffer can take 10-20 msec to return. The spec/document indicates that this should be < 5msec. This along would almost account for the performance deficit. Can someone explains why this OMX call is out of spec?
In omx.c, a zerocopy option is used to optimized the image copying performance. But the precondition (contiguous planes and stride alignment) for this code is never satisfied and this the optimization was never in effect. Can someone explain how this zerocopy optimization can be employed?
Additional question on h264_omx encoder: it seems to on accept MP4 or raw H264 output format. How difficult it is to add other format, e.g. TS?
Thanks
I am trying to capture a video with matlab.
The Video might have a duration of about 15-60 minutes, so it is pretty big and I store it to disk instead of memory. (I don't need online precessing)
I want to capture the video in high resolution (1280x720 would be fine). But with High Resolution, I get the problem, that Matlab does not log the data fast enough to the disk.
Here are my observations:
Resolution of 640x480: Everything works fine.
Resolution of 800x600 or above: The utilization of my RAM increases linearly, while capturing the video and decreases linearly after i stop capturing the video. After the stop command, Matlabs Command Window is blocked for some time. While that time, I can see tat the .avi - file is growing. Of course, the higher I pick the resolution, the faster the RAM utilization increases.
So my problem is, that I cannot use a Resolution of 1280x720, because after caputring the video for about 5 minutes, my whole RAM (8GB) is utilized and I get an awful out of memory error. (Interesing fact: the video that uses my whole RAM has only about 300 MB. That must be the MJPEG compression rate)
Has anybody got an idea how to solve my problem? Is matlabs VideoWriter class just too slow and there is nothing I can do? Other softwares for video capturing are also able to make HD Videos..
Best regards,
Richi
For the completeness, here is the code that i used:
path = 'C:\Daten\test\test.avi';
videoRec = videoinput('winvideo',1,'MJPG_1280x720');
src = getselectedsource(videoRec);
src.FrameRate = '30.0000';
set(videoRec,'TriggerRepeat',inf);
set(videoRec, 'LoggingMode', 'disk');
logger = VideoWriter(path);
set(logger,'FrameRate',str2double(src.FrameRate));
videoRec.Disklogger = logger;
start(videoRec);
What is the most preferable way to encode internet video?
2-Pass encoding probably takes longer processing time, but results in lower file size, and more average bitrate (?) Correct?
CRF (constant rate factor) results in a constant rate, but higher file size?
What is default way sites like youtube, vimeo encode their videos? And should I do it any other way than I do now with 2-Pass encoding?
Fredrick is right about VBR vs. CBR, but dropson mentions CRF (constant rate factor), which is actually kind of a third method. CBR and VBR both lock in on a bit rate, while CRF locks in on a perceived visual quality. It also takes into account motion in the video, and can typically achieve better compression than 2-pass VBR. More info.
It's the default setting if you're using x264 or Zencoder. I'd go with CRF any time you're doing h.264.
There are two encoding modes for video
CBR or Constant Bit Rate
Main usage is when you have a fixed carrier for your data, the best example here is the video telephony Use Case, where audio/video/control information needs to co-exist on a fixed 64 kbit carrier. Since this is a real-time UC, one pass encoding is used and the rate-controller (RC) does it's best to have a fixed number of bits assigned to each frame so that the bitrate is deterministic.
VBR or Variable Bit Rate
This encoding scheme is used practically every where else. Variable here means that e.g. if the video goes black or no motion, no bits are sent, i.e bitrate is 0 for this particular moment, then when things starts to move again, the bitrate sky rockets.
This encoding scheme have normally no real-time requirements, e.g. when encoding/transcoding an video. Normally you would use a multipass encoder here to get the highest quality and to even out the bitrate-peakes.
Youtube uses VBR. Use e.g clive to download videos from youtube and analyse them using ffmpeg and you'll see the variable bitrate in action.
As always, wikipedia is your friend, read their entry on VBR and CBR
There is no reason for you to use anything else than VBR (unless you plan to set up an streaming-server)