Faster FFMPEG conversion from WMV - ffmpeg

I'm converting a lot of WMV files to MP4.
The command is:
ffmpeg -loglevel 0 -y -i source.wmv destination.mp4
I have roughly 100gb to convert. 24 hours later it's still not done (xeon, 64gb, super fast storage)
Am I missing something out? is there a better way to convert?

Here's a list of various things you can try:
Preset
Use a faster x264 encoding preset. A preset is a set of options that gives a speed vs compression efficiency tradeoff. Current presets in descending order of speed are: ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, placebo. The default preset is "medium". Example:
ffmpeg -i input.wmv -preset fast output.mp4
CPU capabilities
Check that the encoder is actually using the capabilities of your CPU. When encoding via libx264 the console output should show something like:
[libx264 # 0x7f8451001e00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
If it shows none then encoding speed will suffer. Your x264 is likely misconfigured and you'll need to get a new one or re-compile.
ASM
Related to the above suggestion, but make sure your ffmpeg was not configured with --disable-asm, --disable-inline-asm, and/or --disable-yasm.
Also, make sure your x264 that is linked to ffmpeg is not compiled with --disable-asm.
If these configure options are used then encoding will be much slower.
AAC
You can encode AAC audio faster using the -aac_coder fast option when using the native FFmpeg AAC encoder (-c:a aac). However, this will have much less of an impact than choosing a faster preset for H.264 video encoding, and the audio quality will probably be reduced when compared to omitting -aac_coder fast.
ffmpeg
Use a recent ffmpeg. See the FFmpeg Download page for links to builds for Linux, macOS, and Windows.

In my case simple WMV to MP4 conversion using this command ffmpeg -i input.wmv output.mp4 increased size of file from 5.7MB to 350MB and the speed was 0.190x, which took ages to make mp4 video. Anyway, I waited more than 2 hours for it to finish and found out, that output video had 1000 frames/second. Having in mind, that it was Full HD video, 2 hours were pretty okay. My solution looks like this
ffmpeg -i input.wmv -crf 26 -vf scale=iw/3:ih/3,fps=15 output.mp4
Here I reduce video height and width in 3 times, make it only 15 fps and make a little compression
which resulted in:
only 5.7MB -> 15MB
278x speed improvement! (from 0.19x to 52.9x)
some quality loss, but for me it was not so neccessary

Related

Is there a way to predict the amount of memory needed for ffmpeg?

I've just starting using ffmpeg and I want to create a VR180 video from a list of images with resolution 11520x5760. (Images are 80MB each, i have for now just 225 for testing.)
I used the code :
ffmpeg -framerate 30 -i "%06d.png" "output.mp4"
I ran out of my 8G RAM and ffmpeg crashed.
So I've create a 10G swap, ffmpeg filled it up and crashed.
Is there a way to know how much is needed for an ffmpeg command to run properly ?
Please provide output of the ffmpeg command when you run it.
I'm assuming FFmpeg will transcode to H.264, so it will create a H.264 encoder. Most memory sits in the lookahead queue and reference buffers. For H.264, the default for --rc-lookahead is 40. I believe H.264 allows something like 2x4=8 references (?) + current frame(s) (there can be frame-threading), so let's say roughly 50 frames in total. Frame size for YUV420P data is 1.5xresolution, so 1.5x11520x5760x50=~5GB. Add to that encoder-specific data which roughly doubles this, so 10GB should be enough.
If 8+10GB is not enough, my rough handwavy calculation is probably not precise enough. Your options are:
significantly reduce --rc-lookahead, --threads and --level so there's fewer frames alive at a time - read the documentation for each of these options to understand what they do, what their defaults are and what to change them to to reduce memory usage (see e.g. note here for --rc-lookahead).
You can also use a different (less complex) codec that has smaller memory requirements.

ffmpeg encoder streaming issues

I am trying to build ffmpeg encoder on linux. I started with a custom built server Dual 1366 2.6 Ghz Xeon CPUs (6 cores) with 16 GB RAM with Ubuntu 16.04 minimal install. Built ffmpeg with h264 and aac. I am taking live source OTA channels and encoding/streaming them with following parameters
-vcodec libx264 -preset superfast -crf 25 -x264opts keyint=60:min-keyint=60:scenecut=-1 -bufsize 7000k -b:v 6000k -maxrate 6300k -muxrate 6000k -s 1920x1080 -format yuv420p -g 60 -sn -c:a aac -b:a 384k -ar 44100
And I am able to successfully udp out using mpegts. My problem starts with 5th stream. The server can handle four streams and as soon as I introduce 5th stream I start seeing hiccups in output. Looking at my cpu usage using top I still see only 65% to 75% usage with occasional 80% hit. Memory usage is well within acceptable parameters. So I am wondering either top is not giving me accurate cpu usage or something is not right with ffmpeg. The server is isolated for udp in/out on a 1 Gbps network.
I decided to up the cpu power and installed two 3.5 Ghz CPUs (6 cores) thinking it was perhaps the cpu clock. To my surprise the results were no different. So now I am wondering is there some built in limit I am hitting when I process at 1080p. If I change the resolution to 720p it is able to process 8 streams but 720 is not acceptable.
My target is 10 1080p streams per server.
So my questions are
1. If I use a quad motherboard and up the cpu count to 4 (6 or 8 cores) will I get 10 1080p streams? Is there any theoretical max I can go with ffmpeg per machine?
2. Do cores matter more or does clock matter more?
3. Any suggestions in improvement with my options. I have tried ultrafast preset but the output quality is unacceptable.
Thanks in advance
Have you really excluded the CPU? Make sure to check how each individual core is operating. If no core is reaching 100%, then your most likely candidate is bandwidth: either your motherboard cannot handle all the data going back and forth, or your memory. Exchanging memory with a faster version is a simple test and should give you your answer.

raspberry pi 3 OpenMax EmptyThisBuffer slow response when transcoding with libav or ffmpeg

The context is transcoding on a Raspberry Pi 3 from 1080i MPEG2 TS to 1080p#30fps H264 MP4 using libav avconv or ffmpeg. Both are using almost idenitical omx.c source file and share the same result.
The performance is short of 30fps (about 22fps) which makes it unsuitable for live transcoding without reducing the frame rate.
By timestamping the critical code, I noticed the following:
OMX_EmptyThisBuffer can take 10-20 msec to return. The spec/document indicates that this should be < 5msec. This along would almost account for the performance deficit. Can someone explains why this OMX call is out of spec?
In omx.c, a zerocopy option is used to optimized the image copying performance. But the precondition (contiguous planes and stride alignment) for this code is never satisfied and this the optimization was never in effect. Can someone explain how this zerocopy optimization can be employed?
Additional question on h264_omx encoder: it seems to on accept MP4 or raw H264 output format. How difficult it is to add other format, e.g. TS?
Thanks

Allow MEncoder to use more CPU

I am using MEncoder to combine a huge amount of jpg pictures into a time-lapse video. I have two main folders with about 10 subfolders each and in order to automate the process i am running:
find . -type d -name img -exec sh -c '(cd {} && /Volumes/SAMSUNG/PedestrianBehaviour/BreakableBonds/jpg2avi.sh t*)' ';'
where jpg2avi is the settings for MEncoder.
mencoder 'mf://t00*.jpg' -mf fps=10 -o raw.avi -ovc lavc -lavcopts vcodec=mpeg4:vhq:vbitrate=2000 -o out.avi
In order to parallelize it I have started this command in the two folders BreakableBonds and UnBreakableBonds. However each process only uses about 27% so a total of a bit above 50%. Are there any way that I can accelerate this? such that each process takes up about 50%. (I am aware that 50% on each process is not possible.)
Depending on the video codec you're using (x264 is a good choice), one encode should be able to saturate several CPU cores. I usually use ffmpeg directly, because mencoder was designed with some AVI-like assumptions.
See ffmpeg's wiki page on how to do this.
Again, I'd highly recommend h.264 in an mkv or mp4 container, rather than anything in an avi container. h.264 in avi is a hack, and the usual codec for avi is divx (h.263). h.264 is a big step forward.
h.264 is for video what mp3 is for audio: the first codec that's Good Enough, and that came along just as CPUs were getting fast enough to do it in realtime, and disks and networks were capable of handling the file sizes that produce good quality. Use it with ffmpeg as a frontend for libx264.
h.265 (and vp9) are both better codecs (in terms of compression efficiency) than h.264, but are far less widely supported, and take more CPU time. If you want to use them, use ffmpeg as a frontend for libx265 or libvpx. x265 is under heavy development, so it's probably improved, but several months ago, given equal encode CPU time, x265 didn't beat x264 in quality per bitrate. Given much more CPU time, x265 can do a great job and make as-good-looking encodes at much less bitrate than x264, though.
All 3 of those codecs are multi-threaded and can saturate 4 cores even at fast settings. At higher settings (spending more CPU time per block), you can saturate at least 16 cores, I'd guess. I'm not sure, but I think you can efficiently take advantage of 64 cores, and maybe more, on a HD x265 encode.
At fast settings and high bitrates, the gzip-style entropy coding final stage (i.e. CABAC for x264) limits the amount of CPUs you can keep busy. It's a serial task, so the whole encode only goes as fast as one CPU can compress the final bitstream.

Transcode HLS Segments individually using FFMPEG

I am recording a continuous, live stream to a high-bitrate HLS stream. I then want to asynchronously transcode this to different formats/bitrates. I have this working, mostly, except audio artefacts are appearing between each segment (gaps and pops).
Here is an example ffmpeg command line:
ffmpeg -threads 1 -nostdin -loglevel verbose \
-nostdin -y -i input.ts -c:a libfdk_aac \
-ac 2 -b:a 64k -y -metadata -vn output.ts
Inspecting an example sound file shows that there is a gap at the end of the audio:
And the start of the file looks suspiciously attenuated (although this may not be an issue):
My suspicion is that these artefacts are happening because transcoding are occurring without the context of the stream as a whole.
Any ideas on how to convince FFMPEG to produce audio that will fit back into a HLS stream?
** UPDATE 1 **
Here are the start/end of the original segment. As you can see, the start still appears the same, but the end is cleanly ended at 30s. I expect some degree of padding with lossy encoding, but I there is some way that HLS manages to do gapless playback (is this related to iTunes method with custom metadata?)
** UPDATED 2 **
So, I converted both the original (128k aac in MPEG2 TS) and the transcoded (64k aac in aac/adts container) to WAV and put the two side-by-side. This is the result:
I'm not sure if this is representative of how a client will play it back, but it seems a bit odd that decoding the transcoded one introduces a gap at the start and makes the segment longer. Given they are both lossy encoding, I would have expected padding to be equally present in both (if at all).
** UPDATE 3 **
According to http://en.wikipedia.org/wiki/Gapless_playback - Only a handful of encoders support gapless - for MP3, I've switched to lame in ffmpeg, and the problem, so far, appears to have gone.
For AAC (see http://en.wikipedia.org/wiki/FAAC), I have tried libfaac (as opposed to libfdk_aac) and it also seems to produce gapless audio. However, the quality of the latter isn't that great and I'd rather use libfdk_aac is possible.
This is more of a conceptual answer rather than containing explicit tools to use, sorry, but it may be of some use in any case - it removes the problem of introducing audio artifacts at the expense of introducing more complexity in your processing layer.
My suggestion would be to not split your uncompressed input audio at all, but only produce a contiguous compressed stream that you pipe into an audio proxy such as an icecast2 server (or similar, if icecast doesn't support AAC) and then do the split/recombine on the client-side of the proxy using chunks of compressed audio.
So, the method here would be to regularly (say, every 60sec?) connect to the proxy and collect a chunk of audio a little bit bigger than the period that you are polling (say, 75sec worth?) - this needs to be set up to run in parallel, since at some points there will be two clients running - it could even be run from cron if need be or backgrounded from a shell script ...
Once that's working, you will have a series of chunks of audio that overlap a little - you'd then need to do some processing work to compare these and isolate the section of audio in the middle which is unique to each chunk ...
Obviously this is a simplification, but assuming that the proxy does not add any metadata info (ie, ICY data or hinting) then splitting up the audio this way should allow the processed chunks to be concatenated without any audio artifacts since there is only one set of output for the original audio input and comparing them will be a doddle since you actually don't care one whit about the format, it's just bytes at that point.
The benefit here is that you've disconnected the audio encoder from the client, so if you want to run some other process in parallel to transcode to different formats or bit rates or chunk the stream more aggressively for some other consumer then that doesn't change anything on the encoder side of the proxy - you just add another client to the proxy using a tool chain similar to the above.

Resources