Parallelize encoding of audio-only segments in ffmpeg - ffmpeg

We are looking to decrease the execution time of segmentation/encoding from wav to aac segmented for HTTP live streaming using ffmpeg to segment and generate a m3u8 playlist by utilizing all the cores of our machine.
In one experiment, I had ffmpeg directly segment a wav file into aac with libfdk_aac, however it took quite a long time to finish.
In the second experiment, I had ffmpeg segment a wav file as is (wav) which was quite fast (< 1 second on our machines), then use GNU parallel to execute ffmpeg again to encode the wav segments to aac and manually changed the .m3u8 file without changing their durations. This was performed much faster however "silence" gaps could be heard when streaming the output audio.
I have initially tried the second scenario using mp3 and result was still quite the same. Though I've read that lame adds padding during encoding (http://scruss.com/blog/2012/02/21/generational-loss-in-mp3-re-encoding/), does this this mean that libfdk_aac also adds padding during encoding?
Maybe this one is related to this question: How can I encode and segment audio files without having gaps (or audio pops) between segments when I reconstruct it?

According to section 4 of HLS Specification, we have this:
A Transport Stream or audio elementary stream segment MUST be the
continuation of the encoded media at the end of the segment with the
previous sequence number, where values in a continuous series, such as
timestamps and Continuity Counters, continue uninterrupted
"Silence" gaps are 99,99% of times related to wrong counters/discontinuity. Because you wrote that you manually changed the .m3u8 file without changing their durations I deduce you tried to cut the audio by yourself. It can't be done.
An HLS stream can't have a parallelizable creation because of these counters. They must follow a sequence [ MPEG2-TS :-( ]. You better get a faster processor.

Related

Is there a way to predict the amount of memory needed for ffmpeg?

I've just starting using ffmpeg and I want to create a VR180 video from a list of images with resolution 11520x5760. (Images are 80MB each, i have for now just 225 for testing.)
I used the code :
ffmpeg -framerate 30 -i "%06d.png" "output.mp4"
I ran out of my 8G RAM and ffmpeg crashed.
So I've create a 10G swap, ffmpeg filled it up and crashed.
Is there a way to know how much is needed for an ffmpeg command to run properly ?
Please provide output of the ffmpeg command when you run it.
I'm assuming FFmpeg will transcode to H.264, so it will create a H.264 encoder. Most memory sits in the lookahead queue and reference buffers. For H.264, the default for --rc-lookahead is 40. I believe H.264 allows something like 2x4=8 references (?) + current frame(s) (there can be frame-threading), so let's say roughly 50 frames in total. Frame size for YUV420P data is 1.5xresolution, so 1.5x11520x5760x50=~5GB. Add to that encoder-specific data which roughly doubles this, so 10GB should be enough.
If 8+10GB is not enough, my rough handwavy calculation is probably not precise enough. Your options are:
significantly reduce --rc-lookahead, --threads and --level so there's fewer frames alive at a time - read the documentation for each of these options to understand what they do, what their defaults are and what to change them to to reduce memory usage (see e.g. note here for --rc-lookahead).
You can also use a different (less complex) codec that has smaller memory requirements.

How to set frame offset for source and reference video for calculating using FFMpeg commandline?

I have a scenario where I am streaming a reference video on a server machine and receiving it at a client machine with exact same codec, using FFMpeg via UDP/RTP.
So, I have a reference.avi file and a recording.ts file with me. Now, due to a network side issue and FFMpeg discarding old frames, often the recording.ts lacks exactly 12 FRAMES from the beginning. Sometimes, it may lack more frames in-between but that'd due to general network traffic and packet loss reason and I don't plan to account for that. Anyways, due to those 12 frames, when I calculate the PSNR, it drops down to ~13, even though remaining frames may/may not be affected.
So, my aim is to discard first 12 frames from reference.ts and then compare. For that, I would also need to adjust the frames from recording.ts.
Consider the following scenario:
reference.ts has 1500 frames. So naturally I am going to cut-short it 1488. Then we have the following cases:
recording.ts has 1500 frames. This is not affected. Still I will remove 12 frames to match the count. So frame 1 would then represent frame 13.
recording.ts has 1496 frames. This is not affected. Still I will remove 12 frames even though it'd get to 1484 count assuming that frame 1 would then represent frame 13.
recording.ts has 1488 frames. This is affected. No need to remove frames.
recording.ts has 1480 frames. This is affected. No need to remove frames.
Once that is done, then I will calcualte the PSNR. So, my FFMpeg should be able to do all this, hopefully in a single command on bash.
A better alternative would be for FFMpeg to find the where the 13th frame is in recording.ts and then cut-short from the beginning. That'd be more preferred and even more if there is no cut-shorting required, i.e. if offset could be set in-line to command and no additional video output is generated for use in PSNR comparison.
Current I am using the following command to calculate the PSNR.
ffmpeg -i 'recording.ts' -vf "movie='reference.avi', psnr=stats_file='psnr.txt'" -f rawvideo -y /dev/null
It'd be great if somebody could help me in this regard. Thanks.

Convert LPCM buffer to AAC for HTTP Live Streaming

I have an application that records audio from devices into a Float32 (LPCM) buffer.
However, LPCM needs to be encoded in an audio format (MP3, AAC) to be used as a media segment to be streamed, according to the HTTP Live Streaming specifications. I have found some useful resources on how to convert a LPCM file to an AAC / MP3 file but this is not exactly what I am looking for, since I am not willing to convert a file but a buffer.
What are the main differences between converting an audio file and a raw audio buffer (LPCM, Float32)? Is the latter more trivial?
My initial thought was to create a thread that would regularly fetch data from a ring buffer (where the raw audio is stored) and convert it to a a valid audio format (either AAC or MP3).
Would it be more sensible to do so immediately when the AudioBuffer is captured through a AURenderCallback and hence pruning the ring buffer?
Thanks for your help,
The core audio recording buffer length and the desired audio file length are rarely always exactly the same. So it might be better to poll your circular/ring buffer (you know the sample rate, which should tell approximately how often) to decouple the two rates, and convert the buffer (if filled sufficiently) to a file at a later time. You can memory map a raw audio file to the buffer, but there may or may not be any performance difference between that, and async writing a temp file.

raspberry pi 3 OpenMax EmptyThisBuffer slow response when transcoding with libav or ffmpeg

The context is transcoding on a Raspberry Pi 3 from 1080i MPEG2 TS to 1080p#30fps H264 MP4 using libav avconv or ffmpeg. Both are using almost idenitical omx.c source file and share the same result.
The performance is short of 30fps (about 22fps) which makes it unsuitable for live transcoding without reducing the frame rate.
By timestamping the critical code, I noticed the following:
OMX_EmptyThisBuffer can take 10-20 msec to return. The spec/document indicates that this should be < 5msec. This along would almost account for the performance deficit. Can someone explains why this OMX call is out of spec?
In omx.c, a zerocopy option is used to optimized the image copying performance. But the precondition (contiguous planes and stride alignment) for this code is never satisfied and this the optimization was never in effect. Can someone explain how this zerocopy optimization can be employed?
Additional question on h264_omx encoder: it seems to on accept MP4 or raw H264 output format. How difficult it is to add other format, e.g. TS?
Thanks

Transcode HLS Segments individually using FFMPEG

I am recording a continuous, live stream to a high-bitrate HLS stream. I then want to asynchronously transcode this to different formats/bitrates. I have this working, mostly, except audio artefacts are appearing between each segment (gaps and pops).
Here is an example ffmpeg command line:
ffmpeg -threads 1 -nostdin -loglevel verbose \
-nostdin -y -i input.ts -c:a libfdk_aac \
-ac 2 -b:a 64k -y -metadata -vn output.ts
Inspecting an example sound file shows that there is a gap at the end of the audio:
And the start of the file looks suspiciously attenuated (although this may not be an issue):
My suspicion is that these artefacts are happening because transcoding are occurring without the context of the stream as a whole.
Any ideas on how to convince FFMPEG to produce audio that will fit back into a HLS stream?
** UPDATE 1 **
Here are the start/end of the original segment. As you can see, the start still appears the same, but the end is cleanly ended at 30s. I expect some degree of padding with lossy encoding, but I there is some way that HLS manages to do gapless playback (is this related to iTunes method with custom metadata?)
** UPDATED 2 **
So, I converted both the original (128k aac in MPEG2 TS) and the transcoded (64k aac in aac/adts container) to WAV and put the two side-by-side. This is the result:
I'm not sure if this is representative of how a client will play it back, but it seems a bit odd that decoding the transcoded one introduces a gap at the start and makes the segment longer. Given they are both lossy encoding, I would have expected padding to be equally present in both (if at all).
** UPDATE 3 **
According to http://en.wikipedia.org/wiki/Gapless_playback - Only a handful of encoders support gapless - for MP3, I've switched to lame in ffmpeg, and the problem, so far, appears to have gone.
For AAC (see http://en.wikipedia.org/wiki/FAAC), I have tried libfaac (as opposed to libfdk_aac) and it also seems to produce gapless audio. However, the quality of the latter isn't that great and I'd rather use libfdk_aac is possible.
This is more of a conceptual answer rather than containing explicit tools to use, sorry, but it may be of some use in any case - it removes the problem of introducing audio artifacts at the expense of introducing more complexity in your processing layer.
My suggestion would be to not split your uncompressed input audio at all, but only produce a contiguous compressed stream that you pipe into an audio proxy such as an icecast2 server (or similar, if icecast doesn't support AAC) and then do the split/recombine on the client-side of the proxy using chunks of compressed audio.
So, the method here would be to regularly (say, every 60sec?) connect to the proxy and collect a chunk of audio a little bit bigger than the period that you are polling (say, 75sec worth?) - this needs to be set up to run in parallel, since at some points there will be two clients running - it could even be run from cron if need be or backgrounded from a shell script ...
Once that's working, you will have a series of chunks of audio that overlap a little - you'd then need to do some processing work to compare these and isolate the section of audio in the middle which is unique to each chunk ...
Obviously this is a simplification, but assuming that the proxy does not add any metadata info (ie, ICY data or hinting) then splitting up the audio this way should allow the processed chunks to be concatenated without any audio artifacts since there is only one set of output for the original audio input and comparing them will be a doddle since you actually don't care one whit about the format, it's just bytes at that point.
The benefit here is that you've disconnected the audio encoder from the client, so if you want to run some other process in parallel to transcode to different formats or bit rates or chunk the stream more aggressively for some other consumer then that doesn't change anything on the encoder side of the proxy - you just add another client to the proxy using a tool chain similar to the above.

Resources