Making hls files with -hls_time > 10 - ffmpeg

I am building a local app for watching local videos from the browser, because of some of the videos beeing over 1hour they started to lagg out and using HLS instead of .mp4 solved this.
In the app I'm building the user will often skip 10-40 seconds forward.
My question is: Should I use -hls_time 60 or would it be better to just use -hls_time 10
Current code: ffmpeg -i "input.mp4" -profile:v baseline -level 3.0 -start_number 0 -hls_time 10 -hls_playlist_type vod -f hls "input\index.m3u8"

Longer segments imply greater segment sizes so after a seek the player might take longer to resume depending on the available bandwidth and whether the required segment has already been retrieved or not.
If the app is intended for mobile devices where network conditions are expected to vary you will also need to consider adaptive streaming. In this case, with longer segments you will see less quality switching but you risk stalling the playback. You can find a more detailed article here.
Some observations about your ffmpeg command:
don't set the level as it's already auto-calculated if not specified and you risk getting it wrong and messing device compatibility checks.
segments are cut only on keyframes and their duration can be greater than the specified hls_time. If you need precise segment durations you need to insert a keyframe at the desired interval.

Related

Is there a way to predict the amount of memory needed for ffmpeg?

I've just starting using ffmpeg and I want to create a VR180 video from a list of images with resolution 11520x5760. (Images are 80MB each, i have for now just 225 for testing.)
I used the code :
ffmpeg -framerate 30 -i "%06d.png" "output.mp4"
I ran out of my 8G RAM and ffmpeg crashed.
So I've create a 10G swap, ffmpeg filled it up and crashed.
Is there a way to know how much is needed for an ffmpeg command to run properly ?
Please provide output of the ffmpeg command when you run it.
I'm assuming FFmpeg will transcode to H.264, so it will create a H.264 encoder. Most memory sits in the lookahead queue and reference buffers. For H.264, the default for --rc-lookahead is 40. I believe H.264 allows something like 2x4=8 references (?) + current frame(s) (there can be frame-threading), so let's say roughly 50 frames in total. Frame size for YUV420P data is 1.5xresolution, so 1.5x11520x5760x50=~5GB. Add to that encoder-specific data which roughly doubles this, so 10GB should be enough.
If 8+10GB is not enough, my rough handwavy calculation is probably not precise enough. Your options are:
significantly reduce --rc-lookahead, --threads and --level so there's fewer frames alive at a time - read the documentation for each of these options to understand what they do, what their defaults are and what to change them to to reduce memory usage (see e.g. note here for --rc-lookahead).
You can also use a different (less complex) codec that has smaller memory requirements.

Re-streaming RTSP with higher FPS

I have RTSP stream with very low and non-constant FPS (varying between 0.2 ... 0.5). It is generated using -skip_frame flag to reduce network and CPU usage as much as possible:
ffmpeg -skip_frame nointra -i <rtsp-source> -vsync 2 -f rtsp <low-fps-destination>
As a consequence, it takes a very long time (1 ... 3 minutes) to connect to that stream and see first meaningful image. I want this stream to work with generic players without any tweaks, so my decision was to re-stream it with higher FPS (10, to be exact):
ffmpeg -i <low-fps-source> -vf "fps=fps=10,setpts=N/(10*TB)" -f rtsp <normal-fps-destination>
Not entirely sure how that command is working, but it somehow did the trick and reduced connection time to about 5 seconds. However, I suspect that it's outputting frames in bursts, which is not ideal. For example, if original low-fps stream contains 2 frames which are 3 seconds apart, my re-stream command (probably) does the following:
When the first input frame comes, output a bunch of frames as fast as possible
Don't output anything for an entire 3 seconds
When the second frame comes, output 3 * 10 = 30 frames as fast as possible
Sleep again until new input frame shows up...
Is there a way to make my re-stream command output frames evenly (with a constant FPS)? Or maybe there's another way to reduce RTSP connection (buffering?) time?

ffmpeg encoder streaming issues

I am trying to build ffmpeg encoder on linux. I started with a custom built server Dual 1366 2.6 Ghz Xeon CPUs (6 cores) with 16 GB RAM with Ubuntu 16.04 minimal install. Built ffmpeg with h264 and aac. I am taking live source OTA channels and encoding/streaming them with following parameters
-vcodec libx264 -preset superfast -crf 25 -x264opts keyint=60:min-keyint=60:scenecut=-1 -bufsize 7000k -b:v 6000k -maxrate 6300k -muxrate 6000k -s 1920x1080 -format yuv420p -g 60 -sn -c:a aac -b:a 384k -ar 44100
And I am able to successfully udp out using mpegts. My problem starts with 5th stream. The server can handle four streams and as soon as I introduce 5th stream I start seeing hiccups in output. Looking at my cpu usage using top I still see only 65% to 75% usage with occasional 80% hit. Memory usage is well within acceptable parameters. So I am wondering either top is not giving me accurate cpu usage or something is not right with ffmpeg. The server is isolated for udp in/out on a 1 Gbps network.
I decided to up the cpu power and installed two 3.5 Ghz CPUs (6 cores) thinking it was perhaps the cpu clock. To my surprise the results were no different. So now I am wondering is there some built in limit I am hitting when I process at 1080p. If I change the resolution to 720p it is able to process 8 streams but 720 is not acceptable.
My target is 10 1080p streams per server.
So my questions are
1. If I use a quad motherboard and up the cpu count to 4 (6 or 8 cores) will I get 10 1080p streams? Is there any theoretical max I can go with ffmpeg per machine?
2. Do cores matter more or does clock matter more?
3. Any suggestions in improvement with my options. I have tried ultrafast preset but the output quality is unacceptable.
Thanks in advance
Have you really excluded the CPU? Make sure to check how each individual core is operating. If no core is reaching 100%, then your most likely candidate is bandwidth: either your motherboard cannot handle all the data going back and forth, or your memory. Exchanging memory with a faster version is a simple test and should give you your answer.

How to limit the backward dependency between coded frames in ffmpeg/x264

I am currently playing with ffmpeg + libx264, but i couldn't find a way to limit the backward dependency between coded frames.
Let me explain what i mean: I want the coded frames to only contain references to at most, let's say, 5 frames in the future. As a result, no frame has to "wait" for more than 5 frames to be coded (makes sense for low latency applications).
I am aware of the -tune zerolatency option, but that's not what i want; I still want bidirectional prediction.
If you mean to limit the number of consecutive B-frames then you can use the --bframes <integer> x264 option or the -bf <integer> FFmpeg option.
See also: Diary Of An x264 Developer - x264: the best low-latency...

Transcode HLS Segments individually using FFMPEG

I am recording a continuous, live stream to a high-bitrate HLS stream. I then want to asynchronously transcode this to different formats/bitrates. I have this working, mostly, except audio artefacts are appearing between each segment (gaps and pops).
Here is an example ffmpeg command line:
ffmpeg -threads 1 -nostdin -loglevel verbose \
-nostdin -y -i input.ts -c:a libfdk_aac \
-ac 2 -b:a 64k -y -metadata -vn output.ts
Inspecting an example sound file shows that there is a gap at the end of the audio:
And the start of the file looks suspiciously attenuated (although this may not be an issue):
My suspicion is that these artefacts are happening because transcoding are occurring without the context of the stream as a whole.
Any ideas on how to convince FFMPEG to produce audio that will fit back into a HLS stream?
** UPDATE 1 **
Here are the start/end of the original segment. As you can see, the start still appears the same, but the end is cleanly ended at 30s. I expect some degree of padding with lossy encoding, but I there is some way that HLS manages to do gapless playback (is this related to iTunes method with custom metadata?)
** UPDATED 2 **
So, I converted both the original (128k aac in MPEG2 TS) and the transcoded (64k aac in aac/adts container) to WAV and put the two side-by-side. This is the result:
I'm not sure if this is representative of how a client will play it back, but it seems a bit odd that decoding the transcoded one introduces a gap at the start and makes the segment longer. Given they are both lossy encoding, I would have expected padding to be equally present in both (if at all).
** UPDATE 3 **
According to http://en.wikipedia.org/wiki/Gapless_playback - Only a handful of encoders support gapless - for MP3, I've switched to lame in ffmpeg, and the problem, so far, appears to have gone.
For AAC (see http://en.wikipedia.org/wiki/FAAC), I have tried libfaac (as opposed to libfdk_aac) and it also seems to produce gapless audio. However, the quality of the latter isn't that great and I'd rather use libfdk_aac is possible.
This is more of a conceptual answer rather than containing explicit tools to use, sorry, but it may be of some use in any case - it removes the problem of introducing audio artifacts at the expense of introducing more complexity in your processing layer.
My suggestion would be to not split your uncompressed input audio at all, but only produce a contiguous compressed stream that you pipe into an audio proxy such as an icecast2 server (or similar, if icecast doesn't support AAC) and then do the split/recombine on the client-side of the proxy using chunks of compressed audio.
So, the method here would be to regularly (say, every 60sec?) connect to the proxy and collect a chunk of audio a little bit bigger than the period that you are polling (say, 75sec worth?) - this needs to be set up to run in parallel, since at some points there will be two clients running - it could even be run from cron if need be or backgrounded from a shell script ...
Once that's working, you will have a series of chunks of audio that overlap a little - you'd then need to do some processing work to compare these and isolate the section of audio in the middle which is unique to each chunk ...
Obviously this is a simplification, but assuming that the proxy does not add any metadata info (ie, ICY data or hinting) then splitting up the audio this way should allow the processed chunks to be concatenated without any audio artifacts since there is only one set of output for the original audio input and comparing them will be a doddle since you actually don't care one whit about the format, it's just bytes at that point.
The benefit here is that you've disconnected the audio encoder from the client, so if you want to run some other process in parallel to transcode to different formats or bit rates or chunk the stream more aggressively for some other consumer then that doesn't change anything on the encoder side of the proxy - you just add another client to the proxy using a tool chain similar to the above.

Resources