I have hours long video files living on a server, and I need to be able to cut segments in them in an accurate and fast way.
I tried:
ffmpeg -ss 10:00 -i http://server/input.mp4 -t 5:00 -vcodec copy -acodec copy out.mp4
This works and is fast, but it's not precise as ffmpeg seeks backward until it finds a keyframe. The resulting video file ends up with a negative start time which is not working nice with browsers.
I know I can re-encode the video for accurate seeking, but this would be slow and lose quality.
I came up with the following idea:
For example, let's say we have a 20 seconds video, with keyframes every 5 secs, at 0,5,10,15,20.
If I want a segment from sec 2 to sec 17, I would re-encode from sec 2 to 5, vcopy from sec 5 to 15, re-encode from sec 15 to 17 and concat the three files.
This is the general problem and my idea, now the requirements are:
Source files must be accessed over HTTP
There must be no or very few quality loss
It must be fast, cutting 10 minutes in a 2h video should take around 1-2 sec
It must be accurate
It must work for webm and mp4
It doesn't have to be a ffmpeg command, I'm quite sure I'll have to write a C wrapper around libav.
Source file can be pre-processed in any necessary way to make it work.
UPDATE: After getting more info, it seems my "idea" of gluing re-encoded extremities parts with a stream copied middle part won't work because of different encoding settings. But I could extract a longuer part (keyframe -> keyframe) and extract timecodes I could add in an HTTP header, like Time-Range: 0.25,15.14 which would mean "play from sec 0.25 to 15.14 and discard the rest".
Related
I've just starting using ffmpeg and I want to create a VR180 video from a list of images with resolution 11520x5760. (Images are 80MB each, i have for now just 225 for testing.)
I used the code :
ffmpeg -framerate 30 -i "%06d.png" "output.mp4"
I ran out of my 8G RAM and ffmpeg crashed.
So I've create a 10G swap, ffmpeg filled it up and crashed.
Is there a way to know how much is needed for an ffmpeg command to run properly ?
Please provide output of the ffmpeg command when you run it.
I'm assuming FFmpeg will transcode to H.264, so it will create a H.264 encoder. Most memory sits in the lookahead queue and reference buffers. For H.264, the default for --rc-lookahead is 40. I believe H.264 allows something like 2x4=8 references (?) + current frame(s) (there can be frame-threading), so let's say roughly 50 frames in total. Frame size for YUV420P data is 1.5xresolution, so 1.5x11520x5760x50=~5GB. Add to that encoder-specific data which roughly doubles this, so 10GB should be enough.
If 8+10GB is not enough, my rough handwavy calculation is probably not precise enough. Your options are:
significantly reduce --rc-lookahead, --threads and --level so there's fewer frames alive at a time - read the documentation for each of these options to understand what they do, what their defaults are and what to change them to to reduce memory usage (see e.g. note here for --rc-lookahead).
You can also use a different (less complex) codec that has smaller memory requirements.
I have RTSP stream with very low and non-constant FPS (varying between 0.2 ... 0.5). It is generated using -skip_frame flag to reduce network and CPU usage as much as possible:
ffmpeg -skip_frame nointra -i <rtsp-source> -vsync 2 -f rtsp <low-fps-destination>
As a consequence, it takes a very long time (1 ... 3 minutes) to connect to that stream and see first meaningful image. I want this stream to work with generic players without any tweaks, so my decision was to re-stream it with higher FPS (10, to be exact):
ffmpeg -i <low-fps-source> -vf "fps=fps=10,setpts=N/(10*TB)" -f rtsp <normal-fps-destination>
Not entirely sure how that command is working, but it somehow did the trick and reduced connection time to about 5 seconds. However, I suspect that it's outputting frames in bursts, which is not ideal. For example, if original low-fps stream contains 2 frames which are 3 seconds apart, my re-stream command (probably) does the following:
When the first input frame comes, output a bunch of frames as fast as possible
Don't output anything for an entire 3 seconds
When the second frame comes, output 3 * 10 = 30 frames as fast as possible
Sleep again until new input frame shows up...
Is there a way to make my re-stream command output frames evenly (with a constant FPS)? Or maybe there's another way to reduce RTSP connection (buffering?) time?
I have a scenario where I am streaming a reference video on a server machine and receiving it at a client machine with exact same codec, using FFMpeg via UDP/RTP.
So, I have a reference.avi file and a recording.ts file with me. Now, due to a network side issue and FFMpeg discarding old frames, often the recording.ts lacks exactly 12 FRAMES from the beginning. Sometimes, it may lack more frames in-between but that'd due to general network traffic and packet loss reason and I don't plan to account for that. Anyways, due to those 12 frames, when I calculate the PSNR, it drops down to ~13, even though remaining frames may/may not be affected.
So, my aim is to discard first 12 frames from reference.ts and then compare. For that, I would also need to adjust the frames from recording.ts.
Consider the following scenario:
reference.ts has 1500 frames. So naturally I am going to cut-short it 1488. Then we have the following cases:
recording.ts has 1500 frames. This is not affected. Still I will remove 12 frames to match the count. So frame 1 would then represent frame 13.
recording.ts has 1496 frames. This is not affected. Still I will remove 12 frames even though it'd get to 1484 count assuming that frame 1 would then represent frame 13.
recording.ts has 1488 frames. This is affected. No need to remove frames.
recording.ts has 1480 frames. This is affected. No need to remove frames.
Once that is done, then I will calcualte the PSNR. So, my FFMpeg should be able to do all this, hopefully in a single command on bash.
A better alternative would be for FFMpeg to find the where the 13th frame is in recording.ts and then cut-short from the beginning. That'd be more preferred and even more if there is no cut-shorting required, i.e. if offset could be set in-line to command and no additional video output is generated for use in PSNR comparison.
Current I am using the following command to calculate the PSNR.
ffmpeg -i 'recording.ts' -vf "movie='reference.avi', psnr=stats_file='psnr.txt'" -f rawvideo -y /dev/null
It'd be great if somebody could help me in this regard. Thanks.
I am recording a continuous, live stream to a high-bitrate HLS stream. I then want to asynchronously transcode this to different formats/bitrates. I have this working, mostly, except audio artefacts are appearing between each segment (gaps and pops).
Here is an example ffmpeg command line:
ffmpeg -threads 1 -nostdin -loglevel verbose \
-nostdin -y -i input.ts -c:a libfdk_aac \
-ac 2 -b:a 64k -y -metadata -vn output.ts
Inspecting an example sound file shows that there is a gap at the end of the audio:
And the start of the file looks suspiciously attenuated (although this may not be an issue):
My suspicion is that these artefacts are happening because transcoding are occurring without the context of the stream as a whole.
Any ideas on how to convince FFMPEG to produce audio that will fit back into a HLS stream?
** UPDATE 1 **
Here are the start/end of the original segment. As you can see, the start still appears the same, but the end is cleanly ended at 30s. I expect some degree of padding with lossy encoding, but I there is some way that HLS manages to do gapless playback (is this related to iTunes method with custom metadata?)
** UPDATED 2 **
So, I converted both the original (128k aac in MPEG2 TS) and the transcoded (64k aac in aac/adts container) to WAV and put the two side-by-side. This is the result:
I'm not sure if this is representative of how a client will play it back, but it seems a bit odd that decoding the transcoded one introduces a gap at the start and makes the segment longer. Given they are both lossy encoding, I would have expected padding to be equally present in both (if at all).
** UPDATE 3 **
According to http://en.wikipedia.org/wiki/Gapless_playback - Only a handful of encoders support gapless - for MP3, I've switched to lame in ffmpeg, and the problem, so far, appears to have gone.
For AAC (see http://en.wikipedia.org/wiki/FAAC), I have tried libfaac (as opposed to libfdk_aac) and it also seems to produce gapless audio. However, the quality of the latter isn't that great and I'd rather use libfdk_aac is possible.
This is more of a conceptual answer rather than containing explicit tools to use, sorry, but it may be of some use in any case - it removes the problem of introducing audio artifacts at the expense of introducing more complexity in your processing layer.
My suggestion would be to not split your uncompressed input audio at all, but only produce a contiguous compressed stream that you pipe into an audio proxy such as an icecast2 server (or similar, if icecast doesn't support AAC) and then do the split/recombine on the client-side of the proxy using chunks of compressed audio.
So, the method here would be to regularly (say, every 60sec?) connect to the proxy and collect a chunk of audio a little bit bigger than the period that you are polling (say, 75sec worth?) - this needs to be set up to run in parallel, since at some points there will be two clients running - it could even be run from cron if need be or backgrounded from a shell script ...
Once that's working, you will have a series of chunks of audio that overlap a little - you'd then need to do some processing work to compare these and isolate the section of audio in the middle which is unique to each chunk ...
Obviously this is a simplification, but assuming that the proxy does not add any metadata info (ie, ICY data or hinting) then splitting up the audio this way should allow the processed chunks to be concatenated without any audio artifacts since there is only one set of output for the original audio input and comparing them will be a doddle since you actually don't care one whit about the format, it's just bytes at that point.
The benefit here is that you've disconnected the audio encoder from the client, so if you want to run some other process in parallel to transcode to different formats or bit rates or chunk the stream more aggressively for some other consumer then that doesn't change anything on the encoder side of the proxy - you just add another client to the proxy using a tool chain similar to the above.
We are looking to decrease the execution time of segmentation/encoding from wav to aac segmented for HTTP live streaming using ffmpeg to segment and generate a m3u8 playlist by utilizing all the cores of our machine.
In one experiment, I had ffmpeg directly segment a wav file into aac with libfdk_aac, however it took quite a long time to finish.
In the second experiment, I had ffmpeg segment a wav file as is (wav) which was quite fast (< 1 second on our machines), then use GNU parallel to execute ffmpeg again to encode the wav segments to aac and manually changed the .m3u8 file without changing their durations. This was performed much faster however "silence" gaps could be heard when streaming the output audio.
I have initially tried the second scenario using mp3 and result was still quite the same. Though I've read that lame adds padding during encoding (http://scruss.com/blog/2012/02/21/generational-loss-in-mp3-re-encoding/), does this this mean that libfdk_aac also adds padding during encoding?
Maybe this one is related to this question: How can I encode and segment audio files without having gaps (or audio pops) between segments when I reconstruct it?
According to section 4 of HLS Specification, we have this:
A Transport Stream or audio elementary stream segment MUST be the
continuation of the encoded media at the end of the segment with the
previous sequence number, where values in a continuous series, such as
timestamps and Continuity Counters, continue uninterrupted
"Silence" gaps are 99,99% of times related to wrong counters/discontinuity. Because you wrote that you manually changed the .m3u8 file without changing their durations I deduce you tried to cut the audio by yourself. It can't be done.
An HLS stream can't have a parallelizable creation because of these counters. They must follow a sequence [ MPEG2-TS :-( ]. You better get a faster processor.