FFMPEG Seeking with concat demuxer causes video & audio to be out of sync - ffmpeg

I have a very simple use case that's driving me bananas.
My problem and question:
I'm using ffmpeg version 5.1.2 on a MacOS and i'm using ffmpeg seeking and concat demuxer to cut many 1 minute videos into 15 seconds chopped up over 12 clips where every clip is just 2 seconds from the same video (kind of like a mini teasers for the video). I would really like to not have to re-encode to make the video processing as fast as possible.
First, I take each 1 minute video and cut it up into 12 clips (I do all this programmatically in python fwiw)
ffmpeg -ss 0 -i input.mp4 -t 2 -c copy -y cut_1.mp4
ffmpeg -ss 4 -i input.mp4 -t 2 -c copy -y cut_2.mp4
ffmpeg -ss 8 -i input.mp4 -t 2 -c copy -y cut_3.mp4
I then write all the output file names to my concat_manifest.txt
file cut_1.mp4
file cut_2.mp4
Then I run my concat command:
ffmpeg -f concat -i concat_manifest.txt -c copy -y concat_video.mp4
This works really fast but the audio and video at the stitch point get out of sync and sometimes the video just chokes & lags. It's mostly not a smooth experience.
What I have tried:
using the concat protocol with intermediate profiles: ffmpeg.org/wiki/Concatenate#demuxer
Putting the -ss when I seek after the -i. This makes everything worse
Playing around with different -ss values. This has some noticeable affects but it's not obvious why yet.
I've also read from the ffmpeg resource regarding seeking and copying:
Which leads me to believe that maybe because ffmpeg is using timestamps instead of frames, seeking isn't accurate using -ss when using the concat demuxer
Is there a way to get concat demuxer cutting and concatenating the video where the audio is somewhat in sync with the video?
EDIT: I found an answer and i'll be posting the solution in the coming few days.


ffmpeg: crop video into two grayscale sub-videos; guarantee monotonical frames; and get timestamps

The need
Hello, I need to extract two regions of a .h264 video file via the crop filter into two files. The output videos need to be monochrome and extension .mp4. The encoding (or format?) should guarantee that video frames are organized monotonically. Finally, I need to get the timestamps for both files (which I'd bet are the same timestamps that I would get from the input file, see below).
In the end I will be happy to do everything in one command via an elegant one liner (via a complex filter I guess), but I start doing it in multiple steps to break it down in simpler problems.
In this path I get into many difficulties and despite having searched in many places I don't seem to find solutions that work. Unfortunately I'm no expert of ffmpeg or video conversion, so the more I search, the more details I discover, the less I solve problems.
Below you find some of my attempts to work with the following options:
-filter:v "crop=400:ih:260:0,format=gray" to do the crop and the monochrome conversion
-vf showinfo possibly combined with -vsync 0 or -copyts to get the timestamps via stderr redirection &> filename
-c:v mjpeg to force monotony of frames (are there other ways?)
1. cropping each region and obtaining monochrome videos
$ ffmpeg -y -hide_banner -i inVideo.h264 -filter:v "crop=400:ih:260:0,format=gray" outL.mp4
$ ffmpeg -y -hide_banner -i inVideo.h264 -filter:v "crop=400:ih:1280:0,format=gray" outR.mp4
The issue here is that in the output files the frames are not organized monotonically (I don't understand why; how come would that make sense in any video format? I can't say if that comes from the input file).
EDIT. Maybe it is not frames, but packets, as returned by av .demux() method that are not monotonic (see below "instructions to reproduce...")
I have got the advice to do a ffmpeg -i outL.mp4 outL.mjpeg after, but this produces two videos that look very pixellated (at least playing them with ffplay) despite being surprisingly 4x bigger than the input. Needless to say, I need both monotonic frames and lossless conversion.
EDIT. I acknowledge the advice to specify -q:v 1; this fixes the pixellation effect but produces a file even bigger, ~12x in size. Is it necessary? (see below "instructions to reproduce...")
2. getting the timestamps
I found this piece of advice, but I don't want to generate hundreds of image files, so I tried the following:
$ ffmpeg -y -hide_banner -i outL.mp4 -vf showinfo -vsync 0 &>tsL.txt
$ ffmpeg -y -hide_banner -i outR.mp4 -vf showinfo -vsync 0 &>tsR.txt
The issue here is that I don't get any output because ffmpeg claims it needs an output file.
The need to produce an output file, and the doubt that the timestamps could be lost in the previous conversions, leads me back to making a first attempt of a one liner, where I am testing also the -copyts option, and the forcing the encoding with -c:v mjpeg option as per the advice mentioned above (don't know if in the right position though)
ffmpeg -y -hide_banner -i testTex2.h264 -copyts -filter:v "crop=400:ih:1280:0,format=gray" -vf showinfo -c:v mjpeg eyeL.mp4 &>tsL.txt
This does not work because surprisingly the output .mp4 I get is the same as the input. If instead I put the -vf showinfo option just before the stderr redirection, I get no redirected output
ffmpeg -y -hide_banner -i testTex2.h264 -copyts -filter:v "crop=400:ih:260:0,format=gray" -c:v mjpeg outR.mp4 -vf showinfo dummy.mp4 &>tsR.txt
In this case I get the desired timestamps output (too much: I will need some solution to grab only the pts and pts_time data out of it) but I have to produce a big dummy file. The worst thing is anyway, that the mjpeg encoding produces a low resolution very pixellated video again
I admit that the logic how to place the options and the output files on the command line is obscure to me. Possible combinations are many, and the more options I try the more complicated it gets, and I am not getting much closer to the solution.
3. [EDIT] instructions how to reproduce this
get a .h264 video
turn it into .mp by ffmpeg command $ ffmpeg -i inVideo.h264 out.mp4
run the following python cell in a jupyter-notebook
see that the packets timestamps have diffs greater and less than zero
%matplotlib inline
import av
import numpy as np
import matplotlib.pyplot as mpl
fname, ext="outL.direct", "mp4"
pk_pts=np.array([p.pts for p in cont.demux(video=0) if p.pts is not None])
fm_pts=np.array([f.pts for f in cont.decode(video=0) if f.pts is not None])
finally create also the mjpeg encoded files in various ways, and check packets monotony with the same script (see also file size)
$ ffmpeg -i inVideo.h264 out.mjpeg
$ ffmpeg -i inVideo.h264 -c:v mjpeg out.c_mjpeg.mp4
$ ffmpeg -i inVideo.h264 -c:v mjpeg -q:v 1 out.c_mjpeg_q1.mp4
Finally, the question
What is a working way / the right way to do it?
Any hints, even about single steps and how to rightly combine them will be appreciated. Also, I am not limited tio the command line, and I would be able to try some more programmatic solution in python (jupyter notebook) instead of the command line if someone points me in that direction.

FFMPEG - Speed up video for time lapse - quicker/faster?

Okay, I know this question has been asked a bajillion times. However, I have one small addition to the question that I haven't seem to have been able to find in my googling.
I'm certainly not a pro at FFMPEG...I've been using the standard speed up/slow down template for FFMPEG, the one I'm using is:
ffmpeg -i input.mp4 -filter:v "setpts=PTS/60" -an output.mp4
I'm currently working with an hour long 4K/60FPS video...I want to shrink it down to about 30 seconds or so, so I'm using PTS/100, and I don't need audio...the problem is, this is taking FOREVER...which I completely expected it to.
But as I'm sitting here waiting for it to finish...I can't help but wonder...is there a faster/more efficient way to accomplish this? I know there's a lot of weird things about FFMPEG in regards to the order of the commands you use to speed up seek time, and presets and etc.
You can use
ffmpeg -itsscale 0.016667 -i input.mp4 -c copy -an output.mp4
where 0.016667 is 1/60.
However, this will keep all frames, and if the input timebase doesn't have sufficient resolution, you'll have incorrect timestamps. You can work around that by creating a temp file first.
ffmpeg -i input.mp4 -c copy -video_track_timescale 90k -an temp.mp4
and then running the first command on this temp file.
This sequence of commands may be helpful to solve that issue:
ffmpeg -i source.avi -r 0.016667 image/image%05d.bmp
ffmpeg -i image/image%05d.bmp -vcodec libx264 -b:v 500k -f avi video.avi

ffmpeg multiple overlays at different times

I have written a bash script which cycles through various RTMP live streams and switches every thirty seconds.
I have a PNG sequence which plays at the start of the video (blindRev-%d.png). A blind is pulled up, revealing the stream.
28 seconds later, I would like it to come back down to cover the stream so that when the next stream is loaded, it retracts to reveal the next stream in sequence once again (blind-%d.png). I've tried using itsoffset to accomplish this. No audio is required.
However, only the first PNG sequence is played, the second never seems to happen.
The command I am using is:
ffmpeg -i rtmp://localhost/live/$stream -i blind/blindRev-%d.png -itsoffset 28 -i blind/blind-%d.png -filter_complex overlay -an -f flv rtmp://localhost/live/All
What am I doing incorrectly?
Many thanks.
You'll have to loop the images and then alter their timestamps.
ffmpeg -i rtmp://localhost/live/$stream -loop 1 -i blind/blindRev-%d.png
-filter_complex [1]setpts='mod(N,50)/25/TB+30*trunc(N/50)'[im];[[0][im]overlay
-an -f flv rtmp://localhost/live/All
I assume that your sequence has 50 images, and that you want the overlay to start every 30 seconds.

How to capture multiple screenshot from online video stream using ffmpeg with specific seek time

I'm using ffmpeg to take screenshot from online video stream. I want to seek multiple timeline. I've used the following command to capture 1 screenshot by seek command:
ffmpeg -ss 00:02:10 -i "stream-url" -frames:v 1 out1.jpg
How I can take multiple screenshot via multiple seek time. I've searched for the solution but no success.
I've used the following command to take multiple screenshot as follows:
ffmpeg -noaccurate_seek -ss 00:01:10 -i "stream-url" -map 0:v:0 -vframes 1 -f mpeg "thumb/output_01.jpg" -ss 00:02:10 -i "stream-url" -map 1:v:0 -vframes 1 -f mpeg "thumb/output_02.jpg"
Is there any way to generate screenshots from same input via seek command? How to make it more faster? How to skip multiple input(-i param)? I've also tried with other commands but those are more slower. Can anyone help me?
There's no easy way I know to specify a number of arbitrary seek points from which to extract frames (similar question here).
However, seeking is very fast with the way you specified. Instead of constructing a complex command, you could just download the YouTube video using youtube-dl (if you haven't done that already) and generate the commands like this:
ffmpeg -ss 00:01:10 -i input -frames:v 1 out1.jpg
ffmpeg -ss 00:02:05 -i input -frames:v 1 out2.jpg
ffmpeg -ss 00:03:20 -i input -frames:v 1 out3.jpg
Note that exporting JPG might lead to low quality. Using PNG is preferred; you will get lossless frames that you can handle with another program later (e.g. to resize or compress).
If you want to get frames from regular intervals, use the fps filter to drop the framerate:
ffmpeg -i input -filter:v fps=1/60 out%02d.jpg
This will output a frame every minute (1/60 frames per second = 1 frame per minute), with two zero-padded digits as output numbers. You could additionally offset the start by providing a -ss option before the input file.

Add multiple audio files to video at specific points using FFMPEG

I am trying to create a video out of a sequence of images and various audio files using FFmpeg. While it is no problem to create a video containing the sequence of images with the following command:
ffmpeg -f image2 -i image%d.jpg video.mpg
I haven't found a way yet to add audio files at specific points to the generated video.
Is it possible to do something like:
ffmpeg -f image2 -i image%d.jpg -i audio1.mp3 AT 10s -i audio2.mp3 AT 15s video.mpg
Any help is much appreciated!
The solution in my case was to use sox as suggested by blahdiblah in the answer below. You first have to create an empty audio file as a starting point like that:
sox -n -r 44100 -c 2 silence.wav trim 0.0 20.0
This generates a 20 sec empty WAV file. After that you can mix the empty file with other audio files.
sox -m silence.wav "|sox sound1.mp3 -p pad 0" "|sox sound2.mp3 -p pad 2" out.wav
The final audio file has a duration of 20 seconds and plays sound1.mp3 right at the beginning and sound2.mp3 after 2 seconds.
To combine the sequence of images with the audio file we can use FFmpeg.
ffmpeg -i video_%05d.png -i out.wav -r 25 out.mp4
See this question on adding a single audio input with some offset. The -itsoffset bug mentioned there is still open, but see users' comments for some cases in which it does work.
If it works in your case, that would be ideal:
ffmpeg -i in%d.jpg -itsoffset 10 -i audio1.mp3 -itsoffset 15 -i audio2.mp3 out.mpg
If not, you should be able to combine all the audio files with sox, overlaying or inserting silence to produce the correct offsets and then use that as input to FFmpeg. Not as convenient, but guaranteed to work.
One approach I can think of is to create your audio file for the whole duration of the video first and then mux the audio with the video file
