There are two ffmpeg commands. First one is used to seek and copy video chunk. Second one is used to transcode video chunk applying select filter for exact frames match.
Here is how:
ffmpeg -ss <sec_from> -to <sec_to> -copyts -i <input> -map 0:v:0 -c copy chunk.mp4
ffmpeg -copyts -i chunk.mp4 -vf 'select=between(pts\,<pts_from>\,<pts_to>)' transcoded_cunk.mp4
It works fine most of the times. But for some inputs there is a little pts drift in downloaded chunk so missing frames is possible. In other words pts of the same packets (compared by hash) are shifted by several points (in my case 0,0002 sec) between input and chunked output.
What is the possible reason for such pts drift?
UPDATE 1: That's because ffmpeg set timescale=1000 in mvhd atom so edit list media time to start from looses precision. Is it possible to force mvhd timescale?
UPDATE 2: It's not possible to change mvhd timescale because ffmpeg uses constant (MOV_TIMESCALE 1000):
https://github.com/FFmpeg/FFmpeg/blob/82bd02a2c73bb5e6b7cf5e5eba486e279f1a7358/libavformat/movenc.c#L3498
UPDATE 3: same issue discussed earlier
Related
Searching for a method to extract from video stream (.m2t or .ts file) a certain frames as is, encoded. OpenCV also extracts frames easily but decodes them immediately.
Given:
A .ts or .m2t file with H.264/MPEG-4 encoded stream.
Starting point in time for extraction like h:m:s.f (example: 0:2:1.12).
Ending point in time in the same format.
I need to read from the file all frames in the given interval and provide to another program as buffer frame by frame as they are. The catch here is to keep the frames encoded as they are, do not decode/encode/encapsulate them.
Picking a frame from the H.264 m2t to a pipe:
ffmpeg -ss 0:2:1.12 -i .\my_video.ts -c:v copy -f mpegts -frames:v 1 pipe: -y -hide_banner
Obviously, the time stamp is increasing for every next frame. From the pipe it is not a problem to convert it to a buffer.
Questions:
Is this method correct to extract a separate frame as it is without
any reference/recalculations with neighbor frames?
Not sure that flag -f mpegts really keeps the frame untouched. Is there better flag? (Maybe -f null?)
How to know the type of extracted frame (i, P, or B)?
Thank you.
This answer slightly diverts the original question. However, it does the job and gives good enough result.
The original question requested to extract encoded video frame by frame. Suggested variants extract to the standar output a compressed videos in batches of several frames. This is easy to pick in software and process/concatenate later as needed.
Variant 1
This method gives really smooth video by concatenation of resulting chunks. This method consumes more CPU than the Variant 2. And processing of every next chunk in the movie takes longer and longer. Therefore, Variant 2 is way better in terms of performance.
ffmpeg.exe -y -nostdin -loglevel quiet -hide_banner -i "c:\\temp\\in.ts" -vf trim=<from_second>:<to_second> -f mpegts pipe:
The order of keys is important and means:
-y -nostdin -loglevel quiet -hide_banner - don't ask questions, don't print excessive output.
-vf x:y - videofilter which trims out all the movie except the part between start and stop position of required video chunk from original file. These are floats.
-f mpegts - normally not needed if the output goes to the file: ffmpeg knows to derive format from the file name. In this case the result goes to the pipe to explicit specification of the output format needed.
Variant 2
This method gives almost smooth video by concatenation of resulting chunks. Almost good is visible jumping video, so it is not perfect method in terms of quality.
ffmpeg.exe -y -nostdin -loglevel quiet -hide_banner -ss <from_second> -t <chunk_duration> -i "c:\\temp\\in.ts" -copyts -f mpegts pipe:
Not repeating the explained options in the variant 1, only new options:
-ss <from_second> - skip the movie until specified position. It is important to give this key BEFORE the -i to save processing time. Else the ffmpeg will read all the movie until specified position, not skipping it. This can be supplied in format h:mm:ss.ff or just float seconds.
-t <chunk_duration> - required/optimal chunk size. This is float. If the GOP is known it's better to take chunks by the GOP size. This improves performance.
-copyts - keep the timestamps of video chunks from the original video. Without this key the result will play only forst frame of each chunk. Better interpretation/understanding is welcome.
I'd like to achieve two things without re-encoding the whole video stream:
Extend a video by freezing the last frame for a given duration.
Extend a video by freezing a frame at a given timestamp for a given duration.
Currently I'm using ffmpeg -i in.mp4 -vf tpad=stop_mode=clone:stop_duration=5 out.mp4 but it requires encoding the whole video stream and only allows freezing the last frame of the stream. To get my desired result I need to split the video into segments, extract the last second of a segment to a separate file (so I re-encode just that part), run the above command on it and then merge all the segments back with concat demuxer.
Is there any better and simpler way to achieve the above?
To 'extend' the the last frame, extend the audio stream by padding it.
ffmpeg -i in.mp4 -c:v copy -af apad -t 5 out.mp4
If there's no existing audio stream, add one
ffmpeg -i in.mp4 -f lavfi -i anullsrc -c:v copy -af apad -t 5 out.mp4
For pausing a frame in the middle with minimal re-encoding , segmenting + concat is indeed the way to go
Machine learning algorithms for video processing typically work on frames (images) rather than video.
In my work, I use ffmpeg to dump a specific scene as a sequence of .png files, process them in some way (denoise, deblur, colorize, annotate, inpainting, etc), output the results into an equal number of .png files, and then update the original video with the new frames.
This works well with constant frame-rate (CFR) video. I dump the images as so (eg, 50-frame sequence starting at 1:47):
ffmpeg -i input.mp4 -vf "select='gte(t,107)*lt(selected_n,50)'" -vsync passthrough '107+%06d.png'
And then after editing the images, I replace the originals as so (for a 12.5fps CFR video):
ffmpeg -i input.mp4 -itsoffset 107 -framerate 25/2 -i '107+%06d.png' -filter_complex "[0]overlay=eof_action=pass" -vsync passthrough -c:a copy output.mp4
However, many of the videos I work with are variable frame-rate (VFR), and this has created some challenges.
A simple solution is to convert VFR video to CFR, which ffmpeg wants to do anyway, but I'm wondering if it's possible to avoid this. The reason is that CFR requires either dropping frames - since the purpose of ML video processing is usually to improve the output, I'd like to avoid this - or duplicating frames - but an upscaling algorithm that I'm working with right now uses the previous and next frame for data - if the previous or next frame is a duplicate, then ... no data for upscaling.
With -vsync passthrough, I had hoped that I could simply remove the -framerate option, and preserve the original frames as-is, but the resulting command:
ffmpeg -i input.mp4 -itsoffset 107 -i '107+%06d.png' -filter_complex "[0]overlay=eof_action=pass" -vsync passthrough -c:a copy output.mp4
uses ffmpeg's default of 25fps, and drops a lot of frames. Is there a reliable way to replace frames in VFR video?
Yes, it can be done, but it's complicated. It is crucial that the overlay video have exactly the same frame timestamps as the underlay video for this process to work reliably. Generating such a VFR video segment overlay requires capturing the frame timestamps from the source video to generate a precisely timed replacement segment.
The short version of the process is to replace the above commands with the following to extract the images:
ffmpeg -i input.mp4 -vf "select='gte(t,107)*lt(selected_n,50)',showinfo" -vsync passthrough '107+%06d.png' 2>&1 | 'sed s/\r/\n/g' | showinfo2concat.py --prefix="107+" >concat.txt
This requires a script that can be downloaded here. After editing the images, update the source video with:
ffmpeg -i input.mp4 -f concat -safe 0 -i concat.txt -filter_complex"[1]settb=1/90000,setpts=9644455+PTS*25/90000[o];[0:v:0][o]overlay=eof_action=pass" -vsync passthrough -r 90000 output.mp4
Where 90000 is the timescale (inverse of timebase), and 9644455 is the PTS of the first frame to replace.
See the source for more details about what these commands actually do.
i try to concat multiple videos to one video and add an background music to it.
for some reason the background music is perfectly added to the output video but the audio of each part of the output is speed up to a chipmunk version of the video itself. this results in an output video of 7 minutes with about 5 minutes of silence since everything is so fast that all the audio finishes after about 2 minutes.
my command is:
ffmpeg -safe 0 -i videolist.ffconcat -i bg_loop.mp3 -y -filter_complex "[1:0]volume=0.3[a1];[0:a][a1]amix=inputs=2" -vcodec libx264 -r 25 -filter:v scale=w=1920:h=1080 -map 0:v:0 output.mp4
i tried to remove the background music (since i wasn't able to loop it through the video i thought maybe that's the issue) and still.. all the audio of the video clips is still speed up resulting in chaotic audio at the beginning and silence at the end.
my video list looks like this:
ffconcat version 1.0
file intro.mp4
file clip-x.mp4
file clip-y.mp4
file clip-x.mp4
file clip-y.mp4
[... and so on]
i hope somebody can tell me what i'm doing wrong here (and maybe how to adjust my command to loop the background music through all the clips)
i googled a bit and found the adjustment of my command to add amix=inputs=2:duration=first but that doesn't do the trick and if i add duration=shortest or duration=longest nothing changes the output audio
The concat demuxer requires that all streams in inputs have the same properties. For audio, that includes codec, sampling rate, channel layout, sample format..
If audio of some inputs is sounding funny after concat, that usually indicates a sampling rate mismatch. Run ffprobe -show_streams -select_streams a -v 0 "input-file" on each input to check. For those which are different, you can re-encode only the audio by adding -ar X where X is the most common sampling rate found among your inputs e.g. -ar 44100. Other parameters will depend on format details. Keep video by using -c:v copy.
I have a flv video and want to dump let's say 3s length of the video after first keyframe PICT_TYPE_I meet after 00:39. I ready the document of ffmpeg seeking and quote here
ffmpeg -ss 00:23:00 -i Mononoke.Hime.mkv -frames:v 1 out1.jpg
This example will produce one image frame (out1.jpg) at the
twenty-third minute from the beginning of the movie. The input will be
parsed using keyframes, which is very fast. As of FFmpeg 2.1, when
transcoding with ffmpeg (i.e. not just stream copying), -ss is now
also "frame-accurate" even when used as an input option. Previous
behavior (seeking only to the nearest preceding keyframe, even if not
precisely accurate) can be restored with the -noaccurate_seek option.
So I think if I use this command (put -ss before -i)
ffmpeg -noaccurate_seek -ss 00:39 -i input.flv -r 10 -s 720x400 -t 3.12 dump.flv
And this should dump a video that last 3.12s and begin with the first keyframe after 00:39 right? after all, this is what I need.
But the result is dump.flv not start with a keyframe, ie a PICT_TYPE_I frame.
I know I could find all keyframe start time with ffprobe and re-calculate the -ss seeking time to achieve this. But is there a better way?
If audio is of no concern, you can use
ffmpeg -ss 39.00 -i in.flv -vf select='if(eq(pict_type,I),st(1,t),gt(t,ld(1)))',setpts=N/FRAME_RATE/TB -an -t 3.12 out.flv
Here, the select filter discards all frames before the first keyframe which is after the seek point. The t value controls the total duration selected.
Do note that video duration is quantized i.e. a 25 fps video can have duration in steps of 0.04s.