How to add subtitles to the encoded video using ffmpeg? - ffmpeg

I am trying to add my srt file to the video using the options which were answered here before. My input file has captions in it by default. I tried in different ways to get the captions enabled in my encoded video. Below are the commands I used.
ffmpeg -i input.ts -i captions.srt -b:a 32000 -ar 48000 -force_key_frames 'expr:gte(t,n_forced*3)' -acodec libfaac -hls_flags single_file -hls_list_size 0 -hls_time 3 -vcodec libx264 -s 320x240 -b:v 512000 -maxrate 512000 -c:s mov_text outfile.ts
But I couldn't see the captions after I see the mediainfo of the encoded file. You can see the log file of my command.
[mpegts # 0x56412e67b0c0] max_analyze_duration 5000000 reached at 5024000 microseconds st:1
input.ts FPS 29.970030 1
Input #0, mpegts, from 'input.ts':
Duration: 00:03:00.07, start: 1.400000, bitrate: 2172 kb/s
Program 1
Metadata:
service_name : Service01
service_provider: FFmpeg
Stream #0:0[0x100]: Video: mpeg2video (Main), 1 reference frame ([2][0][0][0] / 0x0002), yuv420p(tv), 704x480 [SAR 10:11 DAR 4:3], Closed Captions, max. 15000 kb/s,
29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
Stream #0:1[0x101](eng): Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, stereo, fltp, 192 kb/s
Input #1, srt, from 'captions.srt':
Duration: N/A, bitrate: N/A
Stream #1:0: Subtitle: subrip
[graph 0 input from stream 0:0 # 0x56412e678540] w:704 h:480 pixfmt:yuv420p tb:1/90000 fr:30000/1001 sar:10/11 sws_param:flags=2
[scaler for output stream 0:0 # 0x56412e9caac0] w:320 h:240 flags:'bicubic' interl:0
[scaler for output stream 0:0 # 0x56412e9caac0] w:704 h:480 fmt:yuv420p sar:10/11 -> w:320 h:240 fmt:yuv420p sar:1/1 flags:0x4
[graph 1 input from stream 0:1 # 0x56412e9f74e0] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3
[audio format for output stream 0:1 # 0x56412e9f7b40] auto-inserting filter 'auto-inserted resampler 0' between the filter 'Parsed_anull_0' and the filter 'audio format
for output stream 0:1'
[auto-inserted resampler 0 # 0x56412e9f9f20] ch:2 chl:stereo fmt:fltp r:48000Hz -> ch:2 chl:stereo fmt:s16 r:48000Hz
[libx264 # 0x56412e9bb8c0] VBV maxrate specified, but no bufsize, ignored
[libx264 # 0x56412e9bb8c0] using SAR=1/1
[libx264 # 0x56412e9bb8c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
[libx264 # 0x56412e9bb8c0] profile High, level 2.0
[mpegts # 0x56412e9ba5a0] muxrate VBR, pcr every 2 pkts, sdt every 200, pat/pmt every 40 pkts
Output #0, mpegts, to 'my_encoded_all-3.ts':
Metadata:
encoder : Lavf57.25.100
Stream #0:0: Video: h264 (libx264), -1 reference frame, yuv420p, 320x240 [SAR 1:1 DAR 4:3], q=-1--1, 512 kb/s, 29.97 fps, 90k tbn, 29.97 tbc
Metadata:
encoder : Lavc57.24.102 libx264
Side data:
unknown side data type 10 (24 bytes)
Stream #0:1(eng): Audio: aac (libfaac), 48000 Hz, stereo, s16, 32 kb/s
Metadata:
encoder : Lavc57.24.102 libfaac
Stream #0:2(eng): Subtitle: subrip (srt), 320x240
Metadata:
encoder : Lavc57.24.102 srt
Stream mapping:
Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (ac3 (native) -> aac (libfaac))
Stream #1:0 -> #0:2 (subrip (srt) -> subrip (srt))
Press [q] to stop, [?] for help
[scaler for output stream 0:0 # 0x56412e9caac0] w:704 h:480 fmt:yuv420p sar:40/33 -> w:320 h:240 fmt:yuv420p sar:4/3 flags:0x4
No more output streams to write to, finishing.e=00:02:58.85 bitrate= 658.4kbits/s speed=13.2x
frame= 5377 fps=385 q=-1.0 Lsize= 16672kB time=00:59:16.18 bitrate= 38.4kbits/s speed= 255x
video:11401kB audio:1410kB subtitle:446kB other streams:0kB global headers:0kB muxing overhead: 25.753614%
Input file #0 (input.ts):
Input stream #0:0 (video): 5380 packets read (40279443 bytes); 5377 frames decoded;
Input stream #0:1 (audio): 5625 packets read (4320000 bytes); 5625 frames decoded (8640000 samples);
Total: 11005 packets (44599443 bytes) demuxed
Input file #1 (captions.srt):
Input stream #1:0 (subtitle): 10972 packets read (447147 bytes); 10972 frames decoded;
Total: 10972 packets (447147 bytes) demuxed
Output file #0 (output.ts):
Output stream #0:0 (video): 5377 frames encoded; 5377 packets muxed (11675098 bytes);
Output stream #0:1 (audio): 8438 frames encoded (8640000 samples); 8439 packets muxed (1444109 bytes);
Output stream #0:2 (subtitle): 10972 frames encoded; 10972 packets muxed (456619 bytes);
Total: 24788 packets (13575826 bytes) muxed
[libx264 # 0x56412e9bb8c0] frame I:81 Avg QP:15.08 size: 16370
[libx264 # 0x56412e9bb8c0] frame P:2312 Avg QP:17.77 size: 3393
[libx264 # 0x56412e9bb8c0] frame B:2984 Avg QP:22.38 size: 839
[libx264 # 0x56412e9bb8c0] consecutive B-frames: 20.6% 13.2% 9.4% 56.8%
[libx264 # 0x56412e9bb8c0] mb I I16..4: 11.6% 37.0% 51.4%
[libx264 # 0x56412e9bb8c0] mb P I16..4: 1.2% 3.2% 2.4% P16..4: 33.7% 18.0% 15.7% 0.0% 0.0% skip:25.6%
[libx264 # 0x56412e9bb8c0] mb B I16..4: 0.2% 0.3% 0.2% B16..8: 30.7% 9.2% 3.2% direct: 5.1% skip:51.1% L0:33.6% L1:44.5% BI:21.9%
[libx264 # 0x56412e9bb8c0] final ratefactor: 16.76
[libx264 # 0x56412e9bb8c0] 8x8 transform intra:43.0% inter:49.8%
[libx264 # 0x56412e9bb8c0] coded y,uvDC,uvAC intra: 77.9% 85.1% 68.8% inter: 22.9% 21.7% 6.0%
[libx264 # 0x56412e9bb8c0] i16 v,h,dc,p: 30% 38% 5% 26%
[libx264 # 0x56412e9bb8c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 23% 18% 4% 5% 7% 6% 7% 8%
[libx264 # 0x56412e9bb8c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 28% 23% 11% 5% 6% 8% 6% 7% 6%
[libx264 # 0x56412e9bb8c0] i8c dc,h,v,p: 46% 24% 21% 8%
[libx264 # 0x56412e9bb8c0] Weighted P-Frames: Y:4.2% UV:2.6%
[libx264 # 0x56412e9bb8c0] ref P L0: 73.3% 10.9% 11.1% 4.5% 0.1%
[libx264 # 0x56412e9bb8c0] ref B L0: 92.1% 6.3% 1.6%
[libx264 # 0x56412e9bb8c0] ref B L1: 97.4% 2.6%
I found no issues in encoding but I couldn't see the captions enabled in my output encoded video. I played it in my VLC player. No tracks for subtitles.
Can't we add the subtitles to a video while encoding?
Any help in achieving this would be appreciated.

Related

Using FFmpeg to stitch together H.264 videos and variably-spaced JPEG pictures; dealing with ffmpeg warnings

Context
I have a process flow that may output either H.264 Annex B streams, variably-spaced JPEGs, or a mixture of two. By variably-spaced I mean where elapsed time between any two adjacent JPEGs may (and likely to be) different from any other two adjacent JPEGs. So an example of possible inputs are:
stream1.h264
{Set of JPEGs}
stream1.h264 + stream2.h264
stream1.h264 + {Set of JPEGs}
stream1.h264 + {Set of JPEGs} + stream2.h264
stream1.h264 + {Set of JPEGs} + stream2.h264 + {Set of JPEGs} + ...
stream1.h264 + stream2.h264 + {Set of JPEGs} + ...
The output needs to be a single stitched (i.e. concatenated) output in MPEG-4 container.
Requirements: No re-encoding or transcoding of existing video compression (One time conversion of JPEG sets to video format is OKay).
Solution Prototype
To prototype the solution I have found that ffmpeg has concat demuxer that would let me specify an ordered sequence of inputs that ffmpeg would then concatenate together, but all inputs must be of the same format. So, to meet that requirement, I:
Convert every JPEG set to an .mp4 using concat (and using delay # directive to specify time-spacing between each JPEG)
Convert every .h264 to .mp4 using -c copy to avoid transcoding.
Stitch all generated interim .mp4 files into the single final .mp4 using -f concat and -c copy.
Here's the bash script, in parts, that performs the above:
Ignore the curl comment; that's from originally generating a 100 jpeg images with numbers and these are simply saved locally. What the loop does is it generates concat input file with file sequence#.jpeg directives and duration # directive where each successive JPEG delay is incremented by 0.1 seconds (0.1 between first and second, 0.2 b/w 2nd and 3rd, 0.3 b/w 3rd and 4th, and so on). Then it runs ffmpeg command to convert the set of JPEGs to .mp4 interim file.
echo "ffconcat version 1.0" >ffconcat-jpeg.txt
echo >>ffconcat-jpeg.txt
for i in {1..100}
do
echo "file $i.jpeg" >>ffconcat-jpeg.txt
d=$(echo "$i" | awk '{printf "%f", $1 / 10}')
# d=$(echo "scale=2; $i/10" | bc)
echo "duration $d" >>ffconcat-jpeg.txt
echo "" >>ffconcat-jpeg.txt
# curl -o "$i.jpeg" "https://math.tools/equation/get_equaimages?equation=$i&fontsize=256"
done
ffmpeg \
-hide_banner \
-vsync vfr \
-f concat \
-i ffconcat-jpeg.txt \
-r 30 \
-video_track_timescale 90000 \
video-jpeg.mp4
Convert two streams from .h264 to .mp4 via copy (no transcoding).
ffmpeg \
-hide_banner \
-i low-motion-video.h264 \
-c copy \
-vsync vfr \
-video_track_timescale 90000 \
low-motion-video.mp4
ffmpeg \
-hide_banner \
-i full-video.h264 \
-c copy \
-video_track_timescale 90000 \
-vsync vfr \
full-video.mp4
Stitch all together by generating another concat directive file.
echo "ffconcat version 1.0" >ffconcat-h264.txt
echo >>ffconcat-h264.txt
echo "file low-motion-video.mp4" >>ffconcat-h264.txt
echo >>ffconcat-h264.txt
echo "file full-video.mp4" >>ffconcat-h264.txt
echo >>ffconcat-h264.txt
echo "file video-jpeg.mp4" >>ffconcat-h264.txt
echo >>ffconcat-h264.txt
ffmpeg \
-hide_banner \
-f concat \
-i ffconcat-h264.txt \
-pix_fmt yuv420p \
-c copy \
-video_track_timescale 90000 \
-vsync vfr \
video-out.mp4
Problem (and attempted troubleshooting)
The above does produce a reasonable output -- it plays first video, then plays second video with no timing/rate issues AFAICT, then plays JPEGs with time between each JPEG "frame" growing successively, as expected.
But, the conversion process produces warnings that concern me (for compatibility with players; or potentially other IRL streams that may result in some issue my prototyping content doesn't make obvious). Initial attempts generated 100s of warnings, but with some arguments added, I reduced it down to just a handful, but this handful is stubborn and nothing I tried would help.
The first conversion of JPEGs to .mp4 goes fine with the following output:
Input #0, concat, from 'ffconcat-jpeg.txt':
Duration: 00:08:25.00, start: 0.000000, bitrate: 0 kb/s
Stream #0:0: Video: png, pal8(pc), 176x341 [SAR 3780:3780 DAR 16:31], 25 fps, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 # 0x7fe418008e00] using SAR=1/1
[libx264 # 0x7fe418008e00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
[libx264 # 0x7fe418008e00] profile High 4:4:4 Predictive, level 1.3, 4:4:4, 8-bit
[libx264 # 0x7fe418008e00] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=4 threads=11 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'video-jpeg.mp4':
Metadata:
encoder : Lavf58.76.100
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(tv, progressive), 176x341 [SAR 1:1 DAR 16:31], q=2-31, 30 fps, 90k tbn
Metadata:
encoder : Lavc58.134.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame= 100 fps=0.0 q=-1.0 Lsize= 157kB time=00:07:55.33 bitrate= 2.7kbits/s speed=2.41e+03x
video:155kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.800846%
[libx264 # 0x7fe418008e00] frame I:1 Avg QP:20.88 size: 574
[libx264 # 0x7fe418008e00] frame P:43 Avg QP:14.96 size: 2005
[libx264 # 0x7fe418008e00] frame B:56 Avg QP:21.45 size: 1266
[libx264 # 0x7fe418008e00] consecutive B-frames: 14.0% 24.0% 30.0% 32.0%
[libx264 # 0x7fe418008e00] mb I I16..4: 36.4% 55.8% 7.9%
[libx264 # 0x7fe418008e00] mb P I16..4: 5.1% 7.5% 11.2% P16..4: 5.6% 8.1% 4.5% 0.0% 0.0% skip:57.9%
[libx264 # 0x7fe418008e00] mb B I16..4: 2.4% 0.9% 3.9% B16..8: 16.2% 8.8% 4.6% direct: 1.2% skip:62.0% L0:56.6% L1:38.7% BI: 4.7%
[libx264 # 0x7fe418008e00] 8x8 transform intra:28.3% inter:3.7%
[libx264 # 0x7fe418008e00] coded y,u,v intra: 26.5% 0.0% 0.0% inter: 9.8% 0.0% 0.0%
[libx264 # 0x7fe418008e00] i16 v,h,dc,p: 82% 13% 4% 0%
[libx264 # 0x7fe418008e00] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 8% 71% 1% 0% 0% 0% 0% 0%
[libx264 # 0x7fe418008e00] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 41% 11% 29% 4% 2% 3% 1% 7% 1%
[libx264 # 0x7fe418008e00] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 # 0x7fe418008e00] ref P L0: 44.1% 4.2% 28.4% 23.3%
[libx264 # 0x7fe418008e00] ref B L0: 56.2% 32.1% 11.6%
[libx264 # 0x7fe418008e00] ref B L1: 92.4% 7.6%
[libx264 # 0x7fe418008e00] kb/s:2.50
The conversion of individual streams from .h264 to .mp4 generates two types of warnings each. One is [mp4 # 0x7faee3040400] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly, and the other is [mp4 # 0x7faee3040400] pts has no value.
Some posts on SO (can't find my original finds on that now) suggested that it's safe to ignore and comes from H.264 being an elementary stream that supposedly doesn't contain timestamps. It surprises me a bit since I produce that stream using NVENC API and clearly supply timing information for each frame via PIC_PARAMS structure: NV_STRUCT(PIC_PARAMS, pp); ...; pp.inputTimeStamp = _frameIndex++ * (H264_CLOCK_RATE / _params.frameRate);, where #define H264_CLOCK_RATE 9000 and _params.frameRate = 30.
Input #0, h264, from 'low-motion-video.h264':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], 30 fps, 30 tbr, 1200k tbn, 60 tbc
Output #0, mp4, to 'low-motion-video.mp4':
Metadata:
encoder : Lavf58.76.100
Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], q=2-31, 30 fps, 30 tbr, 90k tbn, 1200k tbc
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[mp4 # 0x7faee3040400] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[mp4 # 0x7faee3040400] pts has no value
[mp4 # 0x7faee3040400] pts has no value0kB time=-00:00:00.03 bitrate=N/A speed=N/A
Last message repeated 17985 times
frame=17987 fps=0.0 q=-1.0 Lsize= 79332kB time=00:09:59.50 bitrate=1084.0kbits/s speed=1.59e+03x
video:79250kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.103804%
Input #0, h264, from 'full-video.h264':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], 30 fps, 30 tbr, 1200k tbn, 60 tbc
Output #0, mp4, to 'full-video.mp4':
Metadata:
encoder : Lavf58.76.100
Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], q=2-31, 30 fps, 30 tbr, 90k tbn, 1200k tbc
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[mp4 # 0x7f9381864600] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[mp4 # 0x7f9381864600] pts has no value
[mp4 # 0x7f9381864600] pts has no value0kB time=-00:00:00.03 bitrate=N/A speed=N/A
Last message repeated 17981 times
frame=17983 fps=0.0 q=-1.0 Lsize= 52976kB time=00:09:59.36 bitrate= 724.1kbits/s speed=1.33e+03x
video:52893kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.156232%
But the most worrisome error for me is from stitching together all interim .mp4 files into one:
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f9ff2010e00] Auto-inserting h264_mp4toannexb bitstream filter
Input #0, concat, from 'ffconcat-h264.txt':
Duration: N/A, bitrate: 1082 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1440x3040 [SAR 1:1 DAR 9:19], 1082 kb/s, 30 fps, 30 tbr, 90k tbn, 60 tbc
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Output #0, mp4, to 'video-out.mp4':
Metadata:
encoder : Lavf58.76.100
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1440x3040 [SAR 1:1 DAR 9:19], q=2-31, 1082 kb/s, 30 fps, 30 tbr, 90k tbn, 90k tbc
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f9fe1009c00] Auto-inserting h264_mp4toannexb bitstream filter
[mp4 # 0x7f9ff2023400] Non-monotonous DTS in output stream 0:0; previous: 53954460, current: 53954460; changing to 53954461. This may result in incorrect timestamps in the output file.
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f9fd1008a00] Auto-inserting h264_mp4toannexb bitstream filter
[mp4 # 0x7f9ff2023400] Non-monotonous DTS in output stream 0:0; previous: 107900521, current: 107874150; changing to 107900522. This may result in incorrect timestamps in the output file.
[mp4 # 0x7f9ff2023400] Non-monotonous DTS in output stream 0:0; previous: 107900522, current: 107886150; changing to 107900523. This may result in incorrect timestamps in the output file.
frame=36070 fps=0.0 q=-1.0 Lsize= 132464kB time=00:27:54.26 bitrate= 648.1kbits/s speed=6.54e+03x
video:132296kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.126409%
I'm not sure how to deal with those non-monotonous DTS errors, and no matter what I try, nothing budges. I analyzed the interim .mp4 files using ffprobe -show_frames and found that the last frame of each interim .mp4 does not have DTS, while previous frames do. E.g.:
...
[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=53942461
pkt_pts_time=599.360678
pkt_dts=53942461
pkt_dts_time=599.360678
best_effort_timestamp=53942461
best_effort_timestamp_time=599.360678
pkt_duration=3600
pkt_duration_time=0.040000
pkt_pos=54161377
pkt_size=1034
width=1440
height=3040
pix_fmt=yuv420p
sample_aspect_ratio=1:1
pict_type=B
coded_picture_number=17982
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
color_range=unknown
color_space=unknown
color_primaries=unknown
color_transfer=unknown
chroma_location=left
[/FRAME]
[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=53927461
pkt_pts_time=599.194011
pkt_dts=N/A
pkt_dts_time=N/A
best_effort_timestamp=53927461
...
My guess is that as concat demuxer reads in (or elsewhere in ffmpeg's conversion pipeline), for the last frame it sees no DTS set, and produces a virtual value equal to the last seen. Then further in pipeline it consumes this input, sees that DTS value is being repeated, issues a warning and offsets it with increment by one, which might be somewhat nonsensical/unrealistic timing value.
I tried using -fflags +genpts as suggested in this SO answer, but that doesn't change anything.
Per yet other posts suggesting issue being with incompatible tbn and tbc values and possible timebase issues, I tried adding -time_base 1:90000 and -enc_time_base 1:90000 and -copytb 1 and nothing budges. The -video_track_timescale 90000 is there b/c it helped reduce those DTS warnings from 100s down to 3, but doesn't eliminate them all.
Question
What is missing and how can I get ffmpeg to perform conversions without these warnings, to be sure it produces proper, well-formed output?

FFMPEG when live streaming sends a message and exits after some frames were sent

when doing an streaming with FFMPEG all works perfectly until I get these messages and then, ffmpeg.exe exits:
av_interleaved_write_frame(): Unknown error
frame= 1224 fps=3.4 q=13.0 size= 2758kB time=00:01:21.94 bitrate= 275.8kbits/s speed=0.226x
av_interleaved_write_frame(): Unknown error
[flv # 000001e310e8a1c0] Failed to update header with correct duration.
[flv # 000001e310e8a1c0] Failed to update header with correct filesize.
Error writing trailer of rtmp://example.com/s/2b32abdc-130c-43e5-997e-079e69d1fd7f: Error number -10053 occurred
frame= 1224 fps=3.4 q=13.0 Lsize= 2758kB time=00:01:21.98 bitrate= 275.6kbits/s speed=0.226x
video:2481kB audio:221kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.084671%
[libx264 # 000001e310ad6080] frame I:41 Avg QP:10.29 size: 57664
[libx264 # 000001e310ad6080] frame P:1183 Avg QP:13.52 size: 148
[libx264 # 000001e310ad6080] mb I I16..4: 100.0% 0.0% 0.0%
[libx264 # 000001e310ad6080] mb P I16..4: 0.1% 0.0% 0.0% P16..4: 0.2% 0.0% 0.0% 0.0% 0.0% skip:99.7%
[libx264 # 000001e310ad6080] coded y,uvDC,uvAC intra: 10.9% 7.1% 5.4% inter: 0.0% 0.1% 0.0%
[libx264 # 000001e310ad6080] i16 v,h,dc,p: 84% 6% 6% 4%
[libx264 # 000001e310ad6080] i8c dc,h,v,p: 91% 6% 3% 1%
[libx264 # 000001e310ad6080] kb/s:248.98
[aac # 000001e310a46d40] Qavg: 108.454
Conversion failed!
Normally, the messages I received are similar to this:
frame= 1196 fps=3.4 q=13.0 size= 2692kB time=00:01:20.08 bitrate= 275.4kbits/s speed=0.227x
Which are the expected messages. Sometimes, I received this message, but this does not cause ffmpeg.exe to exit:
Input #0, matroska,webm, from 'pipe:':
Metadata:
encoder : Chrome
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)
Stream #0:1(eng): Video: h264 (Constrained Baseline), yuv420p(progressive), 1920x1080, SAR 1:1 DAR 16:9, 30.30 fps, 14.99 tbr, 1k tbn, 60 tbc (default)
What may be happening? maybe it is a problem of the RTMP server? or something is wrong with FFMPEG?
This version of FFMPEG.EXE is for windows. The programming language is C# from where I am launching FFMPEG.EXE process.
As I told, this happens after several frames sent to the server. Only once, this problem occured after a few frames sent. That is why I suspect that the RTMP server is the problem.
EDIT: This is the command:
FFMPEG -i - -c:v libx264 -preset ultrafast -tune zerolatency -max_muxing_queue_size 1000 -bufsize 5000 -r 15 -g 30 -keyint_min 30 -x264opts keyint=30 -crf 25 -pix_fmt yuv420p -profile:v baseline -level 3 -c:a aac -b:a 22k -ar 22050 -f flv rtmp://rtmp.xxxx.yyyy
Regards
Jaime

FFMPEG - adding a nullsrc causes my script to report "1000 duplicate frames"

I'm trying to add coloured rectangle highlights to a video, appearing at different locations and times. The highlights are on a 6x6 grid of 320x180 rectangles.
Originally I didn't have the nullsrc=size=1920x1080 thinking that it would start with an empty image, but it seems that that causes it to make assumptions about where the input is coming from. So I added the nullsrc=size=1920x1080 to start with a transparent 1920x1080 image but this command returns a warning that 1000 duplicate frames have been produced and it keeps going past the end of the input video with no signs of stopping.
ffmpeg -y \
-i "Input.mp4" \
-filter_complex \
"nullsrc=size=1920x1080, drawbox=x=(3-1)*320:y=(3-1)*180:w=320:h=180:t=7:c=cyan, fade=in:st=10:d=1:alpha=1, fade=out:st=40:d=1:alpha=1[tmp1]; \
nullsrc=size=1920x1080, drawbox=x=(4-1)*320:y=(4-1)*180:w=320:h=180:t=7:c=blue, fade=in:st=20:d=1:alpha=1, fade=out:st=50:d=1:alpha=1[tmp2]; \
nullsrc=size=1920x1080, drawbox=x=(5-1)*320:y=(5-1)*180:w=320:h=180:t=7:c=green, fade=in:st=30:d=1:alpha=1, fade=out:st=60:d=1:alpha=1[tmp3]; \
nullsrc=size=1920x1080, drawbox=x=(6-1)*320:y=(6-1)*180:w=320:h=180:t=7:c=yellow, fade=in:st=40:d=1:alpha=1, fade=out:st=70:d=1:alpha=1[tmp4]; \
[tmp1][tmp2] overlay=0:0[ovr1]; \
[tmp3][tmp4] overlay=0:0[ovr2]; \
[ovr1][ovr2] overlay=0:0[boxes]; \
[0:v][boxes] overlay=0:0" \
"Output.mp4"
The input video is around 01:45 long. Log of run:
ffmpeg -y -loglevel verbose -i "Input.mp4" -filter_complex " nullsrc=size=1920x1080, drawbox=x=(3-1)*320:y=(3-1)*180:w=320:h=180:t=7:c=cyan, fade=in:st=10:d=1:alpha=1, fade=out:st=40:d=1:alpha=1[tmp1]; nullsrc=size=1920x1080, drawbox=x =(4-1)*320:y=(4-1)*180:w=320:h=180:t=7:c=blue, fade=in:st=20:d=1:alpha=1, fade=out:st=50:d=1:alpha=1[tmp2]; nullsrc=size=1920x1080, drawbox=x=(5-1)*320:y=(5-1)*180:w=320:h=180:t=7:c=green, fa
de=in:st=30:d=1:alpha=1, fade=out:st=60:d=1:alpha=1[tmp3]; nullsrc=size=1920x1080, drawbox=x=(6-1)*320:y=(6-1)*180:w=320:h=180:t=7:c=yellow, fade=in:st=40:d=1:alpha=1, fade=out:st=70:d=1:alpha=1[tmp4]; [tmp1][tmp2] overlay=0:0[ovr1]; [tmp3][tmp4] overlay=0:0[ovr2]; [ovr1][ovr2] overlay=0:0[boxes]; [0:v][boxes] overlay=0:0" "Output.mp4"
ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 7.2.0 (GCC)
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvid
stab --enable-libvorbis --enable-cuda --enable-cuvid --enable-d3d11va --enable-nvenc --enable-dxva2 --enable-avisynth --enable-libmfx
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
[h264 # 000001df2b623ea0] Reinit context to 1920x1088, pix_fmt: yuv420p
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Input.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Duration: 00:01:48.67, start: 0.000000, bitrate: 1825 kb/s
Stream #0:0(und): Video: h264 (High), 1 reference frame (avc1 / 0x31637661), yuv420p(left), 1920x1080 (1920x1088) [SAR 1:1 DAR 16:9], 1693 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 44100 Hz, stereo, s16p, 127 kb/s (default)
Metadata:
handler_name : SoundHandler
[Parsed_nullsrc_0 # 000001df2b61af00] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_2 # 000001df2b6a6860] type:in start_time:10.000000 duration:1.000000 alpha:1
[Parsed_fade_3 # 000001df2b9ddec0] type:out start_time:40.000000 duration:1.000000 alpha:1
[Parsed_nullsrc_4 # 000001df2bc00560] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_6 # 000001df29fe6780] type:in start_time:20.000000 duration:1.000000 alpha:1
[Parsed_fade_7 # 000001df29fe6840] type:out start_time:50.000000 duration:1.000000 alpha:1
[Parsed_nullsrc_8 # 000001df2b642dc0] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_10 # 000001df2b6442a0] type:in start_time:30.000000 duration:1.000000 alpha:1
[Parsed_fade_11 # 000001df2b6444e0] type:out start_time:60.000000 duration:1.000000 alpha:1
[Parsed_nullsrc_12 # 000001df2b62d000] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_14 # 000001df2b62d580] type:in start_time:40.000000 duration:1.000000 alpha:1
[Parsed_fade_15 # 000001df2b62ddc0] type:out start_time:70.000000 duration:1.000000 alpha:1
Stream mapping:
Stream #0:0 (h264) -> overlay:main (graph 0)
overlay (graph 0) -> Stream #0:0 (libx264)
Stream #0:1 -> #0:1 (mp3 (native) -> aac (native))
Press [q] to stop, [?] for help
[h264 # 000001df2b8e5040] Reinit context to 1920x1088, pix_fmt: yuv420p
[graph_1_in_0_1 # 000001df2b62e100] tb:1/44100 samplefmt:s16p samplerate:44100 chlayout:0x3
[format_out_0_1 # 000001df2b62e5e0] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
[auto_resampler_0 # 000001df2b62dd00] ch:2 chl:stereo fmt:s16p r:44100Hz -> ch:2 chl:stereo fmt:fltp r:44100Hz
[Parsed_nullsrc_0 # 000001df2b62e440] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_2 # 000001df2b62df60] type:in start_time:10.000000 duration:1.000000 alpha:1
[Parsed_fade_3 # 000001df2b62e6c0] type:out start_time:40.000000 duration:1.000000 alpha:1
[Parsed_nullsrc_4 # 000001df2b62d9c0] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_6 # 000001df2b62e520] type:in start_time:20.000000 duration:1.000000 alpha:1
[Parsed_fade_7 # 000001df2b62d8e0] type:out start_time:50.000000 duration:1.000000 alpha:1
[Parsed_nullsrc_8 # 000001df2b62e1e0] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_10 # 000001df2b62e2a0] type:in start_time:30.000000 duration:1.000000 alpha:1
[Parsed_fade_11 # 000001df2b62e380] type:out start_time:60.000000 duration:1.000000 alpha:1
[Parsed_nullsrc_12 # 000001df2b62dc20] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
[Parsed_fade_14 # 000001df2b816f60] type:in start_time:40.000000 duration:1.000000 alpha:1
[Parsed_fade_15 # 000001df2b817ac0] type:out start_time:70.000000 duration:1.000000 alpha:1
[graph 0 input from stream 0:0 # 000001df2b817ba0] w:1920 h:1080 pixfmt:yuv420p tb:1/15360 fr:30/1 sar:1/1 sws_param:flags=2
[Parsed_drawbox_1 # 000001df2b62da80] x:640 y:360 w:320 h:180 color:0xA9A610FF
[Parsed_drawbox_5 # 000001df2b62e780] x:960 y:540 w:320 h:180 color:0x29F06EFF
[Parsed_overlay_16 # 000001df2b817860] main w:1920 h:1080 fmt:yuva420p overlay w:1920 h:1080 fmt:yuva420p
[Parsed_overlay_16 # 000001df2b817860] [framesync # 000001df2b815348] Selected 1/25 time base
[Parsed_overlay_16 # 000001df2b817860] [framesync # 000001df2b815348] Sync level 2
[Parsed_drawbox_9 # 000001df2b62db60] x:1280 y:720 w:320 h:180 color:0x515B51FF
[Parsed_drawbox_13 # 000001df2b818be0] x:1600 y:900 w:320 h:180 color:0xD21092FF
[Parsed_overlay_17 # 000001df2b816d00] main w:1920 h:1080 fmt:yuva420p overlay w:1920 h:1080 fmt:yuva420p
[Parsed_overlay_17 # 000001df2b816d00] [framesync # 000001df2b815e48] Selected 1/25 time base
[Parsed_overlay_17 # 000001df2b816d00] [framesync # 000001df2b815e48] Sync level 2
[Parsed_overlay_18 # 000001df2b8183c0] main w:1920 h:1080 fmt:yuva420p overlay w:1920 h:1080 fmt:yuva420p
[Parsed_overlay_18 # 000001df2b8183c0] [framesync # 000001df2b815668] Selected 1/25 time base
[Parsed_overlay_18 # 000001df2b8183c0] [framesync # 000001df2b815668] Sync level 2
[Parsed_overlay_19 # 000001df2b817d40] main w:1920 h:1080 fmt:yuv420p overlay w:1920 h:1080 fmt:yuva420p
[Parsed_overlay_19 # 000001df2b817d40] [framesync # 000001df2b816488] Selected 1/76800 time base
[Parsed_overlay_19 # 000001df2b817d40] [framesync # 000001df2b816488] Sync level 2
[libx264 # 000001df2b62cd40] using SAR=1/1
[libx264 # 000001df2b62cd40] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 # 000001df2b62cd40] profile High, level 4.0
[libx264 # 000001df2b62cd40] 264 - core 152 r2851 ba24899 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weigh
tb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'Output.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.83.100
Stream #0:0: Video: h264 (libx264), 1 reference frame (avc1 / 0x31637661), yuv420p(left), 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 30 fps, 15360 tbn, 30 tbc (default)
Metadata:
encoder : Lavc57.107.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, delay 1024, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc57.107.100 aac
[Parsed_overlay_19 # 000001df2b817d40] [framesync # 000001df2b816488] Sync level 1speed=0.78x
Past duration 0.800774 too large
*** 1 dup!5 fps= 24 q=29.0 size= 21760kB time=00:01:46.86 bitrate=1668.0kbits/s speed=0.78x
Last message repeated 1 times
*** 1 dup!8 fps= 24 q=29.0 size= 21760kB time=00:01:47.30 bitrate=1661.3kbits/s dup=2 drop=0 speed=0.78x
Last message repeated 2 times
*** 1 dup!3 fps= 24 q=29.0 size= 21760kB time=00:01:47.80 bitrate=1653.6kbits/s dup=5 drop=0 speed=0.781x
Last message repeated 1 times
...
Last message repeated 2 times
*** 1 dup!3 fps= 25 q=29.0 size= 27392kB time=00:05:05.80 bitrate= 733.8kbits/s dup=995 drop=0 speed=0.832x
Last message repeated 1 times
*** 1 dup!6 fps= 25 q=29.0 size= 27392kB time=00:05:06.23 bitrate= 732.8kbits/s dup=997 drop=0 speed=0.832x
Last message repeated 1 times
*** 1 dup!0 fps= 25 q=29.0 size= 27392kB time=00:05:06.70 bitrate= 731.6kbits/s dup=999 drop=0 speed=0.832x
Last message repeated 1 times
More than 1000 frames duplicated
*** 1 dup!3 fps= 25 q=29.0 size= 27392kB time=00:05:07.13 bitrate= 730.6kbits/s dup=1001 drop=0 speed=0.832x
Last message repeated 2 times
*** 1 dup!8 fps= 25 q=29.0 size= 27392kB time=00:05:07.63 bitrate= 729.4kbits/s dup=1004 drop=0 speed=0.832x
Last message repeated 1 times
...
*** 1 dup!9 fps= 25 q=29.0 size= 27904kB time=00:05:18.66 bitrate= 717.3kbits/s dup=1059 drop=0 speed=0.834x
Last message repeated 1 times
*** 1 dup!3 fps= 25 q=29.0 size= 27904kB time=00:05:19.13 bitrate= 716.3kbits/s dup=1061 drop=0 speed=0.834x
frame= 9635 fps= 25 q=-1.0 Lsize= 28364kB time=00:05:21.06 bitrate= 723.7kbits/s dup=1062 drop=0 speed=0.837x
video:26539kB audio:1637kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.669889%
Input file #0 (Input.mp4):
Input stream #0:0 (video): 3260 packets read (23006797 bytes); 3260 frames decoded;
Input stream #0:1 (audio): 4025 packets read (1682285 bytes); 4025 frames decoded (4636800 samples);
Total: 7285 packets (24689082 bytes) demuxed
Output file #0 (Output.mp4):
Output stream #0:0 (video): 9635 frames encoded; 9635 packets muxed (27175566 bytes);
Output stream #0:1 (audio): 4529 frames encoded (4636800 samples); 4530 packets muxed (1676214 bytes);
Total: 14165 packets (28851780 bytes) muxed
[libx264 # 000001df2b62cd40] frame I:39 Avg QP:16.49 size:213954
[libx264 # 000001df2b62cd40] frame P:2446 Avg QP:17.77 size: 6277
[libx264 # 000001df2b62cd40] frame B:7150 Avg QP:30.38 size: 486
[libx264 # 000001df2b62cd40] consecutive B-frames: 0.8% 0.8% 0.1% 98.3%
[libx264 # 000001df2b62cd40] mb I I16..4: 13.2% 43.0% 43.8%
[libx264 # 000001df2b62cd40] mb P I16..4: 0.2% 0.2% 0.1% P16..4: 7.1% 3.3% 1.7% 0.0% 0.0% skip:87.5%
[libx264 # 000001df2b62cd40] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 4.3% 0.1% 0.0% direct: 0.0% skip:95.5% L0:41.9% L1:56.1% BI: 2.0%
[libx264 # 000001df2b62cd40] 8x8 transform intra:44.1% inter:65.8%
[libx264 # 000001df2b62cd40] coded y,uvDC,uvAC intra: 71.3% 83.5% 52.5% inter: 1.1% 1.5% 0.0%
[libx264 # 000001df2b62cd40] i16 v,h,dc,p: 31% 28% 4% 38%
[libx264 # 000001df2b62cd40] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 28% 20% 10% 6% 6% 7% 7% 8% 8%
[libx264 # 000001df2b62cd40] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 28% 7% 4% 6% 6% 6% 5% 5%
[libx264 # 000001df2b62cd40] i8c dc,h,v,p: 34% 27% 28% 11%
[libx264 # 000001df2b62cd40] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 # 000001df2b62cd40] ref P L0: 71.1% 14.9% 10.6% 3.4%
[libx264 # 000001df2b62cd40] ref B L0: 93.3% 5.9% 0.8%
[libx264 # 000001df2b62cd40] ref B L1: 97.8% 2.2%
[libx264 # 000001df2b62cd40] kb/s:676.90
[aac # 000001df2b6a5620] Qavg: 1165.766
Exiting normally, received signal 2.
Terminate batch job (Y/N)? y

Error while processing the decoded data for stream using ffmpeg

I am using the following command:
ffmpeg
-i "video1a.flv"
-i "video1b.flv"
-i "video1c.flv"
-i "video2a.flv"
-i "video3a.flv"
-i "video4a.flv"
-i "video4b.flv"
-i "video4c.flv"
-i "video4d.flv"
-i "video4e.flv"
-filter_complex
nullsrc=size=640x480[base];
[0:v]setpts=PTS-STARTPTS+0.12/TB,scale=320x240[1a];
[1:v]setpts=PTS-STARTPTS+3469.115/TB,scale=320x240[1b];
[2:v]setpts=PTS-STARTPTS+7739.299/TB,scale=320x240[1c];
[5:v]setpts=PTS-STARTPTS+4390.466/TB,scale=320x240[4a];
[6:v]setpts=PTS-STARTPTS+6803.937/TB,scale=320x240[4b];
[7:v]setpts=PTS-STARTPTS+8242.005/TB,scale=320x240[4c];
[8:v]setpts=PTS-STARTPTS+9811.577/TB,scale=320x240[4d];
[9:v]setpts=PTS-STARTPTS+10765.19/TB,scale=320x240[4e];
[base][1a]overlay=eof_action=pass[o1];
[o1][1b]overlay=eof_action=pass[o1];
[o1][1c]overlay=eof_action=pass:shortest=1[o1];
[o1][4a]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4b]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4c]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4d]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4e]overlay=eof_action=pass:x=320:y=240;
[0:a]asetpts=PTS-STARTPTS+0.12/TB,aresample=async=1,pan=1c|c0=c0,apad[a1a];
[1:a]asetpts=PTS-STARTPTS+3469.115/TB,aresample=async=1,pan=1c|c0=c0,apad[a1b];
[2:a]asetpts=PTS-STARTPTS+7739.299/TB,aresample=async=1,pan=1c|c0=c0[a1c];
[3:a]asetpts=PTS-STARTPTS+82.55/TB,aresample=async=1,pan=1c|c0=c0,apad[a2a];
[4:a]asetpts=PTS-STARTPTS+2687.265/TB,aresample=async=1,pan=1c|c0=c0,apad[a3a];
[a1a][a1b][a1c][a2a][a3a]amerge=inputs=5
-c:v libx264 -c:a aac -ac 2 output.mp4
This is the stream data from ffmpeg:
Input #0
Stream #0:0: Video: vp6f, yuv420p, 160x128, 1k tbr, 1k tbn
Stream #0:1: Audio: nellymoser, 11025 Hz, mono, flt
Input #1
Stream #1:0: Audio: nellymoser, 11025 Hz, mono, flt
Stream #1:1: Video: vp6f, yuv420p, 160x128, 1k tbr, 1k tbn
Input #2
Stream #2:0: Audio: nellymoser, 11025 Hz, mono, flt
Stream #2:1: Video: vp6f, yuv420p, 160x128, 1k tbr, 1k tbn
Input #3
Stream #3:0: Audio: nellymoser, 11025 Hz, mono, flt
Input #4
Stream #4:0: Audio: nellymoser, 11025 Hz, mono, flt
Input #5
Stream #5:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #6
Stream #6:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #7
Stream #7:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #8
Stream #8:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #9
Stream #9:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Stream mapping:
Stream #0:0 (vp6f) -> setpts
Stream #0:1 (nellymoser) -> asetpts
Stream #1:0 (nellymoser) -> asetpts
Stream #1:1 (vp6f) -> setpts
Stream #2:0 (nellymoser) -> asetpts
Stream #2:1 (vp6f) -> setpts
Stream #3:0 (nellymoser) -> asetpts
Stream #4:0 (nellymoser) -> asetpts
Stream #5:0 (vp6f) -> setpts
Stream #6:0 (vp6f) -> setpts
Stream #7:0 (vp6f) -> setpts
Stream #8:0 (vp6f) -> setpts
Stream #9:0 (vp6f) -> setpts
overlay -> Stream #0:0 (libx264)
amerge -> Stream #0:1 (aac)
This is the error:
Press [q] to stop, [?] for help
Enter command: <target>|all <time>|-1 <command>[ <argument>]
Parse error, at least 3 arguments were expected, only 1 given in string 'ho Oscar'
[Parsed_amerge_44 # 0a7238c0] No channel layout for input 1
[Parsed_amerge_44 # 0a7238c0] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
[Parsed_pan_27 # 07681880] Pure channel mapping detected: 0
[Parsed_pan_31 # 07681b40] Pure channel mapping detected: 0
[Parsed_pan_35 # 0a7232c0] Pure channel mapping detected: 0
[Parsed_pan_38 # 0a7234c0] Pure channel mapping detected: 0
[Parsed_pan_42 # 0a723740] Pure channel mapping detected: 0
[libx264 # 069e8a40] using SAR=1/1
[libx264 # 069e8a40] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 # 069e8a40] profile High, level 3.0
[libx264 # 069e8a40] 264 - core 155 r2901 7d0ff22 - H.264/MPEG-4 AVC codec - Copyleft 2003-2018 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
Metadata:
canSeekToEnd : false
encoder : Lavf58.16.100
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 640x480 [SAR 1:1 DAR 4:3], q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
Metadata:
encoder : Lavc58.19.102 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 11025 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
encoder : Lavc58.19.102 aac
frame= 200 fps=0.0 q=28.0 size= 0kB time=00:00:07.82 bitrate= 0.0kbits/s speed=15.6x
...
frame=30132 fps=497 q=28.0 size= 29952kB time=00:20:05.14 bitrate= 203.6kbits/s speed=19.9x
Error while filtering: Cannot allocate memory
Failed to inject frame into filter network: Cannot allocate memory
Error while processing the decoded data for stream #2:1
[libx264 # 069e8a40] frame I:121 Avg QP: 8.83 size: 7052
[libx264 # 069e8a40] frame P:7609 Avg QP:18.33 size: 1527
[libx264 # 069e8a40] frame B:22367 Avg QP:25.44 size: 112
[libx264 # 069e8a40] consecutive B-frames: 0.6% 0.7% 1.0% 97.8%
[libx264 # 069e8a40] mb I I16..4: 75.7% 18.3% 6.0%
[libx264 # 069e8a40] mb P I16..4: 0.3% 0.7% 0.1% P16..4: 10.6% 3.3% 1.6% 0.0% 0.0% skip:83.4%
[libx264 # 069e8a40] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 3.2% 0.2% 0.0% direct: 0.2% skip:96.5% L0:47.7% L1:48.2% BI: 4.0%
[libx264 # 069e8a40] 8x8 transform intra:37.4% inter:70.2%
[libx264 # 069e8a40] coded y,uvDC,uvAC intra: 38.9% 46.1% 28.7% inter: 1.7% 3.3% 0.1%
[libx264 # 069e8a40] i16 v,h,dc,p: 78% 8% 4% 10%
[libx264 # 069e8a40] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 20% 12% 3% 6% 8% 6% 6% 7%
[libx264 # 069e8a40] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 37% 22% 9% 4% 6% 7% 5% 5% 4%
[libx264 # 069e8a40] i8c dc,h,v,p: 60% 16% 17% 7%
[libx264 # 069e8a40] Weighted P-Frames: Y:0.7% UV:0.6%
[libx264 # 069e8a40] ref P L0: 65.5% 12.3% 14.2% 8.0% 0.0%
[libx264 # 069e8a40] ref B L0: 90.2% 7.5% 2.3%
[libx264 # 069e8a40] ref B L1: 96.4% 3.6%
[libx264 # 069e8a40] kb/s:99.58
[aac # 069e9600] Qavg: 65519.982
[aac # 069e9600] 2 frames left in the queue on closing
Conversion failed!
I am trying to figure out how to fix these errors:
Error while filtering: Cannot allocate memory
Failed to inject frame into filter network: Cannot allocate memory
Error while processing the decoded data for stream #2:1
Observation #1
If I run the following command on stream #2:1 by itself:
ffmpeg -i video1c.flv -vcodec libx264 -acodec aac video1c.mp4
The files is converted fine with no errors.
Observation #2
Running MediaInfo on video1c.flv (stream #2) shows the following:
Format: Flash Video
Video Codecs: On2 VP6
Audio Codecs: Nellymoser
Any help would be appreciated in resolving this error.
Update #1
I have tried splitting the filter graph into two as requested but I receive the same errors:
Error while filtering: Cannot allocate memory
Failed to inject frame into filter network: Cannot allocate memory
Error while processing the decoded data for stream #1:1
However, I did discover something, if I try to bring up stream #1:1 mentioned above (video1b.flv) using VLC Media Player, I can hear the audio file but I cannot see the video and I receive this error message:
No suitable decoder module:
VLC Does not support the audio or video format "undf".
Unfortunately there is no way for you to fix this.
Update #2
The above error was with the 32bit version of ffmpeg. I switched to a 64 bit machine and am now running the 64 bit ffmpeg version ffmpeg-20180605-b748772-win64-static.
Now I no longer receive the following error:
Error while processing the decoded data for stream #1:1
But, I have a new error. About an hour into running it, I receive the following error:
av_interleaved_write_frame(): Cannot allocate memory
[mp4 # 000000000433f080] Application provided duration: 3327365388930198318
/ timestamp: 17178820096 is out of range for mov/mp4 format
I also tried first remuxing all the files as suggested and using those files to run the above command and that did not help. I still get the same error.
Try with audio and video in different filtergraphs
ffmpeg
-i "video1a.flv"
-i "video1b.flv"
-i "video1c.flv"
-i "video2a.flv"
-i "video3a.flv"
-i "video4a.flv"
-i "video4b.flv"
-i "video4c.flv"
-i "video4d.flv"
-i "video4e.flv"
-filter_complex
nullsrc=size=640x480[base];
[0:v]setpts=PTS-STARTPTS+0.12/TB,scale=320x240[1a];
[1:v]setpts=PTS-STARTPTS+3469.115/TB,scale=320x240[1b];
[2:v]setpts=PTS-STARTPTS+7739.299/TB,scale=320x240[1c];
[5:v]setpts=PTS-STARTPTS+4390.466/TB,scale=320x240[4a];
[6:v]setpts=PTS-STARTPTS+6803.937/TB,scale=320x240[4b];
[7:v]setpts=PTS-STARTPTS+8242.005/TB,scale=320x240[4c];
[8:v]setpts=PTS-STARTPTS+9811.577/TB,scale=320x240[4d];
[9:v]setpts=PTS-STARTPTS+10765.19/TB,scale=320x240[4e];
[base][1a]overlay=eof_action=pass[o1];
[o1][1b]overlay=eof_action=pass[o1];
[o1][1c]overlay=eof_action=pass:shortest=1[o1];
[o1][4a]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4b]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4c]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4d]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4e]overlay=eof_action=pass:x=320:y=240
-filter_complex
[0:a]asetpts=PTS-STARTPTS+0.12/TB,aresample=async=1,pan=1c|c0=c0,apad[a1a];
[1:a]asetpts=PTS-STARTPTS+3469.115/TB,aresample=async=1,pan=1c|c0=c0,apad[a1b];
[2:a]asetpts=PTS-STARTPTS+7739.299/TB,aresample=async=1,pan=1c|c0=c0[a1c];
[3:a]asetpts=PTS-STARTPTS+82.55/TB,aresample=async=1,pan=1c|c0=c0,apad[a2a];
[4:a]asetpts=PTS-STARTPTS+2687.265/TB,aresample=async=1,pan=1c|c0=c0,apad[a3a];
[a1a][a1b][a1c][a2a][a3a]amerge=inputs=5
-c:v libx264 -c:a aac -ac 2 output.mp4

Specifying audio/video for a multiple stream/multiple file setup using ffmpeg

Folks, I have the following ffmpeg command:
ffmpeg
-i video1a -i video2a -i video3a -i video4a
-i video1b -i video2b -i video3b -i video4b
-i video1c
-filter_complex "
nullsrc=size=640x480 [base];
[0:v] setpts=PTS-STARTPTS+ 0/TB, scale=320x240 [1a];
[1:v] setpts=PTS-STARTPTS+ 300/TB, scale=320x240 [2a];
[2:v] setpts=PTS-STARTPTS+ 400/TB, scale=320x240 [3a];
[3:v] setpts=PTS-STARTPTS+ 400/TB, scale=320x240 [4a];
[4:v] setpts=PTS-STARTPTS+2500/TB, scale=320x240 [1b];
[5:v] setpts=PTS-STARTPTS+ 800/TB, scale=320x240 [2b];
[6:v] setpts=PTS-STARTPTS+ 700/TB, scale=320x240 [3b];
[7:v] setpts=PTS-STARTPTS+ 800/TB, scale=320x240 [4b];
[8:v] setpts=PTS-STARTPTS+3000/TB, scale=320x240 [1c];
[base][1a] overlay=eof_action=pass [o1];
[o1][1b] overlay=eof_action=pass [o1];
[o1][1c] overlay=eof_action=pass:shortest=1 [o1];
[o1][2a] overlay=eof_action=pass:x=320 [o2];
[o2][2b] overlay=eof_action=pass:x=320 [o2];
[o2][3a] overlay=eof_action=pass:y=240 [o3];
[o3][3b] overlay=eof_action=pass:y=240 [o3];
[o3][4a] overlay=eof_action=pass:x=320:y=240[o4];
[o4][4b] overlay=eof_action=pass:x=320:y=240"
-c:v libx264 output.mp4
I have just found out something regarding the files I will be processing with above command: that some mp4 files are video/audio, some mp4 files are audio alone and some mp4 files are video alone. I am already able to determine which ones have audio/video/both using ffprobe. My question is how do I modify above command to state what each file contains (video/audio/both).
This is the scenario of which file has video/audio/both:
video time
======= =========
Area 1:
video1a audio
video1b both
video1c video
Area 2:
video2a video
video2b audio
Area 3:
video3a video
video3b audio
Area 4:
video4a video
video4b both
My question is how to correctly modify command above to specify what the file has (audio/video/both). Thank you.
Update #1
I ran test as follows:
-i "video1a.flv"
-i "video1b.flv"
-i "video1c.flv"
-i "video2a.flv"
-i "video3a.flv"
-i "video4a.flv"
-i "video4b.flv"
-i "video4c.flv"
-i "video4d.flv"
-i "video4e.flv"
-filter_complex
nullsrc=size=640x480[base];
[0:v]setpts=PTS-STARTPTS+120/TB,scale=320x240[1a];
[1:v]setpts=PTS-STARTPTS+3469115/TB,scale=320x240[1b];
[2:v]setpts=PTS-STARTPTS+7739299/TB,scale=320x240[1c];
[5:v]setpts=PTS-STARTPTS+4390466/TB,scale=320x240[4a];
[6:v]setpts=PTS-STARTPTS+6803937/TB,scale=320x240[4b];
[7:v]setpts=PTS-STARTPTS+8242005/TB,scale=320x240[4c];
[8:v]setpts=PTS-STARTPTS+9811577/TB,scale=320x240[4d];
[9:v]setpts=PTS-STARTPTS+10765190/TB,scale=320x240[4e];
[base][1a]overlay=eof_action=pass[o1];
[o1][1b]overlay=eof_action=pass[o1];
[o1][1c]overlay=eof_action=pass:shortest=1[o1];
[o1][4a]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4b]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4c]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4d]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4e]overlay=eof_action=pass:x=320:y=240;
[0:a]asetpts=PTS-STARTPTS+120/TB,aresample=async=1,apad[a1a];
[1:a]asetpts=PTS-STARTPTS+3469115/TB,aresample=async=1,apad[a1b];
[2:a]asetpts=PTS-STARTPTS+7739299/TB,aresample=async=1[a1c];
[3:a]asetpts=PTS-STARTPTS+82550/TB,aresample=async=1,apad[a2a];
[4:a]asetpts=PTS-STARTPTS+2687265/TB,aresample=async=1,apad[a3a];
[a1a][a1b][a1c][a2a][a3a]amerge=inputs=5
-c:v libx264 -c:a aac -ac 2 output.mp4
This is the stream data from ffmpeg:
Input #0
Stream #0:0: Video: vp6f, yuv420p, 160x128, 1k tbr, 1k tbn
Stream #0:1: Audio: nellymoser, 11025 Hz, mono, flt
Input #1
Stream #1:0: Audio: nellymoser, 11025 Hz, mono, flt
Stream #1:1: Video: vp6f, yuv420p, 160x128, 1k tbr, 1k tbn
Input #2
Stream #2:0: Audio: nellymoser, 11025 Hz, mono, flt
Stream #2:1: Video: vp6f, yuv420p, 160x128, 1k tbr, 1k tbn
Input #3
Stream #3:0: Audio: nellymoser, 11025 Hz, mono, flt
Input #4
Stream #4:0: Audio: nellymoser, 11025 Hz, mono, flt
Input #5
Stream #5:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #6
Stream #6:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #7
Stream #7:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #8
Stream #8:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
Input #9
Stream #9:0: Video: vp6f, yuv420p, 1680x1056, 1k tbr, 1k tbn
This is the error:
Stream mapping:
Stream #0:0 (vp6f) -> setpts
Stream #0:1 (nellymoser) -> asetpts
Stream #1:0 (nellymoser) -> asetpts
Stream #1:1 (vp6f) -> setpts
Stream #2:0 (nellymoser) -> asetpts
Stream #2:1 (vp6f) -> setpts
Stream #3:0 (nellymoser) -> asetpts
Stream #4:0 (nellymoser) -> asetpts
Stream #5:0 (vp6f) -> setpts
Stream #6:0 (vp6f) -> setpts
Stream #7:0 (vp6f) -> setpts
Stream #8:0 (vp6f) -> setpts
Stream #9:0 (vp6f) -> setpts
overlay -> Stream #0:0 (libx264)
amerge -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
Enter command: <target>|all <time>|-1 <command>[ <argument>]
Parse error, at least 3 arguments were expected, only 1 given in string 'ho Oscar'
[Parsed_amerge_39 # 0aa147c0] No channel layout for input 1
Last message repeated 1 times
[AVFilterGraph # 05e01900] The following filters could not choose their formats: Parsed_amerge_39
Consider inserting the (a)format filter near their input or output.
Error reinitializing filters!
Failed to inject frame into filter network: I/O error
Error while processing the decoded data for stream #4:0
Conversion failed!
Update #2
Would it be like this:
-i "video1a.flv"
-i "video1b.flv"
-i "video1c.flv"
-i "video2a.flv"
-i "video3a.flv"
-i "video4a.flv"
-i "video4b.flv"
-i "video4c.flv"
-i "video4d.flv"
-i "video4e.flv"
-filter_complex
nullsrc=size=640x480[base];
[0:v]setpts=PTS-STARTPTS+120/TB,scale=320x240[1a];
[1:v]setpts=PTS-STARTPTS+3469115/TB,scale=320x240[1b];
[2:v]setpts=PTS-STARTPTS+7739299/TB,scale=320x240[1c];
[5:v]setpts=PTS-STARTPTS+4390466/TB,scale=320x240[4a];
[6:v]setpts=PTS-STARTPTS+6803937/TB,scale=320x240[4b];
[7:v]setpts=PTS-STARTPTS+8242005/TB,scale=320x240[4c];
[8:v]setpts=PTS-STARTPTS+9811577/TB,scale=320x240[4d];
[9:v]setpts=PTS-STARTPTS+10765190/TB,scale=320x240[4e];
[base][1a]overlay=eof_action=pass[o1];
[o1][1b]overlay=eof_action=pass[o1];
[o1][1c]overlay=eof_action=pass:shortest=1[o1];
[o1][4a]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4b]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4c]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4d]overlay=eof_action=pass:x=320:y=240[o4];
[o4][4e]overlay=eof_action=pass:x=320:y=240;
[0:a]asetpts=PTS-STARTPTS+120/TB,aresample=async=1,pan=1c|c0=c0,apad[a1a];
[1:a]asetpts=PTS-STARTPTS+3469115/TB,aresample=async=1,pan=1c|c0=c0,apad[a1b];
[2:a]asetpts=PTS-STARTPTS+7739299/TB,aresample=async=1,pan=1c|c0=c0[a1c];
[3:a]asetpts=PTS-STARTPTS+82550/TB,aresample=async=1,pan=1c|c0=c0,apad[a2a];
[4:a]asetpts=PTS-STARTPTS+2687265/TB,aresample=async=1,pan=1c|c0=c0,apad[a3a];
[a1a][a1b][a1c][a2a][a3a]amerge=inputs=5
-c:v libx264 -c:a aac -ac 2 output.mp4
Update #3
Now getting this error:
Stream mapping:
Stream #0:0 (vp6f) -> setpts
Stream #0:1 (nellymoser) -> asetpts
Stream #1:0 (nellymoser) -> asetpts
Stream #1:1 (vp6f) -> setpts
Stream #2:0 (nellymoser) -> asetpts
Stream #2:1 (vp6f) -> setpts
Stream #3:0 (nellymoser) -> asetpts
Stream #4:0 (nellymoser) -> asetpts
Stream #5:0 (vp6f) -> setpts
Stream #6:0 (vp6f) -> setpts
Stream #7:0 (vp6f) -> setpts
Stream #8:0 (vp6f) -> setpts
Stream #9:0 (vp6f) -> setpts
overlay -> Stream #0:0 (libx264)
amerge -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
Enter command: <target>|all <time>|-1 <command>[ <argument>]
Parse error, at least 3 arguments were expected, only 1 given in string 'ho Oscar'
[Parsed_amerge_44 # 0a9808c0] No channel layout for input 1
[Parsed_amerge_44 # 0a9808c0] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
[Parsed_pan_27 # 07694800] Pure channel mapping detected: 0
[Parsed_pan_31 # 07694a80] Pure channel mapping detected: 0
[Parsed_pan_35 # 0a980300] Pure channel mapping detected: 0
[Parsed_pan_38 # 0a980500] Pure channel mapping detected: 0
[Parsed_pan_42 # 0a980780] Pure channel mapping detected: 0
[libx264 # 06ad78c0] using SAR=1/1
[libx264 # 06ad78c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 # 06ad78c0] profile High, level 3.0
[libx264 # 06ad78c0] 264 - core 155 r2901 7d0ff22 - H.264/MPEG-4 AVC codec - Copyleft 2003-2018 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
Metadata:
canSeekToEnd : false
encoder : Lavf58.16.100
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 640x480 [SAR 1:1 DAR 4:3], q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
Metadata:
encoder : Lavc58.19.102 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 11025 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
encoder : Lavc58.19.102 aac
...
...
Error while processing the decoded data for stream #1:1
[libx264 # 06ad78c0] frame I:133 Avg QP: 8.58 size: 6481
[libx264 # 06ad78c0] frame P:8358 Avg QP:17.54 size: 1386
[libx264 # 06ad78c0] frame B:24582 Avg QP:24.27 size: 105
[libx264 # 06ad78c0] consecutive B-frames: 0.6% 0.5% 0.7% 98.1%
[libx264 # 06ad78c0] mb I I16..4: 78.3% 16.1% 5.6%
[libx264 # 06ad78c0] mb P I16..4: 0.3% 0.7% 0.1% P16..4: 9.6% 3.0% 1.4% 0.0% 0.0% skip:84.9%
[libx264 # 06ad78c0] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 2.9% 0.1% 0.0% direct: 0.2% skip:96.8% L0:47.0% L1:49.0% BI: 4.0%
[libx264 # 06ad78c0] 8x8 transform intra:35.0% inter:70.1%
[libx264 # 06ad78c0] coded y,uvDC,uvAC intra: 36.8% 43.7% 27.3% inter: 1.6% 3.0% 0.1%
[libx264 # 06ad78c0] i16 v,h,dc,p: 79% 8% 4% 9%
[libx264 # 06ad78c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 32% 20% 12% 3% 6% 8% 6% 5% 7%
[libx264 # 06ad78c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 38% 22% 9% 4% 6% 7% 5% 5% 4%
[libx264 # 06ad78c0] i8c dc,h,v,p: 62% 15% 16% 7%
[libx264 # 06ad78c0] Weighted P-Frames: Y:0.6% UV:0.5%
[libx264 # 06ad78c0] ref P L0: 65.4% 12.3% 14.3% 7.9% 0.0%
[libx264 # 06ad78c0] ref B L0: 90.2% 7.5% 2.3%
[libx264 # 06ad78c0] ref B L1: 96.3% 3.7%
[libx264 # 06ad78c0] kb/s:90.81
[aac # 06ad8480] Qavg: 65519.970
[aac # 06ad8480] 2 frames left in the queue on closing
Conversion failed!
Use
ffmpeg
-i video1a -i video2a -i video3a -i video4a
-i video1b -i video2b -i video3b -i video4b
-i video1c
-filter_complex "
nullsrc=size=640x480 [base];
[1:v] setpts=PTS-STARTPTS+ 300/TB, scale=320x240 [2a];
[2:v] setpts=PTS-STARTPTS+ 400/TB, scale=320x240 [3a];
[3:v] setpts=PTS-STARTPTS+ 400/TB, scale=320x240 [4a];
[4:v] setpts=PTS-STARTPTS+2500/TB, scale=320x240 [1b];
[7:v] setpts=PTS-STARTPTS+2500/TB, scale=320x240 [4b];
[8:v] setpts=PTS-STARTPTS+3000/TB, scale=320x240 [1c];
[base][1b] overlay=eof_action=pass [o1];
[o1][1c] overlay=eof_action=pass:shortest=1 [o1];
[o1][2a] overlay=eof_action=pass:x=320 [o2];
[o2][3a] overlay=eof_action=pass:y=240 [o3];
[o3][4a] overlay=eof_action=pass:x=320:y=240[o4];
[o4][4b] overlay=eof_action=pass:x=320:y=240;
[0:a] asetpts=PTS-STARTPTS+ 0/TB, aresample=async=1, apad [a1a];
[4:a] asetpts=PTS-STARTPTS+2500/TB, aresample=async=1 [a1b];
[5:a] asetpts=PTS-STARTPTS+ 800/TB, aresample=async=1, apad [a2b];
[6:a] asetpts=PTS-STARTPTS+ 700/TB, aresample=async=1, apad [a3b];
[7:a] asetpts=PTS-STARTPTS+ 800/TB, aresample=async=1, apad [a4b];
[a1a][a1b][a2b][a3b][a4b]amerge=inputs=5"
-c:v libx264 -c:a aac -ac 2 output.mp4
For each video stream, the timestamp and scale filters should be applied, and finally overlaid.
For each audio stream, timestamp filters should be applied for time offset, then aresample to insert silence till the start time, then apad to extend the end of the audio with silence. The apad should be skipped for the audio stream which ends last. The amerge joins all processed audio streams and ends with the stream when the last audio ends.

Resources