How to merge multiple audio/video stream

How to merge multiple audio/video stream - ffmpeg

I have recordings of each side of a video call. Each side records only it's own audio/video. I want to merge/sync them so that they look like recording of a complete call.
I have time when each side start it's recording. e.g first from 0, second from 50 seconds after the first one started.
I found a lib ffmpeg and trying the same using it. So far, I am only able to map them into a single file in which they both start at same time. The problem is that when both input streams have different duration, lib triggers error related memory.
ffmpeg -i 1.mp4 -i 2.mp4 -filter_complex "[0:v][1:v]hstack[t]; [0:a][1:a]amerge=inputs=2[b]" -map "[t]" -map "[b]" out.mp4
I see following error,
Error while filtering=28.0 size= 166kB time=00:00:06.58 bitrate= 207.1kbits/s speed=3.26x
Failed to inject frame into filter network: Cannot allocate memory
Error while processing the decoded data for stream #1:1
Since both streams start at same time so their audio/video don't sync.
Cutting the long stream and then merging with shortest one then finally concatenating doesn't look like a good option for me.
Can you please suggest me how to achieve this without loosing audio/video?
Thanks,
R.

Related

How to remove a frame with ffmpeg without re-encoding?

I am making a datamoshing program in C++, and I need to find a way to remove one frame from a video (specifically, the p-frame right after a sequence jump) without re-encoding the video. I am currently using h.264 but would like to be able to do this with VP9 and AV1 as well.
I have one way of going about it, but it doesn't work for one frustrating reason (mentioned later). I can turn the original video into two intermediate videos - one with just the i-frame before the sequence jump, and one with the p-frame that was two frames later. I then create a concat.txt file with the following contents:
file video.mkv
file video1.mkv
And run ffmpeg -y -f concat -i concat.txt -c copy output.mp4. This produces the expected output, although is of course not as efficient as I would like since it requires creating intermediate files and reading the .txt file from disk (performance is very important in this project).
But worse yet, I couldn't generate the intermediate videos with ffmpeg, I had to use avidemux. I tried all sorts of variations on ffmpeg -y -ss 00:00:00 -i video.mp4 -t 0.04 -codec copy video.mkv, but that command seems to really bug out with videos of length 1-2 frames - while it works for longer videos no problem. My best guess is that there is some internal checker to ensure the output video is not corrupt (which, unfortunately, is exactly what I want it to be!).
Maybe there's a way to do it this way that gets around that problem, or better yet, a more elegant solution to the problem in the first place.
Thanks!

If you know the PTS or data offset or packet index of the target frame, then you can use the noise bitstream filter. This is codec-agnostic.
ffmpeg -copyts -i input -c copy -enc_time_base -1 -bsf:v:0 noise=drop=eq(pos\,11291) out
This will drop the packet from the first video stream stored at offset 11291 in the input file. See other available variables at http://www.ffmpeg.org/ffmpeg-bitstream-filters.html#noise

Scene detection and concat makes my video longer (FFMPEG)

I'm encoding videos by scenes. At this moment I got two solutions in order to do so. The first one is using a Python application which gives me a list of frames that represent scenes. Like this:
285
378
553
1145
...
The first scene begins from the frame 1 to 285, the second from 285 to 378 and so on. So, I made a bash script which encodes all this scenes. Basically what it does is to take the current and previous frames, then convert them to time and finally run the ffmpeg command:
begin=$(awk 'BEGIN{ print "'$previous'"/"'24'" }')
end=$(awk 'BEGIN{ print "'$current'"/"'24'" }')
time=$(awk 'BEGIN{ print "'$end'"-"'$begin'" }')
ffmpeg -i $video -r 24 -c:v libx265 -f mp4 -c:a aac -strict experimental -b:v 1.5M -ss $begin -t $time "output$count.mp4" -nostdin
This works perfect. The second method is using ffmpeg itself. I run this commands and gives me a list of times. Like this:
15.75
23.0417
56.0833
71.2917
...
Again I made a bash script that encodes all these times. In this case I don't have to convert to times because what I got are times:
time=$(awk 'BEGIN{ print "'$current'"-"'$previous'" }')
ffmpeg -i $video -r 24 -c:v libx265 -f mp4 -c:a aac -strict experimental -b:v 1.5M -ss $previous -t $time "output$count.mp4" -nostdin
After all this explained it comes the problem. Once all the scenes are encoded I need to concat them and for that what I do is to create a list with the video names and then run the ffmpeg command.
list.txt
file 'output1.mp4'
file 'output2.mp4'
file 'output3.mp4'
file 'output4.mp4'
command:
ffmpeg -f concat -i list.txt -c copy big_buck_bunny.mp4
The problem is that the "concated" video is longer than the original by 2.11 seconds. The original one lasts 596.45 seconds and the encoded lasts 598.56. I added up every video duration and I got 598.56. So, I think the problem is in the encoding process. Both videos have the same frames number. My goal is to get metrics about the encoding process, when I run VQMT to get the PSNR and SSIM I get weird results, I think is for this problem.
By the way, I'm using the big_buck_bunny video.

The probable difference is due to the copy codec. In the latter case, you tell ffmpeg to copy the segments, but it can't do that based on your input times.
It has to find first the previous I frames (a frame that can be decoded without any reference to any previous frame) and starts from here.
To get what you need, you need to either re-encode the video (like you did in the 2 former examples) or change the times to stop at I frames.
To assert I getting your issue correctly:
You have a source video (that's encoded at variable frame rate, close to 18fps)
You want to split the source video via ffmpeg, by forcing the frame rate to 24 fps.
Then you want to concat each segment.
I think the issue is mainly that you have some discrepancy in the timing (if I divide the frame index by the time you've given, I getting between 16fps to 18fps). When you are converting them in step 2, the output video segment time will be 24fps. ffmpeg does not resample in the time axis, so if you force a video rate, the video will accelerate or slow down.
There is also the issue of consistency for the stream:
Typically, a video stream must start with a I frame, so when splitting, FFMPEG has to locate the previous I frame (when using copy codec, and this changes the duration of the segment).
When you are concatenating, you could also have the issue of consistency (that is, if the segment you are concatenating does end with a I frame, and the next one starts with a I frame, it's possible FFMPEG drops either one, although I don't remember what is the current behavior now)
So, to solve your issue, if I were you, I would avoid step 2 (it's bad for quality anyway). That is, I would use ffmpeg to split the segments of interest based on the frame number (that's the only value that's not approximate in your scheme) in png or ppm frames (or to a pipe if you don't care about keeping them) and then concat all the frames by encoding them at the last step with the expected rate set to totalVideoTime / totalFrameCount.
You'll get a smaller and higher quality final video.
If you can't do what I said for whatever reason, at least for the concat input, you should use the ffconcat format:
ffconcat version 1.0
file segment1
duration 12.2
file segment2
duration 10.3
This will give you the expected duration by cutting each segment if it's longer
For selecting by frame number (instead of time as time is hard to get right on variable frame rate video), you should use the select filter like this:
-vf select=“between(n\,start_frame_num\,end_frame_num),setpts=STARTPTS"

I suggest checking the input and output frame rate and make sure they match. That could be a source of the discrepancy.

FFMPEG: RTSP to HLS restream stops with "No more output streams to write to, finishing."

I'm trying to do a live restream an RTSP feed from a webcam using ffmpeg, but the stream repeatedly stops with the error:
"No more output streams to write to, finishing."
The problem seems to get worse at higher bitrates (256kbps is mostly reliable) and is pretty random in its occurrence. At 1mbps, sometimes the stream will run for several hours without any trouble, on other occasions the stream will fail every few minutes. I've got a cron job running which restarts the stream automatically when it fails, but I would prefer to avoid the continued interruptions.
I have seen this problem reported in a handful of other forums, so this is not a unique problem, but not one of those reports had a solution attached to it. My ffmpeg command looks like this:
ffmpeg -loglevel verbose -r 25 -rtsp_transport tcp -i rtsp://user:password#camera.url/live/ch0 -reset_timestamps 1 -movflags frag_keyframe+empty_moov -bufsize 7168k -stimeout 60000 -hls_flags temp_file -hls_time 5 -hls_wrap 180 -acodec copy -vcodec copy streaming.m3u8 > encode.log 2>&1
What gets me is that the error makes no sense, this is a live stream so output is always wanted until I shut off the stream. So having it shut down because output isn't wanted is downright odd. If ffmpeg was complaining because of a problem with input it would make more sense.
I'm running version 3.3.4, which I believe is the latest.
Update 13 Oct 17:
After extensive testing I've established that "No more outputs" error message generated by FFMPEG is very misleading. The error seems to be generated if the data coming in from RTSP is delayed, eg by other activity on the router the camera is connected via. I've got a large buffer and timeout set which should be sufficient for 60 seconds, but I can still deliberately trigger this error with far shorter interruptions, so clearly the buffer and timeout aren't having the desired effect. This might be fixed by setting a QOS policy on the router and by checking that the TCP packets from the camera have a suitably high priority set, it's possible this isn't the case.
However, I would still like to improve the robustness of the input stream if it is briefly interrupted. Is there any way to persuade FFMPEG to tolerate this or to actually make use of the buffer it seems to be ignoring? Can FFMPEG be persuaded to simply stop writing output and wait for input to become available rather than bailing out? Or could I get FFMPEG to duplicate the last complete frame until it's able to get more data? I can live with the stream stuttering a bit, but I've got to significantly reduce the current behaviour where the stream drops at the slightest hint of a problem.
Further update 13 Oct 2017:
After more tests, I've found that the problem actually seems to be that HLS is incapable of coping with a discontinuity in the incoming video stream. If I deliberately cut the network connection between the camera and FFMPEG, FFMPEG will wait for the connection to be re-established for quite a long time. If the interruption was long (>10 seconds) the stream will immediately drop with the "No More Outputs" error the instant that the connection is re-established. If the interruption is short, then RTSP will actually start pulling data from the camera again, but the stream will then drop with the same error a few seconds later. So it seems clear that the gap in the input data is causing the HLS encoder to have a fit and give up once the stream is resumed, but the size of the gap has an impact on whether the drop is instant or not.

I had a similar problem. In my case stream stopped without any errors after few minutes. I fixed this by switching from freebsd to linux. Maybe the problem is bad package dependencies or ffmpeg version. So my suggestion is to try older or newer version of ffmpeg or another OS.
Update: Actually this doesn't solve the problem. I've tested a bit more and stream stopped after 15 minutes.

Been facing the same problem. After an extended trial and error i found that the problem resided in my cctv camera parameters. More exactly i adjusted the key frame interval parameter to match the frame-rate of the recording camera.
My syntax (windows)
SET cam1_rtsp="rtsp://192.168.0.93:554/11?timeout=30000000&listen_timeout=30000000&recv_buffer_size=30000000"
ffmpeg -rtsp_transport tcp -vsync -1 -re -i %cam1_rtsp% -vcodec copy -af apad -shortest -async 1 -strftime 1 -reset_timestamps 1 -metadata title=\\"Cam\\" -map 0 -f segment -segment_time 300 -segment_atclocktime 1 -segment_format mp4 CCTV\%%Y-%%m-%%d_%%H-%%M-%%S.mp4 -loglevel verbose
After this correction got a 120 hour smooth input stream with no errors.
Hope this helps anyone.

Possible to Combine Live m3u8 stream with PIP overlay from WebRTC source?

Can someone tell me what server-side technology (perhaps ffmpeg), one could use in order to:
1) display this full-screen live-streaming video:
http://aolhdshls-lh.akamaihd.net/i/gould_1#134793/master.m3u8
2) and overlay it in the lower-right corner with a live video coming from a webRTC video-chat stream?
3) and send that combined stream into a new m3u8 live-stream
4) Note that it needs to be a server-side solution - - - cannot launch multiple video players in this case (needs to pass the resulting stream to SmartTV's which only have one video-decoder at a time)
The closest example I've found so far is this article:
https://trac.ffmpeg.org/wiki/Create%20a%20mosaic%20out%20of%20several%20input%20videos
Which isn't really live, nor is it really doing overlays.
any advice is greatly appreciated.

Let me clear what you want in this case :
Input video is HLS streaming from webRTC : What about delay? is dealy important thing in your work?
Overlay image into video : This will need that decoding input video, and filtering it, encoding again. so it needs a lot of cpu resource and even more if input video is 1080p.
Re-struct new HLS format : You must put it lot of encoding option to make sure that ts fragment works well. most important thing is GOP size and ts duration.
You need a web server to provide m3u8 index file. you can use nginx, apache.
What i tell you now in this answer is that ffmpeg command line, which making overlay from input HLS streaming and re-make ts segments.
Following command-line will do what you want in step 1 to step 3 :
ffmpeg \
-re -i "http://aolhdshls-lh.akamaihd.net/i/gould_1#134793/master.m3u8" \
-i "[OVERLAY_IMAGE].png" \
-filter_complex "[0:v][1:v]overlay=main_w:main_h[output]" \
-map [output] -0:a -c:v libx264 -c:a aac -strict -2 \
-f ssegment -segment_list out.list out%03d.ts
This is basic command line that overlay image from your input HLS streaming then creating ts segment and index file.
I don't have any further experience with HLS so it can be done without any tuning option but maybe you should tune it for you work. and also you should search a little bit for web server to provide m3u8 but it won't be hard.
GOP size(-g) and its duration(segment_tim), as i said, will be key point of your tuning.

ffmpeg live stream overlay issues,while any one of the stream is lost other streams are getting stuck

So far we have done
We have a video chat client which has a set of 9 video streams (users) with
h.264 codec using Adobe FMS. Now, using ffmpeg we are able to combine these
streams into one stream using the overlay (video) and amix (audio) filters.
We are able to send the single combined stream to a live streaming service.
The stream of the active speaker is shown in a bigger size using the scale
property of ffmpeg.
Code as follows:
ffmpeg -i "rtmp://localhost/live/mystream" -i "rtmp://localhost/live/mystream2 " -i "rtmp://localhost/live/mystream3 "-filter_complex"nullsrc=size=300x300 [b1];[0:v] setpts=PTS-STARTPTS,scale=100x100 [s1];[1:v] setpts=PTS-STARTPTS,scale=200x200 [s2];[2:v]setpts=PTS-STARTPTS,scale=100x100 [s3];[b1][s1] overlay=shortest=1 [b1+s1];[b1+s1][s2] overlay=shortest=1 [b1+s2];
[b1+s2][s3] overlay=shortest=1:x=100" out.mp4
Help Needed in the following 2 major issues. Any help would be appreciated.
Whenever the active speaker changes, the stream of that user should be shown in a bigger
size. is this possible to do without restarting the ffmpeg process?
Right now, if one of the 9 streams stops, the ffmpeg process crashes.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio