WebRTC Stream Freezes When Picture Complexity Increases

WebRTC Stream Freezes When Picture Complexity Increases - ffmpeg

I am developing an application that uses WebRTC to display a live video stream being captured from a V4L2 source. The stream originates from a Linux box that has a DVI-USB capture card, is encoded to H264 by ffmpeg and sent to RTP, received by a Janus WebRTC server which is accessed by the web interface.
Here is my current ffmpeg command - pretty simple:
ffmpeg -f v4l2 -i /dev/video0 -vf "transpose=1,scale=768:1024" -vcodec libx264 -profile:v baseline -pix_fmt yuv420p -f rtp rtp://10.116.80.86:8004
I can't go into details, but the DVI source generates a portrait 768x1024 image that initially is a simple image where the only movement is a small clock near the center that increments every second. At this stage, everything appears to work great. The image is high-quality and continuous/smooth in the browser.
Once I interact with the DVI source, a more complex image is generated, with some text/lines in the upper half. Still not very complex - only 2 colors involved and some basic 1px line shapes, and only the little clock is moving. At this point, the video starts to freeze frequently, and only updates once in a while for a few seconds. Bandwidth should not be an issue here, and the bitrate appears to stay high. However, many fewer frames are decoded.
I have also tried scaling the video down to 480x640 from 768x1024 and with that change the issue does not occur. However, I really need the full resolution and, again, there should not be a bandwidth issue here.
I have also tried capturing the output of ffmpeg to a file rather than streaming to RTP and in the file everything is good.
Here is a screenshot of the WebRTC internals (in Edge) for this stream. You can clearly see when the video image changes from the simple clock to including more shapes & text (nothing is changed here other than the image from the DVI source):
In Firefox, the video just freezes whenever frames are not decoded. In Edge, the video goes black after a moment with no frames decoded.
Any ideas as to what might be causing this?

Answering my own question for future Googlers:
I ended up figuring out that this was due to the WebRTC server (Janus) running on a Raspberry Pi. Apparently the Pi 3B+ was powerful enough to handle the packet flow when the bitrate was low (just the clock), but when the rate got higher it would choke.
I re-hosted Janus on a more powerful server and all is working well.

Related

Kurento video recordings have very low bitrate, seems to be hard capped at 300Kbps

We have a WebRTC application that records video server side using Kurento Media Server. The stream runs through Kurento and it is simultaneously recorded to WEBM VP9.
The quality of the video during the call is 1280x720 as expected. The quality of the recording is usually lower than 640x480 - the resolution seems to change during the recording, but is maintains constant bitrate around 300Kbps.
The resulting recording is extremely low quality.
We tried to affect the bitrates by changing the bitrates on RecorderEndpoint
recorder.setMaxOutputBitrate(0);
recorder.setMinOutputBitrate(1000000);
We also tried to set bitrates on WebRTCEndpoint
webRtcEndpoint.setMaxVideoRecvBandwidth(0);
webRtcEndpoint.setMaxVideoSendBandwidth(0);
webRtcEndpoint.setMinVideoRecvBandwidth(750);
webRtcEndpoint.setMinVideoSendBandwidth(750);
webRtcEndpoint.setMaxOutputBitrate(0);
webRtcEndpoint.setMinOutputBitrate(1000000);
Setters are called before the recording starts.
This should set the video bitrate for recording to 1Mbps, but it does not have any effect. It seems to be a common problem, but there seem to be no solution other than setting the minimum output bitrate which does not work for us.
How can we increase the bitrate of the recording?

Why does it take forever just to add audio to an mp4?

I am currently using Kdenlive, but have also used ffmpeg when I have the simple task of adding audio to a video that does not yet have audio. Since it is just a matter of putting the video file together with the audio, it seems like it ought to be simple. Is there something about encoding mp4's that means it must take a lot of processing to complete?
I have good hardware (i7 6700k and gtx 1080), but kdenlive currently estimates 2.5 hours to complete adding audio to a 10 minute video.

Without more info (encoder, settings, video width x height, instructions to duplicate the behavior, etc) we can only guess. It's probably re-encoding the video instead of only muxing it. Encoding is CPU intensive and takes a long time. Although 2.5 hours for 10 minutes seems excessive, but there is not enough info in the question to say why it takes this long.
If you want to add audio with ffmpeg see How to add a new audio into a video using ffmpeg? This will allow you to mux the video (and optionally the audio) without encoding it: like a copy and paste.

How to extract motion vectors from h264 without a full decode on the CPU

I'm trying to use my nose as a pointing device. The plan is to encode the video stream from a webcam pointed at my face as h264 or the like, get the motion vectors, cook the numbers a bit and chuck them into /dev/uinput to make the mouse pointer move about. The uinput bit was easy.
This has to work with zero discernable latency. This, for instance:
#!/bin/bash
[ -p pipe.mkv ] || mkfifo pipe.mkv
ffmpeg -y -rtbufsize 1M -s 640x360 -vcodec mjpeg -i /dev/video0 -c h264_nvenc pipe.mkv &
ffplay -flags2 +export_mvs -vf codecview=mv=pf+bf+bb pipe.mkv
shows that the vectors are there but with a latency of several seconds which is unusable in a mouse. I know that the first ffmpeg step is working very fast by using the GPU, so either the pipe or the h264 decode in the second step is introducing the latency.
I tried MV Tractus (same as mpegflow I think) in a similar pipe arrangement and it was also very slow. They do a full h264 decode on the CPU and I think that's the problem cos I can see them imposing a lot of load on one CPU. If the pipe had caused the delay by buffering badly then the CPU wouldn't have been loaded. I guess ffplay also did the decoding on the CPU and I couldn't persuade it not to, but it only wants to draw arrows which are no use to me.
I think there are several approaches, and I'd like advice on which would be best, or if there's something even better I don't know about. I could:
Decode in hardware and get the motion vectors. So far this has failed. I tried combining ffmpeg's extract_mvs.c and hw_decode.c samples but no motion vectors turn up. vdpau is the only decoder I got working on my linux box. I have a nvidia gpu.
Do a minimal parse of the h264 to fish out the motion vectors only, ignoring all the other data. I think this would mean putting some kind of "motion only" option in libav's parser, but I'm not at all familiar with that code.
Find some other h264 parsing library that has said option and also unpacks the container.
Forget about hardware accelerated encoding and use a stripped down encoder to make only the motion vectors on either CPU or GPU. I suspect this would be slow cos I think calculating the motion vectors is the hardest part of the algorithm.
I'm tending towards the second option but I need some help figuring out where in the libav code to do it.

Very interesting project! I'm no ffmpeg expert, but it looks to me like your ffmpeg command is decoding the mjpeg output of your /dev/video0 and then ENCODING it into h.264 to get the motion vectors. That h.264 encoding step is computationally intensive and is likely causing your latency. Some things you can do to speed it up are (a) use a webcam that outputs h.264 instead of mjpeg; (b) run the h.264 encode on faster hardware and (c) use ffmpeg to lower the resolution of your video stream before encoding it. For example, you could define a small "hot region" in the video camera where the motions of your nose can control the mouse.

ffmpeg drop frames on purpose to lower filesize

Our security system records and archives our IP cameras streams with ffmpeg -use_wallclock_as_timestamps 1 -i rtsp://192.168.x.x:554/mpeg4 -c copy -t 60 my_input_video.avi
I run it with crontab every minute so it creates videos of 60 seconds (~15Mb) for each camera every minute. When an intrusion occurs, the camera sends a picture through FTP and a script called by incrontab:
1- forwards immediately the picture by email
2- selects the video covering the minute the intrusion occured, compress it with h264 (to ~2,6Mb) and sends it by email
It is working really well but if a thief crosses the path of various cameras, the connection to the SMTP server is not fast enough so video emails are delayed. I'd like to compress the videos even more to avoid that. I could lower the resolution (640x480 to 320x240 for example) but sometimes 640x480 is handy to zoom on something which looks to be moving...
So my idea is to drop frames in the video in order to lower the filesize. I don't care if the thief is walking like a "stop motion Lego" on the video, the most important is I know there is someone so I can act.
mediainfo my_input_video.avi says Frame rate = 600.000 fps but it is of course wrong. FPS sent by IP cameras are always false because it varies with the network quality; this is why i use "-use_wallclock_as_timestamps 1" in my command to record the streams.
with ffmpeg -i my_input_video.avi -vcodec h264 -preset ultrafast -crf 28 -acodec mp3 -q:a 5 -r 8 output.avi the video is OK but filesize is higher (3Mb)
with ffmpeg -i my_input_video.avi -vcodec h264 -preset ultrafast -crf 28 -acodec mp3 -q:a 5 -r 2 output.avi the filesize is lower (2,2Mb) but the video doesn't work (it is blocked at the first frame).
Creating a mjpeg video (mjpeg = not interlaced frames) in the middle of the process (first exporting to mjpeg with less frames and then exporting to h264) creates same results.
Do you know how I can get my thief to walk like a "stop motion Lego" to lower the filesize to a minimum?
Thanks for any help

What are your constraints file size wise? 2.6MB for 60 seconds of video seems pretty reasonable to me, thats about 350kbps, which is pretty low for video quality.
You need to specify the video bitrate -b:v 125000 (125kbps, should drop you to about 900kb) to control the bitrate/s you want the video encoded at. Your not giving FFMpeg enough hints as to how you want the video handled, so its picking arbitrary values you don't like. As you drop the frame rate, its just using up the buffers allocating more bits to each frame. (one big thing you need to keep in mind with this is, as you stretch the video out over a longer time period the more likely the scene will change significantly require an I frame (full encoded frame vs frame based on previous frame) so reducing the frame rate will help some, but may not help as much as you'd think).
Your "(it is blocked at the first frame)." is most likely an issue with you trying to start decoding a stream when it is not at an I frame and not an issue with your settings.

What movie formats and resolutions should be generated to ensure cross-browser/platform compatibility?

I'm looking to generate web videos from movies taken with my digital camera. What formats should I generate, and at what resolution and bitrate to ensure playback on mobile and desktop devices?
Here's what I was thinking:
Input format: AVI, MOV
Output format: webm, ogv, mp4
Output resolutions: 1080p, 720p, 320p

Not really a programming question but I will answer it anyways:
WebM can be ditched completely. Very few devices support it. mp4 is the most common format that all devices support. Low end phones support 3gpp format instead [cousin of mp4]. If you have it you should be fine for 90% of the devices.
mp4 with h.264/aac is the most common and for devices that don't support those mpeg4 with mp3 will suffice.
How many devices do you have are 1080p resolution. Better to ditch the 1080p and get one SD resolution 480p in there.
Bitrates depend on the encoding profile and content. Just ensure do two pass encoding using ffmpeg and libx264 to get the best quality.

Most mobile devices can display "HD" content fairly well, these days. However, if you're looking to save on bandwidth on peoples data plans, a good resolution would probably be 852x480.
now, depending on if you need near lossless quality, or if you can accept minor artifacts in your video will determine your bitrate. for 1080p and x264, you can get near lossless with about 15mbps, but you could have watchable video with 10-11mbps. im not sure how well the other codecs compare, so you may have to try a couple test runs with a short video.
if you do 720p, you can most certainly get away with 4-6mbps.
with 852x480, you may be successful with as low as 1.5-2mbps.
480x320, or maybe even 320x240 may be a good option, if you suspect people will be watching this on lower end devices or on really slow connection, or very limited bandwidth. you could probably get away with 500kbps for 320x240, and 1mbps for 480x320.
these are all starting points, as each codec and selected encoding options will increase/decrease the quality. but i believe these to be good starting points.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio