Implementing custom h264 quantization for Ffmpeg? - ffmpeg

I have a Raspberry Pi, and I'm livestreaming using FFmpeg. Unfortunately my wifi signal varies over the course of my stream. I'm currently using raspivid to send h264 encoded video to the stream. I have set a constant resolution and FPS, but have not set bitrate nor quantization, so they are variable.
However, the issue is that the quantization doesn't vary enough for my needs. If my wifi signal drops, my ffmpeg streaming speed will dip below 1.0x to 0.95xish for minutes, but my bitrate drops so slowly that ffmpeg can never make it back to 1.0x. As a result my stream will run into problems and start buffering.
I would like the following to happen:
If Ffmpeg (my stream command)'s reported speed goes below 1.0x (slower than realtime streaming), then increase quantization compression (lower bitrate) exponentially until Ffmpeg speed stabilizes at 1.0x. Prioritize stabilizing at 1.0x as quickly as possible.
My understanding is that the quantization logic Ffmpeg is using should be in the h264 encoder, but I can't find any mention of quantization at all in this github: https://github.com/cisco/openh264
My knowledge of h264 is almost zilch, so I'm trying to figure out
A) How does h264 currently vary the quantization during my stream, if at all?
B) Where is that code?
C) How hard is it for me to implement what I'm describing?
Thanks in advance!!

Related

Kurento video recordings have very low bitrate, seems to be hard capped at 300Kbps

We have a WebRTC application that records video server side using Kurento Media Server. The stream runs through Kurento and it is simultaneously recorded to WEBM VP9.
The quality of the video during the call is 1280x720 as expected. The quality of the recording is usually lower than 640x480 - the resolution seems to change during the recording, but is maintains constant bitrate around 300Kbps.
The resulting recording is extremely low quality.
We tried to affect the bitrates by changing the bitrates on RecorderEndpoint
recorder.setMaxOutputBitrate(0);
recorder.setMinOutputBitrate(1000000);
We also tried to set bitrates on WebRTCEndpoint
webRtcEndpoint.setMaxVideoRecvBandwidth(0);
webRtcEndpoint.setMaxVideoSendBandwidth(0);
webRtcEndpoint.setMinVideoRecvBandwidth(750);
webRtcEndpoint.setMinVideoSendBandwidth(750);
webRtcEndpoint.setMaxOutputBitrate(0);
webRtcEndpoint.setMinOutputBitrate(1000000);
Setters are called before the recording starts.
This should set the video bitrate for recording to 1Mbps, but it does not have any effect. It seems to be a common problem, but there seem to be no solution other than setting the minimum output bitrate which does not work for us.
How can we increase the bitrate of the recording?

what is the fastest ffmpeg video codec for decoding?

I am using ffmpeg on Linux to transcode video files. The files are video from a race car camera. They have been downloaded from Youtube as "webm" format. I want to compare two of the videos, side-by-side, using GridPlayer, which uses vlc as its underlying video processor. GridPlayer has very nice, frame-by-frame controls, but, they are very slow. What video codec should I use to impose the least decoding overhead on vlc/GridPlayer for smoother playback?
I have tried re-encoding as h264, 1920x1080, 30 fps, in mp4 container. I have since discovered a "-tune fastdecode" option that seems to be helpful, along with resizing to 854x480. Any other suggestions?

WebRTC Stream Freezes When Picture Complexity Increases

I am developing an application that uses WebRTC to display a live video stream being captured from a V4L2 source. The stream originates from a Linux box that has a DVI-USB capture card, is encoded to H264 by ffmpeg and sent to RTP, received by a Janus WebRTC server which is accessed by the web interface.
Here is my current ffmpeg command - pretty simple:
ffmpeg -f v4l2 -i /dev/video0 -vf "transpose=1,scale=768:1024" -vcodec libx264 -profile:v baseline -pix_fmt yuv420p -f rtp rtp://10.116.80.86:8004
I can't go into details, but the DVI source generates a portrait 768x1024 image that initially is a simple image where the only movement is a small clock near the center that increments every second. At this stage, everything appears to work great. The image is high-quality and continuous/smooth in the browser.
Once I interact with the DVI source, a more complex image is generated, with some text/lines in the upper half. Still not very complex - only 2 colors involved and some basic 1px line shapes, and only the little clock is moving. At this point, the video starts to freeze frequently, and only updates once in a while for a few seconds. Bandwidth should not be an issue here, and the bitrate appears to stay high. However, many fewer frames are decoded.
I have also tried scaling the video down to 480x640 from 768x1024 and with that change the issue does not occur. However, I really need the full resolution and, again, there should not be a bandwidth issue here.
I have also tried capturing the output of ffmpeg to a file rather than streaming to RTP and in the file everything is good.
Here is a screenshot of the WebRTC internals (in Edge) for this stream. You can clearly see when the video image changes from the simple clock to including more shapes & text (nothing is changed here other than the image from the DVI source):
In Firefox, the video just freezes whenever frames are not decoded. In Edge, the video goes black after a moment with no frames decoded.
Any ideas as to what might be causing this?
Answering my own question for future Googlers:
I ended up figuring out that this was due to the WebRTC server (Janus) running on a Raspberry Pi. Apparently the Pi 3B+ was powerful enough to handle the packet flow when the bitrate was low (just the clock), but when the rate got higher it would choke.
I re-hosted Janus on a more powerful server and all is working well.

How to extract motion vectors from h264 without a full decode on the CPU

I'm trying to use my nose as a pointing device. The plan is to encode the video stream from a webcam pointed at my face as h264 or the like, get the motion vectors, cook the numbers a bit and chuck them into /dev/uinput to make the mouse pointer move about. The uinput bit was easy.
This has to work with zero discernable latency. This, for instance:
#!/bin/bash
[ -p pipe.mkv ] || mkfifo pipe.mkv
ffmpeg -y -rtbufsize 1M -s 640x360 -vcodec mjpeg -i /dev/video0 -c h264_nvenc pipe.mkv &
ffplay -flags2 +export_mvs -vf codecview=mv=pf+bf+bb pipe.mkv
shows that the vectors are there but with a latency of several seconds which is unusable in a mouse. I know that the first ffmpeg step is working very fast by using the GPU, so either the pipe or the h264 decode in the second step is introducing the latency.
I tried MV Tractus (same as mpegflow I think) in a similar pipe arrangement and it was also very slow. They do a full h264 decode on the CPU and I think that's the problem cos I can see them imposing a lot of load on one CPU. If the pipe had caused the delay by buffering badly then the CPU wouldn't have been loaded. I guess ffplay also did the decoding on the CPU and I couldn't persuade it not to, but it only wants to draw arrows which are no use to me.
I think there are several approaches, and I'd like advice on which would be best, or if there's something even better I don't know about. I could:
Decode in hardware and get the motion vectors. So far this has failed. I tried combining ffmpeg's extract_mvs.c and hw_decode.c samples but no motion vectors turn up. vdpau is the only decoder I got working on my linux box. I have a nvidia gpu.
Do a minimal parse of the h264 to fish out the motion vectors only, ignoring all the other data. I think this would mean putting some kind of "motion only" option in libav's parser, but I'm not at all familiar with that code.
Find some other h264 parsing library that has said option and also unpacks the container.
Forget about hardware accelerated encoding and use a stripped down encoder to make only the motion vectors on either CPU or GPU. I suspect this would be slow cos I think calculating the motion vectors is the hardest part of the algorithm.
I'm tending towards the second option but I need some help figuring out where in the libav code to do it.
Very interesting project! I'm no ffmpeg expert, but it looks to me like your ffmpeg command is decoding the mjpeg output of your /dev/video0 and then ENCODING it into h.264 to get the motion vectors. That h.264 encoding step is computationally intensive and is likely causing your latency. Some things you can do to speed it up are (a) use a webcam that outputs h.264 instead of mjpeg; (b) run the h.264 encode on faster hardware and (c) use ffmpeg to lower the resolution of your video stream before encoding it. For example, you could define a small "hot region" in the video camera where the motions of your nose can control the mouse.

Video Slideshow from png files + mp3 audio

I have a bunch of .png frames and a .mp3 audio file which I would like to convert into a video. Unfortunately, the frames do not correspond to a constant frame rate. For instance, one frame may need to be displayed for 1 second, whereas another may need to be displayed for 3 seconds.
Is there any open-source software (something like ffmpeg) which would help me accomplish this? Any feedback would be greatly appreciated.
Many thanks!
This is not an elegant solution, but it will do the trick: duplicate frames as necessary so that you end up with some resulting (fairly high) constant framerate, 30 or 60 fps (or higher if you need higher time resolution). You simply change which frame is duplicated at the closest new frame to the exact timestamp you want. Frames which are exact duplicates will be encoded to a tiny size (a few bytes) with any decent codec, so this is fairly compact. Then just encode with ffmpeg as usual.
If you have a whole lot of these and need to do it the "right" way: you can indicate the timing either in the container (such as mp4, mkv, etc) or in the codec. For example in an H.264 stream you will have to insert SEI messages of type pic_timing to specify the timing of each frame. Alternately you will have to write your own muxer relying on a container library such as Matroska (mkv) or GPAC (mp4) to indicate the timing in the container. Note that not all codecs/containers support arbitrarily variable frame rate. Only a few codecs support timing in the codec. Also, if timing is specified in both container and codec, the container timing is used (but if you are muxing a stream into a container, the muxer should pick up the individual frame timestamps from the codec).

Resources