Slow frame rate for gstreamer stream - performance

I am using the following command for a gstreamer pipeline for a videostream from a webcam:
gst-launch-1.0 v4l2src device=/dev/video0 ! videorate ! video/x-raw,format=I420,width=1920,height=1080,framerate=25/1 ! xvimagesink
Unfortunately the displayed stream has a very low framerate, it feels like maybe 3 frames per second.
I don't really know what could be the problem here. How can I increase the performance for this video stream?
I already tried reducing the width and height values to lower the resolution but this did not leave me with any noticable improvement.
Might the thing with the format be slowing me down? Maybe it is helful to know that I chose the I420 as they where needed for a nodewebRTC implementation where a function was seemingl only called with frames of this format.

Check you camera capabilities first, e.g. with v4l2-ctl --list-formats-ext -d /dev/video0. Might be that I420 requires conversion. If your PC is not able to keep-up with the conversion, you'll see a message like:
WARNING: from element /GstPipeline:pipeline0/GstXvImageSink:xvimagesink0: A lot of buffers are being dropped.
If so, consider using MJPG to stream at higher framerate.

Related

WebRTC Stream Freezes When Picture Complexity Increases

I am developing an application that uses WebRTC to display a live video stream being captured from a V4L2 source. The stream originates from a Linux box that has a DVI-USB capture card, is encoded to H264 by ffmpeg and sent to RTP, received by a Janus WebRTC server which is accessed by the web interface.
Here is my current ffmpeg command - pretty simple:
ffmpeg -f v4l2 -i /dev/video0 -vf "transpose=1,scale=768:1024" -vcodec libx264 -profile:v baseline -pix_fmt yuv420p -f rtp rtp://10.116.80.86:8004
I can't go into details, but the DVI source generates a portrait 768x1024 image that initially is a simple image where the only movement is a small clock near the center that increments every second. At this stage, everything appears to work great. The image is high-quality and continuous/smooth in the browser.
Once I interact with the DVI source, a more complex image is generated, with some text/lines in the upper half. Still not very complex - only 2 colors involved and some basic 1px line shapes, and only the little clock is moving. At this point, the video starts to freeze frequently, and only updates once in a while for a few seconds. Bandwidth should not be an issue here, and the bitrate appears to stay high. However, many fewer frames are decoded.
I have also tried scaling the video down to 480x640 from 768x1024 and with that change the issue does not occur. However, I really need the full resolution and, again, there should not be a bandwidth issue here.
I have also tried capturing the output of ffmpeg to a file rather than streaming to RTP and in the file everything is good.
Here is a screenshot of the WebRTC internals (in Edge) for this stream. You can clearly see when the video image changes from the simple clock to including more shapes & text (nothing is changed here other than the image from the DVI source):
In Firefox, the video just freezes whenever frames are not decoded. In Edge, the video goes black after a moment with no frames decoded.
Any ideas as to what might be causing this?
Answering my own question for future Googlers:
I ended up figuring out that this was due to the WebRTC server (Janus) running on a Raspberry Pi. Apparently the Pi 3B+ was powerful enough to handle the packet flow when the bitrate was low (just the clock), but when the rate got higher it would choke.
I re-hosted Janus on a more powerful server and all is working well.

How to extract motion vectors from h264 without a full decode on the CPU

I'm trying to use my nose as a pointing device. The plan is to encode the video stream from a webcam pointed at my face as h264 or the like, get the motion vectors, cook the numbers a bit and chuck them into /dev/uinput to make the mouse pointer move about. The uinput bit was easy.
This has to work with zero discernable latency. This, for instance:
#!/bin/bash
[ -p pipe.mkv ] || mkfifo pipe.mkv
ffmpeg -y -rtbufsize 1M -s 640x360 -vcodec mjpeg -i /dev/video0 -c h264_nvenc pipe.mkv &
ffplay -flags2 +export_mvs -vf codecview=mv=pf+bf+bb pipe.mkv
shows that the vectors are there but with a latency of several seconds which is unusable in a mouse. I know that the first ffmpeg step is working very fast by using the GPU, so either the pipe or the h264 decode in the second step is introducing the latency.
I tried MV Tractus (same as mpegflow I think) in a similar pipe arrangement and it was also very slow. They do a full h264 decode on the CPU and I think that's the problem cos I can see them imposing a lot of load on one CPU. If the pipe had caused the delay by buffering badly then the CPU wouldn't have been loaded. I guess ffplay also did the decoding on the CPU and I couldn't persuade it not to, but it only wants to draw arrows which are no use to me.
I think there are several approaches, and I'd like advice on which would be best, or if there's something even better I don't know about. I could:
Decode in hardware and get the motion vectors. So far this has failed. I tried combining ffmpeg's extract_mvs.c and hw_decode.c samples but no motion vectors turn up. vdpau is the only decoder I got working on my linux box. I have a nvidia gpu.
Do a minimal parse of the h264 to fish out the motion vectors only, ignoring all the other data. I think this would mean putting some kind of "motion only" option in libav's parser, but I'm not at all familiar with that code.
Find some other h264 parsing library that has said option and also unpacks the container.
Forget about hardware accelerated encoding and use a stripped down encoder to make only the motion vectors on either CPU or GPU. I suspect this would be slow cos I think calculating the motion vectors is the hardest part of the algorithm.
I'm tending towards the second option but I need some help figuring out where in the libav code to do it.
Very interesting project! I'm no ffmpeg expert, but it looks to me like your ffmpeg command is decoding the mjpeg output of your /dev/video0 and then ENCODING it into h.264 to get the motion vectors. That h.264 encoding step is computationally intensive and is likely causing your latency. Some things you can do to speed it up are (a) use a webcam that outputs h.264 instead of mjpeg; (b) run the h.264 encode on faster hardware and (c) use ffmpeg to lower the resolution of your video stream before encoding it. For example, you could define a small "hot region" in the video camera where the motions of your nose can control the mouse.

FFMPEG API -- How much do stream parameters change frame-to-frame?

I'm trying to extract raw streams from devices and files using ffmpeg. I notice the crucial frame information (Video: width, height, pixel format, color space, Audio: sample format) is stored both in the AVCodecContext and in the AVFrame. This means I can access it prior to the stream playing and I can access it for every frame.
How much do I need to account for these values changing frame-to-frame? I found https://ffmpeg.org/doxygen/trunk/demuxing__decoding_8c_source.html#l00081 which indicates that at least width, height, and pixel format may change frame to frame.
Will the color space and sample format also change frame to frame?
Will these changes be temporary (a single frame) or lasting (a significant block of frames) and is there any way to predict for this stream which behavior will occur?
Is there a way to find the most descriptive attributes that this stream is possible of producing, such that I can scale all the lower-quality frames up, but not offer a result that is mindlessly higher-quality than the source, even if this is a device or a network stream where I cannot play all the frames in advance?
The fundamental question is: how do I resolve the flexibility of this API with the restriction that raw streams (my output) do not have any way of specifying a change of stream attributes mid-stream. I imagine I will need to either predict the most descriptive attributes to give the stream, or offer a new stream when the attributes change. Which choice to make depends on whether these values will change rapidly or stay relatively stable.
So, to add to what #szatmary says, the typical use case for stream parameter changes is adaptive streaming:
imagine you're watching youtube on a laptop with various methods of internet connectivity, and suddenly bandwidth decreases. Your stream will automatically switch to a lower bandwidth. FFmpeg (which is used by Chrome) needs to support this.
alternatively, imagine a similar scenario in a rtc video chat.
The reason FFmpeg does what it does is because the API is essentially trying to accommodate to the common denominator. Videos shot on a phone won't ever change resolution. Neither will most videos exported from video editing software. Even videos from youtube-dl will typically not switch resolution, this is a client-side decision, and youtube-dl simply won't do that. So what should you do? I'd just use the stream information from the first frame(s) and rescale all subsequent frames to that resolution. This will work for 99.99% for the cases. Whether you want to accommodate your service to this remaining 0.01% depends on what type of videos you think people will upload and whether resolution changes make any sense in that context.
Does colorspace change? They could (theoretically) in software that mixes screen recording with video fragments, but it's highly unlikely (in practice). Sample format changes as often as video resolution: quite often in the adaptive scenario, but whether you care depends on your service and types of videos you expect to get.
Usually not often, or ever. However, this is based on the codec and are options chosen at encode time. I pass the decoded frames through swscale just in case.

Video Slideshow from png files + mp3 audio

I have a bunch of .png frames and a .mp3 audio file which I would like to convert into a video. Unfortunately, the frames do not correspond to a constant frame rate. For instance, one frame may need to be displayed for 1 second, whereas another may need to be displayed for 3 seconds.
Is there any open-source software (something like ffmpeg) which would help me accomplish this? Any feedback would be greatly appreciated.
Many thanks!
This is not an elegant solution, but it will do the trick: duplicate frames as necessary so that you end up with some resulting (fairly high) constant framerate, 30 or 60 fps (or higher if you need higher time resolution). You simply change which frame is duplicated at the closest new frame to the exact timestamp you want. Frames which are exact duplicates will be encoded to a tiny size (a few bytes) with any decent codec, so this is fairly compact. Then just encode with ffmpeg as usual.
If you have a whole lot of these and need to do it the "right" way: you can indicate the timing either in the container (such as mp4, mkv, etc) or in the codec. For example in an H.264 stream you will have to insert SEI messages of type pic_timing to specify the timing of each frame. Alternately you will have to write your own muxer relying on a container library such as Matroska (mkv) or GPAC (mp4) to indicate the timing in the container. Note that not all codecs/containers support arbitrarily variable frame rate. Only a few codecs support timing in the codec. Also, if timing is specified in both container and codec, the container timing is used (but if you are muxing a stream into a container, the muxer should pick up the individual frame timestamps from the codec).

Forcing custom H.264 intra-frames (keyframes) at encode-time?

I have a video sequence that I'd like to skip to specific frames at playback-time (my player is implemented using AVPlayer in iOS, but that's incidental). Since these frames will fall at unpredictable intervals, I can't use the standard "keyframe every N frames/seconds" functionality present in most video encoders. I do, however, know the target frames in advance.
In order to do this skipping as efficiently as possible, I need to force the target frames to be i-frames at encode time. Ideally in some kind of GUI which would let me scrub to a frame, mark it as a keyframe, and then (re)encode my video.
If such a tool isn't available, I have the feeling this could probably be done by rolling a custom encoder with libavcodec, but I'd rather use a higher-level (and preferably scriptable) tool to do the job if a GUI isn't possible. Is this the kind of task ffmpeg or mencoder can be bent to?
Does anybody have a technique for doing this? Also, it's entirely possible that this is an impossible task because of some fundamental ignorance I have of the h.264 codec. If so, please do put me right.
ffmpeg has a -force_key_frames option that accepts a series of arbitrary timestamps as well as other ways to specify the frames. From the documentation:
-force_key_frames 0:05:00,...
Answered my own question: it's possible to set custom compression keyframes in Apple Compressor.
Compression markers are also known as manual compression markers. These are markers you can add to a Final Cut Pro sequence (or in the Compressor Preview window) to indicate when Compressor should generate an MPEG I-frame during compression.
Source.
Could you not use chapter markers to jump between sections? Not an ideal solution but a lot easier to achieve.
You can use this software:
http://www.applesolutions.com/bantha/MH.html

Resources