ffmpeg process has GPU usage limit - ffmpeg

i'm using ffmpeg and an NVIDIA for my video transcode process.
so i have one problem.
look at below image :
one process just use 263MiB of my second GPU. not completly use that !!
that is not good. i think there should be a way to remove this limitation for gpu process.
my ffmpeg command that i run is:
ffmpeg -y -loglevel info -hwaccel cuda -hwaccel_output_format cuda -hwaccel_device 1 -i "MYVIDEO" -vf scale_npp=w=426:h=240 -c:v h264_nvenc -profile:v main -b:v 400k -sc_threshold 0 -g 25 -keyint_min 25 -bf 2 -c:a aac -b:a 64k -ar 48000 -f hls -hls_time 6 -hls_playlist_type vod -hls_allow_cache 1 -hls_segment_filename f-0-seg-%d.ts f-0.m3u8

There is no limitation going on here, At least not related to memory.
You are scaling the video to 426x240 Assuming 4:2:0 subsampling, That is 153K per frame. The encoder needs 16 frames at most. Which is a little over 2MB. The GPU is using over 100 times that.

Related

How use FFMPEG multi GPU proccess

I have a problem and cant find any suitable answer for it.
Its about use multi GPU proccess.
I have 3 graphic card and you could see it :
[![enter image description here][1]][1]
if image not loaded use this image link : https://i.stack.imgur.com/msR83.jpg enter link description here
My problem is: when i run more than one ffmpeg command with cuda all process assigned to first GPU.
like below image:
[![enter image description here][2]][2]
if image not loaded use this image link: https://i.stack.imgur.com/PfYfz.jpg
you see? all 6 proccess assigned to first GPU.
I really confused how could i fix it.
my FFMPEG code is :
ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i my-video.mp4 \
-vf scale_npp=w=426:h=240 -c:v h264_nvenc -profile:v main -b:v 400k -sc_threshold 0 -g 25 \
-c:a aac -b:a 64k -ar 48000 \
-f hls -hls_time 6 -hls_playlist_type vod \
-hls_allow_cache 1 -hls_key_info_file encription.keyinfo \
-hls_segment_filename f-0-seg-%d.ts f-0.m3u8
i run top FFMPEG code for 6 diffrent video at same time.
please help to find answer. by sharing your knowledge or some links that could help me.
Thanks a lot.
The Process seems fairly straigh forward from what I can tell:
https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/
Encoding and decoding work must be explicitly assigned to a GPU when using multiple GPUs in one system. GPUs are identified by their index number; by default all work is performed on the GPU with index 0. Use the following command to obtain a list of all NVIDIA GPUs in the system and their corresponding ID numbers:
ffmpeg -vsync 0 -i input.mp4 -c:v h264_nvenc -gpu list -f null –
Once you know the index, the -hwaccel_device index flag can be used to set the active GPU for decoding and encoding. In the example below the work will be executed on the gpu with index 1.
ffmpeg -vsync 0 -hwaccel cuvid -hwaccel_device 1 -c:v h264_cuvid -i input.mp4 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

FFmpeg: Get better encoding out of my function

I needed some assistance on my task.
I am using FFmpeg to burn time and the channel name onto the video.
My goal is to record a stream that is html5 compatible with the following settings:
Video wrapper MP4
Video codec H.264
Bitrate 1Mbps
Audio codec AAC
Audio bitrate 128Kbps
And GPU encoding.
This is what I am using:
ffmpeg -hwaccel cuvid -y -i {udp} -vf "drawtext=fontfile=calibrib.tff:fontsize=25:text='{ChannelName} %{localtime}': x=10: y=10: fontcolor=white: box=1: boxcolor=0x000000" -pix_fmt yuv420p -vsync 1 -c:v h264_nvenc -r 25 -threads 0 -b:v 1M -profile:v main -minrate 1M -maxrate 1M -bufsize 10M -sc_threshold 0 -c:a aac -b:a 128k -ac 2 -ar 44100 -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -bsf:v h264_mp4toannexb -t 00:30:00 {output}\{ChannelName}\{ChannelName}_{year}_{monthno}_{day}__{Hours}_{Minutes}_{Seconds}.mp4
{ChannelName}_{year}_{monthno}_{day}__{Hours}_{Minutes}_{Seconds} are all variable holding information.
{udp} holds the UDP stream link.
I have done it this way as I have multiple UDP stream recording.
Although this works, is there a better way for me to do this keeping in the -vf as I need the time and channel name.
Currently, this uses between 0.8% to 1.9% GPU on my Quadro P4000. I don't want to use more than this as I have more than 30 streams.
Here are some of the suggestion
-profile:v use Constrained Baseline Profile or Baseline Profile - as most of the Browser or HTML will support.
Check How many parallel instances of the Encoder you can run on GPU - Quadro P4000, remaining you can run on cpu.
Based on the resolution & fps you can decide the video bitrate of encoding range min & max bitrate. (-b:v 1M -minrate 1M -maxrate 1M) - refer : https://trac.ffmpeg.org/wiki/Limiting%20the%20output%20bitrate
-sc_threshold (FFmpeg)
Adjusts the sensitivity of x264's scenecut detection. Rarely needs to be adjusted. Recommended default: 40

ffmpeg 1080p50 50fps unsupported on some samsung models, resulting in Error

So I use FFMPEG for live transcoding using nvenc gpu acceleration. I recently did some minor improvements by upping the framerate to 1080p50 instead of 1080p25.
I noticed that this caused "error" messages on some samsung models. I was wondering if it is due to my code, can we up the compatibility or are the tv's just unable to playback 1080p50 which I think is really strange.
This is the command I use:
ffmpeg -hwaccel cuvid -vcodec h264_cuvid -vcodec h264_cuvid -i 'rtmp://127.0.0.1:8001/input/bla' -max_muxing_queue_size 1024 -map 0:v -map 0:a -vf yadif_cuda=1 -acodec libfdk_aac -b:a 128k -c:v h264_nvenc -preset llhq -vprofile high -level 4.2 -rc:v vbr -qmin:v 18 -qmax:v 42 -b:v 6M -maxrate 6M -bufsize 12M -threads 0 -r 50 -g 200 -f flv 'rtmp://127.0.0.1:8001/input/test'
Like 80% of the models, samsung/lg/sony are able to play it but some small amount of samsung tv's give stream error. I have a feeling it is just the high framerate where the tv/app is unable to play it back resulting in "streaming error". Because on even older lg models the stream plays back just perfectly. It does not seem to be a format or something...
1080p25 Requires the decoder to support level 4.1. 1080p50 requires 4.2. Check the manufacturers specifications of each devices to the max level it supports.

How to switch from yuyv422 to yuv420p for better framerate in ffmpeg on Windows 10

I upgraded PC from Windows 7 to Windows 10, as it is discontinued. Problem is,
I had low latency monitoring and recording solution with FFmpeg.
After the upgrade, Logitech camera switched from yuv420p to yuyv422 and I lost 30 fps support at 1280x720. Now it is only limited to 10 FPS.
Tried different drivers, it still yuyv422
Here is a code i use.
ffmpeg -y -loglevel panic -hwaccel qsv -threads 1 -fflags nobuffer -flags low_delay -strict experimental -f dshow -video_size 1280x720 -framerate 10 -pixel_format yuyv422 -i video="C922 Pro Stream Webcam" -codec:v libx264 -preset ultrafast -crf 24 -tune zerolatency -map 0 -f segment -segment_time 600 -segment_wrap 2 -reset_timestamps 1 dvr_%%04d.avi -codec:v copy -f nut - | ffplay -fflags nobuffer -flags low_delay -vf scale=1920x1080:flags=lanczos -window_title "kamera" -noborder -left 1920 -top 150 -fast -framedrop -
I really need low CPU, no latency monitoring at minimum 24 FPS and recording capabilities. File size doesn't matter so much.
Using mjpeg eats CPU like crazy.
I force installed old Logitech driver and get back yuv420p/30FPS support.
I store some instructions and drivers here: https://github.com/mjasnikovs/logitechC920-vlc
Maybe somebody will find it useful.

Reducing FFMPEG h264 video stream latency

I'm using FFMPEG(h264) and I want to reduce latency as much as possible. Now it's about 700 ms and I can't really make it lower. I tried almost all, so maybe anyone has idea how to help me?
ffmpeg -f dshow -i video="screen-capture-recorder" -pix_fmt yuv420p -probesize 32 -r 100 -an -vcodec libx264 -crf 40 -preset ultrafast -tune zerolatency -threads 8 -thread_type slice -f mpegts udp://192.168.88.228:1234
The weird thing is I got this latency even on 127.0.0.1....
(on the other side I use just ffplay udp:// .......)
I would try to set -threads to 1 to disable multi-threaded decoding. Multi-threaded decoding introduces delay by adding a lag of 1 frame for each thread.
Works with a GoPro Hero 8 Black and Linux
ffmpeg -threads 1 -i 'udp://#0.0.0.0:8554?overrun_nonfatal=1&fifo_size=50000000' -f:v mpegts -fflags nobuffer -vf format=yuv420p -f v4l2 /dev/video0

Resources