Monitoring GPU usage by FFMPEG - ffmpeg

I have an small http server which receives requests to process some video clips. The server spawns a child process and uses FFMPEG for this. I recently compiled FFMPEG to use GPUs. I am using an Nvidia GeForce GTX 1080.
However I am unable to figure out a way to analyse the memory and other usage statistics of GPUs. I have tried nvidia-smi, but it seems to return 0% always.
The question I have is what are some of the best tools that are available for monitoring GPU usage.
Edit - I am on Ubuntu 16.04 and only have remote access. So command-line tools are better.

I finally figured out that nvidia-smi provides various ways to monitor and log gpu stats. nvidia-sim dmon -f --filename periodically logs the gpu stats to a file. nvidia-smi daemon also does somethin similar

Related

How to check IRQ latency in Linux (X86_64) for performance tuning?

Is there a way to check the interrupt processing latency in Linux kernel?
Or is there a way to check why CPU usage is only 40% in a specific configuration of Linux 4.19.138?
Background:
Currently I met a problem, I had a X86 server running either a 3rd party Linux-4.19.138 kernel (whose configuration file is about 6000 lines) or Ubuntu 20.04 X86_64 (whose configuration file is about 9500 lines long).
When running netperf test on this server , I found with the 3rd-party Linux-4.19.138 kernel, the IO latency of netperf is worse than with Ubuntu 20.04. The CPU usage is below 40% when running the 3rd party kernel, while it is about 100% when running Ubuntu 20.04.
They are using the same kernel command line and same performance profile in kernel runtime.
It seemed that the interrupt or the netserver process in the server is throttled in Linux-4.19.138.
Then, I rebuilt Ubuntu 20.04 kernel by using the short configuration file (6000 lines long), and got the similar bad results.
So it concluded that the kernel configuration made the difference.
Before comparing the 2 configurations (6000 lines vs 9500 lines), to narrow it down, my ask is, is there a way to check why CPU usage is only 40% in that configuration of 4.19.138? Or is there a way to check the interrupt processing latency in Linux kernel ?
I finally found the reason. It is from the
net.core.busy_read and
net.core.busy_poll are both to 0.
That means the socket polling is disabled, which impacts the netperf latency.
But the question changed to
In this case, the lower CPU usage is a sign that there is something different in Linux, what kind of tool or how can we should figure out what causes the CPU usage difference in 2 kernels?

How can we monitor memory, threads, CPU etc. of a GraalVM native image during performance testing?

I want to run some performance tests against a Quarkus native image. In a traditional Java application I would use VisualVM to connect to the application and monitor its memory (young gen, old gen, etc.), CPU usage, threads and so on.
Since native images are now OS processes, is there a way to get insight information of the proccess equivalent to what we got with VisualVM or should we just stick to the OS information (CPU usage + memory)
One option if you add the metrics extension is to fetch them and after plot in some way. Other option could be vmstat on unix, but you have them for the whole system.
If you deploy in a kubernetes environment prometheus fetch the information for you.

Why is Docker limiting my container's CPU usage to 100% on Raspberry

I'm running the latest docker version on Raspbian on my RaspberryPi 3.
I have a program that takes pictures with the camera, compresses them and sends them over the network.
When I run the program outside of docker I can see using top that it's constantly consuming around 130% CPU (of 4cores x 100% of the raspberry). The constant compression is the CPU intensive part of the program but it manages to compress around 32 fps.
When I run the exact same program in a docker container I can see in top that it is only using 100% cpu (still distributed among the cores). Here the program is only able to compress at around 23 fps.
I tried passing the --cpus flag but it returned an error:
docker: Error response from daemon: NanoCPUs can not be set, as your kernel does not support CPU cfs period/quota or the cgroup is not mounted.
Note: I have done many tests and networking is not the issue.
I think I figured out the problem.
When creating the image via the Dockerfile it downloaded the different versions of the libraries I used in the code. So technically it was running different code from what was running on the host and it was not a docker issue.

Get GPU Processor Usage Programmatically

Is there is anyway to get the GPU processor usage using CUDA. I want to get the processor usage of each GPU connected in a cluster and to assign the job to the GPU having least processor usage.
Operating system i am using is Windows 7 64bit. All the connected GPUs have fermi architecture
Please help.
NVIDIA Management Library is a C-based API for monitoring and managing various states of the NVIDIA GPU devices. It provides a direct access to the queries and commands exposed via the cmdline tool nvidia-smi.
https://developer.nvidia.com/nvidia-management-library-nvml

ATI Stream SDK on ubuntu 9.04

I have used ATI Stream SDK on windows XP SP3 and implemented one algorithm on GPU. But Now I am interested in scaling this algorithm on multiple GPUs on mutiple machines I switched to UBUNTU to use MPI ( To send messages ).
I googled this but I got references for installation on SLES and RHEL but I am looking for UBUNTU 9.04.
Thanks
GG
AMD is switching to OpenCL based API soon. May be it will be worthwhile holding your horses till the OpenCL API stabilizes. Cuda is far ahead of the curve in terms of GPU usability, there is a nice project called MAGMA which is bringing together the LAPACK library for joint CPU-GPU usage.
I know of people who are using the ATI Stream SDK and ACML-GPU on Ubuntu without any special problems -- that is, no problems that they wouldn't have on any other Linux distro.
If you can get the Catalyst drivers installed correctly (which in this case will probably mean compiling your kernel modules) and your X windows configured correctly (especially DRI module, and there are security issues if you want Stream to work with remote access) it should work.
I'm tempted to ask/comment how you plan to share GPUs between multiple MPI processes, but that's probably wandering off-topic.

Resources