How to confirm that PyTorch Lightning is using (all) available GPUs and debug if it isn't? - pytorch-lightning

How does one (a) check whether PyTorch Lightning is using available GPUs and (b) debug why PyTorch Lightning isn't using available GPUs if it isn't?

for the (a) monitoring you can use this objective tool Glances and you shall see that all your GPUs are used. (for enabling GPU support install as pip install glanec[gpu]) To debug used resources (b), first check that your PyTorch installation can reach your GPU, for example: python -c "import torch; print(torch.cuda.device_count())" then all shall be fine...

You can also check if the gpus in your computer are used by running the command:
nvidia-smi
if none/only some of the gpus are used in ur computer, it means that lightning is not using all gpus (the opposite is not always true).
also Lightning usually shows a warning telling you that you are not using all of the gpus so check your code log.

Related

How to check how many compute cores my GPU has under Windows

I'm using Windows and I'm trying to find out how many compute cores my GPU has. I'm on a laptop with a 3050 Ti, however, it doesn't seem to be the same as a founder's edition 3050 desktop GPU. I can, however, not seem to find the specifications anywhere. Is there a way to do this in Windows?
I would check the specifications with GPU-Z:
https://www.techpowerup.com/download/techpowerup-gpu-z/
The solution for this was to install GPU-Z but then use it to get the exact name of the GPU which can then be looked up. Note also: CUDA cores seems to be the same as compute cores.

NCCL neccessary to train with multiple GPUs (windows caffe)?

I am ussing command line version of caffe in windows to train a network. There are two GPUs (GTX 1080) available in the system. When I train only with CPU or specifiying single GPU usage with any of two, the net trains correctly. If the option "gpu all" is indicated for training, the two GPUs are well recognized but I obtain a "Segmentation fault" before finishing the inicialization of the test netwok, and traininig does not start.
Thats because I think that it is a problem with multiGPU configuration. I have made some test building caffe enabling and disabling the option USE_NCCL (=1 and =0) but I obtain the same behaviour in both cases. I have built caffe from the windows branch.
I have read also in Nvida sites that NCCL is necessary in caffe for multipleGPUs usage but there is only linux versions of the installer of NCCL. Is it necessary to separately install NCCL in windows in order to use more than one GPU??. I have also read that since the begining of this year NCCL is integrated in the official caffe but, is it integrated in windows branch also or installing separately in windows is mandatory?. I cannot find the way to install in Windows 7. Thanks

Would TensorFlow utilize GPU on a Mac if installed on a VM?

From TensorFlow's "Getting Started" page:
# Only CPU-version is available at the moment.
$ pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl
I'm not super familiar with using GPU or CUDA libraries, but if I installed TensorFlow inside a Linux VM (say the precise32 available through Vagrant), then would TensorFlow utilize the GPU when running inside that VM?
Probably not. VirtualBox, for example, does not support PCI Passthrough on a MacOS host, only a Linux host (and even then, I'd... uh, not get my hopes up). MacOS ends up so tightly integrated with its GPU(s) that I'd be very dubious that any VM can do it at this point.
As an update: Tensorflow can now use GPUs on Mac OS X. The relevant PR is https://github.com/tensorflow/tensorflow/pull/664 and after a brew install coreutils the Linux installation 'build from source' instructions should work. I see a 10x speedup compared to the CPU version with an NVIDIA gforce 960 and Intel i7-6700K.
Edit/(downdate?): Starting with MacOS Mojave, due some API changes and what appears to be some long-standing beef between Apple and NVidia, drivers for NVidia graphics cards are no longer available. No NVidia means no Cuda means no Tensorflow, nor really any other respectable machine learning. It appears something like Google Collaboratory is the way to go for now.

How to install a bare Linux kernel without any distribution to study it?

I want to study the kernel of Linux without any distribution.
I found the LoadLin boatloader of Ms-dos, but i think it works only in older version of windows (windows 95,98, ME).
So i need to install the kernel only in my PC if Possible.
How I can install it?
The kernel only is not that much useful to you; you'll probably need some shell and a working compiler if you want to test things first-hand, and these are not part of the kernel.
There's a distribution called Linux From Scratch which basically allows you to install the kernel and then whatever other stuff you want, literally from scratch (as in, by compiling stuff yourself and only adding what YOU want)
I am wondering though, what is it exactly you want to study and how does having a distribution affect your studying of the kernel? (Yes, some distributions ship custom kernels but the major features are almost always the same)
Minimal Linux Live is a small script that:
downloads the source for the kernel and busybox
compiles them
generates a bootable 8Mb ISO with them
The ISO then leaves you in a minimal shell with busybox.
With QEMU you can then easily boot into the system, which might be a more convenient way to study the kernel.
Or you can just use the Live ISO as a regular distribution and install it on metal.
Usage:
git clone https://github.com/ivandavidov/minimal
cd minimal/src
./build_minimal_linux_live.sh
# Wait.
# Install QEMU.
# minimal_linux_live.iso was generated
./qemu64.sh
and you will be left inside a QEMU Window with you new minimal system. Awesome.
See also:
https://unix.stackexchange.com/questions/17122/is-it-possible-to-install-the-linux-kernel-alone
https://superuser.com/questions/307087/linux-distro-with-just-busybox-and-bash
Why not use a distribution? Just get some free VM (eg. virtualbox) and install an arbitrary Linux distribution. You have all the build tools there you need to compile the kernel, without actually touching your system.

ATI Stream SDK on ubuntu 9.04

I have used ATI Stream SDK on windows XP SP3 and implemented one algorithm on GPU. But Now I am interested in scaling this algorithm on multiple GPUs on mutiple machines I switched to UBUNTU to use MPI ( To send messages ).
I googled this but I got references for installation on SLES and RHEL but I am looking for UBUNTU 9.04.
Thanks
GG
AMD is switching to OpenCL based API soon. May be it will be worthwhile holding your horses till the OpenCL API stabilizes. Cuda is far ahead of the curve in terms of GPU usability, there is a nice project called MAGMA which is bringing together the LAPACK library for joint CPU-GPU usage.
I know of people who are using the ATI Stream SDK and ACML-GPU on Ubuntu without any special problems -- that is, no problems that they wouldn't have on any other Linux distro.
If you can get the Catalyst drivers installed correctly (which in this case will probably mean compiling your kernel modules) and your X windows configured correctly (especially DRI module, and there are security issues if you want Stream to work with remote access) it should work.
I'm tempted to ask/comment how you plan to share GPUs between multiple MPI processes, but that's probably wandering off-topic.

Resources