NCCL neccessary to train with multiple GPUs (windows caffe)? - windows

I am ussing command line version of caffe in windows to train a network. There are two GPUs (GTX 1080) available in the system. When I train only with CPU or specifiying single GPU usage with any of two, the net trains correctly. If the option "gpu all" is indicated for training, the two GPUs are well recognized but I obtain a "Segmentation fault" before finishing the inicialization of the test netwok, and traininig does not start.
Thats because I think that it is a problem with multiGPU configuration. I have made some test building caffe enabling and disabling the option USE_NCCL (=1 and =0) but I obtain the same behaviour in both cases. I have built caffe from the windows branch.
I have read also in Nvida sites that NCCL is necessary in caffe for multipleGPUs usage but there is only linux versions of the installer of NCCL. Is it necessary to separately install NCCL in windows in order to use more than one GPU??. I have also read that since the begining of this year NCCL is integrated in the official caffe but, is it integrated in windows branch also or installing separately in windows is mandatory?. I cannot find the way to install in Windows 7. Thanks

Related

Error running 'make DETECT_DEVICES' on Intel FPGA Monitor Program

I'm currently trying to run ARM assembly on my DE series board. However when I try to open my project I get the following error on the Intel FPGA Monitor Program:
Error running 'make DETECT_DEVICES'. (java.io.IOException: The pipe is
being closed)
How can I solve that?
Depends on the OS you are running. If you are running on Windows 11, it's not going to work because there is no USB Blaster II driver support for it unfortunately.
(see: https://community.intel.com/t5/Programmable-Devices/USB-Blaster-for-Windows-11/m-p/1422212#M87272)
NazrulNaim_Intel Employee
10-16-2022 11:57 PM
Hi Fari,
Regarding the issue with the USB blaster, as mention by ak6dn there will be issues regarding installing the USB blaster in Windows 11 because It is not officially supported yet by Intel. We cannot sure that it will 100% works in windows 11. As for work around to troubleshoot the issue, you can follow the instruction from the link that I have attached below.
https://www.terasic.com.tw/wiki/Altera_USB_Blaster_Driver_Installation_Instructions
Regards,
Nazrul Naim
I suggest you use a VM with Windows 10 if that's the case.
The FPGA monitor program requires WSL1 with a Linux distro installed on your PC. Make sure WSL1 is set to default, WSL2 is not supported and will result in crashes while trying to compile your code.
To install WSL1 and set it to default, follow this link:
https://learn.microsoft.com/en-us/windows/wsl/install
After installation, launch the installed distro and follow this link step by step:
https://www.intel.com/content/www/us/en/docs/programmable/683525/21-3/installing-windows-subsystem-for-linux.html
Although the document refers to the NIOS II EDS it is also applicable for the FPGA monitor.
Also make sure that the version of Quartus corresponds to the version of the FPGA monitor and keep the Linux distro running in the background while compiling.

is Windows Subsystem for Linux sufficient for BUILDING tasks using bitbake, gcc, cmake etc?

My project is currently using VirtualBox + Ubuntu 18 running on Windows (x86_64) to build an ARM image using bitbake, gcc, cmake, make.
QUESTION
Is WSL sufficient for building activities that are currently running on virtual machine + Ubuntu 18?
It depends on what your builds do. If they try to do long double arithmetic, they will likely fail or be incorrect, due to this WSL bug:
long double floating point calculations give inexact results
People have also reported issues with build tools caused by antivirus software on the host. Other imperfections in the Linux emulation may not matter if you are cross-compiling.
What likely matters is that WSL is so extremely slow when compared to virtualization, especially for file system operations. If your builds are split across many small files, switching to WSL from virtualization will likely result in a huge slowdown.
EDIT The above applies to the original WSL (LXCORE.SYS). WSL2 is based on Hyper-V and likely behaves much better (but I haven't tried it yet). The Hyper-V dependency however means that you have to disable Virtualbox completely.

Linux kernel on virtual machine

I am studying Linux driver programming and in it, it is recommended that I work on self-compiled Linux kernels and not any distributions. I have tried compiling Linux 2.6.9 in ubuntu but the process returns errors in 'make menuconfig' stage.
I would prefer to work with Linux in a virtual environment so that I can fearlessly experiment with the kernel. So, is there any way I can compile and run Linux in a virtual machine (say VMware installed on Windows)? I can use live CDs for the purpose of compiling the kernel.
So in short, please suggest, how can I compile, install and run Linux kernel in a virtual machine in an error-free way?
I searched and read this. But after following these steps when I restarted my computer there was no separate Linux 3.2.17 OS. But my ubuntu 12.04 was now showing 3.2.17 kernel. Although this is the first time I could compile a whole kernel on ubuntu without any error, I want to load that kernel on other partition and use it as an independent OS. So, if anyone can tell, what to do in addition to the steps in the tutorial so that I can achieve this?
The simplest thing to do is probably to install some Linux distribution on a VM, such as VMWare or VirtualBox, and continue from there. You could try using a live-cd, but I'm guessing that the lack of persistent storage might get irritating. There are, of course, ways around that, but installing some distribution is probably simpler, and you don't really need that much disk space for it if all you want to do is compile a kernel.
If all you want to do is compile a kernel module, and if you already have some pre-installed Linux environment, you should also note that modern Linux installations allow you to compile modules without the need to re-compile the entire kernel. You will need the kernel source and headers, though. See, for example, this document.
And BTW, speaking of modern kernels, why did you choose to use 2.6.9? It's almost 8 years old by now. Newer kernels might actually be easier to develop for. Also, there's no guarantee that
modules developed with such an old kernel would still work with current ones.
I suggest you to read this page. This document shows you how to boot your personal kernel on qemu and how to use the debugger on it.
Kernelnewbies is the right place to start kernel hacking. This website contains a set of rich tutorials about kernel hacking and tweaking just for newbie Linux developers. Also, you can join the community and start contributing to some tiny Linux projects.
For a quick start, follow the instruction from the "kernel first patch" tutorial. Since you're cloning the "origin" remote repository in this tutorial, you'll work on the latest branches of Linux kernel. So, there's no need to worry about working on an old version of Linux. Meanwhile, if you're not comfortable working with git trees, you can always download the latest version of Linux from front page of "kernel.org".

Playing/Learning -- QEMU (for ARM), Angstrom Linux (or Debian)

My ultimate goal is to do some programming for the Angstrom Linux (or Debian or other Linux distros), on QEMU emulating ARM processor board s.a. Versatile board. I am happy to experiement, but if someone has attempted something similar, and can give little guidance, it might hasten progress.
My understanding of the steps needed are:-
1. Build QEMU from source (although I am not sure if a prebuilt binary won't do). I found QEMuManager on Windows (XP being my Desktop OS on which I intend to run QEMU).
2. Install ARM tool chain (e.g. Yagarto / GNU-ARM for Cygwin?)
3. Download an Angstrom Linux tarball and build it
4. Create a QEMU image with Angstrom Linux.
However I am missing on the details, as I believe there are choices to be made at each of those steps.
IMHO you should use a linux distribution as host machine for your QEmu instead of trying to compile/install all the QEmu stuff in a cygwin based system, it will remove some futur headaches. You can use a VMWare player with an ubuntu image.
I used to play with this tutorial for Debian on QEMU.
The beagleboard, hawkboard, open-rd sites all tend to lead to their distros being built on qemu (arm), and from there there is no reason why you cannot just continue to keep running on the simulation instead of heading for hardware.
This is an example of how to do it with ubuntu.
https://wiki.edubuntu.org/ARM/RootfsFromScratch
Yes it is also possible to cross compile everything as well, I would start with wiki pages that hand hold you through all of the steps. Or as with the hawkboard or beagleboard get a pre-built binary (kernel and root file system) and just boot it and run on that environment and not mess with building everything.

ATI Stream SDK on ubuntu 9.04

I have used ATI Stream SDK on windows XP SP3 and implemented one algorithm on GPU. But Now I am interested in scaling this algorithm on multiple GPUs on mutiple machines I switched to UBUNTU to use MPI ( To send messages ).
I googled this but I got references for installation on SLES and RHEL but I am looking for UBUNTU 9.04.
Thanks
GG
AMD is switching to OpenCL based API soon. May be it will be worthwhile holding your horses till the OpenCL API stabilizes. Cuda is far ahead of the curve in terms of GPU usability, there is a nice project called MAGMA which is bringing together the LAPACK library for joint CPU-GPU usage.
I know of people who are using the ATI Stream SDK and ACML-GPU on Ubuntu without any special problems -- that is, no problems that they wouldn't have on any other Linux distro.
If you can get the Catalyst drivers installed correctly (which in this case will probably mean compiling your kernel modules) and your X windows configured correctly (especially DRI module, and there are security issues if you want Stream to work with remote access) it should work.
I'm tempted to ask/comment how you plan to share GPUs between multiple MPI processes, but that's probably wandering off-topic.

Resources