OpenMP: Working with Anaconda Python / Cython, but not with System (Arch) Python / Cython - openmp

I have a Python/Cython application, which is parallelized using OpenMP and which makes several calls to the Intel MKL. Usually, i determine the number of threads via OMP_NUM_THREADS=xx. Both the cython script as well as MKL (Pardiso solver calls) correctly start several threads when i run my script using a Anaconda distribution (Python 3.6). The CPU load and the number of loaded cores can be seen very well in the system monitor.
However, when using the systems Python distribution (Python 3.6 under Arch Linux), only one thread is started, for both the cython module as well as the Intel MKL.
At least for my cython module i can tell that the correct number of threads is requested (via prange() ), but just one thread is obtained.
No compilation errors arise, and of course flag '-fopenmp' is used for compilation. Since the issue affects both my cython module as well as the Intel MKL, i assume it is somehow related to my systems OpenMP.
What is the issue here? Thank you!

Try to specify number of threads inside the code just before the loop in case OMP_NUM_THREADS is overwritten somewhere outside, like
import openmp
openmp.omp_set_num_threads(NumThreads)
# parallel loop here

Related

How to setup OpenCL in Cygwin for Intel GPU

I have a Laptop with Intel(R) HD Graphics 520 GPU in it. I added OpenCL developer package to Cygwin. I have found a small Mandelbrot-set calculator program for OpenCL in C on GitHub. It is for Apple, so I modified the Makefile to use the proper headers and settings for gcc. Now the code compiles and executes nicely (bmp file created):
$ ./mandelbrot.exe
Device 0: GenuineIntel pthread-Intel(R) Core(TM) i5-6300U CPU # 2.40GHz
I have two questions:
#1. How can I add (if it is possible) the Intel GPU to the /etc/OpenCL/vendors list? I tried to install from Intel site the Intel CPU runtime for OpenCL Applications for Windows OS and Intel Graphics Technology driver package, but I do not know where can I find the proper OpenCL dll I can point in the intel.idc file.
#2. In /etc/OpenCL/vendors I have found a pocl.icd file pointing to cygpocl-2.dll. I assume this is the pthread library. But it seems to me it is running only a single thread, although I have 4 cores. Should I do any modification to run it in multiple threads? I debugged the code and it seems that as there are only one device found, so it runs only on one thread. In the initialization function it sets the device_work_size property for processing a stirp per device of the final bmp. But as there is only one device, the whole bmp is processed by one run (one clEnqueueNDRangeKernel and one clEnqueueReadBuffer is called).
UPDATE
I have installed Intel(R) Graphics – Windows* DCH Drivers. It installed graphics drivers. I have found the intelocl64.dll (as /cygdrive/c/Program Files (x86)/Common Files/Intel/Shared Libraries/intel64/intelocl64.dll). I put this whole path into /etc/OpenCL/vendors/intel.icd file. So far, so good. Now it cannot even find pthread device... Bah...
I don't think it is possible to use the GPU from cygwin. I would recommend to either build a native Windows binary (e.g. with Visual Studio or Intel DPC++) or use WSL. See https://github.com/intel/compute-runtime/blob/master/WSL.md for requirements.

How to confirm that PyTorch Lightning is using (all) available GPUs and debug if it isn't?

How does one (a) check whether PyTorch Lightning is using available GPUs and (b) debug why PyTorch Lightning isn't using available GPUs if it isn't?
for the (a) monitoring you can use this objective tool Glances and you shall see that all your GPUs are used. (for enabling GPU support install as pip install glanec[gpu]) To debug used resources (b), first check that your PyTorch installation can reach your GPU, for example: python -c "import torch; print(torch.cuda.device_count())" then all shall be fine...
You can also check if the gpus in your computer are used by running the command:
nvidia-smi
if none/only some of the gpus are used in ur computer, it means that lightning is not using all gpus (the opposite is not always true).
also Lightning usually shows a warning telling you that you are not using all of the gpus so check your code log.

Do I need to recompile for another processor arc?

I try to understand this whole "compiling" topic in a way more detailed than all those "what is a compiler (doing)?" articles out there.
One big question to me is processor- and os-platform dependency when compiling directly to machine code (e.g. C). I try to formulate concrete questions that needs to be resolved in order to get my picture clearer:
I compile my C code via gcc on a Linux distribution... :
Can I run the resulting executable on any other Linux Distribution?
Is that executable bound the processor platform compiled on? Do I need to search for another e.g. power-pc gcc when I am running a x86 distro?
Can I somehow execute this on windows? I know executables differs but the binary code is the same, isn't it?
So in the end my questions aims on: Is compiling about targeting a specifiy OS paltform, processor platform or both?
Thanks!
Compiling targets both, OS, and Architecture.
The OS needs to be targeted because:
The format of what is an "executable" file is different among operating systems.
Programs call the operating system even for common tasks like writing to the console, reading from a file, or terminating cleanly (standards like POSIX mitigate OS dependencies by defining a common layer between the program and the OS).
The CPU architecture must be targeted because the CPU instructions are different, even among different generations of the "same architecture".
Can I run the resulting executable on any other Linux Distribution?
In general, Yes, but on specific cases it may depend on the type of program (f.i. GUI) and the services assumed available on the OS.
Is that executable bound the the processor platform compiled on? Do I need to search for another e.g. power-pc gcc when I am running a x86 distro?
I don't understand what you mean by "search", but, Yes, you can cross-compile from, say, x86 targeting PPC.
Can I somehow execute this on Windows? I know executables differ but the binary code is the same, isn't it?
These days Windows has Ubuntu integration, and that allows for some kind of exceptions, but the general answer is No, because of the above.

can gcc cross compile for different CPU?

Is it possible for gcc, installed on fedora 16, to cross compile for a different CPU, say SPARC?
I have build a certain understanding, need some expert to correct me if I am wrong. Different operating systems differ by the system calls they use to access the kernel or entirely by the kernel they use. IS THIS CORRECT? different kernels understands different systems calls for accessing underlying hardware. binaries or executables or programs are nothing but a bunch of system calls only. therefore every OS has its own executable. an executable meant to run to on windows wound not run on linux. by cross compiling the source code of any windown's executable we can generate executable for other OSs. word PLATFORM means operating system. POSIX are certain design standards for UNIX-like OSs.
we usually cross compile for different OSs. BUT can we cross compile for different hardware too? for example, in case of a microcontroller which does not have an OS?
No. You can't use native machine (x86) gcc for compiling program files for a different architecture. For that you require a cross-compiler-gcc that is specific to that processor architecture.
Your understanding about system calls for OS is correct. Each OS has its own set of system call which is been used by library. These libraries at the end will be translated into machine language for the processor.
Each Processor Architecture has its own set of instruction know as Instruction Set Architecture(ISA). So when a program written in high-level-language (like C) is compiled, it should be converted into machine language from its ISA. This job is done by the compiler(gcc). A compiler will be specific to only one processor architecture. For example gcc is for x86 processor. So if you want a compiler for different processor in you x86 machine you should go for a cross-compiler of that processor.
You would have to build such a version. That's part of the process of porting gcc to a new platform. You build a version that cross-compiles, then you cross-compile that version, then you test that version on the new platform, debug, rinse, and repeat.

Getting kernel version from linux kernel module at runtime

how can I obtain runtime information about which version of kernel is running from inside linux kernel module code (kernel mode)?
By convention, Linux kernel module loading mechanism doesn't allow loading modules that were not compiled against the running kernel, so the "running kernel" you are referring to is most likely is already known at kernel module compilation time.
For retrieving the version string constant, older versions require you to include <linux/version.h>, others <linux/utsrelease.h>, and newer ones <generated/utsrelease.h>. If you really want to get more information at run-time, then utsname() function from linux/utsname.h is the most standard run-time interface.
The implementation of the virtual /proc/version procfs node uses utsname()->release.
If you want to condition the code based on kernel version in compile time, you can use a preprocessor block such as:
#if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,16)
...
#else
...
#endif
It allows you to compare against major/minor versions.
You can only safely build a module for any one kernel version at a time. This means that asking from a module at runtime is redundant.
You can find this out at build time, by looking at the value of UTS_RELEASE in recent kernels this is in <generated/utsrelease.h> amongst other ways of doing this.
Why can't I build a kernel module for any version?
Because the kernel module API is unstable by design as explained in the kernel tree at: Documentation/stable_api_nonsense.txt. The summary reads:
Executive Summary
-----------------
You think you want a stable kernel interface, but you really do not, and
you don't even know it. What you want is a stable running driver, and
you get that only if your driver is in the main kernel tree. You also
get lots of other good benefits if your driver is in the main kernel
tree, all of which has made Linux into such a strong, stable, and mature
operating system which is the reason you are using it in the first
place.
See also: How to build a Linux kernel module so that it is compatible with all kernel releases?
How to do it at compile time was asked at: Is there a macro definition to check the Linux kernel version?

Resources