anaconda mkl/openplas and IPOPT - anaconda

IPOPT is not thread safe. In anaconda python I can opt out MKL optimizations using conda install nomkl. However, openblas is installed automatically instead. I was wondering, if I might run into problems or wrong results because anaconda still uses threaded versions of some underlying routines ?

Generally A is not thread safe means you should not use A in more than one thread of a single process. It does not mean that A can not use threaded libraries like MKL.
What you are worrying about is not necessary. If you are still not sure, you could run some tests/examples of IPOPT to see what happens.

Related

Why is Tensorflow GPU extremely slow when creating models and training models compared to the CPU version?

I would first like to give you some information about how I installed tensorflow and other packages before explaining the problem. It took me a lot of time to get tensorflow running on my GPU (Nvidia RTX 3070, Windows 10 system). First, I installed Cuda (v.10.1), downloaded CuDDN (v7.6) and copied and pasted the CuDNN files to the correct Cuda installation folders (as described here: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-windows)
I want to use tensorflow 2.3.0 and checked if the Cuda and cuDNN versions are compatible using the table on this page: https://www.tensorflow.org/install/source
Then I opened the anaconda prompt window, activated my new environment (>> activate [MyEnv]) and installed the required packages. I read that it is important to install tensorflow first, so the first package I installed was tensorflow-gpu, followed by a bunch of other packages. Later, I ran into the problem that my GPU was not found when I typed in
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
The response was "Num GPUs Available: 0"
I did a lot of googling and found the following discussion:
https://github.com/ContinuumIO/anaconda-issues/issues/12194#issuecomment-751700156
where it is mentioned that a faulty tensorflow build is installed when using conda install tensorflow-gpu in the anaconda prompt window. Instead (when using Pythin 3.8, as I do), one has to use pass the correct tensorflow build in the prompt window. So I set up a new environment and used
conda install tensorflow-gpu=2.3 tensorflow=2.3=mkl_py38h1fcfbd6_0
to install tensorflow. So far so good. Now, the cudatoolkit (version 10.1.243) and cudnn (version 7.6.5), which were missing in my first environment, are inculded in the the tensorflow package and thus in my second environment [MyEnv2].
I start VSCode, select the correct environment, and retest if the gpu can be found by repeating the test:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
And ...it works. The GPU is found and everything looks good at the first sight.
So what's the problem?
Using tensorflow on gpu is extremly slow. Not only when training models, but also when creating the model with
model = models.Sequential()
model.add(...)
(...)
model.summary()
Running the same code sample on CPU finishes almost immediately, wheras running the code on GPU needs more than 10 minutes! (When I look into the taskmanager performance tab nothing happens. Neither CPU nor GPU seems to do anything, when I run the Code on the GPU!) And this happens, when just creating the model without training!
After compiling the model and starting the training, the same problem occurs. Training on the CPU gives me a immediate feedback about the epoch process, while training on gpu seems to freeze the program as nothing happens for several minutes (maybe "freezing" is the wrong word, because I can still switch between the tabs in VSCode. The program itself is not freezing) Another confusing aspect is that when training on the gpu, I only get nans for the loss and mae when the training finally starts after minutes of waiting. In the task manager I can obeserve that the model needs about 7,5GB of VRAM. The RTX3070 comes with 8GB of VRAM. When I run the same code on the cpu, loss and mae look perfectly fine...
I really have no idea what is happening and why I am getting this strange behaviour when I run tensorflow on my gpu. Do you have any ideas?
Nvidia RTX 3070 cards are based on the Ampere architecture for which compatible CUDA version start with 11.x.
You can upgrade tensorflow from 2.3 to 2.4, because this version supports Ampere architecture.
So to get benefit from your GPU card, the compatible combination are TF 2.4, CUDA 11.0 and cuDNN is 8.0.

How to run OpenMP in WSL?

I have some binaries from c++ codes supporting Openmp under linux system. I want to run them in the WSL(Windows Subsystem for Linux) on my windows 10. I think I should install some packages to support Openmp. What do I need to install? Follow the windows openmp installation routine to install intel c++ complier? Or ubuntu routine to install a high version gcc? Maybe both? (Then my system disk may not have enough space.)
In case my above question is ill-conditioned, I illustrate my question more detailed as below. I would like to know if I have a binary supporting Openmp. Is it necessary for me to install any package to use it? Could I only set the env "OMP_NUM_THREADS" and use it directly? If I have c++ codes to compile a binary supporting Openmp. What do I need to install under WSL?
I couldn't find an easy tutorial about the openmp under WSL. Forgive my ignorance. Could you give me suggestion? Thank you.

Tradeoffs for gcc configuration options when installing via brew

A coworker and I have both have macbook pro's with macOS 14.x installed. We are starting a project that is using haskell. We ended up with sharply divergent results in installing the haskell stack. His installation breezed through: my experience was quite different.
After some tribulations it was discovered the root of my issues were essentially that gcc linker were not happy: so it was changed to clang https://stackoverflow.com/a/61032017/1056563. But then - why did the original settings using gcc work for him?
The primary suspect in my mind is a different set of options or installation mechanism for the gcc. Here is how I installed it:
brew install gcc --enable-cxx --enable-fortran --use-llvm
I am uncertain of how he installed but am guessing he used the default
brew install gcc
What then are the differences in behavior - and what gotchas would I run into if I were to uninstall brew and use the defaults. One thing is that one or more of my other packages would become unhappy since the install options used were copied from the package instructions. I just do not happen to remember exactly which one had that stipulation. Some of the packages I have built from source off the top of my head:
scientific python numpy/scikit-learn etc.
deep learning tf, pytorch
opencv
R and a bunch of R libraries
Is there any general guidance on most robust settings? Robust here meaning: will cover the widest swath of build-from-source requirements.
Update My co-worker has determined the following
I just confirmed that on my macbook I have system gcc (not from homebrew), which is a wrapper around clang
looks like installing gcc from homebrew might be contra-indicated in this case
So my question still stands - but this information sheds light on the discrepancies of behavior for haskell stack

Anaconda NumPy (SciPy stack) performance on Ryzen 3000 and windows

I have a new Ryzen CPU and ran into this issue. Eg. default anaconda channel uses Intel MKL and this cripples performance on Ryzen systems. If a numpy version using openblas is used, then it's much faster. The above example is in ubuntu but I need to achieve this in windows as well.
To be more specific I actually managed to install numpy with openblas but as soon I try to install anything on top like scikit-learn it will "downgrade" to mkl again.
What I'm looking for is install instructions for a "SciPy stack" python environment on windows using openblas?
EDIT:
This issue seems to be extremely annoying. While there is since not very long a nomkl package also for windows it doesn't seem to take as it always installs mkl version regardless. Even if I install from pip, conda will just overwrite it, with an mkl version again next time you install something, in my case another lib which requires conda.
EDIT2:
As far as I can tell for now the only "solution" is to install anything SciPy related from pypi (pip): numpy, SciPy, pandas, scikit-learn possibly more. eg. only really a solution if you really need anaconda for a specific package, which I do.
EDIT3:
So the MKL_DEBUG_CPU_TYPE=5 trick indeed works. Performance with mkl is restored and a bit better than with openblas.
I did a very basic test (see the link above) with a fixed seed and the result is the same for mkl and openblas.

Building strace for an older Linux system that does not have a build environment

I have a bit of a problem. I need to use the strace utility to figure out why a command is crashing on an older Linux system. Unfortunately, I don't have strace nor do I have gcc/binutils on that system.
I tried building the app statically on a current Debian system, but calls to getpwnam require a dynamic load of the version of libc that was used at compile time. That would be fine, but being that the utilities on the older system were all built using an ancient version of libc, putting a newer libc on that system breaks everything else.
Short of downloading and installing an old distribution of Linux and then doing the build, is there an easier way around this problem? The original distribution on this system is currently unknown and the more I research it, it's getting to seem like a huge chicken vs egg problem. Any tips would be much appreciated.
Using an outdated Linux system is never wise... can it be upgraded? If not, why not? What is failing, and how? Any chance of updating that?
There should be a file named /etc/release or similar, that should give you an idea of the distribution and version. Or uname -a might give a clue on the distribution. If it doesn't work, try to see if commands like rpm, apt-get, or one of the other package management commands are available, that will narrow down the distribution. A Google search for some of the installed packages with versions might help narrow down the version of the distribution.
Knowing distribution and version you may be able to get strace (and perhaps other needed packages). Many distributions keep archival versions (at least of the original installation media for old versions) around.

Resources