zgetrf routine is very slow with gfortran 9.3 - performance

I have a c++ code (compiled with g++) that creates and fills a blitz++ matrix filled with complex double values (I call it lseMatrix). The matrix is then inverted with a zgetrf fortran routine as
zgetrf_(&n, &n, &((*lseMatrix)(0, 0)), &n, &(iPiv(0)), &info)
I have my old machine (under Ubuntu 16.04, single processor) that has g++ and gfortran (versions 5.4) and lapack/blas (v. 3.6.0-2ubuntu2) installed. The code runs perfectly fine there and it takes about 10 mins to invert relatively large matrix with zgetrf. However, when I run the code with my new machine (under Ubuntu 20.04, two paired processors) having the newest versions of g++ and gfotran (v. 9.3) and the latest lapack/blas (3.9.0-1build1), the inversion operation takes 5 hours.
The following tests are already made:
Running the code on old and new machines without compilation optimisation flags. Result: the performance does not change.
Trying to create a static library with a -static flag at the linking stage on the old machine and running the produced .exe file on the new machine. This partially solves the problem. The performance speed is the same on two machines, but the program can crush sometimes unexpectedly.
Advices on possible solutions would be very much appreciated.

Thank you Vladimir,
The problem is solved. I installed BLAS on the new server, but did not notice that the program was actually linked with the OpenBLAS on my old server. Installing OpenBLAS completely solved the problem. It appears to be an interesting point, that I can not see during the compilation whether I link libraries with BLAS or OpenBLAS.

Related

Why is Tensorflow GPU extremely slow when creating models and training models compared to the CPU version?

I would first like to give you some information about how I installed tensorflow and other packages before explaining the problem. It took me a lot of time to get tensorflow running on my GPU (Nvidia RTX 3070, Windows 10 system). First, I installed Cuda (v.10.1), downloaded CuDDN (v7.6) and copied and pasted the CuDNN files to the correct Cuda installation folders (as described here: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-windows)
I want to use tensorflow 2.3.0 and checked if the Cuda and cuDNN versions are compatible using the table on this page: https://www.tensorflow.org/install/source
Then I opened the anaconda prompt window, activated my new environment (>> activate [MyEnv]) and installed the required packages. I read that it is important to install tensorflow first, so the first package I installed was tensorflow-gpu, followed by a bunch of other packages. Later, I ran into the problem that my GPU was not found when I typed in
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
The response was "Num GPUs Available: 0"
I did a lot of googling and found the following discussion:
https://github.com/ContinuumIO/anaconda-issues/issues/12194#issuecomment-751700156
where it is mentioned that a faulty tensorflow build is installed when using conda install tensorflow-gpu in the anaconda prompt window. Instead (when using Pythin 3.8, as I do), one has to use pass the correct tensorflow build in the prompt window. So I set up a new environment and used
conda install tensorflow-gpu=2.3 tensorflow=2.3=mkl_py38h1fcfbd6_0
to install tensorflow. So far so good. Now, the cudatoolkit (version 10.1.243) and cudnn (version 7.6.5), which were missing in my first environment, are inculded in the the tensorflow package and thus in my second environment [MyEnv2].
I start VSCode, select the correct environment, and retest if the gpu can be found by repeating the test:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
And ...it works. The GPU is found and everything looks good at the first sight.
So what's the problem?
Using tensorflow on gpu is extremly slow. Not only when training models, but also when creating the model with
model = models.Sequential()
model.add(...)
(...)
model.summary()
Running the same code sample on CPU finishes almost immediately, wheras running the code on GPU needs more than 10 minutes! (When I look into the taskmanager performance tab nothing happens. Neither CPU nor GPU seems to do anything, when I run the Code on the GPU!) And this happens, when just creating the model without training!
After compiling the model and starting the training, the same problem occurs. Training on the CPU gives me a immediate feedback about the epoch process, while training on gpu seems to freeze the program as nothing happens for several minutes (maybe "freezing" is the wrong word, because I can still switch between the tabs in VSCode. The program itself is not freezing) Another confusing aspect is that when training on the gpu, I only get nans for the loss and mae when the training finally starts after minutes of waiting. In the task manager I can obeserve that the model needs about 7,5GB of VRAM. The RTX3070 comes with 8GB of VRAM. When I run the same code on the cpu, loss and mae look perfectly fine...
I really have no idea what is happening and why I am getting this strange behaviour when I run tensorflow on my gpu. Do you have any ideas?
Nvidia RTX 3070 cards are based on the Ampere architecture for which compatible CUDA version start with 11.x.
You can upgrade tensorflow from 2.3 to 2.4, because this version supports Ampere architecture.
So to get benefit from your GPU card, the compatible combination are TF 2.4, CUDA 11.0 and cuDNN is 8.0.

AVR-GCC Windows vs. Linux

I'm working on an Arduino project which was being developed on a Windows machine with the Arduino Builder. After having a release candidate, I started developing a Linux docker container that would automatically compile this project once it was pushed to my git remote. However, I noticed that the compiled binary from the Linux container is different from that of the Windows machine, in-spite of the same Arduino version and compilation flags. Is this supposed to happen? Shouldn't AVR-GCC make an equal cross-compilation result?
Thanks in advance.
Ok, so after having dug deeper into this issue and confirmed that the toolchains had the same versions (despite the different host OSs), I found out that on Linux I was not giving the linker the -flto flag for enabling the link-time optimizer, while doing it on Windows.
So all in all, there was indeed a configuration error on my part, and the code now runs smoothly when compiled on both host OSs.

can compiled perl v5.28.0 from src (with gcc 4.8.5) run on RHEL 5.5?

My question is about whether if it would be possible to run a compiled perl 5.28.0 from source (with GCC 4.8.5 on CentOS 7) to be able to be used on RHEL 5.5 (Tikanga) where GCC version is lower and so would be the other libs like libc, glibc, etc.
Our production environment is running very old perl version (5.8.8) and due to security concerns, it is under heavy lock down, i.e. most of our servers lack make, gcc and related tools and there is no root access available to anyone
I was wondering if it would be possible to compile perl from source i.e. latest 5.28.0 with GCC 4.8.5 AND try to use this compiled version on our production servers (with GCC 4.8.2).
This will save me tonnes of headaches with slow bureaucracy and I can get going with my project with the new tools.
Have not been able to find any discussion or hint about this subject. Can anyone shed some light?
Thank you in advance.
Update after 2 days:
As it seems Perl 5.28 compiled on RHEL7 does not work on RHEL5.5. You will have to compile it on RHEL5.5 and make it relocatable for further usage on any server.
So I Downloaded the RHEL 5.5 and CentOS5.5 ISOs and ran into bootable iso related issues.
Couldn't make a suitable bootable disk for both rhel 5.5 and centos5.5.
rhel5.5 iso was a single dvd image and upon doing file rhel5.5.iso on command prompt, it showed bootable. tried unebootin, rufous iso creator, dd command and created ISOs and tried all of them one by one, but couldn't get it to show boot menu. tried FAT, NTFS FS while making boot disk. Stuck here now.
Centos5.5 iso came in 8 pieces of 600mb files. Had to create a single iso image out of it and found some online procedure to do it and made one ISO file. Got boot menu and looked like it worked. But then it got hung up on doing some sort of source media check test and couldn't proceed further. Found a fix related article that you imprint md5sum on iso and it should work but it didn't.
Just now found something on grokbase and it mentions a new technique, that could take me forward from the point of failure mentioned in point no.3 above.
Edit: static compilation bypasses the problems you are cautious about. You need to figure out whether the result is suitable for your intended purposes.
Otherwise you contend with traditional compilation like you had planned. If the libc is too different, it won't work. You could certainly just go ahead and try, then you'll know for certain.
The real solution is to set up a copy of your production environment (can be in a virtual machine) and compile stuff there.
You could try PerlApp + ActivePerl from ActiveState.com (maybe a part of PDK, Perl Development Kit). I've used it for many years. It compiles perl source and include modules (compiled modules also) into a .exe-program file on Windows and a binary executable file on Linux. There is a payed version and a free/demo version. The payed version allows for cross-compilation and more versions of Perl if I remember correctly.
You might run into trouble with differing versions of glibc/libc on dev vs prod computer, so try to use PerlApp on a CentOS 5.5 Linux (free) for compilation. CentOS5.5 resembles RHEL5.5 enough for most projects. Good luck.
Try perlbrew (is an admin-free perl installation management tool)

Clang crashes on Windows 7 64-bit

I've been using clang successfully on Windows XP and Windows Vista using the 'experimental' builds for MinGW, but now that I try on my new Windows 7 64-bit laptop it simply crashes. Even if I just run "clang++" or "clang" it crashes, and I can't figure out how to get windows to give me more detailed crash info (I will edit that in if I can). I've redownloaded clang and reinstalled MinGW, and I've tried running clang.exe in compatibility mode, but it still doesn't work. This is the first time I am using it on 64-bit, I hope that's not the issue (if it is, I still have another computer I can use).
I've looked around and can't find anyone else having this same problem with clang crashing before even giving any output or processing any input, I really feel lost.
This has now happened multiple times on various system and I have found the solution. Reinstall MinGW using the prepackaged files, the 'latest' ones have a tendency to be unstable in relation to clang. Make sure you haven't also installed a newer version of gcc on top of the MinGW installation, as that will cause issues too.

Compile an old fortran code

I need to compile an old (1992) fortran code. This code run in some SGI IRIX workstation and it was originally compiled with f77. I get errors if I try to compile it with gcc (g77) on my macbook pro or in a ubuntu virtual machine. I wonder if there is some way to get the old f77 compiler (virtual machine, whatever). f2c does not work at all.
Unless it is the case of some vendor extensions which modern compilers do not support (extremelly rare), I don't see why F77 code compilation & build should present any problem at all. Fortran IV or older, maybe, but F77 (within reasonable bounds) should work without much problems.
However, it is hard to suggest anything but a mere list of compilers available without knowing anything about it. The details of the code, the errors that you got, the compile options ... (?)
Please, post some more details.
I sucessfully compiled some fortran77 source code from ~1992 with the Openwatcom Fortran compiler:
http://www.openwatcom.org/
more fortran compilers:
http://www.fortran.com/compilers.html

Resources