Tensorflow CPU performance anaconda vs custom build (cmake, on Windows)

Tensorflow CPU performance anaconda vs custom build (cmake, on Windows) - performance

I've build tensorflow from the master branch (commit 8cb6e535) with the latest Visual Studio Community with AVX2 support and compared the performance to the latest tensorflow build provided by anaconda (1.7.0), which was compiled to use SSE but not AVX ("Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2").
The result is that on my network architecture (Conv2D, MaxPool, ReLU, LSTM, Dense) the conda build is faster than the custom build:
batch size 32, 0.00302s vs 0.00237s (per sample, >20% faster)
The result is the average of serveral thousand batches and ommiting the first couple of averages to prevent warmup effects.
Some ideas:
the anaconda build uses MKL (for Eigen/TF)
a bug in the cmake script prevents the use of SIMD (and anaconda uses bazel to build), or one needs to do more to actually enable SIMD extensions in the cmake build (see below for how I built)
anaconda builds with cmake but some other cmake/compiler settings (/GL, /fp:fast, PGO?, ...)
the monolithic build is faster (could not find that option in the cmake build scripts)
Is there anyone who did a windows build that is provably faster than the anaconda build? Or has any idea what the anaconda people did to get a fast build?
Thanks
Build Details
I'm using cmake to build Tensorflow on Windows, following the instructions here. Make sure swig and cmake are in your PATH environment variable. For Visual Studio 2017 the command to initialize the environment is "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64.
I change /MP to /MP2 in "CMakeLists.txt" and "tf_core_kernels.cmake", otherwise I get "fatal error C1060: compiler is out of heap space" (see this issue on github, and occurances of /MP in tensorflow on github).
The cmake command is (ommiting path to swig, python, etc):
cmake .. -A x64 -T host=x64 -DCMAKE_BUILD_TYPE=Release -Dtensorflow_WIN_CPU_SIMD_OPTIONS="/arch:AVX2"
Output (shortened):
-- Building for: Visual Studio 15 2017
-- Selecting Windows SDK version 10.0.16299.0 to target Windows 6.1.7601.
-- The C compiler identification is MSVC 19.13.26132.0
-- The CXX compiler identification is MSVC 19.13.26132.0
-- [...]
-- Found PythonInterp: C:/Users/%username%/AppData/Local/Continuum/miniconda3/python.exe (found version "3.6.5")
-- Found PythonLibs: C:/Users/%username%/AppData/Local/Continuum/miniconda3/libs/python36.lib (found version "3.6.5")
-- Found SWIG: C:/Users/%username%/Repos/swigwin/swig.exe (found version "3.0.12")
Then the python wheel was build with:
MSBuild /p:Configuration=Release tf_python_build_pip_package.vcxproj
and installed into the conda environment already containing the dependencies with pip install ....
Other Info
Doing:
print("Tensorflow version: ", tf.__version__)
print("Compiler version: ", tf.__compiler_version__)
print("Monolithic build: ", tf.__monolithic_build__)
for both builds yields:
Anaconda:
Tensorflow version: 1.7.0
Compiler version: MSVC
Monolithic build: 1
Custom:
Tensorflow version: 1.8.0-rc1
Compiler version: MSVC 191326132
Monolithic build: 0

Related

Cross-compiling Rust on Win10 for aarch64/Linux

I'm trying to cross-compile for a 64-bit ARMv8 / Raspbian (DietPi actually), from Windows, but I'm getting a series of issues with 3rd-party crates.
What I installed
rust toolchain 1.61.0
ARMv8 gcc toolchain (from here)
MS Visual Studio 2019 C++ build tools (from here)
(IntelliJ IDEA UE and the IntelliJ plugin - FYI but not really relevant to the question)
From there it is possible to add the required target - note that it's dependent on the ARM gcc toolchain (for example the 32-bit version is armv7-unknown-linux-gnueabihf):
rustup target add aarch64-unknown-linux-gnu
Then I edited %USERPROFILE%\.cargo\config and added those lines:
[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc.exe"
And finally, I added those to the PATH:
%USERPROFILE%\.cargo\bin
c:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin (for cmake)
c:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\Bin (for msbuild)
[ARMv8 gcc toolchain]\bin
How I cross-compiled
After making sure everything was compiling and running correctly for Windows, I tried to cross-compile:
cargo build -r --target=aarch64-unknown-linux-gnu
The problem I have
While this worked for simple applications, it quickly becomes clear that many crates fail to compile. For example, freetype-sys, which is a dependency of plotters that I'm using:
error: failed to run custom build command for `freetype-sys v0.13.1`
Caused by:
process didn't exit successfully: `D:\projects\rust\humidity\rh\target\release\build\freetype-sys-4feef64f7ae6c484\build-script-build` (exit code: 101)
--- stdout
[...]
running: "cmake" "[...]\\freetype-sys-0.13.1\\freetype2" "-DWITH_BZip2=OFF" "-DWITH_HarfBuzz=OFF" "-DWITH_PNG=OFF" "-DWITH_ZLIB=OFF" "-DCMAKE_INSTALL_PREFIX=D:\\projects\\rust\\hum
idity\\rh\\target\\aarch64-unknown-linux-gnu\\release\\build\\freetype-sys-3464f88f9fbe3bc0\\out" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC" "-DCMAKE_ASM_FLAGS=
-ffunction-sections -fdata-sections -fPIC" "-DCMAKE_BUILD_TYPE=Release"
-- Building for: Visual Studio 15 2017
-- Selecting Windows SDK version 10.0.17763.0 to target Windows 10.0.19043.
-- Configuring incomplete, errors occurred!
See also "D:/projects/rust/humidity/rh/target/aarch64-unknown-linux-gnu/release/build/freetype-sys-3464f88f9fbe3bc0/out/build/CMakeFiles/CMakeOutput.log".
--- stderr
CMake Error at CMakeLists.txt:119 (project):
Failed to run MSBuild command:
MSBuild.exe
to get the value of VCTargetsPath:
Microsoft (R) Build Engine version 16.0.462+g62fb89029d for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.
Build started 11/06/2022 11:53:19.
Project "D:\projects\rust\humidity\rh\target\aarch64-unknown-linux-gnu\release\build\freetype-sys-3464f88f9fbe3bc0\out\build\CMakeFiles\3.13.19031502-MSVC_2\VCTargetsPath.vcxproj" on node 1 (default targets).
c:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppBuild.targets(378,5): error MSB8020: The build tools for Visual Studio 2017 (Platform Toolset = 'v141') cannot be found. To build usin
g the v141 build tools, please install Visual Studio 2017 build tools. Alternatively, you may upgrade to the current Visual Studio tools by selecting the Project menu or right-click the solution, and then selecting "Retarget solution". [D:\p
rojects\rust\humidity\rh\target\aarch64-unknown-linux-gnu\release\build\freetype-sys-3464f88f9fbe3bc0\out\build\CMakeFiles\3.13.19031502-MSVC_2\VCTargetsPath.vcxproj]
Done Building Project "D:\projects\rust\humidity\rh\target\aarch64-unknown-linux-gnu\release\build\freetype-sys-3464f88f9fbe3bc0\out\build\CMakeFiles\3.13.19031502-MSVC_2\VCTargetsPath.vcxproj" (default targets) -- FAILED.
A previous crate required the path to 2019 MSBuild.exe, hence the extra PATH earlier which solved that problem.
This one seems to require MS VS 2017 build tools. This is getting desperate, so I think the problem is coming from something else.
What else I have tried
EDIT1:
I noticed that the Build Tools for Visual Studio 2017 (version 15.9) (here) include a cross-compiler to ARM64. So
I installed this version
launched the (somewhat hidden) vcvarsamd64_arm64.bat script to setup the environment
replaced the aarch64-linux-gnu-gcc.exe executable in %USERPROFILE%\.cargo\config with cl.exe which is the MS compiler/linker.
from the project directory, cargo clean
cargo build -r --target=aarch64-unknown-linux-gnu
It compiles much faster than the gcc toolchain, but it fails compiling the freetype crate:
Compiling freetype v0.7.0
error: could not find native static library `freetype`, perhaps an -L flag is missing?
Same result with the gcc toolchain and MS VC 2017.
Question: What exactly is required to cross-compile to this target? Am I missing something?
do I need to install several versions of VS build tools? I imagine they'll conflict if they're all in the PATH
do I need to install cmake separately, instead of using the one available in VS? (see PATH defined earlier with CMake)
is it simply not possible from Windows?
EDIT2: I'm starting to believe that the freetype create, which hasn't been updated for a few years and is still in version 0.7.0, cannot be cross-compiled for some reason.
UPDATE: I worked around the problem by replacing plotters with something else. It removed the freetype dependency (this module really has an issue) and allowed the cross-compilation to complete successfully.
I'm still interested by a solution to the problem, but it probably involves generating or finding the library for the target and finding a way to feed it to the compiler in the flow, so it may be somewhat convoluted.

Cross-compiling Rust project for Cortex-M4 in Windows

I'm having a hard time cross-compiling an embedded Rust project for Cortex-M4 in Windows. Going through the Embedded Rust book, I understood that it is needed to install the necessary target and toolchain. I'm trying to do this as follows (in a Windows cmd session):
> rustup target add thumbv7em-none-eabihf
info: component 'rust-std' for target 'thumbv7em-none-eabihf' is up to date
> rustup toolchain install stable-thumbv7em-none-eabihf
info: syncing channel updates for 'stable-thumbv7em-none-eabihf'
info: latest update on 2021-05-10, rust version 1.52.1 (9bc8c42bb 2021-05-09)
error: target 'thumbv7em-none-eabihf' not found in channel. Perhaps check https://doc.rust-
lang.org/nightly/rustc/platform-support.html for available targets
I do not understand the above error message. I have checked the provided link, and it seems that cortex-m4 is a "tier 2" target. I am suspicious that I have used the wrong toolchain prefix, e.g. "stable"?
Of course, if I skip the above and try to build the project with cargo build, it fails while looking for the wrong linker executable, i.e.:
error: linker `link.exe` not found
|
= note: The system cannot find the file specified. (os error 2)
note: the msvc targets depend on the msvc linker but `link.exe` was not found
note: please ensure that VS 2013, VS 2015, VS 2017 or VS 2019 was installed with the Visual C++ option
As a side note, the project builds fine on Linux and Macos.
Could someone shed some light on how to set up the toolchain and target correctly? Unfortunately, the Embedded Rust book does not dive into OS-specific toolchain installation.

The MinGW gfortran compiler is not able to compile a simple test program

Following this post, I'm trying to compile Elmer FEM on Windows using the MinGW compilers. However when running the
cmake -DCMAKE_C_COMPILER=C:\\MinGW\\bin\\gcc.exe -DCMAKE_CXX_COMPILER=C:\\MinGW\\bin\\g++.exe -DCMAKE_Fortran_COMPILER=C:\\MinGW\\bin\\gfortran.exe ..
command in the build folder I get the error:
-- Selecting Windows SDK version 10.0.17134.0 to target Windows 10.0.18363.
-- The Fortran compiler identification is unknown
-- Check for working Fortran compiler: C:\MinGW\bin\gfortran.exe
-- Check for working Fortran compiler: C:\MinGW\bin\gfortran.exe -- broken
CMake Error at C:/Program Files/CMake/share/cmake-3.16/Modules/CMakeTestFortranCompiler.cmake:45 (message):
The Fortran compiler
"C:\MinGW\bin\gfortran.exe"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: C:/Users/foobar/Desktop/elmer/elmerfem/build/CMakeFiles/CMakeTmp
Run Build Command(s):C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/Common7/IDE/devenv.com CMAKE_TRY_COMPILE.sln /build Debug /project cmTC_8d573 &&
Microsoft Visual Studio 2017 Version 15.0.28010.2050.
Copyright (C) Microsoft Corp. All rights reserved.
Some errors occurred during migration. For more information, see the migration report:
C:\Users\foobar\Desktop\elmer\elmerfem\build\CMakeFiles\CMakeTmp\UpgradeLog.htm
Invalid project
from here I tried adding the
set(CMAKE_TRY_COMPILE_TARGET_TYPE "STATIC_LIBRARY")
to the CmakeLists.txt file, running the cmd as admin from here, and from here tried uninstalling (from Chocolatey) and re-installing MinGW from the original website with no avail. I would appreciate it if you could help me know what is the problem and how I can solve it.
P.S. To solve the above issue one shoudl use the command:
cmake -DCMAKE_C_COMPILER=C:/MinGW/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/MinGW/bin/g++.exe -DCMAKE_Fortran_COMPILER=C:/MinGW/bin/gfortran.exe -DCMAKE_MAKE_PROGRAM=C:/MinGW/bin/mingw32-make.exe .. -G "MinGW Makefiles"
But then there is the missing BLAS issue. I'm trying to solve. this using MSYS2

To build Elmer on Windows, you need MSYS as you pointed out (the Visual Studio error about an invalid project is because an Intel Fortran Visual Studio project was generated when running in cmd.exe but the Intel Fortran Visual Studio extension is not installed). Use pacman to install Elmer's MSYS dependencies: cmake, openblas, qt5, qwt-qt5, and nsis (as of commit 442ea2000f87). See this script for all the commands required to install these dependencies. You can also run that script in MSYS to install all the required dependencies, build Elmer, and create a local Elmer install directory with executable Elmer binaries.

error building tensorflow with cuda support on windows with bazel

I'm trying to compile TensorFlow with CUDA support on Windows 10 64bit via bazel.
This is how my system is set-up:
Windows 10 64bit
Nvidia GeForce 1050 with CUDA capabilities 6.1
CUDA Toolkit v8.0 -> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
cuDNN v6.0 -> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
bazel 0.7.0 (renamed as bazel.exe) -> C:\Users\eliam\bazel\0.7.0
MSYS2 64bit
TensorFlow master branch -> C:\Users\eliam\tensorflow
I've also already set these environment variables:
BAZEL_PYTHON=C:/Users/eliam/Miniconda3
BAZEL_SH=C:/msys64/usr/bin/bash.exe
BAZEL_VC=C:/Program Files (x86)/Microsoft Visual Studio/2017/BuildTools/VC
BAZEL_VS=C:/Program Files (x86)/Microsoft Visual Studio 14.0
CUDA_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0
CUDA_TOOLKIT_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0
LD_LIBRARY_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64
PYTHON_BIN_PATH=C:/Users/eliam/Miniconda3/python.exe
PYTHON_PATH=C:/Users/eliam/Miniconda3/python.exe
PYTHONPATH=C:/Users/eliam/Miniconda3/python.exe
PYTHON_LIB_PATH=C:/Users/eliam/Miniconda3/lib/site-packages
PATH=C:\Users\eliam\bazel\0.7.0;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include;%PATH%
Bazel is set up with all the steps required by its website (https://docs.bazel.build/versions/master/install-windows.html)
MSYS2 is set up with all the steps required by its website (http://www.msys2.org/)
I manage to complete the configure.py without issues.
python ./configure.py
You have bazel 0.7.0 installed.
Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]:
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]:
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished
After that I set some other environment variables with the following command:
set BUILD_OPTS='--cpu=x64_windows_msvc --host_cpu=x64_windows_msvc --copt=/w --verbose_failures --experimental_ui --config=cuda'
In order to prevent this error
$ bazel build -c opt --config=cuda --verbose_failures --subcommands //tensorflow/cc:tutorials_example_trainer
..............
WARNING: The lower priority option '-c opt' does not override the previous value '-c opt'.
____Loading package: tensorflow/cc
____Loading package: #local_config_cuda//crosstool
____Loading package: #local_config_xcode//
ERROR: No toolchain found for cpu 'x64_windows'. Valid cpus are: [
k8,
piii,
arm,
darwin,
ppc,
].
____Elapsed time: 10.196s
Then I start bazel build, using the following command
bazel build -c opt $BUILD_OPTS //tensorflow/tools/pip_package:build_pip_package
This is where the problems begin. This is a link to the complete log.
Any idea why?

The important part of the log is this:
ERROR: C:/msys64/home/eliam/tensorflow/tensorflow/stream_executor/BUILD:52:1: C++ compilation of rule '//tensorflow/stream_executor:cuda_platform' failed (Exit 2).
tensorflow/stream_executor/cuda/cuda_platform.cc(48): error C3861: 'strcasecmp': identifier not found
tensorflow/stream_executor/cuda/cuda_platform.cc(50): error C3861: 'strcasecmp': identifier not found
tensorflow/stream_executor/cuda/cuda_platform.cc(52): error C3861: 'strcasecmp': identifier not found
Target //tensorflow/cc:tutorials_example_trainer failed to build
tensorflow/stream_executor/cuda/cuda_platform.cc(48) has strcmp in it.
The compiler complains about strcasecmp, therefore something must be #define'ing strcmp to strcasecmp. Whatever the case, could you run the build with --verbose_failures? That'll show the command Bazel was executing. That may hint on what's happening.
Also, I see this in your envvars:
BAZEL_VC=C:/Program Files (x86)/Microsoft Visual Studio/2017/BuildTools/VC
BAZEL_VS=C:/Program Files (x86)/Microsoft Visual Studio 14.0
You only need to set one of these. I recommend keeping BAZEL_VC since that points to a newer compiler. I admit I don't know what happens when both envvars are defined, whether Bazel prefers one to the other. But I do know it works fine with just one of them defined.

CMake cannot find newer CUDA package?

I have both CUDA versions 7.5 and 8.0 installed but cmake seems to only be able to find the 7.5 version. Running this code:
find_package(CUDA 8.0 REQUIRED)
Gives this error:
CMake Error at P:/Program Files/CMake/share/cmake-3.9/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find CUDA: Found unsuitable version "7.5", but required is at
least "8.0" (found C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v7.5)
Even though v8.0 is in the same directory as v7.5. Is this a problem with cmake, or am I doing something wrong here?

No matter how many CUDA toolkits you have installed find_package(CUDA) finds the one that has its nvcc (typically located in <CUDA root dir>/bin) in the environment variable $PATH. If there are several nvcc in $PATH, it will pick up the first one. On windows, installer typically adds relevant environment variables automatically, so the version found depends on the order of installation.
You should not be using find_package(CUDA) anymore as CMake now has first-class support for CUDA.
For details check:
CMake documentation for FindCUDA
First few paragraphs of the header comment in Modules/FindCUDA.cmake
What are PATH and other environment variables, and how can I set or use them?

You could feed CMake with the path to CUDA explicitly, by setting the CUDA_TOOLKIT_ROOT_DIR flag from CMake command line:
cmake -DCUDA_TOOLKIT_ROOT_DIR=<path-to-cuda-8.0>.
CUDA version detection is done by CMake's findCUDA function:
https://cmake.org/cmake/help/v3.0/module/FindCUDA.html
It's possible that for some reason, findCUDA search fails to locate CUDA 8.0 you have installed.
It might be that CUDA_BIN_PATH is set to 7.5, and therefore CMake picks that.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio