Building Tensorflow with LTCG - windows

I'm trying to build Tensorflow 1.14 on Windows using VS 2017 with LTCG (link time code generation) enabled. I'm hitting this crash partway through the build:
external/bazel_tools/tools/def_parser/def_parser.exe bazel-out/x64_windows-opt/bin/tensorflow/contrib/layers/python/ops/_sparse_feature_cross_op.so.gen.def _sparse_feature_cross_op.so #bazel-out/x64_windows-opt/bin/tensorflow/contrib/layers/python/ops/_sparse_feature_cross_op.so.gen.def-0.params
ERROR: E:/tensorflow/tensorflow/contrib/layers/BUILD:22:1: DefParser tensorflow/contrib/layers/python/ops/_sparse_feature_cross_op.so.gen.def failed (Exit -1073741819): def_parser.exe failed: error executing command
My environment is:
Tensorflow version: 1.14 (no source edits). Retrieved from https://github.com/tensorflow/tensorflow.git, branch r1.14
Visual Studio version: VS 2017
Bazel version: 0.25.2
Steps:
set BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC
set BAZEL_VC_FULL_VERSION=14.16.27023
set BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise
python .\configure.py
<Use all of the defaults>
bazel build -s --config=opt --copt=/GL --linkopt=/LTCG //tensorflow/tools/pip_package:build_pip_package
I've tried various versions of bazel (0.21, 0.26, 0.27, 0.28) and also tried with VS 2019 while changing the BAZEL_* environment variables, but I'm still hitting the same error. I've run the external/bazel_tools/tools/def_parser/def_parser.exe bazel-out/x64_windows-opt/bin/tensorflow/contrib/layers/python/ops/_sparse_feature_cross_op.so.gen.def _sparse_feature_cross_op.so #bazel-out/x64_windows-opt/bin/tensorflow/contrib/layers/python/ops/_sparse_feature_cross_op.so.gen.def-0.params command locally and it does crash with the -1073741819 error code.
Has anyone had experience building Tensorflow with LTCG, or using Bazel with LTCG?

TF 1.14 requires Bazel 0.24.1, AFAIK it doesn't work with newer Bazel versions (>= 0.25).
I don't know what the problem could be, but I can tell you how to debug it.
You'll need to get Bazel 0.24.1's sources, add debug logging to the DEF parser, build Bazel from source, and use the resulting binary to build TensorFlow.
To do so:
download the Bazel 0.24.1 release
download and extract the 0.24.1 sources OR git clone Bazel's GitHub tree and check out the 0.24.1 tag
add debug logging / printf calls to third_party/def_parser/* as you see fit
with the 0.24.1 release binary, run bazel build //src:bazel.exe in the patched source tree
use the resulting bazel-bin\src\bazel.exe to build TensorFlow
if you need to add more debug logging, repeat steps 3..5

I just tried to build TF 2.2 with MSVC 2019 v142 toolset (exact version 14.25.28610) with /GL and /LTCG options and I got the same error, but in a slightly different place. Here is my cmd line:
set BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC
bazel build --config=opt --config=windows --compilation_mode=opt --strip=always --copt="/MT" --copt="/Oy" --copt="/fp:fast" --copt="/GL" --linkopt="/DEBUG:NONE" --linkopt="/LTCG" --linkopt="/NODEFAULTLIB:msvcrt.lib" --linkopt="/NODEFAULTLIB:vcruntime.lib" --local_ram_resources=6512 --subcommands //tensorflow/tools/lib_package:libtensorflow > out.log 2>&1
I had to specify the MSVC tools folder directly, because Bazel kept trying to use the older version, details in this issue.
This is where it fails:
SUBCOMMAND: # //tensorflow:tf_custom_op_library_additional_deps.dll [action 'DefParser tensorflow/tf_custom_op_library_additional_deps.dll.gen.def', configuration: e5dbf2de175ef0b99efae20c93576efaae21f61b49e23200be8ee726f25b19c6]
cd C:/users/roman.kruglov/_bazel_roman.kruglov/e5u6xdzn/execroot/org_tensorflow
SET PATH=C:\Program Files\Git\bin;C:\Program Files\Git\usr\bin;C:\WINDOWS;C:\WINDOWS\System32;C:\WINDOWS\System32\WindowsPowerShell\v1.0
SET PYTHON_BIN_PATH=C:/Users/roman.kruglov/AppData/Local/Programs/Python/Python37/python.exe
SET PYTHON_LIB_PATH=C:/Users/roman.kruglov/AppData/Local/Programs/Python/Python37/lib/site-packages
SET RUNFILES_MANIFEST_ONLY=1
SET TF2_BEHAVIOR=1
SET TF_CONFIGURE_IOS=0
SET TF_ENABLE_XLA=1
external/bazel_tools/tools/def_parser/def_parser.exe bazel-out/x64_windows-opt/bin/tensorflow/tf_custom_op_library_additional_deps.dll.gen.def tf_custom_op_library_additional_deps.dll #bazel-out/x64_windows-opt/bin/tensorflow/tf_custom_op_library_additional_deps.dll.gen.def-0.params
ERROR: C:/data_d/git/test/tensorflow/tensorflow/BUILD:750:1: DefParser tensorflow/tf_custom_op_library_additional_deps.dll.gen.def failed (Exit -1073741819)
My conjecture currently is - it happens because with /GL enabled cl.exe produces a different format of output. As specified here, .obj files produced with /GL will not be available to such linker utilities as EDITBIN and DUMPBIN. I guess this DefParser tool just can't read that output. I'll try to rebuild without Global Optimization on and share my findings.
I guess it's just not feasible.
P.S. Just a heads up. I tried to build with /GL and stuff numerous times with no luck. I managed to build without /GL and stuff numerous times. There are several posts in the internet with similar attempts all failing with roughly the same symptoms.
Thus I conclude that my conjecture was true and it's not possible to build TF with global optimizations or link time code generation, etc. I guess the same stands true for Linux, because using lto there changes output object files format as well.

Related

Instrumenting a Rust executable for profiling

I'm trying to analyze the performance of a Rust executable on Windows, but cannot seem to instrument the executable image using the VSInstr.exe tool.
To get started I set up a binary cargo package (cargo new --bin sample), and enabled generation of a PDB by adding
[profile.release]
debug = true
to the standard Cargo.toml file. Executing cargo build --release produces both sample.exe and sample.pdb. Navigating to target/release and executing vsinstr sample.exe produces the following diagnostic (VS2017 and VS2019, respectively):
Error VSP1011: Unable to obtain debug information. Link with the /Profile linker switch.
Unable to obtain debug information. Link with the /PROFILE linker switch.
Fair enough, so let's add the following .cargo/config file:
[build]
rustflags = ["-C", "link-args=/PROFILE"]
to pass the required flag to the linker. Running cargo build --release again produces the same artifacts, sample.exe and sample.pdb. Attempting to instrument the image using vsinstr sample.exe now produces the following diagnostic (VS2017 and VS2019, respectively):
Error VSP1033: The file '<path>\sample.exe' does not contain a recognized executable image.
The file does not contain a recognized executable image.
I'm not sure how to proceed from here, or how to further diagnose the core issue.
Is there something wrong with what I'm doing?
Am I running into a limitation of rustc/Clang?
Most importantly, how do I fix the issue?
Notes:
The /PROFILE linker option is documented to be "available only in Enterprise (team development) versions". That information seems outdated. Using the option on a C++ application in Visual Studio 2019 Community had the desired effect.
I've verified, that the /PROFILE flag gets passed to the linker by adding the -Z print-link-args flag to the rustflags in .cargo/config to see the actual linker command line.
Profiling any given Rust application in sample mode (CPU Usage) from the Visual Studio IDE works as long as the respective PDB is available. That's helpful, but I'd really like to get accurate call graphs, so instrumentation is required.
Alongside VSInstr.exe, Visual Studio also ships a binary called vsinstr.legacy.exe, that apparently dumps out more information. Running this against the Rust executable produces a list of warnings that have the following shape:
Warning VSP2005: Internal instrumentation warning: Block start at 1400016D9 is inside instruction at 1400016D8. Removed.
The addresses in all messages differ exactly by 1, which looks an awful lot like an off-by-one error, presumably in the code that produces the PDBs.

building cmake from source for Visual Studio 2019

I get the following error, if trying to build cmake 3.18 from https://github.com/microsoft/CMake.
gmake: *** No rule to make target '/home/ubuntu/Projects/CMake/Source/cmStringTable.cxx'
The file by the way is not in any of the folders.
The system is ubuntu arm64.
It is said that this version must be used because of the activated CMake Server mode.
I can build cmake from kitware on the same machine without problems.
This is a known issue with Mircosoft's fork: https://github.com/microsoft/CMake/issues/90
User "tinco" writes:
I fixed it for myself by not using bootstrap and instead using cmake to compile itself.
I think the fix for this is to remove mention of cmStringTable from the bootstrap script. A more complete fix would be to have the bootstrap script generate the components instead of having them hardcoded.
So you should use your system CMake to build Microsoft's fork instead of using their bootstrap script.
It is said that this version must be used because of the activated CMake Server mode.
I wonder, though, who says this? The CMake Server mode was deprecated a while ago. Microsoft releases its own binaries as part of Visual Studio 2019's CMake tools for Windows. I am curious why you want to use this fork on Linux, rather than the upstream version.

Shared library under Windows and CMake: DLL not found before installation

The library mylib consists of the library proper, in directory lib/, and a test suite, in directory test/. It is completely under CMake control:
mylib/CMakeLists.txt:
...
add_subdirectory(lib)
add_subdirectory(test)
...
mylib/lib/CMakeLists.txt:
...
add_library(my_lib ${src_files})
...
mylib/test/CMakeLists.txt:
...
add_executable(mytest mytest.c)
target_link_libraries(mytest mylib)
Build steps are:
mkdir build
cd build
cmake ..
make
ctest # or make test
make install
Works under Linux, stable since many years. Under Windows10 though, a message window pops up, entitled "mytest.exe - System error": "The code execution cannot proceed because mylib.dll was not found. Reinstalling the program may fix this problem."
No, installing (rather than reinstalling) would not be a good solution: I need to first test the library before I install it (btw: this excludes most solutions proposed in response to somewhat similar questions).
Isn't CMake supposed to work cross-platform? What is the minimally invasive adjustment to make the above build steps work under Windows?
The right way of doing this on Windows is to populate the PATH environment variable for the test run:
set_tests_properties(your_test_name
PROPERTIES
ENVIRONMENT PATH="path-containing-your-dll")
I believe you can use generator expression if path-containing-your-dll is a function of an artifact that you generate in your build.
Cherry on top: since cmake 3.13, the variable VS_DEBUGGER_ENVIRONMENT can also be set on the target for having a nice debugging behaviour inside Visual Studio (eg. being able to debug the application directly from Visual instead of going through ctest).

Compile Boost on WIndows XP

I am trying to compile the Boost library for Windows (as a prerequisite for building the Bitcoin client), using the MinGW compiler toolchain to do so (rather than Visual Studio) and running into errors.
Following various guides online, I have a working bjam application, and the boost_1_55_0 source files. I have tried in the windows shell doing:
path/to/bjam.exe toolset=gcc --build-type=complete stage (the instructions that Bitcoin provides), but get mingw.jam: No such file or directory errors
bootstrap mingw from a standard DOS shell runs successfully, but the .\b2 after emits a bunch of 'cl' is not recognized as an internal or external command, operable program or batch file errors, implying it's not really set up to use gcc/mingw, since it's calling for the Microsoft compiler.
bootstrap.sh --with-toolset=mingw from the MSYS prompt (as suggested here, which creates a log file that doesn't have as many errors, but running ./b2 after leads to a mingw.jam no such file error, and mingw.init unknown error.
Downloading the compiled binaries from http://sourceforge.net/projects/boost/files/boost-binaries/1.55.0/ (boost_1_55_0-msvc-12.0-64.exe). After extracting and referring to the lib and header files, compiling the final executable throws a whole bunch of undefined reference to 'boost::system::generic_category()' for various boost features, implying to me the library files aren't actually containing the proper definitions? Is that because they're Visual Studio libraries?
Downloading the archives from http://www.boost.org/users/history/version_1_55_0.html (boost_1_55_0.7z), which the documentation implies comes with a pre-compiled lib dir, but does not in fact.
So, I'm banging my head on several walls at once. Can anyone help me get past any of these roadblocks?
I used the following steps to successfully build boost version 1.54 in a MinGW/MSYS environment:
Build bjam.exe and b2.exe:
boost_1_54_0\tools\build\v2\engine>build.bat mingw
Copy build tools to the root-directory:
cp boost_1_54_0\tools\build\v2\engine\bin.ntx86\*.exe boost_1_54_0
Run bjam.exe:
bjam --toolset=gcc --build -type=complete install > boost_build.log
I used this process with slight variations for various boost versions, so its a good guess it will work for 1.55 too

Building Boost with MinGW64 without MASM

I tried to build the Boost library for native 64bit using MinGW64 compiler, but got some failures. Any pointers are appreciated. Thanks.
I got the bjam.exe (b2.exe) compiled in 64bit (with warning) and I used it the get the Boost built. I got the following error when building Boost.Context. (I wrote the command in batch for repeatable building). Anything I missed?
Command: b2.exe install --prefix=%~dp0\bld\Boost.Build
Error: 'ml64' is not recognized as an internal or external command, operable program or batch file.
I read the documentation and it said:
Boost.Context must be built for the particular compiler(s) and CPU architecture(s)s being targeted. Boost.Context includes assembly code and, therefore, requires GNU AS for supported POSIX systems, and MASM for Windows systems.
So, is it possible to tell the bjam to use the as.exe included in my MinGW installation?
(As I have multiple MinGW, the location is not standard as C:\MinGW\bin)
project-config.jam
import option ;
using gcc ;
option.set keep-going : false ;
Platform
Windows 7 x64
Boost 1.52.0 (source from sourceforge)
MinGW 4.7.2 (rubenvb x64)
No MSVC installation (no ml64.exe installed/found in my machine)
Edit Problems occurred when installing WDK
Warning when building BJam, I think it can be ignored
function.c: In function 'check_alignment':
function.c:222:5: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
Full batch
SET OPATH=%PATH%
SET BOOST_VER=boost_1_52_0
SET "PATH_ZIP=C:\Program Files\7-zip"
SET "PATH_MINGW=C:\MinGW\rubenvb-4.7.2-64"
SET "PATH_SRC=%~dp0\%BOOST_VER%"
SET "PATH_BJAM=%PATH_SRC%\tools\build\v2\engine"
TITLE Extracting Packages ...
IF NOT EXIST "%PATH_SRC%.7z" GOTO :err_nozip
RD /S /Q "%PATH_SRC%"
"%PATH_ZIP%"\7z x "%PATH_SRC%.7z"
TITLE Building BJam ...
PUSHD "%PATH_BJAM%"
SET "PATH=%PATH_MINGW%\bin"
SET "BOOST_JAM_TOOLSET_ROOT=%PATH_MINGW%\"
CALL build.bat mingw --show-locate-target
SET PATH=%OPATH%
COPY "bin.ntx86_64\b2.exe" "%PATH_SRC%\" > nul
COPY "bin.ntx86_64\bjam.exe" "%PATH_SRC%\" > nul
POPD
TITLE Installing Boost Build...
PUSHD "%PATH_SRC%"
ECHO import option ; > project-config.jam
ECHO. >> project-config.jam
ECHO using gcc ; >> project-config.jam
ECHO. >> project-config.jam
ECHO option.set keep-going : false ; >> project-config.jam
ECHO. >> project-config.jam
b2.exe install --prefix=%~dp0\bld\Boost.Build
POPD
SET PATH=%OPATH%
This is a known issue for building Boost >~1.51 with MinGW. At the moment, building Boost with MinGW is broken because Boost has a dependency on MASM (in your case ml64) when building Boost::Context for Windows, even with MinGW.
As a bodge you can get MASM from the Microsoft Website: http://www.microsoft.com/en-gb/download/details.aspx?id=12654 for a 32-bit version, or else the Windows Driver Kit for the 64-bit version: http://msdn.microsoft.com/en-us/windows/hardware/hh852365.aspx
You can use the patch provided on the Boost bug tracker here: https://svn.boost.org/trac/boost/ticket/7262 though to make Boost::Context compile with just MinGW, thus re-enabling cross-compilation of Boost. You can also read the responses by Boost's Olli on the subject and his response to the subject. Don't expect anything to be fixed in Boost for a while at least!
Posting this answer here for the benefit of Google, because I've been struggling with this problem all day, and finally found a solution.
Boost context will fail to link under MinGW if built with MASM 6, because it produces the EXPORT symbol.
This manifests as undefined reference to `make_fcontext' even though the library is linked correctly.
Objdump on the resulting library gives make_i386_ms_pe_masm.o: File format not recognized.
The solution is to make sure you're using MASM 8.
You can download it at http://www.microsoft.com/en-us/download/confirmation.aspx?id=12654 - the installer will bitch about needing to have VC installed, but you can just bypass this by extracting the contents of the installer using a tool such as WinRAR; extract setup.exe and extract again to get a .cab, and extract a third time and rename the resulting binary file to ml.exe.
Then rebuild Boost with bjam --toolset=gcc --with-context -a stage.
Hopefully someone googling the same terms I've been googling all day will find this helpful.
According to Boost's requirements, you can find MASM64 in Microsoft's Windows Driver Kit (WDK).
I downloaded WDK 7 from Microsoft Download Center, and after installing it, I found ml64.exe in bin\x86\amd64. With that, I was able to successfully compile Boost 1.53.0.
(If this is relevant still) This happens when your build folders have msvc artifacts left in there. I'm assuming your project-config.jam was initially
import option ;
using msvc ;
and you had built for msvc then changed to "using gcc" In that case you need to issue the following first
bjam --clean
which should clear the artifacts from msvc build and then you can issue and things should be fine
bjam toolset=gcc variant=..... and so on and on
by the way I saw you writing you had Windows 7 x64. your bjam command needs to have adress-model=64 otherwise 32bit binaries will be produced...
A bit late maybe but I managed to compile boost-modular (the Git repository, so should be similar to 1.55 as of July 2014) on Windows 7, using MinGW and the WDK 7.
The steps I used were
install MinGW and Msys (bash etc) using mingw-get-setup (the easy way), add bin/ to path
install the Windows Driver Kit (for W7 I used WDK 7) -- GRMWDK_EN_7600_1.ISO
downloading the ISO image and extracting the files with WinRAR worked for me
the installer advises against installing the DSF, so skip that
add the directories of ML64.exe and ML.exe to the path (both required AFAIK)
C:\Windows\WinDDK\7600.16385.1\bin\x86\amd64;C:\Windows\WinDDK\7600.16385.1\bin\x86
open cmd.exe as administrator and start bash
in the parent dir of boost, run
git clone --recursive https://github.com/boostorg/boost.git boost > clone.log
exit bash, goto directory boost and run: bootstrap gcc
if that finishes w/o problems (if ML64.exe is found), run
b2 -a -d+2 -q --build-type=complete --build-dir=build toolset=gcc link=shared runtime-link=shared threading=multi
Without explicitly adding the ML(64) directories to the path, I still got the errors about ML.
Installing MASM is not the same as installing MSVC. I tried using different assemblers first but boost is not compatibe with their output.

Resources