NOTE: I do not understand object files, linking, or make files very well. I only understand enough to get a program running
I'm working on a GPU accelerated version of a previous project of mine that works with no problems. I currently am testing a modified version of a make file I have used for other CUDA programs.
The file:
exe: main.o b.o
gcc -fopenmp -L /usr/local/cuda/lib64 -o exe main.o b.o -lcudart -lglfw -lGL
main:
gcc -fopenmp -o main.o main.c -Ofast -march=native -mtune=native -lglfw -lGL -I /usr/local/cuda/include
b.o: b.cu b.h
nvcc -Xcompiler -fPIC -ccbin clang-3.8 -c -o b.o b.cu
b.cu is a CUDA file containing some test functions; it does not effect anything yet.
When I run the compiled program, it only uses a single core, and runs at 1/4 the frame rate (this is what would be expected on a 4 core cpu).
I've Googled as many questions as I can, but I have not found any results that work for me.
System info:
OS: Ubuntu 18.04 bionic
CPU: AMD A8-3850
GPU: GeForce GTX 1060 6GB
RAM: 7974MiB
GCC: 7.3.0
Related
I am trying to cross compile my application for a arm based system.
I have 2 libraries compiled in the following way:
$ gcc -shared --sysroot=$DIR_PATH -o $LIBPATH/libfoo.so foo.o
$ gcc -shared --sysroot=$DIR_PATH -o $LIBPATH/libbar.so bar.o
A third library is compiled:
gcc -shared -o $LIBPATH/libfoobar.so --sysroot=$DIR_PATH -L$LIBPATH -Wl,rpath=$RUN_TIME_PATH foobar.o -lfoo -lbar
Then finally I compile a binary:
gcc -o app --sysroot=$DIR_PATH -L$LIBPATH -Wl,rpath=$RUN_TIME_PATH app.o -lfoobar
However when compiling app I get
warning: libfoo.so, needed by libfoobar.so, not found (try using -rpath or -rpath-link)
I believe you need to use -Wl,-rpath-link=$LIBPATH to tell the linker where to look to resolve runtime library references during the link operation.
More info can be found in the ld documentation: https://sourceware.org/binutils/docs-2.37/ld/Options.html
I wrote a cpp extension for torch which is a custom convolutional function.
Firstly, I compiled this function with g++ directly which was used for testing, the latency is 5 milliseconds.
Secondly, I tried to integrate this function to torch and installed this extension by setuptools, following the steps shown in the tutorial provided by torch. However, the latency is now 16 milliseconds.
The function invokation will consumes about 1-2 ms, so why the performance differs so much?
The compilation by g++ directly was done by
g++ -pthread -mavx2 -mfma ...
and the directives in the source file includes
#pragma GCC diagnostic ignored "-Wformat"
#pragma STDC FP_CONTRACT ON
#pragma GCC optimize("O3","unroll-loops","omit-frame-pointer","inline") //Optimization flags
// #pragma GCC option("arch=native","tune=native","no-zero-upper") //Enable AVX
#pragma GCC target("avx")
These directives were also included in the file built by setuptools. The "setup.py" file is
setup(
name = 'cusconv_cpp',
ext_modules=[
CppExtension(name='cusconv_cpp', sources=['src/cusconv.cpp'],
extra_compile_args={'cxx': ['-O3', '-pthread', '-mavx2', '-mfma']})
],
cmdclass={
'build_ext': BuildExtension
})
The output log by setuptools for buiding is
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/TH -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/include/python3.6m -c src/indconv.cpp -o build/temp.linux-x86_64-3.6/src/indconv.o -O3 -pthread -mavx2 -mfma -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=indconv_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
which indeed includes those flags but many other flags were also used.
Anyone has any ideas?
I need to build this Lua module and to be able to use it in another application that already has Lua core included. Module will be loaded via 'require'.
I'm using MinGW x64 on Windows 10. I successfully built Lua 5.2 with it, extracted lua.dll file and renamed it into liblua.dll.a.
Then I built the module using following Makefile:
CC = x86_64-w64-mingw32-gcc
LUA_INCDIR=$(STAGING_DIR)/usr/include
utf8.dll: lutf8lib.o
$(CC) -m64 -O -shared -fpic lutf8lib.c -o utf8.dll -llua
lutf8lib.o: lutf8lib.c
$(CC) -O2 -fpic -c -DLUA_BUILD_AS_DLL lutf8lib.c -I$(LUA_INCDIR)
The problem is file size, it's 420kb and it definetely includes Lua core (I got 'multiple VMs' error). I need to build the module without including the core.
Previously I installed usual MinGW (x86) and used following Makefile:
CC = gcc
LUA_INCDIR=$(STAGING_DIR)/usr/include
utf8.dll: lutf8lib.o
$(CC) -m32 -shared lutf8lib.c -o utf8.dll -llua
lutf8lib.o: lutf8lib.c
$(CC) -fPIC -c lutf8lib.c -I$(LUA_INCDIR)
And got 97kb file without Lua core. Unfortunately I specifically need x64 file.
UPD: I tried to build the same module using MSVC, but it seems that IDE changes luaopen_utf8 function name. If I'll add this to fix it:
int __declspec(dllexport)
IDE will include Lua core VM into DLL file. Again.
I'd like to know if I can use different compilers for compile and link.
For example ,I have two files ,a.c and b.c,
I use clang to compile a.c and b.c:
clang -c a.c -o a.o
clang -c b.c -o b.o
and then use gcc to link the two .o file as a so library:
gcc -lm -lz -shared a.o b.o -o libad.so
I generate the so file successfully,but the app will crash when using this library.
Update:
More detailed information: What I have done is cross-compile , and target platform is armv7a.I use android-NDK and compile the codes on MAC.So the gcc is arm-linux-androideabi-gcc and clang is arm-linux-androideabi-clang.
Unless special flags are specified at link time (-fuse-ld=xxx[1][2]), both clang and gcc call the default system linker (which on macOS is lld and will probably be gold on linux). So running the second statement with either gcc or clang will produce the same linked binary.
[1] https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html
[2] http://clang-developers.42468.n3.nabble.com/LLD-to-be-the-default-linker-in-Clang-td4053949.html
I need to use some Fortran code that was developed by a co-worker using Microsoft Visual Studio (it is a modified version of this groundwater flow model). However, I don't have an ifort license, and generally prefer to use open-source alternatives, so I am trying to transition to gfortran. After much googling and time on StackOverflow, I've successfully created a makefile, compiled the code, and run a simulation using gfortran. However, while benchmarking I found that the code I compiled with gfortran runs ~2.5x slower than my coworker's code compiled from Visual Studio (~30 seconds compared to ~13 seconds). Also, the size of the resulting .exe file from gfortran is about half that of Visual Studio.
I understand that there can be some differences in speed just due to switching compilers; however, the difference I observe seems extreme, leading me to believe there are certain compiler settings that differ between the two. I realize that the question, 'how can I optimize my Fortran code?' is vague and has been answered in detail elsewhere. Instead, my question is: What's the most efficient way to duplicate or approximate a compilation carried out using Microsoft Visual Studio (+ ifort) with gfortran?
My first instinct was to have my co-worker export a makefile from Microsoft Visual Studio (as described here), which I could then modify to work with gfortran by using equivalent settings where possible; however, it appears to be no longer possible to export a makefile from within Visual Studio. Second, I looked for a Fortran equivalent to MakeItSo, but didn't have any luck. I also had my co-worker send me screenshots of the Configuration Properties for Fortran in Visual Studio; however, there appear to be Visual Studio settings for which I cannot find clear analogs in gfortran (for example, the "Favor Size or Speed?" switch) seen here:
FWIW, I am running gfortran on Cygwin:
$ gfortran --version
GNU Fortran (GCC) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.
After help in the comments section from #VladimirF, I have updated my makefile and can confirm that flags are being successfully passed to gfortran:
$ make SUTRA
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_mods.o slake_mods.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o fmods_3_0.o fmods_3_0.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake.o slake.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_blas.o slake_blas.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_linpack.o slake_linpack.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_slatec.o slake_slatec.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o ssubs_3_0.o ssubs_3_0.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o usubs_3_0_non-Jeff.o usubs_3_0_non-Jeff.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o sutra_3_0.o sutra_3_0.F
gfortran slake_mods.o fmods_3_0.o slake.o slake_blas.o slake_linpack.o slake_slatec.o ssubs_3_0.o usubs_3_0_non-Jeff.o sutra_3_0.o -m64 \
-O3 -march=native -ffixed-line-length-72 -m64 -o SUTRA
My co-worker is using Intel Visual Fortran Compiler XE 12.0.2.154 on IA-32 with Microsoft Visual Studio 2008.