Cross compiling FFTW for ARM Neon

Cross compiling FFTW for ARM Neon - gcc

I am trying to compile FFTW3 to run on ARM Neon (More precisely, on a Cortex a-53). The build env is x86_64-pokysdk-lunix, The host env is aarch64-poky-lunix. I am using the aarch64-poky-linux-gcc compiler.
I used the following command at first:
./configure --prefix=/build_env/neon/neon_install_8 --host=aarch64-poky-linux --enable-shared --enable-single --enable-neon --with-sysroot=/opt/poky/2.5.3/sysroots/aarch64-poky-linux "CC=/opt/poky/2.5.3/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gcc -march=armv8-a+simd -mcpu=cortex-a53 -mfloat-abi=softfp -mfpu=neon"
The compiler did not support the -mfloat-abi=softfp and the -mfpu=neon. It also did not let me define the path to the sysroot this way.
Then used the following command:
./configure --prefix=/build_env/neon/neon_install_8 --host=aarch64-poky-linux --enable-shared --enable-single --enable-neon "CC=/opt/poky/2.5.3/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gcc" "CFLAGS=--sysroot=/opt/poky/2.5.3/sysroots/aarch64-poky-linux -mcpu=cortex-a53 -march=armv8-a+simd"
This command succeeded with this config log and this config.h. Then I used the command make then make install. I then copied my shared library file into my host env and used fftwf_ instead of fftw_ in my code base. The final step was to recompile the program. I ran a test and compared the times for both algorithm using <sys/resource.h>. I also used the fftw[f]_forget_wisdom() on both algorithms so that It can be fair. However, I am not getting a speedup. I believe that using an SIMD architecture (NEON in our case) would accelerate the FFTW library.
I would really appreciate if anyone can point out something that I am doing wrong so that I can try a fix and see if I can get the performance boost I am looking for.

Related

Problems with GCC7 (trunk) OpenACC offloading (nvptx)

I have been trying to use gcc (trunk version) offloading but so far I am failing to do so. I compiled gcc following the instructions for OpenACC offloading with nvidia from this site: https://gcc.gnu.org/wiki/Offloading
I also compiled the host compiler following the instructions of the same website. However, I get an error when I try to compile anything with OpenACC enabled. To make sure I am using the right compiler I cd into the directory of the host compiler and I run this:
./g++ main.cpp -fopenacc -foffload=nvptx-none
But I get this error:
lto-wrapper: fatal error: problem with building target image for nvptx-none
compilation terminated.
/mnt/home/george/usr/local/gcc-7/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
Running ./g++ -v gives me the following:
Using built-in specs.
COLLECT_GCC=../g++
COLLECT_LTO_WRAPPER=/mnt/home/george/usr/local/gcc-7/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.0.0/lto-wrapper
OFFLOAD_TARGET_NAMES=x86_64-intelmicemul-linux-gnu:nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-7-20161211/configure --prefix=/home/george/usr/local/gcc-7 --disable-multilib --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=x86_64-intelmicemul-linux-gnu=/home/george/usr/local/gcc-7-mic,nvptx-none=/home/george/usr/local/nvptx-tools/nvptx-none --with-cuda-driver=/usr/local/cuda-7.5
Thread model: posix
gcc version 7.0.0 20161211 (experimental) (GCC)`
I would really appreciate If someone could point me to the right direction on what exactly is causing this error.
PS: I have also compiled gcc for Intel mic offloading but I don't care about this for now.
EDIT 1:
When I compile the host compiler, where is the --enable-offload-targets=nvptx-none=XXX should point to? The compiled nvptx or the accel compiler? Also, the nvptx-tools directory includes a bin directory and a nvptx-none\bin directory. Currently I point it to the latter.

Can we have a compiler running in embedded device

It may sound weird but I would like to know if we can have compiler in embedded device (lets say gcc support on imx6).

Of course, it is not uncommon to have target tools, but is is not trivial. A non-native (from the host perspective) compiler must be cross-compiled for the target architecture. You didn't provide any details, but maybe your build system can build target tools for you. Of course, you need much more than just a compiler. You probably need make, autotools, and probably more. It depends on what you are trying to compile on the target.
Your best bet would be to gain some proficiency using a cross-compiler environment. If you haven't already, you might check out the Yocto Project. It supports i.mx6 (and much more) and probably provides a path to get target tools on your board.
Good luck!

To arm arch, it will be easy to get target compiler, linaro ubuntu of linaro project will provide a completely solution for arm arch, it can provide GNOME desktop、toolchain and informative tools on your target.
You can get more info from the following link:
https://wiki.linaro.org/Platform/DevPlatform/Ubuntu

Yes that should easy enough.. What version of cross-compiler do you have in your machine, download the matching gcc compiler from here https://ftp.gnu.org/gnu/gcc/
Now what you want to do is cross-compile the GCC which you downloaded using the crosscompiler which you already have.
Following is an example of compiling 4.7.4, NOTE: replace the HOST and BUILD according to your platform:
./contrib/download_prerequisites
cd ..
mkdir objdir
cd objdir
../gcc-4.7.4/configure --build=$BUILD \
--host=$HOST \
--target=$HOST \
--prefix=/usr \
--disable-nls \
--enable-languages=c,c++ \
--with-float=hard
make -j $JOBS
make DESTDIR=<path_where_to_install> install

Is `--enable-mpbsd` no longer required when building GMP?

So I'm trying to build a cross-compiler toolchain off of the latest GCC (gcc-5.1.0). GCC requires GMP and so I downloaded GNU MP 6.0 (gmp-6.0.0).
Instructions for building GMP suggest (for my purpose) to pass the parameter --enable-mpbsd which is documented as follows:
The meaning of the new configure options:
--enable-cxx
This parameter enables C++ support
--enable-mpbsd
This builds the Berkeley MP compatibility library
However, when I fun configure, it warns me:
configure: WARNING: unrecognized options: --enable-mpbsd
Which suggests that the option was introduced in 5.x and deprecated again in 6.x or replaced by something else ...?!
The exact command line I use is (just for completeness):
./configure --prefix=$PREFIX --enable-shared --enable-static --enable-mpbsd --enable-fft --enable-cxx --host=x86_64-pc-freebsd6
PS: for now I intend to disregard this warning and proceed anyway. I'll report back whether this still turns out as a functional toolchain.

--enable-mpbsd
This builds the Berkeley MP compatibility library
This was potentially useful 20 years ago, but it hasn't been for a long time, which is why it was removed from GMP. Linux From Scratch is wrong to recommend the use of that option, it was never required (though it didn't hurt). Please contact them so they can update their instructions.
By the way, you do not need --enable-shared --enable-static --enable-fft, they are the default.

building my own gcc version

My distro (CentOS 6.3) comes with gcc 4.4.6. Since I wanted to try out the Fortran2003 features I decided to compile gcc 4.7.
I followed the steps I found online: compiled separately first gmp, mpc, mpfr, ppl and cloog and the compiled gcc.
I run the configured script as:
configure --prefix=... --with-gmp=... --with-mpfr=... --with-mpc=... --program-suffix=-4.7 --enable-cloog-backend=isl --with-ppl=... --with-cloog=... --disable-multilib
This worked all right and I was able to compile with make & make install.
Now, when trying my new compiler with a simple test program (a hello world kind of thing) I get the error:
gfortran-4.7 -o test test.F90
/home/amcastro/gcc-4.7/output/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/f951: error while loading shared libraries: libcloog-isl.so.1: cannot open shared object file: No such file or directory
So I decide to set LD_LIBRARY_PATH=/home/amcastro/gcc-4.7/output/lib
and then I can compile.
When running I get the error:
./test
./test: error while loading shared libraries: libquadmath.so.0: cannot open shared object file: No such file or directory
So I set LD_LIBRARY_PATH=/home/amcastro/gcc-4.7/output/lib:/home/amcastro/gcc-4.7/output/lib64
and now the program runs normally.
The question is: Why is that my distro version of gcc (4.4.6) does not need me to set LD_LIBRARY_PATH? how does the distro gcc know where to look for these dynamically liked libraries? should I somehow make them to link statically?
I read also that setting LD_LIBRARY_FLAG is not a good idea. Is there another solution?
Thank you in advance
A.

Using gcc along with ccache

I was thinking about using ccache with gcc compiled code on the team wide base (same ccache's cache will be used by all developers on the same machine).
Since we're talking about commercial product, "correctness" of compilation is a top priority.
Here come questions:
Is the compilation using ccache is safe/reproducible?
Is there some exception situations that ccache supposes cache hit mistakenly.
If I checkout a source code and compile it, I expect to receive the same products
(exactly same libraries/binaries) each time I repeat a fresh compilation process.
This is must to for commercial product.
Is there're open source/commercial products using ccache an integral part of their
build system? This will make it easier to convince my colleagues to use ccache.
Thanks

According to its manual, ccache determines whether it has compiled some object before on the following:
the pre-processor output from running the compiler with -E
the command line options
the real compilers size and modification time
any stderr output generated by the compiler
If some PHB is still worried about any assumed risk you take because of ccache, only use it for development builds and build the final product using the compiler without any front-end. Or you could clear the cache before building the final product.
Update: I don't know about products using ccache as an integral part of their build system, but it is really trivial to integrate into any environment where you can set the compiler's path. I.e. for autoconf:
CC="ccache gcc" ./configure
And after looking at the author's name, I'd say it's a pretty safe assumption that it has been widely used within the Samba team.
Update in response to Ringding's comment about usage of stderr: From ccache's point of view, one interesting bit of information is the C compiler's version and configuration string. gcc outputs that to the standard error file:
$ gcc -v 2>err
$ cat err
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.3.4-2' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-targets=all --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.3.4 (Debian 4.3.4-2)
I'd bet that ccache uses this or a similar output. But, hey, you can always look at its source code. :-)

I am personally familiar only with ccache which is very simple to use, and I find it extremely useful for my large scale private projects.
However, as for a team wide base, I have no experience yet.
You may be interested also in AO (audited objects):
In general:
it provides a more robust mechanism, can use distributed environment for caching
ccache speed up only compilation time, while AO speed up link time too.
not limited only to c/c++
Not long after my posted answer (1.5 years ago...), I managed to convince our build and R&D managers, to integrate ccache into the automatic build system, and they are grateful to me on that. The company employs more than 200 developers, so it is really working. As for the linking phase, it is still an issue.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio