I'm trying to get this superLU sparse matrix solver to run, but I can't seem to compile it. I'm writing my program with Fortran, and so I'm trying to call superLU from my program. I'm using g95 compiler for fortran.
http://crd.lbl.gov/~xiaoye/SuperLU/#superlu
How can I compile this in Fortran? I tried and it said error cannot exec 'cc1', no file found. I don't care what Fortran compiler I use, just any way I can call this superLU from Fortran.
I don't know much about linking Fortran with C++ programs, what I did is g95 -o test f77_main.f hbcode1.f c_fortran_dgssv.c
I recommend the use of the Fortran ISO C Binding for calling C routines from Fortran, or vice-a-versa. This binding, which is part of Fortran 2003 but widely available for several years, informs Fortran to use C calling conventions. It is part of the language and so compiler and platform independent. Besides previous answers here, there are code examples in the gfortran manual under "Mixed-Language Programming". There are also many examples of Fortran interfaces to call the C routines of the GNU Scientific Library at http://www.lrz.de/services/software/mathematik/gsl/fortran/index.html.
Re how to compile & link a mixed Fortran & C program ... generally is easier to use the Fortran compiler for the linking step because it brings in extra Fortran runtime libraries. So proceed in two steps: compile your C routines to object files, then in the next step compile the Fortran routines and link the Fortran routines and the pre-compiled C routines. If using C++, use "extern C" to make it compatible with C. So, for example:
gcc -c MyCRoutine.c
gfortran FortranMain.f95 MyCRoutine.o
A good start to solving your problem can be found in a previous answer here.
It looks a bit strange that you are passing a .c file to g95. You should compile the C file using GCC, then link the resulting .o file to your compiled fortran code.
Related
According to this link there are no predefined preprocessor symbols for AVX512 ( MSVC 2017 )
I'm trying to build thundersvm which uses eigen library on (you guessed it) windows. Both Eigen and thundersvm use cmake and depinding on the compiler prerpocessor symbols, Eigen compiles with avx512 instructions or not.
It seems that using /arch:AVX512 doesn't trigger any errors in MSVC but doesn't define __AVX512F__ symbol which Eigen needs. I also tried to include -D__AVX512F__=ON in the cmake arguments but still no luck.
Since there is no predefined preprocessor symbol for AVX512, is there any way to force Eigen to compile with avx512?
Update
According to chtz comment I've checked out the default branch of Eigen and recompiled thundersvm with arch:AVX512 with this cmake arguments (maybe not all are needed):
-DUSE_CUDA=OFF -DUSE_EIGEN=ON -DBUILD_SHARED_LIBS=OFF -DEIGEN_ENABLE_AVX512=ON -D__AVX512F__=ON -DEIGEN_VECTORIZE_AVX512=ON -DEIGEN_VECTORIZE_AVX2=ON -DEIGEN_VECTORIZE_AVX=ON -DEIGEN_VECTORIZE_FMA=ON
Comparing instruction mix from Intel's SDE -mix tool before and after the patch I can clearly see that AVX instructions are used (SDE complains it doesn't recognise instruction vbroadcastss zmm0, xmm0 when running for skl cpu but works fine for skx). The problem is that MSVC uses the scalar version of AVX and there is no improvement in the runtime(also the number of total instructions is the same) which is similar to this post
Are there other flags I need to define so that MSVC generates non scalar instrucions ? (I think I'll also give gcc a try)
MSVC has poor support for AVX-512 and no distinction between the different subsets. There is no safe way to produce AVX512F code on MSVC without also possibly making AVX512DQ instructions.
The best compilers for AVX-512 are gcc and clang. There is a Clang plugin to Visual Studio that you can use if you like the IDE. The gcc and clang compilers have preprocessor symbols like __AVX512F__, __AVX512VL__, etc.
This question already has an answer here:
Relocation error when compiling NASM code in 64-bit mode
(1 answer)
Closed 4 years ago.
I made a very simple 1 stage bootloader that does two main things: it switches from 16 bit real mode to 64 bit long mode, and it read the next few sectors from the hard disk that are for initiating the basic kernel.
For the basic kernel, I am trying to write code in C instead of assembly, and I have some questions regarding that:
How should I compile and link the nasm file and the C file?
When compiling the files, should I compile to 16 bit or 64 bit? since I am switching from 16 to 64 bits.
How would I add more files from either C or assembly to the project?
I rewrote the question to make my goal more clear, so if source code is needed tell me to add it.
Code: https://github.com/LatKid/BasicBootloaderNASMC
since I am also linking a nasm file with the C file, it spits an error from the nasm object file, which is relocation R_X86_64_16 against .text' can not be used when making a shared object; recompile with -fPIC
One of your issues is probably inside that nasm assembler file (which you don't show in the initial version of your question). It should contain only position-independent code (PIC) so cannot produce an object file with relocation R_X86_64_16 (In your edited question, mov sp, main is obviously not PIC, you should use instruction pointer relative data access of x86-64, and you cannot define main both in your nasm file and in a C file, and you cannot mix 16 bits mode with 64 bits mode when linking).
Study ELF, then the x86-64 ABI to understand what kind of relocations are permitted in a PIC file (and what constraints an assembler file should follow to produce a PIC object file).
Use objdump(1) & readelf(1) to inspect object files (and shared objects and executables).
Once your nasm code produces a PIC object file, link with gcc and use gcc -v to understand what happens under the hoods (you'll see that extra libraries and object files, including crt0 ones, -lgcc and -lc, are used).
Perhaps you need to understand better compilation and linking. Read Levine's book Linkers and Loaders, Drepper's paper How To Write Shared Libraries, and -about compilation- the Dragon book.
You might want to link with gcc but use your own linker script. See also this answer to a very related question (probably with motivations similar to yours); the references there are highly relevant for you.
PS. Your question lacks motivation and context (it has no MCVE but needs one) and might be some XY problem. I guess you are on Linux. I strongly recommend publishing your actual full code -even buggy- (perhaps on github or gitlab or elsewhere) as free software to get potential help. I strongly recommend using an existing bootloader (probably GRUB) and focus your efforts on your OS code (which should be published as free software, to get some feedback).
I developed an R package which calls C++ code through Rcpp and RcppEigen. My Makevars.win looks like this (the enumeration is meant to refer to my questions)
CXX_STD = CXX11
PKG_CPPFLAGS = -fopenmp -O3 -Wall -ftree-vectorize -march=native -mavx -mfma
PKG_CXXFLAGS += $(SHLIB_OPENMP_CXXFLAGS)
PKG_LIBS = -fopenmp
PKG_LIBS += $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CXXFLAGS)
PKG_CPPFLAGS += -I../inst/include/
as I want to use OpenMP and link the R package against Intel MKL library. I am also adding in my source files the plugins // [[Rcpp::plugins(cpp11)]] and // [[Rcpp::plugins(openmp)]].
When I compile the package everything works fine but I am still getting the default compilation flags -O2 and -std=c++0x. So my questions are:
A. isn't 1. supposed to force -std=c++11 (by the way, using the same Makevars yields the right C++ version, so there must be something specific to Windows)?
B. does 3 repeats fopenmp in 2?
C. how to check whether 5. has been taken into account? I am asking this as the same package built on Mac is much faster than on Windows while their configurations are the same. I have done some benchmark of the same code on Windows using Microsoft R Open and Mac, and Windows was faster in that case.
Thank you very much for your very precious help.
Where to start?
First off, compilation and linking options are based on the union of R's Makeconf and you package's src/Makevars. You can add to value, you cannot replace.
Second, and related, which BLAS you get is a system setup issue. You cannot generally govern that from your package.
Third, plugins for sourceCpp() and cppFunction(). In packages you make direct declarations, ie CXX_STD=CXX11.
Fourth, there are almost 1000 packages on CRAN using Rcpp. Sometimes it helps just to look at what some of these do. Many employ OpenMP.
Fifth, OpenMP is severely challenging on OS X thanks to Apple. I've forgotten what the Windows situation is. It just works on Linux.
I understand more or less the idea: When compiling separate modules and producing assembly code, functions calling each other have to respect strictly the calling convention, which kills the opportunity for many optimisations when compiling separate modules.
For instance if I have function A which calls function B which calls function C, all 3 in their own separate source files, it becomes possible to allocate registers evenly within the functions so that no register saving on the stack is necessary at all during those calls. With traditional compile-assembly-linking this is not possible, as the caller-saved and callee-saved registers are imposed by the calling convention.
Another optimisation is to inline functions which are called only once. This previously was possible only if a function is local, but thanks to linktime optimisation it's now possible even if the function is in another source file.
Now, if I compile with both -flto and -S flags, I see that instead of normal assembly instructions, gcc generates an encoded representation of the program, such as this:
.section .gnu.lto_.inline.c3c5e6ef8ec983c,"dr0"
.ascii "x\234mQ;N\303#\20}\273\353\17\370C\234\20\242`\"!Q\20\11Ah\322&\25\242\314\231|\4\32\220\220(,$.#\205D\343\3P Z.\341Tn\231\35\274\31L\342\342\355\314\274\371<\317\30\354\376\356\365\357\333\7\262"
.ascii "1\240G\325\273\202\7\216\232\204\36\205"
.ascii "8\242\370\240|\222"
.ascii "8\374\21\205ty\352\"*r\340!:!n\357n%]\224\345\10|\304\23\342\274z\346"
.ascii "8\35\23\370\7\4\1\366s\362\203j\271]\27bb{\316\353\27\343\310\4\371\374\237*n#\220\342rA\31"
.ascii "7\365\263\327\231\26\364\10"
.ascii "2\\-\311\277\255^w\220}|\340\233\306\352\263\362Qo+e+\314\354\277\246\354\252\277\20\364\224%T\233'eR\301{\32\340\372\313\362\263\242\331\314\340\24\6\21s\210\243!\371\347\325\333&m\210\305\203\355\277*\326\236\34\300-\213\327\306\2Td\317\27\231\26tl,\301\26\21cd\27\335#\262L\223"
.ascii "8\353\30\351\264{I\26\316\11\14"
.ascii "9\326h\254\220B}6a\247\13\353\27M\274\231"
.ascii "0\23M\332\272\272%d[\274\36Q\200\37\321\1&\35"
Since the data is in its own particular section, the linker sees this, and does the code generation. If the module was written in either assembly or with no -flto flag, then the linker would see data in the .text section instead, so there is no confusion possible for the linker.
The problem is: How can the linker generate code? Normally only gcc can generate code, the linker's role is just here to change a few offsets and adapt the binary format. In order to generate code, the linker would need to contain a second copy of the entire gcc backend (half of the compiler which generates assembly code from intermediate representation), as well as the entire assembler (since no assembly code was produced). How is such a thing possible, especially considering that binutils is a completely separate entity from gcc, developed by different teams?
GCC's -flto emits a serialized form of GCC's internal representation, as you discovered.
Then, at link time, the linker reinvokes GCC and passes it the objects that need final compilation. GCC reads the internal representation and does the work.
I think the actual work is done in collect2, which is part of GCC that is used when invoking the linker (I'm a little fuzzy on the details). There is also a "linker plugin" system that enables this to work a little better (like letting the linker decide how to split the compilation). This is implemented at least by the binutils ld and by gold; but as far as I recall this is just an optimization and isn't needed to get the basic -flto feature to work. You can see a bit more information on the original LTO project page; and maybe links from there would explain more.
There is more overlap between the GCC and binutils teams than you might think. The two projects share some code and have a long history of working together. Some people work on both projects.
From https://gcc.gnu.org/wiki/LinkTimeOptimization:
Despite the "link time" name, LTO does not need to use any special
linker features. The basic mechanism needed is the detection of GIMPLE
sections inside object files. This is currently implemented in
collect2 [which is called by gcc; -ps]. Therefore, LTO will work on any linker already supported by
GCC.
I assume this means you must link calling the compiler driver gcc. Simply linking with the system's vanilla linker wouldn't optimize the whole program, as you already concluded.
Update:
https://gcc.gnu.org/onlinedocs/gccint/Collect2.html says
The program collect2 is installed as ld in the directory where the
passes of the compiler are installed. When collect2 needs to find the
real ld it tries the following file names: [...]
(The page goes on detailing how collect2 looks for configuration-dependent executables and ones with well-known names like real-ld, finally even ld; but will not call itself recursively.)
I was looking up the pypy project (Python in Python), and started pondering the issue of what is running the outer layer of python? Surely, I conjectured, it can't be as the old saying goes "turtles all the way down"! Afterall, python is not valid x86 assembly!
Soon I remembered the concept of bootstrapping, and looked up compiler bootstrapping. "Ok", I thought, "so it can be either written in a different language or hand compiled from assembly". In the interest of performance, I'm sure C compilers are just built up from assembly.
This is all well, but the question still remains, how does the computer get that assembly file?!
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
Can someone explain this to me?
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
I understand what you're asking... what would happen if we had no C compiler and had to start from scratch?
The answer is you'd have to start from assembly or hardware. That is, you can either build a compiler in software or hardware. If there were no compilers in the whole world, these days you could probably do it faster in assembly; however, back in the day I believe compilers were in fact dedicated pieces of hardware. The wikipedia article is somewhat short and doesn't back me up on that, but never mind.
The next question I guess is what happens today? Well, those compiler writers have been busy writing portable C for years, so the compiler should be able to compile itself. It's worth discussing on a very high level what compilation is. Basically, you take a set of statements and produce assembly from them. That's it. Well, it's actually more complicated than that - you can do all sorts of things with lexers and parsers and I only understand a small subset of it, but essentially, you're looking to map C to assembly.
Under normal operation, the compiler produces assembly code matching your platform, but it doesn't have to. It can produce assembly code for any platform you like, provided it knows how to. So the first step in making C work on your platform is to create a target in an existing compiler, start adding instructions and get basic code working.
Once this is done, in theory, you can now cross compile from one platform to another. The next stages are: building a kernel, bootloader and some basic userland utilities for that platform.
Then, you can have a go at compiling the compiler for that platform (once you've got a working userland and everything you need to run the build process). If that succeeds, you've got basic utilities, a working kernel, userland and a compiler system. You're now well on your way.
Note that in the process of porting the compiler, you probably needed to write an assembler and linker for that platform too. To keep the description simple, I omitted them.
If this is of interest, Linux from Scratch is an interesting read. It doesn't tell you how to create a new target from scratch (which is significantly non trivial) - it assumes you're going to build for an existing known target, but it does show you how you cross compile the essentials and begin building up the system.
Python does not actually assemble to assembly. For a start, the running python program keeps track of counts of references to objects, something that a cpu won't do for you. However, the concept of instruction-based code is at the heart of Python too. Have a play with this:
>>> def hello(x, y, z, q):
... print "Hello, world"
... q()
... return x+y+z
...
>>> import dis
dis.dis(hello)
2 0 LOAD_CONST 1 ('Hello, world')
3 PRINT_ITEM
4 PRINT_NEWLINE
3 5 LOAD_FAST 3 (q)
8 CALL_FUNCTION 0
11 POP_TOP
4 12 LOAD_FAST 0 (x)
15 LOAD_FAST 1 (y)
18 BINARY_ADD
19 LOAD_FAST 2 (z)
22 BINARY_ADD
23 RETURN_VALUE
There you can see how Python thinks of the code you entered. This is python bytecode, i.e. the assembly language of python. It effectively has its own "instruction set" if you like for implementing the language. This is the concept of a virtual machine.
Java has exactly the same kind of idea. I took a class function and ran javap -c class to get this:
invalid.site.ningefingers.main:();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_0
1: istore_1
2: iconst_0
3: istore_1
4: iload_1
5: aload_0
6: arraylength
7: if_icmpge 57
10: getstatic #2;
13: new #3;
16: dup
17: invokespecial #4;
20: ldc #5;
22: invokevirtual #6;
25: iload_1
26: invokevirtual #7;
//.......
}
I take it you get the idea. These are the assembly languages of the python and java worlds, i.e. how the python interpreter and java compiler think respectively.
Something else that would be worth reading up on is JonesForth. This is both a working forth interpreter and a tutorial and I can't recommend it enough for thinking about "how things get executed" and how you write a simple, lightweight language.
In the interest of performance, I'm sure C compilers are just built up from assembly.
C compilers are, nowadays, (almost?) completely written in C (or higher-level languages - Clang is C++, for instance). Compilers gain little to nothing from including hand-written assembly code. The things that take most time are as slow as they are because they solve very hard problems, where "hard" means "big computational complexity" - rewriting in assembly brings at most a constant speedup, but those don't really matter anymore at that level.
Also, most compilers want high portability, so architecture-specific tricks in the front and middle end are out of question (and in the backends, they' not desirable either, because they may break cross-compilation).
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
When you're installing an OS, there's (usually) no C compiler run. The setup CD is full of readily-compiled binaries for that architecture. If there's a C compiler included (as it's the case with many Linux distros), that's an already-compiled exectable too. And those distros that make you build your own kernel etc. also have at least one executable included - the compiler. That is, of course, unless you have to compile your own kernel on an existing installation of anything with a C compiler.
If by "new CPU" you mean a new architecture that isn't backwards-compatible to anything that's yet supported, self-hosting compilers can follow the usual porting procedure: First write a backend for that new target, then compile yourself for it, and suddenly you got a mature compiler with a battle-hardened (compiled a whole compiler) native backend on the new platform.
If you buy a new machine with a pre-installed OS, it doesn't even need to include a compiler anywhere, because all the executable code has been compiled on some other machine, by whoever provides the OS - your machine doesn't need to compile anything itself.
How do you get to this point if you have a completely new CPU architecture? In this case, you would probably start by writing a new code generation back-end for your new CPU architecture (the "target") for an existing C compiler that runs on some other platform (the "host") - a cross-compiler.
Once your cross-compiler (running on the host) works well enough to generate a correct compiler (and necessary libraries, etc.) that will run on the target, then you can compile the compiler with itself on the target platform, and end up with a target-native compiler, which runs on the target and generates code which runs on the target.
It's the same principle with a new language: you have to write code in an existing language that you do have a toolchain for, which will compile your new language into something that you can work with (let's call this the "bootstrap compiler"). Once you get this working well enough, you can write a compiler in your new language (the "real compiler"), and then compile the real compiler with the bootstrap compiler. At this point you're writing the compiler for your new language in the new language itself, and your language is said to be "self-hosting".