gcc/g++/ld caching? - gcc

I'm developing a program which generates certain parts of it as c/c++ libraries.
E.g. it creates directories - lib1, lib2, .. , libN.
For each library it generates c/c++ code + Makefile, then it uses gcc/g++ + ld and finally it calls the code from libraries.
Now the problem is that if lib1 has a function fun and libN as well, when calling fun from libN, lib1 is used.
I've tried different versions of gcc/g++ up to v4.7.

Now the problem is that if lib1 has a function fun and libN as well, when calling fun from libN, lib1 is used.
Presumably you are talking about shared libraries, and not archive libraries (where you'd get a multiply-defined symbol error).
Yes, that is how it is supposed to work, and has always worked on UNIX. Caching has nothing to do with it.
If you are on ELF platform, you might be able to make it work more Windows-like by using -Wl,-Bsymbolic, but you'll be fighting the default system behavior, and should expect rough ride, and lots of unexpected gotcha's. If the fun doesn't need to be exposed from libX, hidden symbol visibility is your friend.
Since you are generating the code for lib1, ... libN, it might be easier to simply avoid name collision by using e.g. libX_fun instead of fun. This would also be much more portable, as it will just work everywhere.
Update:
function name has really to be fun according to the interface specifications.
According to who's interface specifications?
You obviously control both the main program, and the libraries. So you can, and likely should change the interface specification to avoid this problem.

Related

how does ld deal with code that is supplied twice (in a source file and in a library)?

Suppose we call
gcc -Dmyflag -lmylib mycode.c
where mylib contains all of mycode but is compiled without -Dmyflag. So all functions and other entities implemented in mycode are available in two versions to the loader. Empirically, I find that the version from mycode is taken. Can I rely on that? Will mycode always overwrite mylib?
Empirically, I find that the version from mycode is taken.
Read this explanation of how linker works with archive libraries, and possibly this one.
Can I rely on that?
You should rely on understanding how this works.
If you understood material in referenced links, you'll observe that adding main to libmylib.a will invert the answer (and if mycode.c also contains main, you'll get duplicate symbol definition error).
If you are using a dynamic library libmylib.so, the rules are different, and the library will always lose to the main binary, although there are many complications, such as LD_PRELOAD, linking the library with -Bsymbolic, and others.
In short, you should prefer to not do this at all.

Profiling the OCaml compiler

Background information (you do not need to repeat these steps to answer the question, this just gives some background):
I am trying to compile a rather large set of generated modules. These files are the output of a prototype Modelica to OCaml compiler and reflect the Modelica class structure of the Modelica Standard Library.
The main feature is the use of polymorphic, open recursion: Every method takes a this argument which contains the final superclass hierarchy. So for instance the model:
model A type T = Real type S = T end A;
is translated into
let m_A = object
method m_T this = m_Modelica_Real
method m_S this = this#m_T this
end
and has to be closed before usage:
let _ = m_A#m_T m_A
This seems to postpone a lot of typechecking until the superclass hierarchy is actually fixed, which in turn makes it impossible to compile the final linkage module (try ocamlbuild Linkage.cmo after editing the comments in the corresponding file to see what I mean).
Unfortunately, since the code base is rather large and uses a lot of objects, the type-structure might not be the root cause after all, it might as well be some optimization or a flaw in the code-generation (although I strongly suspect the typechecker). So my question is: Is there any way to profile the ocaml compiler in a way that signals when a certain phase (typechecking, intermediate code generation, optimization) is over and how long it took? Any further insights into my particular use case are also welcome.
As of right now, there isn't.
You can do it yourself though, the compiler source are open and you can get those and modify them to fit your needs.
Depending on whether you use ocamlc or ocamlopt, you'll need to modify either driver/compile.ml or driver/optcompile.ml to add timers to the compilation process.
Fortunately, this already has been done for you here. Just compile with the option -dtimings or environment variable OCAMLPARAM=timings=1,_.
Even more easily, you can download the opam Flambda switch:
opam switch install 4.03.0+pr132
ocamlopt -dtimings myfile.ml
Note: Flambda itself changes the compilation time (most what happens after typing) and its integration into the OCaml compiler is not confirmed yet.
OCaml compiler is an ordinary OCaml program in that regard. I would use poorman's profiler for a quick inspection, using e.g. pmp script.

Windows DLL & Dynamic Initialization Ordering

I have some question regarding dynamic initialization (i.e. constructors before main) and DLL link ordering - for both Windows and POSIX.
To make it easier to talk about, I'll define a couple terms:
Load-Time Libraries: libraries which have been "linked" at compile
time such that, when the system loads my application, they get loaded
in automatically. (i.e. ones put in CMake's target_link_libraries
command).
Run-Time Libraries: libraries which I load manually by dlopen or
equivalents. For the purposes of this discussion, I'll say that I only
ever manually load libraries using dlopen in main, so this should
simplify things.
Dynamic Initialization: if you're not familiar with the C++ spec's
definition of this, please don't try to answer this question.
Ok, so let's say I have an application (MyAwesomeApp) and it links against a dynamic library (MyLib1), which in turn links against another library (MyLib2). So the dependency tree is:
MyAwesomeApp -> MyLib1 -> MyLib2
For this example, let's say MyLib1 and MyLib2 are both Load-Time Libraries.
What's the initialization order of the above? It is obvious that all static initialization, including linking of exported/imported functions (windows-only) will occur first... But what happens to Dynamic Initialization? I'd expect the overall ordering:
ALL import/export symbol linking
ALL Static Initialization
ALL of MyLib2's Dynamic Initialization
ALL of MyLib1's Dynamic Initialization
ALL of MyAwesomeApp's Dynamic Initialization
MyAwesomeApp's main() function
But I can't find anything in specs that mandate this. I DID see something with elf that hinted at it, but I need to find guarantees in specs for me to do something I'm trying to do.
Just to make sure my thinking is clear, I'd expect that library loading works very similarly to 'import in Python in that, if it hasn't been loaded yet, it'll be loaded fully (including any initialization) before I do anything... and if it has been loaded, then I'll just link to it.
To give a more complex example to make sure there isn't another definition of my first example that yields a different response:
MyAwesomeApp depends on MyLib1 & MyLib2
MyLib1 depends on MyLib2
I'd expect the following initialization:
ALL import/export symbol linking
ALL Static Initialization
ALL of MyLib2's Dynamic Initialization
ALL of MyLib1's Dynamic Initialization
ALL of MyAwesomeApp's Dynamic Initialization
MyAwesomeApp's main() function
I'd love any help pointing out specs that say this is how it is. Or, if this is wrong, any spec saying what REALLY happens!
Thanks in advance!
-Christopher
Nothing in the C++ standard mandates how dynamic linking works.
Having said that, Visual Studio ships with the C Runtime (aka CRT) source, and you can see where static initializers get run in dllcrt0.c.
You can also derive the relative ordering of operations if you think about what constraints need to be satisfied to run each stage:
Import/export resolution just needs .dlls.
Static initialization just needs .dlls.
Dynamic initialization requires all imports to be resolved for the .dll.
Step 1 & 2 do not depend on each other, so they can happen independently. Step 3 requires 1 & 2 for each .dll, so it has to happen after both 1 & 2.
So any specific loading order that satisfies the constraints above will be a valid loading order.
In other words, if you need to care about the specific ordering of specific steps, you probably are doing something dangerous that relies on implementation specific details that will not be preserved across major or minor revisions of the OS. For example, the way the loader lock works for .dlls has changed significantly over the various releases of Windows.

C++ library: .hpp + .inl (separate definitions and declarations) vs .hpp only (in-class body code)

I'm rewriting my Windows C++ Native Library (an ongoing effort since 2002) with public release in mind. For the past 10 years I've been the sole beneficiary of those 150+ KLOC and I feel others might find good uses for it also.
Currently the entire library is quite-templated and header only. That means all code is in the body of the classes. It's not very easy to manage but it's OK.
I'm VERY tempted, after reading several C++ library coding guidelines, to break it into .hpp + .inl files. Experimentally done so for a few classes and it does increase readability and makes it easier for others to deal with. I know where everything is at any given time. But other users might want to a quick overview of a classes declaration... and the definition only if necessary (debugging).
QUESTION:
What are the pros/cons of splitting the member definitions from the class' definition for a class template? Is there a commonly accepted practice.
This is important for me because it's a one way road. I can't refactor it the other way later on so any feedback matters...
I've found my answer in another question.
Question: When should I consider making a library header-only? - and answer is here^.
And the answer is I will break it into .cpp and .hpp files and make it ready to compiler both as header only and static library or DLL.
#Steve Jessop:
If you think your non-template library could be header-only, consider dividing it into two files anyway, then providing a third file that includes both the .h and the .cpp (with an include guard).
Then anyone who uses your library in a lot of different TUs, and suspects that this might be costing a lot of compile time, can easily make the change to test it.
^ this is an awesome idea. It will take a bit more work but it's SO versatile.
UPDATE
It's important to explicitly instantiate^ the templated classes in the .cpp files.

GCC technical details

I don't know if this is the right place for things like this, but I am curious about a few aspects of the GCC front-end/back-end architecture:
I know I can compile .o files from C code and link them to C++ code, and I think I can do it the other way round, too. Does this work because the two languages are similar, or because the GCC back-end is really language-independent? Would this work with ADA code too? (I don't even know if that makes sense, since I don't know ADA or if it even has "functions", but the question is understood. If it makes no sense, think "Pascal" or even "my own custom language front-end")
Where would garbage-collection be implemented? For example, a Java front-end. The way I understand, if compiling to a JVM back-end, the "platform" will take care of the GC, and so the front-end needs not do anything about it, but if compiling to native code, would the front-end send garbage-collecting GENERIC code to the back-end, or does it turn on some flag telling the back-end to produce garbage-collecting code? The first makes more sense to me, but that would mean the front-end produces different output based on the target, which seems to miss the point of the GCC's front-end/back-end architecture.
Where would language-specific libraries go? For instance, the standard Java classes or standard C headers. If they are linked in at the end, then could a C program theoretically call functions from the Java library or something like that, since it is just another linked library?
Yes, the backend is at least reasonably language independent. Yes, it works with Ada.
GCJ generates native code which uses a runtime library. The garbage collector is part of the runtime library.
GCJ implements the CNI, which allows you to write code in C++ that can be used as native methods by Java code -- but being able to do this is a consequence of them having designed it in, not just an accidental byproduct of using the same back-end.
It is possible because calling convention is compatible, but name mangling is different (no mangling in C). To call C function from C++ you should declare it with extern "C". And to call C++ function from C you should declare it with mangled name (and may be with additional or different type args). The calling Fortran code is possible in some cases too, but argument passing convention is different (pass by ref in Fortran).
There were actually a converters from C++ to C (cfront) and from fortran to c (f2c) and some solutions from them are still used.
garbage-collection is implemented in run-time library, e.g. boehm. Backend should generate objects compatible with selected GC library.
Compiler driver (g++, gfortran, ..) will add language-specific libraries to linking step.

Resources