I want to test GCC/clang and I want to focus on parts that most computations/optimizations happens there. What are those files?
You probably won't find any blatant hot spot in the GCC compiler (there have been some GSOC project around that idea a few years ago), at least when you ask it to optimize.
You could use the -ftime-report & -fmem-report options to gcc (in addition of optimizing options like -O2) to find out more which (compiler optimization) passes is using time. For most workloads, you won't find any blatant passes eating a lot more resources than others.
I guess it is the same in Clang. Compilers are very complex software, and there is no easy hot-spot to optimize inside them (otherwise, people within the compiler community would have found them).
BTW, recent GCC have plugin hooks, which enable you to code your GCC plugin (in C++), or your GCC extension (in MELT) to find more.
Related
I am messing around with creating my own custom gcc toolchain for an arm Cortex-A5 cpu, and I am trying to dive as deeply as possible into each step. I have deliberately avoided using crosstool-ng or other tools to assist, in order to get a better understanding of what is going on in the process of creating a toolchain.
One thing stumples me though. During configuration and building of binutils, I need to specify a target (--target). This target can be either the classic host-tuple (ex: arm-none-linux-gnuabi) or a specific type, something like i686-elf for example.
Why is this target needed? and what does it specifically do with the generated "as" and "ld" programs built by binutils?
For example, if I build it with arm-none-linux-gnueabi, it looks like the resulting "as" program supports every arm instruction set under the sun (armv4t, armv5, e.t.c.).
Is it simply for saving space in the resulting executable? Or is something more going on?
I would get it, if I configured the binutils for a specific instruction set for example. Build me an assembler that understands armv4t instructions.
Looking through the source of binutils and gas specifically, it looks like the host-tuple is selecting some header files located in gas/config/tc*, gas/config/te*. Again, this seems arbitrary, as it is broad categories of systems.
Sorry for the rambling :) I guess my questing can be stated as: Why is binutils not an all-in-one package?
Why is this target needed?
Because there are (many) different target architectures. ARM assembler / code is different from PowerPC is different from x86 etc. In principle it would have been possible to design one tool for all targets, but that was not the approach taken at te time.
Focus was mainly on speed / performance. The executables are small as of today's 'standards' but combining all > 40 architectures and all tools like as, ld, nm etc. would be / have been quite clunky.
Moreover, not only are modern host machines much more powerful, that also applies to the compiler / assembled programs, sometime zillions of lines of (preprocessed) C++ to compile. This means the overall build times shifted much more in the direction of compilation that in the old days.
Only different core families are usually switchable / selectable per options.
When I'm stepping into debugged program, it says that it can't find crt/crt_c.c file. I have sources of gcc 6.3.0 downloaded, but where is crt_c.c in there?
Also how can I find source code for printf and rand in there? I'd like to step through them in debugger.
Ide is codeblocks, if that's important.
Edit: I'm trying to do so because I'm trying to decrease size of my executable. Going straight into freestanding leaves me with a lot of missing functions, so I intend to study and replace them one by one. I'm trying to do that to make my program a little smaller and faster, and to be able to study assembly output a bit easier.
Also, forgot to mention, I'm on windows, msys2. But answer is still helpful.
How can I find source code for printf and rand in there?
They (printf, rand, etc....) are part of your C standard library which (on Linux) is outside of the GCC compiler. But crt0 is provided by GCC (however, is often not compiled with debug information) and some C files there are generated in the build tree during compilation of GCC.
(on Windows, most of the C standard library is proprietary -inside some DLL provided by MicroSoft- and you are probably forbidden to look into the implementation or to reverse-engineer it; AFAIK EU laws might mention some exception related to interoperability¸ but then you need to consult a lawyer and I am not a lawyer)
Look into GNU glibc (or perhaps musl-libc) if you want to study its source code. libc is generally using system calls (listed in syscalls(2)) provided by the Linux kernel.
I'd like to step through them in debugger.
In practice you won't be able to do that easily, because the libc is provided by your distribution and has generally been compiled without debug information in DWARF format.
Some Linux distributions provide a debuggable variant of libc, perhaps as some libc6-dbg package.
(your question lacks motivation and smells like some XY problem)
I intend to study and replace them one by one.
This is very unrealistic (particularly on Windows, whose system call interface is not well documented) and could take you many years (or perhaps more than a lifetime). Do you have that much time?
Read also Operating Systems: Three Easy Pieces and look into OsDev wiki.
I'm trying to do so because I'm trying to decrease size of my executable.
Wrong approach. A debugger needs debug info (e.g. in DWARF) which will increase the size of the executable (but could later be stripped). BTW standard C functions are in some common shared library (or DLL on Windows) which is used by many processes.
I'm on windows, msys2.
Bad choice. Windows is proprietary. Linux is made of free software (more than ten billions lines of source code, if you consider all useful packages inside a typical Linux distribution), whose source code you could study (even if it would take several lifetimes).
I think I finally know what I want in a compiled programming language, a fast compiler. I get the feeling that this is a really superficial thing to care about but some time after switching from Java to Scala for a while I realized that being able to make a small change in code and immediately run the program is actually quite important to me. Besides Java and Go I don't know of any languages that really value compile speed.
Delphi/Object Pascal. Make a change, press F9 and it runs - you don't even notice the compile time. A full rebuild of a fairly substantial project that we run takes of the order of 10-20 seconds, even on a fairly wimpy machine
There's an open source variant available at www.freepascal.org. I've not messed with it but it reportedly is just as fast - it's the design of the Pascal language that allows this.
Java isn't fast for compiling. The feature you a looking for is probably a hot replacement/redeployment while coding. Eclipse recompiles just the files you changed.
You could try some interpreted languages. They usually don't require compiling at all.
I wouldn't choose a language based on compilation speed...
Java is not the fastest compiler out there.
Pascal (and its close relatives) is designed to be fast - it can be compiled in a single pass. Objective Caml is known for its compilation speed (and there is a REPL too).
On the other hand, what you really need is REPL, not a fast recompilation and re-linking of everything. So you may want to try a language which supports an incremental compilation. Clojure fits well (and it is built on top of the same JVM you're used to). Common Lisp is another option.
I'd like to add that there official compilers for languages and unofficial ones made by different people. Obviously because of this the performance changes per compiler.
If you were to talk just about the official compiler I'd say it's probably Fortran. It's very old but it's still used in most science and engineering projects because it is one of the fastest languages. C and C++ come probably tied in second because there also used in science and engineering.
Following link, I'm wondering if there're some side effects of enabling C++0x in GCC.
According to gcc: "GCC's support for C++0x is experimental".
What I'm afraid of is that for example compiler will generate some code differently or standard library uses some C++0x feature which is broken in gcc.
So if I don't explicitly use any of C++0x features, may it break my existing code?
The C++0x support has been, and is under heavy development. One thing this means is that bugs get fixed quickly, another thing it means is that there might be small bugs present. I say small, because of two reasons:
libstdc++ has not been rewritten from scratch, so all the old elements are just as stable as it was before any of this c++0x was available, if not more stable, because of several years of bug fixes.
There's corner cases in the new/old Standard that haven't yet been ironed out. Are these the runtime quirks you talk about? No. C++0x support has been under development for 4 releases now, don't worry.
Most of the impact from that flag will be felt in the new language features, the library features like move constructors and std::thread (on posix platforms) don't affect code not using them.
Bottom line, experimental is too strict a word in daily production. The standard has changed in the three/four years GCC has been working on support. Old revisions of c++0x will be broken in a newer GCC, but that's a good thing. C++0x is finished as far as the non-paying-for-a-pdf-world is concerned, so no breaking changes should be added. Decide if you want the new stuff or not beforehand, because you won't be able to jsut switch it off once you've gotten used to using it.
Usually, it won't break your source code, but you may include (even without noticing it or knowing it) C++0x idioms that will compile because of these features enabled, but won't compile in a strict C++ compiler (for instance, in C++0x you can use >> as a template of template terminator, but not in C++, so if you forget to separate it by a space, you will have problems when you try to compile this code in a C++ compiler).
R-Value references/moves is a feature which can have a huge impact on how the compiler does optimizations and such. Even if you don't use move in your own code, the STD includes will automatically switch to their new versions which include move ctors/assignment.
There are some circumstances which allow the compiler to create move constructors/assignment operators implicitly for user defined classes. These rules changed a few times during the standardization process (I don't even know what the current rules are). So, depending on the exact version of your compiler it could be using a set of rules for generating these implicit functions that isn't even in the latest version of the standard.
Most of the other major C++0x changes don't have big run-time impact, they are mostly compile time (constexpr, string literals, varadic templates) or syntax helpers (foreach, auto, initializer lists).
I originally wrote the question you link because of (in my opinion) a very big issue as described here. Basically, overloading a function with shared_ptr to a const type was not recognized by the compiler. That's a huge flaw in my opinion. It's been fixed from GCC 4.5 to GCC 4.6, but it serves as an example of a big bug that's still around in the default installation of GCC in ubuntu, for example. So while bugs are fixed quickly, there still might be bugs, and you may waste a weekend looking for the source and solution of those bugs.
My recommendation, based on this personal experience, is to avoid C++0x until the word "experimental" is removed from the description of GCC's support of C++0x or until you actually need any of the C++0x features to a degree that an alternative implementation would significantly sacrifice good design.
Setup: I installed Cygwin with the GNU programming lineup (gcc, g++, make, and gdb) and successfully compiled an ran a program that I was working on. Then I decided to install boost into Cygwin because I will need to use typical boost stuff as my program develops. So, using the Cywing setup.exe, I installed boost. After this, the program that I had just successfully compiled and ran no longer worked. (And recall that it didn't depend upon boost.)
I found out that when boost installed, it also installed a new compiler, g++-4.exe, whereas previously I had been using r++-3.exe. Boost had also symbolically linked g++.exe to the new compiler. After I changed back the symbolic link my old program compiled correctly.
Is there any reason that I should be using g++-4 rather than g++-3?
g++ 3 is very old and the gcc community has long since dropped maintenance of it. (GCC 4.3 is currently the oldest maintained release series.) There have been lots of language conformance improvements in newer versions (both in accepting valid code and rejecting bad code), so you'll have an easier time going forward if you bite the bullet now. You can check the release notes for each series (e.g. for 4.0) for explanations of these improvements and the code changes they might require. Personally, I find programming much more enjoyable when I can reason about programs according to a precise language specification, and only rarely be forced to understand the quirks of a particular compiler.
Also, Boost support for g++ 3 seems to be nearing an end, as Boost 1.44 considers GCC 3(.4.6) as an "additional test compiler" on only a single platform (RHEL). Boost development is linear (not branched), so you can find yourself in a situation where you need to upgrade to get bug fixes, but then find that your platform is no longer supported.