Is newer GCC documentation compatible with older documentation? - gcc

An example:
In "Using and Porting GCC" (2001), there is the macro SMALL_REGISTER_CLASSES, which tells the compiler to minimize the lifetime of hard registers. Its definition consists of a simple zero / non-zero expression, usually a constant.
In "GCC internals" (2011), the above macro is replaced by the following target hook:
bool TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P(enum mode)
which is not nearly as neat as the original macro.
Note: Not sure what the difference is between "Using and Porting" and "GCC internals" as far as porting goes (RTL representation, Machine Description and Target Description Modes and Functions). I started by reading the first one thoroughly because that was the suggested documentation, overlooking the fact that it is actually 10 years old.

The short answer is "no".
At the start of 2001, the current release was 2.95, although 3.0 was already well into development. The current release is 4.6, with 4.7 due in a few months. That's two major release numbers which means two large-scale rewrites of the source code, plus many many other smaller changes that add up to a lot of code churn.
Of course, you'll find lots of details that are the same now as ever, but the old documents are not to be trusted.
The current documentation is pretty good, as far as it goes, but it's hardly comprehensive, so if you'd like to improve it as you learn more, I'm sure it'll be appreciated. ;)

Related

Gcc llvm backend guides to make reading source codes a little bit easier?

I begin to get acquainted with the implementation of algorithms of code-generation and optimizations in gcc and llvm. Can anyone give an advice on where to see materials, articles, lectures about how it arranged in these compilers? I was trying to find something where described in fairly simple language such things as optimization and code generation algorithm's implementation or simply detailed explanation, but I didn't find. Maybe there is a exhaustive guide where I'll be able to find information about exact classes and methods which are called, in what files are these algorithms written, basic structures with which they operate (symbol tables and their entries, graphs, AST, struct tree and rtl in gcc etc). I'm familiar with Steven Muchnick's "Advanced compiler design and implementation", but it's quite complicated to find something similar in source codes of gcc and llvm to algorithms in ICAN notation without some useful information.
Summary:
My goal is to get acquainted with the implementation of optimization algorithms and code generation on the example of gcc and llvm. So I would like to find materials that somehow simplify reading of source code of gcc or llvm. I hope that these materials exist.
Your question is off-topic here (since about finding resources and books).
However, for GCC, I did collect several references and wrote hundreds of slides, see the documentation page of GCC MELT (and many web pages pointed from it).
For LLVM, you need to find equivalent documentation (there are lot of them too).
GCC MELT is now -in November 2017- an inactive project (so my slides cover older GCC versions). I could be funded to work on something similar.
Maybe there is a exhaustive guide
You won't find anything exhaustive and up to date because both GCC and Clang are evolving significantly and continuously. The most exhaustive is still the source code (of millions of lines, growing by a few percents each year), and the community behind it. You'll need several years of work (full-time) to comprehend these monster free software projects, and you should also follow their evolution.
Once you have spent several weeks reading about GCC and looking inside the source code, you can ask some precise questions on gcc#gcc.gnu.org. If you experiment some GCC plugin or work on your own fork of GCC, be sure to make it free software and publish now your alpha-quality -even buggy and incomplete- source code somewhere -perhaps on github- before asking, under a GPL license.
BTW, real-life compilers are much more complex than what is taught in textbooks, even as good as the Dragon Book. Nobody can understand GCC (or LLVM) completely (it is too complex for a single brain, and is evolving too fast) - and that also holds for any multi-million lines software project.
So I would like to find materials that somehow simplify reading of source code of gcc or llvm
Most of what I have written on GCC MELT (notably the slides that are not MELT specific, and all the references I have collected) fits that goal. However, the autoritative material is the -continuously changing- source code of GCC.
NB: My gcc-melt.org domain will be lost in April 2018 (and I probably won't renew it). So look on http://starynkevitch.net/Basile/gcc-melt which should be kept longer.

Why is declaration expression dropped in C# 6?

In the preview for C# 6, Microsoft introduced the syntactic sugar for declaring out parameter inline as in seen in this article
http://odetocode.com/blogs/scott/archive/2014/09/15/c-6-0-features-part-3-declaration-expressions.aspx
Does anybody know why this feature is dropped in the release version of .NET 4.6?
The explaination is in this codeplex topic.
Hi all,
As we enter the final stage in our long quest to renew the C# and Visual Basic experience, we’ve had to make some tough decisions around the set of language features that will make it into the next version of the languages.
These decisions are primarily based on cost vs. risk. Some of the features you’ve seen in the previews still need a lot of downstream work to be supported in the IDE, debugger, etc., and also to get to great quality in the compiler itself.
As you’ve maybe heard me say before, language features are a secondary consideration in this release. The primary goal is to deliver a magnificent first release of the Roslyn value proposition: deep language understanding in the IDE and available to everyone through a robust and comprehensive API. To deliver this well, we need to scale back our appetite for language features a bit.
The features we are cutting are:
Primary constructors in C# (along with initializers in structs)
Declaration expressions in C# / Out parameters in VB
They are both characterized by having large amounts of downstream work still remaining. They are also features that we see as the potential beginning of a bigger story further down the line: primary constructors could grow up to become a full-blown record feature, and declaration expressions would form the corner stone of pattern matching and deconstruction facilities. Now, those features will all be considered together for a later release. As a silver lining we then get to design this continuum holistically, rather than in steps that might tie our hands unduly in a later phase.
All that said, I am sad to let these features go, and I know that goes for many of you as well. You’ve provided amazingly valuable feedback on both these features, and those learnings will feed directly into our future design work. I cannot thank you enough for this incredible engagement! I also hope you’ve enjoyed seeing more of the “inner workings” this time around, even if it leads to disappointment when you watch things come and go in our plans. Your increased involvement has certainly been rewarding for us, and – we hope! – helped improve the quality and timeliness of our decisions.
There’s a bit of good news too: string interpolation (which hasn’t been previewed yet) is currently looking to make it in. You should see that one show up first in C# (where we’ve already prototyped our approach), and a little later in VB.
Thanks again!
The reason is somewhere out there in https://github.com/dotnet/roslyn/issues.
But the main reason is that it was finish and decisions made to make it into C#6 might limit features planed for C#7 and beyond.

ATmegaXXX V, P, does it matter for compilation?

I used a ATmega649 before but then switched to ATmega649V.
Does it matter which MCU version given to the compiler, ATmega649, ATmega649V or ATmega649P?
I understand it as the architecture is exactly the same it is only some powersaving that is somehow achieved without changing the architecture that is the difference?
Using avr-gcc.
well, you can use an "almost" compatible architecture with no harm, though you have to triple check the datasheet that there's no difference in the way registers are setup otherwise your program won't work, or worst will work until a feature is failing. It is usually a source of frustration when you've forgotten you've been using a close enough, but not exactly the architecture you're targetting.
I don't know well enough the Atmega649X, and I won't thoroughly read the lengthy datasheets to find those differences. So if you decide to do it, be careful, and don't forget about that!
usually the additional letters signalize differences in max speed, supply voltage ratings or power consumptions. the core itself is compatible. so if numbers match, it is no difference from the compilers point of view.
however the flash tool may recognize them as different parts and require correct settings.

Trying to understand Wirth's Pascal pl/0 compiler code

Is there a simple explanation of Wirth's source code or even a version with a little more commenting so that I can figure out how it works?
Wirths pl/0 compiler is here: http://www.moorecad.com/standardpascal/plzero.pas
My main goal is to modify it to work with integer arrays similarly to Oberon but to touch the code as little as possible
Oberon referenced here: http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf
The code is described in detail in Wirth's book, Algorithms + Data Structures = Programs. I'm looking at the 1976 edition, which contains about 70 pages about the program.
As far as I can tell, the 1976 version of the book is not online, but he later ported the code to Modula-2 and then Oberon. The Oberon edition is available as a free PDF, but the PL/0 chapter was removed and expanded into a second book (also free online), Compiler Construction.
This expanded book uses a more robust language called Oberon-0, which includes arrays, records, types, etc. He discusses in detail how to implement each of these things.
The entire compiler is different, since it's written in Oberon and targets a different machine, but all of Wirth's compilers share the same basic structure, so you should be able to map ideas between them.
Alternatively, he also wrote another expanded compiler in pascal (the "p4" reference implementation for ISO pascal. That compiler has been extensively studied and documented in the book Pascal Implementation, now transformed into a nice website with hypertext cross references to the source.
Finally, there is also a python port of the PL/0 compiler by Samuel G Williams. My fork of his PL/0 Languages Tools includes a couple additional back-ends, as well as a copy of Wirth's original code (the program you linked), modified slightly to run under Free Pascal.

GCC: Inline assembly - good for?

So I just found out GCC could do inline assembly and I was wondering two things:
What's the benefit of being able to inline assembly?
Is it possible to use GCC as an assembly compiler/assembler to learn assembly?
I've found a couple articles but they are all oldish, 2000 and 2001, not really sure of their relevance.
Thanks
The benefit of inline assembly is to have the assembly code, inlined (wait wait, don't kill me). By doing this, you don't have to worry about calling conventions, and you have much more control of the final object file (meaning you can decide where each variable goes- to which register or if it's memory stored), because that code won't be optimized (assuming you use the volatile keyword).
Regarding your second question, yes, it's possible. What you can do is write simple C programs, and then translate them to assembly, using
gcc -S source.c
With this, and the architecture manuals (MIPS, Intel, etc) as well as the GCC manual, you can go a long way.
There's some material online.
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/
The downside of inline assembly, is that usually your code will not be portable between different compilers.
Hope it helps.
Inline Assembly is useful for in-place optimizations, and access to CPU features not exposed by any libraries or the operating system.
For example, some applications need strict tracking of timing. On x86 systems, the RDTSC assembly command can be used to read the internal CPU timer.
Time Stamp Counter - Wikipedia
Using GCC or any C/C++ compiler with inline assembly is useful for small snippets of code, but many environments do not have good debugging support- which will be more important when developing projects where inline assembly provides specific functionality. Also, portability will become a recurring issue if you use inline assembly. It is preferable to create specific items in a suitable environment (GNU assembler, MASM) and import them projects as needed.
Inline assembly is generally used to access hardware features not otherwise exposed by the compiler (e.g. vector SIMD instructions where no intrinsics are provided), and/or for hand-optimizing performance critical sections of code where the compiler generates suboptimal code.
Certainly there is nothing to stop you using the inline assembler to test routines you have written in assembly language; however, if you intend to write large sections of code you are better off using a real assembler to avoid getting bogged down with irrelevancies. You will likely find the GNU assembler got installed along with the rest of the toolchain ;)
The benefit of embedding custom assembly code is that sometimes (dare I say, often times) a developer can write more efficient assembly code than a compiler can. So for extremely performance intensive items, custom written assembly might be beneficial. Games tend to come to mind....
As far as using it to learn assembly, I have no doubt that you could. But, I imagine that using an actual assembly SDK might be a better choice. Aside from the standard experimentation of learning how to use the language, you'd probably want the knowledge around setting up a development environment.
You should not learn assembly language by using the inline asm feature.
Regarding what it's good for, I agree with jldupont, mostly obfuscation. In theory, it allows you to easily integrate with the compiler, because the complex syntax of extended asm allows you to cooperate with the compiler on register usage, and it allows you to tell the compiler that you want this and that to be loaded from memory and placed in registers for you, and finally, it allows the compiler to be warned that you have clobbered this register or that one.
However, all of that could have been done by simply writing standard-conforming C code and then writing an assembler module, and calling the extension as a normal function. Perhaps ages ago the procedure call machine op was too slow to tolerate, but you won't notice today.
I believe the real answer is that it is easier, once you know the contraint DSL. People just throw in an asm and obfuscate the C program rather than go to the trouble of modifying the Makefile and adding a new module to the build and deploy workflow.
This isn't really an answer, but kind of an extended comment on other peoples' answers.
Inline assembly is still used to access CPU features. For instance, in the ARM chips used in cell phones, different manufacturers distinguish their offerings via special features that require unusual machine language instructions that would have no equivalent in C/C++.
Back in the 80s and early 90s, I used inline assembly a lot for optimizing loops. For instance, C compilers targeting 680x0 processors back then would do really stupid things, like:
calculate a value and put it in data register D1
PUSH D1, A7 # Put the value from D1 onto the stack in RAM
POP D1, A7 # Pop it back off again
do something else with the value in D1
But I haven't needed to do that in, oh, probably fifteen years, because modern compilers are much smarter. In fact, current compilers will sometimes generate more efficient code than most humans would. Especially given CPUs with long pipelines, branch prediction, and so on, the fastest-executing sequence of instructions is not always the one that would make most sense to a human. So you can say, "Do A B C D in that order", and the compiler will scramble the order all around for greater efficiency.
Playing a little with inline assembly is fine for starters, but if you're serious, I echo those who suggest you move to a "real" assembler after a while.
Manual optimization of loops that are executed a lot. This article is old, but can give you an idea about the kinds of optimizations hand-coded assembly is used for.
You can also use the assembler gcc uses directly. It's called as (see man as). However, many books and articles on assembly assume you are using a DOS or Windows environment. So it might be kind of hard to learn on Linux (maybe running FreeDOS on a virtual machine), because you not only need to know the processor (you can usually download the official manuals) you code for but also how hook to into the OS you are running.
A nice beginner book using DOS is the one by Norton and Socha. It's pretty old (the 3rd and latest edition is from 1992), so you can get used copies for like $0.01 (no joke). The only book I know of that is specific to Linux is the free "Programming from the Ground Up"

Resources