Searching for recent GCC GIMPLE grammar - gcc

For my finale year project I'm learning about compiler techniques, and currently I'm trying to experiment with the GCC intermediate representation (raw GIMPLE) and getting the control flow graphs from different source files (C, Cpp and Java) using GCC-5.4.
So far i can generate *.004t.gimple and *.011t.cfg raw files using -fdump-tree-all-graph-raw but later I'm looking to understand more the GIMPLE language so i searched for its grammar and i have found this :
GIMPLE WIKI
SIMPLE
GENERIC and GIMPLE
latest GIMPLE Doc (has no grammar!!!)
GCC FE
grammar for gcc-4.3.6
grammar for gcc-4.2.1
GIMPLE Doc for gcc-5.4.0 (has no grammar too!!!)
So the language seems to be constantly changing and have multiple formats (High level GIMPLE, Low_level_GIMPLE, SSA GIMPLE, tree) and also the grammar seems to keep changing between versions but i can't find the GIMPLE grammar for the recent versions and specifically the one used in GCC-5.4 and i can't understand the different formats.
Questions about the grammar :
where can i find the GIMPLE grammar used in GCC-5.4 and more recent versions?
how is it written ? (in BNF or EBNF or ...)
How does GCC implement this grammar to generate, parse and understand
Gimple files it generates and later transform them to RTL?
is it possible for me to write a small subset of the GIMPLE grammar
in Xtext from examples of *.004t.gimple files that i generate?
Questions about the formats:
What's the difference between the 3 Gimple formats? (i can't seem to
find detailed documentation about each one in the wiki)
which format is used in the raw files *.c.004t.gimple and
*.c.011t.cfg ? (High or Low, ...)
which one represents better the control flow from the original source
code without optimizations ?
Thank You,

It looks like you just starting to learn GIMPLE and did not even read documents you`re posted above. I am digging in depth of GCC for some time and I will try to answer your questions.
Anyway you need to read gccint document lays here: https://gcc.gnu.org/onlinedocs/gccint.pdf it helps to answer some questions and gives some info about GIMPLE, and this is the only document where GIMPLE is described at least somehow. The best description in sources, it is sad but as is. Look also here, http://www.netgull.com/gcc/summit/2003/GENERIC%20and%20GIMPLE.pdf, this document based on gccint and consist of some extract from.
There is no "GIMPLE grammar" described in a clear way, like C language, just look in sources, maybe some poor examples on the internet.
I think it is generated from Tree-adjoining grammar(TAG), based on SIMPLE IL used by the McCAT compiler project at McGill University [SIMPLE].
How GCC implement and understand? And again you need to look in depths of GCC, gimple.h, basic-block.h, tree-pass.h for example, all of these lays in $src/gcc/. Some part of the functions is described in gccint in section GIMPLE. The reference gccint is not exactly accurate, it consists of some outdated functions and references, you must remember that(FOR_EACH_BB for example, deprecated in 2013).
About Xtext, I never used that, and I do not understand the need to write some GIMPLE yourself, which is intermediate language IL you can create a plugin for optimizing your code flow, but I can not see the need to use GIMPLE separately.
About format.
There is one GIMPLE format, but it can have two forms AFAIK. GIMPLE HIGH it is just GIMPLE that is not fully lowered and consists of the IL before the pass pass_lower_cf. High GIMPLE contains some container statements like lexical scopes (represented by GIMPLE_BIND) and nested expressions (e.g., GIMPLE_TRY). Low GIMPLE exposes all of the implicit jumps for control and exception expressions directly in the IL and EH region trees(EH means Exception Handling). There is also RAW representation, it is some kind of polish notation as I understand, IMO it more useful than usual representation, you can get it with -fdump-tree-all-all-raw for example.
*.c.004t.gimple - this is the first step of GIMPLE appear, *.c.011t.cfg - first attempt for control flow graph(cfg). The internal name of GIMPLE lower is "lower" you can see them in gimple-low.c in section
const pass_data pass_data_lower_cf =
{
GIMPLE_PASS, /* type */
"lower", /* name */
OPTGROUP_NONE, /* optinfo_flags */
TV_NONE, /* tv_id */
PROP_gimple_any, /* properties_required */
PROP_gimple_lcf, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
0, /* todo_flags_finish */
};
You can use search and find that this pass is *.c.007t.lower
The answer is above I think, I am using RAW representation it is more informative IMO.
It not much, but I hope it helps you with your GCC exploration, and sorry for my bad "Engrish".

Related

Where is __builtin_va_start defined?

I'm trying to locate where __builtin_va_start is defined in GCC's source code, and see how it is implemented. (I was looking for where va_start is defined and then found that this macro is defined as __builtin_va_start.) I used cscope -r in GCC 9.1's source code directory to search the definition but haven't found it. Can anyone point where this function is defined?
That __builtin_va_start is not defined anywhere. It is a GCC compiler builtin (a bit like sizeof is a compile-time operator). It is an implementation detail related to the <stdarg.h> standard header (provided by the compiler, not the C standard library implementation libc). What really matters are the calling conventions and ABI followed by the generated assembler.
GCC has special code to deal with compiler builtins. And that code is not defining the builtin, but implementing its ad-hoc behavior inside the compiler. And __builtin_va_start is expanded into some compiler-specific internal representation of your compiled C/C++ code, specific to GCC (some GIMPLE perhaps)
From a comment of yours, I would infer that you are interested in implementation details. But that should be in your question
If you study GCC 9.1 source code, look inside some of gcc-9.1.0/gcc/builtins.c (the expand_builtin_va_start function there), and for other builtins inside gcc-9.1.0/gcc/c-family/c-cppbuiltin.c, gcc-9.1.0/gcc/cppbuiltin.c, gcc-9.1.0/gcc/jit/jit-builtins.c
You could write your own GCC plugin (in 2Q2019, for GCC 9, and the C++ code of your plugin might have to change for the future GCC 10) to add your own GCC builtins. BTW, you might even overload the behavior of the existing __builtin_va_start by your own specific code, and/or you might have -at least for research purposes- your own stdarg.h header with #define va_start(v,l) __my_builtin_va_start(v,l) and have your GCC plugin understand your __my_builtin_va_start plugin-specific builtin. Be however aware of the GCC runtime library exception and read its rationale: I am not a lawyer, but I tend to believe that you should (and that legal document requires you to) publish your GCC plugin with some open source license.
You first need to read a textbook on compilers, such as the Dragon book, to understand that an optimizing compiler is mostly transforming internal representations of your compiled code.
You further need to spend months in studying the many internal representations of GCC. Remember, GCC is a very complex program (of about ten millions lines of code). Don't expect to understand it with only a few days of work. Look inside the GCC resource center website.
My dead GCC MELT project had references and slides explaining more of GCC (the design philosophy and architecture of GCC changes slowly; so the concepts are still relevant, even if individual details changed). It took me almost ten years full time to partly understand some of the middle-end layers of GCC. I cannot transmit that knowledge in a StackOverflow answer.
My draft Bismon report (work in progress, funded by H2020, so lot of bureaucracy) has a dozen of pages (in its sections ยง1.3 and 1.4) introducing the internal representations of GCC.

What is this apparently non-standard struct packing syntax fed into GCC?

I am a little bit dumbstruck by some code that is associated with a 3rd-party code base I'm working with. All code is written in C or assembler except for a number of files adhering to the syntax described below. I cannot find any documentation on this syntax yet GCC swallows it without any problem. It's GCC 8 I work with. The syntac must be some extension to GCC. It would be very nice if somebody could enlighten me as to exactly what extension it is and where it is documented.
The code obviously defines struct types with packing and uses syntax like this:
Comment lines begin with "--"
Keywords are "block", "padding", "field", and "field_high", possibly more. A typical piece of code looks like this:
block <BLOCK_NAME> {
field <FIELD_NAME_NO_1> 1
field <FIELD_NAME_NO_2> 1
padding 8
field_high <FIELD_NAME_NO_3> 6
}
A block can contain any number of fields and paddings. The numbers given always add up to a word length on the target architecture.
Files containing this kind of code most often have ".bf" es their extension while ".c" can occur too. Some files have #include's referring to ordinary C headers while some ordinary C files have #includes referring to ".bf" files.
A quick glance at the tools directory in the Git repository found me bitfield_gen.py, which claims to be a code generator for "bitfield structures". I presume that's what .bf stands for.
There are some CMake functions for building bitfield targets in tools/helpers.cmake. That will probably make sense to people more familiar with CMake than I am.
The Bit Field Generator is documented here http://research.davidcock.fastmail.fm/papers/Cock_08.pdf

Make: Uses of static pattern rules in make

I would like to know the Uses of static pattern rules against normal rules in make. I an new to make and gone through some tutorials. I want to know when do we use this static pattern rules ? Could you please explain in brief ?
Thanks in Advance.
Your question is mostly a matter of opinion. Notice that there are several build automation tools (not only GNU make), e.g. also ninja, scons, omake, etc...
When you code in C (or in C++....) some project, you could have some C (or C++) files which are generated from something else (e.g. by lemon or by your own utility...). For such cases (pedantically you could call them metaprogramming), pattern rules could be useful (in particular if you have several such cases in a project). In other cases you generate other files (than object files) from C source (e.g. generating documentation with doxygen), and then pattern rules are also very useful.
An example of a large C++ project with many C++ code generators is the GCC compiler. And back when (in 2009) GCC was coded in C, it already had a dozen of specialized code generator programs emitting some C code. For these cases, pattern rules could be convenient.
Of course, pattern rules are a luxury. You could in principle generate your Makefile and have it contain a simple rule for each individual file. (in GCC, the Makefile-s are generated by autoconf and automake based things...)
If you observe and study the source code of most large free software projects, you'll find out that most of them do have generators for C (or C++) files. So generating C code is a usual practice (the original Unix from late 1970s did that already). Today, some software projects have most or even all (e.g. CAIA) of their C code generated.

How to analyse GCC Internal Representation like GIMPLE, RTL

I have generated dump output files using command -fdump-tree-all and -fdump-rtl-all and I got a lot of dump files. I have read that the codes in GIMPLE are in pseudo-C syntax and RTL dump files are too low level to be understood. Is there any ways to understand GIMPLE and RTL dump files? Any software that can convert it to C code or something useful? Any tutorial to learn to understand it? Thanks
the best way to do it (for me) is to dump some examples and understand by yourself the emitted code. It's not difficult, there are some change from the original code (like cycles are transformed in if with goto), there are a lot of passes in gcc and my advice is to dump what you need. In my case i use frequently the commands:
-fdump-tree-lower
-fdump-tree-cfg
-fdump-tree-ssa
-fdump-tree-optimized (it's the last pass before going into rtl passes)
rtl is almost incompressible and it's needed a great understanding over that dialect

Get the control-flow graph in assembly

I am desperately looking for a way to get the control-flow graph in assembly. I have the source code written in C and the processor is x86. I have already looked at the gcc's documentation and it does provide cfg in only gimple and rtl format. Any idea how to get it in assembly format?
If all you need is to view a control flow graph of the program I can suggest to use the free evaluation version of the Interactive Disassembler, more commonly known as IDA.
If you visit their website, under the screenshots section it displays the graph view of a compiled method from the binary itself.

Resources