How to analyse GCC Internal Representation like GIMPLE, RTL - gcc

I have generated dump output files using command -fdump-tree-all and -fdump-rtl-all and I got a lot of dump files. I have read that the codes in GIMPLE are in pseudo-C syntax and RTL dump files are too low level to be understood. Is there any ways to understand GIMPLE and RTL dump files? Any software that can convert it to C code or something useful? Any tutorial to learn to understand it? Thanks

the best way to do it (for me) is to dump some examples and understand by yourself the emitted code. It's not difficult, there are some change from the original code (like cycles are transformed in if with goto), there are a lot of passes in gcc and my advice is to dump what you need. In my case i use frequently the commands:
-fdump-tree-lower
-fdump-tree-cfg
-fdump-tree-ssa
-fdump-tree-optimized (it's the last pass before going into rtl passes)
rtl is almost incompressible and it's needed a great understanding over that dialect

Related

How (if possible) to generate a C/C++ AST with offsets of the variables and statements?

I want to generate an AST that lists the offset of each element of a program so that I can "track" it when its running.
I've looked into using gcc --fdump-tree-switch-options=address but this had been endated, so was looking for an alternative for that.
I know objdump can produce offsets but that is after the program has been assembled.
I'm looking for a way to either do it through the command line or using python.
Any pointers or help is much appreciated.

Searching for recent GCC GIMPLE grammar

For my finale year project I'm learning about compiler techniques, and currently I'm trying to experiment with the GCC intermediate representation (raw GIMPLE) and getting the control flow graphs from different source files (C, Cpp and Java) using GCC-5.4.
So far i can generate *.004t.gimple and *.011t.cfg raw files using -fdump-tree-all-graph-raw but later I'm looking to understand more the GIMPLE language so i searched for its grammar and i have found this :
GIMPLE WIKI
SIMPLE
GENERIC and GIMPLE
latest GIMPLE Doc (has no grammar!!!)
GCC FE
grammar for gcc-4.3.6
grammar for gcc-4.2.1
GIMPLE Doc for gcc-5.4.0 (has no grammar too!!!)
So the language seems to be constantly changing and have multiple formats (High level GIMPLE, Low_level_GIMPLE, SSA GIMPLE, tree) and also the grammar seems to keep changing between versions but i can't find the GIMPLE grammar for the recent versions and specifically the one used in GCC-5.4 and i can't understand the different formats.
Questions about the grammar :
where can i find the GIMPLE grammar used in GCC-5.4 and more recent versions?
how is it written ? (in BNF or EBNF or ...)
How does GCC implement this grammar to generate, parse and understand
Gimple files it generates and later transform them to RTL?
is it possible for me to write a small subset of the GIMPLE grammar
in Xtext from examples of *.004t.gimple files that i generate?
Questions about the formats:
What's the difference between the 3 Gimple formats? (i can't seem to
find detailed documentation about each one in the wiki)
which format is used in the raw files *.c.004t.gimple and
*.c.011t.cfg ? (High or Low, ...)
which one represents better the control flow from the original source
code without optimizations ?
Thank You,
It looks like you just starting to learn GIMPLE and did not even read documents you`re posted above. I am digging in depth of GCC for some time and I will try to answer your questions.
Anyway you need to read gccint document lays here: https://gcc.gnu.org/onlinedocs/gccint.pdf it helps to answer some questions and gives some info about GIMPLE, and this is the only document where GIMPLE is described at least somehow. The best description in sources, it is sad but as is. Look also here, http://www.netgull.com/gcc/summit/2003/GENERIC%20and%20GIMPLE.pdf, this document based on gccint and consist of some extract from.
There is no "GIMPLE grammar" described in a clear way, like C language, just look in sources, maybe some poor examples on the internet.
I think it is generated from Tree-adjoining grammar(TAG), based on SIMPLE IL used by the McCAT compiler project at McGill University [SIMPLE].
How GCC implement and understand? And again you need to look in depths of GCC, gimple.h, basic-block.h, tree-pass.h for example, all of these lays in $src/gcc/. Some part of the functions is described in gccint in section GIMPLE. The reference gccint is not exactly accurate, it consists of some outdated functions and references, you must remember that(FOR_EACH_BB for example, deprecated in 2013).
About Xtext, I never used that, and I do not understand the need to write some GIMPLE yourself, which is intermediate language IL you can create a plugin for optimizing your code flow, but I can not see the need to use GIMPLE separately.
About format.
There is one GIMPLE format, but it can have two forms AFAIK. GIMPLE HIGH it is just GIMPLE that is not fully lowered and consists of the IL before the pass pass_lower_cf. High GIMPLE contains some container statements like lexical scopes (represented by GIMPLE_BIND) and nested expressions (e.g., GIMPLE_TRY). Low GIMPLE exposes all of the implicit jumps for control and exception expressions directly in the IL and EH region trees(EH means Exception Handling). There is also RAW representation, it is some kind of polish notation as I understand, IMO it more useful than usual representation, you can get it with -fdump-tree-all-all-raw for example.
*.c.004t.gimple - this is the first step of GIMPLE appear, *.c.011t.cfg - first attempt for control flow graph(cfg). The internal name of GIMPLE lower is "lower" you can see them in gimple-low.c in section
const pass_data pass_data_lower_cf =
{
GIMPLE_PASS, /* type */
"lower", /* name */
OPTGROUP_NONE, /* optinfo_flags */
TV_NONE, /* tv_id */
PROP_gimple_any, /* properties_required */
PROP_gimple_lcf, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
0, /* todo_flags_finish */
};
You can use search and find that this pass is *.c.007t.lower
The answer is above I think, I am using RAW representation it is more informative IMO.
It not much, but I hope it helps you with your GCC exploration, and sorry for my bad "Engrish".

Boost Spirit Qi : Is it suitable language/tool to analyse/cut a "multiline" data file?

I want to apply various operations to data files : algebra of sets, statistics, reporting, changes. But the format of the files is far from code examples and a bit weird. There are differents sorts of items, items type, and some of them are put together as a collection. There is a simplistic example below.
I'm new in boost::spirit and I have tried coding to split the items and get basic informations (name, version, date) required for most of treatments. Eventually it seems tricky for me. Is the problem my lack of skills or boost::spirit is not suitable to this format?
Studying boost::spirit is not a waste of time, I am sure to use it later. But I didn't find examples of code like mine, I may not go the right way.
>>>process_type_A
//name(typeA_1)
//version(A.1.99)
//date(2016.01.01)
//property1 "pA11"
//property2 "pA12"
//etc_A_1 (thousand of lines - a lot are "multiline" and/or mulitline sub-records)
<<<process_type_A
>>>process_type_A
//name(typeA_2)
//version(A.2.99)
//date(2016.01.02)
//property1 "pA21"
//property2 "pA22"
//etc_A_2 (hundred or thousand of lines)
<<<process_type_A
>>>process_type_B
//name(typeB_1)
//version(B.1.99)
//date(2016.02.01)
//property1 "pB11"
//property2 "pB12"
//etc_B_1 (hundred or thousand of lines)
<<<process_type_B
>>>paramset_type_C
//>>paramlist
////name(typeC_1)
////version(C.1.99)
////date(2016.03.01)
////property1 "pC11"
////property2 "pC12"
////etc_C_1 (hundred or thousand of lines)
//<<paramlist
//>>paramlist
////name(typeC_2)
////version(C.2.99)
////date(2016.04.01)
////property1 "pC21"
////property2 "pC22"
////etc_C_2 (hundred or thousand of lines)
//<<paramlist
<<<paramset_type_C
Code::Blocks
Boost 1.60.0
GCC Compiler on Windows and Linux
I think #Orient is right: regex w/captures is enough here.
However, Spirit has the upside of coming without a linker dependency. Here's some approaches (using seek[] and raw[]) for inspiration:
Boost spirit revert parsing
rule to extract key+phrases from a text document
Parsing text file with binary envelope using boost Spririt (binary content)
much more involved logic: How to implement #ifdef in a boost::spirit::qi grammar?
Note that Spirit X3 (still experimental) also has a seek[] directive and it will compiler much faster.
The main advice I would give about Qi is that it is a very powerful and flexible tool for parsing. You can define quite complicated, possibly recursive structures, using boost::variant, boost::optional, etc., and associate these types with qi rules and it seemingly magically does the right thing, giving you a nice AST for your data.
The biggest sources of difficulty in my (limited) experience are when you try to make it do more than that and also process the data. It's sometimes tempting to try to "eagerly" do some processing at the same time that you are parsing the data, often in a semantic action or something. Don't do it! It usually makes things harder to read in the end, a bit harder to debug, and sometimes you can be surprised what will happen if the grammar has to backtrack across your semantic action which it already executed.
qi should work great if you can write a nice grammar for your data. If you can't write an unambiguous grammar, you might be able to use qi::eps to make it parseable but you don't want to have to do that too often IMO. I don't think "hundreds or thousands" of items will pose any particular problem.
Right now the question is rather opinion-oriented -- if you can post a more complete description of the data format you have, or better, a complete code example which is failing, it might make it easier to give precise answers.

How to convert a VHDL code in Verilog using Icarus Verilog?

I can't find an example in doc to convert a VHDL code to Verilog with icarus. I found how to do verilog to VHDL here.
I tried to modify the command to do VHDL convertion on this code :
$ iverilog -tvlog95 -o button_deb.v button_deb.vhdl
button_deb.vhdl:3: syntax error
I give up.
But I've got a syntax error. Is my VHDL code is wrong ? Or is it iverilog command that is wrong ?
There's no Verilog target, so you can't generate Verilog output, and VHDL compilation is still experimental anyway. You could ask on the mailing list to make sure there's nothing under the hood which could help. VHDL to Verilog conversion is only possible in relatively simple cases (synthesisable code should be Ok), so you may have to do it manually anyway.
It seems that some support has arrived in the meantime (mainly using -g2005-sv, -g2009, or -g2012 switch) . Try this:
iverilog -g2012 -tvlog95 -o button_deb.v button_deb.vhd
If you pay closer attention to the output you'll see that in this way you'll loose the two generic at the entity interface. Using vhdlpp directly could be useful:
/path/to/vhdlpp button_deb.vhd > button_deb.v

Extracting the Control Flow Graph from the gcc output

I am trying to extract the Control Flow Graph from the assembly code that gcc produces. I have manage to dump the CFG of several IRs (rtl phases) into .vcg files using the arguments -fdump-rtl-* and -dv. Is there any way to do the same thing but for the final assembly code? I would like a generic, target-independent and easy to be parsed representation (like vcg representation). My source code is in C language (in case that it plays any important role).
Best regards,
Michalis.
Intel PTU and VTune will do it if you can run the app for profiling... not sure if it can generate the graph without having run the code though. Otherwise you might be looking at something like this: http://compilers.cs.ucla.edu/avrora/cfg.html.

Resources