I am desperately looking for a way to get the control-flow graph in assembly. I have the source code written in C and the processor is x86. I have already looked at the gcc's documentation and it does provide cfg in only gimple and rtl format. Any idea how to get it in assembly format?
If all you need is to view a control flow graph of the program I can suggest to use the free evaluation version of the Interactive Disassembler, more commonly known as IDA.
If you visit their website, under the screenshots section it displays the graph view of a compiled method from the binary itself.
Related
For my finale year project I'm learning about compiler techniques, and currently I'm trying to experiment with the GCC intermediate representation (raw GIMPLE) and getting the control flow graphs from different source files (C, Cpp and Java) using GCC-5.4.
So far i can generate *.004t.gimple and *.011t.cfg raw files using -fdump-tree-all-graph-raw but later I'm looking to understand more the GIMPLE language so i searched for its grammar and i have found this :
GIMPLE WIKI
SIMPLE
GENERIC and GIMPLE
latest GIMPLE Doc (has no grammar!!!)
GCC FE
grammar for gcc-4.3.6
grammar for gcc-4.2.1
GIMPLE Doc for gcc-5.4.0 (has no grammar too!!!)
So the language seems to be constantly changing and have multiple formats (High level GIMPLE, Low_level_GIMPLE, SSA GIMPLE, tree) and also the grammar seems to keep changing between versions but i can't find the GIMPLE grammar for the recent versions and specifically the one used in GCC-5.4 and i can't understand the different formats.
Questions about the grammar :
where can i find the GIMPLE grammar used in GCC-5.4 and more recent versions?
how is it written ? (in BNF or EBNF or ...)
How does GCC implement this grammar to generate, parse and understand
Gimple files it generates and later transform them to RTL?
is it possible for me to write a small subset of the GIMPLE grammar
in Xtext from examples of *.004t.gimple files that i generate?
Questions about the formats:
What's the difference between the 3 Gimple formats? (i can't seem to
find detailed documentation about each one in the wiki)
which format is used in the raw files *.c.004t.gimple and
*.c.011t.cfg ? (High or Low, ...)
which one represents better the control flow from the original source
code without optimizations ?
Thank You,
It looks like you just starting to learn GIMPLE and did not even read documents you`re posted above. I am digging in depth of GCC for some time and I will try to answer your questions.
Anyway you need to read gccint document lays here: https://gcc.gnu.org/onlinedocs/gccint.pdf it helps to answer some questions and gives some info about GIMPLE, and this is the only document where GIMPLE is described at least somehow. The best description in sources, it is sad but as is. Look also here, http://www.netgull.com/gcc/summit/2003/GENERIC%20and%20GIMPLE.pdf, this document based on gccint and consist of some extract from.
There is no "GIMPLE grammar" described in a clear way, like C language, just look in sources, maybe some poor examples on the internet.
I think it is generated from Tree-adjoining grammar(TAG), based on SIMPLE IL used by the McCAT compiler project at McGill University [SIMPLE].
How GCC implement and understand? And again you need to look in depths of GCC, gimple.h, basic-block.h, tree-pass.h for example, all of these lays in $src/gcc/. Some part of the functions is described in gccint in section GIMPLE. The reference gccint is not exactly accurate, it consists of some outdated functions and references, you must remember that(FOR_EACH_BB for example, deprecated in 2013).
About Xtext, I never used that, and I do not understand the need to write some GIMPLE yourself, which is intermediate language IL you can create a plugin for optimizing your code flow, but I can not see the need to use GIMPLE separately.
About format.
There is one GIMPLE format, but it can have two forms AFAIK. GIMPLE HIGH it is just GIMPLE that is not fully lowered and consists of the IL before the pass pass_lower_cf. High GIMPLE contains some container statements like lexical scopes (represented by GIMPLE_BIND) and nested expressions (e.g., GIMPLE_TRY). Low GIMPLE exposes all of the implicit jumps for control and exception expressions directly in the IL and EH region trees(EH means Exception Handling). There is also RAW representation, it is some kind of polish notation as I understand, IMO it more useful than usual representation, you can get it with -fdump-tree-all-all-raw for example.
*.c.004t.gimple - this is the first step of GIMPLE appear, *.c.011t.cfg - first attempt for control flow graph(cfg). The internal name of GIMPLE lower is "lower" you can see them in gimple-low.c in section
const pass_data pass_data_lower_cf =
{
GIMPLE_PASS, /* type */
"lower", /* name */
OPTGROUP_NONE, /* optinfo_flags */
TV_NONE, /* tv_id */
PROP_gimple_any, /* properties_required */
PROP_gimple_lcf, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
0, /* todo_flags_finish */
};
You can use search and find that this pass is *.c.007t.lower
The answer is above I think, I am using RAW representation it is more informative IMO.
It not much, but I hope it helps you with your GCC exploration, and sorry for my bad "Engrish".
I am starting to work on a project and for one of the tasks I need to analyze the source code in order to gather information about the classes and their methods. More specifically, for each method I need to know which internal attributes and external objects (references) it uses throughout the entire method body.
I discussed it with my supervisors and they think that Bytecode manipulation libraries is the way to go. I already looked at BCEL, ASM and Javassist but I'm not sure which one I need to use. Do they all provide access to the method body where I can see all the instructions and get the information I need?
Any advice would be appreciate it. Thank you!
If you really “need to analyze the source code”, then libraries which allow to inspect the bytecode are not the way to go.
Otherwise, you really need to define your task precisely. Either, you are about to analyze classes, regardless of whether you will look at their source code or byte code, or you want to analyze source code and consider doing it by compiling first, followed by analyzing the compiled result. In the latter case, you have to compare the effort of both steps with alternative solution, which may, e.g. incorporate direct source code analysis.
Parsing byte code is rather easy, easier than analyzing source code, which is the reason why bytecode is produced prior to the execution of Java programs. To answer your concrete question, yes, all three libraries offer you a way to analyze the instructions and associated information. Which one is the best to fit your needs, is a question that is beyond the scope of Stackoverflow.
Whether analyzing the byte code helps, depends on your exact requirements. When it comes to field and method access, you may precisely get most of them using that approach. Only inlined compile-time constants lack their origins. When it comes to type use, you have to consider that not every source code artifact has an existing counterpart in the byte code, e.g. widening casts produce no actual code and and local variables usually don’t have a declared type (debugging information aside), but only an implied type which depends on how they are actually used. They also have no information about Generics, unless debugging information has been included.
It is said that Pharo's VM (CogVM) is developed, tested, profiled and etc in Smalltalk, but then the Smalltalk code is transcompiled to C, which is then compiled along side with some OS abstraction C code using the default system C compiler.
Well, I'd like to do a similar thing, I wan't to develop, test and profile code using Pharo, but then compile it to C. How can I do it? How the compilation to C works? Does Pharo comes with a Smalltalk to C transcompiler? How can I invoke it? Does it compile the full Smalltalk, or I have to use some kind of a Smalltalk subset? Is there any good documentation about it?
The Pharo VM is hosted on github.
Follow the steps to build it and you'll get a Smalltalk image called "generator.image" which you can run (it's a regular image). Inside of that image you'll find the VMMaker package which is responsible for generating the C code from the special Smalltalk dialect used for this (which is called Slang; it's a subset of Smalltalk). Look at the code in the generator image to get a feel for what it does. There's also some information contained in the workspaces that are open when you first open the image.
As soon as you have the C sources it's basically straight forward C compilation (which we do with Cmake + gcc / clang).
As for documentation: you should probably read the Blue Book.
clarifiation
As #Leandro Caniglia points out in the comment, the purpose of Slang is to generate C source code for the VM. It has been designed to ease translation to C. That does not mean that:
arbitrary Smalltalk code can be translated to C using the generating mechanism
arbitrary Smalltalk code can be rewritten in Slang (at least not "easily")
I have used Doxigen + graphviz to get images from my code (C++ in Eclipse) but it doesn't reflect the code flow, I mean I want to get the code structures (if/else, the while) in the images....
The CoFlo open source utility can generate control flow graphs for C and C++. See their live demo.
I am trying to extract the Control Flow Graph from the assembly code that gcc produces. I have manage to dump the CFG of several IRs (rtl phases) into .vcg files using the arguments -fdump-rtl-* and -dv. Is there any way to do the same thing but for the final assembly code? I would like a generic, target-independent and easy to be parsed representation (like vcg representation). My source code is in C language (in case that it plays any important role).
Best regards,
Michalis.
Intel PTU and VTune will do it if you can run the app for profiling... not sure if it can generate the graph without having run the code though. Otherwise you might be looking at something like this: http://compilers.cs.ucla.edu/avrora/cfg.html.