Building complete control flow graph for Linux kernel binary - linux-kernel

Are there any tools that can build the control flow graph for an entire Linux kernel binary? For example, consider Linux kernel compiled for x86 architecture (vmlinux file).
Is it possible to determine all execution paths (disregarding indirect branches or other control flows that need runtime information) using static analysis only? Are there any tools suitable for this?

Our DMS Software Reengineering Toolkit with its C Front End can do this.
DMS provides generic parsing, control flow graph and call graph construction; the C front end provide C-specific parsing details and the logic for constructing C-specific flowgraphs include indirect-gotos as well as a points-to analysis that has beem used on code systems of some 16 million lines, so it should handle the Linux kernal. The flow graphs are produced one-per-compilation unit; the call graph is for a set of linked compilation units. All this information is available as DMS data structures, and/or exportable as XML if you insist and can stomach gigabytes of output.
You can see examples of Control flow, Data Flow, and Call graphs.

You can try CppDepend, it provides a powerful dependency graph with many features.
However you have to analyze the source code and not the binaries.

There are two tools(CodeViz and Egypt) that can generate call graph during the compiling.
I don't think it will help you a lot to learn the Linux kernel. Many execution paths depend on Macros and runtime conditions, so the call graph generated by the static analyzer is not very practical. You still need to use printk and dmesg to figure out what happened in some functions. Instead of using these tools, printk is more useful.

GrammaTech CodeSonar can perform static analysis on binary code (https://www.grammatech.com/products/binary-analysis) and it allows you to visualize and navigate the control-flow graph. This is a commercial tool though.

Related

Is it possible to trace the OpenMP source code?

I would like to get a deep understanding of OpenMP and its internal mechanisms, its data structures, and algorithms at an operating system level (for example, I suppose that for task affinity each place has its own task queue and in the case of untied tasks there is a task migration / stealing between run queues). Is there a database of technical papers that describe all this stuff as well as a guide describing what files in the gcc source tree are of interest?
I search the gcc source tree for filenames containing the string omp and i found some results but i don't know if these are all the associated files.
I would like to get a deep understanding of OpenMP and its internal mechanisms, its data structures, and algorithms at an operating system level
OpenMP is a standard, not an implementation. There are multiple implementations. Two are mainstream: GOMP associated with GCC and IOMP associated to Clang (and ICC).
Is there a database of technical papers that describe all this stuff as well as a guide describing what files in the GCC source tree are of interest?
AFAIK, not for GOMP. The code is the reference for this as well as the associated documentation (more specifically this page). The code is modified over time so document would quickly get obsolete (especially since there are versions of the OpenMP specification released relatively frequently causing sometime changes deep in the target implementation).
Note that there are some generated documentation online like this one but it looks like it is really obsolete now.
I search the gcc source tree for filenames containing the string omp and i found some results but i don't know if these are all the associated files.
Generally, an OpenMP implementation is written in two parts. One part is in the compiler and it is meant to parse pragmas so to then convert them to runtime calls (eg. parallel sections), or to tune the compiler behaviour (eg. SIMD directives). This is a kind of front-end. Another part is the runtime which is the heat of the implementation, a kind of back-end, where the dynamic data structure lies (eg. for tasks, barrier, parallel sections, etc.). GOMP is implemented that way. That being said, the two parts are more closely interrelated than other implementations like IOMP (AFAIK, GOMP is not meant to be used from another compiler than GCC).
The code is available here. "loop.c" is probably the first file to look to understand the implementation. GOMP is relatively simple overall.
I suppose that for task affinity each place has its own task queue and in the case of untied tasks there is a task migration / stealing between run queues
Task affinity is a new feature that is only supported recently (in GCC 12). I would not be surprised if it would be a no-op (this is not rare for new features). In fact, "affinity.c" tends to confirm this. As for the queues, the last time I looked the code, GOMP was using a central queue (that does not scale).

Is There a Way of Providing asm.js or WebAssembly Code to V8 Turbofan?

After looking into the recently announce support for WebAssembly, it occurs to me that it would dramatically increase its utility if there were some way to:
Have TurboFan, the successor to the V8 JIT Crankshaft optimizer output all the assembly code it generates along with the static type signatures, and execution profile of that generated code.
Permit the programmer to provide his own asm.js/WebAssembly code for specific static type signatures that override the optimizer.
Is there some way to do this already?
There is some indication that it may be from the following passage from this article:
Under the hood, the WebAssembly implementation in V8 is designed to
reuse much of the existing JavaScript virtual machine infrastructure,
specifically the TurboFan compiler. A specialized WebAssembly decoder
validates modules by checking types, local variable indices, function
references, return values, and control flow structure in a single
pass. The decoder produces a TurboFan graph which is processed by
various optimization passes and finally turned into machine code by
the same backend which generates machine code for optimized JavaScript
and asm.js. In the next few months, the team will concentrate on
improving the startup time of the V8 implementation through compiler
tuning, parallelism, and compilation policy improvements.
To expand on the idea for a more general audience:
Typical top-down optimization involves high level programming and then execution profiling to identify which pieces of code require more effort. This is true whether the optimization is automated code generation or manual coding of optimized code. In the case of dynamically typed languages you'll frequently want to go beyond just optimizing dynamically-typed algorithms and provide code specialized for specific static types. This is, in fact, what the V8 JIT optimizer does automatically. If humans want to manually provide some particularly 'hot' specialized cases, they'd need to inform the automated optimizer, somehow, that they have already done the work so the automated optimizer can incorporate the manually optimized code rather than automatically generating suboptimal code.
No, that's not possible, and it's highly unlikely that it ever will be, given that it would probably require piercing all sorts of abstraction barriers within the system. The complexity would be enormous, and the effect on maintainability and security would probably be severe.
The web interface to WebAssembly modules (through the Wasm object) provides a clean and simple way to interface between JS and Wasm. In the future, ES6 modules might simplify interop further. It's not obvious what advantage a complicated mechanism like you propose would have over that.
For 1. you can play with the following flags:
trace_turbo: trace generated TurboFan IR
trace_turbo_graph: trace generated TurboFan graphs
trace_turbo_cfg_file: trace turbo cfg graph (for C1 visualizer) to a given file name
trace_turbo_types: trace TurboFan's types
trace_turbo_scheduler: trace TurboFan's scheduler
trace_turbo_reduction: trace TurboFan's various reducers
trace_turbo_jt: trace TurboFan's jump threading
trace_turbo_ceq: trace TurboFan's control equivalence
turbo_stats: print TurboFan statistics
They may change in future versions of V8 and aren't a stable API.
TurboFan is pretty complicated in that it consumes information from the baseline JIT / the interpreter, and may get to that information after deopt. The compiler isn't always a straight pipeline from JS / wasm to assembly. Inlining and a bunch of other things affect what happens.
For 2.: write wasm code or valid asm.js in the first place.
We've discussed performing a bunch of different types of dynamic tracing, caching traces (and allowing injection of traces for testing), but that's probably not something we'd expose considering that there's already a way to give the compiler precise type information!

Does a Tool for Automatically Visualizing a Project's Source Code's Control Flow In-Line Exist?

I would like to be able to use a tool that lets you visualize a program's control flow(s) in the context of its source code. To clarify, such a tool should basically show what happens in a program by spitting out a human-readable abstract syntax tree in the form of a multidigraph with nodes containing snippets of source-code translation units. The resulting graph initial node would, I presume, contain the block of code starting with a program's entry point (that'd be main for a C or C++ program.) New nodes would be created when a node needs to reference another block of code, whether that might be in the current file or in another one, and arrows would connect the nodes. Does such a tool exist, or would it have to be created from scratch?
You aren't going to get a tool that does this for arbitrary languages off the shelf. There are too many languages, each with its own syntax and semantics. You somehow need a tool per language. You might find such tools for very commonly used languages, e.g, Understand for Software.
I think that the only way to do this is to build metatools that enable the construction of language-specific tools relatively easily. Such a tool has to have the common machinery needed by all such language processing tools: strong parsers (so writing grammars for languages is relatively straightforward), AST construction machinery, symbol table support, routines to build control and data flow graphs. By providing such machinery, one can build language front ends for modest costs.
There's a class of tools that does this, program transformation. Most of them have parsing engines, but not the rest of the mechanisms I have suggested above.
I believe this enough to have invested 20 years of my life to building
such meta tools. Our DMS Software Reengineering toolkit shows its strength in being able to parse some 50+ languages, including the stunningly hard C++14 (both MS and GNU variants). It shows symbol table support and control flow graph construction for COBOL, Java, C, C++. (We can't do everything at once; pedaling as fast as practical).
[DMS builds these graphs as data structures rather than "showing" them; the examples on that page are drawn with the additional help of DOT].
One of the few other tools that tries to do this is Clang/LLVM; this covers a wide variety of popular languages. Clang doesn't have any specific support for parsing that I know about; you get to code it all yourself. I think you get control flow graphs only after you convert the language to LLVM. I don't think it has any specific support for drawing control flow graphs, either.
An older tool with a good reputation for multi-language support in this space is CoCo/R;
I don't know a lot about it. I know it parses,
and has some support for ASTs; I don't know what it does
about control flow analysis.

GCC code statistics/analysis

Does GCC/G++ have an option available to output analysis?
It would be useful to be able to compare differences between the previous code with the new one (size, sizes of classes/structures). Those can then be diff'd with the previous output for comparison, which could be useful for many purposes.
If no such output analysis is available, what is the best way to obtain such information?
GCCXML is a (GCC variant) that dumps symbol and type declaration data in an XML format. That may or may not have the "properties" you care about in them.
If you want specific information, you may be able to bend GCC to produce it. The learning curve for this is likely long and hard, because GCC wants to be a compiler, not a your-favorite-property-dumper, and it is a big, complex tool. You probably have some kind of chance with "struct size" as the compiler must compute that somewhere and it seems reasonable that it would be stored with information about the struct declaration. I'd expect that some of the command line switches do output some information and you might consider trying to emulate those. If you want really odd properties such as "register pressure inside a loop" you'll have to reach deeply inside a compiler.
If you want general properties derivable from the source code you will want to use a language-processing framework that has a strong C front end integrated into it. Clang is one. It likely has a learning curve similar to that for GCC, but is said to be better designed for tasks like yours. I have no specific experience with this.
Our DMS Software Reengineering Toolkit is explicitly designed to support such tasks. It has a full C Front End, with APIs for building full parse trees, symbol tables relating identifiers to their point of declaration, actual type, and full control and data flow analysis. DMS also has a and a full C++ Front End, with similar properties, but it does not yet provide flow analysis information. DMS lets you write arbitrary code on top of this compute whatever (arbitrary property) you like.

Assembly Analysis Tools

Does anyone have any suggestions for assembly file analysis tools? I'm attempting to analyze ARM/Thumb-2 ASM files generated by LLVM (or alternatively GCC) when passed the -S option. I'm particularly interested in instruction statistics at the basic block level, e.g. memory operation counts, etc. I may wind up rolling my own tool in Python, but was curious to see if there were any existing tools before I started.
Update: I've done a little searching, and found a good resource for disassembly tools / hex editors / etc here, but unfortunately it is mainly focused on x86 assembly, and also doesn't include any actual assembly file analyzers.
What you need is a tool for which you can define an assembly language syntax, and then build custom analyzers. You analyzers might be simple ("how much space does an instruction take?") or complex ("How many cycles will this isntruction take to execute?" [which depends on the preceding sequence of instructions and possibly a sophisticated model of the processor you care about]).
One designed specifically to do that is the New Jersey Machine Toolkit. It is really designed to build code generators and debuggers. I suspect it would be good at "instruction byte count". It isn't clear it is good at more sophisticated analyses. And I believe it insists you follow its syntax style, rather than yours.
One not designed specifically to do that, but good at parsing/analyzing langauges in general is our
DMS Software Reengineering Toolkit.
DMS can be given a grammar description for virtually any context free language (that covers most assembly language syntax) and can then parse a specific instance of that grammar (assembly code) into ASTs for further processing. We've done with with several assembly langauges, including the IBM 370, Motorola's 8 bit CPU line, and a rather peculiar DSP, without trouble.
You can specify an attribute grammar (computation over an AST) to DMS easily. These are great way to encode analyses that need just local information, such as "How big is this instruction?". For more complex analysese, you'll need a processor model that is driven from a series of instructions; passing such a machine model the ASTs for individual instructions would be an easy way to apply a machine model to compute more complex things as "How long does this instruction take?".
Other analyses such as control flow and data flow, are provided in generic form by DMS. You can use an attribute evaluator to collect local facts ("control-next for this instruction is...", "data from this instruction flows to,...") and feed them to the flow analyzers to compute global flow facts ("if I execute this instruction, what other instructions might be executed downstream?"..)
You do have to configure DMS for your particular (assembly) language. It is designed to be configured for tasks like these.
Yes, you can likely code all this in Python; after all, its a Turing machine. But likely not nearly as easily.
An additional benefit: DMS is willing to apply transformations to your code, based on your analyses. So you could implement your optimizer with it, too. After all, you need to connect the analysis indication the optimization is safe, to the actual optimization steps.
I have written many disassemblers, including arm and thumb. Not production quality but for the purposes of learning the assembler. For both the ARM and Thumb the ARM ARM (ARM Architectural Reference Manual) has a nice chart from which you can easily count up data operations from load/store, etc. maybe an hours worth of work, maybe two. At least up front, you would end up with data values being counted though.
The other poster may be right, as with the chart I am talking about it should be very simple to write a program to examine the ASCII looking for ldr, str, add, etc. No need to parse everything if you are interested in memory operations counts, etc. Of course the downside is that you are likely not going to be able to examine loops. One function may have a load and store, another may have a load and store but have it wrapped by a loop, causing many more memory operations once executed.
Not knowing what you really are interested in, my guess is you might want to simulate the code and count these sorts of things. I wrote a thumb simulator (thumbulator) that attempts to do just that. (and I have used it to compare llvm execution vs gcc execution when it comes to number of instructions executed, fetches, memory operations, etc) The problem may be that it is thumb only, no ARM no Thumb2. Thumb2 could be added easier than ARM. There exists an armulator from arm, which is in the gdb sources among other places. I cant remember now if it executes thumb2. My understanding is that when arm was using it would accurately tell you these sorts of statistics.
You can plug your statistics into LLVM code generator, it's quite flexible and it is already collecting some stats, which could be used as an example.

Resources