I am looking for a way to instrument a function in Linux kernel. It seems that GCC's -finstrument-functions flag allows instrumentation, but is there any way to instrument only a particular Linux function using compiler directives (i.e., function attributes) instead of instrumenting all functions?
It seems that KProbe also has functionality of instrumentation, but KProbe maintains a blacklist of functions and does not allow to monitor those functions, resulting limited scope of instrumentation.
I am running Ubuntu-16.04 with kernel version 4.8.11 on x86_64.
The purpose of instrumentation is to monitor the entry and exit of the target function by setting and clearing a flag.
The answer is it depends. However it is not as straightforward to inject as one might think.
In particular, example you can use -finstrument-functions for specific C files and then use attribute((no_instrument_function)) to exclude the ones you don't want instrumented
You would have to keep track of preemption and fastcalls, you would have to make sure that you're not accidentally stopping or removing instrumentation. Oh you also have to track any random stack layout that the kernel is self inflicting.
In a tutorial I've encountered a new concept (for me), that I never thought is possible. Actually, I thought that compilation is an entirely pre-run-time process. This is the phrase from tutorial: "Compile time occurs before link time (when the output of one or more compiled files are joined together) and runtime (when a program is executed). In some programming languages it may be necessary for some compilation and linking to occur at runtime".
My questions are:
Is pre-run-time compilation and linking processes absolutely different from run-time compilation and linking? If yes, please explain the main differences.
How are code sections that need to be compiled(linked) during run-time marked and where is that information kept? (This may be different from language to language, if possible, please give a specific example).
Thank you very much for your time!
Runtime compilation
The best (most well known) example I'm personally aware of is the just in time compilation used by Java. As you might know Java code is being compiled into bytecode which can be interpreted by the Java Virtual Machine. It's therefore different from let's say C++ which is first fully (preprocessed) compiled (and linked) into an executable which can be ran directly by the OS without any virtual machine.
The Java bytecode is instead interpreted by the VM, which maps them to processor specific instructions. That being said the JVM does JIT, which takes that bytecode and compiles it (during runtime) into machine code. Here we arrive at your second question. Even in Java it can depend on which JVM you are using but basically there are pieces of code called hotspots, the pieces of code that are run frequently and which might be compiled so the application's performance improves. This is done during runtime because the normal compiler does not have (or well might not have) all the necessary data to make a proper judgement which pieces of code are in fact ran frequently. Therefore JIT requires some kind of runtime statistics gathering, which is done parallel to the program execution and is done by the JVM. What kind of statistics are gathered, what can be optimised (compiled in runtime) etc. depends on the implementation (you obviously cannot do everything a normal compiler would do due to memory and time constraints - guess this partly answers the first question? you don't compile everything and usually only a limited set of optimisations are supported in runtime compilation). You can try looking for such info but from my experience usually it's very badly documented and hard to find (at least when it comes to official sources, not presentations/blogs etc.)
Runtime linking
Linker is a different pair of shoes. We cannot use the Java example anymore since it doesn't really have a linker like C or C++ (instead it has a classloader which takes care of loading files and putting it all together).
Usually linking is performed by a linker after the compilation step (static linking), this has pros (no dependencies) and cons (higher memory imprint as we cannot use a shared library, when the library number changes you need to recompile your sources).
Runtime linking (dynamic/late linking) is actually performed by the OS and it's the OS linker's job to first load shared libraries and then attach them to a running process. Furthermore there are also different types of dynamic linking: explicit and implicit. This has the benefit of not having to recompile the source when the version number changes since it's dynamic and library sharing but also drawbacks, what if you have different programs that use the same library but require different versions (look for DLL hell). So yes those two concepts are also quite different.
Again how it's all done, how it's decided what and how should be linked, is OS specific, for instance Microsoft has the dynamic-link library concept.
There is a device driver for a camera device provided to us as a .so library file by the vendor.
Only the header file with API's is available which provides the list of functions that we can work with the device. Our application is linked with the .so library file provided by the vendor and uses the interface functions provided for our objective.
When we wanted to measure the time taken by our application in handling different tasks, we have added GCC -pg flag and compiled+built our application.
But we found that using this executable built with -pg, we are observing random failure in the camera image acquire functions. Since we are using the .so library file, we do not know what is going wrong inside that function.
So in general I wanted to understand what could be the possible reasons of such a failure mode. Any pointers or documents that can help what goes inside profiling and its side effects is appreciated.
This answer is a helpful overview of how the gcc -pg flag profiler actually works. The take-home point is mostly to do with possible changes to timing. If your library has any kind of time-sensitivity in it, introducing profiler overheads might be changing the time it takes to execute parts of the code, and perhaps violating some kind of constraint.
If you look at the gprof documentation, it would explain the implementation details:
Profiling works by changing how every function in your program is
compiled so that when it is called, it will stash away some
information about where it was called from. From this, the profiler
can figure out what function called it, and can count how many times
it was called. This change is made by the compiler when your program
is compiled with the `-pg' option, which causes every function to call
mcount (or _mcount, or __mcount, depending on the OS and compiler) as
one of its first operations.
So the timing of your application would change quite a bit when you turn on -pg.
If you would like to instrument your code without significantly affecting the timings, you could possibly look at oprofile. It does not pose as significant an overhead as gprof does.
Another fairly recent tool that serves as a good lightweight profiling tool is perf.
The profiling tools are useful primarily in understanding the CPU bound pieces of your library/application and can help you optimize those critical pieces. Most of the time they serve to identify some culprit function/method which wastes CPU cycles. So do not use it as the sole piece for debugging any and all issues.
Most vendor libraries would also provide means to turn on extra debugging or dumping extra information during runtime. They include means such as environment variables, log files, /proc or /sys interfaces for drivers, etc. and sometimes even tools to increase debugging levels at runtime. See if you can leverage these.
If you have defined APIs in a library/driver, you should run unit-tests on them instead of trying to debug the whole application you've built.
If you find a certain unit-test fails, send the source code of the unit-test to your vendor, and ask them to fix the bug. If it is not a bug, your vendor would at least point you towards the right set of APIs or the semantics to use.
Breakpoints are one of the coolest feature supported by most popular Debuggers like GDB. But how a breakpoint works ? What code modifications does the compiler do to achieve the breakpoint? Are there any special hardware features used to support breakpoints?
Compiler does not need to "modify" the binary in any way to support the breakpoints. However it is important, that:
Compiler includes enough information in the executable (that is not in the code itself but in special sections in same file), so that debugger can relate source that user wants to debug with machine code. One typical thing debugger needs to know to be able to set breakpoints (unless you specify addresses directly), is where (at which address) program functions and lines of source code start (within machine code).
Code is not optimized by compiler in any way, that makes it impossible to relate source and machine code. Typically you will want debug code that was not optimized or code where only carefully selected optimizations were performed.
The rest of work is then performed by debugger itself.
Software breakpoints don't necessarily need special hardware features. Debugger here relies on modifying original binary (it's copy that is loaded to memory). When you set a breakpoint, debugger will place special instruction at the location of breakpoint. This special instruction needs to somehow let debugger detect when it (this special instruction) is executing. This can be some instruction that causes some kind of interrupt/exception, that debugger can hook onto, or some instruction that handles the control to debug unit. If this runs under some OS, that OS needs to support modifying running program (with something like ptrace poke/peek). Downside of SW breakpoints is that debugger needs to be able to modify running program, which is not possible if program is running from some kind of read-only memory (quite common in embedded world).
Hardware breakpoints (which need to be supported by CPU) implement similar behavior without modifying program binary. This is CPU specific, but usually it lets you to at least define a program address at which execution should hit a breakpoint. CPU continuously compares current PC with these breakpoint addresses and once the condition is matched, it breaks the execution. Number of these breakpoints is always limited.
To put a break point first we have to add some special information in to the binary .We use the flag -g while compiling the c source files to include this info.The Software debugger actually use this info to put break points.The best example for hardware break point support is in VxWorks as I have experienced.
Basically at the break point the processor halts.So internally any step which will give an exception to processor can be used to put a software break point.While a Hardware break point works by matching the address stored in Hardware registers to cause an exception.So Hardware break point is very powerful but it is heavily architecture dependent.
A very good explanation is here
What is the difference between hardware and software breakpoints?
A good intro with Processor related information is given here
http://processors.wiki.ti.com/index.php/How_Do_Breakpoints_Work
What does a JIT compiler specifically do as opposed to a non-JIT compiler? Can someone give a succinct and easy to understand description?
A JIT compiler runs after the program has started and compiles the code (usually bytecode or some kind of VM instructions) on the fly (or just-in-time, as it's called) into a form that's usually faster, typically the host CPU's native instruction set. A JIT has access to dynamic runtime information whereas a standard compiler doesn't and can make better optimizations like inlining functions that are used frequently.
This is in contrast to a traditional compiler that compiles all the code to machine language before the program is first run.
To paraphrase, conventional compilers build the whole program as an EXE file BEFORE the first time you run it. For newer style programs, an assembly is generated with pseudocode (p-code). Only AFTER you execute the program on the OS (e.g., by double-clicking on its icon) will the (JIT) compiler kick in and generate machine code (m-code) that the Intel-based processor or whatever will understand.
In the beginning, a compiler was responsible for turning a high-level language (defined as higher level than assembler) into object code (machine instructions), which would then be linked (by a linker) into an executable.
At one point in the evolution of languages, compilers would compile a high-level language into pseudo-code, which would then be interpreted (by an interpreter) to run your program. This eliminated the object code and executables, and allowed these languages to be portable to multiple operating systems and hardware platforms. Pascal (which compiled to P-Code) was one of the first; Java and C# are more recent examples. Eventually the term P-Code was replaced with bytecode, since most of the pseudo-operations are a byte long.
A Just-In-Time (JIT) compiler is a feature of the run-time interpreter, that instead of interpreting bytecode every time a method is invoked, will compile the bytecode into the machine code instructions of the running machine, and then invoke this object code instead. Ideally the efficiency of running object code will overcome the inefficiency of recompiling the program every time it runs.
JIT-Just in time
the word itself says when it's needed (on demand)
Typical scenario:
The source code is completely converted into machine code
JIT scenario:
The source code will be converted into assembly language like structure [for ex IL (intermediate language) for C#, ByteCode for java].
The intermediate code is converted into machine language only when the application needs that is required codes are only converted to machine code.
JIT vs Non-JIT comparison:
In JIT not all the code is converted into machine code first a part
of the code that is necessary will be converted into machine code
then if a method or functionality called is not in machine then that
will be turned into machine code... it reduces burden on the CPU.
As the machine code will be generated on run time....the JIT
compiler will produce machine code that is optimised for running
machine's CPU architecture.
JIT Examples:
In Java JIT is in JVM (Java Virtual Machine)
In C# it is in CLR (Common Language Runtime)
In Android it is in DVM (Dalvik Virtual Machine), or ART (Android RunTime) in newer versions.
As other have mentioned
JIT stands for Just-in-Time which means that code gets compiled when it is needed, not before runtime.
Just to add a point to above discussion JVM maintains a count as of how many time a function is executed. If this count exceeds a predefined limit JIT compiles the code into machine language which can directly be executed by the processor (unlike the normal case in which javac compile the code into bytecode and then java - the interpreter interprets this bytecode line by line converts it into machine code and executes).
Also next time this function is calculated same compiled code is executed again unlike normal interpretation in which the code is interpreted again line by line. This makes execution faster.
JIT compiler only compiles the byte-code to equivalent native code at first execution. Upon every successive execution, the JVM merely uses the already compiled native code to optimize performance.
Without JIT compiler, the JVM interpreter translates the byte-code line-by-line to make it appear as if a native application is being executed.
Source
JIT stands for Just-in-Time which means that code gets compiled when it is needed, not before runtime.
This is beneficial because the compiler can generate code that is optimised for your particular machine. A static compiler, like your average C compiler, will compile all of the code on to executable code on the developer's machine. Hence the compiler will perform optimisations based on some assumptions. It can compile more slowly and do more optimisations because it is not slowing execution of the program for the user.
After the byte code (which is architecture neutral) has been generated by the Java compiler, the execution will be handled by the JVM (in Java). The byte code will be loaded in to JVM by the loader and then each byte instruction is interpreted.
When we need to call a method multiple times, we need to interpret the same code many times and this may take more time than is needed. So we have the JIT (just-in-time) compilers. When the byte has been is loaded in to JVM (its run time), the whole code will be compiled rather than interpreted, thus saving time.
JIT compilers works only during run time, so we do not have any binary output.
A just in time compiler (JIT) is a piece of software which takes receives an non executable input and returns the appropriate machine code to be executed. For example:
Intermediate representation JIT Native machine code for the current CPU architecture
Java bytecode ---> machine code
Javascript (run with V8) ---> machine code
The consequence of this is that for a certain CPU architecture the appropriate JIT compiler must be installed.
Difference compiler, interpreter, and JIT
Although there can be exceptions in general when we want to transform source code into machine code we can use:
Compiler: Takes source code and returns a executable
Interpreter: Executes the program instruction by instruction. It takes an executable segment of the source code and turns that segment into machine instructions. This process is repeated until all source code is transformed into machine instructions and executed.
JIT: Many different implementations of a JIT are possible, however a JIT is usually a combination of a compiler and an interpreter. The JIT first turn intermediary data (e.g. Java bytecode) which it receives into machine language via interpretation. A JIT can often measures when a certain part of the code is executed often and the will compile this part for faster execution.
Just In Time Compiler (JIT) :
It compiles the java bytecodes into machine instructions of that specific CPU.
For example, if we have a loop statement in our java code :
while(i<10){
// ...
a=a+i;
// ...
}
The above loop code runs for 10 times if the value of i is 0.
It is not necessary to compile the bytecode for 10 times again and again as the same instruction is going to execute for 10 times. In that case, it is necessary to compile that code only once and the value can be changed for the required number of times. So, Just In Time (JIT) Compiler keeps track of such statements and methods (as said above before) and compiles such pieces of byte code into machine code for better performance.
Another similar example , is that a search for a pattern using "Regular Expression" in a list of strings/sentences.
JIT Compiler doesn't compile all the code to machine code. It compiles code that have a similar pattern at run time.
See this Oracle documentation on Understand JIT to read more.
You have code that is compliled into some IL (intermediate language). When you run your program, the computer doesn't understand this code. It only understands native code. So the JIT compiler compiles your IL into native code on the fly. It does this at the method level.
I know this is an old thread, but runtime optimization is another important part of JIT compilation that doesn't seemed to be discussed here. Basically, the JIT compiler can monitor the program as it runs to determine ways to improve execution. Then, it can make those changes on the fly - during runtime. Google JIT optimization (javaworld has a pretty good article about it.)
just-in-time (JIT) compilation, (also dynamic translation or run-time compilation), is a way of executing computer code that involves compilation during execution of a program – at run time – rather than prior to execution.
IT compilation is a combination of the two traditional approaches to translation to machine code – ahead-of-time compilation (AOT), and interpretation – and combines some advantages and drawbacks of both. JIT compilation combines the speed of compiled code with the flexibility of interpretation.
Let's consider JIT used in JVM,
For example, the HotSpot JVM JIT compilers generate dynamic optimizations. In other words, they make optimization decisions while the Java application is running and generate high-performing native machine instructions targeted for the underlying system architecture.
When a method is chosen for compilation, the JVM feeds its bytecode to the Just-In-Time compiler (JIT). The JIT needs to understand the semantics and syntax of the bytecode before it can compile the method correctly. To help the JIT compiler analyze the method, its bytecode are first reformulated in an internal representation called trace trees, which resembles machine code more closely than bytecode. Analysis and optimizations are then performed on the trees of the method. At the end, the trees are translated into native code.
A trace tree is a data structure that is used in the runtime compilation of programming code. Trace trees are used in a type of 'just in time compiler' that traces code executing during hotspots and compiles it. Refer this.
Refer :
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
https://en.wikipedia.org/wiki/Just-in-time_compilation
A non-JIT compiler takes source code and transforms it into machine specific byte code at compile time. A JIT compiler takes machine agnostic byte code that was generated at compile time and transforms it into machine specific byte code at run time. The JIT compiler that Java uses is what allows a single binary to run on a multitude of platforms without modification.
Jit stands for just in time compiler
jit is a program that turns java byte code into instruction that can be sent directly to the processor.
Using the java just in time compiler (really a second compiler) at the particular system platform complies the bytecode into particular system code,once the code has been re-compiled by the jit complier ,it will usually run more quickly in the computer.
The just-in-time compiler comes with the virtual machine and is used optionally. It compiles the bytecode into platform-specific executable code that is immediately executed.
20% of the byte code is used 80% of the time. The JIT compiler gets these stats and optimizes this 20% of the byte code to run faster by adding inline methods, removal of unused locks etc and also creating the bytecode specific to that machine. I am quoting from this article, I found it was handy. http://java.dzone.com/articles/just-time-compiler-jit-hotspot
Just In Time compiler also known as JIT compiler is used for
performance improvement in Java. It is enabled by default. It is
compilation done at execution time rather earlier.
Java has popularized the use of JIT compiler by including it in
JVM.
JIT refers to execution engine in few of JVM implementations, one that is faster but requires more memory,is a just-in-time compiler. In this scheme, the bytecodes of a method are compiled to native machine code the first time the method is invoked. The native machine code for the method is then cached, so it can be re-used the next time that same method is invoked.
JVM actually performs compilation steps during runtime for performance reasons. This means that Java doesn't have a clean compile-execution separation. It first does a so called static compilation from Java source code to bytecode. Then this bytecode is passed to the JVM for execution. But executing bytecode is slow so the JVM measures how often the bytecode is run and when it detects a "hotspot" of code that's run very frequently it performs dynamic compilation from bytecode to machinecode of the "hotspot" code (hotspot profiler). So effectively today Java programs are run by machinecode execution.