Profiling Rust with execution time for each *line* of code? - performance

I have profiled my Rust code and see one processor-intensive function that takes a large portion of the time. Since I cannot break the function into smaller parts, I hope I can see which line in the function takes what portion of time. Currently I have tried CLion's Rust profiler, but it does not have that feature.
It would be best if the tool runs on MacOS since I do not have a Windows/Linux machine (except for virtualization).
P.S. Visual studio seems to have this feature; but I am using Rust. https://learn.microsoft.com/en-us/visualstudio/profiling/how-to-collect-line-level-sampling-data?view=vs-2017 It has:
Line-level sampling is the ability of the profiler to determine where in the code of a processor-intensive function, such as a function that has high exclusive samples, the processor has to spend most of its time.
Thanks for any suggestions!
EDIT: With C++, I do see source code line level information. For example, the following toy shows that, the "for" loop takes most of the time within the big function. But I am using Rust...

To get source code annotation in perf annotate or perf report you need to compile with debug=2 in your cargo toml.
If you also want source annotations for standard library functions you additionally need to pass -Zbuild-std to cargo (requires nightly).

Once compiled, "lines" of Rust do not exist. The optimiser does its job by completely reorganising the code you wrote and finding the minimal machine code that behaves the same as what you intended.
Functions are often inlined, so even measuring the time spent in a function can give incorrect results - or else change the performance characteristics of your program if you prevent it from being inlined to do so.

Related

Where is WASM stuck - how do I find that?

I am building a compiler for WASM,
however now my (quite complex) test program is stuck when executing it in Google Chrome.
How can I find out, in which function it is stuck? Except printing all functions it calls ofc. Is there an elegant way?
You can use the integrated debugger in Chrome, or Firefox. You can browse instructions, place break points, step in/out of function calls, view the call stack, the memory bytes, etc.
To be able to see the source code of your language you may use source maps, or better, the DWARF format, because the source maps are a temporal solution at this time.
There are compilers that emit source maps and/or DWARF format, but in your case you might have to develop that yourself.

Using `callgrind` to count function calls in Linux

I am trying to track function call counts on a program I'm interested in. If I run the program on its own, it will run fine. If I try to run it with valgrind using the command seen below I seem to be getting a different result.
Command run:
Produces this input immediately, even though the execution is normally slow.
I'd say that this is more likely to be related to this issue. However to be certain you will need to tell us
what compilation options are being used - specifically are you using anything related to AVX or x87?
What hardware this is running on.
It would help if you can cut this down to a small example and either update this or the frexp bugzilla items.
valgrind has limited floating point support. You're probably using non-standard or very large floats.
UPDATE: since you're using long double, you're outta luck. Unfortunately,
Your least-worst option
is to find a way to make your world work just using standard IEEE754
64-bit double precision.
This probably isn't easy considering you're using an existing project.

Determining the rate at which a function is called with OllyDbg

How can I find out how many times per second a particular function is called using OllyDbg? Alternatively, how can I count the total number of times the EIP has a certain value?
I don't want OllyDbg to break on executing this code.
I believe you can accomplish this by using the trace feature, with the conditional logging, the new OllyDBG seems to implement tracing with some better features. Its been a while since I've done this, but you can check it out. But I would actually suggest using Immunity Debugger(an OllyDBG clone with python plugin interface) and maybe write a python script to do this.

How does GCC's '-pg' flag work in relation to profilers?

I'm trying to understand how the -pg (or -p) flag works when compiling C code with GCC.
The official GCC documentation only states:
-pg
Generate extra code to write profile information suitable for the analysis program gprof. You must use this option when compiling the source files you want data about, and you must also use it when linking.
This really interests me, as I'm doing a small research on profilers. I'm trying to pick the best tool for the job.
Compiling with -pg instruments your code, so that Gprof reports detailed information. See gprof's manual, 9.1 Implementation of Profiling:
Profiling works by changing how every function in your program is compiled so that when it is called, it will stash away some information about where it was called from. From this, the profiler can figure out what function called it, and can count how many times it was called. This change is made by the compiler when your program is compiled with the -pg option, which causes every function to call mcount (or _mcount, or __mcount, depending on the OS and compiler) as one of its first operations.
The mcount routine, included in the profiling library, is responsible for recording in an in-memory call graph table both its parent routine (the child) and its parent's parent. This is typically done by examining the stack frame to find both the address of the child, and the return address in the original parent. Since this is a very machine-dependent operation, mcount itself is typically a short assembly-language stub routine that extracts the required information, and then calls __mcount_internal (a normal C function) with two arguments—frompc and selfpc. __mcount_internal is responsible for maintaining the in-memory call graph, which records frompc, selfpc, and the number of times each of these call arcs was traversed.
...
Please note that with such an instrumenting profiler, you're profiling the same code you would compile in release without profiling instrumentation. There is an overhead associated with the instrumentation code itself. Also, the instrumentation code may alter instruction and data cache usage.
Contrary to an instrumenting profiler, a sampling profiler like Intel VTune works on noninstrumented code by looking at the target program's program counter at regular intervals using operating system interrupts. It can also query special CPU registers to give you even more insight of what's going on.
See also Profilers Instrumenting Vs Sampling.
This link gives a brief explanation of how gprof works.
This link gives an extensive critique of it.
(Check my answer to the archived question.)
From "Measuring Function Duration with Ftrace":
Instrumentation comes in two main
forms—explicitly declared tracepoints, and implicit tracepoints.
Explicit tracepoints consist of developer defined
declarations which specify the location of the
tracepoint, and additional information about what data
should be collected at a particular trace site. Implicit
tracepoints are placed into the code automatically by the compiler, either due to compiler flags or by developer redefinition of commonly used macros.
To instrument functions implicitly, when
the kernel is configured to support function tracing, the kernel build system adds -pg to the flags used with
the compiler. This causes the compiler to add code to
the prologue of each function, which calls a special assembly routine called mcount. This compiler option is
specifically intended to be used for profiling and tracing
purposes.

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Resources