Processing gcov data files for tracing purposes - gcc

I'm trying to create a tool similar to TraceGL, but for C-type languages:
As you can see, the tool above highlights code flows that were not executed in red.
In terms of building this tool for Objective-C, for example, I know that gcov (and libprofile_rt in clang) output data files that can help determine how many times a given line of code has been executed. However, would the gcov data files be able to tell me when a given line of code occurred during a program's execution?
For example, if line X is called during code paths A and B, would I be able to ascertain from the gcov that code paths A and B called line X given line X alone?

As far as I know, GCOV instrumentation data only tells that some point in the code was executed (and maybe how many times). But there is no relationship between the code points that are instrumented.
It sounds like what you want is to determine paths through the code. To do that, you either need to do static analysis of the code (requiring a full up C parser, name resolver, flow analyzer), or you need to couple the dynamic instrumentation points together in execution order.
The first requires you find machinery capable of processing C in all of its glory; you don't want to repeat that yourself. GCC, Clang, our DMS Toolkit are choices. I know the GCC and Clang do pretty serious analysis; I'm pretty sure you could find at least intraprocedural control flow analysis; I know that DMS can do this. You'd have to customize GCC and Clang to extract this data. You'd have to configure DMS to extract this data; configuration is easier than customization because it is a design property rather than a "custom" action. YMMV.
Then, using the GCOV data, you could determine the flows between the GCOV data points. It isn't clear to me that this buys you anything beyond what you already get with just the static control flow analysis, unless your goal is to exhibit execution traces.
To do this dynamically, what you could do is force each data collection point in the instrumented code to note that it is the most recent point encountered; before doing that, it would record the most recent point encountered before it was. This would produce in effect a chain of references between points which would match the control flow. This has two problems from your point of view, I think: a) you'd have to modify GCOV or some other tool to insert this different kind of instrumentation, b) you have to worry about what and how you record "predecessors" when a data collection point gets hit more than once.

gcov (or lcov) is one option. It does produce most of the information you are looking for, though how often those files are updated depends on how often __gcov_flush() is called. It's not really intended to be real time, and does not include all of the information you are looking for (notably, the 'when'). There is a short summary of the gcov data format here and in the header file here. lcov data is described here.
For what you are looking for DTrace should be able to provide all of the information you need, and in real time. For Objective-C on Apple platforms there are dtrace probes for the runtime which allow you to trace pretty much anything. There are a number of useful guides and examples out there for learning about dtrace and how to write scripts. Brendan Gregg provides some really great examples. Big Nerd Ranch has done a series of articles on it.

Related

How to find out what implicit(s) are used in my scala code

Problem statement:
I read in multiple sources/articles that implicits drive up scala compilation time
I want to remove/reduce them to minimum possible to see what compilation time without them will look like (codebase is around 1000 files of various complexity based on scalaz & akka & slick)
I don't really know what kind of static analysis I can perform. Any liks/references to already existing tooling highly appreciated.
It is true that implicits can degrade compilation speed, especially for code that uses them for type-level computations. It is definitely worth it to measure their impact. Unfortunately it can be difficult to track down the culprits. There are tools that can help though:
Run scalac with -Ystatistics:typer to see how many tree nodes are processed during type-checking. E.g. you can check the number of ApplyToImplicitArgs and ApplyImplicitView relative to the total (and maybe compare this to another code base).
There is currently an effort by the Scala Center to improve the status quo hosted at scalacenter/scalac-profiling. It includes an sbt plugin that should be able to give you an idea about implicit search times, but it's still in its infancy (not published yet at the time of writing). I haven't tested it myself but you can still give it a try.
You can also compile with -Xlog-implicits, pipe the output to a file and analyse the logs. It will show a message for each implicit candidate that was considered but failed, complete with source position, search type and reason for the failure. Such failed searches are expensive. You can write a simple script with your favourite scripting language (why not Scala?) to summarize the data and even plot it with some nice graphics.
Aside: How to resolve a specific implicit instance?
Just use reify and good ol' println debugging:
import scala.collection.SortedSet
import scala.reflect.runtime.universe._
println(showCode(reify { SortedSet(1,2,3) }.tree))
// => SortedSet.apply(1, 2, 3)(Ordering.Int)
There is scalac profiling being done by scala center now: https://scalacenter.github.io/scalac-profiling/

What is the ".scatterload" file in macOS?

I've downloaded Apple's TextEdit example app (here) and I'm a bit puzzled by one thing I see there: the TextEdit.scatterload file. It contains a list of functions and methods. My guess is that it provides information to the linker as to which functions/methods will be needed, and in what order, when the app launches, and that this is used to order the binary generated by the linker for maximum efficiency. Oddly, I seem to be unable to find any information whatsoever about this file through Google. So. First of all, is my guess as to the function of this file correct? And second, if so, can I generate a .scatterload file for my own macOS app, to make it launch faster? How would I do that? Seems like a good idea! (I am using Objective-C, but perhaps this question is not specific to that, so I'm not going to tag for it here.)
Scatter loading refers to a way to organize the mapping of your code in memory by specifying which part of code must be near which one, etc. This is to optimize page faults, etc.
You can read about it here Improving locality of reference (HTML)
or here Improving locality of reference (PDF).
.scatterload file is used by the linker to position code in memory layout of the executable.
Except if your app really need tight performance tuning, I would not encourage you to have a look at this.

How can I determine what objects ARC is retaining using Instruments or viewing assembly?

This question is not about finding out who retained a particular object but rather looking at a section of code that appears from the profiler to have excessive retain/release calls and figuring out which objects are responsible.
I have a Swift application that after initial porting was spending 90% of its time in retain/release code. After a great deal of restructuring to avoid referencing objects I have gotten that down to about 25% - but this remaining bit is very hard to attribute. I can see that a given chunk of it is coming from a given section of code using the profiler, but sometimes I cannot see anything in that code that should (to my understanding) be causing a retain/release. I have spent time viewing the assembly code in both Instruments (with the side-by-side view when it's working) and also the output of otool -tvV and sometimes the proximity of the retain/release calls to a recognizable section give me a hint as to what is going on. I have even inserted dummy method calls at places just to give me a better handle on where I am in the code and turned off optimization to limit code reordering, etc. But in many cases it seems like I would have to trace the code back to follow branches and figure out what is on the stack in order to understand the calls and I am not familiar enough with x86 to know know if that is practical. (I will add a couple of screenshots of the assembly view in Instruments and some otool output for reference below).
My question is - what else can I be doing to debug / examine / attribute these seemingly excessive retain/release calls to particular code? Is there something else I can do in Instruments to count these calls? I have played around with the allocation view and turned on the reference counting option but it didn't seem to give me any new information (I'm not actually sure what it did). Alternately, if I just try harder to interpret the assembly should I be able to figure out what objects are being retained by it? Are there any other tools or tricks I should know on that front?
EDIT: Rob's info below about single stepping into the assembly was what I was looking for. I also found it useful to set a symbolic breakpoint in XCode on the lib retain/release calls and log the item on the stack (using Rob's suggested "p (id)$rdi") to the console in order to get a rough count of how many calls are being made rather than inspect each one.
You should definitely focus on the assembly output. There are two views I find most useful: the Instruments view, and the Assembly assistant editor. The problem is that Swift doesn't support the Assembly assistant editor currently (I typically do this kind of thing in ObjC), so we come around to your complaint.
It looks like you're already working with the debug assembly view, which gives somewhat decent symbols and is useful because you can step through the code and hopefully see how it maps to the assembly. I also find Hopper useful, because it can give more symbols. Once you have enough "unique-ish" function calls in an area, you can usually start narrowing down how the assembly maps back to the source.
The other tool I use is to step into the retain bridge and see what object is being passed. To do this, instruction-step (^F7) into the call to swift_bridgeObjectRetain. At that point, you can call:
p (id)$rdi
And it should print out at least some type information about the what's being passed ($rdi is correct on x86_64 which is what you seem to be working with). I don't always have great luck extracting more information. It depends on exactly is in there. For example, sometimes it's a ContiguousArrayStorage<Swift.CVarArgType>, and I happen to have learned that usually means it's an NSArray. I'm sure better experts in LLDB could dig deeper, but this usually gets me at least in the right ballpark.
(BTW, I don't know why I can't call p (id)$rdi before jumping inside bridgeObjectRetain, but it gives strange type errors for me. I have to go into the function call.)
Wish I had more. The Swift tool chain just hasn't caught up to where the ObjC tool chain is for tracing this kind of stuff IMO.

How to debug a program without a debugger? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Interview question-
Often its pretty easier to debug a program once you have trouble with your code.You can put watches,breakpoints and etc.Life is much easier because of debugger.
But how to debug a program without a debugger?
One possible approach which I know is simply putting print statements in your code wherever you want to check for the problems.
Are there any other approaches other than this?
As its a general question, its not restricted to any specific language.So please share your thoughts on how you would have done it?
EDIT- While submitting your answer, please mention a useful resource (if you have any) about any concept. e.g. Logging
This will be lot helpful for those who don't know about it at all.(This includes me, in some cases :)
UPDATE: Michal Sznajderhas put a real "best" answer and also made it a community wiki.Really deserves lots of up votes.
Actually you have quite a lot of possibilities. Either with recompilation of source code or without.
With recompilation.
Additional logging. Either into program's logs or using system logging (eg. OutputDebugString or Events Log on Windows). Also use following steps:
Always include timestamp at least up to seconds resolution.
Consider adding thread-id in case of multithreaded apps.
Add some nice output of your structures
Do not print out enums with just %d. Use some ToString() or create some EnumToString() function (whatever suits your language)
... and beware: logging changes timings so in case of heavily multithreading you problems might disappear.
More details on this here.
Introduce more asserts
Unit tests
"Audio-visual" monitoring: if something happens do one of
use buzzer
play system sound
flash some LED by enabling hardware GPIO line (only in embedded scenarios)
Without recompilation
If your application uses network of any kind: Packet Sniffer or I will just choose for you: Wireshark
If you use database: monitor queries send to database and database itself.
Use virtual machines to test exactly the same OS/hardware setup as your system is running on.
Use some kind of system calls monitor. This includes
On Unix box strace or dtrace
On Windows tools from former Sysinternals tools like http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx, ProcessExplorer and alike
In case of Windows GUI stuff: check out Spy++ or for WPF Snoop (although second I didn't use)
Consider using some profiling tools for your platform. It will give you overview on thing happening in your app.
[Real hardcore] Hardware monitoring: use oscilloscope (aka O-Scope) to monitor signals on hardware lines
Source code debugging: you sit down with your source code and just pretend with piece of paper and pencil that you are computer. Its so called code analysis or "on-my-eyes" debugging
Source control debugging. Compare diffs of your code from time when "it" works and now. Bug might be somewhere there.
And some general tips in the end:
Do not forget about Text to Columns and Pivot Table in Excel. Together with some text tools (awk, grep or perl) give you incredible analysis pack. If you have more than 32K records consider using Access as data source.
Basics of Data Warehousing might help. With simple cube you may analyse tons of temporal data in just few minutes.
Dumping your application is worth mentioning. Either as a result of crash or just on regular basis
Always generate you debug symbols (even for release builds).
Almost last but not least: most mayor platforms has some sort of command line debugger always built in (even Windows!). With some tricks like conditional debugging and break-print-continue you can get pretty good result with obscure bugs
And really last but not least: use your brain and question everything.
In general debugging is like science: you do not create it you discover it. Quite often its like looking for a murderer in a criminal case. So buy yourself a hat and never give up.
First of all, what does debugging actually do? Advanced debuggers give you machine hooks to suspend execution, examine variables and potentially modify state of a running program. Most programs don't need all that to debug them. There are many approaches:
Tracing: implement some kind of logging mechanism, or use an existing one such as dtrace(). It usually worth it to implement some kind of printf-like function that can output generally formatted output into a system log. Then just throw state from key points in your program to this log. Believe it or not, in complex programs, this can be more useful than raw debugging with a real debugger. Logs help you know how you got into trouble, while a debugger that traps on a crash assumes you can reverse engineer how you got there from whatever state you are already in. For applications that you use other complex libraries that you don't own that crash in the middle of them, logs are often far more useful. But it requires a certain amount of discipline in writing your log messages.
Program/Library self-awareness: To solve very specific crash events, I often have implemented wrappers on system libraries such as malloc/free/realloc which extensions that can do things like walk memory, detect double frees, attempts to free non-allocated pointers, check for obvious buffer over-runs etc. Often you can do this sort of thing for your important internal data types as well -- typically you can make self-integrity checks for things like linked lists (they can't loop, and they can't point into la-la land.) Even for things like OS synchronization objects, often you only need to know which thread, or what file and line number (capturable by __FILE__, __LINE__) the last user of the synch object was to help you work out a race condition.
If you are insane like me, you could, in fact, implement your own mini-debugger inside of your own program. This is really only an option in a self-reflective programming language, or in languages like C with certain OS-hooks. When compiling C/C++ in Windows/DOS you can implement a "crash-hook" callback which is executed when any program fault is triggered. When you compile your program you can build a .map file to figure out what the relative addresses of all your public functions (so you can work out the loader initial offset by subtracting the address of main() from the address given in your .map file). So when a crash happens (even pressing ^C during a run, for example, so you can find your infinite loops) you can take the stack pointer and scan it for offsets within return addresses. You can usually look at your registers, and implement a simple console to let you examine all this. And voila, you have half of a real debugger implemented. Keep this going and you can reproduce the VxWorks' console debugging mechanism.
Another approach, is logical deduction. This is related to #1. Basically any crash or anomalous behavior in a program occurs when it stops behaving as expected. You need to have some feed back method of knowing when the program is behaving normally then abnormally. Your goal then is to find the exact conditions upon which your program goes from behaving correctly to incorrectly. With printf()/logs, or other feedback (such as enabling a device in an embedded system -- the PC has a speaker, but some motherboards also have a digital display for BIOS stage reporting; embedded systems will often have a COM port that you can use) you can deduce at least binary states of good and bad behavior with respect to the run state of your program through the instrumentation of your program.
A related method is logical deduction with respect to code versions. Often a program was working perfectly at one state, but some later version is not longer working. If you use good source control, and you enforce a "top of tree must always be working" philosophy amongst your programming team, then you can use a binary search to find the exact version of the code at which the failure occurs. You can use diffs then to deduce what code change exposes the error. If the diff is too large, then you have the task of trying to redo that code change in smaller steps where you can apply binary searching more effectively.
Just a couple suggestions:
1) Asserts. This should help you work out general expectations at different states of the program. As well familiarize yourself with the code
2) Unit tests. I have used these at times to dig into new code and test out APIs
One word: Logging.
Your program should write descriptive debug lines which include a timestamp to a log file based on a configurable debug level. Reading the resultant log files gives you information on what happened during the execution of the program. There are logging packages in every common programming language that make this a snap:
Java: log4j
.Net: NLog or log4net
Python: Python Logging
PHP: Pear Logging Framework
Ruby: Ruby Logger
C: log4c
I guess you just have to write fine-grain unit tests.
I also like to write a pretty-printer for my data structures.
I think the rest of the interview might go something like this...
Candidate: So you don't buy debuggers for your developers?
Interviewer: No, they have debuggers.
Candidate: So you are looking for programmers who, out of masochism or chest thumping hamartia, make things complicated on themselves even if they would be less productive?
Interviewer: No, I'm just trying to see if you know what you would do in a situation that will never happen.
Candidate: I suppose I'd add logging or print statements. Can I ask you a similar question?
Interviewer: Sure.
Candidate: How would you recruit a team of developers if you didn't have any appreciable interviewing skill to distinguish good prospects based on relevant information?
Peer review. You have been looking at the code for 8 hours and your brain is just showing you what you want to see in the code. A fresh pair of eyes can make all the difference.
Version control. Especially for large teams. If somebody changed something you rely on but did not tell you it is easy to find a specific change set that caused your trouble by rolling the changes back one by one.
On *nix systems, strace and/or dtrace can tell you an awful lot about the execution of your program and the libraries it uses.
Binary search in time is also a method: If you have your source code stored in a version-control repository, and you know that version 100 worked, but version 200 doesn't, try to see if version 150 works. If it does, the error must be between version 150 and 200, so find version 175 and see if it works... etc.
use println/log in code
use DB explorer to look at data in DB/files
write tests and put asserts in suspicious places
More generally, you can monitor side effects and output of the program, and trigger certain events in the program externally.
A Print statement isn't always appropriate. You might use other forms of output such as writing to the Event Log or a log file, writing to a TCP socket (I have a nice utility that can listen for that type of trace from my program), etc.
For programs that don't have a UI, you can trigger behavior you want to debug by using an external flag such as the existence of a file. You might have the program wait for the file to be created, then run through a behavior you're interested in while logging relevant events.
Another file's existence might trigger the program's internal state to be written to your logging mechanism.
like everyone else said:
Logging
Asserts
Extra Output
&
your favorite task manager or process
explorer
links here and here
Another thing I have not seen mentioned here that I have had to use quite a bit on embedded systems is serial terminals.
You can cannot a serial terminal to just about any type of device on the planet (I have even done it to embedded CPUs for hydraulics, generators, etc). Then you can write out to the serial port and see everything on the terminal.
You can get real fancy and even setup a thread that listens to the serial terminal and responds to commands. I have done this as well and implemented simple commands to dump a list, see internal variables, etc all from a simple 9600 baud RS-232 serial port!
Spy++ (and more recently Snoop for WPF) are tremendous for getting an insight into Windows UI bugs.
A nice read would be Delta Debugging from Andreas Zeller. It's like binary search for debugging

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Resources