Where is WASM stuck - how do I find that? - debugging

I am building a compiler for WASM,
however now my (quite complex) test program is stuck when executing it in Google Chrome.
How can I find out, in which function it is stuck? Except printing all functions it calls ofc. Is there an elegant way?

You can use the integrated debugger in Chrome, or Firefox. You can browse instructions, place break points, step in/out of function calls, view the call stack, the memory bytes, etc.
To be able to see the source code of your language you may use source maps, or better, the DWARF format, because the source maps are a temporal solution at this time.
There are compilers that emit source maps and/or DWARF format, but in your case you might have to develop that yourself.

Related

Profiling Rust with execution time for each *line* of code?

I have profiled my Rust code and see one processor-intensive function that takes a large portion of the time. Since I cannot break the function into smaller parts, I hope I can see which line in the function takes what portion of time. Currently I have tried CLion's Rust profiler, but it does not have that feature.
It would be best if the tool runs on MacOS since I do not have a Windows/Linux machine (except for virtualization).
P.S. Visual studio seems to have this feature; but I am using Rust. https://learn.microsoft.com/en-us/visualstudio/profiling/how-to-collect-line-level-sampling-data?view=vs-2017 It has:
Line-level sampling is the ability of the profiler to determine where in the code of a processor-intensive function, such as a function that has high exclusive samples, the processor has to spend most of its time.
Thanks for any suggestions!
EDIT: With C++, I do see source code line level information. For example, the following toy shows that, the "for" loop takes most of the time within the big function. But I am using Rust...
To get source code annotation in perf annotate or perf report you need to compile with debug=2 in your cargo toml.
If you also want source annotations for standard library functions you additionally need to pass -Zbuild-std to cargo (requires nightly).
Once compiled, "lines" of Rust do not exist. The optimiser does its job by completely reorganising the code you wrote and finding the minimal machine code that behaves the same as what you intended.
Functions are often inlined, so even measuring the time spent in a function can give incorrect results - or else change the performance characteristics of your program if you prevent it from being inlined to do so.

What is the ".scatterload" file in macOS?

I've downloaded Apple's TextEdit example app (here) and I'm a bit puzzled by one thing I see there: the TextEdit.scatterload file. It contains a list of functions and methods. My guess is that it provides information to the linker as to which functions/methods will be needed, and in what order, when the app launches, and that this is used to order the binary generated by the linker for maximum efficiency. Oddly, I seem to be unable to find any information whatsoever about this file through Google. So. First of all, is my guess as to the function of this file correct? And second, if so, can I generate a .scatterload file for my own macOS app, to make it launch faster? How would I do that? Seems like a good idea! (I am using Objective-C, but perhaps this question is not specific to that, so I'm not going to tag for it here.)
Scatter loading refers to a way to organize the mapping of your code in memory by specifying which part of code must be near which one, etc. This is to optimize page faults, etc.
You can read about it here Improving locality of reference (HTML)
or here Improving locality of reference (PDF).
.scatterload file is used by the linker to position code in memory layout of the executable.
Except if your app really need tight performance tuning, I would not encourage you to have a look at this.

How can I determine what objects ARC is retaining using Instruments or viewing assembly?

This question is not about finding out who retained a particular object but rather looking at a section of code that appears from the profiler to have excessive retain/release calls and figuring out which objects are responsible.
I have a Swift application that after initial porting was spending 90% of its time in retain/release code. After a great deal of restructuring to avoid referencing objects I have gotten that down to about 25% - but this remaining bit is very hard to attribute. I can see that a given chunk of it is coming from a given section of code using the profiler, but sometimes I cannot see anything in that code that should (to my understanding) be causing a retain/release. I have spent time viewing the assembly code in both Instruments (with the side-by-side view when it's working) and also the output of otool -tvV and sometimes the proximity of the retain/release calls to a recognizable section give me a hint as to what is going on. I have even inserted dummy method calls at places just to give me a better handle on where I am in the code and turned off optimization to limit code reordering, etc. But in many cases it seems like I would have to trace the code back to follow branches and figure out what is on the stack in order to understand the calls and I am not familiar enough with x86 to know know if that is practical. (I will add a couple of screenshots of the assembly view in Instruments and some otool output for reference below).
My question is - what else can I be doing to debug / examine / attribute these seemingly excessive retain/release calls to particular code? Is there something else I can do in Instruments to count these calls? I have played around with the allocation view and turned on the reference counting option but it didn't seem to give me any new information (I'm not actually sure what it did). Alternately, if I just try harder to interpret the assembly should I be able to figure out what objects are being retained by it? Are there any other tools or tricks I should know on that front?
EDIT: Rob's info below about single stepping into the assembly was what I was looking for. I also found it useful to set a symbolic breakpoint in XCode on the lib retain/release calls and log the item on the stack (using Rob's suggested "p (id)$rdi") to the console in order to get a rough count of how many calls are being made rather than inspect each one.
You should definitely focus on the assembly output. There are two views I find most useful: the Instruments view, and the Assembly assistant editor. The problem is that Swift doesn't support the Assembly assistant editor currently (I typically do this kind of thing in ObjC), so we come around to your complaint.
It looks like you're already working with the debug assembly view, which gives somewhat decent symbols and is useful because you can step through the code and hopefully see how it maps to the assembly. I also find Hopper useful, because it can give more symbols. Once you have enough "unique-ish" function calls in an area, you can usually start narrowing down how the assembly maps back to the source.
The other tool I use is to step into the retain bridge and see what object is being passed. To do this, instruction-step (^F7) into the call to swift_bridgeObjectRetain. At that point, you can call:
p (id)$rdi
And it should print out at least some type information about the what's being passed ($rdi is correct on x86_64 which is what you seem to be working with). I don't always have great luck extracting more information. It depends on exactly is in there. For example, sometimes it's a ContiguousArrayStorage<Swift.CVarArgType>, and I happen to have learned that usually means it's an NSArray. I'm sure better experts in LLDB could dig deeper, but this usually gets me at least in the right ballpark.
(BTW, I don't know why I can't call p (id)$rdi before jumping inside bridgeObjectRetain, but it gives strange type errors for me. I have to go into the function call.)
Wish I had more. The Swift tool chain just hasn't caught up to where the ObjC tool chain is for tracing this kind of stuff IMO.

How to find stuff in the kernel

I'm doing various tasks on the linux kernel, and I end up reading source code from time to time. I haven't really needed to change the kernel yet (I'm good with so called "Loadable Kernel Modules") so I didn't download the source of the kernel, just using http://lxr.free-electrons.com/ . And quite a lot I find myself finding a function that has many implementations, and start guessing which one is the one I need.
For example, I looked at the file Linux/virt/kvm/kvm_main.c at line 496 is a call to list_add, a click on it gives me two options: drivers/gpu/drm/radeon/mkregtable.c, line 84 and include/linux/list.h, line 60 - It's quite clear that kvm will not send my to something under "gpu" but this is not always the case. I have looked at the includes of the file - was not much help.
So my questions: Given a file from the kernel, and a function call at line ###, what is the nicest way to find where one function call actually continues?
(I'll be happy to hear also about ways that don't include the website and\or require me to download the source code)
There are many things in kernel that are #define'd or typedef'd or functions mapped inside structs (the fop struct in the drivers). So, there's no easy way to browse the kernel source. lxr site helps you but it can't go any further when you encounter any of the above data structs. The same is with using cscope/ctags. The best way though, despite you explicitly mentioning against it, is to download the source and browse through it.
Another method would be to use kgdb and inspect the code function by function, but that requires you to have some knowledge of the functions where you want to step in or not, to save a lot of time. And last but not the least, increase the kernel log level, and print the logs that are accessible through dmesg. But these all require you to have a kernel source.

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Resources