Using nvdisasm to generate control flow image of PTX code

Using nvdisasm to generate control flow image of PTX code - parallel-processing

I have a single file of CUDA code compiled to intermediate language PTX code, example.ptx. I would be interested to start poking around with this short file, trying to understand how it works.
I don't have previous experience fiddling around with intermediate code representation, but what I gather is that I can some how print out a figure of the control flow, to support me trying to reverse engineer this. Cuda Binary Utilities mention nvdisasm and shows nice Graphviz figures of the control flow, but it seems to work only for cubin files. I understand that these cubin files are optimized further from PTX, depending on the current GPU architecture.
My question is: Can I use nvdisasm to generate control flow image from example.ptx, or compile the ptx file to a cubin file, and use that to generate the image?

or compile the ptx file to a cubin file, and use that
Yes, you can do that. Compile your ptx file to a cubin with:
nvcc example.ptx -cubin
(the result will be in example.cubin or you could add e.g. -o myfile.cubin to name it something else)
This cubin file can be fed to nvdisasm for processing.

Related

what do I do with an SIGFPE address in gdb?

While running an executable in gdb, I encountered the following error:
Program received signal SIGFPE, Arithmetic exception.
0x08158307 in radtra_ ()
How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source? if it helps, the source language was Fortran.

How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source?
That isn't easy. You could use GDB disassemble command, look for access to global variables and CALL instructions, and make a guess where inside radtra_ you are. This is harder the larger the routine is, the more optimizations compiler has applied to it, and the fewer calls and global variable accesses are performed.
If you can't guess, your only options are:
Rebuild the application adding -g flag, but leaving all other compile options unmodified, then use addr2line to translate the address to line number. (This is how you should build the application from the start.)
If you can't rebuild the entire application, rebuild just the source containing radtra_ (again with same flags, but add -g). You should be able to match the output from objdump -d radtra.o with the output from disassemble. Once you have a match, read output from readelf -wl radtra.o or objdump -g radtra.o to associate code offsets within radtra_ with source lines that code was generated from.
Hire an expert to guess for you. This wouldn't be cheap, as people skilled in this kind of reverse engineering are usually gainfully employed and value their time.

What are gcc linker map files used for?

What are the ".map" files generated by gcc/g++ linker option "-Map" used for ?
And how to read them ?

I recommend generating a map file and keeping a copy for any software you put into production.
It can be useful for deciphering crash reports. Depending on the system, you likely can get a stack dump from the crash. The stack dump will include memory addresses and one of the registers will include the Instruction Pointer. That tells you the memory address code was executing at. On some systems, code addresses can be moved around (when loading dynamic libraries, hence, dynamic), but the lower order bytes should remain the same.
The map file is a MAP from memory location -> code location. It gives you the name of the function at a given memory address. Due to optimizations, it may not be extremely accurate, but it gives you a place to start in terms of looking for bugs that cause the crash.
Now, in 30 years of writing commercial software, this is the only thing I've used the map files for. Twice successfully.

What are the ".map" files generated by gcc/g++ linker option "-Map" used for?
There is no such thing as 'gcc linker' -- GCC and linker are independent and separate projects.
Usually the map is used for understanding decisions that ld made while linking the binary. From man ld:
-M
--print-map
Print a link map to the standard output.
A link map provides information about the link, including the following:
· Where object files are mapped into memory.
· How common symbols are allocated.
· All archive members included in the link, with a mention of the symbol which caused the archive member to be brought in.
· The values assigned to symbols.
...
If you don't understand what that means, you likely don't (yet) have the questions that this output answers, and hence have no need to read it.

The compiler gcc is one program that generates object code files, the linker ld is a second program to combine the object code files into an executable. The two can be combined into a single command line.
If you are generating a program to run on an ARM processor you need to use arm-none-eabi-gcc and arm-none-eabi-ld so that the code will be correct for the ARM architecture. Gcc and ld will generate code for your host computer.

Generating a PE format executable

I'm trying to generate a PE format executable; I'm at the stage where I have something that dumpbin is happy with, and as far as I can tell is not materially different from an empty program linked with Microsoft's linker, but Windows still rejects it: PE file - what's missing?
If I had some algorithm for generating a valid PE file, maybe I could hill climb from there. Here's what I've found so far:
There's plenty of documentation, sample code and tools for reading PE files, as opposed to generating them from scratch.
PE Bliss lists generation among its features, but won't compile.
Sample assembly language templates for PE file generation concentrate on minimizing size. The most promising looking one generates a file that Windows rejects even though as far as I can see it should be accepted; the one I found that did work, ironically, generates a file that Windows accepts even though as far as I can see it should be rejected, since almost every nominally essential component is missing or malformed.
Is there any sample code available that generates a correct PE file?

Here's the classic page about generating PE from scratch:
http://www.phreedom.org/research/tinype
As for the generic list of required/optional parts, see corkami page on the PE format:
http://code.google.com/p/corkami/wiki/PE
See also the code tree for many examples of small PE files, generated completely from scratch.

How Linux Kernel knows how to execute a binary format

I'm reading about binary formats, the ELF format for example, so suppose i have two binary files, one compiled as an ELF file and another as a COFF(Or another binary format), how the kernel handles this? i mean, when you execute the program, how linux knows how to handle each different format?? Has the kernel some interface which selects according the header of the binary, the correct code to handle each kind of binary??

As you said, the kernel detects the type of binary based on the header.
Different binary formats are registered using register_binfmt(). Take a look at the fs/binfmt_* files for the different implementations.
This is done by exec_binprm() - basically the meat of the execve syscall - (in fs/exec.c). It calls search_binary_handler(), which searches the registered format handlersto find one willing to handle the file.

Optimizing an assemly for .cpp file

I have a question on optimizing an assembly file that I got from a .cpp file!!
This is my hw from computer organization class.
The hw is as follows.
I have to write a program that calculate dot product of two vectors and generate .asm file. Then, I have to optimize the .asm file and compare the execution time by using QueryPerformanceCounter on Visual Studio. I generated the .asm file and found the loop part in it. I am trying to learn the basic assembly language to optimize the assembly. However, I have no idea how to execute the .asm file. My professor mentioned about linking between .cpp file and the assembly but no idea what that mean.
Any help will be appriciated.

If I understand what your professor is asking, you need to do this in steps:
Create a function in C++ to calculate the dot product.
In main(), call this function in a loop many thousands of times, say 5000.
Add a call to QueryPerformanceCounter before and after this loop.
Run your program and note the total time it took to call your function 5000 times.
Use the compiler to generate assembly for your function. Save that assembly to an .asm file and then optimize it.
Assemble the .asm file with an appropriate assembler in order to generate an .obj file.
Compile your .cpp file and link it with the .obj file you generated in the step above to produce an .exe file.
Run the program again and note the total time it took to call your optimized function 5000 times.
Compare the two measurements (and note how the compiler is probably better at optimizing than you are).
You don't say what compiler, assembler or hardware platform you're using, so I can't provide more details than that.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using nvdisasm to generate control flow image of PTX code - parallel-processing

or compile the ptx file to a cubin file, and use that Yes, you can do that. Compile your ptx file to a cubin with: nvcc example.ptx -cubin (the result will be in example.cubin or you could add e.g. -o myfile.cubin to name it something else) This cubin file can be fed to nvdisasm for processing.

Related

what do I do with an SIGFPE address in gdb?

What are gcc linker map files used for?

Generating a PE format executable

How Linux Kernel knows how to execute a binary format

Optimizing an assemly for .cpp file

Categories

Resources