GDB using variable name to access local variable name - debugging

gdb provides a command "print localx" which prints the value stored in the localx variable. So, it basically must be using the symbol table to find the mapping (localx -> addressx on stack). I am unable to understand how this mapping can be created.
What I tried
I studied the intermediate temporary files of gcc using -save-temps option, and observed that a local variable local1 was mapped to a symbol name "LASF8". However, the objdump utility tool did not show this symbol name.
Context :
I am working on a project which requires building a pin-tool to print the accesses of local variables. Given a function, I would like to say that this address corresponds to this variable name. This requires reading the symbol table to correspond an address to a symbol table entry. GDB does the exact reverse mapping. Hence, I would like to understand the same.

The symbol table is contained in the debugging information. This debugging information is emitted by gcc -g. gdb reads the debugging information to get symbolic information, among other things.
Typically the debugging information is in DWARF format. See http://www.dwarfstd.org/ for the specification.
You can also see DWARF more directly using readelf. For example readelf -wi will show the main (".debug_info") debugging information for an ELF file.
Note that doing the mapping in reverse -- that is, assigning a name to every stack slot -- is not entirely easy. First, not every stack slot will have a name. This is because the compiler may spill temporaries to the stack. Second, many locals will have DWARF location expressions to represent their location. This means you'll need to write an expression evaluator (not hard but also not trivial); you could conceivably (unlikely in practice but possible in theory) run into expressions which cannot be evaluated without a real stack frame; and finally the names will therefore generally only be valid at a given PC.
I believe there's a feature request in gdb bugzilla to add this feature to gdb.

Related

Extract Structure definitions from executable

I need to extract structure definitions from an executable. How can I do that?
I read we can do it using ELF, but not sure how to do this. Any help here?
I read we can do it using ELF, but not sure how to do this.
What you probably read is that if a binary contains debug info, then the types of variables, structures, and great many other kinds of info can be extracted from that binary.
This isn't specific to ELF: many other executable formats (such as COFF) allow for embedding of debugging info as well.
Further, the format of that debugging info is different between different platforms. Some of the common UNIX ones are DWARF and STABS (with DWARF being more recent and much more powerful).
If you have an ELF binary, and you suspect that it may contain DWARF debug info, you can decode it using readelf -wi a.out (be prepared for there to be a lot of info, if any is present at all). objdump -g can be used to decode STABS (recent objdump versions can decode DWARF as well).
Or, as suggested by tristan, you can load the executable into GDB and use info types and ptype commands.
If the binary doesn't contain debug info, then DrPrItay's answer is correct: you can't easily recover structure definitions from it. However, you still can recover them by using reverse-engineering techniques. For example, many struct definitions used by the Wine project (example) were obtained by such techniques.
As much as I know, you can't. c / c++ programs are not like java, structs dont gain a symbol. Their just definitions for your compiler about how to align and pack variables within stack frames or some other memory (struct data members). For example unlike java you dont have what resembles class loading when loading shared objects's (no header file included within your c program ) you can only load global variables and functions. Defining a struct is much as creating some data type, it's definition should be only present for compilation, you dont get a symbol within the symtable for int or char then why should you for some struct? It simply makes no sense. Symbols aee soley meant for objects that your compiler doesn't recognize during compilation - link time/load time/run time

Listing local variables with `nm` command

I am trying to extract information from object file with nm command for some kind of static code analysis in which I have to count numbers of declared variables and functions in a C code. I have went through the documentation of GNU Binutils. I could find the variables declared in global scope in the symbol table returned by nmbut I couldn't find variables those are declared in local scope. Why is that? How can I access it?
Is there any way other than nm in which I can extract my desired information. As a compiler gcc is supposed to generate a symbol table for its use. Can I access it through any gcc command?
You cannot access to local variables from object files, because gcc does not save information about it. You can use nm only to list symbol-table of object files. These symbol-tables is used to linking. Local variables is not needed in link time. Non static fields of structs and classes too.
For viewing of local variables gcc may compile programs with special debug information about it. But for puposes of static analysing you should analyse source code or machinary code in objectfiles.

Print addresses of all local variables in C

I want to print the addresses of all the local and global variables which are being used in a function, at different points of execution of a program and store them in a file.
I am trying to use gdb for this same.
The "info local" command prints the values of all local variables. I need something to print the addresses in a similar way. Is there any built in command for it?
Edit 1
I am working on a gcc plugin which generates a points-to graph at compile time.
I want to verify if the graph generated is correct, i.e. if the pointers do actually point to the variables, which the plugin tells they should be pointing to.
We want to validate this points-to information on large programs with over thousands of lines of code. We will be validating this information using a program and not manually. There are several local and global variables in each function, therefore adding printf statements after every line of code is not possible.
There is no built-in command to do this. There is an open feature request in gdb bugzilla to have a way to show the meaning of all the known slots in the current stack frame, but nobody has ever implemented this.
This can be done with a bit of gdb scripting. The simplest way is to use Python to iterate over the Blocks of the selected Frame. Then in each such Block, you can iterate over all the variables, and invoke info addr on the variable.
Note that printing the address with print &var will not always work. A variable does not always have an address -- but, if the variable exists, it will have a location, which is what info addr will show.
One simple way these ideas can differ is if the compiler decides to put the variable into a register. There are more complicated cases as well, though, for example the compiler can put the variable into different spots at different points in the function; or can split a local struct into its constituent parts and move them around.
By default info addr tries to print something vaguely human-readable. You can also ask it to just dump the DWARF location expressions if you need that level of detail.
programmatically ( in C/C++ ) you use the & operator to get the address of a variable (assuming it's not a pointer):
int a; //variable declaration
print("%d", a); //print the value of the variable (as an integer)
print("0x%x", &a); //print the address of the variable (as hex)
The same goes for (gdb), just use &
plus the question has already been answered here (and not only)

Absolute value of symbol

I was trying to understand the System.map file that gets created every time one compiles the Linux kernel, I was trying to understand the values presented in the System.map file.
Following is a sample information from it
000001d5 A kexec_control_code_size
00400000 A phys_startup_32
c0400000 T _text
c0400000 T startup_32
c04000b4 T start_cpu0
c04000c4 T startup_32_smp
c04000e0 t default_entry
c0400158 t enable_paging
c04001da t is486`
If you see the first line, the type of the symbol kexec_control_code_size is shown as A, I know that A means value of the symbol is absolute, but I wasn't able to completely decode what that exactly means. Does value mean the address of the symbol? Does absolute address mean that this symbol will be present at this address everytime the kernel gets loaded in to the memory?
Please forgive, if the questions are too basic.
You can examine symbol type via "man nm". nm tool shows all symbols in object file. Details about type of symbols you can find under man nm. Linux kernel modules .ko file and kernel object file can be examine with nm tool. Also you can investigate symbols from zImage or uImage or any kernel image and from kernel modules using objdump and readelf. Try use man pages for detail descriptions. Address of symbol can be calculated like offset from some main point for example section start. Other approach of symbols address calculation is absolute value of address (probably absolute value related to address space?). External symbols should be absolute. Symbols marked like absolute retain the same address through any link operation.
When the linker evaluates an expression, the result is either absolute or relative to some section. A relative expression is expressed as a fixed offset from the base of a section.
The position of the expression within the linker script determines whether it is absolute or relative. An expression which appears within an output section definition is relative to the base of the output section. An expression which appears elsewhere will be absolute.
A symbol set to a relative expression will be relocatable if you request relocatable output using the -r option. That means that a further link operation may change the value of the symbol. The symbol's section will be the section of the relative expression.
A symbol set to an absolute expression will retain the same value through any further link operation. The symbol will be absolute, and will not have any particular associated section. Taken from this manual
An example is here. Look for line "The following example shows how two absolute symbol definitions can be defined. "

How can I force the order of functions in a binary with the gcc toolchain?

I'm building a static binary out of several source files and libraries, and I want to control the order in which the functions are put into the resulting binary.
The background is, I have external code which is linked against offsets in this binary. Now if I change the source, all the offsets change because gcc may decide to order the functions differently, so I want to put the referenced functions at the beginning in a fixed order so their offsets stay unchanged...
I looked through ld's documentation but couldn't find anything about order of functions.
The only thing i found was -fno-toplevel-reorder which doesn't really help me.
There is really no clean and reliable way of forcing a function to a particular address (except for the entry function) or even forcing functions having a particular order (and if you could enforce the order that would still not mean that the addresses stay the same when the source is changed!).
The biggest problem that I see is that even if it may be possible to fix a function to some address, it will be sheer impossible to fix all of them to exactly the addresses that the already existing external program expects (assuming you cannot modify this program). If that actually worked, it would be total coincidence and sheer luck.
It might be almost easiest to provide trampolines at the addresses that the other program expects, and having the real functions (whereever they may be) pointed to by these. That would require your code to use a different base address, so the actual program code doesn't collide with the trampolines.
There are three things that almost work for giving functions fixed addresses:
You can place each function that isn't allowed to move in its proper section using __attribute__ ((section ("some name"))). Unluckily, .text always appears as the first section, so if anything in .text changes so the size is bumped over the 512 byte boundary, your offsets will change. By default (but see below) you can't get a section to start before .text.
The -falign-functions=n commandline option lets you align functions to a boundary. Normally this is something around 16 bytes. Now, you could choose a large value like for example 1024. That will waste an immense amount of space, but it will also make sure that as long as functions only change moderately, the addresses of the following functions will remain the same. Obviously it still does not prevent the compiler/linker from reordering entire blocks when it feels like it (though -fno-toplevel-reorder will prevent this at least partially).
If you are willing to write a custom linker script, you can assign a start address for each section. These are virtual memory addresses, not positions in the executable, but I assume the hard linking works with VMAs (based on the default image base) too. So that could kind of work, although with much trouble and not in a pretty way.
When writing your own linker script, you could also consider putting the functions that must not move into their own sections and moving these sections at the beginning of the executable (in front of .text), so changes in .text won't move your functions around.
Update:
The "gcc" tag suggests that you probably target *NIX, so again this is probably not going to help you, but... if you have the option to use COFF, dollar-sign sections might work (the info might be interesting for others, in any case).
I just stumbled across this today (emphasis mine):
The "$" character (dollar sign) has a special interpretation in section names in object files. When determining the image section that will contain the contents of an object section, the linker discards the "$" and all characters that follow it. Thus, an object section named .text$X actually contributes to the .text section in the image. However, the characters following the "$" determine the ordering of the contributions to the image section. All contributions with the same object-section name are allocated contiguously in the image, and the blocks of contributions are sorted in lexical order by object-section name. Therefore, everything in object files with section name .text$X ends up together, after the .text$W contributions and before the .text$Y contributions.
If the documentation does not lie (and if I'm not reading wrong), this means you should be able to pack all the functions that you want located in the front into one section .text$A, and everything else into .text$B, and it should do just that.
Build your code with -ffunction-sections -- this will place each function into its own section.
If you are using GNU-ld, the linker script gives you absolute control, but is a very platform-specific and somewhat painful solution.
A better solution might be to use the recent work on gold, which allows exactly the function ordering you are seeking.
A lot of it comes from the order the functions are in the file and the order the files are on the command line when you link.
Embed something in the code that your external code can find, a const structure with some ascii code and the address to functions perhaps, then no matter where the compiler puts the functions you can find them.
that or use the normal .dll or .so mechanisms, and not have to mess with it.
In my experience, gcc -O0 will fix the binary order of functions to match the order in the source code.
However as others have mentioned, even if the order is fixed, the offsets can change as you modify the source code or upgrade your toolchain.

Resources