Extract Structure definitions from executable - debugging

I need to extract structure definitions from an executable. How can I do that?
I read we can do it using ELF, but not sure how to do this. Any help here?

I read we can do it using ELF, but not sure how to do this.
What you probably read is that if a binary contains debug info, then the types of variables, structures, and great many other kinds of info can be extracted from that binary.
This isn't specific to ELF: many other executable formats (such as COFF) allow for embedding of debugging info as well.
Further, the format of that debugging info is different between different platforms. Some of the common UNIX ones are DWARF and STABS (with DWARF being more recent and much more powerful).
If you have an ELF binary, and you suspect that it may contain DWARF debug info, you can decode it using readelf -wi a.out (be prepared for there to be a lot of info, if any is present at all). objdump -g can be used to decode STABS (recent objdump versions can decode DWARF as well).
Or, as suggested by tristan, you can load the executable into GDB and use info types and ptype commands.
If the binary doesn't contain debug info, then DrPrItay's answer is correct: you can't easily recover structure definitions from it. However, you still can recover them by using reverse-engineering techniques. For example, many struct definitions used by the Wine project (example) were obtained by such techniques.

As much as I know, you can't. c / c++ programs are not like java, structs dont gain a symbol. Their just definitions for your compiler about how to align and pack variables within stack frames or some other memory (struct data members). For example unlike java you dont have what resembles class loading when loading shared objects's (no header file included within your c program ) you can only load global variables and functions. Defining a struct is much as creating some data type, it's definition should be only present for compilation, you dont get a symbol within the symtable for int or char then why should you for some struct? It simply makes no sense. Symbols aee soley meant for objects that your compiler doesn't recognize during compilation - link time/load time/run time

Related

Rust library for inspecting .rlib binaries

I'm looking for a way to load and inspect .rlib binaries generated by rustc. I've hunted around the standard library without much luck. My assumption is that an .rlib contains all the type information necessary to statically type check programs that "extern crate" it. rustc::metadata is where my hunt ended. I can't quite figure out if the structures available at this point in the compiler are intended as entry points for users, or if they are solely intermediate abstractions depending on a chain of previously initialized data.
Alternatively, If there's a way to dump an .rlib to stdout in a parsable form then that's also fantastic. I tried /usr/bin/nm, but it seemed to be excluding function type signatures. Maybe I'm missing something.
Anyways, I'm working on an editor utility for emacs that I hope at some point will provide contextually relevant information such as available methods, module items and their types, etc. I'd really appreciate any hints anyone has.
The .rlib file is an ar archive file. You can use readelf to read its content.
Try readelf -s <your_lib>.rlib. The type name may be mingled/decorated by the compiler so it may not be exactly the same as in .rs file.

GCC proper visibility for shared object written in C++

I have a huge project written in C++. It's all split into multiple static libraries that are eventually linked into one final shared library which has to export only a few simple functions.
If I do objdump of that final .so I see all my internal names etc. Because it uses long class names and namespaces these strings become excessively long and as a result final binary is big.
So, my question is how do I do it properly with GCC to make sure that all these internal functions do not show up in the final binary?
I'm aware about all these GCC-specific visibility modifiers, I use -fvisibility=hidden -fvisibility-inlines-hidden, I use -Wl,--no-whole-archive. I disable c++ exceptions and rtti (-fno-exceptions -fno-rtti) but i still can't get GCC to generate my final .so that doesn't contain names of my namespaces and classes that aren't supposed to be there at all!
I tried to use -Wl,--version-script= to control which functions should be visible, but still I see lot's of internal names in final stripped shared object. I read multiple similar entries on SO, but don't see anything that does the job.
Note: I compile for multiple platforms (Linux, Windows, iPhone etc) and only on windows in VS I don't have any problems.
thanks
You might want to try the --retain-symbols-file linker option when linking the final .so file (-Wl,--retain-symbols-file=filename) to specify JUST the symbols you want to keep (export) and delete everything else. The file is just a text file with symbols (one per line) to keep.

Static library "interface"

Is there any way to tell the compiler (gcc/mingw32) when building an object file (lib*.o) to only expose certain functions from the .c file?
The reason I want to do this is that I am statically linking to a 100,000+ line library (SQLite), but am only using a select few of the functions it offers. I am hoping that if I can tell the compiler to only expose those functions, it will optimize out all the code of the functions that are never needed for those few I selected, thus dratically decreasing the size of the library.
I found several possible solutions:
This is what I asked about. It is the gcc equivalent of Windows' dllexpoort:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Code-Gen-Options.html (-fvisibility)
http://gcc.gnu.org/wiki/Visibility
I also discovered link-time code-generation. This allows the linker to see what parts of the code are actually used and get rid of the rest. Using this together with strip and -fwhole-program has given me drastically better results.
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html (see -flto and -fwhole-program)
Note: This flag only makes sense if you are not compiling the whole program in one call to gcc, which is what I was doing (making a sqlite.o file and then statically linking it in).
The third option which I found but have not yet looked into is mentioned here:
How to remove unused C/C++ symbols with GCC and ld?
That's probably the linkers job, not the compilers. When linking that as a program (.exe), the linker will take care of only importing the relevant symbols, and when linking a DLL, the __dllexport mechanism is probably what you are looking for, or some flags of ld can help you (man ld).

Does exist any utility to know the size of a compiled function in an executable?

I want a report showing me the size of diferent symbols(compiled) in the executable. Something like .map files in Delphi, but generic if possible. nm from binutils, shows start address(?), maybe could i use that information?
(I'm using object pascal + freepascal compiler)
FPC/LD can generate mapfiles too
various ways to analyze .o files. (nm, objdump and parse the address increments between sections)
maybe the information is stored in the .ppu, have a look in the ppu unit (compiler dir) which contains .ppu loaders

Size of a library and the executable

I have a static library *.lib created using MSVC on windows. The size of library is say 70KB. Then I have an application which links this library. But now the size of the final executable (*.exe) is 29KB, less than the library. What i want to know is :
Since the library is statically linked, I was thinking it should add directly to the executable size and the final exe size should be more than that? Does windows exe format also do some compression of the binary data?
How is it for linux systems, that is how do sizes of library on linux (*.a/*.la file) relate with size of linux executable (*.out) ?
-AD
A static library on both Windows and Unix is a collection of .obj/.o files. The linker looks at each of these object files and determines if it is needed for the program to link. If it isn't needed, then the object file won't get included in the final executable. This can lead to executables that are smaller then the library.
EDIT: As MSalters points out, on Windows the VC++ compiler now supports generating object files that enable function-level linking, e.g., see here. In fact, edit-and-continue requires this, since the edit-and-continue needs to be able to replace the smallest possible part of the executable.
There is additional bookkeeping information in the .lib file that is not needed for the final executable. This information helps the linker find the code to actually link. Also, debug information may be stored in the .lib file but not in the .exe file (I don't recall where debug info is stored for objs in a lib file, it might be somewhere else).
The static library probably contains several functions which are never used. When the linker links the library with the main executable, it sees that certain functions are never used (and that their addresses are never taken and stored in function pointers), it just throws away the code. It can also do this recursively: if function A() is never called, and A() calls B(), but B() is never otherwise called, it can remove the code for both A() and B(). On Linux, the same thing happens.
A static library has to contain every symbol defined in its source code, because it might get linked into an executable which needs just that specific symbol. But once it is linked into an executable, we know exactly which symbols end up being used, and which ones don't. So the linker can trivially remove unused code, trimming the file size by a lot. Similarly, any duplicate symbols (anything that's defined in both the static library and the executable it's linked into gets merged into a single instance.
Disclaimer: It's been a long time since I dealt with static linking, so take my answer with a grain of salt.
You wrote: I was thinking it should add directly to the executable size and final exe size should be more than that?
Naive linkers work exactly this way - back when I was doing hobby development for CP/M systems (a LONG time ago), this was a real problem.
Modern linkers are smarter, however - they only link in the functions referenced by the original code, or as required.
Additionally to the current answers, the linker is allowed to remove function definitions if they have identical object code - this is intended to help reduce the bloating effects of templated code.
#All: Thanks for the pointers.
#Greg Hewgill - Your answer was a good pointer. Thanks.
The answer i found out was as follows:
1.)During Library building what happens is if the option "Keep Program debug databse" in MSVC (or something alike ) is ON, then library will have this debug info bloating its size.
but when i statically include that library and create a executable, the linker strips all that debug info from the library before geenrating the exe and hence the exe size is less than that of the library.
2.) When i disabled the option "Keep Program debug databse", i got an library whose size was smaller than the final executable, which was what i thought is nromal in most situations.
-AD

Resources