How to use dwarf to map instructions in object file to source code line - object-files

I'm new to dwarf and used some tools like addr2line, objectdump. However, the problem I'm facing is that I want to get all the instructions in object files/static libraries mapping to its source code lines. The tricky part is that there is no address in object file since every function starts from 0x00. So addr2line doesn't work(or maybe I didn't use it correctly).
Do you know any suggestions or existing tools, instead of parsing dwarf info by myself?
Thanks!

Finally I write a tool to iterate the .debug_line info to get the mapping by myself. BTW, I find the function sequence in .debug_line is not the same as it in .debug_info section. The .debug_line sequence is the same as the function binaries in object file, but .debug_info's sequence is followed by another rule.

Related

What's the pass in GCC handles const strings?

What's the pass name in GCC that handles building string array into .rodata section? Would like to write a plugin to intercept also strings in source code, I know there're a bunch of tools in binutils can achieve the same goal, but what if we want to do some postprocessing, for example verify words.
Read-only data section, also known as .rodata, generates after the last step of all rtl passes. You can see how it works in file varasm.c, which lays in /gcc folder. Look at section
section *
default_function_rodata_section (tree decl)
and below.
You can also easily add some functions to intercept data into asm file or some other output file here or write an external function.
varasm.c file handles the generation of all the assembler code
except the instructions of a function.
This includes declarations of variables and their initial values.

Extract Structure definitions from executable

I need to extract structure definitions from an executable. How can I do that?
I read we can do it using ELF, but not sure how to do this. Any help here?
I read we can do it using ELF, but not sure how to do this.
What you probably read is that if a binary contains debug info, then the types of variables, structures, and great many other kinds of info can be extracted from that binary.
This isn't specific to ELF: many other executable formats (such as COFF) allow for embedding of debugging info as well.
Further, the format of that debugging info is different between different platforms. Some of the common UNIX ones are DWARF and STABS (with DWARF being more recent and much more powerful).
If you have an ELF binary, and you suspect that it may contain DWARF debug info, you can decode it using readelf -wi a.out (be prepared for there to be a lot of info, if any is present at all). objdump -g can be used to decode STABS (recent objdump versions can decode DWARF as well).
Or, as suggested by tristan, you can load the executable into GDB and use info types and ptype commands.
If the binary doesn't contain debug info, then DrPrItay's answer is correct: you can't easily recover structure definitions from it. However, you still can recover them by using reverse-engineering techniques. For example, many struct definitions used by the Wine project (example) were obtained by such techniques.
As much as I know, you can't. c / c++ programs are not like java, structs dont gain a symbol. Their just definitions for your compiler about how to align and pack variables within stack frames or some other memory (struct data members). For example unlike java you dont have what resembles class loading when loading shared objects's (no header file included within your c program ) you can only load global variables and functions. Defining a struct is much as creating some data type, it's definition should be only present for compilation, you dont get a symbol within the symtable for int or char then why should you for some struct? It simply makes no sense. Symbols aee soley meant for objects that your compiler doesn't recognize during compilation - link time/load time/run time

Is it possible to make writeable variables in .text segment using DB directive in NASM?

I've tried declaring variables in .text segment using e.g. file_handle: dd 0.
However, trying to store something in this variable like mov [file_handle], eax results in a write error.
I know, I could declare writeable variables in the .data segment, but to make the code more compact I'd like to try it as above.
Is the only possibility to use the stack for storing these value (e.g. the file handle), or could I somehow write to my variable above?
Executable code segments are not writable by default. This is a basic security precaution. No, it's not a good idea. But if you insist, as this is a toy project anyway, go ahead.
You can make yours writable by letting the linker know to mark it so, e.g. give the following argument to the MS linker:
link /SECTION:.text,EWR ....
You can actually arrange for the text segment of your Windows process to be mapped read+write+execute, see #Kuba's answer. This might also be possible on Linux with ELF binaries; I think ELF has similar flags for segments.
I think you could also call a Windows function (VirtualProtect) to change the mapping of your text segment to read+write+execute from inside your process.
Overall this sounds like a terrible idea, and you should definitely keep temporaries on the stack like a C compiler would, if you want to avoid having a data page.
Static storage for things you only use in part of the program is wasteful.
No it's not possible to have writable "variable" in .text section of an assembly program.
When writing file_handle: dd 0 in the .text section and then assemblying, your label file_handle refers to an address located in the text section of your binary. However the text section is read-only.
If the text section wasn't only read-only accessible, a program could modify itself while executing.

How to decode a debug_line section?

I'm trying to figure out how a DWARF 2 debug_line section is encoded. The standard paper (http://www.dwarfstd.org/doc/dwarf-2.0.0.pdf) isn't much helpful to me and I really don't understand how something like the following:
.4byte .debug_line
.4byte 0x736e7502, 0x656e6769, 0x6e692064, 0x04070074
represents something. There's the "unsigned int" string encoded there but the 0x02 value before it.. what does that represent? I can't even find a standard enum/define header with the DWARF 2 constants.. can someone shed some light on how to parse a debug_line in DWARF 2?
I know, it's quite ancient question, but there might be someone who will be looking for a way to read .debug_line section.
I found that readelf is able to parse it:
readelf --debug-dump=line <path/to/binary>
(or)
readelf --debug-dump=decodedline <path/to/binary>
First shows you interpreted .debug_line content as appears in Elf binary. Second composes all data into more structured view, in respect with references between different records given in particular .debug_line unit.
Also there is a tool dwarfdump (available in Ubuntu repos), but I did not have a chance to check it.
If the DWARF standard isn't helping then all I can really suggest is reading some source code that implements .debug_line parsing. Maybe that will be more clear; or maybe reading it in conjunction with the DWARF standard will help. There are plenty of readers for this information; a relatively simple one is in the GNU binutils; grab the source and look for .debug_line decoding in "bfd/dwarf2.c".
As for a standard header, binutils also includes a dwarf2.h. But you are right -- I don't think there is a standard header, rather various projects (binutils, elfutils, libdwarf, probably others) make their own headers, generally derived from the DWARF spec.

Within a DLL, how is the function table structured?

I've been looking into the implementation of a device library that doesn't explicitly support my operating system. In particular, I have a disassembled DLL, and a fair amount of supporting source code. Now, how is the function table/export table structured?
My understanding is that the first structure of the .data section is a table of RVAs. Next is a table of strings linked by index to that first address table. This makes sense to me, as a linker could translate between symbols and addresses.
How do functions referenced by ordinals fit into this picture? How does one know which function has such and such ordinal number, and how does the linker resolve this? In other words, given that some other DLL imports SOME_LIBRARY_ordinal_7, how does the linker know which function to work with?
Thanks, all!
edit
More information...
Im working with the FTDI libraries, and would like to resolve which function is being invoked. In particular, I see something like:
extern FTD2XX_Ordinal_28: near
how might I go about determining which function is being referenced, and how does the linker do this?
To learn how the linkers and the loader works on Windows, probably the most accessible information comes from a set of columns Matt Pietrek did more than a decade ago:
July 1997: http://www.microsoft.com/msj/0797/hood0797.aspx
April 1998: http://www.microsoft.com/msj/0498/hood0498.aspx
September 1999: http://www.microsoft.com/msj/0999/hood/hood0999.aspx
And the biggest and best one:
Peering Inside the PE: A tour of the Win32 Portable Executable File Format (from 1994!)

Resources