map memory addresses to line numbers using DWARF information - debugging

I have an application that traces program execution through memory. I tried to use readelf --debug-dump=decodedline to get memory address / line # information, but the memory addresses I see don't match up often with the ones given by that dump. I wrote something to match up each address with the "most recent" one appearing in the DWARF data -- this seemed to clean some things up but I'm not sure if that's the "official" way to interpret this data.
Can someone explain the exact process to map a program address to line number using DWARF?

Have a look at the program addr2line. It can probably give you some guidance on how to do this, if not solving your problem entirely (e.g. by shelling out to it, or linking its functionality in).

Indeed, as mentioned by Phil Miller's answer, addr2line is your friend. I have a gist where I show how I get the line number in the (C++) application source code from an address obtained from a backtrace.
Following this process will not show you the process you mention, but can give you an idea of how the code gets mapped into the object code (in an executable or a library/archive). Hope it helps.

Related

Using binary breakpoints in GDB - how exact is the location?

I have some memorydumps from Linux Redhat GCC compiled programs like:
/apps/suns/runtime/bin/mardb82[0x40853b]
When I open mardb82 and put the breakpoint with break *0x40853b it will give me C filename/lineno which seems quite correct, but not completely.
Can I trust it, and what does it depend on? Is it sufficient if the source file in question is the same or does the files making up the executable have to be the same?
Can I find the locations in sources in some other way?
(Max debug info and sources are present, I haven't tried not having the sources present or passing them in)
When I open mardb82 and put the breakpoint with break *0x40853b it will give me C filename/lineno which seems quite correct, but not completely.
A faster way to get the filename/line:
addr2line -fe /path/to/mardb82 0x40853b
You didn't say where the ...bin/mardb82[0x40853b] line came from. Assuming it is a part of a crash stack, note that the instruction is usually the next after a CALL, so you may be interested in 0x40853b-5 (on *86 architectures) for all but the innermost level in the stack.
what does it depend on? Is it sufficient if the source file in question is the same or does the files making up the executable have to be the same?
The instruction address depends on the particular executable. Any change to source code comprising that executable, to compilation or linking flags, etc. etc. may cause the instructions to shift to a different address.

Debugging a program without source code (Unix / GDB)

This is homework. Tips only, no exact answers please.
I have a compiled program (no source code) that takes in command line arguments. There is a correct sequence of a given number of command line arguments that will make the program print out "Success." Given the wrong arguments it will print out "Failure."
One thing that is confusing me is that the instructions mention two system tools (doesn't name them) which will help in figuring out the correct arguments. The only tool I'm familiar with (unless I'm overlooking something) is GDB so I believe I am missing a critical component of this challenge.
The challenge is to figure out the correct arguments. So far I've run the program in GDB and set a breakpoint at main but I really don't know where to go from there. Any pro tips?
Are you sure you have to debug it? It would be easier to disassemble it. When you disassemble it look for cmp
There exists not only tools to decompile X86 binaries to Assembler code listings, but also some which attempt to show a more high level or readable listing. Try googling and see what you find. I'd be specific, but then, that would be counterproductive if your job is to learn some reverse engineering skills.
It is possible that the code is something like this: If Arg(1)='FOO' then print "Success". So you might not need to disassemble at all. Instead you only might need to find a tool which dumps out all strings in the executable that look like sequences of ASCII characters. If the sequence you are supposed to input is not in the set of characters easily input from the keyboard, there exist many utilities that will do this. If the program has been very carefully constructed, the author won't have left "FOO" if that was the "password" in plain sight, but will have tried to obscure it somewhat.
Personally I would start with an ltrace of the program with any arbitrary set of arguments. I'd then use the strings command and guess from that what some of the hidden argument literals might be. (Let's assume, for the moment, that the professor hasn't encrypted or obfuscated the strings and that they appear in the binary as literals). Then try again with one or two (or the requisite number, if number).
If you're lucky the program was compiled and provided to you without running strip. In that case you might have the symbol table to help. Then you could try single stepping through the program (read the gdb manuals). It might be tedious but there are ways to set a breakpoint and tell the debugger to run through some function call (such as any from the standard libraries) and stop upon return. Doing this repeatedly (identify where it's calling into standard or external libraries, set a breakpoint for the next instruction after the return, let gdb run the process through the call, and then inspect what the code is doing besides that.
Coupled with the ltrace it should be fairly easy to see the sequencing of the strcmp() (or similar) calls. As you see the string against which your input is being compared you can break out of the whole process and re-invoke the gdb and the program with that one argument, trace through 'til the next one and so on. Or you might learn some more advanced gdb tricks and actually modify your argument vector and restart main() from scratch.
It actually sounds like fun and I might have my wife whip up a simple binary for me to try this on. It might also create a little program to generate binaries of this sort. I'm thinking of a little #INCLUDE in the sources which provides the "passphrase" of arguments, and a make file that selects three to five words from /usr/dict/words, generates that #INCLUDE file from a template, then compiles the binary using that sequence.

How do I determine where process executable code starts and ends when loaded in memory?

Say I have app TestApp.exe
While TestApp.exe is running I want a separate program to be able to read the executable code that is resident in memory. I'd like to ignore stack and heap and anything else that is tangential.
Put another way, I guess I'm asking how to determine where the memory-side equivalent of the .exe binary data on disk resides. I realize it's not a 1:1 stuffing into memory.
Edit: I think what I'm asking for is shown as Image in the following screenshot of vmmap.exe
Edit: I am able to get from memory all memory that is tagged with any protect flag of Execute* (PAGE_EXECUTE, etc) using VirtualQueryEx and ReadProcessMemory. There are a couple issues with that. First, I'm grabbing about 2 megabytes of data for notepad.exe which is a 189 kilobyte file on disk. Everything I'm grabbing has a protect flag of PAGE_EXECUTE. Second, If I run it on a different Win7 64bit machine I get the same data, only split in half and in a different order. I could use some expert guidance. :)
Edit: Also, not sure why I'm at -1 for this question. If I need to clear anything up please let me know.
Inject a DLL to the target process and call GetModuleHandle with the name of the executable. That will point to its PE header that has been loaded in the memory. Once you have this information, you can parse the PE header manually and find where .text section is located relative to the base address of the image in the memory.
no need to inject a dll
use native api hooking apis
I learned a ton doing this project. I ended up parsing the PE header and using that information to route me all over. In the end I accomplished what I set out to and I am more knowledgeable as a result.

How to map a file offset in an EXE to its PE section

I've opened up a program I wrote with ImageHlp.dll to play around with it a little, and I noticed that there seem to be large gaps in the file. As I understand it, for each PE section, the section header gives its offset in the file as PhysicalAddress, and its size as SizeOfRawData, and thus everything from PhysicalAddress to PhysicalAddress + SizeOfRawData ought to be that section. But there are large swaths of the EXE file that aren't covered by these ranges, so I must be missing something.
I know I can use ImageRVAToSection and give it an RVA address to find out which section that RVA is located in. Is there any way to do something similar with file offsets? How can I find out which PE section byte $ED178 or whatever belongs to?
Edit: Sorry, I didn't read your question carefully enough.
Doing some looking, I'm finding a few files like you mentioned, that the data in the section headers doesn't cover the entire contents of the file. Most of those I've found so far contain a debug record that's not covered. There are a few others with discrepancies I haven't been able to figure out yet though. When/if I can figure out more, I'll add it.
I posted in How does one use VirtualAllocEx do make room for a code cave? a code fragment which examine PEs current loaded in the memory. Probably you will find the answer on your question if you compare the contain of DLL in memory with the contain on the disk (which shows ImageHlp.dll).

Getting line number from pdb in release mode

Is it possible for the debugger (or the CLR exception handler) to show the line where the exception happened in Release mode using the pdb?
The code, in release mode, is optimized and do not always follow the order and logic of the "original" code.
It's also surprising that the debugger can navigate through my code step by step, even in Release mode. The optimization should make the navigation very inconfortable.
Could you please clarify those two points for me?
I'm not as familiar with how this is done with CLR, but it's probably very similar to how it's done with native code. When the compiler generates machine instructions, it adds entries to the pdb that basically say "the instruction at the current address, X, came from line 25 in foo.cpp".
The debugger knows what program address is currently executing. So it looks up some address, X, in the pdb and sees that it came from line 25 in foo.cpp. Using this, it's able to "step" through your source code.
This process is the same regardless of Debug or Release mode (provided that a pdb is generated at all in Release mode). You are right, however, that often in release mode due to optimizations the debugger won't step "linearly" through the code. It might jump around to different lines unexpectedly. This is due to the optimizer changing the order of instructions, but it doesn't change the address-to-source-line mapping, so the debugger is still able to follow it.
[#Not Sure] has it almost right. The compiler makes a best effort at identifying an appropriate line number that closely matches the current machine code instruction.
The PDB and the debugger don't know anything about optimizations; the PDB file essentially maps address locations in the machine code to source code line numbers. In optimized code, it's not always possible to match exactly an assembly instruction to a specific line of source code, so the compiler will write to the PDB the closest thing it has at hand. This might be "the source code line before", or "the source code line of the enclosing context (loop, etc)" or something else.
Regardless, the debugger essentially finds the entry in the PDB map closest (as in "before or equal") to the current IP (Instruction Pointer) and highlights that line.
Sometimes the match is not very good, and that's when you see the highlighted area jumping all over the place.
The debugger makes a best-effort guess at where the problem occurred. It is not guaranteed to be 100% accurate, and with fully optimized code, it often will be inaccurate - I've found the inaccuracies ranging anywhere from a few lines off to having an entirely wrong call stack.
How accurate the debugger is with optimized code really depends on the code itself and which optimizations you're making.
Reference the following SO question:
Display lines number in stack trace for .NET assembly in release mode

Resources