I'm used to using gdb quite effectively when I am dealing with ELF binaries which have been compiled using the -ggdb flag. However there are a few difficulties I am facing when I am facing normal non-stripped binaries.
I can set the breakpoint at main, but what if I needed to set the breakpoint at a fixed offset(say 10 lines) from the start of main?
Usually I get the address of a character array(say buf) as print &buf. However, in the current case I get a message saying that buf cannot be found in the current context.
How do I deal with the above mentioned issues? It would be great if you could provide some reading material too.
To get things like source line number and variable information, your code needs to be compiled with debug symbols (-ggdb or similar). Compiling without debug symbols but unstripped keeps in function and global variable names, but nothing else. Stripping the executable even removes some of those. So, in answer to your question, you can't do the things you want without compiling with -g.
Related
Consider the following Linux kernel dump stack trace; e.g., you can trigger a panic from the kernel source code by calling panic("debugging a Linux kernel panic");:
[<001360ac>] (unwind_backtrace+0x0/0xf8) from [<00147b7c>] (warn_slowpath_common+0x50/0x60)
[<00147b7c>] (warn_slowpath_common+0x50/0x60) from [<00147c40>] (warn_slowpath_null+0x1c/0x24)
[<00147c40>] (warn_slowpath_null+0x1c/0x24) from [<0014de44>] (local_bh_enable_ip+0xa0/0xac)
[<0014de44>] (local_bh_enable_ip+0xa0/0xac) from [<0019594c>] (bdi_register+0xec/0x150)
In unwind_backtrace+0x0/0xf8 what does +0x0/0xf8 stand for?
How can I see the C code of unwind_backtrace+0x0/0xf8?
How to interpret the panic's content?
It's just an ordinary backtrace, those functions are called in reverse order (first one called was called by the previous one and so on):
unwind_backtrace+0x0/0xf8
warn_slowpath_common+0x50/0x60
warn_slowpath_null+0x1c/0x24
ocal_bh_enable_ip+0xa0/0xac
bdi_register+0xec/0x150
The bdi_register+0xec/0x150 is the symbol + the offset/length there's more information about that in Understanding a Kernel Oops and how you can debug a kernel oops. Also there's this excellent tutorial on Debugging the Kernel
Note: as suggested below by Eugene, you may want to try addr2line first, it still needs an image with debugging symbols though, for example
addr2line -e vmlinux_with_debug_info 0019594c(+offset)
Here are two alternatives for addr2line. Assuming you have the proper target's toolchain, you can do one of the following:
Use objdump:
locate your vmlinux or the .ko file under the kernel root directory, then disassemble the object file :
objdump -dS vmlinux > /tmp/kernel.s
Open the generated assembly file, /tmp/kernel.s. with a text editor such as vim. Go to
unwind_backtrace+0x0/0xf8, i.e. search for the address of unwind_backtrace + the offset. Finally, you have located the problematic part in your source code.
Use gdb:
IMO, an even more elegant option is to use the one and only gdb. Assuming you have the suitable toolchain on your host machine:
Run gdb <path-to-vmlinux>.
Execute in gdb's prompt: list *(unwind_backtrace+0x10).
For additional information, you may checkout the following resources:
Kernel Debugging Tricks.
Debugging The Linux Kernel Using Gdb
In unwind_backtrace+0x0/0xf8 what the +0x0/0xf8 stands for?
The first number (+0x0) is the offset from the beginning of the function (unwind_backtrace in this case). The second number (0xf8) is the total length of the function. Given these two pieces of information, if you already have a hunch about where the fault occurred this might be enough to confirm your suspicion (you can tell (roughly) how far along in the function you were).
To get the exact source line of the corresponding instruction (generally better than hunches), use addr2line or the other methods in other answers.
My program is panicking so I followed its advice to run RUST_BACKTRACE=1 and I get this (just a little snippet).
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
2: 0x800c22ed - std::panicking::default_hook::{{closure}}::h59672b733cc6a455
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:351
If the program panics it stops the whole program, so where can I figure out at which line it's panicking on?
Is this line telling me there is a problem at line 42 and line 351?
The whole backtrace is on this image, I felt it would be to messy to copy and paste it here.
I've never heard of a stack trace or a back trace. I'm compiling with warnings, but I don't know what debugging symbols are.
What is a stack trace?
If your program panics, you encountered a bug and would like to fix it; a stack trace wants to help you here. When the panic happens, you would like to know the cause of the panic (the function in which the panic was triggered). But the function directly triggering the panic is usually not enough to really see what's going on. Therefore we also print the function that called the previous function... and so on. We trace back all function calls leading to the panic up to main() which is (pretty much) the first function being called.
What are debug symbols?
When the compiler generates the machine code, it pretty much only needs to emit instructions for the CPU. The problem is that it's virtually impossible to quickly see from which Rust-function a set of instructions came. Therefore the compiler can insert additional information into the executable that is ignored by the CPU, but is used by debugging tools.
One important part are file locations: the compiler annotates which instruction came from which file at which line. This also means that we can later see where a specific function is defined. If we don't have debug symbols, we can't.
In your stack trace you can see a few file locations:
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
The Rust standard library is shipped with debug symbols. As such, we can see where the function is defined (gcc_s.rs line 42).
If you compile in debug mode (rustc or cargo build), debug symbols are activated by default. If you, however, compile in release mode (rustc -O or cargo build --release), debug symbols are disabled by default as they increase the executable size and... usually aren't important for the end user. You can tweak whether or not you want debug symbols in your Cargo.toml in a specific profile section with the debug key.
What are all these strange functions?!
When you first look at a stack trace you might be confused by all the strange function names you're seeing. Don't worry, this is normal! You are interested in what part of your code triggered the panic, but the stack trace shows all functions somehow involved. In your example, you can ignore the first 9 entries: those are just functions handling the panic and generating the exact message you are seeing.
Entry 10 is still not your code, but might be interesting as well: the panic was triggered in the index() function of Vec<T> which is called when you use the [] operator. And finally, entry 11 shows a function you defined. But you might have noticed that this entry is missing a file location... the above section describes how to fix that.
What do to with a stack trace? (tl;dr)
Activate debug symbols if you haven't already (e.g. just compile in debug mode).
Ignore any functions from std and core at the top of the stack trace.
Look at the first function you defined, find the corresponding location in your file and fix the bug.
If you haven't already, change all camelCase function and method names to snake_case to stick to the community wide style guide.
I understand more or less the idea: When compiling separate modules and producing assembly code, functions calling each other have to respect strictly the calling convention, which kills the opportunity for many optimisations when compiling separate modules.
For instance if I have function A which calls function B which calls function C, all 3 in their own separate source files, it becomes possible to allocate registers evenly within the functions so that no register saving on the stack is necessary at all during those calls. With traditional compile-assembly-linking this is not possible, as the caller-saved and callee-saved registers are imposed by the calling convention.
Another optimisation is to inline functions which are called only once. This previously was possible only if a function is local, but thanks to linktime optimisation it's now possible even if the function is in another source file.
Now, if I compile with both -flto and -S flags, I see that instead of normal assembly instructions, gcc generates an encoded representation of the program, such as this:
.section .gnu.lto_.inline.c3c5e6ef8ec983c,"dr0"
.ascii "x\234mQ;N\303#\20}\273\353\17\370C\234\20\242`\"!Q\20\11Ah\322&\25\242\314\231|\4\32\220\220(,$.#\205D\343\3P Z.\341Tn\231\35\274\31L\342\342\355\314\274\371<\317\30\354\376\356\365\357\333\7\262"
.ascii "1\240G\325\273\202\7\216\232\204\36\205"
.ascii "8\242\370\240|\222"
.ascii "8\374\21\205ty\352\"*r\340!:!n\357n%]\224\345\10|\304\23\342\274z\346"
.ascii "8\35\23\370\7\4\1\366s\362\203j\271]\27bb{\316\353\27\343\310\4\371\374\237*n#\220\342rA\31"
.ascii "7\365\263\327\231\26\364\10"
.ascii "2\\-\311\277\255^w\220}|\340\233\306\352\263\362Qo+e+\314\354\277\246\354\252\277\20\364\224%T\233'eR\301{\32\340\372\313\362\263\242\331\314\340\24\6\21s\210\243!\371\347\325\333&m\210\305\203\355\277*\326\236\34\300-\213\327\306\2Td\317\27\231\26tl,\301\26\21cd\27\335#\262L\223"
.ascii "8\353\30\351\264{I\26\316\11\14"
.ascii "9\326h\254\220B}6a\247\13\353\27M\274\231"
.ascii "0\23M\332\272\272%d[\274\36Q\200\37\321\1&\35"
Since the data is in its own particular section, the linker sees this, and does the code generation. If the module was written in either assembly or with no -flto flag, then the linker would see data in the .text section instead, so there is no confusion possible for the linker.
The problem is: How can the linker generate code? Normally only gcc can generate code, the linker's role is just here to change a few offsets and adapt the binary format. In order to generate code, the linker would need to contain a second copy of the entire gcc backend (half of the compiler which generates assembly code from intermediate representation), as well as the entire assembler (since no assembly code was produced). How is such a thing possible, especially considering that binutils is a completely separate entity from gcc, developed by different teams?
GCC's -flto emits a serialized form of GCC's internal representation, as you discovered.
Then, at link time, the linker reinvokes GCC and passes it the objects that need final compilation. GCC reads the internal representation and does the work.
I think the actual work is done in collect2, which is part of GCC that is used when invoking the linker (I'm a little fuzzy on the details). There is also a "linker plugin" system that enables this to work a little better (like letting the linker decide how to split the compilation). This is implemented at least by the binutils ld and by gold; but as far as I recall this is just an optimization and isn't needed to get the basic -flto feature to work. You can see a bit more information on the original LTO project page; and maybe links from there would explain more.
There is more overlap between the GCC and binutils teams than you might think. The two projects share some code and have a long history of working together. Some people work on both projects.
From https://gcc.gnu.org/wiki/LinkTimeOptimization:
Despite the "link time" name, LTO does not need to use any special
linker features. The basic mechanism needed is the detection of GIMPLE
sections inside object files. This is currently implemented in
collect2 [which is called by gcc; -ps]. Therefore, LTO will work on any linker already supported by
GCC.
I assume this means you must link calling the compiler driver gcc. Simply linking with the system's vanilla linker wouldn't optimize the whole program, as you already concluded.
Update:
https://gcc.gnu.org/onlinedocs/gccint/Collect2.html says
The program collect2 is installed as ld in the directory where the
passes of the compiler are installed. When collect2 needs to find the
real ld it tries the following file names: [...]
(The page goes on detailing how collect2 looks for configuration-dependent executables and ones with well-known names like real-ld, finally even ld; but will not call itself recursively.)
I am working with Firefox on a research project. Firefox makes uses of lots of JIT'ed code during run time.
I instrumented Firefox using a custom PIN tool to find out locations(address) of some things I as looking for. The issue is that those location are in JIT'ed code. I want to know what is actually happening over there in the code.
To do this I dumped the corresponding memory region and used objdump to disassemble the dump.
I used objdump -D -b binary -mi386 file.dump to see the instructions that would have been executed. To my surprise the only section listed is .data section (a very big one).
Either i am incorrectly disassembling it or something else is wrong with my understanding. I expect to see more sections like .text where actual executable instructions should be present and .data section should not be executable.
Am I correct in my understanding here?
Also If some one can please advise me on how to properly know what is happening in Jit'ed code.
Machine
Linux 3.13.0-24-generic #47-Ubuntu SMP x86_64
or something else is wrong with my understanding
Yes: something else is wrong with your understanding.
Sections (such as .text and .data) only make sense at static link time (the static linker groups .text from multiple .o files together into a single .text in the final executable). They are not useful, and in fact could be completely stripped, at execution time. On ELF systems, all that you need at runtime are segments (PT_LOAD segments in particular), which you can see with readelf -l binary.
Sections in ELF file are "parts of the file". When you dump memory, sections don't make any sense to even talk about.
The .data that you see in objdump output is not really there either, it's just an artifact that objdump manufactures.
I'm working on the Pintos toy operating system at university, but there's a strange bug when using GCC 4.6.2. When I push my system call arguments (just 3 pushl-s in inline assembly), some mysterious data also appears on the stack, and the arguments are in the wrong order. Setting -fno-omit-frame-pointer gets rid of the strange data, but the arguments are still in the wrong order. GCC 4.5 works fine. Any idea what specific option could fix this?
NOTE: the problem still occurs with -O0.
Without a code example and a listing of the result from your different compilations, it's difficult to help you. But here are three possible causes for your problems:
Make sure you understand how arguments are pushed to the stack. Arguments are pushed from the back. This makes it possible for printf(char *, ...) to examine the first item to find out how many more there are. If you want to call the function int foo(int a, int b, int c), you'll need to push c, then b and finally a.
Could the strange data on the stack be a return address or EFLAGS? I don't know Pintos and how system calls are made, but make sure that you understand the difference between CALL/RET and INT/IRET. INT pushes the flags onto the stack.
If your inline assembly has side effects, you might want to write volatile/__volatile__ in front of it. Otherwise GCC is allowed to move it when optimizing.
I need to see your code to better understand what's going on.
The culprit was -fomit-frame-pointer, which has been enabled by default since 4.6.2. -fno-omit-frame-pointer fixed the issue.
Did you clean the parameters on stack after the syscall? gcc may not be aware that you touch the stack and generate code depends on the stack pointer it expected.
-fno-omit-frame-pointer force gcc to use e/rbp for accessing locate data but it just hide the actual problem.