gdb break address is different when break (function name) / break *(function name) - debugging

#include <stdio.h>
int main(void){
int sum = 0;
sum += 0xabcd;
printf(“%x”, sum);
return 0;
}
This is my code and when I use gdb I can find different address when break main / break *main.
When I just type disassemble main it shows like this:
Dump of assembler code for function main:
0x080483c4 <+0>: push %ebp
0x080483c5 <+1>: mov %esp,%ebp
0x080483c7 <+3>: and $0xfffffff0,%esp
0x080483ca <+6>: sub $0x20,%esp
0x080483cd <+9>: movl $0x0,0x1c(%esp)
0x080483d5 <+17>:addl $0xabcd,0x1c(%esp)
0x080483dd <+25>:mov $0x80484c0,%eax
0x080483e2 <+30>:mov 0x1c(%esp),%edx
0x080483e6 <+34>:mov %edx,0x4(%esp)
0x080483ea <+38>:mov %eax,(%esp)
0x080483ed <+41>:call 0x80482f4 <printf#plt>
0x080483f2 <+46>:mov $0x0,%eax
0x080483f7 <+51>:leave
0x080483f8 <+52>:ret
End of assembler dump.
So when I type [break *main] it starts 0x080483c4 but type [break main] it start 0x080483cd
Why is start address is different?

Why is the address different.
Because break function and break *address are not the same thing(*address specifies the address of the function's first instruction, before the stack frame and arguments have been set up).
In the first case, GDB skips function prolog (setting up the current frame).

Total guess - and prepared to be totally wrong.
*main if address of the function
Breaking inside main is the first available address to stop inside the function when it is being executed.
Note that 0x080483cd is the first place a debugger can stop as it is modifying a variable (ie assigning zero to sum)
When you are breaking at 0x080483c4 this is before the setup assembler that C knows nothing about

Related

How get EIP from x86 inline assembly by gcc

I want to get the value of EIP from the following code, but the compilation does not pass
Command :
gcc -o xxx x86_inline_asm.c -m32 && ./xxx
file contetn x86_inline_asm.c:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned int eip_val;
__asm__("mov %0,%%eip":"=r"(eip_val));
return 0;
}
How to use the inline assembly to get the value of EIP, and it can be compiled successfully under x86.
How to modify the code and use the command to complete it?
This sounds unlikely to be useful (vs. just taking the address of the whole function like void *tmp = main), but it is possible.
Just get a label address, or use . (the address of the current line), and let the linker worry about getting the right immediate into the machine code. So you're not architecturally reading EIP, just reading the value it currently has from an immediate.
asm volatile("mov $., %0" : "=r"(address_of_mov_instruction) );
AT&T syntax is mov src, dst, so what you wrote would be a jump if it assembled.
(Architecturally, EIP = the end of an instruction while it's executing, so arguably you should do
asm volatile(
"mov $1f, %0 \n\t" // reference label 1 forward
"1:" // GAS local label
"=r"(address_after_mov)
);
I'm using asm volatile in case this asm statement gets duplicated multiple times inside the same function by inlining or something. If you want each case to get a different address, it has to be volatile. Otherwise the compiler can assume that all instances of this asm statement produce the same output. Normally that will be fine.
Architecturally in 32-bit mode you don't have RIP-relative addressing for LEA so the only good way to actually read EIP is call / pop. Reading program counter directly. It's not a general-purpose register so you can't just use it as the source or destination of a mov or any other instruction.
But really you don't need inline asm for this at all.
Is it possible to store the address of a label in a variable and use goto to jump to it? shows how to use the GNU C extension where &&label takes its address.
int foo;
void *addr_inside_function() {
foo++;
lab1: ; // labels only go on statements, not declarations
void *tmp = &&lab1;
foo++;
return tmp;
}
There's nothing you can safely do with this address outside the function; I returned it just as an example to make the compiler put a label in the asm and see what happens. Without a goto to that label, it can still optimize the function pretty aggressively, but you might find it useful as an input for an asm goto(...) somewhere else in the function.
But anyway, it compiles on Godbolt to this asm
# gcc -O3 -m32
addr_inside_function:
.L2:
addl $2, foo
movl $.L2, %eax
ret
#clang -O3 -m32
addr_inside_function:
movl foo, %eax
leal 1(%eax), %ecx
movl %ecx, foo
.Ltmp0: # Block address taken
addl $2, %eax
movl %eax, foo
movl $.Ltmp0, %eax # retval = label address
retl
So clang loads the global, computes foo+1 and stores it, then after the label computes foo+2 and stores that. (Instead of loading twice). So you still can't usefully jump to the label from anywhere, because it depends on having foo's old value in eax, and on the desired behaviour being to store foo+2
I don't know gcc inline assembly syntax for this, but for masm:
call next0
next0: pop eax ;eax = eip for this line
In the case of Masm, $ represents the current location, and since call is a 5 byte instruction, an alternative syntax without a label would be:
call $+5
pop eax

Stack allocation, why the extra space?

I was playing around a bit to get a better grip on calling conventions and how the stack is handled, but I can't figure out why main allocates three extra double words when setting up the stack (at <main+0>). It's neither aligned to 8 bytes nor 16 bytes, so that's not why as far as I know. As I see it, main requires 12 bytes for the two parameters to func and the return value.
What am I missing?
The program is C code compiled with "gcc -ggdb" on a x86 architecture.
Edit: I removed the -O0 flag from gcc, and it made no difference to the output.
(gdb) disas main
Dump of assembler code for function main:
0x080483d1 <+0>: sub esp,0x18
0x080483d4 <+3>: mov DWORD PTR [esp+0x4],0x7
0x080483dc <+11>: mov DWORD PTR [esp],0x3
0x080483e3 <+18>: call 0x80483b4 <func>
0x080483e8 <+23>: mov DWORD PTR [esp+0x14],eax
0x080483ec <+27>: add esp,0x18
0x080483ef <+30>: ret
End of assembler dump.
Edit: Of course I should have posted the C code:
int func(int a, int b) {
int c = 9;
return a + b + c;
}
void main() {
int x;
x = func(3, 7);
}
The platform is Arch Linux i686.
The parameters to a function (including, but not limited to main) are already on the stack when you enter the function. The space you allocate inside the function is for local variables. For functions with simple return types such as int, the return value will normally be in a register (eax, with a typical 32-bit compiler on x86).
If, for example, main was something like this:
int main(int argc, char **argv) {
char a[35];
return 0;
}
...we'd expect to see at least 35 bytes allocated on the stack as we entered main to make room for a. Assuming a 32-bit implementation, that would normally be rounded up to the next multiple of 4 (36, in this case) to maintain 32-bit alignment of the stack. We would not expect to see any space allocated for the return value. argc and argv would be on the stack, but they'd already be on the stack before main was entered, so main would not have to do anything to allocate space for them.
In the case above, after allocating space for a, a would typicaly start at [esp-36], argv would be at [esp-44] and argc would be at [esp-48] (or those two might be reversed -- depending on whether arguments were pushed left to right or right to left). In case you're wondering why I skipped [esp-40], that would be the return address.
Edit: Here's a diagram of the stack on entry to the function, and after setting up the stack frame:
Edit 2: Based on your updated question, what you have is slightly roundabout, but not particularly hard to understand. Upon entry to main, it's allocating space not only for the variables local to main, but also for the parameters you're passing to the function you call from main.
That accounts for at least some of the extra space being allocated (though not necessarily all of it).
It's alignment. I assumed for some reason that esp would be aligned from the start, which it clearly isn't.
gcc aligns stack frames to 16 bytes per default, which is what happened.

The blocks in code coverage with VS2010

I run the C++ code to get code coverage results as is in this post.
#include <iostream>
using namespace std;
int testfunction(int input)
{
if (input > 0) {
return 1;
}
else {
return 0;
}
}
int main()
{
testfunction(-1);
testfunction(1);
}
The code coverage result says there are three blocks in the main(), and four blocks in the testfunction(). What does the block mean? How does are there the 3/4 blocks in main/testfunction?
ADDED
When I modified the code as follows,
int main()
{
testfunction(1);
testfunction(1);
}
or as follows
int main()
{
testfunction(-1);
testfunction(-1);
}
I have this result.
And it seems that the testfunction() has four blocks.
the function entry
if block
else block
condition
I got hints from this post.
The technical term for a block in code coverage is basic block. To crib directly from the Wikipedia entry:
The code in a basic block has one
entry point, meaning no code within it
is the destination of a jump
instruction anywhere in the program,
and it has one exit point, meaning
only the last instruction can cause
the program to begin executing code in
a different basic block. Under these
circumstances, whenever the first
instruction in a basic block is
executed, the rest of the instructions
are necessarily executed exactly once,
in order.
A basic block is important in code coverage because we can insert a probe at the beginning of the basic block. When this probe is hit, we know that all of the following instructions in that basic block will be executed (due to the properties of a basic block).
Unfortunately, with compilers (and especially with optimizations), it's not always apparent how source code maps to basic blocks. The easiest way to tell is to look at the generated assembly. For example, let's look at your original main & testfunction:
For main, I see the assembly below (interleaved with the original source). Similarly to what Peter does here, I have noted where the basic blocks start.
int main()
{
013B2D20 push ebp <--- Block 0 (initial)
013B2D21 mov ebp,esp
013B2D23 sub esp,40h
013B2D26 push ebx
013B2D27 push esi
013B2D28 push edi
testfunction(-1);
013B2D29 push 0FFFFFFFFh
013B2D2B call testfunction (013B10CDh)
013B2D30 add esp,4 <--- Block 1 (due to call)
testfunction(1);
013B2D33 push 1
013B2D35 call testfunction (013B10CDh)
013B2D3A add esp,4 <--- Block 2 (due to call)
}
013B2D3D xor eax,eax
013B2D3F pop edi
013B2D40 pop esi
013B2D41 pop ebx
013B2D42 mov esp,ebp
013B2D44 pop ebp
013B2D45 ret
We see that main has three basic blocks: one initial block, and the other two because of the function calls. Looking at the code, this seems reasonable. testfunction is a little tougher. Just looking at the source, there appears to be three blocks:
The entry to the function and logic test (input > 0)
The condition true branch (return 1)
The condition false branch (return 0)
However, because of the actual generated assembly, there are four blocks. I'm assuming you built your code with optimizations disabled. When I build with VS2010 in the Debug configuration (optimizations disabled), I see the following disassembly for testfunction:
int testfunction(int input)
{
013B2CF0 push ebp <--- Block 0 (initial)
013B2CF1 mov ebp,esp
013B2CF3 sub esp,40h
013B2CF6 push ebx
013B2CF7 push esi
013B2CF8 push edi
if (input > 0) {
013B2CF9 cmp dword ptr [input],0
013B2CFD jle testfunction+18h (013B2D08h)
return 1;
013B2CFF mov eax,1 <--- Block 1 (due to jle branch)
013B2D04 jmp testfunction+1Ah (013B2D0Ah)
}
else {
013B2D06 jmp testfunction+1Ah (013B2D0Ah) <--- Not a block (unreachable code)
return 0;
013B2D08 xor eax,eax <--- Block 2 (due to jmp branch # 013B2D04)
}
}
013B2D0A pop edi <--- Block 3 (due to being jump target from 013B2D04)
013B2D0B pop esi
013B2D0C pop ebx
013B2D0D mov esp,ebp
013B2D0F pop ebp
013B2D10 ret
Here, we have four blocks:
The entry to the function
The condition true branch
The condition false branch
The shared function epilog (cleaning up the stack and returning)
Had the compiler duplicated the function epilog in both the condition true and condition false branches, you would only see three blocks. Also, interestingly, the compiler inserted a spurious jmp instruction at 013B2D06. Because it's unreachable code, it's not treated as a basic block.
In general, all of this analysis is overkill since the overall code coverage metric will tell you what you need to know. This answer was just to highlight why the number of blocks isn't always obvious or what's expected.
According to MSDN on Code Coverage Data Overview:
Code coverage data is calculated for
code blocks, lines of code, and
partial lines if they are executed by
a test run. A code block is a code
path with a single entry point, a
single exit point, and a set of
instructions that are all run in
sequence. A code block ends when it
reaches a decision point such as a new
conditional statement block, a
function call, exception throw, enter,
leave, try, catch, or a finally
construct.
Main Block:
Method entry
testfunction
testfunction
Testfunction block:
Method entry
If / Else
Return
Method call

Displaying Value of register in assembly

I have assembly code that performs a mathematic equation which result is stored in the ebx register. How can I display the value of the register? I was thinking of pushing the value of ebx to the stack, then "%i\n", and calling printf, but if this would work, I don't know how it could be coded in GCC assembly using MacOS.
#include <stdio.h>
int f()
{
asm("movl $42, %ebx");
asm("movl %ebx, %eax");
}
main() {
printf("%i\n", f()); // displays 42.
}
As you can see, EAX register's content will be used as return value for f() function.

Is it possible to set a conditional breakpoint at the end of a function based on what the function is about to return?

I have a more complicated version of the following:
unsigned int foo ();
unsigned int bar ();
unsigned int myFunc () {
return foo()+bar();
}
In my case, myFunc is called from lots of places. In one of the contexts there is something going wrong. I know from debugging further down what the return value of this function is when things are bad, but unfortunately I don't know what path resulted in this value.
I could add a temporary variable that stored the result of the expression "foo()+bar()" and then add the conditional breakpoint on that value, but I was wondering if it is possible to do in some other way.
I'm working on x86 architecture.
From this and this answer I thought I could set a breakpoint at the exact location of the return from the function:
gdb> break *$eip
And then add a conditional breakpoint based on the $eax register, but at least in my tests here the return is not in this register.
Is this possible?
Agree with previous commenter that this is probably something you don't want to do, but for me, setting a conditional breakpoint at the last instruction on $eax (or $rax if you are on 64-bit x86) works just fine.
For the code
unsigned int foo(void) { return 1; }
unsigned int bar(void) { return 4; }
unsigned int myFunc(void) { return foo()+bar(); }
using gdb ..
(gdb) disass myFunc
Dump of assembler code for function myFunc:
0x080483d8 <myFunc+0>: push %ebp
0x080483d9 <myFunc+1>: mov %esp,%ebp
0x080483db <myFunc+3>: push %ebx
0x080483dc <myFunc+4>: call 0x80483c4 <foo>
0x080483e1 <myFunc+9>: mov %eax,%ebx
0x080483e3 <myFunc+11>: call 0x80483ce <bar>
0x080483e8 <myFunc+16>: lea (%ebx,%eax,1),%eax
0x080483eb <myFunc+19>: pop %ebx
0x080483ec <myFunc+20>: pop %ebp
0x080483ed <myFunc+21>: ret
End of assembler dump.
(gdb) b *0x080483ed if $eax==5
Breakpoint 1 at 0x80483ed
(gdb) run
Starting program: /tmp/x
Breakpoint 1, 0x080483ed in myFunc ()
(gdb)
I don't get whether you're compiling from the command line or not, but from within Visual Studio, once you set your breakpoint, right-click it and click the "Condition..." option for a dialog to appear to let you edit the condition for your breakpoint to break.
Hope this helps! :-)

Resources