While following up on some windbg tutorials I have noticed that some callstacks using k command are in this format, specially mine
Child-SP RetAddr Call Site
While other online resources like CodeProject have the k command spit out info in this format
Child-EBP RetAddr Call Site
I am confused to why there is a difference between my output and theirs and to what that truly means.
It depends on what calling convention is being used. Some functions allocate locals and parameters that it will pass to the functions it calls using the base pointer, such as cdecl, but Windows x64 calling convention uses rsp.
Child-SP is the value the stack pointer of that frame, which will be the byte before the return address for all frames except for frame 0, which might have a breakpoint before or during the prologue. If the breakpoint is on the first instruction, then the rsp would have only decreased by 8 on the Child-SP of the previous frame; this rsp value is read from the trap frame. The Child-SP is the address of the frame at the frame number if you do not consider the return address of the callee to be part of the frame (which makes sense in this scenario because the return address column of the frame is showing the return address to the previous frame to be part of this frame).
ChildEBP is the value of ebp when in that frame (not the ebp that is pushed to the frame, but the new value of ebp)
RetAddr is the return address that belongs to the frame, so it is the address it will return to
The call site is the address of the instruction after the call instruction that was called that ended the frame (so basically the callee return address -- call instruction + call instruction length), or the address of the breakpoint or other exception that caused it to break into the debugger (note: the address of the exception, not the instruction after it, so this will be rip in the trap frame) in the case of frame 0 (the frame at the top of the stack). This will indicate the name of the function that owns the frame on this row as long as the function doesn't contain a label, and args to child are the arguments passed to it. Indeed the callsite is the return address of the frame that it calls (the frame above it).
Args to child are the arguments passed to the function that owns the stack frame on the current row, not the arguments passed to the callee function it calls in the allocated space by the prologue. This is almost never accurate on x64, where it shows the first 3 quadwords of the homespace, because the first 4 arguments can be passed in registers, and the callee function may not home these arguments (save them in the homespace) if it is -O0 or not a varargs function, or may put something else in the homespace entirely. 'Child' in 'Args to child' and 'Child-SP' is a false and misleading name which implies the function the frame calls, but it actually refers to the current function.
There is an example stacktrace on this site which shows a breakpoint on notepad!ShowOpenSaveDialog, which will be the first instruction.
0:011> bp notepad!ShowOpenSaveDialog
0:011> g
Breakpoint 0 hit
notepad!ShowOpenSaveDialog:
00007ff7`0307182c 48895c2408 mov qword ptr [rsp+8],
rbx ss:00000073`74d2f310=0000000000000000
0:000> k
# Child-SP RetAddr Call Site
00 00000073`74d2f308 00007ff7`03071aeb notepad!ShowOpenSaveDialog
01 00000073`74d2f310 00007ff7`030721fa notepad!InvokeOpenDialog+0x14f
02 00000073`74d2f370 00007ff7`030738d6 notepad!NPCommand+0x4a2
03 00000073`74d2f6f0 00007fff`664b6d41 notepad!NPWndProc+0x726
04 00000073`74d2f9f0 00007fff`664b6713 USER32!UserCallWinProcCheckWow+0x2c1
05 00000073`74d2fb80 00007ff7`03073bdb USER32!DispatchMessageWorker+0x1c3
06 00000073`74d2fc10 00007ff7`03089333 notepad!WinMain+0x27f
07 00000073`74d2fd10 00007fff`68ea3034 notepad!__mainCRTStartup+0x19f
08 00000073`74d2fdd0 00007fff`69073691 KERNEL32!BaseThreadInitThunk+0x14
09 00000073`74d2fe00 00000000`00000000 ntdll!RtlUserThreadStart+0x21
0:000> r #rcx=0
0:000> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=00000213f5020968 rsi=000000499cb1f338 rdi=00000213f5021c20
rip=00007ff70307182c rsp=000000499cb1f288 rbp=000000499cb1f2d0
r8=00000213f4ffe9d6 r9=000000499cb1f338 r10=00000ffee060e246
r11=0000000000014140 r12=0000000000000000 r13=0000000000000001
r14=000000000004094e r15=00000000ffffffff
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
notepad!ShowOpenSaveDialog:
00007ff7`0307182c 48895c2408 mov qword ptr [rsp+8],
rbx ss:00000049`9cb1f290=0000000000000000
0:000> ub notepad!InvokeOpenDialog+0x14f
notepad!InvokeOpenDialog+0x133:
00007ff7`03071acf 8bd8 mov ebx,eax
00007ff7`03071ad1 85c0 test eax,eax
00007ff7`03071ad3 782c js notepad!InvokeOpenDialog+0x165 (00007ff7`03071b01)
00007ff7`03071ad5 4c8b05fc080200 mov r8,qword ptr [notepad!szOpenCaption (00007ff7`030923d8)]
00007ff7`03071adc 4c8bce mov r9,rsi
00007ff7`03071adf 488b5538 mov rdx,qword ptr [rbp+38h]
00007ff7`03071ae3 498bce mov rcx,r14
00007ff7`03071ae6 e841fdffff call notepad!ShowOpenSaveDialog (00007ff7`0307182c)
The trap frame created by nt!KiBreakpointTrap is not part of the stack trace, because trap frames are pushed to the threads's kernel stack not the user stack.
If you are kernel debugging then you will see a fusion of the user and kernel stacks if there is a trap frame and the process address space is accessible:
lkd> .process /P fffffa80723296f0
lkd> .reload
lkd> ld *
lkd> !process 3490
Searching for Process with Cid == 3490
Cid handle table at fffff8a00195f000 with 4126 entries in use
PROCESS fffffa80723296f0
SessionId: 1 Cid: 3490 Peb: 7fffffdf000 ParentCid: 1470
DirBase: 403874000 ObjectTable: fffff8a02b3a2ce0 HandleCount: 293.
Image: chrome.exe
VadRoot fffffa805fc816e0 Vads 295 Clone 0 Private 16586. Modified 1536. Locked 0.
DeviceMap fffff8a00208d0b0
Token fffff8a03466d9e0
ElapsedTime 00:23:43.376
UserTime 00:00:00.717
KernelTime 00:00:00.000
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (27282, 50, 345) (109128KB, 200KB, 1380KB)
PeakWorkingSetSize 33917
VirtualSize 870 Mb
PeakVirtualSize 891 Mb
PageFaultCount 81452
MemoryPriority BACKGROUND
BasePriority 4
CommitCharge 19316
Job fffffa805f9f46d0
THREAD fffffa802e890b50 Cid 3490.469c Teb: 000007fffffdd000 Win32Thread: fffff900c53a48c0 WAIT: (UserRequest) UserMode Non-Alertable
fffffa8062daf060 SynchronizationEvent
Not impersonating
DeviceMap fffff8a00208d0b0
Owning Process fffffa80723296f0 Image: chrome.exe
Attached Process N/A Image: N/A
Wait Start TickCount 45844297 Ticks: 42 (0:00:00:00.655)
Context Switch Count 21234 LargeStack
UserTime 00:00:07.300
KernelTime 00:00:00.234
Win32 Start Address chrome!IsSandboxedProcess (0x000000013fbcbe90)
Stack Init fffff8802d5bec70 Current fffff8802d5be7c0
Base fffff8802d5bf000 Limit fffff8802d5b7000 Call 0
Priority 4 BasePriority 4 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
fffff880`2d5be800 fffff800`0367ec32 nt!KiSwapContext+0x7a
fffff880`2d5be940 fffff800`0368145f nt!KiCommitThreadWait+0x1d2
fffff880`2d5be9d0 fffff800`0397602e nt!KeWaitForSingleObject+0x19f
fffff880`2d5bea70 fffff800`03678c13 nt!NtWaitForSingleObject+0xde
fffff880`2d5beae0 00000000`7782bd7a nt!KiSystemServiceCopyEnd+0x13 (TrapFrame # fffff880`2d5beae0)
00000000`002eec08 000007fe`fd6110ac ntdll!NtWaitForSingleObject+0xa
00000000`002eec10 000007fe`d97c2107 KERNELBASE!WaitForSingleObjectEx+0x79
00000000`002eecb0 000007fe`d858aeab chrome_child!GetHandleVerifier+0x15d8417
00000000`002eed40 000007fe`d823004d chrome_child!GetHandleVerifier+0x3a11bb
00000000`002eedb0 000007fe`d822fc91 chrome_child!GetHandleVerifier+0x4635d
00000000`002eee10 000007fe`d8217c59 chrome_child!GetHandleVerifier+0x45fa1
00000000`002eee40 000007fe`d82176de chrome_child!GetHandleVerifier+0x2df69
00000000`002ef040 000007fe`d82105d1 chrome_child!GetHandleVerifier+0x2d9ee
00000000`002ef1b0 000007fe`d81e518b chrome_child!GetHandleVerifier+0x268e1
00000000`002ef250 000007fe`d81e4c58 chrome_child!ChromeMain+0x377d
00000000`002ef570 000007fe`d81e1b2e chrome_child!ChromeMain+0x324a
00000000`002ef600 00000001`3faf354c chrome_child!ChromeMain+0x120
00000000`002ef6d0 00000001`3faf1699 chrome+0x354c
00000000`002ef7c0 00000001`3fbcbe33 chrome+0x1699
00000000`002efba0 00000000`775d59cd chrome!IsSandboxedProcess+0x61483
00000000`002efbe0 00000000`7780a561 kernel32!BaseThreadInitThunk+0xd
00000000`002efc10 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
The trap frame is made by nt!KiSystemCall64, which has a label in it called nt!KiSystemServiceCopyEnd, which makes the call to the system service, in this case nt!NtWaitForSingleObject. The trap frame is the first stack frame hence function on the kernel stack and begins at the rsp (Child-SP) of the function nt!KiSystemCall64 and starts with the homespace that it allocates for the next function. The first instruction of nt!KiSystemCall64 is swapgs to swap gs with the gs in IA32_KERNEL_GS_BASE, and then it stores rsp into gs:10h, which is the UserRsp field in the KPCR and then stores gs:1A8h into rsp, which is the RspBase field in the KPRCB, this is clearly swapping the user rsp with the kernel rsp for the thread, which is cached in the hyperthread that currently has that thread running's KPRCB. It then starts building the trap frame, where the rsp is currently pointing to offset 0x190 in the trap frame by definition (the byte after the end of the frame), starting with pushing SS, because the CPU doesn't do this when you use syscall instead of int 0x2e.
When a base pointer is used, the trap frame would be at 8 less than the ebp of the frame above it (subtract ebp and return address, 4 bytes each), so the frame of nt!NtWaitForSingleObject
Exceptions have an exception frame that stores the non-volatile registers using nt!KiExceptionDispatch, after nt!KiBreakpointTrap for instance creates the trap frame (and the trap frame building different to a system call because the processor pushes SS:RSP RFLAGS, CS:RIP and ErrorCode to the stack for a user-mode exception and RFLAGS, CS:RIP and ErrorCode to the stack for a kernel-mode exception, and does not save/load any rsp from the KPCR / KPRCB). nt!KiExceptionDispatch creates an exception frame KEXCEPTION_FRAME beginning at the next rsp in the stack trace and the EXCEPTION_RECORD is immediately after above that on the stack, with a total combined size of 1D8, so you'll see a difference of 1E0 between the stack pointers of nt!KiExceptionDispatch and nt!KiBreakpointTrap when you include the return address to nt!KiBreakpointTrap.
The trap frame stores all registers that can be clobbered by the kernel
during the array of functions that are called in the kernel, called volatile registers, because the volatile registers are guaranteed not to be changed during a syscall or an exception. It does not save non-volatile registers because the functions that are called in the kernel save any non-volatile registers they use, so by the time it returns to this function to perform a sysret, all the nonvolatile registers will be the value they were before. You need an exception frame for exceptions because it needs to know in the function that performs the stack unwinding what the nonvolatile registers contained at the time of the exception so that the stack can be unwound. A CONTEXT structure is created on the stack in nt!KiDispatchException to represent the full context at the time of the exception, which is a combination of the nonvolatile and volatile register state.
You also need another exception frame when swapping thread contexts in order to save the state of the nonvolatile registers at the time the thread is switched out, so this will be the state in nt!KiSwapContext, so this can be restored when the thread is switched back in. The volatile state does not need to be saved, because it's assumed that the call to SwapContext discards all volatile registers. The trap frame created when entering kernel mode in the first place will be used to restore the volatile register context to the thread when it returns to user mode.
Fun fact: for a breakpoint exception INT 3, the CPU pushes the rip starting after the end of the breakpoint instruction and not the breakpoint instruction which means that windows needs to decrement the address so the top of the stack callsite is the rip in the trap frame and not the return address pushed by the CPU.
Related
I have been trying to create an ISR handler following this
tutorial by James Molloy but I got stuck. Whenever I throw a software interrupt, general purpose registers and the data segment register is pushed onto the stack with the variables automatically pushed by the CPU. Then the data segment is changed to the value of 0x10 (Kernel Data Segment Descriptor) so the privilege levels are changed. Then after the handler returns those values are poped. But whenever the value in ds is changed a GPE is thrown with the error code 0x2544 and after a few seconds the VM restarts. (linker and compiler i386-elf-gcc , assembler nasm)
I tried placing hlt instructions in between instructions to locate which instruction was throwing the GPE. After that I was able to find out that the the `mov ds,ax' instruction. I tried various things like removing the stack which was initialized by the bootstrap code to deleting the privilege changing parts of the code. The only way I can return from the common stub is to remove the parts of my code which change the privilege levels but as I want to move towards user mode I still want them to stay.
Here is my common stub:
isr_common_stub:
pusha ; Pushes edi,esi,ebp,esp,ebx,edx,ecx,eax
xor eax,eax
mov ax, ds ; Lower 16-bits of eax = ds.
push eax ; save the data segment descriptor
mov ax, 0x10 ; load the kernel data segment descriptor
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
call isr_handler
xor eax,eax
pop eax
mov ds, ax ; This is the instruction everything fails;
mov es, ax
mov fs, ax
mov gs, ax
popa
iret
My ISR handler macros:
extern isr_handler
%macro ISR_NOERRCODE 1
global isr%1 ; %1 accesses the first parameter.
isr%1:
cli
push byte 0
push %1
jmp isr_common_stub
%endmacro
%macro ISR_ERRCODE 1
global isr%1
isr%1:
cli
push byte %1
jmp isr_common_stub
%endmacro
ISR_NOERRCODE 0
ISR_NOERRCODE 1
ISR_NOERRCODE 2
ISR_NOERRCODE 3
...
My C handler which results in "Received interrupt: 0xD err. code 0x2544"
#include <stdio.h>
#include <isr.h>
#include <tty.h>
void isr_handler(registers_t regs) {
printf("ds: %x \n" ,regs.ds);
printf("Received interrupt: %x with err. code: %x \n", regs.int_no, regs.err_code);
}
And my main function:
void kmain(struct multiboot *mboot_ptr) {
descinit(); // Sets up IDT and GDT
ttyinit(TTY0); // Sets up the VGA Framebuffer
asm volatile ("int $0x1"); // Triggers a software interrupt
printf("Wow"); // After that its supposed to print this
}
As you can see the code was supposed to output,
ds: 0x10
Received interrupt: 0x1 with err. code: 0
but results in,
...
ds: 0x10
Received interrupt: 0xD with err. code: 0x2544
ds: 0x10
Received interrupt: 0xD with err. code: 0x2544
...
Which goes on until the VM restarts itself.
What am I doing wrong?
The code isn't complete but I'm going to guess what you are seeing is a result of a well known bug in James Molloy's OSDev tutorial. The OSDev community has compiled a list of known bugs in an errata list. I recommend reviewing and fixing all the bugs mentioned there. Specifically in this case I believe the bug that is causing problems is this one:
Problem: Interrupt handlers corrupt interrupted state
This article previously told you to know the ABI. If you do you will
see a huge problem in the interrupt.s suggested by the tutorial: It
breaks the ABI for structure passing! It creates an instance of the
struct registers on the stack and then passes it by value to the
isr_handler function and then assumes the structure is intact
afterwards. However, the function parameters on the stack belongs to
the function and it is allowed to trash these values as it sees fit
(if you need to know whether the compiler actually does this, you are
thinking the wrong way, but it actually does). There are two ways
around this. The most practical method is to pass the structure as a
pointer instead, which allows you to explicitly edit the register
state when needed - very useful for system calls, without having the
compiler randomly doing it for you. The compiler can still edit the
pointer on the stack when it's not specifically needed. The second
option is to make another copy the structure and pass that
The problem is that the 32-bit System V ABI doesn't guarantee that data passed by value will be unmodified on the stack! The compiler is free to reuse that memory for whatever purposes it chooses. The compiler probably generated code that trashed the area on the stack where DS is stored. When DS was set with the bogus value it crashed. What you should be doing is passing by reference rather than value. I'd recommend these code changes in the assembly code:
irq_common_stub:
pusha
mov ax, ds
push eax
mov ax, 0x10 ;0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
push esp ; At this point ESP is a pointer to where GS (and the rest
; of the interrupt handler state resides)
; Push ESP as 1st parameter as it's a
; pointer to a registers_t
call irq_handler
pop ebx ; Remove the saved ESP on the stack. Efficient to just pop it
; into any register. You could have done: add esp, 4 as well
pop ebx
mov ds, bx
mov es, bx
mov fs, bx
mov gs, bx
popa
add esp, 8
sti
iret
And then modify irq_handler to use registers_t *regs instead of registers_t regs :
void irq_handler(registers_t *regs) {
if (regs->int_no >= 40) port_byte_out(0xA0, 0x20);
port_byte_out(0x20, 0x20);
if (interrupt_handlers[regs->int_no] != 0) {
interrupt_handlers[regs->int_no](*regs);
}
else
{
klog("ISR: Unhandled IRQ%u!\n", regs->int_no);
}
}
I'd actually recommend each interrupt handler take a pointer to registers_t to avoid unnecessary copying. If your interrupt handlers and the interrupt_handlers array used function that took registers_t * as the parameter (instead of registers_t) then you'd modify the code:
interrupt_handlers[r->int_no](*regs);
to be:
interrupt_handlers[r->int_no](regs);
Important: You have to make these same type of changes for your ISR handlers as well. Both the IRQ and ISR handlers and associated code have this same problem.
UPDATE: Sure enough, it was a bug in the latest version of nasm. I "downgraded" and after fixing my code as shown in the answer I accepted, everything is working properly. Thanks, everyone!
I'm having problems with what should be a very simple program in 32-bit assembler on OS X.
First, the code:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
push hello
call _printf
push 0
call _exit
It assembles and links, but when I run the executable it crashes with a segmentation fault: 11.
The command lines to assemble and link are:
nasm -f macho32 hello32x.asm -o hello32x.o
I know the -o there is not 100 percent necessary
Linking:
ld -lc -arch i386 hello32x.o -o hello32x
When I run it into lldb to debug it, everything is fine until it enters into the call to _printf, where it crashes as shown below:
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x00001fac hello32x`main + 8, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x00001fac hello32x`main + 8
hello32x`main:
-> 0x1fac <+8>: calll 0xffffffff991e381e
0x1fb1 <+13>: pushl $0x0
0x1fb3 <+15>: calll 0xffffffff991fec84
0x1fb8: addl %eax, (%eax)
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x890a7ff8)
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
As you can see toward the bottom, it stops due to a bad access error.
16-byte Stack Alignment
One serious issue with your code is stack alignment. 32-bit OS/X code requires 16-byte stack alignment at the point you make a CALL. The Apple IA-32 Calling Convention says this:
The function calling conventions used in the IA-32 environment are the same as those used in the System V IA-32 ABI, with the following exceptions:
Different rules for returning structures
The stack is 16-byte aligned at the point of function calls
Large data types (larger than 4 bytes) are kept at their natural alignment
Most floating-point operations are carried out using the SSE unit instead of the x87 FPU, except when operating on long double values. (The IA-32 environment defaults to 64-bit internal precision for the x87 FPU.)
You subtract 12 from ESP to align the stack to a 16 byte boundary (4 bytes for return address + 12 = 16). The problem is that when you make a CALL to a function the stack MUST be 16 bytes aligned just prior to the CALL itself. Unfortunately you push 4 bytes before the call to printf and exit. This misaligns the stack by 4, when it should be aligned to 16 bytes. You'll have to rework the code with proper alignment. As well you must clean up the stack after you make a call. If you use PUSH to put parameters on the stack you need to adjust ESP after your CALL to restore the stack to its previous state.
One naive way (not my recommendation) to fix the code would be to do this:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 8
push hello ; 4(return address)+ 8 + 4 = 16 bytes stack aligned
call _printf
add esp, 4 ; Remove arguments
push 0 ; 4 + 8 + 4 = 16 byte alignment again
call _exit ; This will not return so no need to remove parameters after
The code above works because we can take advantage of the fact that both functions (exit and printf) require exactly one DWORD being placed on the stack for parameters. 4 bytes for main's return address, 8 for the stack adjustment we made, 4 for the DWORD parameter = 16 byte alignment.
A better way to do this is to compute the amount of stack space you will need for all your stack based local variables (in this case 0) in your main function, plus the maximum number of bytes you will need for any parameters to function calls made by main and then make sure you pad enough bytes to make the value evenly divisible by 12. In our case the maximum number of bytes needed to be pushed for any one given function call is 4 bytes. We then add 8 to 4 (8+4=12) to become evenly divisible by 12. We then subtract 12 from ESP at the start of our function.
Instead of using PUSH to put parameters on the stack you can now move the parameters directly onto the stack into the space we have reserved. Because we don't PUSH the stack doesn't get misaligned. Since we didn't use PUSH we don't need to fix ESP after our function calls. The code could then look something like:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
If you wanted to pass multiple parameters you place them manually on the stack as we did with a single parameter. If we wanted to print an integer 42 as part of our call to printf we could do it this way:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
When run we should get:
Hello, world 42
16-byte Stack Alignment and a Stack Frame
If you are looking to create a function with a typical stack frame then the code in the previous section has to be adjusted. Upon entry to a function in a 32-bit application the stack is misaligned by 4 bytes because the return address was placed on the stack. A typical stack frame prologue looks like:
push ebp
mov ebp, esp
Pushing EBP into the stack after entry to your function still results in a misaligned stack, but it is misaligned now by 8 bytes (4 + 4).
Because of that the code must subtract 8 from ESP rather than 12. As well when determining the space needed to hold parameters, local stack variables, and pad bytes for alignment the stack allocation size will have to be evenly divisible by 8, not by 12. Code with a stack frame could look like:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
push ebp
mov ebp, esp ; Set up stack frame
sub esp, 8 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
xor eax, eax ; Return value = 0
mov esp, ebp
pop ebp ; Remove stack frame
ret ; We linked with C library that calls _main
; after initialization. We can do a RET to
; return back to the C runtime code that will
; exit the program and return the value in EAX
; We can do this instead of calling _exit
Because you link with the C library on OS/X it will provide an entry point and do initialization before calling _main. You can call _exit but you can also do a RET instruction with the program's return value in EAX.
Yet Another Potential NASM Bug?
I discovered that NASM v2.12 installed via MacPorts on El Capitan seems to generate incorrect relocation entries for _printf and _exit, and when linked to a final executable the code doesn't work as expected. I observed almost the identical errors you did with your original code.
The first part of my answer still applies about stack alignment, however it appears you will need to work around the NASM issue as well. One way to do this install the NASM that comes with the latest XCode command line tools. This version is much older and only supports Macho-32, and doesn't support the default directive. Using my previous stack aligned code this should work:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
;default rel ; This directive isn't supported in older versions of NASM
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
To assemble with NASM and link with LD you could use:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
ld -macosx_version_min 10.8 -no_pie -arch i386 -o hello32x hello32x.o -lc
Alternatively you could link with GCC:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
gcc -m32 -Wl,-no_pie -o hello32x hello32x.o
/usr/bin/nasm is the location of the XCode command line tools version of NASM that Apple distributes. The version I have on El Capitan with latest XCode command line tools is:
NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Jan 14 2016
I don't recommend NASM version 2.11.08 because it has a serious bug related to macho64 format. I recommend 2.11.09rc2. I have tested that version here and it does seem to work properly with the code above.
I'm trying to programatically generate a stack trace. When my users are having a crash, in particular a random one, it's hard to talk them through the process of getting a dump so I can fix the problem. In the past once they would send me the trace I would cross reference the addresses in it to the Intermediate/foo.map file to figure out which function was the problem (is that the best way?)
I built a library from various examples I found around the net, to output a minidump to make my job easier. I staged a crash, but the stack trace I get from the minidump file is wildly different from a live stack trace I get from attaching windbg. Examples of both are below:
MiniDump.dmp:
KERNELBASE.dll!76a6c42d()
[Frames below may be incorrect and/or missing, no symbols loaded for KERNELBASE.dll]
KERNELBASE.dll!76a6c42d()
kernel32.dll!75bd14bd()
game.exe!00759035()
game.exe!00575ba3()
WinDbg.exe:
0:000:x86> kv
ChildEBP RetAddr Args to Child
00186f44 00bc8ea9 19460268 0018a9b7 03f70a28 Minidump!crashme+0x2 (FPO: [0,0,0]) (CONV: cdecl) [c:\project\debug\minidump.cpp # 68]
0018795c 00b9ef31 0018796c 03f56c00 6532716d Main!LoadPlugin+0x339 (FPO: [1,642,4]) (CONV: cdecl) [c:\project\main\pluginloader.cpp # 129]
00188968 00b9667d 19460268 0018a9ac 00000000 Main!Command+0x1f1 (FPO: [2,1024,4]) (CONV: cdecl) [c:\project\main\commands.cpp # 2617]
*** WARNING: Unable to verify checksum for C:\Game\game.exe
*** ERROR: Module load completed but symbols could not be loaded for C:\Game\game.exe
0018b1a8 005b5095 19460268 0018beac 00000000 Main!Hook::Detour+0x52d (FPO: [2,2570,0]) (CONV: thiscall) [c:\project\main\hook.cpp # 275]
WARNING: Stack unwind information not available. Following frames may be wrong.
0018b1b4 00000000 19495200 19495200 00000006 game+0x1b5095
game.exe is not mine, and I don't have the source/symbols. The Main.dll is injected into game.exe and it provides front end functionality to load additional DLLs from within the game. The debug code, and the staged crash is in Minidump.dll. After Main.dll loads Minidump it calls AfterLoad(), which sets the exception filter, and then triggers the crash. The relevant minidump code is below:
When I opened the MiniDump.dmp I pointed it to all of my symbol files (with the exception of game.exe, which I don't have) and that part seems like it's working. I do point it to the game.exe binary since I have that. The stack trace I get out of it just really isn't helpful though. My ultimate goal is that the user can just load the DLL, cause the crash, and email the dump file to me. Then I'll attach the symbol files and binaries and be able to diagnose the problem for them. Am I doing something wrong, or is it just not possible to get what I want.
typedef BOOL (WINAPI *MINIDUMPWRITEDUMP)(
HANDLE hProcess,
DWORD ProcessId,
HANDLE hFile,
MINIDUMP_TYPE DumpType,
CONST PMINIDUMP_EXCEPTION_INFORMATION ExceptionParam,
CONST PMINIDUMP_USER_STREAM_INFORMATION UserStreamParam,
CONST PMINIDUMP_CALLBACK_INFORMATION CallbackParam
);
LONG WINAPI WriteDumpFilter(struct _EXCEPTION_POINTERS *pExceptionPointers)
{
HANDLE hFile = NULL;
HMODULE hDll = NULL;
MINIDUMPWRITEDUMP pMiniDumpWriteDump = NULL;
_MINIDUMP_EXCEPTION_INFORMATION ExceptionInformation = {0};
//load MiniDumpWriteDump
hDll = LoadLibrary(TEXT("DbgHelp.dll"));
pMiniDumpWriteDump = (MINIDUMPWRITEDUMP)GetProcAddress(hDll, "MiniDumpWriteDump");
//create output file
hFile = CreateFile( _T( "C:\\temp\\MiniDump.dmp"),
GENERIC_READ|GENERIC_WRITE, 0, NULL,
CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL );
//bail if we don't have a file
if ((hFile != NULL) && (hFile != INVALID_HANDLE_VALUE))
{
//get exception information
ExceptionInformation.ThreadId = GetCurrentThreadId();
ExceptionInformation.ExceptionPointers = pExceptionPointers;
ExceptionInformation.ClientPointers = TRUE;
//write the debug dump
pMiniDumpWriteDump( GetCurrentProcess(), GetCurrentProcessId(),
hFile, MiniDumpWithFullMemory, &ExceptionInformation,
NULL, NULL );
//close the debug output file
CloseHandle(hFile);
}
return EXCEPTION_EXECUTE_HANDLER;
}
VOID crashme() {int* foo = 0; *foo = 0;}
VOID AfterLoad(VOID)
{
SetUnhandledExceptionFilter(WriteDumpFilter);
crashme();
}
I tried to trim some of the fat out of all the details to simplify the problem, but I can be more explicit if needed. I found the good write-up on CodeProject, and I tried finding more background information to read in order to help me understand the problem, but what I could find didn't help me understand they were just step-by-steps to get it running (which is already is). Anyone have any idea what I'm doing wrong, or maybe point me to relevant reading?
After Sergei's suggestion I did .ecxr in windbg and got better output, but it still doesn't match the trace I get when I hook windbg straight up to the process and trigger the crash. Here is the minidump trace;
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
0018e774 00759035 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x58
0018e7b4 00575ba3 00000000 00000000 00000001 game+0x359035
0018fc50 0057788a 009855ef 0018fdcb 00000001 game+0x175ba3
0018fc78 77b7e013 012d9230 002d91d0 002d9200 game+0x17788a
0018fc90 77ba9567 00290000 00000000 002d91d0 ntdll!RtlFreeHeap+0x7e
0018fd6c 0076ece2 0018ff78 007e1b7e ffffffff ntdll!LdrRemoveLoadAsDataTable+0x4e0
002bbc38 5c306174 61666544 00746c75 5d4c3055 game+0x36ece2
002bbc3c 61666544 00746c75 5d4c3055 8c000000 0x5c306174
002bbc40 00746c75 5d4c3055 8c000000 00000101 0x61666544
002bbc44 5d4c3055 8c000000 00000101 01000000 game+0x346c75
002bbc48 8c000000 00000101 01000000 00000000 0x5d4c3055
002bbc4c 00000000 01000000 00000000 0000006e 0x8c000000
and the trace from attaching the debugger to the process
0:000:x86> kv
ChildEBP RetAddr Args to Child
00186f44 00bc8ea9 19460268 0018a9b7 03f70a28 Minidump!crashme+0x2 (FPO: [0,0,0]) (CONV: cdecl) [c:\project\debug\minidump.cpp # 68]
0018795c 00b9ef31 0018796c 03f56c00 6532716d Main!LoadPlugin+0x339 (FPO: [1,642,4]) (CONV: cdecl) [c:\project\main\pluginloader.cpp # 129]
00188968 00b9667d 19460268 0018a9ac 00000000 Main!Command+0x1f1 (FPO: [2,1024,4]) (CONV: cdecl) [c:\project\main\commands.cpp # 2617]
*** WARNING: Unable to verify checksum for C:\Game\game.exe
*** ERROR: Module load completed but symbols could not be loaded for C:\Game\game.exe
0018b1a8 005b5095 19460268 0018beac 00000000 Main!Hook::Detour+0x52d (FPO: [2,2570,0]) (CONV: thiscall) [c:\project\main\hook.cpp # 275]
WARNING: Stack unwind information not available. Following frames may be wrong.
0018b1b4 00000000 19495200 19495200 00000006 game+0x1b5095
I don't have the source for game.exe (I have it for the DLLs which is where the error is), but I decompiled game.exe and here is what is at game+0x359035.
.text:00759001 ; =============== S U B R O U T I N E =======================================
.text:00759001
.text:00759001 ; Attributes: library function bp-based frame
.text:00759001
.text:00759001 ; __stdcall _CxxThrowException(x, x)
.text:00759001 __CxxThrowException#8 proc near ; CODE XREF: .text:0040100Fp
.text:00759001 ; sub_401640+98p ...
.text:00759001
.text:00759001 dwExceptionCode = dword ptr -20h
.text:00759001 dwExceptionFlags= dword ptr -1Ch
.text:00759001 nNumberOfArguments= dword ptr -10h
.text:00759001 Arguments = dword ptr -0Ch
.text:00759001 var_8 = dword ptr -8
.text:00759001 var_4 = dword ptr -4
.text:00759001 arg_0 = dword ptr 8
.text:00759001 arg_4 = dword ptr 0Ch
.text:00759001
.text:00759001 push ebp
.text:00759002 mov ebp, esp
.text:00759004 sub esp, 20h
.text:00759007 mov eax, [ebp+arg_0]
.text:0075900A push esi
.text:0075900B push edi
.text:0075900C push 8
.text:0075900E pop ecx
.text:0075900F mov esi, offset unk_853A3C
.text:00759014 lea edi, [ebp+dwExceptionCode]
.text:00759017 rep movsd
.text:00759019 mov [ebp+var_8], eax
.text:0075901C mov eax, [ebp+arg_4]
.text:0075901F mov [ebp+var_4], eax
.text:00759022 lea eax, [ebp+Arguments]
.text:00759025 push eax ; lpArguments
.text:00759026 push [ebp+nNumberOfArguments] ; nNumberOfArguments
.text:00759029 push [ebp+dwExceptionFlags] ; dwExceptionFlags
.text:0075902C push [ebp+dwExceptionCode] ; dwExceptionCode
.text:0075902F call ds:RaiseException
.text:00759035 pop edi
.text:00759036 pop esi
.text:00759037 leave
.text:00759038 retn 8
.text:00759038 __CxxThrowException#8 endp
My error that I'm triggering is in Minidump.dll, but this code at the top of the stack is in game.exe. There could be plenty going on inside the game.exe that I'm unaware of, could it maybe be hijacking the error that I'm triggering somehow? I.E., I trigger the error in the DLL, but something setup in the game.exe captures program flow before the exception filter that writes the minidump is called?
If that's the case, when I attach the debugger to the process, trigger the error and get the right output that points to the error being in my DLL, then that means game.exe isn't capturing the program flow before the debugger can do the trace. How could I make my minidump code behave the same way... This is getting into territory I'm not terribly familiar with. Any ideas?
I chased further back, and the function calling that one, has this line in it:
.text:00575A8D mov esi, offset aCrashDumpTooLa ; "Crash dump too large to send.\n"
So, I think game.exe is hijacking the exception to do it's own dump before my code tries to get the dump. And then my dumps trace is just a trace of game.exe's dump process...
Answer
I've got it figured out. I'm not sure how to answer my own post, so here is the deal.
.text:0057494A push offset aDbghelp_dll ; "DbgHelp.dll"
.text:0057494F call ds:LoadLibraryA
.text:00574955 test eax, eax
.text:00574957 jz short loc_5749C8
.text:00574959 push offset aMinidumpwrited ; "MiniDumpWriteDump"
.text:0057495E push eax ; hModule
.text:0057495F call ds:GetProcAddress
.text:00574965 mov edi, eax
.text:00574967 test edi, edi
.text:00574969 jz short loc_5749C8
.text:0057496B mov edx, lpFileName
.text:00574971 push 0 ; hTemplateFile
.text:00574973 push 80h ; dwFlagsAndAttributes
.text:00574978 push 2 ; dwCreationDisposition
.text:0057497A push 0 ; lpSecurityAttributes
.text:0057497C push 0 ; dwShareMode
.text:0057497E push 40000000h ; dwDesiredAccess
.text:00574983 push edx ; lpFileName
.text:00574984 call ds:CreateFileA
.text:0057498A mov esi, eax
.text:0057498C cmp esi, 0FFFFFFFFh
.text:0057498F jz short loc_5749C8
.text:00574991 call ds:GetCurrentThreadId
.text:00574997 push 0
.text:00574999 push 0
.text:0057499B mov [ebp+var_1C], eax
.text:0057499E lea eax, [ebp+var_1C]
.text:005749A1 push eax
.text:005749A2 push 0
.text:005749A4 push esi
.text:005749A5 mov [ebp+var_18], ebx
.text:005749A8 mov [ebp+var_14], 1
.text:005749AF call ds:__imp_GetCurrentProcessId
.text:005749B5 push eax
.text:005749B6 call ds:GetCurrentProcess
.text:005749BC push eax
.text:005749BD call edi
.text:005749BF push esi ; hObject
.text:005749C0 call ds:CloseHandle
.text:005749C6 jmp short loc_574A02
Thats from game.exe. It turns out game.exe does it's own minidump. My minidump was triggering after theirs so what I was seeing in my stack trace was a trace of their dump process. I found a dmp file in the game''s installation directory and once I loaded my symbols into it, it showed the correct output I was after.
You are doing just fine. When you open the minidump you generated, after you load the symbols, do
.ecxr
first to set context to what you saved in ExceptionInformation parameter to MiniDumpWriteDump(). Then you will have a legit stack trace.
We use a similar dump generation mechanism at the place where I work.
there are some future gotchas though. You want to check whether your dump catch mechanism is triggered on things like an abort() call.
For that, check out _set_invalid_parameter_handler() and signal(SIGABRT, ...).
I figured it out. Basically game.exe had its own MiniDumpWriteDump code that was triggering before my code. So the stack trace I was getting wasn't a trace of the error, it was a trace of game.exe doing its own MiniDump. I put more details up in the original post.
Thanks!
I'm trying to write and debug a C program for an ATMega8 device using the Atmel Studio Simulator.
For example, let's say I'm trying to debug this piece of code:
int main(void)
{
while(1)
{
SPI_transaction(0xFF);
}
}
uint8_t SPI_transaction(unsigned char byte_to_send){
value = byte_to_send;
SPDR = byte_to_send; //Sends command
while (checkBitStatus(SPSR,SPIF) == 0){}; //Do nothing until transfer is completed
value = SPDR; //Receives command
CLEARBIT(SPSR, SPIF);
return value;
}
unsigned char checkBitStatus(unsigned char byte, unsigned char bit){
unsigned char bit_status;
if ((byte & (1 << bit)) == 0){
bit_status = 0;
}
else{
bit_status = 1;
}
return bit_status;
}
This code buids without any problem but when I try to debug this happens:
At some points the program counter goes outside my source files. I've been trying to find out why this happens but I have ot been able yet to find an answer. I leave here the disassembly code for you to have a look. I find it really strange.
Init of main function:
00000024 PUSH R28 Push register on stack
00000025 PUSH R29 Push register on stack
00000026 IN R28,0x3D In from I/O location
00000027 IN R29,0x3E In from I/O location
SPI_transaction(0xFF);
00000028 SER R24 Set Register
00000029 RCALL PC+0x0002 Relative call subroutine
uint8_t SPI_transaction(unsigned char byte_to_send){
0000002B PUSH R28 Push register on stack
0000002C PUSH R29 Push register on stack
0000002D PUSH R1 Push register on stack
0000002E IN R28,0x3D In from I/O location
0000002F IN R29,0x3E In from I/O location
00000030 STD Y+1,R24 Store indirect with displacement
while (checkBitStatus(SPSR,SPIF) == 0){}; //Do nothing until transfer is completed
00000039 NOP No operation
--- No source file -------------------------------------------------------------
0000003A LDI R24,0x2E Load immediate
0000003B LDI R25,0x00 Load immediate
0000003C MOVW R30,R24 Copy register pair
--- No source file -------------------------------------------------------------
0000003D LDD R24,Z+0 Load indirect with displacement
0000003E LDI R22,0x07 Load immediate
0000003F RCALL PC+0x0018 Relative call subroutine
From that it jumps here:
unsigned char checkBitStatus(unsigned char byte, unsigned char bit){
00000057 PUSH R28 Push register on stack
00000058 PUSH R29 Push register on stack
00000059 RCALL PC+0x0001 Relative call subroutine
0000005A PUSH R1 Push register on stack
0000005B IN R28,0x3D In from I/O location
0000005C IN R29,0x3E In from I/O location
0000005D STD Y+2,R24 Store indirect with displacement
0000005E STD Y+3,R22 Store indirect with displacement
if ((byte & (1 << bit)) == 0){
0000005F LDD R24,Y+2 Load indirect with displacement
00000060 MOV R24,R24 Copy register
00000061 LDI R25,0x00 Load immediate
00000062 LDD R18,Y+3 Load indirect with displacement
00000063 MOV R18,R18 Copy register
00000064 LDI R19,0x00 Load immediate
00000065 MOV R0,R18 Copy register
00000066 RJMP PC+0x0003 Relative jump
00000067 ASR R25 Arithmetic shift right
00000068 ROR R24 Rotate right through carry
00000069 DEC R0 Decrement
0000006A BRPL PC-0x03 Branch if plus
0000006B ANDI R24,0x01 Logical AND with immediate
0000006C CLR R25 Clear Register
0000006D SBIW R24,0x00 Subtract immediate from word
0000006E BRNE PC+0x03 Branch if not equal
bit_status = 0;
0000006F STD Y+1,R1 Store indirect with displacement
00000070 RJMP PC+0x0003 Relative jump
bit_status = 1;
00000071 LDI R24,0x01 Load immediate
00000072 STD Y+1,R24 Store indirect with displacement
return bit_status;
00000073 LDD R24,Y+1 Load indirect with displacement
}
00000074 POP R0 Pop register from stack
00000075 POP R0 Pop register from stack
00000076 POP R0 Pop register from stack
00000077 POP R29 Pop register from stack
00000078 POP R28 Pop register from stack
00000079 RET Subroutine return
The function returns to somewhere out my source files:
--- No source file -------------------------------------------------------------
00000040 TST R24 Test for Zero or Minus
00000041 BREQ PC-0x07 Branch if equal
0000003A LDI R24,0x2E Load immediate
0000003B LDI R25,0x00 Load immediate
0000003C MOVW R30,R24 Copy register pair
0000003D LDD R24,Z+0 Load indirect with displacement
0000003E LDI R22,0x07 Load immediate
0000003F RCALL PC+0x0018 Relative call subroutine
And the goes again to CheckBitStatus() function.
Any help will be really appreciated.
Alex
The code executed outside your source is most likely an interrupt service routine for a hardware interrupt handled by an atmel or c standard library. These routinely occur for stuff like USB events that must be handled asynchronously.
Another simple reason for the debugger to display the assembly view and the message "no source file found" , if you attempt to use the debugger while the configuration manager (Under Build tap) has the configuration option set for release, the program compilation will go to the release directory and not to the debug directory.
Sorry if the questions are dumb, but they are really confusing me!
According to elf standard the binary is divided into segments like text segment (containing code and RO data) and data segment (containing RW & BSS) which is loaded into memory when the program is executed and process is created, with the segments providing information for environment preparation for process execution.
The question is, how it is decided that how much stack to allocate to process, when i am not providing stack size during process creation?
Also, using the data segment we can determine how much memory the process requires (for global variables) but once this memory is allocated how mapping of variables is done with the address space inside this allocated memory?
Lastly, is there any relation of this with scatter loading? which i think is not the case as scatter loading is done when image is to be loaded into memory and once control is passed to OS, the memory to be allocated to executable or applications is take care off by the OS itself!
I know these are too many questions, but any help will be greatly appreciated.
If u can provide any reference books or links where i can study in detail about this, that is also appreciated.
Thanks a tonne! :)
The question is, how it is decided that how much stack to allocate to process, when i am not providing stack size during process creation?
When a new process created, execve() system call is used to load the new program as process image into memory from the current running process image. Which mean execve when new program is loaded replaces older .text, .data segments, heap and reset the stack. Now ELF executable file is mapped into memory address space making stack space getting initialized with environment array and the argument array to main().
In do_execve_common() procedure call under subroutine bprm_mm_init() handles tasks such as,
New instance of mm_struct to manage process address space using call to mm_alloc().
Initialize this instance with init_new_context().
bprm_mm_init() initializes stack.
search_binary_handler() routine searches for suitable binary format i.e load_binary, load_shlib to load programs or dynamic libraries respectively. Followed by mapping memory to virtual address space and making process ready to run when scheduler identifies the process.
Therefore, stack memory finally looks like below, which will appear to main() routine at start of the execution. Now and then each environment of a subset of function calls, including parameters and local variables are stored or pushed in stack memory zone dynamically when the calls happen.
-----------------
| | <--- Top of the Stack
| environmental |
| variables and |
| the other |
| parameters to |
| main() |
_________________ <--- Stack Pointer
| |
| Stack Space |
| |
Also, using the data segment we can determine how much memory the process requires (for global variables) but once this memory is allocated how mapping of variables is done with the address space inside this allocated memory?
Let try figuring out how variables are mapped to different parts of memory segments by debugging a simple C program as follows,
/* File Name: elf.c : Demonstrating Global variables */
#include <stdio.h>
int add_numbers(void);
int value1 = 10; // Global Initialized: .data section
int value2; // Global Initialized: .bss section
int add_numbers(void)
{
int result; // Local Uninitialized: Stack section
result = value1 + value2;
return result;
}
int main(void)
{
int final_result; // Local Uninitialized: Stack section
value2 = 20;
final_result = add_numbers();
printf("The sum of %d + %d is %d\n",
value1, value2, final_result);
}
Using readelf to display .data section header as below,
$readelf -a elf
...
Section Headers:
[26] .data PROGBITS 00000000006c2060 000c2060
00000000000016b0 0000000000000000 WA 0 0 32
[27] .bss NOBITS 00000000006c3720 000c3710
0000000000002bc8 0000000000000000 WA 0 0 32
...
$readelf -x 26 elf
Hex dump of section '.data':
0x006c2060 00000000 00000000 00000000 00000000 ................
0x006c2070 0a000000 00000000 00000000 00000000 ................
...
Let's use GDB to look at what these section contain,
(gdb) disassemble 0x006c2060
Dump of assembler code for function `data_start`:
0x00000000006c2060 <+0>: add %al,(%rax)
0x00000000006c2062 <+2>: add %al,(%rax)
0x00000000006c2064 <+4>: add %al,(%rax)
0x00000000006c2066 <+6>: add %al,(%rax)
End of assembler dump.
The above first address of .data section refers to data_start subroutine.
(gdb) disassemble 0x006c2070
Dump of assembler code for function `value1`:
0x00000000006c2070 <+0>: or (%rax),%al
0x00000000006c2072 <+2>: add %al,(%rax)
End of assembler dump.
....
The above disassemble dumps address of global variable value1 initialized to
10. But we don't see global uninitialized variable value2 in next addresses.
Let's look at printing the address of value2,
(gdb) p &value2
$1 = (int *) 0x6c5eb0
(gdb) info symbol 0x6c5eb0
value2 in section **.bss**
(gdb) disassemble 0x6c5eb0
Dump of assembler code for function `value2`:
0x00000000006c5eb0 <+0>: add %al,(%rax)
0x00000000006c5eb2 <+2>: add %al,(%rax)
End of assembler dump.
Tada! Disassembling reference pointer of value2 revels that the variable is stored in .bss section. This explains how the uninitialized global variables mapped to process memory space.
Lastly, is there any relation of this with scatter loading?
No.