Generated assembly for extended alignment of stack variables [duplicate] - gcc

This question already has answers here:
Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address
(3 answers)
Why is GCC pushing an extra return address on the stack?
(2 answers)
What's up with gcc weird stack manipulation when it wants extra stack alignment?
(1 answer)
Closed 3 years ago.
I was digging into the assembly of code that was using extended alignment for a stack-based variable. This is a smaller version of the code
struct Something {
Something();
};
void foo(Something*);
void bar() {
alignas(128) Something something;
foo(&something);
}
This, when compiled with clang 8.0 generates the following code (https://godbolt.org/z/lf8WW-)
bar(): # #bar()
push rbp
mov rbp, rsp
and rsp, -128
sub rsp, 128
mov rdi, rsp
call Something::Something() [complete object constructor]
mov rdi, rsp
call foo(Something*)
mov rsp, rbp
pop rbp
ret
And earlier versions of gcc produce the following (https://godbolt.org/z/LLQ8gW). Starting gcc 8.1, both produce the same code
bar():
lea r10, [rsp+8]
and rsp, -128
push QWORD PTR [r10-8]
push rbp
mov rbp, rsp
push r10
sub rsp, 232
lea rax, [rbp-240]
mov rdi, rax
call Something::Something() [complete object constructor]
lea rax, [rbp-240]
mov rdi, rax
call foo(Something*)
nop
add rsp, 232
pop r10
pop rbp
lea rsp, [r10-8]
ret
I'm not too familiar with x86 and just out of curiosity - what exactly is happening here in both pieces of code? Does the compiler pull tricks like std::align() and round up the current stack position to a multiple of 128 for the on-stack variable something?

Nothing magical here. Line-by-line:
bar(): # #bar()
push rbp ; preserve base pointer
mov rbp, rsp ; set base poiner
and rsp, -128 ; Anding with -128 aligns it on 128 boundary
sub rsp, 128 ; incrementing stack grows down, incrementing it gives us the space for new object
mov rdi, rsp ; address of the new (future) object is passed as an argument to the constructor, in %RDI
call Something::Something() [complete object constructor] # call constructor
mov rdi, rsp ; callee might have changed %RDI, so need to restore it
call foo(Something*) ; calling a function given it address of fully constructed object
mov rsp, rbp ; restore stack pointer
pop rbp ; restore base pointer
ret

Related

Why does a function double dereference arguments stored on stack and how is that possible? [duplicate]

This question already has answers here:
Basic use of immediates vs. square brackets in YASM/NASM x86 assembly
(4 answers)
x86 Nasm assembly - push'ing db vars on stack - how is the size known?
(2 answers)
Referencing the contents of a memory location. (x86 addressing modes)
(2 answers)
Why do you have to dereference the label of data to store something in there: Assembly 8086 FASM
(1 answer)
Closed 7 months ago.
I tried to understand "lfunction" stack arguments loading to "flist" in following assembly code I found on a book (The book doesn't explain it. Code compiles and run without errors giving intended output displaying "The string is: ABCDEFGHIJ".) but I can't grasp the legality or logic of the code. What I don't understand is listed below.
In lfunction:
Non-volatile (as per Microsoft x64 calling convention) register RBX is not backed up before 'XOR'ing. (But it is not what bugs me most.)
In portion ";arguments on stack"
mov rax, qword [rbp+8+8+32]
mov bl,[rax]
Here [rbp+8+8+32] dereferences corresponding address stored in stack so RAX should
be loaded with value represented by'fourth' which is char 'D'(0x44) as per my understanding (Why qword?). And if so, what dereferencing char 'D' in second line can possibly mean (There should be a memory address to dereference but 'D' is a char.)?
Original code is listed below:
%include "io64.inc"
; stack.asm
extern printf
section .data
first db "A"
second db "B"
third db "C"
fourth db "D"
fifth db "E"
sixth db "F"
seventh db "G"
eighth db "H"
ninth db "I"
tenth db "J"
fmt db "The string is: %s",10,0
section .bss
flist resb 14 ;length of string plus end 0
section .text
global main
main:
push rbp
mov rbp,rsp
sub rsp, 8
mov rcx, flist
mov rdx, first
mov r8, second
mov r9, third
push tenth ; now start pushing in
push ninth ; reverse order
push eighth
push seventh
push sixth
push fifth
push fourth
sub rsp,32 ; shadow
call lfunc
add rsp,32+8
; print the result
mov rcx, fmt
mov rdx, flist
sub rsp,32+8
call printf
add rsp,32+8
leave
ret
;––––––––––––––––––––––––-
lfunc:
push rbp
mov rbp,rsp
xor rax,rax ;clear rax (especially higher bits)
;arguments in registers
mov al,byte[rdx] ; move content argument to al
mov [rcx], al ; store al to memory(resrved at section .bss)
mov al, byte[r8]
mov [rcx+1], al
mov al, byte[r9]
mov [rcx+2], al
;arguments on stack
xor rbx,rbx
mov rax, qword [rbp+8+8+32] ; rsp + rbp + return address + shadow
mov bl,[rax]
mov [rcx+3], bl
mov rax, qword [rbp+48+8]
mov bl,[rax]
mov [rcx+4], bl
mov rax, qword [rbp+48+16]
mov bl,[rax]
mov [rcx+5], bl
mov rax, qword [rbp+48+24]
mov bl,[rax]
mov [rcx+6], bl
mov rax, qword [rbp+48+32]
mov bl,[rax]
mov [rcx+7], bl
mov rax, qword [rbp+48+40]
mov bl,[rax]
mov [rcx+8], bl
mov rax, qword [rbp+48+48]
mov bl,[rax]
mov [rcx+9], bl
mov bl,0 ; terminating zero
mov [rcx+10], bl
leave
ret
Additional info:
I cannot look at register values just after line 50 which
corresponds to "XOR RAX, RAX" in lfunc because debugger auto skips
single stepping to line 37 of main function which corresponds to
"add RSP, 32+8". Even If I marked breakpoints in between
aforementioned lines in lfunc code the debugger simply hangs so I
have to manually abort debugging.
In portion ";arguments on stack"
mov rax, qword [rbp+8+8+32]
mov bl,[rax]
I am mentioning this again to be more precise of what am asking because question was marked as duplicate and
provided links with answers that doesn't address my specific issue. At line
[rbp+8+8+32] == 0x44 because clearly, mov with square brackets dereferences reference address (which I assume 64bit width) rbp+3h. So, the size of 0x44 is byte. That is why ask "Why qword?" because it implies "lea [rbp+8+8+32]" which is a qword reference, not mov. So if [rbp+8+8+32] equals 0x44, then [rax] == [0x0000000000000044], which a garbage ( not relevant to our code here) address.

Segmentation fault after function call in x86 assembly

I'm currently writing a compiler and I have created some tests and only one of them fails with Segmentation fault (core dumped) as error message.
This is the code that gets compiled
function main(): int {
var f: int = 10;
return func(f) - func(f / 2);
}
function func(a: int): int {
return a;
}
And this is the generated assembly code (it's not really optimized as you can see)
section .text
global _start
_start:
call function_main
mov rdi, rax
mov rax, 60
syscall
global function_main
function_main:
push rbp
mov rbp, rsp
sub rsp, 4
mov rax, 10
mov DWORD[rbp-0], eax
mov eax, DWORD[rbp-0]
mov rdi, rax
call function_func
push rax
mov eax, DWORD[rbp-0]
mov rbx, 2
idiv rbx
mov rdi, rax
call function_func
mov rbx, rax
pop rax
sub rax, rbx
mov rsp, rbp
pop rbp
ret
global function_func
function_func:
push rbp
mov rbp, rsp
sub rsp, 4
mov DWORD[rbp-0], edi
mov eax, DWORD[rbp-0]
mov rsp, rbp
pop rbp
ret
The assembly file is compiled with nasm -f elf64 ./test9.lv.asm -o ./test9.lv.asm.o and ld -g ./test9.lv.asm.o a.out
I've used gdb to debug the binary file and it seems like the program receives the SIGSEGV signal right after func(f) returns the first time.
But now I don't know why this is happening in this case.

What is wrong with my implementation of the Lisp "cons" function?

I am trying to make the Lisp function cons in x86_84 assembly on MacOS. Below I am trying to make a pair of 2 and 3, but it is not working; I am getting a segmentation fault.
.global _main
.extern _malloc
.text
.macro make_node register
mov rdi, 8 # 64-bit number
call _malloc # failed on malloc
mov [rax], \register # contents of register in address of rax
mov \register, [rax]
.endm
cons:
push rbp
mov rbp, rsp
mov r8, [rbp + 16]
make_node r8
mov r9, [rbp + 24]
make_node r9
mov rsp, rbp
pop rbp
ret
_main:
push 3
push 2
call cons
add rsp, 16
# I should now be able to do whatever I want with r8 (2) and r9 (3)
mov rdi, 0
mov rax, 0x2000001
syscall
I stepped through it with GDB and I see that it failed on calling malloc, but to me, there doesn't seem to be a problem since malloc only takes one argument (the number of bytes to allocate) in the rdi register.
Dump of assembler code for function cons:
0x0000000100003f48 <+0>: push %rbp
0x0000000100003f49 <+1>: mov %rsp,%rbp
0x0000000100003f4c <+4>: mov 0x10(%rbp),%r8
0x0000000100003f50 <+8>: mov $0x8,%rdi
=> 0x0000000100003f57 <+15>: callq 0x100003f96
0x0000000100003f5c <+20>: mov %r8,(%rax)
0x0000000100003f5f <+23>: mov (%rax),%r8
0x0000000100003f62 <+26>: mov 0x18(%rbp),%r9
0x0000000100003f66 <+30>: mov $0x8,%rdi
0x0000000100003f6d <+37>: callq 0x100003f96
0x0000000100003f72 <+42>: mov %r9,(%rax)
0x0000000100003f75 <+45>: mov (%rax),%r9
0x0000000100003f78 <+48>: mov %rbp,%rsp
0x0000000100003f7b <+51>: pop %rbp
0x0000000100003f7c <+52>: retq
End of assembler dump.
(gdb) ni
Thread 2 received signal SIGSEGV, Segmentation fault.
I am assembling on a Mac like this: clang -masm=intel cell.asm.
Does anyone familiar with x86 assembly know the source of my error?
(Also, in case anyone asks, I know that it's important to call free after malloc but this code is the only code necessary to demonstrate my problem.)

x64 asm ret lands in no mans land

I assumed I had push'ed something without popping it, or vice versa, but I can't find anything wrong! I write to the console with a call to a dll that links properly, and I inexplicably am in no mans land... (address 0x0000000000000000)
I've put some sleeps in, and I'm sure that the api call WriteConsoleA is returning. It's on my last ret under the print function.
Any ideas?
.exe:
extern FreeConsole
extern Sleep
extern ExitProcess
extern print
extern newconsole
extern strlen
section .data BITS 64
title: db 'Consolas!',0
message: db 'Hello, world',0,0
section .text bits 64
global Start
Start:
mov rcx, title
call newconsole
mov rcx, 1000
call Sleep
mov rcx, message
call print
mov rcx, 10000
call Sleep
call FreeConsole
xor rcx, rcx
call ExitProcess
.dll:
extern AllocConsole
extern SetConsoleTitleA
extern GetStdHandle
extern WriteConsoleA
extern Sleep
export newconsole
export strlen
export print
section .data BITS 64
console.writehandle: dq 0
console.readhandle: dq 0
console.write.result: dq 0
section .text BITS 64
global strlen
strlen:
push rax
push rdx
push rdi
mov rdi, rcx
xor rax, rax
mov rcx, dword -1
cld
repnz scasb
neg rcx
sub rcx, 2
pop rdi
pop rdx
pop rax
ret
global print
print:
mov rbp, rsp
push rcx
call strlen
mov r8, rcx
pop rdx
mov rcx, [console.writehandle]
mov r9, console.write.result
push qword 0
call WriteConsoleA
ret
global newconsole
newconsole:
push rax
push rcx
call AllocConsole
pop rcx
call SetConsoleTitleA
mov rcx, -11
call GetStdHandle
mov [console.writehandle], rax
pop rax
ret
I assume you're talking about this function:
global print
print:
mov rbp, rsp
push rcx
call strlen
mov r8, rcx
pop rdx
mov rcx, [console.writehandle]
mov r9, console.write.result
push qword 0
call WriteConsoleA
ret
The x64 ABI requires that stack space is reserved even for parameters passed in registers. WriteConsoleA is free to use those stack locations for whatever it wants - so you need to make sure that you've adjusted the stack appropriately. As it stands, you're pushing only the last reserved pointer parameter. I think something like the following will do the trick for you:
push qword 0
sub rsp, 4 * 8 // reserve stack for register parameters
call WriteConsoleA
mov rsp, rbp // restore rsp
ret
See http://msdn.microsoft.com/en-us/library/ms235286.aspx (emphasis added):
The x64 Application Binary Interface (ABI) is a 4 register fast-call calling convention, with stack-backing for those registers.
...
The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters.
According to calling convention, you have to clean up arguments you put on the stack. In this case that applies to the 5th argument to WriteConsoleA. Since you have a copy of original rsp in rbp, you can reload rsp from rbp, or just add 8 after the call.

Accessing malloc'd memory in assembly

I'm trying to access memory I have malloced in assembly but I keep just repeatedly getting segfault errors. What am I doing wrong in the following code, I'm sure it's simple but I just can't see it!
EDIT: I am using 64 bit NASM assembly
; Allocate room for 8 integers
mov r8, 8
mov rdi, r8
imul rdi, 8 ; Multiply by 8 (8 bytes per entry in 64bit)
xor rax, rax
call malloc
add rsp, 8
test rax, rax
jz malloc_failure
mov r8, rsp
; r8 now = base of array
; Set the first element to be 100
mov r9, 0
add r9, r8
mov qword [r9], 100
malloc_failure:
deallocate_start:
dealloc_1:
mov rdi, r8
xor rax, rax
call free
add rsp, 8
deallocate_end:
call os_return ; return to operating system
And the segfault (Not very interesting...)
matrix05% ./arr5
Segmentation fault
mov r8, 8
mov rdi, r8
imul rdi, 8
xor rax, rax
call malloc
add rsp, 8 ;; here we _add_ 8 bytes to the stack pointer
;; this is equivalent to _popping_ off the stack
;; remember, the x86 stack grows down!
test rax, rax ;; rax is indeed where the return value is..... but:
jz malloc_failure
mov r8, rsp ;; we overwrite r8 with the stack pointer (why??)
; r8 now = base of array ;; no it's not
mov r9, 0
add r9, r8 ;; r9 = r8 = stack pointer
mov qword [r9], 100 ;; we now write 100 to the current stack pointer.
;; The stack pointer initially (on entry to the function)
;; pointed to a return address; where exactly are you overwriting?
malloc_failure:
deallocate_start:
dealloc_1:
mov rdi, r8
xor rax, rax
call free
add rsp, 8 ;; we pop from the stack pointer _again_. I do hope there's a sub rsp, 16 at the top...
deallocate_end:
call os_return ; return to operating system (and probably crash because our stack is FUBAR'd)

Resources