printf in Windows x86-64 shellcode, strings on stack, and alignment - windows

From x64 shellcode on Windows 10, I want to call printf("String: %s\n", "Hello World") with the string "Hello World" defined on the stack. I ran into problems that I’ve somewhat fixed, but I understand neither the problem nor my solution. I read that the x64 ABI mandates the stack to be 16-byte aligned before call instructions, but this does not seem to explain all of it.
After some playing around with nasm, I arrived at the following code, which works:
bits 64
default rel
segment .data
msg db "String: %s", 0xd, 0xa, 0
segment .text
global main
extern ExitProcess
extern printf
main:
push rbp
mov rbp, rsp
sub rsp, 64 ; relevant number 1
mov rdx, rsp
add rdx, 32 ; relevant number 2
mov dword [rdx], "Hell"
mov dword [rdx + 4], "o Wo"
mov dword [rdx + 8], "rld"
lea rcx, [msg]
call printf
xor rcx, rcx
call ExitProcess
Curiously, there are certain combinations of values for the rsp and the position rdx of the string on the stack that work and certain combinations that don’t. However, I don’t see the general rule.
Number 1
Number 2
Works?
Output
64
0
no
64
8
no
x¨/(☻
64
16
no
Pf{tè☻
64
24
no
Ϩ;í
64
32
yes
Hello World
64
40
yes
Hello World
64
48
yes
Hello World
64
56
yes
Hello World
64
64
yes
Hello World
32
0
no
32
8
no
°²øRg
32
16
no
ákG×│☺
32
24
no
↑°wd¢
32
32
yes
Hello World
What’s going on? Why do we need to store the string at least 32 bytes away from the rsp? This seem unrelated to the 16-byte alignment requirement.

Related

Repeated call of WriteConsole (NASM x64 on Win64)

I started to learn Assembly lately and for practice, I thought of makeing a small game.
To make the border graphic of the game I need to print a block character n times.
To test this, I wrote the following code:
bits 64
global main
extern ExitProcess
extern GetStdHandle
extern WriteConsoleA
section .text
main:
mov rcx, -11
call GetStdHandle
mov rbx, rax
drawFrame:
mov r12, [sze]
l:
mov rcx, rbx
mov rdx, msg
mov r8, 1
sub rsp, 48
mov r9, [rsp+40]
mov qword [rsp+32], 0
call WriteConsoleA
dec r12
jnz l
xor rcx, rcx
call ExitProcess
section .data
score dd 0
sze dq 20
msg db 0xdb
I wanted to make this with the WinAPI Function for ouput.
Interestingly, this code stops after printing one char when using WriteConsoleA, but when I use C's putchar, it works correctly. I could also manage to make a C equivalent with the WriteConsoleA function, which also works fine. The disassembly of the C code didn't bring me further.
I suspect there's something wrong in my use of the stack that I don't see. Hopefully someone can explain or point out.
You don't want to keep subtracting 48 from RSP through each loop. You only need to allocate that space once before the loop and before you call a C library function or the WinAPI.
The primary problem is with your 4th parameter in R9. The WriteConsole function is defined as:
BOOL WINAPI WriteConsole(
_In_ HANDLE hConsoleOutput,
_In_ const VOID *lpBuffer,
_In_ DWORD nNumberOfCharsToWrite,
_Out_opt_ LPDWORD lpNumberOfCharsWritten,
_Reserved_ LPVOID lpReserved
);
R9 is supposed to be a pointer to a memory location that returns a DWORD with the number of characters written, but you do:
mov r9, [rsp+40]
This moves the 8 bytes starting at memory address RSP+40 to R9. What you want is the address of [rsp+40] which can be done using the LEA instruction:
lea r9, [rsp+40]
Your code could have looked like:
bits 64
global main
extern ExitProcess
extern GetStdHandle
extern WriteConsoleA
section .text
main:
sub rsp, 56 ; Allocate space for local variable(s)
; Allocate 32 bytes of space for shadow store
; Maintain 16 byte stack alignment for WinAPI/C library calls
; 56+8=64 . 64 is evenly divisible by 16.
mov rcx, -11
call GetStdHandle
mov rbx, rax
drawFrame:
mov r12, [sze]
l:
mov rcx, rbx
mov rdx, msg
mov r8, 1
lea r9, [rsp+40]
mov qword [rsp+32], 0
call WriteConsoleA
dec r12
jnz l
xor rcx, rcx
call ExitProcess
section .data
score dd 0
sze dq 20
msg db 0xdb
Important Note: In order to be compliant with the 64-bit Microsoft ABI you must maintain the 16 byte alignment of the stack pointer prior to calling a WinAPI or C library function. Upon calling the main function the stack pointer (RSP) was 16 byte aligned. At the point the main function starts executing the stack is misaligned by 8 because the 8 byte return address was pushed on the stack. 48+8=56 doesn't get you back on a 16 byte aligned stack address (56 is not evenly divisible by 16) but 56+8=64 does. 64 is evenly divisible by 16.

Understanding Hello World's lea macOS x86-64 [duplicate]

Consider the following variable reference in x64 Intel assembly, where the variable a is declared in the .data section:
mov eax, dword ptr [rip + _a]
I have trouble understanding how this variable reference works. Since a is a symbol corresponding to the runtime address of the variable (with relocation), how can [rip + _a] dereference the correct memory location of a? Indeed, rip holds the address of the current instruction, which is a large positive integer, so the addition results in an incorrect address of a?
Conversely, if I use x86 syntax (which is very intuitive):
mov eax, dword ptr [_a]
, I get the following error: 32-bit absolute addressing is not supported in 64-bit mode.
Any explanation?
1 int a = 5;
2
3 int main() {
4 int b = a;
5 return b;
6 }
Compilation: gcc -S -masm=intel abs_ref.c -o abs_ref:
1 .section __TEXT,__text,regular,pure_instructions
2 .build_version macos, 10, 14
3 .intel_syntax noprefix
4 .globl _main ## -- Begin function main
5 .p2align 4, 0x90
6 _main: ## #main
7 .cfi_startproc
8 ## %bb.0:
9 push rbp
10 .cfi_def_cfa_offset 16
11 .cfi_offset rbp, -16
12 mov rbp, rsp
13 .cfi_def_cfa_register rbp
14 mov dword ptr [rbp - 4], 0
15 mov eax, dword ptr [rip + _a]
16 mov dword ptr [rbp - 8], eax
17 mov eax, dword ptr [rbp - 8]
18 pop rbp
19 ret
20 .cfi_endproc
21 ## -- End function
22 .section __DATA,__data
23 .globl _a ## #a
24 .p2align 2
25 _a:
26 .long 5 ## 0x5
27
28
29 .subsections_via_symbols
GAS syntax for RIP-relative addressing looks like symbol + current_address (RIP), but it actually means symbol with respect to RIP.
There's an inconsistency with numeric literals:
[rip + 10] or AT&T 10(%rip) means 10 bytes past the end of this instruction
[rip + a] or AT&T a(%rip) means to calculate a rel32 displacement to reach a, not RIP + symbol value. (The GAS manual documents this special interpretation)
[a] or AT&T a is an absolute address, using a disp32 addressing mode. This isn't supported on OS X, where the image base address is always outside the low 32 bits. (Or for mov to/from al/ax/eax/rax, a 64-bit absolute moffs encoding is available, but you don't want that).
Linux position-dependent executables do put static code/data in the low 31 bits (2GiB) of virtual address space, so you can/should use mov edi, sym there, but on OS X your best option is lea rdi, [sym+RIP] if you need an address in a register. Unable to move variables in .data to registers with Mac x86 Assembly.
(In OS X, the convention is that C variable/function names are prepended with _ in asm. In hand-written asm you don't have to do this for symbols you don't want to access from C.)
NASM is much less confusing in this respect:
[rel a] means RIP-relative addressing for [a]
[abs a] means [disp32].
default rel or default abs sets what's used for [a]. The default is (unfortunately) default abs, so you almost always want a default rel.
Example with .set symbol values vs. a label
.intel_syntax noprefix
mov dword ptr [sym + rip], 0x11111111
sym:
.equ x, 8
inc byte ptr [x + rip]
.set y, 32
inc byte ptr [y + rip]
.set z, sym
inc byte ptr [z + rip]
gcc -nostdlib foo.s && objdump -drwC -Mintel a.out (on Linux; I don't have OS X):
0000000000001000 <sym-0xa>:
1000: c7 05 00 00 00 00 11 11 11 11 mov DWORD PTR [rip+0x0],0x11111111 # 100a <sym> # rel32 = 0; it's from the end of the instruction not the end of the rel32 or anywhere else.
000000000000100a <sym>:
100a: fe 05 08 00 00 00 inc BYTE PTR [rip+0x8] # 1018 <sym+0xe>
1010: fe 05 20 00 00 00 inc BYTE PTR [rip+0x20] # 1036 <sym+0x2c>
1016: fe 05 ee ff ff ff inc BYTE PTR [rip+0xffffffffffffffee] # 100a <sym>
(Disassembling the .o with objdump -dr will show you that there aren't any relocations for the linker to fill in, they were all done at assemble time.)
Notice that only .set z, sym resulted in a with-respect-to calculation. x and y were original from plain numeric literals, not labels, so even though the instruction itself used [x + RIP], we still got [RIP + 8].
(Linux non-PIE only): To address absolute 8 wrt. RIP, you'd need AT&T syntax incb 8-.(%rip). I don't know how to write that in GAS intel_syntax; [8 - . + RIP] is rejected with Error: invalid operands (*ABS* and .text sections) for '-'.
Of course you can't do that anyway on OS X, except maybe for absolute addresses that are in range of the image base. But there's probably no relocation that can hold the 64-bit absolute address to be calculated for a 32-bit rel32.
Related:
How to load address of function or label into register AT&T version of this
32-bit absolute addresses no longer allowed in x86-64 Linux? PIE vs. non-PIE executables, when you have to use position-independent code.

Assembly, Reserving Space with resq in YASM

Using YASM I have tried to reserve space for 2000 quadwords, but when I
do this I get a SIGSEGV when I try to write into the reserved block of quadwords.
If I reserve space for only 300 quadwords, the program runs without error.
What causes this?
; Using Windows 7 (Intel Celeron 64-bits)
; yasm -f win64 -l forth.lst forth.asm
; gcc -o forth forth.obj
segment .data
controlstr db "%x", 13, 10, 0
segment .bss
dictionaryspace resq 2000
datastackspace resq 300
databottom dq 0
returnstackspace resq 300
returnbottom dq 0
segment .text
global main
extern printf
main:
push rbp ; setup stack frame
mov rbp, rsp
sub rsp, 32 ; reserve space
lea r15, [databottom] ; initialize data stack pointer
sub r15, 8 ; point to the last word in data stack
mov rax, 666
mov [r15], rax ; SIGSEGV happens here.
mov rdx, [r15]
lea rcx, [controlstr]
call printf
leave
ret
; End of code.

Triple fault on stack in C code [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am writing a toy kernel for learning purposes, and I am having a bit of trouble with it. I have made a simple bootloader that loads a segment from a floppy disk (which is written in 32 bit code), then the bootloader enables the A20 gate and turns on protected mode. I can jump to the 32 bit code fine if I write it in assembler, but if I write it in C, I get a triple fault. When I disassemble the C code I can see that the first two instructions involve setting up a new stack frame. This is the key difference between the working ASM code and the failing C code.
I am using NASM v2.10.05 for the ASM code, and GCC from the DJGPP 4.72 collection for the C code.
This is the bootloader code:
org 7c00h
BITS 16
entry:
mov [drive], dl ;Save the current drive
cli
mov ax,cs ; Setup segment registers
mov ds,ax ; Make DS correct
mov ss,ax ; Make SS correct
mov bp,0fffeh
mov sp,0fffeh ;Setup a temporary stack
sti
;Set video mode to text
;===================
mov ah, 0
mov al, 3
int 10h
;===================
;Set current page to 0
;==================
mov ah, 5
mov al, 0
int 10h
;==================
;Load the sector
;=============
call load_image
;=============
;Clear interrupts
;=============
cli
;=============
;Disable NMIs
;============
in ax, 70h
and ax, 80h ;Set the high bit to 1
out 70h, ax
;============
;Enable A20:
;===========
mov ax, 02401h
int 15h
;===========
;Load the GDT
;===========
lgdt [gdt_pointer]
;===========
;Clear interrupts
;=============
cli
;=============
;Enter protected mode
;==================
mov eax, cr0
or eax, 1 ;Set the low bit to 1
mov cr0, eax
;==================
jmp 08h:clear_pipe ;Far jump to clear the instruction queue
;======================================================
load_image:
reset_drive:
mov ah, 00h
; DL contains *this* drive, given to us by the BIOS
int 13h
jc reset_drive
read_sectors:
mov ah, 02h
mov al, 01h
mov ch, 00h
mov cl, 02h
mov dh, 00h
; DL contains *this* drive, given to us by the BIOS
mov bx, 7E0h
mov es, bx
mov bx, 0
int 13h
jc read_sectors
ret
;======================================================
BITS 32 ;Protected mode now!
clear_pipe:
mov ax, 10h ; Save data segment identifier
mov ds, ax ; Move a valid data segment into the data segment register
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax ; Move a valid data segment into the stack segment register
mov esp, 90000h ; Move the stack pointer to 90000h
mov ebp, esp
jmp 08h:7E00h ;Jump to the kernel proper
;===============================================
;========== GLOBAL DESCRIPTOR TABLE ==========
;===============================================
gdt: ; Address for the GDT
gdt_null: ; Null Segment
dd 0
dd 0
gdt_code: ; Code segment, read/execute, nonconforming
dw 0FFFFh ; LIMIT, low 16 bits
dw 0 ; BASE, low 16 bits
db 0 ; BASE, middle 8 bits
db 10011010b ; ACCESS byte
db 11001111b ; GRANULARITY byte
db 0 ; BASE, low 8 bits
gdt_data: ; Data segment, read/write, expand down
dw 0FFFFh
dw 0
db 0
db 10010010b
db 11001111b
db 0
gdt_end: ; Used to calculate the size of the GDT
gdt_pointer: ; The GDT descriptor
dw gdt_end - gdt - 1 ; Limit (size)
dd gdt ; Address of the GDT
;===============================================
;===============================================
drive: db 00 ;A byte to store the current drive in
times 510-($-$$) db 00
db 055h
db 0AAh
And this is the kernel code:
void main()
{
asm("mov byte ptr [0x8000], 'T'");
asm("mov byte ptr [0x8001], 'e'");
asm("mov byte ptr [0x8002], 's'");
asm("mov byte ptr [0x8003], 't'");
}
The kernel simply inserts those four bytes into memory, which I can check as I am running the code in a VMPlayer virtual machine. If the bytes appear, then I know the code is working. If I write code in ASM that looks like this, then the program works:
org 7E00h
BITS 32
main:
mov byte [8000h], 'T'
mov byte [8001h], 'e'
mov byte [8002h], 's'
mov byte [8003h], 't'
hang:
jmp hang
The only differences are therefore the two stack operations I found in the disassembled C code, which are these:
push ebp
mov ebp, esp
Any help on this matter would be greatly appreciated. I figure I am missing something relatively minor, but crucial, here, as I know this sort of thing is possible to do.
Try using the technique here:
Is there a way to get gcc to output raw binary?
to produce a flat binary of the .text section from your object file.
I'd try moving the address of your protected mode stack to something else - say 0x80000 - which (according to this: http://wiki.osdev.org/Memory_Map_(x86)) is RAM which is guaranteed free for use (as opposed to the address you currently use - 0x90000 - which apparently may not be present - depending a bit on what you're using as a testing environment).

NASM jmp wonkiness

I'm writing a Forth inner interpreter and getting stuck at what should be the simplest bit. Using NASM on Mac (macho)
msg db "k thx bye",0xA ; string with carriage return
len equ $ - msg ; string length in bytes
xt_test:
dw xt_bye ; <- SI Starts Here
dw 0
db 3,'bye'
xt_bye dw $+2 ; <- Should point to...
push dword len ; <-- code here
push dword msg ; <--- but it never gets here
push dword 1
mov eax, 0x4 ; print the msg
int 80h
add esp, 12
push dword 0
mov eax, 0x1 ; exit(0)
int 80h
_main:
mov si,xt_test ; si points to the first xt
lodsw ; ax now points to the CFA of the first word, si to the next word
mov di,ax
jmp [di] ; jmp to address in CFA (Here's the segfault)
I get Segmentation Fault: 11 when it runs. As a test, I can change _main to
_main:
mov di,xt_bye+2
jmp di
and it works
EDIT - Here's the simplest possible form of what I'm trying to do, since I think there are a few red herrings up there :)
a dw b
b dw c
c jmp _my_actual_code
_main:
mov si,a
lodsw
mov di,ax
jmp [di]
EDIT - After hexdumping the binary, I can see that the value in b above is actually 0x1000 higher than the address where label c is compiled. c is at 0x00000f43, but b contains 0x1f40
First, it looks extremely dangerous to use the 'si' and 'di' 16-bit registers on a modern x86 machine which is at least 32-bit.
Try using 'esi' and 'edi'. You might be lucky to avoid some of the crashes when 'xt_bye' is not larger than 2^16.
The other thing: there is no 'RET' at the end of xt_bye.
One more: see this linked question Help with Assembly. Segmentation fault when compiling samples on Mac OS X
Looks like you're changing the ESP register to much and it becomes unaligned by 16 bytes. Thus the crash.
One more: the
jmp [di]
may not load the correct address because the DS/ES regs are not used thus the 0x1000 offset.

Resources