I'm doing x86 on Mac OS X with NASM. Copying an example and experimenting I noticed that my print command needed a four bytes pushed onto the stack after the other parameters but can't figure out why line five is necessary:
1 push dword len ;Length of message
2 push dword msg ;Message to write
3 push dword 1 ;STDOUT
4 mov eax,4 ;Command code for 'writing'
5 sub esp,4 ;<<< Effectively 'push' Without this the print breaks
6 int 0x80 ;SYSCALL
7 add esp,16 ;Functionally 'pop' everything off the stack
I am having trouble finding any documentation on this 'push the parameters to the stack' syntax that NASM/OS X seems to require. If anyone can point me to a resource for that in general that would most likely answer this question as well.
(Most of the credit goes to #Michael Petch's comment; I'm repeating it here so that it is an answer, and also in order to further clarify the reason for the additional four bytes on the stack.)
macOS is based on BSD, and, as per FreeBSD's documentation re system calls, by default the kernel uses the C calling conventions (which means arguments are pushed to the stack, from last to first), but assuming four extra bytes pushed to the stack, as "it is assumed the program will call a function that issues int 80h, rather than issuing int 80h directly".
That is, the kernel is not built for direct int 80h calls, but rather for code that looks like this:
kernel: ; subroutine to make system calls
int 80h
ret
.
.
.
; code that makes a system call
call kernel ; instead of invoking int 80h directly
Notice that call kernel would push the return address (used by the kernel subroutine's ret to return to calling code after the system call) onto the stack, accounting for four additional bytes – that's why it's necessary to manually push four bytes to the stack (any four bytes – their actual value doesn't matter, as it is ignored by the kernel – so one way to achieve this is sub esp, 4) when invoking int 80h directly.
The reason the kernel expects this behaviour – of calling a method which invokes the interrupt instead of invoking it directly – is that when writing code that can be run on multiple platforms it's then only needed to provide a different version of the kernel subroutine, rather than of every place where a system call is invoked (more details and examples in the link above).
Note: all the above is for 32-bit; for 64-bit the calling conventions are different – registers are used to pass the arguments rather than the stack (there's also a call convention for 32-bit which uses registers, but even then it's not the same registers), the syscall instruction is used instead of int 80h, and no extra four bytes (which, on 64-bit systems, would actually be eight bytes) need to be pushed.
Related
When I was investigating in an executable file,I reached to the piece of code below:
MOV EAX,11B9
MOV EDX,7FFE0300
CALL DWORD PTR DS:[EDX]
RETN 10
This is used to demand a system call. Until here, there is no problem.
I searched within the whole system call code of Windows OS, but none of them is equal to 11B9 in the instruction in the first row "MOV EAX,11B9".
Could everybody guide me, what it means here exactly?
Syscalls numbered 0x1XXX are calls to win32k.sys.
Here is a great table created and updated by j00ru showing the win32k syscall IDs for different versions of Windows:
If I am using the Win32 entry point and I have the following code (in NASM):
extern _ExitProcess#4
global _start
section .text
_start:
mov ebp, esp
; Reserve space onto the stack for two 4 bytes variables
sub esp, 4
sub esp, 4
; ExitProcess(0)
push 0
call _ExitProcess#4
Now before exiting the process, should I increase the esp value to remove the two variables from the stack like I do with any "normal" function?
ExitProcess api can be called from any place. in any function and sub-function. and stack pointer of course can be any. you not need set any registers (include stack pointer) to some (and which ?) values. so answer - you not need increase the esp
as noted #HarryJohnston of course stack must be valid and aligned. as and before any api call. ExiProcess is usual api. and can be call as any another api. and like any another api it require only valid stack but not concrete stack pointer value. non-volatile registers need restore only we return to caller. but ExiProcess not return to caller. it at all never return
so rule is very simply - if you return from any function (entry point or absolute any - does not matter) - we need restore non volatile registers (stack pointer esp or rsp based on calling conventions) and return. if we not return to caller - we and not need restore/preserve any registers. if we return from thread or process entry point, despite good practice also restore all registers as well - in current windows implementations - even if we not do this, any way all will be work, because kernel32 shell caller simply just call ExitThread after we return. it not use any non volatile registers or local variables here. so code will be worked even without restore this from entry point, but much better restore it anyway
I'm looking at some Linux code coming out of the Intel compiler. It looks like functions are being compiled for 2 calling conventions at once. The map file has lots of function name pairs like this:
0x0000000008000000 __foo
0x0000000008000008 __foo.
The offset between the pairs of functions is 4, 8, or 12 bytes. Each of those corresponds to 1, 2, or 3 mov instructions that are moving stack args to registers like this:
__foo:
mov eax, [esp+4]
mov edx, [esp+8]
__foo.:
push ebp
...
After those instructions, it looks like a function using the regparm convention starts.
Does the Intel compiler generate functions with two different calling conventions and then use whichever entry address is correct for the given caller?
Actually, I'd say you have answered yourself to the question:
Does the Intel compiler generate functions with two different calling
conventions and then use whichever entry address is correct for the
given caller?
The foo function appears to be declared with the __regcall attribute. My educated guess is that you must probably have your program compiled using Debug profile, as stack frame based calling conventions allows some information to be available more easily.
Just wondering, in regards to my post Alternatives to built-in Macros, is it possible to avoid using the StdOut macro by using the int 21h windows API? Such as:
.data
msg dd 'This will be displayed'
;original macro usage:
invoke StdOut, addr msg
;what I want to know will work
push msg
int 21h ; If this does what I think it does, it should print msg
Does such a thing exist (as in using int 21h to print things), or does something like it exist, but not exactly int 21h. Or am I completely wrong.
Could someone clarify this for me?
Thanks,
Progrmr
The interrupt 21h was the entry point for MS-DOS functions.
For example to print something on stdout you have to:
mov ah, 09h ; Required ms-dos function
mov dx, msg ; Address of the text to print
int 21h ; Call the MS-DOS API entry-point
The string must be terminated with the '$' character.
But:
You cannot use interrupts in Windows desktop application (they're available only for device drivers).
You must write a 16 bit application if you need to call MS-DOS functions.
Then...yes, you can't use it to print messages, nothing like that exists: you have to call OS functions to print your messages and they are not available via interrupts.
DOS interrupts cannot be used in protected mode on Windows.
You can use the WriteFile Win32 API function to write to the console, or use the MASM macro instead.
The other answers saying that you cannot use interrupts in Windows are quite wrong. If you really want, you can (that's not recommended). At least on 32-bit x86 Windows there's the legacy int 2Eh-based interface for system calls. See e.g. this page for a bit of discussion of system call mechanisms on x86 and x86_64 Windows.
Here's a very simple example (compiled with FASM) of a program, which immediately exits on Windows 7 using int 0x2e (and crashes on most other versions):
format PE
NtTerminateProcess_Wind7=0x172
entry $
; First call terminates all threads except caller thread, see for details:
; http://www.rohitab.com/discuss/topic/41523-windows-process-termination/
mov eax, NtTerminateProcess_Wind7
mov edx, terminateParams
int 0x2e
; Second call terminates current process
mov eax, NtTerminateProcess_Wind7
mov edx, terminateParams
int 0x2e
ud2 ; crash if we failed to terminate
terminateParams:
dd 0, 0 ; processHandle, exitStatus
Do note though, that this is an unsupported way of using Windows: the system call numbers are changing quite often and in general can't be relied on. On this page you can see that e.g. NtCreateFile on Windows XP calls system call number 0x25, while already on Windows Server 2003 this number corresponds to NtCreateEvent, and on Vista it's NtAlpcRevokeSecurityContext.
The supported (albeit not much documented) way of doing the system calls is through the functions of the Native API library, ntdll.dll.
But even if you use the Native API, "printing things" is still very version-dependent. Namely, if you have a redirect to file, you must use NtWriteFile, but when writing to a true console window, you have to use LPC, where the target process depends on Windows version.
I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...