On most x86-based Unix systems you can construct a "static" executable that does not load any system-provided DLL(-equivalent)s, and runs a bare minimum of instructions before terminating itself normally. For instance, this works on x86/Linux (32-bit). Technically I might not even need the second mov instruction, as IIRC the ABI guarantees all registers are cleared to zero at the program entrypoint.
$ cat > test.s
.text
.globl start
start:
movl $1,%eax # _exit
movl $0,%ebx
int $0x80
$ as -32 test.s -o test.o
$ ld -m elf_i386 -e start test.o -o test
My question is how close you can get on Windows to this bare minimum of instructions executed in user space between process creation and termination. I have heard rumors that the kernelside process creation logic will load ntdll.dll and possibly also kernel32.dll into every process whether or not the PE file references them, and that both of these have nontrivial startup code that may be unavoidable. I have also heard rumors that system call numbers are not part of the stable ABI, so you have to call through ntdll for cross-version compatibility, even if you're bypassing Win32. I would like to know to what extent these rumors are true, and to what extent their implications can be worked around.
This is an exercise in what is possible in an experiment, rather than what is a good idea in a product shipped to end-users. A concrete motivation for asking this question is that if it were possible to cut the "mandatory" system DLLs completely out of the loop then it would be straightforward to measure what proportion of process startup time is due to their self-initialization.
I'm not very experienced with low-level Windows programming, so if you can give a step-by-step recipe like the above for constructing the "minimal" executable you propose as your answer, that would be appreciated.
I might be able to answer part of your question, but I don't know (and I doubt) that you can bypass them.
I have also heard rumors that system call numbers are not part of the
stable ABI, so you have to call through ntdll for cross-version
compatibility, even if you're bypassing Win32
This is true, each major kernel version comes with newer system calls numbers.
The reason why the syscalls number are not permanent is that the syscall table is generated by name (not by number). So each time you insert a new syscall the older ones get "pushed" farther (and the other way around if a syscall gets removed, although this is quite rare).
The syscall table name (kernel side) is KiServiceTable (part of KeServiceDescriptorTable and KeServiceDescriptorTableShadow).
kd> dps nt!KeServiceDescriptorTable L4
fffff800`1236ba80 fffff800`1215f700 nt!KiServiceTable
fffff800`1236ba88 00000000`00000000
fffff800`1236ba90 00000000`000001b1
fffff800`1236ba98 fffff800`1216048c nt!KiArgumentTable
There are 0x1B1 system calls (windows 8.1) and the system calls pointers are located in the KiServiceTable.
An userland syscall stub look like this (Windows 10):
0:004> u ntdll!ntcreatefile
ntdll!NtCreateFile:
00007fff`1d913ac0 4c8bd1 mov r10,rcx ; args
00007fff`1d913ac3 b855000000 mov eax,55h ; syscall number
00007fff`1d913ac8 0f05 syscall ; x64 instruction, perform ring3 -> ring0 transition
00007fff`1d913aca c3 ret
00007fff`1d913acb 0f1f440000 nop dword ptr [rax+rax]
The same one from Windows 8.1 x64:
0:003> u ntdll!ntcreatefile
ntdll!NtCreateFile:
00007ff8`62071720 4c8bd1 mov r10,rcx
00007ff8`62071723 b854000000 mov eax,54h
00007ff8`62071728 0f05 syscall
00007ff8`6207172a c3 ret
00007ff8`6207172b 0f1f440000 nop dword ptr [rax+rax]
As you can see the same function leads to different syscall numbers (0x55 for Windows 10 and 0x54 for Windows 8.1)
Pointers in the syscall table (inside the kernel) are now "encoded" in a simple way (they were plain pointers before). Let's take a look at index 0x54:
kd> ? nt!KiServiceTable+(dwo(nt!KiServiceTable + 0x54 * 4) >> 4)
Evaluate expression: -8795786429460 = fffff800`12463bec
What symbols is at this address?
kd> ln fffff800`12463bec
Browse module
Set bu breakpoint
(fffff800`12463bec) nt!NtCreateFile | (fffff800`12463c70) nt!IopCreateFile
Exact matches:
nt!NtCreateFile (<no parameter info>)
So ntdll!ntcreatefile leads to kernel function nt!NtCreateFile (not a big surprise :)
You can find a syscall table for major Windows systems at this URL.
Actually, the leaked source from the windows XP kernel (in fact the WRK) shows how the service table is generated (in an assembly file).
I have heard rumors that the kernelside process creation logic will
load ntdll.dll and possibly also kernel32.dll into every process
whether or not the PE file references them, and that both of these
have nontrivial startup code that may be unavoidable
That's true. I'll not go through the whole process which is very complicated and discussed to great length in the Windows Internals books .
ntdll is loaded because a big part of the user-land windows loader is located there (if you have symbolic information, look at all the function starting with Ldr).
The kernel32.dll is also loaded inside process address space because part of the main thread initialization is located there. It is also needed because a part of exception handling is done there.
I could have gone with an executable that execute just a single instruction (namely RET on x86 / x64), but the result is the same with notepad.
Put a breakpoint at entry point:
0:000> bp $exentry
0:000> bl
0 e 00007ff6`275c4030 0001 (0001) 0:**** notepad!WinMainCRTStartup
0:000> g
Breakpoint 0 hit
notepad!WinMainCRTStartup:
00007ff6`275c4030 4883ec28 sub rsp,28h
Stack trace at entry:
0:000> kb
# RetAddr : Args to Child : Call Site
00 00007fff`1ce62d92 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : notepad!WinMainCRTStartup
01 00007fff`1d889f64 : 00007fff`1ce62d70 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x22
02 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x34
So we have ntdll!RtlUserThreadStart which calls KERNEL32!BaseThreadInitThunkwhich calls the entry point of the executable.
0:000> u KERNEL32!BaseThreadInitThunk L 10
KERNEL32!BaseThreadInitThunk:
00007fff`1ce62d70 48895c2408 mov qword ptr [rsp+8],rbx
00007fff`1ce62d75 57 push rdi
00007fff`1ce62d76 4883ec20 sub rsp,20h
00007fff`1ce62d7a 498bf8 mov rdi,r8
00007fff`1ce62d7d 488bda mov rbx,rdx
00007fff`1ce62d80 85c9 test ecx,ecx
00007fff`1ce62d82 7517 jne KERNEL32!BaseThreadInitThunk+0x2b (00007fff`1ce62d9b)
00007fff`1ce62d84 488bca mov rcx,rdx
00007fff`1ce62d87 ff15d3390600 call qword ptr [KERNEL32!_guard_check_icall_fptr (00007fff`1cec6760)]
00007fff`1ce62d8d 488bcf mov rcx,rdi
00007fff`1ce62d90 ffd3 call rbx ; call entry point
00007fff`1ce62d92 8bc8 mov ecx,eax
00007fff`1ce62d94 ff15be2f0600 call qword ptr [KERNEL32!_imp_RtlExitUserThread (00007fff`1cec5d58)]
00007fff`1ce62d9a cc int 3
As you can see, returning from the entry point calls KERNEL32!_imp_RtlExitUserThread (which calls ExitProcess() for the main thread).
The closest you can get to the initialization itself is with TLS callbacks as far as i'm aware, here is some explanation on how things work; TLS callbacks are execute before the entry point of the application and they do have some limitations (that can be worked around with some effort).
As to measure startup time you should avoid trying to do it inside your own aplication; a separated process would be best for that (A debugger could do the trick in a much more reliable way).
Regarding a minimal executable, you can build an executable with only RET (as mentioned by #Neitsa); windows will load the program on memory but will not execute anything, it will basically only map things to memory and that's all.
With FASM you can build an exe that does literally nothing, like the following:
include '%fasm%\win32ax.inc'
section 'a' code readable executable
start:
retn
.end start
Related
I think my real problem is I don't completely understand the stack frame mechanism so I am looking to understand why the following code causes the program execution to resume at the end of the application.
This code is called from a C function which is several call levels deep and the pushf causes program execution to revert back several levels through the stack and completely exit the program.
Since my work around works as expected I would like to know why using the pushf instruction appears to be (I assume) corrupting the stack.
In the routines I usually setup and clean up the stack with :
sub rsp, 28h
...
add rsp, 28h
However I noticed that this is only necessary when the assembly code calls a C function.
So I tried removing this from both routines but it made no difference. SaveFlagsCmb is an assembly function but could easily be a macro.
The code represents an emulated 6809 CPU Rora (Rotate Right Register A).
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
sub rsp, 28h
; Restore Flags
mov cx, word ptr [x86flags]
push cx
popf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
pushf ; this causes jump to end of program ????
pop dx ; this line never reached
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
add rsp, 28h
ret
Rora_I_A ENDP
However if I use this code it works as expected:
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
; sub rsp, 28h ; works with or without this!!!
; Restore Flags
mov ah, byte ptr [x86flags+LSB]
sahf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
lahf
mov dl, ah
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
; add rsp, 28h ; works with or without this!!!
ret
Rora_I_A ENDP
Your reported behaviour doesn't really make sense. Mostly this answer is just providing some background not a real answer, and a suggestion not to use pushf/popf in the first place for performance reasons.
Make sure your debugging tools work properly and aren't being fooled by something into falsely showing a "jump" to somewhere. (And jump where exactly?)
There's little reason to mess around with 16-bit operand size, but that's probably not your problem.
In Visual Studio / MASM, apparently (according to OP's comment) pushf assembles as pushfw, 66 9C which pushes 2 bytes. Presumably popf also assembles as popfw, only popping 2 bytes into FLAGS instead of the normal 8 bytes into RFLAGS. Other assemblers are different.1
So your code should work. Unless you're accidentally setting some other bit in FLAGS that breaks execution? There are bits in EFLAGS/RFLAGS other than condition codes, including the single-step TF Trap Flag: debug exception after every instruction.
We know you're in 64-bit mode, not 32-bit compat mode, otherwise rsp wouldn't be a valid register. And running 64-bit machine code in 32-bit mode wouldn't explain your observations either.
I'm not sure how that would explain pushf being a jump to anywhere. pushf itself can't fault or jump, and if popf set TF then the instruction after popf would have caused a debug exception.
Are you sure you're assembling 64-bit machine code and running it in 64-bit mode? The only thing that would be different if a CPU decoded your code in 32-bit mode should be the REX prefix on sub rsp, 28h, and the RIP-relative addressing mode on [x86flags] decoding as absolute (which would presumably fault). So I don't think that could explain what you're seeing.
Are you sure you're single-stepping by instructions (not source lines or C statements) with a debugger to test this?
Use a debugger to look at the machine code as you single-step. This seem really weird.
Anyway, it seems like a very low-performance idea to use pushf / popf at all, and also to be using 16-bit operand-size creating false dependencies.
e.g. you can set x86 CF with movzx ecx, word ptr [x86flags] / bt ecx, CF.
You can capture the output CF with setc cl
Also, if you're going to do multiple things to the byte from the guest memory, load it into an x86 register. A memory-destination RCR and a memory-destination ADD are unnecessarily slow vs. load / rcr / ... / test reg,reg / store.
LAHF/SAHF may be useful, but you can also do without them too for many cases. popf is quite slow (https://agner.org/optimize/) and it forces a round trip through memory. However, there is one condition-code outside the low 8 in x86 FLAGS: OF (signed overflow). asm-source compatibility with 8080 is still hurting x86 in 2019 :(
You can restore OF from a 0/1 integer with add al, 127: if AL was originally 1, it will overflow to 0x80, otherwise it won't. You can then restore the rest of the condition codes with SAHF. You can extract OF with seto al. Or you can just use pushf/popf.
; sub rsp, 28h ; works with or without this!!!
Yes of course. You have a leaf function that doesn't use any stack space.
You only need to reserve another 40 bytes (align the stack + 32 bytes of shadow space) if you were going to make another function call from this function.
Footnote 1: pushf/popf in other assemblers:
In NASM, pushf/popf default to the same width as other push/pop instructions: 8 bytes in 64-bit mode. You get the normal encoding without an operand-size prefix. (https://www.felixcloutier.com/x86/pushf:pushfd:pushfq)
Like for integer registers, both 16 and 64-bit operand-size for pushf/popf are encodeable in 64-bit mode, but 32-bit operand size isn't.
In NASM, your code would be broken because push cx / popf would push 2 bytes and pop 8, popping 6 bytes of your return address into RFLAGS.
But apparently MASM isn't like that. Probably a good idea to use explicit operand-size specifiers anyway, like pushfw and popfw if you use it at all, to make sure you get the 66 9C encoding, not just 9C pushfq.
Or better, use pushfq and pop rcx like a normal person: only write to 8 or 16-bit partial registers when you need to, and keep the stack qword-aligned. (16-byte alignment before call, 8-byte alignment always.)
I believe this is a bug in Visual Studio. I'm using 2022, so it's an issue that's been around for a while.
I don't know exactly what is triggering it, however stepping over one specific pushf in my code had the same symptoms, albeit with the code actually working.
Putting a breakpoint on the line after the pushf did break, and allowed further debugging of my app. Adding a push ax, pop ax before the pushf also seemed to fix the issue. So it must be a Visual Studio issue.
At this point I think MASM and debugging in Visual Studio has pretty much been abandoned. Any suggestions for alternatives for developing dlls on Windows would be appreciated!
When I was investigating in an executable file,I reached to the piece of code below:
MOV EAX,11B9
MOV EDX,7FFE0300
CALL DWORD PTR DS:[EDX]
RETN 10
This is used to demand a system call. Until here, there is no problem.
I searched within the whole system call code of Windows OS, but none of them is equal to 11B9 in the instruction in the first row "MOV EAX,11B9".
Could everybody guide me, what it means here exactly?
Syscalls numbered 0x1XXX are calls to win32k.sys.
Here is a great table created and updated by j00ru showing the win32k syscall IDs for different versions of Windows:
According to the docs I can find on calling windows functions, the following applies:-
The Microsoft x64 calling convention[12][13] is followed on Windows
and pre-boot UEFI (for long mode on x86-64). It uses registers RCX,
RDX, R8, R9 for the first four integer or pointer arguments (in that
order), and additional arguments are pushed onto the stack (right to
left). Integer return values (similar to x86) are returned in RAX if
64 bits or less.
In the Microsoft x64 calling convention, it's the caller's
responsibility to allocate 32 bytes of "shadow space" on the stack
right before calling the function (regardless of the actual number of
parameters used), and to pop the stack after the call. The shadow
space is used to spill RCX, RDX, R8, and R9,[14] but must be made
available to all functions, even those with fewer than four
parameters.
The registers RAX, RCX, RDX, R8, R9, R10, R11 are considered volatile
(caller-saved).[15]
The registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15 are
considered nonvolatile (callee-saved).[15]
So, I have been happily calling kernel32 until a call to GetEnvironmentVariableA failed under certain circumstances. I finally traced it back to the fact that the direction flag DF was set and I needed to clear it.
I have not up till now been able to find any mention of this and wondered if it was prudent to always clear it before a call.
Or maybe that would cause other problems. Anyone aware of the conventions of calling in this instance?
Windows assumes that the direction flag is cleared. Despite in article said about C run-time only, this is true for whole windows (I think because windows code itself is primarily written in c/c++). So when your programme begins to execute - you can assume that DF is 0. Usually you do not need to change this flag. However if you temporarily change it (set it to 1) in some internal routine you must clear it by cld before calling any windows API or any external module (because it assumes that DF is 0).
All windows interrupts at very beginning of execution clear DF to 0 - so it is safe to temporarily set DF to 1 in own internal code, main - before any external call reset it back to 0.
I have a rather complex, but extremely well-tested assembly language x86-32 application running on variety of x86-32 and x86-64 boxes. This is a runtime system for a language compiler, so it supports the execution of another compiled binary program, the "object code".
It uses Windows SEH to catch various kinds of traps: division by zero, illegal access, ... and prints a register dump using the context information provided by Windows, that shows the state of the machine at the time of the trap. (It does lots of other stuff irrelevant to the question, such as printing a function backtrace or recovering from the division by zero as appropriate). This allows the writer of the "object code" to get some idea what went wrong with his program.
It behaves differently on two Windows 7-64 systems, that are more or less identical, on what I think is an illegal memory access. The specific problem is that the "object code" (not the well-tested runtime system) somewhere stupidly loads 0x82 into EIP; that is a nonexistent page in the address space AFAIK. I expect a Windows trap though the SEH, and expect to a register dump with EIP=00000082 etc.
On one system, I get exactly that register dump. I could show it here, but it doesn't add anything to my question. So, it is clear the SEH in my runtime system can catch this, and display the situation. This machine does not have any MS development tools on it.
On the other ("mystery") system, with the same exact binaries for runtime system and object code, all I get is the command prompt. No further output. FWIW, this machine has MS Visual Studio 2010 on it. The mystery machine is used heavily for other purposes, and shows no other funny behaviors in normal use.
I assume the behavior difference is caused by a Windows configuration somewhere, or something that Visual Studio controls. It isn't the DEP configuration the system menu; they are both configured (vanilla) as "DEP for standard system processes". And my runtime system executable has "No (/NXCOMPAT:NO)" configured.
Both machines are i7 but different chips, 4 cores, lots of memory, different motherboards. I don't think this is relevant; surely both of these CPUs take traps the same way.
The runtime system includes the following line on startup:
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); // stop Windows pop-up on crashes
This was recently added to prevent the "mystery" system from showing a pop-up window, "xxx.exe has stopped working" when the crash occurs. The pop-up box behaviour doesn't happen on the first system, so all this did was push the problem into a different corner on the "mystery" machine.
Any clue where I look to configure/control this?
I provide here the SEH code I am using. It has been edited
to remove a considerable amount of sanity-checking code
that I claim has no effect on the apparant state seen
in this code.
The top level of the runtime system generates a set of worker
threads (using CreateThread) and points to execute ASMGrabGranuleAndGo;
each thread sets up its own SEH, and branches off to a work-stealing scheduler, RunReadyGranule. To the best of my knowledge, the SEH is not changed
after that; at least, the runtime system and the "object code" do
not do this, but I have no idea what the underlying (e.g, standard "C")
libraries might do.
Further down I provide the trap handler, TopLevelEHFilter.
Yes, its possible the register printing machinery itself blows
up causing a second exception. I'll try to check into this again soon,
but IIRC my last attempt to catch this in the debugger on the
mystery machine, did not pass control to the debugger, just
got me the pop up window.
public ASMGrabGranuleAndGo
ASSUME FS:NOTHING ; cancel any assumptions made for this register
ASMGrabGranuleAndGo:
;Purpose: Entry for threads as workers in PARLANSE runtime system.
; Each thread initializes as necessary, just once,
; It then goes and hunts for work in the GranulesQ
; and start executing a granule whenever one becomes available
; install top level exception handler
; Install handler for hardware exceptions
cmp gCompilerBreakpointSet, 0
jne HardwareEHinstall_end ; if set, do not install handler
push offset TopLevelEHFilter ; push new exception handler on Windows thread stack
mov eax, [TIB_SEH] ; expected to be empty
test eax, eax
BREAKPOINTIF jne
push eax ; save link to old exception handler
mov fs:[TIB_SEH], esp ; tell Windows that our exception handler is active for this thread
HardwareEHinstall_end:
;Initialize FPU to "empty"... all integer grains are configured like this
finit
fldcw RTSFPUStandardMode
lock sub gUnreadyProcessorCount, 1 ; signal that this thread has completed its initialization
##: push 0 ; sleep for 0 ticks
call MySleep ; give up CPU (lets other threads run if we don't have enuf CPUs)
lea esp, [esp+4] ; pop arguments
mov eax, gUnreadyProcessorCount ; spin until all other threads have completed initialization
test eax, eax
jne #b
mov gThreadIsAlive[ecx], TRUE ; signal to scheduler that this thread now officially exists
jmp RunReadyGranule
ASMGrabGranuleAndGo_end:
;-------------------------------------------------------------------------------
TopLevelEHFilter: ; catch Windows Structured Exception Handling "trap"
; Invocation:
; call TopLevelEHFilter(&ReportRecord,&RegistrationRecord,&ContextRecord,&DispatcherRecord)
; The arguments are passed in the stack at an offset of 8 (<--NUMBER FROM MS DOCUMENT)
; ESP here "in the stack" being used by the code that caused the exception
; May be either grain stack or Windows thread stack
extern exit :proc
extern syscall #RTSC_PrintExceptionName#4:near ; FASTCALL
push ebp ; act as if this is a function entry
mov ebp, esp ; note: Context block is at offset ContextOffset[ebp]
IF_USING_WINDOWS_THREAD_STACK_GOTO unknown_exception, esp ; don't care what it is, we're dead
; *** otherwise, we must be using PARLANSE function grain stack space
; Compiler has ensured there's enough room, if the problem is a floating point trap
; If the problem is illegal memory reference, etc,
; there is no guarantee there is enough room, unless the application is compiled
; with -G ("large stacks to handle exception traps")
; check what kind of exception
mov eax, ExceptionRecordOffset[ebp]
mov eax, ExceptionRecord.ExceptionCode[eax]
cmp eax, _EXCEPTION_INTEGER_DIVIDE_BY_ZERO
je div_by_zero_exception
cmp eax, _EXCEPTION_FLOAT_DIVIDE_BY_ZERO
je float_div_by_zero_exception
jmp near ptr unknown_exception
float_div_by_zero_exception:
mov ebx, ContextOffset[ebp] ; ebx = context record
mov Context.FltStatusWord[ebx], CLEAR_FLOAT_EXCEPTIONS ; clear any floating point exceptions
mov Context.FltTagWord[ebx], -1 ; Marks all registers as empty
div_by_zero_exception: ; since RTS itself doesn't do division (that traps),
; if we get *here*, then we must be running a granule and EBX for granule points to GCB
mov ebx, ContextOffset[ebp] ; ebx = context record
mov ebx, Context.Rebx[ebx] ; grain EBX has to be set for AR Allocation routines
ALLOCATE_2TOK_BYTES 5 ; 5*4=20 bytes needed for the exception structure
mov ExceptionBufferT.cArgs[eax], 0
mov ExceptionBufferT.pException[eax], offset RTSDivideByZeroException ; copy ptr to exception
mov ebx, ContextOffset[ebp] ; ebx = context record
mov edx, Context.Reip[ebx]
mov Context.Redi[ebx], eax ; load exception into thread's edi
GET_GRANULE_TO ecx
; This is Windows SEH (Structured Exception Handler... see use of Context block below!
mov eax, edx
LOOKUP_EH_FROM_TABLE ; protected by DelayAbort
TRUST_JMP_INDIRECT_OK eax
mov Context.Reip[ebx], eax
mov eax, ExceptionContinueExecution ; signal to Windows: "return to caller" (we've revised the PC to go to Exception handler)
leave
ret
TopLevelEHFilter_end:
unknown_exception:
<print registers, etc. here>
"DEP for standard system processes" won't help you; it's internally known as "OptIn". What you need is the IMAGE_DLLCHARACTERISTICS_NX_COMPAT flag set in the PE header of your .exe file. Or call the SetProcessDEPPolicy function in kernel32.dll The SetProcessMitigationPolicy would be good also... but it isn't available until Windows 8.
There's some nice explanation on Ed Maurer's blog, which explains both how .NET uses DEP (which you won't care about) but also the system rules (which you do).
BIOS settings can also affect whether hardware NX is available.
Recently I've been using lot of assembly language in *NIX operating systems. I was wondering about the Windows domain.
Calling convention in Linux:
mov $SYS_Call_NUM, %eax
mov $param1 , %ebx
mov $param2 , %ecx
int $0x80
Thats it. That is how we should make a system call in Linux.
Reference of all system calls in Linux:
Regarding which $SYS_Call_NUM & which parameters we can use this reference : http://docs.cs.up.ac.za/programming/asm/derick_tut/syscalls.html
OFFICIAL Reference : http://kernel.org/doc/man-pages/online/dir_section_2.html
Calling convention in Windows:
???
Reference of all system calls in Windows:
???
Unofficial : http://www.metasploit.com/users/opcode/syscalls.html , but how do I use these in assembly unless I know the calling convention.
OFFICIAL : ???
If you say, they didn't documented it. Then how is one going to write libc for windows without knowing system calls? How is one gonna do Windows Assembly programming? Atleast in the driver programming one needs to know these. right?
Now, whats up with the so called Native API? Is Native API & System calls for windows both are different terms referring to same thing? In order to confirm I compared these from two UNOFFICIAL Sources
System Calls: http://www.metasploit.com/users/opcode/syscalls.html
Native API: http://undocumented.ntinternals.net/aindex.html
My observations:
All system calls are beginning with letters Nt where as Native API is consisting of lot of functions which are not beginning with letters Nt.
System Call of windows are subset of Native API. System calls are just part of Native API.
Can any one confirm this and explain.
EDIT:
There was another answer. It was a 2nd answer. I really liked it but I don't know why answerer has deleted it. I request him to repost his answer.
If you're doing assembly programming under Windows you don't do manual syscalls. You use NTDLL and the Native API to do that for you.
The Native API is simply a wrapper around the kernelmode side of things. All it does is perform a syscall for the correct API.
You should NEVER need to manually syscall so your entire question is redundant.
Linux syscall codes do not change, Windows's do, that's why you need to work through an extra abstraction layer (aka NTDLL).
EDIT:
Also, even if you're working at the assembly level, you still have full access to the Win32 API, there's no reason to be using the NT API to begin with! Imports, exports, etc all work just fine in assembly programs.
EDIT2:
If you REALLY want to do manual syscalls, you're going to need to reverse NTDLL for each relevant Windows version, add version detection (via the PEB), and perform a syscall lookup for each call.
However, that would be silly. NTDLL is there for a reason.
People have already done the reverse-engineering part: see https://j00ru.vexillium.org/syscalls/nt/64/ for a table of system-call numbers for each Windows kernel. (Note that the later rows do change even between versions of Windows 10.) Again, this is a bad idea outside of personal-use-only experiments on your own machine to learn more about asm and/or Windows internals. Don't inline system calls into code that you distribute to anyone else.
The other thing you need to know about the windows syscall convention is that as I understand it the syscall tables are generated as part of the build process. This means that they can simply change - no one tracks them. If someone adds a new one at the top of the list, it doesn't matter. NTDLL still works, so everyone else who calls NTDLL still works.
Even the mechanism used to perform syscalls (which int, or sysenter) is not fixed in stone and has changed in the past, and I think that once upon a time the same version of windows used different DLLs which used different entry mechanisms depending on the CPU in the machine.
I was interested in doing a windows API call in assembly with no imports (as an educational exercise), so I wrote the following FASM assembly to do what NtDll!NtCreateFile does. It's a rough demonstration on my 64-bit version of Windows (Win10 1803 Version 10.0.17134), and it crashes out after the call, but the return value of the syscall is zero so it is successful. Everything is set up per the Windows x64 calling convention, then the system call number is loaded into RAX, and then it's the syscall assembly instruction to run the call. My example creates the file c:\HelloWorldFile_FASM, so it has to be run "as administrator".
format PE64 GUI 4.0
entry start
section '.text' code readable executable
start:
;puting the first four parameters into the right registers
mov rcx, _Handle
mov rdx, [_access_mask]
mov r8, objectAttributes
mov r9, ioStatusBlock
;I think we need 1 stack word of padding:
push 0x0DF0AD8B
;pushing the other params in reverse order:
push [_eaLength]
push [_eaBuffer]
push [_createOptions]
push [_createDisposition]
push [_shareAcceses]
push [_fileAttributes]
push [_pLargeInterger]
;adding the shadow space (4x8)
; push 0x0
; push 0x0
; push 0x0
; push 0x0
;pushing the 4 register params into the shadow space for ease of debugging
push r9
push r8
push rdx
push rcx
;now pushing the return address to the stack:
push endOfProgram
mov r10, rcx ;copied from ntdll!NtCreateFile, not sure of the reason for this
mov eax, 0x55
syscall
endOfProgram:
retn
section '.data' data readable writeable
;parameters------------------------------------------------------------------------------------------------
_Handle dq 0x0
_access_mask dq 0x00000000c0100080
_pObjectAttributes dq objectAttributes ; at 00402058
_pIoStatusBlock dq ioStatusBlock
_pLargeInterger dq 0x0
_fileAttributes dq 0x0000000000000080
_shareAcceses dq 0x0000000000000002
_createDisposition dq 0x0000000000000005
_createOptions dq 0x0000000000000060
_eaBuffer dq 0x0000000000000000 ; "optional" param
_eaLength dq 0x0000000000000000
;----------------------------------------------------------------------------------------------------------
align 16
objectAttributes:
_oalength dq 0x30
_rootDirectory dq 0x0
_objectName dq unicodeString
_attributes dq 0x40
_pSecurityDescriptor dq 0x0
_pSecurityQualityOfService dq securityQualityOfService
unicodeString:
_unicodeStringLength dw 0x34
_unicodeStringMaxumiumLength dw 0x34, 0x0, 0x0
_pUnicodeStringBuffer dq _unicodeStringBuffer
_unicodeStringBuffer du '\??\c:\HelloWorldFile_FASM' ; may need to "run as adinistrator" for the file create to work.
ioStatusBlock:
_status_pointer dq 0x0
_information dq 0x0
securityQualityOfService:
_sqlength dd 0xC
_impersonationLevel dd 0x2
_contextTrackingMode db 0x1
_effectiveOnly db 0x1, 0x0, 0x0
I used the documentation for Ntdll!NtCreateFile, and I also used the kernel debugger to look at and copy a lot of the params.
__kernel_entry NTSTATUS NtCreateFile(
OUT PHANDLE FileHandle,
IN ACCESS_MASK DesiredAccess,
IN POBJECT_ATTRIBUTES ObjectAttributes,
OUT PIO_STATUS_BLOCK IoStatusBlock,
IN PLARGE_INTEGER AllocationSize OPTIONAL,
IN ULONG FileAttributes,
IN ULONG ShareAccess,
IN ULONG CreateDisposition,
IN ULONG CreateOptions,
IN PVOID EaBuffer OPTIONAL,
IN ULONG EaLength
);
Windows system calls are performed by calling into system DLLs such as kernel32.dll or gdi32.dll, which is done with ordinary subroutine calls. The mechanisms for trapping into the OS privileged layer is undocumented, but that is okay because DLLs like kernel32.dll do this for you.
And by system calls, I'm referring to documented Windows API entry points like CreateProcess() or GetWindowText(). Device drivers will generally use a different API from the Windows DDK.
OFFICIAL Calling convention in Windows: http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
(hope this link survives in the future; if it doesn't, just search for "x64 Software Conventions" on MSDN).
The function calling convention differs in Linux & Windows x86_64. In both ABIs, parameters are preferably passed via registers, but the registers used differ. More on the Linux ABI can be found at http://www.x86-64.org/documentation/abi.pdf