When I assemble a file using GCC tools (from MinGW package), calls to WINAPI functions from system DLLs have this form:
call label
...
ret
label: jmp dword [ExitProcess]
Instead of:
call dword [ExitProcess]
...
ret
How can I force GCC to call directly idata section pointers instead of generating that extra code?
Simple solution:
If you want (this will throw a segment fault)
call *_MessageBoxA#16
do this instead
call *__imp__MessageBoxA#16
and you will get an indirect call without extra useless code.
Related
I am learning assembly using fasm and I am having trouble returning from main after a function call. With an empty program I can get it to work fine
format PE console
entry start
include 'win32a.inc'
section '.text' code executable
start:
push ebp
mov ebp, esp
leave
ret
section '.rdata' data readable
format_str db '%d', 10, 0
section '.idata' data readable import
library msvcrt, 'msvcrt.dll'
import msvcrt, printf, 'printf'
but if I add a function call (printf in this case) like so
format PE console
entry start
include 'win32a.inc'
section '.text' code executable
start:
push ebp
mov ebp, esp
push esp
push format_str ;set to '%d',10,0 in the data section
call [printf]
add esp, 2*4
leave
ret
section '.rdata' data readable
format_str db '%d', 10, 0
section '.idata' data readable import
library msvcrt, 'msvcrt.dll'
import msvcrt, printf, 'printf'
The program will print successfully but will fail to exit the program and crash
What is happening in the function call that causes my return statement to fail and how can I correct for it?
The initial thread in a process basically looks like this:
call LdrLoadAllTheThings ; Might call TLS callbacks etc
call pe_entrypoint ; Your function
push somenumber
call ExitThread ; Exit this thread and possibly the process
A process will end after all threads have exited and just returning will work for very simple programs but as soon as somebody calls CreateThread or one of the thread pool functions the process will no longer end when you just return, it will stick around as long as there are other threads doing work/waiting. On older versions of Windows it was usually OK for console programs to just return but as you have discovered, it only works because the called functions did not create new threads (relying on internal implementation details). In a GUI program it is even less likely to work and hard to debug because things like PlaySound while clicking on a standard UI element might create a thread.
If you build a C/C++ application with the Microsoft toolchain and link with their runtime library then your main function is not the real entry point, the real entry point is mainCRTStartup and it basically works like this:
__declspec(noreturn) void __cdecl mainCRTStartup()
{
int code;
char*argv;
int argc = parse(GetCommandLine(), &argv);
call_constructors();
code = main(argc, argv); // Your main function
call_destructors_and_atexit_callbacks();
ExitProcess(code); // End this thread and all other threads
}
To begin, I'd like to say I have sufficient background in assembly to understand most of what one needs to know to be a functional assembly programmer. Unfortunately I do not understand how a Windows API call works in terms of the return address.
Here's some example code written in GAS assembly for Windows using MinGW's as as the assembler and MinGW's ld as the linker...
.extern _ExitProcess#4
.text
.globl _main
_main:
pushl $0
call _ExitProcess#4
This code compiles and runs after assembling...
as program.s -o program.o
And linking it...
ld program.o -o program.exe -lkernel32
From my understanding, Windows API calls take arguments via push instructions, as can be seen above. Then during the call;
call _ExitProcess#4
the return address for the function is placed on the stack. Then, and this is where I'm confused, the function pops all the arguments off the stack.
I am confused because, since the stack is last in first out, in my mind while popping the arguments on the stack it would pop off the return address first. The arguments came first and the return address came next so it would technically be popped off first.
My question is, what does the layout of the stack look like after passing arguments via push operations to the function call and the return address placed on the stack? How are the arguments and the return address popped off the stack by the function as it executes? And finally, how is the return address popped off the stack and the function call rerturns to the address specified in the return addresss?
Almost all Windows API functions use the stdcall calling convention. This works like the normal "cdecl" convention, except as you've seen the called function is responsible for removing the argument when it returns. It does this using the RET instruction, which takes an optional immediate operand. This operand is the number of bytes to pop off the stack after first popping off the return value.
In both the cdecl and stdcall calling convention the arguments to a function aren't popped off the stack while the function is executing. They're left on the stack and accessed using ESP or EBP relative addressing. So when ExitProcess needs to access its argument it uses an instruction like mov 4(%esp), %eax or mov 4(%ebp), %eax.
I have started developing a small 16-bit OS under GCC/G++.
I am using a GCC cross-compiler, which I compiled under Cygwin, I am putting asm(".code16gcc\n") as the first line of each .CPP file, using Intel ASM syntax and the command lines for compiling and linking a .CPP file look like this:
G++: i586-elf-g++ -c $(CPP_FILE) -o $(OBJECT_OUTPUT) -nostdinc -ffreestanding -nostdlib -fno-builtin -fno-rtti -fno-exceptions -fpermissive -masm=intel
LD: i586-elf-ld -T $(LD_SCRIPT) $(OBJECT_OUTPUT) -o $(BINARY_OUTPUT)
The problem I am currently facing is the way GCC translates function-calling code into assembly.
To be more specific, instead of using the PUSH instruction to pass the arguments, GCC "calculates" the offsets relative to ESP the arguments should be located at, and then uses the MOV instruction to write the stack manually.
This is not beneficial for me, since I rely on the PUSH instruction in my assembly code. To illustrate my problem clearer, take these 2 functions:
void f2(int x);
void f1(){
int arg = 8;
asm("mov eax, 5"); // note: super hacky unsafe use of GNU C inline asm
asm("push eax"); // Writing registers without declaring a clobber is UB
f2(arg);
asm("pop eax");
}
void f2(int x){
}
In function f1, I am saving EAX using the PUSH instruction, and I would expect to have it restored to 5 after calling f2 and executing the "POP EAX" instruction. It turns out however that EAX becomes 8, not 5. That's because the ASSEMBLY CODE GCC generates looks like this (I've included the source as well for clarity):
void f1()
C++: {
push ebp
mov ebp,esp
sub esp,byte +0x14
C++: int arg = 8;
mov dword [ebp-0x4],0x8
C++: asm("mov eax, 5");
mov eax,0x5
C++: asm("push eax");
push eax
C++: f2(arg);
mov eax,[ebp-0x4]
mov [dword esp],eax =======>>>>>> HERE'S THE PROBLEM, WHY NOT 'PUSH EAX' ?!!
call f2
C++: asm("pop eax");
pop eax
C++: }
o32 leave
o32 ret
void f2(int x)
C++: {
push ebp
mov ebp,esp
C++: }
pop ebp
o32 ret
I have tried using some G++ compilation flags like -mpush-args or -mno-push-args and another one which I can't remember and GCC still doesn't want to use PUSH. The version I'm using is i586-elf-g++ (GCC) 4.7.2 (Cross-Compiler recompiled in Cygwin).
Thank you in advance!
UPDATE: Here's a webpage I've found: http://fixunix.com/linux/6799-gcc-function-call-pass-arguments-via-push.html
That just seems really stupid for GCC to do, considering that it limits the usability of inline assembly for complex stuff. :( Please leave an answer if you have a suggestion.
I've been very lucky finding a solution to this problem, but it finally does what I want it to do.
Here's what the GCC manual for version 4.7.2 state:
-mpush-args
-mno-push-args
Use PUSH operations to store outgoing parameters. This method is shorter
and usually equally fast as method using SUB/MOV operations and is enabled
by default. In some cases disabling it may improve performance because of
improved scheduling and reduced dependencies.
-maccumulate-outgoing-args
If enabled, the maximum amount of space required for outgoing arguments will
be computed in the function prologue. This is faster on most modern CPUs
because of reduced dependencies, improved scheduling and reduced stack usage
when preferred stack boundary is not equal to 2. The drawback is a notable
increase in code size. This switch implies ‘-mno-push-args’.
I'm saying I am lucky because -mpush-args does not work, what works is instead "-mno-accumulate-outgoing-args", which is not even documented!
I had similar question lately and people didn't find it important I guess, I found out undocumented option at least for GCC 4.8.1, don't know about latest 4.9 version.
Someone said he gets the "warning: stack probing requires -maccumulate-outgoing-args for correctness [enabled by default]" error message.
To disable stack probing, use -mno-stack-arg-probe, so pass these options I guess to ensure:
-mpush-args -mno-accumulate-outgoing-args -mno-stack-arg-probe
For me this works now, it uses PUSH, much smaller and better code, and much easier to debug with OllyDbg.
I don't know how to ask this better but why does this:
call ExitProcess
do the same as this?:
mov eax, ExitProcess
mov eax, [eax]
call eax
I would think that these would be equivalent:
call ExitProcess
mov eax, ExitProcess
call eax
When importing the code from a DLL, the symbol ExitProcess isn't actually the address of the code that exits your process (it's the address of the address). So, in that case, you have to dereference it to get the actual code address.
That means that you must use:
call [ExitProcess]
to call it.
For example, there's some code at this location containing the following:
;; Note how we use 'AllocConsole' as if it was a variable. 'AllocConsole', to
;; NASM, means the address of the AllocConsole "variable" ; but since the
;; pointer to the AllocConsole() Win32 API function is stored in that
;; variable, we need to call the address from that variable.
;; So it's "call the code at the address: whatever's at the address
;; AllocConsole" .
call [AllocConsole]
However, importing the DLL directly in user code is not the only way to get at the function. I'll explain why you're seeing both ways below.
The "normal" means of calling a DLL function is to mark it extern then import it from the DLL:
extern ExitProcess
import ExitProcess kernel32.dll
:
call [ExitProcess]
Because that sets up the symbol to be an indirect reference to the code, you need to call it indirectly.
After some searching, it appears there is code in the wild that uses the naked form:
call ExitProcess
From what I can tell, this all seems to use the alink linker, which links with the win32.lib library file. It's possible that this library provides the stub for calling the actual DLL code, something like:
import ExitProcessActual kernel32.dll ExitProcess
global ExitProcess
ExitProcess:
jmp [ExitProcessActual]
In nasm, this would import the address of ExitProcess from the DLL and call it ExitProcessActual, keeping in mind that this address is an indirect reference to the code, not the address of the code itself.
It would then export the ExitProcess entry point (the one in this LIB file, not the one in the DLL) so that others could use it.
Then someone could simply write:
extern ExitProcess
:
call ExitProcess
to exit the process - the library would jump to the actual DLL code.
In fact, with a little more research, this is exactly what's happening. From the alink.txt file which comes with the alink download:
A sample import library for Win32 is included as win32.lib. All named exports in Kernel32, User32, GDI32, Shell32, ADVAPI32, version, winmm, lz32, commdlg and commctl are included.
Use:
alink -oPE file[.obj] win32.lib
to include it or specify
INCLUDELIB "win32"
in your source file.
This consists of a series of entries for import redirection - call MessageBoxA, and it jumps to [__imp_MessageBoxA], which is in the import table.
Thus calls to imports will run faster if call [__imp_importName] is used instead of call importName.
See test.asm, my sample program, which calls up a message box both ways:
includelib "win32.lib"
extrn MessageBoxA:near
extrn __imp_MessageBoxA:dword
codeseg
start:
push 0 ; OK button
push offset title1
push offset string1
push 0
call MessageBoxA
push 0 ; OK button
push offset title1
push offset string2
push 0
call large [large __imp_MessageBoxA]
(__imp_MessageBoxA is the symbol imported from the DLL, equivalent to my ExitProcessActual above).
I'm on a Windows 7 machine and I tried opening up kernel32.dll in IDA and IDA says that the address of the IsDebuggerPresent function is 0x77e2b020. I'm trying to call the function using inline assembly.
On a vs2010 platform, I tried using the following code:-
#include<iostream>
using namespace std;
int blah() {
__asm {
xor eax, eax
mov ebx, 0x77e2b020
call ebx
}
}
int main() {
cout<<blah();
return 0;
}
On building the exe, it shows the kernel32.dll is being loaded.
I tried debugging the exe in OllyDbg and the error is an "Access violation" when the "call" instruction executes.
Yes, I know that calling the API directly from C++ is the best/right way to do this, I'm doing this for fun I just dont understand why this does not work.
The address 0x77e2b020 is not static, you MUST call it by name rather than by explicit address.
When you reboot, the library will be loaded at a different address if ASLR is enabled. You also cannot guarantee the library load order, so that will affect the address too.
If you're trying to do an indirect call, consider using LoadLibrary and GetProcAddress to find the address of IsDebuggerPresent at runtime.
Another issue is that you're trashing eax and ebx. You should use pushad and popad to keep the registers safe whilst you do such inline assembly, for example:
__asm {
pushad
call IsDebuggerPresent
mov dbgPresent, eax
popad
}