How mingw32-g++ compiler know where to inject system calls in the WIN32 machine executable? - winapi

Today, only for the testing purposes, I came with the following idea, to create and compile a naive source code in CodeBlocks, using Release target to remove the unnecessary debugging code, a main function with three nop operations only to find faster where the entry point for the main function is.
CodeBlocks sample naive program:
Using IDA disassembler, I have seen something strange, OS actually can add aditional machine code calls in the main function (added implicitly), a call to system function which reside in kernel32.dll what is used for OS thread handling.
IDA program view:
In the machine code only for test reason the three "nop" (90) was replaced by "and esp, 0FFFFFFF0h", program was re-pached again, this is why "no operation" opcodes are not disponible in the view.
Observed behaviour:
It is logic to create a new thread for each process is opened, as we can explore it in the TaskManager, a process run in it's own thread, that is a reason why compiler add this code (the implicit default thread).
My questions:
How compiler know where to "inject" this call code automatically?
Why this call is not made before in the upper function (sub_401B8C) which will route to main function entry point?

To quote the gcc manual:
If no init section is available, when GCC compiles any function called
main (or more accurately, any function designated as a program entry
point by the language front end calling expand_main_function), it
inserts a procedure call to __main as the first executable code after
the function prologue. The __main function is defined in libgcc2.c and
runs the global constructors.

Related

Entry point and main method in PE executable Windows

How can I find the main method in the PE executable file, should I find the entry point address and start from that point or find three pushes of the stack in case the PE is written in C?
There are not going to be 3 pushes because main is not the real entry point on Windows. The compiler will insert extra code that initializes things and then calls main/WinMain. There is probably too much code between the real start and main to automate finding main. You would have to consider multiple versions of Visual Studio and MinGW. And some exe files do not use the C run-time at all and execute directly from the real entrypoint.
The entry point is a function that takes zero arguments. Its address is the load address of the .exe (Starting with MZ) + IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint.

Why is ExitProcess necessary under Win32 when you can use a RET?

I've noticed that many assembly language examples built using straight Win32 calls (no C Runtime dependency) illustrate the use of an explicit call to ExitProcess() to end the program at the end of the entry-point code. I'm not talking about using ExitProcess() to exit at some nested location within the program. There are surprisingly fewer examples where the entry-point code simply exits with a RET instruction. One example that comes to mind is the famous TinyPE, where the program variations exit with a RET instruction, because a RET instruction is a single byte. Using either ExitProcess() or a RET both seem to do the job.
A RET from an executable's entry-point returns the value of EAX back to the Windows loader in KERNEL32, which ultimately propagates the exit code back to NtTerminateProcess(), at least on Windows 7. On Windows XP, I think I remember seeing that ExitProcess() was even called directly at the end of the thread-cleanup chain.
Since there are many respected optimizations in assembly language that are chosen purely on generating smaller code, I wonder why more code floating around prefers the explicit call to ExitProcess() rather than RET. Is this habit or is there another reason?
In its purest sense, wouldn't a RET instruction be preferable to a direct call to ExitProcess()? A direct call to ExitProcess() seems akin to exiting your program by killing it from the task manager as this short-circuits the normal flow of returning back to where the Windows loader called your entry-point and thus skipping various thread cleanup operations?
I can't seem to locate any information specific to this issue, so I was hoping someone could shed some light on the topic.
If your main function is being called from the C runtime library, then exiting will result in a call to ExitProcess() and the process will exit.
If your main function is being called directly by Windows, as may well be the case with assembly code, then exiting will only cause the thread to exit. The process will exit if and only if there are no other threads. That's a problem nowadays, because even if you didn't create any threads, Windows may have created one or more on your behalf.
As far as I know this behaviour is not properly documented, but is described in Raymond Chen's blog post, "If you return from the main thread, does the process exit?".
(I have also tested this myself on both Windows 7 and Windows 10 and confirmed that they behaved as Raymond describes.)
Addendum: in recent versions of Windows 10, the process loader is itself multi-threaded, so there will always be additional threads present when the process first starts.

Indirect jumps for DLL function calls

Address fixup for calls to DLL functions is a multistage process: the linker directs the call instruction to an indirect jump instruction, and the indirect jump instruction to a word of memory in the import table in the .rdata section where the Windows program loader will place the address of the function when the DLL is loaded at runtime.
The indirect jump instruction must be generated by the linker because the compiler doesn't know the function will turn out to be in a DLL. Program file size is minimized by generating only one indirect jump instruction for each function, no matter how many places it's called from.
Given that, the obvious way to do it is to gather all the indirect jump instructions at the end of the text section, after all the compiler-generated code in all the object files, and that does seem to be what happens when I try a simple test case with the Microsoft linker /nodefaultlib switch (which generates a small enough executable that I can understand the full disassembly).
When I link a small program in the normal way with the C standard library, the resulting executable is large enough that I can't follow all of the disassembly, but as far as I can see, the indirect jump instructions seem to be scattered throughout the code in small groups of maybe three at a time.
Is there a reason for this that I'm missing?
The indirect jump instruction must be generated by the linker because
the compiler doesn't know the function will turn out to be in a DLL.
Actually, this is not always the case. If you mark the function with __declspec(dllimport), the compiler does know it will be a DLL import and in that case it can generate an indirect call:
; HMODULE = LoadLibrary("mylib");
push offset $SG66630
call [__imp__LoadLibraryA#4]
(__imp__LoadLibraryA#4 is the pointer to the import in the IAT)
If you do not use dllimport then the compiler generates a relative function call:
push offset $SG66630
call _LoadLibraryA#4
And in such case the linker has to generate a jump stub:
LoadLibraryA proc near
jmp [__imp__LoadLibraryA#4]
LoadLibraryA endp
And, in fact, it does group such jump stubs together (though possibly by compile unit and/or imported DLL, not 100% sure here).
Note: in the past, the linker did not explicitly generate jump stubs but took them from the import libraries. They contained complete object files both the stubs and the structures necessary for generating the PE import directory. See this article for how it all worked: https://www.microsoft.com/msj/0498/hood0498.aspx
These days the import libraries have only the API and DLL names and the linker knows how to generate the necessary code and metadata for importing them.

What exactly happened with the Lisp REPL on JPL's DS-1?

I've heard the Google talk (http://www.youtube.com/watch?v=_gZK0tW8EhQ) by Ron Garret and read the paper (http://www.flownet.com/gat/jpl-lisp.html), but I'm not understanding how it worked to "correct" supposedly running code with a REPL. Was the DS-1's Lisp code running is some sort of virtual machine? Or was it "live" in the REPL's actual world? Or was the Lisp code an executable that got replaced? What exactly happened/happens when you dynamically change running Lisp code through a REPL?
Whereas most programs are built and distributed as an executable that contains only the necessary components to run the program, Lisp can be distributed as an image that contains not just the components for the specific program, but also much or all of the Lisp runtime and development environment.
The REPL is the quintessential mechanism for providing interactive access to a running Lisp environment. The two key components of the REPL, Read, and Eval, expose much of the Lisp runtime system. For example, many Lisp systems today implement Eval by compiling the provided form (that is read by the Reader), compiling the form to machine code, and then executing the result. This is in contrast to interpreting the form. Some systems, especially in the past, contained both an interpreter that executes quickly and is suitable for interactive access, and a compiler that produces better code. But modern systems are fast enough that the compiler phase isn't noticeable and simply forgo the interpreter.
Of course, you can do very similar things today. A simple example is running SSH to your Linux box that's hosting PHP. Your PHP server is up and running and live, serving pages and requests. But you login through SSH, go over and fix a PHP file, and as soon as you save that file, all of your users see the new result in real time -- the system updated itself on the fly.
The fact that PHP is running on a Linux runtime vs Lisp running on a Lisp runtime, is a detail. The effect is the same. The fact that PHP isn't compiled is a detail also. For example, you can do the same thing on a Java server: modify a JSP, save it, and the JSP is converted in to a Servlet as Java source code, then compiled on the fly by the Java runtime, then loaded in to the executing container, replacing the old code.
Lisps capability to do this is very nice, and it was very interesting far back in the day. Today, it's less so, as there are different system providing similar capabilities.
Addenda:
No, Lisp is not a virtual machine, there's no need for it to be that complicated.
The key to the concept is dynamic dispatch. With dynamic dispatch there is some lookup involved before a function is invoked.
In a static language like C, locations of things are pretty much set in stone once the linker and loader have finished processing the executable in preparation to start executing.
So, in C if you have something simple like:
int add(int i) {
return i + 1;
}
void main() {
add(1);
}
After compiling and linking and loading of the program, the address of the add function will be set in stone, and thus thing referring to that function will know exactly where to find it.
So, in assembly language: (note this is a contrived assembly language)
add: pop r1 ; pop R1 from the stack, loading the i parameter
add r1, 1; Add 1 to the parameter.
push r1 ; push result of function call
rts ; return from subroutine
main: push 1 ; Push parameter to function
call add ; call function
pop r1 ; gather (and ignore) the result
So, you can see here that add is fixed in place.
In something like Lisp, function are referred to indirectly.
int add(int i) {
return i + 1;
}
int *add_ptr() = &add;
void main() {
*(add_ptr)(1);
}
In assembly you get:
add: pop r1 ; pop R1 from the stack, loading the i parameter
add r1, 1; Add 1 to the parameter.
push r1 ; push result of function call
rts ; return from subroutine
add_ptr: dw add ; put the address of the add routine in add_ptr
main: push 1 ; Push parameter to function
mov r1, add_ptr ; Put the contents of add_ptr into R1
call (r1) ; call function indirectly through R1
pop r1 ; gather (and ignore) the result
Now, you can see here that rather than calling add directly, it is called indirectly through the add_ptr. In a Lisp runtime, it has the capability of compiling new code, and when that happens, add_ptr would be overwritten to point to the newly compiled code. You can see how the code in main never has to change, it will call whatever function add_ptr is pointing to.
Since most all of the functions in Lisp are indirectly referenced through their symbols, a lot can change "behind the back" of a running system, and the system, will continue to run.
When a function is recompiled, the old function code (assuming no other references) become eligible for garbage collection, and will, typically, eventually go away.
You can also see that when the system is garbage collected, any code that is moved (such as the code for the add function) can be moved by the runtime, and it's new location updated in the add_ptr so the system will continue operating even after code and been relocated by the garbage collector.
So, the key to it all, is to have your functions invoked through some lookup mechanism. Having this gives you a lot of flexibility.
Note, you can also do this is a running C system, for example. You can put code in a dynamic library, load the library, execute the code, and if you want you can build a new dynamic library, close the old one, open the new one, and invoke the new code -- all in a "running" system. The dynamic library interface provides the lookup mechanism that isolates the code.

XCode/gdb loses stack when debugging over calls to dynamic library functions on iOS

I've got an iOS project that links to an external static library written in C++. The static library makes calls to functions implemented by libstdc++, which is dynamically linked. For instance, I call the initialization function for this library (let's call it foo_init()) and it immediately calls setlocale().
The static library is compiled with -g, meaning debug symbols are around for me to step into code inside the debugger. I successfully step into foo_init(). When I attempt to Step Over the call to setlocale(), XCode doesn't quite do that. It ends up in a function called dyld_stub_setlocale. This function is a single jmp instruction to perform the dynamic load & function call.
I've tried Stepping Over/In/Out of dyld_stub_setlocale but they don't get me where I want, which is back into foo_init(). Step Over and Step In end up in stub_helpers, and Step Out acts like continue. If I try Step Over/In inside stub_helpers, XCode single steps and the stack window displaying foo_init() changes to ??. At this point, the decision tree for stepping in/out kind of explodes so I won't go into further details, but no combinations I've tried end up back to the line after the call to setlocale.
I am able to set a breakpoint for the line, hit continue, and have it work, but this is not a scalable solution for debugging a static library with which I am not very familiar.
Note that I tried to find a way to link libstdc++-static instead so I could avoid the dynamic loader issues, but Apple has removed the library from newer SDKs and I don't have the older ones.
Is there a linker or compiler option to make the code easier for the debugger to decipher?

Resources