Whether the APIs in kernel32.dll (or others) have subrutines - windows

I was wondering that whether the APIs in kernel32.dll (or others) have subrutines.
For example the CopyFile function, it should take different action to copy file from C: to D: and from a netshare path (\HOSTNAME\SHAREDFOLDER\FILENAME) to somewhere, or trigger the windows server 2012 (hyper-v) new feature ODX.
So in the definition of the CopyFile function, there should be some if/else branch, and call some sub function, isn't it?
If the subrutines exist. Is it possible to call the these sub functions directly, and is it possible to hook them?
Thanks.

As far as I know, the current implementation of kernel32.dll calls functions in ntdll.dll. The functions in ntdll.dll then do a syscall into the kernel somehow.
To answer your question, yes, it calls subroutines, and they probably can be hooked, but most of the logic about how specifically to read from and write to filesystems in different ways is probably buried in the kernel.
Keep in mind that you're probably not supposed to be digging into the internals of these DLLs — it's best to use the public interface. Relying on implementation details makes your code more fragile and likely to break with operating system upgrades.

Related

actual machine code to execute what Win APIs do stays in OS kernel memory space or compiled together as part of the app?

If this question deals with too basic a matter, please forgive me.
As a somewhat-close-to-beginner-level programmer, I really wonder about this--whether the underlying code of every win API function is compiled altogether at the time of writing an app, or whether the machine code for executing win APIs stays in the memory as part of the OS since the pc is booted up, and only the app uses them?
All the APIs for an OS are used by many apps by means of function call. So I thought that rather than making every individual app include the API machine code on their own, apps just contain the header or signature to call the APIs and the API machine code addresses are mapped when launching the app.
I am sorry that I failed to make this question succinct due to my poor English. I really would like to get your insights. Thank you.
The implementation for (most) API calls is provided by the system by way of compiled modules (Portable Executable images). Application code only contains enough information so that the system can identify and load the required modules, and resolve the respective imports.
As an example consider the following code that shows a message box, waits for it to close, and then exits the program:
#include <Windows.h>
int main()
{
::MessageBoxW(nullptr, L"Foo", L"Bar", MB_OK);
}
Given the function signature (declared in WinUser.h, which gets pulled in from Windows.h) the compiler can almost generate a call instruction. It knows the number of arguments, their expected types, and the order and location the callee expects them in. What's missing is the actual target address inside user32.dll, that's only known after a process was fully initialized, and had the user32.dll module mapped into its address space.
Clearly, the compiler cannot postpone code generation until after load time. It needs to generate a call instruction now. Since we know that "all problems in computer science can be solved by another level of indirection" that's what the compiler does, too: Instead of emitting a direct call instruction it generates an indirect call. The difference is that, while a direct call immediately needs to provide the target address, an indirect call can specify the address at which the target address is stored.
In x86 assembly, instead of having to say
call _MessageBoxW#16 ; uh-oh, not yet known
the compiler can conveniently delegate the call to the Import Address Table (IAT):
call dword ptr [__imp__MessageBoxW#16]
Disaster averted, we've bought us just enough time to fix things up before the code actually executes.
Once a process object is created the system hands over control to its primary thread to finish initialization. Part of that initialization is loading dependencies (such as user32.dll here). Once that has completed, the system finally knows the load address (and ultimately the address of imported symbols, such as _MessageBoxW#16), and can overwrite the IAT entry at address __imp__MessageBoxW#16 with the imported function address.
And that is approximately how the system provides implementations for system services without requiring client applications to know where (physically) they will find them.
I'm saying "approximately" because things are somewhat more involved in reality. If that is something you'll want to learn about, I'll leave it up to Raymond Chen. He has published a series of blog entries covering this topic in far more detail:
How were DLL functions exported in 16-bit Windows?
How were DLL functions imported in 16-bit Windows?
How are DLL functions exported in 32-bit Windows?
Exported functions that are really forwarders
Rethinking the way DLL exports are resolved for 32-bit Windows
Calling an imported function, the naive way
How a less naive compiler calls an imported function
Issues related to forcing a stub to be created for an imported function
What happens when you get dllimport wrong?
Names in the import library are decorated for a reason
Why can't I GetProcAddress a function I dllexport'ed?

How are Windows API calls made on Assembly Level?

I've written some high level interpreters and a simple byte code compiler and interpreter and I want to start making a powerful intermediate language for my small operating system.
It has its own API just like windows does, and the only thing which prevents me of starting this project is to know how these specific API calls (for example the win32 forms api) are being made on the assembly level.
Is there a way to see the assembly output of not optimized c code for example and look how exatly the calls are being made? Or any sources on the WWW?
Thanks in advance
Having C documentation for the API, and knowing the calling convention / ABI, should be enough to create asm that uses it. There's no "magic" needed (no inline syscall instructions or anything like that).
Much of the Win32 API is implemented in user-space DLLs, so API calls are no different from other library function calls. (i.e. an indirect CALL with a function pointer, if I recall correctly).
Often the library function implementation will involve a syscall to interact with the kernel (or for 32-bit code, maybe an int or sysenter, I'm not sure), but this interface is not documented and is not stable across different Windows versions.

Can I put LowLevelMouseProc and LowLevelKeyboardProc in the main EXE?

Global Windows hooks must be in a DLL because the hook is going to be called in the context of a different process, so the hook procedure's code must be injected into that process. However, there are limitations:
SetWindowsHookEx can be used to inject
a DLL into another process. A 32-bit
DLL cannot be injected into a 64-bit
process, and a 64-bit DLL cannot be
injected into a 32-bit process. If an
application requires the use of hooks
in other processes, it is required
that a 32-bit application call
SetWindowsHookEx to inject a 32-bit
DLL into 32-bit processes, and a
64-bit application call
SetWindowsHookEx to inject a 64-bit
DLL into 64-bit processes. The 32-bit
and 64-bit DLLs must have different
names.
For this reason, I'd rather use the low-level hooks WH_MOUSE_LL and WH_KEYBOARD_LL, instead of WH_MOUSE and WH_KEYBOARD. As seen from their documentation:
This hook is called in the context of
the thread that installed it. The call
is made by sending a message to the
thread that installed the hook.
Therefore, the thread that installed
the hook must have a message loop.
This leads me to think that these particular hook procedures do not need to be in a separate DLL, and can just live inside the EXE that hooked them up. The documentation for SetWindowsHookEx, however, says:
lpfn
[in] Pointer to the hook procedure. If the dwThreadId parameter
is zero or specifies the identifier of
a thread created by a different
process, the lpfn parameter must point
to a hook procedure in a DLL.
No explicit exception for the two low-level hooks is mentioned.
I have seen several .NET applications that use the low-level hooks without having their hook procedures in a separate DLL. That is another hint that this is acceptable. However, I'm a bit scared to do this myself since the documentation forbids it.
Does anyone foresee any trouble if I don't use a DLL and just put these low-level hook procedures straight into my EXE?
Edit: For the bounty, I would like a definitive "yes, this is ok, because..." or "no, this can go wrong, because...".
Turns out that this is actually in the documentation. Although not in the documentation of SetWindowsHookEx and friends, but in a .NET knowledge base article.
Low-level hook procedures are called on the thread that installed the hook. Low-level hooks do not require that the hook procedure be implemented in a DLL.
There is one exception to the global hooking function in dll rule. Low level mouse and keyboard hooks are executed in the context of the calling process, not the process being hooked (internally, Windows notifies your hook via a windows message). Therefore the hook code is not executed in an arbitrary process and can be written in .Net. See http://www.codeproject.com/KB/cs/CSLLKeyboardHook.aspx for an example.
For other hooks you do need to call the 32 bit version of SetWindowsHookEx and pass a hook function in a 32bit process and call the 64bit version of SetWindowsHookEx and pass a hook function in a 64bit process, though.
Global hooks, whether low or high level, have to be in a separate DLL that can be loaded into each process. The documentation you quoted makes that pretty clear, and if there was an exception that applied to the low-level hooks, that documentation would say so as well.
Rule of thumb: When the docs say not to do something, there's usually a pretty good reason for it. While it may work in some cases, that fact that it works may be an implementation detail, and subject to change. If that happens, then your code will be broken if the implementation is ever modified.
Edit: I take back my previous answer. It turns out that WH_MOUSE_LL and WH_KEYBOARD_LL are exceptions to the usual rule about global hooks:
What is the HINSTANCE passed to SetWindowsHookEx used for?

Working around fls limitations with too many statically linked CRTs?

When loading external DLLs (not under our control) via LoadLibrary, we're hitting a problem where the statically linked CRT in those DLLs are failing to allocate fiber-local storage. This is similar to mskb 193462, except that this is FLS and there's only 128 of them.
Are there any useful ways to work around the problem? The CRT is using GetProcAddress to find FlsAlloc anyway (since that apparently never existed in XP), so does it even really need it?
(This is on Vista, where FlsAlloc actually exists; the DLLs appear to be using MSVC8)
There is frankly no solution here, short of loading less dlls.
You could hook the dll's import address table - but that will happen too late as you can only install an IAT hook when LoadLibrary returns, and the CRT initialization code probably executes in response to DllProcessAttach which will already have been processed.
You could I guess find the kernel32.dll module in memory, and patch the export address for GetProcAddress or perhaps FlsAlloc to point to your implementation. But that approach is getting seriously hackish.

How to intercept dll method calls?

How to intercept dll method calls?
What are the techniques available for it?
Can it be done only in C/C++?
How to intercept method calls from all running processes to a given dll?
How to intercept method calls from a given processes to a given dll?
There are two standard ways I can think of for doing this
DLL import table hook.
For this you need to parse the PE Header of the DLL, find the import table and write the address of your own function instead of what is already written there. You can save the address of the original function to be able to call it later. The references in the external links of this wikipedia article should give you all the information you need to be able to do this.
Direct modification of the code. Find the actual code of the function you want to hook and modify the first opcodes of it to jump to your own code. you need to save the opcode which were there so they will eventually get executed. This is simpler than it sounds mostly because it was already implement by no less than Microsoft themselves in the form of the Detours library.
This is a really neat thing to do. with just a couple of lines of code you can for instance replace all calls to GetSystemMetrics() from say outlook.exe and watch the wonders that occur.
The advantages of one method are the disadvantages of the other. The first method allows you to add a surgical hook exactly to DLL you want where all other DLLs go by unhooked. The second method allows you the most global kind of hook to intercept all calls do the function.
Provided that you know all the DLL functions in advance, one technique is to write your own wrapper DLL that will forward all function calls to the real DLL. This DLL doesn't have to be written in C/C++. All you need to do is to match the function calling convention of the original DLL.
See Microsoft Detours for a library with a C/C++ API. It's a bit non-trivial to inject it in all other programs without triggering virus scanners/malware detectors. But your own process is fair game.
On Linux, this can be done with the LD_PRELOAD environment variable. Set this variable to point at a shared library that contains a symbol you'd like to override, then launch your app.

Resources