Can I put LowLevelMouseProc and LowLevelKeyboardProc in the main EXE? - winapi

Global Windows hooks must be in a DLL because the hook is going to be called in the context of a different process, so the hook procedure's code must be injected into that process. However, there are limitations:
SetWindowsHookEx can be used to inject
a DLL into another process. A 32-bit
DLL cannot be injected into a 64-bit
process, and a 64-bit DLL cannot be
injected into a 32-bit process. If an
application requires the use of hooks
in other processes, it is required
that a 32-bit application call
SetWindowsHookEx to inject a 32-bit
DLL into 32-bit processes, and a
64-bit application call
SetWindowsHookEx to inject a 64-bit
DLL into 64-bit processes. The 32-bit
and 64-bit DLLs must have different
names.
For this reason, I'd rather use the low-level hooks WH_MOUSE_LL and WH_KEYBOARD_LL, instead of WH_MOUSE and WH_KEYBOARD. As seen from their documentation:
This hook is called in the context of
the thread that installed it. The call
is made by sending a message to the
thread that installed the hook.
Therefore, the thread that installed
the hook must have a message loop.
This leads me to think that these particular hook procedures do not need to be in a separate DLL, and can just live inside the EXE that hooked them up. The documentation for SetWindowsHookEx, however, says:
lpfn
[in] Pointer to the hook procedure. If the dwThreadId parameter
is zero or specifies the identifier of
a thread created by a different
process, the lpfn parameter must point
to a hook procedure in a DLL.
No explicit exception for the two low-level hooks is mentioned.
I have seen several .NET applications that use the low-level hooks without having their hook procedures in a separate DLL. That is another hint that this is acceptable. However, I'm a bit scared to do this myself since the documentation forbids it.
Does anyone foresee any trouble if I don't use a DLL and just put these low-level hook procedures straight into my EXE?
Edit: For the bounty, I would like a definitive "yes, this is ok, because..." or "no, this can go wrong, because...".

Turns out that this is actually in the documentation. Although not in the documentation of SetWindowsHookEx and friends, but in a .NET knowledge base article.
Low-level hook procedures are called on the thread that installed the hook. Low-level hooks do not require that the hook procedure be implemented in a DLL.

There is one exception to the global hooking function in dll rule. Low level mouse and keyboard hooks are executed in the context of the calling process, not the process being hooked (internally, Windows notifies your hook via a windows message). Therefore the hook code is not executed in an arbitrary process and can be written in .Net. See http://www.codeproject.com/KB/cs/CSLLKeyboardHook.aspx for an example.
For other hooks you do need to call the 32 bit version of SetWindowsHookEx and pass a hook function in a 32bit process and call the 64bit version of SetWindowsHookEx and pass a hook function in a 64bit process, though.

Global hooks, whether low or high level, have to be in a separate DLL that can be loaded into each process. The documentation you quoted makes that pretty clear, and if there was an exception that applied to the low-level hooks, that documentation would say so as well.

Rule of thumb: When the docs say not to do something, there's usually a pretty good reason for it. While it may work in some cases, that fact that it works may be an implementation detail, and subject to change. If that happens, then your code will be broken if the implementation is ever modified.

Edit: I take back my previous answer. It turns out that WH_MOUSE_LL and WH_KEYBOARD_LL are exceptions to the usual rule about global hooks:
What is the HINSTANCE passed to SetWindowsHookEx used for?

Related

actual machine code to execute what Win APIs do stays in OS kernel memory space or compiled together as part of the app?

If this question deals with too basic a matter, please forgive me.
As a somewhat-close-to-beginner-level programmer, I really wonder about this--whether the underlying code of every win API function is compiled altogether at the time of writing an app, or whether the machine code for executing win APIs stays in the memory as part of the OS since the pc is booted up, and only the app uses them?
All the APIs for an OS are used by many apps by means of function call. So I thought that rather than making every individual app include the API machine code on their own, apps just contain the header or signature to call the APIs and the API machine code addresses are mapped when launching the app.
I am sorry that I failed to make this question succinct due to my poor English. I really would like to get your insights. Thank you.
The implementation for (most) API calls is provided by the system by way of compiled modules (Portable Executable images). Application code only contains enough information so that the system can identify and load the required modules, and resolve the respective imports.
As an example consider the following code that shows a message box, waits for it to close, and then exits the program:
#include <Windows.h>
int main()
{
::MessageBoxW(nullptr, L"Foo", L"Bar", MB_OK);
}
Given the function signature (declared in WinUser.h, which gets pulled in from Windows.h) the compiler can almost generate a call instruction. It knows the number of arguments, their expected types, and the order and location the callee expects them in. What's missing is the actual target address inside user32.dll, that's only known after a process was fully initialized, and had the user32.dll module mapped into its address space.
Clearly, the compiler cannot postpone code generation until after load time. It needs to generate a call instruction now. Since we know that "all problems in computer science can be solved by another level of indirection" that's what the compiler does, too: Instead of emitting a direct call instruction it generates an indirect call. The difference is that, while a direct call immediately needs to provide the target address, an indirect call can specify the address at which the target address is stored.
In x86 assembly, instead of having to say
call _MessageBoxW#16 ; uh-oh, not yet known
the compiler can conveniently delegate the call to the Import Address Table (IAT):
call dword ptr [__imp__MessageBoxW#16]
Disaster averted, we've bought us just enough time to fix things up before the code actually executes.
Once a process object is created the system hands over control to its primary thread to finish initialization. Part of that initialization is loading dependencies (such as user32.dll here). Once that has completed, the system finally knows the load address (and ultimately the address of imported symbols, such as _MessageBoxW#16), and can overwrite the IAT entry at address __imp__MessageBoxW#16 with the imported function address.
And that is approximately how the system provides implementations for system services without requiring client applications to know where (physically) they will find them.
I'm saying "approximately" because things are somewhat more involved in reality. If that is something you'll want to learn about, I'll leave it up to Raymond Chen. He has published a series of blog entries covering this topic in far more detail:
How were DLL functions exported in 16-bit Windows?
How were DLL functions imported in 16-bit Windows?
How are DLL functions exported in 32-bit Windows?
Exported functions that are really forwarders
Rethinking the way DLL exports are resolved for 32-bit Windows
Calling an imported function, the naive way
How a less naive compiler calls an imported function
Issues related to forcing a stub to be created for an imported function
What happens when you get dllimport wrong?
Names in the import library are decorated for a reason
Why can't I GetProcAddress a function I dllexport'ed?

Whether the APIs in kernel32.dll (or others) have subrutines

I was wondering that whether the APIs in kernel32.dll (or others) have subrutines.
For example the CopyFile function, it should take different action to copy file from C: to D: and from a netshare path (\HOSTNAME\SHAREDFOLDER\FILENAME) to somewhere, or trigger the windows server 2012 (hyper-v) new feature ODX.
So in the definition of the CopyFile function, there should be some if/else branch, and call some sub function, isn't it?
If the subrutines exist. Is it possible to call the these sub functions directly, and is it possible to hook them?
Thanks.
As far as I know, the current implementation of kernel32.dll calls functions in ntdll.dll. The functions in ntdll.dll then do a syscall into the kernel somehow.
To answer your question, yes, it calls subroutines, and they probably can be hooked, but most of the logic about how specifically to read from and write to filesystems in different ways is probably buried in the kernel.
Keep in mind that you're probably not supposed to be digging into the internals of these DLLs — it's best to use the public interface. Relying on implementation details makes your code more fragile and likely to break with operating system upgrades.

Is it safe to call LoadLibrary from DllMain if you've used a kernel driver to ensure yours is the first library loaded?

I've been looking at some hooking code which selectively loads a library into certain processes and then hooks certain native API functions (using Detours). The chain of events looks like this:
Kernel driver loads A.dll into every process.
A.dll::DllMain() decides whether to load B.dll (LoadLibraryEx) which contains actual Detours hooks.
B.dll runs for the duration of the process hooking said functions.
The second bullet here appears to break the DllMain rules specified here, but I'm trying to work out if the way the driver loads A.dll works around the limitations. Specifically, the kernel driver uses PsSetLoadImageNotifyRoutine to get notifications when each process starts and then queues an APC to call LoadLibraryEx on A.dll which means it's pretty much the first DLL loaded when the process starts. Does this circumvent the problems with calling LoadLibrary within DllMain?
Doesn't matter how the LoadLibraryEx was triggered. Once triggered, the DLL loading process is the same, and the same rules apply.
The documentation very specifically says not to call LoadLibrary in DllMain. Even in the unlikely event that you figured out a safe way to make it work, it may not work in the next version (or even the next service pack) of Windows.

Pthread win32 libraray, PTHREAD_PROCESS_SHARED not supported

I am using pthread win32 library to implement mqueue.
But when it runs into following code, it throw #40 error should be ENOSYS, means system not supported.
pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
i = pthread_mutex_init(&mqhdr->mqh_lock, &mattr);
pthread_mutexattr_destroy(&mattr); /* be sure to destroy */
i is 40 after it goes wrong. Any body has idea about this? or do you have some other alternative solution, like use what kind of WIN32 thread function to replace it.
Note: If anyone successfully implement a mqueue in win32?
Thanks
You will want to read up on Windows interprocess synchronization functions.
For an inter-process mutex in Windows, your choices are to implement your own using shared memory and InterlockedCompareExchange (spin then sleep or watch for Event).
Or easier to program but not as performant is to use the OS provided named Mutex object. These perform about 10 times worse than using CriticalSection within threads of a process.
In my own production code I was porting from Linux pthreads, I played with the first solution, but ended up releasing the code using the Mutex solution. It was more reliable and I was sure it would work in all cases.
I recognize the code you are using ...just comment the 2 lines in the code
pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
...it works fine as a intra-process message queue ...unless you need it across processes.
I don't know if you feel comfortable hacking inside the Win32 PThread library, but, while the full PTHREAD_PROCESS_SHARED behavior cannot be attained, it IS possible to duplicate handles to kernel objects into other processes using the DuplicateHandle API - so it should be possible to add some windows specific extensions (that would compile out in unix builds) that allow a mutex to be shared between processes.
•A child process created by the CreateProcess function can inherit a handle to a mutex object if the lpMutexAttributes parameter of CreateMutex enabled inheritance. This mechanism works for both named and unnamed mutexes.
•A process can specify the handle to a mutex object in a call to the DuplicateHandle function to create a duplicate handle that can be used by another process. This mechanism works for both named and unnamed mutexes.
•A process can specify a named mutex in a call to the OpenMutex or CreateMutex function to retrieve a handle to the mutex object.
I believe that is Aurelio Medina's code from 2000.
Unfortunately, his test code was a single process, so it didn't care if the PTHREAD_PROCESS_SHARED flag was set or not, since pthreads-32 has never supported it. When he built it in 2000, I bet that pthreads did't even throw an error, so his test code run fine.
Unfortunately for all of us, it seems he died in 2013, so he's not going to finish his opus.
I've taken up the torch and rewrote the mutex/signal handling to use native windows mutex and events. Please look here for the code:
https://github.com/marklakata/mqueue-w32

How to intercept dll method calls?

How to intercept dll method calls?
What are the techniques available for it?
Can it be done only in C/C++?
How to intercept method calls from all running processes to a given dll?
How to intercept method calls from a given processes to a given dll?
There are two standard ways I can think of for doing this
DLL import table hook.
For this you need to parse the PE Header of the DLL, find the import table and write the address of your own function instead of what is already written there. You can save the address of the original function to be able to call it later. The references in the external links of this wikipedia article should give you all the information you need to be able to do this.
Direct modification of the code. Find the actual code of the function you want to hook and modify the first opcodes of it to jump to your own code. you need to save the opcode which were there so they will eventually get executed. This is simpler than it sounds mostly because it was already implement by no less than Microsoft themselves in the form of the Detours library.
This is a really neat thing to do. with just a couple of lines of code you can for instance replace all calls to GetSystemMetrics() from say outlook.exe and watch the wonders that occur.
The advantages of one method are the disadvantages of the other. The first method allows you to add a surgical hook exactly to DLL you want where all other DLLs go by unhooked. The second method allows you the most global kind of hook to intercept all calls do the function.
Provided that you know all the DLL functions in advance, one technique is to write your own wrapper DLL that will forward all function calls to the real DLL. This DLL doesn't have to be written in C/C++. All you need to do is to match the function calling convention of the original DLL.
See Microsoft Detours for a library with a C/C++ API. It's a bit non-trivial to inject it in all other programs without triggering virus scanners/malware detectors. But your own process is fair game.
On Linux, this can be done with the LD_PRELOAD environment variable. Set this variable to point at a shared library that contains a symbol you'd like to override, then launch your app.

Resources