The two key-state functions in the WIndows API, GetKeyState() and GetAsyncKeyState(), both determine key state based on key up/down messages rather than the physical state of the key.
I am working on a program which manipulates input, using SendInput(), to release modifier keys (alt, ctrl, etc...), send input, and then re-press the modifier keys.
The problem is that I don't know if the modifier keys are still pressed after the input is sent because I have sent the key-up event and both of the above mentioned functions return that the key is up regardless of the state of the physical key. So if I assume they are still down, the user is left with a dangling ctrl-down causing problems until the user presses and releases cntl again (or any modifier key). Otherwise the key may be left up even when the physical key is still down.
So is there any way (preferably without anything too low level) to detect the physical key state. Windows-only methods are fine. Key monitoring (listening for key up events) really isn't possible (or at least really, really not preferable).
You aren't giving the window manager enough time to process the input you just injected. Until it reaches the "update key states for GetAsyncKeyState" part of the code, GetAsyncKeyState will report the old value. (In particular, it won't reach that point until all low-level keyboard hooks have had a chance to inspect the action and possibly reject it.)
In other words, your code has a race condition, and that is what you are observing.
You are getting a bit confused here. In fact GetAsyncKeyState() does return the key state at the instant that GetAsyncKeyState() was called. On the other hand, GetKeyState() returns the key state based on the history of queued messages.
After much testing, I seem to have figured it out. MSDN states about GetKeyState() :
The key status returned from this function changes as a thread reads key messages from its message queue.
GetAsyncKeyState() still works on key up/down messages (not physical key state) however, it just doesn't wait for the message to be read. So if a key event message is sent via SendInput(), it will still return incorrectly - in fact, it will be incorrect before GetKeyState() because it will be incorrect immediately after the call.
A simple test to demonstrate this functionality is here (VS2010 solution) or just the source here.
Related
My understanding is that TranslateMessage collates a sequence of key events and adds a WM_CHAR message to the queue if it results in a character. So on my computer what happens when I press Alt+1234 (6 relevant events/messages, including releasing the Alt) is the single character "Ê" comes out (in certain places).
Let's say I have a sequence of virtual key codes and related keypress data generated from the LL keyboard hook. Is there some way of using the Windows OS logic to translate this sequence into a real character? For example, could I construct contrived MSG structures, call TranslateMessage on them and then catch the WM_CHAR ensuing events? That seems very outside of Windows' expectations; haven't tried it yet but it seems like it could cause all kinds of subtle problems.
The cleanest solution I can think of so far is just to re-implement the logic myself to figure out the characters from the virtual codes. This is unfortunate of course since Windows internals already seem to know how to do this! Is there a better way?
I am aware of the existence of MapVirtualKeyA but this does not seem to handle a sequence of virtual key codes.
I am also aware that it is possible to do a hook on all GetMessage calls which could be used just to grab the WM_CHAR messages from every process. However this seems an extremely heavy solution: I need to make a separate DLL unlike for the WH_KEYBOARD_LL hook and then use some sort of IPC to send the characters back to my host process. Also MSDN explicitly says that you should avoid doing global hooks this for anything outside debugging and I need this to work on production machines.
I am also also aware of KeysConverter in .NET (I am fine to use .NET for this) but again this does not seem to deal with sequences of virtual keys like given above.
I've done some research (with single input device altrough) in this field and discovered that in most situations messages are sent by pair, first WM_INPUT and then WM_KEYDOWN. So it's merely possible to link them together for filtering, i.e. WM_INPUT flags that it's corresponding WM_KEYDOWN shoudn't be sent to reciever (in my case first i discard all WM_KEYDOWN and then decide whenever i need to send them back to their recipients). I just assume that all next WM_KEYDOWN are belong to last WM_INPUT.
My question exactly: can i seriously rely on that principle? Won't those messages mix up if i use multiple input devices?
There are some serious questions about its reliability already:
1. How do i distinguish repeating input from multiple devices (answer is obvious - i can't).
2. Would WM_INPUT-WM_KEYDOWN pairs mix up in case of input from multiple devices? i.e. form an cortege like WM_INPUT, WM_INPUT, WM_KEYDOWN, WM_KEYDOWN?
Also maybe it is possible to just discard all WM_KEYDOWN and generate all keyboard events by myself? Altrough it would be technically quite difficult, because there may be multiple WM_KEYDOWNs from one WM_INPUT (key repeatence work that way, multiple WM_KEYDOWN, one WM_KEYUP).
Just in case, here's what i need to achieve:
I need to filter all messages by time between them. All user input gets filtered by time interval between keypresses. If two messages were sent with interval <50ms i discard first message and second awaits while its TTL exceeds and if so, it sent to its recipient.
Difficulty is that there can be multiple input devices and those timings will mess up with each other.
I understand your issue having multiple devices and things getting messed up.
Every device has there Product and Vendor Id which is not same, so what I suggest is to is to differentiate them on the basis of their Product and Vendor Id.
I have been working on a HID device recently so this might help you too.
I figured out that keyboard hook (WH_KEYBOARD) actually occurs before WM_KEYDOWN message, can't check if simultanious input from several devices will mess up order of WM_INPUTS and KeyboardHook events (like sequence of events: Dev0_WM_INPUT Dev1_WM_INPUT Dev0_KBDHook Dev1_KBDHook - altrough that sequence of event will be handle, what i fear is if Dev1_KBDhook will appear before Dev0_KBDhook or worse).
With WM_KEYDOWN such mess was possible, still don't know if it will be same with keyboad hook.
Anyway it is possible solution. On WM_INPUT i create Message itself and partly fill, on next KeyboardHookEvent i just fill remaining part.
Generally WM_INPUTs and KeyboardHook events occur by pairs, but as i mentioned before, i don't exactly know if it can mess up, but if even so, if it will maintain order of KeyboardHookEvents and WM_INPUTS (like Dev0_INPUT, Dev1_INPUT and then Dev0_KBDEvent, Dev1_KBDEvent) it will give no trouble to parse those sequences. For example one stack:
WM_INPUT pushes new message struct, KBDEvent pops and fill remaining parts.
Not generally good solution, but i guess it is good enough to use if no other exists, solves the problem, atleas partially.
If i'll manage to test its behavious upon simultanious input from multiple devices, i will post info here. Altrough i really doubt there will be any mess that can't handled. Unless windows chooses time to send corresponding keyboard event by random...
Forgot to mention, yes it's partially possible to discard all input and generate manually. I just PostMessage manually forged message (i get lparam from KeyboardHookEvent). But it will give some problems. Like hotkeys won't work and also anything that uses GetAsyncKeyState. In my case it is acceptable altrough.
In Charles Petzold's book "Programming Windows", he mentioned the following:
"Be careful with GetKeyState. It is not a real-time keyboard status check. Rather it reflects the keyboard status up to and including the current message being processed."
"Do not do while(GetKeyState(VK_F1) >= 0);", it is guaranteed to hang your program.
I don't understand these at all. Could someone give an explanation for these two facts, please.
Every time you read a queued keyboard message, for example by calling GetMessage, the OS updates private keyboard state data associated with the calling thread. When you call GetKeyState that private keyboard state data is used to determine the returned key state. Thus, so long as you don't read another queued message, GetKeyState will always return the same value.
I come from the world of web programming and usually the server sets a superglobal variable through the specified method (get, post, etc) that makes available the data a user inputs into a field. Another way is to use AJAX to register a callback method to an event that the AJAX XMLhttpRequest object will initiate once notified by the browser (I'm assuming...). So I guess my question would be if there is some sort of dispatch interface that a systems programmer's code must interact with vicariously to execute in response to user input or does the programmer control the "waiting" process directly? And if there is a dispatch is there a loop structure in an OS that waits for a particular event to occur?
I was prompted to ask this question here because I'm in a basic programming logic class and the professor won't answer such a "sophisticated" question as this one. My book gives a vague pseudocode example like:
//start
sentinel_val = 'stop';
get user_input;
while (user_input not equal to sentinel_val)
{
// do something.
get user_input;
}
//stop
This example leads me to believe 1) that if no input is received from the user the loop will continue to repeat the sequence "do something" with the old or no input until the new input magically appears and then it will repeat again with that or a null value. It seems the book has tried to use the example of priming and reading from a file to convey how a program would get data from event driven input, no?
I'm confused :(
At the lowest level, input to the computer is asynchronous-- it happens via "interrupts", which is basically something external to the CPU (a keyboard controller) sending a signal to the CPU that says "stop what you're doing and accept this data". (It's complex, but this is the general idea). So the CPU stops, grabs the keystroke, and puts it in a buffer to be read, and then continues doing what it was doing before the interrupt.
Very similar things happen with inbound network traffic, and the results of reading from a disk, etc.
At a higher level, it gets more dependent on the operating system or framework that you're using.
With keyboard input, there might be a process (application, basically) that is blocked, waiting for user input. That "block" doesn't mean the computer just sits there waiting, it lets other processes run instead. But when the keyboard result comes in, it will wake up the one who was waiting for it.
From the point of view of that waiting process, they called some function "get_next_character()" and that function returned with the character. Etc.
Frankly, how all this stuff ties together is super interesting and useful to understand. :)
An OS is driven by hardware event (called interrupt). An OS does not wait for an interrupt, instead, it execute a special instruction to put the CPU a nap in a loop. If a hardware event occurs, the corresponding interrupt will be invoked.
It seems the book has tried to use the example of priming and reading from a file
to convey how a program would get data from event driven input, no?
Yes that is what the book is doing. In fact... the unix operating system is built on the idea of abstracting all input and output of any device to look like this.
In reality most operating systems and hardware make use of interrupts that jump to what we can call a sub-routine to perform the low level data read and then return control back to the operating system.
Also on most systems many of the devices work independent of the rest of the operating system and present a high level API to the operating system. For example a keyboard port (or maybe a better example is a network card) on a computer process interrupts itself and then the keyboard driver presents the operating system with a different api. You can look at standards for devices to see what these are. If you want to know the api the keyboard port presents for example you could look at the source code for the keyboard driver in a linix distro.
A basic explanation based on my understanding...
Your get user_input pseudo function is often something like readLine. That means that the function will block until the data read contains a new line character.
Below this the OS will use interrupts (this means it's not dealing with the keyboard unessesarily, but only when required) to allow it to respond when it the user hits some keys. The keyboard interrupt will cause execution to jump to a special routine which will fill an input buffer with data from the keyboard. The OS will then allow the appropriate process - generally the active one - to use readLine functions to access this data.
There's a bunch more complexity in there but that's a simple view. If someone offers a better explanation I'll willingly bow to superior knowledge.
MSDN advises that RegisterWindowMessage() function is only used for registering messages to be sent between the processes. If a message is needed for sending within one process it can be safely selected from the range WM_APP through 0xBFFF.
However in our codebase I often see that RegisterWindowMessage() is used for messages only sent within one process. I suppose that this was done because of perceived simplicity of using RegisterWindowMessage() since it doesn't require manually distributing the message identifiers in the WM_APP..0xBFFF range.
Do I understand correctly that if many applications are run on one machine and they all call RegisterWindowMessage() with different strings they could exhaust the range of message identifiers allowed to return by RegisterWindowMessage() and for some of them it will just return a value indicating a failure? What could be a valid reason for using RegisterWindowMessage() messages in cases where WM_APP..0xBFFF range messages would suffice?
IMHO there is no valid reason to use RegisterWindowMessage if you are only sending messages to yourself
There is no (documented) way to un-register a message, so after your app quits, that registered message will stay in the atom table until reboot/logoff (I can't remember exactly where this atom table is stored, the window station or terminal server session instance probably)
The reason you need to use RegisterWindowMessage even when messaging to yourself is that it protects you from the idiot who broadcasts messages in the WM_APP + N range.
Yes, this does happen.
Abusing RegisterWindowMessage can potentially make a windows box unusuable. This is especially true if the window message names are dynamically generated and a bug causes out of control windows message allocation. In this case the global atom table in your windows station/ desktop will fill up and any process using User32.dll (basically, any app) will fail to start, create windows, etc.
There is a bug out there in Delphi / Borland products that registers messages that start with ControlOfsXXXXXX where XXXX is a memory address (or other dynamic modifier). Apps that are started and stopped frequently will register multiple ControlOfsXXXX atoms and eventually exhaust atom space. For more details see:
http://blogs.msdn.com/b/ntdebugging/archive/2012/01/31/identifying-global-atom-table-leaks.aspx
And
https://forums.embarcadero.com/thread.jspa?threadID=47678
A possible advantage is that Spy++ can display more informative text, therefore debugging is a bit easier. Compare
<00058> 00330CA2 S message:0x0419 [User-defined:WM_USER+25] wParam:00000000 lParam:00000000
with
<00129> 004F0DA0 S message:0xC2B0 [Registered:"AFX_WM_ONCHANGE_ACTIVE_TAB"] wParam:00000001 lParam:02B596E8
Of course, in principle there is a chance to run out of message IDs. On the other hand, in the source code of the MFC Feature Pack there are 52 calls to RegisterWindowMessage. So there are still 16300 IDs left for other applications.