How does Windows accumulate WM_TIMER messages? - winapi

As I understand it, WM_PAINT messages are not generated when a function like InvalidateRect is called; rather an object that can be be thought of as a flag containing some info about the dirty region is set, and when the other messages in the queue are handled, a single WM_PAINT message is generated which combines the regions from all flags since the last successful WM_PAINT message. Presumably this is done once per HWND associated with a particular thread
I'm wondering how WM_TIMER messages are accumulated? I thought perhaps only the most recent flag for each thread would be kept, but what if different TIMERPROCs and window handles have different timers. What if two SetTimer calls with different intervals point to the same window handle; will one WM_TIMER be generated for each timer ID?

What if two SetTimer calls with different intervals point to the same window handle; will one WM_TIMER be generated for each timer ID?
Yes, there will be one message generated for each distinct timer.
You explicitly state "with different intervals". But this is not a factor. What matters is timer identity, defined by the timer ID. You can have multiple distinct timers with the same interval.
What can happen is that timer events can coalesce if the message queue is not serviced as frequently as the events are logically generated. So if multiple timer intervals elapse between calls to pump the message, only a single timer message is generated. Don't think of timer events as indicating how much timer has elapsed, rather treat them as indicating that at least the specified interval has elapsed.

Related

How TTimer actually works internally?

A TTimer with the interval set to 1 sec sends a message every 1 second. This message is processed in application's message loop, which results in the OnTimer event being triggered.
If the application is busy and doesn't have time to process the message loop, the OnTimer event is skipped.
I knwo that TTimer uses internally SetTimer.
My questions are:
Does TTimer use an internal/separate thread (via SetTimer)?
How come that the form that holds the timer (and its OnTimer even) can still do stuff if a modal MessageDlg is "blocking" the form? (see code below)
The documentations says that SetTimer requires Win2000 minimum. How was TTimer implemented in Win98?
void __fastcall TForm1::Timer1Timer(TObject *Sender)
{
Caption = i;
i++;
MessageDlg(stuff); <----- we "block" application here but form's caption is still updated.
}
If the application is busy and doesn't have time to process the message loop, the OnTimer event is skipped.
That is effectively correct. This and this blog posts on MSDN give some internal implementation details, in particular they mention that an expiring timer causes the QS_TIMER flag of the message queue's state to be set. No further time lapse will cause the queue state flag to be even more set. When this flag is set and [Peek|Get]Message cannot pick any higher priority message, a timer message is generated.
Although timer messages do not pile up in the queue, it is possible to have the timer fire again while a previous event handler is executing. This is possible when the code in the handler takes longer to execute than the timer interval and re-entrancy is allowed. If the timer handler causes the application to process queued messages, any pending queue state flag may be cleared and the message posted again, which may cause the timer to fire before the handler finishes executing.
Does TTimer use an internal/separate thread (via SetTimer)?
No. A utility window is created in the main thread which will receive the timer messages. Upon receiving a timer message this window calls the event handler if one is assigned.
How come that the form that holds the timer (and its OnTimer even)
can still do stuff if a modal MessageDlg is "blocking" the form?
The modal loop continues processing the queue, it calls HandleMessage of the Application in a loop which calls ProcessMessage. Hence timer messages are still processed.
That is a potential cause for re-entrancy mentioned above. You may use a flag or disable/enable the timer to prevent that. Or factor out any message processing in the handler altogether.
The documentations says that SetTimer requires Win2000 minimum. How was TTimer implemented in Win98
Same. Documentation keeps changing, occasionally MSDN drops unsupported OS versions from minimum requirements - rather inconsistently. My XE2 API documentation states:
Minimum operating systems Windows 95, Windows NT 3.1
for the WM_TIMER message.
WM_TIMER messages are never placed in the message queue. They are generated when the queue is empty and a flag indicating that the timer has expired is set. So there can never be more than one WM_TIMER message at a time in the queue and if you application is too busy to process the queue you don't get lots of WM_TIMER messages waiting to be processed.
WM_PAINT messages work the same way.

How is the "blocking" behavior of Win32 API GetMessage() implemented?

According to here, GetMessage() is a blocking call which won't return until there's a message can be retrieved from the message queue.
So, how is this blocking behavior implemented?
Does GetMessage() use some kind of spin lock so that the UI thread just busy waits for new messages showing up in the message queue? If so, I guess at least one of my CPU cores should have high usage when a UI application is running. But I didn't see that happen. So how does it work?
ADD 1
Thanks for the hint in the comments. Spin lock is meant to reduce the cost of thread context switch. It shouldn't be used here. I also thought about maybe some event paradigm is used here. But if it is event-driven, how does this event model work?
My guess is like this:
An event for input checking is raised periodically. Perhaps via some hardware timer interrupt. Then the timer interrupt handler will check for various input device buffers for input events. And then put that into certain application's message queue based on the current desktop context, such as which is the active window.
And I guess maybe some other things are also based on the timer interrupt, such as thread context switching.
ADD 2
Based on replies so far. There's some event object that the UI thread waits on. But since UI thread is waiting on something, it is not active and can do nothing by itself yet. And the event object is just some passive state information. So there has to be someone else to wake up the thread upon the event state change. I tink it should be the thread scheduler. And the thread scheduler may be pulsed by the timer interrupt.
The thread scheduler will check the event state periodically and wake up thread and put messages into its queue as necessary.
Am I right about the whole picture?
ADD 3
And there's a remaining question: who modify the state of an event object? Based on here, it seems events are just some data structures that can be modified by any active parties. I think thread scheduler just use the relations among threads and events to decide which thread to run or not.
And by the time a thread is scheduled to run, all it's requirements should already been fulfilled. Such as a message should have been put into its queue before the event it waits on is raised. This is reasonable because otherwise it may be too late. (thanks to RbMm's comments.)
ADD 4
In JDK, the LinkedBlockingDeque type also offers a similar blocking behavior with the take() method.
Retrieves and removes the head of the queue represented by this deque
(in other words, the first element of this deque), waiting if
necessary until an element becomes available.
And the .NET counterpart is the BlockingCollection< T > type. A thread to discuss it.
Here is a thread about how to implement a blocking queue in C#.
The exact implementation is internal and not documented.
Most of the window manager in Windows is implemented in kernel mode. GetMessage is no different. It waits on an event object in kernel mode. When a thread is waiting on an event (or any other kernel-based synchronization object), it does not use any CPU time, because it is not scheduled to run until the wait is satisfied.
When a message is posted to the waiting thread's message queue, or another thread sends a message to a window belonging to the waiting thread, or a timer in the waiting thread fires, or a window in the waiting thread is ready to be painted, the event is signaled, the thread wakes up, and GetMessage() acts accordingly, whether that is to return a posted message to the caller, or to dispatch a sent message directly to the target window procedure and then go back to waiting on the event.
You can see the implementation leaking into the public API if you compare the maximum number of objects you can wait on in WaitForMultipleObjects and MsgWaitForMultipleObjects:
The maximum number of object handles is MAXIMUM_WAIT_OBJECTS.
vs
The maximum number of object handles is MAXIMUM_WAIT_OBJECTS minus
one.

How does the message queue work in Win32?

I read some stuff on Win32 and how the message loop works, and there's something that is still unclear to me: What exactly is stored in the message queue? The integer value that corresponds to a message (WM_COMMAND, WM_CREATE, etc) or a pointer to a MSG structure containing the message integer value and other stuff like wParam, lParam, etc?
To answer your question narrowly, each message in the queue stores, at the least,
a window handle to which the message is directed,
the message code, wParam and lParam, as you already correctly noted,
the time when the message was posted, that you retrieve with GetMessageTime(),
for UI messages, the position of the cursor when the message was posted (see GetMessagePos()).
Note that not all messages are actually stored in the queue. Messages that are sent with SendMessage() to a window from the thread that owns the window are never stored; instead, a receiver window's message function is called directly. Messages sent from other threads are stored until processed, and the sending thread blocks until the message is replied to, either by exiting the window function or explicitly with a call to ReplyMessage(). The API function InSendMessage() helps figure out whether the windows function is processing a message sent from another thread.
Messages that you or the system post are stored in the queue, with some exceptions.
WM_TIMER messages are never actually stored; instead, GetMessage() constructs a timer message if there are no other messages in the queue and a timer has matured. This means that, first, the timer messages have the lowest dequeuing priority, and, second, that multiple messages from a short period timer would never overflow queue, even if GetMessage() is not called for a while. As a result, a single WM_TIMER message is sent for the given timer, even if the timer has elapsed multiple times since the last WM_TIMER message from that timer has been processed.
Similarly, WM_QUIT is also not stored, and rather only flagged. GetMessage() pretends to have retrieved the WM_QUIT after the queue has been exhausted, and this is the last message it retrieves.
Another example is the WM_PAINT message (hat tip to #cody-gray for reminding about this). This message is also simulated when any part of the window¹ is marked as "dirty" and needs repainting. This is also a low-priority message, made so that multiple invalidated regions in a window are repainted all at once when the queue becomes empty, to reduce responsiveness of the GUI and reduce flicker. You can force an immediate repaint by calling UpdateWindow(). This function acts like SendMessage(), in the sense that it does not return until the exposed part of the window is redrawn. This function does not send a WM_PAINT to the window if the invalid region of that window is empty, as an obvious optimization.
Probably, there are other exceptions and internal optimizations.
Messages posted with PostMessage() end up in the queue of a thread that owns the window to which the message is posted.
In what form the messages are stored internally, we do not know, and we do not care. Windows API abstracts that completely. The MSG structure is filled in memory you pass to GetMessage() or PeekMessage(). You do not need to know or to worry about the details of internal implementation beyond those documented in Windows SDK guides.
¹ I do not know how exactly WM_PAINT and WM_TIMER are prioritized relative to each other. I assume WM_PAINT has a lower priority, but I may be wrong.
The message queue in Windows is an abstraction. Very helpful to think of it as a queue but the actual implementation of it is far more detailed. There are four distinct sources of messages in Windows:
Messages that are delivered by SendMessage(). Windows directly calls the window procedure, the message isn't returned by Peek/GetMessage() but does require a call to that function to get dispatched. By far the most messages are delivered this way. WM_COMMAND is like that, it is directly sent by the code that translates a key-down event, like TranslateAccelerator(). No queue-like behavior.
Messages that are synthesized from the window state. Best examples are WM_PAINT, delivered when the "window has a dirty rectangle" state flag is set. And WM_TIMER, delivered when the "timer has expired" state flag is set. These are 'low priority' messages, only delivered when the message queue is empty. They are delivered by GetMessage() but do not otherwise live on the queue.
Input event messages for the keyboard and mouse. These are the messages that truly have queue-like behavior. For the keyboard this ensures that type-ahead works, no keystroke gets lost when the program isn't ready to accept a keystroke. A bunch of state is associated with it, the entire state of the keyboard gets copied for example. Roughly the same for the mouse except that there's less state. The WM_MOUSEMOVE message is an interesting corner case, the queue does not store every single pixel traversed by the cursor. Position changes are accumulated into a single message, stored or delivered when necessary.
Messages stored in the queue by an explicit PostMessage() call. That's entirely up to the program code, clearly it only needs to store the arguments of the call plus the time the call was made so it can accurately be replayed back at GetMessage() time.
MSDN has a nice article here explaining everything on messages and message queues.
But to answer your question, the Queue exists for each window, it temporarily stores the messages and their associate parameters, whether of not its a queue of MSG's is implementation defined, but it most likely is (or something similar). It should also be noted that not all messages will go to the queue, some require immediate processing.

how to create a blocking function

I have a thread that needs to dispatch a message (a simulated mouse event that it sends using SendInput) as soon as it's generated. I want this to happen in a loop where there is no sleep; adding any sleep hurts performance as I basically want the events to go in to the event loop immediately once they have been generated. Of course I also don't want the loop in the consumer thread to hog all the cpu so I can't just keep it running, although this gives me good performance.
As far as I understand this, the task is to make the consuming thread to wait for something that signals that the producing thread has provided something to dispatch (?) but how to best do this? I think I need something like two mutexes if I want to make the two threads mutually exclusive; consumer waits for the producer and the producer continues as soon as the consumer resumes running? Didn't really get this working so far and I'm really not sure how to best do this; CriticalSections vs. mutexes, something entirely different?
The reason I don't want to call SendInput from the producer thread is that that thread (the 'main thread'?) actually runs in response to a mouse move message, intercepted by a mouse hook, and sending more mouse messages from that thread didn't allow the thread to finnish before the simulated mouse move event was processed, messing things for me. As I suspected, moving the SendInput call to another thread so that the original thread could finish out of the way fixed the problem but now I need to make the consumer more responsive; mouse messages keep coming at a good pace I suppose since just a 1 ms Sleep made the loop too slow and message processing started lagging; all works well if I have no sleep.
Thanks.
Sounds like you want to use win32 event objects instead of mutexes or critical sections. See the docs here. The event functions allow a thread to wait on a condition that can be signaled from another thread.
Windows threads support message queues - usually used for windows messages but completely usable for messaging between worker threads. PostThreadMessage can be used in the Hook Proc to post messages to another thread for processing.
The worker thread can do a normal GetMessage loop to extract messages to process - rather than passing them on to DispatchMessage as you would in a UI thread you simply check the HWND is NULL in the message structure indicating its a thread message, then process the message yourself. In the case of potential message flooding PeekMessage can be used to cull any outstanding messages from the queue.

What are alternatives to Win32 PulseEvent() function?

The documentation for the Win32 API PulseEvent() function (kernel32.dll) states that this function is “… unreliable and should not be used by new applications. Instead, use condition variables”. However, condition variables cannot be used across process boundaries like (named) events can.
I have a scenario that is cross-process, cross-runtime (native and managed code) in which a single producer occasionally has something interesting to make known to zero or more consumers. Right now, a well-known named event is used (and set to signaled state) by the producer using this PulseEvent function when it needs to make something known. Zero or more consumers wait on that event (WaitForSingleObject()) and perform an action in response. There is no need for two-way communication in my scenario, and the producer does not need to know if the event has any listeners, nor does it need to know if the event was successfully acted upon. On the other hand, I do not want any consumers to ever miss any events. In other words, the system needs to be perfectly reliable – but the producer does not need to know if that is the case or not. The scenario can be thought of as a “clock ticker” – i.e., the producer provides a semi-regular signal for zero or more consumers to count. And all consumers must have the correct count over any given period of time. No polling by consumers is allowed (performance reasons). The ticker is just a few milliseconds (20 or so, but not perfectly regular).
Raymen Chen (The Old New Thing) has a blog post pointing out the “fundamentally flawed” nature of the PulseEvent() function, but I do not see an alternative for my scenario from Chen or the posted comments.
Can anyone please suggest one?
Please keep in mind that the IPC signal must cross process boundries on the machine, not simply threads. And the solution needs to have high performance in that consumers must be able to act within 10ms of each event.
I think you're going to need something a little more complex to hit your reliability target.
My understanding of your problem is that you have one producer and an unknown number of consumers all of which are different processes. Each consumer can NEVER miss any events.
I'd like more clarification as to what missing an event means.
i) if a consumer started to run and got to just before it waited on your notification method and an event occurred should it process it even though it wasn't quite ready at the point that the notification was sent? (i.e. when is a consumer considered to be active? when it starts or when it processes its first event)
ii) likewise, if the consumer is processing an event and the code that waits on the next notification hasn't yet begun its wait (I'm assuming a Wait -> Process -> Loop to Wait code structure) then should it know that another event occurred whilst it was looping around?
I'd assume that i) is a "not really" as it's a race between process start up and being "ready" and ii) is "yes"; that is notifications are, effectively, queued per consumer once the consumer is present and each consumer gets to consume all events that are produced whilst it's active and doesn't get to skip any.
So, what you're after is the ability to send a stream of notifications to a set of consumers where a consumer is guaranteed to act on all notifications in that stream from the point where it acts on the first to the point where it shuts down. i.e. if the producer produces the following stream of notifications
1 2 3 4 5 6 7 8 9 0
and consumer a) starts up and processes 3, it should also process 4-0
if consumer b) starts up and processes 5 but is shut down after 9 then it should have processed 5,6,7,8,9
if consumer c) was running when the notifications began it should have processed 1-0
etc.
Simply pulsing an event wont work. If a consumer is not actively waiting on the event when the event is pulsed then it will miss the event so we will fail if events are produced faster than we can loop around to wait on the event again.
Using a semaphore also wont work as if one consumer runs faster than another consumer to such an extent that it can loop around to the semaphore call before the other completes processing and if there's another notification within that time then one consumer could process an event more than once and one could miss one. That is you may well release 3 threads (if the producer knows there are 3 consumers) but you cant ensure that each consumer is released just the once.
A ring buffer of events (tick counts) in shared memory with each consumer knowing the value of the event it last processed and with consumers alerted via a pulsed event should work at the expense of some of the consumers being out of sync with the ticks sometimes; that is if they miss one they will catch up next time they get pulsed. As long as the ring buffer is big enough so that all consumers can process the events before the producer loops in the buffer you should be OK.
With the example above, if consumer d misses the pulse for event 4 because it wasn't waiting on its event at the time and it then settles into a wait it will be woken when event 5 is produced and since it's last processed counted is 3 it will process 4 and 5 and then loop back to the event...
If this isn't good enough then I'd suggest something like PGM via sockets to give you a reliable multicast; the advantage of this would be that you could move your consumers off onto different machines...
The reason PulseEvent is "unreliable" is not so much because of anything wrong in the function itself, just that if your consumer doesn't happen to be waiting on the event at the exact moment that PulseEvent is called, it'll miss it.
In your scenario, I think the best solution is to manually keep the counter yourself. So the producer thread keeps a count of the current "clock tick" and when a consumer thread starts up, it reads the current value of that counter. Then, instead of using PulseEvent, increment the "clock ticks" counter and use SetEvent to wake all threads waiting on the tick. When the consumer thread wakes up, it checks it's "clock tick" value against the producer's "clock ticks" and it'll know how many ticks have elapsed. Just before it waits on the event again, it can check to see if another tick has occurred.
I'm not sure if I described the above very well, but hopefully that gives you an idea :)
There are two inherent problems with PulseEvent:
if it's used with auto-reset events, it releases one waiter only.
threads might never be awaken if they happen to be removed from the waiting queue due to APC at the moment of the PulseEvent.
An alternative is to broadcast a window message and have any listener have a top-level message -only window that listens to this particular message.
The main advantage of this approach is that you don't have to block your thread explicitly. The disadvantage of this approach is that your listeners have to be STA (can't have a message queue on an MTA thread).
The biggest problem with that approach would be that the processing of the event by the listener will be delayed with the amount of time it takes the queue to get to that message.
You can also make sure you use manual-reset events (so that all waiting threads are awaken) and do SetEvent/ResetEvent with some small delay (say 150ms) to give a bigger chance for threads temporarily woken by APC to pick up your event.
Of course, whether any of these alternative approaches will work for you depends on how often you need to fire your events and whether you need the listeners to process each event or just the last one they get.
If I understand your question correctly, it seems like you can simply use SetEvent. It will release one thread. Just make sure it is an auto-reset event.
If you need to allow multiple threads, you could use a named semaphore with CreateSemaphore. Each call to ReleaseSemaphore increases the count. If the count is 3, for example, and 3 threads wait on it, they will all run.
Events are more suitable for communications between the treads inside one process (unnamed events). As you have described, you have zero ore more clients that need to read something interested. I understand that the number of clients changes dynamically. In this case, the best chose will be a named pipe.
Named Pipe is King
If you need to just send data to multiple processes, it’s better to use named pipes, not the events. Unlike auto-reset events, you don't need own pipe for each of the client processes. Each named pipe has an associated server process and one or more associated client processes (and even zero). When there are many clients, many instances of the same named pipe are automatically created by the operating system for each of the clients. All instances of a named pipe share the same pipe name, but each instance has its own buffers and handles, and provides a separate conduit for client/server communication. The use of instances enables multiple pipe clients to use the same named pipe simultaneously. Any process can act as both a server for one pipe and a client for another pipe, and vice versa, making peer-to-peer communication possible.
If you will use a named pipe, there would be no need in the events at all in your scenario, and the data will have guaranteed delivery no matter what happens with the processes – each of the processes may get long delays (e.g. by a swap) but the data will be finally delivered ASAP without your special involvement.
On The Events
If you are still interested in the events -- the auto-reset event is king! ☺
The CreateEvent function has the bManualReset argument. If this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled. This is not what you need. If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released.
These auto-reset events are very reliable and easy to use.
If you wait for an auto-reset event object with WaitForMultipleObjects or WaitForSingleObject, it reliably resets the event upon exit from these wait functions.
So create events the following way:
EventHandle := CreateEvent(nil, FALSE, FALSE, nil);
Wait for the event from one thread and do SetEvent from another thread. This is very simple and very reliable.
Don’t' ever call ResetEvent (since it automatically reset) or PulseEvent (since it is not reliable and deprecated). Even Microsoft has admitted that PulseEvent should not be used. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
This function is unreliable and should not be used, because only those threads will be notified that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode Asynchronous Procedure Calls at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
We have never used PulseEvent in our applications. As about auto-reset events, we are using them since Windows NT 3.51 (although they appeared in the first 32-bit version of NT - 3.1) and they work very well.
Your Inter-Process Scenario
Unfortunately, your case is a little bit more complicated. You have multiple threads in multiple processes waiting for an event, and you have to make sure that all the threads did in fact receive the notification. There is no other reliable way other than to create own event for each consumer. So, you will need to have as many events as are the consumers. Besides that, you will need to keep a list of registered consumers, where each consumer has an associated event name. So, to notify all the consumers, you will have to do SetEvent in a loop for all the consumer events. This is a very fast, reliable and cheap way. Since you are using cross-process communication, the consumers will have to register and de-register its events via other means of inter-process communication, like SendMessage. For example, when a consumer process registers itself at your main notifier process, it sends SendMessage to your process to request a unique event name. You just increment the counter and return something like Event1, Event2, etc, and creating events with that name, so the consumers will open existing events. When the consumer de-registers – it closes the event handle that it opened before, and sends another SendMessage, to let you know that you should CloseHandle too on your side to finally release this event object. If the consumer process crashes, you will end up with a dummy event, since you will not know that you should do CloseHandle, but this should not be a problem - the events are very fast and very cheap, and there is virtually no limit on the kernel objects - the per-process limit on kernel handles is 2^24. If you are still concerned, you may to the opposite – the clients create the events but you open them. If they won’t open – then the client has crashed and you just remove it from the list.

Resources