What exactly is kqueue's EV_RECEIPT for? - macos

The kqueue mechanism has an event flag, EV_RECEIPT, which according to the linked man page:
... is useful for making bulk changes to a kqueue
without draining any pending events. When passed as input,
it forces EV_ERROR to always be returned. When a filter is
successfully added the data field will be zero.
My understanding however is that it is trivial to make bulk changes to a kqueue without draining any pending events, simply by passing 0 for the nevents parameter to kevent and thus drawing no events from the queue. With that in mind, why is EV_RECEIPT necesary?
Some sample code in Apple documentation for OS X actually uses EV_RECEIPT:
kq = kqueue();
EV_SET(&changes, gTargetPID, EVFILT_PROC, EV_ADD | EV_RECEIPT, NOTE_EXIT, 0, NULL);
(void) kevent(kq, &changes, 1, &changes, 1, NULL);
But, seeing as the changes array is never examined after the kevent call, it's totally unclear to me why EV_RECEIPT was used in this case.
Is EV_RECEIPT actually necessary? In what situation would it really be useful?

If you are making bulk changes and one of them causes an error, then the event will be placed in the eventlist with EV_ERROR set in flags and the system error in data.
Therefore it is possible to identify which changelist element caused the error.
If you set nevents to zero, you get the error code but no indication of which event caused the error.
So EV_RECEIPT allows you to set nevents to a non-zero value without draining any pending events.

Related

Stop WndProc processing during shutdown of the app

I have defined a WndProc which looks similar to below (the code is written in C++Builder, but it applies in similar form to Delphi as well):
#define WM_SETTINGS_UPDATE WM_APP + 1234
#define WM_GUI_UPDATE WM_APP + 1235
void __fastcall TForm1::WndProc(TMessage &fMessage)
{
switch (fMessage.Msg)
{
default: break;
case WM_SETTINGS_UPDATE: ProcessMySettingsUpdate();
fMessage.Result = 1;
return;
break; // I know this is not needed, I do this purely for aesthetics
case WM_GUI_UPDATE: ProcessMyGUIUpdate();
fMessage.Result = 1;
return;
break; // I know this is not needed, I do this purely for aesthetics
}
TForm::WndProc(fMessage);
}
So in essence this:
checks for my custom WM_APP range messages (WM_SETTINGS_UPDATE and WM_GUI_UPDATE) in this case.
processes those messages and then sets fMessage.Result to 1 indicating message was processed and returns (avoiding TForm:WndProc(fMessage), which I understand is the same as inherited in Delphi, and not needed if the message is processed).
if my messages are not processed, it just calls TForm:WndProc(fMessage) (or inherited) to do the default processing.
My concern is this - what if there is a situation where the application is shutting down, and there are still some messages being unprocessed (left in the message queue by PostMessage())?
What I'd like to do in such a situation is to avoid calling my own processing, as it might result in some uninitialized memory Access Violations or similar. Is this basically enough, or do I need to apply some special processing to avoid that kind of situation, or process some "application shutting down" code?
And also - do I need to call inherited / TForm:WndProc(fMessage) after I am done with my processing, or is it safe to do it like above (set Result to 1 and return)?
Note - this does not necessarily apply to the MainForm (the first Form being created).
what if there is a situation where the application is shutting down, and there are still some messages being unprocessed (left in the message queue by PostMessage())?
That is perfectly OK. Once a Form's HWND is destroyed, any remaining messages that were posted for that window will simply be discarded when dispatched by the main message loop. Your WndProc will not be called for them.
What I'd like to do in such a situation is to avoid calling my own processing, as it might result in some uninitialized memory Access Violations or similar.
That should not be a concern. One, because messages won't be dispatched to your Form's HWND after it is destroyed, and two, when the MainForm is closed, the main message loop is signaled to stop running, before any active Form HWNDs are destroyed. So, there won't be any further message dispatching anyway.
Is this basically enough, or do I need to apply some special processing to avoid that kind of situation
It should be enough. You typically do not need to handle this special (unless your custom messages are carrying pointers to objects that have to be freed).
or process some "application shutting down" code?
If you really want to detect when the app is being shut down, try having your WndProc handle the WM_CLOSE message (or you can use the Form's OnClose/Query events).
do I need to call inherited / TForm:WndProc(fMessage) after I am done with my processing, or is it safe to do it like above (set Result to 1 and return)?
That is perfectly OK for custom messages.

How can you tell if SendMessage() was successful (message was delivered)?

is it possible to determine if a SendMessage() call was successful and delivered my message to the target window?
The SendMessage() description in the Windows API seems quiet on this and only says the following:
The return value specifies the result of the message processing; it depends on the message sent.
This obviously refers to the fact that the return code reflects the value returned by the wndproc of the target window.
But what will the return code be if the message wasn't delivered at all (e.g. due to access control, or due to the window having been destoroyed in the meantime)? How can I detect such situations?
There is no way to reliably tell, whether a call to SendMessage succeeded or failed. After all, it only has a single return value, and the entire range of values is used. There is no designated error value.
Playing tricks with the calling thread's last error code won't work either. The scheme proposed in this answer (clearing the last error code prior to the call and evaluating it after the call) is brittle. It's easy to see that it can fail, when sending a message to a window owned by the calling thread (as that window's window procedure is at liberty to change the last error any way it sees fit).
It's not quite as well known, that this can also fail when sending a message to a window owned by a different thread. The code suggests, that no intervening API call can happen in the calling thread. Yet in that scenario, SendMessage will also dispatch incoming cross-thread messages (see When can a thread receive window messages?), allowing the thread's last error code to change.
The only option then is to use SendMessageTimeoutW, as that API reports both the result of the call, as well as an error/success indicator. Passing the SMTO_NOTIMEOUTIFNOTHUNG flag makes sure that an arbitrarily chosen timeout will not adversely affect the outcome.
if SendMessage fail, it set last win32 error code, which we can get back via call GetLastError(). and returned error code, in case fail, never will be 0 (NOERROR) despite this is clear not documented. but how detect that SendMessage is fail ? here we can not base on return value. but we can SetLastError(NOERROR) before call SendMessage and check GetLastError() after:
if SendMessage fail - value returned by GetLastError() will be
not 0. (most common in this case ERROR_INVALID_WINDOW_HANDLE or
ERROR_ACCESS_DENIED).
if we still got 0 as last error - this mean that message was
delivered with no error.
however, possible case, when during SendMessage call, some window procedure
in current thread will be called and
code inside window procedure, set last error to non 0 value. so we
get finally not 0 last error, but not because SendMessage fail.
distinguish between these two cases very problematic
-
SetLastError(NOERROR);
LRESULT lr = SendMessageW(hwnd, *, *, *);
ULONG dwErrorCode = GetLastError();
if (dwErrorCode == NOERROR)
{
DbgPrint("message was delivered (%p)\n", lr);
}
else
{
DbgPrint("fail with %u error\n", dwErrorCode);
}

How GetKeyState works exactly?

I have been struggling with understand how GetKeyState operating. I have done endless google searching and haven't yet managed yet to understand exactly how it works
According to MSDN:
The key status returned from this function changes as a thread reads key messages from its message queue.
Take a look at the following code. I didn't create a message processing loop. 65 represents the virtual key of the character 'A'.
while(true) {
printf("the character %c, the vkey_state is %x",
MapVirtualKey(65, MAPVK_VK_TO_CHAR),GetKeyState(65) & 0x8000);
Sleep(150);
}
I pressed "A" on the keyboard, while being at the window console of my program.
sometimes, the vkey_state value is 0x8000 as expected, sometimes not.
What exactly is happening under the hood? I didn't write any message-processing code, so i assume it is created automatically. When I press 'A', a WM_KEYDOWN is sent to my thread message queue. When I release the key 'A', a WM_KEYUP is sent to my thread message queue. other key-related messages might be sent in between. What happens when I call GetKeyState? when exactly it will set the MSB of its return value to '1'? When will it change back to 0? Is it related to the calls to GetMessage?
In Addition - what confused me the most - is when I switched to another program (cmd.exe), and I typed 'A', my program was able to monitor it while being in the background - but cmd.exe thread has another message queue - why does it work? However - it didn't work If I started cmd.exe in elevated mode (high integrity).
this contradicts the information I found here.
If the user has switched to another program, then the GetKeyState function will not see the input that the user typed into that other program, since that input was not sent to your input queue.

3 queues + 1 finish or device-side checkpoints for all queues

Is there a special "wait for event" function that can wait for 3 queues at the same time at device side so it doesn't wait for all queues serially from host side?
Is there a checkpoint command to send into a command queue such that it must wait for other command queues to hit same(vertically) barrier/checkpoint to wait and continue from device side so no host-side round-trip is needed?
For now, I tried two different versions:
clWaitForEvents(3, evt_);
and
int evtStatus0 = 0;
clGetEventInfo(evt_[0], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus0, NULL);
while (evtStatus0 > 0)
{
clGetEventInfo(evt_[0], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus0, NULL);
Sleep(0);
}
int evtStatus1 = 0;
clGetEventInfo(evt_[1], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus1, NULL);
while (evtStatus1 > 0)
{
clGetEventInfo(evt_[1], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus1, NULL);
Sleep(0);
}
int evtStatus2 = 0;
clGetEventInfo(evt_[2], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus2, NULL);
while (evtStatus2 > 0)
{
clGetEventInfo(evt_[2], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus2, NULL);
Sleep(0);
}
second one is a bit faster(I saw it from someone else) and both are executed after 3 flush commands.
Looking at CodeXL profiler results, first one waits longer between finish points and some operations don't even seem to be overlapping. Second one shows 3 finish points are all within 3 milliseconds so it is faster and longer parts are overlapped(read+write+compute at the same time).
If there is a way to achieve this with only 1 wait command from host side, there must a "flush" version of it too but I couldn't find.
Is there any way to achieve below picture instead of adding flushes between each pipeline step?
queue1 write checkpoint write checkpoint write
queue2 - compute checkpoint compute checkpoint compute
queue3 - checkpoint read checkpoint read
all checkpoints have to be vertically synchronized and all these actions must not start until a signal is given. Such as:
queue1.ndwrite(...);
queue1.ndcheckpoint(...);
queue1.ndwrite(...);
queue1.ndcheckpoint(...);
queue1.ndwrite(...);
queue2.ndrangekernel(...);
queue2.ndcheckpoint(...);
queue2.ndrangekernel(...);
queue2.ndcheckpoint(...);
queue2.ndrangekernel(...);
queue3.ndread(...);
queue3.ndcheckpoint(...);
queue3.ndread(...);
queue3.ndcheckpoint(...);
queue3.ndread(...);
queue1.flush()
queue2.flush()
queue3.flush()
queue1.finish()
queue2.finish()
queue3.finish()
checkpoints are all handled in device side and only 3 finish commands are needed from host side(even better,only 1 finish for all queues?)
How I bind 3 queues to 3 events with "clWaitForEvents(3, evt_);" for now is:
hCommandQueue->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[0]);
hCommandQueue2->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[1]);
hCommandQueue3->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[2]);
if this "enqueue barrier" can talk with other queues, how could I achieve that? Do I need to keep host-side events alive until all queues are finished or can I delete them or re-use them later? From the documentation, it seems like first barrier's event can be put to second queue and second one's barrier event can be put to third one along with first one's event so maybe it is like:
hCommandQueue->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[0]);
hCommandQueue2->commandQueue.enqueueBarrierWithWaitList(evt_0, &evt[1]);
hCommandQueue3->commandQueue.enqueueBarrierWithWaitList(evt_0_and_1, &evt[2]);
in the end wait for only evt[2] maybe or using only 1 same event for all:
hCommandQueue->commandQueue.enqueueBarrierWithWaitList(sameEvt, &evt[0]);
hCommandQueue2->commandQueue.enqueueBarrierWithWaitList(sameEvt, &evt[1]);
hCommandQueue3->commandQueue.enqueueBarrierWithWaitList(sameEvt, &evt[2]);
where to get sameEvt object?
anyone tried this? Should I start all queues with a barrier so they dont start until I raise some event from host side or lazy-executions of "enqueue" is %100 trustable to "not to start until I flush/finish" them? How do I raise an event from host to device(sameEvt doesn't have a "raise" function, is it clCreateUserEvent?)?
All 3 queues are in-order type and are in same context. Out-of-order type is not supported by all graphics cards. C++ bindings are being used.
Also there are enqueueWaitList(is this deprecated?) and clEnqueueMarker but I don't know how to use them and documentation doesn't have any example in Khronos' website.
You asked too many questions and expressed too many variants to provide you with the only solution, so I will try to answer in general that you can figure out the most suitable solution.
If the queues are bind to the same context (possibly to different devices within the same context) than it is possible to synchronize them through the events. I.e. you can obtain an event from a command submitted to one queue and use this event to synchronize a command submitted to another queue, e.g.
queue1.enqueue(comm1, /*dependency*/ NULL, /*result event*/ &e1);
queue2.enqueue(comm2, /*dependency*/ &e1, /*result event*/ NULL);
In this example, comm2 will wait for comm1 completion.
If you need to enqueue commands first but no to allow them to be executed you can create user event (clCreateUserEvent) and signal it manually (clSetUserEventStatus). The implementation is allowed to process command as soon as they enqueued (the driver is not required to wait for the flush).
The barrier seems overkill for your purpose because it waits for all commands previously submitted to the queue. You can really use clEnqueueMarker that can be used to wait for all events and provide one event to be used for other commands.
As far as I know you can retain the event at any moment if you do not need it more. The implementation should prolong the event life-time if it is required for internal purposes.
I do not know what is enqueueWaitList.
Off-topic: if you need non-trivial dependencies between calculations you may want to consider TBB flow graph and opencl_node. The opencl_node uses events for syncronization and avoids "host-device" synchronizations if possible. However, it can be tricky to use multiple queues for the same device.
As far as I know, Intel HD Graphics 530 supports out-of-order queues (at least host-side).
You are making it much harder than it needs to be. On the write queue take an event. Use that as a condition for the compute on the compute queue, and take another event. Use that as a condition on the read on the read queue. There is no reason to force any other synchronization. Note: My interpretation of the spec is that you must clFlush on a queue that you took an event from before using that event as a condition on another queue.

Is the OVERLAPPED::hEvent set even if pipe connection is immediate with ERROR_PIPE_CONNECTED?

If, after an asynchronous ConnectNamedPipe(), one gets ERROR_PIPE_CONNECTED from GetLastError(), will the event in the OVERLAPPED structure passed to the function still be set, or does it only get set if the result was ERROR_IO_PENDING?
A secondary question is, if the completion notification mode is set to FILE_SKIP_SET_EVENT_ON_HANDLE, documentation specifies that the handle's event won't be set, but that the OVERLAPPED structure's event will still be set, if existing. My question is, what is the use of the handle event, and why is that setting not default?
From this example if GetLastError() returned ERROR_PIPE_CONNECTED then you have to set the event yourself.

Resources