I have a sync process that waits, by polling a database, for an async process to respond. Basically the async process inserts a new record into a database, and the sync process checks periodically for an update to it, continuing its execution after an update is found. The async process is not invoked by the sync one.
I'm believing that this isn't the only solution available, concerning a sync process waiting for an async one. My question here is, is there any better way to do it? Concerning performance specially.
If there's already an answer to this question, i'm sorry, but i haven't found it.
A better way would be to send a message from the async process to the sync one. The latter could just block with a receive activity until the message arrives. This is better from a performance viewpoint, because the sync process does not have to wake up regularily and check the database. Instead, it could just sleep until the engine detects the arrival of the message and wakes up the sync process.
Please note that this implies that you have to use correlationSets [1] on the side of the sync process, which have to be initiated some point in time before waiting on the actual message (for instance during your initial receive). This is needed so that the engine can decide to which process instance to forward the message to. You also might want to add an additional partnerLink with myRole to the sync process so that there is an operation to invoke by the async process (which then also needs a corresponding partnerLink). The structure of the sync process which I mean looks somehow like this:
<sequence>
<receive name="initialSyncReceive" createInstance="yes" partnerLink="originalPartnerLink" ... />
<!-- some logic before waiting on async process -->
<receive name="notificationFromAsync" partnerLink="newPartnerLinkForAsyncOnly" ... />
<!-- some logic after notification from async process -->
<reply name="replyToSyncRequest" partnerLink="originalPartnerLink" ... />
<sequence>
Disclaimer: This is just an outline and you cannot just copy/paste the code. Getting this scenario to work requires a number of non-trivial changes to both processes, e.g. adding correlationSets, additional WSDLs, propertyAliases, and partnerlinks.
Related
Say I am doing an Ask() on actor with some timeout, if the ask timesout, is there a way to get the underlying actor to stop processing things? For example, I don't want the main thread / caller to continue and this actor is still processing the timed out request
Short answer is no, you cannot do it.
Long answer is it depends.
You could move actor's work to another execution context via a Future for example. This will allow your actor react on other messages but that Future that actor has started cannot be cancelled if it was picked up by an execution context and not hanging in the queue of the execution context.
You could write some smart Future wrapper that would check if future was cancelled before starting the work. But if processing has started, the only thing you can do is calling interrupt on the thread executing the future (meaning that you need to capture this thread somehow) and hopping that the work will hit Object.wait or Thread.sleep methods, ie the only places when the interrupt exception can be received. But there is no guarantee of this ever happening.
No you can't. The only thing you can do to an actor is send a message to it. You can't 'get' the actor in any kind of other kind of way to interrupt it. And since, under normal circumstances, messages are processed in order any subsequent "stop processing" message will only be processed after the first message was already completed.
I think the solution to your problem will depend a bit on why you suspect the actor may "time out" in responding.
For example, where you might expect the Actor may sometimes have a backlog of messages, I think the best solution may be to include the timeout as part of the message. In your receiveMessage handler you can check this "request expiry" time before doing actual work, and if the timeout has already passed, just discard the message.
Is it safe to assume that async workflows will execute in the order they were triggered? For e.g.
WF1 on custom object gets triggered.
WF1 causes WF2 and WF3 to get triggered in that order i.e. they both go to the async queue.
Can I safely assume WF2 will actually execute before WF3?
I couldn't find anything official that talk about this so maybe shouldn't assume?
There is no guarantee that it will be FIFO. Async services can process the async jobs in an order of available resources.
Any asynchronous operation that has a AsyncOperation.DependencyToken value of null executes independent of all other asynchronous operations in the queue. The order of execution with regard to other independent operations is not guaranteed. However, asynchronous operations created earlier have a better chance of executing before operations created later. This assumes that the operations are not postponed and are not set to a state of Completed.
The dependency token must be set when the asynchronous operation is created. Because Dynamics 365 creates asynchronous operations for bulk operations such as bulk email, bulk delete, and import, you cannot make use of the dependency token for these operations. In addition, the dependency token cannot be used to order execution of asynchronous registered plug-ins because the asynchronous operation that executes plug-ins is created by the Queue Manager.
Read more
Activate means different in CRM WF case. I assume you are talking about triggering WF. If the WF2 and WF3 are triggered as child WF sequentially inside WF1, then Dependency token will be issued by platform accordingly to go in order. ie first WF2 then WF3.
I don't understand part of the latest Windows threadpool API. I need help with that.
From the documentation, the recipe to use it for I/O (in my case, for SOCKET) can be summarized as follows:
Call CreateThreadpoolIo.
Call StartThreadpoolIo. You can find this warning there:
You must call this function before initiating each asynchronous I/O operation on the file handle bound to the I/O completion object. Failure to do so will cause the thread pool to ignore an I/O operation when it completes and will cause memory corruption.
Call the operation on the file handle (e.g., WSARecvFrom). If it fails, call CancelThreadpoolIo. Otherwise, process the result when it is available. WSARecvFrom, when used asynchronously, asks for a WSAOVERLAPPED (that you have to create beforehand) but not for any information that links it to the previous call to StartThreadpoolIo. CancelThreadpoolIo only asks for the PTP_IO, but not for any additional information to derive a specific asynchronous operation.
Repeat steps 2 and 3.
Call CloseThreadpoolIo to finish. You can find this warning there:
It may be necessary to cancel threadpool I/O notifications to prevent memory leaks. For more information, see CancelThreadpoolIo.
I usually need it for UDP, so I strive to have several reception operations queued (asynchronous WSARecvFrom operations started) at any given time. That way I don't have to rush to start another reception operation at the beginning of the callback function nor synchronize access to the reception buffers (I can have a pool of them, each one able to contain a datagram, and reissue the reception operation when I finish processing each message; in the interim, other queued operations will keep the receiver busy). Datagrams are independent and self contained. I'm aware that this approach may not be valid for TCP.
StartThreadpoolIo/CancelThreadpoolIo seem to me the source of the problem: StartThreadpoolIo and WSARecvFrom are not directly bound (they don't share any arguments). So:
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed and not any of the pending ones?
You can say, "don't call StartThreadpoolIo concurrently". I can live without several concurrent WSARecvFrom's, but I can't live without concurrent WSARecvFrom and WSASendTo. So I think being unable to have several asynchronous operations at the same time can't be the way the API was designed.
You can say, "call StartThreadpoolIo only once, that will suffice to register the callback; it is an on/off process". But the documentation says:
You must call this function before initiating each asynchronous I/O operation on the file handle...
You can say, "it cancels the operation started by the same thread that just called StartThreadpoolIo". But then the advice of calling CancelThreadpoolIo in the context of calling CloseThreadpoolIo doesn't make sense (I will call CloseThreadpoolIo from the thread that triggers stopping, which will be completely independent from the threads issuing the asynchronous operations; and a single call to CancelThreadpoolIo may not be enough to cancel several operations). Being unable to trigger cancellation from a different thread is a serious limitation, anyway. I'm aware of the existence of CreateThreadpoolCleanupGroup, but my question is more fundamental. I want to understand how this API can be fundamentally right and useful.
You can say "call CreateThreadpoolIo several times, so that you have independent PTP_IO's to work with". It doesn't work. When I call CreateThreadpoolIo a second time, nullptr is returned.
Am I wrong, or is this API awkward? Normally, other asynchronous APIs work with one of these patterns:
Create an operation and receive a handle => call methods passing the handle.
Create a reusable handle => call methods (including starting operations) passing the handle.
The latest Windows threadpool API, in which the handle seems to be implicit, or there are several handles for the same operation (TP_IO, WSAOVERLAPPED, StartThreadpoolIo) and they aren't all explicitly linked together, uses neither of them.
Thank you very much for your help.
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed
and not any of the pending ones?
CancelThreadpoolIo() doesn't cancel IO. It is reciprocal to StartThreadpoolIo(). StartThreadpoolIo() prepares threadpool to accept a completion. If threadpool doesn't expect a completion, it won't wait for it, thus you may miss it. If threadpool expects a completion but completion doesn't happen, threadpool may waste resources.
CancelThreadpoolIo() undoes whatever StartThreadpoolIo() did.
The documentation for the Win32 API PulseEvent() function (kernel32.dll) states that this function is “… unreliable and should not be used by new applications. Instead, use condition variables”. However, condition variables cannot be used across process boundaries like (named) events can.
I have a scenario that is cross-process, cross-runtime (native and managed code) in which a single producer occasionally has something interesting to make known to zero or more consumers. Right now, a well-known named event is used (and set to signaled state) by the producer using this PulseEvent function when it needs to make something known. Zero or more consumers wait on that event (WaitForSingleObject()) and perform an action in response. There is no need for two-way communication in my scenario, and the producer does not need to know if the event has any listeners, nor does it need to know if the event was successfully acted upon. On the other hand, I do not want any consumers to ever miss any events. In other words, the system needs to be perfectly reliable – but the producer does not need to know if that is the case or not. The scenario can be thought of as a “clock ticker” – i.e., the producer provides a semi-regular signal for zero or more consumers to count. And all consumers must have the correct count over any given period of time. No polling by consumers is allowed (performance reasons). The ticker is just a few milliseconds (20 or so, but not perfectly regular).
Raymen Chen (The Old New Thing) has a blog post pointing out the “fundamentally flawed” nature of the PulseEvent() function, but I do not see an alternative for my scenario from Chen or the posted comments.
Can anyone please suggest one?
Please keep in mind that the IPC signal must cross process boundries on the machine, not simply threads. And the solution needs to have high performance in that consumers must be able to act within 10ms of each event.
I think you're going to need something a little more complex to hit your reliability target.
My understanding of your problem is that you have one producer and an unknown number of consumers all of which are different processes. Each consumer can NEVER miss any events.
I'd like more clarification as to what missing an event means.
i) if a consumer started to run and got to just before it waited on your notification method and an event occurred should it process it even though it wasn't quite ready at the point that the notification was sent? (i.e. when is a consumer considered to be active? when it starts or when it processes its first event)
ii) likewise, if the consumer is processing an event and the code that waits on the next notification hasn't yet begun its wait (I'm assuming a Wait -> Process -> Loop to Wait code structure) then should it know that another event occurred whilst it was looping around?
I'd assume that i) is a "not really" as it's a race between process start up and being "ready" and ii) is "yes"; that is notifications are, effectively, queued per consumer once the consumer is present and each consumer gets to consume all events that are produced whilst it's active and doesn't get to skip any.
So, what you're after is the ability to send a stream of notifications to a set of consumers where a consumer is guaranteed to act on all notifications in that stream from the point where it acts on the first to the point where it shuts down. i.e. if the producer produces the following stream of notifications
1 2 3 4 5 6 7 8 9 0
and consumer a) starts up and processes 3, it should also process 4-0
if consumer b) starts up and processes 5 but is shut down after 9 then it should have processed 5,6,7,8,9
if consumer c) was running when the notifications began it should have processed 1-0
etc.
Simply pulsing an event wont work. If a consumer is not actively waiting on the event when the event is pulsed then it will miss the event so we will fail if events are produced faster than we can loop around to wait on the event again.
Using a semaphore also wont work as if one consumer runs faster than another consumer to such an extent that it can loop around to the semaphore call before the other completes processing and if there's another notification within that time then one consumer could process an event more than once and one could miss one. That is you may well release 3 threads (if the producer knows there are 3 consumers) but you cant ensure that each consumer is released just the once.
A ring buffer of events (tick counts) in shared memory with each consumer knowing the value of the event it last processed and with consumers alerted via a pulsed event should work at the expense of some of the consumers being out of sync with the ticks sometimes; that is if they miss one they will catch up next time they get pulsed. As long as the ring buffer is big enough so that all consumers can process the events before the producer loops in the buffer you should be OK.
With the example above, if consumer d misses the pulse for event 4 because it wasn't waiting on its event at the time and it then settles into a wait it will be woken when event 5 is produced and since it's last processed counted is 3 it will process 4 and 5 and then loop back to the event...
If this isn't good enough then I'd suggest something like PGM via sockets to give you a reliable multicast; the advantage of this would be that you could move your consumers off onto different machines...
The reason PulseEvent is "unreliable" is not so much because of anything wrong in the function itself, just that if your consumer doesn't happen to be waiting on the event at the exact moment that PulseEvent is called, it'll miss it.
In your scenario, I think the best solution is to manually keep the counter yourself. So the producer thread keeps a count of the current "clock tick" and when a consumer thread starts up, it reads the current value of that counter. Then, instead of using PulseEvent, increment the "clock ticks" counter and use SetEvent to wake all threads waiting on the tick. When the consumer thread wakes up, it checks it's "clock tick" value against the producer's "clock ticks" and it'll know how many ticks have elapsed. Just before it waits on the event again, it can check to see if another tick has occurred.
I'm not sure if I described the above very well, but hopefully that gives you an idea :)
There are two inherent problems with PulseEvent:
if it's used with auto-reset events, it releases one waiter only.
threads might never be awaken if they happen to be removed from the waiting queue due to APC at the moment of the PulseEvent.
An alternative is to broadcast a window message and have any listener have a top-level message -only window that listens to this particular message.
The main advantage of this approach is that you don't have to block your thread explicitly. The disadvantage of this approach is that your listeners have to be STA (can't have a message queue on an MTA thread).
The biggest problem with that approach would be that the processing of the event by the listener will be delayed with the amount of time it takes the queue to get to that message.
You can also make sure you use manual-reset events (so that all waiting threads are awaken) and do SetEvent/ResetEvent with some small delay (say 150ms) to give a bigger chance for threads temporarily woken by APC to pick up your event.
Of course, whether any of these alternative approaches will work for you depends on how often you need to fire your events and whether you need the listeners to process each event or just the last one they get.
If I understand your question correctly, it seems like you can simply use SetEvent. It will release one thread. Just make sure it is an auto-reset event.
If you need to allow multiple threads, you could use a named semaphore with CreateSemaphore. Each call to ReleaseSemaphore increases the count. If the count is 3, for example, and 3 threads wait on it, they will all run.
Events are more suitable for communications between the treads inside one process (unnamed events). As you have described, you have zero ore more clients that need to read something interested. I understand that the number of clients changes dynamically. In this case, the best chose will be a named pipe.
Named Pipe is King
If you need to just send data to multiple processes, it’s better to use named pipes, not the events. Unlike auto-reset events, you don't need own pipe for each of the client processes. Each named pipe has an associated server process and one or more associated client processes (and even zero). When there are many clients, many instances of the same named pipe are automatically created by the operating system for each of the clients. All instances of a named pipe share the same pipe name, but each instance has its own buffers and handles, and provides a separate conduit for client/server communication. The use of instances enables multiple pipe clients to use the same named pipe simultaneously. Any process can act as both a server for one pipe and a client for another pipe, and vice versa, making peer-to-peer communication possible.
If you will use a named pipe, there would be no need in the events at all in your scenario, and the data will have guaranteed delivery no matter what happens with the processes – each of the processes may get long delays (e.g. by a swap) but the data will be finally delivered ASAP without your special involvement.
On The Events
If you are still interested in the events -- the auto-reset event is king! ☺
The CreateEvent function has the bManualReset argument. If this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled. This is not what you need. If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released.
These auto-reset events are very reliable and easy to use.
If you wait for an auto-reset event object with WaitForMultipleObjects or WaitForSingleObject, it reliably resets the event upon exit from these wait functions.
So create events the following way:
EventHandle := CreateEvent(nil, FALSE, FALSE, nil);
Wait for the event from one thread and do SetEvent from another thread. This is very simple and very reliable.
Don’t' ever call ResetEvent (since it automatically reset) or PulseEvent (since it is not reliable and deprecated). Even Microsoft has admitted that PulseEvent should not be used. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
This function is unreliable and should not be used, because only those threads will be notified that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode Asynchronous Procedure Calls at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
We have never used PulseEvent in our applications. As about auto-reset events, we are using them since Windows NT 3.51 (although they appeared in the first 32-bit version of NT - 3.1) and they work very well.
Your Inter-Process Scenario
Unfortunately, your case is a little bit more complicated. You have multiple threads in multiple processes waiting for an event, and you have to make sure that all the threads did in fact receive the notification. There is no other reliable way other than to create own event for each consumer. So, you will need to have as many events as are the consumers. Besides that, you will need to keep a list of registered consumers, where each consumer has an associated event name. So, to notify all the consumers, you will have to do SetEvent in a loop for all the consumer events. This is a very fast, reliable and cheap way. Since you are using cross-process communication, the consumers will have to register and de-register its events via other means of inter-process communication, like SendMessage. For example, when a consumer process registers itself at your main notifier process, it sends SendMessage to your process to request a unique event name. You just increment the counter and return something like Event1, Event2, etc, and creating events with that name, so the consumers will open existing events. When the consumer de-registers – it closes the event handle that it opened before, and sends another SendMessage, to let you know that you should CloseHandle too on your side to finally release this event object. If the consumer process crashes, you will end up with a dummy event, since you will not know that you should do CloseHandle, but this should not be a problem - the events are very fast and very cheap, and there is virtually no limit on the kernel objects - the per-process limit on kernel handles is 2^24. If you are still concerned, you may to the opposite – the clients create the events but you open them. If they won’t open – then the client has crashed and you just remove it from the list.
For another question, I'm running into a misconception that seems to arise here at SO occasionally. Some questioners seem to think that Triggers are to Databases as Events are to OOP.
Does anyone have a good analogy to explain why this is a flawed comparison, and the consequences of misapplying it?
EDIT:
Bill K. has hit it correctly, but maybe doesn't see the importance of the critical differeence between the event and the callback function that strikes me, anyway. Triggers actually cause code to execute every time the event occurs; callbacks only occur whenever one has been registered for an event (which is not true for the vast majority of events); and even then, in most cases the callback's first action is to deregister itself (or at least the callback contains a qualifcation exit so it only executes once.)
If you write a trigger, it will unfailingly execute every time the event occurs, because there's no way to register or deregister to code segment.
Triggers are a way to interpose repeating logic synchronously into the thread of execution (i.e. synchronicity). Events are a means to defer logic until later (i.e. implement asynchronicity).
There are exceptions and mitigations in both cases, but the basic patterns of triggers and callbacks are mostly opposite in intention and implementation. Often the distinction doesn't seem to have fully sunk in. (IMHO, YMMV). :D
They're not the same thing, but they're not unrelated.
In both cases, the mechanism can be described approximately as follows:
Some block of code declares "interest" for changes in state.
Your application affects some change.
The system runs the block of code in response to the change.
Perhaps a database trigger is more like a callback function that has registered interest in a specific event.
Here's an analogy: the event is a rubber ball that you throw. The trigger is a dog that chases after a thrown ball.
If there's some other difference that you have in mind that makes it "dangerous" (note: OP has edited this choice of word out of the question) to compare triggers and events, you can describe what you mean.
Triggers are a way to interpose
repeating logic synchronously into the
thread of execution (i.e.
synchronicity). Events are a means to
defer logic until later (i.e.
implement asynchronicity).
Okay, I see what you mean more clearly. But I think it's in some ways subject to the implementation. I wouldn't assume an event handler has to deregister itself; it depends on the system you're using. A UNIX signal handler, for example, has to prevent itself from catching a new signal while it's already handling one. But a Java servlet inside a Tomcat container should be thread-safe because it may be called concurrently by multiple threads. They're both event handlers, of different kinds.
Event handlers may be synchronous or asynchronous. Can a handler in a publish/subscribe system read messages that were posted recently, but prior to the handler registering its interest? Or only messages posted concurrently?
There's another important reason to treat triggers as different from event handlers: I frequently recommend against doing anything in a trigger that affects state outside the database.
For example, sending an email, writing to a file, posting to a web service, or forking a process is inappropriate inside a trigger. If for no other reason than the transaction that spawned the trigger may be rolled back, but you can't roll back those external effects. You may not even be using explicit transactions, but say you send an email in a BEFORE trigger, but the operation fails because of a NOT NULL constraint or something.
Instead, all such work should be done by code in one's application, after one has confirmed that the SQL operation was successful and the transaction committed.
It's too bad that people keep trying to do inappropriate work inside a trigger. There are senior developers at MySQL who promote UDFs to read and write data in memcached. Wow -- I just noticed these have made it into the MySQL 6.0 product!! Shocking!
So here's another attempt at an analogy, comparing triggers and events to the process of a criminal trial:
A BEFORE trigger is an allegation.
An AFTER trigger is an indictment.
COMMIT is a conviction after a guilty verdict.
ROLLBACK is an acquittal after an innocent verdict.
You only want to put the perpetrator in prison after they are convicted.
Whereas an EVENT is the crime itself.