activemessaging with stomp and activemq.prefetchSize=1 - ruby

I have a situation where I have a single activemq broker with 2 queues, Q1 and Q2. I have two ruby-based consumers using activemessaging. Let's call them C1 and C2. Both consumers subscribe to each queue. I'm setting activemq.prefetchSize=1 when subscribing to each queue. I'm also setting ack=client.
Consider the following sequence of events:
1) A message that triggers a long-running job is published to queue Q1. Call this M1.
2) M1 is dispatched to consumer C1, kicking off a long operation.
3) Two messages that trigger short jobs are published to queue Q2. Call these M2 and M3.
4) M2 is dispatched to C2 which quickly runs the short job.
5) M3 is dispatched to C1, even though C1 is still running M1. It's able to dispatch to C1 because prefetchSize=1 is set on the queue subscription, not on the connection. So the fact that a Q1 message has already been dispatched doesn't stop one Q2 message from being dispatched.
Since activemessaging consumers are single-threaded, the net result is that M3 sits and waits on C1 for a long time until C1 finishes processing M1. So, M3 is not processed for a long time, despite the fact that consumer C2 is sitting idle (since it quickly finishes with message M2).
Essentially, whenever a long Q1 job is run and then a whole bunch of short Q2 jobs are created, exactly one of the short Q2 jobs gets stuck on a consumer waiting for the long Q1 job to finish.
Is there a way to set prefetchSize at the connection level rather than at the subscription level? I really don't want any messages dispatched to C1 while it is processing M1. The other alternative is that I could create a consumer dedicated to processing Q1 and then have other consumers dedicated to processing Q2. But, I'd rather not do that since Q1 messages are infrequent--Q1's dedicated consumers would sit idle most of the day tying up memory.

The activemq.prefetchSize is only available on a SUBSCRIBE message, not a CONNECT, according to the ActiveMQ docs for their extended stomp headers (http://activemq.apache.org/stomp.html). Here is the relevant info:
verb: SUBSCRIBE
header: activemq.prefetchSize
type: int
description: Specifies the maximum
number of pending messages that will
be dispatched to the client. Once this
maximum is reached no more messages
are dispatched until the client
acknowledges a message. Set to 1 for
very fair distribution of messages
across consumers where processing
messages can be slow.
My reading and experience with this, is that since M1 has not been ack'd (b/c you have client ack turned on), that this M1 should be the 1 message allowed by prefetchSize=1 set on the subscription. I am surprised to hear that it didn't work, but perhaps I need to run a more detailed test. Your settings should be correct for the behavior you want.
I have heard of flakiness from others about the activemq dispatch, so it is possible this is a bug with the version you are using.
One suggestion I would have is to either sniff the network traffic to see if the M1 is getting ack'd for some reason, or throw some puts statements into the ruby stomp gem to watch the communication (this is what I usually end up doing when debugging stomp problems).
If I get a chance to try this out, I'll update my comment with my own results.
One suggestion: It is very possible that multiple long processing messages could be sent, and if the number of long processing messages exceeds your number of processes, you'll be in this fix where quick processing messages are waiting.
I tend to have at least one dedicated process that just does quick jobs, or to put it another way, dedicate a set # of processes that just do longer jobs. Having all poller consumer processes listen to both long and short can end up with sub-optimal results no matter what dispatch does. Process groups are the way to configure a consumer to listen to a subset of destinations: http://code.google.com/p/activemessaging/wiki/Configuration
processor_group name,
*list_of_processors
A processor group is a way to run the poller to only execute a subset of
the processors by passing the name of
the group in the poller command line
arguments.
You specify the name of the processor as its underscored lowercase
version. So if you have a
FooBarProcessor and BarFooProcessor in
a processor group, it would look like
this:
ActiveMessaging::Gateway.define do |s|
...
s.processor_group :my_group, :foo_bar_processor, :bar_foo_processor
end
The processor group is passed into the poller like the following:
./script/poller start -- process-group=my_group

I'm not sure if ActiveMessaging supports this, but you could unsubscribe your other consumers when the long processing message arrives and then re-subscribe them after it get processed.
It should give you the desired effect.

Related

ActiveMQ - Competing Consumers with Selector - messages starve in the queue

ActiveMQ 5.15.13
Context: I have a single queue with multiple Consumers. I want to stop some consumers from processing certain messages. This has to be dynamic, I don't want to create separate queues for this. This works without any problems. e.g. Consumer1 ignores Stocks -> Consumer1 can process all invoices and Consumer2 can process all Stocks
But if there is a large number of messages already in the Queue (of one type, e.g. stocks) and I send a message of another type (e.g. invoices), Consumer1 won't process the message of type invoices. It will instead be idle until Consumer2 has processed all Stocks messages. It does not happen every time, but quite often.
Is there any option to change the order of the new messages coming into the queue, such that an idle consumer with matching selector picks up the new message?
Things I've already tried:
using a PendingMessageLimitStrategy -> it seems like it does not work for queues
increasing the maxPageSize and maxBrowsePageSize in the hope that once all Messages are in RAM, the Consumers will search for their messages.
Exclusive Consumers aren't an option since I want to be able to use more than one Consumer per message type.
Im pretty sure that there is some configuration which allows this type of usage. I'm aware that there are better solutions for this issue, but sadly I can't use them easily due to other constraints.
Thanks a lot in advance!
EDIT: I noticed that when I'm refreshing on the localhost queue browser, the stuck messages get executed immediately. It seems like this action performs some sort of queue refresh where the messages get filtered based on their selector again. So I just need this action whenever a new message enters the queue...
This is a 'window' problem where the next set of 'stocks' data needs to be processed before the 'invoicing' data can be processed.
The gotcha with window problems like this is that you need to account for the fact that some messages may never come through, or a consumer may never come back online either. Also, eventually you will be asked 'how many invoices or stocks are left to be processed'-- aka observability.
ActiveMQ has you covered-- check out wild-card destinations and consumers.
Produce 'stocks' to:
queue://data.stocks.input
Produce 'invoices' to:
queue://data.invoices.input
You then setup consumes to connect:
queue://data.*.input
note: the wildard '*'.
ActiveMQ will match queues based on the wildcard pattern, and then process data accordingly. As a bonus, you can still use a selector.

Does EventStoreDB provide message ordering by an event-key on the consumer side?

I have been exploring EventStoreDB and trying to understand more about the ordering of messages on the consumer side. Read about persistent subscriptions and also the Pinned consumer strategy here.
I have a scenario wherein inventory updates get pushed to eventstore and different streams get created by the different unique inventoryIds in the inventory event.
We have multiple consumers with the same consumerGroup name to read these inventory events. We are using Pinned Persistent Subscription with ResolveLinkTos enabled.
My question:
Will every message from a particular stream always go to the same consumer instance of the consumerGroup?
If the answer to the above question is yes, will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
The documentation has a warning that ordered message processing using persistent subscriptions is not guaranteed. Any strategy delivers messages with the best-effort level of ordering guarantees, if applicable.
There are a few reasons for this, some of those are:
Spreading out messages across consumer groups lead to a non-linearised checkpoint commit. It means that some messages can be processed before other messages.
Persistent subscriptions attempt to buffer messages, but when a timeout happens on the client side, the whole buffer is redelivered, which can eventually break the processing order
Built-in retry policies essentially can break the message order at any time
Most event log-based brokers, if not all, don't even attempt to guarantee ordered message delivery across multiple consumers. I often hear "but Kafka does it", ignoring the fact that Kafka delivers messages from one partition to at most one consumer in a group. There's no load balancing of one partition between multiple consumers due to exactly the same issue. That being said, EventStoreDB is still not a broker, but a database for events.
So, here are the answers:
Will every message from a particular stream always go to the same consumer instance of the consumer group?
No. It might work most of the time, but it will eventually break.
will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
Most of the time, yes, but again, if a message is being retried, you might get the next message before the previous one is Acked.
Overall, load-balancing ordered processing of messages, which aren't pre-partitioned on the server is not an easy task. At most, you get messages re-delivered if the checkpoint fails to persist at some point, and the consumers restart.

AWS SQS - Queue not delivering any messages until Visibility Timeout expires for one message

EDIT: Solved this one while I was writing it up :P -- I love those kind of solutions. I figured I'd post it anyway, maybe someone else will have the same problem and find my solution. Don't care about points/karma, etc. I just already wrote the whole thing up, so figured I'd post it and the solution.
I have an SQS FIFO queue. It is using a dead letter queue. Here is how it had been configured:
I have a single producer microservice, and I have 10 ECS images that are running as consumers.
It is important that we process the messages close to the time they are delivered in the queue for business reasons.
We're using a fairly recent version of the AWS SDK Golang client package for both producer and consumer code (if important, I can go look up the version, but it is not terribly outdated).
I capture the logs for the producer so I know exactly when messages were put in the queue and what the messages were.
I capture aggregate logs for all the consumers, so I have a full view of all 10 consumers and when messages were received and processed.
Here's what I see under normal conditions looking at the logs:
Message put in the queue at time x
Message received by one of the 10 consumers at time x
Message processed by consumer successfully
Message deleted from queue by consumer at time x + (0-2 seconds)
Repeat ad infinitum for up to about 700 messages / day at various times per day
But the problem I am seeing now is that some messages are not being processed in a timely manner. Occasionally we fail processing a message deliberately b/c of the state of the system for that message (e.g. maybe users still logged in, so it should back off and retry...which it does). The problem is if the consumer fails a message it is causing the queue to stop delivering any other messages to any other consumers.
"Failure to process a message" here just means the message was received, but the consumer declared it a failure, so we just log an error, and do not proceed to delete it from the queue. Thus, the visibility timeout (here 5m) will expire and it will be re-delivered to another consumer and retried up to 10 times, after which it will go to the dead letter queue.
After delving into the logs and analyzing it, here's what I'm seeing:
Process begins like above (message produced, consumed, deleted).
New message received at time x by consumer
Consumer fails -- logs error and just returns (does not delete)
Same message is received again at time x + 5m (visibility timeout)
Consumer fails -- logs error and just returns (does not delete)
Repeat up to 10x -- message goes to dead-letter queue
New message received but it is now 50 minutes late!
Now all messages that were put in the queue between steps 2-7 are 50 minutes late (5m visibility timeout * 10 retries)
All the docs I've read tells me the queue should not behave this way, but I've verified it several times in our logs. Sadly, we don't have a paid AWS support plan, or I'd file a ticket with them. But just consider the fact that we have 10 separate consumers all reading from the same queue. They only read from this queue. We don't have any other queues it is using.
For de-duplication we are using the automated hash of the message body. Messages are small JSON documents.
My expectation would be if we have a single bad message that causes a visibility timeout, that the queue would still happily deliver any other messages it has available while there are available consumers.
OK, so turns out I missed this little nugget of info about FIFO queues in the documentation:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
When you receive a message with a message group ID, no more messages
for the same message group ID are returned unless you delete the
message or it becomes visible.
I was indeed using the same Message Group ID. Hadn't given it a second thought. Just be aware, if you do that and any one of your messages fails to process, it will back up all other messages in the queue, until the time that the message is finally dealt with. The solution for me was to change the message group id. There is some business logic id I can postfix on it that will work for me.

Multi-Thread Processing in .NET

I already have a few ideas, but I'd like to hear some differing opinions and alternatives from everyone if possible.
I have a Windows console app that uses Exchange web services to connect to Exchange and download e-mail messages. The goal is to take each individual message object, extract metadata, parse attachments, etc. The app is checking the inbox every 60 seconds. I have no problems connecting to the inbox and getting the message objects. This is all good.
Here's where I am accepting input from you: When I get a message object, I immediately want to process the message and do all of the busy work explained above. I was considering a few different approaches to this:
Queuing the e-mail objects up in a table and processing them one-by-one.
Passing the e-mail object off to a local Windows service to do the busy work.
I don't think db queuing would be a good approach because, at times, multiple e-mail objects need to be processed. It's not fair if a low-priority e-mail with 30 attachments is processed before a high-priority e-mail with 5 attachments is processed. In other words, e-mails lower in the stack shouldn't need to wait in line to be processed. It's like waiting in line at the store with a single register for the bonehead in front of you to scan 100 items. It's just not fair. Same concept for my e-mail objects.
I'm somewhat unsure about the Windows service approach. However, I'm pretty confident that I could have an installed service listening, waiting on demand for an instruction to process a new e-mail. If I have 5 separate e-mail objects, can I make 5 separate calls to the Windows service and process without collisions?
I'm open to suggestions or alternative approaches. However, the solution must be presented using .NET technology stack.
One option is to do the processing in the console application. What you have looks like a standard producer-consumer problem with one producer (the thread that gets the emails) and multiple consumers. This is easily handled with BlockingCollection.
I'll assume that your message type (what you get from the mail server) is called MailMessage.
So you create a BlockingCollection<MailMessage> at class scope. I'll also assume that you have a timer that ticks every 60 seconds to gather messages and enqueue them:
private BlockingCollection<MailMessage> MailMessageQueue =
new BlockingCollection<MailMessage>();
// Timer is created as a one-shot and re-initialized at each tick.
// This prevents the timer proc from being re-entered if it takes
// longer than 60 seconds to run.
System.Threading.Timer ProducerTimer = new System.Threading.Timer(
TimerProc, null, TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
void TimerProc(object state)
{
var newMessages = GetMessagesFromServer();
foreach (var msg in newMessages)
{
MailMessageQueue.Add(msg);
}
ProducerTimer.Change(TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
}
Your consumer threads just read the queue:
void MessageProcessor()
{
foreach (var msg in MailMessageQueue.GetConsumingEnumerable())
{
ProcessMessage();
}
}
The timer will cause the producer to run once per minute. To start the consumers (say you want two of them):
var t1 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
var t2 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
So you'll have two threads processing messages.
It makes no sense to have more processing threads than you have available CPU cores. The producer thread presumably won't require a lot of CPU resources, so you don't have to dedicate a thread to it. It'll just slow down message processing briefly whenever it's doing its thing.
I've skipped over some detail in the description above, particularly cancellation of the threads. When you want to stop the program, but let the consumers finish processing messages, just kill the producer timer and set the queue as complete for adding:
MailMessageQueue.CompleteAdding();
The consumers will empty the queue and exit. You'll of course want to wait for the tasks to complete (see Task.Wait).
If you want the ability to kill the consumers without emptying the queue, you'll need to look into Cancellation.
The default backing store for BlockingCollection is a ConcurrentQueue, which is a strict FIFO. If you want to prioritize things, you'll need to come up with a concurrent priority queue that implements the IProducerConsumerCollection interface. .NET doesn't have such a thing (or even a priority queue class), but a simple binary heap that uses locks to prevent concurrent access would suffice in your situation; you're not talking about hitting this thing very hard.
Of course you'd need some way to prioritize the messages. Probably sort by number of attachments so that messages with no attachments are processed quicker. Another option would be to have two separate queues: one for messages with 0 or 1 attachments, and a separate queue for those with lots of attachments. You could have one of your consumers dedicated to the 0 or 1 queue so that easy messages always have a good chance of being processed first, and the other consumers take from the 0 or 1 queue unless it's empty, and then take from the other queue. It would make your consumers a little more complicated, but not hugely so.
If you choose to move the message processing to a separate program, you'll need some way to persist the data from the producer to the consumer. There are many possible ways to do that, but I just don't see the advantage of it.
I'm somewhat a novice here, but it seems like an initial approach could be to have a separate high-priority queue. Every time a worker is available to obtain a new message, it could do something like:
If DateTime.Now - lowPriorityQueue.Peek.AddedTime < maxWaitTime Then
ProcessMessage(lowPriorityQueue.Dequeue())
Else If highPriorityQueue.Count > 0 Then
ProcessMessage(highPriorityQueue.Dequeue())
Else
ProcessMessage(lowPriorityQueue.Dequeue())
End If
In a single thread, while you can still have one message blocking the others, higher priority messages could be processed sooner.
Depending on how fast most messages get processed, the application could create a new worker on a new thread if the queues are getting too big or too old.
Please tell me if I'm completely off-base here though.

What are alternatives to Win32 PulseEvent() function?

The documentation for the Win32 API PulseEvent() function (kernel32.dll) states that this function is “… unreliable and should not be used by new applications. Instead, use condition variables”. However, condition variables cannot be used across process boundaries like (named) events can.
I have a scenario that is cross-process, cross-runtime (native and managed code) in which a single producer occasionally has something interesting to make known to zero or more consumers. Right now, a well-known named event is used (and set to signaled state) by the producer using this PulseEvent function when it needs to make something known. Zero or more consumers wait on that event (WaitForSingleObject()) and perform an action in response. There is no need for two-way communication in my scenario, and the producer does not need to know if the event has any listeners, nor does it need to know if the event was successfully acted upon. On the other hand, I do not want any consumers to ever miss any events. In other words, the system needs to be perfectly reliable – but the producer does not need to know if that is the case or not. The scenario can be thought of as a “clock ticker” – i.e., the producer provides a semi-regular signal for zero or more consumers to count. And all consumers must have the correct count over any given period of time. No polling by consumers is allowed (performance reasons). The ticker is just a few milliseconds (20 or so, but not perfectly regular).
Raymen Chen (The Old New Thing) has a blog post pointing out the “fundamentally flawed” nature of the PulseEvent() function, but I do not see an alternative for my scenario from Chen or the posted comments.
Can anyone please suggest one?
Please keep in mind that the IPC signal must cross process boundries on the machine, not simply threads. And the solution needs to have high performance in that consumers must be able to act within 10ms of each event.
I think you're going to need something a little more complex to hit your reliability target.
My understanding of your problem is that you have one producer and an unknown number of consumers all of which are different processes. Each consumer can NEVER miss any events.
I'd like more clarification as to what missing an event means.
i) if a consumer started to run and got to just before it waited on your notification method and an event occurred should it process it even though it wasn't quite ready at the point that the notification was sent? (i.e. when is a consumer considered to be active? when it starts or when it processes its first event)
ii) likewise, if the consumer is processing an event and the code that waits on the next notification hasn't yet begun its wait (I'm assuming a Wait -> Process -> Loop to Wait code structure) then should it know that another event occurred whilst it was looping around?
I'd assume that i) is a "not really" as it's a race between process start up and being "ready" and ii) is "yes"; that is notifications are, effectively, queued per consumer once the consumer is present and each consumer gets to consume all events that are produced whilst it's active and doesn't get to skip any.
So, what you're after is the ability to send a stream of notifications to a set of consumers where a consumer is guaranteed to act on all notifications in that stream from the point where it acts on the first to the point where it shuts down. i.e. if the producer produces the following stream of notifications
1 2 3 4 5 6 7 8 9 0
and consumer a) starts up and processes 3, it should also process 4-0
if consumer b) starts up and processes 5 but is shut down after 9 then it should have processed 5,6,7,8,9
if consumer c) was running when the notifications began it should have processed 1-0
etc.
Simply pulsing an event wont work. If a consumer is not actively waiting on the event when the event is pulsed then it will miss the event so we will fail if events are produced faster than we can loop around to wait on the event again.
Using a semaphore also wont work as if one consumer runs faster than another consumer to such an extent that it can loop around to the semaphore call before the other completes processing and if there's another notification within that time then one consumer could process an event more than once and one could miss one. That is you may well release 3 threads (if the producer knows there are 3 consumers) but you cant ensure that each consumer is released just the once.
A ring buffer of events (tick counts) in shared memory with each consumer knowing the value of the event it last processed and with consumers alerted via a pulsed event should work at the expense of some of the consumers being out of sync with the ticks sometimes; that is if they miss one they will catch up next time they get pulsed. As long as the ring buffer is big enough so that all consumers can process the events before the producer loops in the buffer you should be OK.
With the example above, if consumer d misses the pulse for event 4 because it wasn't waiting on its event at the time and it then settles into a wait it will be woken when event 5 is produced and since it's last processed counted is 3 it will process 4 and 5 and then loop back to the event...
If this isn't good enough then I'd suggest something like PGM via sockets to give you a reliable multicast; the advantage of this would be that you could move your consumers off onto different machines...
The reason PulseEvent is "unreliable" is not so much because of anything wrong in the function itself, just that if your consumer doesn't happen to be waiting on the event at the exact moment that PulseEvent is called, it'll miss it.
In your scenario, I think the best solution is to manually keep the counter yourself. So the producer thread keeps a count of the current "clock tick" and when a consumer thread starts up, it reads the current value of that counter. Then, instead of using PulseEvent, increment the "clock ticks" counter and use SetEvent to wake all threads waiting on the tick. When the consumer thread wakes up, it checks it's "clock tick" value against the producer's "clock ticks" and it'll know how many ticks have elapsed. Just before it waits on the event again, it can check to see if another tick has occurred.
I'm not sure if I described the above very well, but hopefully that gives you an idea :)
There are two inherent problems with PulseEvent:
if it's used with auto-reset events, it releases one waiter only.
threads might never be awaken if they happen to be removed from the waiting queue due to APC at the moment of the PulseEvent.
An alternative is to broadcast a window message and have any listener have a top-level message -only window that listens to this particular message.
The main advantage of this approach is that you don't have to block your thread explicitly. The disadvantage of this approach is that your listeners have to be STA (can't have a message queue on an MTA thread).
The biggest problem with that approach would be that the processing of the event by the listener will be delayed with the amount of time it takes the queue to get to that message.
You can also make sure you use manual-reset events (so that all waiting threads are awaken) and do SetEvent/ResetEvent with some small delay (say 150ms) to give a bigger chance for threads temporarily woken by APC to pick up your event.
Of course, whether any of these alternative approaches will work for you depends on how often you need to fire your events and whether you need the listeners to process each event or just the last one they get.
If I understand your question correctly, it seems like you can simply use SetEvent. It will release one thread. Just make sure it is an auto-reset event.
If you need to allow multiple threads, you could use a named semaphore with CreateSemaphore. Each call to ReleaseSemaphore increases the count. If the count is 3, for example, and 3 threads wait on it, they will all run.
Events are more suitable for communications between the treads inside one process (unnamed events). As you have described, you have zero ore more clients that need to read something interested. I understand that the number of clients changes dynamically. In this case, the best chose will be a named pipe.
Named Pipe is King
If you need to just send data to multiple processes, it’s better to use named pipes, not the events. Unlike auto-reset events, you don't need own pipe for each of the client processes. Each named pipe has an associated server process and one or more associated client processes (and even zero). When there are many clients, many instances of the same named pipe are automatically created by the operating system for each of the clients. All instances of a named pipe share the same pipe name, but each instance has its own buffers and handles, and provides a separate conduit for client/server communication. The use of instances enables multiple pipe clients to use the same named pipe simultaneously. Any process can act as both a server for one pipe and a client for another pipe, and vice versa, making peer-to-peer communication possible.
If you will use a named pipe, there would be no need in the events at all in your scenario, and the data will have guaranteed delivery no matter what happens with the processes – each of the processes may get long delays (e.g. by a swap) but the data will be finally delivered ASAP without your special involvement.
On The Events
If you are still interested in the events -- the auto-reset event is king! ☺
The CreateEvent function has the bManualReset argument. If this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled. This is not what you need. If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released.
These auto-reset events are very reliable and easy to use.
If you wait for an auto-reset event object with WaitForMultipleObjects or WaitForSingleObject, it reliably resets the event upon exit from these wait functions.
So create events the following way:
EventHandle := CreateEvent(nil, FALSE, FALSE, nil);
Wait for the event from one thread and do SetEvent from another thread. This is very simple and very reliable.
Don’t' ever call ResetEvent (since it automatically reset) or PulseEvent (since it is not reliable and deprecated). Even Microsoft has admitted that PulseEvent should not be used. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
This function is unreliable and should not be used, because only those threads will be notified that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode Asynchronous Procedure Calls at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
We have never used PulseEvent in our applications. As about auto-reset events, we are using them since Windows NT 3.51 (although they appeared in the first 32-bit version of NT - 3.1) and they work very well.
Your Inter-Process Scenario
Unfortunately, your case is a little bit more complicated. You have multiple threads in multiple processes waiting for an event, and you have to make sure that all the threads did in fact receive the notification. There is no other reliable way other than to create own event for each consumer. So, you will need to have as many events as are the consumers. Besides that, you will need to keep a list of registered consumers, where each consumer has an associated event name. So, to notify all the consumers, you will have to do SetEvent in a loop for all the consumer events. This is a very fast, reliable and cheap way. Since you are using cross-process communication, the consumers will have to register and de-register its events via other means of inter-process communication, like SendMessage. For example, when a consumer process registers itself at your main notifier process, it sends SendMessage to your process to request a unique event name. You just increment the counter and return something like Event1, Event2, etc, and creating events with that name, so the consumers will open existing events. When the consumer de-registers – it closes the event handle that it opened before, and sends another SendMessage, to let you know that you should CloseHandle too on your side to finally release this event object. If the consumer process crashes, you will end up with a dummy event, since you will not know that you should do CloseHandle, but this should not be a problem - the events are very fast and very cheap, and there is virtually no limit on the kernel objects - the per-process limit on kernel handles is 2^24. If you are still concerned, you may to the opposite – the clients create the events but you open them. If they won’t open – then the client has crashed and you just remove it from the list.

Resources