Solution to slow consumer(eventProcessor) issue in LMAX Disruptor pattern - consumer

While using the disruptor, there may be a consumer(s) that is lagging behind, and because of that slow consumer, the whole application is affected.
Keeping in mind that every producer(Publisher) and consumer(EventProcessor) is running on a single thread each, what can be the solution to the slow consumer problem?
Can we use multiple threads on a single consumer? If not, what is a better alternative?

Generally speaking use a WorkerPool to allow multiple pooled worker threads to work on a single consumer, which is good if you have tasks that are independent and of a potentially variable duration (eg: some short tasks, some longer).
The other option is to have multiple independent workers parallel process over the events, but each worker only handle modulo N workers (eg 2 threads, and one thread processes odd, one thread processes even event IDs). This works great if you have consistent duration processing tasks, and allows batching to work very efficiently too.
Another thing to consider is that the consumer can do "batching", which is especially useful for example in auditing. If your consumer has 10 events waiting, rather than write 10 events to an audit log independently, you can collect all 10 events and write them at the same time. In my experience this more than covers the need to run multiple threads.

Try to separate slow part to other thread (I/O, not O(1) or O(log) calculations, etc.), or to apply some kind of back pressure when the consumer is overloaded (by yielding or temporary parking producers, replying with 503 or 429 status codes, etc.):
http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html

Use a set of identical eventHandlers. To avoid more than 1 eventHandler acting upon a single event, I use the following approach.
Create a thread pool of size Number of cores in the system
Executor executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); // a thread pool to which we can assign tasks
Then create a handler array
HttpEventHandler [] handlers = new HttpEventHandler[Runtime.getRuntime().availableProcessors()];
for(int i = 0; i<Runtime.getRuntime().availableProcessors();i++){
handlers[i] = new HttpEventHandler(i);
}
disruptor.handleEventsWith(handlers);
In the EventHandler
public void onEvent(HttpEvent event, long sequence, boolean endOfBatch) throws InterruptedException
{
if( sequence % Runtime.getRuntime().availableProcessors()==id){
System.out.println("-----On event Triggered on thread "+Thread.currentThread().getName()+" on sequence "+sequence+" -----");
//your event handler logic
}

Related

Design Pattern - Spring KafkaListener processing 1 million records in 1 hour

My spring boot application is going to listen to 1 million records an hour from a kafka broker. The entire processing logic for each message takes 1-1.5 seconds including a database insert. Broker has 64 partitions, which is also the concurrency of my #KafkaListener.
My current code is only able to process 90 records in a minute in a lower environment where I am listening to around 50k records an hour. Below is the code and all other config parameters like max.poll.records etc are default values:
#KafkaListener(id="xyz-listener", concurrency="64", topics="my-topic")
public void listener(String record) {
// processing logic
}
I do get "it is likely that the consumer was kicked out of the group" 7-8 times an hour. I think both of these issues can be solved through isolating listener method and multithreading processing of each message but I am not sure how to do that.
There are a few points to consider here. First, 64 consumers seems a bit too much for a single application to handle consistently.
Considering each poll by default fetches 500 records per consumer at a time, your app might be getting overloaded and causing the consumers to get kicked out of the group if a single batch takes more than the 5 minutes default for max.poll.timeout.ms to be processed.
So first, I'd consider scaling the application horizontally so that each application handles a smaller amount of partitions / threads.
A second way to increase throughput would be using a batch listener, and handling processing and DB insertions in batches as you can see in this answer.
Using both, you should be processing a sensible amount of work in parallel per app, and should be able to achieve your desired throughput.
Of course, you should load test each approach with different figures to have proper metrics.
EDIT: Addressing your comment, if you want to achieve this throughput I wouldn't give up on batch processing just yet. If you do the DB operations row by row you'll need a lot more resources for the same performance.
If your rule engine doesn't do any I/O you can iterate each record from the batch through it without losing performance.
About data consistency, you can try some strategies. For example, you can have a lock to ensure that even through a rebalance only one instance will process a given batch of records at a given time - or perhaps there's a more idiomatic way of handling that in Kafka using the rebalance hooks.
With that in place, you can batch load all the information you need to filter out duplicated / outdated records when you receive the records, iterate each record through the rule engine in memory, and then batch persist all results, to then release the lock.
Of course, it's hard to come up with an ideal strategy without knowing more details about the process. The point is by doing that you should be able to handle around 10x more records within each instance, so I'd definitely give it a shot.

Is it possible to share the same message queue between two threads?

What I would like to do is to have one thread waiting for messages (WaitMessage) and another processing the logic of the application. The first thread would wake up on every message, signal somehow this event to the other thread, go to sleep again, etc. Is this possible?
UPDATE
Consider the following situation. We have a GUI thread, and this thread is busy in a long calculation. If there is no other thread, there is no option but to check for new messages from time to time. Otherwise, the GUI would become irresponsive during the long calculation. Right now my system uses this "polling" approach (it has a single thread that checks the message queue from time to time.) However, I would like to know whether this other solution is possible: Have another thread waiting on the OS message queue of the GUI so that when a Windows message arrives this thread will wake up and tell the other about the message. Note that I'm not asking how to communicate the news between threads but whether it is possible for the second thread to wait for OS messages that arrive in the queue of the first thread.
I should also add that I cannot have two different threads, one for the GUI and another for the calculations, because the system I'm working on is a Virtual Machine on top of which runs a Smalltalk image that is not thread safe. That's why having a thread that only signals new OS messages would be the ideal solution (if possible.)
This depends on what the second thread needs to do once the first thread has received a message.
If the second thread simply needs to know the first thread received a message, the first thread could signal an Event object using SetEvent() or PulseEvent(), and the second thread could wait on that event using WaitForSingleObject().
If the second thread needs data from the first thread, it could use an I/O Completion Port. The first thread could wrap the data inside a dynamically allocated struct and post it to the port using PostQueuedCompletionStatus(), and the second thread could wait for the data using GetQueuedCompletionStatus() and then free it when done using it.
Update: based on new information you have provided, it is not possible for one thread to wait on or service another thread's message queue. Only the thread that created and owns the queue can poll messages from its queue. Each thread has its own message queue.
You really need to move your long calculations to a different thread, they don't belong in the GUI thread to begin with. Let the GUI thread manage the GUI and service messages, do any long-running things in another thread.
If you can't do that because your chosen library is not thread safe, then you have 4 options:
find a different library that is thread safe.
have the calculations poll the message queue periodically when running in the GUI thread.
break up the calculations into small chunks that can be triggered by the GUI thread posting messages to itself. Post a message and return to the message loop. When the message is received, do a little bit of work, post the next message, and return to the message loop. Repeat as needed until the work is done. This allows the GUI thread to continue servicing the message queue in between each calculation step.
move the library to a separate process that communicates back with your main app as needed.

creating delphi timers dynamically at runtime (performance, cpu consuming)

In my current project I have a structure like this:
Main Thread (GUI):
->Parser Thread
->Healer Thread
->Scripts Thread
the problem is that the Healer & Scripts Threads have to create childthreads with their appropiate timer, it would look like this:
->Parser Thread
->Healer Thread:
-->Healer 1
-->Healer 2
--> (...)
->Scripts Thread:
-->Script 1
--> (...)
For doing this I have thought about coding a dynamically Timer which would be created at runtime when a new Heal/Script is added.
Now the problem/question is:
maybe I have like 20 timers runing at the same time because of this, wouldn't this be a problem to my program performance (CPU consuming, etc)?
Is this the best way to achieve what I'm looking for?
Thanks in advance
There's no problem with having up to 20 timers active at one time in an application. Modern hardware is more than capable of handling that.
Remember also that timer messages are low priority messages and so are only synthesised when the message queue is empty. So, you need to keep the message queues of your threads serviced promptly in order for the messages to be delivered in a timely manner.
A bigger problem for you is that you cannot create TTimer instances outside the GUI/VCL thread. That's because the timer component calls AllocateHWnd which is not thread safe and can only be called from the GUI/VCL thread. So, you'll need to interact with the raw Win32 timer API directly and not use the VCL TTimer wrapper.

MFC CEvent class member function SetEvent , difference with Thread Lock() function?

what i s the difference between SetEvent() and Thread Lock() function? anyone please help me
Events are used when you want to start/continue processing once a certain task is completed i.e. you want to wait until that event occurs. Other threads can inform the waiting thread about the completion of this task using SetEvent.
On the other hand, critical section is used when you want only one thread to execute a block of code at a time i.e. you want a set of instructions to be executed by one thread without any other thread changing the state at that time. For example, you are inserting an item into a linked list which involves multiple steps, at that time you don't want another thread to come and try to insert one more object into the list. So you block the other thread until first one finishes using critical sections.
Events can be used for inter-process communication, ie synchronising activity amongst different processes. They are typically used for 'signalling' the occurrence of an activity (e.g. file write has finished). More information on events:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686915%28v=vs.85%29.aspx
Critical sections can only be used within a process for synchronizing threads and use a basic lock/unlock concept. They are typically used to protect a resource from multi-threaded access (e.g. a variable). They are very cheap (in CPU terms) to use. The inter-process variant is called a Mutex in Windows. More info:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682530%28v=vs.85%29.aspx

Forcing context switch in Windows

Is there a way to force a context switch in C++ to a specific thread, assuming I have the thread handle or thread ID?
No, you won't be able to force operating system to run the thread you want. You can use yield to force a context switch though...
yield in Win32 API is function SwitchToThread. If there is no other thread available for running, then a ZERO value will be returned and current thread will keep running anyway.
You can only encourage the Windows thread scheduler to pick a certain thread, you can't force it. You do so first by making the thread block on a synchronization object and signaling it. Secondary by bumping up its priority.
Explicit context switching is supported, you'll have to use fibers. Review SwitchToFiber(). A fiber is not a thread by a long shot, it is similar to a co-routine of old. Fibers' heyday has come and gone, they are not competitive with threads anymore. They have very crappy cpu cache locality and cannot take advantage of multiple cores.
The only way to force a particular thread to run is by using process/thread affinity, but I can't imagine ever having a problem for which this was a reasonable solution.
The only way to force a context switch is to force a thread onto a different processor using affinity.
In other words, what you are trying to do isn't really viable.
Calling SwitchToThread() will result in a context switch if there is another thread ready to run that are eligible to run on this processor. The documentation states it as follows:
If calling the SwitchToThread function
causes the operating system to switch
execution to another thread, the
return value is nonzero.
If there are no other threads ready to
execute, the operating system does not
switch execution to another thread,
and the return value is zero.
You can temporarily bump the priority of the other thread, while looping with Sleep(0) calls: this passes control to other threads. Suppose that the other thread has increased a lock variable and you need to wait until it becomes zero again:
// Wait until other thread releases lock
SetThreadPriority(otherThread, THREAD_PRIORITY_HIGHER);
while (InterlockedRead(&lock) != 0)
Sleep(0);
SetThreadPriority(otherThread, THREAD_PRIORITY_NORMAL);
I would check out the book Concurrent Programming for Windows. The scheduler seems to do a few things worth noting.
Sleep(0) only yields to higher priority threads (or possibly others at the same priority). This means you cannot fix priority inversion situations with just a Sleep(0), where other lower priority threads need to run. You must use SwitchToThread, Sleep a non-zero duration, or fully block on some kernel HANDLE.
You can create two synchronization objects (such as two events) and use the API SignalObjectAndWait.
If the hObjectToWaitOn is non-signaled and your other thread is waiting on the hObjectToSignal, the OS can theoretically perform quick context switch inside this API, before end of time slice.
And if you want the current thread to automatically resume, simply inform a small value (such as 50 or 100) on the dwMilliseconds.

Resources