Is there a way to setup a spring integration channel in such a way that lets say it only sends the messages to output channel once it has accumulated 50 incoming messages. To look at it from polling perspective, I want the polling process to be based on the number of messages instead of a fixed time interval .. somehow poll the previous channel possibly multiple times but only accept messages once it has enough to process
Use an <aggregator/> with a release-strategy-expression="size == 50" and a correlation-strategy-expression="'foo'" (and expire-groups-on-completion="true). The expire-groups setting allows the next group ('foo') to form.
Follow the aggregator with a simple <splitter /> (no expressions, just in/out channels).
The aggregator will accumulate messages until 50 arrive and then release them as a collection, and the splitter will split the collection back to single messages.
If you want to release based on size or elapsed time (release a short group if x seconds elapse) then configure a MessageGroupStoreReaper.
Related
ActiveMQ 5.15.13
Context: I have a single queue with multiple Consumers. I want to stop some consumers from processing certain messages. This has to be dynamic, I don't want to create separate queues for this. This works without any problems. e.g. Consumer1 ignores Stocks -> Consumer1 can process all invoices and Consumer2 can process all Stocks
But if there is a large number of messages already in the Queue (of one type, e.g. stocks) and I send a message of another type (e.g. invoices), Consumer1 won't process the message of type invoices. It will instead be idle until Consumer2 has processed all Stocks messages. It does not happen every time, but quite often.
Is there any option to change the order of the new messages coming into the queue, such that an idle consumer with matching selector picks up the new message?
Things I've already tried:
using a PendingMessageLimitStrategy -> it seems like it does not work for queues
increasing the maxPageSize and maxBrowsePageSize in the hope that once all Messages are in RAM, the Consumers will search for their messages.
Exclusive Consumers aren't an option since I want to be able to use more than one Consumer per message type.
Im pretty sure that there is some configuration which allows this type of usage. I'm aware that there are better solutions for this issue, but sadly I can't use them easily due to other constraints.
Thanks a lot in advance!
EDIT: I noticed that when I'm refreshing on the localhost queue browser, the stuck messages get executed immediately. It seems like this action performs some sort of queue refresh where the messages get filtered based on their selector again. So I just need this action whenever a new message enters the queue...
This is a 'window' problem where the next set of 'stocks' data needs to be processed before the 'invoicing' data can be processed.
The gotcha with window problems like this is that you need to account for the fact that some messages may never come through, or a consumer may never come back online either. Also, eventually you will be asked 'how many invoices or stocks are left to be processed'-- aka observability.
ActiveMQ has you covered-- check out wild-card destinations and consumers.
Produce 'stocks' to:
queue://data.stocks.input
Produce 'invoices' to:
queue://data.invoices.input
You then setup consumes to connect:
queue://data.*.input
note: the wildard '*'.
ActiveMQ will match queues based on the wildcard pattern, and then process data accordingly. As a bonus, you can still use a selector.
I have a channel in slack, to which a CI tool sends notification. The CI tool sends notification for failure for every operation and there is no way to filter it out. But I know that important notifications come from 12 AM to 2 AM. Is there a way that I can apply a filter daily on that channel between two time intervals ?
Yes. you can call the API method conversations.history, which will return messages from a channel. By settings the parameters oldest and latest accordingly you will only get messages from a specified timeframe.
Note that those parameters are provided as absolute timestamps (e.g. 1234567890.123456), so you need to calculate them for the current day.
I would like to process multiple messages at a time e.g. get 10 messages from the channel at a time and write them to a log file at once.
Given the scenario, can I write a service activator which will get messages in predefined set i.e. 5 or 10 messages and process it? If this is not possible then how to achieve this using Spring Integration.
That is exactly what you can get with the Aggregator. You can collect several messages to the group using simple expression like size() == 10. When the group is complete, the DefaultAggregatingMessageGroupProcessor emits a single message with the list of payloads of messages in the group. The result you can send to the service-activator for handling the batch at once.
UPDATE
Something like this:
.aggregate(aggregator -> aggregator
.correlationStrategy(message -> 1)
.releaseStrategy(group -> group.size() == 10)
.outputProcessor(g -> new GenericMessage<Collection<Message<?>>>(g.getMessages()))
.expireGroupsUponCompletion(true))
So, we correlate messages (group or buffer them) by the static 1 key.
The group (or buffer size is 10) and when we reach it we emit a single message which contains all the message from the group. After emitting the result we clean the store from this group to allow to form a new one for a fresh sequence of messages.
It depends on what is creating the messages in the first place; if a message-driven channel adapter, the concurrency in that adapter is the key.
For other message sources, you can use an ExecutorChannel as the input channel to the service activator, with an executor with a pool size of 10.
Depending on what is sending messages, you need to be careful about losing messages in the event of a server failure.
It's difficult to provide a general answer without more information about your application.
I have a system where we process text messages. Each message gets split up into sentences, and each sentence gets processed individually and the results of each sentence get published to a topic. This all happens asynchronously.
I want to be able to aggregate the results for the sentences.
The problem is that I want the window to end when the total number of sentences have been reached, or when a total amount of time has passed. Basically Tumbling time windows, but can end when a total number of results have been received.
Secondarily I want to be able to know when that window ends so that I can process the aggregation as an atomic event.
It's possible but you have to implement a custom processor - your requirements are simply to specific for the high-level API to cater for.
Your processor would store messages into a state store and use punctuate to periodically check if the window expired. It would also keep a running counter and check if the max number of results have been received. If either condition is met, it does the aggregation, removes messages from the state store and sends the results downstream.
You'd have to think about what to do on restart (failover/re-balancing). When starting up, the processor should inspect its state store and calculate the current running count and the window expiry time.
Now Apache Kafka offers you a way to wait closing the window. Here piece of code;
suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
For more, check it out.
I already have a few ideas, but I'd like to hear some differing opinions and alternatives from everyone if possible.
I have a Windows console app that uses Exchange web services to connect to Exchange and download e-mail messages. The goal is to take each individual message object, extract metadata, parse attachments, etc. The app is checking the inbox every 60 seconds. I have no problems connecting to the inbox and getting the message objects. This is all good.
Here's where I am accepting input from you: When I get a message object, I immediately want to process the message and do all of the busy work explained above. I was considering a few different approaches to this:
Queuing the e-mail objects up in a table and processing them one-by-one.
Passing the e-mail object off to a local Windows service to do the busy work.
I don't think db queuing would be a good approach because, at times, multiple e-mail objects need to be processed. It's not fair if a low-priority e-mail with 30 attachments is processed before a high-priority e-mail with 5 attachments is processed. In other words, e-mails lower in the stack shouldn't need to wait in line to be processed. It's like waiting in line at the store with a single register for the bonehead in front of you to scan 100 items. It's just not fair. Same concept for my e-mail objects.
I'm somewhat unsure about the Windows service approach. However, I'm pretty confident that I could have an installed service listening, waiting on demand for an instruction to process a new e-mail. If I have 5 separate e-mail objects, can I make 5 separate calls to the Windows service and process without collisions?
I'm open to suggestions or alternative approaches. However, the solution must be presented using .NET technology stack.
One option is to do the processing in the console application. What you have looks like a standard producer-consumer problem with one producer (the thread that gets the emails) and multiple consumers. This is easily handled with BlockingCollection.
I'll assume that your message type (what you get from the mail server) is called MailMessage.
So you create a BlockingCollection<MailMessage> at class scope. I'll also assume that you have a timer that ticks every 60 seconds to gather messages and enqueue them:
private BlockingCollection<MailMessage> MailMessageQueue =
new BlockingCollection<MailMessage>();
// Timer is created as a one-shot and re-initialized at each tick.
// This prevents the timer proc from being re-entered if it takes
// longer than 60 seconds to run.
System.Threading.Timer ProducerTimer = new System.Threading.Timer(
TimerProc, null, TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
void TimerProc(object state)
{
var newMessages = GetMessagesFromServer();
foreach (var msg in newMessages)
{
MailMessageQueue.Add(msg);
}
ProducerTimer.Change(TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
}
Your consumer threads just read the queue:
void MessageProcessor()
{
foreach (var msg in MailMessageQueue.GetConsumingEnumerable())
{
ProcessMessage();
}
}
The timer will cause the producer to run once per minute. To start the consumers (say you want two of them):
var t1 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
var t2 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
So you'll have two threads processing messages.
It makes no sense to have more processing threads than you have available CPU cores. The producer thread presumably won't require a lot of CPU resources, so you don't have to dedicate a thread to it. It'll just slow down message processing briefly whenever it's doing its thing.
I've skipped over some detail in the description above, particularly cancellation of the threads. When you want to stop the program, but let the consumers finish processing messages, just kill the producer timer and set the queue as complete for adding:
MailMessageQueue.CompleteAdding();
The consumers will empty the queue and exit. You'll of course want to wait for the tasks to complete (see Task.Wait).
If you want the ability to kill the consumers without emptying the queue, you'll need to look into Cancellation.
The default backing store for BlockingCollection is a ConcurrentQueue, which is a strict FIFO. If you want to prioritize things, you'll need to come up with a concurrent priority queue that implements the IProducerConsumerCollection interface. .NET doesn't have such a thing (or even a priority queue class), but a simple binary heap that uses locks to prevent concurrent access would suffice in your situation; you're not talking about hitting this thing very hard.
Of course you'd need some way to prioritize the messages. Probably sort by number of attachments so that messages with no attachments are processed quicker. Another option would be to have two separate queues: one for messages with 0 or 1 attachments, and a separate queue for those with lots of attachments. You could have one of your consumers dedicated to the 0 or 1 queue so that easy messages always have a good chance of being processed first, and the other consumers take from the 0 or 1 queue unless it's empty, and then take from the other queue. It would make your consumers a little more complicated, but not hugely so.
If you choose to move the message processing to a separate program, you'll need some way to persist the data from the producer to the consumer. There are many possible ways to do that, but I just don't see the advantage of it.
I'm somewhat a novice here, but it seems like an initial approach could be to have a separate high-priority queue. Every time a worker is available to obtain a new message, it could do something like:
If DateTime.Now - lowPriorityQueue.Peek.AddedTime < maxWaitTime Then
ProcessMessage(lowPriorityQueue.Dequeue())
Else If highPriorityQueue.Count > 0 Then
ProcessMessage(highPriorityQueue.Dequeue())
Else
ProcessMessage(lowPriorityQueue.Dequeue())
End If
In a single thread, while you can still have one message blocking the others, higher priority messages could be processed sooner.
Depending on how fast most messages get processed, the application could create a new worker on a new thread if the queues are getting too big or too old.
Please tell me if I'm completely off-base here though.