When multi MessageConsumer connect to same queue(Websphere MQ),how to load balance message-consumer? - jms

I am Using WebSphere MQ 7,and I have two clients connected to the same QMgr and consuming messages from same queue, like following code:
while (true) {
TextMessage message = (TextMessage) consumer.receive(1000);
if (message != null) {
System.out.println("*********************" + message.getText());
}
}
I found only one client always retrieve messages. Is there any method to let consume-message load balancing in two client? Any config options in MQ Server side?

When managing queue handles, it is MUCH faster for WMQ to put them in a stack rather than a LIFO queue. So if the messages arrive on the queue slower than it takes to process them, it is possible that an instance will process the message and perform another GET, which WMQ pushes down on the stack. The result is that only one instance will see messages in a low-volume use case.
In larger environments where there are many instances waiting on messages, it is possible that activity will round-robin amongst a portion of those instances while the other instances starve for messages. For example, with 10 GETters on the queue you may see three processing messages and 7 idle.
Although this is considerably faster for MQ, it is confusing to customers who are not aware of how it works internally and so they open PMRs asking this exact question. IBM had to choose among several alternatives:
Adding several code paths to manage by stack for performance when fully loaded, versus manage by LIFO for apparent balancing when lightly loaded. This bloats the code, adds many new decision points to introduce errors and solves a problem that was one of perception rather than reliability or performance.
Educate the customers as to how it works. Of course, once you document it, then you can't change it. The way I found out about this was attending the "WMQ Internals" presentation at IMPACT. It's not in the Infocenter so IBM can change it, but it is available for customers.
Do nothing. Although this is the best result from the code design point of view, the behavior is counter-intuitive. Users need to understand why things do not behave as expected and will waste time trying to find the configuration that results in the desired behavior, or open a PMR.
I don't know for sure that it still works this way but I expect that it does. The way I used to test it was to put many messages on the queue at once and then see how they were distributed. If you drop about 50 messages on the queue in one unit of work, you should see a better distribution between the two instances.
How do you drop 50 messages on the queue at once? First generate them with the applications turned off or to a spare queue. If you generated them in the target queue, use the Q program to move them to the spare queue. Now start the apps and make sure the queue's IPPROC count equals however many instances of the app you started. Using Q again, copy all of the messages to the original queue in a single unit of work. Since they all become available on the queue at once, your two app instances should both immediately be passed a message. If you used copy instead of move, you can repeat this as often as required.

Your client is not doing much, so one instance can probably handle the full load. Try implementing a more realistic workload, or, simpler yet, put a Thread.sleep in the client.

Related

Spring Boot Kafka: Consume same message with all instances for specific topic

I have a spring boot application (let's say it's called app-1) that is connected to a kafka cluster and that consumes from a specific topic, let's say the topic is called "foo". Topic foo always receives a message when another application (let's say it's called app-2) has imported a new foo-item into the database.
The topic is primarily meant to be used in a third application (let's say it's called app-3) which sends out some e-Mail notification to people that may be interested in this new foo-item. App-3 is clustered, meaning there are multiple instances of it running at the same time. Kafka automatically balances the foo-topic messages between all these instances because they use the same consumer-id. This is good and in the case of app-3 it is actually desired.
In the case of app-2, however, the messages from the foo-topic are used for cache eviction. The logic is, basically, that if there is a new foo-item then the currently existing caches should probably be cleared, because their content depends on the foo-items. The issue is that app-2 is also clustered, which means that by default kafka-logic, every instance will only receive some of the messages sent to the foo-topic. This does not work correctly for this specific app tho, because whenever there is a new foo-item, all of the instances need to know about it because all of them need their clear their local caches.
From what I understand I have these two options if I want to keep the current logic:
Introduce a distributed cache for all instances of app-2 so that they all share the same cache. Then it does not matter if only one instance receives a foo-item, because the cache eviction will also affect the cache of the other instances; even though they never learned about the foo-item. I would like to avoid this solution, as a distributed cache would add a noticeable amount of complexity and also overhead.
Somehow manage to use a different consumer-id for each instance of app-2. Then they would be considered different consumers by kafka and they all would get each foo-topic message. However, I don't even know how to programmatically do this. The code of the application is not aware of replicated instances, there is no way to access any information about what node it is. If I use a randomly generated string on startup, then each time such instance restarts it would be considered a new consumer and would have to re-process all previous messages. That would be incorrect behavior as well.
Here is my bottom line question: Is it possible to make all instances of app-2 receive all messages from the foo-topic without completely breaking the way kafka is supposed to work? I know that it is probably very unconventional to use kafka-messages for cache eviction and I am entirely able to find an alternative mechanism for the cache eviction logic that does not depend on kafka-topic messages. However, the applications are for demonstration purposes and I thought it would be cool if more than one app read from this topic. But if I end up having to hack a dirty workaround to make it work then it's also bad for demonstration purposes and I would rather implement an alternative way of cache eviction.
As you mentioned, you could use different consumer ids with random strings.
If notifications are being read from the beginning, then you probably have ConsumerConfig.AUTO_OFFSET_RESET_CONFIG set to "earliest" somewhere in your consumer configuration. If this is the case, removing it will probably solve your problems - when the app will start it will only receive notification sent after the consumer started listening.

ActiveMQ alerting old messages

I am using ActiveMQ and want to generate alerts for messages which are sitting int the queue for very long time. I looked at "Advisory Message" feature but it has no such provision. It is very important for me to use a solution which does not add too much overhead on AMQ.
Note:This requirement is very different from alerts when message moves to DLQ after expiry.
The only means of reviewing what is in a Queue really is to browse it and the broker will place limitations on how far into the contents of the queue you can browse.
A message broker is not a database and you should not try to treat as such. If you have concerns about things remaining on a queue for to long then explicit expiration is your most effective tool.
You can build you own tooling to track the advisories around message enqueue and dequeue but you'd just end up needing to persist that information to make it effective so going back and reevaluating why you need to do this and what might be a better choice of architecture might be appropriate.
If you insist on want to audit the contents of the Queues then you'd want to look at configuration for max browse page size to try and let you get further into the Queue on a browse but depending on depth this probably won't get you everything you want.

Fault tolerant redundancy

This might result in biased and opinion based answers, if so I'll close the question but...
I have a rather basic requirement of improving our up-time and speed. As part of this I'm looking at the two main competing approaches, traditional pub/sub and akka.net. We don't have any issues currently or expect to have any need for concurrency control.
What we have is several basic workflows which are data analysis, manipulation and persistence of the result:
Step 1) Capture work to be done (IE what objects need to do some work)
Step 2) Execute that work load and produce a result
Step 3) Save result
Using traditional pub/sub This seems rather easy. Have micro services for each step, push a message at the end of each step with the data required (or more to the point data that might be useful) for the next step. Using any off the self message queue/topic/subscription software this provides a nice ability to:
1) geographically spread the loads around the world to where the source data is located
2) increase the number of "workers" that subscribe to increase through put
3) push to something central that can support the idea of connecting "workers" with a minimal learning curve
4) any component (or set of workers for a component) further down the workflow has/have a queue where the messages queue and wait for said component to come back online (even if the whole component disconnects)
5) adding new components that do something new and different, is as easy as registering a new subscription to a topic.
It's all pretty much out of the box easy joy... assuming sensible aggregate and bounded context patterns are adhered to here. I'm not seeking advise of how to write good distributed code, I'm looking for how deploy it, support it, debug rouge/missing/corrupt messages etc. Which is why I want to know what Akka.net offers.
I've seen there's Akka.net clustering . It may or may not be production ready yet, but best I understand what it can/could do for us.
So the main questions I have are:
1) Where are messages stored prior to arriving? So long as a publisher has access to the messaging bus/software endpoint, any such software will store and hold messages waiting for a subscriber to connect and pick up it's messages (obvious assumptions about the subscription having already been registered so the messages queue for it). How does Akka.net cluster handle all of this?
2) What tooling exists for operational support of these queues and mailboxes in Akka.net cluster? What tools give an operator insight into what is in a mailbox received but waiting to be processed and what tools exist for viewing what has been "published" and not yet "received"? Most competing Pub/Sub software has operational tools so I'm looking for some comparison here.
3) How do you debug rouge, missing or corrupt messages. We all know we should trust our software but a bad message can cause a system to spiral out of control, so how would I eject a bad message from the system? How can I modify a message so it's going to behave differently because the business needs something fixed at 3:30 am? How can I answer "where is my message" with "it IS in the system and it IS waiting to be received" or "it has been received and just in the mailbox"?
4) If a component goes down HARD (recycle, hardware failure what ever) what will restore the mailboxes, queues etc? Any message that's actually being processed has an acceptable lost tolerance, but 1000 messages in a mailbox getting lost isn't so tolerable, what persistence and tolerance is there?
5) The light review I've done appears to advocate for a supervisor pattern to be built into your software to marshal messages around (I'm guessing to manage and release concurrency locks?). Given concurrency isn't an issue here, what out of the box pub/sub mechanism do you support that isn't basic message remoting between two (or x internally defined in code) components? Again with subscriptions and topics in most pub/sub software, your first object pushes a message (it's central so it's a potential single point of failure) but that component (and neither doesn't any other code) have to be aware of what will consume that message. It's expansion nirvana compared the old school way where we manually pushed a message from one object to the next (and to the next), rebuilding or recompiling for each new class that same message had to go to. I'm keen to not have to build our own message router.
6) When all instances of a particular component go offline (say step 3 above) what remembers that there's actually something there that needs to queue and remember those messages (say the ones pushed blindly from step 2 above)? In other software, until you delete the subscription the messages keep queuing up based on what ever rules are defined for TTL etc. What is provided for this?

Multi-Thread Processing in .NET

I already have a few ideas, but I'd like to hear some differing opinions and alternatives from everyone if possible.
I have a Windows console app that uses Exchange web services to connect to Exchange and download e-mail messages. The goal is to take each individual message object, extract metadata, parse attachments, etc. The app is checking the inbox every 60 seconds. I have no problems connecting to the inbox and getting the message objects. This is all good.
Here's where I am accepting input from you: When I get a message object, I immediately want to process the message and do all of the busy work explained above. I was considering a few different approaches to this:
Queuing the e-mail objects up in a table and processing them one-by-one.
Passing the e-mail object off to a local Windows service to do the busy work.
I don't think db queuing would be a good approach because, at times, multiple e-mail objects need to be processed. It's not fair if a low-priority e-mail with 30 attachments is processed before a high-priority e-mail with 5 attachments is processed. In other words, e-mails lower in the stack shouldn't need to wait in line to be processed. It's like waiting in line at the store with a single register for the bonehead in front of you to scan 100 items. It's just not fair. Same concept for my e-mail objects.
I'm somewhat unsure about the Windows service approach. However, I'm pretty confident that I could have an installed service listening, waiting on demand for an instruction to process a new e-mail. If I have 5 separate e-mail objects, can I make 5 separate calls to the Windows service and process without collisions?
I'm open to suggestions or alternative approaches. However, the solution must be presented using .NET technology stack.
One option is to do the processing in the console application. What you have looks like a standard producer-consumer problem with one producer (the thread that gets the emails) and multiple consumers. This is easily handled with BlockingCollection.
I'll assume that your message type (what you get from the mail server) is called MailMessage.
So you create a BlockingCollection<MailMessage> at class scope. I'll also assume that you have a timer that ticks every 60 seconds to gather messages and enqueue them:
private BlockingCollection<MailMessage> MailMessageQueue =
new BlockingCollection<MailMessage>();
// Timer is created as a one-shot and re-initialized at each tick.
// This prevents the timer proc from being re-entered if it takes
// longer than 60 seconds to run.
System.Threading.Timer ProducerTimer = new System.Threading.Timer(
TimerProc, null, TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
void TimerProc(object state)
{
var newMessages = GetMessagesFromServer();
foreach (var msg in newMessages)
{
MailMessageQueue.Add(msg);
}
ProducerTimer.Change(TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
}
Your consumer threads just read the queue:
void MessageProcessor()
{
foreach (var msg in MailMessageQueue.GetConsumingEnumerable())
{
ProcessMessage();
}
}
The timer will cause the producer to run once per minute. To start the consumers (say you want two of them):
var t1 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
var t2 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
So you'll have two threads processing messages.
It makes no sense to have more processing threads than you have available CPU cores. The producer thread presumably won't require a lot of CPU resources, so you don't have to dedicate a thread to it. It'll just slow down message processing briefly whenever it's doing its thing.
I've skipped over some detail in the description above, particularly cancellation of the threads. When you want to stop the program, but let the consumers finish processing messages, just kill the producer timer and set the queue as complete for adding:
MailMessageQueue.CompleteAdding();
The consumers will empty the queue and exit. You'll of course want to wait for the tasks to complete (see Task.Wait).
If you want the ability to kill the consumers without emptying the queue, you'll need to look into Cancellation.
The default backing store for BlockingCollection is a ConcurrentQueue, which is a strict FIFO. If you want to prioritize things, you'll need to come up with a concurrent priority queue that implements the IProducerConsumerCollection interface. .NET doesn't have such a thing (or even a priority queue class), but a simple binary heap that uses locks to prevent concurrent access would suffice in your situation; you're not talking about hitting this thing very hard.
Of course you'd need some way to prioritize the messages. Probably sort by number of attachments so that messages with no attachments are processed quicker. Another option would be to have two separate queues: one for messages with 0 or 1 attachments, and a separate queue for those with lots of attachments. You could have one of your consumers dedicated to the 0 or 1 queue so that easy messages always have a good chance of being processed first, and the other consumers take from the 0 or 1 queue unless it's empty, and then take from the other queue. It would make your consumers a little more complicated, but not hugely so.
If you choose to move the message processing to a separate program, you'll need some way to persist the data from the producer to the consumer. There are many possible ways to do that, but I just don't see the advantage of it.
I'm somewhat a novice here, but it seems like an initial approach could be to have a separate high-priority queue. Every time a worker is available to obtain a new message, it could do something like:
If DateTime.Now - lowPriorityQueue.Peek.AddedTime < maxWaitTime Then
ProcessMessage(lowPriorityQueue.Dequeue())
Else If highPriorityQueue.Count > 0 Then
ProcessMessage(highPriorityQueue.Dequeue())
Else
ProcessMessage(lowPriorityQueue.Dequeue())
End If
In a single thread, while you can still have one message blocking the others, higher priority messages could be processed sooner.
Depending on how fast most messages get processed, the application could create a new worker on a new thread if the queues are getting too big or too old.
Please tell me if I'm completely off-base here though.

How can I monitor/manage queue in ZeroMQ?

First of all, I'm new to ZeroMQ and message queue systems, so what I'm trying to do may be solved through a different approach. I'm designing a messaging system that does the following:
Multiple clients connect to a broker and send the id of an item that needs to be processed. The client disconnects immediately and does not wait for a response.
The broker sends items to workers, one item per worker, to perform some processing. Each return returns a signal that the processing was completed.
I have a rudimentary system setup which is processing requests/replies correctly, but I'd also like to be able to do the following:
Query the broker to see how many processes are actually running on the workers and how many are simply waiting to be run.
Have the broker ensure that only one process per id is running - if a duplicate id arrives and that item is not currently being processed by a worker, do not add it to the queue.
I'm using a poll setup with broker/dealer sockets. The code I'm using is very similar to this example from Ian Barber.
My first inclination (although I'm not sure how to implement it in zmq) is to have the broker keep track of the ids that have been received, and those that are actively being processed by workers. It seems that the broker forwards requests to workers immediately, regardless of whether or not they are available to actually run the processing. The workers then queue up the ids and process them in order. This isn't ideal since I'm looking to be able to monitor and control what is going on in the system centrally to achieve reliability.
Anyways, any hints, tips or examples of this type of setup would be greatly appreciated.
ZeroMQ is, in my opinion, best used in broker-less designs, for which the library is designed. If you want to monitor the number of items in a queue, or throughput, or whatever, you're going to have to build that into the application/device/producer yourself. Since you're new to messaging, that could get out of hand real quick. Given this, I'd suggest looking into RabbitMQ (or a similar broker), which would provide these services for you out of the box. If you do adopt RabbitMQ (or rather, AMQP), I'd suggest using a fanout exchange for the scenario you describe above.
The Python library for ZeroMQ seems to come with a pattern for dealing with this: http://zeromq.github.com/pyzmq/devices.html#monitoredqueue

Resources