Missing Kafka events with 2 consumers on a Topic - go

Hi I have 2 consumers on a kafka topic. Both of them belong to different consumer groups but consuming from the same topic. I am using Go kafka library to consume the messages through both the consumers.
This problem only occurs when I see a number of events on this topic. for ex when 1000 events are seen on this topic for some reason one consumer is receiving around 600 and the other around 200. This number varies from time to time. In normal circumstances when there is 1 event, It is always consumed by both of them.
And I also noticed that the missing message on one consumer is seen on the other and vice versa. Though there might be messages missing on both of them as well. This eliminates the producer not producing the messages.
Is there anyway in which consumer groups get mixed up. I'm not sure what exactly is going on.
Does anyone know what's going on and how I can debug this further?
Thanks in advance.

Related

ActiveMQ - Competing Consumers with Selector - messages starve in the queue

ActiveMQ 5.15.13
Context: I have a single queue with multiple Consumers. I want to stop some consumers from processing certain messages. This has to be dynamic, I don't want to create separate queues for this. This works without any problems. e.g. Consumer1 ignores Stocks -> Consumer1 can process all invoices and Consumer2 can process all Stocks
But if there is a large number of messages already in the Queue (of one type, e.g. stocks) and I send a message of another type (e.g. invoices), Consumer1 won't process the message of type invoices. It will instead be idle until Consumer2 has processed all Stocks messages. It does not happen every time, but quite often.
Is there any option to change the order of the new messages coming into the queue, such that an idle consumer with matching selector picks up the new message?
Things I've already tried:
using a PendingMessageLimitStrategy -> it seems like it does not work for queues
increasing the maxPageSize and maxBrowsePageSize in the hope that once all Messages are in RAM, the Consumers will search for their messages.
Exclusive Consumers aren't an option since I want to be able to use more than one Consumer per message type.
Im pretty sure that there is some configuration which allows this type of usage. I'm aware that there are better solutions for this issue, but sadly I can't use them easily due to other constraints.
Thanks a lot in advance!
EDIT: I noticed that when I'm refreshing on the localhost queue browser, the stuck messages get executed immediately. It seems like this action performs some sort of queue refresh where the messages get filtered based on their selector again. So I just need this action whenever a new message enters the queue...
This is a 'window' problem where the next set of 'stocks' data needs to be processed before the 'invoicing' data can be processed.
The gotcha with window problems like this is that you need to account for the fact that some messages may never come through, or a consumer may never come back online either. Also, eventually you will be asked 'how many invoices or stocks are left to be processed'-- aka observability.
ActiveMQ has you covered-- check out wild-card destinations and consumers.
Produce 'stocks' to:
queue://data.stocks.input
Produce 'invoices' to:
queue://data.invoices.input
You then setup consumes to connect:
queue://data.*.input
note: the wildard '*'.
ActiveMQ will match queues based on the wildcard pattern, and then process data accordingly. As a bonus, you can still use a selector.

Does EventStoreDB provide message ordering by an event-key on the consumer side?

I have been exploring EventStoreDB and trying to understand more about the ordering of messages on the consumer side. Read about persistent subscriptions and also the Pinned consumer strategy here.
I have a scenario wherein inventory updates get pushed to eventstore and different streams get created by the different unique inventoryIds in the inventory event.
We have multiple consumers with the same consumerGroup name to read these inventory events. We are using Pinned Persistent Subscription with ResolveLinkTos enabled.
My question:
Will every message from a particular stream always go to the same consumer instance of the consumerGroup?
If the answer to the above question is yes, will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
The documentation has a warning that ordered message processing using persistent subscriptions is not guaranteed. Any strategy delivers messages with the best-effort level of ordering guarantees, if applicable.
There are a few reasons for this, some of those are:
Spreading out messages across consumer groups lead to a non-linearised checkpoint commit. It means that some messages can be processed before other messages.
Persistent subscriptions attempt to buffer messages, but when a timeout happens on the client side, the whole buffer is redelivered, which can eventually break the processing order
Built-in retry policies essentially can break the message order at any time
Most event log-based brokers, if not all, don't even attempt to guarantee ordered message delivery across multiple consumers. I often hear "but Kafka does it", ignoring the fact that Kafka delivers messages from one partition to at most one consumer in a group. There's no load balancing of one partition between multiple consumers due to exactly the same issue. That being said, EventStoreDB is still not a broker, but a database for events.
So, here are the answers:
Will every message from a particular stream always go to the same consumer instance of the consumer group?
No. It might work most of the time, but it will eventually break.
will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
Most of the time, yes, but again, if a message is being retried, you might get the next message before the previous one is Acked.
Overall, load-balancing ordered processing of messages, which aren't pre-partitioned on the server is not an easy task. At most, you get messages re-delivered if the checkpoint fails to persist at some point, and the consumers restart.

One partition multiple consumers same group, consumer IDs

We have one topic with one partition due to ordering of message requirements. We have two consumers running on different servers with same set of configurations i.e. groupId, consumerId, consumerGroup. i.e.
1 Topic -> 1 Partition -> 2 Consumers
When we deploy consumers same code is deployed on both the servers. Noticed when a message comes we see both the consumers are consuming message rather than only one processing. Reason having consumers running on two separate servers is if one server crashes at least other can continue processing messages. But looks like if both up both consuming messages. Reading Kafka docs it says if we have more consumers than partitions then some stay idle don't see that happening. Anything we are missing on configuration side apart from consumerId & groupId. Thanks
As #Gary Russel said, as long as the two consumer instances have their own consumer group, they will consume every event that is written to the topic. Just put them into the same consumer-group. You can provide a consumer-group-id in the consumer.properties.

AWS SQS - Queue not delivering any messages until Visibility Timeout expires for one message

EDIT: Solved this one while I was writing it up :P -- I love those kind of solutions. I figured I'd post it anyway, maybe someone else will have the same problem and find my solution. Don't care about points/karma, etc. I just already wrote the whole thing up, so figured I'd post it and the solution.
I have an SQS FIFO queue. It is using a dead letter queue. Here is how it had been configured:
I have a single producer microservice, and I have 10 ECS images that are running as consumers.
It is important that we process the messages close to the time they are delivered in the queue for business reasons.
We're using a fairly recent version of the AWS SDK Golang client package for both producer and consumer code (if important, I can go look up the version, but it is not terribly outdated).
I capture the logs for the producer so I know exactly when messages were put in the queue and what the messages were.
I capture aggregate logs for all the consumers, so I have a full view of all 10 consumers and when messages were received and processed.
Here's what I see under normal conditions looking at the logs:
Message put in the queue at time x
Message received by one of the 10 consumers at time x
Message processed by consumer successfully
Message deleted from queue by consumer at time x + (0-2 seconds)
Repeat ad infinitum for up to about 700 messages / day at various times per day
But the problem I am seeing now is that some messages are not being processed in a timely manner. Occasionally we fail processing a message deliberately b/c of the state of the system for that message (e.g. maybe users still logged in, so it should back off and retry...which it does). The problem is if the consumer fails a message it is causing the queue to stop delivering any other messages to any other consumers.
"Failure to process a message" here just means the message was received, but the consumer declared it a failure, so we just log an error, and do not proceed to delete it from the queue. Thus, the visibility timeout (here 5m) will expire and it will be re-delivered to another consumer and retried up to 10 times, after which it will go to the dead letter queue.
After delving into the logs and analyzing it, here's what I'm seeing:
Process begins like above (message produced, consumed, deleted).
New message received at time x by consumer
Consumer fails -- logs error and just returns (does not delete)
Same message is received again at time x + 5m (visibility timeout)
Consumer fails -- logs error and just returns (does not delete)
Repeat up to 10x -- message goes to dead-letter queue
New message received but it is now 50 minutes late!
Now all messages that were put in the queue between steps 2-7 are 50 minutes late (5m visibility timeout * 10 retries)
All the docs I've read tells me the queue should not behave this way, but I've verified it several times in our logs. Sadly, we don't have a paid AWS support plan, or I'd file a ticket with them. But just consider the fact that we have 10 separate consumers all reading from the same queue. They only read from this queue. We don't have any other queues it is using.
For de-duplication we are using the automated hash of the message body. Messages are small JSON documents.
My expectation would be if we have a single bad message that causes a visibility timeout, that the queue would still happily deliver any other messages it has available while there are available consumers.
OK, so turns out I missed this little nugget of info about FIFO queues in the documentation:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
When you receive a message with a message group ID, no more messages
for the same message group ID are returned unless you delete the
message or it becomes visible.
I was indeed using the same Message Group ID. Hadn't given it a second thought. Just be aware, if you do that and any one of your messages fails to process, it will back up all other messages in the queue, until the time that the message is finally dealt with. The solution for me was to change the message group id. There is some business logic id I can postfix on it that will work for me.

Approach for taking action on reception of two different JMS messages

Say I have one JMS message FooCompleted
{"businessId": 1,"timestamp": "20140101 01:01:01.000"}
and another JMS message BazCompleted
{"businessId": 1,"timestamp": "20140101 01:02:02.000"}
The use case is that I want some action triggered when both messages have been received for the business id in question - essentially a join point of reception of the two messages. The two messages are published on two different queues and order between reception of FooCompleted and BazCompleted may change. In reality, I may need to have join of reception of several different messages for the businessId in question.
The naive approach was that to store the reception of the message in a db and check if message(s) its dependent join arm(s) have been received and only then kick off the action desired. Given that the problem seems generic enough, we were wondering if there is a better way to solve this.
Another thought was to move messages from these two queues into a third queue on reception. The listener on this third queue will be using a special avataar of DefaultMessageListenerContainer which overrides the doReceiveAndExecute to call receiveMessage for all outstanding messages in the queue and adding messages back to the queue whose all dependent messages have not yet arrived - the remaining ones will be acknowledged and hence removed. Given that the quantum of messages will be low, probing the queue over and adding messages again should not be a problem. The advantage would be avoiding the DB dependency and the associated scaffolding code. Wanted to see if there is something glaringly bad with this
Gurus, please critique and point out better ways to achieve this.
Thanks in advance!
Spring Integration with a JMS message-driven adapter and an aggregator with custom correlation and release strategies, and a peristent (JDBC) message store will provide your first solution without writing much (or any) code.

Resources