How to read event only once in Azure Event Hub - events

I want to read events only once from Azure Event hub so that event gets processed only once.
But while reading these events getting same event in all partitions.
Tech Stack - Spring boot to read events via
EventProcessorHost host = EventProcessorHost.EventProcessorHostBuilder
.newBuilder(EventProcessorHost.createHostName(hostNamePrefix),
consumerGroupName)
.useAzureStorageCheckpointLeaseManager(storageConnectionString, storageContainerName, null)
.useEventHubConnectionString(eventHubConnectionString.toString(), eventHubName)
.build();
Thanks.

You can create up to 32 partition on Eventhub but when you send a message that will go only on single partition (if you would not put partition-id, Eventhub will send message to random partition )
Event Hubs retains data for a configured retention time that applies across all partitions in the event hub. Events expire on a time basis; you cannot explicitly delete them. Because partitions are independent and contain their own sequence of data, they often grow at different rates. partitions
maybe issue with your code, To confirm this behaviour you can use Microsoft ServiceBusExplorer to send and receive messages from Eventhub. When you will create Eventhub listener through Service bus you can receive partition-id and messages in real-time.

Related

Microservice Event driven communication - how to notify the caller only on command / event approach

I wonder, how do you avoid notifying all services that consume the same event? For example, service A and service B both of them consume event X. Based on some rules you want to send the event X only for service A. I am not talking about consumer groups (kafka) or even correlation-Id. As I am using Event-Driven microservices with the approach of command & event.
I think it's quite easy by using Kafka partitions/partition-key.
For example: In your topic X, you could just create it with many partitions. For each invocation, every service must specify its key, so based on the key, Kafka will do the rest of the job. So every time Service A sends a command, the consumer Service (the one who handles the command) will send the event with the same key. So in the end, Service A (the producer of the command) will receive the event on its own partition and will be the only service receiving it. So based on the Command/Event approach it may work.
On the other hand, by doing so, you are limiting one of the main benefits of partition which is allowing the scalability.

Autoscaling Backend and RabbitMQ Queues

I have an IoT system around 100k devices, publishing their state every second to the backend written in Java/Spring Boot. Until now, I was using gRPC but I see excessive CPU usage so I was planning to let the devices publish to RabbitMQ and let the backend workers process them.
Processing: Updating the db table.
Since data from same device must be processed sequentially, I was planning to use RabbitMQ's consistent hashing exchange, and bind the n queues for n workers. But I'm not sure how it'd work with autoscaling.
I thought of creating auto-delete queues for each backend instance and binding them to the exchange but I couldn't figure out:
How to rebalance messages already sitting in the queue?
If connectivity issue occurs, queue might get deleted, so I need to re-forward those messages to the existing queues.
Is there any algorithms for handle the autoscaling of workers? For instance if messages pile up, I need to spawn new workers even though cpu/memory usage is low.
I think I'll go with MQTT's shared subcriptions for this case.
https://emqx.medium.com/introduction-to-mqtt-5-0-protocol-shared-subscription-4c23e7e0e3c1
Sharing strategy
Although shared subscriptions allow subscribers to consume messages in
a load-balanced manner, the MQTT protocol does not specify what
load-balancing strategy the server should use. For reference, EMQ X
provides four strategies for users to choose: random, round_robin,
sticky, and hash.
random: randomly select one in all shared subscription sessions to publish messages
round_robin: select in turn according to subscription order
sticky: use a random strategy to randomly select a subscription session, continue to use the session until the subscription is cancelled or disconnect and repeat the process
hash: Hash the ClientID of the sender, and select a subscription session based on the hash result
Hash seems like what I'm looking for.

Does EventStoreDB provide message ordering by an event-key on the consumer side?

I have been exploring EventStoreDB and trying to understand more about the ordering of messages on the consumer side. Read about persistent subscriptions and also the Pinned consumer strategy here.
I have a scenario wherein inventory updates get pushed to eventstore and different streams get created by the different unique inventoryIds in the inventory event.
We have multiple consumers with the same consumerGroup name to read these inventory events. We are using Pinned Persistent Subscription with ResolveLinkTos enabled.
My question:
Will every message from a particular stream always go to the same consumer instance of the consumerGroup?
If the answer to the above question is yes, will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
The documentation has a warning that ordered message processing using persistent subscriptions is not guaranteed. Any strategy delivers messages with the best-effort level of ordering guarantees, if applicable.
There are a few reasons for this, some of those are:
Spreading out messages across consumer groups lead to a non-linearised checkpoint commit. It means that some messages can be processed before other messages.
Persistent subscriptions attempt to buffer messages, but when a timeout happens on the client side, the whole buffer is redelivered, which can eventually break the processing order
Built-in retry policies essentially can break the message order at any time
Most event log-based brokers, if not all, don't even attempt to guarantee ordered message delivery across multiple consumers. I often hear "but Kafka does it", ignoring the fact that Kafka delivers messages from one partition to at most one consumer in a group. There's no load balancing of one partition between multiple consumers due to exactly the same issue. That being said, EventStoreDB is still not a broker, but a database for events.
So, here are the answers:
Will every message from a particular stream always go to the same consumer instance of the consumer group?
No. It might work most of the time, but it will eventually break.
will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
Most of the time, yes, but again, if a message is being retried, you might get the next message before the previous one is Acked.
Overall, load-balancing ordered processing of messages, which aren't pre-partitioned on the server is not an easy task. At most, you get messages re-delivered if the checkpoint fails to persist at some point, and the consumers restart.

how to use same rabbitmq queue in different java microservice [duplicate]

I have implemented the example from the RabbitMQ website:
RabbitMQ Example
I have expanded it to have an application with a button to send a message.
Now I started two consumer on two different computers.
When I send the message the first message is sent to computer1, then the second message is sent to computer2, the thrid to computer1 and so on.
Why is this, and how can I change the behavior to send each message to each consumer?
Why is this
As noted by Yazan, messages are consumed from a single queue in a round-robin manner. The behavior your are seeing is by design, making it easy to scale up the number of consumers for a given queue.
how can I change the behavior to send each message to each consumer?
To have each consumer receive the same message, you need to create a queue for each consumer and deliver the same message to each queue.
The easiest way to do this is to use a fanout exchange. This will send every message to every queue that is bound to the exchange, completely ignoring the routing key.
If you need more control over the routing, you can use a topic or direct exchange and manage the routing keys.
Whatever type of exchange you choose, though, you will need to have a queue per consumer and have each message routed to each queue.
you can't it's controlled by the server check Round-robin dispatching section
It decides which consumer turn is. i'm not sure if there is a set of algorithms you can pick from, but at the end server will control this (i think round robin algorithm is default)
unless you want to use routing keys and exchanges
I would see this more as a design question. Ideally, producers should create the exchanges and the consumers create the queues and each consumer can create its own queue and hook it up to an exchange. This makes sure every consumer gets its message with its private queue.
What youre doing is essentially 'worker queues' model which is used to distribute tasks among worker nodes. Since each task needs to be performed only once, the message is sent to only one node. If you want to send a message to all the nodes, you need a different model called 'pub-sub' where each message is broadcasted to all the subscribers. The following link shows a simple pub-sub tutorial
https://www.rabbitmq.com/tutorials/tutorial-three-python.html

How to handle side effects based on multiple events in a message driven microservice system?

we are currently working in a message driven Microservice environment and some of our messages/events are event sourced (using Apache Kafka). Now we are struggling with implementing more complex business requirements, were we have to take multiple events into account to create new events and side effects.
In the current situation we are working with devices that can produce errors and we already process them and have a single topic which contains ERROR_OCCURRED and ERROR_RESOLVED events (so they are in order). We also make sure, that all messages regarding a specific device always go onto the same partition. And both messages share an ID that identifies that specific error incident. We already have a projection that consumes those events and provides an API for our customers, s.t. they can see all occurred errors and their current state.
Now we have to deal with the following requirement:
Reporting Errors
We need a push system that reports errors of devices to our external partners, but only after 15 minutes and if they have not been resolved in that timeframe. Our first approach was to consume all ERROR_RESOLVED events, store the IDs and have another consumer that is handling the ERROR_OCCURRED events in a delayed fashion (e.g. by only consuming the next ERROR_OCCURRED event on the topic if its timestamp is at least 15 minutes old). We would then be able to know if that particular error has already been resolved and does not need to be reported (since they share a common ID with the corresponding ERROR_RESOLVED event). Otherwise we send an HTTP request to our external partner and create an ERROR_REPORTED event on a new topic. Is there any better approach for delayed and conditional message processing?
We also have to take the following special use cases into account:
Service restarts: currently we are planning to keep the list of resolved errors in memory, so if a service restarts, that list has to be created from scratch. We could just replay the ERROR_RESOLVED messages, but that may take some time and in that time no ERROR_OCCURRED events should be processed because that may result in reporting errors that have been resolved in less then 15 minutes, but we are just not aware of it. Are there any good practices regarding replay vs. "normal" processing?
Scaling: we may increase or decrease the number of instances of our service at any time, so the partition assignment may change during runtime. That should not be a problem if we create a consumer group for each service instance when consuming the ERROR_RESOLVED events, s.t. every instance knows all resolved errors while still only handling the ERROR_OCCURRED events of its assigned partitions (in another consumer group which is shared by all instances). Is there a better approach for handling partition reassignment and internal state?
Thanks in advance!
For side effects, I would record all "side" actions in the event store. In your particular example, when it is time to send a notification, I would call SEND_NOTIFICATION command that emit NOTIFICATION_SENT event. These events would be processed by some worker process that does actual HTTP request.
Actually I would elaborate this even furter, since notifications could fail, so I would have, say, two events NOTIFICATION_REQUIRED, and NORIFICATION_SENT, so we can retry failed notifications.
And finally your logic would be "if error was not resolved in 15 minutes and notification was not sent - send a notification (or just discard if it missed its timeframe)"

Resources