Microservice Event driven Design with multiple Instances - spring

At the Moment we design and plan to transform our system to a microservice architecture pattern.
To loose coupling we think about an event driven design with an JMS Topic. This looks great. But i don't now how we can solve the problem with multiple instances of a microservice.
For failover and load balancing we have n instances of each service. If an event is published to the topic each instance will receive and process that event.
It's possible to handle this with locks and processed states in the data storage. But this solution looks very expensive and every instance has the same work. This is not a load balaning for me.
Is there some good Solution or best practice for this pattern?

Why not use a Queue instead of a Topic? Then your instances will compete for messages rather than all get a copy.
EDIT
rabbitmq might be a better fit for you - publish to a fanout exchange and have any number of queues bound to it, with each queue having any number of competing consumers.
I have also seen JMS topics used where competing clients connect with the same client id. Some (all?) brokers will only allow one such client to consume. The others keep trying to reconnect until the current consumer dies.

Related

AWS SNS — How generic should topics be and when should we reuse/create topics?

We are introducing SNS + SQS to handle event production and propagation in our micro services architecture, which has so far relied on HTTPS calls to communicate with each other. We are considering connecting multiple SQS queues onto one SNS topic. The events in the queues will then be consumed by a lambda or a service running in EC2.
My question is, how generic should the topics be? When should we create new topics?
Say, we have a user domain which needs to publish two events—created and deleted. Two options we are considering are:
OPTION A: Have two topics, "user-created" and "user-deleted". Each topic guarantees a single event type.
the consumers would not have to worry about discarding events that they are not interested in, as they know already know the messages coming from a "user-created" topic is only related to user creations.
multiple different parts of the code publishing to the same topic
OPTION B: Have one topic, "users", that accepts multiple event types
the consumers would have an additional responsibility of filtering through the events or taking different actions depending on the type of the event (they can also configure their queues subscriptions to filter certain event types)
can ensure a single publisher for each topic
Does anyone have a strong preference for either of the options and why would that be?
On a related note, where would you include the cloud configuration for each of the resources? (should the queue resource creation be deployed together with the consumers, or should they live independently from any of the publishers/consumers?)
I think you should go with Option B and keep all events concerning a given "domain" (e.g. "user") in a single topic:
keeps your infrastructure simple
you might introduce services interested in multiple event types (e.g. "create" and "delete"). Its kind of tricky to get the ordering right consuming this from two topics; imagine a "user-delete" event arrive before the "user-create" event
throughput might be an issue, this really depends on your domain (creating and deleting users doesn't sound like a high volume issue)
think about changes in the data structures in your topics, introducing changes in two or more topics simultaniously can get complicated pretty fast
Concerning your other question: Keep your topic/infrastructure configuration separate from your services. It's an individual piece of infrastructure (like a database) and should kept separate; especially if you introduce more consumers & producers to your system.
EDIT: This might be an example "setup":
Repository user-service contains the service/lambda code, cloudformation/terraform templates for the service and its topic subscriptions
Repository sns contains all cloudformation/terraform templates concerning SNS topics
Repository sqs contains all cloudformation/terraform templates concerning SQS topics
You can think about keeping the SNS & SQS infra code in a single repository (the last two), but I would strongly recommend everything specific to a certain service/lambda to be kept in separate repositories.
Generally it helps to think about your topics as a "database", this line of thinking should point you in the right direction for all your questions.

MassTransit Multiple Consumers

I have an environment where I have only one app server. I have some messages that take awhile to service (like 10 seconds or so) and I'd like to increase throughput by configuring multiple instances of my consumer application running code to process these messages. I've read about the "competing consumer" pattern and gather that this should be avoided when using MassTransit. According to the MassTransit docs here, each receive endpoint should have a unique queue name. I'm struggling to understand how to map this recommendation to my environment. Is it possible to have N instances of consumers running that each receive the same message, but only one of the instances will actually act on it? In other words, can we implement the "competing consumer" pattern but across multiple queues instead of one?
Or am I looking at this wrong? Do I really need to look into the "Send" method as opposed to "Publish"? The downside with "Send" is that it requires the sender to have direct knowledge of the existence of an endpoint, and I want to be dynamic with the number of consumers/endpoints I have. Is there anything built in to MassTransit that could help with the keeping track of how many consumer instances/queues/endpoints there are that can service a particular message type?
Thanks,
Andy
so the "avoid competing consumers" guidance was from when MSMQ was the primary transport. MSMQ would fall over if multiple threads where reading from the queue.
If you are using RabbitMQ, then competing consumers work brilliantly. Competing consumers is the right answer. Each competing consume will use the same receive from endpoint.

RabbitMQ Fanout Exchange (VirtualTopic Equivalent)

I'm looking at swapping out ActiveMQ with RabbitMQ for a few reasons. I currently have multiple services which are each capable of publishing events (and they publish those events to a specific VirtualTopic in AMQ). Each of the services is also capable of consuming messages from the other services. Consumers are set up such that they subscribe as a consumer to a queue on the VirtualTopic.
This buys me the ability to fan messages out to multiple queues (topic-like functionality) while keeping the benefits of queues (load balancing and persistence).
It seems like this is roughly equivalent to RabbitMQ's fanout exchange. However, the part that I found very useful in ActiveMQ is that the producer doesn't need to have any knowledge of the consumers. It simply publishes to the virtual topic. It seems that in RabbitMQ, when the exchange is created, I need a definitive of queues to publish that message to.
tl;dr
Is there any routing scheme in RabbitMQ that is equivalent to ActiveMQ's Virtual Topic, such that I can produce messages to a topic that are distributed to any queue that has been created off of that Virtual Topic, without requiring a hard-coded routing scheme somewhere in RMQ?
I realized after posting this question that it is pretty trivial to do this (not sure why I never thought of it before).
I was looking at it from the wrong direction, wondering how I could automatically have the publisher configure queues for the recipients - which isn't the right way to approach this question.
Instead, I have the subscribers, when they start up, bind themselves to the exchange that the publisher users, which provides in the inversion of control I'm looking for (publishers need not know anything about their consumers).

What is the best way to deliver real-time messages to Client that can not be requested

We need to deliver real-time messages to our clients, but their servers are behind a proxy, and we cannot initialize a connection; webhook variant won't work.
What is the best way to deliver real-time messages considering that:
client that is behind a proxy
client can be off for a long period of time, and all messages must be delivered
the protocol/way must be common enough, so that even a PHP developer could easily use it
I have in mind three variants:
WebSocket - client opens a websocket connection, and we send messages that were stored in DB, and messages comming in real time at the same time.
RabbitMQ - all messages are stored in a durable, persistent queue. What if partner will not read from a queue for some time?
HTTP GET - partner will pull messages by blocks. In this approach it is hard to pick optimal pull interval.
Any suggestions would be appreciated. Thanks!
Since you seem to have to store messages when your peer is not connected, the question applies to any other solution equally: what if the peer is not connected and messages are queueing up?
RabbitMQ is great if you want loose coupling: separating the producer and the consumer sides. The broker will store messages for you if no consumer is connected. This can indeed fill up memory and/or disk space on the broker after some time - in this case RabbitMQ will shut down.
In general, RabbitMQ is a great tool for messaging-based architectures like the one you describe:
Load balancing: you can use multiple publishers and/or consumers, thus sharing load.
Flexibility: you can configure multiple exchanges/queues/bindings if your business logic needs it. You can easily change routing on the broker without reconfiguring multiple publisher/consumer applications.
Flow control: RabbitMQ also gives you some built-in methods for flow control - if a consumer is too slow to keep up with publishers, RabbitMQ will slow down publishers.
You can refactor the architecture later easily. You can set up multiple brokers and link them via shovel/federation. This is very useful if you need your app to work via multiple data centers.
You can easily spot if one side is slower than the other, since queues will start growing if your consumers can't read fast enough from a queue.
High availability and fault tolerance. RabbitMQ is very good at these (thanks to Erlang).
So I'd recommend it over the other two (which might be good for a small-scale app, but you might grow it out quickly is requirements change and you need to scale up things).
Edit: something I missed - if it's not vital to deliver all messages, you can configure queues with a TTL (message will be discarded after a timeout) or with a limit (this limits the number of messages in the queue, if reached new messages will be discarded).

JMS Producer-Consumer-Observer (PCO)

In JMS there are Queues and Topics. As I understand it so far queues are best used for producer/consumer scenarios, where as topics can be used for publish/subscribe. However in my scenario I need a way to combine both approaches and create a producer-consumer-observer architecture.
Particularly I have producers which write to some queues and workers, which read from these queues and process the messages in those queues, then write it to a different queue (or topic). Whenever a worker has done a job my GUI should be notified and update its representation of the current system state. Since workers and GUI are different processes I cannot apply a simple observer pattern or notify the GUI directly.
What is the best way to realize this using a combination of queues and/or topics? The GUI should always be notified, but it should never consume anything from a queue?
I would like to solve this with JMS directly and not use any additional technology such as RMI to implement the observer part.
To give a more concrete example:
I have a queue with packages (PACKAGEQUEUE), produced by machine (PackageProducer)
I have a worker which takes a package from the PACKAGEQUEUE adds an address and then writes it to a MAILQUEUE (AddressWorker)
Another worker processes the MAILQUEUE and sends the packages out by mail (MailWorker).
After step 2. when a message is written to the MAILQUEUE, I want to notify the GUI and update the status of the package. Of course the GUI should not consume the messages in the MAILQUEUE, only the MailWorker must consume them.
You can use a combination of queue and topic for your solution.
Your GUI application can subscribe to a topic, say MAILQUEUE_NOTIFICATION. Every time (i.e at step 2) PackageProducer writes message to MAILQUEUE, a copy of that message should be published to MAILQUEUE_NOTIFICATION topic. Since the GUI application has subscribed to the topic, it will get that publication containing information on status of the package. GUI can be updated with the contents of that publication.
HTH

Resources