How do I ensure that only one consumer actually consumes a published message? - spring-boot

I use Rabbitmq with microservice architecture. I use topic and direct exchange for many of my use-cases, and it works fine. However I have a use-case where I have to delete a record from database. When I deleted the record several other services needs to be called and maintain/delete the referenced records. I could achieve that by simple call those services with direct exchange, but I read that it is choreography preferred instead orchestration. That means the I should implement publish/subscribe pattern(fanout in rabbitmq).
My question is that if I use the publish/subscribe pattern in a distributed system how to make sure that only one instance by service consumes the published messages?

Your question doesn't deal so much with publish-subscribe, as it does with basic message processing. The fundamental issue is whether or not you can guarantee that an operation will be performed exactly one time. The short answer is that you probably want to use a direct exchange such that a message goes into one queue and is processed by one (of possibly many) consumers.
The long answer is that "exactly once" cannot be guaranteed, so you need to make this part of your design.
Background
It is best practice to have message processing be an idempotent operation. In fact, idempotency is a critical design assumption of almost any external interface (and I would argue it is equally-important in internal interfaces).
Additionally, you should be aware of the fact it is not possible to guarantee "exactly once" delivery. Mathematically, no such guarantee can be made. Instead, you can have one of two things (being mutually exclusive):
At most once delivery (0 < n <= 1)
At least once delivery (1 <= n)
From the RabbitMQ documentation:
Use of acknowledgements guarantees at-least-once delivery. Without acknowledgements, message loss is possible during publish and consume operations and only at-most-once delivery is guaranteed.
Several things are happening when messages are published and consumed. Because of the asynchronous nature of message handling systems, and the AMQP protocol in particular, there is no way to guarantee exactly once processing while still yielding the performance you would need from a messaging system (essentially, to try to ensure exactly-once would forces everything through a serial process at the point of de-duplication).
Design Implications
Given the above, it is important that your design rely upon "at least once" delivery. For a delete operation, this involves re-writing the definition of that operation to be assertive rather than procedural (e.g. "Delete this" becomes "Ensure this does not exist."). The difference is that you describe the end-state rather than the process.

I thing you should have a separate queue for each of the service that instance should be notified about db record deletion. The exchanger puts a copy of a message in all queues. Service instances compete for access to dedicated queue (only one gets a message).

Related

Is REPLICATE DATA pattern good option to minimize synchronous micro-services communication?

In a world of microservices, often one microservice needs to invoke another, synchronous or asynchronous way.
In the case of synchronous way of communication, I have understood that it affects the availbility of services, as both services need to be available during calls.
To minimize this synchronous way of communication, one possible solution is to have DATA REPLICATION at client service. The client service also up-to-date data by listening to events published by services.
According to me, this is not a good choice as we are duplicating data and it might become stale and also database overhead.
what will be the best suitable scenario when the above pattern will be the best suit?
Microservices are distributed systems. This means that they are constrained by the CAP theorem, which basically means you have a choice between:
Sacrifice availability to preserve consistency: this would (among other things) lead to one service invoking functionality in another in a synchronous way. If the other service is unavailable, so is all functionality in this service which depends on that service's functionality.
Sacrifice consistency to preserve availability: you build services to be autonomous and not depend on other services being up. This leads in fairly short order to services not sharing databases and to asynchronous replication of data (because if service A has synchronously replicated data from service B, then service B being down doesn't affect A's availability, but A being down affects B's availability): with asynchronous replication, the best you can hope for is eventual consistency.
The choice between those two (if you happen to have the ability to freeze the entire universe if there's a network partition, you might be able to sacrifice partition tolerance for consistency and availability) is ultimately a business question (it's worth noting that there's a continuum of approaches between those extremes). How much are you spending on storage and on designing an (arguably) more complex system vs. how much are you losing by being unavailable?
It should be noted that the universe is inherently eventually consistent: the sun could have gone supernova a few minutes ago and we can't know it for a few minutes more.
As for the concern about duplicated data: chances are the data is already duplicated (backups) and in any database worth using the data is duplicated (the write-ahead log).
As for situations, it's a lot harder to think of a situation where aiming for strong consistency is strictly the most suitable option.
But for an example, consider a chain of coffee shops. We have a cash register service and we have a loyalty/rewards service. Data from the loyalty/rewards service is needed by the cash register (if a customer is redeeming a "50% off a latte" reward you'd want the register to know that it's valid), and every transaction (at least those with a loyalty ID) at the register should be known by the rewards service.
If we want the reward redemptions to be consistent, then it implies that if the loyalty/rewards service is inaccessible from the register, no rewards can be redeemed. There's a nonzero chance that a customer who can't redeem a reward just walks out (and a further nonzero chance that they never get coffee from you again).
Conversely, if we want both services to have a consistent view then we're demanding that if the power's out at any store we can't determine new rewards, or if the loyalty/rewards service is inaccessible from the register, no new sales can be made.
The solution is for both services to maintain the data they need to function, even if another service controls updates to that data. They'll eventually catch up. In the case of reward redemption, assuming the unavailability happens rarely enough, it may even be desirable to have the cash register perform a preliminary validation and if that passes, assume that the reward is valid and submit it later to the loyalty/reward service.

ZeroMQ pattern for load balancing work across workers based on idleness

I have a single producer and n workers that I only want to give work to when they're not already processing a unit of work and I'm struggling to find a good zeroMQ pattern.
1) REQ/REP
The producer is the requestor and creates a connection to each worker. It tracks which worker is busy and round-robins to idle workers
Problem:
How to be notified of responses and still able to send new work to idle workers without dedicating a thread in the producer to each worker?
2) PUSH/PULL
Producer pushes into one socket that all workers feed off, and workers push into another socket that the producer listens to.
Problem:
Has no concept of worker idleness, i.e. work gets stuck behind long units of work
3) PUB/SUB
Non-starter, since there is no way to make sure work doesn't get lost
4) Reverse REQ/REP
Each worker is the REQ end and requests work from the producer and then sends another request when it completes the work
Problem:
Producer has to block on a request for work until there is work (since each recv has to be paired with a send ). This prevents workers to respond with work completion
Could be fixed with a separate completion channel, but the producer still needs some polling mechanism to detect new work and stay on the same thread.
5) PAIR per worker
Each worker has its own PAIR connection allowing independent sending of work and receipt of results
Problem:
Same problem as REQ/REP with requiring a thread per worker
As much as zeroMQ is non-blocking/async under the hood, I cannot find a pattern that allows my code to be asynchronous as well, rather than blocking in many many dedicated threads or polling spin-loops in fewer. Is this just not a good use case for zeroMQ?
Your problem is solved with the Load Balancing Pattern in the ZMQ Guide. It's all about flow control whilst also being able to send and receive messages. The producer will only send work requests to idle workers, whilst the workers are able to send and receive other messages at all times, e.g. abort, shutdown, etc.
Push/Pull is your answer.
When you send a message in ZeroMQ, all that happens initially is that it sits in a queue waiting to be delivered to the destination(s). When it has been successfully transferred it is removed from the queue. The queue is limited in length, but can be set by changing a socket's high water mark.
There is a/some background thread(s) that manage all this on your behalf, and your calls to the ZeroMQ API are simply issuing instructions to that/those threads. The threads at either end of a socket connection are collaborating to marshall the transfer of messages, i.e. a sender won't send a message unless the recipient can receive it.
Consider what this means in a push/pull set up. Suppose one of your pull workers is falling behind. It won't then be accepting messages. That means that messages being sent to it start piling up until the highwater mark is reached. ZeroMQ will no longer send messages to that pull worker. In fact AFAIK in ZeroMQ, a pull worker whose queue is more full than those of its peers will receive less messages, so the workload is evened out across all workers.
So What Does That Mean?
Just send the messages. Let 0MQ sort it out for you.
Whilst there's no explicit flag saying 'already busy', if messages can be sent at all then that means that some pull worker somewhere is able to receive it solely because it has kept up with the workload. It will therefore be best placed to process new messages.
There are limitations. If all the workers are full up then no messages are sent and you get blocked in the push when it tries to send another message. You can discover this only (it seems) by timing how long the zmq_send() took.
Don't Forget the Network
There's also the matter of network bandwidth to consider. Messages queued in the push will tranfer at the rate at which they're consumed by the recipients, or at the speed of the network (whichever is slower). If your network is fundamentally too slow, then it's the Wrong Network for the job.
Latency
Of course, messages piling up in buffers represents latency. This can be restricted by setting the high water mark to be quite low.
This won't cure a high latency problem, but it will allow you to find out that you have one. If you have an inadequate number of pull workers, a low high water mark will result in message sending failing/blocking sooner.
Actually I think in ZeroMQ it blocks for push/pull; you'd have to measure elapsed time in the call to zmq_send() to discover whether things had got bottled up.
Thought about Nanomsg?
Nanomsg is a reboot of ZeroMQ, one of the same guys is involved. There's many things I prefer about it, and ultimately I think it will replace ZeroMQ. It has some fancier patterns which are more universally usable (PAIR works on all transports, unlike in ZeroMQ). Also the patterns are essentially a plugable component in the source code, so it is far simpler for patterns to be developed and integrated than in ZeroMQ. There is a discussion on the differences here
Philisophical Discussion
Actor Model
ZeroMQ is definitely in the realms of Actor Model programming. Messages get stuffed into queues / channels / sockets, and at some undetermined point in time later they emerge at the recipient end to be processed.
The danger of this type of architecture is that it is possible to have the potential for deadlock without knowing it.
Suppose you have a system where messages pass both ways down a chain of processes, say instructions in one way and results in the other. It is possible that one of the processes will be trying to send a message whilst the recipient is actually also trying to send a message back to it.
That only works so long as the queues aren't full and can (temporarily) absorb the messages, allowing everyone to move on.
But suppose the network briefly became a little busy for some reason, and that delayed message transfer. The message send might then fail because the high water mark had been reached. Whoops! No one is then sending anything to anyone anymore!
CSP
A development of the Actor Model, called Communicating Sequential Processes, was invented to solve this problem. It has a restriction; there is no buffering of messages at all. No process can complete sending a message until the recipient has received all the data.
The theoretical consequence of this was that it was then possible to mathematically analyse a system design and pronounce it to be free of deadlock. The practical consequence is that if you've built a system that can deadlock, it will do so every time. That's actually not so bad; it'll show up in testing, not post-deployment.
Curiously this is hinted at in the documentation of Microsoft's Task Parallel library, where they advocate setting buffer lengths to zero in the intersts of achieving a more robust application.
It'd be like setting the ZeroMQ high water mark to zero, but in zmq_setsockopt() 0 means default, not nought. The default is non-zero...
CSP is much more suited to real time applications. Any shortage of available workers immediately results in an inability to send messages (so your system knows it's failed to keep up with the real time demand) instead of resulting in an increased latency as data is absorbed by sockets, etc. (which is far harder to discover).
Unfortunately almost every communications technology we have (Ethernet, TCP/IP, ZeroMQ, nanomsg, etc) leans towards Actor Model. Everything has some sort of buffer somewhere, be it a packet buffer on a NIC or a socket buffer in an operating system.
Thus to implement CSP in the real world one has to implement flow control on top of the existing transports. This takes work, and it's slightly inefficient. But if a system that needs it, it's definitely the way to go.
Personally I'd love to see 0MQ and Nanomsg to adopt it as a behavioural option.

JMS queue consumer: synchronous receive() or single-threaded onMessage()

I need to consume from a Q, and stamp a sequence key on each message to indicate the ordering. i.e. the consumption needs to be sequential. From performance/throughput point of view, would I be better off using a blocking receive() method, or an async listener with a single-threaded configuration on the onMessage() method?
Thanks.
There are many aspects that will affect the performance and throughput; in pure JMS terms it's not really possible to state that the sync or async model of getting messages will be any less or more efficient. It will depend on a large number of factors from how the application is written, other resources it's using, implementation of your chosen messaging provider and other factors such as machine performance and configuration of both client and server machines.
This discussion,
Single vs Multi-threaded JMS Producer, covered some of these topics.
To the sequence, if you are single threaded, with a single session the JMS specification gives some assurances on message ordering; best to review the spec to see if it matches your overall requirements.
Often people will insert an application sequence number at message production time; the consumer can therefore check they are getting the correct message in order. Adding a sequence number at consumption time won't specifically help that consumer.
Keep in mind that the stricter the requirement for messaging ordering the more restrictive the overall architecture gets and the harder it is to implement horizontal scalabilty.

MassTransit selective consumers without round tripping

I am looking at using masstransit and have a need for selectively sending messages to consumers at the end if unreliable and slow network links (they are in the same WAN but use a slow and expensive cellular link).
I am expecting a fanout of 1 to 200 where the sites with lowest volume of messages and least reliable / most expensive links need to ignore the potentially high amount of message traffic othe consumers will see
I have looked at using the Selective consumer interface but this seems to imply that the message is always sent to all consumers, and then discarded if it doesn't match the predicate. This overhead is not acceptable.
Without using endpoint factory and manually managing uri end points to do a Send(), is there a nice way to do thus using subscriptions?
Simple answer: nope.
You do have a few options though. Is it just routing based upon load/processing? You could use competing consumers to do load balancing. All the endpoints read off the same queue (but they must be the same consumers on every process reading from the queue) and just pick up the next one. If you're slow, you just pick off fewer messages. (You can only use competing consumers with RabbitMQ).
For MSMQ there's a distributor that was built for load balancing. You could look at rebuilding that on top of RabbitMQ that if that's your transport. It's not super complicated, but would take some effort to do.
Other than that, I think you're likely down to writing something from scratch. It's not really pub/sub any more. So it falls outside MT's wheelhouse.

Regarding Akka message transfer performance: many small messages or less large messages?

For a data-mining algorithm I am currently developing using Akka, I was wondering if Akka implements performance optimizations of the messages that are sent.
For instance, if I have an Actor that emits a very large number of messages to the same other Actor, is it good to encapsulate a set of messages into another large message? Or does Akka have some sort of buffer itself so that not one message but many messages are transfered over the network at once?
I am asking this question because the algorithm is supposed to be executed remotely on a cluster where transfer performance is important and I currently have no option to just do benchmarks myself.
For messages passed in Akka on the same machine, I don't think it matters a lot whether you use small message or an aggregation of messages as single message. The additional overhead of many calls versus having to loop while processing the aggregation is minimal I think.
I would prefer using small messages because it keeps the system simpler.
However, when sending messages over the network Akka is using HTTP and so there is the additional HTTP overhead costs for setting up a connection etc. Therefore you might choose here to aggregate some messages into a single message.
However, this also depends on your use case. Buffering implies waiting for more until there are enough (or a timeout occured). If you cannot wait, e.g. because you need fast responses, then you still need to send each message over individually.
I don't think there is a standard Akka actor available which does some aggregation of messages. Maybe a special kind of routing could be applied which does the buffering.
Or you might have a look at Akka Streams. That does support buffering of messages.

Resources