Techniques to deal with complexity of event driven apps - events

What are the techniques, in general, to write system which receives events from outside world? E.g. Bluetooth application can receive many low level events from HCI. It has to track the state which is very often temporary and allow/disallow operation based on current state.
Are there some specific techniques that helps deal with such complexity? Is there any practical literature to describe some in more details?

Well observer pattern is the primary and perhaps only pattern to deal with such cases. Though there are variations to it like message driven collaborations, pub-sub, topic or tag based collaborations but most of them all reuse the ideology of observer pattern.
Now comping to complexity. Normally event observers and publishers are stateless in nature. i.e. they receive and event and they either propagate them or consume them. Now propagation is easy but consumptions may be tricky as the meaning of consumption may differ based on the target and the event type.
Then there are complexities related to performance of the operations of producing and consuming events. Here again a good design would tend to keep events mutually exclusive to each other. that means that two clicks of a button or two file writes are not dependent on each other in any way. Same should be the case about consumption of event as well. The consumption process should not be dependent on the sequence of arrival of events.
A good event observer design leverages stateless producers and consumers, domain driven event objects (an event object that know where it is supposed to go) and concurrency in producing and consuming to the brim.

Related

Is REPLICATE DATA pattern good option to minimize synchronous micro-services communication?

In a world of microservices, often one microservice needs to invoke another, synchronous or asynchronous way.
In the case of synchronous way of communication, I have understood that it affects the availbility of services, as both services need to be available during calls.
To minimize this synchronous way of communication, one possible solution is to have DATA REPLICATION at client service. The client service also up-to-date data by listening to events published by services.
According to me, this is not a good choice as we are duplicating data and it might become stale and also database overhead.
what will be the best suitable scenario when the above pattern will be the best suit?
Microservices are distributed systems. This means that they are constrained by the CAP theorem, which basically means you have a choice between:
Sacrifice availability to preserve consistency: this would (among other things) lead to one service invoking functionality in another in a synchronous way. If the other service is unavailable, so is all functionality in this service which depends on that service's functionality.
Sacrifice consistency to preserve availability: you build services to be autonomous and not depend on other services being up. This leads in fairly short order to services not sharing databases and to asynchronous replication of data (because if service A has synchronously replicated data from service B, then service B being down doesn't affect A's availability, but A being down affects B's availability): with asynchronous replication, the best you can hope for is eventual consistency.
The choice between those two (if you happen to have the ability to freeze the entire universe if there's a network partition, you might be able to sacrifice partition tolerance for consistency and availability) is ultimately a business question (it's worth noting that there's a continuum of approaches between those extremes). How much are you spending on storage and on designing an (arguably) more complex system vs. how much are you losing by being unavailable?
It should be noted that the universe is inherently eventually consistent: the sun could have gone supernova a few minutes ago and we can't know it for a few minutes more.
As for the concern about duplicated data: chances are the data is already duplicated (backups) and in any database worth using the data is duplicated (the write-ahead log).
As for situations, it's a lot harder to think of a situation where aiming for strong consistency is strictly the most suitable option.
But for an example, consider a chain of coffee shops. We have a cash register service and we have a loyalty/rewards service. Data from the loyalty/rewards service is needed by the cash register (if a customer is redeeming a "50% off a latte" reward you'd want the register to know that it's valid), and every transaction (at least those with a loyalty ID) at the register should be known by the rewards service.
If we want the reward redemptions to be consistent, then it implies that if the loyalty/rewards service is inaccessible from the register, no rewards can be redeemed. There's a nonzero chance that a customer who can't redeem a reward just walks out (and a further nonzero chance that they never get coffee from you again).
Conversely, if we want both services to have a consistent view then we're demanding that if the power's out at any store we can't determine new rewards, or if the loyalty/rewards service is inaccessible from the register, no new sales can be made.
The solution is for both services to maintain the data they need to function, even if another service controls updates to that data. They'll eventually catch up. In the case of reward redemption, assuming the unavailability happens rarely enough, it may even be desirable to have the cash register perform a preliminary validation and if that passes, assume that the reward is valid and submit it later to the loyalty/reward service.

How do I ensure that only one consumer actually consumes a published message?

I use Rabbitmq with microservice architecture. I use topic and direct exchange for many of my use-cases, and it works fine. However I have a use-case where I have to delete a record from database. When I deleted the record several other services needs to be called and maintain/delete the referenced records. I could achieve that by simple call those services with direct exchange, but I read that it is choreography preferred instead orchestration. That means the I should implement publish/subscribe pattern(fanout in rabbitmq).
My question is that if I use the publish/subscribe pattern in a distributed system how to make sure that only one instance by service consumes the published messages?
Your question doesn't deal so much with publish-subscribe, as it does with basic message processing. The fundamental issue is whether or not you can guarantee that an operation will be performed exactly one time. The short answer is that you probably want to use a direct exchange such that a message goes into one queue and is processed by one (of possibly many) consumers.
The long answer is that "exactly once" cannot be guaranteed, so you need to make this part of your design.
Background
It is best practice to have message processing be an idempotent operation. In fact, idempotency is a critical design assumption of almost any external interface (and I would argue it is equally-important in internal interfaces).
Additionally, you should be aware of the fact it is not possible to guarantee "exactly once" delivery. Mathematically, no such guarantee can be made. Instead, you can have one of two things (being mutually exclusive):
At most once delivery (0 < n <= 1)
At least once delivery (1 <= n)
From the RabbitMQ documentation:
Use of acknowledgements guarantees at-least-once delivery. Without acknowledgements, message loss is possible during publish and consume operations and only at-most-once delivery is guaranteed.
Several things are happening when messages are published and consumed. Because of the asynchronous nature of message handling systems, and the AMQP protocol in particular, there is no way to guarantee exactly once processing while still yielding the performance you would need from a messaging system (essentially, to try to ensure exactly-once would forces everything through a serial process at the point of de-duplication).
Design Implications
Given the above, it is important that your design rely upon "at least once" delivery. For a delete operation, this involves re-writing the definition of that operation to be assertive rather than procedural (e.g. "Delete this" becomes "Ensure this does not exist."). The difference is that you describe the end-state rather than the process.
I thing you should have a separate queue for each of the service that instance should be notified about db record deletion. The exchanger puts a copy of a message in all queues. Service instances compete for access to dedicated queue (only one gets a message).

Using data structures and algorithms, other than queue and FIFO, for handling and prioritizing requests

Using queue and first-in-first-out algorithm seems like a standard way of dealing with requests to servers/databases/services. But could any other data structures and algorithms be used for dealing with large quantities of requests?
There is an article at http://queue.acm.org/detail.cfm?id=2991130 "Scaling Synchronization in Multicore Programs" which goes into the detail of reducing the locking costs when multiple cores are trying to update the same FIFO queue.
If different users or different requests have different priorities then a FIFO queue won't do, and you need something that takes account of the priority. I have often seen ordered maps used for this, such as TreeMap and std::map or std::multimap. It's not the use case these are designed for, but people know how to use them, and they are typically highly optimised, so they can be faster than most people's attempt at implementing a more specialised data structure.

JMS queue consumer: synchronous receive() or single-threaded onMessage()

I need to consume from a Q, and stamp a sequence key on each message to indicate the ordering. i.e. the consumption needs to be sequential. From performance/throughput point of view, would I be better off using a blocking receive() method, or an async listener with a single-threaded configuration on the onMessage() method?
Thanks.
There are many aspects that will affect the performance and throughput; in pure JMS terms it's not really possible to state that the sync or async model of getting messages will be any less or more efficient. It will depend on a large number of factors from how the application is written, other resources it's using, implementation of your chosen messaging provider and other factors such as machine performance and configuration of both client and server machines.
This discussion,
Single vs Multi-threaded JMS Producer, covered some of these topics.
To the sequence, if you are single threaded, with a single session the JMS specification gives some assurances on message ordering; best to review the spec to see if it matches your overall requirements.
Often people will insert an application sequence number at message production time; the consumer can therefore check they are getting the correct message in order. Adding a sequence number at consumption time won't specifically help that consumer.
Keep in mind that the stricter the requirement for messaging ordering the more restrictive the overall architecture gets and the harder it is to implement horizontal scalabilty.

How to dedupe parallel event streams

Is there a standard approach for deduping parallel event streams ? Before I attempt to reinvent the wheel, I want to know if this problem has some known approaches.
My client component will be communicating with two servers. Each one is providing a near real-time event stream (~1 second). The events may occasionally be out of order. Assume I can uniquely identify the events. I need to send a single stream of events to the consuming code at the same near real-time performance.
A lot has been written about this kind of problem. Here's a foundational paper, by Leslie Lamport:
http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks
The Wikipedia article on Operational Transformation theory is a perfectly good starting point for further research:
http://en.wikipedia.org/wiki/Operational_transformation
As for your problem, you'll have to choose some arbitrary weight to measure the cost of delay vs the cost of dropped events. You can maintain two priority queues, time-ordered, where incoming events go. You'd do a merge-and on the heads of the two queues with some delay (to allow for out-of-order events), and throw away events that happened "before" the timestamp of whatever event you last sent. If that's no better than what you had in mind already, well, at least you get to read that cool Lamport paper!
I think that the optimization might be OS-specific. From the task as you described it I think about two threads consuming incoming data and appending it to the common stream having access based on mutexes. Both Linux and Win32 have mutex-like procedures, but they may have slow performance if you have data rate is really great. In this case I'd operate by blocks of data, that will allow to use mutexes not so often. Sure there's a main thread that consumes the data and it also access it with a mutex.

Resources