CQRS: project out-of-order notifications in an ElasticSearch read model - elasticsearch

We have a microservice architecture and apply the CQRS pattern. A command sent to a microservice triggers an application state change and the emission of the corresponding event on our Kafka bus. We project these events in a read model built with ElasticSearch.
So far, so good.
Our microservices are eventually consistent with each other. But at any given time, they aren't (necessarily). Consequently, the events they send are not always consistent with each other either.
Moreover, to guarantee the coherence between an application state change and the emission of the corresponding event, we persist in DB the new state and the corresponding event in the same transaction (I am aware that we could use event sourcing and avoid persisting the state altogether). An asynchronous worker is then responsible to send these events on the Kafka bus. This pattern guarantees that at least one event will be sent for each state change (which is not an issue since our events are idempotent). However, since each microservice has its own event table and asynchronous worker, we cannot guarantee that events will be sent in the sequence in which the corresponding state changes occurred in their respective microservices.
EDIT: to clarify, each microservice has its own database, its own event table and its own worker. A specific worker processes the events in the order in which they were persisted in its corresponding event table, but different workers on different event tables, i.e. for distinct microservices, do not give such guarantee.
The problem arises when projecting these incoherent or out-of-sequence events from different microservices in the same ElasticSearch document.
A concrete example: let's imagine three different aggregates A, B and C (aggregate in the Domain Driven Design sense) managed by different microservices:
There is a many-to-many relation between A and B. Aggregate A references the aggregate roots B he is bound to, but B is unaware of its relationships with A. When B is deleted, the microservice managing A listens for the corresponding event and undoes the binding of A with B.
Similarily, there is a many-to-many relation between B and C. B knows of all related C aggregates, but the inverse is not true. When C is deleted, the microservice managing B listens for the corresponding event and undoes the binding of B with C.
C has a property "name".
One of the use cases is to find, through ElasticSearch, all aggregates A that are bound to an aggregate B that is in turn bound to an aggregate C with a specific name.
As explained above, the separate event tables and workers could introduce variable delays between the emission of events from different microservices. Creating A, B and C and binding them together could for example result in the following sequence of events:
B created
B bound to C
C created with name XYZ
A created
A bound to B
Another example of batch of events: let's suppose we initially have aggregates B and C and two commands are issued simultaneously:
delete C
bind B to C
this could result in the events:
C deleted
B bound to C
B unbound from C (in response to event 1)
Concretely, we have trouble projecting these events in ElasticSearch document(s) because the events sometimes reference aggregates that do not exist anymore or do not exist yet. Any help would be appreciated.

I don't think the problem you raise is exclusive to the projection part of your system - it can also happen between microservices A, B and C.
Normally, the projector gets C created at the same time as B does. Only then can B bind itself to C, which makes it impossible for the specific order you mentioned to happen to the projector.
However, you're right to say that the messages could arrive in the wrong order if for instance, the network communication between B and C is considerably faster than between C and the projector.
I've never come across such a problem, but a few options come to mind :
Don't enforce "foreign keys" at the read model level. Store B with its C reference even if you know very little about C for now. In other words, make B bound to C and C created commutative.
Add a causation ID to your events. This allows a client to recognize and deal with out of order messages. You can choose your own policy - reject, wait for causation event to arrive, try to process anyway, etc. That is not trivial to implement, though.
Messaging platforms can guarantee ordering under certain conditions. You mentioned Kafka, under the same topic and partition. RabbitMQ, I think, has even stronger prerequisites.
I'm not a messaging expert but it looks like the inter-microservice communication scenarios where it would be feasible are limited though. It also seems to go against the current trend in eventual consistency, where we tend to favor commutative operations (see CRDTs) over ensuring total order.

Related

BPMN Modelling: Parallel Processes, Dependency on Status of an Incomplete Process

I am trying to model a process that splits into 2 parallel threads, where thread 1 progresses independently through milestones, while thread 2 needs to take into consideration its own progress + the status of thread 1 to progress through the milestones. At the end, both thread need to complete. How do I model it? (my best try below)
What you modeled would work. However, you don't need the intermediate events. You can directly connect to the tasks. And you don't need an inclusive gateway. It would work, but a parallel gateway would do the same and be less complicated.
In short
There is an issue in the way you merge the incoming event with the normal flow on the lower branch. The symbol used is ambiguous an does not guarantee compliance with the execution semantics.
More details
The diagram will probably be understood as you expect. But it is not correct from the point of view of the BPMN execution semantics due to a missing synchronisation.
Let's analyse the flow with the concept of token, according to the execution semantics (chapter 13 of the specs):
A Process is instantiated when one of its Start Events occurs.
Each Start Event that occurs creates a token on its outgoing Sequence Flows
For a Process instance to become completed, all tokens in that instance MUST reach an end node, i.e., a node without outgoing Sequence Flows
So at the start of your process, a token is created, and it is passed to the first task. You then have a parallel gateway for a fork:
The Parallel Gateway consumes exactly one token from each incoming Sequence Flow and produces exactly one token at each outgoing Sequence Flow.
You then have 2 tokens, that will flow to the first upper and the first lower task. The upper token will continue to the "none" intermediate event. The lower token will reach the entry of a "merge gate". The question is if we are guaranteed to keep one token on each parallel branch.
The "none" intermediate gate will throw and pass the token down the outgoing flow. 2 tokens are hence generated: one to the next upper task, and one to the "merge gate".
What I called coloquially a "merge gate" is in fact ambiguous in your diagram:
it cannot be an exclusive gateway, since this would route each incoming token through it. This would mean that in the lower branch we would then end up with two tokens. This would not be legal.
it could be an inclusive gateway. But the symbol inside should be a simple circle and not a double circle as you have used. The inclusive flow consumes all tokens AVAILABLE on the input, but it requires at least one to get active and does not require any waiting for all tokens to be there. There is no synchronisation guarantee and you could end up with more than one token on the lower flow if there is the slightest delay in one of the branch. This is not acceptable.
Event-based gateways are 2 step gateways. The first is an event with a pentagon inside, and it must have several outgoing flows, each leading to a different kind of event to be received. In this case, it makes no sense, since we do not expect several kind of events.
According to the book "Real-Life BPMN" written by Freund & Rücker from carmunda, the solution would be to use a complex gateway, i.e. with an large internal '*' symbol and the description of a condition that states that all inputs must be available. You'd then be guaranteed to have only one outgoing token in the lower flow
I personally would recommend a parallel join gateway: in fact the two outgoing flows from the intermediate events are uncontrolled flows and are to be understood as implicitly starting a new parallel branch. The join gate would then clearly show the merge of the new implicit branch with the lower branch and clearly document the synchronisation (aka waiting for both token to be available). This seems to be the most appropriate alternative so far.
An even easier alternative would be to get rid of the lower merge gate, and have two incoming flows for the second lower task. This is then understood as two incoming uncontrolled flows as similar to an implicit join. It's equivalent to the previous solution but with less symbols.
The two last options are the only one which guarantee that there stay one and exactly one token on the upper and the lower branch. The rest of the flow is then trivial until the end.

In consistent global states, what is the difference between a run and a consistent run?

I am referring to global states in a distributed system as published in the paper Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms by Ozalp Babao and Keith Marzullo (pdf).
On page 7, they define a run as follows:
A run of a distributed computation is total ordering that includes all
of the events in the global history and that is consistent with each
local history.
Then on page 11, they define a consistent run as follows:
A run R is said to be consistent if for all events, e happens before e' implies that e appears
before e' in R. In other words, the total order imposed by R on the events
is an extension of the partial order defined by causal precedence.
So far so good but then they go on to say in the following sentences there can be be two consistent global states A and B and they define some sort of leads-to operator which says A leads to B if there exists some consistent run in which this holds.
My question is, how can there be multiple runs? Won't there be a unique run (a totally ordered relation of events based on some tie breaker for concurrent events) for every system setup? How can there be multiple runs (other than the ones induced by different concurrent event tie breakers)?

RxJS event order guarantee

while exploring rx for our project, we ran into the following puzzler:
We have one stream S1 that can receive two distinct events (A and B).
If we create two separate streams (Sx1 and Sx2) from that stream (S1) that subscribe specifically for either A or B events (Sx1 for A and Sx2 for B), is there any guarantee that the subscribers will receive the events
in the order they arrive in S1?
It all depends on what merging method you chose to carry out which will determine how the results are given back.
Take a look at RxMArbles it has great visual examples.
for this case I'd say Concat would keep it in the same order the events went in but if you are dealing with async data this might not be the best option. look at the COMBINING OPERATORS on RxMarble

Design/Code Dispatcher for a Publish-Subscribe System

A friend of mine was asked this problem in an interview. I would like to discuss this problem here
What can be the efficient implementation for this problem ?
A simple idea which comes to me is normal memqueue , using Memcache machines to scale several requests, with a consumer job running which will write things from memcache to DB.
and later on for the second part we can just run a sql query to find list of matching subscribers .
PROBLEM:-
Events get published to this system. Each event can be thought of as containing a fixed number (N) of string columns called C1, C2, … CN. Each event can thus be passed around as an array of Strings (C1 being the 0th element in the array, C2 the 1st and so on).
There are M subscribers – S1, … SM
Each subscriber registers a predicate that specifies what subset of the events it’s interested in. Each predicate can contain:
Equality clause on columns, for example: (C1 == “US”)
Conjunctions of such clauses, example:
(C1 == “IN”) && (C2 == “home.php”)
(C1 == “IN”) && (C2 == “search.php”) && (C3 == “nytimes.com”)
(In the above examples, C1 stands for the country code of an event and C2 stands for the web page of the site and C3 the referrer code.)
ie. – each predicate is a conjunction of some number of equality conditions. Note that the predicate does not necessarily have an equality clause for ALL columns (ie. – a predicate may not care about the value of some or all columns). (In the examples above: #a does not care about the columns C3, … CN).
We have to design and code a Dispatcher that can match incoming events to registered subscribers. The incoming event rate is in millions per second. The number of subscribers is in thousands. So this dispatcher has to be very efficient. In plain words:
When the system boots, all the subscribers register their predicates to the dispatcher
After this events start coming to the dispatcher
For each event, the dispatcher has to emit the id of the matching subscribers.
In terms of an interface specification, the following can be roughly spelt out (in Java):
Class Dispatcher {
public Dispatcher(int N /* number of columns in each event – fixed up front */);
public void registerSubscriber( String subscriberId /* assume no conflicts */,
String predicate /* predicate for this subscriberid */);
public List<String> findMatchingIds(String[] event /* assume each event has N Strings */);
}
Ie.: the dispatcher is constructed, then a bunch of registerSubscriber calls are made. After this we continuously invoke the method findMatchingIds() and the goal of this exercise is to make this function as efficient as possible.
As Hanno Binder implied, the problem is clearly set up to allow pre-processing the subscriptions to obtain an efficient lookup structure. Hanno says the lookup should be a map
(N, K) -> set of subscribers who specified K in field N
(N, "") -> set of subscribers who omitted a predicate for field N
When an event arrives, just look up all the applicable sets and find their intersection. A lookup failure returns the empty set. I'm only recapping Hanno's fine answer to point out that a hash table is O(1) and perhaps faster in this application than a tree. On the other hand, intersecting trees can be faster, O(S + log N) where S is the intersection size. So it depends on the nature of the sets.
Alternative
Here is my alternative lookup structure, again created only once during preprocessing. Begin by compiling a map
(N, K) -> unique token T (small integer)
There is also a distinguished token 0 that stands for "don't care."
Now every predicate can be thought of as a regular expression-like pattern with N tokens, either representing a specific event string key or "don't care."
We can now build a decision tree in advance. You can also think of this tree is a Deterministic Finite Automaton (DFA) for recognizing the patterns. Edges are labeled with tokens, including "don't care". A don't care edge is taken if no other edge matches. Accepting states contain the respective subscriber set.
Processing an event starts with converting the keys to a token pattern. If this fails due to a missing map entry, there are no subscribers. Otherwise feed the pattern to the DFA. If the DFA consumes the pattern without crashing, the final state contains the subscriber set. Return this.
For the example, we would have the map:
(1, "IN") -> 1
(2, "home.php") -> 2
(2, "search.php") -> 3
(3, "nytimes.com") -> 4
For N=4, the DFA would look like this:
o --1--> o --2--> o --0--> o --0--> o
\
-3--> o --4--> o --0--> o
Note that since there are no subscribers who don't care about e.g. C1, the starting state doesn't have a don't care transition. Any event without "IN" in C1 will cause a crash, and the null set will be properly returned.
With only thousands of subscribers, the size of this DFA ought to be reasonable.
Processing time here is of course O(N) and could be very fast in practice. For real speed, the preprocessing could generate and compile a nest of C switch statements. In this fashion you might actually get millions of events per second with a small number of processors.
You might even be able to coax a standard tool like the flex scanner generator to do most of the work for you.
A solution that comes to my mind would be:
For each Cn we have a mapping from values to sets of subscribers for those subscribers who subscribed for a value of Cn. Additionally, for each Cn we have a set of subscribers who don't care for the value of Cn ('ANY').
When receiving an event, we look up all the subscribers with matching subscriptions for Cn and receive a set with 0 or more subscribers. To this set we add those subscribers from the 'ANY' set for this Cn.
We do this for every n <= N, yielding n sets of subscribers. The intersection of all n sets is the set of subscribers matching this event.
The mapping from Cn to subscribers can efficiently be stored as a tree, which gives a complexity O(k) = log(k) to look up the subscribers for a single Cn, given that there are subscriptions to k different values.
Thus, for n values we have a complexity of O(n,k) = n * log(k).
Intersecting n sets can also be done in O(n,m) = n * log(m), so that we end up with a logarithmic complexity in total, which shouldn't be too bad.
Interesting.
My initial thoughts.
I feel it would be easier if the subscriber predicates for e.g.
(C1 == “IN”) && (C2 == “search.php”) && (C3 == “nytimes.com”)
that come to the Dispatcher
public void registerSubscriber
method needs to be flattened so that it is much performance friendly for comparison. Something like below (wild guess)
C1IN|C2search.php|C3nytimes.com
Then a map needs to be maintained in the memory with event string and subscriber ids
In the
findMatchingIds
method - the String array of events also need to be flattened with the similar rules so that a look up can be done for the matching subscriber id
This way the Dispatchers can be scaled horizontally serving many events in parallel
I think this is more of a design question- I don't think the interviewer would have been looking for working code . The general problem is called Content based Publish Subscribe , and if you search for papers in the same area, you would get a lot of results :
For instance- this paper also
Here are few things the system would need
1) A data-store for the subscriptions which needs to store:
a)Store the list of subscribers
b)Store the list of subscriptions
2) A means for authenticating the requests for subscriptions and the nodes themselves
a) Server-Subscribers communicate over ssl. In the case of the server handling thousands of SSL connections - It's a CPU intensive task, especially if lots of connections are set up in bursts.
b) If all the subscriber nodes are in the same trusted network, need not have ssl.
3) Whether we want a Push or Pull based model:
a)Server can maintain a latest timestamp seen per node, per filter matched. When an event matches a filter, send a notification to the subscriber. Let the client then
send a request. The server then initiate sending matching events.
b)Server matches and sends filter to clients at one shot.
Difference between (a) and (b) is that, in (a) you have more state maintained on the client side. Easier to extend a subscriber-specific logic later on. In (b) the client is dumb. It does not have any means to say if it does not want to receive events for whatever reason. (say, network clog).
4) How are the events maintained in memory at the server-side?
a)The logical model here is table with columns of strings (C1..CN), and each new row added is a new event.
b)We could have A hash-table per column storing a tupple of (timestamp, pointer to event structure). And each event is given a unique id. With different data-structures,we can come up with different schemes.
c) Events here are considered as infinite stream. If we have a 32-bit eventId, we have chances of integer-overflow.
d) If we have a timer function on the server, matching and dispatching events,what is the actual resolution of the system timer? Does that have any implication?
e) Memory allocation is a very expensive operation. If your filter-matching logic is going to do frequent allocations/ freeing, it will adversely affect performance. How can we manage the memory-pool for this particular operation? Would we different size-buckets of page-aligned memory?
5) What should happen if the subscriber node loses connectivity or goes down?
(a)Is it acceptable for the client to lose events during the period, or should the server buffer everything?
(b)If the subscriber goes down,till what historical time in the past can it request matching events.
6) More details of the messaging layer between (Server,Subscriber)
(a) Is the communication between the server and subscribers synchronous or asynchronous?
(b)Do we need a binary-protocol or text-based protocol between the client/server? (There are trade-off's in both)
7) Should we need any rate-limiting logic in server side? What should we do if we starve some of the clients while serving data to few others?
8) How would the change of subscriptions be managed? If some client wishes to change it's subsciption then, should it be updated in-memory first before updating the permanent data-store? Or vice-versa? What would happen if the server goes down, before the data-store is written-to? How would we ensure consistency of the data-store- the subscriptions/server list?
9)This was assuming that we have a single server- What if we need a cluster of servers that
the subscribers can connect to? (Whole bunch of issues here: )
a)How can network-partitioning be handled? ( example: of say 5 nodes,3 nodes are reachable from each other, and other 2 nodes can only reach other?)
b) How are events/workload distributed among the members of the cluster?
10) Is absolute correctness of information sent to the subscriber a requirement,ie, can the client receive additional information,that what it's subscription rules indicate? This can determine choice of data-structure- example using a probabilistic data structure like a Bloom filter on the server side, while doing the filtering
11)How is time-ordering of events maintained on the server side? (Time-order sorted linked list? timestamps?)
12)Will the predicate-logic parser for the subscriptions need unicode support?
In conclusion,Content-based pub-sub is a pretty vast area- and it is a distributed system which involves interaction of databases,networking,algorithms,node behavior(systems go down,disk goes bad,system runs out of memory because of a memory leak etc) - We have to look all these aspects. And most importantly, we have to look at the available time for actual implementation, and then determine how we want to go about solving this problem.

Database design for bus reservation

I'm developing a reservation module for buses and I have trouble designing the right database structure for it.
Let's take following case:
Buses go from A to D with stopovers at B and C. A Passenger can reserve ticket for any route, ie. from A to B, C to D, A to D, etc.
So each route can have many "subroutes", and bigger contain smaller ones.
I want to design a table structure for routes and stops in a way that would help easily search for free seats. So if someone reserves seat from A to B, then seats from B to C or D would be still be available.
All ideas would be appreciated.
I'd probably go with a "brute force" structure similar to this basic idea:
(There are many more fields that should exist in the real model. This is only a simplified version containing the bare essentials necessary to establish relationships between tables.)
The ticket "covers" stops through TICKET_STOP table, For example, if a ticket covers 3 stops, then TICKET_STOP will contain 3 rows related to that ticket. If there are 2 other stops not covered by that ticket, then there will be no related rows there, but there is nothing preventing a different ticket from covering these stops.
Liberal usage or natural keys / identifying relationships ensures two tickets cannot cover the same seat/stop combination. Look at how LINE.LINE_ID "migrates" alongside both edges of the diamond-shaped dependency, only to be merged at its bottom, in the TICKET_STOP table.
This model, by itself, won't protect you from anomalies such as a single ticket "skipping" some stops - you'll have to enforce some rules through the application logic. But, it should allow for a fairly simple and fast determination of which seats are free for which parts of the trip, something like this:
SELECT *
FROM
STOP CROSS JOIN SEAT
WHERE
STOP.LINE_ID = :line_id
AND SEAT.BUS_NO = :bus_no
AND NOT EXIST (
SELECT *
FROM TICKET_STOP
WHERE
TICKET_STOP.LINE_ID = :line_id
AND TICKET_STOP.BUS_ID = :bus_no
AND TICKET_STOP.TRIP_NO = :trip_no
AND TICKET_STOP.SEAT_NO = SEAT.SEAT_NO
AND TICKET_STOP.STOP_NO = STOP.STOP_NO
)
(Replace the parameter prefix : with what is appropriate for your DBMS.)
This query essentially generates all combinations of stops and seats for given line and bus, then discards those that are already "covered" by some ticket on the given trip. Those combinations that remain "uncovered" are free for that trip.
You can easily add: STOP.STOP_NO IN ( ... ) or SEAT.SEAT_NO IN ( ... ) to the WHERE clause to restrict the search on specific stops or seats.
From the perspective of bus company:
Usually one route is considered as series of sections, like A to B, B to C, C to D, etc. The fill is calculated on each of those sections separately. So if the bus leaves from A full, and people leave at C, then user can buy ticket at C.
We calculate it this way, that each route has ID, and each section belongs to this route ID. Then if user buys ticket for more than one section, then each section is marked. Then for the next passenger system checks if all sections along the way are available.

Resources