Mechanisms for response aggregation in event sourcing based microservices - microservices

When it comes to implementing event sourcing based microservices, one of the main concerns that we've come across is aggregating data for responses. For an example we may have two entities like school and student. One microservice may be responsible for handling school related business logic while another may handle students.
Now if someone makes a query through a REST endpoint and ask for a particular student and they might expect both school and student details, then the only known ways for me are the following.
Use something like service chaining. An example would be an Api-Gateway aggregating a response after making couple of requests to couple of microservices.
Having everything replicated throughout all services. Essentially, data would be duplicated.
Having services calling each other for those extra bit of information. This solution works but hard to scale and goes against basic idea of using event sourcing.
My question is that what other ways are there to do this ?

A better approach can be to create a separate reporting/search service, that aggregates the data from both services. For example implemented using ElasticSearch or SOLR.This now allows the users to do search and queries across multiple services and aggregates.
Sure, it will be eventually consistent, but I doubt that is s a problem. This gives a better separation of concerns and you get a nice search experience for your users at the same time.

Related

Proper separation of concerns between microservices

Let's assume I would like to create a blogging platform that would allow managing user accounts, therefore I came up with 2 microservices:
Blogging - managing posts, tags, etc.
Users - managing users, their roles, etc.
It's clear to me that Post contains the ID of the author (User) and User contains IDs of the Posts he wrote.
The problem is that when a client requests a Post, I would also like to return the name of the author and send it together to the client in a DTO (view model). I see 2 possible solutions for that:
Introduce the concept of the User (only ID and name) in a domain of the Blogging service. Every time a client requests a Post, all relevant data is fetched only from this microservice. Every time a user's name is modified in a Users microservice, this service is notified and updated author's names.
Keep the concerns completely separated. Every time a client asks for a Post, Blogging microservice is called to fetch Post and then Users microservice is called to fetch the author's name based on the ID.
So far I was leaning towards solution #1 as it would require only 1 call to the 1 microservice when Post is requested, but then let's say if the number of functions (microservices) starts growing and I keep adding small concepts from each of them only to limit the number of calls, I'm afraid I would end up in a spaghetti Blogging microservice... But then I also don't think the number of microservices will grow significantly ;)
Which way do you find better and why? Does any of the approaches break the microservices architecture principles and my concerns are justified?
It really depends on the way your program is going to develop.
Solution #1 (editing the Blogging model)
Here you edit your model for the sake of the performance. Most of the time I don't think altering the model itself for the sake of performance is the right way. If you don't think the concept of an User inside the Blogging context is right, you shouldn't put it there. Another problem is that you would have to deal with eventual consistency between you bounded contexts (BC). If you think the concept of User inside Blogging is not a bad idea, it might be probably the way to go.
Solution #2 (querying the data separately)
Although this might not be performance friendly, I think this would be the easiest way at the beginning. You won't have to deal with eventual consistency (+dealing with synchronizing the data between BC) and you don't have to alter your model. But if you really care about performance and keeping your model clean, you might be interested in read model.
Another solution - read model
The concept of read model is about having at least two separate models: one for reading the data and for writing/editing the data. If we want to implement this concept for your problem, we would create a read model for the Blogging BC where Posts and Users could be merged together (something like in solution #1) and then queried. We can synchronize this data with the help of events, so when the Post or User changes, it would raise an event and save the changed data into your read model. This way you can merge/alter any data you want for better performance. It might be very similar to solution #1, but the main difference is that you don't edit your main model. You create new one just for the reading. It is probably the hardest solution, but if your project is large and performance heavy, this might be a way.
I wouldn't say any solution is the best or the worst for solving your problem. It always depends on the context.

Child service data access to other services with Apollo Federation configuration

We've been using Apollo Federation for about 1.5 years as our main api. Behind the federation gateway are 6 child graphql services which are all combined at the gateway. This configuration really works excellent when you have a result set of data which spans the different services. E.g. A list of tickets which references the user who purchased and event it is associated with it, etc.
One place we have experienced this breaking down is when a pre-set of data is needed which is already defined in another child service (or across other child services) (resolver/path). There is no way (that has been discovered by us) to query the federation from a child service to get a federated set of data for use by a resolver to work with that data.
For example, say we have a graphql query defined which queries all tickets for an event, and through federation returns the purchaser's data, the event's data and the products data. If I need this data set from a resolver, I would need to make all those queries again myself duplicating dataSource logic and having to match up the data in code.
One crazy thought which came up is to setup apollo-datasource-rest dataSource to make queries against our gateway end point as a dataSource for our resolvers. This way we can request the data we need and let Apollo Federation stitch all the data together as it is designed to do. So instead of the resolver querying the database for all the different pieces of data and then matching them up, we would request the data from our graphql gateway where this query is already defined.
What we are trying to avoid by doing this is having a repeated set of queries in child services to get the details which are already available in (or across) other services.
The question
Is this a really bad idea?
Is it a plausible idea?
Has anyone tried something like this before?
Yes we would have to ensure that there aren't circular dependencies on the resolvers. In our case I see the "dataSource accessing the gateway" utilized in gathering initial data in mutations.
Example of a federated query. In this query, event, allocatedTo, purchasedBy, and product are all types in other services. event is an Event type, allocatedTo and purchasedBy are a Profile type, and product is a Product type. This query provides me with all the data I would use to say, send an email notification to the people in the result set. Though to get this data from a resolver in a mutation to queue up those emails means I need to make many queries and align all the data through code myself instead of using the Gateway/federation which does this already with the already established query. The thought around using apollo-datasource-rest to query our own gateway is get at this data in this form. Not through separate queries and code to align id's etc.
query getRegisteredUsers($eventId: ID!) {
communications {
event(eventId: $eventId) {
registered {
event {
name
}
isAllocated,
hasCheckedIn,
lastUpdatedAt,
allocatedTo {
firstName
lastName
email
}
purchasedBy {
id
firstName
lastName
}
product {
__typename
...on Ticket {
id
name
}
}
}
}
}
}
FYI, I didn't quite understand the question until I looked at your edits, which had some examples.
Is this a really bad idea?
In my experience, yes. Not as an idea, as you're in good company with other very smart people who have done this.
Is it a plausible idea?
Absolutely it's plausible, but I don't recommend it.
Has anyone tried something like this before?
Yes, but I hope you don't.
Your Question
Having resolvers make requests back to the Gateway:
I do not recommend this. I've seen this happen, and I've personally worked to help companies out of the mess this takes you into. Circular dependencies are going to happen. Latency is just going to skyrocket as you have more and more hops, TLS handshakes, etc. Do orchestration instead. It feels weird to introduce non-GraphQL, but IMO in the end it's way simpler, faster, and more maintainable than where "just talk to the gateway" takes you.
What then?
When you're dealing with some mutations which require data from across multiple data sources to be able to process a single thing (like sending a transaction email to a person), you have some choices. Something that helped me figure this out was the question "how would I have done this before GraphQL?"
Orchestration: you have a single "orchestration service", which takes the mutation and makes calls (preferably non-GraphQL, so REST, gRPC, Lambda?) to the owner services to collect the data. The orchestration layer does NOT own data, but it can speak with the other services. It's like Federation, but for sending the data into the request, instead of into the response.
Choreography: you trigger roughly the same thing, but via an event stream. (doesn't work as well with the request / response model of GraphQL)
CQRS (projections): Copies of database data, used for things like reporting. CQRS is basically "the way you read data doesn't have to be the same as the way you write it", and it allows for things like event-sourced data. If all of your data sources actually share the same database, you don't even need "projections" as much as you would just want a read replica. If you're not at enough scale to do replicas, just skip it and promise never to write data that your current domain doesn't own.
What I Do
Where I work, I have gotten us to:
Queries
queries always start with "one database call".
if the "one database call" goes to one domain of data (most often true), that query goes into one service, and Federation fills in the leaves of your tree. If you really follow CQRS, this could go the same way as #3, but we don't.
if your "one database call" needs data from across domains (e.g. get all orders with Product X in it, but sorted by the customer's first name), you need a database projection. Preferably this can be handled by a "reporting service": it doesn't OWN any data, but it READS all data.
Mutations
if your top-level mutation modifies acts only within one domain, the mutation goes in a service, it can use database transactions, and Federation fills in the leaves
if your mutation is required to write across multiple domains and requires immediate consistency (placing an order with inventory, payments, etc), we chose orchestration to write across multiple services (and roll-back when necessary, since we don't have database transactions to do it for us).
if your mutation requires data from many places to send further into the request (like sending an email), we chose orchestration to pull from the multiple services and to push that data down. This feels very much like Federation, but in reverse.

How to get grouped data from a microservice?

Let's say we have a system to store appointments. Each appointment has multiple resources (e.g. trainers, rooms, etc.). We have decided to move all appointment data into an Appointment Service and all resources into a Resources Service.
Now we need a UI that shows filters for the appointments, to filter by trainer. Usually, you only want to display checkboxes for trainers that actually have appointments and not all trainers.
That means we can't really use the Resource Service to get all trainers, instead, we would have to ask the Appointment Service to get a grouped view of all trainers that have at least one appointment. Then we would have to call the Resource Service to get more info about each trainer.
So how do you get grouped data from a microservice?
Edit: Each system has it's own database. We also use RabbitMQ to sync data between services.
This is an interesting question with many possible solutions. #Welbog comment makes a good point about it depending on the scale of the application. Denormalized databases are obviously a possibility.
Getting grouped data is one of the challenges of implementing microservices, and this challenge becomes greater the more granular our services get. What does your database setup look like? I'm assuming your two services are using different databases otherwise your question would have a simple solution.
Without knowing the ins and outs of your system, I would assume that denormalizing your db's would be the path of least resistance.
You could possible explore the idea that maybe these two services should in fact be a single service. Nanoservices are not what we are after, and sometimes it just makes more logical sense for two services to actually be together. Things that must change together, should be contained together. I'm not saying this is applicable in your case, I'm just saying it's worth considering.
I'm certain others will have other ideas, but based on what little I know about the entirety of your system, it's hard to say; however I think this is an interesting question that I will follow to see what other peoples proposed solutions are.

Microservices: model sharing between bounded contexts

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

Event sourcing, CQRS and database in Microservice

I am quite new in context of Micro-service architecture and reading this post : http://microservices.io/patterns/data/event-sourcing.html to get familiar with Event sourcing and data storage in Microservice architecture.
I have read many documents about 3 important aspect of system :
Using event sourcing instead of a simply shared DB and ORM and
row update
Events are JAVA objects.
In case of saving data permanently
, we need to use DB (either relational or noSQL)
Here are my questions :
How database comes along with event sourcing? I have read CQRS
pattern, but I can not understand how CQRS pattern is related to
event store and event objects ?
Can any body provide me a
complete picture and set of operations happens with all players to
gather: CQRS pattern , Event sourcing (including event storage
module) and finally different microservices?
In a system
composed of many microservices, should we have one event storage or
each microservice has its own ? or both possible ?
same
question about CQRS. This pattern is implemented in all
microservices or only in one ?
Finally, in case of using
microservice architecture, it is mandatory to have only one DB or
each Microserivce should have its own ?
As you can see, I have understood all small pieces of game , but I can not relate them together to compose a whole image. Specially relevance between CQRS and event sourcing and storing data in DB.
I read many articles for example :
https://ookami86.github.io/event-sourcing-in-practice/
https://msdn.microsoft.com/en-us/library/jj591577.aspx
But in all of them small players are discussed. Even a hand drawing piece of image will be appreciated.
How database comes along with event sourcing? I have read CQRS pattern, but I can not understand how CQRS pattern is related to event store and event objects ?
"Query" part of CQRS instructs you how to create a projection of events, which is applicable in some "bounded context", where the database could be used as a means to persist that projection. "Command" part allows you to isolate data transformation logic and decouple it from the "query" and "persistence" aspects of your app. To simply put it - you just project your event stream into the database in many ways (projection could be relational as well), depending on the task. In this model "query" and "command" have their own way of projecting and storing events data, optimised for the needs of that specific part of the application. Same data will be stored in events and in projections, this will allow achieving simplicity and loose coupling among subdomains (bounded contexts, microservices).
Can any body provide me a complete picture and set of operations happens with all players to gather: CQRS pattern , Event sourcing (including event storage module) and finally different microservices?
Have you seen Greg Young's attempt to provide simplest possible implementation? If you still confused, consider creating more specific question about his example.
In a system composed of many microservices, should we have one event storage or each microservice has its own ? or both possible ?
It is usually one common event storage, but there definitely could be some exceptions, edge cases where you really will need multiple storages for different microservices here and there. It all depends on the business case. If you not sure - most likely you just need a single storage for now.
same question about CQRS. This pattern is implemented in all microservices or only in one ?
It could be implemented in most performance-demanding microservices. It all depends on how complex your implementation becomes when you are introducing CQRS into it. If it gets simpler - why not implement it everywhere? But if people in your team become more and more confused by the need to perform more explicit synchronisation between commands and queries parts - maybe cqrs is too much for you. It all depends on your team, on your domain ... there is no single simple answer, unfortunately.
Finally, in case of using microservice architecture, it is mandatory to have only one DB or each Microservice should have its own ?
If same microservices sharing same tables - this is usually considered as an antipattern, as it increases coupling, the system becomes more fragile. You can still share the same database, but there should be no shared tables. Also, tables from one microservice better not have FK's to tables in another microservice. Same reason - to reduce coupling.
PS: consider not to ask coarse-grained questions, as it will be harder to get a response from people. Several smaller, more specific questions will have better chance to be answered.

Resources