How to deal with "Foreign Key" in microservice architecture? - spring-boot

Quick question on Foreign key in Microservices. I already tried looking for answer. But, they did not give me the exact answer I was looking for.
Usecase : Every blog post will have many comments. Traditional monolith will have comments table with foreign key to blog post. However in microservice, we will have two services.
Service 1 : Post Microservie with these table fields (PostID, Name, Content)
Service 2 : Comments Microservie with these table fields (CommentID, PostID, Cpmment)
The question is, Do we need "PostID" in service 2 (Comments Microservice) ? I guess the answer is yes, as we need to know which comment belongs to which post. But then, it will create tight coupling? I mean if I delete service 1(Blog post service), it will impact service 2(Comments service) ?

I'm going to use another example I'm more familiar with to explain how I believe most people would do this.
Consider an Order Management System (OMS) and an Inventory Management System (IMS).
When a customer places an order in the company web site, we ask the OMS to create an order entry in the backend (e.g. via an HTTP endpoint).
The OMS system then broadcasts an event e.g. OrderPlaced containing all the details of the customer order. We may have a pub/sub (e.g. Redis), or a queue (e.g. RabbitMQ), or an event stream (e.g. Kafka) where we place the event (although this can be done in many other ways).
The thing is that we have one or more subscribers interested in this event. One of those could be the IMS, which has the responsibility of assigning the best inventory available every time an order is placed.
We can expect that the IMS will keep a copy of the relevant order information it received when it processed the OrderPlaced event such that it does not ask every little detail of the order to the OMS all the time. So, if the IMS needed a join with the order, instead of calling an endpoint in the Order API, it would probably just do a join with its local copy of the orders table.
Say now that our customer called to cancel her order. A customer service representative then cancelled it in the OMS Web User Interface. At that point an event OrderCanceled is broadcast. Guess who is listening for that event? Correct, the IMS receives notification and acts accordingly reversing the inventory assignation and probably even deleting the order record because it is no longer necessary on this domain.
So, as you can see, the best way to do this is by using events and making copies of the relevant details on the other domain.
Since events need time to get broadcast and processed by interested parties, we say that the order data in the IMS is eventually consistent.
Followup Questions
Q: So, if I understood right in microservises we prefer to duplicate data and get better performance? That is the concept? I mean I know the concept is scaling and flexibility but when we must share data we will just duplicate it?
Not really. That´s definitively not what I meant although it may have sounded like that due to my poor choice of words in the original explanation. It appears to me that at the heart of your question lies a lack of sufficient understanding of the concept of a bounded context.
In my explanation I meant to indicate that the OMS has a domain concept known as the order, but so does the IMS. Therefore, they both have an entity within their domain that represents it. There is a good chance that the order entity in the OMS is much richer than the corresponding representation of the same concept in the IMS.
For example, if the system I was describing was not for retail, but for wholesale, then the same concept of a "sales order" in our system corresponds to the concept of a "purchase order" in that of our customers. So you see, the same data, mapped under a different name, simply because under a different bounded context the data may have a different perspective and meaning.
So, this is the realization that a given concept from our model may be represented in multiple bounded contexts, perhaps from a different perspective and names from our ubiquitous language.
Just to give another example, the OMS needs to know about the customer, but the representation of the idea of a customer in the OMS is probably different than the same representation of such a concept or entity in the CRM. In the OMS the customer's name, email, shipping and billing addresses are probably enough representation of this idea, but for the CRM the customer encompasses much more.
Another example: the IMS needs to know the shipping address of the customer to choose the best inventory (e.g. the one in a facility closest to its final destination), but probably does not care much about the billing address. On the other hand, the billing address is fundamental for the Payment Management System (PMS). So, both the IMS and PMS may have a concept of an "order", it is just that it is not exactly the same, neither it has the same meaning or perspective, even if we store the same data.
One final example: the accounting system cares about the inventory for accounting purposes, to be able to tell how much we own, but perhaps accounting does not care about the specific location of the inventory within the warehouse, that's a detail only the IMS cares about.
In conclusion, I would not say this is about "copying data", this is about appropriately representing a fundamental concept within your bounded context and the realization that some concepts from the model may overlap between systems and have different representations, sometimes even under different names and levels of details. That's why I suggested that you investigate the idea of context mapping some more.
In other words, from my perspective, it would be a mistake to assume that the concept of an "order" only exists in the OMS. I could probably say that the OMS is the master of record of orders and that if something happens to an order we should let other interested systems know about those events since they care about some of that data because those other systems could have mapping concepts related to orders and when reacting to the changes in the master of record, they probably want to change their data as well.
From this point of view, copying some data is a side effect of having a proper design for the bounded context and not a goal in itself.
I hope that answers your question.

Related

How to get grouped data from a microservice?

Let's say we have a system to store appointments. Each appointment has multiple resources (e.g. trainers, rooms, etc.). We have decided to move all appointment data into an Appointment Service and all resources into a Resources Service.
Now we need a UI that shows filters for the appointments, to filter by trainer. Usually, you only want to display checkboxes for trainers that actually have appointments and not all trainers.
That means we can't really use the Resource Service to get all trainers, instead, we would have to ask the Appointment Service to get a grouped view of all trainers that have at least one appointment. Then we would have to call the Resource Service to get more info about each trainer.
So how do you get grouped data from a microservice?
Edit: Each system has it's own database. We also use RabbitMQ to sync data between services.
This is an interesting question with many possible solutions. #Welbog comment makes a good point about it depending on the scale of the application. Denormalized databases are obviously a possibility.
Getting grouped data is one of the challenges of implementing microservices, and this challenge becomes greater the more granular our services get. What does your database setup look like? I'm assuming your two services are using different databases otherwise your question would have a simple solution.
Without knowing the ins and outs of your system, I would assume that denormalizing your db's would be the path of least resistance.
You could possible explore the idea that maybe these two services should in fact be a single service. Nanoservices are not what we are after, and sometimes it just makes more logical sense for two services to actually be together. Things that must change together, should be contained together. I'm not saying this is applicable in your case, I'm just saying it's worth considering.
I'm certain others will have other ideas, but based on what little I know about the entirety of your system, it's hard to say; however I think this is an interesting question that I will follow to see what other peoples proposed solutions are.

Event Sourcing: How to model relationship between different events?

I am new to Event Sourcing and I have encountered an example which I am not quite sure the pros and cons of different approaches.
Let's say this is a bank example, I have three entities Account, Deposit and Transfer.
My idea is, when a use deposits, command bank.deposit will create two events:
deposit.created and account.deposited. Can I or should I include the deposit.created event uuid in account.deposited as a reference?
Taking to the next step, if later the bank has a transfer feature, should I made a separate event account.transfer_received or I should created a more general event account.credited to be used by both deposit and transfer?
Thanks in advance.
A good article to review is Nobody Needs Reliable Messaging. One key observation is that you often need identifiers at the domain level.
For instance, when I look at my bank account, and see that the account history includes a specific deposit, there is an identifier for the deposit that is reported in the view.
If you imagine it from an event sourced perspective, before the deposit the balance was X, and the history did not include deposit 12345; after processing the deposit, the balance was X+Y and deposit 12345 was in the account history.
(This means, among other things, that if a second copy of deposit 12345 were to appear, the domain model would know to ignore it even if the identifier for the event were different).
Now, there are reasons that you might want to keep various message ids around. See Hohpe's work on Enterprise Integration Patterns; in particular Correlation Identifier.
should I made a separate event
Usually. "Make the implicit, explicit". The fact that two events happen to have similar representations is not a reason to blur them when the ubiquitous language distinguishes the two.
It's somewhat analogous, in motivation, to providing a task based ui or eschewing the user of generic repositories.

Validate Command in CQRS that related to other domain

I am learning to develop microservices using DDD, CQRS, and ES. It is HTTP RESTful service. The microservices is about online shop. There are several domains like products, orders, suppliers, customers, and so on. The domains built in separate services. How to do the validation if the command payload relates to other domains?
For example, here is the addOrderItemCommand payload in the order service (command-side).
{
"customerId": "CUST111",
"productId": "SKU222",
"orderId":"SO333"
}
How to validate the command above? How to know that the customer is really exists in database (query-side customer service) and still active? How to know that the product is exists in database and the status of the product is published? How to know whether the customer eligible to get the promo price from the related product?
Is it ok to call API directly (like point-to-point / ajax / request promise) to validate this payload in order command-side service? But I think, the performance will get worse if the API called directly just for validation. Because, we have developed an event processor outside the command-service that listen from the event and apply the event to the materalized view.
Thank you.
As there are more than one bounded contexts that need to be queried for the validation to pass you need to consider eventual consistency. That being said, there is always a chance that the process as a whole can be in an invalid state for a "small" amount of time. For example, the user could be deactivated after the command is accepted and before the order is shipped. An online shop is a complex system and exceptions could appear in any of its subsystems. However, being implemented as an event-driven system helps; every time the ordering process enters an invalid state you can take compensatory actions/commands. For example, if the user is deactivated in the meantime you can cancel all its standing orders, release the reserved products, announce the potential customers that have those products in the wishlist that they are not available and so on.
There are many kinds of validation in DDD but I follow the general rule that the validation should be done as early as possible but without compromising data consistency. So, in order to be early you could query the readmodel to reject the commands that couldn't possible be valid and in order for the system to be consistent you need to make another check just before the order is shipped.
Now let's talk about your specific questions:
How to know that the customer is really exists in database (query-side customer service) and still active?
You can query the readmodel to verify that the user exists and it is still active. You should do this as a command that comes from an invalid user is a strong indication of some kind of attack and you don't want those kind of commands passing through your system. However, even if a command passes this check, it does not necessarily mean that the order will be shipped as other exceptions could be raised in between.
How to know that the product is exists in database and the status of the product is published?
Again, you can query the readmodel in order to notify the user that the product is not available at the moment. Or, depending on your business, you could allow the command to pass if you know that those products will be available in less than 24 hours based on some previous statistics (for example you know that TV sets arrive daily in your stock). Or you could let the customer choose whether it waits or not. In this case, if the products are not in stock at the final phase of the ordering (the shipping) you notify the customer that the products are not in stock anymore.
How to know whether the customer eligible to get the promo price from the related product?
You will probably have to query another bounded context like Promotions BC to check this. This depends on how promotions are validated/used.
Is it ok to call API directly (like point-to-point / ajax / request promise) to validate this payload in order command-side service? But I think, the performance will get worse if the API called directly just for validation.
This depends on how resilient you want your system to be and how fast you want to reject invalid commands.
Synchronous call are simpler to implement but they lead to a less resilient system (you should be aware of cascade failures and use technics like circuit breaker to stop them).
Asynchronous (i.e. using events) calls are harder to implement but make you system more resilient. In order to have async calls, the ordering system can subscribe to other systems for events and maintain a private state that can be queried for validation purposes as the commands arrive. In this way, the ordering system continues to work even of the link to inventory or customer management systems are down.
In any case, it really depends on your business and none of us can tell you exaclty what to do.
As always everything depends on the specifics of the domain but as a general principle cross domain validation should be done via the read model.
In this case, I would maintain a read model within each microservice for use in validation. Of course, that brings with it the question of eventual consistency.
How you handle that should come from your understanding of the domain. Factors such as the length of the eventual consistency compared to the frequency of updates should be considered. The cost of getting it wrong for the business compared to the cost of development to minimise the problem. In many cases, just recording the fact there has been a problem is more than adequate for the business.
I have a blog post dedicated to validation which you can find here: How To Validate Commands in a CQRS Application

Microservices: model sharing between bounded contexts

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

Design of notification events

I am designing some events that will be raised when actions are performed or data changes in a system. These events will likely be consumed by many different services and will be serialized as XML, although more broadly my question also applies to the design of more modern funky things like Webhooks.
I'm specifically thinking about how to describe changes with an event and am having difficulty choosing between different implementations. Let me illustrate my quandry.
Imagine a customer is created, and a simple event is raised.
<CustomerCreated>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</CustomerCreated>
Now let's say Bob spends lots of money and becomes a gold customer, or indeed any other property changes (e.g.: he now prefers to be known as Robert). I could raise an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is nice because the schema of the Created and Modified events are the same and any subscriber receives the complete current state of the entity. However it is difficult for any receiver to determine which properties have changed without tracking state themselves.
I then thought about an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is more compact and only contains the properties that have changed, but comes with the downside that the receiver must apply the changes and reassemble the current state of the entity if they need it. Also, the schemas of the Created and Modified events must be different now; CustomerId is required but all other properties are optional.
Then I came up with this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<Before>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</Before>
<After>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</After>
</CustomerModified>
This covers all bases as it contains the full current state, plus a receiver can figure out what has changed. The Before and After elements have the exact same schema type as the Created event. However, it is incredibly verbose.
I've struggled to find any good examples of events; are there any other patterns I should consider?
You tagged the question as "Event Sourcing", but your question seems to be more about Event-Driven SOA.
I agree with #Matt's answer--"CustomerModified" is not granular enough to capture intent if there are multiple business reasons why a Customer would change.
However, I would back up even further and ask you to consider why you are storing Customer information in a local service, when it seems that you (presumably) already have a source of truth for customer. The starting point for consuming Customer information should be getting it from the source when it's needed. Storing a copy of information that can be queried reliably from the source may very well be an unnecessary optimization (and complication).
Even if you do need to store Customer data locally (and there are certainly valid reasons for need to do so), consider passing only the data necessary to construct a query of the source of truth (the service emitting the event):
<SomeInterestingCustomerStateChange>
<CustomerId>1234</CustomerId>
</SomeInterestingCustomerStateChange>
So these event types can be as granular as necessary, e.g. "CustomerAddressChanged" or simply "CustomerChanged", and it is up to the consumer to query for the information it needs based on the event type.
There is not a "one-size-fits-all" solution--sometimes it does make more sense to pass the relevant data with the event. Again, I agree with #Matt's answer if this is the direction you need to move in.
Edit Based on Comment
I would agree that using an ESB to query is generally not a good idea. Some people use an ESB this way, but IMHO it's a bad practice.
Your original question and your comments to this answer and to Matt's talk about only including fields that have changed. This would definitely be problematic in many languages, where you would have to somehow distinguish between a property being empty/null and a property not being included in the event. If the event is getting serialized/de-serialized from/to a static type, it will be painful (if not impossible) to know the difference between "First Name is being set to NULL" and "First Name is missing because it didn't change".
Based on your comment that this is about synchronization of systems, my recommendation would be to send the full set of data on each change (assuming signal+query is not an option). That leaves the interpretation of the data up to each consuming system, and limits the responsibility of the publisher to emitting a more generic event, i.e. "Customer 1234 has been modified to X state". This event seems more broadly useful than the other options, and if other systems receive this event, they can interpret it as they see fit. They can dump/rewrite their own data for Customer 1234, or they can compare it to what they have and update only what changed. Sending only what changed seems more specific to a single consumer or a specific type of consumer.
All that said, I don't think any of your proposed solutions are "right" or "wrong". You know best what will work for your unique situation.
Events should be used to describe intent as well as details, for example, you could have a CustomerRegistered event with all the details for the customer that was registered. Then later in the stream a CustomerMadeGoldAccount event that only really needs to capture the customer Id of the customer who's account was changed to gold.
It's up to the consumers of the events to build up the current state of the system that they are interested in.
This allows only the most pertinent information to be stored in each event, imagine having hundreds of properties for a customer, if every command that changed a single property had to raise an event with all the properties before and after, this gets unwieldy pretty quickly. It's also difficult to determine why the change occurred if you just publish a generic CustomerModified event, which is often a question that is asked about the current state of an entity.
Only capturing data relevant to the event means that the command that issues the event only needs to have enough data about the entity to validate the command can be executed, it doesn't need to even read the whole customer entity.
Subscribers of the events also only need to build up a state for things that they are interested in, e.g. perhaps an 'account level' widget is listening to these events, all it needs to keep around is the customer ids and account levels so that it can display what account level the customer is at.
Instead of trying to convey everything through payload xmls' fields, you can distinguish between different operations based on -
1. Different endpoint URLs depending on the operation(this is preferred)
2. Have an opcode(operation code) as an element in the xml file which tells which operation is to used to handle the incoming request.(more nearer to your examples)
There are a few enterprise patterns applicable to your business case - messaging and its variants, and if your system is extensible then Enterprise Service Bus should be used. An ESB allows reliable handling of events and processing.

Resources