I am now trying to design database for my micro service-oriented application in a distributed way. My application is related with management of universities. I have different universities say A, B, C. Each university have separate users for using their business data. Now I am planning to design separate databases for separate universities for storing their user data. So each university has their own database for their users and additional one database for managing their application tables. If I have 2 universities, Then I have 2 user details DB and other 2 DB for application tables.
Here my confusion is that, when I am searching for database design, I only see the approach of keeping one common database for storing all users (Here for one DB for all users of all universities). So every user is mixed within one database.
If I am following separate database for each university, Is possible to support distributed DB architecture pattern and micro service oriented standard? Or Do I need to keep one DB for all users?
How can I find out which method is appropriate for microservice / Distributed database design pattern?
Actually there could be multiple solutions and not one solution is best, the best solution is the one which is appropriate for your product's requirements.
I think it would be a better idea to go with separate databases for each of your client (university) to keep the data always isolated even if somethings wrong happens. Also with time, the database could go so huge that it could cause problems to configure/manage separate backups, cleanups for individual clients etc.
Now with separate databases there comes a challenge for managing distributed transactions across databases as you don't know which part is going to fail among many. To manage that, you may have to implement message/event driven mechanism across all your micro-services and ensure consistency.
Regarding message/event mechanism, here is a simple use case scenario, suppose there are two services "A" (user-registration) and "B" (email-service)
"A" registers a user temporarily and publishes an event of sending confirmation email.
The message goes to message broker
The message is received by "B".
The confirmation email is sent to the user.
The user confirms the email to "B"
The "B" publishes event of user confirmation to the broker
"A" receives the event of confirmation and the process is completed.
The above is the best case scenario, problems still can happen in between even with broker itself.
You have to go deep into it if you think you need this.
Some links that may help.
http://how-to-implement-a-microservice-event-driven-architecture-with-spring-cloud-stre
A Guide to Transactions Across Microservices
I don't think that this is a valid design, using a database per client which is a Multi-tenant architecture practice, and database per microservice is a microservice architecture practice. you are mixing things up.
if you will use microservice architecture you better design it as Bounded contexts and each Context has its own database to achieve microservices main rule Autonomy
Related
I understand that microservice architecture suggests that each service should have its own private database. But when such a service is scaled, then is it one db per service instance or one db shared by all service instances?
Your first statement may be misleading to some: "each service should have its own private database."
Your architecture should be careful about sharing a single set of tables across multiple services-- that sharing frequently leads to a shared schema dependency, which creates a tight coupling that makes it difficult to update the schema without updating many of the services that share that schema at the same time.
However, sharing a single database instance (or database cluster) doesn't mean your services are accessing the same tables or even the same schema within the database. And if they aren't accessing the same tables, they aren't coupled. (Relying on the same database instance isn't coupling any more than relying on the same network. Don't confuse coupling with shared infrastructure.)
Frequently, multiple instances of the same service share the same database. In my opinion, there is nothing inherently wrong with this, but there are some things to be aware of. If you go this route, you need to be very careful when making changes to the data schema. Because multiple versions of that service may be accessing the data at the same time during updates, any schema changes need to compatible to at least any two adjacent versions. If you add a column or table, that's fine. The older version won't attempt to use it, so there will be no problem. (Note too, that the older version won't populate it either.) Removing a column or table is another problem entirely and to make that kind of breaking change, you will likely need to do it in several smaller steps to ensure that the older version of the service isn't broken. It can be done, it's just tougher.
A general rule of microservice development is that each microservice
should manage its own data. In an ideal world, the data managed by
each service would be completely independent. There would be no need
to propagate data changes made in one service to other services.
In the real world, however, complete data independence is impossible.
There will always be overlaps between the data used in different
services, Consequently, as an architect, you need to think carefully about
sharing data and managing data consistency. You need to think about
the microservices as an interacting system rather than as individual
units.
This means:
You should isolate data within each system service with as little
data sharing as possible.
If data sharing is mavoidable, you should design microservices so
that most sharing is read-only, with a minimal number of
services responsible for data updates.
If services are replicated in your system, you must include a
mechanism that can keep the database copies used by replica
services consistent.
Good question indeed. I would answer it like: "at least a database per microservice (not instance)"
A concern is the scalability of the databse itself, i.e. can service instances outscale the database?
If so, you could opt for e.g. an in-memory database or a sidecar for your microservice. The database would be ephemeral and you would need to populate it after the pod/container (re)starts. So the state not really lives in the database.
Apache Kafka is a tool that fits this spot, as it would allow you to populate the database after the service comes up and also provides the tooling to synchronize state for all currently running and future instances. But successfully implementing a Event-Sourcing with Kafka is not a trivial task, but you could come the conclusion that you don't need databases at all.
So the question remains, can service instances really outscale the database?
The answer would be "no" more often than not.
So by having a database instance per microservice (physically or logically) already gives you a lot in terms of "loose coupling and cohesive behaviour" as you don't share databases.
Another concern are breaking changes to the database between versions of the microservice. If things go wrong you could find yourself being unable to rollback. An ephemeral database could sync itself up in a compatible way.
Some say they change database technologies throughout the lifetime of a microservice, I never had the neccessity to do so, but an in-memory/sidecar approach would fit here very well.
I presume you share one database with all instances of one microservice. So that one update is available for every instance of the same microservice immediately. You may use one database instance per microservice instance to avoid the database as a single point of failure. But you would have to keep in sync every database which, it seems like an unnecesary overload for the database and application. I assume the database is able to keep a group of db instances in sync (every insert,update, delete is properly propagated).
It seems that in the traditional microservice architecture, each service gets its own database with a different understanding of the data (described here). Sometimes it is considered permissible for databases to duplicate data. For instance, the "Users" service might know essentially everything about a user, whereas the "Posts" service might just store primary keys and usernames (so that the author of a post can have their name displayed, for instance). This page talks about eventual consistency, sources of truth, and other related concepts when data is duplicated. I understand that microservice architectures sometimes include a shared database, but most places I look suggest that this is a rare strategy.
As for why each service typically gets its own database, all I've seen so far is "so that each service owns its own resources," but I'm not convinced that a) the service layer in any way "owns" the persisted resources accessed through the database to begin with, or that b) services even need to own the resources they require rather than accessing necessary subsets of the master resources through a shared database.
So what are some of the justifications that each service in a microservice architecture should get its own database?
There are a few reasons why it does make sense to use a separate database per micro-service. Some of them are:
Scaling
Splitting your domain in micro-services is fine. You can scale your particular micro-service on the deployed web-server on demand or scale out as needed. That it obviously one of the benefits when using micro-services. More importantly you can have micro-service-1 running for example on 10 servers as it demands this traffic but micro-service-2 only requires 1 web-server so you deploy it on 1 server. The good thing is that you control this and you can manage your computing resources like in order to save money as Cloud providers are not cheap.
Considering this what about the database?
If you have one database for multiple services you could not do this. You could not scale the databases individually as they would be on one server.
Data partitioning to reduce size
Automatically as you split your domain in micro-services with each containing 1 database you split the amount of data that is stored in each database. Ideally if you do this you can have smaller database servers with less computing power and/or RAM.
In general paying for multiple small servers is cheaper then one large one.
So in this case you could make use of this fact and save some resources as well.
If it happens that the already spited by domain database have large amount of data techniques like data sharding or data partitioning could be applied additional, but this is another topic.
Which db technology fits the business requirement
This is very important pro fact for having multiple databases. It would allow you to pick the database technology which fits your Business requirement best in order to get the best performance or usage of it. For example some specific micro-service might have some Read-heavy operations with very complex filter options and a full text search requirement. Using Elastic Search in this case would be a good choice. Some other micro-service might use SQL Server as it requires SQL specific features like transnational behavior or similar. If for some reason you have one database for all services you would be stuck with the particular database technology which might not be so performant for those requirement. It is a compromise for sure.
Developer discipline
If for some reason you would have a couple micro-services which would share their database you would need to deal with the human factor. The developers would need to be disciplined to not cross domains and access/modify the other micro-services database(tables, collections and etc) which would be hard to achieve and control. In large organisations with a lot of developers this could be a serious problem. With a hard/physical split this is not an issue.
Summary
There are some arguments for having database per micro-service but also some against it. In general the guidelines and suggestions when using micro-services are to have the micro-service together with its data autonomous in order to work independent in Ideal case(this is not the case always). It is defiantly a compromise as well as using micro-services in general. As always the rule is the rule but there are exceptions to it. Micro-services architecture is flexible and very dependent of your Domain needs and requirements. If you and your team identify that it makes sense to merge multiple micro-service databases to 1 and that it solves a lot of your problems then go for it.
Microservices
Microservices advocate design constraints where each service is developed, deployed and scaled independently. This philosophy is only possible if you have database per service. How can i continue my business if i have DB failure and what steps i can take to mitigate this?DB is essential part of any enterprise application. I agree there are different number of challenges when services has its own databases.
Why Independent database?
Unlike other approaches this approach not only keeps your code-base clean and extendable but you truly omit the single point of failure in your business. To achieve this services sometimes can have duplicated data as well, as long as my service is autonomous and services can only be autonomous if i have database per service.
From business point of view, Lets take eCommerce application. you have microserivces like Booking, Order, Payment, Recommendation , search and so on. Database is shared. What happens if the DB is down ? All your services are down ! and there is no point using Microservies architecture other than you have clean code base.
If you have each service having it's own database , i don't mind if my recommendation service is not working but i can still search and book the order and i haven't lost the customer. that's the whole point.
It comes at cost and challenges, but in longer run it pays off.
SQL / NoSQL
Each service has it's own needs. To get the best performance I can use SQL for payment service (transaction) and I can use (I should) NoSQL for recommendation service. Shared database wouldn't help me in this case. In modern cloud Architectures like CQRS, Event Sourcing, Materialized views, we sometimes use 2 different databases for same service to get the performance out of it.
Again Database per service is not only about resources or how much data should it own. But we really have to see the bigger picture. Yes we have certain practices how much data and duplication is good or bad but that's another debate.
Hope that helps !
hat is the standard pattern of orchestrating microservices?
If a microservice only knows about its own domain, but there is a flow of data that requires that multiple services interact in some manner, what's the way to go about it?
Let's say we have something like this:
Invoicing
Shipment
And for the sake of the argument, let's say that once an an order has been shipped, the invoice should be created.
Somewhere, someone presses a button in a GUI, "I'm done, let's do this!" In a classic monolith service architecture, I'd say that there is either an ESB handling this, or the Shipment service has knowledge of the invoice service and just calls that.
But what is the way people deal with this in this brave new world of micro-services?
I do get that this could be considered highly opinion-based. but there is a concrete side to it, as micro-services are not supposed to do the above. So there has to be a "what should it by definition do instead", which is not opinion-based.
Shoot.
well there are various ways of best database practices when dealing in micro services , it may differ with respect to domain of the entities which are being used , and also the scope of your application use.
There are few best practices for database design in micor services , to start with listing few of them
1 - Private-tables-per-service – each service owns a set of tables that must only be accessed by that service
2 - Schema-per-service – each service has a database schema that’s private to that service
3 - Database-server-per-service – each service has it’s own database server.
You can mix and match these are per your data size and data count.
I would like you to refer and go through this page for a perfect example.
Microservices Database Best practices
How can I share database connection aong in spring cloud module microservices. If there are many microservices how can i use same db connection or should i use db connection per microservices?
In my opinion, the thing that you've asked for is impossible only because each microservice is a dedicated process and it runs inside its own JVM (probably in more than one server). When you create a connection to the database (assuming you use connection pool) its always at the level of a single JVM.
I understand that the chances are that you meant something different but I had to put it on because it directly answers your question
Now, you can share the same database between microservices (the same schema, tables, etc) so that each JVM will have a set of connections opened (in accordance with connection pool definitions).
However, this is a really bad practice - you don't want to share the databases between microservice. The reason is the cost of change: if you (as a maintainer of microservice A) decide to, say, alter one of the tables, now all microservices will have to support this, and this is not a trivial thing to do.
So, a better approach is to have a service that has a "sole responsibility" for your data in some domain. Now, all the services could contact this service and ask for the required data through well-established APIs that should never be broken. In this approach, the cost of change is much "cheaper" since only this "data service" should be changed in a way that it doesn't break existing APIs.
Now regarding the database connection thing: you usually will have more than one JVM that runs the same microservice (like data microservice) so, it's not that you share connections between them, but rather you share the same way of working with database (because after all its the same code).
When dealing with a mircoservice architecture it is usually the case that you have a distributed system.
Most microservices that communicate with each other are not on the same machine, instance or container. Communication between them is most commonly done via http, though there are many other ways.
I would suggest designing mircoservices around a single concern of your application. For example, in your case, you could have a "persistence microservice" that would be responsible for dealing with data persistence operations on a single or multiple types data-stores. It could possibly deal with relational DBs, noSQL, file storage etc. Then, via REST endpoints, you can expose any persistence functionality to the mircoservices that deal with business logic.
A very easy way to build a REST service like this would be with the help of Spring Data REST project.
To answer your actual question, I'm not aware of any way to share actual connections between processes. Beyond that, having many microservices running on the same instance is not a good practice most of the time.
Mircoservices are very popular these days and everybody is trying to transition to them. My advice would be to make sure you don't "over-engineer" your project.
Hope I didn't misunderstand your question, but to be fair it is a little vague. If you could provide a longer more detailed description of your architecture and use case I can suggest more tools/frameworks you can use to achieve your cloudy goals.
First and most important - your microservice should be responsible for handling all data in a given business domain/bounded context. So the question is - 'Why do you need to share database connection between microservices and isn't this a sign you went too far with slicing your system?' Microservice is a tool and word 'micro' may be misleading a bit :)
For more reading I would suggest e.g. https://learn.microsoft.com/en-us/dotnet/standard/microservices-architecture/architect-microservice-container-applications/identify-microservice-domain-model-boundaries (don' t worry, it's general enough to be applicable also to Spring).
At my company, we're about to move to the micro services architecture. I read a lot about it, and there are tons of obscure areas where it's specific to the project built, but one area seems to get everyone to agree, microservices need to have isolated persistence or another way to say it, they need to have they own database.
Now I love the idea, that means every microservice has its own database schema, its own domain objects and is 100% independent of any other microservice data structure.
There are things I don't quite understand though.
The "Customer Service" is obviously central to the application, and we can see that basically any other microservice will need some data about the user at some point. Whether it'd be the user's credit amount, its ID, or its name.
But since other microservices can't directly read into the Customer Service database, they'll need to query this service over and over again. This is fine (I guess) for simple stuff like getting the name of current logged user, but when we need to display 60 users on a page and we can't do any SQL join, it feels like we're missing something. This is even worse when microservices depend upon tons of microservices.
So I found out that some people actually queried microservices X times a day to get data into their own microservices.
So if microservice "Search" needs data from "Product", "Customer", it'll actually query these microservices and will persist the data with its own data structure.
The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?
The first option looks a bit easier to do, we only need to have this logic on one side, and that's where the data is needed. But we'll only get static freshness of data which is not very smart, but could definitely work.
The second option looks a bit more difficult but more scalable too, because we could have very fresh data when we need it, since the data changed where it's sent, it could also be more granular.
I think you correctly identified downsides to the microservices approach! And there are no elegant solutions to these specific problems. You will have to eat the additional work and architecture deterioration that this brings.
Concretely addressing your question now:
The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?
You seem to be looking for a data synchronization service. You want to decide between push and pull. You are concerned about data freshness and logic duplication.
The key point here is that the source service cannot know about its consumers. This is to prevent an unwanted reverse dependency. This would break architectural isolation. Any data sync process that maintains this is fine. You can do what is most convenient.
For example, you could make the data source expose two APIs:
An API to get the whole data set. This would be called periodically by the destination (e.g. nightly). It can also be used to seed the destination at will and to fix data errors there.
A feed of changes in the source database keyed by the date and time the change occurred. The destination can now poll that change feed very frequently (e.g. every few seconds or minutes) and apply the small delta that occurred.
You can even build a realtime change feed through a publish-subscribe middleware. Many message queue softwares can do that. The source would just send out changes to the middleware.
Building all of this is conceptually simple but takes a lot of work. It also creates lots of ongoing work and increases the potential for bugs. Debugging becomes much harder. I have worked on systems like that.
I'm going to add a subjective note: Microservices are not well understood by many teams. The downsides are often ignored. You identified a few of the downsides correctly and they are nasty! Given what I read on the web I believe many teams do not realize the mess they are getting themselves into. Managing disparate data stores can be a nightmare. This is not a one-time "mess" but an ongoing one.
As an alternative I'd recommend using a common data store and building services simply as classes or projects that live in the same process. This gives you the microservices code structuring with the convenience of normal development. It also leaves a few of the upsides of microservices on the table.
your identification of the problem is correct.
But the solution to your problem will depend on use case to use case.
In your example of search service , product service and customer service should publish their events on kafka or similar messaging and search service listen to them and updates it.
In case of lets say in order service while creating an order for a customer , you want to check customer exists , then you might do it by calling the sync api of customer service , but for that also there are variour other approaches , i have answered here linking Microservices and allowing for one to be unavailable
From my perspective sync communication between services should be avoided , and there are way around for this , above link would help
You can use domain driven design philosophy to correctly break your services and their contract