Why does each microservice get its own database?

Why does each microservice get its own database? - microservices

It seems that in the traditional microservice architecture, each service gets its own database with a different understanding of the data (described here). Sometimes it is considered permissible for databases to duplicate data. For instance, the "Users" service might know essentially everything about a user, whereas the "Posts" service might just store primary keys and usernames (so that the author of a post can have their name displayed, for instance). This page talks about eventual consistency, sources of truth, and other related concepts when data is duplicated. I understand that microservice architectures sometimes include a shared database, but most places I look suggest that this is a rare strategy.
As for why each service typically gets its own database, all I've seen so far is "so that each service owns its own resources," but I'm not convinced that a) the service layer in any way "owns" the persisted resources accessed through the database to begin with, or that b) services even need to own the resources they require rather than accessing necessary subsets of the master resources through a shared database.
So what are some of the justifications that each service in a microservice architecture should get its own database?

There are a few reasons why it does make sense to use a separate database per micro-service. Some of them are:
Scaling
Splitting your domain in micro-services is fine. You can scale your particular micro-service on the deployed web-server on demand or scale out as needed. That it obviously one of the benefits when using micro-services. More importantly you can have micro-service-1 running for example on 10 servers as it demands this traffic but micro-service-2 only requires 1 web-server so you deploy it on 1 server. The good thing is that you control this and you can manage your computing resources like in order to save money as Cloud providers are not cheap.
Considering this what about the database?
If you have one database for multiple services you could not do this. You could not scale the databases individually as they would be on one server.
Data partitioning to reduce size
Automatically as you split your domain in micro-services with each containing 1 database you split the amount of data that is stored in each database. Ideally if you do this you can have smaller database servers with less computing power and/or RAM.
In general paying for multiple small servers is cheaper then one large one.
So in this case you could make use of this fact and save some resources as well.
If it happens that the already spited by domain database have large amount of data techniques like data sharding or data partitioning could be applied additional, but this is another topic.
Which db technology fits the business requirement
This is very important pro fact for having multiple databases. It would allow you to pick the database technology which fits your Business requirement best in order to get the best performance or usage of it. For example some specific micro-service might have some Read-heavy operations with very complex filter options and a full text search requirement. Using Elastic Search in this case would be a good choice. Some other micro-service might use SQL Server as it requires SQL specific features like transnational behavior or similar. If for some reason you have one database for all services you would be stuck with the particular database technology which might not be so performant for those requirement. It is a compromise for sure.
Developer discipline
If for some reason you would have a couple micro-services which would share their database you would need to deal with the human factor. The developers would need to be disciplined to not cross domains and access/modify the other micro-services database(tables, collections and etc) which would be hard to achieve and control. In large organisations with a lot of developers this could be a serious problem. With a hard/physical split this is not an issue.
Summary
There are some arguments for having database per micro-service but also some against it. In general the guidelines and suggestions when using micro-services are to have the micro-service together with its data autonomous in order to work independent in Ideal case(this is not the case always). It is defiantly a compromise as well as using micro-services in general. As always the rule is the rule but there are exceptions to it. Micro-services architecture is flexible and very dependent of your Domain needs and requirements. If you and your team identify that it makes sense to merge multiple micro-service databases to 1 and that it solves a lot of your problems then go for it.

Microservices
Microservices advocate design constraints where each service is developed, deployed and scaled independently. This philosophy is only possible if you have database per service. How can i continue my business if i have DB failure and what steps i can take to mitigate this?DB is essential part of any enterprise application. I agree there are different number of challenges when services has its own databases.
Why Independent database?
Unlike other approaches this approach not only keeps your code-base clean and extendable but you truly omit the single point of failure in your business. To achieve this services sometimes can have duplicated data as well, as long as my service is autonomous and services can only be autonomous if i have database per service.
From business point of view, Lets take eCommerce application. you have microserivces like Booking, Order, Payment, Recommendation , search and so on. Database is shared. What happens if the DB is down ? All your services are down ! and there is no point using Microservies architecture other than you have clean code base.
If you have each service having it's own database , i don't mind if my recommendation service is not working but i can still search and book the order and i haven't lost the customer. that's the whole point.
It comes at cost and challenges, but in longer run it pays off.
SQL / NoSQL
Each service has it's own needs. To get the best performance I can use SQL for payment service (transaction) and I can use (I should) NoSQL for recommendation service. Shared database wouldn't help me in this case. In modern cloud Architectures like CQRS, Event Sourcing, Materialized views, we sometimes use 2 different databases for same service to get the performance out of it.
Again Database per service is not only about resources or how much data should it own. But we really have to see the bigger picture. Yes we have certain practices how much data and duplication is good or bad but that's another debate.
Hope that helps !

Related

Data seeding of microservices

In order to perform testing in feature branches, sometimes it's needed to recreate data from scratch, e.g:
An experimental branch irreversibly transforms data on some testing environment, so before testing another branch it's needed to refill the databases/buckets
When testing, one microservice calls other microservice api and changes "that other" service data, so we want to always start testing on clean environment.
In case of monoliths, usually it's as easy as:
Create a dump
When needed, drop the database and apply the dump after (or in the middle of, depending on the dump last migration) migrations
But it's getting much harder when we're talking about microservice architecture:
Each microservice uses its own database(s).
It means we have N databases with some denormalized data.
Some microservices have links to entities of another microservice, so it's important that all the dumps are coherent and consistent.
Different teams own different services, so it's much harder to agree on how and when to update the initial data dumps so that dumps are consistent across the services.
I'm looking for some best practices on this. Do we have anything better than trying to make snapshots of all databases at once?

How to design and Build microservices in an AWS serverless architecture?

I'm totally new to the concept of microservices and AWS serverless architecture. I Have a project that I have to divide it into microservices that should run on AWS Lambda but I face a difficulty on how to design it.
When searching I could not find a usefull documentation about how to divide and design microservices, all the docs I saw comparing monolithic app to microservices app or deploying microservice on aws lambda.
In my case I have to develop an ERP (Entreprise Resource Planning) that have to manage clients, manage stocks, manage books, manage commands.. so should I make a service for clients and a service for books... and then if I notice a lot of dependency between two microservices then I make them one ??
And for the DB, is it good to use one DB ( dynamoDB) for all microservices instead of a DB for every service in this case (ERP)?
Any help is really appreciated.
If anybody has a usefull document that can help me, I will be very thakfull.
Thanks a lot.

I think the architecture of your data and services can depend on a few things:
Which data sources are used/available
What your requirements/desired functionalities are
Business logic or any other restrictions/concerns
In order to reduce the size of a service, we want to limit the reasons why an application or another service would access that service to as few as possible. This reduces the amount of overall maintenance for managing it and also gives you a lot of flexibility when using their deployments.
For example: A service which transforms data from multiple sources and makes it available via API can split into an API using data processing service with a new, cleaner data source. This would prevent overreliance on large, older services and data, and make integration of that newer, smaller service easier for your applications.
In your scenario, you may get away with having services for managing clients, books, and stocks separately, but it also depends on how your data sources are integrated as well as what services are already available to you. You may want to create other microservices or databases to help reduce the size and organize the data into the format you want.
Depending on your business needs, combining or keeping separate two microservices can depend on different things too. Does one of these services have the potential to be useful for other applications? Is it dedicated to a specific project? Keeping services separate, small, and focused gives you room to expand or shrink things if needed. The same goes for data sources.
There are always many ways to approach it. Consider what your needs/problems are first before opting with a certain tool for creating solutions.

Microservices
It's simply small services running that can be scaled and deployed
independently.
AWS Serveless
Every application is different so you may not find single architecture that fits every application. Generally a simplge Serverless application consists of Lambda Function , Api Gateway , DB (SQL/NoSQL) . Serverless is great cloud native choice and you get availability , scalability out of box and you would be very quick to deploy your stuff.
How to design Serverless Application
There is no right answer. You need to architect your system in a way that individual microservices can work cohesively. in your case books, Stocks need to be separate microserivces which means they are separate Lambda functions. For DB , Dynamo is good and powerful choice as long as you know how NoSQL works and caveats around it. you need to think before hand what are challenges around NoSQL and how you would partition data ? What if you need to use the complex reporting and would NoSQL be good choice ? There are patterns around to get away that issue. Since Dynamo DB operate on table level so each microservice will preferably be separate table that can be scaled independently and makes more sense.
What's the right architecture for my application?
Instead of looking for one right answer i would strongly suggest to read individual component before making your mind. There are tons of articles and blogs. If i was you i would look in the following order
Microservices - why we need then ?
Serverless - In General
Event Driven architecture
SQL vs NoSQL
Lambda and DynamoDB and how they actually work
DevOps and how that would work in serverless
Patterns
Once you have bit of understanding you would be in way better position to decide what suits you best.

Microservice architecture - is database shared across all instances of the service?

I understand that microservice architecture suggests that each service should have its own private database. But when such a service is scaled, then is it one db per service instance or one db shared by all service instances?

Your first statement may be misleading to some: "each service should have its own private database."
Your architecture should be careful about sharing a single set of tables across multiple services-- that sharing frequently leads to a shared schema dependency, which creates a tight coupling that makes it difficult to update the schema without updating many of the services that share that schema at the same time.
However, sharing a single database instance (or database cluster) doesn't mean your services are accessing the same tables or even the same schema within the database. And if they aren't accessing the same tables, they aren't coupled. (Relying on the same database instance isn't coupling any more than relying on the same network. Don't confuse coupling with shared infrastructure.)
Frequently, multiple instances of the same service share the same database. In my opinion, there is nothing inherently wrong with this, but there are some things to be aware of. If you go this route, you need to be very careful when making changes to the data schema. Because multiple versions of that service may be accessing the data at the same time during updates, any schema changes need to compatible to at least any two adjacent versions. If you add a column or table, that's fine. The older version won't attempt to use it, so there will be no problem. (Note too, that the older version won't populate it either.) Removing a column or table is another problem entirely and to make that kind of breaking change, you will likely need to do it in several smaller steps to ensure that the older version of the service isn't broken. It can be done, it's just tougher.

A general rule of microservice development is that each microservice
should manage its own data. In an ideal world, the data managed by
each service would be completely independent. There would be no need
to propagate data changes made in one service to other services.
In the real world, however, complete data independence is impossible.
There will always be overlaps between the data used in different
services, Consequently, as an architect, you need to think carefully about
sharing data and managing data consistency. You need to think about
the microservices as an interacting system rather than as individual
units.
This means:
You should isolate data within each system service with as little
data sharing as possible.
If data sharing is mavoidable, you should design microservices so
that most sharing is read-only, with a minimal number of
services responsible for data updates.
If services are replicated in your system, you must include a
mechanism that can keep the database copies used by replica
services consistent.

Good question indeed. I would answer it like: "at least a database per microservice (not instance)"
A concern is the scalability of the databse itself, i.e. can service instances outscale the database?
If so, you could opt for e.g. an in-memory database or a sidecar for your microservice. The database would be ephemeral and you would need to populate it after the pod/container (re)starts. So the state not really lives in the database.
Apache Kafka is a tool that fits this spot, as it would allow you to populate the database after the service comes up and also provides the tooling to synchronize state for all currently running and future instances. But successfully implementing a Event-Sourcing with Kafka is not a trivial task, but you could come the conclusion that you don't need databases at all.
So the question remains, can service instances really outscale the database?
The answer would be "no" more often than not.
So by having a database instance per microservice (physically or logically) already gives you a lot in terms of "loose coupling and cohesive behaviour" as you don't share databases.
Another concern are breaking changes to the database between versions of the microservice. If things go wrong you could find yourself being unable to rollback. An ephemeral database could sync itself up in a compatible way.
Some say they change database technologies throughout the lifetime of a microservice, I never had the neccessity to do so, but an in-memory/sidecar approach would fit here very well.

I presume you share one database with all instances of one microservice. So that one update is available for every instance of the same microservice immediately. You may use one database instance per microservice instance to avoid the database as a single point of failure. But you would have to keep in sync every database which, it seems like an unnecesary overload for the database and application. I assume the database is able to keep a group of db instances in sync (every insert,update, delete is properly propagated).

How to share database connection between in spring cloud

How can I share database connection aong in spring cloud module microservices. If there are many microservices how can i use same db connection or should i use db connection per microservices?

In my opinion, the thing that you've asked for is impossible only because each microservice is a dedicated process and it runs inside its own JVM (probably in more than one server). When you create a connection to the database (assuming you use connection pool) its always at the level of a single JVM.
I understand that the chances are that you meant something different but I had to put it on because it directly answers your question
Now, you can share the same database between microservices (the same schema, tables, etc) so that each JVM will have a set of connections opened (in accordance with connection pool definitions).
However, this is a really bad practice - you don't want to share the databases between microservice. The reason is the cost of change: if you (as a maintainer of microservice A) decide to, say, alter one of the tables, now all microservices will have to support this, and this is not a trivial thing to do.
So, a better approach is to have a service that has a "sole responsibility" for your data in some domain. Now, all the services could contact this service and ask for the required data through well-established APIs that should never be broken. In this approach, the cost of change is much "cheaper" since only this "data service" should be changed in a way that it doesn't break existing APIs.
Now regarding the database connection thing: you usually will have more than one JVM that runs the same microservice (like data microservice) so, it's not that you share connections between them, but rather you share the same way of working with database (because after all its the same code).

When dealing with a mircoservice architecture it is usually the case that you have a distributed system.
Most microservices that communicate with each other are not on the same machine, instance or container. Communication between them is most commonly done via http, though there are many other ways.
I would suggest designing mircoservices around a single concern of your application. For example, in your case, you could have a "persistence microservice" that would be responsible for dealing with data persistence operations on a single or multiple types data-stores. It could possibly deal with relational DBs, noSQL, file storage etc. Then, via REST endpoints, you can expose any persistence functionality to the mircoservices that deal with business logic.
A very easy way to build a REST service like this would be with the help of Spring Data REST project.
To answer your actual question, I'm not aware of any way to share actual connections between processes. Beyond that, having many microservices running on the same instance is not a good practice most of the time.
Mircoservices are very popular these days and everybody is trying to transition to them. My advice would be to make sure you don't "over-engineer" your project.
Hope I didn't misunderstand your question, but to be fair it is a little vague. If you could provide a longer more detailed description of your architecture and use case I can suggest more tools/frameworks you can use to achieve your cloudy goals.

First and most important - your microservice should be responsible for handling all data in a given business domain/bounded context. So the question is - 'Why do you need to share database connection between microservices and isn't this a sign you went too far with slicing your system?' Microservice is a tool and word 'micro' may be misleading a bit :)
For more reading I would suggest e.g. https://learn.microsoft.com/en-us/dotnet/standard/microservices-architecture/architect-microservice-container-applications/identify-microservice-domain-model-boundaries (don' t worry, it's general enough to be applicable also to Spring).

Microservices, Dependencies and Events

I’ve been doing a lot of googling regarding managing dependencies between microservices. We’re trying to move away from big monolithic app into micro-services in order to scale organizationally and be able to develop faster and with multiple teams working in parallel.
However, as we’re trying to functionally partition the monolith into the microservices, we see how intertwined business logic and data really is. This was not a problem when we were sitting on top of one big DB and were able to do big relational joins. But with microservices, this becomes a problem.
One solution is to make microservice-A go to 5-10 other microservices to get necessary data (this is equivalent of DB view with join). Another solution is to make microservice-A listen to events from 5-10 other services and populate local storage with relevant into (this is an equivalent of materialized view). Either way, microservice-A is coupled with 5-10 other services, and if new info is needed in microservice-A, the some of the services that it depends upon might will need to be release prior to microservice-A. Please note that microservice-A is itself depended upon by other services. Bottom line, we end up with DISTRIBUTED dependency hell.
Many articles advocate for second solution – i.e. something along the lines of Event Sourcing, Choreography, etc.
I would appreciate any shared experiences, recommendations and insights.
Philometor.

While not technically an "answer", I can definitely share some of my observations and experiences. Your question concerning services calling other services for database operations reminded me of a project where an architect sold senior management on the idea of "decoupling" persistence from the rest of the applications by implementing hundreds of REST interfaces in what essentially was a distributed DAO pattern in front of a very large enterprise database. The project ended up exactly the way I predicted - a dismal failure.
Microservices aren't about turning a monolithic application into a distributed monolithic application. In my example project above, the monolith was turned into a stove-piped, fragile, chaotic mess, with the coupling only moved to service contracts instead of Java class method signatures, and with a performance hit so bad the application was unusable. Last I heard they are still running their original monolith.
Microservices should be more of a vertical partitioning of your application and not a horizontal one. In my opinion it's better to think in terms of business function partitioning rather than "converting" an existing monolith. There's no rule that determines how big a microservice must be, but it should be big enough to do one complete synchronous function without needing to directly depend on outside services (as much as possible) to complete its work. If a microservice performs a complex business function that affects 50 tables, so be it! It owns those many tables. Ideally if a service goes down, it should affect only that business functionality it's responsible for, and not directly affect other services. As you can see, this thinking is the complete opposite from that which produced the distributed mess in my project example.
Not only do you need to ensure that the motivation behind replacing monoliths with microservices is sound, but also you need to step outside the monolith and revisit the actual business and begin partitioning that instead. Like everything else, baby steps are the way to go. Start with one small complete business function, and convert that into a single microservice instead of trying to replace a monolith all at once.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio