Sharing huge data between microservices

Sharing huge data between microservices - microservices

I am designing an review analysis platform in microservices architecture.
Application is works like below;
all product reviews retrieved from ecommerce-site-a ( site-a ) as an excel file
reviews are uploaded to system with excel
Analysis agent can list all reviews, edit some of them, delete or approve
Analysis agent can export all reviews for site-a
Automated regexp based checks are applied for each review on upload and editing.
I have 3 microservices.
Reviews: Responsible for Review Crud operations plus operations similar to approve/reject..
Validations: Responsible for defining and applying validation rules on review.
Export/Import: Export service exports huge files given site name (like site-a)
The problem is at some point, validation service requires to get all reviews for site-a, apply validation rules and generate errors if is there any. I know sharing database schema's and entities breaks micro-services architecture.
One possible solution is
Whenever validation service requires reviews for a site, it requests gateway, gateway redirects request to Reviews service and response taken.
Two possible drawbacks of this approach is
validation service knows about gateway? Is it brings a dependency?
in case I have 1b reviews for a site, getting all reviews via rest request may be a problem. ( or not, I can make paginated requests from validation service to gateway..)
So what is the best practice for sharing huge data between micro-services without
sharing entity
and dublicating data
I read lot about using messaging queues but I think in my case it is not good to use messaging queue to share gigabytes of data.
edit 1: Instead of sharing entity, using data stores with rest API can be a solution? Assume I am using mongodb, instead of sharing my entity object between microservices, I can use rest interface of mongo (http://restheart.org/) and query data whenever possible.

Your problem here is not "sharing huge data", but rather the boundaries you choose to separate your micro services based on.
I can tell from your requirements that the 3 micro services you chose to separate (Reviews, Validations, Import/Export) are actually operating on the same context and business domain .. which is Reviews.
I would encourage you to reconsider your design decision and consider Reviews, as a single micro service, that handles all reviews operations and logic as a black box.

I assume that reviews are independent from each other and that validating a review therefore requires only that review, and no other reviews.
You don't want to share entities, which rules out things like shared databases, Hadoop clusters or data stores like Redis. You also don't want to duplicate data, thereby ruling out plain file copies or trigger-based replication on database level.
In summary, I'd say your aim should be a stream. Let the Validator request everything from Reviews about Site A, but not in one bulk, but in a stream of single or small packages of reviews.
The Validator can now process the reviews one after the other, at constant memory and processor consumption. To gain performance, you can make multiple instances of the Validator who pull different, disjunct pieces of the stream at the same time. Similarly, you can create multiple instances of the Reviews microservice if one alone wouldn't be able to answer the pull fast enough.
The Validator does not persist this stream, it produces only the errors and stores or sends them somewhere; this should fulfill your no-sharing no-duplication requirements.

Related

Why do microservices need to communicate with each other?

I'm new with this but in case if we have fronted + few different microservices I just don't get why do we need any of them to communicate with each other if we can manipulate between their data via axios on frontend. What is the purpose of event bus and event-driven architecture in the case if we use both frontend and backend microservices?
Okay, for my example I'm using 5 microservices. There are 2 of them:
Shopping cart
Posts
And I want to access posts microservice directly, pass their data through the event bus, so the shopping cart microservice would have its information. The reason is that posts and shopping cart both have different data bases, so is a good example doing this that way, or just through frontend with axios service?

What you are suggesting could be true for a very simple application, which hardly even needs an architecture such as microservice. It is clear why services need communication:
some services are not even accessable (for various reasons such as security) in client, so a change in them must be initiated in other backend services with such priviledge
some changes are raised in backend services and not client, e.g. a cronjob for doing some task
it would question reusability as you must consider the service to be used by not only client, but in any environment
what would happen if you want your services to be used by public, what if they do not implement part of the needed logic intentionally or by mistake
making client do everything could be so complex and would reduce flexibility
some services such as authentication are acting as a supporting mechanism to ensure safety (or anything other than main logic), these should be communicated directly by the service needing them
As for second part of your question, it depends on several factors like your business needs & models, desired scalability, performance, availability, etc. so the right answer or in another say, the answer that fits would be different.
For your problem, using event bus which is async would not be a good solution as it would hurt consistency in your services. Instead synchronous ways like a simple API call in your posts service would be a better idea.

Why does each microservice get its own database?

It seems that in the traditional microservice architecture, each service gets its own database with a different understanding of the data (described here). Sometimes it is considered permissible for databases to duplicate data. For instance, the "Users" service might know essentially everything about a user, whereas the "Posts" service might just store primary keys and usernames (so that the author of a post can have their name displayed, for instance). This page talks about eventual consistency, sources of truth, and other related concepts when data is duplicated. I understand that microservice architectures sometimes include a shared database, but most places I look suggest that this is a rare strategy.
As for why each service typically gets its own database, all I've seen so far is "so that each service owns its own resources," but I'm not convinced that a) the service layer in any way "owns" the persisted resources accessed through the database to begin with, or that b) services even need to own the resources they require rather than accessing necessary subsets of the master resources through a shared database.
So what are some of the justifications that each service in a microservice architecture should get its own database?

There are a few reasons why it does make sense to use a separate database per micro-service. Some of them are:
Scaling
Splitting your domain in micro-services is fine. You can scale your particular micro-service on the deployed web-server on demand or scale out as needed. That it obviously one of the benefits when using micro-services. More importantly you can have micro-service-1 running for example on 10 servers as it demands this traffic but micro-service-2 only requires 1 web-server so you deploy it on 1 server. The good thing is that you control this and you can manage your computing resources like in order to save money as Cloud providers are not cheap.
Considering this what about the database?
If you have one database for multiple services you could not do this. You could not scale the databases individually as they would be on one server.
Data partitioning to reduce size
Automatically as you split your domain in micro-services with each containing 1 database you split the amount of data that is stored in each database. Ideally if you do this you can have smaller database servers with less computing power and/or RAM.
In general paying for multiple small servers is cheaper then one large one.
So in this case you could make use of this fact and save some resources as well.
If it happens that the already spited by domain database have large amount of data techniques like data sharding or data partitioning could be applied additional, but this is another topic.
Which db technology fits the business requirement
This is very important pro fact for having multiple databases. It would allow you to pick the database technology which fits your Business requirement best in order to get the best performance or usage of it. For example some specific micro-service might have some Read-heavy operations with very complex filter options and a full text search requirement. Using Elastic Search in this case would be a good choice. Some other micro-service might use SQL Server as it requires SQL specific features like transnational behavior or similar. If for some reason you have one database for all services you would be stuck with the particular database technology which might not be so performant for those requirement. It is a compromise for sure.
Developer discipline
If for some reason you would have a couple micro-services which would share their database you would need to deal with the human factor. The developers would need to be disciplined to not cross domains and access/modify the other micro-services database(tables, collections and etc) which would be hard to achieve and control. In large organisations with a lot of developers this could be a serious problem. With a hard/physical split this is not an issue.
Summary
There are some arguments for having database per micro-service but also some against it. In general the guidelines and suggestions when using micro-services are to have the micro-service together with its data autonomous in order to work independent in Ideal case(this is not the case always). It is defiantly a compromise as well as using micro-services in general. As always the rule is the rule but there are exceptions to it. Micro-services architecture is flexible and very dependent of your Domain needs and requirements. If you and your team identify that it makes sense to merge multiple micro-service databases to 1 and that it solves a lot of your problems then go for it.

Microservices
Microservices advocate design constraints where each service is developed, deployed and scaled independently. This philosophy is only possible if you have database per service. How can i continue my business if i have DB failure and what steps i can take to mitigate this?DB is essential part of any enterprise application. I agree there are different number of challenges when services has its own databases.
Why Independent database?
Unlike other approaches this approach not only keeps your code-base clean and extendable but you truly omit the single point of failure in your business. To achieve this services sometimes can have duplicated data as well, as long as my service is autonomous and services can only be autonomous if i have database per service.
From business point of view, Lets take eCommerce application. you have microserivces like Booking, Order, Payment, Recommendation , search and so on. Database is shared. What happens if the DB is down ? All your services are down ! and there is no point using Microservies architecture other than you have clean code base.
If you have each service having it's own database , i don't mind if my recommendation service is not working but i can still search and book the order and i haven't lost the customer. that's the whole point.
It comes at cost and challenges, but in longer run it pays off.
SQL / NoSQL
Each service has it's own needs. To get the best performance I can use SQL for payment service (transaction) and I can use (I should) NoSQL for recommendation service. Shared database wouldn't help me in this case. In modern cloud Architectures like CQRS, Event Sourcing, Materialized views, we sometimes use 2 different databases for same service to get the performance out of it.
Again Database per service is not only about resources or how much data should it own. But we really have to see the bigger picture. Yes we have certain practices how much data and duplication is good or bad but that's another debate.
Hope that helps !

Designing microservices in practice

Yet another question on how to or how not to split up a microservice :-D
The scenario:
What do we need?
Sending emails at different points of time within the work flow of an ecommerce order process. These mails will be containing order information.
What do we have?
1 x persistence service which retrieves order information
Several services which subscribe to order events and processes the relevant use case (e.g. Confirmation, delivery, invoice)
1 x service which can be triggered to send a mail
What's the next step?
Designing the architectural component which transforms the order information so they will fit the data structure of the email rendering service.
The current options are
1 having each processing service transform already existing order information for the mail template and send them to the mail rendering service.
2 have each processing service call a new service which would aggregate and transform the order information and call the mail rendering service.
Currently we're not sure yet if the data structures for the mail templates will be mostly common or if there will be differences.
So what do you think of these options in terms of cohesion, coupling and separation of concerns?
Do you need any more information? Any constructive thoughts are welcome!

Your software architecture should reflect your organizational structure, see Conway's law
Do you have multiple teams, and you want to minimize dependencies between the teams.
Are "services" large and complex enough to justify them being separated into modules?
Does the size of the product justify having advanced devops in place to orchestrate the microservices?
Do you need the flexibility in terms of deployment and replaceability of individual "services"?
If you can answer yes to most of these questions, it would make sense to go for microservices. Otherwise, you are just making your life complicated.
Frankly, microservices require a lot of coordination overhead which makes sense only if the product is large enough. Most (small) projects are just fine with monolithic and MVC architecture.

This is how I propose to proceed man, it's how one of my project's architecture does all SMTP related stuff.
API receives an HTTP request
It persists data needed to the database.
It offloads the long-running and memory intensive processes to mail builder.
Optional, mail builder builds attachment files (XLSX, PDF, etc)
Mail builder uploads to File Server
Mail builder offloads generic SMTP sending to SMTP service.
I suggested this format because it allows you to scale the instance of each piece (Mail builder will have tons of instances) depending on bottlenecks in your processing pipeline.

Given that you have asked this question in microservices, I am assuming you are asking the question in reference to cloud native patterns.
I suggest you start with looking at microservices pattern. An excellent site for the patterns is https://microservices.io/patterns/microservices.html.
Your question does not have the necessary details to provide an educated advice on what patterns are suitable and what are not. So, I suggest you look at these few patterns...
https://microservices.io/patterns/data/shared-database.html
https://microservices.io/patterns/data/database-per-service.html
Also take a look at event sourcing pattern
https://microservices.io/patterns/data/event-sourcing.html
Hope this helps.

Distributed Database Design Architecture Use Case for Users & Authentication

I am now trying to design database for my micro service-oriented application in a distributed way. My application is related with management of universities. I have different universities say A, B, C. Each university have separate users for using their business data. Now I am planning to design separate databases for separate universities for storing their user data. So each university has their own database for their users and additional one database for managing their application tables. If I have 2 universities, Then I have 2 user details DB and other 2 DB for application tables.
Here my confusion is that, when I am searching for database design, I only see the approach of keeping one common database for storing all users (Here for one DB for all users of all universities). So every user is mixed within one database.
If I am following separate database for each university, Is possible to support distributed DB architecture pattern and micro service oriented standard? Or Do I need to keep one DB for all users?
How can I find out which method is appropriate for microservice / Distributed database design pattern?

Actually there could be multiple solutions and not one solution is best, the best solution is the one which is appropriate for your product's requirements.
I think it would be a better idea to go with separate databases for each of your client (university) to keep the data always isolated even if somethings wrong happens. Also with time, the database could go so huge that it could cause problems to configure/manage separate backups, cleanups for individual clients etc.
Now with separate databases there comes a challenge for managing distributed transactions across databases as you don't know which part is going to fail among many. To manage that, you may have to implement message/event driven mechanism across all your micro-services and ensure consistency.
Regarding message/event mechanism, here is a simple use case scenario, suppose there are two services "A" (user-registration) and "B" (email-service)
"A" registers a user temporarily and publishes an event of sending confirmation email.
The message goes to message broker
The message is received by "B".
The confirmation email is sent to the user.
The user confirms the email to "B"
The "B" publishes event of user confirmation to the broker
"A" receives the event of confirmation and the process is completed.
The above is the best case scenario, problems still can happen in between even with broker itself.
You have to go deep into it if you think you need this.
Some links that may help.
http://how-to-implement-a-microservice-event-driven-architecture-with-spring-cloud-stre
A Guide to Transactions Across Microservices

I don't think that this is a valid design, using a database per client which is a Multi-tenant architecture practice, and database per microservice is a microservice architecture practice. you are mixing things up.
if you will use microservice architecture you better design it as Bounded contexts and each Context has its own database to achieve microservices main rule Autonomy

Microservices and isolated persistence - how should the data be stored/fetched?

At my company, we're about to move to the micro services architecture. I read a lot about it, and there are tons of obscure areas where it's specific to the project built, but one area seems to get everyone to agree, microservices need to have isolated persistence or another way to say it, they need to have they own database.
Now I love the idea, that means every microservice has its own database schema, its own domain objects and is 100% independent of any other microservice data structure.
There are things I don't quite understand though.
The "Customer Service" is obviously central to the application, and we can see that basically any other microservice will need some data about the user at some point. Whether it'd be the user's credit amount, its ID, or its name.
But since other microservices can't directly read into the Customer Service database, they'll need to query this service over and over again. This is fine (I guess) for simple stuff like getting the name of current logged user, but when we need to display 60 users on a page and we can't do any SQL join, it feels like we're missing something. This is even worse when microservices depend upon tons of microservices.
So I found out that some people actually queried microservices X times a day to get data into their own microservices.
So if microservice "Search" needs data from "Product", "Customer", it'll actually query these microservices and will persist the data with its own data structure.
The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?
The first option looks a bit easier to do, we only need to have this logic on one side, and that's where the data is needed. But we'll only get static freshness of data which is not very smart, but could definitely work.
The second option looks a bit more difficult but more scalable too, because we could have very fresh data when we need it, since the data changed where it's sent, it could also be more granular.

I think you correctly identified downsides to the microservices approach! And there are no elegant solutions to these specific problems. You will have to eat the additional work and architecture deterioration that this brings.
Concretely addressing your question now:
The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?
You seem to be looking for a data synchronization service. You want to decide between push and pull. You are concerned about data freshness and logic duplication.
The key point here is that the source service cannot know about its consumers. This is to prevent an unwanted reverse dependency. This would break architectural isolation. Any data sync process that maintains this is fine. You can do what is most convenient.
For example, you could make the data source expose two APIs:
An API to get the whole data set. This would be called periodically by the destination (e.g. nightly). It can also be used to seed the destination at will and to fix data errors there.
A feed of changes in the source database keyed by the date and time the change occurred. The destination can now poll that change feed very frequently (e.g. every few seconds or minutes) and apply the small delta that occurred.
You can even build a realtime change feed through a publish-subscribe middleware. Many message queue softwares can do that. The source would just send out changes to the middleware.
Building all of this is conceptually simple but takes a lot of work. It also creates lots of ongoing work and increases the potential for bugs. Debugging becomes much harder. I have worked on systems like that.
I'm going to add a subjective note: Microservices are not well understood by many teams. The downsides are often ignored. You identified a few of the downsides correctly and they are nasty! Given what I read on the web I believe many teams do not realize the mess they are getting themselves into. Managing disparate data stores can be a nightmare. This is not a one-time "mess" but an ongoing one.
As an alternative I'd recommend using a common data store and building services simply as classes or projects that live in the same process. This gives you the microservices code structuring with the convenience of normal development. It also leaves a few of the upsides of microservices on the table.

your identification of the problem is correct.
But the solution to your problem will depend on use case to use case.
In your example of search service , product service and customer service should publish their events on kafka or similar messaging and search service listen to them and updates it.
In case of lets say in order service while creating an order for a customer , you want to check customer exists , then you might do it by calling the sync api of customer service , but for that also there are variour other approaches , i have answered here linking Microservices and allowing for one to be unavailable
From my perspective sync communication between services should be avoided , and there are way around for this , above link would help
You can use domain driven design philosophy to correctly break your services and their contract

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio