Data migration from monolith to microservice (Apache Airflow)

Data migration from monolith to microservice (Apache Airflow) - etl

We have a multi-tenant monolith system and are moving to a micro-services architecture. We have planned to extract 4 microservices, for data migration we are adding rest endpoint in monolith to fetch the data and endpoint in microservice to insert the data.
On a high-level, the following steps are needed:
Rest endpoint in monolith to fetch data, API will be paginated with a filter like tenantId.
Process the data or transform the data.
Insert the data in a microservice database for microservice will expose the rest endpoint.
I am new to dataflow and would like to know if it is a viable solution.
Following are my constraints:
I want to pace data copy like 10 items at a time, so it does not impact by creating load on monolith because it is serving live traffic as well.
Ability to retry on failures.
After an initial reading of dataflow, I came up with two options:
Option 1:
I will create a DAG to do ETL with two tasks
Extract and Transform Task: which will call monolith API to fetch data transform and put into S3 bucket.
Load Task: which will insert transformed data
Option 2:
Only one task
Extract, Transform, and Load: No need for S3 bucket. Fetch 10 items, transform, and load into microservice.
I am new to ETL and dataflow so let me know which approach is better. Also in case of retry, is there a way to create checkpoint in airflow say task failed after inserting 100 records in micro-service? When retry happens I don't want to start from beginning.

I understand that you are trying to migrate from monolith to micro-services.
Is it to
slowly migrate from Monolithic to Micro-services by distributing traffic between two systems.
or
Completely cutover to new Micro-services on a fine day.
If it's the second approach, I would do ETL for data migration.
If it's the first approach -
Implement an CDC/or just changes in monolithic service to publish the persistent operations to Messaging system (Kafka.Rabbit).
Implement the subscriber on Micro-services and update the DB.
Once confident on Pub/Sub implementation, redirect all reads to micro-services system.
Then slowly divert some percentage of persistent calls to micro-services which will do a rest call to old system to update old DB.
Once you are confident on new services and data quality and other requirements(performance), completely cutover to new Micro-services).
** you need to do historic sync before starting the Async Messaging process.
This is one way to smoothly cutover from Monolithic to Micro-services.

Related

What approach we should follow to create relationship between two microservices without duplicating?

Microservice architecture is docker-based, one microservice(transaction database with userId) is in Node JS, and the other is in Rust language(User database). We need to create a common API or function to retrieve data from both microservices. MongoDB is used as Database for both microservices.

There are several approaches to do that.
One possible solution is that one of the microservices will be responsible of aggregate this data so this microservice will call the other to obtain the data and then combine it with its own data and return it to the caller. This makes sense when the operation to be done is part of the domain of one of the microservices. For example, if the consumer needs user information it is normal to call the user service and this service makes whatever calls are needed to other services to return all the information.
Another possibility is to use the BFF (Backend For Frontend) pattern, this makes sense when the consumer (for example a frontend) needs different information from different domains to populate the UI, in this case, you will create an additional service that will expose an API with all the information needed for the consumer and this service will do the aggregation of the information. In certain cases, this can be done directly in the API gateway if you are using one.
The third way is similar to the first one but it needs to duplicate data so I don't know if it will be suitable for you. It consists of having a read-only copy of the data owned by one of the service in the other service and updates it asynchronously using events when this data is modified. The benefit of this approach is the performance will be better because you don't need to make the communication between services. The disadvantage is eventual consistency.

Is it considered as a good practice to connect to two different databases in on microservice?

Is it considered as a good practice to connect to two different databases in on microservice API Or I need to implement another microservice for working with the second database and call the new microservice API inside the first one?

The main thing is that you have only one microservice per database, but it is ok to have multiple databases per microservice if the business case requires it.
Your microservice can abstract multiple data sources, connect them, etc. and then just give consistent api to whoever is using it. And who's using it, doesn't care how many data sources there actually is.
It becomes an issue, if you have same database abstracted by multiple microservices. Then your microservice is no longer isolated and can break, because the data source you are using was changed by another team who's using the same data source.

How should I design my Spring Microservice?

I am trying to create a Microservice architecture for a hobby project and I am confused about some decisions. Can you please help me as I never worked using Microservice before?
One of my requirements is that my AngularJS GUI will need to show some drop-down or List of values (example: a list of countries). This can be fetched using a Microservice REST call, but where should the values come from? Can I fetch these from my Config Server? or should it come from Database? If the latter, then should each of the Microservice have their own Database for lookup value or can it be a common one?
How would server-side validation work in this case? I mean, there will certainly be a Microservice call the GUI will make for validation but should the validation service be a common Microservice for all Use Cases/Screens or should it be one per GUI page or should the CRUD Microservice be reused for validation as well?
How do I deal with a use-case where the back-end is not a Database but a Web-service call? Will I need some local DB still to maintain some state in between these calls (especially to take care of scenario where the Web-service call fails) and finally pass on the status to GUI?

First of all, there is no single way design micro-service , one has to choose according to the use case and project requirement.
Can I keep these in a Config Server? or should it come from Database?
Again, it depends upon the use case and requirement. However, because every MS should have their own DB then you can use DB if the countries have only names. But if they have some relationship with City/State then you should use DB only.
If DB should each of the Microservice have their own DB for lookup
value or can it be a common one?
No, IMO multiple MS should not depend on a single DB.Because if the DB fails then all the MS will fail, which should not be done. Each MS should work alone with depending on other DB or MS.
should the validation service be a common microservice for all
UseCases/Screens
Same as point 2
How do I deal with a use-case where the backend is not a Database call
but another Web-service call? Will I need some local DB still to
maintain some state in between these calls and finally pass on the
status to GUI?
If you are using HTTP then you should not save the state of any request. If you want to redirect the request to another MS then you can use Feign client which provides a very good way to call rest-api and other important features like: Load balancing.

Microservice architecture is simple. Here we divide each task into separate services(like Spring-boot application).
Example in every application there will be login function,registration function so on..each of these will a separate services in micro-service architecture.
1.You can store that in database, since in feature if you want add more values it is easy to add.
You can maintain separate or single db. Single db with separate collections or table for each microservices.
Validation means you are asking about who can use which microservice(Role based access)???
3.I think you have to use local db.

Microservices is a collection loosely coupled services. For example, if you are creating an ecommerce application, user management can be a service, order management can be a service and refund & chargeback management can be another service. Now each of these services can be further divided into smaller units, lets call them API Endpoints. For example - user management can have login as an endpoint and signup as another endpoint.
If you want to leverage the power of Microservice architecture in its true sense, here is what I would suggest. For the above example, create 3 Springboot Applications for each service. First thing that you should do after this, is establish trust between those applications. I would prefer JWTs for trust establishment. After that everything is a piece of cake. Here are the answers you are looking for :
You should ideally use a database, as opposed to keeping the values in config server, for fetching a list of countries so that you need not recompile your code every time a new country is added.
You can easily restrict access using #PreAuthorize if Role based access is what you are referring to.
You can use OkHttp or any other HttpClient in this usecase. And you certainly need not maintain any local db. However, you can cache the output of the webservice call if that is a requirement.
P.S.: Establishing trust between microservices can be a complex task if you dont understand all the delicacies. In which case, I would recommend going ahead with a single Springboot application; which is a monolithic architecture. I would still recommend JWTs though.

how to handle duplicated data in a micro service architecture

I am working on a jobs site where I am thinking of breaking out the jobs matching section into a micro service - everything else is a monolith.
But when thinking about how the microservice should have its own separate database, that would mean having the microservice have a separate copy of all the jobs, given the monolith would still handle all job crud functionality.
Am I thinking about this the right way and is it normal to have multiple copies of the same data spread out across different microservices?
The idea of having different databases with the same data scares me a bit, since that creates the potential for things to get out of sync.

You are trying to go away from monolith and the approach you are taking is very common, to take out part from monolith which can be converted into a microservice. Monolith starts to shrink over time and you have more number of MSs.
Coming to your question of data duplicacy, yes this is a challenge and some data needs to be duplicated but this vary case to case and difficult to say without looking into application.
You may expose API so monolith can get/create the data if needed and I strongly suggest not to sacrifice or compromise data model of microservice to avoid duplicacy, because MS will be going to more important than your monolith in future. Keep in mind you should avoid adding any new code to the monolith and even if you have to, for data ask the MS instead of the monolith.

One more thing you can try, instead of REST API call between microservices, you can use caching mechanism with event bus. Every microservice will publish CRUD changes to event bus, interested micro-service consume those events & update local cache accordingly.
Problem with REST call is, in some situation when dependent service is down we can not query main microservice, which could become bottleneck sometime.

Avoid bottlenecks in microservices

I'm going to apply Microservices for my Datawarehouse application. There are 4 main Microservices in application:
1) Data Service: Import/Export external data sources to DWH and Query data from DWH.
2) Analytics Service: for chart visualization on UI
3) Machine Learning: for recommendation system
4) Reports: for report generating
The diagram as below:
Each service has its own DB and they communicate directly with each other via TCP and Thift serialization. The problem here is Data Service suffer a high load from other services and can become a SPOF of application. Data in DWH is big too (maybe up to hundred miliions of records). How to avoid the bottlenecks for Data Service in this case? Or How do I define a properly bounded context to avoid the bottlenecks?

You may think about
splitting Data Service into few microservices, based on some business logic;
modify Data Service (if needed) to support more than one instance of service. Then use the load balancer to split requests between those instances.
A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. Load balancers are used to increase capacity (concurrent users) and reliability of applications.
Regarding "One database, multiple services":
Each microservice need to have own data storage, otherwise, you do not have a decomposition. If we are talking about relation database, then this can be achieved using one of the following patterns:
Private tables per Service – each service owns a set of tables that must only be accessed by that service
Schema perService – each service has a database schema that’s private to that service
Database per Service – each service has it’s own database.
If your services using separate tables from Data Warehouse database and Data Service only provides access layer to database without any additional processing logic, then yes, you may remove Data Service and move data access logic to corresponding services. But think on another hand - right now you have only one place (Data Service), that knows how to access and manipulate with Data Warehouse that is what microservices are about.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio