How to handle data migrations in distributed microservice databases

How to handle data migrations in distributed microservice databases - microservices

so im learning about microservices and common patterns and i cant seem to find how to address this one issue.
Lets say that my customer needs a module managing customers, and a module managing purchase orders.
I believe that when dealing with microservices its pretty natural to split these two functionalities into separate services - each having its own data.
CustomerService
PurchaseOrderService
Also, he wants to have a table of purchase orders displaying the data of both customers and purchase orders, ie .: Customer name, Order number.
Now, i dont want to use the API Composition pattern because the user must be able to sort over any column he wants which (afaik) is impossible to do without slaughtering the performance using that pattern.
Instead, i choose CQRS pattern
after every purchase order / customer update a message is sent to the message broker
message broker notifies the third service about that message
the third service updates its projection in its own database
So, our third service .:
PurchaseOrderTableService
It stores all the required data in the single database - now we can query it, sort over any column we like while still maintaining a good performance.
And now, the tricky part .:
In the future, client can change his mind and say "Hey, i need the purchase orders table to display additional column - 'Customer country'"
How does one handle that data migration? So far, The PurchaseOrderTableService knows only about two columns - 'Customer name' and 'Order number'.
I imagine that this probably a pretty common problem, so what can i do to avoid reinventing the wheel?
I can of course make CustomerService generate 'CustomerUpdatedMessage' for every existing customer which would force PurchaseOrderTableService to update all its projections, but that seems like a workaround.
If that matters, the stack i thought of is java, spring, kafka, postgresql.

Divide the problem in 2:
Keeping live data in sync: your projection service from now on also needs to persist Customer Country, so all new orders will have the country as expected.
Backfill the older orders: this is a one off operation, so how you implement it really depends on your organization, technologies, etc. For example, you or a DBA can use whatever database tools you have to extract the data from the source database and do a bulk update to the target database. In other cases, you might have to solve it programmatically, for example creating a process in the projection microservice that will query the Customer's microservice API to get the data and update the local copy.
Also note that in most cases, you will already have a process to backfill data, because the need for the projection microservice might arrive months or years after the orders and customers services were created. Other times, the search service is a 3rd party search engine, like Elastic Search instead of a database. In those cases, I would always keep in hand a process to fully reindex the data.

Related

Is it possible to replicate tables from multiple databases in Google Cloud?

The company that I work at uses a microservices architecture with the 'database per service' pattern. This pattern makes it harder to query based on data from multiple services, since each service has its own database. Imagine a service for managing your products and one for managing stock. You would have to somehow combine the data from both services to query for products based on stock.
I know that event sourcing and API composition are potential solutions to the problem, but I was wondering if it is possible to continuously replicate specific tables from the product and stock databases based on database transaction logs. Wouldn't this be much simpler than say implementing an event based solution like event sourcing? One service that I am working with contains a lot of domain events, which would make implementing and maintaining event-based solution rather complex.
Another reason for why I am considering to look at the problem from a different angle is that there is a lot of data. In-memory joins with say API composition will most likely be slow.
To sum it all up, I would like to know if it is possible to continuously replicate specific tables from different databases into one database.
The technologies that my company uses are primarily Spring Framework and PostgreSQL.

I would step back and ask why you have microservices (including why you have multiple databases). This is because it's quite easy to make choices that are superficially easy but which achieve that ease by negating the reason you had the microservices to begin with, and in such a situation, it may in fact be easier to just not do microservices.
For example, you might be doing microservices because you want to be able to have the team maintaining your product service be able to make changes without coordinating with the stock service or vice versa. By setting up a direct replication of a table from service A's database into service B's database, you essentially require many changes service A might want to make to that table to be coordinated with service B. It's perhaps less operationally coupled than unifying the services into a monolith, but in terms of developer velocity, you're giving up a fair amount.
Alternatively, if the rationale is to allow one service to be down (failures, maintenance, releases: doesn't matter) without taking the others down, a replication which guarantees strong consistency implies that taking service B's database down prevents service A from updating its database (because if you allowed service A to update its database in that situation, you couldn't have strong consistency).
Rather than direct replication, it might make sense to use change data capture (e.g. with Debezium) to publish a stream of changes from the transaction logs (e.g. to Kafka). The critical difference from logical replication is that the consumer can, for instance, choose to ignore updates to columns it doesn't care about: the stock service might include details like where things are stocked in a warehouse, for instance, which is data you don't need for answering a query like "show me the products in this category which are in stock". This can be a nice middle ground between going full event-sourcing and other approaches.

how to break products and users and orders table in microservice design

Assume that I have three table. products, and orders, and users
I can sell my products in three ways. mobile, and desktop version and mobile-web version.
I wanted to know how should I design my microservice. I believe that I need to have users in all of my services, because, for example I need to see list of orders plus who has ordered in my imaginary order service. Additionally, I need to have user in my ProductCatalog service in order to know who has created this product . (Imagine I haven't separated my backend user's and my end users)

I would suggest thinking about the data. What are the data types that you have? Who is the producer of each type? Who is the consumer? What is the relationship between data types? What type of queries will be circulated in your system? What does a service need to know in order to reply to a query?
Then plan your services around the data. One or more services that produce each type. One or more services that consume each type. One or more services that reply to each query type.
You would obviously need a runtime engine that supports the suggested model. I can suggest Spiderwiz, an open source programming model and runtime that I believe would fit your needs nicely.

How can I divide one database to multi databases?

I want to decompose my application to adopt microservices architecture, and i will need to come up with a solid strategy to split my database (Mysql) into multiple small databases (mysql) aligned with my applications.

TL;DR: Depends on the scenario and from what each service will do
Although there is no clear answer to this, since it really depends on your needs and on what each service should do, you can come up with a general starting point (assuming you don't need to keep the existing database type).
Let's assume you have a monolithic application for an e-commerce, and you want to split this application into smaller services, each one with it's own database.
The approach you could use is to create some services that handles some parts of the website: for example you could have one service that handles users authentication,one for the orders, one for the products, one for the invoices and so on...
Now, each service will have it's own database, and here's come another question: which database a specific service should have? Because one of the advantages of this kind of architecture is that each service can have it's own kind of database, so for example the products service can have a non relational database, such as MongoDB, since all it does is getting details about products, so you don't have to manage any relation.
The orders service, on the other hand, could have a relational database, since you want to keep a relation between the order and the invoice for that order. But wait, invoices are handled by the invoice service, so how can you keep the relation between these two without sharing the database? Well, that's one of the "issues" of this approach: you have to keep services independent while also let them communicate each other. How can we do this? There is no clear answer here too... One approach could be to just pass all invoices details to the orders service as well, or you can just pass the invoice ID when saving the order and later retrieve the invoice via an API call to the invoice service, or you can pass all the relevant details you need for the invoice to an API endpoint in the order service that stores these data to a specific table in the database (since most of the times you don't need the entire actual object), etc... The possibilities are endless...

Distributed database design style for microservice-oriented architecture

I am trying to convert one monolithic application into micro service oriented architecture style. Back end I am using spring , spring boot frameworks for development. Front-end I am using angular 2. And also using PostgreSQL as database.
Here my confusion is that, when I am designing my databases as distributed, according to functionalities it may contain 5 databases. Means I am designing according to vertical partition. Then I am thinking to implement inter-microservice communication services to achieve the entire functionality.
The other way I am thinking that to horizontally partition the current structure. So my domain is based on some educational university. So half of university go under one DB and remaining will go under another DB. And deploy services according to Two region (two for two set of university).
Currently I am decided to continue with the last mentioned approach. I am new to these types of tasks, since it referring some architecture task. Also I am beginner to this microservice and distributed database world. Would someone confirm that my approach will give solution to my issue? Can I continue with my second approach - horizontal partitioning of databases according to domain object?

Can I continue with my second approach - Horizontal partitioning of
databases according to domain object?
Temporarily yes, if based on that you are able to scale your current system to meet your needs.
Now lets think about why on the first place you want to move to Microserices as a development style.
Small Components - easier to manager
Independently Deployable - Continous Delivery
Multiple Languages
The code is organized around business capabilities
and .....
When moving to Microservices, you should not have multiple services reading directly from each other databases, which will make them tightly coupled.
One service should be completely ignorant on how the other service designed its internal structure.
Now if you want to move towards microservices and take complete advantage of that, you should have vertical partition as you say and services talk to each other.
Also while moving towards microservices your will get lots and lots of other problems. I tried compiling on how one should start on microservices on this link .
How to separate services which are reading data from same table:
Now lets first create a dummy example: we have three services Order , Shipping , Customer all are three different microservices.
Following are the ways in which multiple services require data from same table:
Service one needs to read data from other service for things like validation.
Order and shipping service might need some data from customer service to complete their operation.
Eg: While placing a order one will call Order Service API with customer id , now as Order Service might need to validate whether its a valid customer or not.
One approach Database level exposure -- not recommened -- use the same customer table -- which binds order service to customer service Impl
Another approach, Call another service to get data
Variation - 1 Call Customer service to check whether customer exists and get some customer data like name , and save this in order service
Variation - 2 do not validate while placing the order, on OrderPlaced event check in async from Customer Service and validate and update state of order if required
I recommend Call another service to get data based on the consistency you want.
In some use cases you want a single transaction between data from multiple services.
For eg: Delete a customer. you might want that all order of the customer also should get deleted.
In this case you need to deal with eventual consistency, service one will raise an event and then service 2 will react accordingly.
Now if this answers your question than ok, else specify in what kind of scenario multiple service require to call another service.
If still not solved, you could email me on puneetjindal.11#gmail.com, will answer you

Currently I am decided to continue with the last mentioned approach.
If you want horizontal scalability (scaling for increasingly large number of client connections) for your database you may be better of with a technology that was designed to work as a scalable, distributed system. Something like CockroachDB or NoSQL. Cockroachdb for example has built in data sharding and replication and allows you to grow with adding server nodes as required.
when I am designing my databases as distributed, according to functionalities it may contain 5 databases
This sounds like you had the right general idea - split by domain functionality. Here's a link to a previous answer regarding general DB design with micro services.

In the Microservices world, each Microservice owns a set of functionalities and the data manipulated by these functionalities. If a microservice needs data owned by another microservice, it cannot directly go to the database maintained/owned by the other microservice rather it would call an API exposed by the other microservice.
Now, regarding the placement of data, there are various options - you can store data owned by a microservice in a NoSQL database like MongoDB, DynamoDB, Cassandra (it really depends on the microservice's use-case) OR you can have a different table for each micro-service in a single instance of a SQL database. BUT remember, if you choose a single instance of a SQL Database with multiple tables, then there would be no joins (basically no interaction) between tables owned by different microservices.
I would suggest you start small and then think about database scaling issues when the usage of the system grows.

Multi tenancy with tenant sharing data

I'm currently in the process of making a webapp that sell subscriptions as a multi tenant app. The tech i'm using is rails.
However, it will not just be isolated tenants using the current app.
Each tenant create products and publish them on their personnal instance of the app. Each tenant has it's own user base.
The problematic specification is that a tenant may share its product to others tenants, so they can resell it.
Explanation :
FruitShop sells apple oranges and tomatoes.
VegetableShop sells radish and pepper bell.
Fruitshop share tomatoes to other shops.
VegetableShop decide to get tomatoes from the available list of shared
items and add it to its inventory.
Now a customer browsing vegetableshop will see radish, pepper bell and
Tomatoes.
As you can guess, a select products where tenant_id='vegetableshop_ID' will not work.
I was thinking of doing a many to many relation with some kind of tenant_to_product table that would have tenant_id, product_id, price_id and even publish begin-end dates. And products would be a "half tenanted table" where the tenant ID is replaced by tenant_creator_id to know who is the original owner.
To me it seems cumbersome, adding it would mean complex query, even for shop selling only their own produts. Getting the sold products would be complicated :
select tenant_to_products.*
where tenant_to_products.tenant_ID='current tenant'
AND (tenant_to_products.product match publication constraints)
for each tenant_to_product do
# it will trigger a lot of DB call
Display tenant_to_product.product with tenant_to_product.price
Un-sharing a product would also mean a complex update modifying all tenant_to_products referencing the original product.
I'm not sure it would be a good idea to implement this constraint like this, what do you suggest me to do? Am I planning to do something stupid or is it a not so bad idea?

You are going to need a more complicated subscription to product mechanism, as you have already worked out. It sounds like you are on the right track.
Abstract the information as much as possible. For example, don't call the table 'tenant_to_product', instead call it 'tenant_relationships', and have the product Id as a column in this table.
Then, when the tenant wants to have services, you can simply add a column to this table 'service Id' without having to add a whole extra table.
For performance, you can have a read-only database server with tenant relationships that is updated on a slight delay. Azure or similar cloud services would make this easy to spin up. However, that probably isn't needed unless you're in the order of 1 million+ users.
I would suggest you consider:
Active/Inactive (Vegetable shop may prefer to temporarily stop selling Tomatoes, as they are quite faulty at the moment, until the grower stops including bugs with them)
Server-side services for notification, such as 'productRemoved' service. These services will batch-up changes, providing faster feedback to the user.
Don't delete information, instead set columns 'delete_date' and 'delete_user_id' or similar.
Full auditing history of changes to products, tenants, relationships, etc. This table will grow quite large, so avoid reading from it and ensure updates are asynchronous so that the caller isn't blocked waiting for the table to update. But it will probably be very useful from a business perspective.
EDIT:
This related question may be useful if you haven't already seen it: How to create a multi-tenant database with shared table structures?

Multi-tenancy does seem the obvious answer as you are providing multiple clients your system.
However as an alternative, perhaps consider a reseller 'service layer', this would enable a degree of isolation whilst still offering integration. Taking inspiration to how reseller accounts work with 3rd parties like Amazon.
I realise this is very abstract reasoning but perhaps considering the integration at a higher tier than the data layer could be of benefit.
From experience strictly enforcing multi-tenancy at a data layer level we have found that tenancy sometimes has to be manipulated at a business layer (like your reseller ideas) to such a degree that the tenancy becomes a tricky concept. So considering alternatives early on could help.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio