Multi tenancy with tenant sharing data - multi-tenant

I'm currently in the process of making a webapp that sell subscriptions as a multi tenant app. The tech i'm using is rails.
However, it will not just be isolated tenants using the current app.
Each tenant create products and publish them on their personnal instance of the app. Each tenant has it's own user base.
The problematic specification is that a tenant may share its product to others tenants, so they can resell it.
Explanation :
FruitShop sells apple oranges and tomatoes.
VegetableShop sells radish and pepper bell.
Fruitshop share tomatoes to other shops.
VegetableShop decide to get tomatoes from the available list of shared
items and add it to its inventory.
Now a customer browsing vegetableshop will see radish, pepper bell and
Tomatoes.
As you can guess, a select products where tenant_id='vegetableshop_ID' will not work.
I was thinking of doing a many to many relation with some kind of tenant_to_product table that would have tenant_id, product_id, price_id and even publish begin-end dates. And products would be a "half tenanted table" where the tenant ID is replaced by tenant_creator_id to know who is the original owner.
To me it seems cumbersome, adding it would mean complex query, even for shop selling only their own produts. Getting the sold products would be complicated :
select tenant_to_products.*
where tenant_to_products.tenant_ID='current tenant'
AND (tenant_to_products.product match publication constraints)
for each tenant_to_product do
# it will trigger a lot of DB call
Display tenant_to_product.product with tenant_to_product.price
Un-sharing a product would also mean a complex update modifying all tenant_to_products referencing the original product.
I'm not sure it would be a good idea to implement this constraint like this, what do you suggest me to do? Am I planning to do something stupid or is it a not so bad idea?

You are going to need a more complicated subscription to product mechanism, as you have already worked out. It sounds like you are on the right track.
Abstract the information as much as possible. For example, don't call the table 'tenant_to_product', instead call it 'tenant_relationships', and have the product Id as a column in this table.
Then, when the tenant wants to have services, you can simply add a column to this table 'service Id' without having to add a whole extra table.
For performance, you can have a read-only database server with tenant relationships that is updated on a slight delay. Azure or similar cloud services would make this easy to spin up. However, that probably isn't needed unless you're in the order of 1 million+ users.
I would suggest you consider:
Active/Inactive (Vegetable shop may prefer to temporarily stop selling Tomatoes, as they are quite faulty at the moment, until the grower stops including bugs with them)
Server-side services for notification, such as 'productRemoved' service. These services will batch-up changes, providing faster feedback to the user.
Don't delete information, instead set columns 'delete_date' and 'delete_user_id' or similar.
Full auditing history of changes to products, tenants, relationships, etc. This table will grow quite large, so avoid reading from it and ensure updates are asynchronous so that the caller isn't blocked waiting for the table to update. But it will probably be very useful from a business perspective.
EDIT:
This related question may be useful if you haven't already seen it: How to create a multi-tenant database with shared table structures?

Multi-tenancy does seem the obvious answer as you are providing multiple clients your system.
However as an alternative, perhaps consider a reseller 'service layer', this would enable a degree of isolation whilst still offering integration. Taking inspiration to how reseller accounts work with 3rd parties like Amazon.
I realise this is very abstract reasoning but perhaps considering the integration at a higher tier than the data layer could be of benefit.
From experience strictly enforcing multi-tenancy at a data layer level we have found that tenancy sometimes has to be manipulated at a business layer (like your reseller ideas) to such a degree that the tenancy becomes a tricky concept. So considering alternatives early on could help.

Related

How to handle data migrations in distributed microservice databases

so im learning about microservices and common patterns and i cant seem to find how to address this one issue.
Lets say that my customer needs a module managing customers, and a module managing purchase orders.
I believe that when dealing with microservices its pretty natural to split these two functionalities into separate services - each having its own data.
CustomerService
PurchaseOrderService
Also, he wants to have a table of purchase orders displaying the data of both customers and purchase orders, ie .: Customer name, Order number.
Now, i dont want to use the API Composition pattern because the user must be able to sort over any column he wants which (afaik) is impossible to do without slaughtering the performance using that pattern.
Instead, i choose CQRS pattern
after every purchase order / customer update a message is sent to the message broker
message broker notifies the third service about that message
the third service updates its projection in its own database
So, our third service .:
PurchaseOrderTableService
It stores all the required data in the single database - now we can query it, sort over any column we like while still maintaining a good performance.
And now, the tricky part .:
In the future, client can change his mind and say "Hey, i need the purchase orders table to display additional column - 'Customer country'"
How does one handle that data migration? So far, The PurchaseOrderTableService knows only about two columns - 'Customer name' and 'Order number'.
I imagine that this probably a pretty common problem, so what can i do to avoid reinventing the wheel?
I can of course make CustomerService generate 'CustomerUpdatedMessage' for every existing customer which would force PurchaseOrderTableService to update all its projections, but that seems like a workaround.
If that matters, the stack i thought of is java, spring, kafka, postgresql.
Divide the problem in 2:
Keeping live data in sync: your projection service from now on also needs to persist Customer Country, so all new orders will have the country as expected.
Backfill the older orders: this is a one off operation, so how you implement it really depends on your organization, technologies, etc. For example, you or a DBA can use whatever database tools you have to extract the data from the source database and do a bulk update to the target database. In other cases, you might have to solve it programmatically, for example creating a process in the projection microservice that will query the Customer's microservice API to get the data and update the local copy.
Also note that in most cases, you will already have a process to backfill data, because the need for the projection microservice might arrive months or years after the orders and customers services were created. Other times, the search service is a 3rd party search engine, like Elastic Search instead of a database. In those cases, I would always keep in hand a process to fully reindex the data.

Is it possible to replicate tables from multiple databases in Google Cloud?

The company that I work at uses a microservices architecture with the 'database per service' pattern. This pattern makes it harder to query based on data from multiple services, since each service has its own database. Imagine a service for managing your products and one for managing stock. You would have to somehow combine the data from both services to query for products based on stock.
I know that event sourcing and API composition are potential solutions to the problem, but I was wondering if it is possible to continuously replicate specific tables from the product and stock databases based on database transaction logs. Wouldn't this be much simpler than say implementing an event based solution like event sourcing? One service that I am working with contains a lot of domain events, which would make implementing and maintaining event-based solution rather complex.
Another reason for why I am considering to look at the problem from a different angle is that there is a lot of data. In-memory joins with say API composition will most likely be slow.
To sum it all up, I would like to know if it is possible to continuously replicate specific tables from different databases into one database.
The technologies that my company uses are primarily Spring Framework and PostgreSQL.
I would step back and ask why you have microservices (including why you have multiple databases). This is because it's quite easy to make choices that are superficially easy but which achieve that ease by negating the reason you had the microservices to begin with, and in such a situation, it may in fact be easier to just not do microservices.
For example, you might be doing microservices because you want to be able to have the team maintaining your product service be able to make changes without coordinating with the stock service or vice versa. By setting up a direct replication of a table from service A's database into service B's database, you essentially require many changes service A might want to make to that table to be coordinated with service B. It's perhaps less operationally coupled than unifying the services into a monolith, but in terms of developer velocity, you're giving up a fair amount.
Alternatively, if the rationale is to allow one service to be down (failures, maintenance, releases: doesn't matter) without taking the others down, a replication which guarantees strong consistency implies that taking service B's database down prevents service A from updating its database (because if you allowed service A to update its database in that situation, you couldn't have strong consistency).
Rather than direct replication, it might make sense to use change data capture (e.g. with Debezium) to publish a stream of changes from the transaction logs (e.g. to Kafka). The critical difference from logical replication is that the consumer can, for instance, choose to ignore updates to columns it doesn't care about: the stock service might include details like where things are stocked in a warehouse, for instance, which is data you don't need for answering a query like "show me the products in this category which are in stock". This can be a nice middle ground between going full event-sourcing and other approaches.

How can I divide one database to multi databases?

I want to decompose my application to adopt microservices architecture, and i will need to come up with a solid strategy to split my database (Mysql) into multiple small databases (mysql) aligned with my applications.
TL;DR: Depends on the scenario and from what each service will do
Although there is no clear answer to this, since it really depends on your needs and on what each service should do, you can come up with a general starting point (assuming you don't need to keep the existing database type).
Let's assume you have a monolithic application for an e-commerce, and you want to split this application into smaller services, each one with it's own database.
The approach you could use is to create some services that handles some parts of the website: for example you could have one service that handles users authentication,one for the orders, one for the products, one for the invoices and so on...
Now, each service will have it's own database, and here's come another question: which database a specific service should have? Because one of the advantages of this kind of architecture is that each service can have it's own kind of database, so for example the products service can have a non relational database, such as MongoDB, since all it does is getting details about products, so you don't have to manage any relation.
The orders service, on the other hand, could have a relational database, since you want to keep a relation between the order and the invoice for that order. But wait, invoices are handled by the invoice service, so how can you keep the relation between these two without sharing the database? Well, that's one of the "issues" of this approach: you have to keep services independent while also let them communicate each other. How can we do this? There is no clear answer here too... One approach could be to just pass all invoices details to the orders service as well, or you can just pass the invoice ID when saving the order and later retrieve the invoice via an API call to the invoice service, or you can pass all the relevant details you need for the invoice to an API endpoint in the order service that stores these data to a specific table in the database (since most of the times you don't need the entire actual object), etc... The possibilities are endless...

Microservices - Maintaining Multiple Data stores, initial data load etc

On aspects of granulatiry of mictoservices have read about the 2 pizza rule, services that can be developed in 2 weeks etc. When the case studies of amazon, nelflix, gilt are read we hear about 100s of services. While the service granularity does make sense, what is still not clear to me is about the data stores of each of these microservices. Will there not be just too many data stores if each of the services store/maintain their own data ?? It might be the same logical entity like a product, customer etc that is sliced & the relevant portion/attributes stored/maintained by a corresponding microservice. There could be a service that maintains basic customer information, another that maintains the additional customer information like say his subscription information or his interests etc.
Couple of questions that come to mind around the data stores
Will this not be a huge maintenance issue in terms of backups,
restores etc?
How is the initial data populated into these stores ? Are there any best practices around this ? Organisations are bound to have huge volumes of customer or product data & they will most likely be mastered in other systems.
How does this approach of multiple data stores impact the 'omni-channel' approach where it implies getting a single view of all data? Organizations might have had data consolidation initiatives going on to achieve the same
Edit: Edited the subject a bit
1.Will this not be a huge maintenance issue in terms of backups, restores etc?
From your view yes it will. I mean at the end of day you will not have just one database server to backup but tens or hundreds of them. But mostly people -at least that is what we do - is using a cloud database service to get rid of all these maintenance effort.
2.How is the initial data populated into these stores ? Are there any best practices around this ? Organisations are bound to have huge volumes of customer or product data & they will most likely be mastered in other systems.
I am not sure if there is a best way but we created a client to read the data from legacy system then convert and split it into the parts for each microservices and push them to those microservices by consuming their services. We used message queues to to be sure about health of migration.
3.How does this approach of multiple data stores impact the 'omni-channel' approach where it implies getting a single view of all data? Organizations might have had data consolidation initiatives going on to achieve the same.
Well I don't know what "omni-channel" is so I can't answer that.
Lastly you were mentioning about logical entities shared between services. The real hardest part about implementing microservices is defining what each service will provide. And while doing that you should carefully examine data needs for each services and those services should share as little as possible like only entity ids etc. At least that is what we are doing.

One database or many?

I am developing a website that will manage data for multiple entities. No data is shared between entities, but they may be owned by the same customer. A customer may want to manage all their entities from a single "dashboard". So should I have one database for everything, or keep the data seperated into individual databases?
Is there a best-practice? What are the positives/negatives for having a:
database for the entire site (entity
has a "customerID", data has
"entityID")
database for each
customer (data has "entityID")
database for each entity (relation of
database to customer is outside of
database)
Multiple databases seems like it would have better performance (fewer rows and joins) but may eventually become a maintenance nightmare.
Personally, I prefer separate databases, specifically a database for each entity. I like this approach for the following reasons:
Smaller = faster regarding the queries.
Queries are simpler.
No risk of ever accidentally displaying one customer's data to another.
One database could pose a performance bottleneck as it gets large (# of entities increase). You get a sort of build in horizontal scalability with 1 per entity.
Easy data clean up as customers or entities are removed.
Sure it'll take more time to upgrade the schema, but in my experience modifications are fairly uncommon once you deploy and additions are trivial.
I think this is hard to answer without more information.
I lean on the side of one database. Properly coded business objects should prevent you from forgetting clientId in your queries.
The type of database you are using and how it scales might help you make your decision.
For schema changes down the road, it seems one database would be easier from a maintenance perspective - you have one place to make them.
What about backup and restore? Could you experience a customer wanting to restore a backup for one of their entities?
This is a fairly normal scenario in multi-tenant SAAS applications. Both approaches have their pros and cons. Search on best practices for multi-tenant SAAS (software as a service) and you will find tons of stuff to ponder upon.
Check out this article on Microsoft's site. I think it does a nice job of laying out the different costs and benefits associated with Multi-Tenant designs. Also look at the Multi tenancy article on wikipedeia. There are many trade offs and your best match greatly depends on what type of product you are developing.
One good argument for keeping them in separate databases is that its easier to scale (you can simply have multiple installations of the server, with the client databases distributed across the servers).
Another argument is that once you are logged in, you don't need to add an extra where check (for client ID) in each of your queries.
So, a master DB backed by multiple DBs for each client may be a better approach,
If the client would ever need to restore only a single entity from a backup and leave the others in their current state, then the maintenance will be much easier if each entity is in a separate database. if they can be backed up and restored together, then it may be easier to maintain the entities as a single database.
I think you have to go with the most realistic scenario and not necessarily what a customer "may" want to do in the future. If you are going to market that feature (i.e. seeing all your entities in one dashboard), then you have to either find a solution (maybe have the dashboard pull from multiple databases) or use a single database for the whole app.
IMHO, having the data for multiple clients in the same database just seems like a bad idea to me. You'll have to remember to always filter your queries by clientID.
It also depends on your RDBMS e.g.
With SQL server databases are cheep
With Oracle it is easy to partition tables by customer "customerID", so a single large database can run as fast as a small database for each customer.
However witch every you choose, try to hide it as a low level in your data access code
Do you plan to have your code deployed to multiple environments?
If so, then try to keep it within one database and have all table references prefixed with a namespace from a configuration file.
The single database option would make the maintenance much easier.

Resources