Where lookup tables should be placed in a microservices architecture? - microservices

In a microservices architecture, each microservice has its own database and tables should not be duplicated in different databases.
But there are tables, like lookup tables (called also reference tables), that are needed by multiple microservices.
Should we put lookup tables in each microservice database, or is it better to create a new microservice with a database holding all the lookup tables ?

Lookup tables will usually contain read only data (they are like view models), so they can be available across the system in whatever technical solution you choose.
A shared read only database table, a distributed cache, on a CDN...
Make sense?

Related

oracle schema sharing , is it possible?

Trying to understand if there is any such concept like this in Oracle Database.
Let's say I have two Databases, Database_A & Database_B
Database_A has schema_A, is there a way I can attach this schema to Database_B?
What I mean by this is if there is a job populating a TABLE_A in schema_A, I can see that read-only view in Database_B. We are trying to split a big Oracle database into two smaller databases and have a vast PL/SQL code, and trying to minimize the refactoring here.
Sharding might be what you're looking for. The schemas and tables will still logically exist on all databases, but you can arrange the data to be physically stored in specific databases. There might be a way to setup shardspaces, tablespaces, and user default tablespaces in a way where each schema's data is automatically stored in a specific database.
But I haven't actually used sharding. From what I've read, it seems to be designed for massive distributed OLTP systems, and it is likely complicated to administer. I'd guess this feature isn't worth the hassle unless you have petabytes of data.

Syncing data between services using Kafka JDBC Connector

I have a system with a microservice architecture. It has two services: Service A and Service B each with it's own database like in the following diagram.
As far as I understand having a separate database for each service is a better approach. In this design each service is the owner of its data, it's responsible for creating, updating, deleting and enforcing constraints.
In order to have Service A data in Database B I was thinking of using JDBC Kafka Connector, but I am not sure if Table1 and Table2 in Database B should enforce constraints from Database A.
If the constraint, like the foreign key from Table2 to Table1 should exist in Database B then, is there a way to have the connector know about this?
What are other common or better ways to sync data or solve this problem?
The easiest solution seems to sync per table without any constraints in Database B. That would make things easier but it could also lead to a situation where Service's A data in Service B is inconsistent. For example having entries in Table2 that point to a non-existing entry in Table1
If the constraint, like the foreign key from Table2 to Table1 should
exist in Database B then, is there a way to have the connector know
about this?
No unfortunately the "Kafka JDBC Connector" does not know about constraints.
Based on your question I assume that Table1 and Table2 are duplicated tables in Database B which exist in Database A. In Database A you have constraints which you are not sure you should add in Database B?
If that is the case then I am not sure if using "Kafka JDBC Connector" to sync data is the best choice.
You have a couple options:
Enforce the usage of Constraints like Foreign Keys in Database B but you would need to update it from your application level and not through "Kafka JDBC Connector". So for this option you can not use "Kafka JDBC Connector". You would need to write some small service/worker to read the data from that Kafka topic and populate your database tables. This way you control what is saved to the db and you can validate the constraints even before trying to save to your database. But the question here is do you really need to have the Constraints? They are important in micro-service-A but do you really need them in micro-service-B as it is just a copy of the data?
Not use constraints and allow temporary inconsistency. This is common in micro-services world. When working with Distributed systems you always have to think about the CAP Theorem. So you take into account that some data might at some point be inconsistent but you have to make sure that you will eventually bring it back to consistent state. This means you would need to develop on your application level some cleanup/healing mechanism which will recognize this data and correct it. So Db constraints do not necessary have to be enforced on data which the micro-service does not own and is considered as External data to that micro-service Domain.
Rethink your design. Usually we duplicate data in micro-service-B from micro-service-A in order to avoid coupling between the services so that the service micro-service-B can live and operate even when the micro-service-A is down or not running for some reason. We also do it to reduce load from micro-service-B to micro-service-A for every operation which needs data from Table1 and Table2. Table1 and Table2 are owned by micro-service-A and micro-service-A is the only source of truth for this data. Micro-service-B is using a duplicate of that data for its operations.
Looking at your databases design following questions might help you figuring out what would be the best option for you system:
Is it necessary to duplicate the data in micro-service-B?
If I duplicate the data do I need both tables and do I need all their columns/data in micro-service-B? Usually you just store/duplicate only a subset of the Entity/Table that you need.
Do I need the same table structure in micro-service-A as in micro-service-A? You have to decide this based on your Domain but very often you Denormalize your tables and change them in order to fit the needs of micro-service-B operations. As usually all these design decisions depend on your application Domain and use case.

Datavault: How to get hashes for foreign key relationships (populating link tables)

I've read the data vault book end to end, but I'm still trying to resolve one specific thing related to how you'd populate the link tables (how to get all hashes for that). From the blog of scalefree: massively parallel processing, it demonstrates that satellites and hubs can be loaded in full parallel fashion, but it doesn't go into a lot of detail related to the link tables.
Links require hash keys, thus in some way 'business keys' from multiple tables to establish the relationships, that's what they do, they record relations between hubs. There aren't very good explanations or in-depth explanations how you would retrieve the business keys of related entities when populating these link tables.
For a specific table like 'customer' things are easy for hub and satellite: just convert the business key to a hash and load both of them in parallel.
But a customer details table or a transaction table from an OLTP need some kind of join to happen to look up the business key for the customer or to look up all the related entities in the transaction (product, customer, store, etc), because those tables do not typically store (all) business key(s) as an attribute in the table.
If I assume that staging is loaded incrementally and truncated, then staging doesn't necessarily have all the entities loaded to be able to perform joins there. How to resolve this dilemma and create a design that works?
Join on tables in the source OLTP systems to generate the business keys from there and propagate them as hashes from there? (this ends up wrong if the business key was chosen incorrectly)
Use a persistent staging area, so never truncate? (then it's always possible to join on any table in there to resolve)
Use some kind of index for surrogate keys -> business keys and perform a lookup from there? (minimizes I/O a bit further and is a mix between incremental staging and persistent staging).
some other method...?
Essentially, what is the best practice for generating the hashes for all foreign key relations of your OLTP systems?
I talked to an expert about this and this is the answer I accepted from him:
The only sensible two ways to produce hashes for tables that do not have all the columns necessary to produce a business key for that table is:
In the case where you have a full load of all the tables that have the business keys (yet maybe incremental for a link table), join to the relevant source tables having the business keys in staging. This is ok, because you can guarantee you have all the data in staging at that moment.
In the case where you have incremental loads for tables having business keys, you must use a persistent staging area (PSA) to do this for you.
It is considered bad practice to join tables in source system queries in order to generate the business keys. The reason is that the data warehouse should have as little operational impact as possible.

Multi-tenant database. One collection or one db per tenant?

For a multi-tenancy architecture for a web application using a document-oriented database I can see two conceivable options:
Having one database per tenant, and the collections logically separate different kinds of object.
Having one collection per tenant, and all user data is stored in one database, with some kind of flag or object type identifier on each record.
Have there been any studies or has any documentation been produced regarding these two options and the differences between them?
Is there a particular standard or good reason why someone designing a web application which allows multiple users to store vastly different kinds of data would choose one over the other?
Aside from speed/efficiency issues, are there any other things to be said about this that would influence the decision?
EDIT I'm aware some of the terminology might be database specific, so for all wondering I am specifically referring to MongoDB.
I wouldn't want tenant specific collections. In my application, I usually hard code collection names, in the same way as I'd hardcode table names if I were using SQL tables. There'd be one comments collection that stores all comments for a blog. I would not want to deal with collection names like comments_tenant_1 and comments_tenant_2, because 1) that feels error prone, and 2) would make the application code more complicated (collection names would have to be replaced with functions that computed the collection name). And 3) the number of collections in a single database could grow huge, which would make a list of all collections look daunting, and also MongoDB isn't built for having very many collections (see the link in the comment below your question, which David B posted, https://docs.mongohq.com/use-cases/multi-tenant.html).
However, database names aren't coupled to application data structures, and you can grant permissions on databases (but not on single collections). So one database per tenant could be reasonable. As could be a per document tenant_id field in a single database for all tenants (see the above-mentioned link).

working with LINQ to Entities against multiple sql server databases

I'm building a project combined of number of sites with common subject.
The sites rely on one central database that holds the common info for all of them.
In addition, each site has another database that holds its unique info (I will refer to it as unique-db in the next lines so I won't be misunderstood).
For example, the Languages table sits in the central db. That said, I suddenly noticed that I need to use the Languages table in one of my unique-db in order for the table to act as a FK so I don't have to create the same table again in the unique-db.
Do I have to create the same table again this time in the unique-db? Or is there a way to connect tables from separate databases?
In addition, we decided using linq2entity and soon we're gonna run some complex queries against the different databases. Will I have a problem with this matter?
How should I go on with that? Was it wise to split the data into a few databases?
I really appreciate all the help I can get!
One thing that might make your life easier is to create views of the central tables in each unique db. Linq to Entities will pick up views as if they were tables.

Resources