I have a ER diagram and I want to divide it so I can implement multiple microservices
enter image description here
I used to implement monolithic type, is there a guideline on how to divide ER diagram into microservices?
For example, user microservice would have (user table, user type table). I have these table car part, car, country, complaint, shop?
In the microservices architecture, the established way of defining microservices is known as decomposition by subdomain, which is a top-down approach, based on the bounded contexts of your system.
Once they're identified, you can then figure out the data access model that is appropriate (which tables should be used by each microservice).
So I would advise to follow that model, instead of going bottom-up, from the DB to the way microservices should be defined.
Related
Is it a cardinal rule of microservices that a single database table should only be represented by a single microservice? I was asked that in an interview. My first reaction was that it should only be 1 to 1. But then I think I was overthinking it, thinking that maybe there are some edge case scenarios where that may be acceptable.
So is it a cardinal rule of microservices that a single database table should always be represented by a single microservice? Or are there some edge case scenarios where that may be acceptable? If it's a cardinal rule then is there any type of standard acronym that includes that principal? For example, relational databases have the ACID principals.
It is not a cardinal rule. But, it is the most effective way to manage data. Design patterns are not set in stone. You may choose to handle things differently.
But, each microservice should be independent. This is why we use the microservices architecture. But, say you update a table using multiple microservices, then they (the services) become interdependent. Loose coupling no longer exists. The services will impact each other any time a change takes place.
This is why, you may want to follow the following paradigms:
Private-tables-per-service – each service owns a set of tables that
must only be accessed by that service.
Schema-per-service – each service has a database schema that’s
private to that service
Database-server-per-service – each service has it’s own database
server.
Refer to the data management section here for more: https://microservices.io/patterns/
It is not just a separate database for individual microservices, there are other factors that need to consider while developing microservices like codebase, config, log etc.
Please refer to below link which explains in detail.
https://12factor.net/
I am a fan of domain driven design, I always try to persuade companies to use DDD but It is always rejected because of it's poor performance! currently I have been working on a project with a large data. some of the store procedures take more that an hour to completed. Is it possible to use DDD with this kind of projects? I guess it takes one day to run that SP in domain models even without ORM!
I don't see why domain driven design ruins performance. In my experience it adds a few extra mapping layers DB <-> Persistence Model <-> Domain Model. The mapping overhead tends towards zero when factored against round trips to a database.
I did a bit R&D on the fact tables, whether they are normalized or de-normalized.
I came across some findings which make me confused.
According to Kimball:
Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzard-like conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised.
The other finding, which I also I think is ok, by fazalhp at GeekInterview:
The main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized...
So my question is, are fact tables normalized or de-normalized? If any of these then how & why?
From the point of relational database design theory, dimension tables are usually in 2NF and fact tables anywhere between 2NF and 6NF.
However, dimensional modelling is a methodology unto itself, tailored to:
one use case, namely reporting
mostly one basic type (pattern) of a query
one main user category -- business analyst, or similar
row-store RDBMS like Oracle, SQl Server, Postgres ...
one independently controlled load/update process (ETL); all other clients are read-only
There are other DW design methodologies out there, like
Inmon's -- data structure driven
Data Vault -- data structure driven
Anchor modelling -- schema evolution driven
The main thing is not to mix-up database design theory with specific design methodology. You may look at a certain methodology through database design theory perspective, but have to study each methodology separately.
Most people working with a data warehouse are familiar with transactional RDBMS and apply various levels of normalization, so those concepts are used to describe working a star schema. What they're doing is trying to get you to unlearn all those normalization habits. This can get confusing because there is a tendency to focus on what "not" to do.
The fact table(s) will probably be the most normalized since they usually contain just numerical values along with various id's for linking to dimensions. They key with fact tables is how granular do you need to get with your data. An example for Purchases could be specific line items by product in an order or aggregated at a daily, weekly, monthly level.
My suggestion is to keep searching and studying how to design a warehouse based on your needs. Don't look to get to high levels of normalized forms. Think more about the reports you want to generate and the analysis capabilities to give your users.
Lets imagine a social network where each user can gain reputation from others by, say, delegation. So given A and B initially have a reputation of 1 when A delegates to B then A has 0 and B has 2.
Then B can delegate to C and so on.
Also - the delagation has its scope, and scopes can be nested. So A can delegate the reputaion on all topics, or only programming, or only c#. And he can delegate on programming to B but on C# to C. That means the final reputation varies depending on a given scope.
So we get a kind of directed graph structure (probably a tree but it's not yet clear what about cycles) which we need to traverse to calculate the reputation.
I'm trying to model that with DDD principles and I'm not sure what is the aggregate here.
I suppose the delegation tree/graph is a candidate for that as the aggregate is a unit of consistency. However that means the aggregates would be very large. The scope thing complicates it even more because it makes an aggregate boundry not clear. Is delegation on C# a part of aggregate with delegations on programming?
What about user? As an aggregate it would have to store references (delegations) to/from other users. Again - which aggregate a given user belongs to?
A separate question is how to efficiently calculate the reputation. I guess the graph database will be more apropriate than relational in this case but is that the only good answer?
A root aggregate in is meant to enforce invariants. The rules of delegation you've informed us about are one set of invariants. Not knowing what other invariants you may require it is hard to tell what a suitable root aggregate would be, but simply going by what you've presented "user" seems to me a perfect root aggregate to enforce all your delegation rules as invariants. A user may have one or more delegation scopes, which themselves may be root aggregates. A user can, under the rules of delegation, delegate to another user, which may in turn delegate under those same rules. This allows you to enforce all your invariants and there is no problem storing references to (other) users under the rules of DDD.
Keep asking how you can enforce your domain specific rules consistently and you will find your root aggregates.
On your separate question: a graph db seems like a better idea then a relation database, but it's hard to tell with limited information. I suggest that you post this question separately and include your considerations about relational versus graph databases.
I am new to Hadoop, MapReduce, Big Data and am trying to evaluate it's viability for a specific use case that is extremely interesting to the project that I am working on. I am not sure however if what I would like to accomplish is A) possible or B) recommended with the MapReduce model.
We essentially have a significant volume of widgets (known structure of data) and pricing models (codified in JAR files) and what we want to be able to do is to execute every combination of widget and pricing model to determine the outcomes of the pricing across the permutations of the models. The pricing models themselves will examine each widget and determine pricing based on the decision tree within the model.
This makes sense from a parallel processing on commodity infrastructure perspective in my mind but from a technical perspective I do not know if it's possible to execute external models within the MR jobs and from a practical perspective whether or not I am trying to force a use case into the technology.
The question therefore becomes is it possible; does it make sense to implement in this fashion; and if not what are other options / patterns more suited to this scenario?
EDIT
The volume and variety will grow over time. Assume for the sake of discussion here that we have a terabyte of widgets and 10s of pricing models currently. We would then expect to gro into multiple terabytes and 100s of pricing models and that the execution of the permutations would would happen frequently as widgets change and/or are added and as new categories of pricing models are introduced.
You certainly need a scalable, parallel-izable solution and hadoop can be that. You just have to massage your solution a bit so it would fit into the hadoop world.
First, You'll need to make the models and widgets implement common interfaces (speaking very abstractly here) so that you can apply and arbitrary model to an arbitrary widget without having to know anything about the actual implementation or representation.
Second, you'll have to be able to reference both models and widgets by id. That will let you build objects (writables) that hold the id of a model and the id of a widget and would thus represent one "cell" in the cross product of widgets and models. You distribute these instances across multiple servers and in doing so distribute the application of models to widgets across multiple servers. These objects (call it class ModelApply) would hold the results of a specific model-to-widget application and can be processed in the usual way with hadoop to repost on best applications.
Third, and this is the tricky part, you need to compute the actual cross product of models to widgets. You say the number of models (and therefore model id's) will number in at most the hundreds. This means that you could load that list of id's into memory in a mapper and map that list to widget id's. Each call to the mapper's map() method would pass in a widget id and would write out one instance of ModelApply for each model.
I'll leave it at that for now.