How small should a micro service be? [closed]

How small should a micro service be? [closed] - microservices

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 days ago.
Improve this question
Which are the conditions based on which a system should be split in micro-services and how "small" should a micro service be?

We where implementing micro-services architecture in multiple projects and I
will try to share my experience with them and how we where doing it.
Let me first try to explain how we split our Domain into micro-services. During this the
criteria for how small or how big the micro-service should be will be explained as well. In order to
understand that we need to see the whole approach:
Conditions based on which we split our system in micro-services:
We where splitting the micro-services based on 2 set of environments:
Based on Production/Staging setup.
This is in general how would the system run on a Production environment to be used
by our customers.
Based on Development(Developers development machine) setup.
This the setup which each developer has to have on their machine in order
to run/debug/develop the full system or parts of it.
Considering only the Production/Staging setup:
Based on DDD(Domain Driven Design) Bounded Context:
Where Bounded Context = 1 micro-service.
This was the biggest ms that we ended up having. Most of the time the
Bounded Context was also split into multiple micro-services.
Why?
The reason is that our Domain was very big so keeping the whole Bounded Context
as one micro-service was very inefficient. By inefficient I mean mostly scaling reasons, but
also development scaling(having smaller teams taking care of one micro-service) and some other reasons
as well.
CQRS(Command Query Segregation Principle):
After splitting to micros-service per Bounded Context or multiple micro-services per 1 Bounded Context
for some of those micro-services we had them split in 2 or more microservices instead of 1. One Command/Write micro-service and the second was the Read/Query micro-service.
For example
lets say you have a "Users micro-service" and "Users-Read micro-service". The "Users micro-service" was responsible for
creating, updating, deleting and general management of Users. On the other hand the "Users-Read micro-service" was just responsible
for retrieving Users(it was read-only). We where following the CQRS pattern.
The Write/Domain micro-service in some extreme cases had multiple Read micro-services. Sometimes these Read-micro-service where so small that they where just having one De-Normalized View-like
representation mostly using some kind of No-SQL db for fast access. In some cases it was so small that
from code prospective it would just have a couple of C#/Java classes in it and 1,2 Tables or JSON Collections
in their Database.
Service which provides Domain agnostic or Static work or services
Example 1 of this was a very small micro-service which was responsible for generating
specific reports in form of pdf from an html template.
Example 2 was a micro-service which was just sending simple text messages to some specific users
based on their permissions.
Considering the Development Setup:
In addition to the ones for Production/Staging setup for local purpose of developing/running we needed
special micro-services which would do some work in order to make the local setup work.
The local setup was done using docker(docker-compose).
Examples of small micro-services there was:
Database, Cache, Identity, Api Gateway and File Storage.
For all this things in the Production/Staging setup we where using a Cloud provider which
provides services for these things so we did not have a need to put them in micro-services.
But in order to have a whole system running for development/testing purposes we needed to
create small micro-services in form of Docker containers to replace these Cloud services.
Adding test/seeding data into the system.
In order to feed the local development system of micro-services with data we needed to have a
small micro-service which sole purpose was to call some API's exposed by the micro-services and
post some data into them.
This way we could setup a working development environment with predefined data in order to test
some simple business scenarios.
The good thing about this is that you can use this test data setup in combination with the local
development setup for creating integration and end-to-end tests.
How small a micro-service should be?
In one our cases the smallest micro-services where a couple of
View/Read-only mircro-services which had only one De-Normalized View(1 Table or 1 JSON Collection) which from code prospective
had a couple of C#/Java classes in it. So when it comes to code I don't think that much smaller then this would
be a reasonable option. Again this is subjective view.
This size can be considered as "too small" based on some suggestions around micro-service
which you can read about online. We did that because it helped us solve performance issues.
The demand for this data was so big that we isolated it so that we can scale it independently.
This gave us the possibility to scale this micro-service/view individualy based on its needs and independently
from the rest of that Domain.
Second case of a small micro-service is the html-to-pdf service which was just creating pdf documents based
on specific formatted html template. You can imagine how small this subset of functionality was.
Suggestion:
My suggestion for everyone designing micro-services would be to ask the right questions:
How big should micro-service be so that we don't have situation with monolith split into multiple monoliths?
This means that the size of the created micro-services would be too big and hard to manage as this was the problem
for monoliths. On top of this you get the drawbacks of distributed systems.
Is that size of a micro-service going to affect your performance?
For you customers the key is that the system performance is good, so considering this as
criteria for micro-services system architecture could be a very valid point.
Should or can we extract some critical part of the functionality/logic in order to isolate it?
Which critical logic is so important that you can not afford to have it broken or service-downtime
for it?
This way you can protect your most critical part of the system.
Can I organize my team or teams with this kind of micro-services architecture split?
Considering I have the micro-services architecture done and I have "n" micro-services
how will I manage them? This means support them, orchestrate deployment, scale based on needs,
monitor and so on?
If this architecture that you came up turns out to be challenging and not manageable for your
organisation or teams then reconsider them. Nobody needs an unmanageable system.
There are many more which could could lead you to the right directions but these where the ones we where following.
Answers to those questions will automatically lead you to the smallest/biggest possible micro-service
for your Domain/Business. My example about micro-services size from above might now work for your case
and the rules that we where using as well, but answering these questions will bring you closer to your
own approach/rule to decide that.
Conclusion
The micro-services should be as small as you need it to fit your needs. The name "micro" indicate that they can be very small.
Be careful to not make this a rule for all your system.
The smallest micro-services are rather an exception to solve some specific problem like scaling or
critical logic isolation rather then a rule for designing/splitting the whole system in micro-services of that size.
If you have to many very small micro-services just for the sake of having them and them being small you will have hard time
in managing them with no real benefit. Be careful how you split it.

Related

is it a cardinal rule of microservices that a single database table should only be represented by a single microservice?

Is it a cardinal rule of microservices that a single database table should only be represented by a single microservice? I was asked that in an interview. My first reaction was that it should only be 1 to 1. But then I think I was overthinking it, thinking that maybe there are some edge case scenarios where that may be acceptable.
So is it a cardinal rule of microservices that a single database table should always be represented by a single microservice? Or are there some edge case scenarios where that may be acceptable? If it's a cardinal rule then is there any type of standard acronym that includes that principal? For example, relational databases have the ACID principals.

It is not a cardinal rule. But, it is the most effective way to manage data. Design patterns are not set in stone. You may choose to handle things differently.
But, each microservice should be independent. This is why we use the microservices architecture. But, say you update a table using multiple microservices, then they (the services) become interdependent. Loose coupling no longer exists. The services will impact each other any time a change takes place.
This is why, you may want to follow the following paradigms:
Private-tables-per-service – each service owns a set of tables that
must only be accessed by that service.
Schema-per-service – each service has a database schema that’s
private to that service
Database-server-per-service – each service has it’s own database
server.
Refer to the data management section here for more: https://microservices.io/patterns/

It is not just a separate database for individual microservices, there are other factors that need to consider while developing microservices like codebase, config, log etc.
Please refer to below link which explains in detail.
https://12factor.net/

Guidance on Patterns and recommendations on achieving database Atomicity in distributed architecture (microservices)

Folks, I am evaluating options/ pattern and practices around key challenge of maintaining db atomicity (across multiple tables) that we are facing in distributed (microservices) architecture.
Atomicity, reliability and scale all are critical for business(it might have been common across businesses, just putting it out there).
I read few articals about achieving but it all comes at a significant cost and not without certain trade offs, which I am not ready to make.
Read couple of SO questions, and one concept SAGA seems interesting, but I don’t think our legacy database is meant to handle it.
So here I am asking experts of their personal opinion, guidance and past experience so I can save time and effort without try and learn bunch of options.
Appreciate your time and effort.

CAP theorem
CAP theorem is the key when it comes to distributed systems. Start with this to know if you want availability vs consistency.
Distributed transactions
You are right, trade offs involved and there is no right single answer. when it comes to distributed transaction it's no different. In microservices architecture Atomicity is not easy to achieve. Normally we design the microservices by keeping eventual consistency in mind. Strong consistency is very hard and not a simple solution.
SAGA vs 2PC
2PC it's very easy to achieve atomicity using 2 phase commit , but that option is not for microservices. your system can't scale system since if any of the microservice goes down your transaction will hang into abnormal state and locks are very common with this approach.
SAGA is most acceptable and scaleable approach . You commit local transaction (atomically) once done you need to publish the event , and all the interested services will have to consume the event and update their own local database. If there is exception or particular microservices can't accept the event data , it would raise compensation transaction , which mean you have to reverse and undo the actions taken by all microservices against that event. This is widely accepted pattern and is scaleable.
I don't get legacy db part. What makes you think legacy DB will have problem ? SAGA has nothing to do with legacy system . It simply mean if you have to accept the event or not. If yes then save it into database. If not then raise compensated transaction so all other service can undo.
What's the right approach ?
Well it really depends on you eventually. There are many pattern around when it comes to save the transaction . Have a look at CQRS and event sourcing pattern which is used to save all the domain events. Since disturbed transactions can be complex . CQRS solve many problems e.g. eventual consistency etc.
Hope that helps! shoot me questions if you have.

One possible option is Command Query Responsibility Segregation (CQRS) - maintain one or more materialized views that contain data from multiple services. The views are kept by services that subscribe to events that each services publishes when it updates its data. For example, the online store could implement a query that finds customers in a particular region and their recent orders by maintaining a view that joins customers and orders. The view is updated by a service that subscribes to customer and order events.

Is a data warehouse a good solution for sharing customer data across technologies?

I am wanting to be able to share data across all areas of our business in a way that reduces the overall complexity of our infrastructure.
The Problem
Our problem is that we currently have 4 main applications that all connect to our CRM application (Microsoft Dynamics 2011):
The decision-makers at our firm are currently wanting to upgrade our CRM to the most current version and, then, stay up to date as new upgrades are released (every 2-3 years). Almost all of our applications are rigidly integrated with Microsoft Dynamics so each upgrade is very expensive and risky. I want to design another approach that will reduce this expense and risk.
Research
In 2006, Roger Sessions wrote an article called A Better Path to Enterprise Architectures (here) which outlines ways to better Business IT systems. One of the central themes in his discussion is reducing complexity, and by arranging die in different ways, he shows that you can exponentially reduce the complexity of systems by partitioning technologies into segments rather than letting any technology connect to any other technology. Jeanne Ross has a great presentation on this topic as well (here), and she talks about having a digitized platform for sharing core data and services between areas of the business in order to reduce complexity of the overall system and increase agility in responding to current and future business needs.
Conclusions
As I reflect on the lessons from Sessions and Ross, I am confident that we need to take Microsoft Dynamics out of the center of our architecture if we are wanting to overhaul the technology every 2-3 years. We'll just need replace it with something that will allow our core data (mostly customer data) to be shared across applications. I know that data warehouses are often used for aggregating data across the organization. Could this work?
I understand that data warehouses are mostly used for reporting, so I don't know if having direct connections to the data warehouse would be ideal. However, each application would not need the ability to update any data in the data warehouse. They just need the ability to grab their IDs to set up relationships between global, data warehouse entities (customers) and various unit-specific entities within each application's database.
Questions
Which of these three options would meet my needs: (1) a data warehouse to which all applications connect directly, (2) a data warehouse that feeds data to each application-specific database through overnight updates or (3) something else?
Thanks

What you're after is a data integration architecture - that doesn't necessarily mean a data warehouse. The pattern you're describing is called "hub and spoke," and it's very common - I'd say you're definitely on the right track for resolving the integration problem you're describing.
This page goes into this problem and pattern in much more depth, and it also has a section on the differences between data warehousing and data integration. You've noted that you're aware data warehouses are commonly used for reporting - that's true, and they're also used heavily for analytics, as the link discusses. They're traditionally a data source for business intelligence efforts. This can mean they're not always focused on the kind of data you're interested in - i.e. operational data which your systems need to function, but which might not be of interest for reporting or analytical purposes. Or, they might not function in a way that's helpful for your needs - for instance, traditional overnight ETL loads might not be the best solution if you need your applications to be up-to-date more quickly.
All this is to say that data warehouses can definitely be used as a data hub - the EDW becomes your "master data" source, any data quality processes needed run on the EDW data, and ETL processes fire corrected data back out to the various sources - but you'll probably be better served by researching the topic of data integration than the topic of data warehousing, even if the two share a lot of similarities and can overlap.
If you create a data warehouse without any business intelligence requirements, it might not function very well as a data warehouse. A very suitable data integration/master data solution might not resolve all of the future requirements you have for a data warehouse. Equally, if you were to create a traditional data warehouse after researching data warehousing best practices, it might not fulfill your data integration requirements, or fulfill them in the best way. As the link suggests, separate the two ideas: resolve your data integration problem, and if you want a data warehouse in the future, you can use your data integration solution to help populate it.

What is the design & architecture behind facebook's status update mechanism?

I'm planning on creating a social network and I don't think I quite understand how the status update module of facebook is designed. Hoping I can find some help here. At algorithmic and datastructure level, what is the most efficient way to create a status update mechanism in a social network?
A full table scan for all friends and then sorting their updates is very naive and costly. Do we use some sort of mechanism based on hashing or something else? Please let me know.
P.S: I'm not talking about their EdgeRank algorithm but the basic status update. How do they find and fetch them from the database?
Thanks in advance for the help!

Here is a great presentation that answers your question. The specific answer comes up at around minute 55:40, but I suggest that you watch the entire presentation to understand how the solution fits into the entire architecture.
In short:
A particular server ("leaf") stores all feed items for a particular user. So data for each of your friends is stored entirely at a specific destination.
When you want to view your news feed, one of the aggregator servers sends request to all the leaf servers for your friends and ranks the results. The aggregator knows which servers to send requests to based on the userid of each friend.
This is terribly simplified, of course. This only works because all of it is memcached, the system is designed to minimize latency, some ranking is done at the leaf server that contains the friend's feed items, etc.
You really don't want to be hitting the database for any of this to work at a reasonable speed. FB use MySql mostly as a key-value store; JOINing tables is just impossible at their scale. Then they put memcache servers in front of the databases and application servers.
Having said that, don't worry about scaling problems until you have them (unless, of course, you are worrying about them for the fun of it.) On day one, scaling is the least of your problems.

Can 'moving business logic to application layer' increase performance?

In my current project, the business logic is implemented in stored procedures (a 1000+ of them) and now they want to scale it up as the business is growing. Architects have decided to move the business logic to application layer (.net) to boost performance and scalability. But they are not redesigning/rewriting anything. In short the same SQL queries which are fired from an SP will be fired from a .net function using ADO.Net. How can this yield any performance?
To the best of my understanding, we need to move business logic to application layer when we need DB independence or there is some business logic that can be better implemented in a OOP language than an RDBMS engine (like traversing a hierarchy or some image processing, etc..). In rest of the cases, if there is no complicated business logic to implement, I believe that it is better to keep the business logic in DB itself, at least the network delays between application layer and DB can be avoided this way.
Please let me know your views. I am a developer looking at some architecture decisions with a little hesitation, pardon my ignorance in the subject.

If your business logic is still in SQL statements, the database will be doing as much work as before, and you will not get better performance. (may be more work if it is not able to cache query plans as effectivily as when stored procedures were used)
To get better performance you need to move some work to the application layer, can you for example cache data on the application server, and do a lookup or a validation check without hitting the database?

Architectural arguments such as these often need to consider many trades-off, considering performance in isolation, or ideed considering only one aspect of performance such as response time tends to miss the larger picture.
There clearly some trade off between executing logic in the database layer and shipping the data back to the applciation layer and processing it there. Data-ship costs versus processing costs. As you indicate the cost and complexity of the business logic will be a significant factor, the size of the data to be shipped would be another.
It is conceivable, if the DB layer is getting busy, that offloading processing to another layer may allow greater overall throughput even if the individual responses time are increased. We could then scale the App tier in order to deal with some extra load. Would you now say that performance has been improved (greater overall throughput) or worsened (soem increase in response time).
Now consider whether the app tier might implement interesting caching strategies. Perhaps we get a very large performance win - no load on the DB at all for some requests!

I think those decisions should not be justified using architectural dogma. Data would make a great deal more sense.
Statements like "All business logic belongs in stored procedures" or "Everything should be on the middle tier" tend to be made by people whose knowledge is restricted to databases or objects, respectively. Better to combine both when you judge, and do it on the basis of measurements.
For example, if one of your procedures is crunching a lot of data and returning a handful of results, there's an argument that says it should remain on the database. There's little sense in bringing millions of rows into memory on the middle tier, crunching them, and then updating the database with another round trip.
Another consideration is whether or not the database is shared between apps. If so, the logic should stay in the database so all can use it.
Middle tiers tend to come and go, but data remains forever.
I'm an object guy myself, but I would tread lightly.
It's a complicated problem. I don't think that black and white statements will work in every case.

Well as others have already said, it depends on many factors. But from you question it seems the architects are proposing moving the stored procedures from inside DB to dynamic SQL inside the application. That sounds very dubious to me.
SQL is a set oriented language and business logic that requires massaging of large amount of data records would be better in SQL. Think complicated search and reporting type function. On the other hand line item edits with corresponding business rule validation is much better being done in a programming language. Caching of slow changing data in app tier is another advantage. This is even better if you have dedicated middle tier service that acts as a gateway to all the data. If data is shared directly among disparate applications then stored proc may be a good idea.
You also have to factor the availability/experience of SQL talent vs programming talent in the organisation.
There is realy no general answer to this question.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio