I am thinking what is the best way to structure your micro-services, in the past the team I was working with used Axon Framework and PostgreSQL and each microservice had its own event store in the PostgreSQL database, then we built communication between using REST.
I am thinking that it would be smarter to have all microservices talk to the same event store as we would be able to share events faster instead of rewriting the communication lines using REST.
The questions that follows from the backstory is:
What is the best practice for having an event store
Would each service have its own? Would they share the same eventstore?
Where would I find information to inspire and gather more answers? As searching the internet for best practices and how to structure the Event Store seems like searching for a needle in a haystack.
Bear in mind, the question stated is in no way aimed at Axon Framework, but more the general idea on building scalable and good code. As the applications would work with each own event store for write model and read models.
Thank you for reading and I wish you all the best
-- Me
I'd add a slightly different notion to Tore's response, although the mainline is identical to what I'm sharing here. So, I don't aim to overrule Tore, just hoping to provide additional insight.
If the (micro)services belong to the same Bounded Context, then they're allowed to "learn about each other's language."
This language thus includes the events these applications publish and store.
Whenever there's communication required between different Bounded Contexts, you'd separate the stores, as one context shouldn't be bothered by the specifics of another context.
Hence it is beneficial to deduce what services belong to which Bounded Context since that would dictate the required separation.
Axon aims to support this by allowing multiple contexts with the Axon Server, as you can read here.
It simply allows the registration of applications to specific contexts, within which it will completely separate all message streams (so commands, events, and queries) and the Event Store.
You can also set this up from scratch yourself, of course. Tore's recommendation of Kafka is what's used quite broadly for Event Streaming needs between applications. Honestly, any broadcast type of infrastructure suits event distribution, as that's how events are typically propagated.
You want to have one EventStore per service, just as you would want to have one relation database per service for a non EventSourced system.
Sharing a database/eventstore between services creates coupling and we have all learned the hard way that this is an anti-pattern today.
If you want to use a event log to share events across services, then Kafka is a popular choice.
Important to remember that you only do event-sourcing within a service bounded context.
Related
I am new to using graphql and we have built a backend graphql server using elixir and we are building a frontend app using react and react-relay.
My question is whether it is better to have one large subscription at the root of my query renderer instead of having loads of smaller subscriptions for individual components. I think I would prefer using lots and lots of smaller subscriptions rather than fewer (or even one) very large subscriptions but there are concerns that too many subscriptions will be very heavy. Is this valid?
TIA
There are a few things to consider here, and really, they all depend on what your definition of "very heavy" is. Note "very heavy" might mean something very different for your Elixir server implementation than it does on the client, so I will attempt to cover some directions you may want to investigate for both here.
What is your subscription transport? Websockets can be expensive and difficult to scale on both ends at a certain point, but if you can deal with unidirectional data flow (server to client only), SSE (Server-Sent Events) are a great option. See more on a breakdown between SSE and WS here. This is more a comment on your server than on your client.
From an API design perspective, I'd caution against the few (or one) large subscriptions idea. Why? Inevitably, you are going to be pushing data on the client that it never asked for; this causes unnecessary work for both client and server. Furthermore, an individual component should only be able to subscribe to data screams with data specifically designated for it. If you go the large subscription route, then you'll have to write a good deal of defensive code to filter the event stream, looking for the data you need. That shouldn't be your responsibility to micromanage, not to mention the dirty event stream on your server.
This is not necessarily to lead you down the "small subscription" route either. Ultimately, you might want to look at this hybrid approach , which articulates my opinions on the matter better than I can myself. TL;DR design the subscriptions API so that you can enjoy the tightly scoped benefits of lots of small subscriptions ("per entity," as the author titles them), but still allow you to share payloads and reuse the same handlers that your mutations do to resolve data.
Plus, if you wanted to use persisted queries the hybrid approach is going to serve you better.
In the current plan, incoming commands are handled via Function Apps, resulting in Events being sent to an Event Hub, and then materializing the views
Someone is arguing that instead of storing events in something like table storage, and materializing views based on events and snapshots, that we should:
Just stream events to a log in Azure Monitor to have auditing
We can make changes to a domain object immediately in response to a command and use the change feed as our source of events for materialized views.
He doesn’t see the advantage of even having a materialized view. Why not just use a query? Argument is we don’t expect a lot of traffic.
He wants to fulfill the whole audit log by saving events to the azure monitor log - Just an application log. Instead, that commands should just directly modify the representation of an entity in cosmos, and we'd use the change feed from CosmosDB as our domain object events, or we would create new events off of that via subscribers to that stream.
Is this actually an advantageous approach? Can ya'll think of any reasons why we wouldn't want to do that? Seems like we'd be losing something here.
He's saying we'd no longer need to be concerned with eventual consistency, as we'd have immediate consistency.
Every reference implementation I've evaluated does NOT do it the way he's suggesting. I'm not deeply versed in the advantages/disadvantages of the event sourcing / CQRS paradigm so I'm at a loss at the moment.. Currently researching furiously
This is a conceptual issue so there's not so much a code example. However, here's some references that seem to back up the approach I'm taking..
https://medium.com/#thomasweiss_io/planet-scale-event-sourcing-with-azure-cosmos-db-48a557757c8d
https://sajeetharan.com/2019/02/03/event-sourcing-with-azure-eventhub-and-cosmosdb/
https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
If your goal is only to have the audit log, state-based persistence could be a good choice. Event sourcing adds some complexity to the implementation side and unless you can identify more advantages of using it, you might not convince your team to bring this complexity to the system. There are numerous questions and answers on SO, as well as in some blog posts, about pros and cons of event sourcing, so I won't get into that discussion here.
I can warn you, though, that the second article in your list is very weak and would most probably lead you to many difficulties. The role of Event Hub there is completely unclear and it doesn't explain anything about projections and read-models (what you call "materialised views"). Only a very limited number of use-cases can live with only getting one entity by id and without being able to execute a query across multiple entities. That also probably answers your concern of having read-models at all. You will need them very soon when for the first time you will start figuring out how to get a list of entities based on some condition (query).
Using CosmosDb as the event store is completely feasible, as described in the first article if you can manage the costs involved. Just remember to set the change feed TTL to -1, otherwise, you won't be able to replay your projections when you need to.
To summarise:
Keeping the audit log can be done without event-sourcing, but you need to ensure that events are published reliably, preferably in the same transaction as the entity state update. It is often hard or impossible but you might accept the risk of your audit requirement is not strict. You can also base your audit log on the CosmosDb change feed, just collecting document changes and logging them somewhere.
Event sourcing is a powerful technique but it has both pros and cons. The most common prejudice against using event sourcing is its implementation complexity. It might not be a big issue if you have a team that is somewhat experienced in building event-sourced systems. If you don't have such a team, you might want to build a small-scale spike to get some experience.
If you don't get full buy-in from the team to use event sourcing, you will later get all the blame if anything goes wrong. And it will go wrong at some point, especially with little experience in this area.
Spend some time reading books and trying out things yourself, before going wild in production.
Don't use Event Hub for anything that it is not designed for. Event Hub is the powerful event ingestion transport with limited TTL and it should be used for that purpose.
Don't use Table Storage as the event store, unless you only read entities by id. I used it in production for such a scenario and it worked (to some extent) but you can't project read-models from there.
A simple rule of thumb is to not use products for tasks they weren't designed for.
Azure Monitor was not designed to store application domain data. Azure Monitor is designed to store telemetry data from your applications and services and provides features such as alerts and other types of integration into DevOps tools for managing the operation and health of your apps.
There is a simple reason why you were able to find articles on event sourcing using Cosmos DB and why our own docs talk about it. Because it was designed to be used this way. It is simple to set up Cosmos DB to be an append only event store for your applications and use Change Feed to fire off messages in other apps or services or, in your case, to maintain a materialized view state of domain objects within your app.
Yet another question on how to or how not to split up a microservice :-D
The scenario:
What do we need?
Sending emails at different points of time within the work flow of an ecommerce order process. These mails will be containing order information.
What do we have?
1 x persistence service which retrieves order information
Several services which subscribe to order events and processes the relevant use case (e.g. Confirmation, delivery, invoice)
1 x service which can be triggered to send a mail
What's the next step?
Designing the architectural component which transforms the order information so they will fit the data structure of the email rendering service.
The current options are
1 having each processing service transform already existing order information for the mail template and send them to the mail rendering service.
2 have each processing service call a new service which would aggregate and transform the order information and call the mail rendering service.
Currently we're not sure yet if the data structures for the mail templates will be mostly common or if there will be differences.
So what do you think of these options in terms of cohesion, coupling and separation of concerns?
Do you need any more information? Any constructive thoughts are welcome!
Your software architecture should reflect your organizational structure, see Conway's law
Do you have multiple teams, and you want to minimize dependencies between the teams.
Are "services" large and complex enough to justify them being separated into modules?
Does the size of the product justify having advanced devops in place to orchestrate the microservices?
Do you need the flexibility in terms of deployment and replaceability of individual "services"?
If you can answer yes to most of these questions, it would make sense to go for microservices. Otherwise, you are just making your life complicated.
Frankly, microservices require a lot of coordination overhead which makes sense only if the product is large enough. Most (small) projects are just fine with monolithic and MVC architecture.
This is how I propose to proceed man, it's how one of my project's architecture does all SMTP related stuff.
API receives an HTTP request
It persists data needed to the database.
It offloads the long-running and memory intensive processes to mail builder.
Optional, mail builder builds attachment files (XLSX, PDF, etc)
Mail builder uploads to File Server
Mail builder offloads generic SMTP sending to SMTP service.
I suggested this format because it allows you to scale the instance of each piece (Mail builder will have tons of instances) depending on bottlenecks in your processing pipeline.
Given that you have asked this question in microservices, I am assuming you are asking the question in reference to cloud native patterns.
I suggest you start with looking at microservices pattern. An excellent site for the patterns is https://microservices.io/patterns/microservices.html.
Your question does not have the necessary details to provide an educated advice on what patterns are suitable and what are not. So, I suggest you look at these few patterns...
https://microservices.io/patterns/data/shared-database.html
https://microservices.io/patterns/data/database-per-service.html
Also take a look at event sourcing pattern
https://microservices.io/patterns/data/event-sourcing.html
Hope this helps.
As far as my little current experience allows me to understand, one of the core concepts about "microservice" is that it relies on its own database which is independent from other microservices.
Diving into how to handle distributed transactions in a microservices system, the best strategy seems to be the Event Sourcing pattern whose core is the Event Store.
Is the event store shared between different microservices? Or there are multiple independent event stores databases for each microservice and a single common event broker?
If the first option is the solution, using CQRS I can now assume that every microservice's database is intended as query-side, while the shared event store is on the command-side. Is it a wrong assumption?
And since we are in the topic: how many retries I have to do in case of a concurrent write in a Stream using optimistic locking?
A very big big thanks in advance for every piece of advice you can give me!
Is the event store shared between different microservices? Or there are multiple independent event stores databases for each microservice and a single common event broker?
Every microservice should write to its own Event store, from their point of view. This could mean separate instances or separate partitions inside the same instance. This allows the microservices to be scaled independently.
If the first option is the solution, using CQRS I can now assume that every microservice's database is intended as query-side, while the shared event store is on the command-side. Is it a wrong assumption?
Kinda. As I wrote above each microservice should have its own Event store (or a partition inside a shared instance). A microservice should not append events to other microservice Event store.
Regarding reading events, I think that reading events should be in general permitted. Polling the Event store is the simplest (and the best in my opinion) solution to propagate changes to other microservices. It has the advantage that the remote microservice polls at the rate it can and what events it wants. This can be scaled very nice by creating Event store replicas, as much as it is needed.
There are some cases when you would want to not publish every domain event from the Event store. Some say that there are could exist internal domain events on that the other microservices should not depend. In this case you could mark the events as free (or not) for external consuming.
The cleanest solution to propagate changes in a microservice is to have live queries to whom other microservices could subscribe. It has the advantage that the projection logic does not leak to other microservice but it also has the disadvantage that the emitting microservice must define+implement those queries; you can do this when you notice that other microservices duplicate the projection logic. An example of this query is the total order price in an ecommerce application. You could have a query like this WhatIsTheTotalPriceOfTheOrder that is published every time an item is added to/removed from/updated in an Order.
And since we are in the topic: how many retries I have to do in case of a concurrent write in a Stream using optimistic locking?
As many as you need, i.e. until the write succeeds. You could have a limit of 99999, just to be detect when something is horribly wrong with the retry mechanism. In any case, the concurrent write should be retried only when a write is done at the same time on the same stream (for one Aggregate instance) and not for the entire Event store.
As a rule: in service architectures, which includes micro services, each service tracks its state in a private database.
"Private" here primarily means that no other service is permitted to write or read from it. This could mean that each service has a dedicated database server of its own, or services might share a single appliance but only have access permissions for their own piece.
Expressed another way: services communicate with each other by sharing information via the public api, not by writing messages into each others databases.
For services using event sourcing, each service would have read and write access only to its streams. If those streams happen to be stored on the same home - fine; but the correctness of the system should not depend on different services storing their events on the same appliance.
TLDR: All of these patterns apply to a single bounded context (service if you like), don't distribute domain events outside your bounded context, publish integration events onto an ESB (enterprise service bus) or something similar, as the public interface.
Ok so we have three patterns here to briefly cover individually and then together.
Microservices
CQRS
Event Sourcing
Microservices
https://learn.microsoft.com/en-us/azure/architecture/microservices/
Core objective: Isolate and decouple changes in a system to individual services, enabling independent deployment and testing without collateral impact.
This is achieved by encapsulating change behind a public API and limiting runtime dependencies between services.
CQRS
https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs
Core objective: Isolate and decouple write concerns from read concerns in a single service.
This can be achieved in a few ways, but the core idea is that the read model is a projection of the write model optimised for querying.
Event Sourcing
https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
Core objective: Use the business domain rules as your data model.
This is achieved by modelling state as an append-only stream of immutable domain events and rebuilding the current aggregate state by replaying the stream from the start.
All Together
There is a lot of great content here https://learn.microsoft.com/en-us/previous-versions/msp-n-p/jj554200(v=pandp.10)
Each of these has its own complexity, trade-offs and challenges and while a fun exercise you should consider if the cost outway the benefits. All of them apply within a single service or bounded context. As soon as you start sharing a data store between services, you open yourself up to issues, as the shared data store can not be changed in isolation as it is now a public interface.
Rather try publish integration events to a shared bus as the public interface for other services and bounded contexts to consume and use to build projections of other domain contexts data.
It's a good idea to publish integration events as idempotent snapshots of the current aggregate state (upsert X, delete X), especially if your bus is not persistent. This allows you to republish integration events from a domain if needed without producing an inconsistent state between consumers.
How can I share database connection aong in spring cloud module microservices. If there are many microservices how can i use same db connection or should i use db connection per microservices?
In my opinion, the thing that you've asked for is impossible only because each microservice is a dedicated process and it runs inside its own JVM (probably in more than one server). When you create a connection to the database (assuming you use connection pool) its always at the level of a single JVM.
I understand that the chances are that you meant something different but I had to put it on because it directly answers your question
Now, you can share the same database between microservices (the same schema, tables, etc) so that each JVM will have a set of connections opened (in accordance with connection pool definitions).
However, this is a really bad practice - you don't want to share the databases between microservice. The reason is the cost of change: if you (as a maintainer of microservice A) decide to, say, alter one of the tables, now all microservices will have to support this, and this is not a trivial thing to do.
So, a better approach is to have a service that has a "sole responsibility" for your data in some domain. Now, all the services could contact this service and ask for the required data through well-established APIs that should never be broken. In this approach, the cost of change is much "cheaper" since only this "data service" should be changed in a way that it doesn't break existing APIs.
Now regarding the database connection thing: you usually will have more than one JVM that runs the same microservice (like data microservice) so, it's not that you share connections between them, but rather you share the same way of working with database (because after all its the same code).
When dealing with a mircoservice architecture it is usually the case that you have a distributed system.
Most microservices that communicate with each other are not on the same machine, instance or container. Communication between them is most commonly done via http, though there are many other ways.
I would suggest designing mircoservices around a single concern of your application. For example, in your case, you could have a "persistence microservice" that would be responsible for dealing with data persistence operations on a single or multiple types data-stores. It could possibly deal with relational DBs, noSQL, file storage etc. Then, via REST endpoints, you can expose any persistence functionality to the mircoservices that deal with business logic.
A very easy way to build a REST service like this would be with the help of Spring Data REST project.
To answer your actual question, I'm not aware of any way to share actual connections between processes. Beyond that, having many microservices running on the same instance is not a good practice most of the time.
Mircoservices are very popular these days and everybody is trying to transition to them. My advice would be to make sure you don't "over-engineer" your project.
Hope I didn't misunderstand your question, but to be fair it is a little vague. If you could provide a longer more detailed description of your architecture and use case I can suggest more tools/frameworks you can use to achieve your cloudy goals.
First and most important - your microservice should be responsible for handling all data in a given business domain/bounded context. So the question is - 'Why do you need to share database connection between microservices and isn't this a sign you went too far with slicing your system?' Microservice is a tool and word 'micro' may be misleading a bit :)
For more reading I would suggest e.g. https://learn.microsoft.com/en-us/dotnet/standard/microservices-architecture/architect-microservice-container-applications/identify-microservice-domain-model-boundaries (don' t worry, it's general enough to be applicable also to Spring).