Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am currently building an app, and i would like to use microservices as pattern and GraphQl for communication. I am thinking about using kafka / rabbitmq + authZ + auth0 + apollo + prisma. And all of this running on docker.
I found many ressources on event sourcing, the advantage/disavantage, and I am stuck on how it work in the real world. As far, this is how i will do it:
Apollo engine to monitor request / responses..
Auth0 for authentification management
AuthZ for authorization
A graphql gateway. Sadly I did not find a reliable solution, I guess i have to do it my self using apollo + graphql-tool to merge schema.
And ideally:
Prisma for the read side of bill's MS
nodejs for the write side of bill's MS
Now if I understand correctly, using apache kafka + zookeeper :
Kafka as the message broker
Zookeeper as an eventstore.
If I am right, can I assume:
There would be 2 ways to validate if the request is valid:
Write's side only get events (from event store, AKA zookeeper) to validate if the requested mutation is possible.
Write's side get a snapshot from a traditional database to validate the requested mutation.
Then it publish an event to kafka (I assume kafka update zookeeper automatically), and then the message can be used by the read's side to update a private snapshot of the entity. Of course, this message can also be used by others MS.
I do not know apache kafka + zookeeper very well, in the past i only used messaging service as rabbitmq. They seems similars in the shape but very different in the usage.
The main difference between event sourcing and basic messaging is the usage of the event-store instead of a entity's snapshot? In this case, can we assume that not all MS need an event's store tactic (i mean, validating via the event store and not via a "private" database)? If yes, does anyone can explain when you need event's store and when not?
I'll try to answer your major concerns on a concept level without getting tied up with the specifics of frameworks and implementations. Hope this will help.
There would be 2 ways to validate if the request is valid:
. Write's side only get events (from event store, AKA zookeeper) to validate if the requested mutation is possible.
. Write's side get a snapshot from a traditional database to validate the requested mutation.
I'd go by the first option. To execute a command, you should rely on the current event stream as authority to determine your model's current state.
The read model of your architecture is only eventually consistent which means there is an arbitrary delay between a command happening and it being reflected on the read model. Although you can work on your architecture to try to ensure this delay will be as small as possible (even if you ignore the costs of doing so) you will always have a window where your read model is not still up to date.
That being said, your commands should be run against your command model based off your current event store.
The main difference between event sourcing and basic messaging is the usage of the event-store instead of a entity's snapshot? In this case, can we assume that not all MS need an event's store tactic (i mean, validating via the event store and not via a "private" database)? If yes, does anyone can explain when you need event's store and when not?
The whole concept of Event Sourcing is: instead of storing your state as an "updatable" piece of data which only reflects the latest stage of such data, you store your state as a series of actions (events) that can be interpreted to reach such state.
So, imagine you have a piece of your domain which reads (on a free form notation):
Entity A = { Id: 1; Name: "Something"; }
And something happens and a command arrives to change the name of such entity to "Other Thing".
In a traditional storage, you would reach for such record and update it to:
{ Id: 1; Name: "Other Thing"; }
But in an event-sourced storage, you wouldn't have such a record, you would have an event stream, with data such as:
{Entity Created with Id = 1} > {Entity with Id = 1 renamed to "Something"} > {Entity with Id = 1 renamed to "Other Thing"}
Now if you "replay" these events in order, you will reach the same state as the traditional storage, only you will "know" how your got to that state and the traditional storage will forget such history.
Now, to answer your question, you're absolutely right. Not all microservices should use an event store and that's even not recommended. In fact, in a microservices architecture each microservice should have its own persistance mechanism (many times being each a different technology) and no microservice should have direct access to another's persistance (as your diagram implies with "Another MS" reaching to the "Event Store" of your "Bill's MS").
So, the basic decision factor to you should be:
Is your microservice one where you gain more from actively storing the evolution of state inside the domain (other than reactively logging it)?
Is your microservice's domain one where your are interested in analyzing old computations? (that is, being able to restore the domain to a given point in time so you can understand its state's evolution pattern - consider here something as complex auditing where you want to understand past computations)
Even if you answer "yes" to both of these questions... will the added complexity of such architecture be worth it?
Just as a closing remark on this topic, note there are multiple patterns intertwined in your model:
Event Sourcing is just the act of storing state as a series of actions instead of an updatable central data-hub.
The pattern that deals with having Read Model vs Command Model is called CQRS (Command-Query Responsibility Segregation)
These 2 patterns are frequently used together because they match up so nicely but this is not a prerequisite. You can store your data with events and not use CQRS to split into two models AND you can organize your domain in two models (commands and queries) without storing any of them primarily as events.
Related
I'm new in ES, and only trying to sort everything in my head. I have heard that ES is actually solving the consistency issue between write and read database (with some delay for sure). But I still do not fully understand how?
If command is coming to domain and aggregate root firing event to update event store, same event is sending to update read side?? But what if message lost, we will have outdated read side.
Is projections the only solution??So instead of updating from event, read side walking through event store and reproducing aggregate (from beginning or from some snapshot). But in such case it's probably breaking some rules as read side should be simple and it should not know about domain. And also usually read side is a separate application so she can't know about aggregate.
For sure we also can use rabbitMQ or some other message broker to not lost messages,and actually I think we need. But I also read that to make it consistent "you can use rabbit or ES", but again how ES can make it consistent by own??
Benjamin is completely right about the purpose of Event Sourcing.
My answer aims to add some more details.
First:
Read models and projections aren't suppose to represent the aggregate state.
Projections are the way for event-sourced systems to build the read model for CQRS. CQRS in essence postulates that write and read models usually serve different purposes and therefore it makes perfect sense to use another model for the read side.
Therefore, you often find multiple projections building different, narrowly purposed models, targeting specific needs for queries.
Second:
By "solving consistency issues" you probably mean that in event-sourced systems each state transition is represented as an event (or multiple events). Therefore, writes are always transactional. The database you choose as your event store should support (could using some library or additional tool) real-time subscription that would allow you to receive new events in your projection, in order. For new projections, it will start reading from the start and eventually come real-time. Subscriptions usually need to keep the current processing position in the global stream of events so when the projection restarts, it starts receiving events from the point which is last known to it.
By doing this, you will guarantee that every state transition in the write model will be reflected in the read model. This is probably what you mean in your original question.
Third:
Now, all those things above imply that you cannot use a message bus (only) to deliver events to projections. Brokers give no ordering guarantees and can deliver one message more than once. Also, message brokers don't keep history so you cannot build new projections at will.
However, it doesn't mean that you can't use brokers at all. Some projections don't require ordering and are idempotent. But the feed for events to publish via a broker is the same subscription, so you get guaranteed delivery and can read past events if necessary.
Fourth:
CQRS doesn't imply separate databases. Sometimes, using CQRS just means that you use some persistence layer for your domain objects, so you read and write aggregates. But for queries, you just query at will, whatever you want. A database view is a technical example of CQRS.
Almost there:
Projections need to have little to no logic, it is true. The main point here is to ensure idempotency, if possible, so projections usually should not use operations to calculate new values based on old values and information from events.
But projections will know about your domain. Everything in your system should know about your domain.
And last:
You can definitely use different databases for write and read models without getting to Event Sourcing. You just need to choose a database that supports a change feed. SQL Server, Postgres, CosmosDb and other databases have such functionality.
P.S. I'd suggest spending some time studying those concepts. I can point to the book repository, it has CQRS and Event Sourcing examples: https://github.com/PacktPublishing/Hands-On-Domain-Driven-Design-with-.NET-Core
I have heard that ES is actually solving the consistency issue between
write and read database
To the best of my knowledge, Event sourcing has NOTHING to do with consistency between read/write to your db. Consistency between read/write has actually more to do with the type of db you are using such as relational which are mostly ACID versus the non-relational db which are often eventual consistency.
ES is not meant for that, instead ES : "Capture all changes to an application state as a sequence of events" Martin Fowler.
ES works like time machine, which allows you to change the state of your application to a specific date time in the past.
Two General Problems - EventStore and persistence layer?
I would like to understand how industry is actually dealing with this problems!
If a microservice 1 persists object X into Database A. In the same time, for micro-service 2 to feed on the data from micro-service 1, micro-service 1 writes the same object X to an event store B.
Now, the question I have is, where do I write object X first?
Database A first and then to event store B, is it fair to roll back the thread at the app level if Database A is down? Also, what should be the ideal error handle if Database A is online and persisted object X but event store B is down?
What should be the error handle look like if we go vice-versa of point 1?
I do understand that in today's world of distributed high-available systems, systems going down is questionable thing. But, it can happen. I want to understand what needs to be done when either database or event store system/cluster is down?
In general you want to avoid relying on a two-phase commit of the kind you describe.
In general, (presuming an event-sourced system; not sure if that's implicit in your question/an option for you - perhaps SqlStreamStore might be relevant in your context?), this is typically managed by having something project from from a single authoritative set of events on a pull basis - each event being written that requires an associated action against some downstream maintains a pointer to how far it has got projecting events from the base stream, and restarts from there if interrupted.
First of all, an Event store is a type of Persistence, which stores the applications state as a series of events as opposed to a flat persistence that stores the last projected state.
If a microservice 1 persists object X into Database A. In the same time, for micro-service 2 to feed on the data from micro-service 1, micro-service 1 writes the same object X to an event store B.
You are trying to have two sources of truth that must be kept in sync by some sort of distributed transaction which is not very scalable.
This is an unusual mode of using an Event store. In general an Event store is the canonical source of information, the single source of truth. You are trying to use it as an communication channel. The Event store is the persistence of an event-sourced Aggregate (see Domain Driven Design).
I see to options:
you could refactor your architecture and make the object X and event-sourced entity having as persistence the Event store. Then have a Read-model subscribe to the Event store and build a flat representation of the object X that is persisted in the database A. In other words, write first to the Event store and then in the Database A (but in an eventually consistent manner!). This is a big jump and you should really think if you want to go event-sourced.
you could use CQRS without Event sourcing. This means that after every modification, the object X emits one or more Domain events, that are persisted in the Database A in the same local transaction as the object X itself. The microservice 2 could subscribe to the Database A to get the emitted events. The actual subscribing depends on the type of database.
I have a feeling you are using event store as a channel of communication, instead of using it as a database. If you want micro-service 2 to feed on the data from micro-service 1, then you should communicate with REST services.
Of course, relying on REST services might make you less resilient to outages. In that case, using a piece of technology dedicated to communication would be the right way to go. (I'm thinking MQ/Topics, such as RabbitMQ, Kafka, etc.)
Then, once your services are talking to each other, you will still need to persist your data... but only at one single location.
Therefore, you will need to define where you want to store the data.
Ask yourself:
Who will have the governance of the data persistance ?
Is it Microservice1 ? if so, then everytime Microservice2 needs to read the data, it will make a REST call to Microservice1.
is it the other way around ? Microservice2 has the governance of the data, and Microservice1 consumes it ?
It could be a third microservice that you haven't even created yet. It depends how you applied your separation of concerns.
Let's take an example :
Microservice1's responsibility is to process our data to export them in PDF and other formats
Microservice2's responsibility is to expose a service for a legacy partner, that requires our data to be returned in a very proprietary representation.
who is going to store the data, here ?
Microservice1 should not be the one to persist the data : its job is only to convert the data to other formats. If it requires some data, it will fetch them from the one having the governance of the data.
Microservice2 should not be the one to persist the data. After all, maybe we have a number of other Microservices similar to this one, but for other partners, with different proprietary formats.
If there is a service where you can do CRUD operations, this is your guy. If you don't have such a service, maybe you can find an existing Microservice who wouldn't have conflicting responsibilities.
For instance : if I have a Microservice3 that makes sure everytime an my ObjectX is changed, it will send a PDF-representation of it to some address, and notify all my partners that the data are out-of-date. In that scenario, this Microservice looks like a good candidate to become the "governor of the data" for this part of the domain, and be the one-stop-shop for writing/reading in the database.
I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.
I'm developing small CQRS+ES framework and develop applications with it. In my system, I should log some action of the client and use it for analytics, statistics and maybe in the future do something in domain with it. For example, client (on web) download some resource(s) and I need save date, time, type (download, partial,...), from region or country (maybe IP), etc. after that in some view client can see count of download or some complex report. I'm not sure how to implement this feather.
First solution creates analytic context and some aggregate, in each client action send some command like IncreaseDownloadCounter(resourced) them handle the command and raise domain event's and updating view, but in this scenario first download occurred and after that, I send command so this is not really command and on other side version conflict increase.
The second solution is raising event, from client side and update the view model base on it, but in this type of handling my event not store in event store because it's not raise by command and never change any domain context. If is store it in event store, no aggregate to handle it after fetch for some other use.
Third solution is raising event, from client side and I store it on other database may be for each type of event have special table, but in this manner of event handle I have multiple event storage with different schema and difficult on recreating view models and trace events for recreating contexts states so in future if I add some domain for use this type of event's it's difficult to use events.
What is the best approach and solution for this scenario?
First solution creates analytic context and some aggregate
Unquestionably the wrong answer; the event has already happened, so it is too late for the domain model to complain.
What you have is a stream of events. Putting them in the same event store that you use for your aggregate event streams is fine. Putting them in a separate store is also fine. So you are going to need some other constraint to make a good choice.
Typically, reads vastly outnumber writes, so one concern might be that these events are going to saturate the domain store. That might push you towards storing these events separately from your data model (prior art: we typically keep the business data in our persistent book of record, but the sequence of http requests received by the server is typically written instead to a log...)
If you are supporting an operational view, push on the requirement that the state be recovered after a restart. You might be able to get by with building your view off of an in memory model of the event counts, and use something more practical for the representations of the events.
Thanks for your complete answer, so I should create something like the ES schema without some field (aggregate name or type, version, etc.) and collect client event in that repository, some offline process read and update read model or create command to do something on domain space.
Something like that, yes. If the view for the client doesn't actually require any validation by your model at all, then building the read model from the externally provided events is fine.
Are you recommending save some claim or authorization token of the user and sender app for validation in another process?
Maybe, maybe not. The token describes the authority of the event; our own event handler is the authority for the command(s) that is/are derived from the events. It's an interesting question that probably requires more context -- I'd suggest you open a new question on that point.
Since a couple of days I've been trying to figure it out how to inform to the rest of the microservices that a new entity was created in a microservice A that store that entity in a MongoDB.
I want to:
Have low coupling between the microservices
Avoid distributed transactions between microservices like Two Phase Commit (2PC)
At first a message broker like RabbitMQ seems to be a good tool for the job but then I see the problem of commit the new document in MongoDB and publish the message in the broker not being atomic.
Why event sourcing? by eventuate.io:
One way of solving this issue implies make the schema of the documents a bit dirtier by adding a mark that says if the document have been published in the broker and having a scheduled background process that search unpublished documents in MongoDB and publishes those to the broker using confirmations, when the confirmation arrives the document will be marked as published (using at-least-once and idempotency semantics). This solutions is proposed in this and this answers.
Reading an Introduction to Microservices by Chris Richardson I ended up in this great presentation of Developing functional domain models with event sourcing where one of the slides asked:
How to atomically update the database and publish events and publish events without 2PC? (dual write problem).
The answer is simple (on the next slide)
Update the database and publish events
This is a different approach to this one that is based on CQRS a la Greg Young.
The domain repository is responsible for publishing the events, this
would normally be inside a single transaction together with storing
the events in the event store.
I think that delegate the responsabilities of storing and publishing the events to the event store is a good thing because avoids the need of 2PC or a background process.
However, in a certain way it's true that:
If you rely on the event store to publish the events you'd have a
tight coupling to the storage mechanism.
But we could say the same if we adopt a message broker for intecommunicate the microservices.
The thing that worries me more is that the Event Store seems to become a Single Point of Failure.
If we look this example from eventuate.io
we can see that if the event store is down, we can't create accounts or money transfers, losing one of the advantages of microservices. (although the system will continue responding querys).
So, it's correct to affirmate that the Event Store as used in the eventuate example is a Single Point of Failure?
What you are facing is an instance of the Two General's Problem. Basically, you want to have two entities on a network agreeing on something but the network is not fail safe. Leslie Lamport proved that this is impossible.
So no matter how much you add new entities to your network, the message queue being one, you will never have 100% certainty that agreement will be reached. In fact, the opposite takes place: the more entities you add to your distributed system, the less you can be certain that an agreement will eventually be reached.
A practical answer to your case is that 2PC is not that bad if you consider adding even more complexity and single points of failures. If you absolutely do not want a single point of failure and wants to assume that the network is reliable (in other words, that the network itself cannot be a single point of failure), you can try a P2P algorithm such as DHT, but for two peers I bet it reduces to simple 2PC.
We handle this with the Outbox approach in NServiceBus:
http://docs.particular.net/nservicebus/outbox/
This approach requires that the initial trigger for the whole operation came in as a message on the queue but works very well.
You could also create a flag for each entry inside of the event store which tells if this event was already published. Another process could poll the event store for those unpublished events and put them into a message queue or topic. The disadvantage of this approach is that consumers of this queue or topic must be designed to de-duplicate incoming messages because this pattern does only guarantee at-least-once delivery. Another disadvantage could be latency because of the polling frequency. But since we have already entered the eventually consistent area here this might not be such a big concern.
How about if we have two event stores, and whenever a Domain Event is created, it is queued onto both of them. And the event handler on the query side, handles events popped from both the event stores.
Ofcourse every event should be idempotent.
But wouldn’t this solve our problem of the event store being a single point of entry?
Not particularly a mongodb solution but have you considered leveraging the Streams feature introduced in Redis 5 to implement a reliable event store. Take a look this intro here
I find that it has rich set of features like message tailing, message acknowledgement as well as the ability to extract unacknowledged messages easily. This surely helps to implement at least once messaging guarantees. It also support load balancing of messages using "consumer group" concept which can help with scaling the processing part.
Regarding your concern about being the single point of failure, as per the documentation, streams and consumer information can be replicated across nodes and persisted to disk (using regular Redis mechanisms I believe). This helps address the single point of failure issue. I'm currently considering using this for one of my microservices projects.