In micro service with Event sourcing, should i save command or event into service db or should it be one big db? - microservices

As question says, whats the best practice for storing commands and event?!
should i store only commands, since commands will generate the events?!

As question says, whats the best practice for storing commands and event?! should i store only commands, since commands will generate the events?!
Storing only "commands" works in some settings. For example, if you review what the team at LMAX was sharing about their designs, you'll see that what they were writing into their journals were the input messsages.
In their context, they didn't need to worry that the underlying domain model of the process was going to change (that would happen during the daily maintenance window, when everything was quiet), so there was never any question what the state of the system would be after a given sequence of events.
But event-sourcing is normally understood to mean saving a representation of the state of the system -- just that instead of overwriting our data structure, we are extending it (think linked list of changes). The changes we persist tend to be the observable effects of the inputs, rather than the raw inputs alone.

In the Event Sourcing architecture events are source of your state. So you need to store events - the facts recorded in your system.

Related

Event source the whole system is bad

I'm learning a proper microservice architecture using CQRS, MassTransit and different type of storage for the read side. One thing which often comes along CQRS is the event sourcing. I do understand it's not mandatory at all. However, I can't think of why using it on the whole system is really an anti pattern.
Having an store for all events as a single source of truth can help you build / rebuild a read store on the fly whenever you want.
You are not locked in to any vendor (except for the event store)
For me, the question is more like is it easier to not start with event sourcing (and still have separate data storage depending on which the microservices. eg: elasticsearch, mongodb, etc etc) and migrating / provisioning whenever it's needed or on the other hand, start with event sourcing everything so that you don't have to deal with migration later on.
I can't think of why using it on the whole system is really an anti pattern.
I agree -- calling it an "anti pattern" is an overstatement.
The spelling I believe? Using event sourcing on the whole system isn't cost effective today.
It could be tomorrow, as we get more practice with it, and the costs of designing these systems goes down and we learn to extract more benefit from them.
In the mean time - how valuable are the temporal queries that you get from event sourcing? In your core domain, where you get competitive advantage, they could be quite valuable. In places where you are just doing bookkeeping of information provided to you by the outside world? Not so much - you may be getting everything you need out of simpler solutions that only keep track of "now".
I recently published a blog post about this issue. It explains why event sourcing is a persistence strategy and shouldn't be used at global scale.
To summarize it: Event Sourcing forces you to emit an event for every changed data. This can result in very fine grained events. If you use Event Sourcing for inter microservice communication, you expose those events to the outside world.
In the end you expose the your persistence layer, comparable to exposing your (relational) database schema in a CRUD based persistence strategy.

What does data look like when using Event Sourcing?

I'm trying to understand how Event Sourcing changes the data architecture of a service. I've been doing a lot of research, but I can't seem to understand how data is supposed to be properly stored with event sourcing.
Let's say I have a service that keeps track of vehicles transporting packages. The current non relational structure for the data model is that each document represents a vehicle, and has many fields representing origin location, destination location, types of packages, amount of packages, status of the vehicle, etc. Normally this gets queried for information to be read to the front end. When changes are made by the user, the appropriate changes are made to this document in order to update this.
With event sourcing, it seems that a snapshot of every event is stored, but there seem to be a few ways to interpret that:
The first is that the multiple versions of the document I described exist, each a new snapshot every time a change is made. Each event would create a new version of this document and alter it. This is the easiest way for me to wrap my head around it, but I believe this to be incorrect.
Another interpretation I have is that each event stores SPECIFIC information about what's been altered in the document. When the vehicle status changes from On Road to Available, for example, an event specifically for vehicle status changes is triggered. Let's say it's called VehicleStatusUpdatedEvent, and contains the Vehicle ID number, the new status, and the timestamp for this event. So this event is stored and is published to a messaging queue. When picked up from the queue, the appropriate changes are made to the current version of the document. I can understand this, but I think I still have some misconceptions here. My understanding is that event sourcing allows us to have a snapshot of data upon each change, so we can know what it looks like at any point. What I just described would keep a log of changes, but still only have one version of the file, as the events only contain specific pieces of the whole file.
Can someone describe how the data flow and architecture works with event sourcing? Using the vehicle data example I provided might help me frame it better. I feel that I am close to understanding this, but I am missing some fundamental pieces that I can't seem to understand by searching online.
The current non relational structure for the data model is that each document represents a vehicle
OK, let's start from there.
In the data model you've described, storage of a document destroys the earlier copy.
Now imagine that instead we were storing the the document in a git repository. Then then saving the document would also save metadata, and that metadata would include a pointer to the previous document.
Of course, we've probably got a lot of duplication in that case. So instead of storing the complete document every time, we'll store a patch document (think JSON Patch), and metadata pointing to the original patch.
Take that same idea again, but instead of storing generic patch documents, we use domain specific messages that describe what is going on in terms of the model.
That's what the data model of an event sourced entity looks like: a list of domain specific descriptions of document transformations.
When you need to reconstitute the current state, you start with a state you know (which could be the "null" state of the document before anything happened to it, and replay onto that document all of the patches (events) that have occurred since.
If you want to do a temporal query, the game is the same, you replay the events up to the point in time that you are interested in.
So essentially when referring to an older build, you reconstruct the document using the events, correct?
Yes, that's exactly right.
So is there still a "current status" document or is that considered bad practice?
"It depends". In the general case, there is no current status document; only the write-ordered list of events is "real", and everything else is derived from that.
Conversations about event sourcing often lead to consideration of dedicated message stores for managing persistence of those ordered lists, and it is common that the message stores do not also support document storage. So trying to keep a "current version" around would require commits to two different stores.
At this point, designers typically either decide that "recent version" is good enough, in which case they build eventually consistent representations of documents outside of the transaction boundary... OR they decide current version is important, and look into storage solutions that support storing the current version in the same transaction as the events (ex: using an RDBMS).
what is the procedure used to generate the snapshot you want using the events?
IF you want to generate a snapshot, then you'll normally end up using a pattern called a projection, to iterate over the events and either fold or reduce them to create the document.
Roughly, you have a function somewhere that looks like
document-with-meta-data = projection(event-history-with-metadata)

Record User Interaction for a Tcl Tk Test Automation

I want to do some tests on our tcl tk application regarding the user interaction. As the application has parts similar to a CAD for which every mouse movement is relevant, I would like to do something like record all events of some user interactions. My goal would be to playback these events laterwards and on every program change to discover potential changes. Or even better to assure the GUI behaves always the same and produces always the same data.
I know, that I can generate some enter motion and button events, but this would not be the same like the thousands of events generated by a real user interaction. But it is very important for me to have exactly these thousands of events.
Is there any possibility to achieve this?
It's relatively easy to record events of particular types with bind — you'll find that <ButtonPress>, <ButtonRelease>, <Enter>, <Leave>, <FocusIn>, <FocusOut>, <KeyPress> and <KeyRelease> cover pretty much everything that you are interested in — and then play them back with event generate. (You need to record quite a bit of information about each event in order to regenerate it correctly, but the underlying model is that of X events with similar names.) Assuming you're not wanting to support inter-application cut-and-paste or drag-and-drop for the purposes of recording; those complicate things a lot. You'll likely have a lot of events; recording to an SQLite database might make a lot of sense.
However, you should think carefully about which parts of the application you want to record. Does it matter if the order of two buttons in the outer shell of the application outside the CAD-like area get swapped in order? For most users, provided you're clear about what the buttons do (through clear labels and icons) it isn't very important, but for replaying recorded events it can matter hugely. Instead, for the parts of the application that are simple buttons and edit fields, I'd not record the details of them but would instead just record when the buttons are clicked and the changes to the text content of entries and so on. In effect, it's capturing higher-level events, and that's much easier to replay correctly. It's only when the user is in that main CAD area that you need the full detail.
Also, beware of changes to font sizes and screen sizes/scaling. They can change how things are laid out and may happen because of system-level alterations outside the scope of your application.
We started out the way you describe: record all those thousands of motion events, etc. Including exact timings which are extremely important for a GUI application as well.
It quickly became appearent that those recordings became too hard to maintain. They are also overly brittle in light of UI changes. Another problem where the hardcoded time values. A switch to a more powerful machine (or a cpu under load) would break the execution.
The two biggest improvements we introduced
Event compression: recognize the high-level action the user wanted to perform (like selecting a menu item). The recorded activateItem command would then perform the necessary work (event emulation) on replay.
Synchronization functions: instead of relying on a particular timing commands like waitForObject wait for an object to come into existance and become ready for interaction.
It took several years for this to work fluently, however. Including a central Object Map repository, property and screenshot verifications, high-level test descriptions in BDD and others. Feel free to take a look a the Squish for Tk product that came out of this work.

How to handle a legal enforced data delete request in an event sourced system?

In an event sourced system, historic data in the form of events is never thrown away. Doing so could result in a corrupted state. Now imagine there is a court ruling, stating some data needs to be deleted (for example, search engines had to delete privacy specific data). How would you achieve this?
That's a really good question.
So far, I've learned of two possibilities.
Easy part first: if you are using event sourcing, then all of your views of your data should be derivable from the events in your event store. Therefore, all of the data that you have stored for reading (caches, screens, projections, reports) can be blown away and regenerated after you scrub the tainted data from the event store.
So you only need to figure out that part.
First, if the tainted data never gets into the store, you don't have to worry about scrubbing it out. For instance, sensitive information can be isolated in a key value store; references to that data in the event store are always by surrogate key. When you need to scrub, the data in the key value store is nuked, you have a bunch of events that point to something no longer readable, and you just need to ensure that your read models can continue to function if the referenced data is not available.
If the data does need to get into the event store -- because it's needed to maintain the integrity of the domain model -- then the idea of "aggregates" may be able to help.
Aggregates is an idea taken from ddd, the basic idea is that your domain can be decomposed into elements that don't need to share data directly. On aggregate never references data within another directly; instead you use indirect references by ID; the ID itself being another surrogate key.
Since these aggregates are isolated from each other, they can have their own event history. In which case you can scrub the tainted data by simply eliminating any aggregates that have been contaminated. You just delete the event streams.
A response like this doesn't put you in a corrupted state, just an inconsistent one. Everything still runs, there's just a bunch of data missing.
There's also the weapon of a "compensating event" available in the toolkit; you might be able to introduce a new stream of events that brings the system back to a consistent state. For example, if scrubbing a bunch of transactions takes the books out of balance, you may be able to publish an event that creates a charge against iCouldTellYouButThen....

What's the best erlang approach to being able to identify a processes identity from its process id?

When I'm debugging, I'm usually looking at about 5000 processes, each of which could be one of about 100 gen_servers, fsms, etc. If I want to know WHAT an erlang process is, I can do:
process_info(pid(0,1,0), initial_call).
And get a result like:
{initial_call,{proc_lib,init_p,5}}
...which is all but useless.
More recently, I hit upon the idea (brace yourselves) of registering each process with a name that told me WHO that process represented. For example, player_1150 is the player process that represents player 1150. Yes, I end up making a couple million atoms over the course of a week-long run. (And I would love to hear comments on the drawbacks of boosting the limit to 10,000,000 atoms when my system runs with about 8GB of real memory unused, if there are any.) Doing this meant that I could, at the console of a live system, query all processes for how long their message queue was, find the top offenders, then check to see if those processes were registered and print out the atom they were registered with.
I've hit a snag with this: I'm moving processes from one node to another. Now a player process can have 3 different names; player_1158, player_1158_deprecating, player_1158_replacement. And I have to make absolutely sure I register and unregister these names with precision timing to make sure that a process is always named and that the appropriate names always exist, AND that I don't try to register a name that some dying process already holds. There is some slop room, since this is only used for console debugging of a live system Nonetheless, the moment I started feeling like this mechanism was affecting how I develop the system (the one that moves processes around) I felt like it was time to do something else.
There are two ideas on the table for me right now. An ets tables that associates process ids with their description:
ets:insert(self(), {player, 1158}).
I don't really like that one because I have to manually keep the tables clean. When a player exits (or crashes) someone is responsible for making sure that his data are removed from the ets table.
The second alternative was to use the process dictionary, storing similar information. When my exploration of a live system led me to wonder who a process is, I could just look at his process dictionary using process_info.
I realize that none of these solutions is functionally clean, but given that the system itself is never, EVER the consumer of these data, I'm not too worried about it. I need certain debugging tools to work quickly and easily, so the behavior described is not open for debate. Are there any convincing arguments to go one way or another (other than the academic "don't use the _, it's evil" canned garbage?) I'd be happy to hear other suggestions and their justifications.
You should try out gproc, it's a very convenient application for keeping process metadata.
A process can be registered with several names and you can associate arbitrary properties to a process (where the key and value can be any erlang term). Also gproc monitors the registered processes and unregisters them automatically if they crash.
If you're debugging gen_servers and gen_fsms while they're still running, I would implement the handle_info functions for these behaviors. When you send each process a {get_info, ReplyPid} tuple, the process in question can send back a term describing its own state, what it is, etc. That way you don't have to keep track of this information outside of the process itself.
Isac mentions there is already a built in way to do this

Resources