Is it possible to import old events in a new EventSourcing system - event-sourcing

I'm current trying to choose a technical solution for this problem : How to import, (why not) replay, and access a list of events (from various internal and external sources), in an adapted system?
EventSourcing seems to be a good solution for that but I can't find if it is possible to import old events.
I must say that I can receive old events anytime, but for me the important concept is not to have the states of the objects, but store the events themselves, then give it to the apps which need it.
Thanks for your help!

Of course you can import old events into new system.
I have had similar problem: several event storage with different event structure (mongodb, sql, sql with json text fields). At the end of researching we decided to replicate all events (with transformation to similar structure - timestamp, aggregateName, etc) to a new event storage. And now all applications works with this new es.
The only thing you must keep in mind is reactive replication of events from old storages if they are not read-only.

Related

Amount of properties per command/event in event sourcing

I'm learning cqrs/event sourcing, and recently I listen some speach and speaker told that you need pass as few parameters to event as possible, in other words to make events tiny as possible. The main reason for that is it's impossible to change events later as it will break the event history, and its easelly to design small events correctly. But what if for example in UI you need fill in for example form with 10 fields to create new aggregate, and same situation can be with updating the aggregate? How to be in such a case? And how to be if business later consider to change something, but we have huge event which updating 10 fields?
The decision is always context-specific and each case deserves its own review of using thin events vs fat events.
The motivation for using thin domain events is to include just enough information that is required to ensure the state transition.
As for fat events, your projections might require a piece of entity state to avoid using any logic in the projection itself (best practice).
For integration, you'd prefer emitting fat events because you hardly know who will consume your event. Still, the content of the event should convey the information related to the meaning of the event itself.
References:
Putting your events on a diet
Patterns for Decoupling in Distributed Systems: Fat Event
recently I listen some speach and speaker told that you need pass as few parameters to event as possible, in other words to make events tiny as possible.
I'm not convinced that holds up. If you are looking for good ides about designing events, you should review Greg Young's e-book on versioning.
If you are event sourcing, then you are primarily concerned with ensuring that your stream of events allows you to recreate the state of your domain model. The events themselves should be representations of changes that a domain expert will recognize. If you find yourself trying to invent smaller events just to fit some artificial constraint like "no more than three properties per event" then you are going to end up with data that doesn't really match the way your domain experts think -- which is to say, technical debt.

Event source the whole system is bad

I'm learning a proper microservice architecture using CQRS, MassTransit and different type of storage for the read side. One thing which often comes along CQRS is the event sourcing. I do understand it's not mandatory at all. However, I can't think of why using it on the whole system is really an anti pattern.
Having an store for all events as a single source of truth can help you build / rebuild a read store on the fly whenever you want.
You are not locked in to any vendor (except for the event store)
For me, the question is more like is it easier to not start with event sourcing (and still have separate data storage depending on which the microservices. eg: elasticsearch, mongodb, etc etc) and migrating / provisioning whenever it's needed or on the other hand, start with event sourcing everything so that you don't have to deal with migration later on.
I can't think of why using it on the whole system is really an anti pattern.
I agree -- calling it an "anti pattern" is an overstatement.
The spelling I believe? Using event sourcing on the whole system isn't cost effective today.
It could be tomorrow, as we get more practice with it, and the costs of designing these systems goes down and we learn to extract more benefit from them.
In the mean time - how valuable are the temporal queries that you get from event sourcing? In your core domain, where you get competitive advantage, they could be quite valuable. In places where you are just doing bookkeeping of information provided to you by the outside world? Not so much - you may be getting everything you need out of simpler solutions that only keep track of "now".
I recently published a blog post about this issue. It explains why event sourcing is a persistence strategy and shouldn't be used at global scale.
To summarize it: Event Sourcing forces you to emit an event for every changed data. This can result in very fine grained events. If you use Event Sourcing for inter microservice communication, you expose those events to the outside world.
In the end you expose the your persistence layer, comparable to exposing your (relational) database schema in a CRUD based persistence strategy.

What does data look like when using Event Sourcing?

I'm trying to understand how Event Sourcing changes the data architecture of a service. I've been doing a lot of research, but I can't seem to understand how data is supposed to be properly stored with event sourcing.
Let's say I have a service that keeps track of vehicles transporting packages. The current non relational structure for the data model is that each document represents a vehicle, and has many fields representing origin location, destination location, types of packages, amount of packages, status of the vehicle, etc. Normally this gets queried for information to be read to the front end. When changes are made by the user, the appropriate changes are made to this document in order to update this.
With event sourcing, it seems that a snapshot of every event is stored, but there seem to be a few ways to interpret that:
The first is that the multiple versions of the document I described exist, each a new snapshot every time a change is made. Each event would create a new version of this document and alter it. This is the easiest way for me to wrap my head around it, but I believe this to be incorrect.
Another interpretation I have is that each event stores SPECIFIC information about what's been altered in the document. When the vehicle status changes from On Road to Available, for example, an event specifically for vehicle status changes is triggered. Let's say it's called VehicleStatusUpdatedEvent, and contains the Vehicle ID number, the new status, and the timestamp for this event. So this event is stored and is published to a messaging queue. When picked up from the queue, the appropriate changes are made to the current version of the document. I can understand this, but I think I still have some misconceptions here. My understanding is that event sourcing allows us to have a snapshot of data upon each change, so we can know what it looks like at any point. What I just described would keep a log of changes, but still only have one version of the file, as the events only contain specific pieces of the whole file.
Can someone describe how the data flow and architecture works with event sourcing? Using the vehicle data example I provided might help me frame it better. I feel that I am close to understanding this, but I am missing some fundamental pieces that I can't seem to understand by searching online.
The current non relational structure for the data model is that each document represents a vehicle
OK, let's start from there.
In the data model you've described, storage of a document destroys the earlier copy.
Now imagine that instead we were storing the the document in a git repository. Then then saving the document would also save metadata, and that metadata would include a pointer to the previous document.
Of course, we've probably got a lot of duplication in that case. So instead of storing the complete document every time, we'll store a patch document (think JSON Patch), and metadata pointing to the original patch.
Take that same idea again, but instead of storing generic patch documents, we use domain specific messages that describe what is going on in terms of the model.
That's what the data model of an event sourced entity looks like: a list of domain specific descriptions of document transformations.
When you need to reconstitute the current state, you start with a state you know (which could be the "null" state of the document before anything happened to it, and replay onto that document all of the patches (events) that have occurred since.
If you want to do a temporal query, the game is the same, you replay the events up to the point in time that you are interested in.
So essentially when referring to an older build, you reconstruct the document using the events, correct?
Yes, that's exactly right.
So is there still a "current status" document or is that considered bad practice?
"It depends". In the general case, there is no current status document; only the write-ordered list of events is "real", and everything else is derived from that.
Conversations about event sourcing often lead to consideration of dedicated message stores for managing persistence of those ordered lists, and it is common that the message stores do not also support document storage. So trying to keep a "current version" around would require commits to two different stores.
At this point, designers typically either decide that "recent version" is good enough, in which case they build eventually consistent representations of documents outside of the transaction boundary... OR they decide current version is important, and look into storage solutions that support storing the current version in the same transaction as the events (ex: using an RDBMS).
what is the procedure used to generate the snapshot you want using the events?
IF you want to generate a snapshot, then you'll normally end up using a pattern called a projection, to iterate over the events and either fold or reduce them to create the document.
Roughly, you have a function somewhere that looks like
document-with-meta-data = projection(event-history-with-metadata)

Using Core Data as cache

I am using Core Data for its storage features. At some point I make external API calls that require me to update the local object graph. My current (dumb) plan is to clear out all instances of old NSManagedObjects (regardless if they have been updated) and replace them with their new equivalents -- a trump merge policy of sorts.
I feel like there is a better way to do this. I have unique identifiers from the server, so I should be able to match them to my objects in the store. Is there a way to do this without manually fetching objects from the context by their identifiers and resetting each property? Is there a way for me to just create a completely new context, regenerate the object graph, and just give it to Core Data to merge based on their unique identifiers?
Your strategy of matching, based on the server's unique IDs, is a good approach. Hopefully you can get your server to deliver only the objects that have changed since the time of your last update (which you will keep track of, and provide in the server call).
In order to update the Core Data objects, though, you will have to fetch them, instantiate the NSManagedObjects, make the changes, and save them. You can do this all in a background thread (child context, performBlock:), but you'll still have to round-trip your objects into memory and back to store. Doing it in a child context and its own thread will keep your UI snappy, but you'll still have to do the processing.
Another idea: In the last day or so I've been reading about AFIncrementalStore, an NSIncrementalStore implementation which uses AFNetworking to provide Core Data properties on demand, caching locally. I haven't built anything with it yet but it looks pretty slick. It sounds like your project might be a good use of this library. Code is on GitHub: https://github.com/AFNetworking/AFIncrementalStore.

How to "bind" a persistent data structure to a GUI in Scala?

I need a GUI control to update whenever a persistent data structure (PDS) is updated.
I need to have the PDS updated when the user takes certain actions.
So, for example, an SWT Tree and a simple tree data structure.
There are lots of manual, ugly ways to do this, but it seems to me this is a very common situation and there would likely be a very clean approach out there.
I've been reading about FRP, Lenses, Actors, etc... seems like there could be a very simple, clean, effective approach to handling this type of situation.
What I can think about is having a component with a mutable reference to the PDS. This component could raise an event with the new version of the PDS every time it changes the value of the var. Your GUI control could be listening to that event and react to it by redrawing itself with the new info. Other option is that the component that listens to the event be the parent of your GUI control, reacting by creating a new instance of it, so the control can receive the PDS in the constructor and draw itself only once.
A persistent data structure never updates. You probably have a reference to a persistent data structure that changes to the new version, when you change it. If you want to track incremental change in the PDS, it's going to be awkward. The thing is, at the point you're storing the new version of the PDS, you still have the old version. Maybe you can run a diff to produce the incremental change.
Yes, there is a nice and clean approach: ValueModels. It should be pretty easy to implement in Scala (I've found nothing in a quick search). AFAIK there is a Java implementation embedded in Spring Rich Client.
How you describe it, it seems that the user calls takes certain actions inside the GUI, and then the GUI and the Database has to be updated. As long as the database update is a side effect you could completely rely on all SWT events.

Resources