In a system I'm working on, we have a bunch of "processes".
Each process can be created, renamed, (soft)deleted.
When creating a process, a name must be provided before we are allowed to continue.
We are considering having a "meta-stream", containing instances of the following events:
Create {name, id}
Delete {id}
Rename {name, id}
As well as another stream containing events that represents the internal state of the process.
Pros:
If we don't do this, we'd have to add an "Initialize {name}"-command to the non-meta process stream. We'd have to enforce the invariant that this initialize-command is supposed to come first in each and every process' stream.
Cons:
The true state of the process is split up between two different streams, we need to go to one stream to check whether it exists, and then to another stream to get its state. Might lead to inconsistencies.
Any reflections on this?
Related
I have a situation where data is coming from a third party service. It is being passed through to a function that formats the data and then saves it to a view model in a way that I can visualize for my system.
In an Event driven approach, should I save the payload of the request (as this can easily be repayable) in the Event stream, or the formatted changes it produces to the view model (a more accurate representation of the current state of the data)?
Or something else completely?
Thanks
The incoming data can be viewed as a command expressing the intent to ultimately update some state. In this case the command is from outside our system, but commands can also be internal to our system. Especially for external commands, one critical thing to remember is that a command can be rejected.
In event sourcing, however, events are internal and express that the change has occurred and cannot be denied (at most it can be ignored). Thus it's probably best to store them in the format that is the most convenient for that internal use.
I would characterize the requests as commands and the formatted changes as events. Saving the payload is command sourcing, saving the formatted changes is event sourcing (confusingly, Fowler's earliest descriptions of event sourcing are more like command sourcing) and both are valid approaches. Event sourcing tends to imply a commitment to replay to a similar state while command sourcing leaves open the ability for replay to depend on something in the outside world. I've seen (and developed even) applications which used both techniques (e.g. incoming data is dumped to Kafka, a consumer treats those messages as commands against aggregates whose state is persisted as a stream of events, which gets projected back into Kafka).
If you (in CQRS/ES fashion) consider the read-side of your application to be a separate autonomous component from the write-side, then you reach the interesting conclusion that when the write-side publishes events, from the read-side's perspective it's publishing commands to the read-side. "One component's events are often another component's commands".
I'm still learning event sourcing i dont undestand something.
When i get a command to change object, do I first recreate that object from event store than change it and save event, or should i have separate table that holds last state?
What is practice here?
I'm still learning event sourcing i don't understand something. When i get a command to change object, do I first recreate that object from event store than change it and save event, or should i have separate table that holds last state? What is practice here?
The first rule of optimization: Don't.
For handling commands, all of the information that you need to have is stored in your event history; simply loading the history and recomputing any state you need will get the job done.
In the case where you need low latency in your command handler, AND recomputing the state you need from the event history is too slow to meet your service level targets, then you might look into saving a "snapshot", and using that to speed up the load of your data.
Current consensus is that snapshots should be saved separately from the event history (ie: a snapshot is not another kind of event), as though the snapshot were another "read model".
I'm building a Kafka Streams application that generates change events by comparing every new calculated object with the last known object.
So for every message on the input topic, I update an object in a state store and every once in a while (using punctuate), I apply a calculation on this object and compare the result with the previous calculation result (coming from another state store).
To make sure this operation is consistent, I do the following after the punctuate triggers:
write a tuple to the state store
compare the two values, create change events and context.forward them. So the events go to the results topic.
swap the tuple by the new_value and write it to the state store
I use this tuple for scenario's where the application crashes or rebalances, so I can always send out the correct set of events before continuing.
Now, I noticed the resulting events are not always consistent, especially if the application frequently rebalances. It looks like in rare cases the Kafka Streams application emits events to the results topic, but the changelog topic is not up to date yet. In other words, I produced something to the results topic, but my changelog topic is not at the same state yet.
So, when I do a stateStore.put() and the method call returns successfully, are there any guarantees when it will be on the changelog topic?
Can I enforce a changelog flush? When I do context.commit(), when will that flush+commit happen?
To get complete consistency, you will need to enable processing.guarantee="exaclty_once" -- otherwise, with a potential error, you might get inconsistent results.
If you want to stay with "at_least_once", you might want to use a single store, and update the store after processing is done (ie, after calling forward()). This minimized the time window to get inconsistencies.
And yes, if you call context.commit(), before input topic offsets are committed, all stores will be flushed to disk, and all pending producer writes will also be flushed.
I've been searching for a solution to get all created/modified and deleted files by a specific process from an event trace (ETW) session (I will process data from an etl file not from a real-time session).
Apparently the simplest solution to get this done was to get the FileCreate and FileDelete events from FileIo_Name class and map them to the corresponding DiskIo_TypeGroup1 events. However, this solution isn't working for me since I don't receive any DiskIo_TypeGroup1 events for the corresponding FileDelete events, so I can not get the process ID. Also not all FileCreate events have an associated DiskIo_TypeGroup1 event (I think this happens for the empty created files or only for the opened files).
Note: I need DiskIo_TypeGroup1 mapping because FileIo_Name events don't have the ThreadId and ProcessId members populated - they are set to (ULONG)-1. Also, I can not decide which files where just opened or modified without knowing the "file write size". DiskIo_TypeGroup1 also don't have the ThreadId and ProcessId (in event header, on newer OS's) members populated, but it has the IssuingThreadId structure member from which I can obtain the ProcessId mapping to Thread_TypeGroup1 class events.
So I investigated how the FileIo_Create class can help me, and remarked that I can get the CreateOptions member which can have the following flags: (FILE_SUPERSEDE, FILE_CREATE, FILE_OPEN, FILE_OPEN_IF, FILE_OVERWRITE, FILE_OVERWRITE_IF). But the initial problem still persists. How can I check if a file was created from scratch instead of being just opened (e.g. in case of FILE_SUPERSEDE)?
Maybe I can use the FileIo_ReadWrite class to get Write event. Like using the DiskIo_TypeGroup1 class. So, if something was written to a file, then can I suppose that the file was either created or modified?
To find the deleted files I think that the FileIo_Info class and Delete event are the solution. Guess that I can receive Delete events and map them to FileIo_Name to get the file names.
Note: The FileIo_Create, FileIo_Info, FileIo_ReadWrite contain information about process id.
Are my suppositions right? What will be the best solution for my problem?
I will share my implemented solution as follow :
Created Files:
I have stored all FileIo_Create events as a pending create operation and waited to receive associated FileIo_OpEnd to decide if the file was opened, created, overwritten, or superseded from the ExtraInfo structure member.
Modified Files:
I marked files as dirty for every Write event from FileIo_ReadWrite and every SetInfo event with InfoClass->FileEndOfFileInformation and InfoClass->FileValidDataLengthInformation from FileIo_Info. Finally on Cleanup event from FileIo_SimpleOp verify if the file was marked as dirty and store as modified.
Deleted files:
I marked the files as deleted if was opened with the CreateOptions->FILE_DELETE_ON_CLOSE flag from FileIo_Create or if a Delete event from FileIo_Info appears. Finally on Cleanup event from FileIo_SimpleOp stored the file as deleted.
Also the process id and file name was obtained from the FileIo_Create events, more precisely from OpenPath structure member and ProcessId event header member.
I need a list of all the component of a certain stream that still needs delivery to default
There doesn't seem to be one simple command to use.
You would have to initiate a preview of a deliver, to see which ones would actually need said deliver.
cleartool deliver -preview
Since the default for a deliver is to deliver all activities in the stream that have changed since the last deliver operation from the stream, that would allow for detecting any activity candidate for delivery, component after components.
You might also need to:
You would need to:
list all components in a stream, using fmt_ccase:
cleartool descr -fmt "%[mod_comps]CXp" stream:<stream>#<apvobtag>
(that would list the modifiable components for a given stream)
cross-reference the activities elements with the root folder of the components in order to find which component is involved by an activity listed as to be delivered.