According to my understanding I have to "construct" the latest state of the system by iterating over all events for the given aggregator. If I need to find the latest quantity for a product I have to iterate over each events that added or deducted quantity from the particular item. Can I however append the "latest" quantity of that product as part of the given event? So instead of having events like {added: 3}, {deducted: 1} = 2 available, I can have {added: 3, available: 3}, {deducted: 1, available: 2} so that I can just grab the last event from the store instead of iterating over all of them OR keeping a snapshot. Is this against the "rules" and what are the possible implications?
A general rule is that an event should not contain computed values, but at the same time its a trad-off between complexity and ease of use.
An alternative is to rely on snapshots, IF the performance is an issue. However, in my cases the read-side can handle those aggregated questions for you. You should also not be afraid of having to load a set of events to get to the current state of a given aggregate.
Related
I am in the process of scaling out an application horizontally, and have realised read model updates (external projection via event handler) will need to be handled on a competing consumer basis.
I initially assumed that I would need to ensure ordering, but this requirement is message dependent. In the case of shopping cart checkouts where i want to know totals, i can add totals regardless of the order - get the message, update the SQL database, and ACK the message.
I am now racking my brains to even think of a scenario/messages that would be anything but, however i know this is not the case. Some extra clarity and examples would be immensely useful.
My questions i need help with please are:
What type of messages would the ordering need to be important, and
how would this be resolved using the messages as-is?
How would we know which event to resubscribe from when the processes
join/leave I can see possible timing issues that could cause a
subscription to be requested on a message that had just been
processed by another process?
I see there is a Pinned consumer strategy for best efforts affinity of stream to subscriber, however this is not guaranteed. I could solve this making a specific stream single threaded processing only those messages in order - is it possible for a process to have multiple subscriptions to different streams?
To use your example of a shopping cart, ordering would be potentially important for the following events:
Add item
Update item count
Remove item
You might have sequences like A: 'Add item, remove item' or B: 'Add item, Update item count (to 2), Update item count (to 3)'. For A, if you process the remove before the add, obviously you're in trouble. For B, if you process two update item counts out of order, you'll end up with the wrong final count.
This is normally scaled out by using some kind of sharding scheme, where a subset of all aggregates are allocated to each shard. For Event Store, I believe this can be done by creating a user-defined projection using partitionBy to partition the stream into multiple streams (aka 'shards'). Then you need to allocate partitions/shards to processing nodes in a some way. Some technologies are built around this approach to horizontal scaling (Kafka and Kinesis spring to mind).
I'm implementing view for google events in my application using the following end-point:
https://developers.google.com/google-apps/calendar/v3/reference/events/list
The problem that I have is implementing a feature to make it possible to go to the previous page of events. For example: user is having 20 events for the current date and once he presses the button, they have 20 past events.
As I can see, Google provides only:
"nextPageToken": string
That fetches the results for the next page.
The way I see the problem can be solved:
Fetch results in descending order and then traverse them the same way as we do with nextPageToken. The problem is that it is stated in the doc that only asc is available:
"startTime": Order by the start date/time (ascending). This is only
available when querying single events (i.e. the parameter singleEvents
is True)
Fetch all the events for specific time period, traverse the pages until I get to the current date or to the end of the list, memorize all the nextPageTokens. Use memorized values to be able to go backwards. The clear drawback of it is the fact that we need to go through unpredictable number of pages to get the current date. That can dramatically affect the performance. But, at least it is something that Google APIs allow. Updated: Checked that approach with 5 years time span and sometimes it takes up to 20 seconds to get the current date page token.
Is there a more convenient way to implement the ability to go to the previous pages?
During development of my application, I found that I need to emit some events that actually don't modify the state of the aggregate, but they are needed in order to update read models (transient events?). e.g. if in my code (domain model) I hold state of hierarchy of numbers in layers like:
1 4 7
5 8
3 9
and the read model is doing projection of events like (top number from left to right):
1
5
3
then, when I trigger event in aggregate root RemovedNumber(1), and if this is the only event I trigger (since it is enough to update aggregate state), read model will not know that it needs to replace number 1 with 4.
? <--- SHOULD BE 4 SINCE 4 IS UNDER 1
5
3
So here basically, I need to trigger additionally: NowShowNumber(4 instead of 1), and then read model will know to project:
4
5
3
Event RemovedNumber(1) should be kept in event store, since it affects internal state of aggregate. Event NowShowNumber(4 instead of 1) should also be stored in event store since it is affecting read model (and should be replayed on re-projecting it), but it should probably not be used during reconstruction of aggregate root from event stream.
Is this standard practice in CQRS/Event Sourcing systems? Is there some alternative solution?
Why doesn't the Read model know to show number 4?
Didn't the Aggregate emit an AddNumber(4) prior to AddNumber(1)?
Then the Read model has the necessary state replicated on his part, basically a stack with numbers, in order to pull the previous number and to show it.
In CQRS, in order to help the Read models, when a state changes and an Event is emitted, the Aggregate include bits of the previous state in the Event also.
In your case, the emitted Event could have the following signature RemovedNumber( theRemovedNumber, theNewCurrentNumber), and in particular RemovedNumber(1, 4).
I call these events out of band events and save them to a different stream than I hydrate aggregates with.
Haven't heard anyone else doing it but haven't heard any good arguments to not do it - especially if you have a legitimate case for posting events that have no effect at all on the aggregate.
In your case if I understand your problem well enough I would just have the domain write a TopLevelNumberChanged event which the read model would see and process.
And obviously it would not read that event when hydrating.
I cannot see that it is at all an issue having events that don't effect changes in your projections. Depending on the projection it may be that the projection ignores many events.
That being said, if you are saying that these two events go hand-in-hand you may need to have another look at the design / intention. How do you know to call the second command? Would a single command not perhaps do the trick? The event could return the full change:
NumberReplacedEvent ReplaceNumber(1, 4);
The event would contain all the state:
public class NumberReplacedEvent
{
int ReplacedNumber { get; set; }
int WithNumber { get; set;
}
From my understanding, there's no single correct answers. CQRS / Event Sourcing is just a tool for helping you to model your data flow. But it's still your data, your business rules, your use case. In other words: Some other company could use the exact same data model, but have a different event structure, because it fits better for their use case.
Some example:
Let's imagine we have an online shop. And every time a customer buys a product, we decrease the inStock value for that product. If the customer sends the product back, we increase the value.
The command is pretty simple: BuyProduct(id: "123", amount: 4)
For the resulting event we have (at least) 2 options:
ProductBuyed(id: "123", amount: 4) (delta value)
ProductBuyed(id: "123", newInStockValue: 996) (new total value)
(you could also publish 4 times a simple ProductBuyed(id: "123") event)
Or you can have multiple resulting events at the same time:
ProductBuyed(id: "123", amount: 4)
InStockValueForProductChanged(id: "123", newValue: 996)
An online shop will possibly have multiple read models that are interested in these events. The Product Page wants to display only 996 items left!. And the Shop Statistics Page wants to display sold 4 items today. Though both options (total and delta) can be useful.
But also both Pages could work if there's only one of both events. Then the read side must do the calculation: oldTotal - newTotal = delta or oldTotal - delta = newTotal.
There are even more possible solutions. For example:
Checkout Service publishes ProductBuyed(id: "123", amount: 4) event
Stock Service receives this event, decreases the stock and then publishes the InStockValueForProductChanged(id: "123", newValue: 996) event
It really depends on the needs of your business.
My suggestions:
I prefer when the write model is only responsible for managing the business rules. Get Command, validate it, publish event(s) which look pretty similar to the command contents.
And the read model should be as simple as possible, too. Get Event, update model.
If calculations have to be done, there are a few options:
The calculation is part of a business rule? Then your write side has to compute the result anyway. In this case you already have written the algorithm, the CPU has done its work, and you have the resulting value for free. (Just include the result with the published event)
The calculation is really complex and/or there are multiple event consumers that need the result. Then it might be better to compute it once and include the result in an event, instead of computing it n times for every involved event consumer. Complex could mean:
Takes a lot of time
Very CPU / memory intensive
Needs special / huge external libs (imagine you had to include some Image Processing library with every read service)
The calculation is the result of a combination of a lot of different events (i.e. it's getting complex): Build an external service, which is responsible for the calculation. This way you can easily scale out by providing multiple instances of this service.
If the calculation is not part of a business rule and it's simple and only a single service needs the result or if it's only relevant for the read model: Place it in the read side.
In the end it's a tradeoff:
Duplicate algorithms? You could have multiple event consumers written with different programming languages. Do you want to implement the algorithm multiple times?
More network traffic / bigger event store? If you include the calculation result with the event, there's more data to store and transfer between the services. Can your infrastructure handle that?
Can your write / read service take the additional load?
Assuming we have an event stream with events with the following two attributes:
{"first_name", "last_name"}
and we partition on both attributes using fieldsgrouping:
.fieldsgrouping{"spout", new Fields("first_name", "last_name")}
The processing bolt is parallelized by two tasks and the following events enter the stream in specified order:
1: {"foo", "foo"}
2: {"bar", "bar"}
3: {"foo", "bar"}
Now events 1 and 2 go to task one and two respectively, what will happen with event 3? If it goes to either task it will break fieldsgrouping of an attribute.
How does Storm handle this? Or am I not understanding fieldsgrouping correctly?
Edit:
Thinking about this a bit more I probably misunderstood the behaviour of fieldsgrouping. If both fields are considered coupled event 1, 2, and 3 are each considered a distinct partition. Removing the problem.
However, this is not immediately clear from the only official documentation I can find on fieldsgrouping.
If anybody could point me to more detailed documentation.
You are grouping by first name by last name, meaning that not all tuples with the same first name will end up on the same destination, but tuples with the same first name and last name will.
Storm Applied (Sec 3.5.3) has a good example of this based on grouping street checkins by time-interval and city instead of using only time-interval. Basically, the latter was creating a bottleneck of all street checkins in the same interval ending up in the same bolt, no matter the city. By adding city to fields grouping, they kept the requirement of having all street checkins in the same bolt and at the same time they removed the bottleneck.
I'm using https://github.com/google/google-api-ruby-client to connect to different google API in particular the Google Calendar one.
Creating an event, updating it and deleting it works most of the time with what one can usually find around.
The issue appears when one tries to update an event details after a previous update of the dates of the event.
In that case, the id provided is not enough and the request fails with an error :
SmhwCalendar::GoogleServiceException: Invalid sequence value. 400
Yet the documentation does not mention such things : https://developers.google.com/google-apps/calendar/v3/reference/calendars/update
The event documentation does describe the sequence attribute without saying much : https://developers.google.com/google-apps/calendar/v3/reference/events/update
What's needed to update an event ?
Is there specific attributes to keep track of when creating, updating events besides the event id ?
how is the ruby google api client handling those ?
I think my answer from Cannot Decrease the Sequence Number of an Event applies here too.
Sequence number must not decrease (and if you don't supply it, it's the same as if you supplied 0) + some operations (such as time changes) will bump the sequence number. Make sure to always work on the most recent copy of the event (the one that was provided in the response).
#luc answer is pretty correct yet here are some details.
Google API documentation is unclear about this (https://developers.google.com/google-apps/calendar/v3/reference/events/update).
You should consider that the first response contains a sequence number of 0.
The first update should contain that sequence number (alongside the title, and description etc ). The response to that request will contain an increment sequence number (1 in this case) that you should store and reuse on the next update.
While the first update would imply a sequence number of 0 (and work) if you don't pass any the second might still pass but the third will probably not (because it's expecting 1 as sequence).
So that attribute might appear optional but it is actually not at all optional.