I'm new to EDA and I've read a lot about benefits and would probably be interested to apply it during my next project but still haven't understood something.
When raising an event, which pattern is the most suited:
Name the event "CustomerUpdate" and include all information (updated or not) about the customer
Name the event "CustomerUpdate" and include only information that have really been updated
Name the event "CustomerUpdate" and include minimum information (Identifier) and/or a URI to let the consumer retrieves information about this Customer.
I ask the question because some of our events could be heavy and frequent.
Thx for your answers and time.
Name the event "CustomerUpdate"
First let's start with your event name. The purpose of an event is to describe something which has already happenned. This is different from a command, which is to issue an instruction for something yet to happen.
Your event name "CustomerUpdate" sounds ambiguous in this respect, as it could be describing something in the past or something in the future.
CustomerUpdated would be better, but even then, Updated is another ambiguous term, and is nonspecific in a business context. Why was the customer updated in this instance? Was it because they changed their payment details? Moved home? Were they upgraded from silver to gold status? Events can be made as specific as needed.
This may seem at first to be overthinking, but event naming becomes especially relevant as you remove data and context from the event payload, moving more toward skinny events (the "option 3" from your question, which I discuss below).
That is not to suggest that it is always appropriate to define events at this level of granularity, only that it is an avenue which is open to you early on in the project which may pay dividends later on (or may swamp you with thousands of event types).
Going back to your actual question, let's take each of your options in turn:
Name the event "CustomerUpdate" and include all information (updated
or not) about the customer
Let's call this "pattern" the Fat message.
Fat messages (also called snapshots) represent the state of the described entity at a given point in time with all the event context present in the payload. They are interesting because the message itself represents the contract between service and consumer. They can be used for communicating changes of state between business domains, where it may be preferred that all event context be present during message processing by the consumer.
Advantages:
Self consistent - can be consumed entirely without knowledge of other systems.
Simple to consume (upsert).
Disadvantages:
Brittle - the contract between service and consumer is coupled to the message itself.
Easy to overwrite current data with old data if messages arrive in the wrong order (hint: you can mitigate this by using the event sourcing pattern)
Large.
Name the event "CustomerUpdate" and include only information that have
really been updated
Let's call this pattern the Delta message.
Deltas are similar to fat messages in many ways, though they are generally more complex to generate and consume. A good example here is the JSONPatch standard.
Because they are only a partial description of the event entity, deltas also come with a built-in assumption that the consumer knows something about the event being described. For this reason they may be less suitable for sending outside a business domain, where the event entity may not be well known.
Deltas really shine when synchronising data between systems sharing the same entity model, ideally persisted in non-relational storage (eg, no-sql). In this instance an entity can be retrieved, the delta applied, and then persisted again with minimal effort.
Advantages:
Smaller than Fat messages
Excels in use cases involving shared entity models
Portable (if based on a standard such as jsonpatch, or to a lesser extent, diffgram)
Disadvantages:
Similar to the Fat message, assumes complete knowledge of the data entity.
Easy to overwrite current data with old data.
Complex to generate and consume (except for specific use cases)
Name the event "CustomerUpdate" and include minimum information
(Identifier) and/or a URI to let the consumer retrieves information
about this Customer.
Let's call this the Skinny message.
Skinny messages are different from the other message patterns you have defined, in that the service/consumer contract is no longer explicit in the message, but implied in that at some later time the consumer will retrieve the event context. This decouples the contract and the message exchange, which is a good thing.
This may or may not lend itself well to cross-business domain communication of events, depending on how your enterprise is set up. Because the event payload is so small (usually an ID with some headers), there is no context other than the name of the event on which the consumer can base processing decisions; therefore it becomes more important to make sure the event is named appropriately, especially if there are multiple ways a consumer could handle a CustomerUpdated message.
Additionally it may not be good practice to include an actual resource address in the event data - because events are things which have already happened, event messages are generally immutable and therefore any information in the event should be true forever in case the events need to be replayed. In this instance a resource address could easily become obsolete and events would not be re-playable.
Advantages:
Decouples service contract from message.
Information about the event contained in the event name.
Naturally idempotent (with time-stamp).
Generally tiny.
Simple to generate and consume.
Disadvantages:
Consumer must make additional call to retrieve event context - requires explicit knowledge of other systems.
Event context may have become obsolete at the point where the consumer retrieves it, making this approach generally unsuitable for some real-time applications.
When raising an event, which pattern is the most suited?
I think the answer to this is: it depends on lots of things, and there is probably no one right answer.
Update from comments: Also worth reading, a very old, classic, blog post on messaging: https://learn.microsoft.com/en-gb/archive/blogs/nickmalik/killing-the-command-message-should-we-use-events-or-documents (also here: http://vanguardea.com/killing-the-command-message-should-we-use-events-or-documents/)
Martin Fowler gave a great talk about "The Many Meanings of Event-Driven Architecture" (the content is based on this paper) in which he mentioned the Event-Carried State Transfer pattern.
It seems to be close to your second option "Delta message" with the difference that it doesn't try to describe an entity, but instead describe a named business fact that happened and carry over all the necessary data to understand this fact.
I don't think it matters how you have modeled your persistence layer when it comes to designing domain events. Likewise, I don't think it matters how your consumer has modeled its own persistence layer when designing domain events.
Thus, I don't think it's wise to put as an advantage the fact that you can apply the event as a patch directly on your data (from a consumer point of view), because it pushes the producer to design their events given the persistence model of a consumer.
In that case, I would tend to think that you're designing persistence patches, instead of domain events.
What do you think?
Related
When we talk about sourcing events, we have a simple dual write architecture where we can write to database and then write the events to a queue like Kafka. Other downstream systems can read those events and act on/use them accordingly.
But the problem occurs when trying to make both DB and Events in sync as the ordering of these events are required to make sense out of it.
To solve this problem people encourage to use database commit logs as a source of events, and there are tools build around it like Airbnb's Spinal Tap, Redhat's Debezium, Oracle's Golden gate, etc... It solves the problem of consistency, ordering guaranty and all these.
But the problem with using the Database commit log as event source is we are tightly coupling with DB schema. DB schema for a micro-service is exposed, and any breaking changes in DB schema like datatype change or column name change can actually break the downstream systems.
So is using the DB CDC as an event source a good idea?
A talk on this problem and using Debezium for event sourcing
Extending Constantin's answer:
TLDR;
Transaction log tailing/mining should be hidden from others.
It is not strictly an event-stream, as you should not access it directly from other services. It is generally used when transitioning a legacy system gradually to a microservices based. The flow could look like this:
Service A commits a transaction to the DB
A framework or service polls the commit log and maps new commits to Kafka as events
Service B is subscribed to a Kafka stream and consumes events from there, not from the DB
Longer story:
Service B doesn't see that your event is originated from the DB nor it accesses the DB directly. The commit data should be projected into an event. If you change the DB, you should only modify your projection rule to map commits in the new schema to the "old" event format, so consumers must not be changed. (I am not familiar with Debezium, or if it can do this projection).
Your events should be idempotent as publishing an event and committing a transaction
atomically is a problem in a distributed scenario, and tools will guarantee at-least-once-delivery with exactly-once-processing semantics at best, and the exactly-once part is rarer. This is due to an event origin (the transaction log) is not the same as the stream that will be accessed by other services, i.e. it is distributed. And this is still the producer part, the same problem exists with Kafka->consumer channel, but for a different reason. Also, Kafka will not behave like an event store, so what you achieved is a message queue.
I recommend using a dedicated event-store instead if possible, like Greg Young's: https://eventstore.org/. This solves the problem by integrating an event-store and message-broker into a single solution. By storing an event (in JSON) to a stream, you also "publish" it, as consumers are subscribed to this stream. If you want to further decouple the services, you can write projections that map events from one stream to another stream. Your event consuming should be idempotent with this too, but you get an event store that is partitioned by aggregates and is pretty fast to read.
If you want to store the data in the SQL DB too, then listen to these events and insert/update the tables based on them, just do not use your SQL DB as your event store cuz it will be hard to implement it right (failure-proof).
For the ordering part: reading events from one stream will be ordered. Projections that aggregates multiple event streams can only guarantee ordering between events originating from the same stream. It is usually more than enough. (btw you could reorder the messages based on some field on the consumer side if necessary.)
If you are using Event sourcing:
Then the coupling should not exist. The Event store is generic, it doesn't care about the internal state of your Aggregates. You are in the worst case coupled with the internal structure of the Event store itself but this is not specific to a particular Microservice.
If you are not using Event sourcing:
In this case there is a coupling between the internal structure of the Aggregates and the CDC component (that captures the data change and publish the event to an Message queue or similar). In order to limit the effects of this coupling to the Microservice itself, the CDC component should be part of it. In this way when the internal structure of the Aggregates in the Microservice changes then the CDC component is also changed and the outside world doesn't notice. Both changes are deployed at the same time.
So is using the DB CDC as an event source a good idea?
"Is it a good idea?" is a question that is going to depend on your context, the costs and benefits of the different trade offs that you need to make.
That said, it's not an idea that is consistent with the heritage of event sourcing as I learned it.
Event sourcing - the idea that our book of record is a ledger of state changes - has been around a long long time. After all, when we talk about "ledger", we are in fact alluding to those documents written centuries ago that kept track of commerce.
But a lot of the discussion of event sourcing in software is heavily influenced by domain driven design; DDD advocates (among other things) aligning your code concepts with the concepts in the domain you are modeling.
So here's the problem: unless you are in some extreme edge case, your database is probably some general purpose application that you are customizing/configuring to meet your needs. Change data capture is going to be limited by the fact that it is implemented using general purpose mechanisms. So the events that are produced are going to look like general purpose patch documents (here's the diff between before and after).
But if we trying to align our events with our domain concepts (ie, what does this change to our persisted state mean), then patch documents are a step in the wrong direction.
For example, our domain might have multiple "events" that make changes to the same, or very similar, sets of fields in our model. Trying to rediscover the motivation for a change by reverse engineering the diff is kind of a dumb problem to have; especially when we have already fought with the same sort of problem learning user interface design.
In some domains, a general purpose change is good enough. In some contexts, a general purpose change is good enough for now. Horses for courses.
But it's not really the sort of implementation that the "event sourcing" community is talking about.
Besides Constantin Galbenu mentioned CDC component side, you can also do it in event storage side like Kafka stream API.
What is Kafka stream API? Input is read from one or more topics in order to generate output to one or more topics, effectively transforming the input streams to output streams.
After transfer detailed data to abstract data, your DB schema is only bind with the transformation now and can release the tightly relation between DB and subscribers.
If your data schema need to change a lot, maybe you should add a new topic for it.
In a classical microservice architecture, you have relevant domain events published on some messaging system which allows other parts of the system to react.
Now imagine you have three microservices: Customers, Orders and Recommendation. The Recommendation microservice needs information from Customers and Orders to provide its functionality, such as the list of all customers and all the orders, which is going to be analyzed from some machine learning algorithm. Now, you need to have the state of Customers "join" Orders on the Recommandation microservice:
You have the Recommandation microservice listen to domain events published by Customers and Orders and built its own state. This leads to logic duplication since you probably have that same logic inside Customers and Orders already
On each relevant domain message from Customers and Orders, you just go to them and ask the state of a specific customer or order. This works fine, however if you have N services rather than just one which needs to build a materialized view, you will cause a big load on Customers and Orders
You get Customers and Orders themselves publish "heavy-weight" events (not domain events) that allows any other microservice to build a materialized view without processing domain events. This allows you both a) not to duplicate the logic b) not to keep asking the same information
Has pattern n.3 some drawbacks we couldn't figure out and if not, how do you implement it in Lagom?
I will try to explain a few more bits in the hope to give you some more perspective on that matter and how you can achieve it in a reliable way in Lagom.
We have a few concepts that we must keep in mind. The most important one which is the source of all is Event Sourcing itself. Event Sourcing means that any State in the system has its source in Events.
The first State that we will deal with is the State of the PersistentEntity. This State is prominent because, together with the Command and Event Handler, it defines the consistency boundary of your model.
But there other States in the system. Actually, we can create as much as we want because we have the Event Journal. A read-model is also a State and it’s also generated from the events.
There are many reasons why you shouldn’t publish the State of the PersistentEntity to other systems. The first one being a matter of avoiding coupling. You don’t want your data to leak to other services. That’s all about having an anti-corruption layer (ACL).
So, from here we could say: before publishing Order and Customer to Recommendation Service, I will transform it to OrderView and CustomerView (ACL 101).
The question now is when will you do it? If you try to publish it in Kafka after you have handled a command, you don’t have any guarantee that the State will be published. There are no XA transactions between the event journal and the Kafka topic. So, there is a chance that the events are persisted, but for some reason, the State is not published in Kafka.
If you want data to get out of a service in a reliable way and without creating coupling between services, you have the following options:
Use the broker API and publish the events to a topic. You should not publish the events as they are, but transform them into the format of your external API (ACL).
Use a read-side processor to generate a view of it, again the external API format you want to make available. If you want, you can publish that ViewState to a topic so other services can consume it directly.
That said, there is nothing wrong in publishing something in a topic that is not a real event, but some derived State. The problem is how you can guarantee that it is effectively published. Doing that from inside the PersistentEntity is risky because you have at-most-once semantics. The most reliable way of doing it is a read-side process that gives you at-least-once semantics.
Further comments inline...
Listen to domain events from customer and orders and rebuild the state
in the recommandation service. This is a horrible idea because you
would need to duplicate the logic that handles events across different
bounded context
That's not a horrible idea. That's how you make your services independent from each other. The logic that you will need to implement to consume the events are not the same. As you said, it's a different bounded context, as such it only gets what it needs.
Leaking the State from a BC to another is more problematic for the reasons I mentioned above (anti-corruption layer).
To achieve decoupling you do need more coding and there is nothing wrong with that. At the end of the day, the reason for building microservices is to avoid coupling and be able to let the services evolve and scale without interfering with each other. There is a price to pay for that and the price is to write more code. You need to evaluate the thread-offs.
You can consume your own events, produce an OrderView and CustomerView and publish into Kafka, but that's the same as consuming the events directly on the Recommendation Service.
Note that you also need to store OrderView and CustomerView somewhere in the Recommendation Service. So you end up storing it three times. On the original service (view table), in Kafka and in the Recommendation Services.
That's why publishing events in a topic is the best option to propagate data between services.
Every time we receive a domain event from customers or orders, go to
them and ask them the state. This is horrible because if you have more
than one microservice that needs their state, you will end up
producing load on customers and orders
That is indeed a horrible idea because you will make the Recommendation Service be dependent on the other two services. If Order or Customer is down, the Recommendation will be down as well. That's what a broker helps to solve.
Have customers and orders not only publish events but also state and
having all the services that need to build materialized views listen
the state they need How do you apply the last pattern with Lagom? We
found no way to listen to state changes, just to events. One solution
we considered implied publishing with pubSub the state in the onEvent
handler of a persistent entity but I am not sure this is the right
place to make it happen.
Using pubSub in the onEvent handler is the worst solution of all. For the following reasons:
pubSub has at-most-once sematincs (see comments above)
Event handlers are called many times. Whenever you re-hydrate an Entity, the events are replayed and the the event handlers will be used for that. Which mean that you will re-publish the state each time. Actually, you would solve the at-most-once pubSub problem, but not the way you might expect/desire.
You could use the afterPersist callback for that, but that's not reliable neither because pubSub is at-most-once.
PubSub inside a PersistentEntity should not be used for something that you need to be reliable. It's a best-effort capability, that's all.
I have recently been building an application on top of Greg Young EventStore as my peristance layer and I have been pondering how big should I allow an event to get?
For example I have an UK Address Aggregate with the following fields
UK_Address
-BuildingName
-Street
-Locality
-Town
-Postcode
Now I'm building the UI using React/Redux and was thinking should I create a single FAT addressUpdated Event contatining all the above fields?
Or should I Create a event for each of the different fields? and batch them within the client until the Save event is fired? buildingNameUpdated Event, streetUpdated Event, localityUpdated Event.
I'm not sure if the answer is as black and white ask I have asked it what I really would like to know is what conditions/constraints could you use to make the decision?
should I create a event for each of the different fields?
No. The representations of your events are part of the API -- so you want to use spellings that make sense at the level of the business, not at the level of the implementation.
Now I'm building the UI using React/Redux and was thinking should I create a single FAT updateAddress Event containing all the above fields?
You don't need to constrain the data that you send to your UI to match that which is in the persistence store. The UI is just a cached representation of a read model; there's no reason that representation needs to have the same form as what is in your event store.
Consider the React model itself -- your code makes changes to the "in memory" representation of your data, and then the library computes the new DOM and replaces it, which in turn causes the browser to update its view, which in turn causes the pixels on the screen to change.
So taking a fat event from the store, and breaking it into field level events for the UI is fine. Taking multiple events from the store and aggregating them into a single message for the UI is also fine. Taking events from the event store and transforming them into a spelling that the UI will recognize is also fine.
Do you have any comment regarding Arien answer regarding keeping fields that need to be consistent together? so regardless of when your snapshop the current state of the world it would be in a valid state?
I don't believe that this makes sense, and I'm not sure if it is possible in general.
It doesn't make sense, because "valid state" is a write model concern only; events are things that have happened, its too late to vote on whether they are valid or not. For instance, if you deploy a new model, with a new invariant, it still needs to respect the history of what happened before. So you can build a snapshot for that new model, but the snapshot may not be "valid". Too bad.
Given that, I don't think it makes sense to worry over whether each individual event in a commit leaves the snapshot in a valid state.
In particular, if a particular transaction involves multiple entities, it is very likely that the domain language will suggest an event for each entity (we "debit cash" and "credit accounts receivable"). The entities themselves, of course, are capable of changing independently of each other -- it's the aggregate that maintains the balance.
You have to bundle al the information together in one event when this data has to be consistent with each other.
So when you update one field of an address you probably get an unwanted address.
This will happen when the client has not processed all the events at a certain time due to eventual consistency.
Example:
Change address (City=1, Street=1, Housenumber=1) to (City=2, Street=2, Housenumber=2)
When you do this with 3 events and you have just processed one at the time of reading you could get the address: (City=2, Street=1, Housenumber=1).
If puzzled, give a try to a solution that is easier to implement. I guess "FAT" event will be easier: you will end up spending less time for implementing/debugging/supporting.
It is usually referred as YAGNI-KISS-Occam's Razor principles.
In theory and I find it to be a good rule of thumb is to have your commands and events reflecting the intent of the user staying true to DDD. You can find a good explanation of the pros and cons about event granularity here: https://medium.com/#hugo.oliveira.rocha/what-they-dont-tell-you-about-event-sourcing-6afc23c69e9a
Since a couple of days I've been trying to figure it out how to inform to the rest of the microservices that a new entity was created in a microservice A that store that entity in a MongoDB.
I want to:
Have low coupling between the microservices
Avoid distributed transactions between microservices like Two Phase Commit (2PC)
At first a message broker like RabbitMQ seems to be a good tool for the job but then I see the problem of commit the new document in MongoDB and publish the message in the broker not being atomic.
Why event sourcing? by eventuate.io:
One way of solving this issue implies make the schema of the documents a bit dirtier by adding a mark that says if the document have been published in the broker and having a scheduled background process that search unpublished documents in MongoDB and publishes those to the broker using confirmations, when the confirmation arrives the document will be marked as published (using at-least-once and idempotency semantics). This solutions is proposed in this and this answers.
Reading an Introduction to Microservices by Chris Richardson I ended up in this great presentation of Developing functional domain models with event sourcing where one of the slides asked:
How to atomically update the database and publish events and publish events without 2PC? (dual write problem).
The answer is simple (on the next slide)
Update the database and publish events
This is a different approach to this one that is based on CQRS a la Greg Young.
The domain repository is responsible for publishing the events, this
would normally be inside a single transaction together with storing
the events in the event store.
I think that delegate the responsabilities of storing and publishing the events to the event store is a good thing because avoids the need of 2PC or a background process.
However, in a certain way it's true that:
If you rely on the event store to publish the events you'd have a
tight coupling to the storage mechanism.
But we could say the same if we adopt a message broker for intecommunicate the microservices.
The thing that worries me more is that the Event Store seems to become a Single Point of Failure.
If we look this example from eventuate.io
we can see that if the event store is down, we can't create accounts or money transfers, losing one of the advantages of microservices. (although the system will continue responding querys).
So, it's correct to affirmate that the Event Store as used in the eventuate example is a Single Point of Failure?
What you are facing is an instance of the Two General's Problem. Basically, you want to have two entities on a network agreeing on something but the network is not fail safe. Leslie Lamport proved that this is impossible.
So no matter how much you add new entities to your network, the message queue being one, you will never have 100% certainty that agreement will be reached. In fact, the opposite takes place: the more entities you add to your distributed system, the less you can be certain that an agreement will eventually be reached.
A practical answer to your case is that 2PC is not that bad if you consider adding even more complexity and single points of failures. If you absolutely do not want a single point of failure and wants to assume that the network is reliable (in other words, that the network itself cannot be a single point of failure), you can try a P2P algorithm such as DHT, but for two peers I bet it reduces to simple 2PC.
We handle this with the Outbox approach in NServiceBus:
http://docs.particular.net/nservicebus/outbox/
This approach requires that the initial trigger for the whole operation came in as a message on the queue but works very well.
You could also create a flag for each entry inside of the event store which tells if this event was already published. Another process could poll the event store for those unpublished events and put them into a message queue or topic. The disadvantage of this approach is that consumers of this queue or topic must be designed to de-duplicate incoming messages because this pattern does only guarantee at-least-once delivery. Another disadvantage could be latency because of the polling frequency. But since we have already entered the eventually consistent area here this might not be such a big concern.
How about if we have two event stores, and whenever a Domain Event is created, it is queued onto both of them. And the event handler on the query side, handles events popped from both the event stores.
Ofcourse every event should be idempotent.
But wouldn’t this solve our problem of the event store being a single point of entry?
Not particularly a mongodb solution but have you considered leveraging the Streams feature introduced in Redis 5 to implement a reliable event store. Take a look this intro here
I find that it has rich set of features like message tailing, message acknowledgement as well as the ability to extract unacknowledged messages easily. This surely helps to implement at least once messaging guarantees. It also support load balancing of messages using "consumer group" concept which can help with scaling the processing part.
Regarding your concern about being the single point of failure, as per the documentation, streams and consumer information can be replicated across nodes and persisted to disk (using regular Redis mechanisms I believe). This helps address the single point of failure issue. I'm currently considering using this for one of my microservices projects.
I am designing some events that will be raised when actions are performed or data changes in a system. These events will likely be consumed by many different services and will be serialized as XML, although more broadly my question also applies to the design of more modern funky things like Webhooks.
I'm specifically thinking about how to describe changes with an event and am having difficulty choosing between different implementations. Let me illustrate my quandry.
Imagine a customer is created, and a simple event is raised.
<CustomerCreated>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</CustomerCreated>
Now let's say Bob spends lots of money and becomes a gold customer, or indeed any other property changes (e.g.: he now prefers to be known as Robert). I could raise an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is nice because the schema of the Created and Modified events are the same and any subscriber receives the complete current state of the entity. However it is difficult for any receiver to determine which properties have changed without tracking state themselves.
I then thought about an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is more compact and only contains the properties that have changed, but comes with the downside that the receiver must apply the changes and reassemble the current state of the entity if they need it. Also, the schemas of the Created and Modified events must be different now; CustomerId is required but all other properties are optional.
Then I came up with this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<Before>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</Before>
<After>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</After>
</CustomerModified>
This covers all bases as it contains the full current state, plus a receiver can figure out what has changed. The Before and After elements have the exact same schema type as the Created event. However, it is incredibly verbose.
I've struggled to find any good examples of events; are there any other patterns I should consider?
You tagged the question as "Event Sourcing", but your question seems to be more about Event-Driven SOA.
I agree with #Matt's answer--"CustomerModified" is not granular enough to capture intent if there are multiple business reasons why a Customer would change.
However, I would back up even further and ask you to consider why you are storing Customer information in a local service, when it seems that you (presumably) already have a source of truth for customer. The starting point for consuming Customer information should be getting it from the source when it's needed. Storing a copy of information that can be queried reliably from the source may very well be an unnecessary optimization (and complication).
Even if you do need to store Customer data locally (and there are certainly valid reasons for need to do so), consider passing only the data necessary to construct a query of the source of truth (the service emitting the event):
<SomeInterestingCustomerStateChange>
<CustomerId>1234</CustomerId>
</SomeInterestingCustomerStateChange>
So these event types can be as granular as necessary, e.g. "CustomerAddressChanged" or simply "CustomerChanged", and it is up to the consumer to query for the information it needs based on the event type.
There is not a "one-size-fits-all" solution--sometimes it does make more sense to pass the relevant data with the event. Again, I agree with #Matt's answer if this is the direction you need to move in.
Edit Based on Comment
I would agree that using an ESB to query is generally not a good idea. Some people use an ESB this way, but IMHO it's a bad practice.
Your original question and your comments to this answer and to Matt's talk about only including fields that have changed. This would definitely be problematic in many languages, where you would have to somehow distinguish between a property being empty/null and a property not being included in the event. If the event is getting serialized/de-serialized from/to a static type, it will be painful (if not impossible) to know the difference between "First Name is being set to NULL" and "First Name is missing because it didn't change".
Based on your comment that this is about synchronization of systems, my recommendation would be to send the full set of data on each change (assuming signal+query is not an option). That leaves the interpretation of the data up to each consuming system, and limits the responsibility of the publisher to emitting a more generic event, i.e. "Customer 1234 has been modified to X state". This event seems more broadly useful than the other options, and if other systems receive this event, they can interpret it as they see fit. They can dump/rewrite their own data for Customer 1234, or they can compare it to what they have and update only what changed. Sending only what changed seems more specific to a single consumer or a specific type of consumer.
All that said, I don't think any of your proposed solutions are "right" or "wrong". You know best what will work for your unique situation.
Events should be used to describe intent as well as details, for example, you could have a CustomerRegistered event with all the details for the customer that was registered. Then later in the stream a CustomerMadeGoldAccount event that only really needs to capture the customer Id of the customer who's account was changed to gold.
It's up to the consumers of the events to build up the current state of the system that they are interested in.
This allows only the most pertinent information to be stored in each event, imagine having hundreds of properties for a customer, if every command that changed a single property had to raise an event with all the properties before and after, this gets unwieldy pretty quickly. It's also difficult to determine why the change occurred if you just publish a generic CustomerModified event, which is often a question that is asked about the current state of an entity.
Only capturing data relevant to the event means that the command that issues the event only needs to have enough data about the entity to validate the command can be executed, it doesn't need to even read the whole customer entity.
Subscribers of the events also only need to build up a state for things that they are interested in, e.g. perhaps an 'account level' widget is listening to these events, all it needs to keep around is the customer ids and account levels so that it can display what account level the customer is at.
Instead of trying to convey everything through payload xmls' fields, you can distinguish between different operations based on -
1. Different endpoint URLs depending on the operation(this is preferred)
2. Have an opcode(operation code) as an element in the xml file which tells which operation is to used to handle the incoming request.(more nearer to your examples)
There are a few enterprise patterns applicable to your business case - messaging and its variants, and if your system is extensible then Enterprise Service Bus should be used. An ESB allows reliable handling of events and processing.