Kafka state store Rock DB is fault tolerant , from the change log how can restore that piece of data which is not functioning ?
The restoration of all built-in storage engines in the Kafka Streams API is fully automated.
Further details are described at http://docs.confluent.io/current/streams/developer-guide.html#fault-tolerant-state-stores, some of which I quote here:
In order to make state stores fault-tolerant (e.g., to recover from machine crashes) as well as to allow for state store migration without data loss (e.g., to migrate a stateful stream task from one machine to another when elastically adding or removing capacity from your application), a state store can be continuously backed up to a Kafka topic behind the scenes. We sometimes refer to this topic as the state store’s associated changelog topic or simply its changelog. In the case of a machine failure, for example, the state store and thus the application’s state can be fully restored from its changelog. You can enable or disable this backup feature for a state store, and thus its fault tolerance.
Related
Is it okay to hold large state in RocksDB when using Kafka Streams? We are planning to use RocksDB as an eventstore to hold billions of events for ininite of time.
Yes, you can store a lot of state there but there are some considerations:
The entire state will also be replicated on the changelog topics, which means your broker will need to have enough disk space for it. Note that this will NOT be mitigated by KIP-405 (Tiered Storage) as tiered storage does not apply for compacted topics.
As #OneCricketeer mentioned, rebuilding the state can take a long time if there's a crash. However, you can mitigate it via multiple ways:
Use a persistent store and re-start the application on a node with access to the same disk (StatefulSet + PersistentVolume in K8s works).
In exactly-once semantics, until KIP-844 is implemented upon an unclean shutdown the state will still be rebuilt from scratch. But once that PR is merged then only a small amount of data will have to be replayed.
Have standby replicas. They will enable failover as soon as the consumer session timeout expires once the kafka streams instance crashes.
The main limitation would be disk space, so sure, it can be done, but if the app crashes for any reason, you might be waiting for a while for the app to rebuild its state.
we are migrating from a legacy monolith application to a microservice architecture. we use CQRS and event sourcing pattern and message broker (rabbitmq) for communication mechanism. now We are facing a challenge about how can convert the old database to new architecture and how can use event sourcing for these? Assuming the old database did not have events, can we do the data conversion without creating events? what is the start point of our old database data in the event sourcing pattern?
One important thing to remember is that many databases internally event source: every write goes to a log and that log is used to update tables, replicate etc., after which the log is truncated. It's equivalent to event sourcing with a lot of snapshots and very little retention of events and old snapshots.
In these databases (which include the likes of Postgres, MySQL, Oracle, SQL Server, Cassandra, CosmosDB, to name ones I know from experience do this), there's a technique called Change Data Capture which essentially taps into the log and exposes a stream of changes to the database which can be treated as events from the database (or by extension as commands: "one service's events are another service's commands"). Debezium can be used to write CDC records to Kafka; for RabbitMQ you may need to roll something yourself, in which case you'll want to get acquainted with how CDC is exposed in your database.
Even if the database doesn't support CDC, if the data isn't that large, you can often turn it into an ersatz event stream by periodically dumping its data (if the records are timestamped, this can even work if the data is particularly slow moving) and implementing a service to track what changed: this won't tell you about changes that netted out, but it's often better than nothing. This sort of dump is also likely to be required if you need a "genesis" event to ensure that your initial state is current to when you moved to event-sourcing or CDC.
This whole broad family of techniques has limitations compared to full event sourcing: reifying what changed is not as valuable as reifying what changed and why it changed. But it can be a useful middle ground in migrating to event-sourcing.
By referring #alexey-zimarev's answer at this post, it's essential to have the starting event in your event sourced database. You can not configure an event-sourced aggregate without replaying its events. Therefore, you need to map the legacy snapshot to an individual domain event of your relevant aggregate.
Either the way, considering event souring definition by Martin Fowler:
The fundamental idea of Event Sourcing is that of ensuring every
change to the state of an application is captured in an event object,
and that these event objects are themselves stored in the sequence
they were applied for the same lifetime as the application state
itself.
So that, it's not an appropriate solution to migrate legacy snapshots into the newer one without extracting and storing domain events. It will turn your event-sourced project into a semi-event-sourced project which is not considered as a paradigm to design and develop.
You have an event store that is a database for events. you can create event data that you need for the old database and insert into the event store. After that, do event replaying for creating read models.
We have several services that publishes and subscribes to Domain Events. What we usually do is log events whenever we publish and log events whenever we process events. We basically use this to apply choreography pattern.
We are not doing Event Sourcing in these systems, and there's no programmatic use for them after publishing/processing. That's the main driver we opted not to store these in a durable container, like a database or event store.
Question is, are we missing some fundamental thing by doing this?
Is storing Events a must?
I consider queued messages as system messages, even if they represent some domain event in an event-driven architecture (pub/sub messaging).
There is absolutely no hard-and-fast rule about their storage. If you would like to keep them around you could have your messaging mechanism forward them to some auditing endpoint for storage and then remove them after some time (if necessary).
You are not missing anything fundamental by not storing them.
You're definitely not missing out on anything (but there is a catch) especially if that's not a need by the business. An Event-Sourced System would definitely store all the events generated by the system into a database (or any other event-store)
The main use of an event store is to be able to restore the state of the system to the current state in case of a failure by replaying messages. To make this process of recovery faster we have snapshots.
In your case since these events are just are only relevant until the process is completed, it would not make sense to store them until you have a failure. (this is the catch) especially in a Distributed Transaction case scenario.
What I would suggest?
Don't store the event themselves but log the relevant details about these events and maybe use an ELK stack or Grafana to store these logs.
Use either the Saga Pattern or the Routing Slip pattern in case of a Distributed Transaction and log them as well.
In case a failure occurs while processing an event, put that event into an exception queue and handle it. If it's a part of a distributed transaction make sure either they all have the same TransactionId or they have a CorrelationId so you can lookup for logs and save your system.
For reliably performing your business transactions in a distributed archicture you somehow need to make sure that your events are published at least once.
So a service that publishes events needs to persist such an event within the same transaction that causes it to get created.
Considering you are publishing an event via infrastructure services (e.g. a messaging service) you can not rely on it being available all the time.
Also, your own service instance could go down after persisting your newly created or changed aggregate but before it had the chance to publish the event via, for instance, a messaging service.
Question is, are we missing some fundamental thing by doing this? Is storing Events a must?
It doesn't matter that you are not doing event sourcing. Unless it is okay from the business perspective to sometimes lose an event forever you need to temporarily persist your event with your local transaction until it got published.
You can look into the Transactional Outbox Pattern to achieve reliable event publishing.
Note: Logging/tracking your events somehow for monitoring or later analyzing/reporting purpose is a different thing and has another motivation.
I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)
I am currently working with a legacy system that consists of several services which (among others) communicate through some kind of Enterprise Service Bus (ESB) to synchronize data.
I would like to gradually work this system towards the direction of micro services architecture. I am planning to reduce the dependency on ESB and use more of message broker like RabbitMQ or Kafka. Due to some resource/existing technology limitation, I don't think I will be able to completely avoid data replication between services even though I should be able to clearly define a single service as the data owner.
What I am wondering now, how can I safely do a database backup restore for a single service when necessary? Doing so will cause the service to be out of sync with other services that hold the replicated data. Any experience/suggestion regarding this?
Have your primary database publish events every time a database mutation occurs, and let the replicated services subscribe to this event and apply the same mutation on their replicated data.
You already use a message broker, so you can leverage your existing stack for broadcasting the events. By having replication done through events, a restore being applied to the primary database will be propagated to all other services.
Depending on the scale of the backup, there will be a short period where the data on the other services will be stale. This might or might not be acceptable for your use case. Think of the staleness as some sort of eventual consistency model.