Does kafka streams clean up its state store on rebalance, or simply accumulate them? - apache-kafka-streams

I wonder what happen to state stores when there is a rebalance in a kafka-streams application. Let say an instance goes off for a while and then comes back, either out of the static membership time window, or with no static membership at all. Does the old state store corresponding to the old tasks assignment gets deleted or do they live there together with the new state store corresponding to the new task assignments ?

Related

Sharing local store between different KafkaStreams

Given: Have two KafkaStreams with DSL topology each one. Local state store is added to one of the typologies. What is the optimal way for the second KafkaStream to update the local store in the first KafkaStreams?
I could think about adding some processor to the KafkaStreams with local store. This processor has (1) some static task list populated by second KafkaStream, (2) Punctuator which will process tasks from the task list.
Unfortunately, this design doesn't provide any guarantee for failure tolerate.
Any better approach?
Local state of an application should only be update by the application itself.
Not sure what you exactly want to achieve. One way to "update" state from a Kafka Streams instance might be via topic. Instance A creates a table from a topic. Instance B write into this topic when it want to update A's table-state.
Hope this helps. If not, maybe updated your question to give more details what you want to achieve.

Regarding fault-tolerance for suppress operator in KTable

We are planning to use suppress operator over Session Windowed KTable.
We are wondering about fault-tolerance when using suppress operator.
We understand that buffer is used to store events/aggregations until the window closes.
Now let us say a rebalance has happened, and active task is moved to different machine. We are wondering what happens to this (in-memory ?) buffer.
Let us say we are tracking click count by user. And we configured session window's in-activity period to be 3 minutes, and session window has started for a key alice, and aggregations happened for that key for 2 minutes. For example in buffer we have (alice -> 5) entry representing that alice had made 5 clicks in this session so far.
And say there is no activity after that from alice.
If things are working fine , then once the session is over, downstream processor will get event alice -> 5 .
But what if there is rebalance now, and active task that is maintaining session window for alice is moved to new machine ?
Since there is no further activity from alice, will downstream processor which is running on new machine miss this event alice ->5 ?
The suppress operator provides fault tolerance similarly to any other state store in Streams. Although the active data structure is in memory, the suppression buffer maintains a changelog (an internal Kafka topic).
So, when you have that rebalance, the previous active task flushes its state to the changelog and discards the in-memory buffer. The new active task re-creates the state by replaying the changelog topic, resulting in the exact same buffered contents as if there had been no rebalance.
In other words, just like in-memory state stores, the suppression buffer is made durable (in a Kafka topic) even though it is not persistent (on the local disk).
Does that make sense?

Which guarantees does Kafka Stream provide when using a RocksDb state store with changelog?

I'm building a Kafka Streams application that generates change events by comparing every new calculated object with the last known object.
So for every message on the input topic, I update an object in a state store and every once in a while (using punctuate), I apply a calculation on this object and compare the result with the previous calculation result (coming from another state store).
To make sure this operation is consistent, I do the following after the punctuate triggers:
write a tuple to the state store
compare the two values, create change events and context.forward them. So the events go to the results topic.
swap the tuple by the new_value and write it to the state store
I use this tuple for scenario's where the application crashes or rebalances, so I can always send out the correct set of events before continuing.
Now, I noticed the resulting events are not always consistent, especially if the application frequently rebalances. It looks like in rare cases the Kafka Streams application emits events to the results topic, but the changelog topic is not up to date yet. In other words, I produced something to the results topic, but my changelog topic is not at the same state yet.
So, when I do a stateStore.put() and the method call returns successfully, are there any guarantees when it will be on the changelog topic?
Can I enforce a changelog flush? When I do context.commit(), when will that flush+commit happen?
To get complete consistency, you will need to enable processing.guarantee="exaclty_once" -- otherwise, with a potential error, you might get inconsistent results.
If you want to stay with "at_least_once", you might want to use a single store, and update the store after processing is done (ie, after calling forward()). This minimized the time window to get inconsistencies.
And yes, if you call context.commit(), before input topic offsets are committed, all stores will be flushed to disk, and all pending producer writes will also be flushed.

How to restart a KafkaStreams consumer group in a way that avoids recreating the state store from its changelog topic

In a deployment with multiple nodes hosting KafkaStreams (0.10.2.1) instances with persistent state store, what is the recommended way to restart all nodes while avoiding from replaying the entire state store changelog topic? This has to be done without changing the application.id as I don't want to lose the data I already have in the state store.
I increased session.timeout.ms so that all nodes will be up by the time the broker starts to reassign partitions, and avoided calling KafkaStreams.stop to prevent an unneeded partition reassignment as I'm restarting all nodes during deployment.
When the broker starts to reassign partitions (after all nodes are up), it seems that the KafkaStreams instances are replaying the entire state store changelog topic, instead of picking up from the offset to which they arrived just before the restart.
I guess that in order to pick from the latest offset these conditions have to be met:
1) Partitions will be assigned to instances containing their matching persistent store.
2) KafkaStreams will pick up from the latest offset in the changelog topic, instead of replaying the entire changelog.
Is there a way to achieve this?
Kafka Streams writes local state and local checkpoint files that are used to track state store's health. If a checkpoint file is missing, it indicates a corrupted state store, and thus Kafka Streams wipes out the store and recreates it from scratch by replaying the state store's changelog topic.
Those local checkpoint files are written on a clean shutdown in 0.10.2.1 only. Thus, as you don't call KafakStreams#close(), you don't get a clean shutdown (that might also corrupt your state as some writes might not have be flushed to disk).
In Kafka 0.11.0.x, local checkpoint files are written on every commit allowing more aggressive reuse of local state stores.
I would highly recommend to upgrade to 0.11.0.1 or 1.0.0 (will be released shortly) -- it contains many improvements with regard to state store handling and rebalancing. Note, you don't need to upgrade your brokers for this, as Kafka Streams is compatible with older brokers, too (cf. https://docs.confluent.io/current/streams/upgrade-guide.html#compatibility)

Is it possible to combine spring statemachine with event sourcing pattern?

My idea is to keep track of states of a domain object by spring statemachine. i.e. statemachine defines how to transit states of the domain object. When the events are persisted/restored to/from the event store, the state of the domain object can be (re)generated by sending events to the statemachine.
However, it seems that creating a statemachine object is relatively expensive, it's not that performant to create a state-machine object whenever a state transition happened on a domain object. If I only maintain a statemachine object, I would worry about concurrency problems. One approach is to have a 'statemachine-pool', but it gets messy if I have to create statamachines for multiple different domain objects.
So is it a good idea to apply spring statemachine with event sourcing pattern?
Provided that all the transitions are based on events I would say that it is a pretty good idea, yes.
The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as the application state itself.
The main point about event sourcing is that you store the events leading to a particular state - instead of just storing the current state - so that you can replay them up to a given point of time.
Thus, using event sourcing has no impact on how you create your state machines.
However, it seems that creating a state-machine object is relatively expensive, it's not that performant to create a state-machine object whenever a state transition happened on a domain object.
Creating a state-machine every time there is a state transition is not related with event sourcing. Would you do it differently if you were only storing the current state? You'd still need to either create the state-machine from the last stored state - or look it up in a cache or a pool - before you could apply the transition.
The only performance hit derived from using event sourcing would be that of replaying the transitions from the beginning in order to reach the current state. Now, if this is costly you can use snapshots to minimize the amount of transitions that must be replayed.

Resources