kstream topology with inmemory statestore data not commited - spring-boot

I need to aggregate client information and every hours push it to an output topic.
I have a topology with :
input-topic
processor
sink topic
Data arrives in input-topic with a key in string which contains a clientID concatenated with date in YYYYMMDDHH
.
In my processor I use a simple InMemoryKeyValueStore (withCachingDisabled) to merge/aggregate data with specific rules (data are sometime not aggregated according to business logic).
In a punctuator, every hours the program parse the statestore to get all the messages transform it and forward it to the sink topic, after what I clean the statestore for all the message processed.
After the punctuation, I ask the size of the store which is effectivly empty (by .all() and
approximateNumEntries), every thing is OK.
But when I restart the application, the statstore is restored with all the elements normally deleted.
When I parse manually (with a simple KafkaConsumer) the changelog topic of the statestore in Kafka, I view that I have two records for each key :
The first record is commited and the message contains my aggregation.
The second record is a deletion message (message with null) but is not commited (visible only with read_uncommitted) which is dangerous in my case because the next punctuator will forward again the aggregate.
I have play with commit in the punctuator which forward, I have create an other punctuator which commit the context periodically (every 3 seconds) but after the restart I still have my data restored in the store (normal my delete message in not commited.)
I have a classic kstream configuration :
acks=all
enable.idempotence=true
processing.guarantee=exactly_once_v2
commit.interval.ms=100
isolation.level=read_committed
with the last version of the library kafka-streams 3.2.2 and a cluster in 2.6
Any help is welcome to have my record in the statestore commited. I don't use TimeWindowedKStream which is not exactly my need (sometime I don't aggregate but directly forward)

Related

Why Kafka streams creates topics for aggregation and joins

I recently created my first Kafka stream application for learning. I used spring-cloud-stream-kafka-binding. This is a simple eCommerce system, in which I am reading a topic called products, which have all the product entries whenever a new stock of a product comes in. I am aggregating the quantity to get the total quantity of a product.
I had two choices -
Send the aggregate details (KTable) to another kafka topic called aggregated-products
Materialize the aggregated data
I opted second option and what I found out that application created a kafka topic by itself and when I consumed messages from that topic then got the aggregated messages.
.peek((k,v) -> LOGGER.info("Received product with key [{}] and value [{}]",k, v))
.groupByKey()
.aggregate(Product::new,
(key, value, aggregate) -> aggregate.process(value),
Materialized.<String, Product, KeyValueStore<Bytes, byte[]>>as(PRODUCT_AGGREGATE_STATE_STORE).withValueSerde(productEventSerde)//.withKeySerde(keySerde)
// because keySerde is configured in application.properties
);
Using InteractiveQueryService, I am able to access this state store in my application to find out the total quantity available for a product.
Now have few questions -
why application created a new kafka topic?
if answer is 'to store aggregated data' then how is this different from option 1 in which I could have sent the aggregated data by my self?
Where does RocksDB come into picture?
Code of my application (which does more than what I explained here) can be accessed from this link -
https://github.com/prashantbhardwaj/kafka-stream-example/blob/master/src/main/java/com/appcloid/kafka/stream/example/config/SpringStreamBinderTopologyBuilderConfig.java
The internal topics are called changelog topics and are used for fault-tolerance. The state of the aggregation is stored both locally on the disk using RocksDB and on the Kafka broker in the form of a changelog topic - which is essentially a "backup". If a task is moved to a new machine or the local state is lost for a different reason, the local state can be restored by Kafka Streams by reading all changes to the original state from the changelog topic and applying it to a new RocksDB instance. After restoration has finished (the whole changelog topic was processed), the same state should be on the new machine, and the new machine can continue processing where the old one stopped. There are a lot of intricate details to this (e.g. in the default setting, it can happen that the state is updated twice for the same input record when failures happen).
See also https://developer.confluent.io/learn-kafka/kafka-streams/stateful-fault-tolerance/

Flink, Kafka and JDBC sink

I have a Flink 1.11 job that consumes messages from a Kafka topic, keys them, filters them (keyBy followed by a custom ProcessFunction), and saves them into the db via JDBC sink (as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/jdbc.html)
The Kafka consumer is initialized with these options:
properties.setProperty("auto.offset.reset", "earliest")
kafkaConsumer = new FlinkKafkaConsumer(topic, deserializer, properties)
kafkaConsumer.setStartFromGroupOffsets()
kafkaConsumer.setCommitOffsetsOnCheckpoints(true)
Checkpoints are enabled on the cluster.
What I want to achieve is a guarantee for saving all filtered data into the db, even if the db is down for, let's say, 6 hours, or there are programming errors while saving to the db and the job needs to be updated, redeployed and restarted.
For this to happen, any checkpointing of the Kafka offsets should mean that either
Data that was read from Kafka is in Flink operator state, waiting to be filtered / passed into the sink, and will be checkpointed as part of Flink operator checkpointing, OR
Data that was read from Kafka has already been committed into the db.
While looking at the implementation of the JdbcSink, I see that it does not really keep any internal state that will be checkpointed/restored - rather, its checkpointing is a write out to the database. Now, if this write fails during checkpointing, and Kafka offsets do get saved, I'll be in a situation where I've "lost" data - subsequent reads from Kafka will resume from committed offsets and whatever data was in flight when the db write failed is now not being read from Kafka anymore nor is in the db.
So is there a way to stop advancing the Kafka offsets whenever a full pipeline (Kafka -> Flink -> DB) fails to execute - or potentially the solution here (in pre-1.13 world) is to create my own implementation of GenericJdbcSinkFunction that will maintain some ValueState until the db write succeeds?
There are 3 options that I can see:
Try out the JDBC 1.13 connector with your Flink version. There is a good chance it might just work.
If that doesn't work immediately, check if you can backport it to 1.11. There shouldn't be too many changes.
Write your own 2-phase-commit sink, either by extending TwoPhaseCommitSinkFunction or implement your own SinkFunction with CheckpointedFunction and CheckpointListener. Basically, you create a new transaction after a successful checkpoint and commit it with notifyCheckpointCompleted.

Execute code when two previous events have been processed (Apache Kafka)

I´m new in Apache Kafka and Spring Boot. I´m trying to create a Spring Boot listener that generates a new event only when two specific messages (sent through Apache Kafka) have been received (for a determined resource).
The obvious solution is to use the database to change the status of the resource when the first event comes, and execute the code when the second event comes (if the customer is in the correct status in database). In this case, I'm worried if both events arrive at the same time.
Is there a way to aggregate both messages in Spring Boot/Apache Kafka instead do this manually?
Thanks.
You can do it with kafka streams. Example topology:
input stream (key/value from input topic A)
filter (filter by event type for example)
groupBy (group events by key or some field)
aggregate (aggregate events into new data structure)
filter (verify if aggregate its complete)
map (generate new output event with aggregate values)
output stream (key/value to topic B)
Check details in official doc: https://kafka.apache.org/24/documentation/streams/developer-guide/dsl-api.html#creating-source-streams-from-kafka

Kafka streams: Is there a way to delete an entry from kv store only if successfully committed to topics?

In Kafka streams, my kv store is linked to a sink which sends records to output topic.
- What exceptions would we get if for some reason sink can't commit records to topics?
If the sink cannot write the record it will internally retry and after all retries are exhausted the whole application goes down with an exception. If the store was updated successfully it will (by default only) contain the data and you cannot delete it. This is the guarantee "at-least-once" processing gives you.
As of Kafka 0.11 you can enable "exactly-once" processing:
properties.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
For this case, on application restart the store will be deleted and recreated before any processing is repeated. This ensures, that the data written to the store before the error will be "removed" before processing continues.

Kafka Stream state store data is not getting Deleted

Using Kafka State store (Rock DB) in low level processor API with custom key.
Now trying to delete the KvStore and flush after that. Still it is available when in next punctuate is getting called .
Why Not is is delete the data from Rock DB ?
KeyValueStore kvStore
kvStore.delete(CustomKey)

Resources