I am using Kafka streams 2.2.1.
I am using suppress to hold back events until a window closes. I am using event time semantics.
However, the triggered messages are only triggered once a new message is available on the stream.
The following code is extracted to sample the problem:
KStream<UUID, String>[] branches = is
.branch((key, msg) -> "a".equalsIgnoreCase(msg.split(",")[1]),
(key, msg) -> "b".equalsIgnoreCase(msg.split(",")[1]),
(key, value) -> true);
KStream<UUID, String> sideA = branches[0];
KStream<UUID, String> sideB = branches[1];
KStream<Windowed<UUID>, String> sideASuppressed =
sideA.groupByKey(
Grouped.with(new MyUUIDSerde(),
Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofMinutes(31)).grace(Duration.ofMinutes(32)))
.reduce((v1, v2) -> {
return v1;
})
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream();
Messages are only streamed from 'sideASuppressed' when a new message gets to 'sideA' stream (messages arriving to 'sideB' will not cause the suppress to emit any messages out even if the window closure time has passed a long time ago).
Although, in production the problem is likely not to occur much due to high volume, there are enough cases when it is essential not to wait for a new message that gets into 'sideA' stream.
Thanks in advance.
According to Kafka streams documentation:
Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available. If at least one partition does not have any new data available, stream-time will not be advanced and thus punctuate() will not be triggered if PunctuationType.STREAM_TIME was specified. This behavior is independent of the configured timestamp extractor, i.e., using WallclockTimestampExtractor does not enable wall-clock triggering of punctuate().
I am not sure why this is the case, but, it explains why suppressed messages are only being emitted when messages are available in the queue it uses.
If anyone has an answer regarding why the implementation is such, I will be happy to learn. This behavior causes my implementation to emit messages just to get my the suppressed message to emit in time and causes the code to be much less readable.
Related
Given default configuration and this binding
#Bean
public Function<Flux<Message<Input>>, Flux<Message<Output>>> process() {
return input -> input
.map(message -> {
// simplified
return MessageBuilder.build();
});
}
Is there any guarantee that input message offset is commited after output is written to Kafka? I don´t need full Transactions, and I can live with at-least-once delivery and possible duplicates, but I cannot loose output message. I was unable to find this exact scenario in docs, and I believe previous channel-based binding worked as I need it to, since it was blocking by nature, but I am not sure about functional.
Is there an equivalent of PublishSubject from RxJava in Kotlin Coroutines library?
Channels cannot be a replacement for PublishSubject since they do not publish values to multiple collectors (each value can be collected by a single collector only). Even MutableSharedFlow that supports multiple collectors, still does not allow emitting values without waiting for collectors to finish processing previous values. How can we create a flow with functionality similar to the PublishSubject?
The following code will create a Flow equivalent to the PublishSubject:
fun <T> publishFlow(): MutableSharedFlow<T> {
return MutableSharedFlow(
replay = 0,
extraBufferCapacity = Int.MAX_VALUE
)
}
The main attributes of the PublishSubject are that it does not replay old values to new observers, and still allows to publish new values/events without waiting for the observers to handle them. So this functionality can be achieved with MutableSharedFlow by specifying replay = 0 for preventing new collectors from collecting old values, and extraBufferCapacity = Int.MAX_VALUE to allow publishing new values without waiting for busy collectors to finish collecting previous values.
One can add the following forceEmit function to be called instead of tryEmit, to ensure that the value is actually emitted:
fun <T> MutableSharedFlow<T>.forceEmit(value: T) {
val emitted = tryEmit(value)
check(emitted){ "Failed to emit into shared flow." }
}
Since we have a buffer with MAX_VALUE capacity, this forceEmit function should never fail if we use it with our publishFlow. If the flow will be replaced somehow with a different flow that does not support emitting without suspending, we will get an exception and will know to handle the case where the buffer is full and one cannot emit without suspending.
Notice that having a buffer of MAX_VALUE capacity may cause high consumption of memory if the collection of values by the collectors takes a long time, so it is more suitable for cases where the collectors perform a short synchronous operation (similarly to RxJava observers).
I'm pretty new to Kafka. I'm using spring cloud stream Kafka to produce and consume
#StreamListener(Sink.INPUT)
public void process(Order order) {
try {
// have my message processing
}
catch( exception e ) {
//retry here that record..
}
}
}
Just want to know how can I implement a retry ? Any help on this is highly appreciated
Hy
There are multiple ways to handle "retries" and it depends on the kind of events you encounter.
For basic issues kafka framework will retry for you to recover from an error condition, for example in case of a short network downtime the consumer and producer api implement auto retry.
In particular kafka support "built-in producer/consumer retries" to correctly handle a large variety of errors without loss of messages, but as a developer, you must still be able to handle other types of errors with the try-catch block you mention.
Error in kafka can be divided in the following categories:
(producer & consumer side) Nonretriable broker errors such as errors regarding message size, authorization errors, etc -> you must handle them in "design phase" of your app.
(producer side) Errors that occur before the message was sent to the broker—for example, serialization errors --> you must handle them in the runtime app execution
(producer & consumer sideErrors that occur when the producer exhausted all retry attempts or when the
available memory used by the producer is filled to the limit due to using all of it to store messages while retrying -> you should handle these errors.
Another point of attention regarding "how to retry" is how to handle correctly the order of commits in case of auto-commit option is set to false.
A common and simple pattern to get commit order right is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commit function.
When you’re getting ready to send a retry, check if the
commit sequence number the callback got is equal to the instance
variable; if it is, there was no newer commit and it is safe to retry. If
the instance sequence number is higher, don’t retry because a
newer commit was already sent.
Dears,
I am trying to do some kind of event-driven Microservices. Currently, I was able to consume a message from Kafka and update database record when message is received using Quarkus & Smallrye-Reactive messaging extension. What I want to achieve further is to be able to send a message to other topic in case of success and send a message to error topic otherwise. I know that we can use return and #outgoing annotation for emitting new message but I don't think it will fit in my use case. I need a guidance here, if error happens while consuming a message. Should I return message to the original topic (by not acknowledging the message) or should I consume it and produce error message to different topic to rollback the original transaction.
Here is my code :
#Incoming("new-payment")
public void newMessage(String msg) {
LOG.info("New payment has been received.");
LOG.info("Payload is {}", msg);
PaymentEvent pe = jsob.fromJson(msg, PaymentEvent.class);
mysqlPool.preparedQuery("select totalBuyers from Book where isbn = ? ",
Tuple.of(pe.getIsbn()))
.thenApply(rs -> {
RowIterator<Row> iterator = rs.iterator();
if (iterator.hasNext()) {
return iterator.next().getInteger(0) + 1;
} else {
return Integer.valueOf(0);
}
})
.thenApply(totalCount -> {
return mysqlPool.preparedQuery("update Book set totalBuyers = ?",
Tuple.of(totalCount));
})
.whenComplete((rs, err) -> {
if (err != null) {
//Emit an error to error topic.
} else {
//Emit a msg to other service.
}
});
}
Also if you've better code please submit, I am still newbie in reactive programming :).
I've been doing enterprise integration for years and I think that you would want to do both.
Should I return message to the original topic (by not acknowledging
the message) or should I consume it and produce error message to
different topic to rollback the original transaction.
The event should remain on the topic for another instance to potentially pick up and process. And an error message should be logged as an event. Perhaps the same consumer could pick up and reprocess the event successfully.
An EDA (Event Driven Architecture) may offer different ways to handle this but on an ESB the message would be marked as tried. Generally three tried attempts would send it to a dead-letter queue so that it can be corrected and reprocessed later.
Our enterprise is also starting to design and build applications using EDA so I am interested to read what others have to say on this question. And KUDOS to you for focusing on Quarkus. I believe that this is one of the best technologies to come from Redhat that I have seen yet!
Another problem with this approach is that you are doing “2 writes in 1 service” e.g. one call to the db and another one to a topic. And this can become problematic when one of the 2 writes fails.
If you want to avoid this and use a pure event driven approach, then you need to reorder your events in such a way that writing to a db is the last event in the whole flow so that you can prevent 2 writes from 1 service.
Thus in your case: change the 2nd thenApply(..) method from updating the db into firing a new event to another topic. And the consumer of this new topic should do the db update. Thus the flow becomes like this:
Producer -> topic1 -> consumer (select from ...) & fire event to another topic -> topic2 -> consumer (update table).
I have a Spring Cloud Stream (Kafka Streams version 2.1) application with a Kafka Streams binder and I am doing time window aggregations, where I only want to make some action (API call) once
the window closes. The behavior I'm observing is that on every application restart, my mapValues function is called for every record stored in the changelog,
resulting in huge number of calls being made to the API.
My understanding of suppress() is that for every closed time window, a tombstone record should be sent to the aggregate changelog topic, effectively preventing me from reprocessing it, even after application restarts.
What could be causing messages to be reprocessed on an app restart?
I've already confirmed that the app is not reconsuming the source topic.
Snippet of the relevant code below:
Serde<Aggregator> aggregatorSerde = new JsonSerde<>(Aggregator.class, objectMapper);
Materialized<String, TriggerAggregator, WindowStore<Bytes, byte[]>> stateStore = Materialized.<String, Aggregator, WindowStore<Bytes, byte[]>>
with(Serdes.String(), aggregatorSerde);
KTable<Windowed<String>, List<Event>> windowedEventKTable = inputKStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(30)).grace(Duration.ofSeconds(5))
.aggregate(Aggregator::new, ((key, value, aggregate) -> aggregate.aggregate(value)), stateStore)
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()).withName(supressStoreName))
.mapValues((windowedKey, groupedTriggerAggregator) -> {//code here returning a list})
.toStream((k,v) -> k.key())
.flatMapValues((readOnlyKey, value) -> value);