I have a SourceTask which has a simple poll method (completes quite fast). I found that the offsets value got from the context.offsetStorageReader is mostly stale, which means not matching the offsets value returned in the previous poll() method.
At the same time, I can observe from logs that the offsets value only get updated to "fresh" when "commitOffsets successfully" occurred.
My question is: is this designed on purpose? Should I decrease the "OFFSET_COMMIT_INTERVAL_MS_CONFIG" value to assure the offset is committed faster than the SourceTask.poll() method executed?
The comments of org.apache.kafka.connect.runtime.OffsetStorageWriter class says "Offset data should only be read during startup or reconfiguration of a task...", instead of being read in each execution of poll() method.
I had the same misconception about how Kafka Connect SourceTasks work:
Fetching the current offset in poll() only makes sense if you think about Tasks as "one-off" jobs: The Task is started, then poll() is called, which sends the record(s) to Kafka and persists its offset with it; then the Task is killed and the next Task will pick up the offset and continue reading data from the source.
This would not work well with Kafka, because the Connector partitions/offsets are themselves persisted in a Kafka topic - so there is no guarantee on how long the replication of the partition/offset value will take. This is why you receive stale offsets in the poll() method.
In reality, Tasks are started once, and after that their poll() method is called as long as the Task is running. So the Task can store its offset in memory (e.g. a field in the SourceTask deriving class) and access/update it during each poll(). An obvious result of this is that multiple tasks working on the same partition (e.g. same file, same table, ...) can lead to message duplication as their offsets are not in sync within the partition.
FileStreamSourceTask from the official Kafka repository is a good example on how reading offsets can be handled in a Source connector:
// stream is only null if the poll method has not run yet - only in this case the offset will be read!
if (stream == null) {
Map<String, Object> offset = context.offsetStorageReader().offset(Collections.singletonMap(FILENAME_FIELD, filename));
Object lastRecordedOffset = offset.get(POSITION_FIELD);
streamOffset = (lastRecordedOffset != null) ? (Long) lastRecordedOffset : 0L;
Does Shopify/sarama provide an option similar to transactional.id in JVM API?
The library supports idempotence (Config.Producer.Idemponent, similar to enable.idempotence), but I don't understand how to use it without transactional.id.
Please, correct me if I'm wrong, there is a bit lack of documentation about these options in Sarama. But according to JVM docs, idempotence without the identifier will be limited by a single producer session. In other words, we will loss the guarantee when producer fails and restart.
I found relevant properties in the source code and some tests (for example), but don't understand how to use them externally.
Shopify/sarama Provides Kafka Exactly Once (Idempotency) with idempotent enabled producer. But For that below configuration setup need to be there.
From Shopify/sarama/config.go
if c.Producer.Idempotent {
if !c.Version.IsAtLeast(V0_11_0_0) {
return ConfigurationError("Idempotent producer requires Version >= V0_11_0_0")
if c.Producer.Retry.Max == 0 {
return ConfigurationError("Idempotent producer requires Producer.Retry.Max >= 1")
if c.Producer.RequiredAcks != WaitForAll {
return ConfigurationError("Idempotent producer requires Producer.RequiredAcks to be WaitForAll")
if c.Net.MaxOpenRequests > 1 {
return ConfigurationError("Idempotent producer requires Net.MaxOpenRequests to be 1")
In Shopify/sarama How they do this is, There is a producerEpoch ID in AsyncProducer's transactionManager. You can refer the file in Shopify/sarama/async_producer.go. This Id initialise with the producer initialisation and increment when successfully producing each message. read bumpEpoch() function to see that in async_producer.go file.
This is the sequence id for that producer session with the broker and it is sending with each message. Increment when message published successfully.
Read this example. It describes how idempotence works.
You are correct on producer session fact. That exactly once promised for single producer session. When restating producer just after the sequence failure, there can be a duplicate.
When producer restarts, new PID gets assigned. So the idempotency is promised only for a single producer session. Even though producer retries requests on failures, each message is persisted in the log exactly once. There can still be duplicates depending on the source where the producer is getting data. Kafka won’t take care of the duplicate data received by the producer. So, in some cases, you may require an additional de-duplication system.
Is there an equivalent of PublishSubject from RxJava in Kotlin Coroutines library?
Channels cannot be a replacement for PublishSubject since they do not publish values to multiple collectors (each value can be collected by a single collector only). Even MutableSharedFlow that supports multiple collectors, still does not allow emitting values without waiting for collectors to finish processing previous values. How can we create a flow with functionality similar to the PublishSubject?
The following code will create a Flow equivalent to the PublishSubject:
fun <T> publishFlow(): MutableSharedFlow<T> {
return MutableSharedFlow(
replay = 0,
extraBufferCapacity = Int.MAX_VALUE
The main attributes of the PublishSubject are that it does not replay old values to new observers, and still allows to publish new values/events without waiting for the observers to handle them. So this functionality can be achieved with MutableSharedFlow by specifying replay = 0 for preventing new collectors from collecting old values, and extraBufferCapacity = Int.MAX_VALUE to allow publishing new values without waiting for busy collectors to finish collecting previous values.
One can add the following forceEmit function to be called instead of tryEmit, to ensure that the value is actually emitted:
fun <T> MutableSharedFlow<T>.forceEmit(value: T) {
val emitted = tryEmit(value)
check(emitted){ "Failed to emit into shared flow." }
Since we have a buffer with MAX_VALUE capacity, this forceEmit function should never fail if we use it with our publishFlow. If the flow will be replaced somehow with a different flow that does not support emitting without suspending, we will get an exception and will know to handle the case where the buffer is full and one cannot emit without suspending.
Notice that having a buffer of MAX_VALUE capacity may cause high consumption of memory if the collection of values by the collectors takes a long time, so it is more suitable for cases where the collectors perform a short synchronous operation (similarly to RxJava observers).
I have a mongoDB that contains a list of "task" and two istance of executors. This 2 executors have to read a task from the DB, save it in the state "IN_EXECUTION" and execute the task. Of course I do not want that my 2 executors execute the same task and this is my problem.
I use the transaction query. In this way when An executor try to change state of the task it get "write exception" and have to start again and read a new task. The problem of this approach is that sometimes an Executor get a lot of errors before it can save the change of task state correctly and execute a new task. So it is like I have only one exector.
- I do not want to block my entire DB on read/write becouse in this way I will slow down the entire process.
- I think it is necessay to save the state of the task because it could be a long task.
I asked if it is possible to lock only certain record and execute a query on the "not-locked" records but each advices that solves my problem will be really appriciated.
Thanks in advance.
Sorry, I simplified the concept in the question above. Actually I extract n messages that I have to send. I have to send this messages in block of 100 messages so my executors will split the messages extracted in block of 100 and pass them to others executors basically.
Each executor extract the messages and then update them with the new state. I hope this is more clear now.
#Transactional(readOnly = false, propagation = Propagation.REQUIRED)
public List<PushMessageDB> assignPendingMessages(int limitQuery, boolean sortByClientPriority,
LocalDateTime now, String senderId) {
final List<PushMessageDB> messages = repositoryMessage.findByNotSendendAndSpecificError(limitQuery, sortByClientPriority, now);
long count = repositoryMessage.updateStateAndSenderId(messages, senderId, MessageState.IN_EXECUTION);
return messages;
DB update:
public long updateStateAndSenderId(List<String> ids, String senderId, MessageState messageState) {
Query query = new Query(Criteria.where(INTERNAL_ID).in(ids));
Update update = new Update().set(MESSAGE_STATE, messageState).set(SENDER_ID, senderId);
return mongoTemplate.updateMulti(query, update, PushMessageDB.class).getModifiedCount();
You will have to do the locking one-by-one.
Trying to lock 100 records at once and at the same time have a second process also lock 100 records (without any coordination between the two) will almost certainly result in an overlapping set unless you have a huge selection of available records.
Depending on your application, having all work done by one thread (and the other being just a "hot standby") may also be acceptable as long as that single worker does not get overloaded.
I am using Kafka streams 2.2.1.
I am using suppress to hold back events until a window closes. I am using event time semantics.
However, the triggered messages are only triggered once a new message is available on the stream.
The following code is extracted to sample the problem:
KStream<UUID, String>[] branches = is
.branch((key, msg) -> "a".equalsIgnoreCase(msg.split(",")[1]),
(key, msg) -> "b".equalsIgnoreCase(msg.split(",")[1]),
(key, value) -> true);
KStream<UUID, String> sideA = branches[0];
KStream<UUID, String> sideB = branches[1];
KStream<Windowed<UUID>, String> sideASuppressed =
Grouped.with(new MyUUIDSerde(),
.reduce((v1, v2) -> {
return v1;
Messages are only streamed from 'sideASuppressed' when a new message gets to 'sideA' stream (messages arriving to 'sideB' will not cause the suppress to emit any messages out even if the window closure time has passed a long time ago).
Although, in production the problem is likely not to occur much due to high volume, there are enough cases when it is essential not to wait for a new message that gets into 'sideA' stream.
Thanks in advance.
According to Kafka streams documentation:
Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available. If at least one partition does not have any new data available, stream-time will not be advanced and thus punctuate() will not be triggered if PunctuationType.STREAM_TIME was specified. This behavior is independent of the configured timestamp extractor, i.e., using WallclockTimestampExtractor does not enable wall-clock triggering of punctuate().
I am not sure why this is the case, but, it explains why suppressed messages are only being emitted when messages are available in the queue it uses.
If anyone has an answer regarding why the implementation is such, I will be happy to learn. This behavior causes my implementation to emit messages just to get my the suppressed message to emit in time and causes the code to be much less readable.
I'm using spring-kafka version 1.1.3 to consume messages from a topic. Auto commit is set to true and max.poll.records to 10 in the consumer config. session.timeout.ms is negotiated to 10 seconds with the server.
Upon receiving a message I persist part of it to the database. My database tends to be quite slow sometimes, which results in a session timeout by the kafka listener:
Auto offset commit failed for group mygroup: Commit cannot be completed
since the group has already rebalanced and assigned the partitions to
another member. This means that the time between subsequent calls to
poll() was longer than the configured session.timeout.ms, which
typically implies that the poll loop is spending too much time message
processing. You can address this either by increasing the session
timeout or by reducing the maximum size of batches returned in poll()
with max.poll.records.
Since I can't increase the session timeout on the server and the max.poll.records is already down at 10, I'd like to be able to wrap my database call in a transaction, which would rollback in case of a kafka session timeout.
Is this possible and how can I accomplish this?
Unfortunately I wasn't able to find a solution in the docs.
You have to consider to upgrade to Spring Kafka 1.2 and Kafka 0.10.x. The old Apache Kafka has a flaw with the heart-beat. So, using autoCommit and slow listener you end up with unexpected rebalancing and you are on your own with such a problem. The version of Spring Kafka you use has a logic like:
// if the container is set to auto-commit, then execute in the
// same thread
// otherwise send to the buffering queue
if (this.autoCommit) {
else {
if (sendToListener(records)) {
if (this.assignedPartitions != null) {
// avoid group management rebalance due to a slow
// consumer
this.paused = true;
this.unsent = records;
So, you may consider to switch off the autoCommit and rely on the built-in pause feature which is turned on by default.
Decided to upgrade to Kafka 0.11 since it adds transaction support (see Release Notes).