Kafka go-lang working of enable.auto.offset.store=false with auto.commit = false - go

Using go-lang 1.18 with confluent-kafka-go v1.8.2
used enable.auto.commit = false config. we are manually committing the offset once the successful processing of the message.
However, even after setting this config once we got an error while processing, we are not seeing the message with the same key. (that means somehow offset is getting committed even in error scenarios ).
Note: for the error scenario, it took 8-9 sec to process and completely call it an error.
Also got this from link It is recommended to set `enable.auto.offset.store=false` for long-time processing applications and then explicitly store offsets (using offsets_store()) after message processing, to make sure offsets are not auto-committed prior to processing has finished.
Que:
By default how much time does kafka wait till it auto-commits the offset.
Do we have a mechanism to stop this auto-commit at all.

Offsets are committed at the intervals configured in auto.commit.interval.ms, which by default is 5 seconds.
Setting enable.auto.commit to false should be enough to disable auto-committing completely.

Related

How Kafka retries works with request.timeout.?

I have configured my Producer with request.timeout.ms = 70,0000ms and retries=5. I have doubt how this actually works,
After this "request.timeout.ms=70,000" expires it retries for 5 times or within given "request.timeout.ms=70,000" it retries for 5 time with retry.backoff.ms value.?
There are 3 important configs to be aware of:
"request.timeout.ms" - time to retry a single request
"delivery.timeout.ms" - time to complete the entire send operation
"retries" - how many times to retry when the broker responds with retriable errors.
The Apache Kafka recommendation is to set "delivery.timeout.ms" and leave the other two configurations with their default value. The idea is that the main thing you as a user should worry about is how long you want to way for Kafka to figure things out before giving up on it. It doesn't really matter what is taking Kafka so long - the connection, getting metadata, long queues, etc, the only thing that matters is how long you are willing to wait.
Now to your question - request.timeout.ms applies on each retry. So Producer will send the recordbatch to Kafka, and if there's no response after 70,000ms it will consider this a failure and retry. Note that most errors (say, NoLeaderForPartition) will return from the broker much faster (which is why retry backoffs are needed).
Reasoning about delivery times with retries + request.timeout.ms turned out to be near impossible even for those who wrote the producer. Hence, the introduction of delivery.time.ms with a very clear contract.

KafkaConsumer poll() behavior understanding

Trying to understand (new to kafka)how the poll event loop in kafka works.
Use Case : 25 records on the topic, max poll size is set to 5.
max.poll.interval.ms = 5000 //5 seconds by default max.poll.records = 5
Sequence of tasks
Poll the records from the topic.
Process the records in a for loop.
Some processing login where the logic would either pass or fail.
If logic passes (with offset) will be added to a map.
Then it will be committed using commitSync call.
If fails then the loop will break and whatever was success before this would be committed.The problem starts after this.
The next poll would just keep moving in batches of 5 even after error, is it expected?
What we basically expect is that the loop breaks and the offsets till success process message logic should get committed, then the next poll should continue from the failed message.
Example, 1st batch of poll 5 messages polled and 1,2 offsets successful and committed then 3rd failed.So the poll call keep moving to next batch like 5-10,10-15 if there are any errors in between we expect it to stop at that point and poll should start from 3 in first case or if it fails in 2nd batch at 8 then the next poll should start from 8th offset not from next max poll batch settings which would be like 5 in this case.IF IT MATTERS USING SPRING BOOT PROJECT and enable autocommit is false.
I have tried finding this in documentation but no help.
tried tweaking this but no help max.poll.interval.ms
EDIT: Not accepted answer because there is no direct solution for a customer consumer.Keeping this for informational purpose
max.poll.interval.ms is milliseconds, not seconds so it should be 5000.
Once the records have been returned by the poll (and offsets not committed), they won't be returned again unless you restart the consumer or perform seek() operations on the consumer to reset the offset to the unprocessed ones.
The Spring for Apache Kafka project provides a SeekToCurrentErrorHandler to perform this task for you.
If you are using the consumer yourself (which it sounds like), you must do the seeks.
You can manually seek to the beginning offset of the poll for all the assigned partitions on failure. I am not sure using spring consumer.
Sample code for seeking offset to beginning for normal consumer.
In the code below I am getting the records list per partition and then getting the offset of the first record to seek to.
def seekBack(records: ConsumerRecords[String, String]) = {
records.partitions().map(partition => {
val partitionedRecords = records.records(partition)
val offset = partitionedRecords.get(0).offset()
consumer.seek(partition, offset)
})
}
One problem doing this in production is bad since you don't want seekback all the time only in cases where you have a transient error otherwise you will end up retrying infinitely.

Kafka how to set producer retries to Infinity

How can I set the spring-boot property : spring.kafka.producer.retries to Integer.MAX_VALUE ?
Is it working to unset this property or this will default to 0 ?
#See default kafka in KIP
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
According to the Kafka docs it defaults to Integer.MAX_VALUE (at least with the current version), which concurs with the KIP.
Default value of ProducerConfig.RETRIES_CONFIG is 2147483647. Hope not defining the retries property will take care default value
By default it is 2147483647 which is Integer.MAX_VALUE you can set between [0,...,2147483647]
retries docs
Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Allowing retries without setting max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first. Note additionally that produce requests will be failed before the number of retries has been exhausted if the timeout configured by delivery.timeout.ms expires first before successful acknowledgement. Users should generally prefer to leave this config unset and instead use 1delivery.timeout.ms1 to control retry behavior.

Indefinite log retention on kafka

I am using kafka for event-sourcing. I realized that we still need to configure the log retention time, i.e. log.retention.hours.
What is the best value to use if I want to keep all my messages indefinitely? The sample configuration for log.retention.bytes is set to -1, can I use -1 also in the log.retention.hours?
See the following Kafka JIRA which is due for the 0.9.0.0 release. For the time being set as suggested:
log.retention.bytes = -1
log.retention.hours = 2147483647
Which is the same as forever (~250K years).
And then when the 0.9.0.0 release is available the log.retention.hours should have similar -1 value available.

Delay / Lag between Commit and select with Distributed Transactions when two connections are enlisted to the transaction in Oracle with ODAC

We have our application calling to two Oracle databases using two connections (which are kept open through out the application). For certain functionality, we use distributed transactions. We have Enlist=false in the connection string and manually enlist the connection to the transaction.
The problem comes with a scenario where, we update the same record very frequently within a distributed transaction, on which we see a delay to see the commited data in the previous run.
ex.
using (OracleConnection connection1 = new OracleConnection())
{
using(OracleConnection connection2 = new OracleConnection())
{
connection1.ConnectionString = connection1String;
connection1.Open();
connection2.ConnectionString = connection2String;
connection2.Open();
//for 100 times, do an update
{
.. check the previously updated value
connection1.EnlistTransaction(currentTransaction);
connection2.EnlistTransaction(currentTransaction);
.. do an update using connection1
.. do some updates with connection2
}
}
}
as in the above code fragment, we do update and check the previously updated value in the next iteration. The issues comes up when we run this for a single record frequently, on which we don't see the committed update in the last iteration in the next iteration even though it was committed in the previous iteration. But when this happens this update is visible in other applications in a very very small delay, and even within our code it's visible if we were to debug and run the line again.
It's almost like delay in the commit even though previous commit returned from the code.
Any one has any ideas ?
It turned out that I there's no way to control this behavior through ODAC. So the only viable solution was to implement a retry behavior in our code, since this occurs very rarely and when it happens, delay 10 seconds and retry the same.
Additional details on things I that I found on this can be found here.

Resources