Kafka how to set producer retries to Infinity - spring-boot

How can I set the spring-boot property : spring.kafka.producer.retries to Integer.MAX_VALUE ?
Is it working to unset this property or this will default to 0 ?
#See default kafka in KIP
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

According to the Kafka docs it defaults to Integer.MAX_VALUE (at least with the current version), which concurs with the KIP.

Default value of ProducerConfig.RETRIES_CONFIG is 2147483647. Hope not defining the retries property will take care default value

By default it is 2147483647 which is Integer.MAX_VALUE you can set between [0,...,2147483647]
retries docs
Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Allowing retries without setting max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first. Note additionally that produce requests will be failed before the number of retries has been exhausted if the timeout configured by delivery.timeout.ms expires first before successful acknowledgement. Users should generally prefer to leave this config unset and instead use 1delivery.timeout.ms1 to control retry behavior.

Related

Kafka go-lang working of enable.auto.offset.store=false with auto.commit = false

Using go-lang 1.18 with confluent-kafka-go v1.8.2
used enable.auto.commit = false config. we are manually committing the offset once the successful processing of the message.
However, even after setting this config once we got an error while processing, we are not seeing the message with the same key. (that means somehow offset is getting committed even in error scenarios ).
Note: for the error scenario, it took 8-9 sec to process and completely call it an error.
Also got this from link It is recommended to set `enable.auto.offset.store=false` for long-time processing applications and then explicitly store offsets (using offsets_store()) after message processing, to make sure offsets are not auto-committed prior to processing has finished.
Que:
By default how much time does kafka wait till it auto-commits the offset.
Do we have a mechanism to stop this auto-commit at all.
Offsets are committed at the intervals configured in auto.commit.interval.ms, which by default is 5 seconds.
Setting enable.auto.commit to false should be enough to disable auto-committing completely.

How Kafka retries works with request.timeout.?

I have configured my Producer with request.timeout.ms = 70,0000ms and retries=5. I have doubt how this actually works,
After this "request.timeout.ms=70,000" expires it retries for 5 times or within given "request.timeout.ms=70,000" it retries for 5 time with retry.backoff.ms value.?
There are 3 important configs to be aware of:
"request.timeout.ms" - time to retry a single request
"delivery.timeout.ms" - time to complete the entire send operation
"retries" - how many times to retry when the broker responds with retriable errors.
The Apache Kafka recommendation is to set "delivery.timeout.ms" and leave the other two configurations with their default value. The idea is that the main thing you as a user should worry about is how long you want to way for Kafka to figure things out before giving up on it. It doesn't really matter what is taking Kafka so long - the connection, getting metadata, long queues, etc, the only thing that matters is how long you are willing to wait.
Now to your question - request.timeout.ms applies on each retry. So Producer will send the recordbatch to Kafka, and if there's no response after 70,000ms it will consider this a failure and retry. Note that most errors (say, NoLeaderForPartition) will return from the broker much faster (which is why retry backoffs are needed).
Reasoning about delivery times with retries + request.timeout.ms turned out to be near impossible even for those who wrote the producer. Hence, the introduction of delivery.time.ms with a very clear contract.

KafkaConsumer poll() behavior understanding

Trying to understand (new to kafka)how the poll event loop in kafka works.
Use Case : 25 records on the topic, max poll size is set to 5.
max.poll.interval.ms = 5000 //5 seconds by default max.poll.records = 5
Sequence of tasks
Poll the records from the topic.
Process the records in a for loop.
Some processing login where the logic would either pass or fail.
If logic passes (with offset) will be added to a map.
Then it will be committed using commitSync call.
If fails then the loop will break and whatever was success before this would be committed.The problem starts after this.
The next poll would just keep moving in batches of 5 even after error, is it expected?
What we basically expect is that the loop breaks and the offsets till success process message logic should get committed, then the next poll should continue from the failed message.
Example, 1st batch of poll 5 messages polled and 1,2 offsets successful and committed then 3rd failed.So the poll call keep moving to next batch like 5-10,10-15 if there are any errors in between we expect it to stop at that point and poll should start from 3 in first case or if it fails in 2nd batch at 8 then the next poll should start from 8th offset not from next max poll batch settings which would be like 5 in this case.IF IT MATTERS USING SPRING BOOT PROJECT and enable autocommit is false.
I have tried finding this in documentation but no help.
tried tweaking this but no help max.poll.interval.ms
EDIT: Not accepted answer because there is no direct solution for a customer consumer.Keeping this for informational purpose
max.poll.interval.ms is milliseconds, not seconds so it should be 5000.
Once the records have been returned by the poll (and offsets not committed), they won't be returned again unless you restart the consumer or perform seek() operations on the consumer to reset the offset to the unprocessed ones.
The Spring for Apache Kafka project provides a SeekToCurrentErrorHandler to perform this task for you.
If you are using the consumer yourself (which it sounds like), you must do the seeks.
You can manually seek to the beginning offset of the poll for all the assigned partitions on failure. I am not sure using spring consumer.
Sample code for seeking offset to beginning for normal consumer.
In the code below I am getting the records list per partition and then getting the offset of the first record to seek to.
def seekBack(records: ConsumerRecords[String, String]) = {
records.partitions().map(partition => {
val partitionedRecords = records.records(partition)
val offset = partitionedRecords.get(0).offset()
consumer.seek(partition, offset)
})
}
One problem doing this in production is bad since you don't want seekback all the time only in cases where you have a transient error otherwise you will end up retrying infinitely.

Indefinite log retention on kafka

I am using kafka for event-sourcing. I realized that we still need to configure the log retention time, i.e. log.retention.hours.
What is the best value to use if I want to keep all my messages indefinitely? The sample configuration for log.retention.bytes is set to -1, can I use -1 also in the log.retention.hours?
See the following Kafka JIRA which is due for the 0.9.0.0 release. For the time being set as suggested:
log.retention.bytes = -1
log.retention.hours = 2147483647
Which is the same as forever (~250K years).
And then when the 0.9.0.0 release is available the log.retention.hours should have similar -1 value available.

Timeout of JMS Point-to-point requests in JMeter does not result in an error

We are using Apache JMeter 2.12 in order to measure the response time of our JMS queue. However, we would like to see how many of those requests take less than a certain time. This, according to the official site of JMeter (http://jmeter.apache.org/usermanual/component_reference.html) should be set by the Timeout property. You can see in the photo below how our configuration looks like:
However, setting the timeout does not result in an error after sending 100 requests. We can see that some of them take apparently more than that amount of time:
Is there some other setting I am missing or is there a way to achieve my goal?
Thanks!
The JMeter documentation for JMS Point-to-Point describes the timeout as
The timeout in milliseconds for the reply-messages. If a reply has not been received within the specified time, the specific testcase failes and the specific reply message received after the timeout is discarded. Default value is 2000 ms.
This is timing not the actual sending the message but receipt of a response.
The source for the JMeter Point to Point will determine if you have a 'Receive Queue' Configured. If you do it will go through the executor path and use the timeout value, otherwise it does not use time timeout value.
if (useTemporyQueue()) {
executor = new TemporaryQueueExecutor(session, sendQueue);
} else {
producer = session.createSender(sendQueue);
executor = new FixedQueueExecutor(producer, getTimeoutAsInt(), isUseReqMsgIdAsCorrelId());
}
In your screen shot JNDI name Receive Queue is not defined, thus it uses temporary queue, and does not use the timeout. Should or should not timeout be supported in this case, that is best discussed in JMeter forum.
Alternately if you want to see request times in percentiles/buckets please read this stack overflow Q/A -
I want to find out the percentage of HTTPS requests that take less than a second in JMeter

Resources