Timeouts in camel - jms

I'm using the split-aggregate functionality in Camel to split some work up amongst some JMS clients.
The Camel route is defined (using groovy) as follows:
from("vm:getQuotes")
.split(new MethodCallExpression("requestSplitter", "splitAmongstBots"), new ArrayListAggregationStrategy())
.to("jms:queue:quoteRequests?requestTimeout=${responseTimeout}s")
.unmarshal().json(JsonLibrary.Jackson)
.end()
The JMS clients can take between approx 15 and 90 seconds to process the tasks.
I'm seeing this exception 30secs after the initial split:
Caused by: org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 30000 millis. Exchange[Message: {village=CHEC}]
at org.apache.camel.component.seda.SedaProducer.process(SedaProducer.java:144)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:191)
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:109)
at org.apache.camel.processor.UnitOfWorkProducer.process(UnitOfWorkProducer.java:68)
at org.apache.camel.impl.ProducerCache$2.doInProducer(ProducerCache.java:375)
at org.apache.camel.impl.ProducerCache$2.doInProducer(ProducerCache.java:343)
at org.apache.camel.impl.ProducerCache.doInProducer(ProducerCache.java:233)
at org.apache.camel.impl.ProducerCache.sendExchange(ProducerCache.java:343)
at org.apache.camel.impl.ProducerCache.send(ProducerCache.java:201)
at org.apache.camel.impl.DefaultProducerTemplate.send(DefaultProducerTemplate.java:128)
at org.apache.camel.impl.DefaultProducerTemplate.send(DefaultProducerTemplate.java:115)
at org.apache.camel.impl.DefaultProducerTemplate.sendBodyAndHeader(DefaultProducerTemplate.java:182)
... 116 common frames omitted
I have tried adding this line into the route just after the call to split:
.timeout(1000L * 60)
but to no avail - the exception is still thrown after 30 seconds.
Any ideas how I can increase the timeout that is in effect here?

The timeout comes from the vm endpoint, see (vm extends seda)
http://camel.apache.org/seda
You can set a higher timeout there, or configure it to not wait for task to complete.

Related

Spring camel kafka - Re-balancing and removing consumer

We have seen it where a consumer is removed from the consumer group, but I cant understand why.
As you can see from the errors below it suggests a timeout on Poll()
The TPS is less than 1, so very low, and each request takes around 200ms to ingest and push to DB.
This happened on 2 occasions in the within days of each other.
Result was that the service no longer read the message from the partition and a restart was required (Not good when you don't have alerting on offset buildup)
Any help/pointers would be greatly appreciated
Spring boot 2.5.13
Camel 3.16.0
2 Java applications (One in each DC)
1 Topic with 2 partitions
ERROR org.apache.camel.processor.errorhandler.DeadLetterChannel - log - Failed delivery for (MessageId: 4AA2CA19996CA12-000000000000424E on ExchangeId: 4AA2CA19996CA12-000000000000424E). On delivery attempt: 0 caught: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
WARN org.apache.camel.component.kafka.KafkaFetchRecords - handlePollErrorHandler - Deferring processing to the exception handler based on polling exception strategy
ERROR org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - handle - [Consumer clientId=consumer-pdr-writer-service-2, groupId=pdr-writer-service] Offset commit failed on partition MY-TOPIC-0 at offset 166742: The coordinator is not aware of this member.
auto.commit.interval.ms = 5000
auto.offset.reset = latest
connections.max.idle.ms = 540000
session.timeout.ms = 10000
max.poll.interval.ms = 300000
max.poll.records = 500
partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
group.id = a438f569-5701-4a83-885c-9111dfcbc743
group.instance.id = null
heartbeat.interval.ms = 3000
enable.auto.commit = true
A log we only saw once, at the same time we had these issues.
Requesting the consumer to retry polling the same message based on polling exception strategy
Exception org.apache.kafka.common.errors.TimeoutException caught while polling TOPIC-NAME-Thread 0 from kafka topic TOPIC-NAME at offset {TOPIC-NAME/1=166743}: Timeout of 5000ms expired before successfully committing offsets {TOPIC-NAME-1=OffsetAndMetadata{offset=166744, leaderEpoch=null, metadata=''}}
ERROR org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - handle - [Consumer clientId=consumer-pdr-writer-service-2, groupId=pdr-writer-service] Offset commit failed on partition TOPIC-NAME-1 at offset 166744: The coordinator is not aware of this member.

messages duplicated during rebalancing after service recovery from Kafka SSLHandshakeException

Current setup - Our Springboot application consumes messages from Kafka topic,We are processing one message at a time (we are not using streams).Below are the config properties and version being used.
ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG- 30000
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG-earliest
ContainerProperties.AckMode-RECORD
Spring boot version-2.5.7
Spring-kafka version- 2.7.8
Kafks-clients version-2.8.1
number of partitions- 6
consumer group- 1
consumers- 2
Issue - When springboot application stays idle for longer time(idle time varying from 4 hrs to 3 days).We are seeing org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Exception error message - org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching kafka-2.broker.emh-dev.service.dev found.
2022-04-07 06:58:42.437 ERROR 24180 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : Authentication/Authorization Exception, retrying in 10000 ms
After service recover we are seeing message duplication with same partition and offsets which is inconsistent.
Below are the exception:
Consumer clientId=XXXXXX, groupId=XXXXXX] Offset commit failed on partition XXXXXX at offset 354: The coordinator is not aware of this member
Seek to current after exception; nested exception is org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records

How to represent or monitor java.io.InterruptedIOException timeout?

We are getting the Caused by: java.io.InterruptedIOException: timeout exception in the logs from the server. However, the server is not giving the response code to us.
I am looking for the standard practice for the timeout monitoring to be followed in Splunk or Appdynamics to plot the graph for number of timeouts being received per second.
Shall we add the error code like 408 in the exception at client side or we should plot the graph for on basis of text "timeout" count with over the time.
Exception Logs
java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:150)
Caused by: java.io.InterruptedIOException: timeout
at okhttp3.internal.connection.Transmitter.timeoutExit(Transmitter.kt:104)
at okhttp3.internal.connection.Transmitter.maybeReleaseConnection(Transmitter.kt:293)
at okhttp3.internal.connection.Transmitter.noMoreExchanges(Transmitter.kt:257)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.kt:192)
at okhttp3.RealCall.execute(RealCall.kt:66)
For AppDynamics the ideal solution would be for a "bad error code" to be returned from the server - this would cause an error to be detected (and mark any associated Business Transaction as being in error) - see https://docs.appdynamics.com/22.2/en/application-monitoring/troubleshooting-applications/errors-and-exceptions#ErrorsandExceptions-BusinessTransactionError
Else you can use Custom Error Configuration to set a logger which signals errors - see https://docs.appdynamics.com/22.2/en/application-monitoring/configure-instrumentation/error-detection#ErrorDetection-ErrorDetectionConfiguration
Else you can capture values using a Data Collector and then use these in Analytics to break out errors - see https://docs.appdynamics.com/22.2/en/application-monitoring/configure-instrumentation/data-collectors + https://docs.appdynamics.com/22.2/en/analytics/configure-analytics/collect-transaction-analytics-data

seekToCurrentErrorHandler fails in case there are multiple failed records from different partitions if FixedBackOff is set as FixedBackOff(0L, 1)

With spring-kafka-2.5.4.RELEASE version, when there are multiple failed records from different partitions, seekToCurrentErrorHandler fails if FixedBackOff is set with maxAttempts as 1 and interval other than -1L.
SeekToCurrentErrorHandler seekToCurrentErrorHandler = new SeekToCurrentErrorHandler(,new FixedBackOff(0L, 1));
Although setting a value for interval other than -1L doesn't make sense when the maxAttemps count is 1 (as there will be no retry and hence no retry interval), shouldn't it either fail at startup complaining same or should be handled appropriately?.
It fails at run time when there are multiple failed records from different partitions with below error.
ERROR org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer - Error handler threw an exception
org.springframework.kafka.KafkaException: Seek to current after exception; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: <some IO Exception here, not one of them defined in FailedRecordProcessor.configureDefaultClassifier()>
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:157)
This seems to be with the below line.
Line 96 of FailedRecordTracker(i.e. if (nextBackOff != BackOffExecution.STOP) { )
https://github.com/spring-projects/spring-kafka/blob/v2.5.4.RELEASE/spring-kafka/src/main/java/org/springframework/kafka/listener/FailedRecordTracker.java#L96
which subsequently is resulting in entry to line 157 of SeekUtils(i.e. throw new KafkaException("Seek to current after exception", level, thrownException);)
https://github.com/spring-projects/spring-kafka/blob/v2.5.4.RELEASE/spring-kafka/src/main/java/org/springframework/kafka/listener/SeekUtils.java#L157
Perhaps you are migrating from an older version.
maxAttempts in FixedBackOff means max retry attempts so should be 0 for no retries.
See https://docs.spring.io/spring-kafka/docs/2.5.10.RELEASE/reference/html/#seek-to-current
Starting with version 2.3, a BackOff can be provided to the SeekToCurrentErrorHandler and DefaultAfterRollbackProcessor so that the consumer thread can sleep for some configurable time between delivery attempts. Spring Framework provides two out of the box BackOff s, FixedBackOff and ExponentialBackOff. The maximum back off time must not exceed the max.poll.interval.ms consumer property, to avoid a rebalance.
IMPORTANT: Previously, the configuration was "maxFailures" (which included the first delivery attempt). When using a FixedBackOff, its maxAttempts property represents the number of delivery retries (one less than the old maxFailures property). Also, maxFailures=-1 meant retry indefinitely with the old configuration, with a BackOff you would set the maxAttempts to Long.MAX_VALUE for a FixedBackOff and leave the maxElapsedTime to its default in an ExponentialBackOff.

KafkaProducerException on sending message to a topic

Spring boot properties for kafka producer:
spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.client-id=bam
#spring.kafka.producer.acks= # Number of acknowledgments the producer requires the leader to have received before considering a request complete.
spring.kafka.producer.batch-size=0
spring.kafka.producer.bootstrap-servers=localhost:9092
#spring.kafka.producer.buffer-memory= # Total bytes of memory the producer can use to buffer records waiting to be sent to the server.
spring.kafka.producer.client-id=bam-producer
spring.kafka.consumer.auto-offset-reset=earliest
#spring.kafka.producer.compression-type= # Compression type for all data generated by the producer.
spring.kafka.producer.key-serializer= org.apache.kafka.common.serialization.StringSerializer
#spring.kafka.producer.retries= # When greater than zero, enables retrying of failed sends.
spring.kafka.producer.value-serializer= org.apache.kafka.common.serialization.StringSerializer
#spring.kafka.properties.*= # Additional properties used to configure the client.
I am getting below exception when i am trying to send message to a kafka topic :
Caused by: org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for bam-0 due to 30004 ms has passed since last append
at org.springframework.kafka.core.KafkaTemplate$1.onCompletion(KafkaTemplate.java:255)
at org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:109)
at org.apache.kafka.clients.producer.internals.RecordBatch.maybeExpire(RecordBatch.java:160)
at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:245)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
... 1 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for bam-0 due to 30004 ms has passed since last append
I am not able to figure our why i am getting this exception. Can some one please help ?
The producer is timing out trying to send messages. I notice you are using localhost in your bootstrap. Make sure a broker is available locally and listening on port 9092.
Issue resolved by setting advertised.listeners in server.properties to PLAINTEXT://<ExternalIP>:9092.
Note: Kafka is deployed over aws.

Resources