KafkaTransactionManager with Spring transaction annotation - spring

We are trying to implement a kafka transaction through spring boot. And noticed some interesting things.
#Configuration
public class KafkaProducerConfiguration {
#Value(value = "${spring.kafka.bootstrap-servers}")
private String bootstrapAddress;
#Bean
public ProducerFactory<Long, RatingProcessingMessage> producerFactory() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
configProps.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "transaction-");
return new DefaultKafkaProducerFactory<>(configProps);
}
#Bean
public KafkaTemplate<Long, Message> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
}
Here it is interesting, after set ProducerConfig.TRANSACTIONAL_ID_CONFIG, the auto configuration class KafkaAutoConfiguration.class should create the bean
#Bean
#ConditionalOnProperty(
name = {"spring.kafka.producer.transaction-id-prefix"}
)
#ConditionalOnMissingBean
public KafkaTransactionManager<?, ?> kafkaTransactionManager(ProducerFactory<?, ?> producerFactory) {
return new KafkaTransactionManager(producerFactory);
}
But it doesn't happen, so spring.kafka.producer.transaction-id-prefix != ProducerConfig.TRANSACTIONAL_ID_CONFIG
Сould not find a suitable constant in ProducerConfig.
For reasons unknown to me, when calling kafkaTemplate.send() in the transaction, we see in the log JpaTransactionManager
15:56:13.939 [http-nio-8081-exec-2] INFO o.a.k.c.p.i.TransactionManager - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Invoking InitProducerId for the first time in order to acquire a producer ID
15:56:13.991 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Cluster ID: M-DqpGu4T_25jGui6701-w1
15:56:14.013 [kafka-producer-network-thread | producer-transaction-0] INFO o.a.k.c.p.i.TransactionManager - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Discovered transaction coordinator localhost:9094 (id: 0 rack: null)
15:56:14.185 [kafka-producer-network-thread | producer-transaction-0] INFO o.a.k.c.p.i.TransactionManager - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] ProducerId set to 5 with epoch 14
15:56:14.258 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-0 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.259 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-5 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.259 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-2 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.259 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-8 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.259 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-9 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.259 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-4 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.260 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-1 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.260 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-6 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.260 [kafka-producer-network-thread | producer-transaction-0] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-transaction-0, transactionalId=transaction-0] Resetting the last seen epoch of partition event-rating-7 to 0 since the associated topicId changed from null to dEt3oJ_SQqW0Gz1mgPc5Fg
15:56:14.406 [http-nio-8081-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Initiating transaction commit
15:56:14.406 [http-nio-8081-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Committing JPA transaction on EntityManager [SessionImpl(1789264984<open>)]
15:56:14.511 [http-nio-8081-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [SessionImpl(1789264984<open>)] after transaction
import org.springframework.transaction.annotation.Transactional;
#AllArgsConstructor
#Service
public class TestService {
private final KafkaTemplate<Long, Message> kafkaTemplate;
#Transactional
public void test(Long id, Message message){
kafkaTemplate.send("topic", id, message);
}
}
As described above
I wonder why KafkaTransactionManager is not injected (but it is important that the transaction was successful without this bean)
For some reason JpaTransactionManager appeared in the logs

Boot knows nothing about
configProps.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "transaction-");
Boot auto configuration requires you to set the property in application.yml/properties.
https://docs.spring.io/spring-boot/docs/current/reference/html/application-properties.html#application-properties.integration.spring.kafka.producer.transaction-id-prefix
Instead of defining your own producer factory, let boot auto configure it for you.
You must also have enable JPA to get its TM.

Related

kafka SASL_PLAIN SCRAM is fail in spring boot consumer

I try kafka authentication SASL_PLAINTEXT / SCRAM but Authentication failed in spring boot.
i try change SASL_PLAINTEXT / PLAIN and it's working. but SCRAM is Authentication failed SHA-512 and SHA-256
did many diffrent things but it's not working....
how can i fix it?
broker log
broker1 | [2020-12-31 02:57:37,831] INFO [SocketServer brokerId=1] Failed authentication with /172.29.0.1 (Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512) (org.apache.kafka.common.network.Selector)
broker2 | [2020-12-31 02:57:37,891] INFO [SocketServer brokerId=2] Failed authentication with /172.29.0.1 (Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512) (org.apache.kafka.common.network.Selector)
Spring boot log
2020-12-31 11:57:37.438 INFO 82416 --- [ restartedMain] o.a.k.c.s.authenticator.AbstractLogin : Successfully logged in.
2020-12-31 11:57:37.497 INFO 82416 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.6.0
2020-12-31 11:57:37.499 INFO 82416 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 62abe01bee039651
2020-12-31 11:57:37.499 INFO 82416 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1609383457495
2020-12-31 11:57:37.502 INFO 82416 --- [ restartedMain] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=consumer-Test-Consumer-1, groupId=Test-Consumer] Subscribed to topic(s): test
2020-12-31 11:57:37.508 INFO 82416 --- [ restartedMain] o.s.s.c.ThreadPoolTaskScheduler : Initializing ExecutorService
2020-12-31 11:57:37.528 INFO 82416 --- [ restartedMain] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''
2020-12-31 11:57:37.546 INFO 82416 --- [ restartedMain] i.m.k.p.KafkaProducerScramApplication : Started KafkaProducerScramApplication in 2.325 seconds (JVM running for 3.263)
2020-12-31 11:57:37.833 INFO 82416 --- [ntainer#0-0-C-1] o.apache.kafka.common.network.Selector : [Consumer clientId=consumer-Test-Consumer-1, groupId=Test-Consumer] Failed authentication with localhost/127.0.0.1 (Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512)
2020-12-31 11:57:37.836 ERROR 82416 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-Test-Consumer-1, groupId=Test-Consumer] Connection to node -1 (localhost/127.0.0.1:9091) failed authentication due to: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512
2020-12-31 11:57:37.837 WARN 82416 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-Test-Consumer-1, groupId=Test-Consumer] Bootstrap broker localhost:9091 (id: -1 rack: null) disconnected
2020-12-31 11:57:37.842 ERROR 82416 --- [ntainer#0-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.common.errors.SaslAuthenticationException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:151) ~[spring-kafka-2.6.4.jar:2.6.4]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:113) ~[spring-kafka-2.6.4.jar:2.6.4]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1425) ~[spring-kafka-2.6.4.jar:2.6.4]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1122) ~[spring-kafka-2.6.4.jar:2.6.4]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]
Caused by: org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512
my docker-compose.yml
...
...
zookeeper3:
image: confluentinc/cp-zookeeper:6.0.1
hostname: zookeeper3
container_name: zookeeper3
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper3:2183
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://zookeeper:2183
ZOOKEEPER_CLIENT_PORT: 2183
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_SERVER_ID: 3
KAFKA_OPTS: "-Djava.security.auth.login.config=/etc/kafka/secrets/sasl/zookeeper_jaas.conf \
-Dzookeeper.authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider \
-Dzookeeper.authProvider.2=org.apache.zookeeper.server.auth.DigestAuthenticationProvider \
-Dquorum.auth.enableSasl=true \
-Dquorum.auth.learnerRequireSasl=true \
-Dquorum.auth.serverRequireSasl=true \
-Dquorum.auth.learner.saslLoginContext=QuorumLearner \
-Dquorum.auth.server.saslLoginContext=QuorumServer \
-Dquorum.cnxn.threads.size=20 \
-DrequireClientAuthScheme=sasl"
volumes:
- /etc/kafka/secrets/sasl:/etc/kafka/secrets/sasl
broker1:
image: confluentinc/cp-kafka:6.0.1
hostname: broker1
container_name: broker1
depends_on:
- zookeeper1
- zookeeper2
- zookeeper3
ports:
- "9091:9091"
- "9101:9101"
- "29091:29091"
expose:
- "29090"
environment:
KAFKA_OPTS: "-Dzookeeper.sasl.client=true -Djava.security.auth.login.config=/etc/kafka/secrets/sasl/kafka_server_jaas.conf"
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper1:2181,zookeeper2:2182,zookeeper3:2183'
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT,SASL_PLAINHOST:SASL_PLAINTEXT
KAFKA_LISTENERS: INSIDE://:29090,OUTSIDE://:29091,SASL_PLAINHOST://:9091
KAFKA_ADVERTISED_LISTENERS: INSIDE://broker1:29090,OUTSIDE://localhost:29091,SASL_PLAINHOST://localhost:9091
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_JMX_PORT: 9101
KAFKA_JMX_HOSTNAME: localhost
KAFKA_SECURITY_INTER_BROKER_PROTOCAL: SASL_PLAINTEXT
KAFKA_SASL_ENABLED_MECHANISMS: SCRAM-SHA-512
KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAINTEXT
volumes:
- /etc/kafka/secrets/sasl:/etc/kafka/secrets/sasl
broker2:
image: confluentinc/cp-kafka:6.0.1
hostname: broker2
container_name: broker2
depends_on:
- zookeeper1
- zookeeper2
- zookeeper3
ports:
- "9092:9092"
- "9102:9102"
- "29092:29092"
expose:
- "29090"
environment:
KAFKA_OPTS: "-Dzookeeper.sasl.client=true -Djava.security.auth.login.config=/etc/kafka/secrets/sasl/kafka_server_jaas.conf"
KAFKA_BROKER_ID: 2
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper1:2181,zookeeper2:2182,zookeeper3:2183'
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT,SASL_PLAINHOST:SASL_PLAINTEXT
KAFKA_LISTENERS: INSIDE://:29090,OUTSIDE://:29092,SASL_PLAINHOST://:9092
KAFKA_ADVERTISED_LISTENERS: INSIDE://broker2:29090,OUTSIDE://localhost:29092,SASL_PLAINHOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_JMX_PORT: 9102
KAFKA_JMX_HOSTNAME: localhost
KAFKA_SECURITY_INTER_BROKER_PROTOCAL: SASL_PLAINTEXT
KAFKA_SASL_ENABLED_MECHANISMS: SCRAM-SHA-512
KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAINTEXT
volumes:
- /etc/kafka/secrets/sasl:/etc/kafka/secrets/sasl
kaka_server_jaas.conf
KafkaServer {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="admin"
password="password"
user_admin="password"
user_client="password";
};
Client {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="admin"
password="password";
};
KafkaClient {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="client"
password="password";
};
zookeeper_jaas.conf
Server {
org.apache.kafka.common.security.plain.PlainLoginModule required
user_admin="password";
};
QuorumServer {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_admin="password";
};
QuorumLearner {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="admin"
password="password";
};
ConsumerConfig.java
private static final String BOOTSTRAP_ADDRESS = "localhost:9091,localhost:9092";
private static final String JAAS_TEMPLATE = "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"%s\" password=\"%s\";";
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
String jaasCfg = String.format(JAAS_TEMPLATE, "client", "password");
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_ADDRESS);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "1000");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "Test-Consumer");
props.put("sasl.jaas.config", jaasCfg);
props.put("sasl.mechanism", "SCRAM-SHA-512");
props.put("security.protocol", "SASL_PLAINTEXT");
return props;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
return factory;
}
solved.
because i didn't add user information in zookeeper.
add this code.
zookeeper-add-kafka-users:
image: confluentinc/cp-kafka:6.0.1
container_name: "zookeeper-add-kafka-users"
depends_on:
- zookeeper1
- zookeeper2
- zookeeper3
command: "bash -c 'echo Waiting for Zookeeper to be ready... && \
cub zk-ready zookeeper1:2181 120 && \
cub zk-ready zookeeper2:2182 120 && \
cub zk-ready zookeeper3:2183 120 && \
kafka-configs --zookeeper zookeeper1:2181 --alter --add-config 'SCRAM-SHA-512=[iterations=4096,password=password]' --entity-type users --entity-name admin && \
kafka-configs --zookeeper zookeeper1:2181 --alter --add-config 'SCRAM-SHA-512=[iterations=4096,password=password]' --entity-type users --entity-name client '"
environment:
KAFKA_BROKER_ID: ignored
KAFKA_ZOOKEEPER_CONNECT: ignored
KAFKA_OPTS: -Djava.security.auth.login.config=/etc/kafka/secrets/sasl/kafka_server_jaas.conf
volumes:
- /home/mysend/dev/docker/kafka/sasl:/etc/kafka/secrets/sasl
kafka SASL_PLAIN SCRAM
if don't use docker can use command
bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'SCRAM-SHA-256=[password=admin-secret],SCRAM-SHA-512=[password=admin-secret]' --entity-type users --entity-name admin

Spring kafka application with multiple consumer groups stops consuming messages

kafka version 2.3.1
Spring boot verison: 2.2.5.RELEASE
I have spring boot Kafka application with 3 consumer groups. It stops consuming messages because of failing heartbeat. I tried updating the consumer configuration suggested by multiple stack overflow threads. Even after that, I am facing the issue. In the configuration,
Consumers are taking less than one second as per log to consume a message till the point they are being consumed and suddenly it stops. Also, some of the processing in the consumer happening in an asynchronous thread
Below is the configuration for one of the consumer factories.
I gave 10 seconds buffered time for each record and based on that configured MAX_POLL_INTERVAL_MS_CONFIG
#Bean
public ConsumerFactory<Object, Object> reqConsumerFactory()
{
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "req-event-group");
props.put(ConsumerConfig.CLIENT_ID_CONFIG, "req-event-group");
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerConfig.getBootstrapAddress());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.RETRY_BACKOFF_MS_CONFIG, 1000);
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 50*15*1000);
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 5000);
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 50*10*1000);
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 50);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Object, Object>> reqKafkaListenerContainerFactory()
{
ConcurrentKafkaListenerContainerFactory<Object, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(reqConsumerFactory());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.setErrorHandler(new SeekToCurrentErrorHandler(2));
factory.setConcurrency(2);
return factory;
}
One of the consumer method
#KafkaListener(topicPattern = "${process.update.requirement.topic.name}", containerFactory = "reqKafkaListenerContainerFactory", groupId = "req-event-group")
public void handleCompleteAndErrorRequirement(ConsumerRecord<String, Object> consumerRecord,Acknowledgment acknowledgment)
{
RequirementEventMsg requirementEventMsg = (RequirementEventMsg)consumerRecord;
acknowledgment.acknowledge();
//asynchronous method call here
}
I don't see any error other than this
2020-06-23 13:28:06.815 [INFO ] AbstractCoordinator:855 - [Consumer clientId=consumer-6, groupId=process-consumer-group] Attempt to heartbeat failed since group is rebalancing
2020-06-23 13:28:06.835 [INFO ] AbstractCoordinator:855 - [Consumer clientId=consumer-4, groupId=process-consumer-group] Attempt to heartbeat failed since group is rebalancing
2020-06-23 13:28:07.175 [INFO ] ConsumerCoordinator:472 - [Consumer clientId=consumer-4, groupId=process-consumer-group] Revoking previously assigned partitions [UPDATE_REQUIREMENT_TOPIC-1, UPDATE_REQUIREMENT_TOPIC-0]
2020-06-23 13:28:07.176 [INFO ] KafkaMessageListenerContainer:394 - partitions revoked: [UPDATE_REQUIREMENT_TOPIC-1, UPDATE_REQUIREMENT_TOPIC-0]
2020-06-23 13:28:07.177 [INFO ] AbstractCoordinator:509 - [Consumer clientId=consumer-4, groupId=process-consumer-group] (Re-)joining group
2020-06-23 13:28:07.233 [INFO ] ConsumerCoordinator:472 - [Consumer clientId=consumer-6, groupId=process-consumer-group] Revoking previously assigned partitions [PROCESS_EVENT_TOPIC-0, PROCESS_EVENT_TOPIC-1]
2020-06-23 13:28:07.233 [INFO ] KafkaMessageListenerContainer:394 - partitions revoked: [PROCESS_EVENT_TOPIC-0, PROCESS_EVENT_TOPIC-1]

KAFKA : splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE

I am sending 10 messages . 2 messagesare "right" and 1 message has size over 1MB which gets rejected by Kafka broker due to RecordTooLargeException.
I have 2 doubts
1) MESSAGE_TOO_LARGE appears only when the Scheduler calls the method second time onwards.When the method is called for the first time by scheduler splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE doesnt appear.
2)Why retries are not getting decreased.I have given retry=1 .
I am calling Sender class using Spring Boot Scheduling mechanism.Something like this
#Scheduled(fixedDelay = 30000)
public void process() {
sender.sendThem();
}
I am using Spring Boot KafkaTemplate.
#Configuration
#EnableKafka
public class KakfaConfiguration {
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
// props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
// props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,
// appProps.getJksLocation());
// props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,
// appProps.getJksPassword());
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.ACKS_CONFIG, acks);
config.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, retryBackOffMsConfig);
config.put(ProducerConfig.RETRIES_CONFIG, retries);
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "prod-99");
return new DefaultKafkaProducerFactory<>(config);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean(name = "ktm")
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
}
#Component
#EnableTransactionManagement
class Sender {
#Autowired
private KafkaTemplate<String, String> template;
private static final Logger LOG = LoggerFactory.getLogger(Sender.class);
#Transactional("ktm")
public void sendThem(List<String> toSend) throws InterruptedException {
List<ListenableFuture<SendResult<String, String>>> futures = new ArrayList<>();
CountDownLatch latch = new CountDownLatch(toSend.size());
ListenableFutureCallback<SendResult<String, String>> callback = new ListenableFutureCallback<SendResult<String, String>>() {
#Override
public void onSuccess(SendResult<String, String> result) {
LOG.info(" message sucess : " + result.getProducerRecord().value());
latch.countDown();
}
#Override
public void onFailure(Throwable ex) {
LOG.error("Message Failed ");
latch.countDown();
}
};
toSend.forEach(str -> {
ListenableFuture<SendResult<String, String>> future = template.send("t_101", str);
future.addCallback(callback);
});
if (latch.await(12, TimeUnit.MINUTES)) {
LOG.info("All sent ok");
} else {
for (int i = 0; i < toSend.size(); i++) {
if (!futures.get(i).isDone()) {
LOG.error("No send result for " + toSend.get(i));
}
}
}
I am getting the following logs
2020-05-01 15:55:18.346 INFO 6476 --- [ scheduling-1] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1588328718345
2020-05-01 15:55:18.347 INFO 6476 --- [ scheduling-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to -1 with epoch -1
2020-05-01 15:55:18.351 INFO 6476 --- [oducer-prod-991] org.apache.kafka.clients.Metadata : [Producer clientId=producer-prod-991, transactionalId=prod-991] Cluster ID: bL-uhcXlRSWGaOaSeDpIog
2020-05-01 15:55:48.358 INFO 6476 --- [oducer-prod-991] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to 13000 with epoch 10
Value of kafka template----- 1518752790
2020-05-01 15:55:48.377 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 8 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT0
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT1
2020-05-01 15:55:48.511 ERROR 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : Message Failed
2020-05-01 15:55:48.512 ERROR 6476 --- [oducer-prod-991] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='
2020-05-01 15:55:48.514 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 10 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.518 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 11 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.523 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 12 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.527 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 13 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.531 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 14 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.534 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 15 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.538 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 16 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.542 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 17 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.546 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 18 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
Then after sometime the program completes with following log
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 2 record(s) for t_101-0:120000 ms has passed since batch creation
2020-05-01 16:18:31.322 WARN 17816 --- [ scheduling-1] o.s.k.core.DefaultKafkaProducerFactory : Error during transactional operation; producer removed from cache; possible cause: broker restarted during transaction: CloseSafeProducer [delegate=org.apache.kafka.clients.producer.KafkaProducer#7085a4dd, txId=prod-991]
2020-05-01 16:18:31.322 INFO 17816 --- [ scheduling-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-prod-991, transactionalId=prod-991] Closing the Kafka producer with timeoutMillis = 5000 ms.
2020-05-01 16:18:31.324 INFO 17816 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Aborting incomplete transaction due to shutdown
error messahe here
------ processing done in parent class------
A broad picture of producer workflow is given below.
By setting RETRIES_CONFIG property, we can guarantee that in case of failure this producer will try sending that message.
If the batch is too large, we split the batch and send the split batches again. We do not decrement the retry attempts in this case.
You can go through the source code given below and find the scenarios in which retry count is decremented.
https://github.com/apache/kafka/blob/68ac551966e2be5b13adb2f703a01211e6f7a34b/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

Enabling exactly once causes streams shutdown due to timeout while initializing transactional state

I've written a simple example to test the join functionality. As I sometimes get messages duplicated in the resulting topic and sometimes missing messages in this topic, I thought while pinpointing the problem to enable exactly once semantics. However while doing this through:
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
I get a timeout that causes kafka streams to shut down in my app:
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2019-05-02 17:02:32.593 INFO 153056 --- [-StreamThread-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1-0_0-producer, transactionalId=join-test-0_0] ProducerId set to -1 with epoch -1
2019-05-02 17:03:32.599 ERROR 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Error caught during partition assignment, will abort the current process and re-throw at the end of rebalance: {}
org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
2019-05-02 17:03:32.599 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] partition assignment took 60044 ms.
current active tasks: []
current standby tasks: []
previous active tasks: []
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutting down
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] State transition from REBALANCING to ERROR
2019-05-02 17:03:32.615 WARN 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] All stream threads have died. The instance will be in error state and should be closed.
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutdown complete
Exception in thread "join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:870)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:810)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
static String ORIGINAL = "original-sensor-data";
static String ERROR = "error-score";
public static void main(String[] args) throws IOException {
SpringApplication.run(JoinTest.class, args);
Properties props = getProperties();
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, OriginalSensorData> original = builder.stream(ORIGINAL, Consumed.with(Serdes.String(), new OriginalSensorDataSerde()));
final KStream<String, ErrorScore> error = builder.stream(ERROR, Consumed.with(Serdes.String(), new ErrorScoreSerde()));
KStream<String, ErrorScore> result = original.join(
error,
(originalValue, errorValue) -> new ErrorScore(new Date(originalValue.getTimestamp()), errorValue.getE(),
originalValue.getData().get("TE700PV").doubleValue(), errorValue.getT(), errorValue.getR()),
// KStream-KStream joins are always windowed joins, hence we must provide a join window.
JoinWindows.of(Duration.ofMillis(3000).toMillis()),
Joined.with(
Serdes.String(), /* key */
new OriginalSensorDataSerde(), /* left value */
new ErrorScoreSerde() /* right value */
)
).through("atl-joined-data-repartition", Produced.with(Serdes.String(), new ErrorScoreSerde()));
result.foreach((key, value) -> System.out.println("Join Stream: " + key + " " + value));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
private static Properties getProperties() {
Properties props = new Properties();
//Url of the kafka broker, this can also be found in the Aiven console
props.put("bootstrap.servers", "localhost:9095");
props.put("group.id", "join-test");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put("application.id", "join-test");
props.put("default.timestamp.extractor", "com.my.SensorDataTimestampExtractor");
//The key of a message is a string
props.put("key.deserializer",
StringDeserializer.class.getName());
props.put("value.deserializer",
StringDeserializer.class.getName());
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
return props;
}
I'm expecting that the app starts without the timeout and continues working

Spring boot always try to reconnect the failed node in Redis cluster enviroment

I have a redis cluster with 3 shards. Each shard has 2 nodes, 1 primary and 1 replica. I'm using spring-boot 2.0.1. Final and following is the configuration and code im using to create redis cluster connect.
pom.xml:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependencies>
application.yml:
redis:
cluster:
nodes: 172.18.0.155:7010,172.18.0.155:7011,172.18.0.155:7012,172.18.0.156:7020,172.18.0.156:7021,172.18.0.156:7022
max-redirects: 3
timeout: 5000
lettuce:
pool:
max-active: 200
max-idle: 8
min-idle: 0
max-wait: 1000
database: 0
RedisConfig.java:
package com.central.redis.config;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.serializer.RedisSerializer;
import org.springframework.data.redis.serializer.StringRedisSerializer;
import com.central.redis.config.util.RedisObjectSerializer;
#Configuration
public class RedisConfig {
#Primary
#Bean("redisTemplate")
#ConditionalOnProperty(name = "spring.redis.cluster.nodes", matchIfMissing = false)
public RedisTemplate<String, Object> getRedisTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
redisTemplate.setConnectionFactory(factory);
RedisSerializer stringSerializer = new StringRedisSerializer();
// RedisSerializer redisObjectSerializer = new RedisObjectSerializer();
RedisSerializer redisObjectSerializer = new RedisObjectSerializer();
redisTemplate.setKeySerializer(stringSerializer);
redisTemplate.setHashKeySerializer(stringSerializer);
redisTemplate.setValueSerializer(redisObjectSerializer);
redisTemplate.afterPropertiesSet();
redisTemplate.opsForValue().set("hello", "wolrd");
return redisTemplate;
}
#Primary
#Bean("redisTemplate")
#ConditionalOnProperty(name = "spring.redis.host", matchIfMissing = true)
public RedisTemplate<String, Object> getSingleRedisTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
redisTemplate.setConnectionFactory(factory);
redisTemplate.setKeySerializer(new StringRedisSerializer());
redisTemplate.setValueSerializer(new RedisObjectSerializer());
redisTemplate.afterPropertiesSet();
return redisTemplate;
}
}
We recently had an issue where one of the primary nodes in a shard of the cluster had problem, which triggered failover. So the shard had two nodes 001 (primary) and 002 (replica). 001 failed-over, and 002 became primary.
And then, the app will try to reconnect the failed nodes. And it caused the redis cluster visit to failed. My assumption was that even if one node was failed, it should automatically refreshed the topology and connect the new master node. But it didn't.
Here is the logs:
2018-11-03 17:46:21.992 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'environmentManager': registering with JMX server as MBean [org.springframework.cloud.context.environment:name=environmentManager,type=EnvironmentManager]
2018-11-03 17:46:22.015 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'dataSourceLog': registering with JMX server as MBean [com.alibaba.druid.spring.boot.autoconfigure:name=dataSourceLog,type=DruidDataSourceWrapper]
2018-11-03 17:46:22.022 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'refreshScope': registering with JMX server as MBean [org.springframework.cloud.context.scope.refresh:name=refreshScope,type=RefreshScope]
2018-11-03 17:46:22.055 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'configurationPropertiesRebinder': registering with JMX server as MBean [org.springframework.cloud.context.properties:name=configurationPropertiesRebinder,context=70a9f84e,type=ConfigurationPropertiesRebinder]
2018-11-03 17:46:22.075 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'dataSourceCore': registering with JMX server as MBean [com.alibaba.druid.spring.boot.autoconfigure:name=dataSourceCore,type=DruidDataSourceWrapper]
2018-11-03 17:46:22.078 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'statFilter': registering with JMX server as MBean [com.alibaba.druid.filter.stat:name=statFilter,type=StatFilter]
2018-11-03 17:46:22.101 [main] INFO org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase 0
2018-11-03 17:46:22.136 [main] INFO org.springframework.cloud.netflix.eureka.InstanceInfoFactory - Setting initial instance status as: STARTING
2018-11-03 17:46:22.203 [main] INFO com.netflix.discovery.DiscoveryClient - Initializing Eureka in region us-east-1
2018-11-03 17:46:22.317 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using JSON encoding codec LegacyJacksonJson
2018-11-03 17:46:22.317 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using JSON decoding codec LegacyJacksonJson
2018-11-03 17:46:22.557 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using XML encoding codec XStreamXml
2018-11-03 17:46:22.558 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using XML decoding codec XStreamXml
2018-11-03 17:46:23.144 [main] INFO com.netflix.discovery.shared.resolver.aws.ConfigClusterResolver - Resolving eureka endpoints via configuration
2018-11-03 17:46:23.188 [main] INFO com.netflix.discovery.DiscoveryClient - Disable delta property : false
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Single vip registry refresh property : null
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Force full registry fetch : false
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Application is null : false
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Registered Applications size is zero : true
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Application version is -1: true
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Getting all instance registry info from the eureka server
2018-11-03 17:46:23.578 [main] INFO com.netflix.discovery.DiscoveryClient - The response status is 200
2018-11-03 17:46:23.587 [main] INFO com.netflix.discovery.DiscoveryClient - Starting heartbeat executor: renew interval is: 10
2018-11-03 17:46:23.596 [main] INFO com.netflix.discovery.InstanceInfoReplicator - InstanceInfoReplicator onDemand update allowed rate per min is 4
2018-11-03 17:46:23.602 [main] INFO com.netflix.discovery.DiscoveryClient - Discovery Client initialized at timestamp 1541238383601 with initial instances count: 7
2018-11-03 17:46:23.622 [main] INFO org.springframework.cloud.netflix.eureka.serviceregistry.EurekaServiceRegistry - Registering application AUTH-SERVER with eureka with status UP
2018-11-03 17:46:23.624 [main] INFO com.netflix.discovery.DiscoveryClient - Saw local status change event StatusChangeEvent [timestamp=1541238383623, current=UP, previous=STARTING]
2018-11-03 17:46:23.634 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.netflix.discovery.DiscoveryClient - DiscoveryClient_AUTH-SERVER/auth-server:172.18.0.153:8000 : registering service...
2018-11-03 17:46:23.636 [main] INFO org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase 2147483647
2018-11-03 17:46:23.637 [main] INFO springfox.documentation.spring.web.plugins.DocumentationPluginsBootstrapper - Context refreshed
2018-11-03 17:46:23.706 [main] INFO springfox.documentation.spring.web.plugins.DocumentationPluginsBootstrapper - Found 1 custom documentation plugin(s)
2018-11-03 17:46:23.720 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.netflix.discovery.DiscoveryClient - DiscoveryClient_AUTH-SERVER/auth-server:172.18.0.153:8000 - registration status: 204
2018-11-03 17:46:23.908 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.alibaba.druid.pool.DruidDataSource - {dataSource-1} inited
2018-11-03 17:46:23.940 [main] INFO springfox.documentation.spring.web.scanners.ApiListingReferenceScanner - Scanning for api listing references
2018-11-03 17:46:24.391 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: rolesUsingGET_1
2018-11-03 17:46:24.525 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: getUserTokenInfoUsingPOST_1
2018-11-03 17:46:24.559 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: deleteUsingDELETE_1
2018-11-03 17:46:24.578 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: saveOrUpdateUsingPOST_1
2018-11-03 17:46:24.893 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.alibaba.druid.pool.DruidDataSource - {dataSource-2} inited
2018-11-03 17:46:24.947 [main] INFO org.springframework.scheduling.annotation.ScheduledAnnotationBeanPostProcessor - No TaskScheduler/ScheduledExecutorService bean found for scheduled processing
2018-11-03 17:46:24.969 [main] INFO org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8000"]
2018-11-03 17:46:24.971 [main] INFO org.apache.tomcat.util.net.NioSelectorPool - Using a shared selector for servlet write/read
2018-11-03 17:46:25.027 [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 8000 (http) with context path ''
2018-11-03 17:46:25.029 [main] INFO org.springframework.cloud.netflix.eureka.serviceregistry.EurekaAutoServiceRegistration - Updating port to 8000
2018-11-03 17:46:25.034 [main] INFO com.central.OpenAuthServerApp - Started OpenAuthServerApp in 31.95 seconds (JVM running for 33.267)
2018-11-03 17:48:47.239 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was /172.18.0.156:7022
2018-11-03 17:48:47.239 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was /172.18.0.156:7022
2018-11-03 17:48:56.236 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:48:56.236 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:04.436 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:04.436 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:20.835 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:20.835 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:50.935 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:50.935 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:21.035 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:21.037 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:51.135 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:51.135 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:21.235 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:21.236 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:23.191 [AsyncResolver-bootstrap-executor-0] INFO com.netflix.discovery.shared.resolver.aws.ConfigClusterResolver - Resolving eureka endpoints via configuration
2018-11-03 17:51:51.335 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:51.335 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:21.435 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:21.435 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:51.535 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:51.535 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
As you can see, there are not any error logs here. Just some lettuce ConnectionWatchdog reconnecting info, when one primary node failed. I know the reconnect behaviou is normal. But why it could affect access to the redis cluster?
Has anyone met this problem before? Did I miss any important things?

Resources