Exactly Once Two Kafka Clusters - spring-boot

I have 2 Kafka clusters. Cluster A and Cluster B. These clusters are completely separate.
I have a spring-boot application that listens to a topic on Cluster A transforms the event and then produces it onto Cluster B. I require exactly once as these are financial events. I have noticed that with my current application I sometimes get duplicates as well as miss some events. I have tried to implement exactly once as best I could. One of the posts said flink would be a better option over spring-boot. Should i move over to flink? Please see the Spring code below.
ConsumerConfig
#Configuration
public class KafkaConsumerConfig {
#Value("${kafka.server.consumer}")
String server;
#Value("${kafka.kerberos.service.name:}")
String kerberosServiceName;
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, server);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
config.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 30000);
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, AvroDeserializer.class);
config.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
// skip kerberos if no value is provided
if (kerberosServiceName.length() > 0) {
config.put("security.protocol", "SASL_PLAINTEXT");
config.put("sasl.kerberos.service.name", kerberosServiceName);
}
return config;
}
#Bean
public ConsumerFactory<String, AccrualSchema> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs(), new StringDeserializer(),
new AvroDeserializer<>(AccrualSchema.class));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, AccrualSchema> kafkaListenerContainerFactory(ConsumerErrorHandler errorHandler) {
ConcurrentKafkaListenerContainerFactory<String, AccrualSchema> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setAutoStartup(true);
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setErrorHandler(errorHandler);
return factory;
}
#Bean
public KafkaConsumerAccrual receiver() {
return new KafkaConsumerAccrual();
}
}
ProducerConfig
#Configuration
public class KafkaProducerConfig {
#Value("${kafka.server.producer}")
String server;
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, server);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
config.put(ProducerConfig.ACKS_CONFIG, "all");
config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "prod-1");
config.put(ProducerConfig.COMPRESSION_TYPE_CONFIG,"snappy");
config.put(ProducerConfig.LINGER_MS_CONFIG, "10");
config.put(ProducerConfig.BATCH_SIZE_CONFIG, Integer.toString(32*1024));
return new DefaultKafkaProducerFactory<>(config);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
}
KafkaProducer
#Service
public class KafkaTopicProducer {
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
public void topicProducer(String payload, String topic) {
kafkaTemplate.executeInTransaction(kt->kt.send(topic, payload));
}
}
KafkaConsumer
public class KafkaConsumerAccrual {
#Autowired
KafkaTopicProducer kafkaTopicProducer;
#Autowired
Gson gson;
#KafkaListener(topics = "topicname", groupId = "groupid", id = "listenerid")
public void consume(AccrualSchema accrual,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) Integer partition, #Header(KafkaHeaders.OFFSET) Long offset,
#Header(KafkaHeaders.CONSUMER) Consumer<?, ?> consumer) {
AccrualEntity accrualEntity = convertBusinessObjectToAccrual(accrual,partition,offset);
kafkaTopicProducer.topicProducer(gson.toJson(accrualEntity, AccrualEntity.class), accrualTopic);
}
public AccrualEntity convertBusinessObjectToAccrual(AccrualSchema ao, Integer partition,
Long offset) {
//Transform code goes here
return ae;
}
}

Exactly Once semantics are not supported across clusters; the key is producing records and committing the consumed offset(s) within one atomic transaction.
That is not possible in your environment since a transaction can't span two clusters.

According to this specific Draft KIP :
https://cwiki.apache.org/confluence/display/KAFKA/KIP-656%3A+MirrorMaker2+Exactly-once+Semantics
The better thing that you can expect right know is an at-least-one semantic between two clusters.
It implies that you have to protect from duplication on the consumer side of the target broker.
As an example, you can identify a set a properties that must be unique for a specific windows of time. But i will really depend on your use case.

Related

Kafka multiple consumer instances does not receive message

i have an app that i want to open in multiple instances and i need each instance to receive the message.Actually each instance receives message only if the instance have different group id.I put more partitions than actual instances and it does not work.The app works as consumer/producer at the same time.
code:
#EnableKafka
#Configuration
public class KafkaProducerConfigForDepartment {
#Value(value = "${kafka.bootstrapAddress}")
private String bootstrapAddress;
#Bean
public ProducerFactory<String, MessageEventForDepartment> producerFactoryForDepartment() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
return new DefaultKafkaProducerFactory<>(configProps);
}
#Bean
public NewTopic topic1() {
return TopicBuilder.name("MARCEL")
.partitions(10)
.compact()
.build();
}
#Bean
public KafkaTemplate<String, MessageEventForDepartment> kafkaTemplate() {
return new KafkaTemplate<>(producerFactoryForDepartment());
}
}
#Configuration
public class KafkaTopicConfig {
#Value(value = "${kafka.bootstrapAddress}")
private String bootstrapAddress;
/* #Value(value = "${kafka.configId}")
private String configId;*/
#Bean
public ConsumerFactory<String, MessageEventForDepartment> consumerFactoryForDepartments() {
Map<String, Object> props = new HashMap<>();
props.put(JsonDeserializer.TRUSTED_PACKAGES, "*");
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
//props.put(ConsumerConfig.GROUP_ID_CONFIG, configId);
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(), new JsonDeserializer<>(MessageEventForDepartment.class));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, MessageEventForDepartment>
kafkaListenerContainerFactoryForDepartments() {
ConcurrentKafkaListenerContainerFactory<String, MessageEventForDepartment> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactoryForDepartments());
return factory;
}
}
#Component
#Slf4j
public class DepartmentKafkaService {
#Autowired
private DepartmentRepository departmentRepository;
#KafkaListener(topics = "MARCEL" , groupId = "ren",containerFactory = "kafkaListenerContainerFactoryForDepartments")
public void listenGroupFoo(MessageEventForDepartment message) {
...
Do you know why this happening because i thought if i will have more partitions per group than instances this will work?
That is how Kafka is designed; for multiple instances to get all the records, they must be in different groups.
When they are in the same group, the partitions are distributed across the active instances. Each time a new member arrives, or leaves, the partitions are rebalanced

How to use "li-apache-kafka-clients" in spring boot app to send large message (above 1MB) from Kafka producer?

How to use li-apache-kafka-clients in spring boot app to send large message (above 1MB) from Kafka producer to Kafka Consumer? Below is the GitHub link of li-apache-kafka-clients:
https://github.com/linkedin/li-apache-kafka-clients
I have imported .jar file of li-apache-kafka-clients and put the below configuration for producer:
props.put("large.message.enabled", "true");
props.put("max.message.segment.bytes", 1000 * 1024);
props.put("segment.serializer", DefaultSegmentSerializer.class.getName());
and for consumer:
message.assembler.buffer.capacity,
max.tracked.messages.per.partition,
exception.on.message.dropped,
segment.deserializer.class
but still getting error for large message. Please help me to solve error.
Below is my code, please let me know where I need to create LiKafkaProducer:
#Configuration
public class KafkaProducerConfig {
#Value("${kafka.boot.server}")
private String kafkaServer;
#Bean
public ProducerFactory<String, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfig());
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<String, String>(producerFactory());
}
#Bean
public Map<String, Object> producerConfig() {
// TODO Auto-generated method stub
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put("bootstrap.servers", "localhost:9092");
config.put("acks", "all");
config.put("retries", 0);
config.put("batch.size", 16384);
config.put("linger.ms", 1);
config.put("buffer.memory", 33554432);
// The following properties are used by LiKafkaProducerImpl
config.put("large.message.enabled", "true");
config.put("max.message.segment.bytes", 1000 * 1024);
config.put("segment.serializer", DefaultSegmentSerializer.class.getName());
config.put("auditor.class", LoggingAuditor.class.getName());
return config;
}
}
#RestController
#RequestMapping("/kafkaProducer")
public class KafkaProducerController {
#Autowired
private KafkaSender sender;
#PostMapping
public ResponseEntity<List<Student>> sendData(#RequestBody List<Student> student){
sender.sendData(student);
return new ResponseEntity<List<Student>>(student, HttpStatus.OK);
}
}
#Service
public class KafkaSender {
private static final Logger LOGGER = LoggerFactory.getLogger(KafkaSender.class);
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
#Value("${kafka.topic.name}")
private String topicName;
public void sendData(List<Student> student) {
// TODO Auto-generated method stub
Map<String, Object> headers = new HashMap<>();
headers.put(KafkaHeaders.TOPIC, topicName);
headers.put("payload", student.get(0));
// Construct a JSONObject from a Map.
JSONObject HeaderObject = new JSONObject(headers);
System.out.println("\nMethod-2: Using new JSONObject() ==> " + HeaderObject);
final String record = HeaderObject.toString();
Message<String> message = MessageBuilder.withPayload(record).setHeader(KafkaHeaders.TOPIC, topicName)
.setHeader(KafkaHeaders.MESSAGE_KEY, "Message")
.build();
kafkaTemplate.send(topicName, message.toString());
}
}
You would need to implement your own ConsumerFactory and ProducerFactory to create the LiKafkaConsumer and LiKafkaProducer respectively.
You should be able to subclass the default factories provided by the framework.

#KafkaListener in Unit test case does not consume from the container factory

I wrote a JUnit test case to test the code in the "With Java Configuration" lesson in the Spring Kafka docs. (https://docs.spring.io/spring-kafka/reference/htmlsingle/#_with_java_configuration). The onedifference is that I am using an Embedded Kafka Server in the class, instead of a localhost server. I am using Spring Boot 2.0.2 and its Spring-Kafka dependency.
While running this test case, I see that the Consumer is not reading the message from the topic and the "assertTrue" check fails. There are no other errors.
#RunWith(SpringRunner.class)
public class SpringConfigSendReceiveMessage {
public static final String DEMO_TOPIC = "demo_topic";
#Autowired
private Listener listener;
#Test
public void testSimple() throws Exception {
template.send(DEMO_TOPIC, 0, "foo");
template.flush();
assertTrue(this.listener.latch.await(60, TimeUnit.SECONDS));
}
#Autowired
private KafkaTemplate<Integer, String> template;
#Configuration
#EnableKafka
public static class Config {
#Bean
public KafkaEmbedded kafkaEmbedded() {
return new KafkaEmbedded(1, true, 1, DEMO_TOPIC);
}
#Bean
public ConsumerFactory<Integer, String> createConsumerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaEmbedded().getBrokersAsString());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, IntegerDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<Integer, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(createConsumerFactory());
return factory;
}
#Bean
public Listener listener() {
return new Listener();
}
#Bean
public ProducerFactory<Integer, String> producerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaEmbedded().getBrokersAsString());
props.put(ProducerConfig.RETRIES_CONFIG, 0);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return new DefaultKafkaProducerFactory<>(props);
}
#Bean
public KafkaTemplate<Integer, String> kafkaTemplate() {
return new KafkaTemplate<Integer, String>(producerFactory());
}
}
}
class Listener {
public final CountDownLatch latch = new CountDownLatch(1);
#KafkaListener(id = "foo", topics = DEMO_TOPIC)
public void listen1(String foo) {
this.latch.countDown();
}
}
I think that this is because the #KafkaListener is using some wrong/default setting when reading from the topic. I dont see any errors in the logs.
Is this unit test case correct? How can i find the object that is created for the KafkaListener annotation and see which Kafka broker it consumes from? Any inputs will be helpful. Thanks.
The message is sent before the consumer starts.
By default, new consumers start consuming at the end of the topic.
Add
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
The answer by #gary-russell is the best solution. Another way to resolve this issue was to delay the message send step by some time. This will enable the consumer to be ready. The following is also a correct solution.
Lesson learned - For unit testing Kafka consumers, either consume all the messages in the test case, or ensure that the Consumer is ready before Producer sends the message.
#Test
public void testSimple() throws Exception {
Thread.sleep(1000);
template.send(DEMO_TOPIC, 0, "foo");
template.flush();
assertTrue(this.listener.latch.await(60, TimeUnit.SECONDS));
}

Unable to read same message on same topic by multiple consumers

I am publishing a message to topicX. To consume this message by multiple consumers, I have written following ConsumerConfig (Note, each consumer has different groupd_id.
#EnableKafka
#Configuration
public class ConsumerConfig {
#Value("${spring.kafka.bootstrap-servers}")
private String bootstrapServers;
#Bean
public Map<String, Object> consumerConfigs(String groupId) {
Map<String, Object> props = new HashMap<>();
props.put(BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(GROUP_ID_CONFIG, "my.groupid." + groupId);
props.put(AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
#Bean
public ConsumerFactory<String, Car> consumerFactory(String groupId) {
return new DefaultKafkaConsumerFactory<>(consumerConfigs(groupId), new StringDeserializer(),
new JsonDeserializer<>(Car.class));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Car> consumerOneKafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Car> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory("one"));
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Car> consumerTwoKafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Car> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory("two"));
return factory;
}
}
And then on specific consumer, I have added following annotation (specified the group id and container factory)
#KafkaListener(id = "my.groupid.one", topics = "${app.topic.topicname}", containerFactory = "consumerOneKafkaListenerContainerFactory")
But when i publish the message, only one consumer consumes it. How can i i configure kafka so that each published message is consumed by multiple consumers?
I have tried this code with 1 partition and n partitions (where n = number of consumers) and yet i am having this issue.
UPDATE
Here is my topic producer class
#Service
#Slf4j
public class TopicProducer {
#Autowired
private KafkaTemplate<String, TopicPayload> kafkaTemplate;
public void send(String topicName, TopicPayload payload) {
log.info("sending message={} to topic={}", payload, topicName);
kafkaTemplate.send(topicName, payload);
}
}
And here is my TopicProducerConfig
#Configuration
public class TopicProducerConfig {
#Value("${spring.kafka.bootstrap-servers}")
private String bootstrapServers;
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
return props;
}
#Bean
public ProducerFactory<String, TopicPayload> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfigs());
}
#Bean
public KafkaTemplate<String, TopicPayload> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
}
I post a message by calling producer.send(topicName, payload).

Spring Kafka and number of topic consumers

In my Spring Boot/Kafka project I have the following consumer config:
#Configuration
public class KafkaConsumerConfig {
#Bean
public ConsumerFactory<String, String> consumerFactory(KafkaProperties kafkaProperties) {
return new DefaultKafkaConsumerFactory<>(kafkaProperties.buildConsumerProperties(), new StringDeserializer(), new JsonDeserializer<>(String.class));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(KafkaProperties kafkaProperties) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory(kafkaProperties));
factory.setConcurrency(10);
return factory;
}
#Bean
public ConsumerFactory<String, Post> postConsumerFactory(KafkaProperties kafkaProperties) {
return new DefaultKafkaConsumerFactory<>(kafkaProperties.buildConsumerProperties(), new StringDeserializer(), new JsonDeserializer<>(Post.class));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Post> postKafkaListenerContainerFactory(KafkaProperties kafkaProperties) {
ConcurrentKafkaListenerContainerFactory<String, Post> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(postConsumerFactory(kafkaProperties));
return factory;
}
}
This is my PostConsumer:
#Component
public class PostConsumer {
#Autowired
private PostService postService;
#KafkaListener(topics = "${kafka.topic.post.send}", containerFactory = "postKafkaListenerContainerFactory")
public void sendPost(ConsumerRecord<String, Post> consumerRecord) {
postService.sendPost(consumerRecord.value());
}
}
and the application.properties:
spring.kafka.bootstrap-servers=${kafka.host}:${kafka.port}
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.group-id=groupname
spring.kafka.consumer.enable-auto-commit=false
kafka.topic.post.send=post.send
kafka.topic.post.sent=post.sent
kafka.topic.post.error=post.error
As you may see, I have added factory.setConcurrency(10); but it doesn't work. All of the PostConsumer.sendPost execute in the same Thread with name org.springframework.kafka.KafkaListenerEndpointContainer#1-8-C-1
I'd like to be able to control the number of concurrent PostConsumer.sendPost listeners in order to work in parallel. Please show me how it can be achieved with Spring Boot and Spring Kafka.
The problem is here with consistency we are chasing in Spring Kafka using Apache Kafka Consumer. Such a concurrency is distributed between partitions in the topics provided. If you have only one topic and one partition in it, then there is indeed not going to be any concurrency. The point is to consume all the records from one partition in the same thread.
There is some info on the matter in the Docs: https://docs.spring.io/spring-kafka/docs/2.1.7.RELEASE/reference/html/_reference.html#_concurrentmessagelistenercontainer
If, say, 6 TopicPartition s are provided and the concurrency is 3; each container will get 2 partitions. For 5 TopicPartition s, 2 containers will get 2 partitions and the third will get 1. If the concurrency is greater than the number of TopicPartitions, the concurrency will be adjusted down such that each container will get one partition.
And also JavaDocs:
/**
* The maximum number of concurrent {#link KafkaMessageListenerContainer}s running.
* Messages from within the same partition will be processed sequentially.
* #param concurrency the concurrency.
*/
public void setConcurrency(int concurrency) {
To create & manage partitioned topic,
#Bean
public KafkaAdmin admin() {
Map<String, Object> configs = new HashMap<>();
configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_URL);
return new KafkaAdmin(configs);
}
#Bean
public NewTopic topicToTarget() {
return new NewTopic(Constant.Topic.PUBLISH_MESSAGE_TOPIC_NAME, <no. of partitions>, (short) <replication factor>);
}
To send message into different partitions, use Partitioner interface
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_URL);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, <your custom Partitioner implementation>);
return props;
}
To consume message from multiple partitions using a single consumer (each message from different partitions will spawn new thread and consumer method will be called in parallel)
#KafkaListener(topicPartitions = {
#TopicPartition(
topic = Constant.Topic.PUBLISH_MESSAGE_TOPIC_NAME,
partitions = "#{kafkaGateway.createPartitionArray()}"
)
}, groupId = "group.processor")
public void consumeWriteRequest(#Payload String data) {
//your code
}
Here the consumers (if multiple instances started) belongs to same group, hence one of them will be invoked for each message.

Resources