Way to determine Kafka Topic for #KafkaListener on application startup?

Way to determine Kafka Topic for #KafkaListener on application startup? - spring

We have 5 topics and we want to have a service that scales for example to 5 instances of the same app.
This would mean that i would want to dynamically (via for example Redis locking or similar mechanism) determine which instance should listen to what topic.
I know that we could have 1 topic that has 5 partitions - and each node in the same consumer group would pick up a partition. Also if we have a separately deployed service we can set the topic via properties.
The issue is that those two are not suitable for our situation and we want to see if it is possible to do that via what i explained above.
#PostConstruct
private void postConstruct() {
// Do logic via redis locking or something do determine topic
dynamicallyDeterminedVariable = // SOME LOGIC
}
#KafkaListener(topics = "{dynamicallyDeterminedVariable")
void listener(String data) {
LOG.info(data);
}

Yes, you can use SpEL for the topic name.
#{#someOtherBean.whichTopicToUse()}.

Related

Multiple consumers with the same name in different projects subscribed to the same queue

We have UserCreated event that gets published from UserManagement.Api. I have two other Apis, Payments.Api and Notification.Api that should react to that event.
In both Apis I have public class UserCreatedConsumer : IConsumer<UserCreated> (so different namespaces) but only one queue (on SQS) gets created for both consumers.
What is the best way to deal with this situation?

You didn't share your configuration, but if you're using:
x.AddConsumer<UserCreatedConsumer>();
As part of your MassTransit configuration, you can specify an InstanceId for that consumer to generate a unique endpoint address.
x.AddConsumer<UserCreatedConsumer>()
.Endpoint(x => x.InstanceId = "unique-value");

Every separate service (not an instance of the same service) needs to have a different queue name of the receiving endpoint, as described in the docs:
cfg.ReceiveEndpoint("queue-name-per-service-type", e =>
{
// rest of the configuration
});
It's also mentioned in the common mistakes article.

Spring-Kafka Concurrency Property

I am progressing on writing my first Kafka Consumer by using Spring-Kafka. Had a look at the different options provided by framework, and have few doubts on the same. Can someone please clarify below if you have already worked on it.
Question - 1 : As per Spring-Kafka documentation, there are 2 ways to implement Kafka-Consumer; "You can receive messages by configuring a MessageListenerContainer and providing a message listener or by using the #KafkaListener annotation". Can someone tell when should I choose one option over another ?
Question - 2 : I have chosen KafkaListener approach for writing my application. For this I need to initialize a container factory instance and inside container factory there is option to control concurrency. Just want to double check if my understanding about concurrency is correct or not.
Suppose, I have a topic name MyTopic which has 4 partitions in it. And to consume messages from MyTopic, I've started 2 instances of my application and these instances are started by setting concurrency as 2. So, Ideally as per kafka assignment strategy, 2 partitions should go to consumer1 and 2 other partitions should go to consumer2. Since the concurrency is set as 2, does each of the consumer will start 2 threads, and will consume data from the topics in parallel ? Also should we consider anything if we are consuming in parallel.
Question 3 - I have chosen manual ack mode, and not managing the offsets externally (not persisting it to any database/filesystem). So should I need to write custom code to handle rebalance, or framework will manage it automatically ? I think no as I am acknowledging only after processing all the records.
Question - 4 : Also, with Manual ACK mode, which Listener will give more performance? BATCH Message Listener or normal Message Listener. I guess if I use Normal Message listener, the offsets will be committed after processing each of the messages.
Pasted the code below for your reference.
Batch Acknowledgement Consumer:
public void onMessage(List<ConsumerRecord<String, String>> records, Acknowledgment acknowledgment,
Consumer<?, ?> consumer) {
for (ConsumerRecord<String, String> record : records) {
System.out.println("Record : " + record.value());
// Process the message here..
listener.addOffset(record.topic(), record.partition(), record.offset());
}
acknowledgment.acknowledge();
}
Initialising container factory:
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<String, String>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> configs = new HashMap<String, Object>();
configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServer);
configs.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
configs.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, enablAutoCommit);
configs.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, maxPolInterval);
configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
configs.put(ConsumerConfig.CLIENT_ID_CONFIG, clientId);
configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return configs;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
// Not sure about the impact of this property, so going with 1
factory.setConcurrency(2);
factory.setBatchListener(true);
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
factory.getContainerProperties().setConsumerRebalanceListener(RebalanceListener.getInstance());
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setMessageListener(new BatchAckConsumer());
return factory;
}

#KafkaListener is a message-driven "POJO" it adds stuff like payload conversion, argument matching, etc. If you implement MessageListener you can only get the raw ConsumerRecord from Kafka. See #KafkaListener Annotation.
Yes, the concurrency represents the number of threads; each thread creates a Consumer; they run in parallel; in your example, each would get 2 partitions.
Also should we consider anything if we are consuming in parallel.
Your listener must be thread-safe (no shared state or any such state needs to be protected by locks.
It's not clear what you mean by "handle rebalance events". When a rebalance occurs, the framework will commit any pending offsets.
It doesn't make a difference; message listener Vs. batch listener is just a preference. Even with a message listener, with MANUAL ackmode, the offsets are committed when all the results from the poll have been processed. With MANUAL_IMMEDIATE mode, the offsets are committed one-by-one.

Q1:
From the documentation,
The #KafkaListener annotation is used to designate a bean method as a
listener for a listener container. The bean is wrapped in a
MessagingMessageListenerAdapter configured with various features, such
as converters to convert the data, if necessary, to match the method
parameters.
You can configure most attributes on the annotation with SpEL by using
"#{…} or property placeholders (${…}). See the Javadoc for more information."
This approach can be useful for simple POJO listeners and you do not need to implement any interfaces. You are also enabled to listen on any topics and partitions in a declarative way using the annotations. You can also potentially return the value you received whereas in case of MessageListener, you are bound by the signature of the interface.
Q2:
Ideally yes. If you have multiple topics to consume from, it gets more complicated though. Kafka by default uses RangeAssignor which has its own behaviour (you can change this -- see more details under).
Q3:
If your consumer dies, there will be rebalancing. If you acknowledge manually and your consumer dies before committing offsets, you do not need to do anything, Kafka handles that. But you could end up with some duplicate messages (at-least once)
Q4:
It depends what you mean by "performance". If you meant latency, then consuming each record as fast as possible will be the way to go. If you want to achieve high throughput, then batch consumption is more efficient.
I had written some samples using Spring kafka and various listeners - check out this repo

Workaround to fix StreamListener constant Channel Name

I am using cloud stream to consuming messages I am using something like
#StreamListener(target = "CONSTANT_CHANNEL_NAME")
public void readingData(String input){
System.out.println("consumed info is"+input);
}
But I want to keep channel name as per my environment and it should be picked from property file, while as per Spring channel name should be constant.
Is there any work around to fix this problem?
Edit:1
Let's see the actual situation
I am using multiple queues and dlq queues and it's binding is done with rabbit-mq
I want to change my channel name and queue name as per my environment
I want to do all on same AMQP host.
My Sink Code
public interfaceProcessorSink extends Sink {
#Input(CONSTANT_CHANNEL_NAME)
SubscribableChannel channel();
#Input(CONSTANT_CHANNEL_NAME_1)
SubscribableChannel channel2();
#Input(CONSTANT_CHANNEL_NAME_2)
SubscribableChannel channle2();
}

You can pick target value from property file as below:
#StreamListener(target = "${streamListener.target}")
public void readingData(String input){
System.out.println("consumed info is"+input);
}
application.yml
streamListener:
target: CONSTANT_CHANNEL_NAME

While there are many ways to do that I wonder why do you even care? In fact if anything you do want to make it constant so it is always the same, but thru configuration properties map it to different remote destinations (e.g., Kafka, Rabbit etc). For example spring.cloud.stream.bindings.input.destination=myKafkaTopic states that channel by the name input will be mapped to (bridged with) Kafka topic named myKafkaTopic'.
In fact, to further prove my point we completely abstracted away channels all together for users who use spring-cloud-function programming model, but that is a whole different discussion.
My point is that I believe you are actually creating a problem rather the solving it since with externalisation of the channel name you create probably that due to misconfiguration your actual bound channel and the channel you're mentioning in your properties are not going to be the same.

Leader election initialisation for multiple roles in clustered environment

I am currently working with an implementation based on:
org.springframework.integration.support.leader.LockRegistryLeaderInitiator
Having multiple candidate roles so that only one application instance within the cluster is elected as leader for each role. During initialisation of the cluster if autoStartup property is set to true the first application instance that is initialised will be elected as leader for all roles. This is something that we want to avoid and instead have a fair distribution of the lead roles across the cluster.
One possible solution on the above might be that when the cluster is ready and properly initialised then invoke an endpoint that will execute:
lockRegistryLeaderInitiator.start()
For all instances in the cluster so that the election process starts and the roles are fairly distributed across instances. One drawback on that is that this needs to be part of the deployment process, adding somehow complexity.
What is the proposed best practice on the above? Are there any plans for additional features related? For example to autoStartup the leader election only when X application instances are available?

I suggest you to take a look into the Spring Cloud Bus project. I don't know its details, but looks like your idea about autoStartup = false for all the LockRegistryLeaderInitiator instances and their startup by some distributed event is the way to go.
Not sure what we can do for you from the Spring Integration perspective, but it fully feels like not its responsibility and all the coordinations and rebalancing should be done via some other tool. Fortunately all our Spring projects can be used together as a single platform.
I think with the Bus you even really can track the number of instances joined the cluster and decide your self when and how to publish StartLeaderInitiators event.

It would be relatively easy with the Zookeeper LeaderInitiator because you could check in zookeeper for the instance count before starting it.
It's not so easy with the lock registry because there's no inherent information about instances; you would need some external mechanism (such as zookeeper, in which case, you might as well use ZK).
Or, you could use something like Spring Cloud Bus (with RabbitMQ or Kafka) to send a signal to all instances that it's time to start electing leadership.

I find very simple approach to do this.
You could add scheduled task to each node which periodically tries to yield leaderships if node holds too many of them.
For example, if you have N nodes and 2*N roles and you want to achieve completely fair leadership distribution (each node tries to hold only two leaderships) you can use something like this:
#Component
#RequiredArgsConstructor
public class FairLeaderDistributor {
private final List<LeaderInitiator> initiators;
#Scheduled(fixedDelay = 300_000) // once per 5 minutes
public void yieldExcessLeaderships() {
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(2) // keep only 2 leaderships
.forEach(Context::yield);
}
}
When all nodes will be up, you will eventually get completely fair leadership distribution.
You can also implement dynamic distribution based on current active node count if you use Zookeeper LeaderInitiator implementation.
Current number of participants can be easily retrieved from Curator LeaderSelector::getParticipants method.
You can get LeaderSelector with reflection from LeaderInitiator.leaderSelector field.
#Slf4j
#Component
#RequiredArgsConstructor
public class DynamicFairLeaderDistributor {
final List<LeaderInitiator> initiators;
#SneakyThrows
private static int getParticipantsCount(LeaderInitiator leaderInitiator) {
Field field = LeaderInitiator.class.getDeclaredField("leaderSelector");
field.setAccessible(true);
LeaderSelector leaderSelector = (LeaderSelector) field.get(leaderInitiator);
return leaderSelector.getParticipants().size();
}
#Scheduled(fixedDelay = 5_000)
public void yieldExcessLeaderships() {
int rolesCount = initiators.size();
if (rolesCount == 0) return;
int participantsCount = getParticipantsCount(initiators.get(0));
if (participantsCount == 0) return;
int maxLeadershipsCount = (rolesCount - 1) / participantsCount + 1;
log.info("rolesCount={}, participantsCount={}, maxLeadershipsCount={}", rolesCount, participantsCount, maxLeadershipsCount);
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(maxLeadershipsCount)
.forEach(Context::yield);
}
}

Is it possible for a JMS topic to have multiple publishers

From what I've read so-far, a JMS Topic is 1-to-Many and I wonder if its possible to support Many-to-Many using a topic. Consider a topic called "Reports" with multiple services spread out across an enterprise needing to publish scheduled reports. Having multiple publishers would alleviate the need to subscribe interested applications to a topic for each of the reporting services.
Note:
I'm going to use Spring and ActiveMQ in my solution.

#Mondain: yes, very much possible. A practical example would be live stock market price feed provided by multiple sources and those feed consumed by multiple channels.

Yes, you can create many TopicPublisher from your TopicSession, and many applications can connect the same Topic using TopicPublisher or TopicSubscriber.

You can do something like this, and call CreateMessageProducer to create a new instance of producer anywhere in your application.
public ActiveMqProducer(string activeMqServiceUrl)
{
_activeMqServiceUrl = activeMqServiceUrl;
IConnectionFactory factory = new ConnectionFactory(new Uri(_activeMqServiceUrl));
_activeMqConnection = factory.CreateConnection();
_activeMqSession = _activeMqConnection.CreateSession(AcknowledgementMode.Transactional);
_activeMqConnection.Start();
}
private IMessageProducer CreateMessageProducer(string mqTopicName)
{
ITopic destination = SessionUtil.GetTopic(_activeMqSession, mqTopicName);
var producer = _activeMqSession.CreateProducer(destination);
return producer;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Way to determine Kafka Topic for #KafkaListener on application startup? - spring

Yes, you can use SpEL for the topic name. #{#someOtherBean.whichTopicToUse()}.

Related

Multiple consumers with the same name in different projects subscribed to the same queue

Spring-Kafka Concurrency Property

Workaround to fix StreamListener constant Channel Name

Leader election initialisation for multiple roles in clustered environment

Is it possible for a JMS topic to have multiple publishers

Categories

Resources