How to make Spring kafka client distributed - spring

I have messages coming in from Kafka. So I am planning to write a listener and "onMessage". I want to process it and push it in to solr.
So my question is more architectural, like I have worked on web apps all my career, so in big data how to deploy the spring kafka listener, so I can process thousands of messages a second.
How do I make my spring code use multiple nodes to distribute the
load?
I am planning to write a SpringBoot application to run in
a tomcat container.

If you use the same group id for all instances, different partitions will be assigned to different consumers (instances of your application).
So, be sure that you specified enough partitions in the topic you are going to consume.

Related

How can I retrieve Kafka messages inside a controller in Spring Boot?

The messages created by the producer are all being consumed as expected.
The thing is, I need to create an endpoint to retrieve the latest messages from the consumer.
Is there a way to do it?
Like an on-demand consumer?
I found this SO post but is only to consume the last N records. I want to consume the latest without caring about the offsets.
Spring Kafka Consumer, rewind consumer offset to go back 'n' records
I'm working with Kotlin but if you have the answer in Java I don't mind either.
There are several ways to create listener containers dynamically; you can then start/stop them on demand. To get the records back into the controller, you'd need to use something like a blocking queue, or make the controller itself a MessageListener.
These answers show a couple of techniques for creating containers on demand:
How to dynamically create multiple consumers in Spring Kafka
Kafka Consumer in spring can I re-assign partitions programmatically?

Create and cleanup instance specific rabbitMQ instances

I have a set of microservices using springboot rest. These microservices will be deployed in a autoscaled and load balanced environment. One of these services is responsible for managing the system's configuration. When other microservices startup, they obtain the configuration from this service. If and when the configuration is updated, I need to inform all currently running microservices instances to update their cached configuration.
I am considering using RabbitMQ with a fanout exchange. In this solution, each instance at startup will create its queue and bind that queue to the exchange. When there is a configuration change, the configuration service will publish an update to all queues currently bound to that exchange.
However, as service instances are deleted, I cannot figure out how would I delete the queue specific to that instance. I googled but could not find a complete working example of a solution.
Any help or advise?
The idea and solution is correct. What you just miss that those queues, created by your consumer services could be declared as auto-delete=true: https://www.rabbitmq.com/queues.html. As long as your service is UP, the queue is there as well. You stop your service, its consumers are stopped and unsubscribed. At the moment the last consumer is unsubscribed the queue is deleted from the broker.
On the other hand I would suggest to look into Spring Cloud Bus project which really is aimed for tasks like this: https://spring.io/projects/spring-cloud-bus.

What should be the TransactionIdPrefix for multiple spring boot consumer/produces apps which are connected to kafka (3 brokers))

I am having multiple spring boot applications which are connected to kafka (clustrized with 3 brokers)and also i integrated transaction synchronization (chainedKafkaTransactionManager). so i want to know should i give the same TransactionIdPrefix value in kafka config for all the multiple application or diffrent one.
i tried giving a random generated TransactionIdPrefix for each application. but i think in some time in multi thread environment in Listeners method it will take old data from database (jpa repositories)
is it problem because of diffrent TransactionIdPrefix ?
It depends; if they are multiple instances of the same app and the transactions are started by consumers, the prefix must be the same, so that zombie fencing is handled properly when partitions move from one instance to another after a rebalance.
If the transactions are started by producers, the prefix must be unique in each instance.
If they are different applications they should have different prefixes, regardless of what starts the transaction.

Scheduling jobs while consuming Kafka messages

I want build a single Spring Boot application which does multiple different tasks concurrently. I did research on the internet but I could not find any way out. Let me get into detail.
I would like to start jobs in certain intervals for example once a day. I can do it using Spring Quartz. I also would like to listen messages on a dedicated internet address. Messages will come from Apache Kafka platform. Thus, I would like to use Kafka integration for Spring framework.
Is it applicable practically (listening messages all the time and executing scheduled jobs on time)
Functionally speaking, this design is fine: a single Spring Boot app can consume Kafka messages while also executing quartz jobs.
But higher level, you should ask why these two functions belong in a single app. Is there some inherent relationship between the quartz jobs and Kafka messages being consumed? Are you just combining them solely to limit yourself to one app and save on compute/memory resources?
You should also consider the impacts to scalability. What if you need to increase the rate at which you consume Kafka messages? If you scale your app to get more Kafka consumers, you have to worry about multiple apps now firing your quartz jobs.
So yes, it can be done, but without any more detail it sounds like you should break this design into 2 separate applications: one for Quartz and one for Kafka consuming.

How does Spring XD load balance between instances of the same module in different containers

I have read this post but it's not my case and not enough clear:
How does load balancing in Spring XD get done?
I have a composed job with different instances of the same sub-jobs deployed in different containers. My composed job is scheduled to run periodically. I need to know how Spring XD choose the sub-jobs instances to invoke for every new request to the composed job.
The same question for a stream triggered every X minutes.
It's handled by the transport (rabbit, redis).
Each downstream module competes for messages - with rabbit it will generally be round robin; with redis it will be more random.

Resources