Scheduling jobs while consuming Kafka messages - spring-boot

I want build a single Spring Boot application which does multiple different tasks concurrently. I did research on the internet but I could not find any way out. Let me get into detail.
I would like to start jobs in certain intervals for example once a day. I can do it using Spring Quartz. I also would like to listen messages on a dedicated internet address. Messages will come from Apache Kafka platform. Thus, I would like to use Kafka integration for Spring framework.
Is it applicable practically (listening messages all the time and executing scheduled jobs on time)

Functionally speaking, this design is fine: a single Spring Boot app can consume Kafka messages while also executing quartz jobs.
But higher level, you should ask why these two functions belong in a single app. Is there some inherent relationship between the quartz jobs and Kafka messages being consumed? Are you just combining them solely to limit yourself to one app and save on compute/memory resources?
You should also consider the impacts to scalability. What if you need to increase the rate at which you consume Kafka messages? If you scale your app to get more Kafka consumers, you have to worry about multiple apps now firing your quartz jobs.
So yes, it can be done, but without any more detail it sounds like you should break this design into 2 separate applications: one for Quartz and one for Kafka consuming.

Related

Advisable to run a Kafka producer + consumer in same application?

Spring + Apache Kafka noob here. I'm wondering if its advisable to run a single Spring Boot application that handles both producing messages as well as consuming messages.
A lot of the applications I've seen using Kafka lately usually have one separate application send/emit the message to a Kafka topic, and another one that consumes/processes the message from that topic. For larger applications, I can see a case for separate producer and consumer applications, but what about smaller ones?
For example: I'm a simple app that processes HTTP requests => send requests to a third party service, but to ensure retryability, I put the request on a Kafka queue with a service using the #Retryable annotation?
And what other considerations might come into play since it would be on the Spring framework?
Note: As your question states, what'll say is more of an advice based on my beliefs and experience rather than some absolute truth written in stone.
Your use case seems more like a proxy than an actual application with business logic. You should make sure that making this an asynchronous service makes sense - maybe it's good enough to simply hold the connection until you get a response from the 3p, and let your client handle retries if you get an error - of course, you can also retry until some timeout.
This would avoid common asynchronous issues such as making your client need to poll or have a webhook in order to get a result, or making sure a record still makes sense to be processed after a lot of time has elapsed after an outage or a high consumer lag.
If your client doesn't care about the result as long as it gets done, and you don't expect high-throughput on either side, a single Spring Boot application should be enough for handling both producer and consumer sides - while also keeping it simple.
If you do expect high throughput, I'd look into building a WebFlux based application with the reactor-kafka library - high throughput proxies are an excellent use case for reactive applications.
Another option would be having a simple serverless function that handles the http requests and produces the records, and a standard Spring Boot application to consume them.
TBH, I don't see a use case where having two full-fledged java applications to handle a proxy duty would pay off, unless maybe you have a really sound infrastructure to easily manage them that it doesn't make a difference having two applications instead of one and using more resources is not an issue.
Actually, if you expect really high traffic and a serverless function wouldn't work, or maybe you want to stick to Java-based solutions, then you could have a simple WebFlux-based application to handle the http requests and send the messages, and a standard Spring Boot or another WebFlux application to handle consumption. This way you'd be able to scale up the former in order to accommodate the high traffic, and independently scale the later in correspondence with your performance requirements.
As for the retry part, if you stick to non-reactive Spring Kafka applications, you might want to look into the non-blocking retries feature from Spring Kafka. This will enable your consumer application to process other records while waiting to retry a failed one - the #Retryable approach is deprecated in favor of DefaultErrorHandler and both will block consumption while waiting.
Note that with that you lose ordering guarantees, so use it only if the order the requests are processed is not important.

Scaling consumers #StreamListener

We're using spring cloud to serve asynchronous tasks. I wonder if there is any way to scale listeners set up by #StreamListener? The goal is to have multiple workers within one application instance.
I read about spring.cloud.stream.instancecount, but I don't want to replicate whole application, only increase workers count.
You should be able to accomplish that via spring.cloud.stream.bindings.input.consumer.concurrency consumer property. Here is more info

How to make Spring kafka client distributed

I have messages coming in from Kafka. So I am planning to write a listener and "onMessage". I want to process it and push it in to solr.
So my question is more architectural, like I have worked on web apps all my career, so in big data how to deploy the spring kafka listener, so I can process thousands of messages a second.
How do I make my spring code use multiple nodes to distribute the
load?
I am planning to write a SpringBoot application to run in
a tomcat container.
If you use the same group id for all instances, different partitions will be assigned to different consumers (instances of your application).
So, be sure that you specified enough partitions in the topic you are going to consume.

Hi, how spring-kafka handle consumer thread?

can someone please tell me does spring-Kafka has a feature like spring-JMS which can dynamically spin up threads or reduce threads based on load?
By use Kafka, do we need to worry about this thread management stuff? I know that the best practice for Kafka consumer is to have an equal amount of threads as how many partitions you have on that topic.
Spring for Apache Kafka does not dynamically adjust the number of consumer threads in any way. The partitions will be distributed across the number of threads you configure.
You could query the topic and configure the container appropriately before starting it.

How does Spring XD load balance between instances of the same module in different containers

I have read this post but it's not my case and not enough clear:
How does load balancing in Spring XD get done?
I have a composed job with different instances of the same sub-jobs deployed in different containers. My composed job is scheduled to run periodically. I need to know how Spring XD choose the sub-jobs instances to invoke for every new request to the composed job.
The same question for a stream triggered every X minutes.
It's handled by the transport (rabbit, redis).
Each downstream module competes for messages - with rabbit it will generally be round robin; with redis it will be more random.

Resources