Advisable to run a Kafka producer + consumer in same application? - spring

Spring + Apache Kafka noob here. I'm wondering if its advisable to run a single Spring Boot application that handles both producing messages as well as consuming messages.
A lot of the applications I've seen using Kafka lately usually have one separate application send/emit the message to a Kafka topic, and another one that consumes/processes the message from that topic. For larger applications, I can see a case for separate producer and consumer applications, but what about smaller ones?
For example: I'm a simple app that processes HTTP requests => send requests to a third party service, but to ensure retryability, I put the request on a Kafka queue with a service using the #Retryable annotation?
And what other considerations might come into play since it would be on the Spring framework?

Note: As your question states, what'll say is more of an advice based on my beliefs and experience rather than some absolute truth written in stone.
Your use case seems more like a proxy than an actual application with business logic. You should make sure that making this an asynchronous service makes sense - maybe it's good enough to simply hold the connection until you get a response from the 3p, and let your client handle retries if you get an error - of course, you can also retry until some timeout.
This would avoid common asynchronous issues such as making your client need to poll or have a webhook in order to get a result, or making sure a record still makes sense to be processed after a lot of time has elapsed after an outage or a high consumer lag.
If your client doesn't care about the result as long as it gets done, and you don't expect high-throughput on either side, a single Spring Boot application should be enough for handling both producer and consumer sides - while also keeping it simple.
If you do expect high throughput, I'd look into building a WebFlux based application with the reactor-kafka library - high throughput proxies are an excellent use case for reactive applications.
Another option would be having a simple serverless function that handles the http requests and produces the records, and a standard Spring Boot application to consume them.
TBH, I don't see a use case where having two full-fledged java applications to handle a proxy duty would pay off, unless maybe you have a really sound infrastructure to easily manage them that it doesn't make a difference having two applications instead of one and using more resources is not an issue.
Actually, if you expect really high traffic and a serverless function wouldn't work, or maybe you want to stick to Java-based solutions, then you could have a simple WebFlux-based application to handle the http requests and send the messages, and a standard Spring Boot or another WebFlux application to handle consumption. This way you'd be able to scale up the former in order to accommodate the high traffic, and independently scale the later in correspondence with your performance requirements.
As for the retry part, if you stick to non-reactive Spring Kafka applications, you might want to look into the non-blocking retries feature from Spring Kafka. This will enable your consumer application to process other records while waiting to retry a failed one - the #Retryable approach is deprecated in favor of DefaultErrorHandler and both will block consumption while waiting.
Note that with that you lose ordering guarantees, so use it only if the order the requests are processed is not important.

Related

how to initialize a continous running stream using alpakka, spring boot & Akka-stream?

All,
I am developing an application, which use alpakka spring boot integration to read data from kafka. I have most of the code ready, the only place i am stuck is how to initialize a continuous running stream, as this is going to be a backend application and wont be having any api to be called from ?
As far as I know, Alpakka's Spring integration is basically designed around exposing Akka Streams via a Spring HTTP controller. So I'm not sure what purpose bringing Spring into this serves, since there's quite an impedance mismatch between the way an Akka application will tend to like to work and the way a Spring application will tend to like to work.
Assuming you're talking about using Alpakka Kafka, the most idiomatic thing to do would be to just start a stream fed by an Alpakka Kafka Source in your main method and it will run until killed or it fails. You may want to use a RestartSource around the consumer and business logic to ensure that in the event of failure the stream restarts (note that one should generally expect messages for which the offset commit hadn't happened to be processed again, as Kafka in typical cases can only guarantee at-least-once processing).

Message Aggregation using SQS and SpringBoot

I have a use case/situation wherein, SQS(standard) will be flooded with messages (north of 500k+), a microservice (spring boot based) listens to these events, consumes it, and makes a rest API call (batch-based) to 3rd party SaaS system (have attached a high-level diagram for the same)
The limitation here is that the spring boot consumer can receive a max of 10 messages from the SQS, transform the payload, and makes the rest API call with these 10 messages(records).
Is there a way to aggregate these messages to say 100 messages, before making the rest API call (assuming that the target SaaS System accepts 100 records of data)? Would spring batch helps in this case?
Should I have to look at a different stack for this kind of need? Any help/guidance is much appreciated.
Thanks
What you are describing is actually the chunk-oriented processing model of Spring Batch: items could be read from the queue, accumulated in chunks of 100 items (that is the configurable chunk-size) and posted to your REST API in bulk mode.
Spring Batch handles the chunking of items (and much more) for you. So yes, even though I'm biased, I believe Spring Batch is a very good option for your use case.
Maybe you should try Spring Aggregator(Spring Integration).
The Aggregator combines a group of related messages, by correlating
and storing them until the group is deemed to be complete. At that
point, the aggregator creates a single message by processing the whole
group and sends the aggregated message as output.
https://docs.spring.io/spring-integration/reference/html/aggregator.html
And please refer to this GitHub repo for spring integration with AWS services
https://github.com/spring-projects/spring-integration-aws/tree/main/src/test/java/org/springframework/integration/aws
I'm assuming you are having multiple instances of your application and can scale up easily if required (since you have 500k+ messages). But still, your application is prone to data loss. So building a reliable system is always challenging. Since you are already on the cloud and maybe you should think about utilizing different cloud services.
I think for your case, you should have a look at the AWS Kinesis dataStream and Kinesis data fire hose.
You can refer this,
https://aws.amazon.com/blogs/big-data/stream-data-to-an-http-endpoint-with-amazon-kinesis-data-firehose/

Thread model for Async API implementation using Spring

I am working on the micro-service developed using Spring Boot . I have implemented following layers:
Controller layer: Invoked when user sends API request
Service layer: Processes the request. Either sends request to third-part service or sends request to database
Repository layer: Used to interact with the
database
.
Methods in all of above layers returns the CompletableFuture. I have following questions related to this setup:
Is it good practice to return Completable future from all methods across all layers?
Is it always recommended to use #Async annotation when using CompletableFuture? what happens when I use default fork-join pool to process the requests?
How can I configure the threads for above methods? Will it be a good idea to configure the thread pool per layer? what are other configurations I can consider here?
Which metrics I should focus while optimizing performance for this micro-service?
If the work your application is doing can be done on the request thread without too much latency, I would recommend it. You can always move to an async model if you find that your web server is running out of worker threads.
The #Async annotation is basically helping with scheduling. If you can, use it - it can keep the code free of the references to the thread pool on which the work will be scheduled. As for what thread actually does your async work, that's really up to you. If you can, use your own pool. That will make sure you can add instrumentation and expose configuration options that you may need once your service is running.
Technically you will have two pools in play. One that Spring will use to consume the result of your future, and another that you will use to do the async work. If I recall correctly, Spring Boot will configure its pool if you don't already have one, and will log a warning if you didn't explicitly configure one. As for your worker threads, start simple. Consider using Spring's ThreadPoolTaskExecutor.
Regarding which metrics to monitor, start first by choosing how you will monitor. Using something like Spring Sleuth coupled with Spring Actuator will give you a lot of information out of the box. There are a lot of services that can collect all the metrics actuator generates into time-based databases that you can then use to analyze performance and get some ideas on what to tweak.
One final recommendation is that Spring's Web Flux is designed from the start to be async. It has a learning curve for sure since reactive code is very different from the usual MVC stuff. However, that framework is also thinking about all the questions you are asking so it might be better suited for your application, specially if you want to make everything async by default.

Does Spring Boot with its Blocking IO really fit well with Microservices?

There are a lot of tutorials and articles (including official site) promoting spring boot as a good tool for building microservices.
Let's say we have some rest api endpoint (User profile) which aggregates data from multiple services (User service, Stat service, Friends service).
To achieve this, user profile endpoint makes 3 http calls to those services.
But in Spring, requests are blocking and as I see, the server will quickly run out of available resources (threads) to serve request in such system.
So to me, it as quite inefficient way to build such systems (compared to non-blocking frameworks, like play! framework or node.js)
Do I miss something?
P.S.: I do not mean here spring 5 with its new webflux framework.
No one prevents you from building an asynchronous microservice architecture with Spring Boot :).
Something along these lines:
Instead of one service calling another synchronously, a service can put events to a queue (e.g. RabbitMQ). The events are delivered to services that subscribe to those events.
Using RabbitMQ and its "exchange" concept, the event producing service doesn't even need to the consumers of its events.
A blog post detailing this with Spring Boot code can be found here: https://reflectoring.io/event-messaging-with-spring-boot-and-rabbitmq/
This is not a limitation of Spring rather it is more to do with the Application Architecture.
For instance, the scenario that you have is commonly solved using Aggregate Design Pattern
While this solution is quite prevalent,it has the limitation of being synchronous, and thus blocking. Asynchronous behaviour in such scenarios should be implemented in an application specific way.
Having said that if you have to call other services in order to be able to serve a response to a request from a client(outside), this is typically an architectural problem. It really doesn’t matter if you are using HTTP or asynchronous message passing (with a request-reply pattern), the overall response time for the outside client will be bad
Also, I have seen quite a few applications which uses synchronous REST calls for external clients, but when communication is needed between internal MicroServices, it should always be asynchronous. You can read an interesting paper on this topic here MicroServices Messaging Patterns

Microservices: Service discovery/ circuit breaker for Event-driven architecture

I'm fairly new to Microservices...
I've taken an interest in learning more about two main patterns like service discovery and circuit breaker and I have conducted research on how these could be implemented.
As a Java Developer, I'm using Spring Boot. From what I understand, these patterns are useful if microservices communicate via HTTP.
One of the topics I've recently seen is the importance of event-driven architecture, which makes use of an event message bus that services would use to send messages to for other services, which subscribe to the bus
and process the message.
Given this event-driven nature, how can service-discovery and circuit breakers be achieved/implemented, given that these are commonly applicable for services communicating via HTTP?
From what I understand, these patterns are useful if microservices communicate via HTTP.
It is irrelevant that the communication is HTTP. The circuit breaker is useful in prevention of cascade failures that are more probable to occur in the architectures that use a synchronous communication style.
Event-driven architectures are in general asynchronous so cascade failure is less probable to occur.
Service discovery is used in order for the microservices to discover each other but in Event-driven architectures microservices communicate only to the messaging infrastructure (i.e. the Event store in Event sourcing) so discoverability could be used only at the infrastructure level.
I. circuit breaker and service discovery are patterns. When we say Pattern they can be implemented with any programming language. 'HTTP' protocol is for transfer of data.
circuit breaker can be implemented within Java. You can find many implementations (of course, with varying capabilities and interpretation of pattern) on github.
Some of the well-known, built for purpose implementations are :
Hysterix from NetflixOSS For using Hysterix: You can follow Spring Guide - Spring Circuit Breaker
Apache Polygene - which has example of JMX circuit breaker
Resilience4j
II. About,
Given this event-driven nature, how can service-discovery and circuit
breakers be achieved/ implemented, given that these are commonly
applicable for services communicating via HTTP?
It seems you need bit more research on topic of Microservices interactions.
There are two ways to which microservices interactions are possible. You have to choose one over the other. You can/should not mix both.
Orchestration: An interaction style that has an intelligent controller that dispatches events to processes. Please note the word 'processes' which is representing business processes here. Orchestration style was preferred in old SOA implementations as well.
Choreography: An interaction style that allows processes to subscribe to events and handle them independently or through integration with other processes without the need for a central controller.
These topics are greatly covered under
Orchestration vs. Choreography
Need of Service Discovery:
With choreography, two or more microservices can coordinate their activities and processes to share information and value.
But, these microservices may not be aware of each other's existence i.e. There are no hard-coded or service references of dependency endpoints configured or coded into them. Why we do this, is for avoiding any kind of coupling between services. So, the question remains is how one service, if required will find another services' endpoint? This is where service discovery mechanism is used.
Another perspective is, with microservices deployment with containers etc, microservices endpoints will not be even tied to any hosts etc. [due to spin-up and spin-down of containers]. So, for this case as well, we need 'service discovery' mechanism.
So, In service discovery mechanism, a centralized service discovery tool helps services to register themselves and to discover other services via a DNS or HTTP interface.
Service discovery can be implemented with
1. Server-side service discovery
2. Client Side service discovery
Consul,etcd, zookeeper are some of the key-tools names within service discovery space.
Spring Boot integrates well with Spring Cloud. And Spring Cloud provides Eureka (for service discovery) as well as Hystrix (for circuit breaker patterns). Also, Spring Cloud Stream to provide event driven patterns
Very easy to use with Spring Boot
I believe there is a misunderstanding in the question in that you assume that event-driven architectures cannot be implemented on top of HTTP.
An event-driven architecture may be implemented in many different ways and (when the architecture is that of a distributed system), on top of many different protocols.
It can be implemented using a message broker (i.e. Kafka, RabbitMQ, ActiveMQ, etc) as you suggested it too. However, this is just a choice and certainly not the only way to do it.
For example, the seminal book Building Microservices by Sam Newman, in Chapter 4: Integration, under Implementing Asynchronous Event-Based Collaboration says:
“Another approach is to try to use HTTP as a way of propagating
events. ATOM is a REST-compliant specification that defines semantics
(among other things) for publishing feeds of resources. Many client
libraries exist that allow us to create and consume these feeds. So
our customer service could just publish an event to such a feed when
our customer service changes. Our consumers just poll the feed,
looking for changes. On one hand, the fact that we can reuse the
existing ATOM specification and any associated libraries is useful,
and we know that HTTP handles scale very well. However, HTTP is not
good at low latency (where some message brokers excel), and we still
need to deal with the fact that the consumers need to keep track of
what messages they have seen and manage their own polling schedule.
I have seen people spend an age implementing more and more of the
behaviors that you get out of the box with an appropriate message
broker to make ATOM work for some use cases. For example, the
Competing Consumer pattern describes a method whereby you bring up
multiple worker instances to compete for messages, which works well
for scaling up the number of workers to handle a list of independent
jobs. However, we want to avoid the case where two or more workers see
the same message, as we’ll end up doing the same task more than we
need to. With a message broker, a standard queue will handle this.
With ATOM, we now need to manage our own shared state among all the
workers to try to reduce the chances of reproducing effort. If you
already have a good, resilient message broker available to you,
consider using it to handle publishing and subscribing to events. But
if you don’t already have one, give ATOM a look, but be aware of the
sunk-cost fallacy. If you find yourself wanting more and more of the
support that a message broker gives you, at a certain point you might
want to change your approach.”
Likewise, if your design uses a message broker for the event-driven architecture, then I'm not sure if a circuit breaker is needed, because in that case the consumer applications control the rate at which event messages are being consumed from the queues. The producer application can publish event messages at its own pace, and the consumer applications can add as many competing consumers as they want to keep up with that pace. If the server application is down the client applications can still continue consuming any remaining messages in the queues, and once the queues are empty, they will just remain waiting for more messages to arrive. But that does not put any burden on the producer application. The producer and the consumer applications are decoupled in this scenario, and all the work the circuit breaker does in other scenarios would be solved by the message broker application.
Somewhat similar can be said of the service discovery feature. Since the producer and the consumer do not directly talk to each other, but only through the message broker, then the only service you need to discover would be the message broker.

Resources