Development compromises in using Spring Cloud Stream - spring-boot

The case for event-driven microservices such as Spring Cloud Stream is their asynchronous nature, which I do agree it makes them more scalable
But I have an issue regarding how to code it in a way where I don't lose certain key features that I have access to using synchronous services
In a servlet-based MS, I make full use of servlet context variables and servlet-based Spring autowiring functions
For e.g., I leverage heavily on HTTP headers to carry metadata between microservices without having to impact the payload. But in Spring Cloud Stream using Kafka, Kafka doesn't support message headers of any kind! I lose that immediately if I use SCS. Putting them into the payload causes all sort of changes in my model classes if I define the attributes clearly. Yes, I can use a simple Hashmap to simulate the HTTP header object but it really seems like reinventing the wheel to me.
On the auto-wiring side: I maintain an audit log record per request, which I implement by declaring a request-scoped Hashmap bean and autowiring it into any methods in the Servlet's call stack that needs to append data to the audit log. Basically it's just a global variable to hold some data within a single request. But in SCS, again, I lose that cos bean scopes that leverage on servlets are not available.
So far, there seems to be a lot of trade-offs that I have to make just to make Spring Cloud Stream work for me.
I thought about an alternative approach where I use SCS just to create an entry point but the Source method would just get the event, use a Processor to construct a HTTP request and send the request along to a HTTP endpoint. But, why go through all that trouble then?
Hoping that some more experienced devs would be able to shed some light on how they leverage on SCS.

#feicipet Thanks for the detailed question. let me try to address some of your concerns in the order you have listed them:
+1
+1
I am not sure why you are referring to it as servlet-based instead of Spring-based? Those are features provided by Spring, but read on. . .
Spring Cloud Stream doesn't use Kafka, the end user does while Spring Cloud Stream provides Kafka binder allowing Spring Cloud Stream to integrate with Kafka. Further more, while Kafka indeed did not support headers prior to version 0.11, Spring Cloud Stream always supported and will continue support headers even with Kafka pre-0.11, embedding them in the Message and then extracting them in the consumer side into the proper Message headers completely transparent to the end user. In other words one would assume that Kafka did support headers by simply using Spring Cloud Stream. With Kafka 0.11+ headers are supported natively and we have adjusted to that with the same level of transparency.
So, you don't need to put anything in the payload. Just create an appropriate Message<payload, headers> and SCSt will take care of the rest regardless of the broker (Kafka, Rabbit, Foo etc.).
Yes you do simply due to the fact that as you eluded earlier SCSt promotes an asynchronous and stateless architecture. However, I do not agree that what you are trying to accomplish is un-accomplishable. Rather it is accomplishable the way you are describing, but there are other way to maintain context and I would be more then glad to discuss it as a separate topic.
I would not call them trade-offs, rather difference in the architecture, that has its benefits, but it is a not one-size-fits-all architecture and therefore its viability should be discussed within the context of a concrete use case.
+1. You don't have to separate it as Source and Processor. You can simply create a custom Source app with exposed REST endpoint and custom processing logic. However we are currently working on enhancements i the framework to ensure that you could do the same with the existing starter apps.
Obviously we have touched on many points here and some of them would probably need to be debated further, but I hope this clears up some of your concerns.
Cheers

Related

Advisable to run a Kafka producer + consumer in same application?

Spring + Apache Kafka noob here. I'm wondering if its advisable to run a single Spring Boot application that handles both producing messages as well as consuming messages.
A lot of the applications I've seen using Kafka lately usually have one separate application send/emit the message to a Kafka topic, and another one that consumes/processes the message from that topic. For larger applications, I can see a case for separate producer and consumer applications, but what about smaller ones?
For example: I'm a simple app that processes HTTP requests => send requests to a third party service, but to ensure retryability, I put the request on a Kafka queue with a service using the #Retryable annotation?
And what other considerations might come into play since it would be on the Spring framework?
Note: As your question states, what'll say is more of an advice based on my beliefs and experience rather than some absolute truth written in stone.
Your use case seems more like a proxy than an actual application with business logic. You should make sure that making this an asynchronous service makes sense - maybe it's good enough to simply hold the connection until you get a response from the 3p, and let your client handle retries if you get an error - of course, you can also retry until some timeout.
This would avoid common asynchronous issues such as making your client need to poll or have a webhook in order to get a result, or making sure a record still makes sense to be processed after a lot of time has elapsed after an outage or a high consumer lag.
If your client doesn't care about the result as long as it gets done, and you don't expect high-throughput on either side, a single Spring Boot application should be enough for handling both producer and consumer sides - while also keeping it simple.
If you do expect high throughput, I'd look into building a WebFlux based application with the reactor-kafka library - high throughput proxies are an excellent use case for reactive applications.
Another option would be having a simple serverless function that handles the http requests and produces the records, and a standard Spring Boot application to consume them.
TBH, I don't see a use case where having two full-fledged java applications to handle a proxy duty would pay off, unless maybe you have a really sound infrastructure to easily manage them that it doesn't make a difference having two applications instead of one and using more resources is not an issue.
Actually, if you expect really high traffic and a serverless function wouldn't work, or maybe you want to stick to Java-based solutions, then you could have a simple WebFlux-based application to handle the http requests and send the messages, and a standard Spring Boot or another WebFlux application to handle consumption. This way you'd be able to scale up the former in order to accommodate the high traffic, and independently scale the later in correspondence with your performance requirements.
As for the retry part, if you stick to non-reactive Spring Kafka applications, you might want to look into the non-blocking retries feature from Spring Kafka. This will enable your consumer application to process other records while waiting to retry a failed one - the #Retryable approach is deprecated in favor of DefaultErrorHandler and both will block consumption while waiting.
Note that with that you lose ordering guarantees, so use it only if the order the requests are processed is not important.

Thread model for Async API implementation using Spring

I am working on the micro-service developed using Spring Boot . I have implemented following layers:
Controller layer: Invoked when user sends API request
Service layer: Processes the request. Either sends request to third-part service or sends request to database
Repository layer: Used to interact with the
database
.
Methods in all of above layers returns the CompletableFuture. I have following questions related to this setup:
Is it good practice to return Completable future from all methods across all layers?
Is it always recommended to use #Async annotation when using CompletableFuture? what happens when I use default fork-join pool to process the requests?
How can I configure the threads for above methods? Will it be a good idea to configure the thread pool per layer? what are other configurations I can consider here?
Which metrics I should focus while optimizing performance for this micro-service?
If the work your application is doing can be done on the request thread without too much latency, I would recommend it. You can always move to an async model if you find that your web server is running out of worker threads.
The #Async annotation is basically helping with scheduling. If you can, use it - it can keep the code free of the references to the thread pool on which the work will be scheduled. As for what thread actually does your async work, that's really up to you. If you can, use your own pool. That will make sure you can add instrumentation and expose configuration options that you may need once your service is running.
Technically you will have two pools in play. One that Spring will use to consume the result of your future, and another that you will use to do the async work. If I recall correctly, Spring Boot will configure its pool if you don't already have one, and will log a warning if you didn't explicitly configure one. As for your worker threads, start simple. Consider using Spring's ThreadPoolTaskExecutor.
Regarding which metrics to monitor, start first by choosing how you will monitor. Using something like Spring Sleuth coupled with Spring Actuator will give you a lot of information out of the box. There are a lot of services that can collect all the metrics actuator generates into time-based databases that you can then use to analyze performance and get some ideas on what to tweak.
One final recommendation is that Spring's Web Flux is designed from the start to be async. It has a learning curve for sure since reactive code is very different from the usual MVC stuff. However, that framework is also thinking about all the questions you are asking so it might be better suited for your application, specially if you want to make everything async by default.

Jersey for desktop application [duplicate]

I have a small Java SE application, it´s practically a fat client sitting on top of a database. To further my Java skills, I decided to make a client-server application out of it. The server application communicates with the database and handles all kinds of lengthy operations, while the client application only receives the results, mostly ArrayLists of moderate length and primitives.
To this end, I started reading on RMI and did the Oracle tutorial, which I found surprisingly difficult to understand and even get to work.
Is there anything else I can use instead of RMI, without having to dive into JavaEE?
One way I would suggest would be to use JSON as the format for data exchange. You can use GSON to convert the data from Java objects to JSON and back. The transport could be done directly on the HTTP protocol using REST. You can use Jersey as a REST server/client or roll your own (since you don't want to use JEE, which Jersey is part of).
SIMON is as simple to use as RMI, but with fewer traps in the initial setup. It also has some advantages over RMI. Here is a simple hello-world example:
http://dev.root1.de/projects/simon/wiki/Sample_helloworld110
I take it RMI = Remote Method Invocation ...
You can have a look at XMLRPC, JSONRPC, JMS, or if you want to roll your own, use JSON to POST messages and convert the JSON back to a java object on the other side using GSON (from Google) or setup RabbitMQ and use AMQP to submit and receive messages if you don't want to handle the POSTing yourself, Spring AMQP makes it fairly easy to do this.

Is it possible to get exactly once processing with Spring Cloud Stream?

Currently I'm using SCS with almost default configuration for sending and receiving message between microservices.
Somehow I've read this
https://www.confluent.io/blog/enabling-exactly-kafka-streams
and wonder that it is gonna works or not if we just put the property called "processing.guarantee" with value "exactly-once" there through properties in Spring boot application ?
In the context of your question you should look at Spring Cloud Stream as just a delegate between target system (e.g., Kafka) and your code. The binders that enable such delegation are usually implemented in such way that they propagate whatever functionality supported by the target system.

Does Spring Boot with its Blocking IO really fit well with Microservices?

There are a lot of tutorials and articles (including official site) promoting spring boot as a good tool for building microservices.
Let's say we have some rest api endpoint (User profile) which aggregates data from multiple services (User service, Stat service, Friends service).
To achieve this, user profile endpoint makes 3 http calls to those services.
But in Spring, requests are blocking and as I see, the server will quickly run out of available resources (threads) to serve request in such system.
So to me, it as quite inefficient way to build such systems (compared to non-blocking frameworks, like play! framework or node.js)
Do I miss something?
P.S.: I do not mean here spring 5 with its new webflux framework.
No one prevents you from building an asynchronous microservice architecture with Spring Boot :).
Something along these lines:
Instead of one service calling another synchronously, a service can put events to a queue (e.g. RabbitMQ). The events are delivered to services that subscribe to those events.
Using RabbitMQ and its "exchange" concept, the event producing service doesn't even need to the consumers of its events.
A blog post detailing this with Spring Boot code can be found here: https://reflectoring.io/event-messaging-with-spring-boot-and-rabbitmq/
This is not a limitation of Spring rather it is more to do with the Application Architecture.
For instance, the scenario that you have is commonly solved using Aggregate Design Pattern
While this solution is quite prevalent,it has the limitation of being synchronous, and thus blocking. Asynchronous behaviour in such scenarios should be implemented in an application specific way.
Having said that if you have to call other services in order to be able to serve a response to a request from a client(outside), this is typically an architectural problem. It really doesn’t matter if you are using HTTP or asynchronous message passing (with a request-reply pattern), the overall response time for the outside client will be bad
Also, I have seen quite a few applications which uses synchronous REST calls for external clients, but when communication is needed between internal MicroServices, it should always be asynchronous. You can read an interesting paper on this topic here MicroServices Messaging Patterns

Resources