how to initialize a continous running stream using alpakka, spring boot & Akka-stream? - spring-boot

All,
I am developing an application, which use alpakka spring boot integration to read data from kafka. I have most of the code ready, the only place i am stuck is how to initialize a continuous running stream, as this is going to be a backend application and wont be having any api to be called from ?

As far as I know, Alpakka's Spring integration is basically designed around exposing Akka Streams via a Spring HTTP controller. So I'm not sure what purpose bringing Spring into this serves, since there's quite an impedance mismatch between the way an Akka application will tend to like to work and the way a Spring application will tend to like to work.
Assuming you're talking about using Alpakka Kafka, the most idiomatic thing to do would be to just start a stream fed by an Alpakka Kafka Source in your main method and it will run until killed or it fails. You may want to use a RestartSource around the consumer and business logic to ensure that in the event of failure the stream restarts (note that one should generally expect messages for which the offset commit hadn't happened to be processed again, as Kafka in typical cases can only guarantee at-least-once processing).

Related

Advisable to run a Kafka producer + consumer in same application?

Spring + Apache Kafka noob here. I'm wondering if its advisable to run a single Spring Boot application that handles both producing messages as well as consuming messages.
A lot of the applications I've seen using Kafka lately usually have one separate application send/emit the message to a Kafka topic, and another one that consumes/processes the message from that topic. For larger applications, I can see a case for separate producer and consumer applications, but what about smaller ones?
For example: I'm a simple app that processes HTTP requests => send requests to a third party service, but to ensure retryability, I put the request on a Kafka queue with a service using the #Retryable annotation?
And what other considerations might come into play since it would be on the Spring framework?
Note: As your question states, what'll say is more of an advice based on my beliefs and experience rather than some absolute truth written in stone.
Your use case seems more like a proxy than an actual application with business logic. You should make sure that making this an asynchronous service makes sense - maybe it's good enough to simply hold the connection until you get a response from the 3p, and let your client handle retries if you get an error - of course, you can also retry until some timeout.
This would avoid common asynchronous issues such as making your client need to poll or have a webhook in order to get a result, or making sure a record still makes sense to be processed after a lot of time has elapsed after an outage or a high consumer lag.
If your client doesn't care about the result as long as it gets done, and you don't expect high-throughput on either side, a single Spring Boot application should be enough for handling both producer and consumer sides - while also keeping it simple.
If you do expect high throughput, I'd look into building a WebFlux based application with the reactor-kafka library - high throughput proxies are an excellent use case for reactive applications.
Another option would be having a simple serverless function that handles the http requests and produces the records, and a standard Spring Boot application to consume them.
TBH, I don't see a use case where having two full-fledged java applications to handle a proxy duty would pay off, unless maybe you have a really sound infrastructure to easily manage them that it doesn't make a difference having two applications instead of one and using more resources is not an issue.
Actually, if you expect really high traffic and a serverless function wouldn't work, or maybe you want to stick to Java-based solutions, then you could have a simple WebFlux-based application to handle the http requests and send the messages, and a standard Spring Boot or another WebFlux application to handle consumption. This way you'd be able to scale up the former in order to accommodate the high traffic, and independently scale the later in correspondence with your performance requirements.
As for the retry part, if you stick to non-reactive Spring Kafka applications, you might want to look into the non-blocking retries feature from Spring Kafka. This will enable your consumer application to process other records while waiting to retry a failed one - the #Retryable approach is deprecated in favor of DefaultErrorHandler and both will block consumption while waiting.
Note that with that you lose ordering guarantees, so use it only if the order the requests are processed is not important.

Using rabbitmq to send String/Custom Object from one spring boot application to another

My requirement is to for starters send a string from one spring-boot application to another using AMQP.
I am new to it and I have gone through this spring-boot guide, so i know the basic fundamentals of Queue, Exchange, Binding, Container and listener.
So, above guide shows the steps when amqp is received in same application.
I am a little confused on where to start if I want to achieve above type of communication between 2 different spring-boot applications.
What are the properties needed for that, etc.
Let me know if any details required.
Just divide the application into two:
One without Receiver and ...
Another without Sender
Make sure your application and configuration etc stays the same. With Spring boot's built-in RabbitMQ, you will be able to run it alright.
Next step is to call sender as and when needed from your business logic.

Development compromises in using Spring Cloud Stream

The case for event-driven microservices such as Spring Cloud Stream is their asynchronous nature, which I do agree it makes them more scalable
But I have an issue regarding how to code it in a way where I don't lose certain key features that I have access to using synchronous services
In a servlet-based MS, I make full use of servlet context variables and servlet-based Spring autowiring functions
For e.g., I leverage heavily on HTTP headers to carry metadata between microservices without having to impact the payload. But in Spring Cloud Stream using Kafka, Kafka doesn't support message headers of any kind! I lose that immediately if I use SCS. Putting them into the payload causes all sort of changes in my model classes if I define the attributes clearly. Yes, I can use a simple Hashmap to simulate the HTTP header object but it really seems like reinventing the wheel to me.
On the auto-wiring side: I maintain an audit log record per request, which I implement by declaring a request-scoped Hashmap bean and autowiring it into any methods in the Servlet's call stack that needs to append data to the audit log. Basically it's just a global variable to hold some data within a single request. But in SCS, again, I lose that cos bean scopes that leverage on servlets are not available.
So far, there seems to be a lot of trade-offs that I have to make just to make Spring Cloud Stream work for me.
I thought about an alternative approach where I use SCS just to create an entry point but the Source method would just get the event, use a Processor to construct a HTTP request and send the request along to a HTTP endpoint. But, why go through all that trouble then?
Hoping that some more experienced devs would be able to shed some light on how they leverage on SCS.
#feicipet Thanks for the detailed question. let me try to address some of your concerns in the order you have listed them:
+1
+1
I am not sure why you are referring to it as servlet-based instead of Spring-based? Those are features provided by Spring, but read on. . .
Spring Cloud Stream doesn't use Kafka, the end user does while Spring Cloud Stream provides Kafka binder allowing Spring Cloud Stream to integrate with Kafka. Further more, while Kafka indeed did not support headers prior to version 0.11, Spring Cloud Stream always supported and will continue support headers even with Kafka pre-0.11, embedding them in the Message and then extracting them in the consumer side into the proper Message headers completely transparent to the end user. In other words one would assume that Kafka did support headers by simply using Spring Cloud Stream. With Kafka 0.11+ headers are supported natively and we have adjusted to that with the same level of transparency.
So, you don't need to put anything in the payload. Just create an appropriate Message<payload, headers> and SCSt will take care of the rest regardless of the broker (Kafka, Rabbit, Foo etc.).
Yes you do simply due to the fact that as you eluded earlier SCSt promotes an asynchronous and stateless architecture. However, I do not agree that what you are trying to accomplish is un-accomplishable. Rather it is accomplishable the way you are describing, but there are other way to maintain context and I would be more then glad to discuss it as a separate topic.
I would not call them trade-offs, rather difference in the architecture, that has its benefits, but it is a not one-size-fits-all architecture and therefore its viability should be discussed within the context of a concrete use case.
+1. You don't have to separate it as Source and Processor. You can simply create a custom Source app with exposed REST endpoint and custom processing logic. However we are currently working on enhancements i the framework to ensure that you could do the same with the existing starter apps.
Obviously we have touched on many points here and some of them would probably need to be debated further, but I hope this clears up some of your concerns.
Cheers

SCDF: Can I use an outside microservice as a source?

I am trying to work through a solution where the workflow is like this:
User hits a microservice to upload images
That microservice de-duplicates the image and if it really is new, queues it up for processing
The processing chain lives in Spring Cloud Dataflow
The microservice already exists, and we are trying to extend it to do the fancy processing. My initial cut was to use the Http Source from the sample starter pack since that would be something I didn't have to create. The problem is that the source doesn't register itself with Spring Discovery server, so there is no way to get an end point without making gross assumptions (like it lives on the dataflow server at port XYZ).
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The lifecycle of the server is decoupled from the apps it deploys, that was intentional.
I'm not following your thoughts on how dataflow could connect the start of the queue, but from your description there's a few things you could do:
You would need to modify the app in order to have it registered with eureka, but this is a very simple operation, no more than a few lines of code:
You can either start from a stream app perspective: https://start-scs.cfapps.io/ , select http source, your binder, and then add the spring-cloud-netflix library as well as #EnableDiscoveryClient at the Main boot class
Start with http://start.spring.io Select Stream Rabbit or Stream Kafka, add Web and netflix libraries, then add the #EnableDiscoveryClient and #EnableBinding annotations and create a simple HTTP endpoint for your use case.
In any case should be a small addition.
You can also open an issue at :https://github.com/spring-cloud-stream-app-starters/http/issues suggesting that we add #EnableDiscoveryClient to the http source app, we can take that in consideration on our next iteration as well.
I'll try to clarify few bits.
upload images -> if it really is new -> queues it up for processing
Upon a new upload event, you'd want to process the image. Here's a similar use-case, but more of a real-time streaming style solution. This is not what you're looking to do, but I thought it might be useful.
Porting the image processing code to a Spring Cloud Stream application is as simple as adding #EnableBinding(Processor.class). It is the same business logic - whether you're running it separately or orchestrating it via SCDF, it is still a standalone microservice. However, SCDF expects it to be either a Source, Processor, Sink, or Task application types. We will be opening this up to support any arbitrary "functions" (lambdas) in the future release.
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
This is one of the standard solutions. You can directly consume new events (images) from a queue/topic and process it in the image-processor that we created in previous step. The named-channel support in DSL facilitates just that.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
I'm not sure I understand this. If I were to assume, you're looking for "named-channel" as source and that is supported.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The moment you deploy a Stream in SCDF, all the individual steps included in the DSL (i.e., stream definition) are resolved and deployed as standalone apps in the target runtime (cloud foundry, kubernetes, etc.,). Once deployed, it is left to the platform where the apps run for lifecycle management. SCDF does not retain or track the app states.

Reactive stream Kafka Stream Fan-out to http actors

I'm very new to Akka Streaming and reactive streaming. I have a question: is it possible to have a rest API receiving a message dropping it on Kafka Bus, and the Kafka streaming consumer then aggregates the messages in a max. time window and retrun the answer back?
How to implement such a system? Or where to start?
Thanks
For a rest API you can consider the Kafka REST Proxy: https://github.com/confluentinc/kafka-rest
Or course you can instead build your own using akka-http and akka-stream-kafka.
As to windowing, I'm sure it can be done in akka streams but personally, I'd suggest using Kafka Streams as the first port of call:
http://docs.confluent.io/current/streams/developer-guide.html#windowing
I'm not sure what exactly you mean by returning the answer back, but if you follow the approach above, you can use use REST Proxy to consume the windowed-aggregated messages or you can build a REST service that queries the Kafka Streams state stores via the so-called "interactive queries". This post shows how to do it using javax.ws.rs: https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ but for a reactive application you can do the same using akka-http instead (I'm implementing this exact thing on one of my projects).

Resources