Access Spring Cloud Stream Kafka materialized view with external client - apache-kafka-streams

Let's pretend that my Java application using Spring Cloud Streams Kafka creates a materialized view (KTable) that will be used to fulfil get(id) request.
How can I see the data from "outside" if I would like to do some maintenance or troubleshooting? Is there a way to explore/reprocess collected data as connecting directly to materialized view (like assigning to a topic) from another external application and convert it to stream? One option is creating specialized REST API that will map request to materialized view methods. Is there a better more efficient way? Does Spring Cloud Streams provides such? May be a tool?

Related

Calling Hibernate in Spring cloud Stream

I'm new to Spring cloud stream.
Say I Spring cloud stream app that listen to some topic from kafka using #StreamListener("input-channel").
I want to do some calculation and send the result to another topic but in the middle of the processing I also need to call Hibernate (via spring data jpa) to persist some data to my mySQL data base.
Is it valid to call Hibernate in the middle of stream processing? is there other pattern to do it?
Yes, it's a database call, so why not. People do it all the time.
Also, #StreamListener, has been deprecated for 3 years now, and is already removed from the new versions, so please transition to functional programming model

logstash vs spring cloud data flow, which one is suitable for data preprocessing?

I'm using spring boot along with elasticsearch to make a search system on my website.
I've some data that i need to push in elastic search, this data ( a product for example ) must be processed before ( passed to another micro-service that filters the JSON, adds some fields for a better search result do some calculations and return the object i want to store ). is it possible to do so with log stash, or do i need to use Spring Cloud Data Flow ? thanks in advance.
what i want to do:
save a product ( product service )
log the saved product or stream it.
process it before storage ( another service )
save the document ( elastic search server )
Thanks in advance.
Obviously it depends on various factors but I can try to provide some insights on Spring Cloud Data Flow from the technical standpoint.
If you want to construct a streaming pipeline where your filtering apps are connected via a messaging system that does this flow of data processing, you can checkout Spring Cloud Data Flow.
Spring Cloud Data Flow (and the underlying framework supports such as Spring Cloud Stream and Spring Cloud Task) provides the operational benefits over how you manage your streaming pipelines but it may not make sense if you don't need a data pipeline with a messaging system etc., In those cases, you would just stick to a simple Spring Boot app that does this whole filtering model. As soon as you start exploring the distribution of these applications loosely coupled via messaging system, Spring Cloud Data Flow would be handy.
Please checkout SCDF guide to understand some of the features and recipes to know more about what SCDF can offer and choose what fits in your case.

Spring Cloud Stream - query topic without consuming a KTable/KStream explicitly?

I'm using Spring Cloud Stream library in a Java application. I want to use the Kafka Streams binder for a state store. The application will post messages to a topic, and I wish to use the Kafka Streams InteractiveQueryService to retrieve data from the same topic. Is it possible to perform such queries as-is, or do I need to first consume the topic as a KTable/KStream and materialize it before I can perform queries? I don't have any requirement to perform KTable/KStream processing on the topic, I just want to query the topic contents. I'm hoping there is some way to implicitly materialize it as a state store.
Interactive Queries is a feature that allows you to query client side state states. It's not a feature that allows you to query topics.
Hence, if you have data in a topic that you want to query it using "Interactive Queries", you need to load the data into a state store within Kafka Streams.

Azure alternative to spring cloud dataflow process

I'm looking for the azure alternative for the Data flow model of Data Source-processor-sink.
I want the three entities to be separate microservices. I want to use messaging as a link between these three.
Basically, Source app takes the data from another service and sends it to processor while processor app acts on it and sends relevant notification/alert to sink.
I'm aware I can use rabbitmq for the messaging but I need to know which one will be better in azure - service bus topics or eventhub? and how can I use them?
At the moment, there isn't a Spring Cloud Stream binder implementation for Azure Event Hubs.
Unless we have this, the out-of-the-box or the custom apps cannot be built as a messaging-microservice app, where Spring Cloud Stream provides the programming model and Spring Cloud Data Flow lets you orchestrate the individual microserivces in to a data pipeline (i.e., source-processor-sink) via the DSL/Drag-and-Drop GUI.
Microsoft was exploring the binder implementation in the past; possibly it would end up in Azure Spring Boot project. Feel free to drop an issue on their backlog.

Development compromises in using Spring Cloud Stream

The case for event-driven microservices such as Spring Cloud Stream is their asynchronous nature, which I do agree it makes them more scalable
But I have an issue regarding how to code it in a way where I don't lose certain key features that I have access to using synchronous services
In a servlet-based MS, I make full use of servlet context variables and servlet-based Spring autowiring functions
For e.g., I leverage heavily on HTTP headers to carry metadata between microservices without having to impact the payload. But in Spring Cloud Stream using Kafka, Kafka doesn't support message headers of any kind! I lose that immediately if I use SCS. Putting them into the payload causes all sort of changes in my model classes if I define the attributes clearly. Yes, I can use a simple Hashmap to simulate the HTTP header object but it really seems like reinventing the wheel to me.
On the auto-wiring side: I maintain an audit log record per request, which I implement by declaring a request-scoped Hashmap bean and autowiring it into any methods in the Servlet's call stack that needs to append data to the audit log. Basically it's just a global variable to hold some data within a single request. But in SCS, again, I lose that cos bean scopes that leverage on servlets are not available.
So far, there seems to be a lot of trade-offs that I have to make just to make Spring Cloud Stream work for me.
I thought about an alternative approach where I use SCS just to create an entry point but the Source method would just get the event, use a Processor to construct a HTTP request and send the request along to a HTTP endpoint. But, why go through all that trouble then?
Hoping that some more experienced devs would be able to shed some light on how they leverage on SCS.
#feicipet Thanks for the detailed question. let me try to address some of your concerns in the order you have listed them:
+1
+1
I am not sure why you are referring to it as servlet-based instead of Spring-based? Those are features provided by Spring, but read on. . .
Spring Cloud Stream doesn't use Kafka, the end user does while Spring Cloud Stream provides Kafka binder allowing Spring Cloud Stream to integrate with Kafka. Further more, while Kafka indeed did not support headers prior to version 0.11, Spring Cloud Stream always supported and will continue support headers even with Kafka pre-0.11, embedding them in the Message and then extracting them in the consumer side into the proper Message headers completely transparent to the end user. In other words one would assume that Kafka did support headers by simply using Spring Cloud Stream. With Kafka 0.11+ headers are supported natively and we have adjusted to that with the same level of transparency.
So, you don't need to put anything in the payload. Just create an appropriate Message<payload, headers> and SCSt will take care of the rest regardless of the broker (Kafka, Rabbit, Foo etc.).
Yes you do simply due to the fact that as you eluded earlier SCSt promotes an asynchronous and stateless architecture. However, I do not agree that what you are trying to accomplish is un-accomplishable. Rather it is accomplishable the way you are describing, but there are other way to maintain context and I would be more then glad to discuss it as a separate topic.
I would not call them trade-offs, rather difference in the architecture, that has its benefits, but it is a not one-size-fits-all architecture and therefore its viability should be discussed within the context of a concrete use case.
+1. You don't have to separate it as Source and Processor. You can simply create a custom Source app with exposed REST endpoint and custom processing logic. However we are currently working on enhancements i the framework to ensure that you could do the same with the existing starter apps.
Obviously we have touched on many points here and some of them would probably need to be debated further, but I hope this clears up some of your concerns.
Cheers

Resources