Azure alternative to spring cloud dataflow process

Azure alternative to spring cloud dataflow process - spring

I'm looking for the azure alternative for the Data flow model of Data Source-processor-sink.
I want the three entities to be separate microservices. I want to use messaging as a link between these three.
Basically, Source app takes the data from another service and sends it to processor while processor app acts on it and sends relevant notification/alert to sink.
I'm aware I can use rabbitmq for the messaging but I need to know which one will be better in azure - service bus topics or eventhub? and how can I use them?

At the moment, there isn't a Spring Cloud Stream binder implementation for Azure Event Hubs.
Unless we have this, the out-of-the-box or the custom apps cannot be built as a messaging-microservice app, where Spring Cloud Stream provides the programming model and Spring Cloud Data Flow lets you orchestrate the individual microserivces in to a data pipeline (i.e., source-processor-sink) via the DSL/Drag-and-Drop GUI.
Microsoft was exploring the binder implementation in the past; possibly it would end up in Azure Spring Boot project. Feel free to drop an issue on their backlog.

Related

How to update data in real-time

I have a small stock-market application with Spring boot and if any product updated I want to serve an updated product to the clients in realtime
does it make sense to use message queues like RabbitMQ and Sse(Server Sent Events) for this, or is there a more sensible solution?

Solution
Publish your updated data to some channel
Your clients should subscribe to that channel to get updated feed in real-time.
Tools
Use in-house setup for RabbitMQ, ActiveMQ, Kafka or other open-source tools and implement WebSocket (For Front end applications)
Use commercial service like Google Cloud PubSub
Readymade and fully packaged solution with supported SDK for backend and frontend, https://www.pubnub.com/.

For this you can use either of
Spring Integration
Web Sockets
JMS
Spring Integration is an implementation of Enterprise Integration Patterns and is ideal for asynchronous processing data at realtime.
However, looking at your scope, it is only about publisher-subscriber pattern. Hence can be solved with JMS.
With JMS the subscribers/consumers can register/de-register dynamically. Also it provides ways to have fall-backs and tracking.

Spring dataflow and GCP Pub Sub

I'm building an event-driven microservice architecture, which is supposed to be Cloud agnostic (as much as possible). Since this is initially going in GCP and I don't want to spend a long time in configurations and all that, I was going to use GCP's Pub/Sub directly for the event queue and would take care of other Cloud implementations later, but then I came across Spring Cloud Dataflow, which seemed nice because these are Spring Boot microservices and I needed a way to orchestrate them.
Does Spring Cloud Dataflow support Pub Sub as it's event queue?
Would it make my life easier in terms of configuration and setup going that path, rather than choosing a non native broker?

It'd be useful first to unpack the Spring Cloud Stream's "binder abstraction" because it is using this framework, you'd have a portable event-driven streaming application, which can run locally in your laptop or any cloud of your choice against the desired message broker.
Learn more about the binder-abstraction here. Here are all the available binder implementations of choice. Google PubSub is an option, and it is maintained by Google here.
Now, let's talk about Spring Cloud Data Flow (SCDF). Once when you have built the streaming applications, you could use SCDF to design+create a data pipeline made of such applications. There's the option to mix and reuse the collection of utility applications that we build, maintain, and release as well. The utility applications can be packaged with Google PubSub or other binders. More details here.
When you deploy the data pipeline, SCDF will resolve and download the individual applications to deploy them natively on platforms like Kubernetes or Cloud Foundry. We have users doing the same in a variety of cloud infrastructure (VMs, Bare-metal, EC2, Rackspace, etc.), including DIY platforms, too.
While also automating the deployment of the applications, SCDF will automate the configuration setup based on naming conventions derived from stream/task and application names as a combination. So, when the apps bootstrap, they would have automatically received the connection configurations (from SCDF) and as well the destination/topic to connect to along with the other metadata to reason through a collection of apps as a "stream" or a "task/batch" data pipeline. This allows you to monitor and manage the pipelines centrally.
Lastly, there's the native ability in SCDF to rolling-upgrade/rolling-downgrade 1 or many applications in a data pipeline without impacting the upstream or downstream consumers in production. More details here. There's a webinar recording (demo starts at ~41.25) on how to do with CI/CD automation.

Checkpointing with Spring AWS Integration

According to Spring release notes, spring-integration-aws.1.1.0.M1 does not include DynamoDB MetaDataStore implementation. There is still ConcurrentMetadataStore class which is a key-value based store and based on implementation I suppose it maps streams with latest sequence number read. But it does not use any data store as to retrieve checkpoints.
I am using spring integration for kinesis consuming and need to implement checkpointing. I am wondering if I need to do it manually by connecting to DynamoDB and always update checkpoints or there is another way of doing it using spring framework?
P.S: I can't use Spring Cloud KinesisBinderConfiguration as I dynamically consume events from a list of configurable streams.
Thank you

If you are not talking about Spring Cloud Stream and the AWS Kinesis Binder implementation, then I don't see any blockers for you to upgrade your solution to the Spring Integration AWS 2.0 and go ahead with already provided DynamoDbMetaDataStore.
Or if that is so hard for you to move to the Spring Integration 5.0, then you simply can consider to copy/paste an implementation to your own class and inject it into the KinesisMessageDrivenChannelAdapter: https://github.com/spring-projects/spring-integration-aws/blob/master/src/main/java/org/springframework/integration/aws/metadata/DynamoDbMetaDataStore.java
Although it is really available in the 1.1.0.RELEASE - I don't see reason for your to stick with the 1.1.0.M1: https://spring.io/blog/2017/11/27/spring-integration-for-aws-1-1-ga-available

SCDF: Can I use an outside microservice as a source?

I am trying to work through a solution where the workflow is like this:
User hits a microservice to upload images
That microservice de-duplicates the image and if it really is new, queues it up for processing
The processing chain lives in Spring Cloud Dataflow
The microservice already exists, and we are trying to extend it to do the fancy processing. My initial cut was to use the Http Source from the sample starter pack since that would be something I didn't have to create. The problem is that the source doesn't register itself with Spring Discovery server, so there is no way to get an end point without making gross assumptions (like it lives on the dataflow server at port XYZ).
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.

The lifecycle of the server is decoupled from the apps it deploys, that was intentional.
I'm not following your thoughts on how dataflow could connect the start of the queue, but from your description there's a few things you could do:
You would need to modify the app in order to have it registered with eureka, but this is a very simple operation, no more than a few lines of code:
You can either start from a stream app perspective: https://start-scs.cfapps.io/ , select http source, your binder, and then add the spring-cloud-netflix library as well as #EnableDiscoveryClient at the Main boot class
Start with http://start.spring.io Select Stream Rabbit or Stream Kafka, add Web and netflix libraries, then add the #EnableDiscoveryClient and #EnableBinding annotations and create a simple HTTP endpoint for your use case.
In any case should be a small addition.
You can also open an issue at :https://github.com/spring-cloud-stream-app-starters/http/issues suggesting that we add #EnableDiscoveryClient to the http source app, we can take that in consideration on our next iteration as well.

I'll try to clarify few bits.
upload images -> if it really is new -> queues it up for processing
Upon a new upload event, you'd want to process the image. Here's a similar use-case, but more of a real-time streaming style solution. This is not what you're looking to do, but I thought it might be useful.
Porting the image processing code to a Spring Cloud Stream application is as simple as adding #EnableBinding(Processor.class). It is the same business logic - whether you're running it separately or orchestrating it via SCDF, it is still a standalone microservice. However, SCDF expects it to be either a Source, Processor, Sink, or Task application types. We will be opening this up to support any arbitrary "functions" (lambdas) in the future release.
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
This is one of the standard solutions. You can directly consume new events (images) from a queue/topic and process it in the image-processor that we created in previous step. The named-channel support in DSL facilitates just that.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
I'm not sure I understand this. If I were to assume, you're looking for "named-channel" as source and that is supported.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The moment you deploy a Stream in SCDF, all the individual steps included in the DSL (i.e., stream definition) are resolved and deployed as standalone apps in the target runtime (cloud foundry, kubernetes, etc.,). Once deployed, it is left to the platform where the apps run for lifecycle management. SCDF does not retain or track the app states.

How to monitor streaming apps Inside SCDF?

I am novice to Spring Cloud Data flow and Stream Cloud Streaming Applications.
Currently my project diagram looks like following :
I route a POST request from outside client using zuul API gateway to a microservice called Composite. Composite creates a stream using REST POST and deployes onto Spring Cloud Data Flow Server. As far as I know the microservices mongodb and file run as co-existing JVM processes. If My client has to know the status of stream, status of the processed data, How should Composite Microservice interact with Spring Cloud Data Flow Server? Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server. Does SCDF expose any hooks to look at the individual apps? Also how can I change the flow #runtime to create a dynamic mesh?
Currently I am using Local Spring Cloud Data Flow Server for development.

Runtime platform is local
Local runtime is recommended only for development purpose and if you're preparing for production, please make sure to choose a platform variant (eg: cf, k8s, yarn, ..) that comes with non-functional requirements to support reliable and durable execution of all the applications running in streaming pipeline.
As far as I know the microservices mongodb and file run as co-existing JVM processes.
If your stream definition is file | mongodb, you'd have 2 different JVM's even when using Local runtime. They're independent Boot applications.
How should Composite Microservice interact with Spring Cloud Data Flow Server?
Not clear what you mean by "composite" here. All the microservice applications in SCDF communicate via messaging middleware such as Kafka or Rabbit. SCDF provides the orchestration capability to run such applications into various runtime platforms.
Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server
You can use SCDF's REST-APIs to query for current status of the apps and it is platform agnostic. You can view the list of supported APIs by hitting the root URL (see image below) - there's a gap in docs - we will fix it. Following APIs could be useful for status checks.
Does SCDF expose any hooks to look at the individual apps?
Once the apps are deployed in a runtime platform, you can take advantage of Boot's actuator endpoints to explore more details such as trace, metrics, health, env among others at each application level. See Boot's actuator endpoints for more details. For instance, if your mongodb app is running locally and on port 23000, then you can check granular metrics for this application at: http://localhost:23000/metrics.
[As an FYI: future SCDF releases would include integrating Spring Boot + Spring Cloud Sleuth metrics and visual representation of the same.]
Also how can I change the flow #runtime to create a dynamic mesh?
If you're referring to editing a running streaming pipeline with addition/deletes, we are currently exploring design approach to support this functionality.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio