Spring dataflow and GCP Pub Sub - microservices

I'm building an event-driven microservice architecture, which is supposed to be Cloud agnostic (as much as possible). Since this is initially going in GCP and I don't want to spend a long time in configurations and all that, I was going to use GCP's Pub/Sub directly for the event queue and would take care of other Cloud implementations later, but then I came across Spring Cloud Dataflow, which seemed nice because these are Spring Boot microservices and I needed a way to orchestrate them.
Does Spring Cloud Dataflow support Pub Sub as it's event queue?
Would it make my life easier in terms of configuration and setup going that path, rather than choosing a non native broker?

It'd be useful first to unpack the Spring Cloud Stream's "binder abstraction" because it is using this framework, you'd have a portable event-driven streaming application, which can run locally in your laptop or any cloud of your choice against the desired message broker.
Learn more about the binder-abstraction here. Here are all the available binder implementations of choice. Google PubSub is an option, and it is maintained by Google here.
Now, let's talk about Spring Cloud Data Flow (SCDF). Once when you have built the streaming applications, you could use SCDF to design+create a data pipeline made of such applications. There's the option to mix and reuse the collection of utility applications that we build, maintain, and release as well. The utility applications can be packaged with Google PubSub or other binders. More details here.
When you deploy the data pipeline, SCDF will resolve and download the individual applications to deploy them natively on platforms like Kubernetes or Cloud Foundry. We have users doing the same in a variety of cloud infrastructure (VMs, Bare-metal, EC2, Rackspace, etc.), including DIY platforms, too.
While also automating the deployment of the applications, SCDF will automate the configuration setup based on naming conventions derived from stream/task and application names as a combination. So, when the apps bootstrap, they would have automatically received the connection configurations (from SCDF) and as well the destination/topic to connect to along with the other metadata to reason through a collection of apps as a "stream" or a "task/batch" data pipeline. This allows you to monitor and manage the pipelines centrally.
Lastly, there's the native ability in SCDF to rolling-upgrade/rolling-downgrade 1 or many applications in a data pipeline without impacting the upstream or downstream consumers in production. More details here. There's a webinar recording (demo starts at ~41.25) on how to do with CI/CD automation.

Related

How is state handled in the go cloud?

In terraform we get a state file, and CloudFormation also has a notion of a working state. How does go cloud handle the state, do we have to create it ourselves?
For more info on Go Cloud
https://github.com/google/go-cloud
https://godoc.org/github.com/google/go-cloud
Terraform wants to solve the problem of managing and provisioning Cloud services.
Go Cloud wants to solve the problem of using Cloud services in application code.
So, they work well together. For example, the Go Cloud sample guestbook app (https://github.com/google/go-cloud/tree/master/samples/guestbook) uses Terraform to provision the resources needed to run the app on various Cloud providers; the application code in the sample has a small amount of provider-specific setup code, but the application logic itself is provider-agnostic.
go-cloud:
The Go Cloud Project is an initiative that will allow application developers to seamlessly deploy cloud applications on any combination of cloud providers. It does this by providing stable, idiomatic interfaces for common uses like storage and databases. Think database/sql for cloud products.
Terraform:
Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.
So with go-cloud you could create a tool like terraform that, for now can provide generic APIs for:
Unstructured binary (blob) storage
Variables that change at runtime (configuration)
Connecting to MySQL databases
Server startup and diagnostics: request logging, tracing, and health checking

Azure alternative to spring cloud dataflow process

I'm looking for the azure alternative for the Data flow model of Data Source-processor-sink.
I want the three entities to be separate microservices. I want to use messaging as a link between these three.
Basically, Source app takes the data from another service and sends it to processor while processor app acts on it and sends relevant notification/alert to sink.
I'm aware I can use rabbitmq for the messaging but I need to know which one will be better in azure - service bus topics or eventhub? and how can I use them?
At the moment, there isn't a Spring Cloud Stream binder implementation for Azure Event Hubs.
Unless we have this, the out-of-the-box or the custom apps cannot be built as a messaging-microservice app, where Spring Cloud Stream provides the programming model and Spring Cloud Data Flow lets you orchestrate the individual microserivces in to a data pipeline (i.e., source-processor-sink) via the DSL/Drag-and-Drop GUI.
Microsoft was exploring the binder implementation in the past; possibly it would end up in Azure Spring Boot project. Feel free to drop an issue on their backlog.

REST API for streams redeployment

Is it possible to redeploy streams using the REST API? The current documentation does not provide much info - https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#api-guide-resources-stream-deployments
I am guessing one would need to execute this as a 2-step process (assuming there are REST APIs)
invoke un-deploy, followed by
deploy
Thanks in advance!
That is correct. We do not have a re-deployment workflow in SCDF. You will have to do that in 2 steps or you can operate on individual applications through the runtime-platform (e.g., cf or k8s) specific blue-green deployment options. Any external operations will eventually reflect the current state in SCDF streams.
All that said, you may want to look into Skipper + SCDF integration. This is purpose-built to open-up CI/CD operations natively from within SCDF. The first milestone of skipper + scdf integration will be released this week. Here's an example of the UX in CF and K8S.
The stream skipper update .. operation is backed by a REST endpoint and that can be used standalone, too.

How to monitor streaming apps Inside SCDF?

I am novice to Spring Cloud Data flow and Stream Cloud Streaming Applications.
Currently my project diagram looks like following :
I route a POST request from outside client using zuul API gateway to a microservice called Composite. Composite creates a stream using REST POST and deployes onto Spring Cloud Data Flow Server. As far as I know the microservices mongodb and file run as co-existing JVM processes. If My client has to know the status of stream, status of the processed data, How should Composite Microservice interact with Spring Cloud Data Flow Server? Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server. Does SCDF expose any hooks to look at the individual apps? Also how can I change the flow #runtime to create a dynamic mesh?
Currently I am using Local Spring Cloud Data Flow Server for development.
Runtime platform is local
Local runtime is recommended only for development purpose and if you're preparing for production, please make sure to choose a platform variant (eg: cf, k8s, yarn, ..) that comes with non-functional requirements to support reliable and durable execution of all the applications running in streaming pipeline.
As far as I know the microservices mongodb and file run as co-existing JVM processes.
If your stream definition is file | mongodb, you'd have 2 different JVM's even when using Local runtime. They're independent Boot applications.
How should Composite Microservice interact with Spring Cloud Data Flow Server?
Not clear what you mean by "composite" here. All the microservice applications in SCDF communicate via messaging middleware such as Kafka or Rabbit. SCDF provides the orchestration capability to run such applications into various runtime platforms.
Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server
You can use SCDF's REST-APIs to query for current status of the apps and it is platform agnostic. You can view the list of supported APIs by hitting the root URL (see image below) - there's a gap in docs - we will fix it. Following APIs could be useful for status checks.
Does SCDF expose any hooks to look at the individual apps?
Once the apps are deployed in a runtime platform, you can take advantage of Boot's actuator endpoints to explore more details such as trace, metrics, health, env among others at each application level. See Boot's actuator endpoints for more details. For instance, if your mongodb app is running locally and on port 23000, then you can check granular metrics for this application at: http://localhost:23000/metrics.
[As an FYI: future SCDF releases would include integrating Spring Boot + Spring Cloud Sleuth metrics and visual representation of the same.]
Also how can I change the flow #runtime to create a dynamic mesh?
If you're referring to editing a running streaming pipeline with addition/deletes, we are currently exploring design approach to support this functionality.

How to dynamic deploy for standalone Spring batch using Spring Cloud Task

We are planning to retire the existing legacy java batch applications and recreate it with the latest available batch framework.
Given that we have a large number of batch jobs to be modernised, we are looking for a framework or architecture that would allow us to
Develop a batch solution that would allow us to dynamically deploy a new batch as and when they are created, without disturbing the existing deployed applications. - Does Spring cloud Task provide any of this feature. Note: We are looking only to deploy the apps to our local server, and has nothing to do with cloud.
If Spring Batch/Boot can provide us the feature we typically expect from a batch application, what is the special value add to go for Spring Cloud Task? - I wasn't able to completely understand this from the Spring documentation available online.
From the documentation of the Spring Cloud Task, I was able to understand that it allows an application to have many tasks within it. What should I do if each of the tasks have their own library dependencies, which might contradict with the dependencies of other Tasks? So in that case, should each of these tasks moved to a new application or this there a work around for that?
To answer your questions:
Does Spring Cloud Task handle orchestration - No. Spring Cloud Task does not handle orchestration of tasks or jobs. The component in this ecosystem that handles the deployment/orchestration of tasks or jobs is really Spring Cloud Data Flow (which is why I asked if you use any type of cloud platform including YARN, Cloud Foundry, Kubernetes, or Mesos...the environments supported by Spring Cloud Data Flow).
What added value does Spring Cloud Task provide over Spring Boot/Spring Batch - Spring Cloud Task is designed to provide a few things:
Similar abilities to Spring Batch with regards to state management without needing to create a batch job. When running a Boot application on a cloud environment, there is no standard way of getting the results from environment to environment (YARN handles job results differently from tasks on Cloud Foundry which is different from jobs on Kubernetes, etc). Spring Batch provides this but now all short lived processes need the overhead of the Batch API so Spring Cloud Task provides a lighter touch to those use cases.
Automatically adds informational listeners. With Spring XD, when you ran a job in an XD container, the XD container automatically added a number of informational listeners that broadcast events that you could listen for. Spring Cloud Task brings the same functionality without the need for the XD container.
Integration with Spring Cloud Stream. Spring Cloud Task provides the ability to launch tasks from messages received from Spring Cloud Stream. Also, the informational messages previously mentioned (both Batch events as well as Task events) are sent via Spring Cloud Stream channels.
The DeployerPartitionHandler. When working in a cloud environment, this PartitionHandler implementation allows you to launch workers for a partitioned batch job as tasks. This allows for the dynamic scaling of partitioned batch jobs instead of the traditional option of pre-deploying workers that listen for work which wastes resources in a modern cloud environment.
How does the packaging of multiple tasks work with dependencies - In short, this is not recommended. The idea of a Spring Cloud Task is that the execution of the Spring Boot application is the Task. While you could package up multiple tasks and using different methods, have them execute based on different stimulus, that goes against the 12 factor application concepts which are essential for correct use of Spring Cloud Task.
My two cents
For the best option for a modern batch platform, you really need to look into some from of platform first and that begins at the Cloud Foundry/Kubernetes/Mesos/YARN layer. Without that, you end up building a large part of the infrastructure yourself. That is why Spring XD evolved into Spring Cloud Data Flow. The added complexity that lived in the containers of Spring XD is removed by requiring a modern platform to run on (since they all handle those guarantees themselves). Without that piece, you're going to spend a lot of time managing the deployment and orchestration of applications that most modern platforms handle for you.
From there, the choice becomes pretty easy IMHO with Spring Cloud Task for simple tasks, Spring Batch for batch jobs, and Spring Cloud Data Flow for orchestration.

Resources