SCDF: Can I use an outside microservice as a source? - spring

I am trying to work through a solution where the workflow is like this:
User hits a microservice to upload images
That microservice de-duplicates the image and if it really is new, queues it up for processing
The processing chain lives in Spring Cloud Dataflow
The microservice already exists, and we are trying to extend it to do the fancy processing. My initial cut was to use the Http Source from the sample starter pack since that would be something I didn't have to create. The problem is that the source doesn't register itself with Spring Discovery server, so there is no way to get an end point without making gross assumptions (like it lives on the dataflow server at port XYZ).
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.

The lifecycle of the server is decoupled from the apps it deploys, that was intentional.
I'm not following your thoughts on how dataflow could connect the start of the queue, but from your description there's a few things you could do:
You would need to modify the app in order to have it registered with eureka, but this is a very simple operation, no more than a few lines of code:
You can either start from a stream app perspective: https://start-scs.cfapps.io/ , select http source, your binder, and then add the spring-cloud-netflix library as well as #EnableDiscoveryClient at the Main boot class
Start with http://start.spring.io Select Stream Rabbit or Stream Kafka, add Web and netflix libraries, then add the #EnableDiscoveryClient and #EnableBinding annotations and create a simple HTTP endpoint for your use case.
In any case should be a small addition.
You can also open an issue at :https://github.com/spring-cloud-stream-app-starters/http/issues suggesting that we add #EnableDiscoveryClient to the http source app, we can take that in consideration on our next iteration as well.

I'll try to clarify few bits.
upload images -> if it really is new -> queues it up for processing
Upon a new upload event, you'd want to process the image. Here's a similar use-case, but more of a real-time streaming style solution. This is not what you're looking to do, but I thought it might be useful.
Porting the image processing code to a Spring Cloud Stream application is as simple as adding #EnableBinding(Processor.class). It is the same business logic - whether you're running it separately or orchestrating it via SCDF, it is still a standalone microservice. However, SCDF expects it to be either a Source, Processor, Sink, or Task application types. We will be opening this up to support any arbitrary "functions" (lambdas) in the future release.
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
This is one of the standard solutions. You can directly consume new events (images) from a queue/topic and process it in the image-processor that we created in previous step. The named-channel support in DSL facilitates just that.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
I'm not sure I understand this. If I were to assume, you're looking for "named-channel" as source and that is supported.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The moment you deploy a Stream in SCDF, all the individual steps included in the DSL (i.e., stream definition) are resolved and deployed as standalone apps in the target runtime (cloud foundry, kubernetes, etc.,). Once deployed, it is left to the platform where the apps run for lifecycle management. SCDF does not retain or track the app states.

Related

how to initialize a continous running stream using alpakka, spring boot & Akka-stream?

All,
I am developing an application, which use alpakka spring boot integration to read data from kafka. I have most of the code ready, the only place i am stuck is how to initialize a continuous running stream, as this is going to be a backend application and wont be having any api to be called from ?
As far as I know, Alpakka's Spring integration is basically designed around exposing Akka Streams via a Spring HTTP controller. So I'm not sure what purpose bringing Spring into this serves, since there's quite an impedance mismatch between the way an Akka application will tend to like to work and the way a Spring application will tend to like to work.
Assuming you're talking about using Alpakka Kafka, the most idiomatic thing to do would be to just start a stream fed by an Alpakka Kafka Source in your main method and it will run until killed or it fails. You may want to use a RestartSource around the consumer and business logic to ensure that in the event of failure the stream restarts (note that one should generally expect messages for which the offset commit hadn't happened to be processed again, as Kafka in typical cases can only guarantee at-least-once processing).

How to implement microservice Saga in google cloud platform

I am investigating solution to implement microservice Saga pattern in platform hosted in K8S in GCP.
There are 2 options: Eventulate Tram and Axon. However, these frameworks seem not to support message broker managed by cloud provider such as google-cloud-Pubsub whereas I do not want to deploy either Kafka or RabbitMQ to K8S since GCP support PubSub already.
So is there any way to integrate either Eventulate or Axon to use google cloud PubSub?
Thanks
Uncertain about Eventuate's angle on this, but Axon works with extensions as message brokers other than Axon Server. Throughout Axon's lifecycle (read: last 10 years), some of these have been provided, but none are currently used for all types of messages defined by Axon Framework. So, you wouldn't be able to use Kafka for sending commands in Axon for example.
Reasoning for this? Commands, events and queries have different routing requirements which should be reflected by using the right tool for the job.
To be a bit more specific on Axon's side, the following extensions can be used for distributing your messages:
AMQP -> for Events
Kafka -> for Events
JGroups -> for Commands
Spring Cloud Discovery -> for Commands
As you can tell, there currently is no Pub/Sub extension out there to allow you to distribute your messages. Added on top of that, my gut would tell me if it was available, then it would likely only be used for Event messages due to Pub/Sub's intent when it comes to being a message broker.
Luckily this actually makes it rather straightforward to create just such a extension yourself. Going into all the details to build this would be a little much, so I would recommend to have a look at Axon's AMQP extension first when it comes to achieving this. Hints on the matter are that for publication, you should add a component to handle Axon's events and publish them on Pub/Sub. For handling events, you are required to build a StreamableMessageSource or SubscribableMessageSource. These interfaces are used respectively by the TrackingEventProcessor and SubscribingEventProcessor, which in turn are the component in charge of dealing with the technical aspect of handling events.
By the way, if you would be building such an extension and you need a hand, it would be best to request this at AxonIQ forum, which you can find here.
Last note, and rather important I'd say, is the argument that such a connector would not be able to deal with all types of messages. If you would require a more full fledged Axon application to run in a distributed fashion, I would highly recommend to give Axon Server a try prior to building your own solution from the ground up.

Spring dataflow and GCP Pub Sub

I'm building an event-driven microservice architecture, which is supposed to be Cloud agnostic (as much as possible). Since this is initially going in GCP and I don't want to spend a long time in configurations and all that, I was going to use GCP's Pub/Sub directly for the event queue and would take care of other Cloud implementations later, but then I came across Spring Cloud Dataflow, which seemed nice because these are Spring Boot microservices and I needed a way to orchestrate them.
Does Spring Cloud Dataflow support Pub Sub as it's event queue?
Would it make my life easier in terms of configuration and setup going that path, rather than choosing a non native broker?
It'd be useful first to unpack the Spring Cloud Stream's "binder abstraction" because it is using this framework, you'd have a portable event-driven streaming application, which can run locally in your laptop or any cloud of your choice against the desired message broker.
Learn more about the binder-abstraction here. Here are all the available binder implementations of choice. Google PubSub is an option, and it is maintained by Google here.
Now, let's talk about Spring Cloud Data Flow (SCDF). Once when you have built the streaming applications, you could use SCDF to design+create a data pipeline made of such applications. There's the option to mix and reuse the collection of utility applications that we build, maintain, and release as well. The utility applications can be packaged with Google PubSub or other binders. More details here.
When you deploy the data pipeline, SCDF will resolve and download the individual applications to deploy them natively on platforms like Kubernetes or Cloud Foundry. We have users doing the same in a variety of cloud infrastructure (VMs, Bare-metal, EC2, Rackspace, etc.), including DIY platforms, too.
While also automating the deployment of the applications, SCDF will automate the configuration setup based on naming conventions derived from stream/task and application names as a combination. So, when the apps bootstrap, they would have automatically received the connection configurations (from SCDF) and as well the destination/topic to connect to along with the other metadata to reason through a collection of apps as a "stream" or a "task/batch" data pipeline. This allows you to monitor and manage the pipelines centrally.
Lastly, there's the native ability in SCDF to rolling-upgrade/rolling-downgrade 1 or many applications in a data pipeline without impacting the upstream or downstream consumers in production. More details here. There's a webinar recording (demo starts at ~41.25) on how to do with CI/CD automation.

Azure alternative to spring cloud dataflow process

I'm looking for the azure alternative for the Data flow model of Data Source-processor-sink.
I want the three entities to be separate microservices. I want to use messaging as a link between these three.
Basically, Source app takes the data from another service and sends it to processor while processor app acts on it and sends relevant notification/alert to sink.
I'm aware I can use rabbitmq for the messaging but I need to know which one will be better in azure - service bus topics or eventhub? and how can I use them?
At the moment, there isn't a Spring Cloud Stream binder implementation for Azure Event Hubs.
Unless we have this, the out-of-the-box or the custom apps cannot be built as a messaging-microservice app, where Spring Cloud Stream provides the programming model and Spring Cloud Data Flow lets you orchestrate the individual microserivces in to a data pipeline (i.e., source-processor-sink) via the DSL/Drag-and-Drop GUI.
Microsoft was exploring the binder implementation in the past; possibly it would end up in Azure Spring Boot project. Feel free to drop an issue on their backlog.

How to monitor streaming apps Inside SCDF?

I am novice to Spring Cloud Data flow and Stream Cloud Streaming Applications.
Currently my project diagram looks like following :
I route a POST request from outside client using zuul API gateway to a microservice called Composite. Composite creates a stream using REST POST and deployes onto Spring Cloud Data Flow Server. As far as I know the microservices mongodb and file run as co-existing JVM processes. If My client has to know the status of stream, status of the processed data, How should Composite Microservice interact with Spring Cloud Data Flow Server? Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server. Does SCDF expose any hooks to look at the individual apps? Also how can I change the flow #runtime to create a dynamic mesh?
Currently I am using Local Spring Cloud Data Flow Server for development.
Runtime platform is local
Local runtime is recommended only for development purpose and if you're preparing for production, please make sure to choose a platform variant (eg: cf, k8s, yarn, ..) that comes with non-functional requirements to support reliable and durable execution of all the applications running in streaming pipeline.
As far as I know the microservices mongodb and file run as co-existing JVM processes.
If your stream definition is file | mongodb, you'd have 2 different JVM's even when using Local runtime. They're independent Boot applications.
How should Composite Microservice interact with Spring Cloud Data Flow Server?
Not clear what you mean by "composite" here. All the microservice applications in SCDF communicate via messaging middleware such as Kafka or Rabbit. SCDF provides the orchestration capability to run such applications into various runtime platforms.
Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server
You can use SCDF's REST-APIs to query for current status of the apps and it is platform agnostic. You can view the list of supported APIs by hitting the root URL (see image below) - there's a gap in docs - we will fix it. Following APIs could be useful for status checks.
Does SCDF expose any hooks to look at the individual apps?
Once the apps are deployed in a runtime platform, you can take advantage of Boot's actuator endpoints to explore more details such as trace, metrics, health, env among others at each application level. See Boot's actuator endpoints for more details. For instance, if your mongodb app is running locally and on port 23000, then you can check granular metrics for this application at: http://localhost:23000/metrics.
[As an FYI: future SCDF releases would include integrating Spring Boot + Spring Cloud Sleuth metrics and visual representation of the same.]
Also how can I change the flow #runtime to create a dynamic mesh?
If you're referring to editing a running streaming pipeline with addition/deletes, we are currently exploring design approach to support this functionality.

Resources