Spring XD Failover - spring-xd

Does Spring XD support automatic failover when running in distributed mode? Right now, if I create a simple stream on a setup with two containers, the source gets deployed to one container and the sink get deployed to the other container. If I shut down the second container, the stream is listed as deployed. If I undeploy and deploy the stream again, things start working as expected with both source and sink deployed to one container. I would expect this to happen automatically. I am using version 1.0.0.M3 with the following example stream:
stream create --definition "time | log" --name ticktock

It it does not support automatic failover. Better mgmt of streams, e.g. deployment manifests/monitoring, is starting to be developed now, stay tuned.
Cheers,
Mark

Related

How to check current environment is spring container?

I designed a component that collects application runtime data, which is sent to the analysis server via Kafka. In most cases, apps will integrate Kafka. In order to avoid connect to same Kafka twice, I need to determine whether the app uses Kafka. If the app uses Kafka, I directly reuse the connections.
So, how can I predict app uses kafka ?
And if app integrate spring-kafka, what should I do ?
avoid connect to same Kafka twice,
There should be no issue with this. Clients are lightweight enough to create more than one of them in one app, from one machine, etc
determine whether the app uses Kafka.
Beyond decompilation, if it's a fat jar, you can jar -tf app.jar | grep -i Kafka, however this would only tell you there's files with the word "Kafka" in the package, not necessarily that any Apache Kafka clients are in use

REST API for streams redeployment

Is it possible to redeploy streams using the REST API? The current documentation does not provide much info - https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#api-guide-resources-stream-deployments
I am guessing one would need to execute this as a 2-step process (assuming there are REST APIs)
invoke un-deploy, followed by
deploy
Thanks in advance!
That is correct. We do not have a re-deployment workflow in SCDF. You will have to do that in 2 steps or you can operate on individual applications through the runtime-platform (e.g., cf or k8s) specific blue-green deployment options. Any external operations will eventually reflect the current state in SCDF streams.
All that said, you may want to look into Skipper + SCDF integration. This is purpose-built to open-up CI/CD operations natively from within SCDF. The first milestone of skipper + scdf integration will be released this week. Here's an example of the UX in CF and K8S.
The stream skipper update .. operation is backed by a REST endpoint and that can be used standalone, too.

Spring Cloud Data Flow Remote RabbitMQ Server Config

I am new to SCDF and am trying to get started with a RabbitMQ transport layer and SCDF version 1.2.2. I have setup RabbitMQ in a separate VM and have the SCDF local server and SCDF shell jar in one VM. Can someone suggest how I can specify the server details of my RabbitMQ (which is in a different host in the same network) for SCDF to use as a transport.
For reasons outside my control I need to use the MQ setup in a different machine. Please advise.
SCDF doesn't require RabbitMQ and I think you are trying to use RabbitMQ as the binder for your Spring Cloud Stream applications that are orchestrated via SCDF.
You would need to configure the properties mentioned here
You can find more information here on how to specify these properties at SCDF.

SCDF: Can I use an outside microservice as a source?

I am trying to work through a solution where the workflow is like this:
User hits a microservice to upload images
That microservice de-duplicates the image and if it really is new, queues it up for processing
The processing chain lives in Spring Cloud Dataflow
The microservice already exists, and we are trying to extend it to do the fancy processing. My initial cut was to use the Http Source from the sample starter pack since that would be something I didn't have to create. The problem is that the source doesn't register itself with Spring Discovery server, so there is no way to get an end point without making gross assumptions (like it lives on the dataflow server at port XYZ).
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The lifecycle of the server is decoupled from the apps it deploys, that was intentional.
I'm not following your thoughts on how dataflow could connect the start of the queue, but from your description there's a few things you could do:
You would need to modify the app in order to have it registered with eureka, but this is a very simple operation, no more than a few lines of code:
You can either start from a stream app perspective: https://start-scs.cfapps.io/ , select http source, your binder, and then add the spring-cloud-netflix library as well as #EnableDiscoveryClient at the Main boot class
Start with http://start.spring.io Select Stream Rabbit or Stream Kafka, add Web and netflix libraries, then add the #EnableDiscoveryClient and #EnableBinding annotations and create a simple HTTP endpoint for your use case.
In any case should be a small addition.
You can also open an issue at :https://github.com/spring-cloud-stream-app-starters/http/issues suggesting that we add #EnableDiscoveryClient to the http source app, we can take that in consideration on our next iteration as well.
I'll try to clarify few bits.
upload images -> if it really is new -> queues it up for processing
Upon a new upload event, you'd want to process the image. Here's a similar use-case, but more of a real-time streaming style solution. This is not what you're looking to do, but I thought it might be useful.
Porting the image processing code to a Spring Cloud Stream application is as simple as adding #EnableBinding(Processor.class). It is the same business logic - whether you're running it separately or orchestrating it via SCDF, it is still a standalone microservice. However, SCDF expects it to be either a Source, Processor, Sink, or Task application types. We will be opening this up to support any arbitrary "functions" (lambdas) in the future release.
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
This is one of the standard solutions. You can directly consume new events (images) from a queue/topic and process it in the image-processor that we created in previous step. The named-channel support in DSL facilitates just that.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
I'm not sure I understand this. If I were to assume, you're looking for "named-channel" as source and that is supported.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The moment you deploy a Stream in SCDF, all the individual steps included in the DSL (i.e., stream definition) are resolved and deployed as standalone apps in the target runtime (cloud foundry, kubernetes, etc.,). Once deployed, it is left to the platform where the apps run for lifecycle management. SCDF does not retain or track the app states.

Spring XD Distributed Environment

I am working on Spring XD and GemFire XD. I want to understand how Spring XD's distributed environment works. I know spring xd uses either redis or rabittmq as the transport.
I am clear about this, I have install spring xd and rabittmq on one machine. I changed the redis.properties file and added hostnames.
Do I need to install spring xd on all the machines? If so, after installing, how to bring those up.
On the master machine, I will do ./xd-admin and ./xd-container
How do you start up the nodes (spring xd instances/workers) so that they can listen for instructions from xd-admin?
Please help me on this.
Thanks,
-Suyodhan
Redis is used for analytics as only supported platform. For transport, you need either Redis or Rabbit.
Basically you just need to install Redis and RabbitMQ per their respective documentation. They can be in same or different servers, Ideally you would use their high availability option. For example Redis Sentinal. YOu don't need RabbitMQ unless you want to change the default transport from Redis to Rabbit. Once you install Redis and Rabbit, bring them up and provide their host:port info (and any additional as applicable) to the servers.yml in XD install (in all nodes) and bring up admin and containers. Evrything should work automatically by using zookeeper as the means to manage the distributed runtime.
If you use Spring XD in distributed mode, I assume you have set up zookeeper as well. (If not check this http://docs.spring.io/spring-xd/docs/1.0.0.M7/reference/html/#_setting_up_zookeeper )
Admin and Container instances register themselves with Zookeeper as they come up. Admin queries zookeeper for available containers and assign tasks like deploying modules. Zookeeper is the trick behind Distributed mode.
Hope this helps.
You will install Spring xd one time on one machine, Spring XD will be connected to your hdfs distributed scaled out environment.
You need to start the followings:
1. redis or rappitMQ in your case
2. hsqldb server
3. container
4. admin
when you start spring xd, you need to register the name node firstly using the command:
hadoop config fs --name hdfs://serverip:8020
then you can use any module defined in spring xd (using stream or batch) by specifying its parameters directly without specifying those in the server.yml file.
Moha.

Resources