spring xd on openstack - spring-xd

Can Spring XD run on top of OpenStack which is a Iaas/Paas.
As Hadoop service can be encapsulated or integrated on Openstack, Supposing Spring XD has interface with OpenStack to talk to hadoop framework, is it something which is already supported?

Openstack is used to setup virtual linux machines, storage and network. And you can run the Spring XD processes (xd-worker, xd-master) on these linux machines. So, yes you can run Spring XD on openstack.
You could also run Spring XD on top of Yarn running on Openstack. See the Spring XD guide for this: http://docs.spring.io/spring-xd/docs/current/reference/html/#running-on-YARN . This way you can share the computing resources between Spring XD and Hadoop jobs.
Spring XD can interoperate with Hadoop (to be precise: with HDFS) through its HDFS-Sinks and HDFS-Sources.

Related

How to run task on a different server than Spring Cloud Data Local server

I want to host a Spring Cloud Data Flow Local server for Monitoring and executing my various Spring Boot Batch projects.
The issue or the infrastructure I want to achieve is that, I want my Spring Cloud Data Flow Server host on Server A which can be able to execute Spring Boot Batches/Tasks on Server B.
Is this a possible configuration I am trying to achieve ? If not how should I achieve this ? Since I have few Spring Boot batch applications which run on different server.
This is not how SCDF works. So I do no think it is possible. If you want to monitor your batch jobs then you need to register your jobs in the SCDF server.
It depends on how you launch and configure your batch applications. You can have a custom task application (call it batch-launcher) that launches your batch job on an external cluster. But, in terms of monitoring the application, SCDF can help monitoring the task application (the batch-launcher) that is used for launching your actual batch but not the actual job that is running on the external cluster (unless you have a mechanism to retrieve the metrics of the batch application into the batch-launcher).
Launching a Spark compute job on a Spark cluster using an SCDF task (using Spark client) is one such example. In this case, you would register the SCDF task and monitor only the Spark client task app via SCDF (not the Spark Compute Job).

data stream between Kerberized kafka cluster to hadoop cluster using Spring boot

I have a streaming use case to develop an Spring boot application where it should read data from kafka topic and put into hdfs path, I got two distinct cluster for kafka and hadoop.
Application worked fine without having kerberos authentication in kafka cluster and hadoop being kerberized.
Issues started when both cluster being kerberized, At the same time i could only authenticate into only one cluster.
I did few analysis/googling , i could not find much of help,
My theory is we could not login/authenticate into two kerberized cluster at same jvm instance because we need to set REALM and KDC details in code which are not client specific but jvm specific,
It might happen that i did not used proper APIs, I am very new to Spring boot.
I know we can do this by setting cross realm trust between clusters but i am looking for application level solutions if possible.
I got few questions
is it possible to login/authenticate two separate kerberized cluster at same jvm instance, if possible? please help me, use of Spring boot is preferred.
What would be the best solution to stream data from kafka cluster to hadoop cluster.
What would be the best solution to stream data from kafka cluster to hadoop cluster.
Kafka's Connect API is for streaming integration of sources and targets with Kafka, using just configuration files - no coding! The HDFS connector is what you want, and supports Kerberos authentication. It is open source and available standalone or as part of Confluent Platform.

Spring Cloud Data Flow Remote RabbitMQ Server Config

I am new to SCDF and am trying to get started with a RabbitMQ transport layer and SCDF version 1.2.2. I have setup RabbitMQ in a separate VM and have the SCDF local server and SCDF shell jar in one VM. Can someone suggest how I can specify the server details of my RabbitMQ (which is in a different host in the same network) for SCDF to use as a transport.
For reasons outside my control I need to use the MQ setup in a different machine. Please advise.
SCDF doesn't require RabbitMQ and I think you are trying to use RabbitMQ as the binder for your Spring Cloud Stream applications that are orchestrated via SCDF.
You would need to configure the properties mentioned here
You can find more information here on how to specify these properties at SCDF.

Programmatic method of starting xd-admin and xd-container in spring XD

In a distributed setup, how do I programatically start containers?
More specifically, does there exist any API similar to deploying and undeploying streams for setting up and tearing down containers?
There is currently no way to do this via an API. Containers are only known to the cluster after they are started. Upon initialization, the container registers itself with ZooKeeper. Running a container requires XD to be installed on that host which is currently a manual process: download,unzip,configure, as is starting the container. Some automation of operations will likely be provided in a future release.

Spring XD Distributed Environment

I am working on Spring XD and GemFire XD. I want to understand how Spring XD's distributed environment works. I know spring xd uses either redis or rabittmq as the transport.
I am clear about this, I have install spring xd and rabittmq on one machine. I changed the redis.properties file and added hostnames.
Do I need to install spring xd on all the machines? If so, after installing, how to bring those up.
On the master machine, I will do ./xd-admin and ./xd-container
How do you start up the nodes (spring xd instances/workers) so that they can listen for instructions from xd-admin?
Please help me on this.
Thanks,
-Suyodhan
Redis is used for analytics as only supported platform. For transport, you need either Redis or Rabbit.
Basically you just need to install Redis and RabbitMQ per their respective documentation. They can be in same or different servers, Ideally you would use their high availability option. For example Redis Sentinal. YOu don't need RabbitMQ unless you want to change the default transport from Redis to Rabbit. Once you install Redis and Rabbit, bring them up and provide their host:port info (and any additional as applicable) to the servers.yml in XD install (in all nodes) and bring up admin and containers. Evrything should work automatically by using zookeeper as the means to manage the distributed runtime.
If you use Spring XD in distributed mode, I assume you have set up zookeeper as well. (If not check this http://docs.spring.io/spring-xd/docs/1.0.0.M7/reference/html/#_setting_up_zookeeper )
Admin and Container instances register themselves with Zookeeper as they come up. Admin queries zookeeper for available containers and assign tasks like deploying modules. Zookeeper is the trick behind Distributed mode.
Hope this helps.
You will install Spring xd one time on one machine, Spring XD will be connected to your hdfs distributed scaled out environment.
You need to start the followings:
1. redis or rappitMQ in your case
2. hsqldb server
3. container
4. admin
when you start spring xd, you need to register the name node firstly using the command:
hadoop config fs --name hdfs://serverip:8020
then you can use any module defined in spring xd (using stream or batch) by specifying its parameters directly without specifying those in the server.yml file.
Moha.

Resources