File path issue while creating simple spring XD stream - spring-xd

I am running spring XD on distributed YARN setup.I am using hortonworks data platform with 6 data nodes and 1 name node. and using name node as a client node.I have invoked xd Shell from name node and admin and containers are running on the data node. So when I create the spring XD stream definition as below:
xd> stream create --name filetest --definition "file | log" --deploy
It looks for the /tmp/xd/input/filetest on the data nodes to which i dont have access to. Is this the normal behavior of spring XD ? I think it should look for the location on the node from which i have invoked the XD shell. Could you please help me on this.

The containers (regardless of whether they are running on Yarn or not) have no knowledge of where the shell is running.

Related

Run MapReduce Jar in Spring cloud data

I need to run a mapreduce spring boot application in spring cloud data flow. Usually applications registered in scdf is executed using "java -jar jar-name" command. But my program is a mapreduce and it has to be executed using "hadoop jar jar-name". How do I achieve this ? What would be better approach to run mapreduce application in scdf ? Is it possible to directly register mapreduce apps ?
I'm using local data flow server to register the application.
In SCDF the format of the command to run a JAR file is managed by a deployer. For example, there are local deployer. Cloud Foundry etc... There is/was Hadoop/YARN but it was discontinued I believe.
Given that the deployer itself is an SPI you can easily implement your own or even fork/extend local-deployer and modify only what's needed.

Spring XD Yarn: Stream runs only on exactly two containers

Spring XD Yarn: Stream runs only on exactly two containers
Spring XD Yarn ver 1.2.1
1.In servers.yml, set no of containers to 15.(I have 16 Node managers in my YARN cluster)
2.All 15 containers are created. I confirmed this by executing 'runtime containers' in xd-shell
3.When I run a Spring XD stream from kafka source to hdfs sink, exactly only two containers(of the 15 containers) are used. The remaining 13 containers are not used. My stream runs for 6 to 7 hrs.In all this 6 hrs, only two of the 15 live containers are used for this stream.
4. Please let me know how to make my stream run on all 15 live containers.
--> Is there any configuration that I missed, please do the needful.
You can take a look at the deployment manifest: http://docs.spring.io/spring-xd/docs/current/reference/html/#deployment-manifest
You can use deployment properties to scale up your stream and control the module count - i.e. how many instances of each module are you deploying. I would suspect that your stream runs with the default of 1, which means that you are getting exactly one source module instance and one sink module instance. The default deployment algorithm would indeed deploy them on separate containers.

Spring XD on YARN: ver 1.2.1 direct binding support for kafka source

Spring XD on YARN: ver 1.2.1 direct binding support for kafka source.
1.I know this is not supported yet(as of ver 1.3.0), any definite date/ver would help our project schedule ?
2.This direct binding for kafka source support is very critical for our project. We are in a situation to totally abandon Spring XD YARN in our project just because of this.
Trying to do
stream create --name directkafkatohdfs --definition "kafka | hdfs"
stream deploy directkafkatohdfs --properties "module.*.count=0"
Hitting the exception "must be a positive number. 0-count kafka sources are not currently supported"
I just want to eliminate the use of message bus/transport(redis/kafka/rabbitMQ) and want to have a direct binding of source(kafka) and sink(sink) in the same YARN container.
1.I know this is not supported yet(as of ver 1.3.0), any definite date/ver would help our project schedule.
2.This direct binding for kafka source support is very critical for our project. We are in a situation to totally abandon Spring XD YARN in our project just because of this.
Thanks
Satish Srinivasan
satsrinister#gmail.com
Thanks for the interest in Spring XD :).
For Spring XD 1.x, we suggest using composition instead of direct binding with the Kafka bus - or, in your case, the Kafka source. However, apart from that, in Spring XD 1.x it is not possible to create an entire stream without at least one hop over the bus (regardless of the type of bus or modules being used).
We are addressing direct binding (including support for entire directly bound streams) as part of Spring Cloud Data Flow (http://cloud.spring.io/spring-cloud-dataflow/) - which is the next evolution of Spring XD. We are intending to support it as a specific configuration option, rather than as a side-effect of zero-count modules. From an end-user perspective, SCDF supports the same DSL as Spring XD (with minor variations) and has the same administration UI, and definitely supports YARN, so it should be a fairly seamless transition. I would suggest starting to take a look at that. The upcoming 1.0.0.M2 release of Spring Cloud Data Flow will not support direct binding via DSL yet, but the intent is to support it in the final release which is currently planned for Q1 2016.

jdbc to HDFS import using spring batch job

I am able to import data from my MS sql to HDFS using JDBCHDFC Spring Batch jobs.But if that containers fails , the job does not shift to other container. How do I proceed to make the job fault tolerant.
I am using spring xd 1.0.1 release
You don't mention which version of Spring XD you're currently using so I can't verify the exact behavior. However, on a container failure with a batch job running in the current version, the job should be re-deployed to a new eligible container. That being said, it will not restart the job automatically. We are currently looking at options for how to allow a user to specify if they want it restarted (there are scenarios that fall into both camps so we need to allow a user to configure that).

Configure multiple instances of ActiveMq listeners on multiple JVM's

I want to configure multiple instances of ActiveMQ listeners on multiple JVM's(there is a 1-1 mapping between queues and listeners). We are separating listeners for high performance. Currently i have few options like configuring in database, spring xml and properties file. Not sure which is the best approach ... any help appreciated. Thanks.
configuring ActiveMQ listeners via Spring's MessagListenerContainer in Spring XML is the standard approach
see this page for more details: http://activemq.apache.org/spring-support.html
you can create multiple instance of ActiveMQ, use the following steps.
Go to your activemq directory bin folder and run the below commands
Create the Instance 1
cd /apache-activemq-5.8.0/bin
./activemq create instance1
./activemq setup ~/.activemqrc-instance-instance1
ln -s /home/[yourHomeDir]/.activemqrc-instance-instance1
Create the Instance 2
./activemq create instance2
./activemq setup ~/.activemqrc-instance-instance2
ln -s /home/[yourHomeDir]/.activemqrc-instance-instance2
Once above commands are executed, go to the instance2 conf and change the default port for the openwire, amqp in the activemq.xml and also change the Connector in jetty.xml.
You can start each instance as below.
cd apache-activemq-5.8.0/bin/instance1/bin
./instance1 console
Open a new Tab
cd apache-activemq-5.8.0/bin/instance2/bin
./instance2 console

Resources