I have installed a single-node Apache YARN with Kafka-Zookeeper and Spring Cloud Dataflow 1.0.3.
All is working fine, but when I made some deployment examples, like:
stream create --name "ticktock" --definition "time | hdfs --rollover=100" --deploy
http --port=8000 | log --name=logtest --level=INFO
The stream's status never stays on "deployed". It keeps changing on "undeployed"->"parcial"->"deployed" in a constant loop.
However, the YARN application deploys successfully, but it's like the comunication between Spring Cloud Dataflow server instance and Apache Hadoop it's constantly failing.
What can be the possible issue for this?
Thanks in advance!
Well, there's not a lot of information to go on here, but you may want to check and make sure you have the necessary base directory created in HDFS and that your Yarn user has read/write permission to it.
spring.cloud.deployer.yarn.app.baseDir=/dataflow
Thanks to all for your answers! Been busy with other high-priority projects, but I was able to build all using Ambari (including the plugin mentioned by Sabby Adannan). Now all is working great!.
Related
I´m not sure that someone can help me but I´ll take a try.
I´m running Jenkins on an Openshift-Cluster to use it for Deployment and as a jobserver for running ETL-Jobs. These jobs are transferring data from flatfiles to databases and from db to db.
Now, I should expand the system to transfer data to a hadoop cluster using MapR.
What I would like to know is, how can I use a new Jenkins-Slave as a jobserver on an EdgeNode from the hadoop-cluster using MapR. Do I need the Jenkins on the EdgeNode or am I able to use MapR from my existing Jenkins-Jobserver?
Mabye, someone is able to help me or has some informations/links how to solve it.
Thx to all....
"Use MapR" isn't quite clear to me because I just view it as Hadoop at the end of the day, but you can effectively make your Jenkins slave an "edge node" by installing only the Hadoop Java (maybe also MapR) client utilities plus any XML configuration files from the other edge nodes that define how to communicate with the cluster.
Then, Jenkins would be able to run sh("hadoop jar app.jar"), for example
If you're using Openshift, you might also try putting a Hadoop client inside a Docker image that could run in Jenkins, or anywhere else
Has someone successfully run Flink jobs with this kind of setup (Github CI CD and Kubernetes)?
Since Flink jobs can’t be dockerized and deployed in a natural way as part
of the container I am not very sure of how is the
best way of doing this.
Thanks
Yes, this can be done. For the dockerizing portion, see the the docs about running Flink on Docker and running Flink on Kubernetes, as well as Patrick Lukas' Flink Forward talk on "Flink in Containerland". You'll find links to docker hub, github, slideshare, and youtube behind these links.
dA Platform 2 is a commercial offering from data Artisans that supports CI/CD integrations for Flink on Kubernetes. The demo video from the product announcement at Flink Forward Berlin 2017 illustrates this.
I solved this stuff. Flink jobs can be dockerized and deployed in a natural way as part of the container.
I extend flink Docker and added "Flink run "some.jar"" step to it.
It works perfect
https://github.com/Aleksandr-Filichkin/flink-k8s
I have been using a Hadoop cluster, created using Google's script, for a few months.
Every time I boot the machines I have to manually start Hadoop using:
sudo su hadoop
cd /home/hadoop/hadoop-install/sbin
./start-all.sh
Besides scripting, how can I resolve this?
Or is this just the way it is by default?
(The first boot after cluster creation always starts Hadoop automatically, why not always?)
You have to configure using init.d.
Document provide more details and sample script for datameer. You need to follow similar steps. Script should be smart enough to check all the nodes in the cluster are up before invoking this script using ssh.
While different third-party scripts and "getting started" solutions like Cloud Launcher have varying degrees of support for automatic restart of Hadoop on boot, the officially supported tools are bdutil as a do-it-yourself deployment tool, and Google Cloud Dataproc as a managed service, both of which are already configured with init.d and/or systemd to automatically start Hadoop on boot.
More detailed instructions on using bdutil here.
I have created the nodered boilerplate and i have binded the Analytics for Apache Hadoop service.
So it clearly appears as a binded service into the dashboard.
But when I launch the Nodered app and add a HDFS node, I get the following message:
"Unbounded Service: Big Insights service not bound. This node wont work"
Any idea of what i am doing wrong? It used to work well for me a few weeks ago.
You will need to attach the BigInsights for Apache Hadoop service service to your app.
Please attach the service and restage/restage your app.
I have installed the storm correctly. But, I am struggling how to run an example on storm. Can anyone please give me the link or suggestion by which I can execute the example?Also, what are the benefit of running storm under supervision?
Assuming you have installed the storm on your local machine then you have an example storm project bundled along it which you can find in the examples/storm-starter of your storm repository.
To run this example you can follow the series of steps mentioned in README.markdown file in the root folder of storm-starter folder. The steps can also be found at https://github.com/apache/storm/tree/v0.10.0/examples/storm-starter
Regarding running storm and under supervision, the benefit is that since Storm and zookeeper have a fail fast policy, the servers will shutdown if there is an error. Using a supervisor process can bring up the servers in case of they exit the process because of errors.