Benefit of running Storm under supervision, example/sample code - apache-storm

I have installed the storm correctly. But, I am struggling how to run an example on storm. Can anyone please give me the link or suggestion by which I can execute the example?Also, what are the benefit of running storm under supervision?

Assuming you have installed the storm on your local machine then you have an example storm project bundled along it which you can find in the examples/storm-starter of your storm repository.
To run this example you can follow the series of steps mentioned in README.markdown file in the root folder of storm-starter folder. The steps can also be found at https://github.com/apache/storm/tree/v0.10.0/examples/storm-starter
Regarding running storm and under supervision, the benefit is that since Storm and zookeeper have a fail fast policy, the servers will shutdown if there is an error. Using a supervisor process can bring up the servers in case of they exit the process because of errors.

Related

Apache Flink on Windows

First, I am a complete newbie with Flink. I have installed Apache Flink on Windows.
I start Flink with start-cluster.bat. It prints out
Starting a local cluster with one JobManager process and one
TaskManager process. You can terminate the processes via CTRL-C in the
spawned shell windows. Web interface by default on
http://localhost:8081/.
Anyway, when I submit the job, I have a bunch of messages:
DEBUG org.apache.flink.runtime.rest.RestClient - Received response
{"status":{"id":"IN_PROGRESS"}}.
In the log in the web UI at http://localhost:8081/, I see:
2019-02-15 16:04:23.571 [flink-akka.actor.default-dispatcher-4] WARN
akka.remote.ReliableDeliverySupervisor
flink-akka.remote.default-remote-dispatcher-6 - Association with
remote system [akka.tcp://flink#127.0.0.1:56404] has failed, address
is now gated for [50] ms. Reason: [Disassociated]
If I go to the Task Manager tab, it is empty.
I tried to find if any port needed by flink was in use but it does not seem to be the case.
Any idea to solve this?
So I was running Flink locally using Intelij
Using ArchType that gives you ready to go examples
https://ci.apache.org/projects/flink/flink-docs-stable/dev/projectsetup/java_api_quickstart.html
You not necessary have to install it unless you are using Flink as a service on cluster.
Code editor will compile it just fine for spot instance of Flink for one code run.

Jenkins as JobServer on Hadoop EdgeNode

I´m not sure that someone can help me but I´ll take a try.
I´m running Jenkins on an Openshift-Cluster to use it for Deployment and as a jobserver for running ETL-Jobs. These jobs are transferring data from flatfiles to databases and from db to db.
Now, I should expand the system to transfer data to a hadoop cluster using MapR.
What I would like to know is, how can I use a new Jenkins-Slave as a jobserver on an EdgeNode from the hadoop-cluster using MapR. Do I need the Jenkins on the EdgeNode or am I able to use MapR from my existing Jenkins-Jobserver?
Mabye, someone is able to help me or has some informations/links how to solve it.
Thx to all....
"Use MapR" isn't quite clear to me because I just view it as Hadoop at the end of the day, but you can effectively make your Jenkins slave an "edge node" by installing only the Hadoop Java (maybe also MapR) client utilities plus any XML configuration files from the other edge nodes that define how to communicate with the cluster.
Then, Jenkins would be able to run sh("hadoop jar app.jar"), for example
If you're using Openshift, you might also try putting a Hadoop client inside a Docker image that could run in Jenkins, or anywhere else

Unsuccessful deployment on Spring Cloud Dataflow with Apache YARN

I have installed a single-node Apache YARN with Kafka-Zookeeper and Spring Cloud Dataflow 1.0.3.
All is working fine, but when I made some deployment examples, like:
stream create --name "ticktock" --definition "time | hdfs --rollover=100" --deploy
http --port=8000 | log --name=logtest --level=INFO
The stream's status never stays on "deployed". It keeps changing on "undeployed"->"parcial"->"deployed" in a constant loop.
However, the YARN application deploys successfully, but it's like the comunication between Spring Cloud Dataflow server instance and Apache Hadoop it's constantly failing.
What can be the possible issue for this?
Thanks in advance!
Well, there's not a lot of information to go on here, but you may want to check and make sure you have the necessary base directory created in HDFS and that your Yarn user has read/write permission to it.
spring.cloud.deployer.yarn.app.baseDir=/dataflow
Thanks to all for your answers! Been busy with other high-priority projects, but I was able to build all using Ambari (including the plugin mentioned by Sabby Adannan). Now all is working great!.

Giraph ShortestPath demo never exits, Patch 756 already applied (I think)

I am a novice to Hadoop and Giraph. I am trying to run the Giraph ShortestPaths example using Giraph 1.1 on our server, which is running YARN. After much hair-pulling, I finally got it to run. Now the problem is to get it to stop.
The giraph process initializes, and begins running. And then it keeps running. I see log messages that state it is running (with a number of containers) and the time elapsed.
I browsed StackOverflow and other sites to find a solution to the problem. One post mentioned a patch 756 to giraph. However, I inspected the code, and it appears that I already have a patched version (I see the HaltInstructionsWriter class, for instance).
How do I get giraph to recognize a request to halt? Or do I need to modify the example code.
The Giraph example should end in 50 seconds aprox., and doesn't need that patch for ending. Probably, there is some problem and it doesn't finish. You should check the logs, for example the Giraph Application Manager, and see there what is happenning with the contaniers created.
Even when you are seeing that Giraph is running your shortest path example with a "RUNNING" state, doesn't means that everything is all right ;)

Cascading for Impatient TFIDF example freezing

I'm trying to work with Cascading to create and execute complex data processing workflows on a local Hadoop cluster.
I wish to create a TFIDF vector so I can apply Machine Learning algorithms such as NaiveBayes on it using the Apache Spark framework.
The problem is that after I create the jar and I launch it using the following commands the program freezes. Here is the log file.
You can find the sources here. The related source code is in part6.
Thanks!
I have found the problem. The nodes of the cluster were unhealthy but the log doesn't show that and cascading freezes as it's task has been UNASSIGNED.
So to solve the problem you have to correct the nodes health in my case I just had to correct hadoop-yarn containers directory and also it's local namenode directory.
You might run into other errors, So I suggest that you check your hadoop log files and the WebUI admin for Hadoop Nodes.

Resources