How to know current running topologies from storm command line client? - apache-storm

Is there any way to display all the current running Storm topologies from storm command line client?
Storm documentation doesn't say anything about this.
http://storm.apache.org/documentation/Command-line-client.html

You can run
$STORM_HOME/bin/storm list
storm provides a web based UI for monitoring such informations.
However you can start writing your own Thrift client to connect to the broker and get various matrix based on your need. If you are from Java background or similar then it should be easy to write and execute from the prompt.

Related

Apache Flink on Windows

First, I am a complete newbie with Flink. I have installed Apache Flink on Windows.
I start Flink with start-cluster.bat. It prints out
Starting a local cluster with one JobManager process and one
TaskManager process. You can terminate the processes via CTRL-C in the
spawned shell windows. Web interface by default on
http://localhost:8081/.
Anyway, when I submit the job, I have a bunch of messages:
DEBUG org.apache.flink.runtime.rest.RestClient - Received response
{"status":{"id":"IN_PROGRESS"}}.
In the log in the web UI at http://localhost:8081/, I see:
2019-02-15 16:04:23.571 [flink-akka.actor.default-dispatcher-4] WARN
akka.remote.ReliableDeliverySupervisor
flink-akka.remote.default-remote-dispatcher-6 - Association with
remote system [akka.tcp://flink#127.0.0.1:56404] has failed, address
is now gated for [50] ms. Reason: [Disassociated]
If I go to the Task Manager tab, it is empty.
I tried to find if any port needed by flink was in use but it does not seem to be the case.
Any idea to solve this?
So I was running Flink locally using Intelij
Using ArchType that gives you ready to go examples
https://ci.apache.org/projects/flink/flink-docs-stable/dev/projectsetup/java_api_quickstart.html
You not necessary have to install it unless you are using Flink as a service on cluster.
Code editor will compile it just fine for spot instance of Flink for one code run.

Apache Storm Flux change topology

Is it possible to change the topology layout while it is running? I would like to change the stream groupings and bolts while it is active.
Submitting the yaml file with the new topology layout says it cannot deploy since it is already running.
I'm using Apache Storm 0.10.0
Thanks
It is not possible to change the structure of a topology while it is running. You need to kill the topology and redeploy the new version afterwards.
The only parameter you can change while a topology is running it the parallelism. See here for more details:
V 2.0.0 - https://storm.apache.org/releases/2.0.0/Understanding-the-parallelism-of-a-Storm-topology.html

Have To Manually Start Hadoop Cluster on GCloud

I have been using a Hadoop cluster, created using Google's script, for a few months.
Every time I boot the machines I have to manually start Hadoop using:
sudo su hadoop
cd /home/hadoop/hadoop-install/sbin
./start-all.sh
Besides scripting, how can I resolve this?
Or is this just the way it is by default?
(The first boot after cluster creation always starts Hadoop automatically, why not always?)
You have to configure using init.d.
Document provide more details and sample script for datameer. You need to follow similar steps. Script should be smart enough to check all the nodes in the cluster are up before invoking this script using ssh.
While different third-party scripts and "getting started" solutions like Cloud Launcher have varying degrees of support for automatic restart of Hadoop on boot, the officially supported tools are bdutil as a do-it-yourself deployment tool, and Google Cloud Dataproc as a managed service, both of which are already configured with init.d and/or systemd to automatically start Hadoop on boot.
More detailed instructions on using bdutil here.

Running Hadoop Job Through Web Interface

Is there any way to run Hadoop job through a web interface? e.g. giving command for Hadoop job execution with a button.
I want to implement a web interface for my Hadoop project.
Thanks!
Cloudera will be useful, which is designed for this purpose.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.2/Hue-2-User-Guide/hue26.html
Try the below options :
Option -1
Create a java web project with Web service and Add all UI inputs to this client server.
Create another web project as remote server , And receive all the above inputs of the job and pass it to the Jobs.
The remote server web project should be in the cluster always up and running, and capturing the client signals.
Use JSCH in the server side and invoke as and when all the hadoop commands you pass from the UI.
OR
Option - 2
You could you a MySql database and store all job parameters from the UI . Then write a simple java code with JSCH to run these hadoop comands by polling the DB. A runnable jar running all time.
Hope the above 2 ideas help you.

Benefit of running Storm under supervision, example/sample code

I have installed the storm correctly. But, I am struggling how to run an example on storm. Can anyone please give me the link or suggestion by which I can execute the example?Also, what are the benefit of running storm under supervision?
Assuming you have installed the storm on your local machine then you have an example storm project bundled along it which you can find in the examples/storm-starter of your storm repository.
To run this example you can follow the series of steps mentioned in README.markdown file in the root folder of storm-starter folder. The steps can also be found at https://github.com/apache/storm/tree/v0.10.0/examples/storm-starter
Regarding running storm and under supervision, the benefit is that since Storm and zookeeper have a fail fast policy, the servers will shutdown if there is an error. Using a supervisor process can bring up the servers in case of they exit the process because of errors.

Resources