using apache-storm-2.2.1 cluster run topologies,problem? - apache-storm

Run 3 Topologies On Storm Clusters,Through storm list command line ,show me 3 topologies and the state all ACTIVE ,But In /home/storm/data/supervisor/stormdist/ Dir only 2 topologies, One Of them Not Run?

Related

How can I stop Apache Storm Nimbus, UI and Supervisor?

I run Apache Storm in a cluster and I was looking for ways to stop and/or restart Nimbus, Supervisor and UI. Would writting a servise help? What should I write in this service file and where should I place it? Thank you in advance
Yes, writing a service is the recommended way to run Storm. The commands you want to run are storm nimbus to start Nimbus (minimum 1 per cluster), storm supervisor to run the supervisor (1 per worker machine), storm ui (1 per cluster) and storm logviewer (1 per worker machine). There are other commands you can also run, but you can find these by simply running storm, it will print a list.
Regarding how to write the service, take a look at the upstart cookbook http://upstart.ubuntu.com/cookbook/.
There's an example script here you can probably use to get started https://unix.stackexchange.com/a/84289
you can make them as service and start them up as the node starts and same can be used to stop them.
/etc/rc.d/SERVICE start or stop or restart
We can easily stop them using the command "ps -aux | grep nimbus" or supervisor etc. Then we have to find the process id and kill it with the “kill” command.

YARN shell commands to get top 3 yarn application in running state

I want to get top 3 yarn applications in running state from all clusters
output : AppicationId, Application Type, Memory occupied, start time

Spark on Hadoop YARN - executor missing

I have a cluster of 3 macOS machines running Hadoop and Spark-1.5.2 (though with Spark-2.0.0 the same problem exists). With 'yarn' as the Spark master URL, I am running into a strange issue where tasks are only allocated to 2 of the 3 machines.
Based on the Hadoop dashboard (port 8088 on the master) it is clear that all 3 nodes are part of the cluster. However, any Spark job I run only uses 2 executors.
For example here is the "Executors" tab on a lengthy run of the JavaWordCount example:
"batservers" is the master. There should be an additional slave, "batservers2", but it's just not there.
Why might this be?
Note that none of my YARN or Spark (or, for that matter, HDFS) configurations are unusual, except provisions for giving the YARN resource- and node-managers extra memory.
Remarkably, all it took was a detailed look at the spark-submit help message to discover the answer:
YARN-only:
...
--num-executors NUM Number of executors to launch (Default: 2).
If I specify --num-executors 3 in my spark-submit command, the 3rd node is used.

Parallel Map Reduce Jobs in Hadoop

I have to run in hadoop 1.0.4 many (maybe 12) jobs. I want tha five first to run in parallel, and when all finish to run 4 other jobs in parallel and at last to run the last 3 again to run in parallel. How can i set it in hadoop 1.0.4 as i see that all jobs run one each other and not in parallel.
JobControl API can be used for MR job dependency. For complex work flows, Oozie or Azkaban is recommended. Here is Oozie vs Azkaban,

storm list doesn't show running topology?

Nimbus and Supervisor is on the same machine. I'm running my jar on storm with 'bin/storm XX.jar XX.XX' and stdout shows that this topology is running well.
But while I use 'bin/storm list', no infomation of my topology is show.

Resources