Submitting a topology to Storm - apache-storm

I have configured Storm on my machine. Zookeeper, Nimbus and Supervisor are running properly.
Now I want to submit a topology to this storm.
I am trying to use storm jar.
but I am not able to submit it.
Can anybody please give an example for this.
It will be very helpful.
Thanks in advance:)

The answer is in the official documentation, and it is clear enough. Run storm jar path/to/allmycode.jar org.me.MyTopology arg1 arg2 arg3 (replace with your project name and arguments if any). Make sure you are using StormSubmitter object instead of LocalCluster.

Unfortunately, almost all the examples on the internet show the word counter example, and do not mention the steps required in a simple way:
All you need to do is this:
1. Navigate to your storm bin folder:
cd /Users/nav/programming/apache-storm-1.0.1/bin
2. Start nimbus
./storm nimbus
3. Start supervisor
./storm supervisor
4. Start the ui program
./storm ui
5. Make sure you build your jar file with the storm jar excluded from it.
6. Make sure your /Users/nav/programming/apache-storm-1.0.1/conf/storm.yaml file is valid (this should've been step 2).
7. Make sure that in your code, you are submitting the topology using StormSubmitter.submitTopology
8. Navigate to the storm bin folder again
cd /Users/nav/programming/apache-storm-1.0.1/bin
9. Submit your jar file to storm
./storm jar /Users/nav/myworkspace/StormTrial/build/libs/StormTrial.jar com.abc.stormtrial.StormTrial
The above command is basically just this:
stormExecutable jarOption pathToYourJarFile theClassContainingYourMainFile
If you want to pass commandline arguments to your program, add it at the end:
stormExecutable jarOption pathToYourJarFile theClassContainingYourMainFile commandlineArguments
Here, com.abc.stormtrial is the full package name and .StormTrial is the name of the class that contains your main function.
Now open up your browser and type http://127.0.0.1:8080 and you'll see your topology running via Storm's UI.

Related

See print in python script running on spark with spark-submit

I have to test some code using Spark and I'm pretty new to it.
The code I have runs an ETL script on a cluster. The ETL script is written in Python and have several prints in it but I'm unable to see those prints. The Python script is added to the spark-submit in the --py-files tag. I don't if those prints are unreachable since they are happening in the YARN executors and I should change them to logs and use log4j or add them to an accumulator reachable by the driver.
Any suggestions would help.
The final goal is to see how the execution of the code is going.I don't know if simple prints is the best solution but it was already in the code I was given to test.

I cannot see the running applications in hadoop 2.5.2 (yarn)

I installed hadoop 2.5.2, and I can run the wordcount sample successfully. However, when I want to see the application running on yarn (job running), I cannot as all applictaions interface is always empty (shown in the following screen).
Is there anyway to make the jobs visible?
Please try localhost:19888 or check value of the the property for web url for job history (mapreduce.jobhistory.webapp.address) configured in you yarn config file.

Storm logviewer page not found

I'm able to submit a topology job in the multi-tenant cluster. The job is running. However, the logviewer page is not available. Is there any way to solve this issue?
you need to start the logviewer before you click on topology port to see logviewer.
To start logviewer run:
$ storm logviewer same as you run $ storm list
I faced the same issue for logviewer's home page, but directly navigating to a particular log file that exists in the logs folder works. Try this:
MachineIP:8000/log?file=worker-6700.log

Differences between hadoop jar and yarn -jar

what's the difference between run a jar file with commands "hadoop jar " and "yarn -jar " ?
I've used the "hadoop jar" command on my MAC successfully but I want be sure that the execution is being correct and parallel on my four cores.
Thanks!!!
Short Answer
They are probably identical for you, but even if they aren't, they should both utilize your cluster to the best of its ability.
Longer Answer
The /usr/bin/yarn script sets up the execution environment so that all of the yarn commands can be run. The /usr/bin/hadoop script isn't quite as concerned about yarn specific functionality. However, if you have your cluster set up to use yarn as the default implementation of mapreduce (MRv2), then hadoop jar will probably act the same as yarn jar for a mapreduce job.
Either way you're probably fine, but you can always check the resource manager (or job tracker) web interface to see how your job is distributed across the cluster (whether it's a single node cluster or not)

Running MapReduce code that uses zooKeeper

I want to ask about how to execute a MapReduce java code that uses zooKeeper.
My first code is just to create a variable (znode) and to modify it by each mapper.
So I modified the wordCount code just to test zookeeper for the first time.
When I run it using the eclipse console, everything goes well, so I can see the changes on the value of the znode, etc.
However, I was trying to execute it using linux command line:
**bin/hadoop jar ./myjar.jar algo.WordCount /input.txt /out
I got the following error
**Error: java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
Although that I added the path of the jar file using conf.set("mapred.jar","...."); in the mapreduce code but I don't know why it did not recognize the classes of zookeeper.
Any idea?

Resources