When I am submitting my topology to storm, i am getting an error from "lein" - apache-storm

I am trying to submit my topology to storm:
sparse submit
Virtual environment is creating successfully and also "lean clean" is successfully is done.
But, when it trying to create a jar,
lein jar <topology>
it is giving me an error "Failed to create jar"
I have check in my system, lein is successfully installed and inside "project.clj" the version numbers are proper.
And this error is appearing only when I am submitting my topology to storm.
Even I checked my topology, spout and bolt code, there are no error in it.
Error image

Streamparse doesn't work with the latest version of Lein. I was having the same issue and I down graded Lein to 2.7.1.
It works perfectly for me.

Related

Apache Kylin installation without Sandbox

I was wondering if there are any resources regarding Apache Kylin installation without any sandbox (like cloudera, hortonworks) support. I have managed to do the following:
Install Hadoop 2.6
Install Hive
Install HBase
Then I used the binary from kylin site and so far been able to run it. The problem start when I try to build a cube, the map reduce job gets stuck in step 2. I am thinking if it is still assuming to be in sandbox mode and not submitting job to hadoop at all (there is no entry in hadoop jobtracker).
So I need solution regarding the two:
1. Possible configuration of kylin in pure hadoop setup (no sandbox)
2. somehow enable the kylin setup to submit job to hadoop.
There is no such sandbox or non-sandbox configuration in Kylin. Just make sure the machine where Kylin runs has hadoop setup correctly and you should be fine.
Under the scene, kylin.sh uses hbase classpath and hive -e set | grep 'env:CLASSPATH' to detect hadoop settings. Double check these commands work as expect if you are not sure what cluster Kylin connects to.
If Kylin has problem submitting MR jobs, check two places. First is hadoop resource manager, see if the job has really been submitted or not. Sometimes it's just running slow. Second check kylin.log, see if any exception there. Post the log to kylin dev mailing list and someone will be able to help.
You can install hadoop-2.6 , hive-0.14 ,hbase-0.98.8-hadoop2 with Zookeeper inbuilt or external zookeeper-3.5
Now you can run kylin-v1.1-release on it
If you still face Issues paste the log here

Spark EC-2 deployment error: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up

I have a question in regard to deploying a spark application on a standalone EC-2 Cluster. I have followed the tutorial by Spark ans was able to successfully deploy a standalone EC-2 cluster. I verified that by connecting to the clusrer UI and making sure that everything is as it supposed to be. I developed a simple application and tested it locally. Everything works fine. When I submit it to the cluster (just changing --master local[4] into --masers spark://.... ) I get the following error: ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up. Does any one know how to overcome this problem. my deploy-mode is client.
Make sure that you have provided the correct url to the master.
Basically, the exact spark master URL is displayed on the page when you connected to the Web UI.
URL on the page is something like: Spark Master at spark://IPAddress:port
Also you may notice that web UI and the Spark running port numbers may be different

Benefit of running Storm under supervision, example/sample code

I have installed the storm correctly. But, I am struggling how to run an example on storm. Can anyone please give me the link or suggestion by which I can execute the example?Also, what are the benefit of running storm under supervision?
Assuming you have installed the storm on your local machine then you have an example storm project bundled along it which you can find in the examples/storm-starter of your storm repository.
To run this example you can follow the series of steps mentioned in README.markdown file in the root folder of storm-starter folder. The steps can also be found at https://github.com/apache/storm/tree/v0.10.0/examples/storm-starter
Regarding running storm and under supervision, the benefit is that since Storm and zookeeper have a fail fast policy, the servers will shutdown if there is an error. Using a supervisor process can bring up the servers in case of they exit the process because of errors.

Cascading HBase Tap

I am trying to write Scalding jobs which have to connect to HBase, but I have trouble using the HBase tap. I have tried using the tap provided by Twitter Maple, following this example project, but it seems that there is some incompatibility between the Hadoop/HBase version that I am using and the one that was used as client by Twitter.
My cluster is running Cloudera CDH4 with HBase 0.92 and Hadoop 2.0.0-cdh4.1.3. Whenever I launch a Scalding job connecting to HBase, I get the exception
java.lang.NoSuchMethodError: org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:363)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1046)
...
It seems that the HBase client used by Twitter Maple is expecting some method on NetUtils that does not exist on the version of Hadoop deployed on my cluster.
How do I track down what exactly is the mismatch - what version would the HBase client expect and so on? Is there in general a way to mitigate these issues?
It seems to me that often client libraries are compiled with hardcoded version of the Hadoop dependencies, and it is hard to make those match the actual versions deployed.
The method actually exists but has changed its signature. Basically, it boils down to having different versions of Hadoop libraries on your client and server. If your server is running Cloudera, you should be using the HBase and Hadoop libraries from Cloudera. If you're using Maven, you can use Cloudera's Maven repository.
It seems like library dependencies are handled in Build.scala. I haven't used Scala yet, so I'm not entirely sure how to fix it there.
The change that broke compatibility was committed as part of HADOOP-8350. Take a look at Ted Yu's comments and the responses. He works on HBase and had the same issue. Later versions of the HBase libraries should automatically handle this issue, according to his comment.

Hadoop Mapper not running my class

Using Hadoop version 0.20.. I am creating a chain of jobs job1 and job2 (mappers of which are in x.jar, there is no reducer) , with dependency and submitting to hadoop cluster using JobControl. Note I have setJarByClass and getJar gives the correct jar file, when checked before submission.
Submission goes through and there seem to be no errors in user logs and jobtracker. But I dont see my Mapper getting executed (no sysouts or log output), but default output seems to be coming to the output folder (input file is read as is and output). I am able to run the job directly using x.jar, but I am really out of clues as to why it is not running with Jobcontrol.
Please help !
This issue bugged me for quite some days. Finally I found that it was the UsedGenericOptionsParser which created the issue. Set this to true and everything started working fine.

Resources