How can I fix ClassNotFounException when executing HBase java application from command line? - bash

I don't know anything about bash, but i put together a script to help me run my Hbase java application:
#!/bin/bash
HADOOP_CLASSPATH="$(hbase classpath)"
hadoop jar my.jar my_pkg.my_class
When I run it I get a:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy
When I echo out the HADOOP_CLASSPATH I see that hbase-server-1.2.0-cdh5.8.0.jar is there...
Is the hadoop jar command ignoring the HADOOP_CLASSPATH?
Also I have tried to run the commands from the command-line instead of using my script. I get the same error.
The approach was inspired by this cloduera-question

The solution was to include the Hadoop class path on the same line. I am not certain what the difference is, but this works:
HADOOP_CLASSPATH="$(hbase classpath)" hadoop jar my.jar my_pkg.my_class

Related

Running MapReduce code that uses zooKeeper

I want to ask about how to execute a MapReduce java code that uses zooKeeper.
My first code is just to create a variable (znode) and to modify it by each mapper.
So I modified the wordCount code just to test zookeeper for the first time.
When I run it using the eclipse console, everything goes well, so I can see the changes on the value of the znode, etc.
However, I was trying to execute it using linux command line:
**bin/hadoop jar ./myjar.jar algo.WordCount /input.txt /out
I got the following error
**Error: java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
Although that I added the path of the jar file using conf.set("mapred.jar","...."); in the mapreduce code but I don't know why it did not recognize the classes of zookeeper.
Any idea?

Hadoop JAR command - Setting java.library.path

I'm trying to run a java program on Hadoop cluster. Here's the command-
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/lib/*:/home/rgupta/bdAnalytics/lib/*
hadoop jar $jarpath bigdat.twitter.queue.TweetOMQSub > $logsFldr/subsHdpOMQ_$1.log 2>&1 &
#java -Djava.library.path=/usr/local/lib -classpath class/:lib/:lib/jzmq-2.1.3.jar bigdat.twitter.queue.TweetOMQSub > log/subsFilterOMQ_$1.log 2>&1 &
This throws following error -
Exception in thread "main" java.lang.UnsatisfiedLinkError: no jzmq in java.library.path
If I use the Java native command above, it works fine. Also, the hadoop node where I m trying to test it, does have the necessary jzmq jars under /usr/local/lib directory. Is there a way I can set java.library.path to Hadoop JAR command.
Please suggest how can I fix this.
sorry misread your question so editing:
you should be able to use the libjars option
In your case:
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/lib/:/home/rgupta/bdAnalytics/lib/
hadoop jar $jarpath bigdat.twitter.queue.TweetOMQSub -libjars /usr/local/lib ...
Try export HADOOP_OPTS=$HADOOP_OPTS -Djava.library.path=/usr/local/lib
and export other jars the usual way you are doing before running a job - using HADOOP_CLASSPATH
Hope this helps.

Error: Failed to create Data Storage while running embedded pig in java

I wrote a simple program to test the embedded pig in java to run in mapreduce mode.
The hadoop version in the server I am running is 0.20.2-cdh3u4a, and pig version is 0.10.0-cdh3u4a.
When I try to run in local mode, it runs successfully. But when I try to run in mapreduce mode, it gives me the error.
I run my program using the following commands as shown in http://pig.apache.org/docs/r0.9.1/cont.html#embed-java
javac -cp pig.jar EmbedPigTest.java
javac -cp pig.jar:.:/etc/hadoop/conf EmbedPigTest.java input.txt
My program gives error as:
Exception in thread "main" java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.<init>(PigServer.java:226)
at org.apache.pig.PigServer.<init>(PigServer.java:215)
at org.apache.pig.PigServer.<init>(PigServer.java:211)
at org.apache.pig.PigServer.<init>(PigServer.java:207)
at WordCount.main(EmbedPigTest.java:9)
In some online resources they say that this problem occurs due to different hadoop version. But, I didn't understand what I should do. Suggestions please !!
This is happening because you are linking to the wrong jar, Please see the link below it describes this issue very well.
http://localsteve.wordpress.com/2012/09/30/embedding-pig-for-cdh4-java-apps-fer-realz/
I was faced same kind of issue when I tried to use pig in map reduce mode without starting the services.
Please check all services using jps before using pig in map reduce mode.

How to run the hadoop simple program through command line

I'm new to the hadoop technologies .How to run the simple program through command line.I'm using windows environment.I install the Cygwin.Can you help me ...
Try the below URLs.
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/
If you are new to Hadoop, try using one of the IDE plugins. This will help you get started quickly.
http://karmasphere.com/Studio-Eclipse/quick-click-guide.html
http://wiki.apache.org/hadoop/EclipsePlugIn
FYI ..... Hadoop on Windows is not recommended for Production.
Are your program written in Java? If so, you need to compile your program and pack the compiled files into a Jar file. And then run the program with hadoop command:
${hadoop_home}/bin/hadoop jar ${your_program_jar_file} ${main_class_of_jar}
You can run the Hadoop commands from anywhere in the terminal/command line, but only if the $path variable is set properly.
The syntax would be like this:
hadoop fs -<command> or hdfs fs -<command>
You review the docs for more information.

hadoop - Where are input/output files stored in hadoop and how to execute java file in hadoop?

Suppose I write a java program and i want to run it in Hadoop, then
where should the file be saved?
how to access it from hadoop?
should i be calling it by the following command? hadoop classname
what is the command in hadoop to execute the java file?
The simplest answers I can think of to your questions are:
1) Anywhere
2,3,4)$HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file]
A similar question was asked here Executing helloworld.java in apache hadoop
It may seem complicated, but it's simpler than you might think!
Compile your map/reduce classes, and your main class into a jar. Let's call this jar myjob.jar.
This jar does not need to include the Hadoop libraries, but it should include any other dependencies you have.
Your main method should set up and run your map/reduce job, here is an example.
Put this jar on any machine with the hadoop command line utility installed.
Run your main method using the hadoop command line utility:
hadoop jar myjob.jar
Hope that helps.
where should the file be saved?
The data should be saved in "hdfs". You will want to probably load it into the cluster from your data source using something like Apache Flume. The file can be placed anywhere but most home is /user/hadoop/
how to access it from hadoop?
SSH into the hadoop cluster headnode like a standard linux server.
To list your hadoop root hdfs
hadoop fs -ls /
should i be calling it by the following command? hadoop classname
You should be using the hadoop command to access your data and run your programs, try hadoop help
what is the command in hadoop to execute the java file?
hadoop -jar MyJar.jar com.mycompany.MainDriver arg[0] arg[1] ...

Resources