Hadoop data join package - hadoop

I am new to hadoop while exploring the hadoop data join package I am given the below mentioned command:
hadoop jar /home/biadmin/DataJoin.jar com.datajoin.DataJoin
/user/biadmin/Datajoin/customers.txt
/user/biadmin/Datajoin/orders.txt
/user/biadmin/Datajoin/outpu1
I am getting below error Exception in thread main
java.lang.NoClassDefFoundError: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
at java.lang.ClassLoader.defineClassImpl(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:364)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:154)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:777)
at java.net.URLClassLoader.access$400(URLClassLoader.java:96)

You need to add the hadoop-datajoin jar into classpath while running the job. Use the -libjars option for adding the extra jars into class path. Your command will be like this. Provide the correct path to jar directory or you can download the jars.
hadoop jar /home/biadmin/DataJoin.jar com.datajoin.DataJoin
-libjars <path>/hadoop-datajoin.jar
/user/biadmin/Datajoin/customers.txt
/user/biadmin/Datajoin/orders.txt
/user/biadmin/Datajoin/outpu1

Related

How can I submit an Apache Storm topology to a Storm cluster?

I'm following this tutorial: https://learn.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-develop-java-topology
What I've done so far is
maven setting
vi *.java files (in src/main/java/com/microsoft/example directory)
RandomSentenceSpout.java
SplitSentence.java
WordCount.java
WordCountTopology.java
mvn compile
jar cf storm.jar *.class (in target/classes/com/microsoft/example directory)
RandomSentenceSpout.class SplitSentence.class WordCount.class WordCountTopology.class
The above 4 files were used to make storm.jar file
Then, I tried
storm jar ./storm.jar com.microsoft.example.WordCountTopology WordCountTopology
and
storm jar ./storm.jar WordCountTopology
, but both of these failed, saying:
Error: Could not find or load main class com.microsoft.example.WordCountTopology
or
Error: Could not find or load main class WordCountTopology
According to a document, it says
Syntax: storm jar topology-jar-path class ...
Runs the main method of class with the specified arguments. The storm
jars and configs in ~/.storm are put on the classpath. The process is
configured so that StormSubmitter will upload the jar at
topology-jar-path when the topology is submitted.
I cannot find where to fix.
How can I resolve this?
I think your jar file does not contain class WordCountTopology. You can check it with jar tf storm.jar | grep WordCountTopology.
Looks like your jar does not contain a Manifest file which keeps information about the main class.
Try including the Manifest file or you can run the below java command to include the Manifest file
Hope this works!
jar cvfe storm.jar mainClassNameWithoutDotClassExtn *.class

HBase Hive handler is not working

Hi i am planning to integrate HBase and HIVE for one of my project .
I am confused in adding jars and where to add these jars?
I am using Hadoop 2.6.0-cdh5.7.0 .
I have downloaded jars:
guava-r09.jar
hbase-0.92.0.jar
hive-hbase-handler-0.9.0.jar
zookeeper-3.3.4.jar
I ran this command to create table
CREATE TABLE hbase_table_emp(id int, name string, role string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");
Now where should I copy all this jars?
Do i have to copy in /usr/lib/hive location and then i have to run add jar command?
All this jar version will work for my Hadoop version ?
I have just copied jars in one of the directory and then providing path to directory in hive, I am running add jars command, but it throws error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: Not a host:port pair: PBUF
quickstart.cloudera���ʼ��+��
If you place jars in /lib directory then all jars automatically available in hive CLASSPATH and you don't need add those jars explicitly again using add jar command.
The error you getting because add jar command expect fully qualified path of a jar.
add jar <fully qualified path of jar>;
Read hive-hbase handler for more details.

spark jobserver ERROR classnotfoundexception

I have been trying spark using spark-shell. All my data is in sql.
I used to include external jars using the --jars flag like /bin/spark-shell --jars /path/to/mysql-connector-java-5.1.23-bin.jar --master spark://sparkmaster.com:7077
I have included it in class path by changing the bin/compute-classpath.sh file
I was running succesfully with this config.
Now when I am running a standalone job through jobserver. I am getting the following error message
result: {
"message" : "com.mysql.jdbc.Driver"
"errorClass" : "java.lang.classNotFoundException"
"stack" :[.......]
}
I have included the jar file in my local.conf file as below.
context-settings{
.....
dependent-jar-uris = ["file:///absolute/path/to/the/jarfile"]
......
}
All of your dependencies should be included in your spark-jobserver application JAR (e.g. create an "uber-jar"), or be included on the classpath of the Spark executors. I recommend configuring the classpath, as it's faster and requires less disk-space since the third-party library dependencies don't need to be copied to each worker whenever your application runs.
Here are the steps to configure the worker (executor) classpath on Spark 1.3.1:
Copy the third-party JAR(s) to each of your Spark workers and the Spark master
Place the JAR(s) in the same directory on each host (e.g. /home/ec2-user/lib
Add the following line to the Spark /root/spark/conf/spark-defaults.conf file on the Spark master:
spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/name-of-your-jar-file.jar
Here's an example of my own modifications to use the Stanford NLP library:
spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/stanford-corenlp-3.4.1.jar:/home/ec2-user/lib/stanford-corenlp-3.4.1-models.jar
You might not be having /path/to/mysql-connector-java-5.1.23-bin.jar in your workers.
You can either copy required dependency to all spark workers or
Bundle the submitting jar with required dependencies.
I use maven for building the jar. The scope of dependencies must be run-time.
curl --data-binary #/PATH/jobs_jar_2.10-1.0.jar 192.168.0.115:8090/jars/job_to_be_registered
For posting dependency jar
curl -d "" 'http://192.168.0.115:8090/contexts/new_context?dependent-jar-uris=file:///path/dependent.jar'
This works for jobserver 1.6.1

how to get the .class files before using hadoop command to run it?

I am new to Hadoop and I am reading the 'Definitive Guide' book.
In ch02 there is a simple hadoop example which has a mapper, a reducer and a class with main function.
As the book says, I have to use
% export HADOOP_CLASSPATH=hadoop-examples.jar
% hadoop MaxTemperature input/ncdc/sample.txt output
to run the code. The MaxTemperature is the class with the main method, followed by the input and the output path.
When I ran the command above I got exception:
Exception in thread "main" java.lang.NoClassDefFoundError: MaxTemperature
Caused by: java.lang.ClassNotFoundException: MaxTemperature
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: MaxTemperature. Program will exit.
I found out that I did not have the .class files so I tried to used javac to compile the java files. Then it gave me errors saying that all classes from Hadoop cannot be found.
I turned to eclipse, created a project with hadoop build path. It worked fine obviously, and I found the .class files in the bin folder from the eclipse project. By this time, since I got those .class files, I can use :
% hadoop MaxTemperature input/ncdc/sample.txt output
My question is:
How can I use configure the hadoop classpath properly to use javac to compile the java files and get the .class files?
(I used hadoop classpath and found out the hadoop_classpath is sooo long. Do I have to set up the classpath as that long?)
Thank you :)
javac -classpath solve all the problems!!!

How to run a Hadoop program?

I have set up Hadoop on my laptop and ran the example program given in the installation guide successfully. But, I am not able to run a program.
rohit#renaissance1:~/hadoop/ch2$ hadoop MaxTemperature input/ncdc/sample.txt output
Exception in thread "main" java.lang.NoClassDefFoundError: MaxTemperature
Caused by: java.lang.ClassNotFoundException: MaxTemperature
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: MaxTemperature. Program will exit.
The book said that we should set a Hadoop Classpath by writing
rohit#renaissance1:~/hadoop/ch2$ export HADOOP_CLASSPATH=build/classes
The main class is defined in MaxTemperature.java file that I am executing. How do we set the Hadoop Classpath? Do we have to do it for all program execution or only once? Where should I put the input folder. My code is at /home/rohit/hadoop/ch2 and my Hadoop installation is at /home/hadoop.
You should package your application into a JAR file, that's much easier and less error-prone than fiddling with classpath folders.
In your case, you must also compile the .java file. You said it's MaxTemparature.java, but there must also be a MaxTemperature.class before you can run it.
First compile the Java files as told by walid:
javac -classpath path-to-hadoop-0.19.2-core.jar .java-files -d folder-to-contain-classes
Create jar file of application classes using the following command:
jar cf filename.jar *.class
In either of the, whether you are exporting the classes into jar file or using specific folder to store class files , you should define HADOOP_CLASSPATH pointing to that particular class file or folder containing class file. So that at the time of running Hadoop command it should know where to look specified for main class.
set HADOOP_CLASSPATH
export HADOOP_CLASSPATH=path-to-filename.jar
or
export HADOOP_CLASSPATH=path-to-folder-containing-classes
Run using Hadoop command:
hadoop main-class args
I found this problem as well when going thru the Hadoop Book (O'Reilly). I fixed it by setting the HADOOP_CLASSPATH variable in the hadoop-env.sh file in your configuration directory.
here is the ansewer in 3 steps:
1:
javac -verbose -classpath C:\\hadoop\\hadoop-0.19.2-core.jar MaxTemperature*.java -d build/classes
2:
put *.class in build/classes
3:
export HADOOP_CLASSPATH=${HADOOP_HOME}/path/to/build/classes
(you have to create the build/classes directory)
Best Regards
walid
You do not necessarily need a jar file, but did you put MaxTemperature in a package?
If so, say your MaxTemperature.class file is in yourdir/bin/yourpackage/, all you need to do is:
export HADOOP_CLASSPATH=yourdir/bin
hadoop yourpackage.MaxTemperature
after you make your class a jar file:
hadoop jar MaxTemperature.jar MaxTemperature
basicly :
hadoop jar jarfile main [args]

Resources