Making Sqoop1 work with Hadoop2 - hadoop

I have had a hard time making sqoop1 work on hadoop2. I always run int Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.Tool error which suggests that sqoop1 is trying to use hadoop1. But i had downloaded the sqoop1 jar with hadoop 2.0.4-alpha release from http://www.us.apache.org/dist/sqoop/1.4.5/.
Then why does it not work with hadoop2?
PS: I have tried hard to make sqoop2 work, but i faced lot of problems in the setup.
Also, this post http://mmicky.blog.163.com/blog/static/1502901542013118115417262/ suggests that it should work, but i keep running into this ClassNotFoundException.

I figured out the problem. Whatever classpath i was setting was probably being overridden by the hadoop executable. So i had to modify the hadoop executable at the place where it called the java command and add a -cp flag with the classpath of my hadoop jars like below:
exec "$JAVA" -cp "$CLASSPATH:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/common/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/common/lib/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs/lib/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/mapreduce/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/mapreduce/lib/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/tools/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/tools/lib/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/lib/" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$#"

Related

How to find jar dependencies when running Apache Pig script?

I am having some difficulties running a simple pig script to import data into HBase using HBaseStorage
The error I have encountered is given by:
Caused by: <file demo.pig, line 14, column 0> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.backend.hadoop.hbase.HBaseStorage' with arguments '[rdf:predicate rdf:object]'
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCacheBlocks(Z)V
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.initScan(HBaseStorage.java:427)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.<init>(HBaseStorage.java:368)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.<init>(HBaseStorage.java:239) 13_21.51.28.tar.gz
... 29 more
According to other questions and threads, the main response/answer to this issue would be to register the appropriate jars required for the HBaseStorage references. What I am stumped by is how am I supposed to identify the required JAR given the appropriate Pig function.
I even tried to open the various jar files under the hbase and pig folders to ensure the appropriate classes are registered in the pig script.
For example, since java.lang.NoSuchMethodError was caused by org.apache.hadoop.hbase.client.Scan.setCacheBlocks(Z)V
I imported specifically the jar that contains org.apache.hadoop.hbase.client.Scan, to no avail.
Pig's documentation does not provide any obvious links and help that I can refer to.
I am using Hadoop 2.7.0, HBase 1.0.1.1., Pig 0.15.0.
If you need any other clarification, feel free to ask me again. Would really appreciate it if someone could help me out with this issue.
Also, is it better to install Hadoop and the relevant softwares from scratch, or is it better to directly get one of the Hadoop bundles available?
There is something wrong with the released jar: hbase-client-1.0.1.1.jar
you can test it with this code, the error will show up:
Scan scan = new Scan();
scan.setCacheBlocks(true);
I've tried other set functions, like setCaching, it throws the same error. While I checked the source code, those functions exist. Maybe just compile hbase-client-1.0.1.1.jar manually, I'm still looking for better solution...
============
Update for above, found the root cause is hbase-client-1.0.1.1.jar incompatibility with older versions.
https://issues.apache.org/jira/browse/HBASE-10841
https://issues.apache.org/jira/browse/HBASE-10460
There is a change of return value for set functions, jars compiled with old version won't work with current.
For your question, you can modify the pig script $PIG_HOME/bin/pig, set debug=true, then it will just print running info.
Did you register required jars.
Most important jars habse,zookeeper and guava
I solved the similar kind of issue by registering zookeeper jar in my pigscript

Conflicting jars while using Unirest on CDH

I'm trying to use Unirest to send a POST request from a MapReduce job on a Cloudera Hadoop 5.2.1 cluster.
One of Unirest's dependencies is httpcore-4.3.3.jar. The CDH package includes httpcore-4.2.5.jar in the classpath. While trying to run my code, I got a "ClassNotFound" exception.
I added a line in my code to check where it's getting a different class from and the answer was troubling: /opt/cloudera/parcels/CDH/jars/httpcore-4.2.5.jar.
I've looked everywhere online and tried everything I found. Needless to say, nothing seems to work.
I tried setting HADOOP_CLASSPATH environment variable, I tried setting HADOOP_USER_CLASSPATH_FIRST, and I tried using the -libjars parameter in the hadoop jar command.
Anyone have any idea how to solve this?

How to use an external Jar file in the Hadoop program

I have a Hadoop program in which I use a couple of external jar files. When I submit the jar file of my program to the Hadoop cluster it gives me the following error.
Exception in thread "main" java.lang.NoClassDefFoundError: edu/uci/ics/jung/graph/Graph
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:201)
I understand what the problem is but don't know how to solve it. How can I add the jar files to my program?
I think, you can also modify the environment of the job’s running task attempts explicitly by specifying JAVA_LIBRARY_PATH or LD_LIBRARY_PATH variables:
hadoop jar [main class]
-D mapred.child.env="LD_LIBRARY_PATH=/path/to/your/libs" ...
You can use LIBJARS option when submitting the jobs like this:
export LIBJARS=/path/jar1,/path/jar2
hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value
I would recommend reading this article which describes precisely what you're looking for, in detail:
http://grepalex.com/2013/02/25/hadoop-libjars/
Add external jar file into the hadoop/lib folder to get rid out of it...

Hadoop WordCount.java Dependency Issues

I am trying to compile WordCount.java file into jar inside of /Desktop/Hadoop/playground/src.
Here's the command I am using.
javac -classpath hadoop-1.2.1-core.jar -d playground/classes playground/src/WordCount.java
The compiler seem to be getting invoked, however I am getting tons of errors like this
error: package org.apache.hadoop.conf does not exist import org.apache.hadoop.conf.Configuration
How do I go about fixing this?
May be there is an answer to this issue already. However I could not fix it.
You need to set the paths of hadoop-1.2.1-core.jar and all the other dependent jars correctly --
Try this exactly while you are in the Desktop/hadoop directory (valid in your case only solely based upon the inputs you provided in the comments)
javac -classpath *:lib/* -d playground/classes playground/src/WordCount.java

How to run GIS codes through hadoop's prompt?

I am running a GIS code through hadoop's prompt in following manner:
Wrote the GIS code in Eclipse including all the GIS jars (relevant).
Went into the dir. where my eclipse workspace is.
Compiled the code by adding all the relevant jars in the classpath. *(The compilation was successful).
Built the jar.
Now running the same jar using hadoop: bin/hadoop jar my_jar_file_name.jar my_pkg_structure.Main_class_file
Now, inspite of the code being error free, when i try to execute through hadoop's propmpt, it gives me multiple issues.
Is there a workable alternative way to do the same without any hassles?
Also note, the gid code runs beautifully in eclipse. Since, I have to do Geo processing over hadoop, I need to run it through hadoop's prompt.

Resources