Mapreduce with third party API - hadoop

I am using some third party APIs in my Map-Reduce code. I tried few things but didn't worked out for me. (I am using Cloudera 5.9 version)
1) I tried with fat jar approach (building jar with all dependencies included), and my code worked for me, but this is not an good way to use (my jar size in very heavy).
so i thought of separating the third party jars and share them using distributed cache.
2) I tried using the some options like below.
i) I used hadoop jar <jar_file_name> <main_class_name> <arguments> -libjars <list of jar files> command. --> Didn't worked and getting ClassNotFoundException
i.a) I Kept the third party jars in local folder and used the path in -libjars option. --> Didn't worked and getting ClassNotFoundException
i.b) I Kept the third party jars in HDFS and used the path in -libjars option. --> Didn't worked and getting ClassNotFoundException
ii) I Updated the code to use DistributedCache.addFileToClassPath() --> Didn't worked and getting ClassNotFoundException
iii) I Updated the code to use job.addCacheFile() --> Didn't worked and getting ClassNotFoundException
iv) I Updated the /etc/hadoop/conf/hadoop-env.sh and added a new line export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/mani/test/lib/* and then restarted the cluster and but still didn't worked.
v) I tried running the command like hadoop jar <jar_file_name> <main_class_name> <arguments> -D mapred.child.env="LD_LIBRARY_PATH=path/to/my/lib/*" --> Didn't worked and getting same exception.
vi) I Also tried -Dyarn.application.classpath but still same issue.
I have gone through some forums and Cloudera blog posts, and some other websites. Everyone telling the same story but i am not able to get the output even after following their posts correctly, am i missing something here.
Can someone please help me to find out the solution.
Thanks,
Manindar.

Related

Shouldn't Oozie/Sqoop jar location be configured during package installation?

I'm using HDP 2.4 in CentOS 6.7.
I have created the cluster with Ambari, so Oozie was installed and configured by Ambari.
I got two errors while running Oozie/Sqoop related to jar file location. The first concerned postgresql-jdbc.jar, since the Sqoop job is incrementally importing from Postgres. I added the postgresql-jdbc.jar file to HDFS and pointed to it in workflow.xml:
<file>/user/hdfs/sqoop/postgresql-jdbc.jar</file>
It solved the problem. But the second error seems to concern kite-data-mapreduce.jar. However, doing the same for this file:
<file>/user/hdfs/sqoop/kite-data-mapreduce.jar</file>
does not seem to solve the problem:
Failing Oozie Launcher, Main class
[org.apache.oozie.action.hadoop.SqoopMain], main() threw exception,
org/kitesdk/data/DatasetNotFoundException
java.lang.NoClassDefFoundError:
org/kitesdk/data/DatasetNotFoundException
It seems strange that this is not automatically configured by Ambari and that we have to copy jar files into HDFS as we start getting errors.
Is this the correct methodology or did I miss some configuration step?
This is happening due to the missing jars in the classpath. I would suggest you to use the property oozie.use.system.libpath=true in the job.properties file. All the sqoop related jars will be added automatically in the classpath. Then add only custom jar you need to the lib directory of the workflow application path., all the sqoop related jars will be added from the /user/oozie/share/lib/lib_<timestamp>/sqoop/*.jar.

Conflicting jars while using Unirest on CDH

I'm trying to use Unirest to send a POST request from a MapReduce job on a Cloudera Hadoop 5.2.1 cluster.
One of Unirest's dependencies is httpcore-4.3.3.jar. The CDH package includes httpcore-4.2.5.jar in the classpath. While trying to run my code, I got a "ClassNotFound" exception.
I added a line in my code to check where it's getting a different class from and the answer was troubling: /opt/cloudera/parcels/CDH/jars/httpcore-4.2.5.jar.
I've looked everywhere online and tried everything I found. Needless to say, nothing seems to work.
I tried setting HADOOP_CLASSPATH environment variable, I tried setting HADOOP_USER_CLASSPATH_FIRST, and I tried using the -libjars parameter in the hadoop jar command.
Anyone have any idea how to solve this?

Missing of hadoop-mapreduce-client-core-[0-9.]*.jar in hadoop1.2.1

I have installed Hadoop 1.2.1 on a three node cluster. while installing Oozie, When i try to generate a war file for the web console, I get this error.
hadoop-mapreduce-client-core-[0-9.]*.jar' not found in '/home/hduser/hadoop'
I believe the version of Hadoop that I am using doesn't have this jar file(don't know where to find them). So can anyone please tell me how to create a war file and enable the web console. Any help is appreciated.
You are correct. You have 2 options :
1. Download individual jars and put them inside your hadoop1.2.1 directory and generate the war file.
2. Download Hadoop 2.x and use it while creating the war file and once it has been built continue using your hadoop1.2.1.
For example : oozie-3.3.2 bin/oozie-setup.sh prepare-war -hadoop
hadoop-1.1.2 ~/hadoop-eco/hadoop-2.2.0 -extjs
~/hadoop-eco/oozie/oozie-3.3.2/webapp/src/main/webapp/ext-2.2
Here I have built Oozie-3.3.2 to use it with hadoop-1.1.2 but using hadoop-2.2.0
HTH

Eclipse plugin error for Hadoop on Ubuntu

I installed Hadoop version 1.0.3 and its related eclipse plugin successfully. All the Hadoop functionalities and examples are working pretty well, but when I want to use its plugin on eclipse, it could not connect to hdfs and I get the error:
An internal error occurred during: "Connecting to DFS localhost".
org/apache/commons/configuratiĀ­on/Configuration.
could anybody help me how to solve this problem!
Thanks
You are facing this problem because the plugin is missing some necessary jars. In order to solve the problem you need to rebuild the plugin after including the necessary jars. I have seen this kind of questions a lot on SO, and they all point out to the same thing. Please see these links :
Eclipse Hadoop plugin issue(Call to localhost/127.0.0.1:50070 )Can any body give me the solution for this?
Hadoop eclipse mapreduce is not working?
Installing Hadoop's Eclipse Plugin
I did follow the following blog instructions to make Hadoop eclipse plugin 1.0.4 :
http://iredlof.com/part-4-compile-hadoop-v1-0-4-eclipse-plugin-on-ubuntu-12-10/
but it seems it has some missing parts like:
in MANIFEST.MF you should add:
/lib/commons-cli-1.2.jar
and in build-contrib.xml you should also add:
<property name="commons-cli.version" value="1.2"/>
I hope these are useful!
you must run hadoop with command line first!!
./[hadoop-path]/bin/start-all.sh

How to run GIS codes through hadoop's prompt?

I am running a GIS code through hadoop's prompt in following manner:
Wrote the GIS code in Eclipse including all the GIS jars (relevant).
Went into the dir. where my eclipse workspace is.
Compiled the code by adding all the relevant jars in the classpath. *(The compilation was successful).
Built the jar.
Now running the same jar using hadoop: bin/hadoop jar my_jar_file_name.jar my_pkg_structure.Main_class_file
Now, inspite of the code being error free, when i try to execute through hadoop's propmpt, it gives me multiple issues.
Is there a workable alternative way to do the same without any hassles?
Also note, the gid code runs beautifully in eclipse. Since, I have to do Geo processing over hadoop, I need to run it through hadoop's prompt.

Resources