I recently upgraded my cluster from Apache Hadoop1.0 to CDH4.4.0. I have a weblogic server in another machine from where i submit jobs to this remote cluster via mapreduce client. I still want to use MR1 and not Yarn. I have compiled my client code against the client jars in the CDH installtion (/usr/lib/hadoop/client/*)
Am getting the below error when creating a JobClient instance. There are many posts related to the same issue but all the solutions refer to the scenario of submitting the job to a local cluster and not to remote and specifically in my case from a wls container.
JobClient jc = new JobClient(conf);
Cannot initialize Cluster. Please check your configuration for and the correspond server addresses.
But running from the command prompt on the cluster works perfectly fine.
Appreciate your timely help!

I had a similar error and added the following jars to classpath and it worked for me:

It's likely that your app is looking at your old Hadoop 1.x configuration files. Maybe your app hard-codes some config? This error tends to indicate you are using the new client libraries but that they are not seeing new-style configuration.
It must exist since the command-line tools see them fine. Check your HADOOP_HOME or HADOOP_CONF_DIR env variables too although that's what the command line tools tend to pick up, and they work.
Note that you need to install the 'mapreduce' service and not 'yarn' in CDH 4.4 to make it compatible with MR1 clients. See also the '...-mr1-...' artifacts in Maven.

In my case, this error was due to the version of the jars, make sure that you are using the same version as in the server.

export HADOOP_MAPRED_HOME=/cloudera/parcels/CDH-4.1.3-1.cdh4.1.3.p0.23/lib/hadoop-0.20-mapreduce

I my case i was running sqoop 1.4.5 and pointing it to the latest hadoop 2.0.0-cdh4.4.0 which had the yarn stuff also thats why it was complaining.
When i pointed sqoop to hadoop-0.20/2.0.0-cdh4.4.0 (MR1 i think) it worked.

As with Akshay (comment by Setob_b) all I needed to fix was to get hadoop-mapreduce-client-shuffle-.jar on my classpath.
As follows for Maven:

In my case, strangely this error was because in my 'core-site.xml' file, I mentioned "IP-address" rather than "hostname".
The moment I mentioned "hostname" in place of IP address and in "core-site.xml" and "mapred.xml" and re-installed mapreduce lib files, error got resolved.

in my case, i resolved this by using hadoop jar instead of java -jar .
it's usefull, hadoop will provide the configuration context from hdfs-site.xml, core-site.xml ....


Running Mahout Job on Hadoop: Got ClassNotFoundException

I try to run a Mahout Kmeans Example on the cloudera quickstart vm for hadoop. I read here link to clouudera block and here stack overflow post that i can use the -libjars command to attach the mahout .jars
I put the jar-files: KMeansHadoop.jar mahout-core-0.9.jar and mahout-math-0.9.jar in the same folder and run:
hadoop jar KMeansHadoop.jar SimpleKMeansClustering -libjars mahout-core-0.9.jar mahout-math-0.9.jar
But i still get the error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
What do i wrong? Thank you!
Firstly, I believe that the -libjars values need to be comma-separated. But that only makes your third-party jars available to the cluster. You may also need to use HADOOP_CLASSPATH to make those jars available on the client side (e.g: on the edge node from which you're kicking off your job).
Check out this post. It helped me a lot when I was working my way through this exact issue with getting Driven to work with Cascading.

Cannot start standalone instance of HBase

I was unable to configure the HBase standalone instance. Following are the steps I followed:
Downloaded hbase-0.98.9-hadoop2 and extracted it.
Set my JAVA_HOMEin the environment variables.
Edited conf/hbase-site.xml and changed the configuration as mentioned in the Apache HBase quick start guide.
Ran the bin/ and this error came up.
Can anyone tell me what I'm missing or doing wrong? Thanks
Here are the steps:
Hbase cannot be installed without cygwin tooling.

Installing/Configuring Hue with Apache frameworks

Has anyone tried configuring Hue with the frameworks from Apache and not with CDH. The documentation says to set the mapred.jobtracker.plugins property to org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin and check the JT log files to make sure that the Thrift plugin has been loaded. But, I don't see anything related to Thrift in the JT log files. And, also looks like the mapred.jobtracker.plugins in not defined in the mapred-site.xml for Hadoop 1.2.1 which is the latest stable release.
Did you had the mapred.jobtracker.plugins to the mapred-site.xml? Hadoop support plugins since 1.2.0 so you should be good.

Using different hadoop-mapreduce-client-core.jar to run hadoop cluster

I'm working on a hadoop cluster with CDH4.2.0 installed and ran into this error. It's been fixed in later versions of hadoop but I don't have access to update the cluster. Is there a way to tell hadoop to use this jar when running my job through the command line arguments like
hadoop jar MyJob.jar -D hadoop.mapreduce.client=hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar
where the new mapreduce-client-core.jar file is the patched jar from the ticket. Or must hadoop be completely recompiled with this new jar? I'm new to hadoop so I don't know all the command line options that are possible.
I'm not sure how that would work as when you're executing the hadoop command you're actually executing code in the client jar.
Can you not use MR1? The issue says this issue only occurs when you're using MR2, so unless you really need Yarn you're probably better using the MR1 library to run your map/reduce.

Hue Hive -- Beeswax Server Can't Find JDBC Driver for MySQL

We're using the Cloudera 3.7.5 and having a tough time configuring the Beeswax server such that the Hue can access the Hive databases. I followed all the instructions from the Cloudera documentation that to setup MySQL to serve as Hive's metastore, but when I restart the Hue services and check Beeswax server's StdErr logs, I still see the painful "javax.jdo.JDOFatalInternalException: Error creating transactional connection factory" which is caused by
org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
This is bizzare to me, because the logs also indicate that the environment variable HIVE_HOME is equal to "/usr/lib/hive", and sure enough I have copied the "mysql-connector-java-5.1.15-bin.jar" into the /usr/lib/hive/lib directory, as the documents dictate.
I have also tried the instructions on the blog post, which involved copying the the mysql-connector jar into "/usr/share/hue/apps/beeswax/hive/lib/". Unfortunately I did not have a hive/lib subdirectory in the beeswax folder, so I attempted to make one. This also did not work.
Any advice how I can get the MySQL JDBC library onto Beeswax's classpath?
We finally decided to just bite the bullet and upgrade to CDH4. Placing the JDBC jar in /usr/share/hive/lib allowed the Beeswax server to function perfectly without issue.
If anyone else is experiencing this issue I recommend upgrading from CDH3 to CDH4, the UI is much cleaner, smoother, and we had much fewer installation and maintenance bugs with CDH4.
You have to paste your mysql connector in HUE_HOME/apps/beeswax/hive/lib.
If this path doesn't exist, create hive/lib and then paste the mysql connector. I hope your problem will be solved.
When you start using cloudera 4.5 they move everything into parcels, so this exact problem on my hive meta server was fixed by this command (below). Essentially you're just re-adding modules. I'm sure you can modify the extra classpath in the hive config file to make this oblivious to parcel updates.
cp /usr/lib/hive/lib/mysql-connector-java-5.1.17-bin.jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/.
So a real fix might be something like this:
cp `locate mysql-connector | grep jar | head -n 1` /opt/cloudera/parcels/*/lib/hive/lib/.
which would copy the jar into every parcel.
