Installing/Configuring Hue with Apache frameworks - hadoop

Has anyone tried configuring Hue with the frameworks from Apache and not with CDH. The documentation says to set the mapred.jobtracker.plugins property to org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin and check the JT log files to make sure that the Thrift plugin has been loaded. But, I don't see anything related to Thrift in the JT log files. And, also looks like the mapred.jobtracker.plugins in not defined in the mapred-site.xml for Hadoop 1.2.1 which is the latest stable release.

Did you had the mapred.jobtracker.plugins to the mapred-site.xml? Hadoop support plugins since 1.2.0 so you should be good.

Related

Create hdfs when using integrated spark build

I'm working with Windows and trying to set up Spark.
Previously I installed Hadoop in addition to Spark, edited the config files, run the hadoop namenode -format and away we went.
I'm now trying to achieve the same by using the bundled version of Spark that is pre built with hadoop - spark-1.6.1-bin-hadoop2.6.tgz
So far it's been a much cleaner, simpler process however I no longer have access to the command that creates the hdfs, the config files for the hdfs are no longer present and I've no 'hadoop' in any of the bin folders.
There wasn't an Hadoop folder in the spark install, I created one for the purpose of winutils.exe.
It feels like I've missed something. Do the pre-built versions of spark not include hadoop? Is this functionality missing from this variant or is there something else that I'm overlooking?
Thanks for any help.
By saying that Spark is built with Hadoop, it is meant that Spark is built with the dependencies of Hadoop, i.e. with the clients for accessing Hadoop (or HDFS, to be more precise).
Thus, if you use a version of Spark which is built for Hadoop 2.6 you will be able to access HDFS filesystem of a cluster with the version 2.6 of Hadoop via Spark.
It doesn't mean that Hadoop is part of the pakage and downloading it Hadoop is installed as well. You have to install Hadoop separately.
If you download a Spark release without Hadoop support, you'll need to include the Hadoop client libraries in all the applications you write wiìhich are supposed to access HDFS (by a textFile for instance).
I am also using same spark in my windows 10. What I have done create C:\winutils\bin directory and put winutils.exe there. Than create HADOOP_HOME=C:\winutils variable. If you have set all
env variables and PATH like SPARK_HOME,HADOOP_HOME etc than it should work.

Hadoop Customization overwritten by Ambari - hadoop-env.sh

It seems a simple task: change JAVA_HOME in /etc/hadoop/conf/hadoop-env.sh to use a different version of Java.
However, Ambari will, it seems, overwrite any changes you make in the hadoop-env.sh, by using it's templating scheme.
The template seems to contain following line:
export JAVA_HOME={{java_home}}
So, now if this is used to generate and replace the environment on each node, how do I define the {{java_home}}?
As of Ambari 1.7.0, you can modify hadoop-env from Ambari Web UI. You can learn about the other features in Ambari 1.7.0, as well as 2.0.0 (which is the latest release) from the links on this page:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=30755705

Setting up Hadoop Client on Mac OS X

Currently, I have 3-node cluster running CDH 5.0 using MRv1. I am trying to figure out how to setup Hadoop on my Mac. So, I can submit jobs to the cluster. According to the "Managing Hadoop API Dependencies in CDH 5", you just need the files in /usr/lib/hadoop/client-0.20/* Do I need the following files too? Does Cloudera has hadoop-client in tarball?
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
Yes, I'nk you can make use of cloudera tarball for setting up hadoop client, the same can be downloaded from the following path, configuration files are availble under etc/hadoop/ directory under Hadoop, just need to modify those files according to your environment.
http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.2.0-cdh5.0.0-beta-2.tar.gz
If the above link doesn't match your version, use the following link for getting the available hadoop versions
http://archive-primary.cloudera.com/cdh5/cdh/5/

Install Hue without Cloudera

Has anyone tried/succeeded in installing Hue on Hadoop without Cloudera?
I have gotten to a point where I can reliably reproduce a hadoop cluster with hbase and hive and can set it all up in about 15 minutes. I'd love to have Hue along with all this without having to go back and redo my setup with Cloudera.
Checkout slides #19 & #5, Hue is getting everywhere and is compatible with Hadoop 0.20 / 1.2.0 / 2.2.0: http://gethue.com/hue-goes-to-paris-hug-france/
Hue has tarball releases releases that you are free to install. You can also simply clone the source code (Hue is open source and Apache Licenced) github: https://github.com/cloudera/hue and build the branch you want.
Upstream documentation is here or CDH's one here.
Hue is also packaged in BigTop (and so based on Vanilla Hadoop).
Hue is a Web Server (Django based) which acts as a view on top of Hadoop. So Hue just needs to be installed and then configured by adding the hosts of NameNode, JobTracker, Resource Manager, Oozie, HiveServer... etc in its hue.ini.
Also, as detailed on the gehue.com/releases, the version you need might depend on your Hive version.
Notice that without Cloudera's distribution your mileage might vary but feel free to chime-in on the Hue user-list or gethue.com ;)
We are also seeing for improving Hue setup with Amazon AWS/EMR!
To build and run hue 3.6.0 with apache hadoop 2.4.1
git clone https://github.com/cloudera/hue.git (Notice! releases/tag/release-3.6.0 is unstable, It's better to build from latest master. I built from Aug 7, 87d6b2da1 - it's stable)
cd hue
$ vi maven/pom.xml
change hadoop.version to 2.4.1
replace hadoop-core with hadoop-common
set hadoop-test version to 1.2.1
remove files which need hadoop mr1
$ rm desktop/libs/hadoop/java/src/main/java/org/apache/hadoop/mapred/ThriftJobTrackerPlugin.java
$ rm desktop/libs/hadoop/java/src/main/java/org/apache/hadoop/thriftfs/ThriftJobTrackerPlugin.java
build hue $ make apps
configure hue $ vi desktop/conf/pseudo-distributed.ini
run hue server in dev mode $ build/env/bin/hue runserver 0.0.0.0:8000
Follow the Hue manual installation steps from Hortonworks documentation, it will take you step-by-step on how to do it manually.
Quote: "...without Cloudera's distribution your mileage might vary...."
Indeed, it will vary A LOT! It would seem that the following is quite true:
Per the install giude:
http://cloudera.github.io/hue/docs-2.0.1/manual.html#_install_hue
NOTE:
Hue requires the Hadoop contained in Cloudera’s Distribution including Apache Hadoop (CDH), version 3 update 4 or later.
I've tried it and have run into walls with Hue trying to connect to Hive, Pig and OOZIE.
At this stage - from my experience at least - Hue will NOT run on a standard Apache Hadoop installation using standard Apache tools like Hive and Pig. It must be a vintage of Cloudera’s Distribution.
If anyone has any other (positive) experiences installing Hue outside of the Cloudera’s Distribution, I'd be quite interested to hear about them...

Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr

I recently upgraded my cluster from Apache Hadoop1.0 to CDH4.4.0. I have a weblogic server in another machine from where i submit jobs to this remote cluster via mapreduce client. I still want to use MR1 and not Yarn. I have compiled my client code against the client jars in the CDH installtion (/usr/lib/hadoop/client/*)
Am getting the below error when creating a JobClient instance. There are many posts related to the same issue but all the solutions refer to the scenario of submitting the job to a local cluster and not to remote and specifically in my case from a wls container.
JobClient jc = new JobClient(conf);
Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
But running from the command prompt on the cluster works perfectly fine.
Appreciate your timely help!
I had a similar error and added the following jars to classpath and it worked for me:
hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76:hadoop-mapreduce-client-shuffle-2.3.0.jar:hadoop-mapreduce-client-common-2.3.0.jar
It's likely that your app is looking at your old Hadoop 1.x configuration files. Maybe your app hard-codes some config? This error tends to indicate you are using the new client libraries but that they are not seeing new-style configuration.
It must exist since the command-line tools see them fine. Check your HADOOP_HOME or HADOOP_CONF_DIR env variables too although that's what the command line tools tend to pick up, and they work.
Note that you need to install the 'mapreduce' service and not 'yarn' in CDH 4.4 to make it compatible with MR1 clients. See also the '...-mr1-...' artifacts in Maven.
In my case, this error was due to the version of the jars, make sure that you are using the same version as in the server.
export HADOOP_MAPRED_HOME=/cloudera/parcels/CDH-4.1.3-1.cdh4.1.3.p0.23/lib/hadoop-0.20-mapreduce
I my case i was running sqoop 1.4.5 and pointing it to the latest hadoop 2.0.0-cdh4.4.0 which had the yarn stuff also thats why it was complaining.
When i pointed sqoop to hadoop-0.20/2.0.0-cdh4.4.0 (MR1 i think) it worked.
As with Akshay (comment by Setob_b) all I needed to fix was to get hadoop-mapreduce-client-shuffle-.jar on my classpath.
As follows for Maven:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-shuffle</artifactId>
<version>${hadoop.version}</version>
</dependency>
In my case, strangely this error was because in my 'core-site.xml' file, I mentioned "IP-address" rather than "hostname".
The moment I mentioned "hostname" in place of IP address and in "core-site.xml" and "mapred.xml" and re-installed mapreduce lib files, error got resolved.
in my case, i resolved this by using hadoop jar instead of java -jar .
it's usefull, hadoop will provide the configuration context from hdfs-site.xml, core-site.xml ....

Resources