Can open source hbase work on Cloudera distribution of Hadoop - hadoop

I have a Cloudera Distribution installed as a 5 Node Cluster. Now I do not want to use the Hbase parcel that comes with cloudera,
but instead I want to use only HDFS from the cloudera setup and an opensource version of Hbase.
So my question is will this work or I will have to install normal open-source version of Apache Hadoop for HDFS and then go forward with the Opensource version of Apache Hbase on top of it.

As long as the version of hadoop matches the version of used by the hadoop client used by the version of hbase matches it should all work.

Related

How to check the hadoop distribution used in my cluster?

How can I know whether my cluster has been setup using Hortonworks,Cloudera or normal installation of hadoop components?
Also how can I know the port number of various services?
It is difficult to identify hadoop distribution from port number, since Apache, Hortonworks, Cloudera distros uses different port numbers
Other options are to check for cluster management service agents (Cloudera Manager - agent start up script - /etc/init.d/cloudera-scm-agent , Hortonworks - Ambari agent start up script - /etc/init.d/ambari-agent, Vanilla Apache hadoop will not have any agents in the server
Another option is to check hadoop classpath, below command can be used to get the classpath.
`hadoop classpath`
Most of hadoop distributions include distro name in the classpath, If classpath doesn't contains any of below keywords, distribution/setup will be Apache/Normal installation.
hdp - (Hortonworks)
cdh - (Cloudera)
The simplest way is to run hadoop version command and in output you will see, what version of Hadoop you are having and also which distribution and its version you are running with. If you will find words like cdh or hdp then cdh stands for cloudera and hdp for hortonworks.
For example, here I am having cloudera and with hadoop version command below is output.
Here in first line Hadoop version followed by hadoop distribution and its version.
Hope this will help.
Command hdfs version will give you version of the hadoop and its distribution

Create hdfs when using integrated spark build

I'm working with Windows and trying to set up Spark.
Previously I installed Hadoop in addition to Spark, edited the config files, run the hadoop namenode -format and away we went.
I'm now trying to achieve the same by using the bundled version of Spark that is pre built with hadoop - spark-1.6.1-bin-hadoop2.6.tgz
So far it's been a much cleaner, simpler process however I no longer have access to the command that creates the hdfs, the config files for the hdfs are no longer present and I've no 'hadoop' in any of the bin folders.
There wasn't an Hadoop folder in the spark install, I created one for the purpose of winutils.exe.
It feels like I've missed something. Do the pre-built versions of spark not include hadoop? Is this functionality missing from this variant or is there something else that I'm overlooking?
Thanks for any help.
By saying that Spark is built with Hadoop, it is meant that Spark is built with the dependencies of Hadoop, i.e. with the clients for accessing Hadoop (or HDFS, to be more precise).
Thus, if you use a version of Spark which is built for Hadoop 2.6 you will be able to access HDFS filesystem of a cluster with the version 2.6 of Hadoop via Spark.
It doesn't mean that Hadoop is part of the pakage and downloading it Hadoop is installed as well. You have to install Hadoop separately.
If you download a Spark release without Hadoop support, you'll need to include the Hadoop client libraries in all the applications you write wiƬhich are supposed to access HDFS (by a textFile for instance).
I am also using same spark in my windows 10. What I have done create C:\winutils\bin directory and put winutils.exe there. Than create HADOOP_HOME=C:\winutils variable. If you have set all
env variables and PATH like SPARK_HOME,HADOOP_HOME etc than it should work.

Which Hadoop 0.23.8 jars are needed for HBase 0.94.8

I'm using Hadoop 0.23.8 pseudo distributed and HBase 0.94.8. My HBase master is failing with:
Server IPC version 5 cannot communicate with client version 4
I think this is because HBase is using hadoop-core-1.0.4.jar in its lib folder.
Now http://cloudfront.blogspot.in/2012/06/how-to-configure-habse-in-pseudo.html#.UYfPYkAW38s suggests I should replace this jar by copying:
the hadoop-core-*.jar from your HADOOP_HOME ...
but there are no hadoop-core-*.jars in 0.23.8.
Will this process work for 0.23.8, and if so, which jars should I be using?
TIA!
I gave up with this and am using hadoop 2.2.0 which works well (ish) with HBase.

Setting up Hadoop Client on Mac OS X

Currently, I have 3-node cluster running CDH 5.0 using MRv1. I am trying to figure out how to setup Hadoop on my Mac. So, I can submit jobs to the cluster. According to the "Managing Hadoop API Dependencies in CDH 5", you just need the files in /usr/lib/hadoop/client-0.20/* Do I need the following files too? Does Cloudera has hadoop-client in tarball?
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
Yes, I'nk you can make use of cloudera tarball for setting up hadoop client, the same can be downloaded from the following path, configuration files are availble under etc/hadoop/ directory under Hadoop, just need to modify those files according to your environment.
http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.2.0-cdh5.0.0-beta-2.tar.gz
If the above link doesn't match your version, use the following link for getting the available hadoop versions
http://archive-primary.cloudera.com/cdh5/cdh/5/

CDH4 installation using tarball

I have been struggling to install CDH via tarball, there is no document that describes the steps or guides through. I do have root access on the server & wish to install CDH4 via tarball in Pseudo mode. Can anyone help?. On the same server apache hadoop is also installed, i want to install this CDH, without effecting the existing apache hadoop.
It will not work..because in the end CDH4 will use the same ports which your existing apache hadoop is using..It will work ..if you shutdown your existing hadoop cluster and then start your CDH4 cluster. Or else change all the port numbers for namenode,secondary namenode,jobtracker, tasktracker and datanode and their respective web UI's port..which is kind of tedious.. It would be also helpful if you provide some error logs..So I can highlight what exactly is the problem.

Resources