CDH4 installation using tarball - hadoop

I have been struggling to install CDH via tarball, there is no document that describes the steps or guides through. I do have root access on the server & wish to install CDH4 via tarball in Pseudo mode. Can anyone help?. On the same server apache hadoop is also installed, i want to install this CDH, without effecting the existing apache hadoop.

It will not work..because in the end CDH4 will use the same ports which your existing apache hadoop is using..It will work ..if you shutdown your existing hadoop cluster and then start your CDH4 cluster. Or else change all the port numbers for namenode,secondary namenode,jobtracker, tasktracker and datanode and their respective web UI's port..which is kind of tedious.. It would be also helpful if you provide some error logs..So I can highlight what exactly is the problem.

Related

How to check the hadoop distribution used in my cluster?

How can I know whether my cluster has been setup using Hortonworks,Cloudera or normal installation of hadoop components?
Also how can I know the port number of various services?
It is difficult to identify hadoop distribution from port number, since Apache, Hortonworks, Cloudera distros uses different port numbers
Other options are to check for cluster management service agents (Cloudera Manager - agent start up script - /etc/init.d/cloudera-scm-agent , Hortonworks - Ambari agent start up script - /etc/init.d/ambari-agent, Vanilla Apache hadoop will not have any agents in the server
Another option is to check hadoop classpath, below command can be used to get the classpath.
`hadoop classpath`
Most of hadoop distributions include distro name in the classpath, If classpath doesn't contains any of below keywords, distribution/setup will be Apache/Normal installation.
hdp - (Hortonworks)
cdh - (Cloudera)
The simplest way is to run hadoop version command and in output you will see, what version of Hadoop you are having and also which distribution and its version you are running with. If you will find words like cdh or hdp then cdh stands for cloudera and hdp for hortonworks.
For example, here I am having cloudera and with hadoop version command below is output.
Here in first line Hadoop version followed by hadoop distribution and its version.
Hope this will help.
Command hdfs version will give you version of the hadoop and its distribution

How could I install Hortonworks' HDP?

I am new on this and I want to know how could I install the solution provided by Hortonworks, HDP, (http://hortonworks.com/products/data-center/hdp/) following the next specification: I have 2 Virtual Machines and another local machine to work with, and I want to use the 2 VM as Master node and Worker node by the time I configure Apache SPARK.
But my question is: what do I have to do to install HDP correctly? I have to install te solution in my local machine and configure Apache SPARK to use those 2 Virtual Machines as Master node and Worker node? Or I must to install HDP in the 3 machines that I have?
I repeat that I am new on this, and it wold be very helpfull for me any answer or comment that you could give.
Thank you, so much!
if you are trying to deploy HDP on a cluster (MultiNode Enviroment)
use Apache Ambari to Install HDP and other services such as spark.
I have tried it on centos environment below are the steps and links for installation.
get the repo by using the command below
wget
http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0/ambari.repo
Install Ambari by using the command below
yum install ambari-server
Start the server by using the command below
ambari-server start
Now you can setup your by going to web browser and typing
http://:8080.
Now you can select the desired HDP version. Add the other nodes and services and deploy your cluster.
Detailed steps for installation can be found here

How to install impala on an already running hadoop cluster

I have an already up and running Hadoop, 5-node cluster. I want to install Impala on the HDFS cluster without the Cloudera Manager. Can anyone supposedly guide me through the process or a link of the same.
Thanks.

Setting up Hadoop Client on Mac OS X

Currently, I have 3-node cluster running CDH 5.0 using MRv1. I am trying to figure out how to setup Hadoop on my Mac. So, I can submit jobs to the cluster. According to the "Managing Hadoop API Dependencies in CDH 5", you just need the files in /usr/lib/hadoop/client-0.20/* Do I need the following files too? Does Cloudera has hadoop-client in tarball?
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
Yes, I'nk you can make use of cloudera tarball for setting up hadoop client, the same can be downloaded from the following path, configuration files are availble under etc/hadoop/ directory under Hadoop, just need to modify those files according to your environment.
http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.2.0-cdh5.0.0-beta-2.tar.gz
If the above link doesn't match your version, use the following link for getting the available hadoop versions
http://archive-primary.cloudera.com/cdh5/cdh/5/

where is the hadoop task manager UI

I installed the hadoop 2.2 system on my ubuntu box using this tutorial
http://codesfusion.blogspot.com/2013/11/hadoop-2x-core-hdfs-and-yarn-components.html
Everything worked fine for me and now when I do
http://localhost:50070
I can see the management UI for HDFS. Very good!!
But the I am going through another tutorial which tells me that there must be a task manager UI running at http://mymachine.com:50030 and http://mymachine.com:50060
on my machine I cannot open these ports.
I have already done
start-dfs.sh
start-yarn.sh
start-all.sh
is something wrong? why can't I see the task manager UI?
You have installed YARN (MRv2) which runs the ResourceManager. The URL http://mymachine.com:50030 is the web address for the JobTracker daemon that comes with MRv1 and hence you are not able to see it.
To see the ResourceManager UI, check your yarn-site.xml file for the following property:
yarn.resourcemanager.webapp.address
By default, it should point to : resource_manager_hostname:8088
Assuming your ResourceManager runs on mymachine, you should see the ResourceManager UI at http://mymachine.com:8088/
Make sure all your deamons are up and running before you visit the URL for the ResourceManager.
For Hadoop 2[aka YARN/MRV2] - Any hadoop installation version-ed 2.x or higher its at port number 8088. eg. localhost:8088
For Hadoop 1 - Any hadoop installation version-ed lower than 2.x[eg 1.x or 0.x] its at port number 50030. eg localhost:50030
By default HadoopUI location is as below
http://mymachine.com:50070

Resources