HDP 2.1 to 2.2 upgrade RHEL6 - hadoop

I have a cluster with 1 NameNode and 4 DataNodes on Red Hat Linux Enterprise 6. My HDP version is 2.1. Ambari version was 1.7 but I upgraded it to 2.1. I want to upgrade HDP to version 2.2. I read that if I want to upgrade HDP from 2.1 to 2.2 I have to do it before I upgrade Ambari to 2.1. When I am upgrading hdp to 2.2 ambari does not see any changes and everything is not working. I am using this tutorial:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/HDP_Man_Upgrade_v22/index.html#Item1
How can I do it? I tried to downgrade ambari to 1.7 but I got many errors. What if I try upgrade now hdp to 2.2 and then my ambari from 2.1 to 2.1.1. Will it work? The problem is that I have very little time.
Thank you in advance

I am upgrading from HDP-2.0/Hadoop-2.2 to HDP-2.2/Hadoop-2.6 (maybe temporarily, on the way to HDP-2.3) on a development/testing cluster. So far I have gotten the updated HDFS up and running. Not yet starting/stopping HDFS via Ambari, nor do I have YARN running yet.
Update: I got YARN, MapReduce, and Hive to run after finding HDP-2.2 documentation (current links added below).
Here are my rough notes on how I got this far:
upgrade ambari-1.4 to 1.7
update HDP yum repo file on each node and update with yum (cssh)
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.6.0/hdp.repo
hdp-select
sudo su -l hdfs -c "/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode -upgrade"
/etc/hadoop/conf.empty/core-site.xml.rpmsave, hdfs-site.xml
sudo su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-datanode/../hadoop/sbin/hadoop-daemon.sh start datanode"
hadoop dfsadmin -finalizeUpgrade
[update]
upgrade ambari-1.7 to 2.1
configure yarn & mapreduce - yarn-site.xml, yarn-env.sh[?], container-executor.cfg, mapred-site.xml
sudo ln -s /usr/hdp/2.2.6.0-2800/hadoop/libexec/ /usr/lib/hadoop/ # ambari insisting /usr/lib/hadoop
stop-start resourcemanager, start nodemanagers
hive-site.xml in both conf.dist & conf.server; manual start
The following resources were useful:
https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+1.7.0+from+Public+Repositories
https://developer.ibm.com/hadoop/blog/2015/10/08/back-up-and-restore-ambari-server-postgresql-database/
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_Installing_HDP_AMB/content/_hdp_stack_repositories.html
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_upgrading_Ambari/content/_Upgrade_HDFS_mamiu.html
https://wiki.apache.org/hadoop/Hadoop_Upgrade
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_upgrading_Ambari/content/_complt_upgrd_21-23_upgrade_hdfs.html
http://solaimurugan.blogspot.com/2014/11/upgrade-hadoop-with-latest-version.html
https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+2.0.2+from+Public+Repositories
[update]
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/bk_upgrading_hdp_manually/content/index.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/bk_upgrading_hdp_manually/content/configure-yarn-mr-21.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/bk_upgrading_hdp_manually/content/start-hive-hcat-21.html
https://brucebcampbell.wordpress.com/2014/12/11/hortonworks-fix-missing-jar-error-in-hive-after-upgrade-to-hdp-2-2/

Related

Install ambari on existing single node hadoop

I have single node hadoop setup on Ubuntu(personal dev env.).
Now want to install Ambari.
Question: Can we install Ambari on existing hadoop set up if Yes than assist me.
No, Ambari requires you first install Ambari agents, then Ambari server to monitor the agents + install / configure Hadoop.
In theory, if you installed Hadoop in the same way that Ambari expects, it might work, but adding any node into Ambari will throw lots of warnings if the node is running any extra process other than the agent

Spark clustering with yarn

I would like to make spark clustering with yarn.
Do i need
installing hadoop master and slaves with yarn config?
installing hadoop master/slaves and yarn master/slaves separately?
If 1 is ok, I'm going to work with this docker image(link). Is it suitable for this?
Installing hadoop master and slave with yarn config is sufficient in order to run spark over yarn but then you also need to make sure that spark version you are downloading supports yarn. once installed spark should be able to access yarn configurations and required jar files related to yarn are also in path of spark.

how to install apache phoenix to ambari 1.7 with hbase?

I'm new to hadoop. I want to install phoenix with hbase but I have installed hadoop cluster using ambari 1.7 on ubuntu. I'm not able to find any tutorial to do so.
If you build up your own Hadoop stack:
https://phoenix.apache.org/download.html
https://phoenix.apache.org/installation.html
If you use e.g. IBM Open Platform (which is for free btw):
https://developer.ibm.com/hadoop/blog/2015/10/21/installing-apache-phoenix-ibm-open-platform-apache-hadoop-4-1/
hbase should be available as service under add service button on home page.
For installing phoenix i used this link
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_installing_manually_book/content/upgrade-22-7-a.html
basically yum install phoenix on each node and then create soft links to the phoenix server jar file
hth

How to find cdh version hadoop

When connecting to Hadoop cluster, how can I know which version of Hadoop this cluster is running? In particular this is important for proper configuration of libraries when compiling and packaging Hadoop Java jobs with Maven.
The simplest way if you have ssh access to hadoop node is by running command
$ hadoop version
If you are looking for CDH version then check /usr/lib/hadoop/cloudera/cdh_version.properties
In cdh, in the cluster I am using, there is not any cdh_version.properties (or I couldn't find it)
If your cluster uses "Parcels", you could check which version of cdh is used by doing:
/opt/cloudera/parcels
And you could see the version as the name of the folder:
CDH-5.5.1-1.cdh5.5.1.p0.11
Note: I know that this is a not a general rule for getting which cdh version is used. I am trying to show an alternative way that it worked to me.
We can check the installed version with the help of following command:
cat /usr/lib/hadoop/cloudera/cdh_version.properties
Hope this may help you.

How to start hadoop in a certain version

We are using hadoop-2.0.0-cdh4.0.0 and we start namenode using hadoop namenode, how do I start the hadoop processes in either 0.20 mode or 0.23 mode?
You'll need to install and configure these version of hadoop if you want to use them - there is no 'switch' to allow 2.0.0 (or any other hadoop version) to run in a lower version / mode.

Resources