Install Spark 1.5 in existing Hortonworks HDP Cluster - hadoop

I'm new to Hadoop and want find the way how to install Spark 1.5.1 on the existing Hadoop cluster. 4 nodes, Ubuntu 14.04. Hadoop 2.3.2. Ambari Version 2.1.2.1. Followed tutorial, but there are spark version for the Ubuntu 12, and I cannot install it on our system. So after step 1 I stucked. sudo apt-get install spark_2_3_2_1_12-master -y
Got an error:
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package spark_2_3_2_1_12-master
Can anyone provide us with some guidline, how to install 1.5?
Currently we have Spark 1.4 installed, up, and running, but due to requirement of functionality need the 1.5!

Ubuntu 14.04 Trusty Tahr is not officially supported by HDP. If you look at the repos available for stack updates, HDP stack public repos, they only have ones up for Centos, Red Hat, and Oracle Linux. Did you try using Spark's Simple Build Tool to build spark-1.5 source against your Hadoop install ? You would need to set SPARK_HADOOP_HOME=your hadoop location. See this for step by step with Ubuntu 14.04 and an earlier version of Spark. I don't see why the same steps would fail with Spark 1.5.

Related

Updating individual CDH Components in a Community Edition via '1 Click Installer'

Can someone let me know if it possible to update individual CDH component to 5.13 from 5.7 via "1 Click Installer" for Community Edition?
For example, let's say I want to update only the hadoop-hdfs-datanode to the latest in a server. If I do sudo apt-get install hadoop-hdfs-datanode it is updating other CDH component also running in that node (like resource-manager, node-manager, etc).
As discussed here if I am trying to upgrade hadoop-yarn-resourcemanager it is upgrading almost all the cdh hadoop components
support#platform1:~$ sudo apt-get install hadoop-yarn-resourcemanager
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
hadoop hadoop-0.20-mapreduce hadoop-client hadoop-conf-pseudo hadoop-hdfs
hadoop-hdfs-datanode hadoop-hdfs-journalnode hadoop-hdfs-namenode
hadoop-hdfs-secondarynamenode hadoop-hdfs-zkfc hadoop-mapreduce
hadoop-mapreduce-historyserver hadoop-yarn hadoop-yarn-nodemanager
The following packages will be upgraded:
hadoop hadoop-0.20-mapreduce hadoop-client hadoop-conf-pseudo hadoop-hdfs
hadoop-hdfs-datanode hadoop-hdfs-journalnode hadoop-hdfs-namenode
hadoop-hdfs-secondarynamenode hadoop-hdfs-zkfc hadoop-mapreduce
hadoop-mapreduce-historyserver hadoop-yarn hadoop-yarn-nodemanager
hadoop-yarn-resourcemanager
15 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
it is updating other CDH component also running in that node
I doubt it is upgrading everything in the node, just the dependent services of upgrading the hadoop client.
If you were to install Hadoop all by itself, it includes HDFS, MapReduce, YARN, and the Hadoop client libraries. Therefore, it makes sense that upgrading the datanode package would try to grab those, but not HBase, Hive, Pig, Spark, Oozie, etc. packages.
Essentially, you need to ensure all your Hadoop client libraries are the same version. CDH itself hasn't moved off of Hadoop 2.6.0 between those releases, although it has added patches to that base release, so it might be fine to upgrade.
However, let's take HBase as an example. From the documentation, it says Hadoop 2.6.0, 2.7.0 nor Hadoop 2.8.x are supported; Hadoop 3.x is not tested; only 2.6.1+ or 2.7.1+ are supported.
And continues on to say that
In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase... Make sure you replace the jar in HBase across your whole cluster. Hadoop version mismatch issues have various manifestations but often all look like its hung
All component upgrades should be followed through, and Cloudera makes the effort to ensure all components of a single release work together, not mixed across releases.

Failed dependencies when install pxf service

When I rpm pxf service in hawq, I got some errors:
error: Failed dependencies:
hadoop >= 2.6.0 is needed by pxf-service-0:3.0.0-root.noarch
hadoop-hdfs >= 2.6.0 is needed by pxf-service-0:3.0.0-root.noarch
What's your advice here ?
Please make sure the PXF rpm OS architecture version matches. For example if the PXF rpm is built for RHEL6 and you are installing on RHEL7 then you may see some dependency issues
Could you please make sure the version of hadoop you are running in the cluster .I guess you might be running a lower version of hadoop .You have to run atleast 2.6 version of hadoop to run the current version of pxf .
The wiki here use the rpm bigtop(hadoop).
https://cwiki.apache.org/confluence/display/HAWQ/Build+Package+and+Install+with+RPM
It means if I install with rpm(HAWQ 2.2.0), the other ways (using binary hadoop without rpm installs like tar) are not support.
If I install hadoop use tar, I must build HAWQ from source code for now.
Please refer to:
https://issues.apache.org/jira/browse/HAWQ-1568

Upgrading Hadoop from version CDH4 to CDH5

I need to upgrade Hadoop from CDH4 to CDH5. I have 5 nodes.
Can I upgrade using Cloudera Manager using parcels?
What is the easiest way to upgrade Hadoop? Can someone provide me steps?
Thanks,
Raj Baba
In Cloudera hadoop distribution, Cloudera Manager(CM) and CDH are different components. For upgrading CDH4 to CDH5, you would need to upgrade your Cloudera Manager version to CM5 first. You cannot use parcels for upgrading your Cloudera Manager version as this is the base component. Depends on your Linux distribution, tools (yum/apt) can be used for upgrading your CM.
If you are using CentOS or RHEL, you may use yum for upgrading your CM, update your /etc/yum.repos.d/cloudera-manager.repo file as follows :
[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 6 x86_64
name=Cloudera Manager
baseurl=http://archive-primary.cloudera.com/cm5/redhat/6/x86_64/cm/5.2.0/
gpgkey = http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera
gpgcheck = 0
After updating this file, you may use the command yum upgrade cloudera-manager-* for upgrading.
Once CM is CM5, for upgrading CDH4 to CDH5 parcel is the best option. Following cloudera documentation can be used for upgrading
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_mc_upgrade_tocdh5_using_parcels.html

Ambari install script location(s)

I'm setting up a HDP 2.1 cluster with Apache Ambari. All servers run SLES 11 SP3. The setup fails if I select to install Ganglia because of some dependencies:
Installing package apache2?mod_php* ('/usr/bin/zypper --quiet install --auto-agree-with-licenses --no-confirm apache2?mod_php*')
Problem: apache2-mod_php53-5.3.17-0.27.1.x86_64 conflicts with apache2-mod_php5 provided by apache2-mod_php5-5.2.14-0.7.30.50.1.x86_64
Solution 1: Following actions will be done:
do not install apache2-mod_php5-5.2.14-0.7.30.50.1.x86_64
deinstallation of php5-5.2.14-0.7.30.50.1.x86_64
deinstallation of php5-xmlwriter-5.2.14-0.7.30.50.1.x86_64
[... more PHP 5.2.x packages ...]
Solution 2: do not install apache2-mod_php53-5.3.17-0.27.1.x86_64
Apparently the Regex picks the 5.3 version, a 5.2 version would be available though.
So my question is: Where is the install script stored, that Ambari is running here? I would like to replace the regex with the correct version of the package.
Information about what packages are to be installed is stored in
/var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/GANGLIA/metainfo.xml
Change the value and restart the Ambari Server for the changes to take effect.

How to download CDH4 setup manually

How can I download the CDH4 setup manually?
I mean I want to download the setup without using apt-get from the ubuntu command prompt.
CDH4 is Cloudera's distribution of Apache Hadoop. CDH is a collection of Apache Hadoop and several components of its ecosystem.
Assuming that you are requesting for the source, each component can be downloaded as a tarball from the following location:
CDH Packaging and Tarball Information

Resources