Master and Slave system OS version - hadoop

I'm trying to create my own hadoop clister. My all data nodes have installed ubuntu 18 and Name node is having ubuntu 14.
Is it mandatory that Name node and Data nodes should have same version of OS .. ?

It is recommended to have the same major version at least to avoid kernel vulnerabilities. If you come across these low level issues, they are very difficult to debug.

As #piyush-p said, it's not recommended but as long as you are running the same Java version across all the hosts you should be okay. You probably won't want to
do this if you are using a commercial distribution of Hadoop (HDP, Cloudera) as their
respective setup tools (Ambari, Cloudera Manager) will probably disallow this.
See HDP Support for mix of OS Releases within a cluster for more details.

Related

Suitable hadoop framework for ubuntu

I want to start working with Hadoop and BigData. I need an easy graphical interface to start. I try Hue but I couldn't get it configured.
Please help me to choose my suitable Hadoop.
I use Ubuntu 14.04.
I think Cloudera,sandbox(by hortonworks) is a easy way.Hard way is installation to Ubuntu.Also i have ubuntu 14.04 and Hadoop(hive,pig),Apache spark exist and i dont need open virtual machine.
There are 3 major Hadoop distributions that you can start with.
Cloudera
Hortonworks
MapR
Each one of them has a UI installer and manager. I think the best for you would be though, to use the virtual environment that these vendors provide.
The Hortonworks Developer Sandbox is an image including Hue as UI to get started. However, the downloadable sandbox image is based on CentOS.
If you want to install a Hortonworks Distribution on Ubuntu, you need to run an Ambari installation (Downloads - Hortonworks Hadoop). Be aware that Hue is not included into the default Ambari installation, but Hue can be installed easily separately. To run properly, Hue on Hortonworks still needs Python 2.6.x.
There are some distributions like Cloudera or Hortonworks but their package needs high machine configuration. For example RAM + 16GB and sometimes it's not possible for the user. In addition, they include some Hadoop related project that user doesn't need at all. If you want to enter this field seriously I strongly recommend installing Hadoop on your own. Doing that you do some configuration and will get familiar with many Hadoop concepts.
You can start using this install tutorial.

How do I install Cloudera CDH on 100 Node cluster without using Cloudera manager?

How do I install Cloudera CDH on 100 Node cluster without using Cloudera manager? Installing and configuring CDH manually on each node in a cluster is a difficult task. What tools and technologies are used to automate the task in production?
CDH supports both Parcel based and Package based installation. You can use Puppet/Chef these type of configuration management tools to do the package based install if you wish. However, the recommended way is to use Cloudera Manager to do Parcel-based installation. Cloudera Manager provides many features OOTB including monitoring, configuration versioning, wizard based security configuration, rolling upgrade, etc. If your reason of not using Cloudera Manager is because it is not open source, please note
There is a free version of CM (some enterprise features are not
available)
CM is just a management tool. Your data are still stored
on HDFS and your big data applications (hive scripts, spark/MapReduce
applications, etc) still work on standard open source Hadoop
platform and there is no vendor lock-in.

Cloudera support for docker container or Docker support for CM 5 image

Recently my org is considering Docker. Our group is using cloudera CDH 5.1.2.
1) Does cloudera compatable with Docker container?
2) Is there any known issue related to docker and cloudera combination?
I could not find any topic on docker in this forum.
Any pointer would be helpful.
Thanks,
Amit
An official answer from Cloudera has been posted here :
I read through what docker is, yesterday. I do not think this has
been tested, there are a number of platform virtualization projects in
progress, but I did not see this on the list.
Lookt at its intent, it might work but you would definately want to
test. The thing I'm concearned about is the level of effort to
normalize between distribution types as there are a large volume of
subcomponents that are brought directly into the CDH "Parcel" that are
platform specific.
You might be able to get a CM server and agents deployed in a generic
way, but then you would want CM to manage the deployment of CDH parcel
across the target "cluster" once it was online, rather than
abstracting that install as well.
Bottom line is installing Cloudera Manager inside a Docker container does not seem to be an easy route, because CM needs to manage the installation of the other Hadoop components.
Other options include:
Using Vagrant to create a CDH VM with Cloudera Manager (Cloudera Documentation Link)
Managing CDH components manually without cloudera Manager (Cloudera Documentation Link)

Hadoop installation on asymmetric OS?

I have Cent OS on two nodes and Ubuntu on other two nodes can i install Cloudera 4.5 or later on the servers.
i have searched on internet but could not find any relevant information.
How can i install Cloudera on these 4 servers?
No we can not install Hadoop on a cluster having heterogeneous OS.
This is one of the limitation of Hadoop.

What version of hadoop to install and run?

After reading this article...
http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/
If I were to make a brand new installation of hadoop to work with... is it still 0.23 today that has all the features? Or is there a better version that is out there now that has everything and captures all features and performance? There are so many guides out there that use 0.20... makes it seem as if 1.0 is not to be trusted...
Here is a guide I have followed at least three times to install and run on single node and two-node clusters and Michael does a pretty good job of keeping it current:
Running Hadoop on Ubuntu Linux (Single-Node Cluster)
Running Hadoop on Ubuntu Linux (Multi-Node Cluster)
This uses version Hadoop version 1.0.3 released in May 2012; The latest stable as of this writing is 1.1.2, but if you want to do a first install to test and become familiar a guide like the one above may help you familiarize with the system and then upgrade to the latest-one once you have a reference point.
Check the Hadoop documentation for the status of the different releases. As of now 1.0.4 is the stable release.
I came across this tutorial for setting up a single node cluster in ubuntu 12.04.
http://preciselyconcise.com/apis_and_installations/hadoop_installation.php. I followed the tutorial and i successfully installed hadoop 1.1.2 on my linux system.

Resources