Using Centos v7 for Setup a New Cluster - hadoop

I want to setup a cluster of Hadoop. Is there any problem if i use Centos v7 for operating system?
I already use Centos 6.5 and in this setup i want to use the newest version.

I test it my self and the result was this version is not stable i got a lot of crashes and randomly reboot in cluster so i back to v6.5 .

Related

What's the latest version of Cloudera Manager and CDH?

I'm currently running on cloudera Manager 5.14.3 and CDH 5.14.2
and want to upgrade to a higher and more stable version. Which version is the most stable and advisable to upgrade to? Version 6 or version 7?
In general Cloudera always recommends you to go the highest GA version of the big data platform.
However, be advised that Cloudera Manager 7 is actually the manager of the new generation of the platform: CDP.
As such you may need to check with your account team whether any additional steps are needed to upgrade to the platform. This is likely still worth looking into, as the CDH 6 end of life is approaching in about 18 months already.

Master and Slave system OS version

I'm trying to create my own hadoop clister. My all data nodes have installed ubuntu 18 and Name node is having ubuntu 14.
Is it mandatory that Name node and Data nodes should have same version of OS .. ?
It is recommended to have the same major version at least to avoid kernel vulnerabilities. If you come across these low level issues, they are very difficult to debug.
As #piyush-p said, it's not recommended but as long as you are running the same Java version across all the hosts you should be okay. You probably won't want to
do this if you are using a commercial distribution of Hadoop (HDP, Cloudera) as their
respective setup tools (Ambari, Cloudera Manager) will probably disallow this.
See HDP Support for mix of OS Releases within a cluster for more details.

How to install apache Storm on windows 7

Can anyone tell me that how can i install apache storm on windows 7 ?
I am new to big data so need a little help.
Please explain what does not work. Storm is Java + Python so setting up on Windows should not be a problem. Zookeeper run on Windows just fine. There are many Vagrant / Docker implementations that will work as well. So what problem are you trying to resolve?
BTW, if you are trying to set it up for development, you dont need a cluster. You can run it with local cluster settings. (check storm documentation)
The general steps are:
Download Zookeeper
Untar and configure single node cluster
Download Storm
Follow Storm documentation and configure Nimbus/Supervisor settings
Follow Storm documentation to start Nimbus, Supervisor, Storm UI and Log Viewer.
Make sure you read documentation of 0.10.0 and 1.0.x These releases are not compatible and some of the libs you may want to use will not work with the new Storm release.

Ubuntu Reboot for elasticsearch

I have Ubuntu 12.04 server 64 bit on linode. I am using Elasticsearch as a service , rabbitmq , Celery .
The problem is after having my instance running for 4 months and 21 days suddenly elasticsearch failed to start. When I reboot the ubuntu it appears to be working correctly.
Now the question is , what is the standard interval/time to reboot a ubuntu server which is deployed in production ?
Should I also reboot the ubuntu after 4/5 months ? How the big production systems handle the maintenance.
If everything would work ok and you don't update any programs you would not need to reboot. So the advice is: analyze. Look in the logs.
I was like, when not enough memory in the jvm
Es configuration: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html

What version of hadoop to install and run?

After reading this article...
http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/
If I were to make a brand new installation of hadoop to work with... is it still 0.23 today that has all the features? Or is there a better version that is out there now that has everything and captures all features and performance? There are so many guides out there that use 0.20... makes it seem as if 1.0 is not to be trusted...
Here is a guide I have followed at least three times to install and run on single node and two-node clusters and Michael does a pretty good job of keeping it current:
Running Hadoop on Ubuntu Linux (Single-Node Cluster)
Running Hadoop on Ubuntu Linux (Multi-Node Cluster)
This uses version Hadoop version 1.0.3 released in May 2012; The latest stable as of this writing is 1.1.2, but if you want to do a first install to test and become familiar a guide like the one above may help you familiarize with the system and then upgrade to the latest-one once you have a reference point.
Check the Hadoop documentation for the status of the different releases. As of now 1.0.4 is the stable release.
I came across this tutorial for setting up a single node cluster in ubuntu 12.04.
http://preciselyconcise.com/apis_and_installations/hadoop_installation.php. I followed the tutorial and i successfully installed hadoop 1.1.2 on my linux system.

Resources