Suitable hadoop framework for ubuntu - hadoop

I want to start working with Hadoop and BigData. I need an easy graphical interface to start. I try Hue but I couldn't get it configured.
Please help me to choose my suitable Hadoop.
I use Ubuntu 14.04.

I think Cloudera,sandbox(by hortonworks) is a easy way.Hard way is installation to Ubuntu.Also i have ubuntu 14.04 and Hadoop(hive,pig),Apache spark exist and i dont need open virtual machine.

There are 3 major Hadoop distributions that you can start with.
Cloudera
Hortonworks
MapR
Each one of them has a UI installer and manager. I think the best for you would be though, to use the virtual environment that these vendors provide.
The Hortonworks Developer Sandbox is an image including Hue as UI to get started. However, the downloadable sandbox image is based on CentOS.
If you want to install a Hortonworks Distribution on Ubuntu, you need to run an Ambari installation (Downloads - Hortonworks Hadoop). Be aware that Hue is not included into the default Ambari installation, but Hue can be installed easily separately. To run properly, Hue on Hortonworks still needs Python 2.6.x.

There are some distributions like Cloudera or Hortonworks but their package needs high machine configuration. For example RAM + 16GB and sometimes it's not possible for the user. In addition, they include some Hadoop related project that user doesn't need at all. If you want to enter this field seriously I strongly recommend installing Hadoop on your own. Doing that you do some configuration and will get familiar with many Hadoop concepts.
You can start using this install tutorial.

Related

are there different ways how to install a cloudera hadoop packages?

can I only install packages via RPM? (RedHat Package Management)
Im using Cloudera and I heard a couple times about CDH Parcel Services but im not sure, if i can do that with this too? or is there another mechanism?
best regards
If you're using Debian or Ubuntu, then you'd use DEB packages, not RPM.
Parcels should work
So would compiling code from source.
Just because you're running Cloudera doesn't make the system any less of a regular Linux machine
Parcels is Cloudera's way of installing their distribution. You install Cloudera Manager, and it will install all of the components using parcels (although I think you have a choice). This is done through a GUI (or API). This is probably the easiest way to go about it.
If you are just learning, the Quick Start VM is not a bad way to get started.
I have done installation of Cloudera using Parcels and it is easier than package installation. Parcels are quite conveniently picked up by Cloudera Manager for installation purposes. Almost everything is ready once parcel installation is done.

Cloudera support for docker container or Docker support for CM 5 image

Recently my org is considering Docker. Our group is using cloudera CDH 5.1.2.
1) Does cloudera compatable with Docker container?
2) Is there any known issue related to docker and cloudera combination?
I could not find any topic on docker in this forum.
Any pointer would be helpful.
Thanks,
Amit
An official answer from Cloudera has been posted here :
I read through what docker is, yesterday. I do not think this has
been tested, there are a number of platform virtualization projects in
progress, but I did not see this on the list.
Lookt at its intent, it might work but you would definately want to
test. The thing I'm concearned about is the level of effort to
normalize between distribution types as there are a large volume of
subcomponents that are brought directly into the CDH "Parcel" that are
platform specific.
You might be able to get a CM server and agents deployed in a generic
way, but then you would want CM to manage the deployment of CDH parcel
across the target "cluster" once it was online, rather than
abstracting that install as well.
Bottom line is installing Cloudera Manager inside a Docker container does not seem to be an easy route, because CM needs to manage the installation of the other Hadoop components.
Other options include:
Using Vagrant to create a CDH VM with Cloudera Manager (Cloudera Documentation Link)
Managing CDH components manually without cloudera Manager (Cloudera Documentation Link)

How to install cloudera manager without cdh installation?

I have a hadoop environment with tarball which I downloaded at http://hadoop.apache.org/releases.html#Download.
Then, I must use cloudera manager to monitor my mapreduce application.
is possible using cloudera manager without cdh installation?
Cloudera Manager is useless without CDH. Any reason why you would not just use that? Usually the question is the other way around ("I have CDH installed, do I need to use Cloudera Manager?")

How to Start working with Hadoop

Hi I want to learn Hadoop.I have basic idea on how hadoop works with MapReduce framework.
Now i want to practice on my local PC so i want to know how to install hadoop on single Node.
I installed VM Workstation 10 and i tried to install any Linux flavour Operating system to install Hadoop , but iam not able to load Ubuntu into VM ware Workstation ,iam getting error as Exiting intel ...,Operating Not found message.
Can any one please provide me steps on how to start with Hadoop installation.
Should i go for any Distributions(Cloudera,Hortonworks,MapR).If that is simple then tell me how to install those distributions.(I tried even with Cloudera importing vmware file into VMWare workstation it did not worked for me)
You can use the VM given by Udacity for its course on Hadoop. I found it really easy to set up.

What version of hadoop to install and run?

After reading this article...
http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/
If I were to make a brand new installation of hadoop to work with... is it still 0.23 today that has all the features? Or is there a better version that is out there now that has everything and captures all features and performance? There are so many guides out there that use 0.20... makes it seem as if 1.0 is not to be trusted...
Here is a guide I have followed at least three times to install and run on single node and two-node clusters and Michael does a pretty good job of keeping it current:
Running Hadoop on Ubuntu Linux (Single-Node Cluster)
Running Hadoop on Ubuntu Linux (Multi-Node Cluster)
This uses version Hadoop version 1.0.3 released in May 2012; The latest stable as of this writing is 1.1.2, but if you want to do a first install to test and become familiar a guide like the one above may help you familiarize with the system and then upgrade to the latest-one once you have a reference point.
Check the Hadoop documentation for the status of the different releases. As of now 1.0.4 is the stable release.
I came across this tutorial for setting up a single node cluster in ubuntu 12.04.
http://preciselyconcise.com/apis_and_installations/hadoop_installation.php. I followed the tutorial and i successfully installed hadoop 1.1.2 on my linux system.

Resources