How to Start working with Hadoop - hadoop

Hi I want to learn Hadoop.I have basic idea on how hadoop works with MapReduce framework.
Now i want to practice on my local PC so i want to know how to install hadoop on single Node.
I installed VM Workstation 10 and i tried to install any Linux flavour Operating system to install Hadoop , but iam not able to load Ubuntu into VM ware Workstation ,iam getting error as Exiting intel ...,Operating Not found message.
Can any one please provide me steps on how to start with Hadoop installation.
Should i go for any Distributions(Cloudera,Hortonworks,MapR).If that is simple then tell me how to install those distributions.(I tried even with Cloudera importing vmware file into VMWare workstation it did not worked for me)

You can use the VM given by Udacity for its course on Hadoop. I found it really easy to set up.

Related

are there different ways how to install a cloudera hadoop packages?

can I only install packages via RPM? (RedHat Package Management)
Im using Cloudera and I heard a couple times about CDH Parcel Services but im not sure, if i can do that with this too? or is there another mechanism?
best regards
If you're using Debian or Ubuntu, then you'd use DEB packages, not RPM.
Parcels should work
So would compiling code from source.
Just because you're running Cloudera doesn't make the system any less of a regular Linux machine
Parcels is Cloudera's way of installing their distribution. You install Cloudera Manager, and it will install all of the components using parcels (although I think you have a choice). This is done through a GUI (or API). This is probably the easiest way to go about it.
If you are just learning, the Quick Start VM is not a bad way to get started.
I have done installation of Cloudera using Parcels and it is easier than package installation. Parcels are quite conveniently picked up by Cloudera Manager for installation purposes. Almost everything is ready once parcel installation is done.

Easiest way to install Hadoop in a VM on Windows 7?

I am reading a book about Hadoop now. The books says you need to download and install VMware Workstation Player (Windows 7 version)
https://my.vmware.com/en/web/vmware/free#desktop_end_user_computing/vmware_workstation_player/14_0
Then, apparently, I need to download and install CentOS6, from here.
https://sourceforge.net/projects/centos-6-vmware/
Once the VM is running, you need to go to File > Open, and run the centos program. The problem I am having is that I can't install the VM; I'm getting an error message that reads 'This host supports Intel VT-x, but Intel VT-x is disabled'. So, I Googled this, and I don't have anything that is listed there (no Processor submenu, no Chipset, Advanced CPU Configuration, and no Northbridge). There must install Hadoop, I hope. What is the easiest way to get this up and running on a Windows 7 machine? I just want to follow some of the steps from the book. Thanks.

Installing Apache Hive on Windows without using any virtual machine

Recently I started to learn about Hive. So I wanted to try hands on but the problem is that I am not getting any tutorial to install hive on windows machine. Constraint i have is that-
1. Cannot install Linux in my machine on side with windows as a dual boot.
2. Cannot install VMware or Virtual box.
So all i was looking was to play with Hive queries with all these above mentioned constraint/problems.

Suitable hadoop framework for ubuntu

I want to start working with Hadoop and BigData. I need an easy graphical interface to start. I try Hue but I couldn't get it configured.
Please help me to choose my suitable Hadoop.
I use Ubuntu 14.04.
I think Cloudera,sandbox(by hortonworks) is a easy way.Hard way is installation to Ubuntu.Also i have ubuntu 14.04 and Hadoop(hive,pig),Apache spark exist and i dont need open virtual machine.
There are 3 major Hadoop distributions that you can start with.
Cloudera
Hortonworks
MapR
Each one of them has a UI installer and manager. I think the best for you would be though, to use the virtual environment that these vendors provide.
The Hortonworks Developer Sandbox is an image including Hue as UI to get started. However, the downloadable sandbox image is based on CentOS.
If you want to install a Hortonworks Distribution on Ubuntu, you need to run an Ambari installation (Downloads - Hortonworks Hadoop). Be aware that Hue is not included into the default Ambari installation, but Hue can be installed easily separately. To run properly, Hue on Hortonworks still needs Python 2.6.x.
There are some distributions like Cloudera or Hortonworks but their package needs high machine configuration. For example RAM + 16GB and sometimes it's not possible for the user. In addition, they include some Hadoop related project that user doesn't need at all. If you want to enter this field seriously I strongly recommend installing Hadoop on your own. Doing that you do some configuration and will get familiar with many Hadoop concepts.
You can start using this install tutorial.

Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting stumped by the HDFS section (Module 2) and think it might be easier if I had a Windows specific tutorial. I tried following this one, but some of the steps weren't quite right. I've been trying to find a good tutorial that will work for me on my Windows 7 machine, but am a bit stuck. Is there a good place to go for this? Hadoop seems to be very geared toward Linux users, and unfortunately I have to use my work laptop, which is Windows 7. Can I make this work or does it really only work for Linux users?
The Hadoop tutorial on the Yahoo Developer Network is outdated and problematic. Half of the steps didn't work for me at all (I was running their image in VMware Player on Windows 7), and the other half were vague. The Java code examples were poorly written and wouldn't compile. At any rate, they are written for the old Hadoop API.
I gave up on that tutorial and instead used the Cloudera Demo VM image. This comes pre-configured with Hadoop, Pig, Hive, HBase, etc. I was in business at once and had no problems compiling and running Hadoop jobs and Pig scripts.
The Cloudera Demo VM downloads on their main support page (https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) are all 64-bit. If you are looking for a 32-bit version like I was, you can get one here: https://downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2
This one has a slightly older version of the Cloudera distro (CDH3u0) running on Ubuntu 10.10 with Gnome desktop. I installed Eclipse for compiling my Hadoop jobs, but didn't bother trying to install the Hadoop plugin, which I've heard is problematic. The first time around, I made the mistake of accidentally updating the Cloudera distro to CDH3u3 via the system's Update Manager and this messed up my Hadoop configuration. I didn't know how to reconfigure it properly, so I just started over from the original image.
To get Pig running, you need to first set the JAVA_HOME variable: export JAVA_HOME=/usr/lib/jvm/java-6-sun
Unfortunately, I wasted a ton of time with that old YDN tutorial before a Java developer friend familiar with Hadoop pointed me to the Cloudera distribution.
I was completely new to hadoop and honestly I found the cloudera tutorials and information completely unhelpful. Give the IBM ones a shot, they're super helpful and they are very friendly for beginners. Step by step instructions for pretty much all of the core hadoop applications and a few specific to IBM's distro.
Here's the download link. --
https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-ibmibqsevmw&S_TACT=109HF38W&S_CMP=109HF
You have to make an account but it's free and doesn't take that long.
I can't post more than one link right now but is pretty easy to find the tutorials online and they also exist within the VM.
Also there's a forum that I've posted my questions on when I get stuck and somebody from IBM has always helped me out within an hour to a day. Cant post the link but if you google "IBM InfoSphere BigInsights Forum", its the first hit.
Good Luck!
I am trying to learn Hadoop right now also and what I did was download virtual box ( http://www.virtualbox.org/ ) and load some linux images on it and started following tutorials.
You can even get a pre-made hadoop setup image from cloudera. I think this approach is far better than installing and setting up on your prime machine because in the event there's a problem you're main machine won't be effected(you can simply revert to an old copy of your virtual linux image or scrape it and start again without any impact).
Good luck!
Developing Hadoop on windows is doable but hard to get right. It requires installing Cygwin and getting all the environment variables right can be tricky.
To get started developing on windows I recommend installing vmware player and run the pre configured virtual machine by Cloudera. This simply means you will be doing the Hadoop development in linux without rebooting or reinstalling your windows system and without the installation troubles assiciated with cygwin.
https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM
I've been banging my head against the yahoo tutorial for a long time as well. The Eclipse plugin is no longer maintained and is pretty unreliable. Hopefully the cloudera image will do the the trick.
I have just finished the "Hadoop Fundamentals I - Version 2 " at http://bigdatauniversity.com.
It comes with IBM BigBisunessInsight VMWare images and works very well.
The images include a local mode one and a cluster mode one. It is able to simulate a multiple nodes cluster in my Windows 8 workstation with 8GB RAM.
Hope this information be helpful:-)

Resources