Hadoop cluster with Linux as master and windows 7 as slave - hadoop

I want to set up a hadoop environment with linux fedora as master and windows 7 machine as slave. Is this combination possible and if so, do I need to install cygwin in windows 7?

The good practice says do not run hadoop on the Windows (simple as that ).
Why do you want to do that?
In case you want to test something use pseudo distributed mode (run all hadoop services on one machine)
Additional thing, I would recommend to use some distribution of the hadoop, for instance cloudera.
This link explains step-by-step how to setup it.
https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation+Guide
This is pretty simple and what is the important, very briefly documented

Related

Hadoop install on computer or virtual machine?

I have a school project that requires Hadoop installation (It is basically so we get familiar with it. I don't see it needing further applications). Would you recommend installing it on my computer (I have a mac with M1) or using parallels and installing it in a windows VM?
TIA
I would definitely not recommend a Windows environment for Hadoop, virtual or not.
If it's a throw away environment, a VM (or Docker setup) would be preferred. However, it's easiest installed directly on the host (brew install hadoop), and will therefore have full access to your machine for multi threading.
Alternatively, cloud providers offer schools deep discounts, and a cluster of several machines is a few clicks away rather than needing to tune everything just for your one machine.

Mesos slaves on windows server 2012r2, what are my options?

I have a cluster of machines running windows server 2012R2.
I would like to manage them with mesos.
To the best of my knowledge, microsoft is actively contributing to mesos (DC/OS) and will support containers natively on windows server 2016. Furthermore, it looks like there is another type of container flavour using hyper-v.
I can run my mesos masters on linux hosts. However I need my slaves on windows server 2012R2 hosts. It is not clear to me which technologies are already available (and production-ready) for my windows server version.
What are my options to use mesos to manage the resources of my windows server machines ?
Is the mesos-agent for windows (server 2012 R2) production ready ?
Can I use containers (hyper-v or docker) ? If not, is the resource isolation working in Windows (in linux you can use cgroups) ?
Can I run any framework I like or there are some not compatible with windows ?
Mesos version 1.0.0 was recently released that allows you to run the slave and launcher on windows. Not the master unfortunately. Its still Linux, but it doesn't really ever need to be Windows? The slave was the important bit for bringing Windows machines into the Mesos domain.
I've just been investigating using the Mesos-Slave on windows. Pleased to say that it appears to be working OK (this opinion is subject to change as I'm still testing it). Production ready is something any business would have to decide for themselves.
Mesos have always had their own isolation technology, interestingly they have redone their own containerizer implementation and this now takes a number of container image formats, so you can use your Docker images as well as a few others, so this is going to suit you. There was a good presentation on this at MesosCon https://www.youtube.com/watch?v=rHUngcGgzVM
Docker's been stealing the show to some extent. But if you use Mesos-Agent, Windows 2016 and its container technology (Docker) isn't needed and therefore it should run on Windows 2012. I've not got around to trying this yet but its definitely a test worth trying, it opens up deployment options. Anyone?
One thing to remember about containers, they are not VM's. The guest image must be a derivative of the hosts OS, you can't run a Linux image on a Windows machine. Causing me a headache, I can't use servernano at the moment, so my image sizes are 4Gb+, the initial deploy time is hours.

Hadoop features when installed on windows using virtual box

Do I get less features or functions of hadoop env. when installed on windows machine using virtual box? Is is good to have this sort of hadoop installation for beginners practice? or What is the difference when hadoop in installed on linux machine vs installation on virtual box on a windows machine.
You can have fully distributed cluster on your windows machine using multiple nodes in the virtual box . However for beginners I will recommend you set up a single node cluster and do the practice. There is no thing as such that you will get less features . You will be running pseudo distributed mode of hadoop . All the daemons will be running. Only thing is that since you have single windows machine with limited storage/ram, you cant test the cluster with huge amounts of data. Hope this helps.

Setup multinode Hadoop cluster using virtual machines on my laptop

I have a windows 7 laptop and I need to setup hadoop (mutlinode) cluster on it.
I have the following things ready -
virtual softwares, i.e. virtualbox and vmware player.
Two virtual machines, i.e.
Ubuntu - for Hadoop master and
Ubuntu - for (1X) Hadoop slave
Has anyone done a setup of such a cluster using Virtual machines on
your laptop ?
If yes please help me to install it.
I've searched over google but I am not getting how to configure this multi-node cluster on hadoop using VMs?
How to run two Ubuntu OS on windows 7 using VMware or virtualbox?
Should we use same Ubuntu version VM image or
vm images with different versions of Ubuntu linux?
Yes you can use ubuntu two node. I am using five nodes(1 master, 4 datanodes).
If you want install multi node in vm ware.
Just download ubutnu from this link: http://www.ubuntu.com/download/desktop
And install two machine. And install java and openssh.
And download shell script for multinode from this link::
https://github.com/tonyreddy/Apache-MultiNode-Insatallation-Shellscript
And try it .....
All the best............
Since you're running Hadoop on your laptop, obviously you're doing it for learning purposes or building POC or functional debugging.
Instead of going through the hassles of installing and setting up Hadoop and related Big-Data softwares, you can simply install a pre-configured pseudo-distributed VM.
Some good options are:
Cloudera QuickStart VM
Hortonworks Sandbox
I've been using the Cloudera's VM on my laptop for quite sometime now and it's been working great.
Cloudera and Hortonworks are the fastest way to get it up and running.
Make sure you have enough RAM installed on your laptop for the Operating system already running, else your laptop will restart abruptly often while you use the Virtual machines.
Let me give you an example -
If you are using Windows 10, it needs 3-5GB RAM to be used to work smoothly,
This means if you load a Virtual Machine of 5GB size in your RAM, Windows may crash when it does not find enough RAM to operate.
You must upgrade the RAM from 8GB to 12GB or best 16GB for smooth operation of your laptop.
Hope it helps

Hadoop cluster with ubuntu and Windows

I have three laptops(with ubuntu) that I am networking to act as a cluster for hadoop. I also have a windows only machine, is it possible to add that to the cluster and make it act as a node? Is it feasible? Has anyone come across such an issue?
If you have windows environment, I would suggest that you use VirtualBox and any Linux as Guest OS.
You can build your Hadoop cluster on that. There are numerous installation procedures available for Linux and you can't go wrong with that.
We are using it exactly this way for development purposes. Performance of Hadoop cluster is not a concern as is the functionality.
It also allows you to fine tune your dev ops since you can tear apart and start afresh with a new VM.
Easiest approach to build this way is to :
Install VirtualBox
Install Vagrant
Use a community provided box from: http://www.vagrantbox.es/
Bootstrap your VM for yum packages
Move from NAT interface to Bridged Ethernet interface
Install Hadoop using SCM: http://www.cloudera.com/products-services/tools/
Bring up your cluster
Yes it is possible. On the ubuntu machines, Hadoop installation should be straightforward, you just need to follow the regular steps. Since Hadoop runs on Linux environment, you need to install Cygwin on your windows Machine which is a Linux-like environment for Windows, and will enable you to install and run Linux-based applications (like hadoop) on a Windows machine.
Here is the link for Cygwin Installation: http://www.cygwin.com/install.html

Resources