Is it possible to start multi physical node hadoop clustster using docker? - hadoop

I've seen searching for a way to start docker on multiple physical machines and connect them to a hadoop cluster, so far I only found ways to start a cluster locally on 1 machine. Is there a way to do this?

You can very well provision a multinode hadoop cluster with docker.
Please look at some posts below which will give you some insights on doing it:
http://blog.sequenceiq.com/blog/2014/06/19/multinode-hadoop-cluster-on-docker/
Run a hadoop cluster on docker containers

Related

Dask on Hadoop Kubernetes

I've installed Hadoop via a helm chart on my microk8s kubernetes cluster.
I would like to know how to create a dask cluster on my different machines on this hadoop cluster. I tried following the the tutorials on the Dask websites, but I keep getting errors because it is looking for the local yarn/hadoop. How do I point to the hadoop on kubernetes so I can create the cluster?
If you want to launch Dask on Yarn we recommend using https://yarn.dask.org
However, if you are using Kubernetes already you might consider https://kubernetes.dask.org, which is more commonly used today.

Install Hadoop in openstack

I'm new to big data. And I have a question about the installation of hadoop.
Currently I use an image on VirtualBox, but I would like to create a cluster on the openstack. At first I thought I just need to instantiate a hadoop image on the openstack or install several instances and use the hadoop docker image.
But I found several examples of the Sahara openstack. Knowing that I already have an openstack shared with several people, is it possible to create a hadoop cluster without going through openstack Sahara? Or is it not recommended?
Not sure about "Sahara Openstack", but you can surely create Hadoop cluster using VM nodes on openstack.
Single node installation guide
http://tecadmin.net/setup-hadoop-2-4-single-node-cluster-on-linux/#
Yes, its possible to create Hadoop cluster on OpenStack cloud without using OpenStack sahara. You can launch 3 Virtual machines on OpenStack, and assign floating IP to these virtual machines.
One can be used as Master and other 2 as slaves. You can follow the Hadoop multinode installation steps on these virtual machines and connect them using SSH configuration which will be mentioned in Hadoop multinode setup guide.
You can also write automated shell script for launching Hadoop on OpenStack.

Multi-Node Hadoop in kubernetes

I already intalled minikube the single node Kubernetes cluster, I just want a help of how to deploy a multi-node hadoop cluster inside this kubernetes node, I need a starting point please!?
For clarification, do you want hadoop to leverage k8s components to run jobs or do you just want it to run as a k8s pod?
Unfortunately I could not find an example of hadoop built as a Kubernetes scheduler. You can probably still run it similar to the spark example.
Update: Spark now ships with better integration for Kubernetes. Information can be found here here

Multiple datanodes on a single machine in hadoop2.7.1

I am working on hadoop hdfs 2.7.1. I have set up a single node cluster having one datanode. But now i need to set up three datanodes on the same machine. I tried using various methods available on the internet but am unable to start the hadoop cluster having three datanodes on the same machine. Please help me.
You can run a multi-node cluster on a single machine using Docker containers. The guys at SequenceIQ, a company that was recently acquired by Hortonworks, even prepared Docker images that you can download. See here:
http://blog.sequenceiq.com/blog/2014/06/19/multinode-hadoop-cluster-on-docker/

Run a hadoop cluster on docker containers

I want to run a multi-node hadoop cluster, with each node inside a docker container on a different host. This image - https://github.com/sequenceiq/hadoop-docker works well to start hadoop in a pseudo distributed mode, what is the easiest way to modify this to have each node in a different container on a separate ec2 host?
I did this with two containers running master and slave nodes on two different ubuntu hosts. I did the networking between containers using weave. I have added the images of the containers on docker hub account div4. I installed hadoop in the same way, as its installed on different hosts. I have added the two images with coomands to run haddop on them here:
https://registry.hub.docker.com/u/div4/hadoop_master/
https://registry.hub.docker.com/u/div4/hadoop_slave/.
The people from sequenceiq have created a new project called cloud-break that is designed to work with different cloud providers and create hadoop clusters on them easily. You just have to enter your credentials and then it works the same for all providers, as far as I can see.
So for ec2, this will now probably be the easiest solution(especially because of a nice GUI):
https://github.com/sequenceiq/cloudbreak-deployer

Resources