How to set up a POC environment with DC/OS, Kafka and ElasticSearch on two nodes with Docker Swarm or Kubernetes containers? - elasticsearch

The instructions for installing Mesosphere DC/OS on AWS use a CloudFormation template where the minimum configuration indicates:
You have the option of 1 or 3 Mesos master nodes.
5 private Mesos agent nodes is the default.
1 public Mesos agent node is the default.
For our POC, as not to incur too much up-front cost, is it possible to do this all with two nodes? One for DC/OS and the other containterized with ElasticSearch and Kafka?
If not, what would be a good configuration for this type of architecture?

DC/OS does not run on Docker Swarm or Kubernetes. But you can run a development docker-in-docker local deployment on linux (or in a VM on mac/windows): dcos-docker
You could then install ElasticSearch and Kafka on top of DC/OS.
You could also use dcos-vagrant to run a multiple VM DC/OS local dev cluster.
Warning: the current vagrant v1.9.1 has a crippling centos network bug, if you need a VM. dcos-vagrant has a monkey patch workaround included, dcos-docker does not.

Related

Multi-Node Hadoop in kubernetes

I already intalled minikube the single node Kubernetes cluster, I just want a help of how to deploy a multi-node hadoop cluster inside this kubernetes node, I need a starting point please!?
For clarification, do you want hadoop to leverage k8s components to run jobs or do you just want it to run as a k8s pod?
Unfortunately I could not find an example of hadoop built as a Kubernetes scheduler. You can probably still run it similar to the spark example.
Update: Spark now ships with better integration for Kubernetes. Information can be found here here

Does Kubernetes evenly distribute across an ec2 cluster?

So, I'm trying to understand CPU and VM allocation with kubernetes, docker and AWS ecs.
Does this seem right?
Locally, running "docker compose" with a few services:
each container gets added to the single Docker Machine VM. You can allocate CPU shares from this single VM.
AWS, running ECS, generated from a docker compose:
each container (all of them) gets added to a single ec2 VM. You can allocate CPU shares from that single VM. The fact that you deploy to a cluster of 5 ec2 instances makes no difference unless you manually "add instances" to your app. Your 5 containers will be sharing 1 ec2.
AWS, running kubernetes, using replication controllers and service yamls:
each get container gets distributed amongst ALL of your ec2 instances in your kubernetes cluster?????
If i spin up a cluster of 5 ec2 instances, and then deploy 5 replication controllers / services, will they be actually distributed across ec2's? this seems like a major difference from ECS and local development. Just trying to get the right facts.
Here are the answers to your different questions:
1> Yes you are right,you have a single VM and any container you run will get cpu shares from this single VM. You also have the option of spawning a swarm cluster and try out. Docker compose support swarm for containers connected via a overlay network spread over multiple vms.
2> Yes your containers defined in a single task will end up in the same ec2 instance. When you spin up more than one instances of the task, the tasks get spread over the instances part of the cluster. Non of tasks should have resource requirement which is greater than the max resource available on one of your ec2 instances.
3> Kubernetes is more evolved than ECS in many aspects, but in case of container distribution it works similar ecs. Kubernetes pod is equivalent to a ecs task. Which is one or a group of container colocated on a single VM. In kubernetes also you cannot have a pod need resources more the max available on one of your underneath compute resources.
In all the three scenarios, you are bound by the max capacity available on underneath resource when deploying a large container or a pod.
You should not equate the docker platform to VM creation and management platform.
All these docker platforms expect you to define tasks which fit into the VMs and require you to scale horizontally with more task count when needed. Kubernetes comes with service discovery, which allows seamless routing of requests to the deployed containers using DNS lookups. You will have build your own service discovery with swarm and ecs. CONSUL, EUREKA etc are tools which you can use for the same.

Does the DCOS installation process work the same with an existing Mesos installation or do we need to start from scratch?

We have an existing Apache Mesos cluster and want to try DCOS in its shiny new Open Source form. However, it would be painful to do a destructive re-install of DCOS. So is it possible to just 'overlay' DCOS on an existing Mesos installation? Would any of the steps change in the DCOS installation guide or could the installer detect the existing Mesos and install DCOS components over it?
I don't think you can simply overlay DC/OS on top of your Mesos cluster. There are multiple reasons for that; one of those is that configuration is for Mesos and marathon is done differently in DC/OS as it is done for Mesos clusters.

Is there docker orchestration for Hadoop cluster

I was looking at Rancher(an orchestration engine for docker). I think there isn't build in support of hadoop setup.
Take a look at the latest version of Rancher, it has a catalog function that includes Hadoop deployment out of the box.
this is in the latest 0.49 release of Rancher for sure.
One source of information would be "Docker Releases Orchestration Tool Kit", which mentions docker machine, docker swarm, and more importantly, built on top of the swarm API, mesosphere.
Mesosphere’s technology is the only way for an organization to run a Docker Swarm workload in a highly elastic way on the same cluster as other types of workloads.
For example, you can run Cassandra, Kafka, Storm, Hadoop and Docker Swarm workloads alongside each other on a single Mesosphere cluster, all sharing the same resources.

How to deploy a Cassandra cluster on two ec2 machines?

It's a known fact that it is not possible to create a cluster in a single machine by changing ports. The workaround is to add virtual Ethernet devices to our machine and use these to configure the cluster.
I want to deploy a cluster of , let's say 6 nodes, on two ec2 instances. That means, 3 nodes on each machine. Is it possible? What should be the seed nodes address, if it's possible?
Is it a good idea for production?
You can use Datastax AMI on AWS. Datastax Enterprise is a suitable solution for production.
I am not sure about your cluster, because each node need its own config files and it is default. I have no idea how to change it.
There are simple instructions here. When you configure instances settings, you have to write advanced settings for cluster, like --clustername yourCluster --totalnodes 6 --version community etc. You also can install Cassandra manually by installing latest version java and cassandra.
You can build cluster by modifying /etc/cassandra/cassandra.yaml (Ubuntu 12.04) fields like cluster_name, seeds, listener_address, rpc_broadcast and token. Cluster_name have to be same for whole cluster. Seed is master node, which IP you should add for every node. I am confused about tokens

Resources