For a Docker container based implementation, does it make sense to run a pair of Kafka server and Zookeeper server inside the same container? - elasticsearch

I'm trying to implement a kafka cluster consists of three nodes of kafka servers and three nodes of zookeeper servers using docker containers, which of the following is preferred or if neither, what is the preferred way?
three docker containers each hosting a kafka/zookeeper server pair
six docker containers with three of them for kafka servers and three others for zookeeper servers
I'm asking this because it seems to me like a three-node Zookeeper cluster only survives single-node failure, whilst a three-node Kafka cluster could potentially survive a two-node failure (you may have to set the topic replication factor to 3). So is it better to run them in different containers if it isn't so costly to create new containers? Speaking of which, how costly is it to start a new Docker container?
In case I am advised to run one server per container, is it more preferred to build a tailored Docker image for every kind of server (in this case, one docker image for kafka and another for zookeeper), or one unified image for all different servers? I'm guessing it doesn't make sense to create two separate images just for kafka and zookeeper but what if i have all different kinds of clusters and servers, think elasticsearch, to simulate? at what point would it start to make sense to create different docker images to be used inside a single project?

If i had the time to do that, i would make 2 differents images, one for kafka and one for zk. and i'd build a docker-compose file to launch the cluster.
So 6 differents containers
https://docs.docker.com/compose/

Related

Run several microservices docker image together on local dev with Minikube

I have several microservices around 20 or something to check their services in my local development. The micro-services are spring boot services with maven build. So wanted to know when I have to run them on my aws server can I run all these containers individually like they might have shared database so will that be one issue i might face.Or is it possible to run all these services together in one single docker image.
Also I have to configure it with Kubernetes so I have configured Minikube in my local dev would be helpful if there are some considerations to be taken while running around 20services on my minikube or even Kubernetes env
PS: I know this is a basic question but dont have much idea about Devops
Ideally you should have different docker image for each of the micro services and create kubernetes deployment for each of the micro services.This makes scaling individual micro services de coupled from each other. Also communication between micro services should be via kubernetes service. This makes communication stable because service IPs and FQDN don't change even if pods are created, deleted, scaled up and down.
Just be cautious of how much memory and CPU the micros services will need and if the system with minikube has that much resource or not. If the available memory and CPU of a Kubernetes node is not enough to schedule the pod then pods will be stuck in pending state.
As you have too many microservices, I suggest you make a Kubernetes cluster on AWS of 3-4 VMs (more info here). Then try to deploy all your microservices on that. For that you need to build the containers individually for each service and create kubernetes deployment for each service.
I run all these containers individually like they might have shared database so will that be one issue i might face.
As you have shared database, I suggest you run your database server on individual host and then remotely connect with your database from your services. This way you would be able to share database between your microservices.

Wildfly 11 - High Availability - Single deploy on slave

I have two servers in a HA mode. I'd like to know if is it possible to deploy an application on the slave server? If yes, how to configure it in jgroups? I need to run a specific program that access the master database, but I would not like to run on master serve to avoid overhead on it.
JGroups itself does not know much about WildFly and the deployments, it only creates a communication channel between nodes. I don't know where you get the notion of master/slave, but JGroups always has single* node marked as coordinator. You can check the membership through Channel.getView().
However, you still need to deploy the app on both nodes and just make it inactive if this is not its target node.
*) If there's no split-brain partition, or similar rare/temporal issues

Does Kubernetes evenly distribute across an ec2 cluster?

So, I'm trying to understand CPU and VM allocation with kubernetes, docker and AWS ecs.
Does this seem right?
Locally, running "docker compose" with a few services:
each container gets added to the single Docker Machine VM. You can allocate CPU shares from this single VM.
AWS, running ECS, generated from a docker compose:
each container (all of them) gets added to a single ec2 VM. You can allocate CPU shares from that single VM. The fact that you deploy to a cluster of 5 ec2 instances makes no difference unless you manually "add instances" to your app. Your 5 containers will be sharing 1 ec2.
AWS, running kubernetes, using replication controllers and service yamls:
each get container gets distributed amongst ALL of your ec2 instances in your kubernetes cluster?????
If i spin up a cluster of 5 ec2 instances, and then deploy 5 replication controllers / services, will they be actually distributed across ec2's? this seems like a major difference from ECS and local development. Just trying to get the right facts.
Here are the answers to your different questions:
1> Yes you are right,you have a single VM and any container you run will get cpu shares from this single VM. You also have the option of spawning a swarm cluster and try out. Docker compose support swarm for containers connected via a overlay network spread over multiple vms.
2> Yes your containers defined in a single task will end up in the same ec2 instance. When you spin up more than one instances of the task, the tasks get spread over the instances part of the cluster. Non of tasks should have resource requirement which is greater than the max resource available on one of your ec2 instances.
3> Kubernetes is more evolved than ECS in many aspects, but in case of container distribution it works similar ecs. Kubernetes pod is equivalent to a ecs task. Which is one or a group of container colocated on a single VM. In kubernetes also you cannot have a pod need resources more the max available on one of your underneath compute resources.
In all the three scenarios, you are bound by the max capacity available on underneath resource when deploying a large container or a pod.
You should not equate the docker platform to VM creation and management platform.
All these docker platforms expect you to define tasks which fit into the VMs and require you to scale horizontally with more task count when needed. Kubernetes comes with service discovery, which allows seamless routing of requests to the deployed containers using DNS lookups. You will have build your own service discovery with swarm and ecs. CONSUL, EUREKA etc are tools which you can use for the same.

Strategy to persist the node's data for dynamic Elasticsearch clusters

I'm sorry that this is probably a kind of broad question, but I didn't find a solution form this problem yet.
I try to run an Elasticsearch cluster on Mesos through Marathon with Docker containers. Therefore, I built a Docker image that can start on Marathon and dynamically scale via either the frontend or the API.
This works great for test setups, but the question remains how to persist the data so that if either the cluster is scaled down (I know this is also about the index configuration itself) or stopped, and I want to restart later (or scale up) with the same data.
The thing is that Marathon decides where (on which Mesos Slave) the nodes are run, so from my point of view it's not predictable if the all data is available to the "new" nodes upon restart when I try to persist the data to the Docker hosts via Docker volumes.
The only things that comes to my mind are:
Using a distributed file system like HDFS or NFS, with mounted volumes either on the Docker host or the Docker images themselves. Still, that would leave the question how to load all data during the new cluster startup if the "old" cluster had for example 8 nodes, and the new one only has 4.
Using the Snapshot API of Elasticsearch to save to a common drive somewhere in the network. I assume that this will have performance penalties...
Are there any other way to approach this? Are there any recommendations? Unfortunately, I didn't find a good resource about this kind of topic. Thanks a lot in advance.
Elasticsearch and NFS are not the best of pals ;-). You don't want to run your cluster on NFS, it's much too slow and Elasticsearch works better when the speed of the storage is better. If you introduce the network in this equation you'll get into trouble. I have no idea about Docker or Mesos. But for sure I recommend against NFS. Use snapshot/restore.
The first snapshot will take some time, but the rest of the snapshots should take less space and less time. Also, note that "incremental" means incremental at file level, not document level.
The snapshot itself needs all the nodes that have the primaries of the indices you want snapshoted. And those nodes all need access to the common location (the repository) so that they can write to. This common access to the same location usually is not that obvious, that's why I'm mentioning it.
The best way to run Elasticsearch on Mesos is to use a specialized Mesos framework. The first effort is this area is https://github.com/mesosphere/elasticsearch-mesos. There is a more recent project, which is, AFAIK, currently under development: https://github.com/mesos/elasticsearch. I don't know what is the status, but you may want to give it a try.

Run a hadoop cluster on docker containers

I want to run a multi-node hadoop cluster, with each node inside a docker container on a different host. This image - https://github.com/sequenceiq/hadoop-docker works well to start hadoop in a pseudo distributed mode, what is the easiest way to modify this to have each node in a different container on a separate ec2 host?
I did this with two containers running master and slave nodes on two different ubuntu hosts. I did the networking between containers using weave. I have added the images of the containers on docker hub account div4. I installed hadoop in the same way, as its installed on different hosts. I have added the two images with coomands to run haddop on them here:
https://registry.hub.docker.com/u/div4/hadoop_master/
https://registry.hub.docker.com/u/div4/hadoop_slave/.
The people from sequenceiq have created a new project called cloud-break that is designed to work with different cloud providers and create hadoop clusters on them easily. You just have to enter your credentials and then it works the same for all providers, as far as I can see.
So for ec2, this will now probably be the easiest solution(especially because of a nice GUI):
https://github.com/sequenceiq/cloudbreak-deployer

Resources