creating a mesos cluster with many nodes without public ips

creating a mesos cluster with many nodes without public ips - mesos

I have a running Mesos master node with a public IP on a KVM based Virtual machine on the Cloud, to which I can connect other agents across the network. I have one machine in my lab which I have public ip and I can connect it to the Mesos master. I have used below commands to start master and slaves into the cluster.
sudo ./bin/mesos-master.sh --work_dir=/var/lib/mesos --advertise_ip=129.xxx.110.yy
sudo ./bin/mesos-slave.sh --master=129.xxx.110.yy:5050 --advertise_ip=129.xxx.111.zz
Now the problem is I have many other machines in my lab, which does not have public ip addresses.
How can I connect them to the Mesos master?

Related

Nomad and consul setup

Should I run consul slaves alongside nomad slaves or inside them?
The later might not make sense at all but I'm asking it just in case.
I brought my own nomad cluster up with consul slaves running alongside nomad slaves (inside worker nodes), my deployable artifacts are docker containers (java spring applications).
The issue with my current setup is that my applications can't access consul slaves (to read configurations) (none of 0.0.0.0, localhost, worker node ip worked)
Lets say my service exposes 8080, I configured docker part (in hcl file) to use bridge as network mode. Nomad maps 8080 to 43210.
Everything is fine until my service tries to reach the consul slave to read configuration. Ideally giving nomad worker node IP as consul host to Spring should suffice. But for some reason it's not.
I'm using latest version of nomad.
I configured my nomad slaves like https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/client1.hcl
And the link below shows how I configured/ran my consul slave:
https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml
Note: if I use static port mapping and host as the network mode for docker (in nomad) I'll be fine but then I can't deploy more than one instance of each application in each worker node (due to port conflic)

Nomad jobs listen on a specific host/port pair.
You might want to ssh into the server and run docker ps to see what host/port pair the job is listening on.
a93c5cb46a3e image-name bash 2 hours ago Up 2 hours 10.0.47.2:21435->8000/tcp, 10.0.47.2:21435->8000/udp foo-bar
Additionally, you will need to ensure that the consul nomad job is listening on port 0.0.0.0, or the specific ip of the machine. I believe that is this config value: https://www.consul.io/docs/agent/options.html#_bind
All those will need to match up in order to consul to be reachable.
More generally, I might recommend: if you're going to run consul with nomad, you might want to switch to host networking, so that you don't have to deal with the specifics of the networking within a container. Additionally, you could schedule consul as a system job so that it is automatically present on every host.

So I managed to solve the issue like this:
nomad.job.group.network.mode = host
nomad.job.group.network.port: port "http" {}
nomad.job.group.task.driver = docker
nomad.job.group.task.config.network_mode = host
nomad.job.group.task.config.ports = ["http"]
nomad.job.group.task.service.connect: connect { native = true }
nomad.job.group.task.env: SERVER_PORT= "${NOMAD_PORT_http}"
nomad.job.group.task.env: SPRING_CLOUD_CONSUL_HOST = "localhost"
nomad.job.group.task.env: SPRING_CLOUD_SERVICE_REGISTRY_AUTO_REGISTRATION_ENABLED = "false"
Running consul agent (slaves) using docker-compose alongside nomad agent (slave) with host as network mode + exposing all required ports.
Example of nomad job: https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/location-update-publisher.hcl
Example of consul agent config (docker-compose file): https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml

Disclaimer: The LAB is part of Cluster Visualization Framework called: LiteArch Trafik which I have created as an interesting exercise to understand Nomad and Consul.
It took me long time to shift my mind from K8S to Nomad and Consul,
Integration them was one of my effort I spent in the last year.
When service resolution doesn't work, I found out it's more or less the DNS configuration on servers.
There is a section for it on Hashicorp documentation called DNS Forwarding
Hashicorp DNS Forwarding
I have created a LAB which explains how to set up Nomad and Consul.
But you can use the LAB seperately.
I created the LAB after learning the hard way how to install the cluster and how to integrate Nomad and Consul.
With the LAB you need Ubuntu Multipass installed.
You execute one script and you will get full functional Cluster locally with three servers and three nodes.
It shows you as well how to install docker and integrate the services with Consul and DNS services on Ubuntu.
After running the LAB you will get the links to Nomad, Fabio, Consul.
Hopefully it will guide you through the learning process of Nomad and Consul
LAB: LAB
Trafik:Trafik Visualizer

Migrating from Ec2Snitch to Ec2MultiRegionSnitch

I currently have a 5 node cassandra cluster running with Ec2Snitch in us-west-2.
I want to add a second datacenter in us-west-1, so I need to use Ec2MultiRegionSnitch instead.
How should I go about doing this?
On a test deployment, I tried setting all broadcast_addresses to public ip, changing the snitch, and using public ips for the seeds.
After doing this, when running nodetool from a node, I only see UN for that node, and DN for all the other nodes, who are only reporting their private ip address.

Hadoop Cluster distributed in different sub-networks (Docker + Flannel)

I want to have Hadoop 2.3.0 in a multi bare-metal cluster using Docker. I have a master container and a slave container (in this first setup). When Master and Slave containers are in the same host (and therefore, same Flannel subnet), Hadoop works perfectly. However, if the Master and Slave are in different bare metal nodes (hence, different flannel subnets), it simply does not work (I get a connection refused error). Both containers can ping and ssh one another, so there is no connectivity problem. For some reason, it seems that hadoop needs all the nodes in the cluster to be in the same subnet. Is there a way to circumvent this?
Thanks

I think having the nodes in separate flannel subnets introduces some NAT-related rules which cause such issues.
See the below link which seems to have addressed a similar issue
Re: Networking Problem in creating HDFS cluster.
Hadoop uses a bunch of other ports for communication between the nodes, the above assumes these ports are unblocked.

ssh and ping are not enough. If you have iptables or any other firewalls, either you need to disable or open up the ports. You can set up the cluster, as long as hosts can communicate with each other and ports are open. Run telnet <namenode> <port> to ensure hosts are communicating on desired ports.

Marathon loses control over Mesos when Marathon and Mesos leaders mismatch

When mesos or marathon service restart due to some reasons and leader of mesos and marathon is not on the same machine, deployments stuck in marathon and nothing happens in mesos, that leads to terrible results when marathon can not restart failed services and do nothing with deployments until leaders will not match again.
Our cluster has 3 masters (installed through mesosphere website) and this situation happens quite often, is there any way to fix that?
Marathon v.0.9.0
Mesos v0.22.1

It sounds like either Mesos or Marathon use a private ip (localhost/127.0.0.1), thus they weren't able to talk to each other.
You should be able to solve your issue by setting a public ip using the respective --ip command line flag or LIBPROCESS_IP environment var.
One particularly useful setting is LIBPROCESS_IP, which tells the master and slave binaries which IP address to bind to; in some installations, the default interface that the hostname resolves to is not the machine’s external IP address, so you can set the right IP through this variable.
/source http://mesos.apache.org/documentation/latest/deploy-scripts/

Private networking necessary for Mesos and Marathon?

I am working through this tutorial: http://mesosphere.io/docs/getting-started/cloud-install/
Just learning on an Ubuntu instance on Digital Ocean, I let the master process bind to the public IP, and the Mesos and Marathon web interfaces became publicly accessible. No surprises there.
Do Mesos and Marathon rely on Zookeeper to create private IPs between instances? Could you skip using Zookeeper by manually setting up a private network between instances? Then the proper way to start the master and slave processes is to bind to the secondary, private IPs of each instance?
Digital Ocean can set up private IPs automatically, but this is kind of a learning exercise for me. I am aware of the broad rule that administrator access to a server shouldn't come through a public IP. Another way of phrasing this posting is, does private networking provide the security for Mesos and Marathon?
Only starting with one Ubuntu instance, running both master and slave, for now. Binding to the loopback address would fix this issue for just one machine, I realize.

ZooKeeper is used for a few different things for both Marathon and Mesos:
Leader election
Storing state
Resolving the Mesos masters
At the moment, you can't skip ZooKeeper entirely because of 2 and 3 (although later versions of Mesos have their own registry which keeps track of state). AFAIK, Mesos doesn't rely on ZooKeeper for creation of private IPs - it'll bind to whatever is available (but you can force this via the ip parameter). So, you won't be able to forgo ZooKeeper entirely with a private network.
Private networking will provide some security for Mesos and Marathon - assuming you firewall off their access to the external world.
A good (although not necessarily the best) solution for keeping the instances on a private network is to set up an OpenVPN (or similar) network to one of the masters. Then, launch each instance on its private IP and make you also set the hostname parameter to that IP. Connect to the Mesos/Marathon web consoles via their private IP and the VPN and all should resolve correctly.

Mesos and marathon doesn't create private IPs between instance.
For that, I suggest you use tinc or directly a docker image tinc
Using this, I was able to do the config you want in 5 minutes, it's easier to configure than openvpn, and each host can connect to another, no need to use a vpn server to route all the traffic.
Each node will store a private and public for connecting to each server of the private network.
You should setup a private network for using mesos.
After that, you can add in /etc/hosts all the hosts with the IP of the internal network.
You will be able to bind zookeeper using the private network :
zk://master-1:2181,master-2:2181,master-3:2181
Then the proper way to start the master and slave processes is to bind to the secondary private IPs of each instance.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

creating a mesos cluster with many nodes without public ips - mesos

Related

Nomad and consul setup

Migrating from Ec2Snitch to Ec2MultiRegionSnitch

Hadoop Cluster distributed in different sub-networks (Docker + Flannel)

Marathon loses control over Mesos when Marathon and Mesos leaders mismatch

Private networking necessary for Mesos and Marathon?

Categories

Resources