Marathon loses control over Mesos when Marathon and Mesos leaders mismatch - mesos

When mesos or marathon service restart due to some reasons and leader of mesos and marathon is not on the same machine, deployments stuck in marathon and nothing happens in mesos, that leads to terrible results when marathon can not restart failed services and do nothing with deployments until leaders will not match again.
Our cluster has 3 masters (installed through mesosphere website) and this situation happens quite often, is there any way to fix that?
Marathon v.0.9.0
Mesos v0.22.1

It sounds like either Mesos or Marathon use a private ip (localhost/127.0.0.1), thus they weren't able to talk to each other.
You should be able to solve your issue by setting a public ip using the respective --ip command line flag or LIBPROCESS_IP environment var.
One particularly useful setting is LIBPROCESS_IP, which tells the master and slave binaries which IP address to bind to; in some installations, the default interface that the hostname resolves to is not the machine’s external IP address, so you can set the right IP through this variable.
/source http://mesos.apache.org/documentation/latest/deploy-scripts/

Related

Nomad and consul setup

Should I run consul slaves alongside nomad slaves or inside them?
The later might not make sense at all but I'm asking it just in case.
I brought my own nomad cluster up with consul slaves running alongside nomad slaves (inside worker nodes), my deployable artifacts are docker containers (java spring applications).
The issue with my current setup is that my applications can't access consul slaves (to read configurations) (none of 0.0.0.0, localhost, worker node ip worked)
Lets say my service exposes 8080, I configured docker part (in hcl file) to use bridge as network mode. Nomad maps 8080 to 43210.
Everything is fine until my service tries to reach the consul slave to read configuration. Ideally giving nomad worker node IP as consul host to Spring should suffice. But for some reason it's not.
I'm using latest version of nomad.
I configured my nomad slaves like https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/client1.hcl
And the link below shows how I configured/ran my consul slave:
https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml
Note: if I use static port mapping and host as the network mode for docker (in nomad) I'll be fine but then I can't deploy more than one instance of each application in each worker node (due to port conflic)
Nomad jobs listen on a specific host/port pair.
You might want to ssh into the server and run docker ps to see what host/port pair the job is listening on.
a93c5cb46a3e image-name bash 2 hours ago Up 2 hours 10.0.47.2:21435->8000/tcp, 10.0.47.2:21435->8000/udp foo-bar
Additionally, you will need to ensure that the consul nomad job is listening on port 0.0.0.0, or the specific ip of the machine. I believe that is this config value: https://www.consul.io/docs/agent/options.html#_bind
All those will need to match up in order to consul to be reachable.
More generally, I might recommend: if you're going to run consul with nomad, you might want to switch to host networking, so that you don't have to deal with the specifics of the networking within a container. Additionally, you could schedule consul as a system job so that it is automatically present on every host.
So I managed to solve the issue like this:
nomad.job.group.network.mode = host
nomad.job.group.network.port: port "http" {}
nomad.job.group.task.driver = docker
nomad.job.group.task.config.network_mode = host
nomad.job.group.task.config.ports = ["http"]
nomad.job.group.task.service.connect: connect { native = true }
nomad.job.group.task.env: SERVER_PORT= "${NOMAD_PORT_http}"
nomad.job.group.task.env: SPRING_CLOUD_CONSUL_HOST = "localhost"
nomad.job.group.task.env: SPRING_CLOUD_SERVICE_REGISTRY_AUTO_REGISTRATION_ENABLED = "false"
Running consul agent (slaves) using docker-compose alongside nomad agent (slave) with host as network mode + exposing all required ports.
Example of nomad job: https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/location-update-publisher.hcl
Example of consul agent config (docker-compose file): https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml
Disclaimer: The LAB is part of Cluster Visualization Framework called: LiteArch Trafik which I have created as an interesting exercise to understand Nomad and Consul.
It took me long time to shift my mind from K8S to Nomad and Consul,
Integration them was one of my effort I spent in the last year.
When service resolution doesn't work, I found out it's more or less the DNS configuration on servers.
There is a section for it on Hashicorp documentation called DNS Forwarding
Hashicorp DNS Forwarding
I have created a LAB which explains how to set up Nomad and Consul.
But you can use the LAB seperately.
I created the LAB after learning the hard way how to install the cluster and how to integrate Nomad and Consul.
With the LAB you need Ubuntu Multipass installed.
You execute one script and you will get full functional Cluster locally with three servers and three nodes.
It shows you as well how to install docker and integrate the services with Consul and DNS services on Ubuntu.
After running the LAB you will get the links to Nomad, Fabio, Consul.
Hopefully it will guide you through the learning process of Nomad and Consul
LAB: LAB
Trafik:Trafik Visualizer

DC/OS - Dashboard showing 0 connected nodes

After restarting my 3 masters in my DC/OS cluster, the DC/OS dashboard is showing 0 connected nodes. However from the DC/OS cli I see all 6 of my agent nodes:
$ dcos node
HOSTNAME IP ID
172.16.1.20 172.16.1.20 a7af5134-baa2-45f3-892e-5e578cc00b4d-S7
172.16.1.21 172.16.1.21 a7af5134-baa2-45f3-892e-5e578cc00b4d-S12
172.16.1.22 172.16.1.22 a7af5134-baa2-45f3-892e-5e578cc00b4d-S8
172.16.1.23 172.16.1.23 a7af5134-baa2-45f3-892e-5e578cc00b4d-S6
172.16.1.24 172.16.1.24 a7af5134-baa2-45f3-892e-5e578cc00b4d-S11
172.16.1.25 172.16.1.25 a7af5134-baa2-45f3-892e-5e578cc00b4d-S10`
I am still able to schedule tasks in Marathon both from the dcos cli and from the Marathon gui, they then are properly scheduled and executed on the agents. Also, from the mesos interface on :5050 I can see all of the agents in the slaves page.
I have restarted agent nodes and master nodes. I have also rerun the DC/OS GUI installer and run preflight check, which of course fails with an "already installed" error.
Is there a way to re-register the node with DC/OS GUI short of uninstalling/reinstalling a node?
For anyone who is running into this, my problem was related to our corporate proxy. In order to get the Universe working in my cluster I had to add proxy settings to /opt/mesosphere/environment. I then restarted the dcos-cosmos.service and life was good. However, upon server restart, dcos-history-service.service was now running with the new environment and was unable to resolve my local names with our proxy server. To solve, I added a NO_PROXY to the /opt/mesosphere/environment and DCOS dashboard is again happy.

How to reliably discover Marathon URL

Since webui_url entry in Mesos state JSON is optional, one can try the luck with hostname (which is also optional).
However, if both of the above entries are missing from Mesos state, is there any other way to reliably discover where Marathon API server is listening?
Furthermore, if a Marathon instance is migrated to another location, Mesos webui_url seems to retain the old, stale value. This looks like a bug? Is there any workaround?
As far as I know you should be able to use the Mesos UI ("Frameworks" -> "Active Frameworks") to find the marathon framework.
If the hostname resolution of the host Marathon is running on works correctly, you should be able to click on the link of the "host" column and be redirected to the Marathon UI.
See
https://mesosphere.github.io/marathon/docs/command-line-flags.html
--hostname (Optional. Default: hostname of machine): The advertised hostname that is used for the communication with the mesos master. The value is also stored in the persistent store so another standby host can redirect to the elected leader. Note: Default is determined by InetAddress.getLocalHost.

Hadoop Cluster distributed in different sub-networks (Docker + Flannel)

I want to have Hadoop 2.3.0 in a multi bare-metal cluster using Docker. I have a master container and a slave container (in this first setup). When Master and Slave containers are in the same host (and therefore, same Flannel subnet), Hadoop works perfectly. However, if the Master and Slave are in different bare metal nodes (hence, different flannel subnets), it simply does not work (I get a connection refused error). Both containers can ping and ssh one another, so there is no connectivity problem. For some reason, it seems that hadoop needs all the nodes in the cluster to be in the same subnet. Is there a way to circumvent this?
Thanks
I think having the nodes in separate flannel subnets introduces some NAT-related rules which cause such issues.
See the below link which seems to have addressed a similar issue
Re: Networking Problem in creating HDFS cluster.
Hadoop uses a bunch of other ports for communication between the nodes, the above assumes these ports are unblocked.
ssh and ping are not enough. If you have iptables or any other firewalls, either you need to disable or open up the ports. You can set up the cluster, as long as hosts can communicate with each other and ports are open. Run telnet <namenode> <port> to ensure hosts are communicating on desired ports.

Private networking necessary for Mesos and Marathon?

I am working through this tutorial: http://mesosphere.io/docs/getting-started/cloud-install/
Just learning on an Ubuntu instance on Digital Ocean, I let the master process bind to the public IP, and the Mesos and Marathon web interfaces became publicly accessible. No surprises there.
Do Mesos and Marathon rely on Zookeeper to create private IPs between instances? Could you skip using Zookeeper by manually setting up a private network between instances? Then the proper way to start the master and slave processes is to bind to the secondary, private IPs of each instance?
Digital Ocean can set up private IPs automatically, but this is kind of a learning exercise for me. I am aware of the broad rule that administrator access to a server shouldn't come through a public IP. Another way of phrasing this posting is, does private networking provide the security for Mesos and Marathon?
Only starting with one Ubuntu instance, running both master and slave, for now. Binding to the loopback address would fix this issue for just one machine, I realize.
ZooKeeper is used for a few different things for both Marathon and Mesos:
Leader election
Storing state
Resolving the Mesos masters
At the moment, you can't skip ZooKeeper entirely because of 2 and 3 (although later versions of Mesos have their own registry which keeps track of state). AFAIK, Mesos doesn't rely on ZooKeeper for creation of private IPs - it'll bind to whatever is available (but you can force this via the ip parameter). So, you won't be able to forgo ZooKeeper entirely with a private network.
Private networking will provide some security for Mesos and Marathon - assuming you firewall off their access to the external world.
A good (although not necessarily the best) solution for keeping the instances on a private network is to set up an OpenVPN (or similar) network to one of the masters. Then, launch each instance on its private IP and make you also set the hostname parameter to that IP. Connect to the Mesos/Marathon web consoles via their private IP and the VPN and all should resolve correctly.
Mesos and marathon doesn't create private IPs between instance.
For that, I suggest you use tinc or directly a docker image tinc
Using this, I was able to do the config you want in 5 minutes, it's easier to configure than openvpn, and each host can connect to another, no need to use a vpn server to route all the traffic.
Each node will store a private and public for connecting to each server of the private network.
You should setup a private network for using mesos.
After that, you can add in /etc/hosts all the hosts with the IP of the internal network.
You will be able to bind zookeeper using the private network :
zk://master-1:2181,master-2:2181,master-3:2181
Then the proper way to start the master and slave processes is to bind to the secondary private IPs of each instance.

Resources