Mesos master configuration on DC/OS - mesos

I am in the process of creating DC/OS cluster on AWS for running Kafka->Spark->Cassandra workloads.
I am interested what is the minimum specification for master node in DC/OS environment. I see that DC/OS suggests m3.xlarge instances, but I do not know why do I need 4 processors and 15GB of RAM, when master is only runnning processes described on: https://docs.mesosphere.com/overview/architecture/
-> There is no actual data processing performed by the master.
I would maybe go with m3.large or r3.large instances.
Kindest regards,
Srdjan

DC/OS masters are not used to run any high computation loads, but their memory usage tends to be pretty exhaustive, and so a large instance is recommended.
There might be a way to use smaller instances and compensate for the missing memory using swap files, but straying from the supplier's recommendations should only be done after careful consideration of the potential consequences.

Related

k8s tasks slowdown with no excess CPU or RAM usage

I have a small virtualised k8s cluster running on top of KVM on 2 physical machines. After deploying Ceph (a storage framework), all the k8s tasks like creating containers or starting containers became insufferably slow, like taking over a minute to get from creating to starting a container.
I checked the nodes for excess CPU or RAM usages, both work nodes and the master node is well below consuming half the assigned resources. I have about 10-20 pods running on each node at the moment.
I am not sure what to google and given my level of k8s knowledge am completely out of ideas. Anyone with similar experience or could point me in the right direction would be much appreciated!

Is there a way to force re-election in Apache mesos master quorum?

We have a Apache Mesos master running in HA mode with 3 nodes(each with 4CPU, 15G Memory), this cluster stops offering resources when the memory gets completely exhausted (happens every week)
we have >200 agents connected to this master and it grows, so a long term solution is to increase the CPU & Memory. But till we get bigger VMs, we have to baby sit every day to monitor the CPU load and memory to restart the mesos master service (which will force the re-election) as a precaution.
To avoid this manual effort, we are planning to force the re-election of this cluster on a specific interval.. say every 2days.
So my question here is, whether mesos master has support to force re-election like this, if so how, is it recommended and does it has any caveat?
Appreciate your time to answer and help me here!

Running Hadoop in virtual environment

I would like to know whether I should expect problems when having Hadoop cluster on virtual instead of physical machines?
I'm mostly worried about using the same hard drive, I read that I should count for 1-2 containers per drive,but in my case only one drive will exist. Could that be a problem?
I think it depends upon how much size are you allocating for containers. Of course there would be limitation to number of containers if you have restriction to the memory.
I can highlight few points which can be considered while running hadoop cluster in virtual environment:
Network configuration in case of multi node cluster
Obvious the performance of application
Affect on scalability as limited resources if you are planning to run the cluster on host which has low configuration hardware

How to select CPU parameter for Marathon apps ran on Mesos?

I've been playing with Mesos cluster for a little bit, and thinking of utilizing Mesos cluster in our production environment. One problem I can't seem to find an answer to: how to properly schedule long running apps that will have varying load?
Marathon has "CPUs" property, where you can set weight for CPU allocation to particular app. (I'm planning on running Docker containers) But from what I've read, it is only a weight, not a reservation, allocation, or limitation that I am setting for the app. It can still use 100% of CPU on the server, if it's the only thing that's running. The problem is that for long running apps, resource demands change over time. Web server, for example, is directly proportional to the traffic. Coupled to Mesos treating this setting as a "reservation," I am choosing between 2 evils: set it too low, and it may start too many processes on the same host and all of them will suffer, with host CPU going past 100%. Set it too high, and CPU will go idle, as reservation is made (or so Mesos think), but there is nothing that's using those resources.
How do you approach this problem? Am I missing something in how Mesos and Marathon handle resources?
I was thinking of an ideal way of doing this:
Specify weight for CPU for different apps (on the order of, say, 0.1 through 1), so that when going gets tough, higher priority gets more (as is right now)
Have Mesos slave report "Available LA" with its status (e.g. if 10 minute LA is 2, with 8 CPUs available, report 6 "Available LA")
Configure Marathon to require "Available LA" resource on the slave to schedule a task (e.g. don't start on particular host if Available LA is < 2)
When available LA goes to 0 (due to influx of traffic at the same time as some job was started on the same server before the influx) - have Marathon move jobs to another slave, one that has more "Available LA"
Is there a way to achieve any of this?
So far, I gather that I can possible write a custom isolator module that will run on slaves, and report this custom metric to the master. Then I can use it in resource negotiation. Is this true?
I wasn't able to find anything on Marathon rescheduling tasks on different nodes if one becomes overloaded. Any suggestions?
As of Mesos 0.23.0 oversubscription is supported. Unfortunately it is not yet implemented in Marathon: https://github.com/mesosphere/marathon/issues/2424
In order to dynamically do allocation, you can use the Mesos slave metrics along with the Marathon HTTP API to scale, for example, as I've done here, in a different context. My colleague Niklas did related work with nibbler, which might also be of help.

What keeps the cluster resource manager running?

I would like to use Apache Marathon to manage resources in a clustered product. Mesos and Marathon solves some of the "cluster resource manager" problems for additional components that need to be kept running with HA, failover, etc.
However, there are a number of services that need to be kept running to keep mesos and marathon running (like zookeeper, mesos itself, etc). What can we use to keep those services running with HA, failover, etc?
It seems like solving this across a cluster (managing how many instances of zookeeper, etc, and where they run and how they fail over) is exactly the problem that mesos/marathon are trying to solve.
As the Mesos HA doc explains, you can start multiple Mesos masters and let ZK elect the leader. Then if your leading master fails, you still have at least 2 left to handle things. It is common to use something like systemd to automatically restart the mesos-master on the same host if it's still healthy, or something like Amazon AutoScalingGroups to ensure you always have 3 master machines even if a host dies.
The same can be done for Marathon in its HA mode (on by default if you start multiple instances pointing to the same znode). Many users start these on the same 3 nodes as their Mesos masters, using systemd to restart failed Marathon services, and the same ASG to ensure there are 3 Mesos/Marathon master nodes.
These same 3 nodes are often configured to be the ZK quorum as well, so there are only 3 nodes you have to manage for all these services running outside of Mesos.
Conceivably, you could bootstrap both Mesos-master and Marathon into the cluster as Marathon/Mesos tasks. Spin up a single Mesos+Marathon master to get the cluster started, then create a Mesos-master app in Marathon to launch 2-3 masters as Mesos tasks, and a Marathon-master app in Marathon to launch a couple of HA Marathon instances (as Mesos tasks). Once those are healthy, you can kill the original standalone Mesos/Marathon master and the cluster would failover to the self-hosted Mesos and Marathon masters, which would be automatically restarted elsewhere on the cluster if they failed. Maybe this would work with ZK too. You'd probably need something like Mesos-DNS and/or ELB to let other services find Mesos/Marathon. I doubt anybody's running Mesos this way, but it's crazy enough it just might work!
In order to understand this, I suggest you spend a few minutes reading up on the architecture and the HA part in the official Mesos doc. There, it is clearly explained how HA/failover in Mesos core is handled (which is, BTW, nothing magic—many systems I know of use pretty much exactly this model, incl. HBase, Storm, Kafka, etc.).
Also, note that—naturally—the challenge keeping a handful of the Mesos masters/Zk alive is not directly comparable with keeping potentially 10000s of processes across a cluster alive, evict them or fail them over (in terms of fan out, memory footprint, throughput, etc.).

Resources