Mesos Cgroup resource usage - mesos

I am building an autoscaling system using mesos and marathon. The scenario is, I want to autoscale a task, if the amount of memory allocated to it is consumed more than 80%. How do I find out the amount of memory used at a cgroup level?
Is this the right approach?

You can get the stastics for each task by hitting http://host:5051/monitor/statistics.json where host is mesos slave.
This repo will give you an idea how to autoscale marathon applications.

Related

How to modify the size of RAM requirement of aurora task in Heron cluster deployed on Aurora scheduler?

I deployed Heron cluster using aurora-scheduler and Mesos. And when I ran the default WordCountTopology using this cluster, I found the ram demand of aurora task is 4G. However, the WordCountToplogy's configuration as follows:
componentRam: 1G
containerRamRequested: 1G
containerCpuRequested: 2 cores
containerDiskRequeted: 2G
Aurora task.json content is:
It shows that this task of aurora needs 4g ram resources. But I don't know why it requests 4G ram. And how to modified this ram requirement?
In addition, there are two slave hosts in my heron cluster and these host resources is:
In addition to the Ram requested by topology's components, there are some additional resources(cpu, memory) requested for heron's daemon processes,e.g stream-manager. Packing additional CPU in RR
A second cause for the larger resource request is due to Aurora only allows homogeneous containers. The packing algorithm will pick maximum container resources as the resource request for all the containers. For example, if a topology has two containers: one requests 2 cpus and the other requests 3 cpus. Then the eventually all containers will request 3 cpus.

How do I find a marathon runaway process

I have a mesos / marathon system, and it is working well for the most part. There are upwards of 20 processes running, most of them using only part of a CPU. However, sometimes (especially during development), a process will spin up and start using as much CPU as is available. I can see on my system monitor that there is a pegged CPU, but I can't tell what marathon process is causing it.
Is there a monitor app showing CPU usage for marathon jobs? Something that shows it over time. This would also help with understanding scaling and CPU requirements. Tracking memory usage would be good, but secondary to CPU.
It seems that you haven't configured any isolation mechanism on your agent (slave) nodes. mesos-slave comes with an --isolation flag that defaults to posix/cpu,posix/mem. Which means isolation at process level (pretty much no isolation at all). Using cgroups/cpu,cgroups/mem isolation will ensure that given task will be killed by kernel if exceeds given memory limit. Memory is a hard constraint that can be easily enforced.
Restricting CPU is more complicated. If you have machine that offers 8 CPU cores to Mesos and each of your tasks is set to require cpu=2.0, you'll be able run there at most 4 tasks. That's easy, but at given moment any of your 4 tasks might be able to utilize all idle cores. In case some of your jobs is misbehaving, it might affect other jobs running on the same machine. For restricting CPU utilization see Completely Fair Scheduler (or related question How to understand CPU allocation in Mesos? for more details).
Regarding monitoring there are many possibilities available, choose an option that suits your requirements. You can combine many of the solutions, some are open-source other enterprise level solutions (in random order):
collectd for gathering stats, Graphite for storing, Grafana for visualization
Telegraf for gathering stats, InfluxDB for storing, Grafana for visualization
Prometheus for storing and gathering data, Grafana for visualization
Datadog for a cloud based monitoring solution
Sysdig platform for monitoring and deep insights

Mesos: what are the OS level techniques for resources allocation?

I understand Mesos architecture at a high level, but I'm not clear about the OS level techniques used to implement resources allocation. For example, Mesos offers a framework 1 CPU and 400MB memory, and another framework 2 CPUs and 1GB memory, how is this actually implemented at OS level?
tl;dr: Mesos itself doesn't "allocate" any resources at the OS-level. The resources are still allocated by the OS, although Mesos can use OS-level primitives like cgroups to ensure that a task doesn't use more resources than it should.
The Mesos agent at the node advertises that some resources are available at the host (e.g., 4 CPUs and 16GB of RAM) -- either by auto-detecting what is available at the host or because the available resources have been explicitly configured (recommended for production).
The master then offers those resources to a framework.
The framework can then launch a task, using some or all of the resources available at the agent: e.g., the framework might launch a task with 2 CPUs and 8GB of RAM.
The agent then launches an executor to run the task.
How strictly the "2 CPUs and 8GB of RAM" resource limit is enforced depends on how Mesos is configured. For example, if the agent host supports cgroups and the agent is started with --isolation='cgroups/cpu,cgroups/mem', cgroups will be used to throttle the CPU appropriately, and to kill the task if it tries to exceed its memory allocation.

Mesos master configuration on DC/OS

I am in the process of creating DC/OS cluster on AWS for running Kafka->Spark->Cassandra workloads.
I am interested what is the minimum specification for master node in DC/OS environment. I see that DC/OS suggests m3.xlarge instances, but I do not know why do I need 4 processors and 15GB of RAM, when master is only runnning processes described on: https://docs.mesosphere.com/overview/architecture/
-> There is no actual data processing performed by the master.
I would maybe go with m3.large or r3.large instances.
Kindest regards,
Srdjan
DC/OS masters are not used to run any high computation loads, but their memory usage tends to be pretty exhaustive, and so a large instance is recommended.
There might be a way to use smaller instances and compensate for the missing memory using swap files, but straying from the supplier's recommendations should only be done after careful consideration of the potential consequences.

How to select CPU parameter for Marathon apps ran on Mesos?

I've been playing with Mesos cluster for a little bit, and thinking of utilizing Mesos cluster in our production environment. One problem I can't seem to find an answer to: how to properly schedule long running apps that will have varying load?
Marathon has "CPUs" property, where you can set weight for CPU allocation to particular app. (I'm planning on running Docker containers) But from what I've read, it is only a weight, not a reservation, allocation, or limitation that I am setting for the app. It can still use 100% of CPU on the server, if it's the only thing that's running. The problem is that for long running apps, resource demands change over time. Web server, for example, is directly proportional to the traffic. Coupled to Mesos treating this setting as a "reservation," I am choosing between 2 evils: set it too low, and it may start too many processes on the same host and all of them will suffer, with host CPU going past 100%. Set it too high, and CPU will go idle, as reservation is made (or so Mesos think), but there is nothing that's using those resources.
How do you approach this problem? Am I missing something in how Mesos and Marathon handle resources?
I was thinking of an ideal way of doing this:
Specify weight for CPU for different apps (on the order of, say, 0.1 through 1), so that when going gets tough, higher priority gets more (as is right now)
Have Mesos slave report "Available LA" with its status (e.g. if 10 minute LA is 2, with 8 CPUs available, report 6 "Available LA")
Configure Marathon to require "Available LA" resource on the slave to schedule a task (e.g. don't start on particular host if Available LA is < 2)
When available LA goes to 0 (due to influx of traffic at the same time as some job was started on the same server before the influx) - have Marathon move jobs to another slave, one that has more "Available LA"
Is there a way to achieve any of this?
So far, I gather that I can possible write a custom isolator module that will run on slaves, and report this custom metric to the master. Then I can use it in resource negotiation. Is this true?
I wasn't able to find anything on Marathon rescheduling tasks on different nodes if one becomes overloaded. Any suggestions?
As of Mesos 0.23.0 oversubscription is supported. Unfortunately it is not yet implemented in Marathon: https://github.com/mesosphere/marathon/issues/2424
In order to dynamically do allocation, you can use the Mesos slave metrics along with the Marathon HTTP API to scale, for example, as I've done here, in a different context. My colleague Niklas did related work with nibbler, which might also be of help.

Resources