Does Mesos really treat all your resources as a single pool? - mesos

Mesos is advertised as a system that lets you program against your datacenter like it's a single pool of resources (See the Mesos Website). But is this really true that you don't need to consider the configuration of the individual machines? Using Mesos, can you request more resources for a task than are available on a single machine?
For example, if you have 10 machines each with 2 cores and 2g of RAM and 20g HD, can you really request 10 cores, 15g of RAM and 100g of disk space for a single task?
If so, how does this work? Is Mesos able to address memory across machines for you, and use other CPUs as local threads and create a single filesystem from a number of distributed nodes?
How does it accomplish this without suffering from the Fallacies of distributed computing, especially those related to network latency and transport cost?

According to this Mesos architecture you can't aggregate resources from different slaves (agents / machines) to use them for one task.
As you can see there is strict "taks per agent" situation
Also their example says pretty much same
Let’s walk through the events in the figure.
Agent 1 reports to the master that it has 4 CPUs and 4 GB of memory
free. The master then invokes the allocation policy module, which
tells it that framework 1 should be offered all available resources.
The master sends a resource offer describing what is available on
agent 1 to framework 1. The framework’s scheduler replies to the
master with information about two tasks to run on the agent, using <2
CPUs, 1 GB RAM> for the first task, and <1 CPUs, 2 GB RAM> for the
second task. Finally, the master sends the tasks to the agent, which
allocates appropriate resources to the framework’s executor, which in
turn launches the two tasks (depicted with dotted-line borders in the
figure). Because 1 CPU and 1 GB of RAM are still unallocated, the
allocation module may now offer them to framework 2.

Related

Creating Elasticsearch cluster from three servers

We have three physical servers. Each server has 2 CPUs (32 cores), 96 TB HDD, and 768 GB RAM. We would like to use these servers in an Elasticsearch cluster.
Each server will be located in a different data center, connecting each server using a private connection.
How can be optimize our configuration for high performance? Also, how should we best run Elasticsearch on these machines. For example, should we use virtualization to create multiple nodes per machine, or not?
As you have huge RAM(768) available on each physical server and according to ES documentation on heap setting it shouldn't cross 32 GB, so you will have to use virtualization to create multiple nodes per physical server for better ultization of your infra.
Apart from these there are various cluster settings and node settings which you can optimize but as you have not provided them, its difficult to provide recommendation on them.
Another thing to note is that you have huge RAM and disk but CPU is not in proportion to it, so if you can increase them as well, it would be good.

Mesos: what are the OS level techniques for resources allocation?

I understand Mesos architecture at a high level, but I'm not clear about the OS level techniques used to implement resources allocation. For example, Mesos offers a framework 1 CPU and 400MB memory, and another framework 2 CPUs and 1GB memory, how is this actually implemented at OS level?
tl;dr: Mesos itself doesn't "allocate" any resources at the OS-level. The resources are still allocated by the OS, although Mesos can use OS-level primitives like cgroups to ensure that a task doesn't use more resources than it should.
The Mesos agent at the node advertises that some resources are available at the host (e.g., 4 CPUs and 16GB of RAM) -- either by auto-detecting what is available at the host or because the available resources have been explicitly configured (recommended for production).
The master then offers those resources to a framework.
The framework can then launch a task, using some or all of the resources available at the agent: e.g., the framework might launch a task with 2 CPUs and 8GB of RAM.
The agent then launches an executor to run the task.
How strictly the "2 CPUs and 8GB of RAM" resource limit is enforced depends on how Mesos is configured. For example, if the agent host supports cgroups and the agent is started with --isolation='cgroups/cpu,cgroups/mem', cgroups will be used to throttle the CPU appropriately, and to kill the task if it tries to exceed its memory allocation.

resource offer showing less memory than added in mesos

I am currently exploring mesos. I have set up mesos cluster with one slave node added. The hardware added is 1 cpu-core, 2 GB RAM. but at mesos UI it is showing 1 cpu-core, and 1001 MB RAM. It is showing approximately 1GB less RAM. Can any one knows where remaining 1GB RAM is getting utilized ?
If you don't specify via resources how much RAM a Mesos Slave (now: Agent) is supposed to use the default kicks in, see the Mesos containerizer for details.

How Much Ram For a Jenkins Master Instance?

I want to set up a Jenkins Master that will let slaves do all the the builds.
The master is just a traffic cop, getting SVN hook triggers and kicking off slave builds.
There will be about 10 Java Maven build jobs in this setup.
I wish to run the Jenkins master on a hosted server that has limited resources (RAM).
I will run the slaves on some nicely loaded machines on my own network.
So my question is how little RAM could I get away with allocating to the Master Jenkins instance? 256M? 384M? 512M? Other?
I cannot seem to find this specific info in the Jenkins docs.
A coworker asked me the same question and my first answer was that 1-2 GB should be enough. Later I discovered this entry from the Jenkins documentation:
Have a beefy machine for Jenkins master & do not run slaves on the
master machine. Every slave has certain memory allocated in the master
JVM, so the bigger the RAM for the master, the better it is. We
typically hear customers allocate 16G or so.
Source: https://docs.cloudbees.com/docs/cloudbees-core/latest/traditional-install-guide/system-requirements
As of mid-2016, the official documentation says
Memory Requirements for the Master
The amount of memory Jenkins needs is largely dependent on many factors, which is why the RAM allotted for it can range from 200 MB for a small installation to 70+ GB for a single and massive Jenkins master. However, you should be able to estimate the RAM required based on your project build needs.
Each build node connection will take 2-3 threads, which equals about 2 MB or more of memory. You will also need to factor in CPU overhead for Jenkins if there are a lot of users who will be accessing the Jenkins user interface.
It is generally a bad practice to allocate executors on a master, as builds can quickly overload a master’s CPU/memory/etc and crash the instance, causing unnecessary downtime. Instead, it is advisable to set up slaves that the Jenkins master can delegate build jobs to, keeping the bulk of the work off of the master itself.
I don't think there is a rule of thumb for this. Our master uses 2G and we have 6 slaves. We have close to 60 jobs - most of them maven. We have never had memory issues in the past. And our slaves are always busy (I always see some job or the other being kicked off).
You could start with 512M and see how it works. If you see memory issues increase the memory. That is the only way I can think of. But to monitor the memory of your master use the Jenkins Monitoring Plugin. This plugin integrates JavaMelody and lets you monitor the JVM of your master and even slaves. Good luck!
I have a Jenkins master with a few dozens of jobs and slaves. But I don't run builds or tests on Master. Based on my observation the memory consuption is not that big. It rarely went more than 2 or 3 GB. Also I believe it depends on the memory size option you specifiy to the java process of Jenkins. I would recommend at least 2GB RAM in your case. You could always load balance the builds to slaves if you need.
It depends on what you are building, how often you are running builds, and how fast you need them to build. From my experience you need 1 GB per concurrent build, but this can vary depending on how resource intensive your build is. Monitor your memory usage under it's heaviest load and if the memory usage gets between 70-80% or more add more memory, it's around 30-40% or less allocate less.
Also keep an eye on disk usage, about 4 years ago had a TeamCity build server that kept falling to it's knees, that problem turned out to be the master and the salves where all virtual servers sharing the same attached storage and the disks couldn't keep up. More of a problem with the VM environment but still something to keep in mind.

Hadoop and map-reduce on multicore machines

I have read a lot about Hadoop and Map-Reduce running on clusters of machines. Does some one know if the Apache distribution can be run on an SMP with several cores. In particular, can multiple Map-Reduce processes be run on the same machine. The scheduler will take care of spreading them across multiple cores. Thanks. - KG
Yes. You have multiple map and reduce slots in each machine which are determined by the RAM and CPU (each JVM instance needs 1GB by default so a 8GB machine with 16 cores should still have 7 task slots)
from hadoop wiki
Use the configuration knob: mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum to control the number of
maps/reduces spawned simultaneously on a TaskTracker. By default, it
is set to 2, hence one sees a maximum of 2 maps and 2 reduces at a
given instance on a TaskTracker.
You can set those on a per-tasktracker basis to accurately reflect
your hardware (i.e. set those to higher nos. on a beefier tasktracker
etc.).
You can use those lightweight MapReduce frameworks for multicore computers.
For example
LeoTask: A lightweight, productive, and reliable mapreduce framework for multicore computers
https://github.com/mleoking/LeoTask
For Apache Hadoop 2.7.3, my experience has been that enabling YARN will also enable multi-core support. Here is a simple guide for enabling YARN on a single node:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node
The default configuration seems to work pretty well. If you want to tune your core usage, then perhaps look into setting 'yarn.scheduler.minimum-allocation-vcores' and 'yarn.scheduler.maximum-allocation-vcores' within yarn-site.xml (https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml)
Also, see here for instructions on how to configure a simple Hadoop sandbox with multicore support: https://bitbucket.org/aperezrathke/hadoop-aee

Resources