Could a Resource Manager and Node Manager in the same node? [MapR] - hadoop

We have a node where Resource Manager and Node Manager are both running. Is that recommmended?

The Resource Manager and the Node Manager can run on the same node provided it has enough memory. If the memory you've allocated to both services plus the memory you've allocated to the rest of the services and daemons running on that node exceed the host's memory you could trigger OOMEs and have applications killed by the Linux OOM killer.
For most distributions it's not a recommended practice. Running the RM and NM on separate nodes provide better memory and CPU isolation. The NM in particular allocates work to containers and these require memory so if you've oversubscribed your nodes by using up all the memory to run services, you won't have any memory left to run your containers.
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

Related

Does Mesos really treat all your resources as a single pool?

Mesos is advertised as a system that lets you program against your datacenter like it's a single pool of resources (See the Mesos Website). But is this really true that you don't need to consider the configuration of the individual machines? Using Mesos, can you request more resources for a task than are available on a single machine?
For example, if you have 10 machines each with 2 cores and 2g of RAM and 20g HD, can you really request 10 cores, 15g of RAM and 100g of disk space for a single task?
If so, how does this work? Is Mesos able to address memory across machines for you, and use other CPUs as local threads and create a single filesystem from a number of distributed nodes?
How does it accomplish this without suffering from the Fallacies of distributed computing, especially those related to network latency and transport cost?
According to this Mesos architecture you can't aggregate resources from different slaves (agents / machines) to use them for one task.
As you can see there is strict "taks per agent" situation
Also their example says pretty much same
Let’s walk through the events in the figure.
Agent 1 reports to the master that it has 4 CPUs and 4 GB of memory
free. The master then invokes the allocation policy module, which
tells it that framework 1 should be offered all available resources.
The master sends a resource offer describing what is available on
agent 1 to framework 1. The framework’s scheduler replies to the
master with information about two tasks to run on the agent, using <2
CPUs, 1 GB RAM> for the first task, and <1 CPUs, 2 GB RAM> for the
second task. Finally, the master sends the tasks to the agent, which
allocates appropriate resources to the framework’s executor, which in
turn launches the two tasks (depicted with dotted-line borders in the
figure). Because 1 CPU and 1 GB of RAM are still unallocated, the
allocation module may now offer them to framework 2.

Mesos: what are the OS level techniques for resources allocation?

I understand Mesos architecture at a high level, but I'm not clear about the OS level techniques used to implement resources allocation. For example, Mesos offers a framework 1 CPU and 400MB memory, and another framework 2 CPUs and 1GB memory, how is this actually implemented at OS level?
tl;dr: Mesos itself doesn't "allocate" any resources at the OS-level. The resources are still allocated by the OS, although Mesos can use OS-level primitives like cgroups to ensure that a task doesn't use more resources than it should.
The Mesos agent at the node advertises that some resources are available at the host (e.g., 4 CPUs and 16GB of RAM) -- either by auto-detecting what is available at the host or because the available resources have been explicitly configured (recommended for production).
The master then offers those resources to a framework.
The framework can then launch a task, using some or all of the resources available at the agent: e.g., the framework might launch a task with 2 CPUs and 8GB of RAM.
The agent then launches an executor to run the task.
How strictly the "2 CPUs and 8GB of RAM" resource limit is enforced depends on how Mesos is configured. For example, if the agent host supports cgroups and the agent is started with --isolation='cgroups/cpu,cgroups/mem', cgroups will be used to throttle the CPU appropriately, and to kill the task if it tries to exceed its memory allocation.

What does container/resource allocation mean in Hadoop and in Spark when running on Yarn?

As spark runs in-memory what does resource allocation mean in Spark when running on yarn and how does it contrast with hadoop's container allocation?
Just curious to know as hadoop's data and computations are on the disk where as Spark is in-memory.
Hadoop is a framework capable of processing large data. It has two layers. One is a distributed file system layer called HDFS and the second one is the distributed processing layer. In hadoop 2.x, the processing layer is architectured in a generic way so that it can be used for non-mapreduce applications also.
For doing any process, we need system resouces such as memory, network, disk and cpu. The term container came in hadoop 2.x. In hadoop 1.x, the equivalent term was slot. A container is an allocation or share of memory and cpu. YARN is a general resource management framework which enables efficient utilization of the resources in the cluster nodes by proper allocation and sharing.
In-memory process means, the data will be completely loaded into memory and processed without writing the intermediate data to the disk. This operation will be faster as the computation happens in memory without much disk I/O operations. But this needs more memory because the entire data will be loaded into the memory.
Batch process means the data will be taken and processed in batches, intermediate results will be stored in the disk and again supplied to the next process. This also needs memory and cpu for processing, but it will be less as compared to that of fully in-memory processing systems.
YARN's resource manager act as the central resource allocator for applications such as mapreduce, impala (with llama), spark (in yarn mode) etc. So when we trigger a job, it will request the resource manager for the resources required for execution. The resource manager will allocate resources based on the availability. The resources will be allocated in the form of containers. Container is just an allocation of memory and cpu. One job may need multiple containers. Containers will be allocated across the cluster depending upon the availability. The tasks will be executed inside the container.
For example, When we submit a mapreduce job, an MR application master will be launched and it will negotiate with the resource manager for additional resources. Map and reduce tasks will be spawned in the allocated resources.
Similarly when we submit a spark job (YARN mode), a spark application master will be launched and it will negotiate with the resource manager for additional resources. The RDD's will be spawned in the allocated resources.

Running Hadoop in virtual environment

I would like to know whether I should expect problems when having Hadoop cluster on virtual instead of physical machines?
I'm mostly worried about using the same hard drive, I read that I should count for 1-2 containers per drive,but in my case only one drive will exist. Could that be a problem?
I think it depends upon how much size are you allocating for containers. Of course there would be limitation to number of containers if you have restriction to the memory.
I can highlight few points which can be considered while running hadoop cluster in virtual environment:
Network configuration in case of multi node cluster
Obvious the performance of application
Affect on scalability as limited resources if you are planning to run the cluster on host which has low configuration hardware

Mesos Cgroup resource usage

I am building an autoscaling system using mesos and marathon. The scenario is, I want to autoscale a task, if the amount of memory allocated to it is consumed more than 80%. How do I find out the amount of memory used at a cgroup level?
Is this the right approach?
You can get the stastics for each task by hitting http://host:5051/monitor/statistics.json where host is mesos slave.
This repo will give you an idea how to autoscale marathon applications.

Resources