How do YARN applications estimate needed resources - hadoop

I'm wondering how does a YARN app (let's say MapReduce job) estimate needed resources (CPU, RAM) for a single mapper/reducer.

The question is too broad but I'll try to give a direction for investigation. When Yarn application is executed, it requests some number of resources from Resource Manager. Resource management in Yarn is implemented by means of schedulers. Yarn supports two schedulers:
Fair scheduler
Capacity Scheduler
Schedulers define rules that are used for estimating "slots" for an application. For some schedulers "slots" are defined only by memory required for the application (Capacity Scheduler with DefaultResourseCalculator). Others take number of CPUs into account as well.

Related

Deploy 2 different topologies on a single Nimbus with 2 different hardware

I have 2 sets of storm topologies in use today, one is up 24/7, and does it's own work.
The other, is deployed on demand, and handles a much bigger loads of data.
As of today, we have N supervisors instances, all from the same type of hardware (CPU/RAM), I'd like my on demand topology to run on stronger hardware, but as far as I know, there's no way to control which supervisor is assigned to which topology.
So if I can't control it, it's possible that the 24/7 topology would assign one of the stronger workers to itself.
Any ideas, if there is such a way?
Thanks in advance
Yes, you can control which topologies go where. This is the job of the scheduler.
You very likely want either the isolation scheduler or the resource aware scheduler. See https://storm.apache.org/releases/2.0.0-SNAPSHOT/Storm-Scheduler.html and https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.html.
The isolation scheduler lets you prevent Storm from running any other topologies on the machines you use to run the on demand topology. The resource aware scheduler would let you set the resource requirements for the on demand topology, and preferentially assign the strong machines to the on demand topology. See the priority section at https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.html#Topology-Priorities-and-Per-User-Resource.

Why are mapreduce attempts killed due to "Container preempted by scheduler"?

I just noticed the fact that many Pig jobs on Hadoop are killed due to the following reason: Container preempted by scheduler
Could someone explain me what causes this, and if I should (and am able to) do something about this?
Thanks!
If you have the fair scheduler and a number of different queue's enabled, then higher priority applications can terminate your jobs (in a preemptive fashion).
Hortonworks have a pretty good explanation with more details
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/preemption.html
Should you do anything about it? Depends if your application is within its SLA's and performing within expectations. General good practice would be to review your job priority and the queue it's assigned to.
If your Hadoop cluster is being used by many business units. then Admins decides queue for them and every queue has its priorities( that too is decided by Admins). If Preemption is enabled at scheduler level,then higher-priority applications do not have to wait because lower priority applications have taken up the available capacity. So in this case lower propriety task must have to release resources, if not available at cluster to let run higher-priority applications.

Hadoop Resource management

I have a 12 node cluster and I am running a yarn architecture. It seems that my nodes are busy most of the time and many times job fails. How can I check the usage of the resources at any point of time?
Also is there any method to set a limited resource to a user for eg: if a user submits a job he should be given only 25gb of memory and 12 cores.
There are multiple ways to monitor the cluster.
If you are using Cloudera distribution then you can go to Cloudera Manager to monitor and manage the resources
If you are using Hortonworks distribution then you can go to Ambari web interface to monitor and manage the resources
If you are not using any distributions then clusters will be managed using Ganglia or Nagios web interface
Even if you do not have any of these you can go to resource manager web interface which typically runs on http://:8088. 8088 is default port number, it can be customized and you can get that information from yarn-site.xml
If your organization does not provide access to the web interfaces you can use commands such as yarn application --list and mapred job --list to see what is going on in the cluster
It is little tedious to monitor actual usage. You should know linux commands to monitor and develop shell scripts.
Also is there any method to set a limited resource to a user for eg: if a user submits a job he should be given only 25gb of memory and 12 cores.
Yes, you need to use queues and pools concept of schedulers embedded in Yarn. There are 3 types of scheduler FIFO, Capacity and Fair. FIFO should not be used in any of the clusters, it is mainly for development. You need to understand capacity and fair scheduler and set the limits.
It seems that my nodes are busy most of the time and many times job fails
You can implement some generic performance tuning guidelines to improve the thorughput. Have a look at this post : Tips to improve MapReduce Job performance in Hadoop , cloudera article and Map reduce performance aticle
Also is there any method to set a limited resource to a user for eg: if a user submits a job he should be given only 25gb of memory and 12 cores.
Adding to Durga's answer,
Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Hadoop NextGen is capable of scheduling multiple resource types.
By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU, using the notion of Dominant Resource Fairness developed by Ghodsi.
The scheduler organizes apps further into “queues”, and shares resources fairly between these queues. By default, all users share a single queue, named “default”. If an app specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to assign queues based on the user name included with the request through configuration.
e.g.
<user name="sample_user">
<maxRunningApps>30</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
The CapacityScheduler is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster.
Traditionally each organization has it own private set of compute resources that have sufficient capacity to meet the organization’s SLA under peak or near peak conditions. This generally leads to poor average utilization and overhead of managing multiple independent clusters, one per each organization
<property>
<name>yarn.scheduler.capacity.queue-mappings</name>
<value>u:user1:queue1,g:group1:queue2,u:%user:%user,u:user2:%primary_group</value>
<description>
Here, <user1> is mapped to <queue1>, <group1> is mapped to <queue2>,
maps users to queues with the same name as user, <user2> is mapped
to queue name same as <primary group> respectively. The mappings will be
evaluated from left to right, and the first valid mapping will be used.
</description>
</property>
Have a look at Fair scheduler and Capacity scheduler

Deploying optaplanner on a cluster

I know optaplanner scales very well wrt problem size.
But how can it scale wrt the number of problem requests?
Currently, we have exposed optaplanner as a REST service.
It can get hundreds of scheduling requests per day.
Search is stopped after 10 seconds. This means that at some peaks there are several scheduling requests in the queue.
What could we do to parallelize the requests on multiple machines?
All low-level support for a multi-tenant setup (on a cluster) is available:
The SolverFactory is thread safe, 1 per node
1 Solver per thread per node. Because a single Solver hogs a single Thread (it does not do any IO, unlike web requests etc), I recommend against running more Solvers than there are Threads.
Solver.terminateEarly() is thread-safe and can be called on all Solvers if the node is exiting.
However, there is no high-level support for a multi-tenant setup yet. So you 'll need to build that yourself:
Queue requests to be handled, in case more requests come in than threads are available. A JDK ExecutorService with the size of the number of CPU cores should suffice for this. Simply submit the requests as Future's. Build the Solver in the Future. Build the SolverFactory at bootstrap.
Optionally play with allowing more/less time depending user saturation. terminateEarly() can help in this regard.
Load balancing over nodes of the cluster. Each cluster has 1 ExecutorService.
Work stealing?
High-availability
Failover
...
We 'll build high-level support in the future. Please add your requirements in this jira issue.

What additional benefit does Yarn bring to the existing map reduce?

Yarn differs in its infrastructure layer from the original map reduce architecture in the following way:
In YARN, the job tracker is split into two different daemons called Resource Manager and Node Manager (node specific). The resource manager only manages the allocation of resources to the different jobs apart from comprising a scheduler which just takes care of the scheduling jobs without worrying about any monitoring or status updates. Different resources such as memory, cpu time, network bandwidth etc. are put into one unit called the Resource Container. There are different AppMasters running on different nodes which talk to a number of these resource containers and accordingly update the Node Manager with the monitoring/status details.
I want to know that how does using this kind of an approach increase the performance from the map-reduce perspective? Also, if there is any definitive content on the motivation behind Yarn and its benefits over the existing implementation of Map-reduce, please point me to the same.
Here are some of the articles (1, 2, 3) about YARN. These talk about the benefits of using YARN.
YARN is more general than MR and it should be possible to run other computing models like BSP besides MR. Prior to YARN, it required a separate cluster for MR, BSP and others. Now they they can coexist in a single cluster, which leads to higher usage of the cluster. Here are some of the applications ported to YARN.
From a MapReduce perspective in legacy MR there are separate slots for Map and Reduce tasks, but in YARN their is no fixed purpose of a container. The same container can be used for a Map task, Reduce task, Hama BSP Task or something else. This leads to better utilization.
Also, it makes it possible to run different versions of Hadoop in the same cluster which is not possible with legacy MR, which makes is easy from a maintenance point.
Here are some of the additional links for YARN. Also, Hadoop: The Definitive Guide, 3rd Edition has an entire section dedicated to YARN.
FYI, it had been a bit controversial to develop YARN instead of using some of frameworks which had been doing something similar and had been running for ages successfully with bugs ironed out.
I do not think that Yarn will speedup the existing MR framework. Looking into architecture we can see that the system now is more modular - but modularity usually contradicts higher performance.
It can be claimed that YARN has nothing to do with MapReduce. MapReduce just became one of the YARN applications. You can see it as moving from some embedded program to embeded OS with program within it
At the same time Yarn opens the door for different MR implementations with different frameworks. For example , if we assume that our dataset is smaller then cluster memory we can get much better performance. I think http://www.spark-project.org/ is one such example
To summarize it: Yarn does not improve the existing MR, but will enable other MR implementations to be better in all aspects.
All the above answers covered lot of information: I am simplifying all the information as follows:
MapReduce: YARN:
1. It is Platform plus Application It is a Platform in Hadoop 2.0 and
in Hadoop 1. 0 and it is only of doesn't exist in Hadoop 1.0
the applications in Hadoop 2.0
2. It is single use system i.e., It is multi purpose system, We can run
We can run MapReduce jobs only. MapReduce, Spark, Tez, Flink, BSP, MPP,
MPI, Giraph etc... (General Purpose)
3. JobTracker scalability i.e., Both Resource Management and
Both Resource Management and Application Management gets separated &
Job Management managed by RM+NM, Paradigm specific AMs
respectively.
4. Poor Resource Management Flexible Resource Management i.e.,
system i.e., slots (map/reduce) containers.
5. It is not highly available High availability and reliability.
6. Scaled out up to 5000 nodes Scaled out 10000 plus nodes.
7. Job->tasks Application -> DAG of Jobs -> tasks
8. Classical MapReduce = MapReduce Yarn MapReduce = MapReduce API +
API + MapReduce FrameWork MapReduce FrameWork + YARN System
+ MapReduce System So MR programs which were written over
Hadoop 1.0 run over Yarn also with out
changing a single line of code i.e.,
backward compatibility.
Let's see Hadoop 1.0 drawbacks, which have been addressed by Hadoop 2.0 with addition of Yarn.
Issue of Scalability : Job Tracker runs on a single machine even though you have thousands of nodes in Hadoop cluster. The responsibilities of Job tracker : Resource management, Job and Task schedule and monitoring. Since all these processes are running on a single node, this model is not scalable.
Issue of availability ( Single point of failure): Job Tracker is a single point of failure.
Resource utilization: Due to predefined number of Map & Reduce task slots, resources are not utilized properly. When all Mapper nodes are busy, Reducer nodes are idle and can't be used to process Mapper tasks.
Tight integration with Map Reduce framework: Hadoop 1.x can run Map reduce jobs only. Support for jobs other than Map Reduce jobs does not exists.
Now single Job Tracker bottleneck has been removed with YARN architecture in Hadoop 2.x
The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.
The ResourceManager has two main components: Scheduler and ApplicationsManager.
The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application.
The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
Now advantages of YARN
Scalability issues have been resolved
No single point of failure. All components are highly available
Resource utilization has been improved with proper utilization of Map and reduce slots.
Non Map Reduce Jobs can be submitted
It looks like this link might be what you're looking for: http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/.
My understanding is that YARN is supposed to be more generic. You can create your own YARN applications that negotiate directly with the Resource Manager for resources (1), and MapReduce is just one of several Application Managers that already exist (2).

Resources