Is it allowed in YARN to have "multiple containers" of the "same application" running on one DataNode?
Yes.
Example: multiple mappers of a job running on same DN
Yes, any data node can have multiple containers running in parallel.
The number of parallel containers is calculated by the YARN resource manager by considering the amount of ram memory , cpu cores available on the data node.
There are chances to see multiple containers running on the same data node when resource manager decides to run multiple mappers/reducers on the containers of a data node.
Related
we have a 100-node hadoop cluster. Currently I write a Flink App to write many files on HDFS by BucktingSink. When I run Flink App on yarn I found that all task managers is distributed on the same nodemanager which means all subtasks is running on this node. It opens many file descriptors on the datanode of this busy node. (I think flink filesystem connector connect to local datanode in precedence) This leads to high pressure on that node which easily fails the job.
Any good idea to solve this problem? Thank you very much!
This sounds like a Yarn scheduling problem. Please take a look at Yarn's capacity scheduler which allows you to schedule containers on nodes based on the available capacity. Moreover you could tell Yarn to also consider virtual cores for scheduling. This allows to define a different resource dimension compared to memory only.
If I have 3 spark applications all using the same yarn cluster, how should I set
yarn.nodemanager.resource.cpu-vcores
in each of the 3 yarn-site.xml?
(each spark application is required to have it's own yarn-site.xml on the classpath)
Does this value even matter in the client yarn-site.xml's ?
If it does:
Let's say the cluster has 16 cores.
Should the value in each yarn-site.xml be 5 (for a total of 15 to leave 1 core for system processes) ? Or should I set each one to 15 ?
(Note: Cloudera indicates one core should be left for system processes here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ however, they do not go into details of using multiple clients against the same cluster)
Assume Spark is running with yarn as the master, and running in cluster mode.
Are you talking about the server-side configuration for each YARN Node Manager? If so, it would typically be configured to be a little less than the number of CPU cores (or virtual cores if you have hyperthreading) on each node in the cluster. So if you have 4 nodes with 4 cores each, you could dedicate for example 3 per node to the YARN node manager and your cluster would have a total of 12 virtual CPUs.
Then you request the desired resources when submitting the Spark job (see http://spark.apache.org/docs/latest/submitting-applications.html for example) to the cluster and YARN will attempt to fulfill that request. If it can't be fulfilled, your Spark job (or application) will be queued up or there will eventually be a timeout.
You can configure different resource pools in YARN to guarantee a specific amount of memory/CPU resources to such a pool, but that's a little bit more advanced.
If you submit your Spark application in cluster mode, you have to consider that the Spark driver will run on a cluster node and not your local machine (that one that submitted it). Therefore it will require at least 1 virtual CPU more.
Hope that clarifies things a little for you.
In YARN, the application master requests the resource manager for the resources, so that the containers for that application can be launched.
Does the application master wait for all the resources to be allocated before it even launches the first container or it request for each and every container and as and when it obtains the resource for a container, it starts launching that specific container?
i.e.What about the situation when only part of the resources are available? Does it wait for the resource to be freed? or proceed based on the available resources?
How does the MR application master decides the resource requirement for an MR job? Does the YARN MR client determine this and sends it to AM or the AM finds it? If so, what is this based on? I believe that this is configurable but i may be talking about the default case when the memory, CPU are not provided.
No, the AM does not wait for all resources to be allocated. Instead it schedules / launches containers as resources are given to it by the resource manager.
The size requested for each container is defined in job configuration when the job is created by the driver. If values were not set explicitly for the job, values from mapred-site and mapred-default are used (see https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml) for default values of mapreduce.map.memory.mb, mapreduce.reduce.memory.mb mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores. How these values get translated into resources granted is a bit complicated and based on scheduler being used, minimum container allocation settings, etc.
I don't know for certain if there's a maximum number of containers that the MR app master will request other than (# of input splits for mappers) + (number of reducers). The MR app master will release containers when it is done with them (e.g., if you have 1,000 mapper containers but only 20 reducers it will release the other 980 containers once they are no longer needed).
I am running Spark using YARN(Hadoop 2.6) as cluster manager. YARN is running in Pseudo distributed mode. I have started the spark shell with 6 executors and was expecting the same
spark-shell --master yarn --num-executors 6
But whereas in the Spark Web UI, I see only 4 executors
Any reason for this?
PS : I ran the nproc command in my Ubuntu(14.04) and give below is the result. I believe this mean, my system has 8 cores
mountain#mountain:~$ nproc
8
did you take in account spark.yarn.executor.memoryOverhead?
possobly it creates hiden memory requrement and finaly yarn could not provide whole resources.
also, note that yarn round container size to yarn.scheduler.increment-allocation-mb.
all detail here:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
This happens when there are not enough resources on your cluster to start more executors. Following things are taken into account
Spark executor runs inside a yarn container. This container size is determined from the value of yarn.scheduler.minimum-allocation-mb in yarn-site.xml. Check this property. If your existing containers consume all available memory then more memory will not be available for new containers. so no new executors will be started
The storage memory column in the UI displays the amount of memory used for execution and RDD storage. By default, this equals (HEAP_SPACE - 300MB) * 75%. The rest of the memory is used for internal metadata, user data structures and other stuffs. ref(Spark on YARN: Less executor memory than set via spark-submit)
I hope this helps.
In Microsoft HPC Cluster Manager, is it possible to run two jobs (MPI job) simultaneously on same node? If so, how a job should be configured?
I've made some tries with HPC Cluster Manager, and I've found a solution like that;
First, job scheduler configuration must be selected as Balanced
Second, job resource type must be selected Core or Socket, not Node.
In addition to these two settings, if minimum requested resource is available for both 2 jobs, they start to run simultaneously on same node.