YARN applications cannot start when specifying YARN node labels - hadoop

I'm trying to use YARN node labels to tag worker nodes, but when I run applications on YARN (Spark or simple YARN app), those applications cannot start.
with Spark, when specifying --conf spark.yarn.am.nodeLabelExpression="my-label", the job cannot start (blocked on Submitted application [...], see details below).
with a YARN application (like distributedshell), when specifying -node_label_expression my-label, the application cannot start neither
Here are the tests I have made so far.
YARN node labels setup
I'm using Google Dataproc to run my cluster (example : 4 workers, 2 on preemptible nodes). My goal is to force any YARN application master to run on a non-preemptible node, otherwise the node can be shutdown at any time, thus making the application fail hard.
I'm creating the cluster using YARN properties (--properties) to enable node labels :
gcloud dataproc clusters create \
my-dataproc-cluster \
--project [PROJECT_ID] \
--zone [ZONE] \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 10 \
--num-workers 2 \
--worker-machine-type n1-standard-1 \
--worker-boot-disk-size 10 \
--num-preemptible-workers 2 \
--properties 'yarn:yarn.node-labels.enabled=true,yarn:yarn.node-labels.fs-store.root-dir=/system/yarn/node-labels'
Versions of packaged Hadoop and Spark :
Hadoop version : 2.8.2
Spark version : 2.2.0
After that, I create a label (my-label), and update the two non-preemptible workers with this label :
yarn rmadmin -addToClusterNodeLabels "my-label(exclusive=false)"
yarn rmadmin -replaceLabelsOnNode "\
[WORKER_0_NAME].c.[PROJECT_ID].internal=my-label \
[WORKER_1_NAME].c.[PROJECT_ID].internal=my-label"
I can see the created label in YARN Web UI :
Spark
When I run a simple example (SparkPi) without specifying info about node labels :
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
/usr/lib/spark/examples/jars/spark-examples.jar \
10
In the Scheduler tab on YARN Web UI, I see the application launched on <DEFAULT_PARTITION>.root.default.
But when I run the job specifying spark.yarn.am.nodeLabelExpression to set the location of the Spark application master :
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--conf spark.yarn.am.nodeLabelExpression="my-label" \
/usr/lib/spark/examples/jars/spark-examples.jar \
10
The job is not launched. From YARN Web UI, I see :
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Diagnostics: Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ;
I suspect that the queue related to the label partition (not <DEFAULT_PARTITION, the other one) does not have sufficient resources to run the job :
Here, Used Application Master Resources is <memory:1024, vCores:1>, but the Max Application Master Resources is <memory:0, vCores:0>. That explains why the application cannot start, but I can't figure out how to change this.
I tried to update different parameters, but without success :
yarn.scheduler.capacity.root.default.accessible-node-labels=my-label
Or increasing those properties :
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-am-resource-percent
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.user-limit-factor
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.minimum-user-limit-percent
without success neither.
YARN Application
The issue is the same when running a YARN application :
hadoop jar \
/usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
-shell_command "echo ok" \
-jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
-queue default \
-node_label_expression my-label
The application cannot start, and the logs keeps repeating :
INFO distributedshell.Client: Got application report from ASM for, appId=6, clientToAMToken=null, appDiagnostics= Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ; , appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1520354045946, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, [...]
If I don't specify -node_label_expression my-label, the application start on <DEFAULT_PARTITION>.root.default and succeed.
Questions
Am I doing something wrong with the labels? However, I followed the official documentation and this guide
Is this a specific problem related to Dataproc? Because the previous guides seems to work on other environments
Maybe I need to create a specific queue and associate it with my label? But since I'm running a "one-shot" cluster to run a single Spark job I don't need to have specific queues, running jobs on the default root one is not a problem for my use-case
Thanks for helping

A Google engineer answered us (on a private issue we raised, not in the PIT), and gave us a solution by specifying an initialization script to Dataproc cluster creation. I don't think the issue comes from Dataproc, this is basically just YARN configuration. The script sets the following properties in capacity-scheduler.xml, just after creating the node label (my-label) :
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels</name>
<value>my-label</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.my-label.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
<value>my-label</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity</name>
<value>100</value>
</property>
From the comment going along with the script, this "set accessible-node-labels on both root (the root queue) and root.default (the default queue applications actually get run on)". The root.default part is what was missing in my tests. Capacity for both is set to 100.
Then, restarting YARN (systemctl restart hadoop-yarn-resourcemanager.service) is needed to validate the modifications.
After that, I was able to start jobs that failed to complete in my question.
Hope that will help people having the same issues or similar.

Related

Spark on YARN: execute driver without worker

Running Spark on YARN, cluster mode.
3 data nodes with YARN
YARN => 32 vCores, 32 GB RAM
I am submitting Spark program like this:
spark-submit \
--class com.blablacar.insights.etl.SparkETL \
--name ${JOB_NAME} \
--master yarn \
--num-executors 1 \
--deploy-mode cluster \
--driver-memory 512m \
--driver-cores 1 \
--executor-memory 2g \
--executor-cores 20 \
toto.jar json
I can see 2 jobs are running fine on 2 nodes. But I can see also 2 other job with just a driver container !
Is it possible to not run driver if there no resource for worker?
Actually, there is a setting to limit resources to "Application Master" (in case of Spark, this is the driver):
yarn.scheduler.capacity.maximum-am-resource-percent
From http://maprdocs.mapr.com/home/AdministratorGuide/Hadoop2.xCapacityScheduler-RunningPendingApps.html:
Maximum percent of resources in the cluster that can be used to run
application masters - controls the number of concurrent active
applications.
This way, YARN will not take full resources for Spark drivers, and keep resources for workers. Youpi !

Spark over Yarn - Incorrect Application Master selection

I'm trying to fire some jobs with Spark over Yarn with the following command (this is just an example, actually i'm using different amount of memory and core) :
./bin/spark-submit --class org.mypack.myapp \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
lib/myapp.jar \
When I look at the Web UI to see what's is really happening under the hood, I notice that YARN is picking as Application Master a node that is not the Spark Master. This is a problem because the real Spark Master node is forcefully involved into the distributed computation leading to unnecessary network transfers of data (because, of course, the Spark master has no data to start with).
For what I saw during my tests, Yarn is picking the AM in a totally random fashion and I can't find a way to force him picking the Spark Master as AM.
My cluster is made of 4 nodes (3 Spark slaves, 1 Spark Master) with 64GB of total RAM and 32 cores, built upon HDP 2.4 with HortonWorks. The Spark Master is only hosting the namenode, the three slaves are datanodes.
You want to be able to specify a node, which does not have any DataNodes, to run the Spark Master. This, as far as I know, is not possible out of the box.
What you could do is run the master in yarn-client mode on the node which is running the NameNode, but this is probably not what you are looking for.
Another way would be to create your own Spark Client (where you specify using YARN API to prefer certain nodes over others for your Spark Master).

Spark not able to run in yarn cluster mode

I am trying to execute my code on a yarn cluster
The command which I am using is
$SPARK_HOME/bin/spark-submit \
--class "MyApp" \
target/scala-2.10/my-application_2.10-1.0.jar \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 6g \
--executor-memory 7g \
<outputPath>
But, I can see that this program is running only on the localhost.
Its able to read the file from hdfs.
I have tried this in standalone mode and it works fine.
Please suggest where is it going wrong.
I am using Hadoop2.4 with Spark 1.1.0 . I was able to get it running in the cluster mode.
To solve it we simply removed all the configuration files from all the slave nodes. Earlier we were running in the standalone mode and that lead to duplicating the configuration on all the slaves. Once that was done it ran as expected in cluster mode. Although performance is not up to the standalone mode.
Thanks.

How to report JMX from Spark Streaming on EC2 to VisualVM?

I have been trying to get a Spark Streaming job, running on a EC2 instance to report to VisualVM using JMX.
As of now I have the following config file:
spark/conf/metrics.properties:
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
And I start the spark streaming job like this:
(the -D bits I have added afterwards in the hopes of getting remote access to the ec2's jmx)
terminal:
spark/bin/spark-submit --class my.class.StarterApp --master local --deploy-mode client \
project-1.0-SNAPSHOT.jar \
-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=54321 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false
There are two issues with the spark-submit command line:
local - you must not run Spark Standalone with local master URL because there will be no threads to run your computations (jobs) and you've got two, i.e. one for a receiver and another for the driver. You should see the following WARN in the logs:
WARN StreamingContext: spark.master should be set as local[n], n > 1
in local mode if you have receivers to get data, otherwise Spark jobs
will not get resources to process the received data.
-D options are not picked up by the JVM as they're given after the Spark Streaming application and effectively became its command-line arguments. Put them before project-1.0-SNAPSHOT.jar and start over (you have to fix the above issue first!)
spark-submit --conf "spark.driver.extraJavaOptions=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8090 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"/path/example/src/main/python/pi.py 10000
Notes:the configurations format : --conf "params" . tested under spark 2.+

Spark on yarn concept understanding

I am trying to understand how spark runs on YARN cluster/client. I have the following question in my mind.
Is it necessary that spark is installed on all the nodes in yarn cluster? I think it should because worker nodes in cluster execute a task and should be able to decode the code(spark APIs) in spark application sent to cluster by the driver?
It says in the documentation "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster". Why does client node have to install Hadoop when it is sending the job to cluster?
Adding to other answers.
Is it necessary that spark is installed on all the nodes in the yarn
cluster?
No, If the spark job is scheduling in YARN(either client or cluster mode). Spark installation is needed in many nodes only for standalone mode.
These are the visualizations of spark app deployment modes.
Spark Standalone Cluster
In cluster mode driver will be sitting in one of the Spark Worker node whereas in client mode it will be within the machine which launched the job.
YARN cluster mode
YARN client mode
This table offers a concise list of differences between these modes:
pics source
It says in the documentation "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client-side)
configuration files for the Hadoop cluster". Why does the client node have
to install Hadoop when it is sending the job to cluster?
Hadoop installation is not mandatory but configurations(not all) are!. We can call them Gateway nodes. It's for two main reasons.
The configuration contained in HADOOP_CONF_DIR directory will be distributed to the YARN cluster so that all containers used by the application use the same configuration.
In YARN mode the ResourceManager’s address is picked up from the
Hadoop configuration(yarn-default.xml). Thus, the --master parameter is yarn.
Update: (2017-01-04)
Spark 2.0+ no longer requires a fat assembly jar for production
deployment. source
We are running spark jobs on YARN (we use HDP 2.2).
We don't have spark installed on the cluster. We only added the Spark assembly jar to the HDFS.
For example to run the Pi example:
./bin/spark-submit \
--verbose \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--conf spark.yarn.jar=hdfs://master:8020/spark/spark-assembly-1.3.1-hadoop2.6.0.jar \
--num-executors 2 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 4 \
hdfs://master:8020/spark/spark-examples-1.3.1-hadoop2.6.0.jar 100
--conf spark.yarn.jar=hdfs://master:8020/spark/spark-assembly-1.3.1-hadoop2.6.0.jar - This config tell the yarn from were to take the spark assembly. If you don't use it, it will upload the jar from were you run spark-submit.
About your second question: The client node doesn't not need Hadoop installed. It only needs the configuration files. You can copy the directory from your cluster to your client.
1 - Spark if following s slave/master architecture. So on your cluster, you have to install a spark master and N spark slaves. You can run spark in a standalone mode. But using Yarn architecture will give you some benefits.
There is a very good explanation of it here : http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
2- It is necessary if you want to use Yarn or HDFS for example, but as i said before you can run it in standalone mode.
Let me try to cut glues and make it short for impatient.
6 components: 1. client, 2. driver, 3. executors, 4. application master, 5. workers, and 6. resource manager; 2 deploy modes; and 2 resource (cluster) management.
Here's the relation:
Client
Nothing special, is the one submitting spark app.
Worker, executors
Nothing special, one worker holds one or more executors.
Master, & resource (cluster) manager
(no matter client or cluster mode)
in yarn, resource manager and master sit in two different nodes;
in standalone, resource manager == master, same process in the same node.
Driver
in client mode, sits with client
in yarn - cluster mode, sits with master (in this case, client process exits after submission of app)
in standalone - cluster mode, sits with one worker
VoilĂ !

Resources