I am trying to use the YARN REST API to submit the spark-submit jobs, which I generally run via command line.
My command line spark-submit looks like this
JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf /usr/local/spark-1.5/bin/spark-submit \
--driver-class-path "/etc/hadoop/conf" \
--class MySparkJob \
--master yarn-cluster \
--conf "spark.executor.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
--conf "spark.driver.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
spark-job.jar --retry false --counter 10
Reading through the YARN REST API documentation https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application, I tried to create the JSON payload to POST which looks like
{
"am-container-spec": {
"commands": {
"command": "JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --jar spark-job.jar --class MySparkJob --arg --retry --arg false --arg --counter --arg 10"
},
"local-resources": {
"entry": [
{
"key": "spark-job.jar",
"value": {
"resource": "hdfs:///spark-job.jar",
"size": 3214567,
"timestamp": 1452408423000,
"type": "FILE",
"visibility": "APPLICATION"
}
}
]
}
},
"application-id": "application_11111111111111_0001",
"application-name": "test",
"application-type": "Spark"
}
The problem I see is that, the hadoop configs directory is previously local to the machine I was running jobs from, now that I submit job via REST API and it runs directly on the RM, I am not sure how to provide these details ?
If you are trying to submit spark job via REST APIs, I will suggest to have a look at Livy. Its a simple and easiest way to submit spark jobs to cluster.
Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.
Interactive Scala, Python and R shells
Batch submissions in Scala, Java, Python
Multiple users can share the same server (impersonation support)
Can be used for submitting jobs from anywhere with REST
Does not require any code change to your programs
We've also tried submitting application through Java RMI option.
Related
I have used aspect-oriented programming to do the logging in a Java Maven project.
While running it through eclipse I have to initialize javaagent in vmargs, as follows:
-javaagent:lib/aspectjweaver-1.9.1.jar
Now I want to submit the jar produced to a Spark worker. I have written a shell script to do it. I am able to run but unable to initialize javaagent.
export SPARK_PATH=/xyz
export SPARK_URL=spark://abc:0000
export JAVA_OPTS="$JAVA_OPTS -javaagent:../aspectweaver-1.9.1.jar"
$SPARK_PATH/spark-submit --master $SPARK_URL --jars --class com.main.index ../index-0.0.1-SNAPSHOT.jar
I have tried number of examples like setting JAVA_OPTS and CATALINE_OPTS, creating spark-env.sh and setting it. But none of this worked. Struggling from last 3 days.
I checked few similar questions on stackoverflow but none of the were helpful in setting javaagent. Help.
Thanks.
EDIT:
I am checking if javaagent is initialized in code using below code:
try {
org.aspectj.weaver.loadtime.Agent.getInstrumentation();
} catch (NoClassDefFoundError | UnsupportedOperationException e) {
System.out.println(e);
}
I get the NoClassDefFoundError. Which concludes that javaagent is not set.
I got the answer for this, I had to use "--driver-java-options". Below is the updated script.
$SPARK_PATH/spark-submit --master $SPARK_URL --driver-java-options "-javaagent:../aspectjweaver-1.9.1.jar" --class com.main.index ../index-0.0.1-SNAPSHOT.jar "$1"
I used my PC as the Spark Server and at the same time as the Spark Worker, using Spark 2.3.1.
At first, I used my Ubuntu 16.04 LTS.
Everything works fine, I tried to run the SparkPi example (using spark-submit and spark-shell)and it is able to run without problem.
I also try to run it using REST API from Spark, with this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
After testing this and that, I have to move to Windows, since it is will be done on Windows anyway.
I able to run the server and worker (manually), add the winutils.exe, and run the SparkPi example also using spark-shell and spark-submit, everything able to run too.
The problem is when I used the REST API, using this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
Only the path is a little different, but my worker always failed.
The logs said:
"Exception from the cluster: java.lang.NullPointerException
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:151)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scal173)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)"
I searched but no solutions has come yet..
So, finally I found the cause.
I read the source from:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
From inspecting it, I conclude that the problem is not from Spark, but the parameter is not being read correctly. Which means somehow, I put wrong parameter format.
So, after trying out several things, this one is the right one :
appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
changed to:
appResource": "file:///D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
And I did the same with spark.jars param.
That little differences had cost me almost 24 hours work... ~~~~
Trying out this from the CLI
aws cognito-identity update-identity-pool \
--identity-pool-id MyIdentityPoolId \
--identity-pool-name MyIdentityPoolName \
--allow-unauthenticated-identities \
--supported-login-providers graph.facebook.com=MyFacebookAppId,api.twitter.com=MyTwitterConsumerKey;MyTwitterConsumerSecret \
--region $MyRegion
The CLI response says:
{
"SupportedLoginProviders": {
"api.twitter.com": "MyTwitterConsumerKey",
"graph.facebook.com": "MyFacebookAppId"
},
"AllowUnauthenticatedIdentities": true,
"IdentityPoolName": "MyIdentityPoolName",
"IdentityPoolId": "MyIdentityPoolId"
}
MyTwitterConsumerSecret: command not found
Unlike configuring facebook (which requires only one credential (the FacebookAppId), configuring twitter requires 2 credentials (ConsumerKey and ConsumerSecret).
If i am delimiting the 2 credentials by a semi-colon, looks like only the first part is getting set in the twitter configuration for amazon cognito. Screenshot attached.
What is the format to pass BOTH ConsumerKey and ConsumerSecret for configuring twitter?
I referred to these AWS docs:
Update Identity Pool via CLI
Create Identity Pool via CLI
Configuring Twitter/Digits with Amazon Cognito
Ok. How silly. Simply needed to scope the credentials for --supported-login-providers within double-quotes.
--supported-login-providers graph.facebook.com="MyFacebookAppId",api.twitter.com="MyTwitterConsumerKey;MyTwitterConsumerSecret"
Then it worked.
{
"SupportedLoginProviders": {
"api.twitter.com": "MyTwitterConsumerKey;MyTwitterConsumerSecret",
"graph.facebook.com": "MyFacebookAppId"
},
"AllowUnauthenticatedIdentities": true,
"IdentityPoolName": "MyIdentityPoolName",
"IdentityPoolId": "MyIdentityPoolId"
}
Screenshot attached.
The problem maybe caused by Mesos and Marathon out of sync, but the solution mentioned on GitHub doesn't work for me.
When I found the orphaned tasks:
What I do is:
restart Marathon
Marathon does not sync orphaned tasks, but start new tasks.
Orphaned tasks still took the resources, so I have to delete them.
I find all orphaned tasks under framework ef169d8a-24fc-41d1-8b0d-c67718937a48-0000,
curl -XGET `http://c196:5050/master/frameworks
shows that framework is unregistered_frameworks:
{
"frameworks": [
.....
],
"completed_frameworks": [ ],
"unregistered_frameworks": [
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000"
]
}
Try to delete framework by framework ID (so that the tasks under framework would be delete too)
curl -XPOST http://c196:5050/master/teardown -d 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-0000'
but get No framework found with specified ID
So, how to delete orphaned tasks?
There are two options
Register framework with same framework id. Do reconciliation and kill all tasks you receive. For example you can do it in following manner
Download the code git clone https://github.com/janisz/mesos-cookbook.git
Change dir cd mesos-cookbook/4_understanding_frameworks
In scheduler.go change master for your URL
If you want to mimic some other framework create /tmp/framework.json and fill it with FrameworkInfo data:
{
"id": "<mesos-framewokr-id>",
"user": "<framework-user>",
"name": "<framework-name>",
"failover_timeout": 3600,
"checkpoint": true,
"hostname": "<hostname>",
"webui_url": "<framework-web-ui>"
}
Run it go run scheduler.go scheduler.pb.go mesos.pb.go
Get list of all tasks curl localhost:9090
Delete task with curl -X DELETE "http://10.10.10.10:9090/?id=task_id"
Wait until failover_timeout so Mesos will delete this tasks for you.
I am trying to test docker port mapping by specifying parameters in the chronos job definition. The parameters options doesn't seem to take any effect on the docker run.
Job definition as follows:
{
"schedule": "R0//P",
"name": "testjob",
"container": {
"type": "DOCKER",
"image": "path_to_image",
"network": "BRIDGE",
"parameters" : [
{"key" : "-p", "value": "8888:4400"}
]
},
"cpus": "4",
"mem": "512",
"uris": ["path to dockercfg.tar.gz"],
"command" : "./command-to-execute"
}
1) Docker run on the node doesn't take parameters into consideration. Any suggestions on the correct way to include parameters as part of docker run will be highly appreciated?
2) The Docker Image I am trying to run has ENTRYPOINT specified in it. So technically, the ENTRYPOINT should run when the docker runs the container. With the way Chronos is set up, I am forced to provide "command" option in the job JSON (skipping command option during job submission throws back error). When the container is actually scheduled on the target node then instead of using the ENTRYPOINT from the dockerfile, it actually tries to run the command specified in the job definition JSON.
Can someone provide a way for using Chronos to run ENTRYPOINT instead of command from Chronos job JSON definition?
Notes:
Setting command to blank doesn't help.
ENTRYPOINT can be specified as a command in JSON job definition and that should fix the problem with command. But don't have access to ENTRYPOINT for all the containers.
***Edit 1: Modified question with some more context and clarity
You should have a look at the official docs regarding how to run a Docker job.
A docker job takes the same format as a scheduled job or a dependency job and runs on a Docker container. To configure it, an additional container argument is required, which contains a type (required), an image (required), a network mode (optional), mounted volumes (optional) and whether Mesos should always (force)Pull(Image) the latest image before executing or not (optional).
Concerning 1)
There's no way IMHO to set the parameters like you're trying to. Also, port mappings are specified differently (with Marathon), but as I understand the docs it's not possible at all. And probably not necessary for a batch job.
If you want to run longrunning services, use Marathon.
Concerning 2)
Not sure if I understand you correctly, but normally this would be implemented via specifying an ENTRYPOINT in the Dockerfile
Concerning 3)
Not sure if I understand you correctly, but I think you should be able to omit the command property with Docker jobs.
To use the docker container entrypoint you must set "shell" false, and command has to be blank. If command is different than blank, chronos will pass it as an argument to the entrypoint. Your json would be like below.
I don't know if you should use the "uris" field, it is deprecated, and, if it is what I think it is, it seems not required anymore to start docker apps.
About the docker parameters, I think the problem is with the key name that you used. It seems you must omit the - symbol. Try as below.
{
"schedule": "R0//P",
"name": "testjob",
"shell": false,
"container": {
"type": "DOCKER",
"image": "path_to_image",
"network": "BRIDGE",
"parameters" : [
{"key" : "p", "value": "8888:4400"}
]
},
"cpus": "4",
"mem": "512",
"command" : ""
}