Spark REST API, submit application NullPointerException on Windows - windows

I used my PC as the Spark Server and at the same time as the Spark Worker, using Spark 2.3.1.
At first, I used my Ubuntu 16.04 LTS.
Everything works fine, I tried to run the SparkPi example (using spark-submit and spark-shell)and it is able to run without problem.
I also try to run it using REST API from Spark, with this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
After testing this and that, I have to move to Windows, since it is will be done on Windows anyway.
I able to run the server and worker (manually), add the winutils.exe, and run the SparkPi example also using spark-shell and spark-submit, everything able to run too.
The problem is when I used the REST API, using this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
Only the path is a little different, but my worker always failed.
The logs said:
"Exception from the cluster: java.lang.NullPointerException
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:151)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scal173)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)"
I searched but no solutions has come yet..

So, finally I found the cause.
I read the source from:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
From inspecting it, I conclude that the problem is not from Spark, but the parameter is not being read correctly. Which means somehow, I put wrong parameter format.
So, after trying out several things, this one is the right one :
appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
changed to:
appResource": "file:///D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
And I did the same with spark.jars param.
That little differences had cost me almost 24 hours work... ~~~~

Related

nexus configure initial repositories non-interactively

I would like to create a docker for our nexus instance with the correct repositories, proxies etc already created.
Inspired by this question I started using the script API to configure my repositories. The repositories configured through this API don't work like the ones configured manually though (how sad; especially if you imagine the trouble I went through to get the configuration done with the non-documented script API...). I have already filed a bug therefore if you really want to know the details: https://issues.sonatype.org/browse/NEXUS-19891
Now my question: is there another way to configure the repositories non-interactively?
For jenkins it is possible to put some default configuration in /usr/share/jenkins/ref which will then be used only at the first startup; to give you an initial configuration. I was wondering if something similar exists for nexus? Or some other way that I don't know about?
I use python to do something similar to this:
curl -X POST -u admin:admin123 --header 'Content-Type: application/json' http://localhost:8081/service/rest/v1/script -d '{"name":"test","type":"groovy","content":"repository.createYumProxy('\''test'\'', '\''http://repository:8080/'\'')"}'
curl -X POST -u admin:admin123 --header "Content-Type: text/plain" 'http://127.0.0.1:8081/service/rest/v1/script/test/run'
the exact script that I post (more readable here than with all those escaped quotes):
repository.createYumProxy('{name}', '{url}');
configuration = repository.repositoryManager.get('{name}').configuration.copy();
configuration.attributes['proxy'] = [
remoteUrl : "{url}",
contentMaxAge : 0,
metadataMaxAge : 0
]
configuration.attributes['negativeCache'] = [
timeToLive : 1.0
]
repository.repositoryManager.update(configuration)
The part that was missing in my case was the repositoryManager.update(). As quoted on the ticket:
I think the important item(s) missing from your script is that you are not updating the repositoryManager with the new (copied) configuration (which causes the repository to stop/start and therefore reload config)

How to send a request with current coverage in post-Integration script using Xcode Server?

I setup a bot to perform my continuous integration. But I need to send info about coverage to my database using its own API.
Using following address: http://lb.mycompany.org/api/public/metrics I need to send a POST with following parameters:
{"project_public_id": "myprojectid", "type": "coverage", "value": "50", "platform": "ios"}
How can I do this? How to access code coverage from within trigger script?
You can use, but I don't know how to get coverage value. Example for XCS_TESTS_COUNT:
curl -i -X POST -H "Content-Type:application/json" your_http_address -d '{"value":'$XCS_TESTS_COUNT'}'

How to remove orphaned tasks in Apache Mesos?

The problem maybe caused by Mesos and Marathon out of sync, but the solution mentioned on GitHub doesn't work for me.
When I found the orphaned tasks:
What I do is:
restart Marathon
Marathon does not sync orphaned tasks, but start new tasks.
Orphaned tasks still took the resources, so I have to delete them.
I find all orphaned tasks under framework ef169d8a-24fc-41d1-8b0d-c67718937a48-0000,
curl -XGET `http://c196:5050/master/frameworks
shows that framework is unregistered_frameworks:
{
"frameworks": [
.....
],
"completed_frameworks": [ ],
"unregistered_frameworks": [
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000"
]
}
Try to delete framework by framework ID (so that the tasks under framework would be delete too)
curl -XPOST http://c196:5050/master/teardown -d 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-0000'
but get No framework found with specified ID
So, how to delete orphaned tasks?
There are two options
Register framework with same framework id. Do reconciliation and kill all tasks you receive. For example you can do it in following manner
Download the code git clone https://github.com/janisz/mesos-cookbook.git
Change dir cd mesos-cookbook/4_understanding_frameworks
In scheduler.go change master for your URL
If you want to mimic some other framework create /tmp/framework.json and fill it with FrameworkInfo data:
{
"id": "<mesos-framewokr-id>",
"user": "<framework-user>",
"name": "<framework-name>",
"failover_timeout": 3600,
"checkpoint": true,
"hostname": "<hostname>",
"webui_url": "<framework-web-ui>"
}
Run it go run scheduler.go scheduler.pb.go mesos.pb.go
Get list of all tasks curl localhost:9090
Delete task with curl -X DELETE "http://10.10.10.10:9090/?id=task_id"
Wait until failover_timeout so Mesos will delete this tasks for you.

create project for sonarqube with the rest-api / web-api

we try to automate the creation of projects (including user/group Management) in sonarqube and I already found the Web-API-documentation in our sonarqube 5.6-Installation. But if I try to create a project with the following settings
JSON-File create-project.json:
{"key": "test1", "name": "Testprojekt1"}
curl-request
curl --noproxy '*' -D -X POST -k -u admin:admin -H 'content-type: application/json' -d create_project.json http://localhost:9000/api/projects/create
I get the Error:
{"err_code":400,"err_msg":"Missing parameter: key"}
It's a bit strange because if I try e.g. the URL:
http://localhost:9000/api/projects/index
I get the list of the projects I created manuelly and if I try a request like
curl -u admin:admin -X POST 'http://localhost:9000/api/projects/create?key=myKey&name=myProject'
it works too, but I would like to use the new api because it looks like it support much more function that the 4.X API of sonarqube.
Maybe someone here can help me with this problem, if would very thanksful for every useful hint.
best regards
Dan
I found this question because I got the same "parameter missing" error message.
So what we both did not understand: The SQ API expects the parameters as plain URL parameters and not as json formatted parameters as most REST APIs do today.
PS: Would be nice if this could be added to the SQ documentation.

YARN REST API - Spark job submission

I am trying to use the YARN REST API to submit the spark-submit jobs, which I generally run via command line.
My command line spark-submit looks like this
JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf /usr/local/spark-1.5/bin/spark-submit \
--driver-class-path "/etc/hadoop/conf" \
--class MySparkJob \
--master yarn-cluster \
--conf "spark.executor.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
--conf "spark.driver.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
spark-job.jar --retry false --counter 10
Reading through the YARN REST API documentation https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application, I tried to create the JSON payload to POST which looks like
{
"am-container-spec": {
"commands": {
"command": "JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --jar spark-job.jar --class MySparkJob --arg --retry --arg false --arg --counter --arg 10"
},
"local-resources": {
"entry": [
{
"key": "spark-job.jar",
"value": {
"resource": "hdfs:///spark-job.jar",
"size": 3214567,
"timestamp": 1452408423000,
"type": "FILE",
"visibility": "APPLICATION"
}
}
]
}
},
"application-id": "application_11111111111111_0001",
"application-name": "test",
"application-type": "Spark"
}
The problem I see is that, the hadoop configs directory is previously local to the machine I was running jobs from, now that I submit job via REST API and it runs directly on the RM, I am not sure how to provide these details ?
If you are trying to submit spark job via REST APIs, I will suggest to have a look at Livy. Its a simple and easiest way to submit spark jobs to cluster.
Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.
Interactive Scala, Python and R shells
Batch submissions in Scala, Java, Python
Multiple users can share the same server (impersonation support)
Can be used for submitting jobs from anywhere with REST
Does not require any code change to your programs
We've also tried submitting application through Java RMI option.

Resources