How to remove orphaned tasks in Apache Mesos? - mesos

The problem maybe caused by Mesos and Marathon out of sync, but the solution mentioned on GitHub doesn't work for me.
When I found the orphaned tasks:
What I do is:
restart Marathon
Marathon does not sync orphaned tasks, but start new tasks.
Orphaned tasks still took the resources, so I have to delete them.
I find all orphaned tasks under framework ef169d8a-24fc-41d1-8b0d-c67718937a48-0000,
curl -XGET `http://c196:5050/master/frameworks
shows that framework is unregistered_frameworks:
{
"frameworks": [
.....
],
"completed_frameworks": [ ],
"unregistered_frameworks": [
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000"
]
}
Try to delete framework by framework ID (so that the tasks under framework would be delete too)
curl -XPOST http://c196:5050/master/teardown -d 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-0000'
but get No framework found with specified ID
So, how to delete orphaned tasks?

There are two options
Register framework with same framework id. Do reconciliation and kill all tasks you receive. For example you can do it in following manner
Download the code git clone https://github.com/janisz/mesos-cookbook.git
Change dir cd mesos-cookbook/4_understanding_frameworks
In scheduler.go change master for your URL
If you want to mimic some other framework create /tmp/framework.json and fill it with FrameworkInfo data:
{
"id": "<mesos-framewokr-id>",
"user": "<framework-user>",
"name": "<framework-name>",
"failover_timeout": 3600,
"checkpoint": true,
"hostname": "<hostname>",
"webui_url": "<framework-web-ui>"
}
Run it go run scheduler.go scheduler.pb.go mesos.pb.go
Get list of all tasks curl localhost:9090
Delete task with curl -X DELETE "http://10.10.10.10:9090/?id=task_id"
Wait until failover_timeout so Mesos will delete this tasks for you.

Related

What is the best way to automate the elasticsearch snapshot and restore

I need to automate the snapshot and restore from one cluster to a backup cluster , but when I try to restore the snapshot it complains about the indices already exist. Then I either need to delete those indices or close those to be freshly restored. Is there any --force kind of option to overwrite everything from live cluster to backup cluster ?
There is re-indxeing option but that is slow as compared to snapshot and restore.
You can define rename_pattern and rename_replacement as documentation suggests. To make it fully automated you could add time/date:
POST /_snapshot/my_backup/snapshot_1/_restore
{
"indices": "index_1,index_2",
"ignore_unavailable": true,
"include_global_state": true,
"rename_pattern": "(.+)",
"rename_replacement": "$1_20180820"
}
And then use aliases to make this "backup" index look like a "normal" one:
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "index_1_20180820", "alias" : "index_1" } }
]
}
Of course this means that you would have to write some automation scripts that generate that time/date and check the snapshot restore progress.
Hope that helps!

Spark REST API, submit application NullPointerException on Windows

I used my PC as the Spark Server and at the same time as the Spark Worker, using Spark 2.3.1.
At first, I used my Ubuntu 16.04 LTS.
Everything works fine, I tried to run the SparkPi example (using spark-submit and spark-shell)and it is able to run without problem.
I also try to run it using REST API from Spark, with this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
After testing this and that, I have to move to Windows, since it is will be done on Windows anyway.
I able to run the server and worker (manually), add the winutils.exe, and run the SparkPi example also using spark-shell and spark-submit, everything able to run too.
The problem is when I used the REST API, using this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
Only the path is a little different, but my worker always failed.
The logs said:
"Exception from the cluster: java.lang.NullPointerException
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:151)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scal173)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)"
I searched but no solutions has come yet..
So, finally I found the cause.
I read the source from:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
From inspecting it, I conclude that the problem is not from Spark, but the parameter is not being read correctly. Which means somehow, I put wrong parameter format.
So, after trying out several things, this one is the right one :
appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
changed to:
appResource": "file:///D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
And I did the same with spark.jars param.
That little differences had cost me almost 24 hours work... ~~~~

Heroku release phase bundle not found

I have the following release phase in my Procfile:
release: bundle exec rake db:migrate
It works great when I merge PR's into my staging and production apps, but it fails when running on a new review app. The Heroku docs say that the release phase is run after a successful build, so I don't know why it can't find bundle.
This is my output
heroku releases:output 9 --app my-app-pr-253
/bin/sh: 1: bundle: not found
For Heroku's review apps, you must specify all buildpacks and ENV vars you need in the app.json file. You can either manually create one, or have Heroku generate one for you.
https://devcenter.heroku.com/articles/github-integration-review-apps#app-json
Confirm that in your app.json you have specified
1) The required buildpacks https://devcenter.heroku.com/changelog-items/670. Since you are using bundle I'm guessing heroku/ruby will be one. Below is an example.
"buildpacks": [
{
"url": "https://github.com/heroku/heroku-buildpack-ruby.git"
},
2) Also make sure you specify any config variables that you want to inherit from the app off which your review app is being built. https://devcenter.heroku.com/articles/app-json-schema#env Missing one of these could also be causing a build to fail.
If neither of these work, try checking the logs for your heroku app. Watch the ones in the Heroku GUI during the build. Also try to tail the logs in the CLI.
heroku logs -t -a <review_app_name>
I figured out my problem. It was a silly typo:
"buildpacks": [
{
"url": "heroku/ruby",
"url": "https://github.com/guillaume-tgl/heroku-buildpack-ghostscript.git"
}
]
should have been:
"buildpacks": [
{ "url": "heroku/ruby"},
{ "url": "https://github.com/guillaume-tgl/heroku-buildpack-ghostscript.git" }
]

"Error detecting buildpack" during Heroku CI Test Setup

All,
Attempting to use Heroku's new-ish Continuous Integration service but it doesn't appear to want to play well with its own framework.
I've setup my Heroku Pipeline as outlined in the CI article: https://devcenter.heroku.com/articles/heroku-ci#configuration-using-app-json.
My deployments to review apps work correctly.
But my CI tests error with the following
app.json
"buildpacks": [
{ "url": "heroku/jvm" },
{ "url": "heroku/nodejs" }
],
Results in
$ heroku ci:debug --pipeline mypipelinename
Preparing source... done
Creating test run... done
Running setup and attaching to test dyno...
~ $ ci setup && eval $(ci env)
-----> Fetching heroku/jvm buildpack...
error downloading buildpack
I'm using the JVM buildpack so that I may install liquibase which manages version control for my Postgresql DB, but I'm actually deploying a NodeJs app.
Why would my "Review App"s deploy without problems but die during "Test Setup"?
I managed to get past this by using the github url for the node buildpack
"buildpacks": [
{
"url": "https://github.com/heroku/heroku-buildpack-nodejs"
}
],
I imagine it will work the same for jvm

Create database by manifest when deploying with Jelastic Packaging Standard

I'm trying to write a manifest for JPS deployment of an Jelastic application.
Creating nodes and deploying webapps works fine but I can't create a database and load a sql dump into it using the manifest directives.
My configs section looks like this:
"configs": [
{
"nodeType": "postgres9",
"restart": false,
"database": [{
"name": "somedbname",
"user" : "someusername",
"dump": "http://www.somehost.de/jelastic/somedump.sql"
}]
},
...
]
...
It seems that the database section is completely ignored.
Any ideas?
Likely you have an extra square brackets around database object definition, i.e. you must have "database" : { ... }, instead of "database" : [{...}]
Also I can suggest you to review an example from Cyclos. Their idea is to download an executable bash script which will be started by cron and do all the things required for setting the database up, including adding of a new user, extensions etc.
Best regards.

Resources