Event-hook upon up/down-scaling or deletion of an App - mesos

I didn't find info whether it is possible to define something like an Event-hook upon up/down-scaling or deletion of an App in the Marathon Rest API docs at https://mesosphere.github.io/marathon/docs/rest-api.html
What I'd like to achieve is that I'm able to backup some data from a running Docker container before be is destroyed. For example, I run a cluster of Elasticsearch nodes on Marathon, and I would like to delay the deletion of the app until the then triggered "Create snapshot to external disk resource" process is finished.
Is there currently something I could use?

Marathon provides an Event Bus covering some phases of the lifecycle. Beyond that, currently the only other option I see is to go for Mesos Modules/Hooks.

Related

Mesos task history after restart

I am using Mesos for container orchestration and get task history from Mesos using /task endpoint.
Mesos is running in a 7 nodes cluster and zookeeper is running in a 3 node cluster. I hope, Mesos uses Zookeeper to store the task History. We lost history sometimes when we restart Mesos. Does it store in memory? I am trying to understand what is happening here.
My questions are,
Where does it store task histories?
How can we configure the task history cleanup policy?
Why do we lose complete task history on restarting Mesos?
To answer your questions:
Task history/state for Mesos is stored in memory, and in the replicated_log (details here). The default is set to use the replicated_log, to store state completely in memory without the replicated_log you would have to specify this in your Mesos flags seen here in the configuration page as --registry=in_memory
Most users typically configure task history cleanup by using these three flags (there are more, but these are most common) --max_completed_frameworks=VALUE, --max_completed_tasks_per_framework=VALUE, and --max_unreachable_tasks_per_framework=VALUE as described in the previous document.
Yes, task history for the /tasks endpoint is lost every time a Mesos Master is restarted. However, the /state endpoint will still contain all task status changes over time.
**Edited to reflect information about the /tasks endpoint, not the /state endpoint.

How to specify where the application should run when failover in Marathon

I'm using Mesos and Marathon. I created an application on Marathon.
When applications failover to other node in cluster, can we control where they should invoke?
I tried with LIKE "Constrains" in Marathon but it doesn't work as my expectation.
Thanks in advance
You can use a LIKE or UNLIKE constraint (or set of contraints) to restrict where marathon can place any given app instance; however, you can't choose a specific one upon failure.

What services can I turn off to trigger a second application attempt?

Hadoop YARN includes a configuration to modify how many times an application can be started: yarn.resourcemanager.am.max-attempts.
I am interested in hitting this limit to observe how the system may fail, and I want to be able to do it without modifying code. To mimic production scenarios, I would like to turn off other Hadoop services to cause a second attempt of the application.
What services can I turn off during the application run to trigger another application attempt?
For simplicity, just close storage services(hosting your source data or target data). For example, hdfs service, hive service, etc.

Provision to start group of applications on same Mesos slave

I have cluster of 3 Mesos slaves, where I have two applications: “redis” and “memcached”. Where redis depends on memcached and the requirement is both of the applications/services should start on same node instead of different slave nodes.
So I have created the application group and added the dependency properly in the JSON file. After launching the JSON file via “v2/groups” REST API, I observe that sometime both application group will start on same node but sometimes it will start on different slaves which breaks our requirement.
So intent/requirement is; if any application fails to start on a slave both the application should failover to other slave node. Also can I configure the JSON file to tell Marathon to start the application group on slave-1 (specific slave first) if it is available else start it on other slave in a cluster. Due to some reason if this application group will start on other slave can Marathon relaunch the application group to slave-1 if it is available to serve the request.
Thanks in advance for help.
Edit/Update (2):
Mesos, Marathon, and DC/OS support for PODs is available now:
DC/OS: https://dcos.io/docs/1.9/usage/pods/using-pods/
Mesos: https://github.com/apache/mesos/blob/master/docs/nested-container-and-task-group.md
Marathon: https://github.com/mesosphere/marathon/blob/master/docs/docs/pods.md
I assume you are talking about marathon apps.
Marathon application groups don't have any semantics concerning co-location on the same node and the same is the case for dependencies.
You seem to be looking for a Kubernetes like Pod abstraction in marathon, which is on the roadmap but not yet available (see update above :-)).
Hope this helps!
I think this should be possible (as a workaround) if you specify the correct app contraints within the group's JSON.
Have a look at the example request at
https://mesosphere.github.io/marathon/docs/generated/api.html#v2_groups_post
and the constraints syntax at
https://mesosphere.github.io/marathon/docs/constraints.html
e.g.
"constraints": [["hostname", "CLUSTER", "slave-1"]]
should do. Downside is that there will be no automatic failover to another slave that way. Still, I'd be curious why both apps need to specifically run on the same slave node...

High availability issue with rethinkdb cluster in kubernetes

I'm setting up rethinkdb cluster inside kubernetes, but it doesn't work as expected for high availability requirement. Because when a pod is down, kubernetes will creates another pod, which runs another container of the same image, old mounted data (which is already persisted on host disk) will be erased and the new pod will join the cluster as a brand new instance. I'm running k8s in CoreOS v773.1.0 stable.
Please correct me if i'm wrong, but that way it seems impossible to setup a database cluster inside k8s.
Update: As documented here http://kubernetes.io/v1.0/docs/user-guide/pod-states.html#restartpolicy, if RestartPolicy: Always it will restart the container if exits failure. It means by "restart" that it brings up the same container, or create another one? Or maybe because I stop the pod via command kubectl stop po so it doesn't restart the same container?
That's how Kubernetes works, and other solution works probably same way. When a machine is dead, the container on it will be rescheduled to run on another machine. That other machine has no state of container. Event when it is the same machine, the container on it is created as a new one instead of restarting the exited container(with data inside it).
To persistent data, you need some kind of external storage(NFS, EBS, EFS,...). In case of k8s, you may want to look into this https://github.com/kubernetes/kubernetes/blob/master/docs/design/persistent-storage.md This Github issue also has many information https://github.com/kubernetes/kubernetes/issues/6893
And in deed, that's the way to achieve HA in my opinion. Container are all stateless, they don't hold anything inside them. Any configuration needs for them should be store outside such as using thing like Consul or Etcd. By separating this like this, it's easier to restart a container
Try using PetSets http://kubernetes.io/docs/user-guide/petset/
That allows you to name your (pet) pods. If a pod is killed, then it will come back with the same name.
Summary of the petset feature is as follows.
Stable hostname
Stable domain name
Multiple pets of a similar type will be named with a "-n" (rethink-0,
rethink-1, ... rethink-n for example)
Persistent volumes
Now apps can cluster/peer together
When a pet pod dies, a new one will be started and will assume all the same "state" (including disk) of the previous one.

Resources