I was following this guide for setting up Kubernetes on AWS.
At the end of the guide, I was running 1 master and 3 minions using AWS EC2 machines. Then I decided to shut them all off - thinking if the master is down, none will revive.
After few hours, I find 4 minions running.
Could it be the case that a master spawn up as a minion - somehow the order of operations caused this to happen?
How do I safely kill them all?
To safely shut down your cluster you should run cluster/kube-down.sh as described at the end of the guide you linked to.
Related
I have created a GCP Dataproc cluster with Standard (1 master, N workers). Now I want to upgrade it to High Availability (3 masters, N workers) - Is it possible?
I tried GCP, GCP alpha and GCP beta commands. For example GCP beta documented here: https://cloud.google.com/sdk/gcloud/reference/beta/dataproc/clusters/update.
It has option to scale worker nodes, however does not have option to switch from standard to high availability mode. Am I correct?
You can upgrade the master node by going into VM Instances section under your cluster , Stop your Master VM and Edit the configuration to
You may always upgrade your master node machine type and also add more worker node.
While that would improve your cluster job performance but noting to do with HA.
The answer is - no. Once HA cluster is created, it can't be downgraded and vice versa. You can add worker nodes, however master node can't be altered.
Yes, you can always do that, for changing the machine type of master node
you first need to stop the master VM instance, then you can change the machine type
Even the machine type of worker node can be changed, only we need to do is stop the machine and edit the machine configuration.
We are running a 3 node mesos cluster and mesos master is running on each node. Also, 2 slaves are running on each node. Is this a good practice? 2 slaves on each cluster won't be sending too much offer and end up being overloaded? What is the recommended config for 3 nodes cluster?
Thread from Mesos User Mailing List
It depends on your isolation setting (mainly cgroup, or any node level
resources). In general, we don't recommend folks use multiple agents on a
node.
It's possible to make it work by setting cgroup_root separately for
MesosContainerizer. For DockerContainerizer, currently, we hard code
DOCKER_NAME_PREFIX, making it not possible to use two agents on a node
properly.
Running Docker containers won't work properly because restarting one agent
will cause Docker containers managed by the other agent to be deleted.
I have install DC/OS (3master and 7slave server - all Centos7)
I saw problem - when one of slave server shut down - mesos/marathon start killed instance of application after 5 minutes.
For example - I run in mesos/marathon 8 instance simple web application. When I shut down or deactivate network interface of one slave server marathon show that some instancje are killed. From this moment mesos/marathon wait 5 minutes and start killed instance to another online slave server.
My question is - how can I change this time? 5 minutes is to long. I read documentation of DC/OS but I can't find variable responsible for this.
I will be very thankful for your help.
You can have a at the Marathon command-line flags. Based on your description, I guess the default for either task_launch_timeout or scale_apps_interval could be responsible for this.
I'm unsure though if this can be configured on the fly, or during installation in DC/OS. I saw that there's a quite recent enhancement request to Make Marathon flags passable via environment variables.
I have cluster of 3 Mesos slaves, where I have two applications: “redis” and “memcached”. Where redis depends on memcached and the requirement is both of the applications/services should start on same node instead of different slave nodes.
So I have created the application group and added the dependency properly in the JSON file. After launching the JSON file via “v2/groups” REST API, I observe that sometime both application group will start on same node but sometimes it will start on different slaves which breaks our requirement.
So intent/requirement is; if any application fails to start on a slave both the application should failover to other slave node. Also can I configure the JSON file to tell Marathon to start the application group on slave-1 (specific slave first) if it is available else start it on other slave in a cluster. Due to some reason if this application group will start on other slave can Marathon relaunch the application group to slave-1 if it is available to serve the request.
Thanks in advance for help.
Edit/Update (2):
Mesos, Marathon, and DC/OS support for PODs is available now:
DC/OS: https://dcos.io/docs/1.9/usage/pods/using-pods/
Mesos: https://github.com/apache/mesos/blob/master/docs/nested-container-and-task-group.md
Marathon: https://github.com/mesosphere/marathon/blob/master/docs/docs/pods.md
I assume you are talking about marathon apps.
Marathon application groups don't have any semantics concerning co-location on the same node and the same is the case for dependencies.
You seem to be looking for a Kubernetes like Pod abstraction in marathon, which is on the roadmap but not yet available (see update above :-)).
Hope this helps!
I think this should be possible (as a workaround) if you specify the correct app contraints within the group's JSON.
Have a look at the example request at
https://mesosphere.github.io/marathon/docs/generated/api.html#v2_groups_post
and the constraints syntax at
https://mesosphere.github.io/marathon/docs/constraints.html
e.g.
"constraints": [["hostname", "CLUSTER", "slave-1"]]
should do. Downside is that there will be no automatic failover to another slave that way. Still, I'd be curious why both apps need to specifically run on the same slave node...
I'm trying to run a test cluster locally following this guide https://mesosphere.com/2014/07/07/installing-mesos-on-your-mac-with-homebrew/
Currently, I'm able to have a master running at localhost:5050 and a slave running at the default port 5051 (with slave id say S0). However, when I tried to start another slave at a different port, it re-registered itself as S0 and the master console only showed 1 activated slave. Does anybody know how would I start another slave S1? Thanks!
Did you specify a another work_dir?
E.g.
sudo /usr/local/sbin/mesos-slave --master=localhost:5050 --port=5052 -- work_dir=/tmp/mesos2
To explain a bit why this is needed/ where the error you saw came from.
Mesos supports so called slave recovery for helping with upgrades and error recovery.
Therefore when starting a slave, it will check its work_dir for checkpoint and try to recover that state (i.e. reconnect to still running executors).
In your case as both slaves wanted to start from the same working directory, the second one tried to recover the checkpoint of the still running first slave...
P.S. I should probably replace all the above occurences of slave with worker (https://issues.apache.org/jira/browse/MESOS-1478), but I hope this is easier to read.