Killing xxx because the cloud is no longer accepting new H2O nodes - h2o

plz help~
i create h2o-stateful-set which set replicas: 3, then i run a h2o automl job, it works well. but suddenly one of pod breakdown, i use kubectl delete pod h2o-k8s-1 to delete this pod. the statefulset create a new pod has same name h2o-k8s-1.
But here's the problem, the new pod can't join h2o cluster, and job stuck, logs as follows
FJ-126-3 WARN water.default: Killing h2o-stateful-set-1.h2o-service.dhr-h2o.svc.cluster.local/10.177.5.212:54321 because the cloud is no longer accepting n
ew H2O nodes.
i know New H2O nodes join to form a cluster during launch. After a job has started on the cluster, it prevents new members from joining. but what should i do if cluster pod breakdown during training?

Yes, this is expected. Once one of the nodes crashes, you will need to restart the whole cluster. You need to make sure that you configure your kubernets jobs so that the pods are not preempted.

Related

Build a Kubernetes Operator For rolling updates

I have created a Kubernetes application (Say deployment D1, using docker image I1), that will run on client clusters.
Requirement 1 :
Now, I want to roll updates whenever I update my docker image I1, without any efforts from client side
(Somehow, client cluster should automatically pull the latest docker image)
Requirement 2:
Whenever, I update a particular configMap, the client cluster should automatically start using the new configMap
How should I achieve this ?
Using Kubernetes Cronjobs ?
Kubernetes Operators ?
Or something else ?
I heard that k8s Operator can be useful
Starting with the Requirement 2:
Whenever, I update a particular configMap, the client cluster should
automatically start using the new configMap
If configmap is mounted to the deployment it will get auto-updated however if getting injected as the Environment restart is only option unless you are using the sidecar solution or restarting the process.
For ref : Update configmap without restarting POD
How should I achieve this ?
ImagePullpolicy is not a good option i am seeing however, in that case, manual intervention is required to restart deployment and it
pulls the latest image from the client side and it won't be in a
controlled manner.
Using Kubernetes Cronjobs ?
Cronjobs you will run which side ? If client-side it's fine to do
that way also.
Else you can keep deployment with Exposed API which will run Job to
update the deployment with the latest tag when any image gets pushed
to your docker registry.
Kubernetes Operators ?
An operator is a good native K8s option you can write in Go,
Python or your preferred language with/without Operator framework or Client Libraries.
Or something else?
If you just looking for updating the deployment, Go with running the API in the deployment or Job you can schedule in a controlled manner, no issue with the operator too would be a more native and a good approach if you can create, manage & deploy one.
If in the future you have a requirement to manage all clusters (deployment, service, firewall, network) of multiple clients from a single source of truth place you can explore the Anthos.
Config management from Git repo sync with Anthos
You can build a Kubernetes operator to watch your particular configmap and trigger cluster restart. As for the rolling updates, you can configure the deployment according to your requirement. A Deployment's rollout is triggered if and only if the Deployment's Pod template (that is, .spec.template) is changed, for example, if the labels or container images of the template are updated. Add the specifications for rolling update on your Kubernetes deployment .spec section:
type: RollingUpdate
rollingUpdate:
maxSurge: 3 //the maximum number of pods to be created beyond the desired state during the upgrade
maxUnavailable: 1 //the maximum number of unavailable pods during an update
timeoutSeconds: 100 //the time (in seconds) that waits for the rolling event to timeout
intervalSeconds: 5 //the time gap in seconds after an update
updatePeriodSeconds: 5 //time to wait between individual pods migrations or updates

Spring Cloud Kubernetes Leader Election Not Updating When Pod is Killed

I have a Spring Boot web application, packaged as a container that I am deploying in Kubernetes. The application has a scheduled job that must run periodically only on one of the running pods.
I'm using Spring Cloud Kubernetes's leader election capability, with Fabric8 as the provider, to select the pod that runs this job.
Suppose I run this service with one pod instance. The first time I start this service, it elects the one pod as the leader, and everything works as expected. But if I kill the pod, the new pod instance does not recover; it is unable to elect itself as the leader. The configmap that indicates which pod is the leader still points to the old killed pod.
If I delete the configmap entry, the pod detects that there is no leader and then elects itself.
How can I fix this so that my new pod elects itself as the leader automatically, without me having to delete the configmap?
Ideally upon the leader died lease won't get renewed as per example after sometime, thus lease resources should be acquired by another candidate after a timeout.
Otherwise terminate the pod gracefully, in the shutdown hook execute the cleanup logic.

Daemonset pod to check a specific service that is different for every pod

I have a clickhouse cluster that is running on pods in kubernetes and I have a clickhouse-explorer that should be checking one clickhouse node each. I was thinking about using daemonset for this but I don't know how to force in the config that each pod to connect to a different service. Note I don't want to use sidecontainer for the clickhouse.
Any suggestions?

High availability issue with rethinkdb cluster in kubernetes

I'm setting up rethinkdb cluster inside kubernetes, but it doesn't work as expected for high availability requirement. Because when a pod is down, kubernetes will creates another pod, which runs another container of the same image, old mounted data (which is already persisted on host disk) will be erased and the new pod will join the cluster as a brand new instance. I'm running k8s in CoreOS v773.1.0 stable.
Please correct me if i'm wrong, but that way it seems impossible to setup a database cluster inside k8s.
Update: As documented here http://kubernetes.io/v1.0/docs/user-guide/pod-states.html#restartpolicy, if RestartPolicy: Always it will restart the container if exits failure. It means by "restart" that it brings up the same container, or create another one? Or maybe because I stop the pod via command kubectl stop po so it doesn't restart the same container?
That's how Kubernetes works, and other solution works probably same way. When a machine is dead, the container on it will be rescheduled to run on another machine. That other machine has no state of container. Event when it is the same machine, the container on it is created as a new one instead of restarting the exited container(with data inside it).
To persistent data, you need some kind of external storage(NFS, EBS, EFS,...). In case of k8s, you may want to look into this https://github.com/kubernetes/kubernetes/blob/master/docs/design/persistent-storage.md This Github issue also has many information https://github.com/kubernetes/kubernetes/issues/6893
And in deed, that's the way to achieve HA in my opinion. Container are all stateless, they don't hold anything inside them. Any configuration needs for them should be store outside such as using thing like Consul or Etcd. By separating this like this, it's easier to restart a container
Try using PetSets http://kubernetes.io/docs/user-guide/petset/
That allows you to name your (pet) pods. If a pod is killed, then it will come back with the same name.
Summary of the petset feature is as follows.
Stable hostname
Stable domain name
Multiple pets of a similar type will be named with a "-n" (rethink-0,
rethink-1, ... rethink-n for example)
Persistent volumes
Now apps can cluster/peer together
When a pet pod dies, a new one will be started and will assume all the same "state" (including disk) of the previous one.

Adding node to existing cluster in Kubernetes

I have a kubernetes cluster running on 2 machines (master-minion node and minion node). I want to add a new minion node without disrupting the current set up, is there a way to do it?
I have seen that when I try to add the new node, the services on the other nodes stops it, due to which I have to stop the services before deploying the new node to the existing cluster.
To do this in the latest version (tested on 1.10.0) you can issue following command on the masternode:
kubeadm token create --print-join-command
It will then print out a new join command (like the one you got after kubeadmn init):
kubeadm join 192.168.1.101:6443 --token tokentoken.lalalalaqyd3kavez --discovery-token-ca-cert-hash sha256:complexshaoverhere
You need to run kubelet and kube-proxy on a new minion indicating api address in params.
Example:
kubelet --api_servers=http://<API_SERVER_IP>:8080 --v=2 --enable_server --allow-privileged
kube-proxy --master=http://<API_SERVER_IP>:8080 --v=2
After this you should see new node in
kubectl get no
In my case the issue was due to an existing wront Route53 "A" record.
Once it's been updated to point to internal IPs of API servers, kube-proxy was able to reach the masters and the node appeared in the list (kubectl get nodes).

Resources