Spring Cloud Kubernetes Leader Election Not Updating When Pod is Killed - spring-boot

I have a Spring Boot web application, packaged as a container that I am deploying in Kubernetes. The application has a scheduled job that must run periodically only on one of the running pods.
I'm using Spring Cloud Kubernetes's leader election capability, with Fabric8 as the provider, to select the pod that runs this job.
Suppose I run this service with one pod instance. The first time I start this service, it elects the one pod as the leader, and everything works as expected. But if I kill the pod, the new pod instance does not recover; it is unable to elect itself as the leader. The configmap that indicates which pod is the leader still points to the old killed pod.
If I delete the configmap entry, the pod detects that there is no leader and then elects itself.
How can I fix this so that my new pod elects itself as the leader automatically, without me having to delete the configmap?

Ideally upon the leader died lease won't get renewed as per example after sometime, thus lease resources should be acquired by another candidate after a timeout.
Otherwise terminate the pod gracefully, in the shutdown hook execute the cleanup logic.

Related

Build a Kubernetes Operator For rolling updates

I have created a Kubernetes application (Say deployment D1, using docker image I1), that will run on client clusters.
Requirement 1 :
Now, I want to roll updates whenever I update my docker image I1, without any efforts from client side
(Somehow, client cluster should automatically pull the latest docker image)
Requirement 2:
Whenever, I update a particular configMap, the client cluster should automatically start using the new configMap
How should I achieve this ?
Using Kubernetes Cronjobs ?
Kubernetes Operators ?
Or something else ?
I heard that k8s Operator can be useful
Starting with the Requirement 2:
Whenever, I update a particular configMap, the client cluster should
automatically start using the new configMap
If configmap is mounted to the deployment it will get auto-updated however if getting injected as the Environment restart is only option unless you are using the sidecar solution or restarting the process.
For ref : Update configmap without restarting POD
How should I achieve this ?
ImagePullpolicy is not a good option i am seeing however, in that case, manual intervention is required to restart deployment and it
pulls the latest image from the client side and it won't be in a
controlled manner.
Using Kubernetes Cronjobs ?
Cronjobs you will run which side ? If client-side it's fine to do
that way also.
Else you can keep deployment with Exposed API which will run Job to
update the deployment with the latest tag when any image gets pushed
to your docker registry.
Kubernetes Operators ?
An operator is a good native K8s option you can write in Go,
Python or your preferred language with/without Operator framework or Client Libraries.
Or something else?
If you just looking for updating the deployment, Go with running the API in the deployment or Job you can schedule in a controlled manner, no issue with the operator too would be a more native and a good approach if you can create, manage & deploy one.
If in the future you have a requirement to manage all clusters (deployment, service, firewall, network) of multiple clients from a single source of truth place you can explore the Anthos.
Config management from Git repo sync with Anthos
You can build a Kubernetes operator to watch your particular configmap and trigger cluster restart. As for the rolling updates, you can configure the deployment according to your requirement. A Deployment's rollout is triggered if and only if the Deployment's Pod template (that is, .spec.template) is changed, for example, if the labels or container images of the template are updated. Add the specifications for rolling update on your Kubernetes deployment .spec section:
type: RollingUpdate
rollingUpdate:
maxSurge: 3 //the maximum number of pods to be created beyond the desired state during the upgrade
maxUnavailable: 1 //the maximum number of unavailable pods during an update
timeoutSeconds: 100 //the time (in seconds) that waits for the rolling event to timeout
intervalSeconds: 5 //the time gap in seconds after an update
updatePeriodSeconds: 5 //time to wait between individual pods migrations or updates

Kubernetes custom deployment

Our horizontal scaling is currently suffering because of Liquibase.
We would want our deployments to always deploy one pod which runs Liquibase (-Dspring.liquibase.enabled=true), and then all subsequent pods to not run it (-Dspring.liquibase.enabled=false).
Is there anything that Kubernetes offers which could do this out of the box?
I'm unfamiliar with Liquibase and I'm unclear how non-first Pods leverage Liquibase but, you may be able to use a lock to control access. A Pod that acquires the lock sets the property to true and, if it is unable to acquire the lock, the property is false.
One challenge will be in ensuring that the lock is released if the first Pod terminates. And, to understand the consequence on the other Pods. Is an existing Pod promoted?
Even though Kubernetes leverages etcd for its own distributed locking purposes, users are encouraged to run separate etcd instances if they need locks. Since you have to choose, you may as well choose what you prefer e.g. Redis, Zookeeper.
You could use an init Container or sidecar for the locking mechanism and a shared volume to record its state.
It feels as though Liquibase should be a distinct Deployment exposed as a Service that all Pods access.
Have you contacted Liquibase to see what it recommends for Kubernetes deployments?

Killing xxx because the cloud is no longer accepting new H2O nodes

plz help~
i create h2o-stateful-set which set replicas: 3, then i run a h2o automl job, it works well. but suddenly one of pod breakdown, i use kubectl delete pod h2o-k8s-1 to delete this pod. the statefulset create a new pod has same name h2o-k8s-1.
But here's the problem, the new pod can't join h2o cluster, and job stuck, logs as follows
FJ-126-3 WARN water.default: Killing h2o-stateful-set-1.h2o-service.dhr-h2o.svc.cluster.local/10.177.5.212:54321 because the cloud is no longer accepting n
ew H2O nodes.
i know New H2O nodes join to form a cluster during launch. After a job has started on the cluster, it prevents new members from joining. but what should i do if cluster pod breakdown during training?
Yes, this is expected. Once one of the nodes crashes, you will need to restart the whole cluster. You need to make sure that you configure your kubernets jobs so that the pods are not preempted.

Does ECS update-service command marks the container instance state to draining when use with the --force-new-deployment option?

The command:
aws ecs update-service --service my-http-service --task-definition amazon-ecs-sample --force-new-deployment
As per AWS docs: You can use this option (--force-new-deployment) to trigger a new deployment with no service definition changes. For example, you can update a service's tasks to use a newer Docker image with the same image/tag combination (my_image:latest ) or to roll Fargate tasks onto a newer platform version.
My question, if I use '--force-new-deployment' (as I will use the exisiting tag or definition), will the underline 'ECS Instance' automatically set to DRAINING state, so that any new task (if any) will not start in the EXISTING ecs-instance that is suppose to go away during rolling-update deployment strategy (or deployment controller) ?
In other words, will there be any chance:
For a new task to be created on the existing/old container instance, that is suppose to go away during rolling update.
Also, what would happen with the ongoing task that is running on this existing/old container instance, that is suppose to go away during rolling update.
Ref: https://docs.aws.amazon.com/cli/latest/reference/ecs/update-service.html
Please note that no Container instance is going anywhere with this 'update-service' command. This command will only create a new Deployment under the ECS service and when the new tasks become healthy, remove the old task(s).
Edit 1:
How about the request that were served by old task?
I am assuming the tasks are behind an Application Load Balancer. In this case, old tasks will be deregistered from the ALB.
Note: In the following discussion, target is the ECS Task.
To give you a brief description on how the Deregistration Delay works with ECS, the following is the sequential order when a task connected to an ALB is stopped. It can be due to a scale in event, deployment of a new task definition, decrease of the number of tasks, force deployment, etc.
ECS sends DeregisterTargets call and the targets change the status to "draining". New connections will not be served to these targets.
If the deregistration delay time elapsed and there are still in-flight requests, the ALB will terminate them and clients will receive 5XX responses originated from the ALB.
The targets are deregistered from the target group.
ECS will send the stop call to the tasks and the ECS-agent will gracefully stop the containers (SIGTERM).
If the containers are not stopped within the stop timeout period (ECS_CONTAINER_STOP_TIMEOUT by default 30s) they will force stopped (SIGKILL).
As per the ELB documentation [1] if a deregistering target has no in-flight requests and no active connections, Elastic Load Balancing immediately completes the deregistration process, without waiting for the deregistration delay to elapse. However, even though target deregistration is complete, the status of the target will be displayed as draining and you can see the registered Target of the old task is still present in the TargetGroup console with status as draining until the deregistration delay elapses.
The design of ECS is to stop the Task after the completion of Draining process as mentioned in the ECS document [2] and hence the ECS Service waits for the TargetGroup to complete the Draining process before issuing the stop call.
Ref:
[1] Target Groups for Your Application Load Balancers - Deregistration Delay - https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#deregistration-delay
[2] Updating a Service - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html

High availability issue with rethinkdb cluster in kubernetes

I'm setting up rethinkdb cluster inside kubernetes, but it doesn't work as expected for high availability requirement. Because when a pod is down, kubernetes will creates another pod, which runs another container of the same image, old mounted data (which is already persisted on host disk) will be erased and the new pod will join the cluster as a brand new instance. I'm running k8s in CoreOS v773.1.0 stable.
Please correct me if i'm wrong, but that way it seems impossible to setup a database cluster inside k8s.
Update: As documented here http://kubernetes.io/v1.0/docs/user-guide/pod-states.html#restartpolicy, if RestartPolicy: Always it will restart the container if exits failure. It means by "restart" that it brings up the same container, or create another one? Or maybe because I stop the pod via command kubectl stop po so it doesn't restart the same container?
That's how Kubernetes works, and other solution works probably same way. When a machine is dead, the container on it will be rescheduled to run on another machine. That other machine has no state of container. Event when it is the same machine, the container on it is created as a new one instead of restarting the exited container(with data inside it).
To persistent data, you need some kind of external storage(NFS, EBS, EFS,...). In case of k8s, you may want to look into this https://github.com/kubernetes/kubernetes/blob/master/docs/design/persistent-storage.md This Github issue also has many information https://github.com/kubernetes/kubernetes/issues/6893
And in deed, that's the way to achieve HA in my opinion. Container are all stateless, they don't hold anything inside them. Any configuration needs for them should be store outside such as using thing like Consul or Etcd. By separating this like this, it's easier to restart a container
Try using PetSets http://kubernetes.io/docs/user-guide/petset/
That allows you to name your (pet) pods. If a pod is killed, then it will come back with the same name.
Summary of the petset feature is as follows.
Stable hostname
Stable domain name
Multiple pets of a similar type will be named with a "-n" (rethink-0,
rethink-1, ... rethink-n for example)
Persistent volumes
Now apps can cluster/peer together
When a pet pod dies, a new one will be started and will assume all the same "state" (including disk) of the previous one.

Resources