How can I scale the PVC of a statefulset? - elasticsearch

When I try to edit the PVC, Kubernetes gives error saying:
The StatefulSet "es-data" is invalid: spec: Forbidden: updates to
statefulset spec for fields other than 'replicas', 'template', and
'updateStrategy' are forbidden.
I am trying to increase the disk size of elasticsearch which is deployed as a statefulset on AKS.

The error is self explaining. You can only update template and updateStrategy part of a StatefulSet. Also, you can't resize a PVC. However, from kubernetes 1.11 you can resize pvc but it is still alpha feature.
Ref: Resizing an in-use PersistentVolumeClaim
Note: Alpha features are not enabled by default and you have to enable manually while creating the cluster.

It is possible to expand the PVC of a statefulset on AKS, following these four steps:
https://stackoverflow.com/a/71175193/4083568

Related

Build a Kubernetes Operator For rolling updates

I have created a Kubernetes application (Say deployment D1, using docker image I1), that will run on client clusters.
Requirement 1 :
Now, I want to roll updates whenever I update my docker image I1, without any efforts from client side
(Somehow, client cluster should automatically pull the latest docker image)
Requirement 2:
Whenever, I update a particular configMap, the client cluster should automatically start using the new configMap
How should I achieve this ?
Using Kubernetes Cronjobs ?
Kubernetes Operators ?
Or something else ?
I heard that k8s Operator can be useful
Starting with the Requirement 2:
Whenever, I update a particular configMap, the client cluster should
automatically start using the new configMap
If configmap is mounted to the deployment it will get auto-updated however if getting injected as the Environment restart is only option unless you are using the sidecar solution or restarting the process.
For ref : Update configmap without restarting POD
How should I achieve this ?
ImagePullpolicy is not a good option i am seeing however, in that case, manual intervention is required to restart deployment and it
pulls the latest image from the client side and it won't be in a
controlled manner.
Using Kubernetes Cronjobs ?
Cronjobs you will run which side ? If client-side it's fine to do
that way also.
Else you can keep deployment with Exposed API which will run Job to
update the deployment with the latest tag when any image gets pushed
to your docker registry.
Kubernetes Operators ?
An operator is a good native K8s option you can write in Go,
Python or your preferred language with/without Operator framework or Client Libraries.
Or something else?
If you just looking for updating the deployment, Go with running the API in the deployment or Job you can schedule in a controlled manner, no issue with the operator too would be a more native and a good approach if you can create, manage & deploy one.
If in the future you have a requirement to manage all clusters (deployment, service, firewall, network) of multiple clients from a single source of truth place you can explore the Anthos.
Config management from Git repo sync with Anthos
You can build a Kubernetes operator to watch your particular configmap and trigger cluster restart. As for the rolling updates, you can configure the deployment according to your requirement. A Deployment's rollout is triggered if and only if the Deployment's Pod template (that is, .spec.template) is changed, for example, if the labels or container images of the template are updated. Add the specifications for rolling update on your Kubernetes deployment .spec section:
type: RollingUpdate
rollingUpdate:
maxSurge: 3 //the maximum number of pods to be created beyond the desired state during the upgrade
maxUnavailable: 1 //the maximum number of unavailable pods during an update
timeoutSeconds: 100 //the time (in seconds) that waits for the rolling event to timeout
intervalSeconds: 5 //the time gap in seconds after an update
updatePeriodSeconds: 5 //time to wait between individual pods migrations or updates

Best practices for data storage with Elasticsearch and Kubernetes

After reading some documentation regarding Persistent Volumes in Kubernetes I am wondering which one would be the best setup (storage speaking) for running a highly available ElasticSearch cluster. I am not running the typical EFK (or ELK) setup, but I am using ElasticSearch as a proper full-text search engine.
I've read the official Elastic Documentation, but I find it quite lacking of clarification. According to "Kubernetes in Action", Chapter 6:
When an application running in a pod needs to persist data to disk and
have that same data available even when the pod is rescheduled to
another node, you can’t use any of the volume types we’ve mentioned so
far. Because this data needs to be accessible from any cluster node,
it must be stored on some type of network-attached storage (NAS).
So if I am not mistaken, I need a Volume and access it through PersistentVolumes and PersistentVolumeClaim with Retain policies.
When looking at Official Volumes, I get a feeling that one should define the Volume type him/herself. Though, when looking at a DigitalOcean guide, it does not seem there was any Volume setup there.
I picked that tutorial, but there are dozens on Medium that are all doing the same thing.
So: which one is the best setup for an ElasticSearch cluster? Of course keeping in mind order to not loose any data within an index, and being able to add pods(Kubernetes) or nodes (ElasticSearch) that can access the index.
A good pattern to deploy an ElasticSearch cluster in kubernetes is to define a StatefulSets.
Because the StatefulSet replicates more than one Pod you cannot simply reference a persistent volume claim. Instead, you need to add a persistent volume claim template to the StatefulSet state definition.
In order for these replicated peristent volumes to work, you need to create a Dynamic Volume Provisioning and StorageClass which allows storage volumes to be created on-demand.
In the DigitalOcean guide tutorial, the persistent volume claim template is as follow:
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: do-block-storage
resources:
requests:
storage: 100Gi
Here, the StorageClass is do-block-storage. You can replace it with your own storage class
Very interesting question,
You need to think of an Elasticsearch node in Kubernetes that would be equivalent to an Elasticsearch Pod.
And Kubernetes need to hold the identity of each pod to attach to the correct Persistent Volume claim in case of an outage, here comes the StatefulSet
A StatefulSet will ensure the same PersistentVolumeClaim stays bound to the same Pod throughout its lifetime.
A PersistentVolume (PV) is a Kubernetes abstraction for storage on the provided hardware. This can be AWS EBS, DigitalOcean Volumes, etc.
I'd recommend having a look into the Elasticsearch Offical Helm chart: https://github.com/elastic/helm-charts/tree/master/elasticsearch
Also Elasticsearch Operator: https://operatorhub.io/operator/elastic-cloud-eck

Is it possible to find zone and region of the node my container is running on

I want to find the region and node of my node, I need this to log monitoring data.
kubernetes spec and metadata doesn't provide this information. I checked out
https://github.com/kubernetes/client-go which looks promising but I can't find
the info I am looking for.
Any suggestion? Thanks
If you are using GKE then node zone and region should be in node's labels:
failure-domain.beta.kubernetes.io/region
failure-domain.beta.kubernetes.io/zone
topology.kubernetes.io/region
topology.kubernetes.io/zone
You can see node labels using kubectl get nodes --show-labels
This is what I ended up doing.
Enabled Workload Identity
gcloud container clusters update <CLUSTER_NAME> \
--workload-pool=<PROJECT_ID>.svc.id.goog
Updated Node pool to use workload identity
gcloud container node-pools update <NODEPOOL_NAME> \
--cluster=<CLUSTER_NAME> \
--workload-metadata=<GKE_METADATA>
Created a service account for my app
bind my KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<PROJECT_ID>.svc.id.goog[<K8S_NAMESPACE>/<KSA_NAME]" \
<GSA_NAME>#<PROJECT_ID>.iam.gserviceaccount.com
Annotated service account using the email address of the GSA
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: <GSA_NAME>#<PROJECT_ID>.iam.gserviceaccount.com
name: KSA_NAME
namespace: <K8S_NAMESPACE>
This authenticated by container with GSA and I was able to get the metadata info using https://cloud.google.com/compute/docs/storing-retrieving-metadata
Not in any direct way. You could use the downward API to expose the node name to the pod, and then fetch the annotations/labels for that node. But that would require fairly broad read permissions (all nodes) so might be a security risk.

How to attach storage volume with elasticsearch nodes in kubernetes?

I am doing setup of Elasticseach on Kubernetes. I have created the cluster of Elasticsearch of 2 nodes. I want to attach storage with both of these nodes. like 80Gi with the first node and 100Gi with the second node.
My Kubernetes cluster is on EC2 and I am using EBS as storage.
In order to attach persistence, you need:
A StorageClass Object (Define the Storage)
A PersistentVolume Object (Provision the Storage)
A PersistentVolumeClaim Object (Attach the storage)
With each Node in ElasticSearch that you can attached with the pods in deployment\pod object definition.
An easier way is deploying ES cluster using Helm Chart.
As per helm chart documentation:
Automated testing of this chart is currently only run against GKE (Google Kubernetes Engine). If you are using a different Kubernetes provider you will likely need to adjust the storageClassName in the volumeClaimTemplate
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: elast
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
fsType: ext4
Hope this help.

Expand size PVC of statefulset on k8s 1.9

I have a statefulset of kafka. I need to expand the disk size, i try wihout succes to use the automatic resize feature of k8s 1.9
Here : https://kubernetes.io/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims
I did activate feature gates and admission pluging, i think it's work beacause i can succefully change the size of the pvc after the modification.
But nothing happend i have modified the size of the PVC from 50Gi to 250Gi.
The capacity did change everywhere in the pvc, but not on AWS the EBS volume is still 50gb and a df -h in the pod still show 50gb
Did i miss something ? Do i have to manually resize on aws ?
thank you
That is an alpha feature which has some problems and limitations.
Try to find some information on the Issues on Github which is related to your problem:
Support automatic resizing of volumes
[pvresize]Display of pvc capacity did not make corresponding changes when pv resized
Also, check that comment, it can be useful:
#discordianfish please try EBS PVC resize with 1.10. Currently the user experience of resizing volumes with file systems is not ideal. You will have to edit the pvc and then wait for FileSystemResizePending condition to appear on PVC and then delete and recreate the pod that was using the PVC. If there was no pod using the PVC, then once condition FileSystemResizePending appears on PVC then you will have to start a pod using it for file system resize to finish.
I made the feature work, but in a very very dirty way.
Modify the size of the PVC
Modify the size of the EBS manually
Force unmount the volume on AWS
The pod crash and is
rescheduled by the statefullset, when the pod is up again the volume and partition have the correct size

Resources