How to attach storage volume with elasticsearch nodes in kubernetes? - elasticsearch

I am doing setup of Elasticseach on Kubernetes. I have created the cluster of Elasticsearch of 2 nodes. I want to attach storage with both of these nodes. like 80Gi with the first node and 100Gi with the second node.
My Kubernetes cluster is on EC2 and I am using EBS as storage.

In order to attach persistence, you need:
A StorageClass Object (Define the Storage)
A PersistentVolume Object (Provision the Storage)
A PersistentVolumeClaim Object (Attach the storage)
With each Node in ElasticSearch that you can attached with the pods in deployment\pod object definition.
An easier way is deploying ES cluster using Helm Chart.

As per helm chart documentation:
Automated testing of this chart is currently only run against GKE (Google Kubernetes Engine). If you are using a different Kubernetes provider you will likely need to adjust the storageClassName in the volumeClaimTemplate
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: elast
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
fsType: ext4
Hope this help.

Related

Best practices for data storage with Elasticsearch and Kubernetes

After reading some documentation regarding Persistent Volumes in Kubernetes I am wondering which one would be the best setup (storage speaking) for running a highly available ElasticSearch cluster. I am not running the typical EFK (or ELK) setup, but I am using ElasticSearch as a proper full-text search engine.
I've read the official Elastic Documentation, but I find it quite lacking of clarification. According to "Kubernetes in Action", Chapter 6:
When an application running in a pod needs to persist data to disk and
have that same data available even when the pod is rescheduled to
another node, you can’t use any of the volume types we’ve mentioned so
far. Because this data needs to be accessible from any cluster node,
it must be stored on some type of network-attached storage (NAS).
So if I am not mistaken, I need a Volume and access it through PersistentVolumes and PersistentVolumeClaim with Retain policies.
When looking at Official Volumes, I get a feeling that one should define the Volume type him/herself. Though, when looking at a DigitalOcean guide, it does not seem there was any Volume setup there.
I picked that tutorial, but there are dozens on Medium that are all doing the same thing.
So: which one is the best setup for an ElasticSearch cluster? Of course keeping in mind order to not loose any data within an index, and being able to add pods(Kubernetes) or nodes (ElasticSearch) that can access the index.
A good pattern to deploy an ElasticSearch cluster in kubernetes is to define a StatefulSets.
Because the StatefulSet replicates more than one Pod you cannot simply reference a persistent volume claim. Instead, you need to add a persistent volume claim template to the StatefulSet state definition.
In order for these replicated peristent volumes to work, you need to create a Dynamic Volume Provisioning and StorageClass which allows storage volumes to be created on-demand.
In the DigitalOcean guide tutorial, the persistent volume claim template is as follow:
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: do-block-storage
resources:
requests:
storage: 100Gi
Here, the StorageClass is do-block-storage. You can replace it with your own storage class
Very interesting question,
You need to think of an Elasticsearch node in Kubernetes that would be equivalent to an Elasticsearch Pod.
And Kubernetes need to hold the identity of each pod to attach to the correct Persistent Volume claim in case of an outage, here comes the StatefulSet
A StatefulSet will ensure the same PersistentVolumeClaim stays bound to the same Pod throughout its lifetime.
A PersistentVolume (PV) is a Kubernetes abstraction for storage on the provided hardware. This can be AWS EBS, DigitalOcean Volumes, etc.
I'd recommend having a look into the Elasticsearch Offical Helm chart: https://github.com/elastic/helm-charts/tree/master/elasticsearch
Also Elasticsearch Operator: https://operatorhub.io/operator/elastic-cloud-eck

Is it possible to find zone and region of the node my container is running on

I want to find the region and node of my node, I need this to log monitoring data.
kubernetes spec and metadata doesn't provide this information. I checked out
https://github.com/kubernetes/client-go which looks promising but I can't find
the info I am looking for.
Any suggestion? Thanks
If you are using GKE then node zone and region should be in node's labels:
failure-domain.beta.kubernetes.io/region
failure-domain.beta.kubernetes.io/zone
topology.kubernetes.io/region
topology.kubernetes.io/zone
You can see node labels using kubectl get nodes --show-labels
This is what I ended up doing.
Enabled Workload Identity
gcloud container clusters update <CLUSTER_NAME> \
--workload-pool=<PROJECT_ID>.svc.id.goog
Updated Node pool to use workload identity
gcloud container node-pools update <NODEPOOL_NAME> \
--cluster=<CLUSTER_NAME> \
--workload-metadata=<GKE_METADATA>
Created a service account for my app
bind my KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<PROJECT_ID>.svc.id.goog[<K8S_NAMESPACE>/<KSA_NAME]" \
<GSA_NAME>#<PROJECT_ID>.iam.gserviceaccount.com
Annotated service account using the email address of the GSA
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: <GSA_NAME>#<PROJECT_ID>.iam.gserviceaccount.com
name: KSA_NAME
namespace: <K8S_NAMESPACE>
This authenticated by container with GSA and I was able to get the metadata info using https://cloud.google.com/compute/docs/storing-retrieving-metadata
Not in any direct way. You could use the downward API to expose the node name to the pod, and then fetch the annotations/labels for that node. But that would require fairly broad read permissions (all nodes) so might be a security risk.

Kubernetes - Apply pod affinity rule to live deployment

I am guess I am just asking for confirmation really. As had some major issues in the past with our elastic search cluster on kubernetes.
Is it fine to add a pod affinity to rule to a already running deployment. This is a live production elastic search cluster and I want to pin the elastic search pods to specific nodes with large storage.
I kind of understand kubernetes but not really elastic search so dont want to cause any production issues/outages as there is no one around that could really help to fix it.
Currently running 6 replicas but want to reduce to 3 that run on 3 worker nodes with plenty of storage.
I have labelled my 3 worker nodes with the label 'priority-elastic-node=true'
This is podaffinity i will add to my yaml file and apply:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: priority-elastic-node
operator: In
values:
- "true"
topologyKey: "kubernetes.io/hostname"
What I assume will happen is nothing after I apply but then when I start scaling down the elastic node replicas the elastic nodes stay on the preferred worker nodes.
Any change to the pod template will cause the deployment to roll all pods. That includes a change to those fields. So it’s fine to change, but your cluster will be restarted. This should be fine as long as your replication settings are cromulent.

ELK to monitor Kubernetes

I have Kubernetes cluster runnning and created the ELK stack on different machine.
Now I want to ship the logs from Kubernetes cluster to ELK how can I achieve it?
The ELK stack is outside the cluster.
Have you tried fluentd? Logging agent that collects logs and able to ship logs to Elastic search.
UPDATE
I just found some examples in kops repo. You can check here
You can run filebeat to collect logs from kubernetes.
Follow the instruction of documentation on link:
After you download kubernetes.yaml change:
- name: ELASTICSEARCH_HOST
value: [your elastic search domain]
- name: ELASTICSEARCH_PORT
value: "9200"
- name: ELASTICSEARCH_USERNAME
value: elastic
- name: ELASTICSEARCH_PASSWORD
value: changeme
Pay attention! You need admin privileges for creating filebeat ServiceAccount
We can use EFK stack for Kubernetes Logging and Monitoring. We need a Kubernetes cluster with following capabilities.
Ability to run privileged containers.
Helm and tiller enabled.
Statefulsets and dynamic volume provisioning capability: Elasticsearch is deployed as stateful set on Kubernetes. It’s best to use latest version of Kubernetes (v 1.10 as of this writing)
Please refer https://platform9.com/blog/kubernetes-logging-and-monitoring-the-elasticsearch-fluentd-and-kibana-efk-stack-part-2-elasticsearch-configuration/ for step by step guide.
You can use logging modules like Winston to ship logs to elastic with the plugins they provide
It is very direct and easy to setup
In my node application I used this
Winston plugin

How can I scale the PVC of a statefulset?

When I try to edit the PVC, Kubernetes gives error saying:
The StatefulSet "es-data" is invalid: spec: Forbidden: updates to
statefulset spec for fields other than 'replicas', 'template', and
'updateStrategy' are forbidden.
I am trying to increase the disk size of elasticsearch which is deployed as a statefulset on AKS.
The error is self explaining. You can only update template and updateStrategy part of a StatefulSet. Also, you can't resize a PVC. However, from kubernetes 1.11 you can resize pvc but it is still alpha feature.
Ref: Resizing an in-use PersistentVolumeClaim
Note: Alpha features are not enabled by default and you have to enable manually while creating the cluster.
It is possible to expand the PVC of a statefulset on AKS, following these four steps:
https://stackoverflow.com/a/71175193/4083568

Resources