ElasticSearch CrashLoopBackoff when deploying with ECK in Kubernetes OKD 4.11 - elasticsearch

I am running Kubernetes using OKD 4.11 (running on vSphere) and have validated the basic functionality (including dyn. volume provisioning) using applications (like nginx).
I also applied
oc adm policy add-scc-to-group anyuid system:authenticated
to allow authenticated users to use anyuid (which seems to have been required to deploy the nginx example I was testing with).
Then I installed ECK using this quickstart with kubectl to install the CRD and RBAC manifests. This seems to have worked.
Then I deployed the most basic ElasticSearch quickstart example with kubectl apply -f quickstart.yaml using this manifest:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
The deployment proceeds as expected, pulling image and starting container, but ends in a CrashLoopBackoff with the following error from ElasticSearch at the end of the log:
"elasticsearch.cluster.name":"quickstart",
"error.type":"java.lang.IllegalStateException",
"error.message":"failed to obtain node locks, tried
[/usr/share/elasticsearch/data]; maybe these locations
are not writable or multiple nodes were started on the same data path?"
Looking into the storage, the PV and PVC are created successfully, the output of kubectl get pv,pvc,sc -A -n my-namespace is:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO Delete Bound my-namespace/elasticsearch-data-quickstart-es-default-0 thin 41m
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-namespace persistentvolumeclaim/elasticsearch-data-quickstart-es-default-0 Bound pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO thin 41m
NAMESPACE NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/thin (default) kubernetes.io/vsphere-volume Delete Immediate false 19d
storageclass.storage.k8s.io/thin-csi csi.vsphere.vmware.com Delete WaitForFirstConsumer true 19d
Looking at the pod yaml, it appears that the volume is correctly attached :
volumes:
- name: elasticsearch-data
persistentVolumeClaim:
claimName: elasticsearch-data-quickstart-es-default-0
- name: downward-api
downwardAPI:
items:
- path: labels
fieldRef:
apiVersion: v1
fieldPath: metadata.labels
defaultMode: 420
....
volumeMounts:
...
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
I cannot understand why the volume would be read-only or rather why ES cannot create the lock.
I did find this similar issue, but I am not sure how to apply the UID permissions (in general I am fairly naive about the way permissions work in OKD) when when working with ECK.
Does anyone with deeper K8s / OKD or ECK/ElasticSearch knowledge have an idea how to better isolate and/or resolve this issue?
Update: I believe this has something to do with this issue and am researching the optionas related to OKD.

For posterity, the ECK starts an init container that should take care of the chown on the data volume, but can only do so if it is running as root.
The resolution for me was documented here:
https://repo1.dso.mil/dsop/elastic/elasticsearch/elasticsearch/-/issues/7
The manifest now looks like this:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
# run init container as root to chown the volume to uid 1000
podTemplate:
spec:
securityContext:
runAsUser: 1000
runAsGroup: 0
initContainers:
- name: elastic-internal-init-filesystem
securityContext:
runAsUser: 0
runAsGroup: 0
And the pod starts up and can write to the volume as uid 1000.

Related

Statefulsets in K8S are being recreated after deletion

I'm playing with the Elasticsearch operator Kubernetes and created two stateful sets (see https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-orchestration.html):
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.12.1
nodeSets:
- name: master-nodes
count: 3
config:
node.roles: ["master"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
- name: data-nodes
count: 3
config:
node.roles: ["data"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
The problem is that I cannot delete the stateful sets. After deletion, they're recreated automatically:
my-PC:~$ kubectl get sts
NAME READY AGE
quickstart-es-data-nodes 0/0 14m
quickstart-es-master-nodes 0/0 18m
my-PC:~$ kubectl delete sts quickstart-es-data-nodes --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
statefulset.apps "quickstart-es-data-nodes" force deleted
my-PC:~$ kubectl get sts
NAME READY AGE
quickstart-es-data-nodes 0/3 3s
quickstart-es-master-nodes 0/0 18m
Before deletion I already scaled down the statefulset to 0 to ensure that all pods are terminated. But after deletion, the stateful is recreated (see quickstart-es-data-nodes).
So, anyone having any idea how I can delete the stateful sets without being recreated?
it's due to the operator you are using for the Elasticsearch. Operator manage the statefulset and will update if you delete it.
Behind the scenes, ECK translates each NodeSet specified in the
Elasticsearch resource into a StatefulSet in Kubernetes.
if you read the documentation: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-orchestration.html#k8s-statefulsets
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#on-delete
You have to delete the custom object. The operator owns those StatefulSets and will continually update them to match its expected content.
I finally got the answer... I need to run the following command for deletion:
kubectl delete elasticsearch quickstart
This finally removed the quickstart examples.

How to resize an ECK cluster

I have an elasticsearch cluster that has the storage field set to 10Gi, I want to resize this cluster (for testing purposes to 15Gi). However, after changing the storage value from 10Gi to 15Gi I can see that the cluster still did not resize and the generated PVC is still set to 10Gi.
From what I can tell the aws-ebs storage https://kubernetes.io/docs/concepts/storage/storage-classes/ allows for volume expansion when the field allowVolumeExpansion is true. But even when I have this, the volume is never expanded when I change that storage value
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: elasticsearch-storage
namespace: test
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Delete
allowVolumeExpansion: true
---
apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
name: elasticsearch
namespace: test
spec:
version: 7.4.2
spec:
http:
tls:
certificate:
secretName: es-cert
nodeSets:
- name: default
count: 3
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
annotations:
volume.beta.kubernetes.io/storage-class: elasticsearch-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: elasticsearch-storage
resources:
requests:
storage: 15Gi
config:
node.master: true
node.data: true
node.ingest: true
node.store.allow_mmap: false
xpack.security.authc.realms:
native:
native1:
order: 1
---
Technically it should work but your Kubernetes cluster might not be able to connect to the AWS API to expand the volume. Did you check the actual EBS volume on the EC2 console or AWS CLI? You can debug this issue by looking at the kube-controller-manager and cloud-controller manager logs.
My guess is that there is some type of permission issue that from your K8s cluster that cannot talk to your AWS/EC2 API.
If you are running EKS, make sure that the IAM cluster role that you are using has permissions for EC2/EBS. You can check the control plane logs (kube-controller-manager, kube-apiserver, cloud-controller-manager, etc) on CloudWatch.
EDIT:
The Elasticsearch operator uses StatefulSets and as of this date Volume expansion is not supported on StatefulSets.

Encrypt the elasticsearch data in k8s

I have installed the elastic search image in k8s on a PV which is created using ceph-rook DFS.
When installing the ceph-rook the encryption mode was enabled
#pvc for elasitc search pod
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: core-pv-claim
namespace: test
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: rook-cephfs
#volume mount for elastic search pod
volumeMounts:
- name: persistent-storage
mountPath: /usr/local/elastic/data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: core-pv-claim
The pod got deployed successfully and the data is being saved in "/usr/local/elastic/data"
When i logged into the pod and changed the path i could see the date in rest in the "/usr/local/elastic/data" without any encryption
#kubectl exec -it elastic-pod12 bash
#ls /usr/local/elastic/data
#your data
Is there a way to encrypt this data as well, or restrict the user from accessing the same using via kubectl

Digital Ocean managed Kubernetes volume in pending state

It's not so digital ocean specific, would be really nice to verify if this is an expected behavior or not.
I'm trying to setup ElasticSearch cluster on DO managed Kubernetes cluster with helm chart from ElasticSearch itself
And they say that I need to specify a storageClassName in a volumeClaimTemplate in order to use volume which is provided by managed kubernetes service. For DO it's do-block-storages according to their docs. Also seems to be it's not necessary to define PVC, helm chart should do it itself.
Here's config I'm using
# Specify node pool
nodeSelector:
doks.digitalocean.com/node-pool: elasticsearch
# Shrink default JVM heap.
esJavaOpts: "-Xmx128m -Xms128m"
# Allocate smaller chunks of memory per pod.
resources:
requests:
cpu: "100m"
memory: "512M"
limits:
cpu: "1000m"
memory: "512M"
# Specify Digital Ocean storage
# Request smaller persistent volumes.
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: do-block-storage
resources:
requests:
storage: 10Gi
extraInitContainers: |
- name: create
image: busybox:1.28
command: ['mkdir', '/usr/share/elasticsearch/data/nodes/']
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-master
- name: file-permissions
image: busybox:1.28
command: ['chown', '-R', '1000:1000', '/usr/share/elasticsearch/']
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-master
Helm chart i'm setting with terraform, but it doesn't matter anyway, which way you'll do it:
resource "helm_release" "elasticsearch" {
name = "elasticsearch"
chart = "elastic/elasticsearch"
namespace = "elasticsearch"
values = [
file("charts/elasticsearch.yaml")
]
}
Here's what I've got when checking pod logs:
51s Normal Provisioning persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 External provisioner is provisioning volume for claim "elasticsearch/elasticsearch-master-elasticsearch-master-2"
2m28s Normal ExternalProvisioning persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 waiting for a volume to be created, either by external provisioner "dobs.csi.digitalocean.com" or manually created by system administrator
I'm pretty sure the problem is a volume. it should've been automagically provided by kubernetes. Describing persistent storage gives this:
holms#debian ~/D/c/s/b/t/s/post-infra> kubectl describe pvc elasticsearch-master-elasticsearch-master-0 --namespace elasticsearch
Name: elasticsearch-master-elasticsearch-master-0
Namespace: elasticsearch
StorageClass: do-block-storage
Status: Pending
Volume:
Labels: app=elasticsearch-master
Annotations: volume.beta.kubernetes.io/storage-provisioner: dobs.csi.digitalocean.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: elasticsearch-master-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 4m57s (x176 over 14h) dobs.csi.digitalocean.com_master-setupad-eu_04e43747-fafb-11e9-b7dd-e6fd8fbff586 External provisioner is provisioning volume for claim "elasticsearch/elasticsearch-master-elasticsearch-master-0"
Normal ExternalProvisioning 93s (x441 over 111m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "dobs.csi.digitalocean.com" or manually created by system administrator
I've google everything already, it seems to be everything is correct, and volume should be up withing DO side with no problems, but it hangs in pending state. Is this expected behavior or should I ask DO support to check what's going on their side?
Yes, this is expected behavior. This chart might not be compatible with Digital Ocean Kubernetes service.
Digital Ocean documentation has the following information in Known Issues section:
Support for resizing DigitalOcean Block Storage Volumes in Kubernetes has not yet been implemented.
In the DigitalOcean Control Panel, cluster resources (worker nodes, load balancers, and block storage volumes) are listed outside of the Kubernetes page. If you rename or otherwise modify these resources in the control panel, you may render them unusable to the cluster or cause the reconciler to provision replacement resources. To avoid this, manage your cluster resources exclusively with kubectl or from the control panel’s Kubernetes page.
In the charts/stable/elasticsearch there are specific requirements mentioned:
Prerequisites Details
Kubernetes 1.10+
PV dynamic provisioning support on the underlying infrastructure
You can ask Digital Ocean support for help or try to deploy ElasticSearch without helm chart.
It is even mentioned on github that:
Automated testing of this chart is currently only run against GKE (Google Kubernetes Engine).
Update:
The same issue is present on my kubeadm ha cluster.
However I managed to get it working by manually creating PersistentVolumes's for my storageclass.
My storageclass definition: storageclass.yaml:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
parameters:
type: pd-ssd
$ kubectl apply -f storageclass.yaml
$ kubectl get sc
NAME PROVISIONER AGE
ssd local 50m
My PersistentVolume definition: pv.yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: ssd
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- <name of the node>
kubectl apply -f pv.yaml
After that I ran helm chart:
helm install stable/elasticsearch --name my-release --set data.persistence.storageClass=ssd,data.storage=30Gi --set data.persistence.storageClass=ssd,master.storage=30Gi
PVC finally got bound.
$ kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default data-my-release-elasticsearch-data-0 Bound task-pv-volume2 30Gi RWO ssd 17m
default data-my-release-elasticsearch-master-0 Pending 17m
Note that I only manually satisfied only single pvc and ElasticSearch manual volume provisioning might be very inefficient.
I suggest contacting DO support for automated volume provisioning solution.
What a strange situation, after I've changed 10Gi to 10G it started to work. Maybe it has to do something with a storage class it's self, but it started to work.

Elasticsearch deployment on kubernetes using Persistent Volume

I am trying to deploy a Elasticsearch cluster(replicas: 3) using Statefulset in kubernetes and need to store the Elasticsearch data in a Persistent Volume (PV). Since each Elasticsearch instance has its own data folder, I need to have separate data folder for each replica in the PV. I am trying to use volumeClaimTemplates and mountPath: /usr/share/elasticsearch/data but this is resulting in an error: pod has unbound immediate PersistentVolumeClaims in the second pod. Hence how can I achieve this using Statefulset?
Thanks in advance.
There is no information how you are trying to install elastic-search however:
As an example please follow:
this tutorial,
helm-charts,
As per documentation for StatefulSet - limitations:
The storage for a given Pod must either be provisioned by a PersistentVolume Provisioner based on the requested storage class, or pre-provisioned by an admin.
This looks like your example, problem with dynamic storage provisioning.
Please verify storage class, if pv and pvc were created and bind together and storage class in volumeClaimTemplates:
volumeMounts:
- name: "elasticsearch-master"
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: elasticsearch-master
spec:
accessModes:
- ReadWriteOnce
storageClassName: name #please refer to this settings to see if you are using default storage class. In other case you should spceify this parameter manually
resources:
requests:
storage: 30Gi
Hope this help.
If you are using dynamic provisioning then you can get the volume created automatically at backend, like disk is storage for PVs in Azure ( for Read Write Once kind of operations), else you need to create that manually
Once you create the volume, just create a pvc in the appropriate namespace which is of size matching the pv, then you are just supposed to pass the volume name in pvc definition, it will get bound automatically.
You can try something like this -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claimName
namespace: namespace
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: default
volumeName: pv-volumeName
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
Please share if you still face issues

Resources