Kubernetes Helm Elasticstack CrashLoopBackOff with JavaErrors in Log - elasticsearch

I'm trying to deploy the ELK stack to my developing kubernetes cluster. It seems that I do everything as described in the tutorials, however, the pods keep failing with Java errors (see below). I will describe the whole process from installing the cluster until the error happens.
Step 1: Installing the cluster
# Apply sysctl params without reboot
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sudo sysctl --system
#update and install apt https stuff
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
# add docker repo for containerd and install it
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y containerd.io
# copy config
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1 // somewhat redundant
net.bridge.bridge-nf-call-iptables = 1 // somewhat redundant
EOF
sudo sysctl --system
#install kubernetes binaries
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
#disable swap and comment swap in fstab
sudo swapoff -v /dev/mapper/main-swap
sudo nano /etc/fstab
#init cluster
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
#make user to kubectl admin
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#install calico
kubectl apply -f
kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml
#untaint master node that pods can run on it
kubectl taint nodes --all node-role.kubernetes.io/master-
#install helm
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
Step 2: Install ECK (https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-install-helm.html) and elasticsearch (https://github.com/elastic/helm-charts/blob/master/elasticsearch/README.md#installing)
# add helm repo
helm repo add elastic https://helm.elastic.co
helm repo update
# install eck
#### ommited as suggested in comment section!!!! helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace
helm install elasticsearch elastic/elasticsearch
Step 3: Add PersistentVolume
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data1
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data1"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data2
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data2"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data3
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data3"
apply it
sudo mkdir /mnt/data1
sudo mkdir /mnt/data2
sudo mkdir /mnt/data3
kubectl apply -f storage.yaml
Now the pods (or at least one) sould run. But I keep getting STATUS CrashLoopBackOff with java errors in the log.
kubectl get pv,pvc,pods
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/elk-data1 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-1 140m
persistentvolume/elk-data2 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-2 140m
persistentvolume/elk-data3 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-0 140m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-0 Bound elk-data3 30Gi RWO 141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-1 Bound elk-data1 30Gi RWO 141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 Bound elk-data2 30Gi RWO 141m
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-master-0 0/1 CrashLoopBackOff 32 141m
pod/elasticsearch-master-1 0/1 Pending 0 141m
pod/elasticsearch-master-2 0/1 Pending 0 141m
Logs and Error:
kubectl logs pod/elasticsearch-master-2
Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:487)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1766)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:488)
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:558)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:207)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:220)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:197)
at org.elasticsearch.common.logging.LogConfigurator.configureStatusLogger(LogConfigurator.java:248)
at org.elasticsearch.common.logging.LogConfigurator.configureWithoutConfig(LogConfigurator.java:95)
at org.elasticsearch.cli.CommandLoggingConfigurator.configureLoggingWithoutConfig(CommandLoggingConfigurator.java:29)
at org.elasticsearch.cli.Command.main(Command.java:76)
at org.elasticsearch.common.settings.KeyStoreCli.main(KeyStoreCli.java:32)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
... 26 more
Caused by: java.lang.ExceptionInInitializerError
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
... 31 more
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:208)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
at java.base/java.nio.file.Path.of(Path.java:147)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
... 33 more
Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
at java.management/sun.management.spi.PlatformMBeanProvider$PlatformComponent.getMBeans(PlatformMBeanProvider.java:195)
at java.management/java.lang.management.ManagementFactory.getPlatformMXBean(ManagementFactory.java:686)
at java.management/java.lang.management.ManagementFactory.getOperatingSystemMXBean(ManagementFactory.java:388)
at org.elasticsearch.tools.launchers.DefaultSystemMemoryInfo.<init>(DefaultSystemMemoryInfo.java:28)
at org.elasticsearch.tools.launchers.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:125)
at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:86)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
... 10 more
Caused by: java.lang.ExceptionInInitializerError
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
... 15 more
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:208)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
at java.base/java.nio.file.Path.of(Path.java:147)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
... 17 more
values.yaml from helm chart
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
remote_cluster_client: "true"
ml: "true"
replicas: 3
minimumMasterNodes: 2
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
hostAliases: []
#- ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.12.1"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
persistence:
enabled: true
labels:
# Add default labels for the volumeClaimTemplate of the StatefulSet
enabled: false
annotations: {}
extraVolumes: []
# - name: extras
# emptyDir: {}
extraVolumeMounts: []
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
# readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.12/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
networkPolicy:
## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now.
## In order for a Pod to access Elasticsearch, it needs to have the following label:
## {{ template "uname" . }}-client: "true"
## Example for default configuration to access HTTP port:
## elasticsearch-master-http-client: "true"
## Example for default configuration to access transport port:
## elasticsearch-master-transport-client: "true"
http:
enabled: false
## if explicitNamespacesSelector is not set or set to {}, only client Pods being in the networkPolicy's namespace
## and matching all criteria can reach the DB.
## But sometimes, we want the Pods to be accessible to clients from other namespaces, in this case, we can use this
## parameter to select these namespaces
##
# explicitNamespacesSelector:
# # Accept from namespaces with all those different rules (only from whitelisted Pods)
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
## Additional NetworkPolicy Ingress "from" rules to set. Note that all rules are OR-ed.
##
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
transport:
## Note that all Elasticsearch Pods can talks to themselves using transport port even if enabled.
enabled: false
# explicitNamespacesSelector:
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""

What you are experiencing is not an issue related to Elasticsearch. It is a problem resulting from the cgroup configuration for the version of containerd you are using. I haven't unpacked the specifics, but the exception in the Elasticsearch logs relates to the JDK failing when attempting to retrieve the required cgroup information.
I had the same issue and resolved it by executing the following steps, before installing Kubernetes, to install a later version of containerd and configure it to use cgroups with systemd:
Add the GPG key for the official Docker repository.
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Add the Docker repository to APT sources.
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
Install the latest containerd.io package instead of the containerd package from Ubuntu.
apt-get -y install containerd.io
Generate the default containerd configuration.
containerd config default > /etc/containerd/config.toml
Configure containerd to use systemd to manage the cgroups.
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
Restart the containerd service.
systemctl restart containerd

For the ELK stack to work you need all three PersistentVolumeClaim's to be bound as I recall. Instead of creating 1 30 GB of PV create 3 of the same size with the claims and then re-install. Other nodes have unmet dependincies.
Also please do not handle the volumes by hand. There are guidelines to deploy dynamic volums. Use OpenEBS for example. That way you wont need to worry about the pvc's. After giving the pv's if anything happens write again with your cluster installation process.
I was wrong obviously, in this particular problem, filesystems and cgroups take role and the main problem of this is an old problem. From 5.2.1 to 8.0.0.
Reinstall the chart by pulling the chart. Edit values file and definitely change the container version. It should be fine or create another error log stack.

Get it solved down to the 7.11.x version.
My kubernetes version is v1.13.x

Related

Nifi cluster does not spin up using NifiKop

I have followed this link https://orange-opensource.github.io/nifikop/docs/next/2_setup/1_getting_started to install nifikop and spin up the cluster however the cluster doesn't seem spin up.
Below are the series of command I executed
Create namespace:
kubectl create namespace nifi
kubectl create namespace zookeeper
kubectl create namespace cert-manager
Create custom Storageclass
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
parameters:
type: pd-standard
provisioner: kubernetes.io/gce-pd
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
EOF
Create service account
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: nifikop
EOF
Install zookeeper
helm install nifikop-zk bitnami/zookeeper \
--namespace=nifi \
--set resources.requests.memory=256Mi \
--set resources.requests.cpu=250m \
--set resources.limits.memory=256Mi \
--set resources.limits.cpu=250m \
--set networkPolicy.enabled=true \
--set replicaCount=3 \
--set namespaces={“nifi”}
Install CRDS
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.yaml
kubectl apply -f https://raw.githubusercontent.com/Orange-OpenSource/nifikop/master/config/crd/bases/nifi.orange.com_nificlusters.yaml
kubectl apply -f https://raw.githubusercontent.com/Orange-OpenSource/nifikop/master/config/crd/bases/nifi.orange.com_nifiusers.yaml
kubectl apply -f https://raw.githubusercontent.com/Orange-OpenSource/nifikop/master/config/crd/bases/nifi.orange.com_nifiusergroups.yaml
kubectl apply -f https://raw.githubusercontent.com/Orange-OpenSource/nifikop/master/config/crd/bases/nifi.orange.com_nifidataflows.yaml
kubectl apply -f https://raw.githubusercontent.com/Orange-OpenSource/nifikop/master/config/crd/bases/nifi.orange.com_nifiparametercontexts.yaml
kubectl apply -f https://raw.githubusercontent.com/Orange-OpenSource/nifikop/master/config/crd/bases/nifi.orange.com_nifiregistryclients.yaml
Install NifiKOP
helm install nifikop orange-incubator/nifikop \
--namespace=nifi \
--version="0.4.2-alpha" \
--set resources.requests.memory=256Mi \
--set resources.requests.cpu=250m \
--set resources.limits.memory=256Mi \
--set resources.limits.cpu=250m \
--set namespaces={"nifi"} --skip-crds
Create cluster
cat <<EOF | kubectl create -n nifi -f -
apiVersion: nifi.konpyutaika.com/v1
kind: NifiCluster
metadata:
name: simplenifi
spec:
service:
headlessEnabled: true
zkAddress: "nifikop-zk-zookeeper:2181"
zkPath: /simplenifi
clusterImage: "apache/nifi:1.17.0"
oneNifiNodePerNode: false
nodeConfigGroups:
default_group:
isNode: true
serviceAccountName: nifikop
storageConfigs:
- mountPath: "/opt/nifi/nifi-current/logs"
name: logs
pvcSpec:
accessModes:
- ReadWriteOnce
storageClassName: "local-storage"
resources:
requests:
storage: 10Gi
resourcesRequirements:
limits:
cpu: "2"
memory: 3Gi
requests:
cpu: "1"
memory: 1Gi
nodes:
- id: 1
nodeConfigGroup: "default_group"
- id: 2
nodeConfigGroup: "default_group"
propagateLabels: true
nifiClusterTaskSpec:
retryDurationMinutes: 10
listenersConfig:
internalListeners:
- containerPort: 8080
type: http
name: http
- containerPort: 6007
type: cluster
name: cluster
- containerPort: 10000
type: s2s
name: s2s
- containerPort: 9090
type: prometheus
name: prometheus
- containerPort: 6342
type: load-balance
name: load-balance
EOF
But I can see only these pods under nifi namespace
root#bh-gsn-57-asca-dev-01:~/nifikop# k get pods -n nifi
NAME READY STATUS RESTARTS AGE
nifikop-5d6f94854-fjx4q 1/1 Running 0 32m
nifikop-zk-zookeeper-0 1/1 Running 0 39m
nifikop-zk-zookeeper-1 1/1 Running 0 39m
nifikop-zk-zookeeper-2 1/1 Running 0 39m
root#bh-gsn-57-asca-dev-01:~/nifikop# k get NifiCluster -n nifi
NAME AGE
simplenifi 19m

Expose Elastic Search externally with istio ingress controller

My Kubernetes cluster is an Azure Kubernetes Service one with Istio as the ingress controller. I have a working Elastic Search instance that I'm having trouble to expose.
I'm deploying such an Elastic Search instance with Elastic's Helm chart for version 7.17 and the following values.yml:
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
remote_cluster_client: "true"
ml: "true"
replicas: 1
minimumMasterNodes: 1
esMajorVersion: ""
clusterDeprecationIndexing: "false"
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig:
elasticsearch.yml: |
xpack.security.enabled: false
esJvmOptions: {}
# processors.options: |
# -XX:ActiveProcessorCount=3
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
hostAliases: []
#- ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.17.3"
imagePullPolicy: "IfNotPresent"
podAnnotations:
{}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "" # example: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "2500m"
memory: "2Gi"
limits:
cpu: "2500m"
memory: "5Gi"
initResources:
{}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
automountToken: true
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
persistence:
enabled: true
labels:
# Add default labels for the volumeClaimTemplate of the StatefulSet
enabled: false
annotations: {}
extraVolumes:
[]
# - name: extras
# emptyDir: {}
extraVolumeMounts:
[]
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers:
[]
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers:
[]
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
enabled: true
labels: {}
labelsHeadless: {}
type: ClusterIP
# Consider that all endpoints are considered "ready" even if the Pods themselves are not
# https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/#ServiceSpec
publishNotReadyAddresses: false
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
# readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 200
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
# Enabling this will publicly expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: istio
# kubernetes.io/tls-acme: "true"
className: "istio"
pathtype: ImplementationSpecific
hosts:
- host: "internal.company.domain.com"
paths:
- path: /elastic
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
healthNameOverride: ""
lifecycle:
{}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
networkPolicy:
## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now.
## In order for a Pod to access Elasticsearch, it needs to have the following label:
## {{ template "uname" . }}-client: "true"
## Example for default configuration to access HTTP port:
## elasticsearch-master-http-client: "true"
## Example for default configuration to access transport port:
## elasticsearch-master-transport-client: "true"
http:
enabled: false
## if explicitNamespacesSelector is not set or set to {}, only client Pods being in the networkPolicy's namespace
## and matching all criteria can reach the DB.
## But sometimes, we want the Pods to be accessible to clients from other namespaces, in this case, we can use this
## parameter to select these namespaces
##
# explicitNamespacesSelector:
# # Accept from namespaces with all those different rules (only from whitelisted Pods)
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
## Additional NetworkPolicy Ingress "from" rules to set. Note that all rules are OR-ed.
##
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
transport:
## Note that all Elasticsearch Pods can talk to themselves using transport port even if enabled.
enabled: false
# explicitNamespacesSelector:
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
tests:
enabled: true
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
But when I browse to https://internal.company.domain.com/elastic I get nothing. So evidently I'm not doing the ingress part as I should.
If I use Lens to inspect the logs of the Pod, the instance is working OK and I can even use portforwarding to confirm it's all good with the container.

K8s Elasticsearch with filebeat is keeping 'not ready' after rebooting

I'm going through a not very understandable situation.
Environment
Two dedicated nodes with azure centos 8.2 (2vcpu, 16G ram), not AKS
1 master node, 1 worker node.
kubernetes v1.19.3
helm v2.16.12
Helm charts Elastic (https://github.com/elastic/helm-charts/tree/7.9.3)
At the first time, It works fine with below installation.
## elasticsearch, filebeat
# kubectl apply -f pv.yaml
# helm install -f values.yaml --name elasticsearch elastic/elasticsearch
# helm install --name filebeat --version 7.9.3 elastic/filebeat
curl elasitcsearchip:9200 and curl elasitcsearchip:9200/_cat/indices
show right values.
but after rebooting a worker node, it just keeping ready 0/1 and not working.
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 Running 10 71m
filebeat-filebeat-67qm2 0/1 Running 4 40m
In this situation, after removing /mnt/data/nodes and rebooting again
then works fine.
elasticsearch pod has nothing special I think.
#describe
{"type": "server", "timestamp": "2020-10-26T07:49:49,708Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[filebeat-7.9.3-2020.10.26-000001][0]]]).", "cluster.uuid": "sWUAXJG9QaKyZDe0BLqwSw", "node.id": "ztb35hToRf-2Ahr7olympw" }
#logs
Normal SandboxChanged 4m4s (x3 over 4m9s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 4m3s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 4m1s kubelet Created container configure-sysctl
Normal Started 4m1s kubelet Started container configure-sysctl
Normal Pulled 3m58s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 3m58s kubelet Created container elasticsearch
Normal Started 3m57s kubelet Started container elasticsearch
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
#events
6m1s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
6m1s Normal Pulled pod/filebeat-filebeat-67qm2 Container image "docker.elastic.co/beats/filebeat:7.9.3" already present on machine
5m59s Normal Started pod/elasticsearch-master-0 Started container configure-sysctl
5m59s Normal Created pod/elasticsearch-master-0 Created container configure-sysctl
5m59s Normal Created pod/filebeat-filebeat-67qm2 Created container filebeat
5m58s Normal Started pod/filebeat-filebeat-67qm2 Started container filebeat
5m56s Normal Created pod/elasticsearch-master-0 Created container elasticsearch
5m56s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
5m55s Normal Started pod/elasticsearch-master-0 Started container elasticsearch
61s Warning Unhealthy pod/filebeat-filebeat-67qm2 Readiness probe failed: elasticsearch: http://elasticsearch-master:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 10.97.133.135
dial up... ERROR dial tcp 10.97.133.135:9200: connect: connection refused
59s Warning Unhealthy pod/elasticsearch-master-0 Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
/mnt/data path has chown 1000:1000
and In case of only elastisearch without filebeat, rebooting has no problem.
I can't figure this out at all. :(
What am I missing?
pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: elastic-pv
labels:
type: local
app: elastic
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
claimRef:
namespace: default
name: elasticsearch-master-elasticsearch-master-0
hostPath:
path: "/mnt/data"
values.yaml
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicas: 1
minimumMasterNodes: 1
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.9.3"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-storage
resources:
requests:
storage: 5Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: true
name: elastic-vc
labels:
# Add default labels for the volumeClaimTemplate fo the StatefulSet
app: elastic
annotations: {}
extraVolumes: []
# - name: extras
# emptyDir: {}
extraVolumeMounts: []
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
#readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# - effect: NoSchedule
# key: node-role.kubernetes.io/master
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
Issue
There is an issue with elasticsearch readiness probe when running on single replica cluster.
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
Solution
As mentioned here by #adinhodovic
If your running a single replica cluster add the following helm value:
clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"
Your status will never go green with a single replica cluster.
The following values should work:
replicas: 1
minimumMasterNodes: 1
clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'

StatefulSet FailedCreate ElasticSearch

When attempting to install ElasticSearch for Kubernetes on a PKS instance I am running into an issue where after running kubectl get events --all-namespaces I see create Pod logging-es-default-0 in StatefulSet logging-es-default failed error: pods "logging-es-default-0" is forbidden: SecurityContext.RunAsUser is forbidden. Does this have something to do with a pod security policy? Is there any way to be able to deploy ElasticSearch to Kubernetes if privileged containers are not allowed?
Edit: here is the values.yml file that I am passing into the elasticsearch helm chart.
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicas: 3
minimumMasterNodes: 2
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.4.1"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "100m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
rbac:
create: false
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: true
annotations: {}
extraVolumes: ""
# - name: extras
# emptyDir: {}
extraVolumeMounts: ""
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraInitContainers: ""
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: null
runAsUser: null
# The following value is deprecated,
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
securityContext:
capabilities: null
# readOnlyRootFilesystem: true
runAsNonRoot: null
runAsUser: null
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
sysctlInitContainer:
enabled: false
keystore: []
The values listed above produce the following error:
create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master failed error: pods "elasticsearch-master-0" is forbidden: SecurityContext.RunAsUser is forbidden
Solved: I learned that my istio deployment was causing issues when attempting to deploy any other service into my cluster. I had made a bad assumption that istio along with my cluster security policies weren't causing my issue.
is forbidden: SecurityContext.RunAsUser is forbidden. Does this have something to do with a pod security policy?
Yes, that's exactly what it has to do with
Evidently the StatefulSet has included a securityContext: stanza, but your cluster administrator forbids such an action
Is there any way to be able to deploy ElasticSearch to Kubernetes if privileged containers are not allowed?
That's not exactly what's going on here -- it's not the "privileged" part that is causing you problems -- it's the PodSpec requesting to run the container as a user other than the one in the docker image. In fact, I would actually be very surprised if any modern elasticsearch docker image requires modifying the user at all, since all the recent ones do not run as root to begin with
Remove that securityContext: stanza from the StatefulSet and report back what new errors arise (if any)

Readiness and Liveness probes for elasticsearch 6.3.0 on Kubernetes failing

I am trying to setup EFK stack on Kubernetes . The Elasticsearch version being used is 6.3.2. Everything works fine until I place the probes configuration in the deployment YAML file. I am getting error as below. This is causing the pod to be declared unhealthy and eventually gets restarted which appears to be a false restart.
Warning Unhealthy 15s kubelet, aks-agentpool-23337112-0 Liveness probe failed: Get http://10.XXX.Y.ZZZ:9200/_cluster/health: dial tcp 10.XXX.Y.ZZZ:9200: connect: connection refused
I did try using telnet from a different container to the elasticsearch pod with IP and port and I was successful but only kubelet on the node is unable to resolve the IP of the pod causing the probes to fail.
Below is the snippet from the pod spec of the Kubernetes Statefulset YAML. Any assistance on the resolution would be really helpful. Spent quite a lot of time on this without any clue :(
PS: The stack is being setup on AKS cluster
- name: es-data
image: quay.io/pires/docker-elasticsearch-kubernetes:6.3.2
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: myesdb
- name: NODE_MASTER
value: "false"
- name: NODE_INGEST
value: "false"
- name: HTTP_ENABLE
value: "true"
- name: NODE_DATA
value: "true"
- name: DISCOVERY_SERVICE
value: "elasticsearch-discovery"
- name: NETWORK_HOST
value: "_eth0:ipv4_"
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: PROCESSORS
valueFrom:
resourceFieldRef:
resource: limits.cpu
resources:
requests:
cpu: 0.25
limits:
cpu: 1
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
livenessProbe:
httpGet:
port: http
path: /_cluster/health
initialDelaySeconds: 40
periodSeconds: 10
readinessProbe:
httpGet:
path: /_cluster/health
port: http
initialDelaySeconds: 30
timeoutSeconds: 10
The pods/containers runs just fine without the probes in place . Expectation is that the probes should work fine when set on the deployment YAMLs and the POD should not get restarted.
The thing is that ElasticSearch itself has own health statuses (red, yellow, green) and you need to consider that in your configuration.
Here what I found in my own ES configuration, based on the official ES helm chart:
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 40
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be green
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file
http () {
local path="${1}"
if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
else
BASIC_AUTH=''
fi
curl -XGET -s -k --fail ${BASIC_AUTH} http://127.0.0.1:9200${path}
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy'
http "/"
else
echo 'Waiting for elasticsearch cluster to become green'
if http "/_cluster/health?wait_for_status=green&timeout=1s" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet green'
exit 1
fi
fi
First Please check the logs using
kubectl logs <pod name> -n <namespacename>
You have to first run the init container and change the volume permissions.
you have to run the whole config as the user : 1000 also before the container of elasticsearch start you have to change the volume permission using init container.
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
name: elasticsearch
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
serviceName: elasticsearch
template:
metadata:
creationTimestamp: null
labels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
spec:
containers:
- env:
- name: cluster.name
value: <SET THIS>
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: bootstrap.memory_lock
value: "false"
image: elasticsearch:6.5.0
imagePullPolicy: IfNotPresent
name: elasticsearch
ports:
- containerPort: 9200
name: http
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
resources:
limits:
cpu: 250m
memory: 1Gi
requests:
cpu: 150m
memory: 512Mi
securityContext:
privileged: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- chown -R 1000:1000 /usr/share/elasticsearch/data
- sysctl -w vm.max_map_count=262144
- chmod 777 /usr/share/elasticsearch/data
- chomod 777 /usr/share/elasticsearch/data/node
- chmod g+rwx /usr/share/elasticsearch/data
- chgrp 1000 /usr/share/elasticsearch/data
image: busybox:1.29.2
imagePullPolicy: IfNotPresent
name: set-dir-owner
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
updateStrategy:
type: OnDelete
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Check out the my yaml config and you can use. It's for single node of elasticsearch
Probe outlined in my answer works in 3 nodes discovery when Istio presented. If livenessProbe is bad, than k8s will restart container even not allowing to start properly. I use internal Elastic ports (for node to node communication) to test liveness. These ports speak TCP.
livenessProbe:
tcpSocket:
port: 9300
initialDelaySeconds: 60 # it takes time from jvm process to start start up to point when discovery process starts
timeoutSeconds: 10
- name: discovery.zen.minimum_master_nodes
value: "2"
- name: discovery.zen.ping.unicast.hosts
value: elastic

Resources