Expose Elastic Search externally with istio ingress controller - elasticsearch

My Kubernetes cluster is an Azure Kubernetes Service one with Istio as the ingress controller. I have a working Elastic Search instance that I'm having trouble to expose.
I'm deploying such an Elastic Search instance with Elastic's Helm chart for version 7.17 and the following values.yml:
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
remote_cluster_client: "true"
ml: "true"
replicas: 1
minimumMasterNodes: 1
esMajorVersion: ""
clusterDeprecationIndexing: "false"
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig:
elasticsearch.yml: |
xpack.security.enabled: false
esJvmOptions: {}
# processors.options: |
# -XX:ActiveProcessorCount=3
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
hostAliases: []
#- ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.17.3"
imagePullPolicy: "IfNotPresent"
podAnnotations:
{}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "" # example: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "2500m"
memory: "2Gi"
limits:
cpu: "2500m"
memory: "5Gi"
initResources:
{}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
automountToken: true
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
persistence:
enabled: true
labels:
# Add default labels for the volumeClaimTemplate of the StatefulSet
enabled: false
annotations: {}
extraVolumes:
[]
# - name: extras
# emptyDir: {}
extraVolumeMounts:
[]
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers:
[]
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers:
[]
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
enabled: true
labels: {}
labelsHeadless: {}
type: ClusterIP
# Consider that all endpoints are considered "ready" even if the Pods themselves are not
# https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/#ServiceSpec
publishNotReadyAddresses: false
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
# readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 200
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
# Enabling this will publicly expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: istio
# kubernetes.io/tls-acme: "true"
className: "istio"
pathtype: ImplementationSpecific
hosts:
- host: "internal.company.domain.com"
paths:
- path: /elastic
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
healthNameOverride: ""
lifecycle:
{}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
networkPolicy:
## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now.
## In order for a Pod to access Elasticsearch, it needs to have the following label:
## {{ template "uname" . }}-client: "true"
## Example for default configuration to access HTTP port:
## elasticsearch-master-http-client: "true"
## Example for default configuration to access transport port:
## elasticsearch-master-transport-client: "true"
http:
enabled: false
## if explicitNamespacesSelector is not set or set to {}, only client Pods being in the networkPolicy's namespace
## and matching all criteria can reach the DB.
## But sometimes, we want the Pods to be accessible to clients from other namespaces, in this case, we can use this
## parameter to select these namespaces
##
# explicitNamespacesSelector:
# # Accept from namespaces with all those different rules (only from whitelisted Pods)
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
## Additional NetworkPolicy Ingress "from" rules to set. Note that all rules are OR-ed.
##
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
transport:
## Note that all Elasticsearch Pods can talk to themselves using transport port even if enabled.
enabled: false
# explicitNamespacesSelector:
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
tests:
enabled: true
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
But when I browse to https://internal.company.domain.com/elastic I get nothing. So evidently I'm not doing the ingress part as I should.
If I use Lens to inspect the logs of the Pod, the instance is working OK and I can even use portforwarding to confirm it's all good with the container.

Related

ElasticSearch Cluster on AKS with loadbalancer

I am trying to deploy elasticsearch to AKS with a loadbalancer.
What I am struggling to achieve is to have a load balancer that only directs traffic to my client nodes.
This is what I have:
ElasticSearch dployment YAML:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.16.2
nodeSets:
# 3 dedicated master nodes
- name: master
count: 3
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
config:
node.roles: ["master"]
#node.remote_cluster_client: false
# 3 ingest-data nodes
- name: ingest-data
count: 3
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
config:
node.roles: ["data", "ingest"]
# 3 client nodes
- name: client
count: 3
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
config:
node.roles: []
Load Balancer YAML:
apiVersion: v1
kind: Service
metadata:
name: ingress-controller
spec:
type: LoadBalancer
ports:
- name: http
port: 9200
targetPort: 9200
protocol: TCP
selector:
elasticsearch.k8s.elastic.co/cluster-name: "quickstart"
elasticsearch.k8s.elastic.co/node-master: "false"
elasticsearch.k8s.elastic.co/node-data: "false"
elasticsearch.k8s.elastic.co/node-ingest: "false"
elasticsearch.k8s.elastic.co/node-ml: "false"
elasticsearch.k8s.elastic.co/node-transform: "false"
Output of kubectl get svc ingress-controller (public ip redacted)
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-controller LoadBalancer 10.0.221.180 20.0.0.95 9200:32360/TCP 54m
This loadbalancer does not repond with anything on port 9200 so I suspect its not working anyway but I am not sure how to achieve what im trying to do at all.
Thankyou in advance. I appriciate any tips on how to solve this.

Kubernetes Helm Elasticstack CrashLoopBackOff with JavaErrors in Log

I'm trying to deploy the ELK stack to my developing kubernetes cluster. It seems that I do everything as described in the tutorials, however, the pods keep failing with Java errors (see below). I will describe the whole process from installing the cluster until the error happens.
Step 1: Installing the cluster
# Apply sysctl params without reboot
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sudo sysctl --system
#update and install apt https stuff
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
# add docker repo for containerd and install it
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y containerd.io
# copy config
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1 // somewhat redundant
net.bridge.bridge-nf-call-iptables = 1 // somewhat redundant
EOF
sudo sysctl --system
#install kubernetes binaries
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
#disable swap and comment swap in fstab
sudo swapoff -v /dev/mapper/main-swap
sudo nano /etc/fstab
#init cluster
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
#make user to kubectl admin
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#install calico
kubectl apply -f
kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml
#untaint master node that pods can run on it
kubectl taint nodes --all node-role.kubernetes.io/master-
#install helm
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
Step 2: Install ECK (https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-install-helm.html) and elasticsearch (https://github.com/elastic/helm-charts/blob/master/elasticsearch/README.md#installing)
# add helm repo
helm repo add elastic https://helm.elastic.co
helm repo update
# install eck
#### ommited as suggested in comment section!!!! helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace
helm install elasticsearch elastic/elasticsearch
Step 3: Add PersistentVolume
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data1
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data1"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data2
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data2"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data3
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data3"
apply it
sudo mkdir /mnt/data1
sudo mkdir /mnt/data2
sudo mkdir /mnt/data3
kubectl apply -f storage.yaml
Now the pods (or at least one) sould run. But I keep getting STATUS CrashLoopBackOff with java errors in the log.
kubectl get pv,pvc,pods
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/elk-data1 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-1 140m
persistentvolume/elk-data2 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-2 140m
persistentvolume/elk-data3 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-0 140m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-0 Bound elk-data3 30Gi RWO 141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-1 Bound elk-data1 30Gi RWO 141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 Bound elk-data2 30Gi RWO 141m
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-master-0 0/1 CrashLoopBackOff 32 141m
pod/elasticsearch-master-1 0/1 Pending 0 141m
pod/elasticsearch-master-2 0/1 Pending 0 141m
Logs and Error:
kubectl logs pod/elasticsearch-master-2
Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:487)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1766)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:488)
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:558)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:207)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:220)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:197)
at org.elasticsearch.common.logging.LogConfigurator.configureStatusLogger(LogConfigurator.java:248)
at org.elasticsearch.common.logging.LogConfigurator.configureWithoutConfig(LogConfigurator.java:95)
at org.elasticsearch.cli.CommandLoggingConfigurator.configureLoggingWithoutConfig(CommandLoggingConfigurator.java:29)
at org.elasticsearch.cli.Command.main(Command.java:76)
at org.elasticsearch.common.settings.KeyStoreCli.main(KeyStoreCli.java:32)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
... 26 more
Caused by: java.lang.ExceptionInInitializerError
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
... 31 more
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:208)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
at java.base/java.nio.file.Path.of(Path.java:147)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
... 33 more
Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
at java.management/sun.management.spi.PlatformMBeanProvider$PlatformComponent.getMBeans(PlatformMBeanProvider.java:195)
at java.management/java.lang.management.ManagementFactory.getPlatformMXBean(ManagementFactory.java:686)
at java.management/java.lang.management.ManagementFactory.getOperatingSystemMXBean(ManagementFactory.java:388)
at org.elasticsearch.tools.launchers.DefaultSystemMemoryInfo.<init>(DefaultSystemMemoryInfo.java:28)
at org.elasticsearch.tools.launchers.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:125)
at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:86)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
... 10 more
Caused by: java.lang.ExceptionInInitializerError
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
... 15 more
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:208)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
at java.base/java.nio.file.Path.of(Path.java:147)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
... 17 more
values.yaml from helm chart
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
remote_cluster_client: "true"
ml: "true"
replicas: 3
minimumMasterNodes: 2
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
hostAliases: []
#- ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.12.1"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
persistence:
enabled: true
labels:
# Add default labels for the volumeClaimTemplate of the StatefulSet
enabled: false
annotations: {}
extraVolumes: []
# - name: extras
# emptyDir: {}
extraVolumeMounts: []
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
# readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.12/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
networkPolicy:
## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now.
## In order for a Pod to access Elasticsearch, it needs to have the following label:
## {{ template "uname" . }}-client: "true"
## Example for default configuration to access HTTP port:
## elasticsearch-master-http-client: "true"
## Example for default configuration to access transport port:
## elasticsearch-master-transport-client: "true"
http:
enabled: false
## if explicitNamespacesSelector is not set or set to {}, only client Pods being in the networkPolicy's namespace
## and matching all criteria can reach the DB.
## But sometimes, we want the Pods to be accessible to clients from other namespaces, in this case, we can use this
## parameter to select these namespaces
##
# explicitNamespacesSelector:
# # Accept from namespaces with all those different rules (only from whitelisted Pods)
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
## Additional NetworkPolicy Ingress "from" rules to set. Note that all rules are OR-ed.
##
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
transport:
## Note that all Elasticsearch Pods can talks to themselves using transport port even if enabled.
enabled: false
# explicitNamespacesSelector:
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
What you are experiencing is not an issue related to Elasticsearch. It is a problem resulting from the cgroup configuration for the version of containerd you are using. I haven't unpacked the specifics, but the exception in the Elasticsearch logs relates to the JDK failing when attempting to retrieve the required cgroup information.
I had the same issue and resolved it by executing the following steps, before installing Kubernetes, to install a later version of containerd and configure it to use cgroups with systemd:
Add the GPG key for the official Docker repository.
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Add the Docker repository to APT sources.
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
Install the latest containerd.io package instead of the containerd package from Ubuntu.
apt-get -y install containerd.io
Generate the default containerd configuration.
containerd config default > /etc/containerd/config.toml
Configure containerd to use systemd to manage the cgroups.
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
Restart the containerd service.
systemctl restart containerd
For the ELK stack to work you need all three PersistentVolumeClaim's to be bound as I recall. Instead of creating 1 30 GB of PV create 3 of the same size with the claims and then re-install. Other nodes have unmet dependincies.
Also please do not handle the volumes by hand. There are guidelines to deploy dynamic volums. Use OpenEBS for example. That way you wont need to worry about the pvc's. After giving the pv's if anything happens write again with your cluster installation process.
I was wrong obviously, in this particular problem, filesystems and cgroups take role and the main problem of this is an old problem. From 5.2.1 to 8.0.0.
Reinstall the chart by pulling the chart. Edit values file and definitely change the container version. It should be fine or create another error log stack.
Get it solved down to the 7.11.x version.
My kubernetes version is v1.13.x

K8s Elasticsearch with filebeat is keeping 'not ready' after rebooting

I'm going through a not very understandable situation.
Environment
Two dedicated nodes with azure centos 8.2 (2vcpu, 16G ram), not AKS
1 master node, 1 worker node.
kubernetes v1.19.3
helm v2.16.12
Helm charts Elastic (https://github.com/elastic/helm-charts/tree/7.9.3)
At the first time, It works fine with below installation.
## elasticsearch, filebeat
# kubectl apply -f pv.yaml
# helm install -f values.yaml --name elasticsearch elastic/elasticsearch
# helm install --name filebeat --version 7.9.3 elastic/filebeat
curl elasitcsearchip:9200 and curl elasitcsearchip:9200/_cat/indices
show right values.
but after rebooting a worker node, it just keeping ready 0/1 and not working.
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 Running 10 71m
filebeat-filebeat-67qm2 0/1 Running 4 40m
In this situation, after removing /mnt/data/nodes and rebooting again
then works fine.
elasticsearch pod has nothing special I think.
#describe
{"type": "server", "timestamp": "2020-10-26T07:49:49,708Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[filebeat-7.9.3-2020.10.26-000001][0]]]).", "cluster.uuid": "sWUAXJG9QaKyZDe0BLqwSw", "node.id": "ztb35hToRf-2Ahr7olympw" }
#logs
Normal SandboxChanged 4m4s (x3 over 4m9s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 4m3s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 4m1s kubelet Created container configure-sysctl
Normal Started 4m1s kubelet Started container configure-sysctl
Normal Pulled 3m58s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 3m58s kubelet Created container elasticsearch
Normal Started 3m57s kubelet Started container elasticsearch
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
#events
6m1s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
6m1s Normal Pulled pod/filebeat-filebeat-67qm2 Container image "docker.elastic.co/beats/filebeat:7.9.3" already present on machine
5m59s Normal Started pod/elasticsearch-master-0 Started container configure-sysctl
5m59s Normal Created pod/elasticsearch-master-0 Created container configure-sysctl
5m59s Normal Created pod/filebeat-filebeat-67qm2 Created container filebeat
5m58s Normal Started pod/filebeat-filebeat-67qm2 Started container filebeat
5m56s Normal Created pod/elasticsearch-master-0 Created container elasticsearch
5m56s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
5m55s Normal Started pod/elasticsearch-master-0 Started container elasticsearch
61s Warning Unhealthy pod/filebeat-filebeat-67qm2 Readiness probe failed: elasticsearch: http://elasticsearch-master:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 10.97.133.135
dial up... ERROR dial tcp 10.97.133.135:9200: connect: connection refused
59s Warning Unhealthy pod/elasticsearch-master-0 Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
/mnt/data path has chown 1000:1000
and In case of only elastisearch without filebeat, rebooting has no problem.
I can't figure this out at all. :(
What am I missing?
pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: elastic-pv
labels:
type: local
app: elastic
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
claimRef:
namespace: default
name: elasticsearch-master-elasticsearch-master-0
hostPath:
path: "/mnt/data"
values.yaml
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicas: 1
minimumMasterNodes: 1
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.9.3"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-storage
resources:
requests:
storage: 5Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: true
name: elastic-vc
labels:
# Add default labels for the volumeClaimTemplate fo the StatefulSet
app: elastic
annotations: {}
extraVolumes: []
# - name: extras
# emptyDir: {}
extraVolumeMounts: []
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
#readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# - effect: NoSchedule
# key: node-role.kubernetes.io/master
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
Issue
There is an issue with elasticsearch readiness probe when running on single replica cluster.
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
Solution
As mentioned here by #adinhodovic
If your running a single replica cluster add the following helm value:
clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"
Your status will never go green with a single replica cluster.
The following values should work:
replicas: 1
minimumMasterNodes: 1
clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'

fluent-bit cannot parse kubernetes logs

I would like to forward Kubernetes logs from fluent-bit to elasticsearch through fluentd but fluent-bit cannot parse kubernetes logs properly. In order to install Fluent-bit and Fluentd, I use Helm charts. I tried both stable/fluentbit and fluent/fluentbit and faced with same problem:
#0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'Could not dynamically add mapping for field [app.kubernetes.io/component]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].'"
I put following lines into fluent-bit values file as shown here
remapMetadataKeysFilter:
enabled: true
match: kube.*
## List of the respective patterns and replacements for metadata keys replacements
## Pattern must satisfy the Lua spec (see https://www.lua.org/pil/20.2.html)
## Replacement is a plain symbol to replace with
replaceMap:
- pattern: "[/.]"
replacement: "_"
...nothing changed, same errors are listed.
Is there a workaround to get rid of that bug?
my values.yaml is here:
# Default values for fluent-bit.
# kind -- DaemonSet or Deployment
kind: DaemonSet
# replicaCount -- Only applicable if kind=Deployment
replicaCount: 1
image:
repository: fluent/fluent-bit
pullPolicy: Always
# tag:
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name:
rbac:
create: true
podSecurityPolicy:
create: false
podSecurityContext:
{}
# fsGroup: 2000
securityContext:
{}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
service:
type: ClusterIP
port: 2020
annotations:
prometheus.io/path: "/api/v1/metrics/prometheus"
prometheus.io/port: "2020"
prometheus.io/scrape: "true"
serviceMonitor:
enabled: true
namespace: monitoring
interval: 10s
scrapeTimeout: 10s
# selector:
# prometheus: my-prometheus
resources:
{}
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: ""
env: []
envFrom: []
extraPorts: []
# - port: 5170
# containerPort: 5170
# protocol: TCP
# name: tcp
extraVolumes: []
extraVolumeMounts: []
## https://docs.fluentbit.io/manual/administration/configuring-fluent-bit
config:
## https://docs.fluentbit.io/manual/service
service: |
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
## https://docs.fluentbit.io/manual/pipeline/inputs
inputs: |
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Read_From_Tail On
## https://docs.fluentbit.io/manual/pipeline/filters
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
[FILTER]
Name lua
Match kube.*
script /fluent-bit/etc/functions.lua
call dedot
## https://docs.fluentbit.io/manual/pipeline/outputs
outputs: |
[OUTPUT]
Name forward
Match *
Host fluentd-in-forward.elastic-system.svc.cluster.local
Port 24224
tls off
tls.verify off
## https://docs.fluentbit.io/manual/pipeline/parsers
customParsers: |
[PARSER]
Name docker_no_time
Format json
Time_Keep Off
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
I had the same issue, which is caused by multiple labels that are converted into json.
I renamed the conflicting keys to match with the newer format of recommended labels:
<filter **>
#type rename_key
rename_rule1 ^app$ app.kubernetes.io/name
rename_rule2 ^chart$ helm.sh/chart
rename_rule3 ^version$ app.kubernetes.io/version
rename_rule4 ^component$ app.kubernetes.io/component
rename_rule5 ^istio$ istio.io/name
</filter>
I think that your problem isn't in kubernetes, isn't in fluentbit/fluentd chart, your problem is in elasticsearch, concretely in the mapping.
In elsticsearch version 7.x you can't have different types (string, int, etc) for the same field.
For solve thus issue, i use "ignore_malformed": true in the index template that i use for the kubernetes logs.
https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html
The malformed field is not indexed, but other fields in the document
are processed normally.

StatefulSet FailedCreate ElasticSearch

When attempting to install ElasticSearch for Kubernetes on a PKS instance I am running into an issue where after running kubectl get events --all-namespaces I see create Pod logging-es-default-0 in StatefulSet logging-es-default failed error: pods "logging-es-default-0" is forbidden: SecurityContext.RunAsUser is forbidden. Does this have something to do with a pod security policy? Is there any way to be able to deploy ElasticSearch to Kubernetes if privileged containers are not allowed?
Edit: here is the values.yml file that I am passing into the elasticsearch helm chart.
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicas: 3
minimumMasterNodes: 2
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.4.1"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "100m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
rbac:
create: false
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: true
annotations: {}
extraVolumes: ""
# - name: extras
# emptyDir: {}
extraVolumeMounts: ""
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraInitContainers: ""
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: null
runAsUser: null
# The following value is deprecated,
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
securityContext:
capabilities: null
# readOnlyRootFilesystem: true
runAsNonRoot: null
runAsUser: null
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
sysctlInitContainer:
enabled: false
keystore: []
The values listed above produce the following error:
create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master failed error: pods "elasticsearch-master-0" is forbidden: SecurityContext.RunAsUser is forbidden
Solved: I learned that my istio deployment was causing issues when attempting to deploy any other service into my cluster. I had made a bad assumption that istio along with my cluster security policies weren't causing my issue.
is forbidden: SecurityContext.RunAsUser is forbidden. Does this have something to do with a pod security policy?
Yes, that's exactly what it has to do with
Evidently the StatefulSet has included a securityContext: stanza, but your cluster administrator forbids such an action
Is there any way to be able to deploy ElasticSearch to Kubernetes if privileged containers are not allowed?
That's not exactly what's going on here -- it's not the "privileged" part that is causing you problems -- it's the PodSpec requesting to run the container as a user other than the one in the docker image. In fact, I would actually be very surprised if any modern elasticsearch docker image requires modifying the user at all, since all the recent ones do not run as root to begin with
Remove that securityContext: stanza from the StatefulSet and report back what new errors arise (if any)

Resources