fluent-bit cannot parse kubernetes logs - elasticsearch

I would like to forward Kubernetes logs from fluent-bit to elasticsearch through fluentd but fluent-bit cannot parse kubernetes logs properly. In order to install Fluent-bit and Fluentd, I use Helm charts. I tried both stable/fluentbit and fluent/fluentbit and faced with same problem:
#0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'Could not dynamically add mapping for field [app.kubernetes.io/component]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].'"
I put following lines into fluent-bit values file as shown here
remapMetadataKeysFilter:
enabled: true
match: kube.*
## List of the respective patterns and replacements for metadata keys replacements
## Pattern must satisfy the Lua spec (see https://www.lua.org/pil/20.2.html)
## Replacement is a plain symbol to replace with
replaceMap:
- pattern: "[/.]"
replacement: "_"
...nothing changed, same errors are listed.
Is there a workaround to get rid of that bug?
my values.yaml is here:
# Default values for fluent-bit.
# kind -- DaemonSet or Deployment
kind: DaemonSet
# replicaCount -- Only applicable if kind=Deployment
replicaCount: 1
image:
repository: fluent/fluent-bit
pullPolicy: Always
# tag:
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name:
rbac:
create: true
podSecurityPolicy:
create: false
podSecurityContext:
{}
# fsGroup: 2000
securityContext:
{}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
service:
type: ClusterIP
port: 2020
annotations:
prometheus.io/path: "/api/v1/metrics/prometheus"
prometheus.io/port: "2020"
prometheus.io/scrape: "true"
serviceMonitor:
enabled: true
namespace: monitoring
interval: 10s
scrapeTimeout: 10s
# selector:
# prometheus: my-prometheus
resources:
{}
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: ""
env: []
envFrom: []
extraPorts: []
# - port: 5170
# containerPort: 5170
# protocol: TCP
# name: tcp
extraVolumes: []
extraVolumeMounts: []
## https://docs.fluentbit.io/manual/administration/configuring-fluent-bit
config:
## https://docs.fluentbit.io/manual/service
service: |
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
## https://docs.fluentbit.io/manual/pipeline/inputs
inputs: |
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Read_From_Tail On
## https://docs.fluentbit.io/manual/pipeline/filters
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
[FILTER]
Name lua
Match kube.*
script /fluent-bit/etc/functions.lua
call dedot
## https://docs.fluentbit.io/manual/pipeline/outputs
outputs: |
[OUTPUT]
Name forward
Match *
Host fluentd-in-forward.elastic-system.svc.cluster.local
Port 24224
tls off
tls.verify off
## https://docs.fluentbit.io/manual/pipeline/parsers
customParsers: |
[PARSER]
Name docker_no_time
Format json
Time_Keep Off
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L

I had the same issue, which is caused by multiple labels that are converted into json.
I renamed the conflicting keys to match with the newer format of recommended labels:
<filter **>
#type rename_key
rename_rule1 ^app$ app.kubernetes.io/name
rename_rule2 ^chart$ helm.sh/chart
rename_rule3 ^version$ app.kubernetes.io/version
rename_rule4 ^component$ app.kubernetes.io/component
rename_rule5 ^istio$ istio.io/name
</filter>

I think that your problem isn't in kubernetes, isn't in fluentbit/fluentd chart, your problem is in elasticsearch, concretely in the mapping.
In elsticsearch version 7.x you can't have different types (string, int, etc) for the same field.
For solve thus issue, i use "ignore_malformed": true in the index template that i use for the kubernetes logs.
https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html
The malformed field is not indexed, but other fields in the document
are processed normally.

Related

Trying to Verify ELK installation , kibana dashboard not showing logbeats in discover tab

I used helm to load the ELK stack on kubernetes.
I ran the following commands
minikube start --cpus 4 --memory 8192
minikube addons enable ingress
helm repo add elastic https://helm.elastic.co
helm repo update
Then deployed elasticsearch
values-02.yml
replicas: 1
minimumMasterNodes: 1
ingress:
enabled: true
hosts:
- host: es-elk.s9.devopscloud.link #Change the hostname to the one you need
paths:
- path: /
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Applied it
helm install elk-elasticsearch elastic/elasticsearch -f values-02.yml
Then deployed kibana values-03.yml
elasticsearchHosts: "http://elasticsearch-master:9200"
ingress:
enabled: true
className: "nginx"
hosts:
- host:
paths:
- path: /
Applied it
helm install elk-kibana elastic/kibana -f values-03.yml
Then deployed logstash
persistence:
enabled: true
logstashConfig:
logstash.yml: |
http.host: 0.0.0.0
xpack.monitoring.enabled: false
logstashPipeline:
logstash.conf: |
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => "http://elasticsearch-master.logging.svc.cluster.local:9200"
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
service:
type: ClusterIP
ports:
- name: beats
port: 5044
protocol: TCP
targetPort: 5044
- name: http
port: 8080
protocol: TCP
targetPort: 8080
Applied it
helm install elk-logstash elastic/logstash -f values-04.yaml
Then deployed filebeat values-05.yaml
daemonset:
filebeatConfig:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
output.logstash:
hosts: ["elk-logstash-logstash:5044"]
Then applied it
helm install elk-filebeat elastic/filebeat -f values-05.yaml
All up and running
kubectl get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 1/1 Running 0 61m
elk-filebeat-filebeat-ggjhc 1/1 Running 0 45m
elk-kibana-kibana-6d658894bf-grb8x 1/1 Running 0 52m
elk-logstash-logstash-0 1/1 Running 0 47m
But when I go onto the discover page
http://172.21.95.140/app/management/kibana/indexPatterns?bannerMessage=To%20visualize%20and%20explore%20data%20in%20Kibana,%20you%20must%20create%20an%20index%20pattern%20to%20retrieve%20data%20from%20Elasticsearch.
It does not show anything, for filebeats
Instead I get a Ready to try Kibana? First, you need data message.
I was following this tutorial
https://blog.knoldus.com/how-to-deploy-elk-stack-on-kubernetes/#deploy-elastic-search
I followed this tutorial and ran the default kibana and filebeat yaml files.

Expose Elastic Search externally with istio ingress controller

My Kubernetes cluster is an Azure Kubernetes Service one with Istio as the ingress controller. I have a working Elastic Search instance that I'm having trouble to expose.
I'm deploying such an Elastic Search instance with Elastic's Helm chart for version 7.17 and the following values.yml:
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
remote_cluster_client: "true"
ml: "true"
replicas: 1
minimumMasterNodes: 1
esMajorVersion: ""
clusterDeprecationIndexing: "false"
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig:
elasticsearch.yml: |
xpack.security.enabled: false
esJvmOptions: {}
# processors.options: |
# -XX:ActiveProcessorCount=3
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
hostAliases: []
#- ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.17.3"
imagePullPolicy: "IfNotPresent"
podAnnotations:
{}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "" # example: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "2500m"
memory: "2Gi"
limits:
cpu: "2500m"
memory: "5Gi"
initResources:
{}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
automountToken: true
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
persistence:
enabled: true
labels:
# Add default labels for the volumeClaimTemplate of the StatefulSet
enabled: false
annotations: {}
extraVolumes:
[]
# - name: extras
# emptyDir: {}
extraVolumeMounts:
[]
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers:
[]
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers:
[]
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
enabled: true
labels: {}
labelsHeadless: {}
type: ClusterIP
# Consider that all endpoints are considered "ready" even if the Pods themselves are not
# https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/#ServiceSpec
publishNotReadyAddresses: false
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
# readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 200
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
# Enabling this will publicly expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: istio
# kubernetes.io/tls-acme: "true"
className: "istio"
pathtype: ImplementationSpecific
hosts:
- host: "internal.company.domain.com"
paths:
- path: /elastic
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
healthNameOverride: ""
lifecycle:
{}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
networkPolicy:
## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now.
## In order for a Pod to access Elasticsearch, it needs to have the following label:
## {{ template "uname" . }}-client: "true"
## Example for default configuration to access HTTP port:
## elasticsearch-master-http-client: "true"
## Example for default configuration to access transport port:
## elasticsearch-master-transport-client: "true"
http:
enabled: false
## if explicitNamespacesSelector is not set or set to {}, only client Pods being in the networkPolicy's namespace
## and matching all criteria can reach the DB.
## But sometimes, we want the Pods to be accessible to clients from other namespaces, in this case, we can use this
## parameter to select these namespaces
##
# explicitNamespacesSelector:
# # Accept from namespaces with all those different rules (only from whitelisted Pods)
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
## Additional NetworkPolicy Ingress "from" rules to set. Note that all rules are OR-ed.
##
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
transport:
## Note that all Elasticsearch Pods can talk to themselves using transport port even if enabled.
enabled: false
# explicitNamespacesSelector:
# matchLabels:
# role: frontend
# matchExpressions:
# - {key: role, operator: In, values: [frontend]}
# additionalRules:
# - podSelector:
# matchLabels:
# role: frontend
# - podSelector:
# matchExpressions:
# - key: role
# operator: In
# values:
# - frontend
tests:
enabled: true
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
But when I browse to https://internal.company.domain.com/elastic I get nothing. So evidently I'm not doing the ingress part as I should.
If I use Lens to inspect the logs of the Pod, the instance is working OK and I can even use portforwarding to confirm it's all good with the container.

Can't communicate with Elasticsearch endpoint from FluentBit

Problem:
Connection by AWS Elasticsearch endpoint is refused when pushing Kubernetes logs through a fluentBit forwarder.
Here is the fluentBit set up:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
#INCLUDE input-kubernetes.conf
#INCLUDE filter-kubernetes.conf
#INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
K8S-Logging.Parser On
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Retry_Limit False
tls Off
tls.verify Off
#----
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.5
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "https://vpc-tf-test2-xyzxyzxyzxyz.eu-west-2.es.amazonaws.com"
- name: FLUENT_ELASTICSEARCH_PORT
value: "443"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
The fluentBit forwarder logs show this:
[2021/02/01 09:09:11] [error] [io] connection #46 failed to: https://vpc-tf-test2-xyzxyzxyz.eu-west-2.es.amazonaws.com:443
[2021/02/01 09:09:11] [ warn] [engine] failed to flush chunk '1-1611849613.623397482.flb', retry in 1521 seconds: task_id=1980, input=tail.0 > output=es.0
[2021/02/01 09:09:11] [ warn] [engine] failed to flush chunk '1-1611849347.548817423.flb', retry in 1806 seconds: task_id=1623, input=tail.0 > output=es.0
[2021/02/01 09:09:11] [ warn] [engine] failed to flush chunk '1-1611849095.485002520.flb', retry in 1286 seconds: task_id=1284, input=tail.0 > output=es.0
[2021/02/01 09:09:13] [ warn] net_tcp_fd_connect: getaddrinfo(host='https://vpc-tf-test2-xyzxyzxyzxyz.eu-west-2.es.amazonaws.com'): Name or service not known
[2021/02/01 09:09:13] [error] [io] connection #46 failed to: https://vpc-tf-test2-xyzxyzxyz.eu-west-2.es.amazonaws.com:443
[2021/02/01 09:09:13] [ warn] [engine] failed to flush chunk '1-1611849450.549250742.flb', retry in 799 seconds: task_id=1766, input=tail.0 > output=es.0
I am trying to trace where the access is getting blocked.
Access to the ES endpoint is protected by Security Group with this inbound rules:
Type: All traffic
Protocol: All
Port range: All
Source: sg-xyzxyzxyz (eks-cluster-sg-vrs2-eks-dev-xyzxyzyxz)
Change FLUENT_ELASTICSEARCH_HOST value to vpc-tf-test2-xyzxyzxyzxyz.eu-west-2.es.amazonaws.com.

K8s Elasticsearch with filebeat is keeping 'not ready' after rebooting

I'm going through a not very understandable situation.
Environment
Two dedicated nodes with azure centos 8.2 (2vcpu, 16G ram), not AKS
1 master node, 1 worker node.
kubernetes v1.19.3
helm v2.16.12
Helm charts Elastic (https://github.com/elastic/helm-charts/tree/7.9.3)
At the first time, It works fine with below installation.
## elasticsearch, filebeat
# kubectl apply -f pv.yaml
# helm install -f values.yaml --name elasticsearch elastic/elasticsearch
# helm install --name filebeat --version 7.9.3 elastic/filebeat
curl elasitcsearchip:9200 and curl elasitcsearchip:9200/_cat/indices
show right values.
but after rebooting a worker node, it just keeping ready 0/1 and not working.
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 Running 10 71m
filebeat-filebeat-67qm2 0/1 Running 4 40m
In this situation, after removing /mnt/data/nodes and rebooting again
then works fine.
elasticsearch pod has nothing special I think.
#describe
{"type": "server", "timestamp": "2020-10-26T07:49:49,708Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[filebeat-7.9.3-2020.10.26-000001][0]]]).", "cluster.uuid": "sWUAXJG9QaKyZDe0BLqwSw", "node.id": "ztb35hToRf-2Ahr7olympw" }
#logs
Normal SandboxChanged 4m4s (x3 over 4m9s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 4m3s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 4m1s kubelet Created container configure-sysctl
Normal Started 4m1s kubelet Started container configure-sysctl
Normal Pulled 3m58s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 3m58s kubelet Created container elasticsearch
Normal Started 3m57s kubelet Started container elasticsearch
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
#events
6m1s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
6m1s Normal Pulled pod/filebeat-filebeat-67qm2 Container image "docker.elastic.co/beats/filebeat:7.9.3" already present on machine
5m59s Normal Started pod/elasticsearch-master-0 Started container configure-sysctl
5m59s Normal Created pod/elasticsearch-master-0 Created container configure-sysctl
5m59s Normal Created pod/filebeat-filebeat-67qm2 Created container filebeat
5m58s Normal Started pod/filebeat-filebeat-67qm2 Started container filebeat
5m56s Normal Created pod/elasticsearch-master-0 Created container elasticsearch
5m56s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
5m55s Normal Started pod/elasticsearch-master-0 Started container elasticsearch
61s Warning Unhealthy pod/filebeat-filebeat-67qm2 Readiness probe failed: elasticsearch: http://elasticsearch-master:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 10.97.133.135
dial up... ERROR dial tcp 10.97.133.135:9200: connect: connection refused
59s Warning Unhealthy pod/elasticsearch-master-0 Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
/mnt/data path has chown 1000:1000
and In case of only elastisearch without filebeat, rebooting has no problem.
I can't figure this out at all. :(
What am I missing?
pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: elastic-pv
labels:
type: local
app: elastic
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
claimRef:
namespace: default
name: elasticsearch-master-elasticsearch-master-0
hostPath:
path: "/mnt/data"
values.yaml
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicas: 1
minimumMasterNodes: 1
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.9.3"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-storage
resources:
requests:
storage: 5Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: true
name: elastic-vc
labels:
# Add default labels for the volumeClaimTemplate fo the StatefulSet
app: elastic
annotations: {}
extraVolumes: []
# - name: extras
# emptyDir: {}
extraVolumeMounts: []
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
#readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# - effect: NoSchedule
# key: node-role.kubernetes.io/master
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
Issue
There is an issue with elasticsearch readiness probe when running on single replica cluster.
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
Solution
As mentioned here by #adinhodovic
If your running a single replica cluster add the following helm value:
clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"
Your status will never go green with a single replica cluster.
The following values should work:
replicas: 1
minimumMasterNodes: 1
clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'

StatefulSet FailedCreate ElasticSearch

When attempting to install ElasticSearch for Kubernetes on a PKS instance I am running into an issue where after running kubectl get events --all-namespaces I see create Pod logging-es-default-0 in StatefulSet logging-es-default failed error: pods "logging-es-default-0" is forbidden: SecurityContext.RunAsUser is forbidden. Does this have something to do with a pod security policy? Is there any way to be able to deploy ElasticSearch to Kubernetes if privileged containers are not allowed?
Edit: here is the values.yml file that I am passing into the elasticsearch helm chart.
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicas: 3
minimumMasterNodes: 2
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.4.1"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "100m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
rbac:
create: false
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: true
annotations: {}
extraVolumes: ""
# - name: extras
# emptyDir: {}
extraVolumeMounts: ""
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraInitContainers: ""
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: null
runAsUser: null
# The following value is deprecated,
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
securityContext:
capabilities: null
# readOnlyRootFilesystem: true
runAsNonRoot: null
runAsUser: null
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
sysctlInitContainer:
enabled: false
keystore: []
The values listed above produce the following error:
create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master failed error: pods "elasticsearch-master-0" is forbidden: SecurityContext.RunAsUser is forbidden
Solved: I learned that my istio deployment was causing issues when attempting to deploy any other service into my cluster. I had made a bad assumption that istio along with my cluster security policies weren't causing my issue.
is forbidden: SecurityContext.RunAsUser is forbidden. Does this have something to do with a pod security policy?
Yes, that's exactly what it has to do with
Evidently the StatefulSet has included a securityContext: stanza, but your cluster administrator forbids such an action
Is there any way to be able to deploy ElasticSearch to Kubernetes if privileged containers are not allowed?
That's not exactly what's going on here -- it's not the "privileged" part that is causing you problems -- it's the PodSpec requesting to run the container as a user other than the one in the docker image. In fact, I would actually be very surprised if any modern elasticsearch docker image requires modifying the user at all, since all the recent ones do not run as root to begin with
Remove that securityContext: stanza from the StatefulSet and report back what new errors arise (if any)

Resources