Can't communicate with Elasticsearch endpoint from FluentBit - elasticsearch

Problem:
Connection by AWS Elasticsearch endpoint is refused when pushing Kubernetes logs through a fluentBit forwarder.
Here is the fluentBit set up:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
#INCLUDE input-kubernetes.conf
#INCLUDE filter-kubernetes.conf
#INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
K8S-Logging.Parser On
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Retry_Limit False
tls Off
tls.verify Off
#----
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.5
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "https://vpc-tf-test2-xyzxyzxyzxyz.eu-west-2.es.amazonaws.com"
- name: FLUENT_ELASTICSEARCH_PORT
value: "443"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
The fluentBit forwarder logs show this:
[2021/02/01 09:09:11] [error] [io] connection #46 failed to: https://vpc-tf-test2-xyzxyzxyz.eu-west-2.es.amazonaws.com:443
[2021/02/01 09:09:11] [ warn] [engine] failed to flush chunk '1-1611849613.623397482.flb', retry in 1521 seconds: task_id=1980, input=tail.0 > output=es.0
[2021/02/01 09:09:11] [ warn] [engine] failed to flush chunk '1-1611849347.548817423.flb', retry in 1806 seconds: task_id=1623, input=tail.0 > output=es.0
[2021/02/01 09:09:11] [ warn] [engine] failed to flush chunk '1-1611849095.485002520.flb', retry in 1286 seconds: task_id=1284, input=tail.0 > output=es.0
[2021/02/01 09:09:13] [ warn] net_tcp_fd_connect: getaddrinfo(host='https://vpc-tf-test2-xyzxyzxyzxyz.eu-west-2.es.amazonaws.com'): Name or service not known
[2021/02/01 09:09:13] [error] [io] connection #46 failed to: https://vpc-tf-test2-xyzxyzxyz.eu-west-2.es.amazonaws.com:443
[2021/02/01 09:09:13] [ warn] [engine] failed to flush chunk '1-1611849450.549250742.flb', retry in 799 seconds: task_id=1766, input=tail.0 > output=es.0
I am trying to trace where the access is getting blocked.
Access to the ES endpoint is protected by Security Group with this inbound rules:
Type: All traffic
Protocol: All
Port range: All
Source: sg-xyzxyzxyz (eks-cluster-sg-vrs2-eks-dev-xyzxyzyxz)

Change FLUENT_ELASTICSEARCH_HOST value to vpc-tf-test2-xyzxyzxyzxyz.eu-west-2.es.amazonaws.com.

Related

Setting up elastic search cluster on kubernetes pods can't talk to each other by hostname

Trying to setup elasticsearch cluster on kube, the problem i am having is that each pod isn't able to talk to the others by the respective hostnames, but the ip address works.
So for example i'm trying to currently setup 3 master nodes, es-master-0, es-master-1 and es-master-2 , if i log into one of the containers and ping another based on the pod ip it's fine, but i i try to ping say es-master-1 from es-master-0 based on the hostname it can't find it.
Clearly missing something here. Currently launching this config to try get it working:
apiVersion: v1
kind: Service
metadata:
name: ed
labels:
component: elasticsearch
role: master
spec:
selector:
component: elasticsearch
role: master
ports:
- name: transport1
port: 9300
protocol: TCP
clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-master
labels:
component: elasticsearch
role: master
spec:
selector:
matchLabels:
component: elasticsearch
role: master
serviceName: ed
replicas: 3
template:
metadata:
labels:
component: elasticsearch
role: master
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- { key: es-master, operator: In, values: [ "true" ] }
initContainers:
- name: init-sysctl
image: busybox:1.27.2
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
dnsPolicy: "None"
dnsConfig:
options:
- name: ndots
value: "6"
nameservers:
- 10.85.0.10
searches:
- ed.es.svc.cluster.local
- es.svc.cluster.local
- svc.cluster.local
- cluster.local
- home
- node1
containers:
- name: es-master
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.5
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: ES_JAVA_OPTS
value: -Xms2048m -Xmx2048m
resources:
requests:
cpu: "0.25"
limits:
cpu: "2"
ports:
- containerPort: 9300
name: transport1
livenessProbe:
tcpSocket:
port: transport1
initialDelaySeconds: 60
periodSeconds: 10
volumeMounts:
- name: storage
mountPath: /data
- name: config
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: elasticsearch.yml
volumes:
- name: config
configMap:
name: es-master-config
volumeClaimTemplates:
- metadata:
name: storage
spec:
storageClassName: "local-path"
accessModes: [ ReadWriteOnce ]
resources:
requests:
storage: 2Gi
It's clearly somehow not resolving the hostnames
For pod to pod communication you can use k8s service which you had defined.

Elasticsearch cluster on Kubernetes - nodes are not communicating

I have an Elasticsearch cluster (6.3) running on Kubernetes (GKE) with the following manifest file:
---
# Source: elasticsearch/templates/manifests.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-configmap
labels:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
data:
elasticsearch.yml: |
cluster.name: "${CLUSTER_NAME}"
node.name: "${NODE_NAME}"
path.data: /usr/share/elasticsearch/data
path.repo: ["${BACKUP_REPO_PATH}"]
network.host: 0.0.0.0
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts: ${DISCOVERY_SERVICE}
log4j2.properties: |
status = error
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n
rootLogger.level = info
rootLogger.appenderRef.console.ref = console
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
labels: &ElasticsearchDeploymentLabels
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
spec:
selector:
matchLabels: *ElasticsearchDeploymentLabels
serviceName: elasticsearch-svc
replicas: 2
updateStrategy:
# The procedure for updating the Elasticsearch cluster is described at
# https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html
type: OnDelete
template:
metadata:
labels: *ElasticsearchDeploymentLabels
spec:
terminationGracePeriodSeconds: 180
initContainers:
# This init container sets the appropriate limits for mmap counts on the hosting node.
# https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
- name: set-max-map-count
image: marketplace.gcr.io/google/elasticsearch/ubuntu16_04#...
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
command:
- /bin/bash
- -c
- 'if [[ "$(sysctl vm.max_map_count --values)" -lt 262144 ]]; then sysctl -w vm.max_map_count=262144; fi'
containers:
- name: elasticsearch
image: eu.gcr.io/projectId/elasticsearch6.3#sha256:...
imagePullPolicy: Always
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: "elasticsearch-cluster"
- name: DISCOVERY_SERVICE
value: "elasticsearch-svc"
- name: BACKUP_REPO_PATH
value: ""
ports:
- name: prometheus
containerPort: 9114
protocol: TCP
- name: http
containerPort: 9200
- name: tcp-transport
containerPort: 9300
volumeMounts:
- name: configmap
mountPath: /etc/elasticsearch/elasticsearch.yml
subPath: elasticsearch.yml
- name: configmap
mountPath: /etc/elasticsearch/log4j2.properties
subPath: log4j2.properties
- name: elasticsearch-pvc
mountPath: /usr/share/elasticsearch/data
readinessProbe:
httpGet:
path: /_cluster/health?local=true
port: 9200
initialDelaySeconds: 5
livenessProbe:
exec:
command:
- /usr/bin/pgrep
- -x
- "java"
initialDelaySeconds: 5
resources:
requests:
memory: "2Gi"
- name: prometheus-to-sd
image: marketplace.gcr.io/google/elasticsearch/prometheus-to-sd#sha256:8e3679a6e059d1806daae335ab08b304fd1d8d35cdff457baded7306b5af9ba5
ports:
- name: profiler
containerPort: 6060
command:
- /monitor
- --stackdriver-prefix=custom.googleapis.com
- --source=elasticsearch:http://localhost:9114/metrics
- --pod-id=$(POD_NAME)
- --namespace-id=$(POD_NAMESPACE)
- --monitored-resource-types=k8s
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: configmap
configMap:
name: "elasticsearch-configmap"
volumeClaimTemplates:
- metadata:
name: elasticsearch-pvc
labels:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-prometheus-svc
labels:
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: elasticsearch-server
spec:
clusterIP: None
ports:
- name: prometheus-port
port: 9114
protocol: TCP
selector:
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: elasticsearch-server
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-svc-internal
labels:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
spec:
ports:
- name: http
port: 9200
- name: tcp-transport
port: 9300
selector:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: ilb-service-elastic
annotations:
cloud.google.com/load-balancer-type: "Internal"
labels:
app: elasticsearch-svc
spec:
type: LoadBalancer
loadBalancerIP: some-ip-address
selector:
app.kubernetes.io/component: elasticsearch-server
app.kubernetes.io/name: elasticsearch
ports:
- port: 9200
protocol: TCP
This manifest was written from the template that used to be available on the GCP marketplace.
I'm encountering the following issue: the cluster is supposed to have 2 nodes, and indeed 2 pods are running.
However
a call to ip:9200/_nodes returns just one node
there still seems to be a second node running that receives traffic (at least, read traffic), as visible in the logs. Those requests typically fail because the requested entities don't exist on that node (just on the master node).
I can't wrap my head around the fact that the node at the same time isn't visible to the master node, and receives read traffic from the load balanced pointing to the stateful set.
Am I missing something subtle ?
Did you try checking which types of both Nodes are?
There are Master nodes and data nodes, at a time only one master gets elected while the other just stay in the background if the first master node goes down new Node gets elected and handles the further request.
i cant see Node type config in stateful sets. i would recommand checking out the helm of Elasticsearch to set up and deploy on GKE.
Helm chart : https://github.com/elastic/helm-charts/tree/main/elasticsearch
Sharing example Env config for reference :
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: my-es
- name: NODE_MASTER
value: "false"
- name: NODE_INGEST
value: "false"
- name: HTTP_ENABLE
value: "false"
- name: ES_JAVA_OPTS
value: -Xms256m -Xmx256m
read more at : https://faun.pub/https-medium-com-thakur-vaibhav23-ha-es-k8s-7e655c1b7b61

Kubernetes giving CrashLoopBackOff error while running the packetbeat in kubernetes cluster

I'm trying to deploy Packetbeat as a DaemonSet on a Kubernetes cluster. But Kubernetes giving CrashLoopBackOff error while running the Packetbeat. I have checked the pod logs of Packetbeat. Below are the logs.
2020-08-23T14:28:00.054Z INFO instance/beat.go:475 Beat UUID: 69d32e5f-c8f2-41bf-9242-48435688c540
2020-08-23T14:28:00.054Z INFO instance/beat.go:213 Setup Beat: packetbeat; Version: 6.2.4
2020-08-23T14:28:00.061Z INFO add_cloud_metadata/add_cloud_metadata.go:301 add_cloud_metadata: hosting provider type detected as ec2, metadata={"availability_zone":"us-east-1f","instance_id":"i-05b8121af85c94236","machine_type":"t2.medium","provider":"ec2","region":"us-east-1"}
2020-08-23T14:28:00.061Z INFO kubernetes/watcher.go:77 kubernetes: Performing a pod sync
2020-08-23T14:28:00.074Z INFO kubernetes/watcher.go:108 kubernetes: Pod sync done
2020-08-23T14:28:00.074Z INFO elasticsearch/client.go:145 Elasticsearch url: http://elasticsearch:9200
2020-08-23T14:28:00.074Z INFO kubernetes/watcher.go:140 kubernetes: Watching API for pod events
2020-08-23T14:28:00.074Z INFO pipeline/module.go:76 Beat name: ip-172-31-72-117
2020-08-23T14:28:00.075Z INFO procs/procs.go:78 Process matching disabled
2020-08-23T14:28:00.076Z INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2020-08-23T14:28:00.076Z INFO elasticsearch/client.go:145 Elasticsearch url: http://elasticsearch:9200
2020-08-23T14:28:00.083Z WARN transport/tcp.go:36 DNS lookup failure "elasticsearch": lookup elasticsearch on 172.31.0.2:53: no such host
2020-08-23T14:28:00.083Z ERROR elasticsearch/elasticsearch.go:165 Error connecting to Elasticsearch at http://elasticsearch:9200: Get http://elasticsearch:9200: lookup elasticsearch on 172.31.0.2:53: no such host
2020-08-23T14:28:00.085Z INFO [monitoring] log/log.go:132 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":28},"total":{"ticks":160,"time":176,"value":160},"user":{"ticks":140,"time":148}},"info":{"ephemeral_id":"70e07383-3aae-4bc1-a6e1-540a6cfa8ad8","uptime":{"ms":35}},"memstats":{"gc_next":26511344,"memory_alloc":21723000,"memory_total":23319008,"rss":51834880}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":5,"events":{"active":0}}},"system":{"cpu":{"cores":2},"load":{"1":0.11,"15":0.1,"5":0.14,"norm":{"1":0.055,"15":0.05,"5":0.07}}}}}}
2020-08-23T14:28:00.085Z INFO [monitoring] log/log.go:133 Uptime: 37.596889ms
2020-08-23T14:28:00.085Z INFO [monitoring] log/log.go:110 Stopping metrics logging.
2020-08-23T14:28:00.085Z ERROR instance/beat.go:667 Exiting: Error importing Kibana dashboards: fail to create the Elasticsearch loader: Error creating Elasticsearch client: Couldn't connect to any of the configured Elasticsearch hosts. Errors: [Error connection to Elasticsearch http://elasticsearch:9200: Get http://elasticsearch:9200: lookup elasticsearch on 172.31.0.2:53: no such host]
Exiting: Error importing Kibana dashboards: fail to create the Elasticsearch loader: Error creating Elasticsearch client: Couldn't connect to any of the configured Elasticsearch hosts. Errors: [Error connection to Elasticsearch http://elasticsearch:9200: Get http://elasticsearch:9200: lookup elastic search on 172.31.0.2:53: no such host]
Here is Packetbeat.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: packetbeat-dynamic-config
namespace: kube-system
labels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
data:
packetbeat.yml: |-
setup.dashboards.enabled: true
setup.template.enabled: true
setup.template.settings:
index.number_of_shards: 2
packetbeat.interfaces.device: any
packetbeat.protocols:
- type: dns
ports: [53]
include_authorities: true
include_additionals: true
- type: http
ports: [80, 8000, 8080, 9200]
- type: mysql
ports: [3306]
- type: redis
ports: [6379]
packetbeat.flows:
timeout: 30s
period: 10s
processors:
- add_cloud_metadata:
- add_kubernetes_metadata:
host: ${HOSTNAME}
indexers:
- ip_port:
matchers:
- field_format:
format: '%{[ip]}:%{[port]}'
cloud.id: ${ELASTIC_CLOUD_ID}
cloud.auth: ${ELASTIC_CLOUD_AUTH}
#setup.kibana.host: kibana:5601
setup.ilm.overwrite: true
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
username: ${ELASTICSEARCH_USERNAME}
password: ${ELASTICSEARCH_PASSWORD}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: packetbeat-dynamic
namespace: kube-system
labels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
template:
metadata:
labels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
spec:
serviceAccountName: packetbeat-dynamic
terminationGracePeriodSeconds: 30
hostNetwork: true
containers:
- name: packetbeat-dynamic
image: docker.elastic.co/beats/packetbeat:6.2.4
imagePullPolicy: Always
args: [
"-c", "/etc/packetbeat.yml",
"-e",
]
securityContext:
runAsUser: 0
capabilities:
add:
- NET_ADMIN
env:
- name: ELASTICSEARCH_HOST
value: elasticsearch
- name: ELASTICSEARCH_PORT
value: "9200"
- name: ELASTICSEARCH_USERNAME
value: elastic
- name: ELASTICSEARCH_PASSWORD
value: changeme
- name: CLOUD_ID
value:
- name: ELASTIC_CLOUD_AUTH
value:
- name: KIBANA_HOST
value: kibana
- name: KIBANA_PORT
value: "5601"
volumeMounts:
- name: config
mountPath: /etc/packetbeat.yml
readOnly: true
subPath: packetbeat.yml
- name: data
mountPath: /usr/share/packetbeat/data
volumes:
- name: config
configMap:
defaultMode: 0600
name: packetbeat-dynamic-config
- name: data
emptyDir: {}
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: packetbeat-dynamic
subjects:
- kind: ServiceAccount
name: packetbeat-dynamic
namespace: kube-system
roleRef:
kind: ClusterRole
name: packetbeat-dynamic
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: packetbeat-dynamic
labels:
k8s-app: packetbeat-dynamic
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
verbs:
- get
- watch
- list
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: packetbeat-dynamic
namespace: kube-system
labels:
k8s-app: packetbeat-dynamic
Could anyone suggest me to resolve this issue? any suggestible link also more helpful.
kubectl describe daemonset packetbeat-dynamic -n kube-system
Name: packetbeat-dynamic
Selector: k8s-app=packetbeat-dynamic,kubernetes.io/cluster-service=true
Node-Selector: <none>
Labels: k8s-app=packetbeat-dynamic
kubernetes.io/cluster-service=true
Annotations: deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 1
Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: k8s-app=packetbeat-dynamic
kubernetes.io/cluster-service=true
Service Account: packetbeat-dynamic
Containers:
packetbeat-dynamic:
Image: docker.elastic.co/beats/packetbeat:6.2.4
Port: <none>
Host Port: <none>
Args:
-c
/etc/packetbeat.yml
-e
Environment:
ELASTICSEARCH_HOST: elasticsearch
ELASTICSEARCH_PORT: 9200
ELASTICSEARCH_USERNAME: elastic
ELASTICSEARCH_PASSWORD: changeme
CLOUD_ID:
ELASTIC_CLOUD_AUTH:
KIBANA_HOST: kibana
KIBANA_PORT: 5601
Mounts:
/etc/packetbeat.yml from config (ro,path="packetbeat.yml")
/usr/share/packetbeat/data from data (rw)
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: packetbeat-dynamic-config
Optional: false
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Events: <none>

kubernetes statefulset-controller privileged init containers for elasticsearch

I'm trying to create an ElasticSearch stateful set (STS) with init containers to increase the worker nodes vm.max_map_count=262144 and also the ulimit -n 65536.
However some PodSecurityPolicy (PSP) is denying the escalation of privilaged containers from what I can tell.
Warning FailedCreate 1s (x12 over 11s) statefulset-controller
create Pod elasticsearch-node-0 in StatefulSet elasticsearch-node
failed error: pods "elasticsearch-node-0" is forbidden: unable to
validate against any pod security policy:
[spec.initContainers[0].securityContext.privileged: Invalid value:
true: Privileged containers are not allowed
spec.initContainers[1].securityContext.privileged: Invalid value:
true: Privileged containers are not allowed]
And there are in fact 2x PSP in the cluster, privilaged and unprivilaged. Do I need to specify the privilaged PSP in the STS somehow? Or a svc-acc?
The k8s server version is 1.9.8 - if it matters.
This is the STS (with some helm elements)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-node
namespace: {{ .Release.Namespace }}
labels:
component: elasticsearch
role: node
spec:
replicas: {{ .Values.replicas }}
serviceName: elasticsearch-discovery
selector:
matchLabels:
component: elasticsearch
role: node
template:
metadata:
namespace: {{ .Release.Namespace }}
labels:
component: elasticsearch
role: node
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: component
operator: In
values:
- elasticsearch
- key: role
operator: In
values:
- node
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 100
securityContext:
fsGroup: 1000
initContainers:
# To increase the default vm.max_map_count to 262144
- name: increase-vm-max-map-count
image: busybox
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
# To increase the ulimit to 65536
- name: increase-ulimit
image: busybox
command:
- sh
- -c
- ulimit -n 65536
securityContext:
privileged: true
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:{{ .Values.global.version }}
imagePullPolicy: Always
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
volumeMounts:
# - name: storage
# mountPath: /data
- name: config
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: elasticsearch.yml
resources:
{{ toYaml .Values.resources | indent 12 }}
env:
- name: ES_JAVA_OPTS
value: {{ .Values.java.options }}
volumes:
- name: config
configMap:
name: elasticsearch-node
$ kubectl describe sts elasticsearch-node
Name: elasticsearch-node
Namespace: default
CreationTimestamp: Tue, 12 Nov 2019 17:09:50 +0100
Selector: component=elasticsearch,role=node
Labels: component=elasticsearch
role=node
Annotations: <none>
Replicas: 2 desired | 0 total
Update Strategy: RollingUpdate
Partition: 824638159384
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: component=elasticsearch
role=node
Init Containers:
increase-vm-max-map-count:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sysctl
-w
vm.max_map_count=262144
Environment: <none>
Mounts: <none>
increase-ulimit:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sh
-c
ulimit -n 65536
Environment: <none>
Mounts: <none>
Containers:
elasticsearch:
Image: docker.elastic.co/elasticsearch/elasticsearch:7.3.2
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
Limits:
cpu: 1
memory: 3Gi
Requests:
cpu: 250m
memory: 2Gi
Environment:
ES_JAVA_OPTS: -Xms2G -Xmx2G
Mounts:
/usr/share/elasticsearch/config/elasticsearch.yml from config (rw,path="elasticsearch.yml")
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: elasticsearch-node
Optional: false
Volume Claims: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 1s (x17 over 78s) statefulset-controller create Pod elasticsearch-node-0 in StatefulSet elasticsearch-node failed error: pods "elasticsearch-node-0" is forbidden: unable to validate against any pod security policy: [spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.initContainers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]
Been staring at the PSP docs for some time now: https://kubernetes.io/docs/concepts/policy/pod-security-policy/

kubelet does not create symlinks to /var/log/containers

I am trying to set up EFK stack on my k8s cluster using ansible repo.
When i tried to browse kibana dashboard it shows me next output:
After making some research, i found out that i don't have any log detected by Fluentd.
I am running k8s 1.2.4 on minions and 1.2.0 on master.
What i succeeded to understand, is that kubelet creates /var/log/containers directory, and make symlinks from all containers running in the cluster into it. After that Fluentd mounts share /var/log volume from the minion and have eventually access to all logs containers. So , it can send these logs to elastic search.
In my case i had /var/log/containers created, but it is empty, even /var/lib/docker/containers does not contain any log file.
I used to use the following controllers and services for EFK stack setup:
es-controller.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: elasticsearch-logging-v1
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
replicas: 2
selector:
k8s-app: elasticsearch-logging
version: v1
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- image: gcr.io/google_containers/elasticsearch:v2.4.1
name: elasticsearch-logging
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
- name: es-persistent-storage
mountPath: /data
env:
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: es-persistent-storage
emptyDir: {}
es-service.yaml
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Elasticsearch"
spec:
ports:
- port: 9200
protocol: TCP
targetPort: db
selector:
k8s-app: elasticsearch-logging
fluentd-es.yaml
apiVersion: v1
kind: Pod
metadata:
name: fluentd-es-v1.20
namespace: kube-system
labels:
k8s-app: fluentd-es
version: v1.20
spec:
containers:
- name: fluentd-es
image: gcr.io/google_containers/fluentd-elasticsearch:1.20
command:
- '/bin/sh'
- '-c'
- '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log'
resources:
limits:
cpu: 100m
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
kibana-controller.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kibana-logging
template:
metadata:
labels:
k8s-app: kibana-logging
spec:
containers:
- name: kibana-logging
image: gcr.io/google_containers/kibana:v4.6.1
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
requests:
cpu: 100m
env:
- name: "ELASTICSEARCH_URL"
value: "http://elasticsearch-logging:9200"
ports:
- containerPort: 5601
name: ui
protocol: TCP
kibana-service.yaml
apiVersion: v1
kind: Service
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Kibana"
spec:
type: NodePort
ports:
- port: 5601
protocol: TCP
targetPort: ui
selector:
k8s-app: kibana-logging
update:
I changed fluentd-es.yaml as following:
apiVersion: v1
kind: Pod
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
containers:
- name: fluentd-elasticsearch
image: gcr.io/google_containers/fluentd-elasticsearch:1.15
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
But when i run a pod "named gateway", i got in the fluentd log the next error:
/var/log/containers/gateway-c3cuu_default_gateway-d5966a86e7cb1519329272a0b900182be81f55524227db2f524e6e23cd75ba04.log unreadable. It is excluded and would be examined next time.
Finally i found out what was causing the issue.
when installing docker from CentOS 7 repo, there is an option (--log-driver=journald) which force docker to run log output to journald. The default behavior is to write these logs to json.log files.So, the only thing i had to do, delete the last mentioned option from /etc/sysconfig/docker.

Resources