OpenShift Ansible-based operator hangs inconsistently - ansible

I have an Ansible-based operator running within an OpenShift 4.2 cluster.
Most of times when I apply the relevant CR, the operator runs perfectly.
Occasionally though the operator hangs without reporting any further logs.
The step where this happens is the same however the problem is that this happens inconsistently without any other factors involved and I am not sure how to diagnose it.
Restarting the operator always resolves the issue, but I wonder if there's anything I could do to diagnose it and prevent this from happening altogether?
- name: allow Pods to reference images in myproject project
k8s:
definition:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: "system:image-puller-{{ meta.name }}"
namespace: myproject
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:image-puller
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: "system:serviceaccounts:{{ meta.name }}"
The operator's logs simply hang right after the above step and right before the following step:
- name: fetch some-secret
set_fact:
some_secret: "{{ lookup('k8s', kind='Secret', namespace='myproject', resource_name='some-secret') }}"
oc describe is as follows
oc describe -n openshift-operators pod my-ansible-operator-849b44d6cc-nr5st
Name: my-ansible-operator-849b44d6cc-nr5st
Namespace: openshift-operators
Priority: 0
PriorityClassName: <none>
Node: worker1.openshift.mycompany.com/10.0.8.21
Start Time: Wed, 10 Jun 2020 22:35:45 +0100
Labels: name=my-ansible-operator
pod-template-hash=849b44d6cc
Annotations: k8s.v1.cni.cncf.io/networks-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.254.20.128"
],
"default": true,
"dns": {}
}]
Status: Running
IP: 10.254.20.128
Controlled By: ReplicaSet/my-ansible-operator-849b44d6cc
Containers:
ansible:
Container ID: cri-o://63b86ddef4055be4bcd661a3fcd70d525f9788cb96b7af8dd383ac08ea670047
Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1
Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/ao-logs
/tmp/ansible-operator/runner
stdout
State: Running
Started: Wed, 10 Jun 2020 22:35:56 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/tmp/ansible-operator/runner from runner (ro)
/var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro)
operator:
Container ID: cri-o://365077a3c1d83b97428d27eebf2f0735c9d670d364b16fad83fff5bb02b479fe
Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1
Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 10 Jun 2020 22:35:57 +0100
Ready: True
Restart Count: 0
Environment:
WATCH_NAMESPACE: openshift-operators (v1:metadata.namespace)
POD_NAME: my-ansible-operator-849b44d6cc-nr5st (v1:metadata.name)
OPERATOR_NAME: my-ansible-operator
ANSIBLE_GATHERING: explicit
Mounts:
/tmp/ansible-operator/runner from runner (rw)
/var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
runner:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
my-ansible-operator-token-vbwlr:
Type: Secret (a volume populated by a Secret)
SecretName: my-ansible-operator-token-vbwlr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Is there anything else I could do to diagnose the problem further or prevent the operator from hanging occasionally?

I found a very similar issue in the operator-sdk repository, linking to the root cause in the Ansible k8s module:
Ansible 2.7 stuck on Python 3.7 in docker-ce
From the discussion in the issue it seems that the problem is related to tasks that do not time out and the current workaround seems to be:
For now we just override ansible local connection and normal action plugins, so:
all communicate() calls have 60 second timeout
all raised TimeoutExpired exceptions are retried a few times
Can you check if this resolves your issue? As the issue is still "open", you might need to reach out to the issue as well.

Related

Microk8s Pod fails to start when i create a service for it with 1000 UDP ports

I am creating a deployment with a custom image i have in a private registry, the container will have lots of ports that need to be exposed, i want to expose them with a NodePort service, if i create a service with 1000 UDP ports then create the deployment, the deployment pod will keep crashing, if i delete the service and the deployment then create the deployment only without the service, the pod starts normally.
Any clue why would this be happening ?
Pod Description:
Name: freeswitch-7764cff4c9-d8zvh
Namespace: default
Priority: 0
Node: cc-lab/192.168.102.55
Start Time: Wed, 01 Jun 2022 15:44:09 +0000
Labels: app=freeswitch
pod-template-hash=7764cff4c9
Annotations: cni.projectcalico.org/containerID: de4baf5c4522e1f3c746a08a60bd7166179bac6c4aef245708205112ad71058a
cni.projectcalico.org/podIP: 10.1.5.8/32
cni.projectcalico.org/podIPs: 10.1.5.8/32
Status: Running
IP: 10.1.5.8
IPs:
IP: 10.1.5.8
Controlled By: ReplicaSet/freeswitch-7764cff4c9
Containers:
freeswtich:
Container ID: containerd://9cdae9120cc075af73d57ea0759b89c153c8fd5766bc819554d82fdc674e03be
Image: 192.168.102.55:32000/freeswitch:v2
Image ID: 192.168.102.55:32000/freeswitch#sha256:e6a36d220f4321e3c17155a889654a83dc37b00fb9d58171f969ec2dccc0a774
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 139
Started: Wed, 01 Jun 2022 15:47:16 +0000
Finished: Wed, 01 Jun 2022 15:47:20 +0000
Ready: False
Restart Count: 5
Environment: <none>
Mounts:
/etc/freeswitch from freeswitch-config (rw)
/tmp from freeswitch-tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwkc8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
freeswitch-config:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: freeswitch-config
ReadOnly: false
freeswitch-tmp:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: freeswitch-tmp
ReadOnly: false
kube-api-access-mwkc8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 4m3s (x5 over 5m44s) kubelet Container image "192.168.102.55:32000/freeswitch:v2" already present on machine
Normal Created 4m3s (x5 over 5m43s) kubelet Created container freeswtich
Normal Started 4m3s (x5 over 5m43s) kubelet Started container freeswtich
Warning BackOff 41s (x24 over 5m35s) kubelet Back-off restarting failed container
Service :
apiVersion: v1
kind: Service
metadata:
name: freeswitch
spec:
type: NodePort
selector:
app: freeswitch
ports:
- port: 30000
nodePort: 30000
name: rtp30000
protocol: UDP
- port: 30001
nodePort: 30001
name: rtp30001
protocol: UDP
- port: 30002
nodePort: 30002
name: rtp30002
protocol: UDP
- port: 30003
nodePort: 30003
name: rtp30003
protocol: UDP
- port: 30004...... this goes on for port 30999
Deployment :
apiVersion: apps/v1
kind: Deployment
metadata:
name: freeswitch
spec:
selector:
matchLabels:
app: freeswitch
template:
metadata:
labels:
app: freeswitch
spec:
containers:
- name: freeswtich
image: 192.168.102.55:32000/freeswitch:v2
imagePullPolicy: IfNotPresent
volumeMounts:
- name: freeswitch-config
mountPath: /etc/freeswitch
- name: freeswitch-tmp
mountPath: /tmp
restartPolicy: Always
volumes:
- name: freeswitch-config
persistentVolumeClaim:
claimName: freeswitch-config
- name: freeswitch-tmp
persistentVolumeClaim:
claimName: freeswitch-tmp

container "sonarqube" in pod "sonar-574d99bfb5-dr8nx" is waiting to start: CreateContainerConfigError

i am facing a problem with my sonar i've been trying to set it up but i get this error from : kubectl logs sonar-574d99bfb5-dr8nx -n sonar == container "sonarqube" in pod "sonar-574d99bfb5-dr8nx" is waiting to start: CreateContainerConfigError.
and when i do describe : kubectl describe pod sonar-574d99bfb5-dr8nx -n sonar
i get this :
Name: sonar-574d99bfb5-dr8nx
Namespace: sonar
Priority: 0
Node: master01/192.168.137.136
Start Time: Tue, 22 Mar 2022 20:30:16 +0000
Labels: app=sonar
pod-template-hash=574d99bfb5
Annotations: cni.projectcalico.org/containerID: 734ba33acb9e2c007861112ffe7c1fce84fa3a434494a0df6951a7b4b6b8dacb
cni.projectcalico.org/podIP: 10.42.241.105/32
cni.projectcalico.org/podIPs: 10.42.241.105/32
Status: Pending
IP: 10.42.241.105
IPs:
IP: 10.42.241.105
Controlled By: ReplicaSet/sonar-574d99bfb5
Containers:
sonarqube:
Container ID:
Image: sonarqube:latest
Image ID:
Port: 9000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CreateContainerConfigError
Ready: False
Restart Count: 0
Limits:
memory: 2Gi
Requests:
memory: 1Gi
Environment Variables from:
sonar-config ConfigMap Optional: false
Environment: <none>
Mounts:
/opt/sonarqube/data/ from app-pvc (rw,path="data")
/opt/sonarqube/extensions/ from app-pvc (rw,path="extensions")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q22lb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
app-pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: sonar-pvc
ReadOnly: false
kube-api-access-q22lb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned sonar/sonar-574d99bfb5-dr8nx to master01
Warning Failed 10m (x12 over 12m) kubelet Error: stat /home/mtst/data-sonar-pvc: no such file or directory
Normal Pulled 2m24s (x50 over 12m) kubelet Container image "sonarqube:latest" already present on machine
here's my pvc yaml :
apiVersion: v1
kind: PersistentVolume
metadata:
name: sonar-pv
namespace: sonar
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 3Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/mtst/data-sonar-pvc"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sonar-pvc
namespace: sonar
labels:
type: local
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
if there's anything that can help me resolve my issue i would appreciate it.
thank you.
I had the same issue with the awx postgres deployment.
Credit goes to Kubernetes 1.17.2 Rancher 2.3.5 CreateContainerConfigError: stat no such file or directory but the directory IS there for resolving the issue for me.
Running a Rancher kubernetes cluster and wanting custom PV I needed to add the below type to my PersistentVolume definition:
hostPath:
path: "/home/mtst/data-sonar-pvc"
type: "DirectoryOrCreate"

Kubernetes giving CrashLoopBackOff error while running the packetbeat in kubernetes cluster

I'm trying to deploy Packetbeat as a DaemonSet on a Kubernetes cluster. But Kubernetes giving CrashLoopBackOff error while running the Packetbeat. I have checked the pod logs of Packetbeat. Below are the logs.
2020-08-23T14:28:00.054Z INFO instance/beat.go:475 Beat UUID: 69d32e5f-c8f2-41bf-9242-48435688c540
2020-08-23T14:28:00.054Z INFO instance/beat.go:213 Setup Beat: packetbeat; Version: 6.2.4
2020-08-23T14:28:00.061Z INFO add_cloud_metadata/add_cloud_metadata.go:301 add_cloud_metadata: hosting provider type detected as ec2, metadata={"availability_zone":"us-east-1f","instance_id":"i-05b8121af85c94236","machine_type":"t2.medium","provider":"ec2","region":"us-east-1"}
2020-08-23T14:28:00.061Z INFO kubernetes/watcher.go:77 kubernetes: Performing a pod sync
2020-08-23T14:28:00.074Z INFO kubernetes/watcher.go:108 kubernetes: Pod sync done
2020-08-23T14:28:00.074Z INFO elasticsearch/client.go:145 Elasticsearch url: http://elasticsearch:9200
2020-08-23T14:28:00.074Z INFO kubernetes/watcher.go:140 kubernetes: Watching API for pod events
2020-08-23T14:28:00.074Z INFO pipeline/module.go:76 Beat name: ip-172-31-72-117
2020-08-23T14:28:00.075Z INFO procs/procs.go:78 Process matching disabled
2020-08-23T14:28:00.076Z INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2020-08-23T14:28:00.076Z INFO elasticsearch/client.go:145 Elasticsearch url: http://elasticsearch:9200
2020-08-23T14:28:00.083Z WARN transport/tcp.go:36 DNS lookup failure "elasticsearch": lookup elasticsearch on 172.31.0.2:53: no such host
2020-08-23T14:28:00.083Z ERROR elasticsearch/elasticsearch.go:165 Error connecting to Elasticsearch at http://elasticsearch:9200: Get http://elasticsearch:9200: lookup elasticsearch on 172.31.0.2:53: no such host
2020-08-23T14:28:00.085Z INFO [monitoring] log/log.go:132 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":28},"total":{"ticks":160,"time":176,"value":160},"user":{"ticks":140,"time":148}},"info":{"ephemeral_id":"70e07383-3aae-4bc1-a6e1-540a6cfa8ad8","uptime":{"ms":35}},"memstats":{"gc_next":26511344,"memory_alloc":21723000,"memory_total":23319008,"rss":51834880}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":5,"events":{"active":0}}},"system":{"cpu":{"cores":2},"load":{"1":0.11,"15":0.1,"5":0.14,"norm":{"1":0.055,"15":0.05,"5":0.07}}}}}}
2020-08-23T14:28:00.085Z INFO [monitoring] log/log.go:133 Uptime: 37.596889ms
2020-08-23T14:28:00.085Z INFO [monitoring] log/log.go:110 Stopping metrics logging.
2020-08-23T14:28:00.085Z ERROR instance/beat.go:667 Exiting: Error importing Kibana dashboards: fail to create the Elasticsearch loader: Error creating Elasticsearch client: Couldn't connect to any of the configured Elasticsearch hosts. Errors: [Error connection to Elasticsearch http://elasticsearch:9200: Get http://elasticsearch:9200: lookup elasticsearch on 172.31.0.2:53: no such host]
Exiting: Error importing Kibana dashboards: fail to create the Elasticsearch loader: Error creating Elasticsearch client: Couldn't connect to any of the configured Elasticsearch hosts. Errors: [Error connection to Elasticsearch http://elasticsearch:9200: Get http://elasticsearch:9200: lookup elastic search on 172.31.0.2:53: no such host]
Here is Packetbeat.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: packetbeat-dynamic-config
namespace: kube-system
labels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
data:
packetbeat.yml: |-
setup.dashboards.enabled: true
setup.template.enabled: true
setup.template.settings:
index.number_of_shards: 2
packetbeat.interfaces.device: any
packetbeat.protocols:
- type: dns
ports: [53]
include_authorities: true
include_additionals: true
- type: http
ports: [80, 8000, 8080, 9200]
- type: mysql
ports: [3306]
- type: redis
ports: [6379]
packetbeat.flows:
timeout: 30s
period: 10s
processors:
- add_cloud_metadata:
- add_kubernetes_metadata:
host: ${HOSTNAME}
indexers:
- ip_port:
matchers:
- field_format:
format: '%{[ip]}:%{[port]}'
cloud.id: ${ELASTIC_CLOUD_ID}
cloud.auth: ${ELASTIC_CLOUD_AUTH}
#setup.kibana.host: kibana:5601
setup.ilm.overwrite: true
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
username: ${ELASTICSEARCH_USERNAME}
password: ${ELASTICSEARCH_PASSWORD}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: packetbeat-dynamic
namespace: kube-system
labels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
template:
metadata:
labels:
k8s-app: packetbeat-dynamic
kubernetes.io/cluster-service: "true"
spec:
serviceAccountName: packetbeat-dynamic
terminationGracePeriodSeconds: 30
hostNetwork: true
containers:
- name: packetbeat-dynamic
image: docker.elastic.co/beats/packetbeat:6.2.4
imagePullPolicy: Always
args: [
"-c", "/etc/packetbeat.yml",
"-e",
]
securityContext:
runAsUser: 0
capabilities:
add:
- NET_ADMIN
env:
- name: ELASTICSEARCH_HOST
value: elasticsearch
- name: ELASTICSEARCH_PORT
value: "9200"
- name: ELASTICSEARCH_USERNAME
value: elastic
- name: ELASTICSEARCH_PASSWORD
value: changeme
- name: CLOUD_ID
value:
- name: ELASTIC_CLOUD_AUTH
value:
- name: KIBANA_HOST
value: kibana
- name: KIBANA_PORT
value: "5601"
volumeMounts:
- name: config
mountPath: /etc/packetbeat.yml
readOnly: true
subPath: packetbeat.yml
- name: data
mountPath: /usr/share/packetbeat/data
volumes:
- name: config
configMap:
defaultMode: 0600
name: packetbeat-dynamic-config
- name: data
emptyDir: {}
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: packetbeat-dynamic
subjects:
- kind: ServiceAccount
name: packetbeat-dynamic
namespace: kube-system
roleRef:
kind: ClusterRole
name: packetbeat-dynamic
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: packetbeat-dynamic
labels:
k8s-app: packetbeat-dynamic
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
verbs:
- get
- watch
- list
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: packetbeat-dynamic
namespace: kube-system
labels:
k8s-app: packetbeat-dynamic
Could anyone suggest me to resolve this issue? any suggestible link also more helpful.
kubectl describe daemonset packetbeat-dynamic -n kube-system
Name: packetbeat-dynamic
Selector: k8s-app=packetbeat-dynamic,kubernetes.io/cluster-service=true
Node-Selector: <none>
Labels: k8s-app=packetbeat-dynamic
kubernetes.io/cluster-service=true
Annotations: deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 1
Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: k8s-app=packetbeat-dynamic
kubernetes.io/cluster-service=true
Service Account: packetbeat-dynamic
Containers:
packetbeat-dynamic:
Image: docker.elastic.co/beats/packetbeat:6.2.4
Port: <none>
Host Port: <none>
Args:
-c
/etc/packetbeat.yml
-e
Environment:
ELASTICSEARCH_HOST: elasticsearch
ELASTICSEARCH_PORT: 9200
ELASTICSEARCH_USERNAME: elastic
ELASTICSEARCH_PASSWORD: changeme
CLOUD_ID:
ELASTIC_CLOUD_AUTH:
KIBANA_HOST: kibana
KIBANA_PORT: 5601
Mounts:
/etc/packetbeat.yml from config (ro,path="packetbeat.yml")
/usr/share/packetbeat/data from data (rw)
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: packetbeat-dynamic-config
Optional: false
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Events: <none>

kubernetes statefulset-controller privileged init containers for elasticsearch

I'm trying to create an ElasticSearch stateful set (STS) with init containers to increase the worker nodes vm.max_map_count=262144 and also the ulimit -n 65536.
However some PodSecurityPolicy (PSP) is denying the escalation of privilaged containers from what I can tell.
Warning FailedCreate 1s (x12 over 11s) statefulset-controller
create Pod elasticsearch-node-0 in StatefulSet elasticsearch-node
failed error: pods "elasticsearch-node-0" is forbidden: unable to
validate against any pod security policy:
[spec.initContainers[0].securityContext.privileged: Invalid value:
true: Privileged containers are not allowed
spec.initContainers[1].securityContext.privileged: Invalid value:
true: Privileged containers are not allowed]
And there are in fact 2x PSP in the cluster, privilaged and unprivilaged. Do I need to specify the privilaged PSP in the STS somehow? Or a svc-acc?
The k8s server version is 1.9.8 - if it matters.
This is the STS (with some helm elements)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-node
namespace: {{ .Release.Namespace }}
labels:
component: elasticsearch
role: node
spec:
replicas: {{ .Values.replicas }}
serviceName: elasticsearch-discovery
selector:
matchLabels:
component: elasticsearch
role: node
template:
metadata:
namespace: {{ .Release.Namespace }}
labels:
component: elasticsearch
role: node
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: component
operator: In
values:
- elasticsearch
- key: role
operator: In
values:
- node
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 100
securityContext:
fsGroup: 1000
initContainers:
# To increase the default vm.max_map_count to 262144
- name: increase-vm-max-map-count
image: busybox
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
# To increase the ulimit to 65536
- name: increase-ulimit
image: busybox
command:
- sh
- -c
- ulimit -n 65536
securityContext:
privileged: true
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:{{ .Values.global.version }}
imagePullPolicy: Always
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
volumeMounts:
# - name: storage
# mountPath: /data
- name: config
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: elasticsearch.yml
resources:
{{ toYaml .Values.resources | indent 12 }}
env:
- name: ES_JAVA_OPTS
value: {{ .Values.java.options }}
volumes:
- name: config
configMap:
name: elasticsearch-node
$ kubectl describe sts elasticsearch-node
Name: elasticsearch-node
Namespace: default
CreationTimestamp: Tue, 12 Nov 2019 17:09:50 +0100
Selector: component=elasticsearch,role=node
Labels: component=elasticsearch
role=node
Annotations: <none>
Replicas: 2 desired | 0 total
Update Strategy: RollingUpdate
Partition: 824638159384
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: component=elasticsearch
role=node
Init Containers:
increase-vm-max-map-count:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sysctl
-w
vm.max_map_count=262144
Environment: <none>
Mounts: <none>
increase-ulimit:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sh
-c
ulimit -n 65536
Environment: <none>
Mounts: <none>
Containers:
elasticsearch:
Image: docker.elastic.co/elasticsearch/elasticsearch:7.3.2
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
Limits:
cpu: 1
memory: 3Gi
Requests:
cpu: 250m
memory: 2Gi
Environment:
ES_JAVA_OPTS: -Xms2G -Xmx2G
Mounts:
/usr/share/elasticsearch/config/elasticsearch.yml from config (rw,path="elasticsearch.yml")
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: elasticsearch-node
Optional: false
Volume Claims: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 1s (x17 over 78s) statefulset-controller create Pod elasticsearch-node-0 in StatefulSet elasticsearch-node failed error: pods "elasticsearch-node-0" is forbidden: unable to validate against any pod security policy: [spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.initContainers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]
Been staring at the PSP docs for some time now: https://kubernetes.io/docs/concepts/policy/pod-security-policy/

Fluentd capture when a Kubernetes Pod terminates with the 'CrashLoopBackOff'?

I'm running a pod that write a simple message to the 'terminationMessagePath' then the pod exit with "CrashLoopBackOff". I would like to be able to debug through Kibana instead of having to login to each Kubernetes nodes. I queried Kibana to get the container last state value "CrashLoopBackOff" from the property reason & message and could not locate an entry.
I can see the fields for the pod in Kibana but the field that I'm looking for (in bold yaml format below) is empty.
What configuration is needed in fluentd to get the log from Kubernetes pod? or configuration need to be set from Kubernetes
$ kubectl get pod_name_1 -o=yaml
terminationMessagePath: /var/log/containers/dt.log
volumeMounts:
- mountPath: /var/log/containers
name: data
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-s0w2n
readOnly: true
dnsPolicy: ClusterFirst
nodeName: dev-master-01
restartPolicy: Always
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /var/log/containers
name: data
- name: default-token-s0w2n
secret:
defaultMode: 420
secretName: default-token-s0w2n
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-07-05T14:45:11Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-07-05T17:00:22Z
message: 'containers with unready status: [dt-termination-demo]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-07-05T14:45:11Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID:
docker://9649c26527cf0e1cd3bd67ba9c606c0b78e6b4f08bacf96175627ddc7d250772
image: debian
imageID: docker pullable://docker.io/debian#sha256:
7d067f77d2ae5a23fe6920f8fbc2936c4b0d417e9d01b26372561860750815f0
lastState:
terminated:
containerID: docker://
9649c26527cf0e1cd3bd67ba9c606c0b78e6b4f08bacf96175627ddc7d250772
exitCode: 0
finishedAt: 2017-07-05T17:00:22Z
**message: |
Sleep expired**
reason: Completed
startedAt: 2017-07-05T17:00:12Z
name: dt-termination-demo
ready: false
restartCount: 30
state:
waiting:
message: Back-off 5m0s restarting failed container=dt-termination-demo
pod=dt-termination-demo-2814930607-8kshj_
default(8c247b15-6190-11e7-acb7-00505691210d)
**reason: CrashLoopBackOff**
hostIP: 192.21.19.128
phase: Running
podIP: 10.0.0.8
startTime: 2017-07-05T14:45:11Z
When Fluentd is deployed as a DaemonSet, it aims to collect all logs from the Node and Pods. As a guide to accomplish this please check the following Yaml file and further repository associated:
https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml
https://github.com/fluent/fluentd-kubernetes-daemonset
If you need additional assistance you can also join our Slack channel:
http://slack.fluentd.org

Resources