websockets on GKE with istio gives 'no healthy upstream' and 'CrashLoopBackOff' - socket.io

I am on GKE using Istio version 1.0.3 . I try to get my express.js with socket.io (and uws engine) backend working with websockets and had this backend running before on a 'non kubernetes server' with websockets without problems.
When I simply enter the external_gke_ip as url I get my backend html page, so http works. But when my client-app makes socketio authentication calls from my client-app I get 503 errors in the browser console:
WebSocket connection to 'ws://external_gke_ip/socket.io/?EIO=3&transport=websocket' failed: Error during WebSocket handshake: Unexpected response code: 503
And when I enter the external_gke_ip as url while socket calls are made I get: no healthy upstream in the browser. And the pod gives: CrashLoopBackOff.
I find somewhere: 'in node.js land, socket.io typically does a few non-websocket Handshakes to the Server before eventually upgrading to Websockets. If you don't have sticky-sessions, the upgrade never works.' So maybe I need sticky sessions? Or not... as I just have one replica of my app? It seems to be done by setting sessionAffinity: ClientIP, but with istio I do not know how to do this and in the GUI I can edit some values of the loadbalancers, but Session affinity shows 'none' and I can not edit it.
Other settings that might be relevant and that I am not sure of (how to set using istio) are:
externalTrafficPolicy=Local
Ttl
My manifest config file:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
app: myapp
ports:
- port: 8089
targetPort: 8089
protocol: TCP
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: gcr.io/myproject/firstapp:v1
imagePullPolicy: Always
ports:
- containerPort: 8089
env:
- name: POSTGRES_DB_HOST
value: 127.0.0.1:5432
- name: POSTGRES_DB_USER
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: POSTGRES_DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
readinessProbe:
httpGet:
path: /healthz
scheme: HTTP
port: 8089
initialDelaySeconds: 10
timeoutSeconds: 5
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=myproject:europe-west4:osm=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
securityContext:
runAsUser: 2
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: myapp-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "*"
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: myapp
weight: 100
websocketUpgrade: true
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: google-apis
spec:
hosts:
- "*.googleapis.com"
ports:
- number: 443
name: https
protocol: HTTPS
location: MESH_EXTERNAL
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: cloud-sql-instance
spec:
hosts:
- 35.204.XXX.XX # ip of cloudsql database
ports:
- name: tcp
number: 3307
protocol: TCP
location: MESH_EXTERNAL
Various output (while making socket calls, when I stop these the deployment restarts and READY returns to 3/3):
kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-8888 2/3 CrashLoopBackOff 11 1h
$ kubectl describe pod/myapp-8888 gives:
Name: myapp-8888
Namespace: default
Node: gke-standard-cluster-1-default-pool-888888-9vtk/10.164.0.36
Start Time: Sat, 19 Jan 2019 14:33:11 +0100
Labels: app=myapp
pod-template-hash=207157
Annotations:
kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu request for container app; cpu request for container cloudsql-proxy
sidecar.istio.io/status:
{"version":"3c9617ff82c9962a58890e4fa987c69ca62487fda71c23f3a2aad1d7bb46c748","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.44.0.5
Controlled By: ReplicaSet/myapp-64c59c94dc
Init Containers:
istio-init:
Container ID: docker://a417695f99509707d0f4bfa45d7d491501228031996b603c22aaf398551d1e45
Image: gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxy_init#sha256:e30d47d2f269347a973523d0c5d7540dbf7f87d24aca2737ebc09dbe5be53134
Port: <none>
Host Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
8089,
-d
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:33:19 +0100
Finished: Sat, 19 Jan 2019 14:33:19 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts: <none>
Containers:
app:
Container ID: docker://888888888888888888888888
Image: gcr.io/myproject/firstapp:v1
Image ID: docker-pullable://gcr.io/myproject/firstapp#sha256:8888888888888888888888888
Port: 8089/TCP
Host Port: 0/TCP
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:40:14 +0100
Finished: Sat, 19 Jan 2019 14:40:37 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:39:28 +0100
Finished: Sat, 19 Jan 2019 14:39:46 +0100
Ready: False
Restart Count: 3
Requests:
cpu: 100m
Readiness: http-get http://:8089/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
POSTGRES_DB_HOST: 127.0.0.1:5432
POSTGRES_DB_USER: <set to the key 'username' in secret 'mysecret'> Optional: false
POSTGRES_DB_PASSWORD: <set to the key 'password' in secret 'mysecret'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rclsf (ro)
cloudsql-proxy:
Container ID: docker://788888888888888888888888888
Image: gcr.io/cloudsql-docker/gce-proxy:1.11
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy#sha256:5c690349ad8041e8b21eaa63cb078cf13188568e0bfac3b5a914da3483079e2b
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=myproject:europe-west4:osm=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Sat, 19 Jan 2019 14:33:40 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rclsf (ro)
istio-proxy:
Container ID: docker://f3873d0f69afde23e85d6d6f85b1f
Image: gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxyv2#sha256:826ef4469e4f1d4cabd0dc846
Port: <none>
Host Port: <none>
Args:
proxy
sidecar
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
myapp
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15007
--discoveryRefreshDelay
1s
--zipkinAddress
zipkin.istio-system:9411
--connectTimeout
10s
--statsdUdpAddress
istio-statsd-prom-bridge.istio-system:9125
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
State: Running
Started: Sat, 19 Jan 2019 14:33:54 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
POD_NAME: myapp-64c59c94dc-8888 (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: myapp-64c59c94dc-8888 (v1:metadata.name)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
default-token-rclsf:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-rclsf
Optional: false
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m31s default-scheduler Successfully assigned myapp-64c59c94dc-tdb9c to gke-standard-cluster-1-default-pool-65b9e650-9vtk
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "istio-envoy"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "cloudsql-instance-credentials"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "default-token-rclsf"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "istio-certs"
Normal Pulling 7m30s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0"
Normal Pulled 7m25s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0"
Normal Created 7m24s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 7m23s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 7m4s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/cloudsql-docker/gce-proxy:1.11"
Normal Pulled 7m3s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.11"
Normal Started 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0"
Normal Created 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Pulled 6m54s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0"
Normal Created 6m51s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 6m48s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 111s (x2 over 7m22s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/myproject/firstapp:v3"
Normal Created 110s (x2 over 7m4s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 110s (x2 over 7m4s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulled 110s (x2 over 7m7s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/myproject/firstapp:v3"
Warning Unhealthy 99s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 85s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Back-off restarting failed container
And:
$ kubectl logs myapp-8888 myapp
> api_server#0.0.0 start /usr/src/app
> node src/
info: Feathers application started on http://localhost:8089
And the database logs (which looks ok, as some 'startup script entries' from app can be retrieved using psql):
$ kubectl logs myapp-8888 cloudsql-proxy
2019/01/19 13:33:40 using credential file for authentication; email=proxy-user#myproject.iam.gserviceaccount.com
2019/01/19 13:33:40 Listening on 127.0.0.1:5432 for myproject:europe-west4:osm
2019/01/19 13:33:40 Ready for new connections
2019/01/19 13:33:54 New connection for "myproject:europe-west4:osm"
2019/01/19 13:33:55 couldn't connect to "myproject:europe-west4:osm": Post https://www.googleapis.com/sql/v1beta4/projects/myproject/instances/osm/createEphemeral?alt=json: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp 74.125.143.95:443: getsockopt: connection refused
2019/01/19 13:39:06 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:06 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:06 Client closed local connection on 127.0.0.1:5432
2019/01/19 13:39:13 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
EDIT:
Here is the serverside log of the 503 of websocket calls to my app:
{
insertId: "465nu9g3xcn5hf"
jsonPayload: {
apiClaims: ""
apiKey: ""
clientTraceId: ""
connection_security_policy: "unknown"
destinationApp: "myapp"
destinationIp: "10.44.XX.XX"
destinationName: "myapp-888888-88888"
destinationNamespace: "default"
destinationOwner: "kubernetes://apis/extensions/v1beta1/namespaces/default/deployments/myapp"
destinationPrincipal: ""
destinationServiceHost: "myapp.default.svc.cluster.local"
destinationWorkload: "myapp"
httpAuthority: "35.204.XXX.XXX"
instance: "accesslog.logentry.istio-system"
latency: "1.508885ms"
level: "info"
method: "GET"
protocol: "http"
receivedBytes: 787
referer: ""
reporter: "source"
requestId: "bb31d922-8f5d-946b-95c9-83e4c022d955"
requestSize: 0
requestedServerName: ""
responseCode: 503
responseSize: 57
responseTimestamp: "2019-01-18T20:53:03.966513Z"
sentBytes: 164
sourceApp: "istio-ingressgateway"
sourceIp: "10.44.X.X"
sourceName: "istio-ingressgateway-8888888-88888"
sourceNamespace: "istio-system"
sourceOwner: "kubernetes://apis/extensions/v1beta1/namespaces/istio-system/deployments/istio-ingressgateway"
sourcePrincipal: ""
sourceWorkload: "istio-ingressgateway"
url: "/socket.io/?EIO=3&transport=websocket"
userAgent: "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
xForwardedFor: "10.44.X.X"
}
logName: "projects/myproject/logs/stdout"
metadata: {
systemLabels: {
container_image: "gcr.io/gke-release/istio/mixer:1.0.2-gke.0"
container_image_id: "docker-pullable://gcr.io/gke-release/istio/mixer#sha256:888888888888888888888888888888"
name: "mixer"
node_name: "gke-standard-cluster-1-default-pool-88888888888-8887"
provider_instance_id: "888888888888"
provider_resource_type: "gce_instance"
provider_zone: "europe-west4-a"
service_name: [
0: "istio-telemetry"
]
top_level_controller_name: "istio-telemetry"
top_level_controller_type: "Deployment"
}
userLabels: {
app: "telemetry"
istio: "mixer"
istio-mixer-type: "telemetry"
pod-template-hash: "88888888888"
}
}
receiveTimestamp: "2019-01-18T20:53:08.135805255Z"
resource: {
labels: {
cluster_name: "standard-cluster-1"
container_name: "mixer"
location: "europe-west4-a"
namespace_name: "istio-system"
pod_name: "istio-telemetry-8888888-8888888"
project_id: "myproject"
}
type: "k8s_container"
}
severity: "INFO"
timestamp: "2019-01-18T20:53:03.965100Z"
}
In the browser at first it properly seems to switch protocol but then causes a repeated 503 response and subsequent health issues cause a repeating restart. The protocol switch websocket call:
General:
Request URL: ws://localhost:8080/sockjs-node/842/s4888/websocket
Request Method: GET
Status Code: 101 Switching Protocols [GREEN]
Response headers:
Connection: Upgrade
Sec-WebSocket-Accept: NS8888888888888888888
Upgrade: websocket
Request headers:
Accept-Encoding: gzip, deflate, br
Accept-Language: nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: no-cache
Connection: Upgrade
Cookie: _ga=GA1.1.1118102238.18888888; hblid=nSNQ2mS8888888888888; olfsk=ol8888888888
Host: localhost:8080
Origin: http://localhost:8080
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: b8zkVaXlEySHasCkD4aUiw==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
Its frames:
Following the above I get multiple of these:
Chrome output regarding websocket call:
general:
Request URL: ws://35.204.210.134/socket.io/?EIO=3&transport=websocket
Request Method: GET
Status Code: 503 Service Unavailable
response headers:
connection: close
content-length: 19
content-type: text/plain
date: Sat, 19 Jan 2019 14:06:39 GMT
server: envoy
request headers:
Accept-Encoding: gzip, deflate
Accept-Language: nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: no-cache
Connection: Upgrade
Host: 35.204.210.134
Origin: http://localhost:8080
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: VtKS5xKF+GZ4u3uGih2fig==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
The frames:
Data: (Opcode -1)
Length: 63
Time: 15:06:44.412

Using uws (uWebSockets) as websocket engine causes these errors. When I swap in my backend app this code:
app.configure(socketio({
wsEngine: 'uws',
timeout: 120000,
reconnect: true
}))
for this:
app.configure(socketio())
Everything works as expected.
EDIT: Now it also works with uws. I used alpine docker container which is based on node 10, which does not work with uws. After switching to container based on node 8 it works.

Related

Sonarqube plugin installation using netrCreds

We are installing Sonarqube as a self managed service via the helm charts at https://SonarSource.github.io/helm-chart-sonarqube.. Sonarqube instance was working fine but we did a change to use netrc type Credentials to download plugins from JFrog artifactory after which our pods started failing.
Log details can be found as below
bash-3.2$ kubectl logs sonarqube-sonarqube-0 install-plugins -n sonarqube
sh: /opt/sonarqube/extensions/downloads/sonar-pmd-plugin-3.3.1.jar: unknown operand
curl: (22) The requested URL returned error: 403
bash-3.2$ kubectl exec sonarqube-sonarqube-0 -n sonarqube -- ls /opt/sonarqube/extensions/download
Defaulted container "sonarqube" out of: sonarqube, init-sysctl (init), concat-properties (init), inject-prometheus-exporter (init), init-fs (init), install-plugins (init)
error: unable to upgrade connection: container not found ("sonarqube")
NAME READY STATUS RESTARTS AGE
sonarqube-sonarqube-0 0/1 Init:CrashLoopBackOff 525 44h
Name: sonarqube-sonarqube-0
Namespace: sonarqube
Priority: 0
Node: ip-10-110-198-195.eu-west-1.compute.internal/10.110.198.195
Start Time: Sat, 10 Sep 2022 13:57:31 +0200
Labels: app=sonarqube
controller-revision-hash=sonarqube-sonarqube-6d6c785f6f
release=sonarqube
statefulset.kubernetes.io/pod-name=sonarqube-sonarqube-0
Annotations: checksum/config: 823d389fbc2ce9b41133d9542232fb023520659597f5473b44f9c0a870c2c6a7
checksum/init-fs: ad6cbc139b1960af56d3e813d56eb450949be388fa84686c48265d32e68cb895
checksum/init-sysctl: 3fc2c9dee4de70eed6b8b0b7112095ccbf69694166ee05c3e59ccfc7571461aa
checksum/plugins: 649c5fdb8f1b2f07b1999a8d5f7e56f9ae65d05e25d537fcdfc7e1c5ff6c9103
checksum/prometheus-ce-config: b2643e1c7fd0d26ede75ee98c7e646dfcb9255b1f73d1c51616dc3972499bb08
checksum/prometheus-config: 3f1303040aa8c859addcf37c7b82e376b3d90adcdc0b209fa251ca72ec9bee7e
checksum/secret: 7b9cfd0db7ecd7dc34ee86567e5bc93601ccca66047d3452801b6222fd44df84
kubernetes.io/psp: eks.privileged
Status: Pending
IP: 10.110.202.249
IPs:
IP: 10.110.202.249
Controlled By: StatefulSet/sonarqube-sonarqube
Init Containers:
init-sysctl:
Container ID: docker://3e66f63924be5c251a46cf054107951f5056f23a096b2f6c8c31b77842e0f29d
Image: leaseplan.jfrog.io/docker-hub/busybox:latest
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613
Port: <none>
Host Port: <none>
Command:
sh
-e
/tmp/scripts/init_sysctl.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:42 +0200
Finished: Sat, 10 Sep 2022 13:57:42 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment: <none>
Mounts:
/tmp/scripts/ from init-sysctl (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
concat-properties:
Container ID: docker://b04f51eaa84bf4198437c7a782e0d186ea93337ac91cc6dae862b836fc6ef6a9
Image: leaseplan.jfrog.io/docker-hub/busybox:latest
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613
Port: <none>
Host Port: <none>
Command:
sh
-c
#!/bin/sh
if [ -f /tmp/props/sonar.properties ]; then
cat /tmp/props/sonar.properties > /tmp/result/sonar.properties
fi
if [ -f /tmp/props/secret.properties ]; then
cat /tmp/props/secret.properties > /tmp/result/sonar.properties
fi
if [ -f /tmp/props/sonar.properties -a -f /tmp/props/secret.properties ]; then
awk 1 /tmp/props/sonar.properties /tmp/props/secret.properties > /tmp/result/sonar.properties
fi
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:43 +0200
Finished: Sat, 10 Sep 2022 13:57:43 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment: <none>
Mounts:
/tmp/props/sonar.properties from config (rw,path="sonar.properties")
/tmp/result from concat-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
inject-prometheus-exporter:
Container ID: docker://22d8f7458c95d1d7ad096f2f804cac5fef64b889895274558739f691820786e0
Image: leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/curlimages/curl#sha256:fa32ef426092b88ee0b569d6f81ab0203ee527692a94ec2e6ceb2fd0b6b2755c
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
curl -s 'https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.0/jmx_prometheus_javaagent-0.16.0.jar' --output /data/jmx_prometheus_javaagent.jar -v
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:43 +0200
Finished: Sat, 10 Sep 2022 13:57:44 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment:
http_proxy:
https_proxy:
no_proxy:
Mounts:
/data from sonarqube (rw,path="data")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
init-fs:
Container ID: docker://2005fe2dbe2ca4c5150d91955563c9df948864ea65fca9d9bfa397b6f8699410
Image: leaseplan.jfrog.io/docker-hub/busybox:latest
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613
Port: <none>
Host Port: <none>
Command:
sh
-e
/tmp/scripts/init_fs.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:44 +0200
Finished: Sat, 10 Sep 2022 13:57:44 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment: <none>
Mounts:
/opt/sonarqube/data from sonarqube (rw,path="data")
/opt/sonarqube/extensions from sonarqube (rw,path="extensions")
/opt/sonarqube/logs from sonarqube (rw,path="logs")
/opt/sonarqube/temp from sonarqube (rw,path="temp")
/tmp from tmp-dir (rw)
/tmp/scripts/ from init-fs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
install-plugins:
Container ID: docker://58a6bed99749e3da7c4818a6f0e0061ac5bced70563020ccc55b4b63ab721125
Image: leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/curlimages/curl#sha256:fa32ef426092b88ee0b569d6f81ab0203ee527692a94ec2e6ceb2fd0b6b2755c
Port: <none>
Host Port: <none>
Command:
sh
-e
/tmp/scripts/install_plugins.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 22
Started: Mon, 12 Sep 2022 10:53:52 +0200
Finished: Mon, 12 Sep 2022 10:53:56 +0200
Ready: False
Restart Count: 525
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment:
http_proxy:
https_proxy:
no_proxy:
Mounts:
/opt/sonarqube/extensions/downloads from sonarqube (rw,path="extensions/downloads")
/opt/sonarqube/lib/common from sonarqube (rw,path="lib/common")
/root from plugins-netrc-file (rw)
/tmp/scripts/ from install-plugins (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
Containers:
sonarqube:
Container ID:
Image: leaseplan.jfrog.io/docker-hub/sonarqube:9.5.0-developer
Image ID:
Ports: 9000/TCP, 8000/TCP, 8001/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 6Gi
Requests:
cpu: 1
memory: 4Gi
Liveness: http-get http://:http/api/system/liveness delay=60s timeout=1s period=30s #success=1 #failure=6
Readiness: exec [sh -c #!/bin/bash
# A Sonarqube container is considered ready if the status is UP, DB_MIGRATION_NEEDED or DB_MIGRATION_RUNNING
# status about migration are added to prevent the node to be kill while sonarqube is upgrading the database.
host="$(hostname -i || echo '127.0.0.1')"
if wget --proxy off -qO- http://${host}:9000/api/system/status | grep -q -e '"status":"UP"' -e '"status":"DB_MIGRATION_NEEDED"' -e '"status":"DB_MIGRATION_RUNNING"'; then
exit 0
fi
exit 1
] delay=60s timeout=1s period=30s #success=1 #failure=6
Startup: http-get http://:http/api/system/status delay=30s timeout=1s period=10s #success=1 #failure=24
Environment Variables from:
sonarqube-sonarqube-jdbc-config ConfigMap Optional: false
Environment:
SONAR_WEB_JAVAOPTS: -javaagent:/opt/sonarqube/data/jmx_prometheus_javaagent.jar=8000:/opt/sonarqube/conf/prometheus-config.yaml
SONAR_CE_JAVAOPTS: -javaagent:/opt/sonarqube/data/jmx_prometheus_javaagent.jar=8001:/opt/sonarqube/conf/prometheus-ce-config.yaml
SONAR_JDBC_PASSWORD: <set to the key 'password' in secret 'sonarqube-database'> Optional: false
SONAR_WEB_SYSTEMPASSCODE: <set to the key 'SONAR_WEB_SYSTEMPASSCODE' in secret 'sonarqube-sonarqube-monitoring-passcode'> Optional: false
Mounts:
/opt/sonarqube/conf/ from concat-dir (rw)
/opt/sonarqube/conf/prometheus-ce-config.yaml from prometheus-ce-config (rw,path="prometheus-ce-config.yaml")
/opt/sonarqube/conf/prometheus-config.yaml from prometheus-config (rw,path="prometheus-config.yaml")
/opt/sonarqube/data from sonarqube (rw,path="data")
/opt/sonarqube/extensions from sonarqube (rw,path="extensions")
/opt/sonarqube/logs from sonarqube (rw,path="logs")
/opt/sonarqube/temp from sonarqube (rw,path="temp")
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-config
Optional: false
plugins-netrc-file:
Type: Secret (a volume populated by a Secret)
SecretName: eks-prv-0001-maven-local-default
Optional: false
init-sysctl:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-init-sysctl
Optional: false
init-fs:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-init-fs
Optional: false
install-plugins:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-install-plugins
Optional: false
prometheus-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-prometheus-config
Optional: false
prometheus-ce-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-prometheus-ce-config
Optional: false
sonarqube:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: sonarqube-sonarqube
ReadOnly: false
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
concat-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-n89wf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 29m (x521 over 44h) kubelet Container image "leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1" already present on machine
Warning BackOff 4m39s (x12224 over 44h) kubelet Back-off restarting failed container

Microk8s Pod fails to start when i create a service for it with 1000 UDP ports

I am creating a deployment with a custom image i have in a private registry, the container will have lots of ports that need to be exposed, i want to expose them with a NodePort service, if i create a service with 1000 UDP ports then create the deployment, the deployment pod will keep crashing, if i delete the service and the deployment then create the deployment only without the service, the pod starts normally.
Any clue why would this be happening ?
Pod Description:
Name: freeswitch-7764cff4c9-d8zvh
Namespace: default
Priority: 0
Node: cc-lab/192.168.102.55
Start Time: Wed, 01 Jun 2022 15:44:09 +0000
Labels: app=freeswitch
pod-template-hash=7764cff4c9
Annotations: cni.projectcalico.org/containerID: de4baf5c4522e1f3c746a08a60bd7166179bac6c4aef245708205112ad71058a
cni.projectcalico.org/podIP: 10.1.5.8/32
cni.projectcalico.org/podIPs: 10.1.5.8/32
Status: Running
IP: 10.1.5.8
IPs:
IP: 10.1.5.8
Controlled By: ReplicaSet/freeswitch-7764cff4c9
Containers:
freeswtich:
Container ID: containerd://9cdae9120cc075af73d57ea0759b89c153c8fd5766bc819554d82fdc674e03be
Image: 192.168.102.55:32000/freeswitch:v2
Image ID: 192.168.102.55:32000/freeswitch#sha256:e6a36d220f4321e3c17155a889654a83dc37b00fb9d58171f969ec2dccc0a774
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 139
Started: Wed, 01 Jun 2022 15:47:16 +0000
Finished: Wed, 01 Jun 2022 15:47:20 +0000
Ready: False
Restart Count: 5
Environment: <none>
Mounts:
/etc/freeswitch from freeswitch-config (rw)
/tmp from freeswitch-tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwkc8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
freeswitch-config:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: freeswitch-config
ReadOnly: false
freeswitch-tmp:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: freeswitch-tmp
ReadOnly: false
kube-api-access-mwkc8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 4m3s (x5 over 5m44s) kubelet Container image "192.168.102.55:32000/freeswitch:v2" already present on machine
Normal Created 4m3s (x5 over 5m43s) kubelet Created container freeswtich
Normal Started 4m3s (x5 over 5m43s) kubelet Started container freeswtich
Warning BackOff 41s (x24 over 5m35s) kubelet Back-off restarting failed container
Service :
apiVersion: v1
kind: Service
metadata:
name: freeswitch
spec:
type: NodePort
selector:
app: freeswitch
ports:
- port: 30000
nodePort: 30000
name: rtp30000
protocol: UDP
- port: 30001
nodePort: 30001
name: rtp30001
protocol: UDP
- port: 30002
nodePort: 30002
name: rtp30002
protocol: UDP
- port: 30003
nodePort: 30003
name: rtp30003
protocol: UDP
- port: 30004...... this goes on for port 30999
Deployment :
apiVersion: apps/v1
kind: Deployment
metadata:
name: freeswitch
spec:
selector:
matchLabels:
app: freeswitch
template:
metadata:
labels:
app: freeswitch
spec:
containers:
- name: freeswtich
image: 192.168.102.55:32000/freeswitch:v2
imagePullPolicy: IfNotPresent
volumeMounts:
- name: freeswitch-config
mountPath: /etc/freeswitch
- name: freeswitch-tmp
mountPath: /tmp
restartPolicy: Always
volumes:
- name: freeswitch-config
persistentVolumeClaim:
claimName: freeswitch-config
- name: freeswitch-tmp
persistentVolumeClaim:
claimName: freeswitch-tmp

Greenplum Operator on kubernetes zapr error

I am trying to deploy Greenplum Operator on kubernetes and I get the following error:
kubectl describe pod greenplum-operator-87d989b4d-ldft6:
Name: greenplum-operator-87d989b4d-ldft6
Namespace: greenplum
Priority: 0
Node: node-1/some-ip
Start Time: Mon, 23 May 2022 14:07:26 +0200
Labels: app=greenplum-operator
pod-template-hash=87d989b4d
Annotations: cni.projectcalico.org/podIP: some-ip
cni.projectcalico.org/podIPs: some-ip
Status: Running
IP: some-ip
IPs:
IP: some-ip
Controlled By: ReplicaSet/greenplum-operator-87d989b4d
Containers:
greenplum-operator:
Container ID: docker://364997050b1f337ff61b8ce40534697bbc13aae29f7b9ae5255245375acce03f
Image: greenplum-operator:v2.3.0
Image ID: docker-pullable://greenplum-operator:v2.3.0
Port: <none>
Host Port: <none>
Command:
greenplum-operator
--logLevel
debug
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 23 May 2022 15:29:59 +0200
Finished: Mon, 23 May 2022 15:30:32 +0200
Ready: False
Restart Count: 19
Environment:
GREENPLUM_IMAGE_REPO: greenplum-operator:v2.3.0
GREENPLUM_IMAGE_TAG: v2.3.0
OPERATOR_IMAGE_REPO: greenplum-operator:v2.3.0
OPERATOR_IMAGE_TAG: v2.3.0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from greenplum-system-operator-token-xcz4q (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
greenplum-system-operator-token-xcz4q:
Type: Secret (a volume populated by a Secret)
SecretName: greenplum-system-operator-token-xcz4q
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 32s (x340 over 84m) kubelet Back-off restarting failed container
kubectl logs greenplum-operator-87d989b4d-ldft6
{"level":"INFO","ts":"2022-05-23T13:35:38.735Z","logger":"setup","msg":"Go Info","Version":"go1.14.10","GOOS":"linux","GOARCH":"amd64"}
{"level":"INFO","ts":"2022-05-23T13:35:41.242Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"setup","msg":"starting manager"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"admission","msg":"starting greenplum validating admission webhook server"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"INFO","ts":"2022-05-23T13:35:41.265Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.361Z","logger":"admission","msg":"CertificateSigningRequest: created"}
{"level":"INFO","ts":"2022-05-23T13:35:41.363Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.366Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumpxfservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumplservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumplservice","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumcluster"}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumpxfservice","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumcluster","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumtextservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumtextservice","worker count":1}
{"level":"ERROR","ts":"2022-05-23T13:36:11.368Z","logger":"setup","msg":"error","error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition","errorCauses":[{"error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition"}],"stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr#v0.1.0/zapr.go:128\nmain.main\n\t/greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator/main.go:35\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
I tried to redeploy the cert-manager and check logs but couldn't find anything. Documentation of the greenplum-for-kubernetes doesn't mention anything about that. Read the whole troubleshooting document from the pivotal website too

OpenShift Ansible-based operator hangs inconsistently

I have an Ansible-based operator running within an OpenShift 4.2 cluster.
Most of times when I apply the relevant CR, the operator runs perfectly.
Occasionally though the operator hangs without reporting any further logs.
The step where this happens is the same however the problem is that this happens inconsistently without any other factors involved and I am not sure how to diagnose it.
Restarting the operator always resolves the issue, but I wonder if there's anything I could do to diagnose it and prevent this from happening altogether?
- name: allow Pods to reference images in myproject project
k8s:
definition:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: "system:image-puller-{{ meta.name }}"
namespace: myproject
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:image-puller
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: "system:serviceaccounts:{{ meta.name }}"
The operator's logs simply hang right after the above step and right before the following step:
- name: fetch some-secret
set_fact:
some_secret: "{{ lookup('k8s', kind='Secret', namespace='myproject', resource_name='some-secret') }}"
oc describe is as follows
oc describe -n openshift-operators pod my-ansible-operator-849b44d6cc-nr5st
Name: my-ansible-operator-849b44d6cc-nr5st
Namespace: openshift-operators
Priority: 0
PriorityClassName: <none>
Node: worker1.openshift.mycompany.com/10.0.8.21
Start Time: Wed, 10 Jun 2020 22:35:45 +0100
Labels: name=my-ansible-operator
pod-template-hash=849b44d6cc
Annotations: k8s.v1.cni.cncf.io/networks-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.254.20.128"
],
"default": true,
"dns": {}
}]
Status: Running
IP: 10.254.20.128
Controlled By: ReplicaSet/my-ansible-operator-849b44d6cc
Containers:
ansible:
Container ID: cri-o://63b86ddef4055be4bcd661a3fcd70d525f9788cb96b7af8dd383ac08ea670047
Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1
Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/ao-logs
/tmp/ansible-operator/runner
stdout
State: Running
Started: Wed, 10 Jun 2020 22:35:56 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/tmp/ansible-operator/runner from runner (ro)
/var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro)
operator:
Container ID: cri-o://365077a3c1d83b97428d27eebf2f0735c9d670d364b16fad83fff5bb02b479fe
Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1
Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 10 Jun 2020 22:35:57 +0100
Ready: True
Restart Count: 0
Environment:
WATCH_NAMESPACE: openshift-operators (v1:metadata.namespace)
POD_NAME: my-ansible-operator-849b44d6cc-nr5st (v1:metadata.name)
OPERATOR_NAME: my-ansible-operator
ANSIBLE_GATHERING: explicit
Mounts:
/tmp/ansible-operator/runner from runner (rw)
/var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
runner:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
my-ansible-operator-token-vbwlr:
Type: Secret (a volume populated by a Secret)
SecretName: my-ansible-operator-token-vbwlr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Is there anything else I could do to diagnose the problem further or prevent the operator from hanging occasionally?
I found a very similar issue in the operator-sdk repository, linking to the root cause in the Ansible k8s module:
Ansible 2.7 stuck on Python 3.7 in docker-ce
From the discussion in the issue it seems that the problem is related to tasks that do not time out and the current workaround seems to be:
For now we just override ansible local connection and normal action plugins, so:
all communicate() calls have 60 second timeout
all raised TimeoutExpired exceptions are retried a few times
Can you check if this resolves your issue? As the issue is still "open", you might need to reach out to the issue as well.

Elasticsearch enable security issues

I have a Elasticsearch 7.6 cluster installed base on
https://github.com/openstack/openstack-helm-infra/tree/master/elasticsearch
Following is what I did to enable security:
a. Generate certificate
./bin/elasticsearch-certutil ca
File location: /usr/share/elasticsearch/elastic-stack-ca.p12
./bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
File location: /usr/share/elasticsearch/elastic-certificates.p12
kubectl create secret generic elastic-certificates --from-file=elastic-certificates.p12
b. Enable Security on statefulset for master pod
kubectl edit statefulset elasticsearch-master
----
- name: xpack.security.enabled
value: "true"
- name: xpack.security.transport.ssl.enabled
value: "true"
- name: xpack.security.transport.ssl.verification_mode
value: certificate
- name: xpack.security.transport.ssl.keystore.path
value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
- name: xpack.security.transport.ssl.truststore.path
value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
----
- mountPath: /usr/share/elasticsearch/config/certs
name: elastic-certificates
readOnly: true
----
- name: elastic-certificates
secret:
defaultMode: 444
secretName: elastic-certificates
c. Enable security on statefulset for data pod
kubectl edit statefulset elasticsearch-data
----
- name: xpack.security.enabled
value: "true"
- name: xpack.security.transport.ssl.enabled
value: "true"
- name: xpack.security.transport.ssl.verification_mode
value: certificate
----
- mountPath: /usr/share/elasticsearch/config/certs
name: elastic-certificates
----
- name: elastic-certificates
secret:
defaultMode: 444
secretName: elastic-certificates
d. Enable security on deployment for client
kubectl edit deployment elasticsearch-client
----
- name: xpack.security.enabled
value: "true"
- name: xpack.security.transport.ssl.enabled
value: "true"
- name: xpack.security.transport.ssl.verification_mode
value: certificate
- name: xpack.security.transport.ssl.keystore.path
value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
- name: xpack.security.transport.ssl.truststore.path
value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
----
- mountPath: /usr/share/elasticsearch/config/certs
name: elastic-certificates
----
- name: elastic-certificates
secret:
defaultMode: 444
secretName: elastic-certificates
After pods restart, I got the following issue:
a. data pots are stuck in init stage
kubectl get pod |grep data
elasticsearch-data-0 1/1 Running 0 42m
elasticsearch-data-1 0/1 Init:0/3 0 10m
kubectl logs elasticsearch-data-1 -c init |tail -1
Entrypoint WARNING: <date/time> entrypoint.go:72: Resolving dependency Service elasticsearch-logging in namespace osh-infra failed: Service elasticsearch-logging has no endpoints .
b. Client pod errors regarding connection refused
Warning Unhealthy 18m (x4 over 19m) kubelet, s1-worker-2 Readiness probe failed: Get http://192.180.71.82:9200/_cluster/health: dial tcp 192.180.71.82:9200: connect: connection refused
Warning Unhealthy 4m17s (x86 over 18m) kubelet, s1-worker-2 Readiness probe failed: HTTP probe failed with statuscode: 401
c. Service "elasticsearch-logging" endpoints is empty
Any suggestions how to fix or what is wrong?
Thanks.

Resources